Out
Out
Spencer Caplan
A DISSERTATION
in
Linguistics
in
W
2021
IE
EV
Dissertation Committee
© COPYRIGHT
2021
.........
W
IE
EV
PR
iii
ACKNOWLEDGEMENT
A dissertation is the culmination of a lot of work and a lot of growth. On both fronts I
have an immense amount to be thankful for. Yet, as I write this I can’t help but feel a bit
conflicted. This bookends a major chapter in my life, both personally and academically; and
I will always be tremendously grateful for the wonderful opportunities and conversations I’ve
had at Penn over the last six years. But there’s also an element of bittersweetness for me
(you’re only a graduate student once after all!), particularly as I reflect on an environment
that will never again be quite as I remember it. Nonetheless, many acknowledgments are in
order.
W
First among those whom I’d like to thank for guiding and supporting me throughout my
time at Penn is my exceptional team of advisors: Charles Yang, Mitch Marcus, and John
guide me towards the real questions, wherever they were hiding. I’m certain that I would
I am immensely thankful for the time I spent in conversation/debate with Tony Kroch,
who taught me as much as anyone else in grad school. I could always express exactly what
I was thinking to Tony, and I’d get nothing but the same intensity and earnestness right
back. I would also like to thank Lila Gleitman — whose depth of knowledge and infectiously
invested attitude were unparalleled in psycholinguistics — for many wonderful comments and
I have co-authored papers with a number of people during my time here, all of whom
deserve mention: Deniz Beser, Kajsa Djärv, Alon Hafri, Jordan Kodner, Mitch Marcus,
iv
My time at Penn was also made much brighter by a host of friends, both near and far. Worthy
sibling. The CIS- and Ling-fueled journey was far more fulfilling taken together. Thank you
to Doug Guilbeault, with whom I’ve shared countless laughs and an even greater countless
days”). A big thank you to Patrick O’Callahan and Billy Shinevar for our continued weekly
correspondence over the last six years: the Adorno-inspired, Zohar-curious, Freudo-Marxist
reading group was as good an intellectual outlet as it gets, and in many ways represents
the cohort of 2015 (the Smartbeginners) and the licorice-fueled DARPA LORELEI team.
Additional thanks to Faruk Akkuş, Ryan Budnick, Andrea Ceolin, Victor Gomes, Alex
W
Kalomoiros, Steve O’Neill, Vichet Ou, Zack Wiener, Hongzhi Xu, and many others not
listed here. And lastly, thank you to Hannah Brooks, whose radiant and persistent kindness
IE
is so rare in this world and always appreciated.
I would like to thank everyone I’ve known through the Ballroom dance community, as dance
EV
was a particularly helpful outlet to escape the stresses of graduate life. In particular I’d
like to acknowledge my dance partners Maria Peifer and Alexa Gamburg for hundreds of
Emanuele Pappacena and Francesca Lazzari for never letting my ego get too big.
Finally, I would like to thank my parents, who taught me to learn independently and never
offered anything less than their unconditional support. And to Julian, you still have such an
impact on me. I only wish you would have been able to read this and tell me it’s beautiful,
v
ABSTRACT
Spencer Caplan
Charles Yang
John C. Trueswell
This dissertation investigates the wide-ranging implications of a simple fact: language un-
folds over time. Whether as cognitive symbols in our minds, or as their physical realization
in the world, if linguistic computations are not made over transient and shifting information
W
as it occurs, they cannot be made at all. This dissertation explores the interaction between
herent to information processing that I term the immediacy of linguistic computation. This
EV
program motivates the study of intermediate representations recruited during online pro-
cessing and acquisition rather than simply an Input/Output mapping. While ultimately
extracted from linguistic input, such intermediate representations may differ significantly
PR
from the underlying distributional signal. I demonstrate that, due to the immediacy of
First, I present experimental evidence from a perceptual learning paradigm that the inter-
categories but includes no direct information about the original acoustic-phonetic signal.
Instead of retaining experiential statistics over words and all their potential meanings, my
model constructs hypotheses for word meanings as they occur. Uses of the same word
vi
are evaluated (and revised) with respect to the learner’s intermediate representation rather
than to their complete distribution of experience. In the third case study, I probe predic-
tions about the time-course, content, and structure of these intermediate representations of
meaning via a new eye-tracking paradigm. Finally, the fourth case study uses large-scale
corpus data to explore syntactic choices during language production. I demonstrate how a
mechanistic account of production can give rise to highly “efficient” outcomes even without
explicit optimization. Taken together these case studies represent a rich analysis of the im-
mediacy of linguistic computation and its system-wide impact on the mental representations
W
IE
EV
PR
vii
TABLE OF CONTENTS
ACKNOWLEDGEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
CHAPTER 1 : Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
W
1.1 Outline of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3
IE
Study 3: A More Direct Probe of Intermediate Representations during
Word Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
EV
1.1.4 Study 4: Choices in Language Production . . . . . . . . . . . . . . . . 6
1.2 In Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
PR
of Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.3 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.5 Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.6 Exclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.7 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
viii
2.1.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.3 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
W
3.0.1 Word Learning and Generalization . . . . . . . . . . . . . . . . . . . . 35
3.1
3.0.3
IE
Organization of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.2 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.4.1 Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
ix
3.5 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Meanings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.1.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.1.2 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.1.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
W
4.1.5 Exclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.1.6 Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2
IE
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
x
5.3.5 Prior Mention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
W
A.2 Full Null Model Structures for Mixed Effects Regression Analyses . . . . . . . 125
xi
C.4 Timecourse Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
W
IE
EV
PR
xii
LIST OF TABLES
TABLE 1 : Output of the best fitting model predicting /t/ responses on the
first half of test trials for Experiment 1. Bracketed values are 95%
confidence intervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
TABLE 2 : Output of the best fitting model predicting /t/ responses on the
first half of test trials for Experiment 2. Bracketed values are 95%
confidence intervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
TABLE 3 : Data from Lewis and Frank (2018). Dependent variable is the out-
come of broad vs. narrow generalization on all trials. Mixed-effects
logistic regression predicting generalization based on listed effects as
well as random slopes for subject and stimulus class. PSE and SCE
emerge as significant main effects along with a three-way interaction
between Presentation-Style, Training-Number, and Block-Order. . . . 48
TABLE 4 : Mixed-effects logistic regression predicting generalization outcome
W
on second-block trials (data from Lewis and Frank (2018)) based on
listed effects (Presentation-Style, Training-Number, Number-Timing
Interaction) as well as random slopes for each subject and stimulus
IE
class. Neither SCE nor PSE manifest on second-block trials. . . . . . 49
TABLE 5 : Mixed-effects logistic regression predicting generalization outcome
on first-block trials (data from Lewis and Frank (2018)) based on
listed effects (Presentation-Style, Training-Number, Number-Timing
EV
Interaction) as well as random slopes for each subject and stimulus
class. PSE and SCE emerge as significant main effects. . . . . . . . . 49
TABLE 6 : In this toy example, the initial training set contains a single instance
of a dalmatian with features (A:1, B:1, C:1, D:0). From this, the
learner extracts a mental representation of (A:0.3, B:0.8, C:0.3, D:0).
PR
During testing, a few potential items are all compared against men-
tal representation in order to select category members. Only val-
ues present in mental representation but missing from the evaluated
items incur a penalty. If the maximum category cutoff were 1.0, then
both the dalmatian and the poodle (shown with shaded background)
would be selected in this case. . . . . . . . . . . . . . . . . . . . . . . 60
TABLE 7 : Major patterns to be captured by models of word learning and gen-
eralization. Both the size of the training set (SCE) as well as the
temporal manner of presentation (PSE) have reliable effects on the
meanings posited by learners. “0.15” represents the typical standard
deviation from results in Spencer et al. (2011) . . . . . . . . . . . . . 62
xiii
TABLE 10 : Evaluating cases of N=2 or more words . . . . . . . . . . . . . . . . . 118
TABLE 11 : Evaluating cases of N=4 or more words. The effect of conditional
probability is absent, while the effects of frequency, object length,
and definiteness remain. . . . . . . . . . . . . . . . . . . . . . . . . . 118
W
TABLE 20 : Output of the best fitting model on the last half of test trials for
Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
TABLE 21 : Output of the best fitting model on the first half of test trials, text-
before condition for Experiment 2 . . . . . . . . . . . . . . . . . . . . 129
TABLE 22 :
IE
Output of the best fitting model on the first half of test trials, text-
after condition for Experiment 2 . . . . . . . . . . . . . . . . . . . . . 129
TABLE 23 : Exclusions with ceiling/floor cutoff for each experiment (by condition).137
EV
TABLE 24 : Tuned parameter values from Section 3.4.3 . . . . . . . . . . . . . . . 151
TABLE 25 : Data from Lewis and Frank (2018). Dependent variable is the generalization-
level outcome on all trials. Linear mixed model predicting general-
ization based on listed effects as well as random slopes for subject
and stimulus class. PSE and SCE emerge as significant main ef-
PR
xiv
TABLE 29 : Potential feature alternations for each domain. . . . . . . . . . . . . . 154
W
IE
EV
PR
xv
LIST OF ILLUSTRATIONS
W
heard audio with ambiguous voice-onset times (VOTs) paired with
“d” text, whereas participants in the shifted-/t/ condition heard
audio with ambiguous VOTs paired with “t” text. . . . . . . . . . .
IE 17
FIGURE 4 : Psychometric functions for Experiment 1: proportion of /t/ choices
as a function of voice-onset time (VOT) and shifted-phone condition
(/t/ or /d/), plotted separately for the text-before and text-after
conditions. Data points are the average of participant means, and
EV
error bars are within-subject 95% confidence intervals. Adaptation
occurred in the text-before condition, but did not occur in the text-
after condition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
FIGURE 5 : Psychometric functions for Experiment 2: proportion of /t/ choices
as a function of voice-onset time (VOT) and shifted-phone condition
PR
xvi
FIGURE 8 : Computation of mental representation from single training example
and subsequent comparison to test objects. Values are schematic
and for illustration only. . . . . . . . . . . . . . . . . . . . . . . . . 55
FIGURE 9 : Algorithmic flow charts highlighting some possible paths of NGM
behavior. This illustrates the common difference in experimental
outcome under parallel (left) and sequential (right) presentation of
stimuli. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
FIGURE 10 : Implementation of distance computation between an object and
mental representation under the NGM . . . . . . . . . . . . . . . . 59
FIGURE 11 : Chart of all seven training configurations. Conditions used for pa-
rameter tuning shown in light red. Time during training is indicated
within each block vertically; the objects in the parallel condition are
co-present at the same time, while the “sequential” trials training
objects are never co-present. . . . . . . . . . . . . . . . . . . . . . . 63
FIGURE 12 : Training on a single item. Experimental results from Spencer et al.
(2011) are shown in gold. Output of NGM in grey. Bars indicate
standard deviations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
W
FIGURE 13 : Training items presented in sequence. Experimental results in gold.
Output of NGM in grey. Bars indicate standard deviations. . . . . . 64
FIGURE 14 : Training items presented simultaneously. Experimental results in
gold. Output of NGM in grey. Bars indicate standard deviations. . 64
IE
FIGURE 15 : Visualization of parallel vs. sequential training conditions. Word1
in red and Word2 in blue. The total number of exemplars and
EV
display time remained constant across conditions. . . . . . . . . . . 76
FIGURE 16 : Sample pairs of maximally divergent stimuli (differing on all five
features) for each domain. . . . . . . . . . . . . . . . . . . . . . . . 79
FIGURE 17 : Example AOI calculation. The RF might be the front and the
bottom in blue. While the NF are the tail and the top in red. . . . 82
PR
xvii
FIGURE 24 : Example illustration of IG output of verb-particle construction.
Variations in lemmas access or constituent assembly speed mani-
fest as comparable variations in linear order (when permitted by
the grammar): whichever element is retrieved and constructed first
is sent off to positional processing first. . . . . . . . . . . . . . . . . 103
FIGURE 25 : Distribution of object length (in words) within the present sample
of verb-particle data (67,905 sentences) . . . . . . . . . . . . . . . . 117
W
thresholds at test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
FIGURE 30 : Psychometric functions for phoneme categorization during testing
in Experiment 1. Output split by Shifted Phone (/t/ or /d/), Tim-
ing condition (text-before or text-after), and test-phase half (first
IE
four test blocks or remaining five). Adaptation occurred in the text-
before but not text-after condition and faded over the course of the
test phase (first vs. last half). Data points are subject means and
EV
error bars are within-subject 95% confidence intervals (Morey, 2008).141
FIGURE 31 : Psychometric functions for phoneme categorization during testing
in Experiment 2. Output split by Shifted Phone (/t/ or /d/), Tim-
ing condition (text-before or text-after), and test-phase half (first
four test blocks or remaining five). Adaptation occurred in the text-
PR
before but not text-after condition and faded over the course of the
test phase (first vs. last half). Data points are subject means and
error bars are within-subject 95% confidence intervals (Morey, 2008).142
FIGURE 32 : Violin plot of the median 50% threshold for “t” / “d” categorization
for each continuum in the norming study. Red line shows the over-
all median 50% threshold at 46.9ms. “_SHIFTED” and “_ORIG”
correspond to pitch-edited and original-pitch CV continua respec-
tively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
FIGURE 33 : Assignment of audio to text in Experiment S1. There is a confound
between “edited speech” and the particular phonological category
under manipulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
FIGURE 34 : Main results for Experiment S1. Psychometric functions for phoneme
discrimination during testing. Output split by Shifted Phone (/t/
or /d/) and Timing condition (text-before or text-after). Unlike in
Experiments 1 and 2, adaptation occurred in both the text-before
and text-after conditions. Data points are subject means and error
bars are within-subject 95% confidence intervals (Morey, 2008). . . 148
xviii
FIGURE 35 : Heatmap of gaze (within stimulus bounding box) throughout all
training trials and all stimulus domains. . . . . . . . . . . . . . . . . 155
FIGURE 36 : Heatmaps of overall gaze split by domain and overlaid on example
stimulus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
FIGURE 37 : Plots showing timecourse of gaze-time to the RF-set as a function
of learning outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
W
IE
EV
PR
xix
CHAPTER 1 : Introduction
One the fundamental question of linguistics asks “what does a person know when they
know a language?” What are the mental representations that underlie our cognitive system
of linguistic meaning, how do we learn them, and how are they processed in real time?
To address these question, this dissertation investigates the wide ranging implications of a
Whether as cognitive symbols in our minds, or as their physical realization in sound waves
and ever-changing referents in the world, if linguistic computations are not made over tran-
sient and shifting information as it occurs, they cannot be made at all. This dissertation
W
explores the interaction between the computations, mechanisms, and representations of lan-
guage acquisition and language processing — with a central theme being the unique study
of the temporal restrictions inherent to information processing that I term the immediacy
IE
of linguistic computation. This program motivates the study of intermediate representations
recruited during online processing and acquisition rather than simply an Input/Output map-
EV
ping. While ultimately extracted from linguistic input, such intermediate representations
may differ significantly from the underlying distributional signal. I demonstrate in sev-
eral lines of work that, due to the immediacy of linguistic computation, such intermediate
PR
Given the system-wide impact that temporal restrictions have on language processing and
acquisition, this dissertation examines the immediacy of computation and the impact of
In particular, I address:
1. How are linguistic hypotheses (i.e. “what phone did I just hear”, or “what does that
word mean”) formed in real-time and what contents do these hypotheses comprise?
1
How does the process of generating hypotheses shape language compared with the
edge and behavior are somehow governed by the raw computing power available to
the brain, but instead I argue that temporal restrictions have a far greater impact by
2. How do the algorithms implemented in our minds use simple tools to actually com-
and acquisition? A set of outputs is often largely consistent with many possible algo-
rithms — this work attempts to identify which algorithms are most likely at play and
W
1.1. Outline of the Dissertation
The dissertation is comprised of four distinct studies. This includes a set of experiments
IE
probing the intermediate representation of speech during online processing (Chapter 2), a
computational model and corresponding eye-tracking study of how learners handle semantic
ambiguity during word learning (Chapters 3 and 4 respectively), and a statistical analysis of
EV
how speakers make real-time choices during language production (Chapter 5). I discuss each
of these projects and the methods used to address these questions below in the remainder
of Chapter 1. While the individual findings stand on their own, when taken together they
PR
represent a rich analysis of the immediacy of linguistic computation and its system-wide
signal after it enters the mind of a listener. Previous work (Connine et al., 1991, inter alia)
demonstrates that listeners maintain intermediate speech representations over time. Suc-
order for listeners to use subsequent context to aide in the interpretation of prior phonetic
1
Thus an alternative title of the dissertation might have been: “How to Get (Linguistically) Rich when
you’re (Computationally) Poor”
2
input. Consider for instance how one would decide between the interpretation of a poten-
tially ambiguous word pair like “[t/d]ent” in a sentence such as “That was the [t/d]ent that
they the acoustic-phonetic signal or more general information about the probability of possi-
ble categories — has remained underspecified. I present experimental evidence from a novel
perceptual learning (“accent adaptation”) paradigm which supports the view that informa-
tion about the acoustic-phonetic signal is not maintained over time. In particular, I exposed
cerning phones/words and manipulated the temporal availability of disambiguating cues via
visually presented text (i.e., presentation before or after each utterance). Results show that
listeners adapt to the modified acoustic distribution only when disambiguating text is pro-
W
vided before the auditory information, but not after. This finding supports the position that
signal. Such results have impactful ramifications far beyond speech processing: limits to the
EV
storage of sensory input place real limits on mental representations. This may inform long-
standing debates in other areas of linguistics regarding the exemplar vs. abstract/discrete
Children famously face ambiguity during of morphological and syntactic acquisition (Yang,
2002, 2016; Tyler and Nagy, 1989; Pinker, 1989; Rumelhart and McClelland, 1985): how
does the learner deal with such ambiguity when multiple grammars are, in principle, con-
sistent with the words and sentences they have heard (Gold, 1967)? While words, unlike
syntactic units, are often thought of as atomic, a fundamental question in word learning
is strikingly similar: how, given only evidence about what objects a word has previously
referred to, are children able to generalize to the total class? How does a child end up
knowing that “poodle” picks out a specific subset of dogs despite their overlapping exten-
sions? Chapter 3 presents a model of word learning grounded in category formation (the
3
Naïve Generalization Model or “NGM”). While learners have been argued to display optimal
behavior by performing statistical inference over the input distribution of their experience
(e.g. via Bayesian inference — Xu and Tenenbaum (2007b)), they are also sensitive to input
conditions that are orthogonal to purely statistical reasoning (Spencer et al., 2011), like the
timing with which referents are encountered (for instance, whether stimuli are co-present on
I contrast the NGM with the popular Bayesian inference theory of generalization (Xu and
Tenenbaum, 2007b). On the Bayesian account, learners have some representation of many
potential meanings for a word, and engage in statistically sensitive calculations to select the
hypothesis that is most probable given a distribution of attested exemplars. The “heavy-
W
lifting” and explanatory power resides in evaluation (via statistical inference) of many hy-
potheses without specifying the process which generates them. In contrast with previous
IE
Bayesian (Xu and Tenenbaum, 2007b) or associative (Regier, 2005) accounts, computation
in the NGM is local and lacks any global optimization over an evaluation metric. On my
view, word learning is an incremental and mechanistic process. Instead of retaining experi-
EV
ential statistics over words and all their potential meanings, the NGM constructs hypotheses
for word meanings as they occur. Uses of the same word are evaluated (and revised) with
respect to the learner’s intermediate representation (e.g. their current working conception)
PR
rather than to their complete distribution of experience. While in some cases this “working
conception” ends up being extremely similar to the distribution of experience, other cases
lead to divergent and highly-informative outcomes, in particular when stimuli are presented
sequentially rather than simultaneously. What you see (during learning) is not necessarily
I evaluate, and find support for, the NGM on a range of experimental data — varying the
tion in word learning (Xu and Tenenbaum, 2007b; Spencer et al., 2011). Learning behavior
is shaped by the immediacy of linguistic computation: learners are limited to locally eval-
4
uating only the fit of whatever structures they posit. Through this temporally constrained
process, one hypothesis will end up winning out because it offers a satisfactory fit to the
data, but this does not mean that the final meaning or grammar is provably optimal (as
often assumed by alternative accounts). Learners do the best job they can, not the best job
possible.
1.1.3. Study 3: A More Direct Probe of Intermediate Representations during Word Learning
The experiments modeled in Chapter 3 are informative as to the word representations that
result from successful learning, and the NGM makes predictions about intermediate states
of acquisition, but this does not provide direct evidence as to the fine-grained time-course
over which the relevant semantic generalizations emerge. Just like the generation/evaluation
W
function (intension) and measuring its output (extension) as this is frequently a many-to-
one mapping. Chapter 4 introduces and presents results from a new eye-tracking paradigm
IE
(inspired by Rehder and Hoffman (2005a)) designed to test the predictions of broad classes
of word learning theories: accounts grounded in hypothesis generation like the NGM in
EV
contrast with accounts based on the statistical accumulation and evaluation of evidence. The
paradigm uses artificially created stimuli with spatially distributed features — each region
of selective attention to these individual features, we are able to study the content and time-
accumulation theory predicts that learners should initially attend to all the dimensions that
they can in order to extract a representative sample before applying any evaluative filter
for the most likely meaning. Conversely the NGM predicts that learners should extract an
intermediate hypothesis on the basis of initial exposure; given the immediacy of linguistic
computation subsequent trials are evaluated only with respect to the hypothesized meaning2 .
I find that, consistent with the NGM, learners’ attention is limited only to the features
2
Perhaps the modernist movement had it right all along! “Nothing is less real than realism. Details are
confusing. It is only by selection, by elimination, by emphasis, that we get at the real meaning of things”
(attributed to Georgia O’Keefe — as quoted in Stuhlman (2007))
Reproduced with permission of copyright owner. Further reproduction prohibited without permission.