Constraint-Based Learning of Phonological Processes
Constraint-Based Learning of Phonological Processes
Shraddha Barke, Rose Kunkel, Nadia Polikarpova, Eric Meinhardt, Eric Baković, and Leon Bergen
{sbarke,wkunkel,npolikarpova,emeinhar,ebakovic,lbergen} @ucsd.edu
6176
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing
and the 9th International Joint Conference on Natural Language Processing, pages 6176–6186,
Hong Kong, China, November 3–7, 2019.
2019c Association for Computational Linguistics
approach in a system called S Y P HON and show that it omitted, because it is irrelevant to the rule’s applica-
is capable of generating accurate phonological rules tion; formally, A, L, and R may each be empty feature
in under a minute and from just 5–30 data points. vectors, which are defined to match any phone.
Hereafter, we refer to the sequence LAR of
2 Background and Problem Definition the target and the context as the condition of the
rule. If the condition is empty, the rule applies
In this section, we illustrate phonological processes unconditionally. In addition to + and −, the values of
and the problem of phonological rule induction using features in the condition of the rule may be variables,
our running example of English verbs. which enforce that features have the same value in
different parts of the condition. For example,
2.1 Rule-Based Phonology
A → B / [αconsonant] [αconsonant]
Phonological features. Phones (speech sounds) are describes a rule which applies between pairs of
described using a feature system that groups similar- consonants and pairs of vowels, but not between a
sounding phones together. For instance, voiced con- consonant and a vowel.
sonants (consonants produced with vibrating vo-
2.2 Problem Definition
cal cords, like [z], [d], [b]) possess the features
+consonant and +voice, while voiceless consonants The input to our problem is a matrix of surface forms,
(like [s], [t], [p]) possess the features +consonant such as the one shown in Fig. 1, left. These forms are
and −voice. Each phone can be uniquely identified arranged into rows, corresponding to different stems,
by a feature vector: for example [−voice −strident and columns, corresponding to different inflections
+anterior −distributed] uniquely identifies the sound (in this case, the third-person singular and past tense
[t]. However, some phones may be uniquely identi- of English verbs). In the interest of space, we only
fied by several feature vectors, and not all feature vec- show four rows from this data set, but a typical input
tors correspond to phones (the feature system is redun- in a phonology textbook problem is only slightly
dant). For example, the feature vector [+low +high] larger and ranges from 5 to 30 rows.
does not correspond to any phones, as no phone can Given these data, our task is to infer the latent
have both a raised and a lowered tongue body. underlying forms for each of the words in the input
such that the resulting matrix of underlying forms
Phonological rules. In rule-based phonology, a
factorizes into stems and suffixes, and to learn a
phonological process is formalized as a conditional
sequence of phonological rules which, when applied
rewrite rule that transforms an underlying form
elementwise to the matrix of underlying forms,
of a word (roughly, the unique stored form of
reproduces the matrix of surface forms.
the word) into its surface form (the word as it is
This learned sequence of phonological rules
intended to be pronounced). In our English past tense
is generative in the following sense: given the
example, the underlying form /zIpd/—formed by
underlying form for a new word, such as /æskz/, we
concatenating the stem /zIp/ and past tense suffix
can deterministically apply these rules to generate
/d/—is transformed into the surface form [zIpt] by a
the surface form of that word, [æsks]. We use this
rule that makes an obstruent voiceless when it occurs
property to evaluate the accuracy of the rule set we
after a voiceless obstruent:
learned by holding out a portion of the words from
[−sonorant] → [−voice] / [−voice] the data, and then applying the rule to the underlying
In general, phonological rules have the form forms of those words, which were determined
A → B / L R, where all of A, B, L, and R are through phonological research.
feature vectors. The rule means that any phone that
matches A and occurs between two phones that match 2.3 Phonological Intuition
L and R, respectively, will be rewritten to match B The design of our system is informed by how linguists
(leaving the features not mentioned by B intact). A is solve the problem of phonological rule induction.
called the target of the rule, B is called the structural When a phonologist analyzes these data, they begin
change, and L and R are the left and the right by positing underlying forms that are likely to
contexts.1 In the example above, the right context is result in the simplest set of rules. For example, they
1
observe that the substring shared in each row is most
In this work we only consider a subset of strictly local k=3
rules (Chandlee et al., 2014), where either side of the context likely the stem, which surfaces without change; the
is restricted to at most a single phone. underlying suffix in the first column in Fig. 1 is likely
6177
Validate
/æsk/+/z/
Train
[zIps] [zIpt]
[bEgz] [bEgd] Xij R
∅ → [@] / [αstrident] [αstrident]
[mIs@z] [mIst] S Y P HON [−sonorant] → [−voice] / [−voice]
[nidz] [nid@d]
Uij model (phonological rules)
input (surface forms)
/zIp/+/z/ /zIp/+/d/
/bEg/+/z/ /bEg/+/d/
/mIs/+/z/ /mIs/+/d/ [æsks]
/nid/+/z/ /nid/+/d/
latent underlying forms
Figure 1: The general structure of the problem, shown concretely for English verbs.
/z/, which sometimes surfaces as [s] and other times be formalized as a context-free grammar:
as [@z]; and similarly, the underlying suffix in the R ⇒ R∗ R ⇒ C →C /C C
second column is likely /d/, which can change to [t] ∗
C ⇒ (V F ) V ⇒ +|− (1)
or [@d]. The choice of /z/ and /d/ as the underlying
suffixes is preferred to, say, /s/ and /t/, because this F ⇒ consonant | voice | ...
choice lets us explain all the observed data using only According to this grammar, R is a sequence of rules
three edits: /z/ → [s], /d/ → [t], and ∅ → [@]. R; each R is defined in terms of four feature vectors
The next step is to merge and generalize individual C; each feature vector is a sequence of pairs of
edits: the first two edits are both devoicing an feature values V and feature names F .
obstruent, so they can be merged into [−sonorant] Rewriting. We use CR and BR to denote the condi-
→ [−voice], while the last edit is an insertion and tion and structural change of a rule R, respectively.
cannot be generalized. A feature vector C can be interpreted as a Boolean
The final step of the analysis is to infer the condi- formula that holds of a phone a if a possesses all
tions under which each of the two structural changes features in C; we denote by |C| the number of
occurs. By contrasting examples in the first column, models of this formula, i.e. phones in the inventory
we infer that the insertion happens when the suffix /z/ Φ that satisfy C. Similarly, CR is a Boolean formula
occurs after a strident (like /s/ in /mIs/); otherwise, over trigrams of phones. A rewrite of a trigram abc
/z/ and /d/ are devoiced whenever they occur after a by rule R is defined as:
(
voiceless obstruent (like /p/ in /zIp/). The full data BR (b) if CR (abc)
set can be explained using the two rules in Fig. 1, right. R(abc) =
b otherwise
Note that in order to capture the data in both columns,
the insertion rule says that [@] is inserted whenever The notion of rewrites can be extended to words and
the stridency of the left and right context matches. rule sets.
Note also that in this case the order of rules matters: Learning as constrained optimization. We can
for words like /mIsz/, insertion is applied first, which now formalize our problem as a hard correctness
prevents the devoicing rule from applying. constraint over rules and underlying forms Uij :
R(Uij ) = Xij where Uij , Aj [Si ] (2)
3 Learning Phonological Rules Here, Aj [Si ] denotes a concatenation of the
prefix/suffix Aj with the stem Si .
As illustrated in Fig. 1, the input to our learning
There might be many rule sets R that are consistent
problem is a matrix of surface forms Xij with I rows
with all the data, and what we would like is to pick
and J columns. The goal is to learn a discrete rule
one that generalizes to other data that exhibits the
set R, while jointly inferring the latent set of I stems
same phonological process (for example, the rule
Si and J affixes Aj .
inferred in Fig. 1 should generalize to other regular
Hypothesis space. The hypothesis space for R can English verbs). Hence we frame the learning problem
6178
R (subject to the hard constraint that they form a fac-
torizable matrix Uij ). Finally, we deterministically
compute xk , Rk (uk ). Hence we can define:
Rk 0 if R (u ) 6= xk
P (b =>)k k
P (xk ,uk | Rk ) =
k
|CRk | if CRk (uk )
P (bk =⊥) otherwise
|¬CRk |
bk uk xk
Our goal is to maximize
k ∈ 1..N P (R,R1 ,...,RN ,u1 ,...,uN | x1 ,...,xN )
N
Figure 2: Probabilistic model of a phonological process. Y
A rule set R is sampled from a description length prior. ∝ P (R) P (xk ,uk | Rk )P (Rk | R)
We observe a set of N surface phonemes xk ; each xk k=1
is generated by sampling a rule Rk from R and an Objective function. Taking logs, we can derive the
underlying trigram uk , and deterministically applying following approximate minimization objective for
Rk to uk (coin flip bk decides whether uk should match our constrained
X optimization problem:
Rk ’s condition). ws `(R)+NR+ ·log(|CR |) (3)
R∈R
where NR+ is the number of positive examples for this
as a constrained optimization problem and derive
rule. (Note that this objective ignores P (Rk | R) and
the objective function using a Bayesian model.
bk , which are assumed to be uniform. It also ignores
3.1 Bayesian Model the negative examples. This provides a reasonable ap-
proximation, under the assumption that |¬CR | |CR |
Generative process. Intuitively, to generate surface for each rule R, which holds in the current setting.)
forms Xij , we must sample a single rule set R, I This function includes a simplicity term, which favors
stems Si , and J affixes Aj , and then deterministically rules with shorter (and hence, more general) condi-
apply R to each Aj [Si ]. Prior work on phonological tions, and a likelihood term, which favors more spe-
rule learning (Ellis et al., 2015) assumed that Si and cific conditions if there are sufficient positive exam-
Aj are sampled uniformly from the language and ples to support them. This likelihood term stems from
independently of R. We observe, however, that in our strong sampling assumption; we demonstrate its
most data sets of interest, underlying forms are in importance for inferring accurate rules in Sec. 5.
fact sampled to contrast the contexts in which R
does and does not apply. We model this intuition as 3.2 Inference by Program Synthesis
a strong sampling process depicted in Fig. 2. To solve the constrained optimization problem
For simplicity, in this model each observation we build upon a technique from programming
corresponds to an individual rule application to languages called constraint-based program
an underlying trigram u that produces a surface synthesis (Solar-Lezama, 2013).
phoneme x. For example, the rewrite /zIpz/ → Constraint-based synthesis. The input to (induc-
[zIps] is represented as four observations: /#zi/ tive) program synthesis is a DSL that defines the
→ [z], /zIp/ → [I], /ipz/ → [p], and /pz#/ → [s] space of possible programs and a set of input-output
(where # encodes word boundary). −−→
examples E = hi,oi; the goal is to find a program
Our generative process first samples a ruleset R whose behavior is consistent with the examples. In
from the description length prior over the hypothesis constraint-based synthesis, this search problem is
space (1): P reduced to solving a boolean constraint. To this end,
P (R) ∝ 2−ws · R∈R `(R) we index the DSL by a bitvector c, called a control
where `(R) is the length of rule R and ws > 0 is vector. We then define a mapping from control vec-
a model hyperparameter. For each observation tors to program behaviors via an evaluation relation
k ∈ 1..N , we pick a rule Rk uniformly from R. ϕ(c,y,z)—a boolean formula that holds if and only if
Before sampling the underlying trigram uk , we flip a program indexed by c produces output z on input y.
a coin bk to decide whether we want to sample a Given the evaluation relation, the synthesis problem
positive or a negative trigram, i.e. whether CRk (uk ) reduces to solving the^
following boolean constraint:
should hold true; we then sample uk uniformly ∃c. ϕ(c,i,o)
from the set of all positive (resp. negative) trigrams hi,oi∈E
6179
An SMT solver (de Moura and Bjørner, 2008) is the condition under which this change occurs
then used to find a satisfying assignment for c, which (Sec. 3.5). If this step fails, we go back to step
allows us to recover the corresponding program. For 1 and generate the next candidate matrix Uij .
this approach to succeed, the evaluation relation has
In the rest of this section we detail these three
to be designed carefully so that it only uses constraints
steps. For illustration purposes, in all examples we
that the solver can efficiently reason about.
will assume that our feature set has just three features:
Synthesis of phonological rules. In our setting, the voice v, sonorant s, and continuant c.
DSL is the space of all rule sets R (up to a certain
size), and the evaluation relation ϕ(c,U,X) is the 3.3 Underlying Form Inference
correctness condition (2). Importantly, our setting The input to this step is the matrix of surface forms
differs from traditional program synthesis in two Xij and the output is a set of aligned pairs hU,Xiij .
ways: first, we have to search for both the control Tab. 1 illustrates this for a 2 × 2 matrix of English
vector and the inputs, and second, in addition to verbs. For example, hU,Xi11 = h[zIpz],[zIps]i; we
satisfying the correctness condition, we also seek to use red to show alignment information (in this case,
minimize the objective function (3). If we encode the a single substitution z → s). Insertions and deletions
−−−→
objective function as ψ(c,hU,Xi), we can reduce rule are represented by alignment with null segments.
learning to the following constrained optimization:
−−−−−−−−→ Input Output
minimize ψ(c,hAj [Si ],Xij i)
N,M
[zIps] [zIpt] [zIpz] [zIpd]
[zIps] [zIpt]
^
subject to ϕ(c,Aj [Si ],Xij )
i,j=1,1
[nidz] [nid@d] [nidz] [nid d]
Given a proper encoding of ϕ and ψ, this constraint [nidz] [nid@d]
can be solved by an optimizing SMT solver (Bjørner
Table 1: Underlying form inference on English verbs.
et al., 2015); this is the approach used in prior
work (Ellis et al., 2015). However, this is a very
The output matrix hU, Xiij has to satisfy two
computationally intensive problem. The reason
properties: (i) the matrix Uij can be factorized into
is the astronomical size of the search space: for a
stems Si and affixes Aj , and (ii) each pair hU,Xi has
problem of factorizing a 10×2 matrix Xij into stems
a small edit distance. Our intuition is that underlying
of length `S = 3 and affixes of length `A = 2, if we
forms that have a small edit distance from surface
limit the maximum number of rules NR to 2 and
forms are likely to produce simpler rules. Hence we
consider an inventory Φ with 90 phones and a feature
generate candidate matrices hU,Xiij in the order of
set F with 30 features, we can estimate the size of
increasing edit distance, until rule inference succeeds
the search space as 3|F |NR |Φ|I`S +J`A ≈ 2600 .
for one of them. This strategy will always eventually
Decomposition. To achieve scalable inference, we find a matrix of underlying forms which can be
decompose the global constrained optimization related to the surface forms by a rule set we can infer
problem into three steps, inspired by phonological as long as one exists. This process is not guaranteed to
intuition we described in Sec. 2.3: find the global minimum of the objective function (3),
but we show empirically that it produces good results.
1. Underlying form inference. In the first step we
We can encode the properties (i) and (ii) as
use an SMT solver to generate likely underlying
a boolean constraint over unknown strings with
stems and suffixes. We rank them based on the
concatenation and length, which can be solved
heuristic that underlying forms Uij that have a
efficiently by the Z3 STR 2 solver (Zheng et al.,
smaller edit distance from surface forms Xij are
2017). From the solutions for those unknowns it
more likely to produce simple rules (Sec. 3.3).
is straightforward to recover not only the stems and
2. Change inference. Given the set of edits suffixes, but also the required alignment information
between each Uij and the corresponding Xij , between the underlying and surface forms.
we identify the smallest set of structural changes
3.4 Change Inference
B that can describe all the edits (Sec. 3.4).
The input to change inference is the set of all edits in
3. Condition inference. Finally, for each structural the aligned pairs hU,Xiij , computed in the previous
change B, we use program synthesis to infer step, and the output is a set of structural changes
6180
that captures all the edits. Tab. 2 illustrates this for u ` Features
the edits from Tab. 1; columns LHS and RHS show
/#zI/ ⊥ [+#] [+v −s][+v +s]
relevant features of the left- and and right-hand sides
of the edit. /zIp/ ⊥ [+v −s][+v +s][−v −s]
/Ipz/ ? [+v +s][−v −s][+v −s]
Edit LHS RHS Change
/pz#/ > [−v −s][+v −s] [+#]
/z/ → [s] [+v −s +c] [−v −s +c]
[−v]
/d/ → [t] [+v −s −c] [−v −s −c] Table 3: Input to condition inference for change [−v]
∅ → [@] ∅ [@] [@] on /zIpz/ → [zIps]
6181
Finally, as the solver does not support logarithms, for both rule sets and underlying forms, provided by
we encode log using a lookup table. This is tractable, one of our coauthors, who is a phonologist.
p
since we only need to evaluate the log of each |CR |,
which is at most the size of our inventory, roughly 4.2 Textbook Problems
100 phones. For this category, we curated a set of 34 problems
from phonology textbooks (Gussenhoven and Jacobs,
3.7 Current limitations
2017; Odden, 2005; Roca and Johnson, 1999) by
S Y P HON currently leverages three simplifying selecting problems with local, non-interacting rules.
assumptions about rules for domain-specific problem For each problem, the input data set is tailored
decomposition and SMT encoding, which are crucial (by the textbook author) to illustrate a particular
to making learning computationally tractable. phonological process, and contains 20-50 surface
Conjunctive conditions. Rule conditions are con- forms. For all of these problems we have gold
junctions of equalities over feature values, and each standard solutions, either provided with the textbook
rule has a unique change. We can thus decompose or inferred by a phonologist. Gold standard solutions
the learning process into change inference and condi- range in complexity from one to two rules. Out of the
tion inference: change inference greedily groups all 34, 21 problems are shared with (Ellis et al., 2019),
observed edits into changes, and from then on we as- which we use as the baseline for inference times.
sume that each change uniquely corresponds to a rule. Following the textbooks, these problems are
Local context. The condition of each rule is only further subdivided into 10 matrix problems, 20
a function of the target phone and two surrounding alternation problems, and 4 supervised problems.
phones. This allows us to encode condition inference The matrix problems follow the format presented
as learning a formula over trigrams of phones, which in Sec. 2. The alternation and supervised problems
has a compact encoding as SMT constraints. are easier, as we are given more information about
Rule interaction. One rule’s change does not create the underlying form. For alternation problems,
conditions for another. This allows us to perform we are essentially given a set of choices for what
condition inference for each rule independently. the underlying form might be, and for supervised
problems the underlying form is given exactly. These
Many attested patterns in real languages go beyond
problems include examples of phones in comple-
these limitations. We believe that it is possible to
mentary distribution. Our problem decomposition
lift these restrictions, and still leverage the structure
allows us to switch out underlying forms inference
of conditional rewrite rules to retain most of the
to handle different kinds of input. According to this
benefits of our problem decomposition. We leave
classification, the flapping lexical database is an
this extension to future work.
alternation problem and verbs is a matrix problem.
4 Data
5 Experiments
We evaluate our system on two broad categories of
datasets: lexical databases and textbook problems. We evaluate our system on the two categories of data
sets discussed in Sec. 4. We split the 34 textbook
4.1 Lexical Databases problems into 24 development and 10 test problems.
We use large lexical databases to investigate two Our system has several free parameters (most
(morpho)phonological processes in English: flapping importantly, the simplicity weight ws ). These were
(6457 rows) and regular verb inflections (2756 rows). hand-tuned on all of the data except the test problems.
We process the CMU pronouncing dictionary (Weide, For the test problems, we only added missing sounds
2014) to create underlying and surface form pairs ex- to the inventory as needed. The 10 test problems are
emplifying flapping, as in Gildea and Jurafsky (1996). all alternation problems. We leave for future work
For verb inflections, we combine morphological the investigation of these hyperparameter settings
information extracted from CELEX-2 (Baayen et al., on new matrix problems.
1993) with CMU transcriptions to create a database
of regular verbs, where each row of the database 5.1 Lexical Database Experiments
contains the third-person singular present tense We evaluate our system on two large English datasets,
wordform and past tense wordform for a given verb. one demonstrating flapping, and the other verbs. For
For both datasets we have gold standard solutions each dataset, we learn a rule set from 20, 50 and 100
6182
Accuracy Rule Match Data Set Inferred Rule
+cor
Precision Recall UF 1 Flap 20
−voi
→ [+approx] / [+1stress]
SP SP- SP SP- SP SP-
+cor
+voi
Flap 20 76 52 50 66 31 25 100 2 Flap 50 −voi → / [+stress] [+syll]
+approx
−del. rel.
Flap 50 93 79 86 71 86 71 100
+ant
+voi
Flap 100 100 79 100 71 100 71 100 3 Flap 100 −voi → / [+stress] [+syll]
+approx
Verb 20 86 73 48 42 83 61 100 −del. rel.
6183
Accuracy Rule Match such work either requires thousands of training
UF
Precision Recall observations (Gildea and Jurafsky, 1996) or has used
SP SP- SP SP- SP SP- SP abstracted and greatly simplified symbol inventories
M AT 100 100 70 69 77 69 100 and training data (Chandlee et al., 2014).
A LT 100 100 66 61 71 62 100 Hayes and Wilson (2008), Goldsmith and Riggle
S UP 100 100 63 60 71 64 - (2012), and Futrell et al. (2017) propose different
T EST 100 100 54 52 61 48 100 methods for learning probabilistic models of phono-
tactics, which represent gradient co-occurrence re-
Table 6: Accuracy of textbook problems. We use (-)
strictions between surface segments within a word.
for supervised problems without underlying form
inference. Unlike the current implementation of S Y P HON, these
models include representational structures that enable
Inference Time (secs) them to capture certain non-local phenomena. How-
S Y P HON Baseline Speedup ever, because these models focus on phonotactics,
M AT 30.0 3100 124.6 they do not infer underlying forms or rules which
A LT 10.7 N/A N/A relate underlying forms to surface forms.
S UP 5.3 6333 378.3 Finally, much work has focused on learning
T EST 8.3 N/A N/A representations of phonological processes as
mappings that minimally violate a set of ranked or
Table 7: Comparison of the inference times of textbook weighted constraints (Prince and Smolensky, 2004;
problems with baseline. We report the median execution Legendre et al., 1990), but such work has generally
times and geometric mean of the speedups. N/A indi- taken the constraint definitions as given and focused
cates examples where baseline results are unavailable.
on learning rankings or weights (see e.g. Goldwater
and Johnson, 2003; Tesar and Smolensky, 2000;
6 Related Work Boersma and Hayes, 2001), with some exceptions
(Doyle et al., 2014; Doyle and Levy, 2016).
Learning (morpho-)phonology is a rich and active
area of research; in this overview, we focus on 7 Conclusion
approaches that share our goal of inferring fully
interpretable models of phonological processes. We have presented a new approach to learning fully
Most closely related to ours is the work of Ellis et al. interpretable phonological rules from sets of related
(2015) and its (unpublished) follow-up (Ellis et al., surface forms. We have shown that our approach pro-
2019) on using program synthesis to infer phonologi- duces rules that largely match linguists’ intuition from
cal rules. As mentioned above, the main difference is a handful of examples and within minutes. The contri-
that S Y P HON is two orders of magnitude faster than butions of this paper are a novel decomposition of the
their system thanks to a novel decomposition and ef- global inference problem into three local problems, as
ficient SMT encoding. On the other hand, we impose well as an encoding of these problems into constraints
extra restrictions on the hypothesis space (i.e. we only that can be efficiently solved by an SMT solver.
support local rules), which means that S Y P HON is
unable to solve some of the harder textbook problems
References
that Ellis et al. (2019) can solve. In addition, Ellis et al.
(2019) propose a method for inducing phonological Adam Albright and Bruce Hayes. 2002. Model-
ing English past tense intuitions with minimal
representations which are universal across languages.
generalization. In Proceedings of the 2002 Work-
Beyond program synthesis, Rasin et al. (2017) shop on Morphological Learning, Association for
use a comparable description length-based approach Computational Linguistics.
to unsupervised joint inference of underlying
Rajeev Alur, Rastislav Bodı́k, Garvit Juniwal, Milo
phonological forms and rewrite rule representations M. K. Martin, Mukund Raghothaman, Sanjit A. Se-
of phonological processes, but use a genetic shia, Rishabh Singh, Armando Solar-Lezama, Emina
algorithm to find approximate solutions. Gildea and Torlak, and Abhishek Udupa. 2013. Syntax-guided
Jurafsky (1996) and Chandlee et al. (2014) discuss synthesis. In Formal Methods in Computer-Aided
Design, FMCAD 2013, pages 1–8.
supervised learning of restricted classes of finite-state
transducer representations of several phonological Rajeev Alur, Arjun Radhakrishna, and Abhishek Udupa.
processes (including English flapping). To date, 2017. Scaling enumerative program synthesis via
6184
divide and conquer. In International Conference Daniel Gildea and Daniel Jurafsky. 1996. Learning bias
on Tools and Algorithms for the Construction and and phonological-rule induction. Computational
Analysis of Systems, pages 319–336. Springer. Linguistics, 22(4):497–530.
R. Harald Baayen, Richard Piepenbrock, and H van Rijn. John Goldsmith and Jason Riggle. 2012. Information
1993. The CELEX lexical database on CD-ROM. theoretic approaches to phonological structure: the
case of Finnish vowel harmony. Natural Language
Nikolaj Bjørner, Anh-Dung Phan, and Lars Flecken- & Linguistic Theory, 30(3):859–896.
stein. 2015. νz - an optimizing SMT solver. In Tools
and Algorithms for the Construction and Analysis of Sharon Goldwater and Mark Johnson. 2003. Learning
Systems, pages 194–199. OT Constraint Rankings Using a Maximum Entropy
Model. Proceedings of the Stockholm Workshop on
Paul Boersma and Bruce Hayes. 2001. Empirical Variation within Optimality Theory, pages 111–120.
Tests of the Gradual Learning Algorithm. Linguistic
Inquiry, 32(1):45–86. Noah D Goodman, Joshua B Tenenbaum, Jacob
Feldman, and Thomas L Griffiths. 2008. A rational
Jane Chandlee, Rémi Eyraud, and Jeffrey Heinz. 2014. analysis of rule-based concept learning. Cognitive
Learning strictly local subsequential functions. science, 32(1):108–154.
Transactions of the Association for Computational
Linguistics, 2:491–504. Dan Gusfield. 1997. Algorithms on Strings, Trees and
Sequences. Cambridge University Press.
Noam Chomsky and Morris Halle. 1968. The Sound Pat-
tern of English. Studies in language. Harper & Row. Carlos Gussenhoven and Haike Jacobs. 2017. Under-
standing Phonology. Routledge.
Leonardo Mendonça de Moura and Nikolaj Bjørner.
2008. Z3: an efficient SMT solver. In TACAS, Bruce Hayes and Colin Wilson. 2008. A maximum
volume 4963 of LNCS, pages 337–340. Springer. entropy model of phonotactics and phonotactic
learning. Linguistic Inquiry, 39(3):379–440.
Gabriel Doyle, Klinton Bicknell, and Roger Levy.
2014. Nonparametric Learning of Phonological Géraldine Legendre, Yoshiro Miyata, and Paul
Constraints in Optimality Theory. In Proceedings Smolensky. 1990. Harmonic Grammar – A formal
of the 52nd Annual Meeting of the Association for multi-level connectionist theory of linguistic well-
Computational Linguistics, pages 1094–1103. formedness: Theoretical foundations. Technical
Report ICS # 90-5, CU-CS-465-90, University of
Gabriel Doyle and Roger Levy. 2016. Data-driven Colorado.
learning of symbolic constraints for a log-linear
David Odden. 2005. Introducing Phonology. Cam-
model in a phonological setting. In Proceedings of
bridge University Press.
COLING 2016, the 26th International Conference on
Computational Linguistics: Technical Papers, pages José Oncina, Pedro Garcı́a, and Enrique Vidal. 1993.
2217–2226. Learning Subsequential Transducers for Pattern
Recognition Interpretation Tasks. IEEE Transac-
Greg Durrett and John DeNero. 2013. Supervised
tions on Pattern Analysis and Machine Intelligence,
learning of complete morphological paradigms. In
15(5):448–458.
Proceedings of the North American Chapter of the
Association for Computational Linguistics. Alan Prince and Paul Smolensky. 2004. Optimality The-
ory: Constraint interaction in generative grammar.
Kevin Ellis, Adam Albright, Armando Solar-Lezama, Wiley-Blackwell.
Joshua B. Tenenbaum, and Timothy J. O’Donnell.
2019. Synthesizing theories of human language with Ezer Rasin, Iddo Berger, Nur Lan, and Roni Katzir.
Bayesian program induction. In Prep. 2017. Acquiring opaque phonological interactions
using Minimum Description Length. In Supple-
Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, mental Proceedings of the 2017 Annual Meeting on
and Josh Tenenbaum. 2018. Learning to infer Phonology.
graphics programs from hand-drawn images. In
Advances in Neural Information Processing Systems Iggy Roca and Wyn Johnson. 1999. A Workbook in
31, pages 6059–6068. Curran Associates, Inc. Phonology. Blackwell.
Kevin Ellis, Armando Solar-Lezama, and Josh Tenen- Rohit Singh, Venkata Vamsikrishna Meduri, Ahmed K.
baum. 2015. Unsupervised learning by program Elmagarmid, Samuel Madden, Paolo Papotti, Jorge-
synthesis. In Advances in neural information Arnulfo Quiané-Ruiz, Armando Solar-Lezama, and
processing systems, pages 973–981. Nan Tang. 2017. Synthesizing entity matching rules
by examples. PVLDB, 11(2):189–202.
Richard Futrell, Adam Albright, Peter Graff, and
Timothy J. O’Donnell. 2017. A generative model Armando Solar-Lezama. 2013. Program sketch-
of phonotactics. Transactions of the Association for ing. International Journal on Software Tools for
Computational Linguistics, 5:73–86. Technology Transfer, 15(5-6):475–495.
6185
Bruce Tesar and Paul Smolensky. 2000. Learnability in
Optimality Theory. MIT Press.
Abhinav Verma, Vijayaraghavan Murali, Rishabh
Singh, Pushmeet Kohli, and Swarat Chaudhuri.
2018. Programmatically interpretable reinforcement
learning. In Proceedings of the 35th International
Conference on Machine Learning, pages 5052–5061.
R. L Weide. 2014. The CMU pronouncing dictionary.
Release 0.7b.
6186