M CFG S For Linguists
M CFG S For Linguists
for linguists
Alexander Clark
September 20, 2014
This is a gentle introduction to Multiple Context Free Grammars
(mcfgs), intended for linguists who are familiar with context free
grammars and movement based analyses of displaced constituents, but
unfamiliar with Minimalist Grammars or other mildly context-sensitive
formalisms.1 1
I originally wrote this for LOT 2012.
This is a lightly edited version with a
few corrections.
Introduction
a S b
a S b
a b
We can view the CF production S aSb as an instruction to build
the tree starting from the top and going down: that is to say if we
have a partial tree like this:
S
a S b
We can view the productions as saying that we can replace the
boxed S node in the tree with a subtree either of the same type or of
the type:
S
a b
An alternative way is to view it as a bottom-up construction. We
can think of the production S ab as saying, not that we can rewrite
S as ab but rather that we can take an a and a b and we can stick
them together to form an S. That is to say the production S aSb The switch to bottom up derivations is
says that we can take the three chunks: also a part of the Minimalist Program.
a and w and b
and combine them to form a new unit like this
S
a S b
w
an introduction to multiple context free grammars for linguists 4
S( aabb). (1)
If u is a P and v is a Q, then uv is an N.
Writing this using the predicate notation this just says: P(u) and
Q(v) imples N (uv) We will write this therefore as the following
production:
Note that the terminal symbols a and b end up here on the left hand
side of the rule. This is because we know that an a is an a!
We could define predicates like A which is true only of a, and then
have the rule S(uwv) A(u), S(w), B(v), together with the rules
with an empty righthand side A( a) and B(b).
Here A, and B would be like preterminals in a CFG, or lexical
categories in some sense:
an introduction to multiple context free grammars for linguists 5
A S B
a b
Let us consider a point which is kind of trivial here but becomes
nontrivial later. Compare the two following rules:
If you think about this for a moment, they say exactly the same
thing. Two context free rules N PQ and N QP are different
because we express the two different concatenations uv versus vu by
the order of the nonterminals on the right hand side of the rule. In
the case of this new format, we express the concatenation explicitly
using variables. This means we have a little bit of spurious ambiguity
in the rules. We can stipulate that we only want the first one for
example by requiring that the order of the variables on the left hand
side must match the order of the variables in the right hand side.
Think a little more here if we have two string u and v there are
basically two ways to stick them together: uv and vu. We can express
this easily using the standard CFG rule, just by using the two orders
PQ and QP. But are there only two ways to stick them together?
What about uuv? Or uuuu? Or uvuvvuuu? Actually there are, once
we broaden our horizons, a whole load of different ways of combin-
ing u and v beyond the two trivial ones uv and vu. But somehow the
first two ways are more natural than the others because they satisfy
two conditions: first, we dont copy any of them, and secondly we
dont discard any of them. That is to say the total length of uv and vu
is going to be just the sum of the lengths of u and v no matter how
long u and v are. Each variable on the right hand side of the rule oc-
curs exactly once on the left handside of the rule and there arent
any other variables. uuuv is bad because we copy u several times,
and uu is bad because we discard v (and copy u). We will say that
concatenation rules that satisfy these two conditions are linear.
So the rule N (uv) P(u) Q(v) is linear, but N (uvu) P(u) Q(v)
is not.
Summarising, the grammar for an bn above just has two produc-
tions that we write as S( ab) and S( awb) S(w).
When we have a derivation tree for a string like aabb we can write
this with each label of the tree being a nonterminal as we did earlier
this is the standard way or we can write it where each node is
an introduction to multiple context free grammars for linguists 6
labeled with the string that that subtree derives, or we can label them
with both.
S aabb S(aabb)
a S b a ab b a S(ab) b
a b a b a b
The last of these three trees has the most information, but if we
have long strings, then the last two types of tree will get long and
unwieldy.
Finally, when we look at the last two trees, we can see that we
have a string attached to each node of the tree. At the leaves of the
tree we just have individual letters, but at the nonleaf nodes we have
sequences that may be longer. Each of these sequences corresponds
to an identifiable subsequence of the surface string. That is to say,
when we consider the node labeled S( ab) in the final tree, the ab is
exactly the ab in the middle of the string.
S(aabb)
a S(ab) b
a b
a a b b
the N RP
book R S
that NP VP
I V t
read e
of strings will correspond to the two parts of this: the first element
of the pair will correspond to the segment of the constituent to the
left of the gap (before the gap) and the second element of the pair
will correspond to the segment after the gap. This is not the only
interpretation: we might also have some construction that would be
modeled as movement in that case we might have one part of the
pair representing the moving constituent, and the other part repre-
senting the constituent that it is moving out of.
So in our toy examples above we had a nonterminal N and we
would write N (w) to mean that w is an N. We will now consider a
two-place predicate on strings, that applies not just to one string, but
to two strings. So we will have some predicates, say M which are two
place predicates and we will write M(u, v) to mean that the ordered
pair of strings u, v satisfies the predicate M. Rather than having a
nonterminal generating a single string S aabb, we will have a non-
terminal that might rewrite a pair of strings M ( aa, bb). Since we
want to define a formal language, which is a set of strings, we will
always want to have some normal predicates which just generate
strings which will include S. Terminologically, we will say that all
an introduction to multiple context free grammars for linguists 9
N2 ( a, b). (10)
Finally we have a trivial rule this just asserts that ( a, b) is gener-
ated by N2 . This is just an analog of the way that S ab becomes
S( ab), but in this case we have an ordered pair of strings rather than
a single string.
An MCFG grammar consists, just like a CFG grammar, of a collec-
tion of nonterminals and some productions. Lets look at the gram-
mar that has just these three rules. What language does it define?
Just as with a CFG, we consider the language defined by an MCFG
to be the set of strings generated by the symbol S1 : since this has
dimension 1 by definition, we know that it generates a set of strings
and not a set of pairs of strings. N2 on the other hand generates a
set of pairs of strings. Let us think what this set is. To start off with
an introduction to multiple context free grammars for linguists 10
a S b
the tree occur in exactly the order of the symbols on the right hand
side aSb. But as we saw earlier, we dont have such a rigid notion
of order, as the order is moved onto the specific function that we put
on the left hand side of the rule. So we could write it in a number of
ways:
S1 S1 S1
N2 N2 N2
a b N2 a N2 b N2 a b
a b N2 a N2 b N2 a b
a b a b a b
This difference doesnt matter. What matters crucially is this:
while we have defined exactly the same language as the CFG ear-
lier, the structures we have defined are completely different in each
local tree we have an a and a b just as with the CFG, but think about
which pairs of as and bs are derived simultaneously. To see this easily
an introduction to multiple context free grammars for linguists 11
N2
a1 b1 N2
a2 b2 N2
a3 b3
Working bottom up in this tree we see that the bottommost N2
derives the pair a3 , b3 , the next one up derives a2 a3 , b2 , b3 and the
top one derives a1 a2 a3 , b1 , b2 , b3 , and thus S derives a1 a2 a3 b1 , b2 , b3 .
Thus we have cross serial dependencies in this case as opposed to the
hierarchical dependencies.
We can write this into the tree like this as what is sometimes called
a value tree.
S( aaabbb) a1 a2 a3 b1 b2 , b3
N2 ( aaa, bbb) a1 a2 a3 , b1 b2 b3
a b N2 ( aa, bb) a1 b1 a2 a3 , b2 b3
a b N2 ( a, b) a2 b2 a3 , b3
a b a3 b3
Let us now draw the links from the nodes in the derivation trees to
the symbols in the surface string as before.
S( aaabbb)
N2 ( aaa, bbb)
a b N2 ( aa, bb)
a b N2 ( a, b)
a b
a a a b b b
restricted the rules in some way, we can identify for each node in
the derivation tree a subsequence (which might be discontinuous) of
symbols in the surface string.
Clearly, this gives us some additional descriptive power. Lets look
at a larger class of productions now. Suppose we have three nonter-
minals of dimension 2: N2 , P2 and Q2 , and consider productions that
from N from P and Q.
A simple example looks like this:
X N Y
u,v
We know that the final string will look like lumvr that is to say
we know that there will be a substring u and a substring v occuring
in that order, together with some other material before and after it.
This is the crucial point from a linguistic point of view u, v forms
a discontinuous constituent. We can simultaneously construct the u
and the v even if they are far apart from each other. From a semantic
viewpoint this gives us a domain of locality which is not local with
respect to the surface order of the string this means that for seman-
tic intepretation, a word can be local at two different points in the
string at the same time.
Crucially for the computational complexity of this, the formalism
maintains the context free property in a more abstract level. That is
to say, the validity of a step in the derivation does not depend on the
context that the symbol occurs in, only on the symbol that is being
processed. Thinking of it in terms of trees, this means that if we have
a valid derivation with a subtree headed with some symbol, whether
it is of dimension one or two, we can freely swap that subtree for any
other subtree that has the same symbol as its root, and the result will
be also be a valid derivation.
Noncontext-free languages
The language we defined before was a context-free language and
could be defined by a context free grammar, so in spite of the fact
that the structures we got were not those that could be defined by a
context free grammar, it wasnt really exploiting the power of the for-
malism. Lets look at a classic example, indeed the classic example, of 8
S. Shieber. Evidence against the
context-freeness of natural language.
a linguistic construction that requires more than context free power:
Linguistics and Philosophy, 8:333343,
the famous case of cross-serial dependencies in Swiss German 8 . 1985
The figure above shows the dependencies in the standard exam-
ple. The crucial point is that we have a sequence of noun phrases
that are marked for either accusative or dative case, and a sequence
of verb phrases that require arguments that are marked for either
accusative or dative case, and, finally, the dependencies overlap in
the way shown in the diagram. To get this right requires the sort of
(6) ... das mer em Hans es huus halfed aastriiche
... that we Hans-DAT house-ACC helped paint
an introduction to multiple context free grammars for linguists 14
... that we let the children help Hans paint the house
Argumentation goes as follows: Assume that L is context-free. Then the intersection of a regular language
MCFG
with the imageapparatus
of L under athat we have developed.
homomorphism must be context-free as well. Find a homomorphism and a
Lets abstract
regular language this
such that the aresult
littleisbit andcontext-free
a non consider language.
a formalContradiction.
language for
this
Further non context
possible sentence:free fragment of Swiss German. We consider that
we have the following words or word types: Na , Nd which are re-
(8) ... das mer dchind em Hans es huus haend wele laa halfe aastriiche
spectively accusative
... that we the and Hans-DAT
children-ACC dative noun phrases,
house-ACC Va , V
have d which
wanted are verb
let help paint
phrases that
... that we haverequire
wanted accusative and dative
to let the children noun
help Hans phrases
paint respectively,
the house
and finally C which is a complementizer which appears at the begin-
Swiss German allows constructions of the form (Jan sait) (Jan says)das mer (dchind)i (em Hans)j es
ning of the clause. Thus the language we are looking at consists of
huus haend wele (laa)i (halfe)j aastriiche. In these constructions the number of accusative NPs dchind
must sequences like of verbs (here laa) selecting for an accusative and the number of dative NPs em
equal the number
Hans must equal the number of verbs (here halfe) selecting for a dative object. Furthermore, the order
be theCN
must (14) a Vin
same a the sense that if all accusative NPs precede all dative NPs, then all verbs selecting
an accusative must precede all verbs selecting a dative.
(15) CNdfV:d
Homomorphism
f (dchind) = a f (Jan sait das mer) = w
f(16) CNa Na N=d VabVa Vd
(em Hans) f (es huus haend wele) = x
f (laa) = c f (aastriiche) = y
butf (halfe)
crucially =
doesd not contain examples where f (s) the
= sequence of ac-
z otherwise
cusative/dative markings on the noun sequence is different from the
sequence of requirements on the verbs. So it does not contain CNd Va ,
because the verb requires an accusative 6 and it only has a dative, nor
does it include CNa Nd Vd Va , because though there are the right num-
ber of accusative and dative arguments (one each) they are in the
wrong order the reverse order. Actually this is incorrect: according to
We can write down a simple grammar with just two nonterminals: Shieber the nested orders are acceptable
as well. We neglect this for ease of
we will have one nonterminal which corresponds to the whole clause exposition.
S1 , amd one nonterminal that sticks together the nouns and the verbs
which we will call T2 . No linguistic interpretation is intended for
these two labels they are just arbitrary symbols. As the subscripts
indicate, the S1 is of dimension 1 and just derives whole strings, and
the T2 has dimension 2, and derives pairs of strings. The first of the
pair will be a sequence of Ns and the second of the pair will be a
matching sequence of Vs.
an introduction to multiple context free grammars for linguists 15
The first rule has a right hand side which directly introduces the
terminal symbol C and concatenates the two separate parts of the T2 .
The next two rules just introduce the simplest matching of noun and
verb, and the final two give the recursive rules that allow a poten-
tially unbounded sequence of nouns and verbs to occur. Note that
in the final two rules we add the nouns and verbs to the left of the
strings u and v. This is important because, as you can see by looking
at the original Swiss german example, the topmost nound and verb
occur on the left of the string.
Let us look at how this grammar will generate the right strings
and structures for these examples. We will put some derivation trees
for the examples above, together with the matching value trees on
the right. We will start with the two trivial ones which arent very
S1 CNa Va
C T2 C Na , Va
C T2 ( Na , Va )
Na Va
S1 CNd Vd
C T2 C Nd , Vd
Nd Vd Nd Vd
Now we have the example CNd Vd
S1 (CNd Vd )
C T2 ( Nd , Vd )
Nd Vd
an introduction to multiple context free grammars for linguists 16
Now lets consider the longer example with three pairs: CNa Na Nd Va Va Vd .
We have a derivation tree that looks like the following diagram.
S1 (CNa Na Nd Va Va Vd )
C T2 ( Na Na Nd , Va Va Vd )
Na Va T2 ( Na Nd , Va Vd )
Na Va T2 ( Nd , Vd )
Nd Vd
This is a bit cluttered so in the next diagram of the same sentence
we have drawn dotted lines from the terminal symbols in the deriva-
tion tree to the symbols in their positions in the original string so that
the structure of the derivation can be seen more easily.
S1
C T2
Na Va T2
Na Va T2
Nd Vd
C Na Na Nd Va Va Vd
We also do the same thing but where we mark one of the nonter-
minals, here boxed in red, and also mark boxed in blue the discontin-
uous constituent that is derived from that nonterminal.
an introduction to multiple context free grammars for linguists 17
S1
C T2
Na Va T2
Na Va T2
Nd Vd
C Na Na Nd Va Va Vd
C T2
Na Va T2
Na Va T2
Nd Vd
C Na Na Nd Va Va Vd
an introduction to multiple context free grammars for linguists 18
NP
Relative clauses
read book
an introduction to multiple context free grammars for linguists 19
that I read,book
that I read,book
I read book
read book
In fact we will make a minor notational difference and switch the This is because we want to use only
non-permuting rules. This means
orders of the moving and head labels, so that they reflect the final
that when we look at Figure4 the
order in the surface string. diagram on the right hand side has
concatenations that do not permute.
I N V I book read
We can now write down the sorts of rules we need here: for each
local tree, assuming that the not very plausible labels on the trees
are ok, we write down a rule. The nonterminal symbols here in the
MCFG will just be one for each label that we get in the tree. Note
that these correspond in some sense to tuples of more primitive la-
bels. Thus the node in the tree labeled N, S will turn into an atomic
symbol, NS, in the MCFG we are writing, but has some internal
structure that we flatten out it is an S with an N that is moving out
of it.
an introduction to multiple context free grammars for linguists 20
element using the symbol T1 for trace, we can define a rule T1 () that
introduces the empty string . We then define a nonterminal of di-
mension 2, NT2 which generates the moving word and trace. We can
then rewrite the rule using the two rules.
Discussion
Acknowledgments
References