Dependency Grammars and Context-Free Grammars
Dependency Grammars and Context-Free Grammars
Steven Abney
University of Tübingen
The question arises from time to time what the relation is between depen-
dency grammars (DG’s) and phrase-structure grammars. A classic paper by
Gaifman [1] would appear to have laid the issue to rest, by proving that depen-
dency grammars are a special case of context-free grammars (CFG’s). Gaifman
proves that dependency grammars are equivalent to a proper subset of phrase-
structure grammars, those of degree ≤ 1, which I will dub d1-CFG’s. (As degree
cannot be explained in a few words, I will leave it undefined for the moment.)
Under a weaker notion of correspondence, which Gaifman attributes to Hays
[2], dependency grammars correspond to finite-degree CFG’s, which represent
a larger subset of CFG’s, but a proper subset nonetheless.
I submit, however, that Gaifman correlates DG’s with a proper subset of
CFG’s only by suppressing the essential property of DG’s, namely, their head-
edness. I would like to show that, if we take headedness seriously, we are led
to the conclusion that DG’s and CFG’s both represent equivalence classes of
what I will call headed context-free grammars (HCFG’s), but that neither DG’s
nor CFG’s subsume the other. Nonetheless, Gaifman’s result is preserved in
a different form, in that the equivalence classes defined by CFG’s include all
HCFG’s, but the equivalence classes defined by DG’s include only finite-degree
HCFG’s.
In particular, each HCFG has a unique characteristic grammar, a CFG that
abstracts away from the choice of heads in the HCFG. Each HCFG also has
a unique projection grammar, a DG representing the dependencies among pro-
jections in the headed trees generated by the HCFG. The projection grammar
abstracts away from the order in which dependent projections are combined
with governing projections. Each relation defines equivalence classes: the class
of HCFG’s having the same characteristic grammar, the class of HCFG’s having
the same projection grammar. But the equivalence classes are incomparable.
There are HCFG’s that have the same characteristic grammar, but different pro-
jection grammars; and there are HCFG’s with the same projection grammar,
but different characteristic grammars.
To flesh out this sketch, we need to define some terms. A DG is a tuple
G = (Σ, P, S), where Σ is a set of word categories, P is a set of productions,
and S ⊆ Σ is the set of start symbols. Productions are of the form X(α; β),
where X is a category, and α, β are sequences of categories. The productions
license dependency trees. A dependency tree is licensed by G iff every node
is licensed by some production of G. A production X(Y1 , . . . , Ym ; Z1 , . . . Zn )
1
licenses a node iff the node is of category X, its left dependents are of categories
Y1 , . . . , Ym , in that order, and its right dependents of categories Z1 , . . . , Zn , in
that order.
To relate a dependency tree to a sequence of words, we also require a lexicon,
which assigns words to categories. For example, the following DG-cum-lexicon
licenses (generates) the dependency tree (1), among others.
X(W ; ²) x ∈ X S=Y
Y (X; Z) y ∈ Y
z∈Z
w∈W
Y
(1)
X Z
w x y z
W *X
w x y z
2
phrase-structure tree is identical to the dependency tree, with the exception that
extra terminal nodes are introduced for internal nodes in the dependency tree.
This is necessary because only terminal nodes correspond to words in a phrase-
structure tree, whereas in a dependency tree, no distinction is made between
terminal and nonterminal nodes. (2) is the tree induced by (1); *X and *Y are
new categories representing the heads of X and Y , respectively.
This seems innocuous enough, but it has serious consequences. Gaifman
proceeds to define two grammars to be equivalent if they generate/induce the
same set of unlabelled trees. But as Gaifman himself points out, dependency
trees that differ not only in labels, but also in structure, induce the same unla-
belled phrase-structure trees. For example, the dependency trees (3a) and (3b)
induce the same unlabelled phrase-structure tree (3c).
Y Z
(3) a. b. c.
X Z W Y
W X
w x y z w x y z w x y z
Gaifman justifies his decision by pointing out that different dependency trees
induce different labelled phrase-structure trees. For example, the dependency
trees in (3) induce the labelled trees in (4).
Y Z
(4) a. b.
X *Y Z W Y *Z
W *X *W X
w x y z w x y z
3
(assuming fixed, infinite sets of categories), we know that equivalence relations
between them exist, and it’s not too surprising if we can construct one, with a
bit of cleverness. But we could equally well construct an equivalence relation
to show that the class of CFG’s is equivalent to some proper subset of DG’s.
For example, we could convert each CFG to Greibach normal form, following a
convention regarding category names to guarantee that each CFG is mapped to
a different GNF grammar, then map the GNF grammar to a DG in the obvious
way. This construction would permit us to say that CFG’s are equivalent to
a proper subset of the DG’s, namely, those in which no production has any
left-dependents.
In brief, mapping dependency trees to phrase-structure trees is interesting
only if no significant properties of the dependency trees are suppressed. Since the
distinction between a node and its dependents is clearly a significant property
of dependency trees, we need to preserve it in the mapping.
Therefore, instead of mapping dependency trees to phrase-structure trees
simpliciter, let us map dependency trees to headed phrase-structure trees. In
a headed phrase-structure tree, a unique child of each node is distinguished as
the head. Formally, we define a node (without label) as a pair (α, i), where
α (a sequence of nodes) is the node’s children, and i is the index of the head
node. Notationally, Gaifman’s star convention is as good as any other, so we will
write headed phrase-structure trees as in (4).1 However, the stars are now to be
understood not as part of the category-name, but as a mark distinguishing the
head node. The unlabelled headed tree corresponding to (4a) is not (3c), but
rather, (3c) with stars on the second and third terminal nodes. The unlabelled
headed tree corresponding to (4b) would have stars on the first and fourth
terminal nodes instead.
To generate headed phrase-structure trees, we define a headed context-free
grammar to be a context-free grammar in which a unique element on the right-
hand side of each production is distinguished as the head. Formally, we define
productions as pairs (r, i), where r is a CFG production, and i is the index of
the head, with 1 ≤ i ≤ |r|. (We assume there are no epsilon productions. With
a separate lexicon, we can replace epsilon with an epsilon terminal category. In
the lexicon, the empty string is assigned to the epsilon terminal category.)
The characteristic grammar C(G) of a HCFG G is the CFG obtained by
omitting the head indices. Every HCFG has a unique characteristic grammar,
and every CFG is the characteristic grammar of at least one HCFG. C is not
one-one. More than one HCFG may have the same characteristic grammar; or
said the other way round, we may choose heads for a CFG in more than one
way.
Two HCFG’s are characteristic-equivalent if they have the same charac-
teristic grammar. Two characteristic-equivalent grammars are also strongly
1 Another convenient convention, which I will adopt without further comment, is to draw
trees such that heads are connected to their parent by a vertical line, and non-heads are
connected to their parent by oblique lines.
4
equivalent, in the following sense. We can map headed phrase-structure trees
to unheaded phrase-structure trees by erasing the head-markings (informally
speaking). In this manner, we can define the unheaded tree-set generated by
an HCFG by mapping each headed tree it generates to an unheaded one. An
HCFG is strongly equivalent to its characteristic grammar in the sense that
both generate the same set of unheaded trees.
We can also relate an HCFG to a unique DG, which I will call its projection-
dependency grammar. A projection in a headed tree is a path from a terminal
node upward, from head to parent, terminating in a node that is not the head of
its parent. The degree of a headed tree is the length of the longest projection it
contains. We define the degree of a headed grammar as the maximal degree of
any headed tree it generates. The degree of a CFG, in turn, is the minimal degree
for any choice of heads (yielding an HCFG). This is not Gaifman’s definition of
degree, inasmuch as he does not talk about headed trees or headed grammars,
but it is extensionally equivalent as applied to CFG’s.
We next need to assign categories to projections. Define a projection-category
to be a sequence of categories. A projection is a sequence of nodes; its projection-
category is the corresponding sequence of node categories. For example, there
are three projections in tree (5), and their projection-categories are (A), (B, D, S),
and (C), respectively.
(5) S
A D
B C
x y z
5
(6) BDS
A C
x y z
(7) a. S b. S
A D D C
B C A B
x y z x y z
HCFG. Not all DG’s use projection-categories as categories; but every DG is identical up to
choice of category names to some DG that is a projection grammar.
6
HCFG’s are projection-equivalent, they are strongly equivalent in the sense of
generating the same (projection-)dependency trees. (This is of course a different
sense of strong equivalence from that relating HCFG’s to their characteristic
CFG’s.)
As we have seen, both C and Π induce equivalence classes of HCFG’s. But
the equivalence classes defined by C and Π are incomparable. There are HCFG’s
that have the same characteristic grammar, but different projection grammars.
Conversely, there are HCFG’s with the same projection grammar, but different
characteristic grammars. (8a) provides an example of the former; (8b) of the
latter.
(8) a. G S → ∗a b S → a ∗b
C(G) S → a b S → a b
Π(G) aS(²; b) bS(a; ²)
b. G S → a ∗A S → ∗A c
A → ∗b c A → a ∗b
C(G) S → a A S→Ac
A→bc A→ab
7
and since then, phrase-structure rules have been virtually taboo. Or again, in
the area of GB parsing there has been a general concensus that, with respect to
the structures assigned to sentences, a vector of parameter settings is probably
equivalent to some augmented context-free grammar. Nonetheless, the view is
adopted very persistently that the GB description is true, and the augmented
context-free description is false, and the parser must therefore use as data struc-
tures GB principles and parameters.
One contributor to this attitude is perhaps Chomsky’s early characteriza-
tion of simplicity in terms of the number of symbols used in writing down a
grammar. That view of simplicity is a valid approach in the context of gram-
matical inference, where the grammar to be infered is drawn from a specified
class of grammars, and one must choose somehow among the grammars that all
describe the input sample equally well. But Chomsky has long since abandoned
the view that grammatical inference consists in choosing from the class of TG’s
the simplest TG compatible with the input.
A grammar is a mathematical characterization of a set of syntactic struc-
tures. Any other equivalent description is to be freely used if it is more conve-
nient for some purpose. As for the syntactic structures, they are also mathe-
matical descriptions of properties of sentences, and any equivalent description
may be used if it is more convenient for establishing some result or stating some
constraint. The ability to translate among representations is itself a powerful
and desirable tool.
If the result reported here is of linguistic interest, it is because some con-
straints on syntactic structures are most easily established using projection-
dependency trees, and others are most easily established using characteristic
trees. The result does not mean that phrase-structure trees are right and de-
pendency trees are wrong. Indeed, if one is a true description of a sentence,
the other is ipso facto true; though they may be differentially useful in different
contexts.
References
[1] Haim Gaifman. Dependency systems and phrase structure systems. Techni-
cal Report P-2315, The RAND Corporation, Santa Monica, CA, May 1961.