12
12
Copyright
c 2019. All
rights reserved. Draft of October 2, 2019.
CHAPTER
Constituency Grammars
12
The study of grammar has an ancient pedigree; Panini’s grammar of Sanskrit was
written over two thousand years ago and is still referenced today in teaching San-
skrit. Despite this history, knowledge of grammar remains spotty at best. In this
chapter, we make a preliminary stab at addressing some of these gaps in our knowl-
edge of grammar and syntax, as well as introducing some of the formal mechanisms
that are available for capturing this knowledge in a computationally useful manner.
syntax The word syntax comes from the Greek sýntaxis, meaning “setting out together
or arrangement”, and refers to the way words are arranged together. We have seen
various syntactic notions in previous chapters. The regular languages introduced in
Chapter 2 offered a simple way to represent the ordering of strings of words, and
Chapter 3 showed how to compute probabilities for these word sequences. Chap-
ter 8 showed that part-of-speech categories could act as a kind of equivalence class
for words. In this chapter and the next few we introduce a variety of syntactic phe-
nomena and models for syntax that go well beyond these simpler approaches.
The bulk of this chapter is devoted to the topic of context-free grammars. Context-
free grammars are the backbone of many formal models of the syntax of natural
language (and, for that matter, of computer languages). As such, they are integral to
many computational applications, including grammar checking, semantic interpreta-
tion, dialogue understanding, and machine translation. They are powerful enough to
express sophisticated relations among the words in a sentence, yet computationally
tractable enough that efficient algorithms exist for parsing sentences with them (as
we show in Chapter 13). In Chapter 14, we show that adding probability to context-
free grammars gives us a powerful model of disambiguation. And in Chapter 17 we
show how they provide a systematic framework for semantic interpretation.
The constituency grammars we introduce here, however, are not the only pos-
sible formal mechanism for modeling syntax. Chapter 15 will introduce syntactic
dependencies, an alternative model that is the core representation for dependency
parsing. Both constituency and dependency formalisms are important for language
processing.
In addition to introducing grammar formalism, this chapter also provides a brief
overview of the grammar of English. To illustrate our grammars, we have chosen
a domain that has relatively simple sentences, the Air Traffic Information System
(ATIS) domain (Hemphill et al., 1990). ATIS systems were an early example of
spoken language systems for helping book airline reservations. Users try to book
flights by conversing with the system, specifying constraints like I’d like to fly from
Atlanta to Denver.
2 C HAPTER 12 • C ONSTITUENCY G RAMMARS
12.1 Constituency
The fundamental notion underlying the idea of constituency is that of abstraction
— groups of words behaving as single units, or constituents. A significant part of
developing a grammar involves discovering the inventory of constituents present in
the language.
noun phrase How do words group together in English? Consider the noun phrase, a sequence
of words surrounding at least one noun. Here are some examples of noun phrases
(thanks to Damon Runyon):
Harry the Horse a high-class spot such as Mindy’s
the Broadway coppers the reason he comes into the Hot Box
they three parties from Brooklyn
What evidence do we have that these words group together (or “form constituents”)?
One piece of evidence is that they can all appear in similar syntactic environments,
for example, before a verb.
three parties from Brooklyn arrive. . .
a high-class spot such as Mindy’s attracts. . .
the Broadway coppers love. . .
they sit
But while the whole noun phrase can occur before a verb, this is not true of each
of the individual words that make up a noun phrase. The following are not grammat-
ical sentences of English (recall that we use an asterisk (*) to mark fragments that
are not grammatical English sentences):
*from arrive. . . *as attracts. . .
*the is. . . *spot sat. . .
Thus, to correctly describe facts about the ordering of these words in English, we
must be able to say things like “Noun Phrases can occur before verbs”.
preposed Other kinds of evidence for constituency come from what are called preposed or
postposed postposed constructions. For example, the prepositional phrase on September sev-
enteenth can be placed in a number of different locations in the following examples,
including at the beginning (preposed) or at the end (postposed):
On September seventeenth, I’d like to fly from Atlanta to Denver
I’d like to fly on September seventeenth from Atlanta to Denver
I’d like to fly from Atlanta to Denver on September seventeenth
But again, while the entire phrase can be placed differently, the individual words
making up the phrase cannot be
*On September, I’d like to fly seventeenth from Atlanta to Denver
*On I’d like to fly September seventeenth from Atlanta to Denver
*I’d like to fly on September from Atlanta to Denver seventeenth
free grammars are also called Phrase-Structure Grammars, and the formalism
is equivalent to Backus-Naur Form, or BNF. The idea of basing a grammar on
constituent structure dates back to the psychologist Wilhelm Wundt (1900) but was
not formalized until Chomsky (1956) and, independently, Backus (1959).
rules A context-free grammar consists of a set of rules or productions, each of which
expresses the ways that symbols of the language can be grouped and ordered to-
lexicon gether, and a lexicon of words and symbols. For example, the following productions
NP express that an NP (or noun phrase) can be composed of either a ProperNoun or
a determiner (Det) followed by a Nominal; a Nominal in turn can consist of one or
more Nouns.
NP → Det Nominal
NP → ProperNoun
Nominal → Noun | Nominal Noun
Context-free rules can be hierarchically embedded, so we can combine the pre-
vious rules with others, like the following, that express facts about the lexicon:
Det → a
Det → the
Noun → flight
The symbols that are used in a CFG are divided into two classes. The symbols
terminal that correspond to words in the language (“the”, “nightclub”) are called terminal
symbols; the lexicon is the set of rules that introduce these terminal symbols. The
non-terminal symbols that express abstractions over these terminals are called non-terminals. In
each context-free rule, the item to the right of the arrow (→) is an ordered list of one
or more terminals and non-terminals; to the left of the arrow is a single non-terminal
symbol expressing some cluster or generalization. Notice that in the lexicon, the
non-terminal associated with each word is its lexical category, or part-of-speech,
which we defined in Chapter 8.
A CFG can be thought of in two ways: as a device for generating sentences
and as a device for assigning a structure to a given sentence. Viewing a CFG as a
generator, we can read the → arrow as “rewrite the symbol on the left with the string
of symbols on the right”.
So starting from the symbol: NP
we can use our first rule to rewrite NP as: Det Nominal
and then rewrite Nominal as: Det Noun
and finally rewrite these parts-of-speech as: a flight
We say the string a flight can be derived from the non-terminal NP. Thus, a CFG
can be used to generate a set of strings. This sequence of rule expansions is called a
derivation derivation of the string of words. It is common to represent a derivation by a parse
parse tree tree (commonly shown inverted with the root at the top). Figure 12.1 shows the tree
representation of this derivation.
dominates In the parse tree shown in Fig. 12.1, we can say that the node NP dominates
all the nodes in the tree (Det, Nom, Noun, a, flight). We can say further that it
immediately dominates the nodes Det and Nom.
The formal language defined by a CFG is the set of strings that are derivable
start symbol from the designated start symbol. Each grammar must have one designated start
symbol, which is often called S. Since context-free grammars are often used to define
sentences, S is usually interpreted as the “sentence” node, and the set of strings that
are derivable from S is the set of sentences in some simplified version of English.
4 C HAPTER 12 • C ONSTITUENCY G RAMMARS
NP
Det Nom
a Noun
flight
Let’s add a few additional rules to our inventory. The following rule expresses
verb phrase the fact that a sentence can consist of a noun phrase followed by a verb phrase:
S → NP VP I prefer a morning flight
A verb phrase in English consists of a verb followed by assorted other things;
for example, one kind of verb phrase consists of a verb followed by a noun phrase:
VP → Verb NP prefer a morning flight
Or the verb may be followed by a noun phrase and a prepositional phrase:
VP → Verb NP PP leave Boston in the morning
Or the verb phrase may have a verb followed by a prepositional phrase alone:
VP → Verb PP leaving on Thursday
A prepositional phrase generally has a preposition followed by a noun phrase.
For example, a common type of prepositional phrase in the ATIS corpus is used to
indicate location or direction:
PP → Preposition NP from Los Angeles
The NP inside a PP need not be a location; PPs are often used with times and
dates, and with other nouns as well; they can be arbitrarily complex. Here are ten
examples from the ATIS corpus:
to Seattle on these flights
in Minneapolis about the ground transportation in Chicago
on Wednesday of the round trip flight on United Airlines
in the evening of the AP fifty seven flight
on the ninth of July with a stopover in Nashville
Figure 12.2 gives a sample lexicon, and Fig. 12.3 summarizes the grammar rules
we’ve seen so far, which we’ll call L0 . Note that we can use the or-symbol | to
indicate that a non-terminal has alternate possible expansions.
We can use this grammar to generate sentences of this “ATIS-language”. We
start with S, expand it to NP VP, then choose a random expansion of NP (let’s say, to
I), and a random expansion of VP (let’s say, to Verb NP), and so on until we generate
the string I prefer a morning flight. Figure 12.4 shows a parse tree that represents a
complete derivation of I prefer a morning flight.
It is sometimes convenient to represent a parse tree in a more compact format
bracketed called bracketed notation; here is the bracketed representation of the parse tree of
notation
Fig. 12.4:
12.2 • C ONTEXT-F REE G RAMMARS 5
NP → Pronoun I
| Proper-Noun Los Angeles
| Det Nominal a + flight
Nominal → Nominal Noun morning + flight
| Noun flights
VP → Verb do
| Verb NP want + a flight
| Verb NP PP leave + Boston + in the morning
| Verb PP leaving + on Thursday
(12.1) [S [NP [Pro I]] [VP [V prefer] [NP [Det a] [Nom [N morning] [Nom [N flight]]]]]]
A CFG like that of L0 defines a formal language. We saw in Chapter 2 that a for-
mal language is a set of strings. Sentences (strings of words) that can be derived by a
grammar are in the formal language defined by that grammar, and are called gram-
grammatical matical sentences. Sentences that cannot be derived by a given formal grammar are
ungrammatical not in the language defined by that grammar and are referred to as ungrammatical.
This hard line between “in” and “out” characterizes all formal languages but is only
a very simplified model of how natural languages really work. This is because de-
termining whether a given sentence is part of a given natural language (say, English)
often depends on the context. In linguistics, the use of formal languages to model
generative
grammar natural languages is called generative grammar since the language is defined by
the set of possible sentences “generated” by the grammar.
NP VP
Pro Verb NP
a Nom Noun
Noun flight
morning
Figure 12.4 The parse tree for “I prefer a morning flight” according to grammar L0 .
For the remainder of the book we adhere to the following conventions when dis-
cussing the formal properties of context-free grammars (as opposed to explaining
particular facts about English or other languages).
Capital letters like A, B, and S Non-terminals
S The start symbol
Lower-case Greek letters like α, β , and γ Strings drawn from (Σ ∪ N)∗
Lower-case Roman letters like u, v, and w Strings of terminals
A language is defined through the concept of derivation. One string derives an-
other one if it can be rewritten as the second one by some series of rule applications.
More formally, following Hopcroft and Ullman (1979),
if A → β is a production of R and α and γ are any strings in the set
directly derives (Σ ∪ N)∗, then we say that αAγ directly derives αβ γ, or αAγ ⇒ αβ γ.
Derivation is then a generalization of direct derivation:
Let α1 , α2 , . . . , αm be strings in (Σ ∪ N)∗, m ≥ 1, such that
α1 ⇒ α2 , α2 ⇒ α3 , . . . , αm−1 ⇒ αm
∗
derives We say that α1 derives αm , or α1 ⇒ αm .
We can then formally define the language LG generated by a grammar G as the
set of strings composed of terminal symbols that can be derived from the designated
12.3 • S OME G RAMMAR RULES FOR E NGLISH 7
start symbol S.
∗
LG = {w|w is in Σ ∗ and S ⇒ w}
The problem of mapping from a string of words to its parse tree is called syn-
syntactic
parsing tactic parsing; we define algorithms for parsing in Chapter 13.
S → VP
yes-no question Sentences with yes-no question structure are often (though not always) used to
ask questions; they begin with an auxiliary verb, followed by a subject NP, followed
by a VP. Here are some examples. Note that the third example is not a question at
all but a request; Chapter 26 discusses the uses of these question forms to perform
different pragmatic functions such as asking, requesting, or suggesting.
Do any of these flights have stops?
Does American’s flight eighteen twenty five serve dinner?
Can you give me the same information for United?
Here’s the rule:
S → Aux NP VP
8 C HAPTER 12 • C ONSTITUENCY G RAMMARS
The most complex sentence-level structures we examine here are the various wh-
wh-phrase structures. These are so named because one of their constituents is a wh-phrase, that
wh-word is, one that includes a wh-word (who, whose, when, where, what, which, how, why).
These may be broadly grouped into two classes of sentence-level structures. The
wh-subject-question structure is identical to the declarative structure, except that
the first noun phrase contains some wh-word.
What airlines fly from Burbank to Denver?
Which flights depart Burbank after noon and arrive in Denver by six p.m?
Whose flights serve breakfast?
Here is a rule. Exercise 12.7 discusses rules for the constituents that make up the
Wh-NP.
S → Wh-NP VP
wh-non-subject- In the wh-non-subject-question structure, the wh-phrase is not the subject of the
question
sentence, and so the sentence includes another subject. In these types of sentences
the auxiliary appears before the subject NP, just as in the yes-no question structures.
Here is an example followed by a sample rule:
What flights do you have from Burbank to Tacoma Washington?
S → Wh-NP Aux NP VP
The Determiner
Noun phrases can begin with simple lexical determiners, as in the following exam-
ples:
a stop the flights this flight
those flights any flights some flights
The role of the determiner in English noun phrases can also be filled by more
complex expressions, as follows:
United’s flight
United’s pilot’s union
Denver’s mayor’s mother’s canceled flight
In these examples, the role of the determiner is filled by a possessive expression
consisting of a noun phrase followed by an ’s as a possessive marker, as in the
following rule.
Det → NP 0 s
The fact that this rule is recursive (since an NP can start with a Det) helps us model
the last two examples above, in which a sequence of possessive expressions serves
as a determiner.
Under some circumstances determiners are optional in English. For example,
determiners may be omitted if the noun they modify is plural:
(12.2) Show me flights from San Francisco to Denver on weekdays
As we saw in Chapter 8, mass nouns also don’t require determination. Recall that
mass nouns often (not always) involve something that is treated like a substance
(including e.g., water and snow), don’t take the indefinite article “a”, and don’t tend
to pluralize. Many abstract nouns are mass nouns (music, homework). Mass nouns
in the ATIS domain include breakfast, lunch, and dinner:
(12.3) Does this flight serve dinner?
The Nominal
The nominal construction follows the determiner and contains any pre- and post-
head noun modifiers. As indicated in grammar L0 , in its simplest form a nominal
can consist of a single noun.
Nominal → Noun
As we’ll see, this rule also provides the basis for the bottom of various recursive
rules used to capture more complex nominal constructions.
10 C HAPTER 12 • C ONSTITUENCY G RAMMARS
Nominal → Nominal PP
non-finite The three most common kinds of non-finite postmodifiers are the gerundive (-
ing), -ed, and infinitive forms.
gerundive Gerundive postmodifiers are so called because they consist of a verb phrase that
begins with the gerundive (-ing) form of the verb. Here are some examples:
any of those [leaving on Thursday]
any flights [arriving after eleven a.m.]
flights [arriving within thirty minutes of each other]
12.3 • S OME G RAMMAR RULES FOR E NGLISH 11
We can define the Nominals with gerundive modifiers as follows, making use of
a new non-terminal GerundVP:
We can make rules for GerundVP constituents by duplicating all of our VP pro-
ductions, substituting GerundV for V.
GerundVP → GerundV NP
| GerundV PP | GerundV | GerundV NP PP
The phrases in italics below are examples of the two other common kinds of
non-finite clauses, infinitives and -ed forms:
the last flight to arrive in Boston
I need to have dinner served
Which is the aircraft used by this flight?
A postnominal relative clause (more correctly a restrictive relative clause), is
relative
pronoun a clause that often begins with a relative pronoun (that and who are the most com-
mon). The relative pronoun functions as the subject of the embedded verb in the
following examples:
The relative pronoun may also function as the object of the embedded verb, as
in the following example; we leave for the reader the exercise of writing grammar
rules for more complex relative clauses of this kind.
NP
PreDet NP
Nom PP to Tampa
Noun flights
morning
Figure 12.5 A parse tree for “all the morning flights from Denver to Tampa leaving before 10”.
VP → Verb disappear
VP → Verb NP prefer a morning flight
VP → Verb NP PP leave Boston in the morning
VP → Verb PP leaving on Thursday
Verb phrases can be significantly more complicated than this. Many other kinds
of constituents, such as an entire embedded sentence, can follow the verb. These are
sentential
complements called sentential complements:
You [VP [V said [S you had a two hundred sixty-six dollar fare]]
[VP [V Tell] [NP me] [S how to get from the airport in Philadelphia to down-
town]]
I [VP [V think [S I would like to take the nine thirty flight]]
Here’s a rule for these:
VP → Verb S
Similarly, another potential constituent of the VP is another VP. This is often the
case for verbs like want, would like, try, intend, need:
I want [VP to fly from Milwaukee to Orlando]
Hi, I want [VP to arrange three flights]
12.3 • S OME G RAMMAR RULES FOR E NGLISH 13
While a verb phrase can have many possible kinds of constituents, not every
verb is compatible with every verb phrase. For example, the verb want can be used
either with an NP complement (I want a flight . . . ) or with an infinitive VP comple-
ment (I want to fly to . . . ). By contrast, a verb like find cannot take this sort of VP
complement (* I found to fly to Dallas).
This idea that verbs are compatible with different kinds of complements is a very
transitive old one; traditional grammar distinguishes between transitive verbs like find, which
intransitive take a direct object NP (I found a flight), and intransitive verbs like disappear,
which do not (*I disappeared a flight).
subcategorize Where traditional grammars subcategorize verbs into these two categories (tran-
sitive and intransitive), modern grammars distinguish as many as 100 subcategories.
Subcategorizes We say that a verb like find subcategorizes for an NP, and a verb like want sub-
for
categorizes for either an NP or a non-finite VP. We also call these constituents the
complements complements of the verb (hence our use of the term sentential complement above).
So we say that want can take a VP complement. These possible sets of complements
Subcategorization are called the subcategorization frame for the verb. Another way of talking about
frame
the relation between the verb and these other constituents is to think of the verb as
a logical predicate and the constituents as logical arguments of the predicate. So we
can think of such predicate-argument relations as FIND (I, A FLIGHT ) or WANT (I, TO
FLY ). We talk more about this view of verbs and arguments in Chapter 16 when we
talk about predicate calculus representations of verb semantics. Subcategorization
frames for a set of example verbs are given in Fig. 12.6.
We can capture the association between verbs and their complements by making
separate subtypes of the class Verb (e.g., Verb-with-NP-complement, Verb-with-Inf-
VP-complement, Verb-with-S-complement, and so on):
Each VP rule could then be modified to require the appropriate verb subtype:
VP → Verb-with-no-complement disappear
VP → Verb-with-NP-comp NP prefer a morning flight
VP → Verb-with-S-comp S said there were two flights
A problem with this approach is the significant increase in the number of rules
and the associated loss of generality.
14 C HAPTER 12 • C ONSTITUENCY G RAMMARS
12.3.5 Coordination
conjunctions The major phrase types discussed here can be conjoined with conjunctions like and,
coordinate or, and but to form larger constructions of the same type. For example, a coordinate
noun phrase can consist of two other noun phrases separated by a conjunction:
Please repeat [NP [NP the flights] and [NP the costs]]
I need to know [NP [NP the aircraft] and [NP the flight number]]
Here’s a rule that allows these structures:
NP → NP and NP
Note that the ability to form coordinate phrases through conjunctions is often
used as a test for constituency. Consider the following examples, which differ from
the ones given above in that they lack the second determiner.
Please repeat the [Nom [Nom flights] and [Nom costs]]
I need to know the [Nom [Nom aircraft] and [Nom flight number]]
The fact that these phrases can be conjoined is evidence for the presence of the
underlying Nominal constituent we have been making use of. Here’s a rule for this:
VP → VP and VP
S → S and S
Since all the major phrase types can be conjoined in this fashion, it is also possible
to represent this conjunction fact more generally; a number of grammar formalisms
metarules such as GPSG ((Gazdar et al., 1985)) do this using metarules such as the following:
X → X and X
This metarule simply states that any non-terminal can be conjoined with the same
non-terminal to yield a constituent of the same type. Of course, the variable X
must be designated as a variable that stands for any non-terminal rather than a non-
terminal itself.
12.4 Treebanks
Sufficiently robust grammars consisting of context-free grammar rules can be used
to assign a parse tree to any sentence. This means that it is possible to build a
corpus where every sentence in the collection is paired with a corresponding parse
treebank tree. Such a syntactically annotated corpus is called a treebank. Treebanks play
12.4 • T REEBANKS 15
((S
(NP-SBJ (DT That) ((S
(JJ cold) (, ,) (NP-SBJ The/DT flight/NN )
(JJ empty) (NN sky) ) (VP should/MD
(VP (VBD was) (VP arrive/VB
(ADJP-PRD (JJ full) (PP-TMP at/IN
(PP (IN of) (NP eleven/CD a.m/RB ))
(NP (NN fire) (NP-TMP tomorrow/NN )))))
(CC and)
(NN light) ))))
(. .) ))
(a) (b)
Figure 12.7 Parsed sentences from the LDC Treebank3 version of the Brown (a) and ATIS
(b) corpora.
Figure 12.9 shows a tree from the Wall Street Journal. This tree shows an-
traces other feature of the Penn Treebanks: the use of traces (-NONE- nodes) to mark
syntactic long-distance dependencies or syntactic movement. For example, quotations often
movement
follow a quotative verb like say. But in this example, the quotation “We would have
to wait until we have collected on those assets” precedes the words he said. An
empty S containing only the node -NONE- marks the position after said where the
quotation sentence often occurs. This empty node is marked (in Treebanks II and
III) with the index 2, as is the quotation S at the beginning of the sentence. Such
co-indexing may make it easier for some parsers to recover the fact that this fronted
or topicalized quotation is the complement of the verb said. A similar -NONE- node
1 The Penn Treebank project released treebanks in multiple languages and in various stages; for ex-
ample, there were Treebank I (Marcus et al., 1993), Treebank II (Marcus et al., 1994), and Treebank III
releases of English treebanks. We use Treebank III for our examples.
16 C HAPTER 12 • C ONSTITUENCY G RAMMARS
NP-SBJ VP .
DT JJ , JJ NN VBD ADJP-PRD .
full IN NP
of NN CC NN
marks the fact that there is no syntactic subject right before the verb to wait; instead,
the subject is the earlier NP We. Again, they are both co-indexed with the index 1.
( (S (‘‘ ‘‘)
(S-TPC-2
(NP-SBJ-1 (PRP We) )
(VP (MD would)
(VP (VB have)
(S
(NP-SBJ (-NONE- *-1) )
(VP (TO to)
(VP (VB wait)
(SBAR-TMP (IN until)
(S
(NP-SBJ (PRP we) )
(VP (VBP have)
(VP (VBN collected)
(PP-CLR (IN on)
(NP (DT those)(NNS assets)))))))))))))
(, ,) (’’ ’’)
(NP-SBJ (PRP he) )
(VP (VBD said)
(S (-NONE- *T*-2) ))
(. .) ))
Figure 12.9 A sentence from the Wall Street Journal portion of the LDC Penn Treebank.
Note the use of the empty -NONE- nodes.
The Penn Treebank II and Treebank III releases added further information to
make it easier to recover the relationships between predicates and arguments. Cer-
12.4 • T REEBANKS 17
Grammar Lexicon
S → NP VP . PRP → we | he
S → NP VP DT → the | that | those
S → “ S ” , NP VP . JJ → cold | empty | full
S → -NONE- NN → sky | fire | light | flight | tomorrow
NP → DT NN NNS → assets
NP → DT NNS CC → and
NP → NN CC NN IN → of | at | until | on
NP → CD RB CD → eleven
NP → DT JJ , JJ NN RB → a.m.
NP → PRP VB → arrive | have | wait
NP → -NONE- VBD → was | said
VP → MD VP VBP → have
VP → VBD ADJP VBN → collected
VP → VBD S MD → should | would
VP → VBN PP TO → to
VP → VB S
VP → VB SBAR
VP → VBP VP
VP → VBN PP
VP → TO VP
SBAR → IN S
ADJP → JJ PP
PP → IN NP
Figure 12.10 A sample of the CFG grammar rules and lexical entries that would be ex-
tracted from the three treebank sentences in Fig. 12.7 and Fig. 12.9.
tain phrases were marked with tags indicating the grammatical function of the phrase
(as surface subject, logical topic, cleft, non-VP predicates) its presence in particular
text categories (headlines, titles), and its semantic function (temporal phrases, lo-
cations) (Marcus et al. 1994, Bies et al. 1995). Figure 12.9 shows examples of the
-SBJ (surface subject) and -TMP (temporal phrase) tags. Figure 12.8 shows in addi-
tion the -PRD tag, which is used for predicates that are not VPs (the one in Fig. 12.8
is an ADJP). We’ll return to the topic of grammatical function when we consider
dependency grammars and parsing in Chapter 15.
[DT The] [JJ state-owned] [JJ industrial] [VBG holding] [NN company] [NNP Instituto]
[NNP Nacional] [FW de] [NNP Industria]
[NP Shearson’s] [JJ easy-to-film], [JJ black-and-white] “[SBAR Where We Stand]”
[NNS commercials]
Viewed as a large grammar in this way, the Penn Treebank III Wall Street Journal
corpus, which contains about 1 million words, also has about 1 million non-lexical
rule tokens, consisting of about 17,500 distinct rule types.
Various facts about the treebank grammars, such as their large numbers of flat
rules, pose problems for probabilistic parsing algorithms. For this reason, it is com-
mon to make various modifications to a grammar extracted from a treebank. We
discuss these further in Chapter 14.
S(dumped)
NP(workers) VP(dumped)
a bin
Figure 12.11 A lexicalized tree from Collins (1999).
Figure 12.11 shows an example of such a tree from Collins (1999), in which each
non-terminal is annotated with its head.
For the generation of such a tree, each CFG rule must be augmented to identify
one right-side constituent to be the head daughter. The headword for a node is
then set to the headword of its head daughter. Choosing these head daughters is
simple for textbook examples (NN is the head of NP) but is complicated and indeed
controversial for most phrases. (Should the complementizer to or the verb be the
head of an infinite verb-phrase?) Modern linguistic theories of syntax generally
include a component that defines heads (see, e.g., (Pollard and Sag, 1994)).
An alternative approach to finding a head is used in most practical computational
systems. Instead of specifying head rules in the grammar itself, heads are identified
dynamically in the context of trees for specific sentences. In other words, once
a sentence is parsed, the resulting tree is walked to decorate each node with the
appropriate head. Most current systems rely on a simple set of handwritten rules,
such as a practical one for Penn Treebank grammars given in Collins (1999) but
developed originally by Magerman (1995). For example, the rule for finding the
head of an NP is as follows (Collins, 1999, p. 238):
Selected other rules from this set are shown in Fig. 12.12. For example, for VP
rules of the form VP → Y1 · · · Yn , the algorithm would start from the left of Y1 · · ·
Yn looking for the first Yi of type TO; if no TOs are found, it would search for the
first Yi of type VBD; if no VBDs are found, it would search for a VBN, and so on.
See Collins (1999) for more details.
20 C HAPTER 12 • C ONSTITUENCY G RAMMARS
A → B C D
can be converted into the following two CNF rules (Exercise 12.8 asks the reader to
formulate the complete algorithm):
A → B X
X → C D
Sometimes using binary branching can actually produce smaller grammars. For
example, the sentences that might be characterized as
VP -> VBD NP PP*
are represented in the Penn Treebank by this series of rules:
VP → VBD NP PP
VP → VBD NP PP PP
12.6 • L EXICALIZED G RAMMARS 21
VP → VBD NP PP PP PP
VP → VBD NP PP PP PP PP
...
but could also be generated by the following two-rule grammar:
VP → VBD NP PP
VP → VP PP
The generation of a symbol A with a potentially infinite sequence of symbols B with
Chomsky-
adjunction a rule of the form A → A B is known as Chomsky-adjunction.
Categories
Categories are either atomic elements or single-argument functions that return a cat-
egory as a value when provided with a desired category as argument. More formally,
we can define C , a set of categories for a grammar as follows:
• A ⊆ C , where A is a given set of atomic elements
• (X/Y), (X\Y) ∈ C , if X, Y ∈ C
The slash notation shown here is used to define the functions in the grammar.
It specifies the type of the expected argument, the direction it is expected be found,
and the type of the result. Thus, (X/Y) is a function that seeks a constituent of type
22 C HAPTER 12 • C ONSTITUENCY G RAMMARS
Y to its right and returns a value of X; (X\Y) is the same except it seeks its argument
to the left.
The set of atomic categories is typically very small and includes familiar el-
ements such as sentences and noun phrases. Functional categories include verb
phrases and complex noun phrases among others.
The Lexicon
The lexicon in a categorial approach consists of assignments of categories to words.
These assignments can either be to atomic or functional categories, and due to lexical
ambiguity words can be assigned to multiple categories. Consider the following
sample lexical entries.
flight : N
Miami : NP
cancel : (S\NP)/NP
Nouns and proper nouns like flight and Miami are assigned to atomic categories,
reflecting their typical role as arguments to functions. On the other hand, a transitive
verb like cancel is assigned the category (S\NP)/NP: a function that seeks an NP on
its right and returns as its value a function with the type (S\NP). This function can,
in turn, combine with an NP on the left, yielding an S as the result. This captures the
kind of subcategorization information discussed in Section 12.3.4, however here the
information has a rich, computationally useful, internal structure.
Ditransitive verbs like give, which expect two arguments after the verb, would
have the category ((S\NP)/NP)/NP: a function that combines with an NP on its
right to yield yet another function corresponding to the transitive verb (S\NP)/NP
category such as the one given above for cancel.
Rules
The rules of a categorial grammar specify how functions and their arguments com-
bine. The following two rule templates constitute the basis for all categorial gram-
mars.
X/Y Y ⇒ X (12.4)
Y X\Y ⇒ X (12.5)
The first rule applies a function to its argument on the right, while the second
looks to the left for its argument. We’ll refer to the first as forward function appli-
cation, and the second as backward function application. The result of applying
either of these rules is the category specified as the value of the function being ap-
plied.
Given these rules and a simple lexicon, let’s consider an analysis of the sentence
United serves Miami. Assume that serves is a transitive verb with the category
(S\NP)/NP and that United and Miami are both simple NPs. Using both forward
and backward function application, the derivation would proceed as follows:
United serves Miami
NP (S\NP)/NP NP
>
S\NP
<
S
12.6 • L EXICALIZED G RAMMARS 23
Categorial grammar derivations are illustrated growing down from the words,
rule applications are illustrated with a horizontal line that spans the elements in-
volved, with the type of the operation indicated at the right end of the line. In this
example, there are two function applications: one forward function application indi-
cated by the > that applies the verb serves to the NP on its right, and one backward
function application indicated by the < that applies the result of the first to the NP
United on its left.
With the addition of another rule, the categorial approach provides a straight-
forward way to implement the coordination metarule described earlier on page 14.
Recall that English permits the coordination of two constituents of the same type,
resulting in a new constituent of the same type. The following rule provides the
mechanism to handle such examples.
X CONJ X ⇒ X (12.6)
This rule states that when two constituents of the same category are separated by a
constituent of type CONJ they can be combined into a single larger constituent of
the same type. The following derivation illustrates the use of this rule.
We flew to Geneva and drove to Chamonix
NP (S\NP)/PP PP/NP NP CONJ (S\NP)/PP PP/NP NP
> >
PP PP
> >
S\NP S\NP
<Φ>
S\NP
<
S
Here the two S\NP constituents are combined via the conjunction operator <Φ>
to form a larger constituent of the same type, which can then be combined with the
subject NP via backward function application.
These examples illustrate the lexical nature of the categorial grammar approach.
The grammatical facts about a language are largely encoded in the lexicon, while the
rules of the grammar are boiled down to a set of three rules. Unfortunately, the basic
categorial approach does not give us any more expressive power than we had with
traditional CFG rules; it just moves information from the grammar to the lexicon. To
move beyond these limitations CCG includes operations that operate over functions.
The first pair of operators permit us to compose adjacent functions.
as its argument. The following schema show two versions of type raising: one for
arguments to the right, and one for the left.
The category T in these rules can correspond to any of the atomic or functional
categories already present in the grammar.
A particularly useful example of type raising transforms a simple NP argument
in subject position to a function that can compose with a following VP. To see how
this works, let’s revisit our earlier example of United serves Miami. Instead of clas-
sifying United as an NP which can serve as an argument to the function attached to
serve, we can use type raising to reinvent it as a function in its own right as follows.
NP ⇒ S/(S\NP)
Combining this type-raised constituent with the forward composition rule (12.7)
permits the following alternative to our previous derivation.
United serves Miami
NP (S\NP)/NP NP
>T
S/(S\NP)
>B
S/NP
>
S
By type raising United to S/(S\NP), we can compose it with the transitive verb
serves to yield the (S/NP) function needed to complete the derivation.
There are several interesting things to note about this derivation. First, is it
provides a left-to-right, word-by-word derivation that more closely mirrors the way
humans process language. This makes CCG a particularly apt framework for psy-
cholinguistic studies. Second, this derivation involves the use of an intermediate
unit of analysis, United serves, that does not correspond to a traditional constituent
in English. This ability to make use of such non-constituent elements provides CCG
with the ability to handle the coordination of phrases that are not proper constituents,
as in the following example.
(12.11) We flew IcelandAir to Geneva and SwissAir to London.
Here, the segments that are being coordinated are IcelandAir to Geneva and
SwissAir to London, phrases that would not normally be considered constituents, as
can be seen in the following standard derivation for the verb phrase flew IcelandAir
to Geneva.
flew IcelandAir to Geneva
(VP/PP)/NP NP PP/NP NP
> >
VP/PP PP
>
VP
In this derivation, there is no single constituent that corresponds to IcelandAir
to Geneva, and hence no opportunity to make use of the <Φ> operator. Note that
complex CCG categories can can get a little cumbersome, so we’ll use VP as a
shorthand for (S\NP) in this and the following derivations.
The following alternative derivation provides the required element through the
use of both backward type raising (12.10) and backward function composition (12.8).
12.6 • L EXICALIZED G RAMMARS 25
Finally, let’s examine how these advanced operators can be used to handle long-
distance dependencies (also referred to as syntactic movement or extraction). As
mentioned in Section 12.3.1, long-distance dependencies arise from many English
constructions including wh-questions, relative clauses, and topicalization. What
these constructions have in common is a constituent that appears somewhere dis-
tant from its usual, or expected, location. Consider the following relative clause as
an example.
the flight that United diverted
Here, divert is a transitive verb that expects two NP arguments, a subject NP to its
left and a direct object NP to its right; its category is therefore (S\NP)/NP. However,
in this example the direct object the flight has been “moved” to the beginning of the
clause, while the subject United remains in its normal position. What is needed is a
way to incorporate the subject argument, while dealing with the fact that the flight is
not in its expected location.
The following derivation accomplishes this, again through the combined use of
type raising and function composition.
the flight that United diverted
NP/N N (NP\NP)/(S/NP) NP (S\NP)/NP
> >T
NP S/(S\NP)
>B
S/NP
>
NP\NP
<
NP
As we saw with our earlier examples, the first step of this derivation is type raising
United to the category S/(S\NP) allowing it to combine with diverted via forward
composition. The result of this composition is S/NP which preserves the fact that we
are still looking for an NP to fill the missing direct object. The second critical piece
is the lexical category assigned to the word that: (NP\NP)/(S/NP). This function
seeks a verb phrase missing an argument to its right, and transforms it into an NP
seeking a missing element to its left, precisely where we find the flight.
26 C HAPTER 12 • C ONSTITUENCY G RAMMARS
CCGBank
12.7 Summary
This chapter has introduced a number of fundamental concepts in syntax through
the use of context-free grammars.
As we have already noted, grammars based on context-free rules are not ubiqui-
tous. Various classes of extensions to CFGs are designed specifically to handle long-
distance dependencies. We noted earlier that some grammars treat long-distance-
dependent items as being related semantically but not syntactically; the surface syn-
tax does not represent the long-distance link (Kay and Fillmore 1999, Culicover and
Jackendoff 2005). But there are alternatives.
One extended formalism is Tree Adjoining Grammar (TAG) (Joshi, 1985).
The primary TAG data structure is the tree, rather than the rule. Trees come in two
kinds: initial trees and auxiliary trees. Initial trees might, for example, represent
simple sentential structures, and auxiliary trees add recursion into a tree. Trees are
combined by two operations called substitution and adjunction. The adjunction
operation handles long-distance dependencies. See Joshi (1985) for more details.
An extension of Tree Adjoining Grammar, called Lexicalized Tree Adjoining Gram-
mars is discussed in Chapter 14. Tree Adjoining Grammar is a member of the family
of mildly context-sensitive languages.
We mentioned on page 15 another way of handling long-distance dependencies,
based on the use of empty categories and co-indexing. The Penn Treebank uses
this model, which draws (in various Treebank corpora) from the Extended Standard
Theory and Minimalism (Radford, 1997).
Readers interested in the grammar of English should get one of the three large
reference grammars of English: Huddleston and Pullum (2002), Biber et al. (1999),
and Quirk et al. (1985). Another useful reference is McCawley (1998).
There are many good introductory textbooks on syntax from different perspec-
generative tives. Sag et al. (2003) is an introduction to syntax from a generative perspective,
focusing on the use of phrase-structure rules, unification, and the type hierarchy in
Head-Driven Phrase Structure Grammar. Van Valin, Jr. and La Polla (1997) is an
functional introduction from a functional perspective, focusing on cross-linguistic data and on
the functional motivation for syntactic structures.
Exercises
12.1 Draw tree structures for the following ATIS phrases:
1. Dallas
2. from Denver
3. after five p.m.
4. arriving in Washington
5. early flights
6. all redeye flights
7. on Thursday
8. a one-way fare
9. any delays in Denver
12.2 Draw tree structures for the following ATIS sentences:
1. Does American Airlines have a flight between five a.m. and six a.m.?
2. I would like to fly on American Airlines.
3. Please repeat that.
4. Does American 487 have a first-class section?
5. I need to fly between Philadelphia and Atlanta.
6. What is the fare from Atlanta to Denver?
E XERCISES 29
Ajdukiewicz, K. (1935). Die syntaktische Konnexität. Stu- Hockenmaier, J. and Steedman, M. (2007). Ccgbank: a cor-
dia Philosophica, 1, 1–27. English translation “Syntactic pus of ccg derivations and dependency structures extracted
Connexion” by H. Weber in McCall, S. (Ed.) 1967. Polish from the penn treebank. Computational Linguistics, 33(3),
Logic, pp. 207–231, Oxford University Press. 355–396.
Backus, J. W. (1959). The syntax and semantics of the pro- Hopcroft, J. E. and Ullman, J. D. (1979). Introduction to
posed international algebraic language of the Zurich ACM- Automata Theory, Languages, and Computation. Addison-
GAMM Conference. In Information Processing: Proceed- Wesley.
ings of the International Conference on Information Pro- Huddleston, R. and Pullum, G. K. (2002). The Cambridge
cessing, Paris, 125–132. UNESCO. Grammar of the English Language. Cambridge University
Backus, J. W. (1996). Transcript of question and answer ses- Press.
sion. In Wexelblat, R. L. (Ed.), History of Programming Joshi, A. K. (1985). Tree adjoining grammars: How much
Languages, p. 162. Academic Press. context-sensitivity is required to provide reasonable struc-
Bar-Hillel, Y. (1953). A quasi-arithmetical notation for syn- tural descriptions?. In Dowty, D. R., Karttunen, L., and
tactic description. Language, 29, 47–58. Zwicky, A. (Eds.), Natural Language Parsing, 206–250.
Cambridge University Press.
Bazell, C. E. (1952/1966). The correspondence fallacy in
structural linguistics. In Hamp, E. P., Householder, F. W., Kay, P. and Fillmore, C. J. (1999). Grammatical construc-
and Austerlitz, R. (Eds.), Studies by Members of the En- tions and linguistic generalizations: The What’s X Doing
glish Department, Istanbul University (3), reprinted in Y? construction. Language, 75(1), 1–33.
Readings in Linguistics II (1966), 271–298. University of Magerman, D. M. (1995). Statistical decision-tree models
Chicago Press. for parsing. In ACL-95, 276–283.
Biber, D., Johansson, S., Leech, G., Conrad, S., and Fine- Marcus, M. P., Kim, G., Marcinkiewicz, M. A., MacIntyre,
gan, E. (1999). Longman Grammar of Spoken and Written R., Bies, A., Ferguson, M., Katz, K., and Schasberger, B.
English. Pearson. (1994). The Penn Treebank: Annotating predicate argu-
ment structure. In ARPA Human Language Technology
Bies, A., Ferguson, M., Katz, K., and MacIntyre, R. (1995).
Workshop, 114–119. Morgan Kaufmann.
Bracketing guidelines for Treebank II style Penn Treebank
Project.. Marcus, M. P., Santorini, B., and Marcinkiewicz, M. A.
(1993). Building a large annotated corpus of English: The
Bloomfield, L. (1914). An Introduction to the Study of Lan-
Penn treebank. Computational Linguistics, 19(2), 313–
guage. Henry Holt and Company.
330.
Bloomfield, L. (1933). Language. University of Chicago
McCawley, J. D. (1998). The Syntactic Phenomena of En-
Press.
glish. University of Chicago Press.
Bresnan, J. (Ed.). (1982). The Mental Representation of Naur, P., Backus, J. W., Bauer, F. L., Green, J., Katz, C.,
Grammatical Relations. MIT Press. McCarthy, J., Perlis, A. J., Rutishauser, H., Samelson, K.,
Charniak, E. (1997). Statistical parsing with a context-free Vauquois, B., Wegstein, J. H., van Wijnagaarden, A., and
grammar and word statistics. In AAAI-97, 598–603. Woodger, M. (1960). Report on the algorithmic language
Chomsky, N. (1956). Three models for the description of ALGOL 60. CACM, 3(5), 299–314. Revised in CACM 6:1,
language. IRE Transactions on Information Theory, 2(3), 1-17, 1963.
113–124. Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajič,
Chomsky, N. (1956/1975). The Logical Structure of Linguis- J., Manning, C. D., McDonald, R., Petrov, S., Pyysalo, S.,
tic Theory. Plenum. Silveira, N., Tsarfaty, R., and Zeman, D. (2016). Universal
Dependencies v1: A multilingual treebank collection. In
Chomsky, N. (1957). Syntactic Structures. Mouton, The
LREC-16.
Hague.
Percival, W. K. (1976). On the historical source of immedi-
Chomsky, N. (1963). Formal properties of grammars. In
ate constituent analysis. In McCawley, J. D. (Ed.), Syntax
Luce, R. D., Bush, R., and Galanter, E. (Eds.), Handbook
and Semantics Volume 7, Notes from the Linguistic Under-
of Mathematical Psychology, Vol. 2, 323–418. Wiley.
ground, 229–242. Academic Press.
Chomsky, N. (1981). Lectures on Government and Binding. Pollard, C. and Sag, I. A. (1994). Head-Driven Phrase Struc-
Foris. ture Grammar. University of Chicago Press.
Collins, M. (1999). Head-Driven Statistical Models for Nat- Quirk, R., Greenbaum, S., Leech, G., and Svartvik, J.
ural Language Parsing. Ph.D. thesis, University of Penn- (1985). A Comprehensive Grammar of the English Lan-
sylvania, Philadelphia. guage. Longman.
Culicover, P. W. and Jackendoff, R. (2005). Simpler Syntax. Radford, A. (1997). Syntactic Theory and the Structure of
Oxford University Press. English: A Minimalist Approach. Cambridge University
Gazdar, G., Klein, E., Pullum, G. K., and Sag, I. A. (1985). Press.
Generalized Phrase Structure Grammar. Blackwell. Sag, I. A., Wasow, T., and Bender, E. M. (Eds.). (2003). Syn-
Harris, Z. S. (1946). From morpheme to utterance. Lan- tactic Theory: A Formal Introduction. CSLI Publications,
guage, 22(3), 161–183. Stanford, CA.
Hemphill, C. T., Godfrey, J., and Doddington, G. (1990). Steedman, M. (1989). Constituency and coordination in a
The ATIS spoken language systems pilot corpus. In Pro- combinatory grammar. In Baltin, M. R. and Kroch, A. S.
ceedings DARPA Speech and Natural Language Workshop, (Eds.), Alternative Conceptions of Phrase Structure, 201–
96–101. 231. University of Chicago.
Exercises 31