Semantic Research in Computational Linguistics
Semantic Research in Computational Linguistics
1.
Introduction
tence meaning (Section 3.). These models have much better coverage, at the
expense of the level of detail, precision, and conceptual clarity of the semantic representations. We conclude with an outlook on some novel directions
of research, which are aimed at comparing and integrating the worlds of
logical and statistical methods (Section 4.).
2.
S:
x. man(x) (y. woman(y) love(y)(x))
NP:
Px. man(x) P(x)
VP:
xy. woman(y) love(y)(x)
V:
yx. love(y)(x)
every man
NP:
Qy. woman(y) Q(y)
loves
a woman
Semantic construction
Compositional semantics. In the early 1970s, Richard Montague presented a framework for a strictly compositional interpretation of naturallanguage sentences in terms of type theory, including a formal treatment
of quantifier scope (Montague 1973). His work not only provided the basis
for modern semantic theory, but has also had great influence on the development of computational semantics. Standard model computational
semantics takes it as given that we can assign lambda terms to lexicon entries, combine them by traversing the parse tree bottom-up, and compute
lambda terms for larger phrases compositionally out of those for smaller
phrases, using functional application and beta reduction. An abbreviated
example for the derivation of one reading of the sentence every man loves
a woman is shown in Fig. 108.1.
Montagues original framework was based on an idiosyncratic version of
categorial grammar. Computational linguists mostly used the formalism of
unification grammar, i.e., phrase-structure grammar extended with feature
unification, when they first started developing large-scale grammars in the
1980s. Unification grammars such as LFG (Dalrymple et al. 1995) and
HPSG (Pollard & Sag 1994) offered an elegant and simple way to compute
predicate-argument structures by filling the argument positions of a head
with the semantic contributions of its complements using unification (see e.g.
Pereira & Shieber 1987). These methods were later extended to cover more
complex problems in semantic construction (Dalrymple 1999; Copestake,
Quantifier storage approaches. One non-local aspect of semantic construction that has received particular attention in computational semantics
is scope ambiguity. From a perspective of theoretical linguistics, the basic problem of semantic construction for sentences with scope ambiguities
was essentially solved by the Quantifier Raising (QR) operation in Montague Grammar. However, QR-based approaches cannot be used effectively
in computational semantics because the development of efficient parsing algorithms becomes very complicated, and it is inconvenient to develop large
grammars. A second major challenge for a computational treatment of scope
is that the number of readings quickly becomes very large as the sentence
grows longer, and the algorithm must still remain efficient even when this
happens. Algorithms for semantic construction can differ by a huge degree
in this respect; recent underspecification-based methods can perform tasks
that used to be completely infeasible (requiring years of computation time
for one sentence) in milliseconds.
A first step towards removing the reliance on QR was quantifier storage, which was first proposed by Cooper (1983) and then refined by Keller
(1988). The key idea in Cooper Storage was to replace Montagues treatment of scope ambiguity by a storage technique for quantifiers: Nodes in a
(phrase-structure) syntax tree are assigned structured semantic representations, consisting of content (a -expression of appropriate type) and quantifier store (a set of -expressions representing noun phrase meanings). As
the parse tree is traversed bottom-up, noun phrases may either be applied
in situ to form new content; for the example sentence every man loves a
woman, this leads to narrow scope for the object, in essentially the same
way as in the Montague-style derivation of Fig. 108.1. Alternatively, we
may move the content into the quantifier store at any NP node (as shown
at the node for a woman in Fig. 108.2) and then retrieve an item from
the store and apply it to the content at the sentence node. This enables the
non-deterministic derivation of different scope readings of a sentence from a
surface-oriented phrase-structure grammar analysis.
A related approach was proposed by Hobbs & Shieber (1987) first, and
later generalized to Quasi-Logical Form (QLF; Alshawi & Crouch 1992),
which became a central part of SRIs Core Language Engine (CLE; Alshawi
1990): During parsing, preliminary semantic representations (QLFs) are
built up, which contain the quantifier representations in the argument positions of their main predicate. In a second step, rewrite rules on the QLFs
move quantifiers to their appropriate position, leaving a variable behind to
bring about proper binding. For the above example, this system would first
derive the QLF term love(hevery, x, mani, hsome, y, womani), from which it
6
S:
x. man(x) love(x1)(x), {Qy. woman(y) Q(y)1}
y. woman(y) (x. man(x) love(y)(x)),
NP:
Px. man(x) P(x),
VP:
x. love(x1)(x), {Qy. woman(y) Q(y)1}
V:
yx. love(y)(x),
every man
NP:
Qy. woman(y) Q(y),
x1, {Qy. woman(y) Q(y)1}
loves
a woman
Figure 108.2: A Cooper Storage derivation for the second reading of the
sentence Every man loves a woman.
would derive the two readings in Fig. 108.1 and Fig. 108.2 by either scoping
hevery, x, mani over love first and then hsome, y, womani over the result, or
vice versa.
Underspecification. As grammars grew larger in order to extend their
coverage of free text, a further problem emerged. In a sentence with multiple
scope ambiguities, the number of readings can grow exponentially with the
number of quantifiers or other scope-bearing operators (such as negations
or modal operators) in the sentence. The following sentence from Poesio
(1994), which has (5!)2 = 14400 readings in which each quantifier and modal
operator takes scope in its own clause alone, illustrates this problem. In
practice, the problem is even worse because large-scale grammars tend to
make generalizing assumptions (e.g., that all noun phrases take scope) that
can cause innocent-looking sentences to be assigned millions of readings.
(1)
A politician can fool most voters on most issues most of the time, but
no politician can fool every voter on every single issue all of the time.
The standard approach to handling massive ambiguity like this in largescale grammars today is underspecification. Underspecification approaches
derive a compact representation of all readings from the syntactic analysis,
and proceed to single specific readings only by need, and after irrelevant
man(x)
woman(y)
man(x)
woman(y)
love(y)(x)
woman(y)
(a)
(b)
love(y)(x)
man(x)
love(y)(x)
(c)
Figure 108.3: A dominance graph for every man loves a woman (a), along
with the two trees it describes (b,c).
readings have been filtered out by inferences. Most underspecification approaches that are used in practice specify the parts from which a semantic
representation is supposed to be built, plus constraints that govern how the
parts may be combined. For instance, the dominance graph (Egg, Koller &
Niehren 2001; Althaus et al. 2003) for the earlier example sentence every
man loves a woman is shown in Fig. 108.3a. The parts of this graph may
be combined in all possible ways that respect the dotted dominance edges,
yielding the two trees in Fig. 108.3b,c. These trees represent the semantic
representations that we also derived in Fig. 108.2.
Most modern large-scale grammars use underspecification in one form or
another. HPSG grammars use Minimal Recursion Semantics (MRS, Copestake et al. 2005). The Glue Logic system used by many LFG grammars (Dalrymple 1999) can be seen as an underspecification approach as well; note
that some recent LFG grammars also use a simpler rewriting mechanism
for semantic construction (Crouch & King 2006). Underspecification-based
semantic construction algorithms have also been defined for Tree Adjoining Grammars (Kallmeyer & Romero 2008; Gardent 2003). Hole Semantics
(Blackburn & Bos 2005) is a particularly easy-to-understand underspecification formalism. The algorithmic foundations of underspecification have
been worked out particularly well for dominance graphs, into which MRS
and Hole Semantics can be translated. Dominance graphs also support powerful inference algorithms for efficiently reducing the set of possible readings
without even computing them (Koller & Thater 2010). For more information
about underspecification, we refer to article 24 Semantic underspecification
in this handbook.
One popular grammar formalism in computational linguistics that follows the original Montagovian program more directly is Combinatory Categorial Grammar (Steedman 2000; Bos et al. 2004). CCG is a variant of
8
categorial grammar, with which it shares a very elegant and direct mapping of syntactic to semantic representations. Although this forces CCG
into modeling semantic ambiguities as syntactic ambiguities, CCG can still
be parsed efficiently by representing both kinds of ambiguity together in a
parse chart.
2.2.
Inference
The major added value of logic as a representational framework in computational linguistics is its suitability for the development of provably correct inference procedures. Because logical deduction is backed by the truthconditional concept of logical entailment, it is possible to define under what
conditions a deduction system is sound and complete, and to develop such
systems. This is crucial when we model the processes which people perform
when interpreting or producing an utterance e.g., deriving relevant implicit information from the utterances semantic interpretation, integrating
meaning information into their knowledge, or reducing ambiguity by the
exclusion of inconsistent interpretations.
For first-order predicate logic, theorem provers that is, computer programs that test formulas for validity or unsatisfiability have become efficient enough to support the practical application of deduction systems. Theoretically, first-order logic is undecidable; but theorem provers, which were
originally designed for mathematical applications, have nonetheless achieved
an impressive average performance on standard tasks. Currently, a variety
of highly efficient off-the-shelf theorem provers are available which can be
used as general purpose inference engines for natural language processing
(Riazanov & Voronkov 2002; Hillenbrand 2003); there are also tools called
model builders which can test a formula for satisfiability and build satisfying models for them (McCune 1998; Claessen & Sorensson 2003). There has
been some research on theorem provers for dynamic logics, such as DRT (van
Eijck, Hegueiabehere & O Nuallain 2001; Kohlhase 2000), but these provers
have not been engineered as thoroughly as standard first-order provers, and
it is more efficient in practice to translate dynamic logic into static logic and
use the standard tools (Bos 2001). One example for an end-to-end system
of the standard model, involving semantic construction and the use of
first-order theorem provers, is Bos & Markert (2005).
It is known that first-order logic is not expressive enough to represent
genuinely higher-order or intensional phenomena in natural language, such
as embedding under propositional attitudes. Some researchers have directly
applied theorem provers for higher-order logic (e.g., Andrews & Brown 2006)
9
to natural-language inference tasks; see e.g. Gardent & Konrad (2000). However, higher-order theorem provers are much less efficient in practice than
first-order provers. To compensate for this restriction, computational semantics has a strong tendency towards avoiding higher-order constructs,
choosing first-order analyses in the case that semantic theory offers them as
an option, and sometimes even using first-order representations to approximate phenomena that would be modeled appropriately with higher-order
logic only (e.g. in the ontological promiscuity approach (Hobbs 1985); see
also Pulman (2007) for a more recent case study).
Conversely, one can explore the use of logics that are less expressive
than first-order logic in order to maximize efficiency, for restricted tasks
and applications. Description logics (Baader et al. 2003) are a family of
fragments of first-order logic designed to model terminological knowledge
and reasoning about the membership of objects in the denotation of concepts, of which the KL-ONE system is an early representative (Brachman
& Schmolze 1985). They are supported by very fast reasoning systems
(Haarslev & M
oller 2001; Tsarkov, Horrocks & Patel-Schneider 2007). Because they offer only restricted types of quantification, however, they have
mostly been used for small domains or for specific problem, such as the
resolution (Koller et al. 2004) and generation (Areces, Koller & Striegnitz
2008) of referring expressions.
Historically, another fragment of first-order logic that experienced widespread use in computational semantics is Horn Clause Logic, which underlies the programming language Prolog. Horn Clause Logic is limited by its
inability to express true logical negation, which in Prolog must be approximated as negation by failure: A negation A is considered as true iff
A cannot be proved from the database. Prolog has been widely used in
computational linguistics (Pereira & Shieber 1987; Blackburn & Bos 2005)
among other reasons, because it can model the full process of naturallanguage understanding including parsing, semantic construction, and inference uniformly, by using logical deduction. However, its use has declined
due to the availability of fast theorem provers and of NLP software libraries
for mainstream programming languages, as well as the growing importance
of numeric processing for statistical methods (see Section 3. below).
A final challenge is the modeling of common-sense reasoning. Inference
steps needed in the process of natural-language understanding may be valid
only in the typical case, and thus their results can be overwritten, if more
specific contradicting information is added. Knowing that Tweety is a bird
allows us to infer that Tweety can fly; adding the information that Tweety is
a penguin forces us to revise the derived information. This raises the infer10
3.
The standard model we have presented so far enables us to compute logicbased meaning representations, which can be used by theorem provers to
draw inferences. This works efficiently and with impressive accuracy, if
hand-crafted grammars and knowledge resources are available that cover all
information that is required for the interpretation. However, logic-based
semantic methods run into a number of fundamental problems:
Natural language is extremely ambiguous, and understanding of utterances implies ambiguity resolution: the determination of a contextually
appropriate reading. Underspecification methods enable an efficient
representation of semantic ambiguity, but they make no attempt to
resolve it. A particular challenge is word-sense disambiguation, because lexical ambiguity comprises a large and extremely heterogenous
class of individual phenomena.
Modeling inference for open-domain text understanding with logic requires us to encode a huge amount of world knowledge in logic-based
knowledge bases, as we have discussed. Such knowledge bases are
not available; even large-scale efforts at manual resource creation like
WordNet and Cyc have coverage problems.
Despite the progress in hand-crafting large grammars with semantic
information, many free-text sentences cannot be completely analyzed
by these grammars: Knowledge-based grammar processing still faces
coverage problems. Because traditional algorithms for semantic construction can only work on complete parses, no semantic representations can be computed for these sentences. That is, semantic construction procedures are not robust to coverage problems.
As a consequence, logic-based methods for computational semantics have
not been very successful as part of applications in language technology. In
13
14
3.1.
Word-sense disambiguation. Lexical ambiguity is pervasive in natural languages, and the determination of the contextually appropriate word
meaning, known as word-sense disambiguation (WSD), has long been recognized as a hard problem in computational linguistics. Over fifty years ago,
Yehoshua Bar-Hillel argued in his famous report on automatic translation
(Bar-Hillel 1960) that a translation machine should not only be supplied
with a dictionary but also with a universal encyclopedia. For example, to
appropriately translate the box was in the pen into another language, a
computer program must know about typical sizes and shapes of boxes and
pens to conclude that pen is used in the enclosure sense rather than
the writing implement sense. Bar-Hillel commented that any attempt to
solve this problem with knowledge-based methods was utterly chimerical
and hardly deserves any further discussion.
We can get a first grasp on the problem of WSD from lexical-semantic
resources that define an inventory of possible word senses for each word of a
language. Two such resources for English are WordNet (Fellbaum 1998) and
Rogets Thesaurus (Chapman 1977). WordNet lists Bar-Hillels two senses
for the noun pen, along with the senses correctional institution and
female swan. English WordNet contains about 29,000 polysemous words,
each of these with 3 different senses on average. Neither of these resources
contains the information (e.g., box and pen sizes) that is necessary to reliably
determine the sense in which a word was used in a given sentence.
Machine learning and WSD. WSD in early large-scale NLP systems
was typically done by hand-written rules that were developed specifically for
the application and the relevant domain (see e.g. Toma 1977; Hobbs et al.
1992; Koch, K
ussner & Stede 2000). Early attempts at defining generic
rule-based methods for WSD are (Wilks 1975; Hirst & Charniak 1982). The
weighted abduction approach by Hobbs et al. (1993) supported a generic,
logic-based mechanism for disambiguation, but suffered from efficiency issues
and required a large hand-coded knowledge base to work.
By contrast, statistical approaches attempt to solve the WSD problem
by automatically learning the choice of the appropriate word sense from text
corpora. The fundamental idea of such a machine learning approach is to
build a classifier, which for each occurrence of a word w in some context c
determines the sense s of this occurrence of w. This classifier is automatically
learned from observations in a text corpus, in which each occurrence of each
word has been manually annotated with its sense; one corpus that has been
15
annotated with WordNet senses is the SemCor corpus (Landes, Leacock &
Tengi 1998).
Machine learning approaches in which the training data is assumed to be
annotated in this way are called supervised. The context c is usually approximated by a collection f of features that can be automatically extracted from
the text. The machine learning system is trained on the annotated training
corpus, i.e., it observes the pairs of sense annotations and extracted feature
instantiations, for all instances of w, and derives from these data a statistical model of the correlation between feature patterns and word senses.
The system can then be executed on unseen, unlabeled documents to label
each word token automatically with its most plausible word sense, given the
feature information extracted from the tokens context.
Different approaches to statistical WSD are distinguished by the features
they use and the machine learning method. The simplest choice for the
features is to use context words. For instance, Yarowskys (1995) system
automatically identified the context words life, animal, and species as strong
statistical indicators of the biological sense of the target word plant, and
manufacturing, equipment, and employee as strong indicators of its factory
sense. To address the disambiguation problem in a systematic way, we might
determine the 2000 most frequent content words w1 , . . . , w2000 in the corpus.
For any occurrence of a target word w, we could then assign the feature fi
the value 1 if the context word wi occurs within a window of n words (for
n = 5, 10, 30, . . .) before or after w, and 0 otherwise. Approaches to machine
learning differ substantially in the exact way in which they make use of the
feature information to solve their classification task. For an overview of
different approaches to machine learning, see Mitchell (1997), Russell &
Norvig (2010), or Witten, Frank & Hall (2011).
Modeling context. The choice of features is a crucial part of designing a
successful machine-learning-based WSD system: Since only the information
encoded in features is visible to the machine learning system, the design of
the feature space entails a decision about the information made available to
the disambiguation process. The simplistic view of context as a set of cooccurring content words can be refined by adding more features representing
different kinds of information. We can, e.g., include precedence information
(does the context word occur to the left or to the right of the target?) or
use positional information (does the context word occur as the immediate
left and right neighbor of the target instance?). We may enrich the context
information with linguistic information provided by available, reasonably
16
efficient and reliable analysis tools: Using lemma and part-of-speech information is standard; adding syntactic information through shallow syntactic
parsing is another frequently chosen option.
In principle, it would be desirable to use deeper and more informative
context features than this. However, extracting such features tends to be
expensive (it may again require large hand-crafted grammar and knowledge
resources) or extremely noisy, if it can be done at all. Nevertheless, even
the simple context-word approach can capture a remarkable amount of information on different levels of contextual knowledge and their interaction,
however. Consider the following example; the common noun dish is ambiguous between a plate and a food sense.
(6)
The verb order contributes selectional preference information for its object position, and restaurant provides relevant topical or situational information. The two pieces of contextual evidence interact in a way that supports a strong prediction of the food sense of dish. Explicit modeling
of the inference process leading to the correct reading would require very
specific common-sense knowledge. A simple statistical model is able to predict the effects of this interaction with good results, based on the simple
co-occurrence counts of these context words.
Measuring system performance. A machine learning system generalizes from observations without human intervention, and typically only has
access to shallow features. The goal in designing such a system is therefore
never that it is infallible. Instead, the aim is to balance maximum coverage with making relatively few mistakes. In order to examine the quality
of such a system, one evaluates it on data for which the correct responses
are known. To this end, one splits the manually annotated corpus into two
separate portions for training and testing. The machine learning system is
trained on the training corpus, and then used to classify every single word in
the test corpus. One can, e.g., compute the accuracy, i.e., the percentage of
word tokens in the test corpus for which the system computed the annotated
word sense. This makes it possible to compare the performance of different
systems using well-defined measures.
WSD has been an active field of research in computational semantics
for the last two decades. An early successful WSD system was presented by
Yarowsky (1992). One can get a sense of the current state of the art from the
results of the Coarse-grained English All Words Task (Navigli, Litkowski
17
& Hargraves 2007), a competition advertised for the SemEval 2007 workshop. This task consists in annotating the words in a given corpus with a
coarse-grained sense inventory derived from WordNet. The random baseline, which assigns each word a random sense, achieved an accuracy of about
52% on this task. Because one sense of a word is often strongly predominant, the simple policy of assigning the instances of each word always its
globally most frequent sense achieves 79% accuracy on the dataset, which
is a much more demanding baseline for WSD systems. On the other hand,
the inter-annotator agreement, i.e. the percentage of tokens for which the
human annotators agreed when creating the SemEval 2007 test data was
94%. This is usually taken to indicate the upper bound for automatic processing. The best-performing WSD system in the 2007 competition reached
an accuracy of about 88%, beating the most-frequent-sense baseline significantly. Although the WSD system does not reach human performance yet, it
does come rather close. Recent overview articles about WSD are McCarthy
(2009) and Navigli (2009).
3.2.
FrameNet and PropBank. Research on SRL in computational linguistics therefore tends to use semantic role inventories which do not assume
universal semantic roles, either in FrameNet (Fillmore & Baker 2010) or in
PropBank style (Palmer, Gildea & Kingsbury 2005).
FrameNet organizes the lexicon into frames, which correspond to situation types. The FrameNet database currently contains about 12,000 lexical
units, organized into 1,100 frames. Semantic roles (called frame elements)
are then assumed to be specific to frames. For example, the verbs replace and substitute (as exchange and switch, and the nouns replacement and substitution) evoke the REPLACING frame; core roles
of this frame are Agent, Old, and New. The names of these roles are meaningful only within a given frame. This makes the role concept of FrameNet
rather specific and concrete, and makes it possible to annotate role information with high intuitive confidence. Two major corpora that have been
annotated with FrameNet data are the Berkeley FrameNet Corpus (Baker,
Fillmore & Cronin 2003) and the SALSA Corpus for German (Burchardt
et al. 2006). An example that illustrates how different verbs can induce the
same predicate-argument structure in FrameNet is shown in (8).
(8) a. [Agent Lufthansa] is replacingREPLACING [Old its 737s]
[New with Airbus A320s].
b. [Agent Lufthansa] is substitutingREPLACING [New Airbus A320s]
[Old for its 737s].
The PropBank approach proposes an even more restricted notion of a
semantic role. PropBank assumes specific roles called arg0, arg1, arg2, . . .
for the senses of each verb separately, and thus only relates syntactic alternations of the same predicate to each other. Role label identity between
complements of different verbs is not informative, as the examples in (9)
illustrate:
(9) a. [Arg0 Lufthansa] is replacing [Arg1 its 737s]
[Arg2 with Airbus A320s].
b. [Arg0 Lufthansa] is substituting [Arg1 Airbus A320s]
[Arg3 for its 737s].
Of the two approaches, FrameNet is the more ambitious one, in that it
supports a more informative encoding of predicate-argument structure than
PropBank role labeling. However, annotating a corpus with PropBank roles
19
is easier and can be done much more quickly than for FrameNet. As a consequence, exhaustively annotated corpora are available for several languages;
the English PropBank corpus is a version of the Penn Treebank (Marcus,
Santorini & Marcinkiewicz 1993) in which the arguments of all verb tokens
are annotated with semantic roles.
Semantic role labeling systems. The SRL task for FrameNet or PropBank can be split into two steps. First, because roles are specific to FrameNet
frames or PropBank verb senses, we must determine the frame or sense in
which a given verb token is being used. This is a WSD task, and is usually
handled with WSD methods.
Assuming that each predicate in the sentence has been assigned a frame,
the second step is to identify the arguments and determine the semantic
roles they fill. The first system that did this successfully was presented by
Gildea & Jurafsky (2002) originally for FrameNet, but the approach has
also been adapted for PropBank (see Palmer, Gildea & Kingsbury 2005).
It uses a set of features providing information about the target verb, the
candidate role-filler phrase, and their mutual relation. Most of the features
refer to some kind of syntactic information, which is typically provided by
a statistical parser. Features used include the phrase type (e.g., NP, PP,
S); the head word of the candidate phrase; the voice of the head verb; the
position of the candidate phrase relative to the head verb (left or right); and
the path between candidate phrase and head verb, described as a string of
non-terminals. Based on this information, the system estimates the probability that the candidate phrase stands in certain role relations to the target
predicate, and selects the most probable one for labeling.
Feature design and the sparse data problem. The Gildea & Jurafsky
system (as well as more recent approaches to WSD) uses syntactic information, but only looks at a handful of specific features of a syntax tree; much
of the available information that the syntax tree contains is hidden from
the machine learning system. Even a human annotator would sometimes
have difficulties in predicting the correct semantic roles given just this information. If the SRL system assumes that it has full syntactic information
anyway, why does it ignore most of it? Couldnt its performance be improved by adding additional features that represent more detailed syntactic
information?
This question touches upon a fundamental challenge in using statistical
methods, the sparse data problem. Every statistical model is trained from
20
a limited set of observations in the corpus, and is expected to make accurate predictions on unseen data. The reliability of these predictions depends
greatly on the size of the training corpus and the number of features. If we
add features, we increase the number of possible combinations of featurevalue pairs, i.e., the size of the feature space. For a given size of the training
data, this means that certain feature-value combinations will be seen only
once or not at all in training, which implies that the estimate of the statistical model becomes too inaccurate to make good predictions. Smoothing
and back-off techniques can improve the performance of systems by assigning some kind of positive probability to combinations that have never or
rarely been seen in training. But even these methods ultimately reduce the
systems predictions on rare events to educated guesses.
The trade-off between informativity and occurrence frequency is one of
the major challenges to statistical NLP. Sensible feature design, i.e. selecting
a feature set which provides maximal information while keeping the feature
space manageable, is a task where combined technical and linguistic expertise is required.
Further reading. For a more detailed introduction to standard SRL, we
refer the reader to Jurafsky & Martin (2008). Just as for WSD, a good
starting point to get a sense of the state of the art is to look at recent SRL
competitions (Carreras & Marquez 2004; Carreras & Marquez 2005; Hajic
et al. 2009).
3.3.
All data-intensive methods we have described so far are supervised methods: They require manually annotated corpora for training. The sparse data
problem we just mentioned arises because annotating a corpus is costly and
time-intensive, which limits the size of available corpora (Ng 1997). Conversely, this means that supervised methods can only be used with relatively
inexpressive features.
Data expansion methods attempt to work around this problem by partially automating the annotation process. These methods train an initial
model on a small amount of manually annotated seed data; use this model
to identify instances in a large un-annotated corpus whose correct annotation
can be predicted with high confidence; add the automatically annotated instances to the corpus; use the extended corpus to retrain the model; and then
repeat the entire process in a bootstrapping cycle. Such semi-supervised
methods have been quite successful in early WSD systems (Yarowsky 1995),
21
22
factory
flower
tree
plant
water
fork
grow
garden
worker
production
wild
15
5
279
102
3
147
200
0
6
216
330
198
5
9
35
517
316
84
130
96
106
118
18
28
30
3
17
0
0
0
Figure 108.4: Some co-occurrence vectors from the British National Corpus.
factory
plant
tree
flower
Figure 108.5: Graphical illustration of co-occurrence vectors.
through counts of context words occurring in the neighborhood of target
word instances. Take, as in the WSD example above, the n (e.g., 2000)
most frequent content words in a corpus as the set of relevant context words;
then count, for each word w, how often each of these context words occurred
in a context window of n before or after each occurrence of w. Fig. 108.4
shows the co-occurrence counts for a number of target words (columns),
and a selection of context words (rows) obtained from a 10% portion of the
British National Corpus (Clear 1993).
The resulting frequency pattern encodes information about the meaning
of w. According to the Distributional Hypothesis, we can model the semantic
similarity between two words by computing the similarity between their cooccurrences with the context words. In the example of Fig. 108.4, the target
flower co-occurs frequently with the context words grow and garden, and
infrequently with production and worker. The target word tree has a similar
distribution, but the target factory shows the opposite co-occurrence pattern
with these four context words. This is evidence that trees and flowers are
more similar to each other than to factories.
Technically, we represent each word w as a vector in a high-dimensional
23
vector space, with one dimension for each context word; the value of the vector at a certain dimension v is the co-occurrence frequency of w with v. We
define a similarity measure between words based on their respective vector
representations. A commonly used measure is the cosine of the angle between the two vectors, which can be computed easily from the co-occurrence
counts. It assumes the value 1 if the vectors directions coincide (i.e., the
proportions of their context-word frequencies are identical), and 0 if the vectors are orthogonal (i.e., the distributions are maximally dissimilar). In the
5-dimensional word-space of our example, we obtain a high distributional
similarity between the targets tree and flower (cosine of 0.752, representing
an angle of about 40 ), and a low similarity (cosines of 0.045 and 0.073, respectively, representing angles of about 85 ) between either of the two and
the target factory, as illustrated in Fig. 108.5.
Discussion. Standard distributional models offer only a rough approximation to lexical meaning. Strictly speaking, they do not model semantic
similarity in terms of the likeness of lexical meaning, but a rather vague
notion of semantic relatedness, which includes synonymy, topical relatedness, and even antonymy (Budanitsky & Hirst 2006). This is in part because
the notion of context is rather crude. A deeper problem is that textual cooccurrence patterns provide essentially incomplete and indirect information
about natural-language meaning, whose primary function is to connect language to the world. We will come back to the issue in Section 4.4..
Nevertheless, distributional approaches to semantics are attractive because they are fully unsupervised: They do not require any annotation or
other preparatory manual work, in contrast to the supervised and semisupervised methods sketched above. Therefore, one gets wide-coverage models almost for free; the only prerequisite is a text corpus of sufficient size.
In particular, distributional models can be easily obtained for languages for
which no lexicon resources exist, and adapted to arbitrary genre-specific or
domain-specific sub-languages. They have proven practically useful for several language-technology tasks. Examples are word-sense disambiguation
(McCarthy & Carroll 2003; Li, Roth & Sporleder 2010; Thater, F
urstenau
& Pinkal 2011), word-sense induction (Sch
utze 1998), information retrieval
(Manning, Raghavan & Sch
utze 2008), and question answering (Dinu 2011).
Contextualization. An obvious flaw of the basic distributional approach
is that it counts words rather than word senses. Because of lexical ambiguity,
the distributional pattern of a word is therefore a mixture of the distribu-
24
25
4.
Current developments
Textual entailment
As we have argued above, inference is the touchstone for computational semantics. It is the capability of supporting inferences that makes semantic
processing potentially useful in applications. The performance of a semantic processing method is therefore strongly dependent on its performance
in modeling inference. While the evaluation of WSD or SRL systems is
straightforward, the question of how to assess a systems performance on
the more global task of modeling inference appropriately has long been an
open issue in the computational semantics community.
FraCaS. A first step in this direction was the creation of a test suite of
inference problems by the FraCaS project in the 1990s (Cooper et al. 1996).
Each problem consisted of a premise and a candidate conclusion (phrased
as a yes/no question), plus information about their logical relation; systems
could then be evaluated by making them decide the logical relation between
the sentences and comparing the result against the gold standard. Two of
the about 350 examples are shown below:
(10) P: ITEL won more orders than APCOM
Q: Did ITEL win some orders?
YES
(11) P: Smith believed that ITEL had won the contract in 1992
H: Had ITEL won the contract in 1992?
26
UNKNOWN
The FraCaS testsuite was hand-crafted to cover challenging semantic
phenomena (such as quantifiers, plural, anaphora, temporal reference, and
attitudes), while minimizing the impact of problems like syntactic complexity and word-sense ambiguity. This made it a valuable diagnostic tool for
semanticists, but it also limited its usefulness for the performance evaluation of semantic processing systems on real-world language data, in which
syntactic complexity is uncontrolled and word-sense ambiguity is prevalent.
RTE. A milestone in the development of an organized and realistic evaluation framework for natural-language inference was the Recognizing Textual
Entailment (RTE) challenge initiated by Ido Dagan and his colleagues in the
PASCAL network (Dagan, Glickman & Magnini 2006). The RTE dataset
consists of pairs of sentences (a text T and a hypothesis H) derived from
text that naturally occurred in applications such as question answering, information retrieval, and machine translation, plus an annotation specifying
whether each sentence pair stands in an entailment relation.
In RTE, entailment is defined as follows:
We say that T entails H if the meaning of H can be inferred from
the meaning of T, as would typically be interpreted by people.
This somewhat informal definition is based on (and assumes)
common human understanding of language as well as common
background knowledge. (Dagan, Glickman & Magnini 2006)
For instance, the following sentence pair from the second RTE challenge
(Bar-Haim et al. 2006) is in the entailment relation.
(12) T: In 1954, in a gesture of friendship to mark the 300th anniversary
of Ukrainian union with Russia, Soviet Premier Nikita Khrushchev
gave Crimea to Ukraine.
H: Crimea became part of Ukraine in 1954.
YES
Crucially, textual entailment is not a logical notion; it is a relation
between textual objects. The above definition has been criticized for its
vagueness and for its insufficient theoretical grounding, in that it blurs the
distinction between logical entailment, common-sense inference, presupposition, and conversational implicature (Zaenen, Karttunen & Crouch 2005).
However, it was deliberately intended as a specification of a pre-theoretic
27
28
Pure logic-based systems, located at the other end of the spectrum, have
completely failed at the RTE task, which was shown impressively by Bos
& Markert (2005). They applied a state-of-the-art logic-based system along
the lines of Section 2.. Where this system claims entailment for a given
sentence pair, its judgment is quite reliable; but because it only claimed
entailment for less than 6% of the pairs, it gave far fewer correct answers
overall than a simple word-overlap model. This demonstrates the severity
of the knowledge bottleneck in logic-based semantics, which we mentioned
above.
A standard system architecture that emerged from the experiences in
RTE combines syntactic and semantic knowledge with machine learning
technology. A typical inventory of knowledge types includes syntactic dependency information contributed by knowledge-based or statistical parsers plus
lexical semantic information taken from WordNet or distributional models,
potentially complemented by semantic role information (FrameNet, PropBank) and lexical semantic and world knowledge from other sources (e.g.,
DIRT (Lin & Pantel 2001), VerbOcean (Chklovski & Pantel 2004), or the
YAGO knowledge base (Suchanek, Kasneci & Weikum 2008)). This information is used as input to a supervised machine-learning system, which learns
to predict the entailment status of a sentence pair from features indicating
structural and semantic similarity. Systems enhanced with linguistic knowledge in such ways typically outperform the purely overlap-based systems,
but only by a rather modest margin, with an accuracy around 65% (see e.g.
Giampiccolo et al. (2007) for an overview).
A notable exception is Hickl & Bensley (2007), a system submitted by
an industrial company (LCC) in the RTE-3 Challenge, which achieved 80%
accuracy, using a variety of rich resources in a machine learning approach. A
second LCC system (Tatu & Moldovan 2007) used a special-purpose theorem
prover (Moldovan et al. 2007) and reached a high accuracy as well. Although
neither the knowledge repositories nor the details about the method are
available to the public, it is likely that the success of these systems stems
from language and knowledge resources of various kinds that have been
built over years with enormous manpower, accompanied by a consistent
optimization of methods based on repeated task-oriented evaluations. This
suggests that at the end of the day, the decisive factor in building highperforming systems for entailment checking is not a single theoretical insight
or design decision, but rather the availability of huge amounts of information
about language and the world. The key difference between the logic-based
and machine-learning paradigms is that the latter degrades more gracefully
when this information is not sufficiently available.
29
Discussion. Between 2005 and 2010 a total of about 300 different systems
in total were evaluated. This has helped a lot in providing a clear picture
of the potential of different methods and resources on the task. However,
the RTE Challenges reveal a current state of the art that is not entirely
satisfactory. Statistical systems appear to hit a ceiling in modeling inference.
This is not just a technical problem: the fundamental shortcoming of purely
text-based approaches is that they do not model the truth conditions of
the sentences involved, and therefore cannot ground entailment in truth. It
is difficult to imagine how a notion of inference for semantically complex
sentences can be approximated by a model that does not in some way or
another subsume the conceptual framework of logic-based semantics. On the
other hand, direct implementations of the logic-based framework do not solve
the problem either, because such systems are rendered practically unusable
by the lack of formalized knowledge. Resolving this tension remains the
central challenge for computational semantics today.
4.2.
One promising direction of research that might help solve the dilemma is to
model truth-based entailment directly in natural language, without resorting
to explicit logical representations. The idea is old indeed, before the
introduction of formal logic, it was the only way of analyzing inference ,
but was revived and formalized in the 1980s by Johan von Benthem under
the heading of natural logic (van Benthem 1986; Sanchez-Valencia 1991).
Consider the following examples:
(14) a. Last year, John bought a German convertible.
b. Last year, John bought a German car.
To determine the entailment relation between (14a) and (14b), we need
not compute the respective logical representations and employ a deduction
system. We just need to know that convertible is a hyponym of car.
The argument does not apply in general. Replacing convertible with car
in John didnt buy a convertible or John bought two convertibles has
different semantic effects: In the former case, entailment holds in the inverse
direction, in the second, the two sentences are logically independent. The
differences are due to the different monotonicity properties (in the sense of
Barwise & Cooper 1981) of the contexts in which the respective substitutions
take place. In addition to knowledge about lexical inclusion relations, we
need syntactic information, a mechanism for monotonicity marking, and
30
monotonicity or polarity information for the functor expressions (in the sense
of categorial grammar or type theory).
Natural logic and RTE. MacCartney & Manning (2008) and MacCartney (2009) propose a model for textual entailment recognition which is based
on natural logic and extends and complements the framework in several aspects. Compared to the original approach of Sanchez-Valencia, they use
a refined inventory of semantic relations. Wide-coverage knowledge about
lexical semantic relations is obtained from WordNet, with distributional similarity as a fallback. Monotonicity handling includes the polarity analysis
of implicative and factive verbs (Nairn, Condoravdi & Karttunen 2006),
in addition to the standard operators (negation, determiners, conjunctions,
modal expressions) and constructions. Their full model also processes sentence pairs that require multiple substitutions, deletions, or insertions; the
global entailment relation between the sentences is computed as the joint
entailment effect of the individual edit steps.
Because the preposition without introduces a downward monotonic
context, the system can thus make the correct, but nontrivial judgment
that (15a) and (15b) do not entail each other, based on the edits shown in
(16).
(15) a. Some people are happy without a car.
b. Some professors are happy without an expensive convertible.
(16) Some SUBST(people, professors) are happy without an
INSERT(expensive) SUBST(car, convertible).
The global entailment relation between the sentences is computed as
the joint entailment effect of the single edit steps. Because the preposition
without is downward monotonic in its internal argument, the system can
thus make the correct, but nontrivial judgment that (15a) and (15b) do not
entail each other, based on the edits shown in (16).
MacCartneys NATLOG system has been shown to achieve an accuracy
of 70% on the FraCaS test suite. This demonstrates that the system can
handle logically non-trivial inference problems, although some phenomena,
like ellipsis, are outside the systems coverage. On the RTE-3 test set, the
system has an accuracy of 59%, which does not exceed the performance
achieved by simple word-overlap systems. However, the positive message is
that that the natural-logic-based approach is able to avoid the robustness
issues that make semantic construction for standard logic-based systems so
31
difficult. Combining NATLOG with the the shallow Stanford RTE system
(de Marneffe et al. 2006) increases the accuracy of the shallow system
from 60.5% by 4%, which proves that the deep inferences captured by the
natural-logic-based system are able to complement shallow RTE methods in
a substantial way.
Discussion. The natural logic approach does not capture all inferences
that a predicate logic approach would. It does not deal with inferences that
require multiple premises, and can only relate sentence pairs in which the
lexical material is exchanged while the global structure stays the same (e.g.,
de Morgans Law is outside its reach). However, the approach does cover
many inference patterns that are relevant in natural language, and the overhead for semantic construction and the disambiguation of irrelevant parts
of sentences is eliminated, because no translation to logical representation
is required.
4.3.
One reason for the low performance of logic-based inference systems in the
standard framework of computational semantics is the lack of wide-coverage
semantic construction procedures. Natural logic gets around the problem
by dispensing with semantic construction altogether. An alternative that
has recently been explored is the use of machine learning techniques for the
automatic assignment of rich semantic representations.
To get a better idea of the task, it is helpful to consider its relationship
to systems for syntactic parsing. The two problems are similar from a highlevel perspective, in that both compute structured linguistic representations
for natural language expressions. The dominant approach in syntactic parsing is to apply supervised statistical approaches to syntactically annotated
corpora, in order to learn grammars and estimate the parameters of a syntactic probability model. For semantic construction, statistical approaches
have been much less successful. Even for Semantic Role Labeling, the results
are noisier than for syntax. The assignment of complex logical structures
as representations for full sentences is harder, due to the fine granularity of
the target representations and the difficulty of finding surface features that
are indicative of deep semantic phenomena. This makes the specification of
annotation guidelines that would allow non-experts to reliably annotate a
corpus challenging.
Nevertheless, a considerable amount of research in the past few years has
investigated the use of supervised learning in semantic parsers, trained on
32
33
these methods have so far been applied only to relatively small corpora from
limited domains, and it remains to be seen how well they will scale up.
4.4.
5.
Conclusion
6.
References
von Ahn, Luis & Laura Dabbish 2004. Labeling images with a computer
game. In: Proceedings of the ACM CHI Conference.
Alshawi, Hiyan (ed.) 1990. The Core Language Engine. MIT Press.
Alshawi, Hiyan & Richard Crouch 1992. Monotonic semantic interpretation. In: Proceedings of the 30th Annual Meeting of the Association for
Computational Linguistics.
Althaus, Ernst, Denys Duchier, Alexander Koller, Kurt Mehlhorn, Joachim
Niehren & Sven Thiel 2003. An efficient graph algorithm for dominance
constraints. Journal of Algorithms 48, 194219.
Andrews, Peter B. & Chad E. Brown 2006. TPS: A hybrid automaticinteractive system for developing proofs. Journal of Applied Logic 4,
367395.
36
Areces, Carlos, Alexander Koller & Kristina Striegnitz 2008. Referring expressions as formulas of description logic. In: Proceedings of the 5th
International Natural Language Generation Conference. Salt Fork.
Asher, Nicholas & Alex Lascarides 2003. Logics of Conversation. Cambridge
University Press.
Baader, Franz, Diego Calvanese, Deborah McGuiness, Daniele Nardi & Peter Patel-Schneider (eds.) 2003. The Description Logic Handbook: Theory, implementation and applications. Cambridge University Press.
Baker, Collin, Charles Fillmore & Beau Cronin 2003. The structure of the
FrameNet database. International Journal of Lexicography 16, 281296.
Bar-Haim, Roy, Ido Dagan, Bill Dolan, Lisa Ferro, Danilo Giampiccolo,
Bernardo Magnini & Idan Szpektor 2006. The second PASCAL Recognising Textual Entailment challenge. In: Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment.
Bar-Hillel, Yehoshua 1960. The present status of automatic translation of
languages. Advances in Computers 1, 91163.
Barwise, Jon & Robin Cooper 1981. Generalized quantifiers and natural
language. Linguistics & Philosophy 4, 159219.
van Benthem, Johan 1986. Essays in Logical Semantics. Dordrecht: Reidel.
Blackburn, Patrick & Johan Bos 2005. Representation and Inference for
Natural Language. A First Course in Computational Semantics. CSLI
Publications.
Bos, Johan 2001. DORIS 2001: Underspecification, Resolution and Inference
for Discourse Representation Structures. In: Proceedings of the Third
International Workshop on Inference in Computational Semantics.
Bos, Johan, Stephen Clark, Mark Steedman, James Curran & Julia Hockenmaier 2004. Wide-coverage semantic representations from a CCG parser.
In: Proceedings of the 20th International Conference on Computational
Linguistics (COLING).
Bos, Johan & Katja Markert 2005. Recognising textual entailment with logical inference. In: Proceedings of the Conference on Empirical Methods
in Natural Language Processing. 628635.
Brachman, Ronald & James Schmolze 1985. An overview of the KL-ONE
knowledge representation system. Cognitive Science 9, 171216.
37
Branavan, S.R.K., Harr Chen, Luke S. Zettlemoyer & Regina Barzilay 2009.
Reinforcement learning for mapping instructions to actions. In: Proceedings of the Joint Conference of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing (ACL-IJCNLP).
Budanitsky, Alexander & Graeme Hirst 2006. Evaluating WordNet-based
measures of semantic distance. Computational Linguistics 32(1), 1347.
Burchardt, Aljoscha, Katrin Erk, Anette Frank, Andrea Kowalski, Sebastian
Pad
o & Manfred Pinkal 2006. The SALSA Corpus: a German corpus
resource for lexical semantics. In: Proceedings of the 5th International
Conference on Language Resources and Evaluation (LREC). 969974.
Carreras, Xavier & Lluis Marquez 2004. Introduction to the CoNLL-2004
shared task: Semantic role labeling. In: Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL).
Carreras, Xavier & Lluis Marquez 2005. Introduction to the CoNLL-2005
shared task: Semantic role labeling. In: Proceedings of the Ninth Conference on Computational Language Learning (CoNLL). 152164.
Chapman, R. 1977. Rogets International Thesaurus. New York: Harper &
Row.
Chen, David & Bill Dolan 2011. Building a persistent workforce on Mechanical Turk for multilingual data collection. In: Proceedings of the 25th
Conference on Artificial Intelligence (AAAI-11).
Chen, David L., Joohyun Kim & Raymond J. Mooney 2010. Training a Multilingual Sportscaster: Using Perceptual Context to Learn Language.
Journal of Artificial Intelligence Research 37, 397435.
Chiang, David 2007. Hierarchical phrase-based translation. Computational
Linguistics 33, 201228.
Chklovski, Timothy & Patrick Pantel 2004. VerbOcean: Mining the Web
for fine-grained semantic verb relations. In: Proceedings of EMNLP.
Claessen, Koen & Niklas Sorensson 2003. New techniques that improve
MACE-style model finding. In: Proceedings of the CADE-19 Workshop
on Model computation Principles, Algorithms, Applications. 1127.
Clear, Jeremy 1993. The British National Corpus. MIT Press.
Cooper, Robin 1983. Quantification and Syntactic Theory. Dordrecht: Reidel.
38
Cooper, Robin, Richard Crouch, Jan van Eijck, Chris Fox, Johan
van Genabith, Jan Jaspars, Hans Kamp, David Milward, Manfred
Pinkal, Massimo Poesio & Steve Pulman 1996. Using the framework. FraCas project deliverable D-16, Technical Report LRE 62-051;
ftp://ftp.cogsci.ed.ac.uk/pub/FRACAS/del16.ps.gz.
Copestake, Ann, Dan Flickinger, Carl Pollard & Ivan Sag 2005. Minimal Recursion Semantics: An Introduction. Research on Language and Computation 3, 281332.
Copestake, Ann, Alex Lascarides & Dan Flickinger 2001. An algebra for
semantic construction in constraint-based grammars. In: Proceedings of
the 39th Annual Meeting of the Association for Computational Linguistics. Toulouse, 132139.
Crouch, Dick & Tracy Holloway King 2006. Semantics via f-structure rewriting. In: Proceedings of the LFG06 Conference.
Dagan, Ido, Oren Glickman & Bernardo Magnini 2006. The PASCAL Recognising Textual Entailment challenge. In: J. Qui
nonero-Candela, I. Dagan, B. Magnini & F. dAlche Buc (eds.) Machine Learning Challenges,
Springer. 177190.
Dahl, Deborah A., Madeleine Bates, Michael Brown, William Fisher, Kate
Hunicke-Smith, David Pallett, Christine Pao, Alexander Rudnicky &
Elizabeth Shriberg 1994. Expanding the scope of the ATIS task: the
ATIS-3 corpus. In: Proceedings of the ARPA Human Language Technology Workshop.
Dalrymple, Mary (ed.) 1999. Semantics and Syntax in Lexical Functional
Grammar: The Resource Logic Approach. MIT Press.
Dalrymple, Mary, Ronald M. Kaplan, John T. Maxwell & Annie Zaenen
(eds.) 1995. Formal Issues in Lexical-Functional Grammar. CSLI Publications.
Dinu, Georgiana 2011. Word Meaning in Context: A Probabilistic Model
and its Application to Question Answering. Ph.D. thesis, Saarland University.
Dinu, Georgiana & Mirella Lapata 2010. Topic models for meaning similarity in context. In: Proceedings of the 23rd International Conference on
Computational Linguistics (COLING).
Egg, Markus, Alexander Koller & Joachim Niehren 2001. The constraint
language for lambda structures. Logic, Language, and Information 10,
457485.
39
van Eijck, Jan, Juan Hegueiabehere & Breanndan O Nuallain 2001. Tableau
reasoning and programming with dynamic first order logic. Logic Journal
of the IGPL .
Erk, Katrin & Sebastian Pado 2008. A structured vector space model for
word meaning in context. In: Proceedings of EMNLP.
Fellbaum, Christiane (ed.) 1998. WordNet: An Electronic Lexical Database.
MIT Press.
Fillmore, Charles 1968. Lexical entries for verbs. Foundations of Language
4, 373393.
Fillmore, Charles J. & Collin F. Baker 2010. A frame approach to semantic
analysis. In: B. Heine & H. Narrog (eds.) Oxford Handbook of Linguistic
Analysis, Oxford: Oxford University Press.
Firth, John 1957. Papers in Linguistics 19341951. Oxford University Press.
Fleischman, Michael & Deb Roy 2005. Intentional context in situated language learning. In: Proceedings of the Ninth Conference on Natural Language Learning (CoNLL).
F
urstenau, Hagen & Mirella Lapata 2009. Semi-supervised semantic role
labeling. In: Proceedings of the 12th EACL.
Gardent, Claire 2003. Semantic construction in feature-based TAG. In: Proceedings of the 10th Meeting of the European Chapter of the Association
for Computational Linguistics (EACL). 123130.
Gardent, Claire & Karsten Konrad 2000. Understanding Each Other. In:
Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP). 319326.
Giampiccolo, Danilo, Bernardo Magnini, Ido Dagan & Bill Dolan 2007. The
third PASCAL Recognizing Textual Entailment challenge. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing.
Gildea, Daniel & Daniel Jurafsky 2002. Automatic labeling of semantic roles.
Computational Linguistics 28, 245288.
Girju, Roxana, Adriana Badulescu & Dan Moldovan 2006. Automatic discovery of part-whole relations. Computational Linguistics 32.
Glickman, Oren, Ido Dagan & Moshe Koppel 2005. A probabilistic classification approach for lexical lextual entailment. In: Proceedings of the
20th National Conference on Artificial Intelligence (AAAI).
40
Gold, Kevin & Brian Scassellati 2007. A robot that uses existing vocabulary
to infer non-visual word meanings from observation. In: Proceedings of
the Twenty-Second Conference on Artificial Intelligence (AAAI).
Grefenstette, Edward & Mehrnoosh Sadrzadeh 2011. Experimental support
for a categorical compositional distributional model of meaning. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Groenendijk, Jeroen & Martin Stokhof 1991. Dynamic predicate logic. Linguistics & Philosophy 14, 39100.
Grosz, Barbara, Aravind Joshi & Scott Weinstein 1995. Centering: A framework for modeling the local coherence of discourse. Computational Linguistics 21, 203225.
Grosz, Barbara & Candace Sidner 1986. Attention, intention, and the structure of discourse. Computational Linguistics 12, 175204.
Haarslev, Volker & Ralf Moller 2001. Description of the RACER system
and its applications. In: Proceedings of the International Workshop on
Description Logics (DL-2001). 131141.
Hajic, Jan, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Antonia Marti, Lluis Marquez, Adam Meyers, Joakim Nivre,
Sebastian Pad
o, Jan Stepanek, Pavel Stranak, Mihai Surdeanu, Nianwen Xue & Yi Zhang 2009. The CoNLL-2009 shared task: Syntactic
and semantic dependencies in multiple languages. In: Proceedings of
the Thirteenth Conference on Computational Natural Language Learning (CoNLL): Shared Task.
Hamp, B. & H. Feldweg 1997. GermaNet a lexical-semantic net for German. In: Proceedings of the ACL Workshop on Automatic Information
Extraction and Building of Lexical Semantic Resources for NLP Applications.
Harris, Zellig S. 1951. Methods in structural linguistics. University of Chicago
Press.
Hearst, Marti A. 1992. Automatic acquisition of hyponyms from large text
corpora. In: Proceedings of the Fourteenth Conference on Computational
Linguistics (COLING). Nantes, 539545.
Hickl, Andrew & Jeremy Bensley 2007. A discourse commitment-based
framework for recognizing textual entailment. In: Proceedings of the
ACL-PASCAL Workshop on Textual Entailment and Paraphrasing.
41
Keller, William 1988. Nested Cooper storage: The proper treatment of quantification in ordinary noun phrases. In: U. Reyle & C. Rohrer (eds.)
Natural Language Parsing and Linguistic Theories, Dordrecht: Reidel.
432447.
Kipper-Schuler, Karin 2006. VerbNet: A Broad-coverage, Comprehensive
Verb Lexicon. Ph.D. thesis, University of Pennsylvania.
Koch, Stephan, Uwe K
ussner & Manfred Stede 2000. Contextual disambiguation. In: W. Wahlster (ed.) Verbmobil: Foundations of Speech-tospeech Translation, Heidelberg: Springer. 466480.
Kohlhase, Michael 2000. Model generation for discourse representation theory. In: Proceedings of the 14th European Conference on Artificial Intelligence (ECAI). 441445.
Kohlhase, Michael, Susanna Kuschert & Manfred Pinkal 1996. A typetheoretic semantics for -DRT. In: P. Dekker & M. Stokhof (eds.) Proceedings of the 10th Amsterdam Colloquium. 479498.
Koller, Alexander, Ralph Debusmann, Malte Gabsdil & Kristina Striegnitz
2004. Put my galakmid coin into the dispenser and kick it: Computational linguistics and theorem proving in a computer game. Journal of
Logic, Language, and Information 13, 187206.
Koller, Alexander & Stefan Thater 2010. Computing weakest readings. In:
Proceedings of the 48th ACL. Uppsala.
Kruijff, Geert-Jan M., Hendrik Zender, Patric Jensfelt & Henrik I. Christensen 2007. Situated dialogue and spatial organization: What, where
. . . and why? International Journal of Advanced Robotic Systems 4, 125
138.
Kwiatkowski, Tom, Luke Zettlemoyer, Sharon Goldwater & Mark Steedman 2010. Inducing probabilistic CCG grammars from logical form with
higher-order unification. In: Proceedings of the Conference on Empirical
Methods in Natural Language Processing (EMNLP).
Kwiatkowski, Tom, Luke Zettlemoyer, Sharon Goldwater & Mark Steedman 2011. Lexical generalization in CCG grammar induction for semantic parsing. In: Proceedings of the Conference on Empirical Methods in
Natural Language Processing (EMNLP).
Landauer, Thomas, Peter Foltz & Darrell Laham 1998. An introduction to
latent semantic analysis. Discourse Processes 25, 259284.
43
Landes, Shari, Claudia Leacock & Randee I. Tengi 1998. Building semantic
concordances. In: C. Fellbaum (ed.) WordNet: An Electronic Lexical
Database, Cambridge, MA: MIT Press.
Lenat, Douglas 1995. CYC: A large-scale investment in knowledge infrastructure. Communications of the ACM 38, 3338.
Li, Linlin, Benjamin Roth & Caroline Sporleder 2010. Topic models for word
sense disambiguation and token-based idiom detection. In: Proceedings
of the 48th Annual Meeting of the Association for Computational Linguistics (ACL).
Lin, Dekang & Patrick Pantel 2001. Discovery of inference rules for question
answering. Natural Language Engineering 7, 343360.
Lipton, Peter 2001. Inference to the Best Explanation. London: Routledge.
MacCartney, Bill 2009. Natural language inference. Ph.D. thesis, Stanford
University.
MacCartney, Bill & Christopher D. Manning 2008. Modeling semantic containment and exclusion in natural language inference. In: Proceedings of the 22nd International Conference on Computational Linguistics
(COLING-08).
Manning,
Christopher 2006. Local Textual inference:
Its
Hard to Circumscribe,
but You Know It When You
See It and NLP Needs It. Ms., Stanford University.
http://nlp.stanford.edu/manning/papers/LocalTextualInference.pdf.
Manning, Christopher, Prabhakar Raghavan & Hinrich Sch
utze 2008. Introduction to Information Retrieval. Cambridge University Press.
Manning, Christopher & Hinrich Sch
utze 1999. Foundations of Statistical
Natural Language Processing. MIT Press.
Marcus, Mitchell P., Beatrice Santorini & Mary Ann Marcinkiewicz 1993.
Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19, 313330.
de Marneffe, Marie-Catherine, Bill MacCartney, Trond Grenager, Daniel
Cer, Anna Rafferty & Christopher Manning 2006. Learning to distinguish valid textual entailments. In: Proceedings of the Second PASCAL
Workshop on Recognizing Textual Entailment.
Marszalek, Marcin, Ivan Laptev & Cordelia Schmid 2009. Actions in context.
In: Proceedings of the IEEE Computer Society Conference on Computer
Vision and Pattern Recognition.
44
45
Ng, Hwee Tou 1997. Getting serious about word sense disambiguation. In:
Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical
Semantics: Why, What, and How? 17.
Ng, Vincent 2010. Supervised noun phrase coreference research: The first
fifteen years. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL). 13961411.
Orkin, Jeff & Deb Roy 2007. The Restaurant Game: Learning social behavior and language from thousands of players online. Journal of Game
Development 3, 3960.
Palmer, Martha, Daniel Gildea & Paul Kingsbury 2005. The Proposition
Bank: An annotated corpus of semantic roles. Computational Linguistics
31, 71105.
Pereira, Fernando C. N. & Stuart M. Shieber 1987. Prolog and NaturalLanguage Analysis. CSLI Publications.
Poesio, Massimo 1994. Ambiguity, underspecication, and discourse interpretation. In: Proceedings of the First International Workshop on Computational Semantics.
Pollard, Carl & Ivan Sag 1994. Head-driven Phrase Structure Grammar.
University of Chicago Press.
Pulman, Stephen 2007. Formal and computational semantics: A case study.
In: J. Geertzen, E. Thijsse, H. Bunt & A. Schiffrin (eds.) Proceedings
of the Seventh International Workshop on Computational Semantics
(IWCS). 181196.
Ravichandran, Deepak & Eduard Hovy 2002. Learning surface text patterns
for a question answering system. In: Proceedings of the 40th ACL.
Reiter, R. 1980. A logic for default reasoning. Artificial Intelligence 13, 81
132.
Riazanov, Alexandre & Andrei Voronkov 2002. The design and implementation of VAMPIRE. AI Communications 15, 91110.
Roy, Deb & Ehud Reiter 2005. Connecting language to the world. Artificial
Intelligence 167, 112.
Russell, Stuart & Peter Norvig 2010. Artificial Intelligence: A Modern Approach. Prentice Hall.
Sanchez-Valencia, Victor 1991. Studies on Natural Logic and Categorial
Grammar. Ph.D. thesis, University of Amsterdam.
46
47
Turney, Peter & Patrick Pantel 2010. From frequency to meaning: Vector
space models of semantics. Journal of Artificial Intelligence Research 37,
141188.
Vossen, Piek 2004. EuroWordNet: a multilingual database of autonomous
and language-specific wordnets connected via an Inter-Lingual-Index.
International Journal of Linguistics 17.
Wilks, Yorick 1975. A preferential, pattern seeking, semantics for natural
language inference. Artificial Intelligence 6.
Witten, Ian H., Eibe Frank & Mark A. Hall 2011. Data Mining: Practical
Machine Learning Tools and Techniques. Morgan Kaufmann.
Wong, Yuk Wah & Raymond J. Mooney 2007. Learning synchronous grammars for semantic parsing with lambda calculus. In: Proceedings of the
45th Annual Meeting of the Association for Computational Linguistics
(ACL).
Yarowsky, David 1992. Word-sense disambiguation using statistical models
of Rogets categories trained on large corpora. In: Proceedings of COLING.
Yarowsky, David 1995. Unsupervised word sense disambiguation rivaling
supervised methods. In: Proceedings of the 33rd ACL.
Zaenen, Annie, Lauri Karttunen & Richard Crouch 2005. Local textual inference: Can it be defined or circumscribed? In: Proceedings of the
43rd Annual Meeting of the Association for Computational Linguistics
(ACL).
Zelle, John M. & Raymond J. Mooney 1996. Learning to parse database
queries using Inductive Logic Programming. In: Proceedings of the Thirteenth National Conference on Aritificial Intelligence (AAAI). 1050
1055.
Zettlemoyer, Luke S. & Michael Collins 2005. Learning to map sentences to
logical form: Structured classification with probabilistic categorial grammars. In: Proceedings of the Twenty-First Conference on Uncertainty in
Artificial Intelligence (UAI).
Keywords:
computational linguistics, knowledge-based methods, corpus-based methods
48
49