PDF 97025 download
PDF 97025 download
https://ebookultra.com
https://ebookultra.com/download/anaphora-
processing-linguistic-cognitive-and-computational-
modelling-1st-edition-antonio-branco/
https://ebookultra.com/download/computational-modelling-of-concrete-
structures-1st-edition-nenad-bicanic/
ebookultra.com
https://ebookultra.com/download/processing-creative-coding-and-
computational-art-1st-edition-ira-greenberg/
ebookultra.com
Computational Modelling in Hydraulic and Coastal
Engineering 1st Edition Christopher Koutitas (Author)
https://ebookultra.com/download/computational-modelling-in-hydraulic-
and-coastal-engineering-1st-edition-christopher-koutitas-author/
ebookultra.com
https://ebookultra.com/download/cognitive-and-communicative-
approaches-to-linguistic-analysis-1st-edition-ellen-contini-morava-ed/
ebookultra.com
https://ebookultra.com/download/cognitive-foundations-of-linguistic-
usage-patterns-1st-edition-hans-jo%cc%88rg-schmid/
ebookultra.com
https://ebookultra.com/download/principles-of-linguistic-change-
volume-3-cognitive-and-cultural-factors-1st-edition-william-labov/
ebookultra.com
https://ebookultra.com/download/the-handbook-of-computational-
linguistics-and-natural-language-processing-1st-edition-alexander-
clark/
ebookultra.com
Anaphora Processing Linguistic Cognitive and
Computational Modelling 1st Edition António Branco
Digital Instant Download
Author(s): António Branco, Anthony Mark McEnery, Ruslan Mitkov
ISBN(s): 9789027247773, 9027247773
Edition: 1
File Details: PDF, 2.24 MB
Year: 2005
Language: english
ANAPHORA PROCESSING
AMSTERDAM STUDIES IN THE THEORY AND
HISTORY OF LINGUISTIC SCIENCE
General Editor
E.F. KONRAD KOERNER
(Zentrum für Allgemeine Sprachwissenschaft, Typologie
und Universalienforschung, Berlin)
Volume 263
Anaphora Processing
Linguistic, cognitive and computational modelling
ANAPHORA PROCESSING
LINGUISTIC, COGNITIVE
AND COMPUTATIONAL MODELLING
Edited by
ANTÓNIO BRANCO
Universidade de Lisboa
TONY McENERY
Lancaster University
RUSLAN MITKOV
University of Wolverhampton
This volume includes extended versions of the best papers from DAARC’2002.
The selection process was highly competitive in that all authors of papers at
DAARC’2002 were invited to submit an extended and updated version of their
DAARC’2002 paper which was reviewed anonymously by 3 reviewers,
members of a Paper Selection Committee of leading international researchers. It
is worth mentioning that whilst we were delighted to have so many
contributions at DAARC’2002, restrictions on the number of papers and pages
which could be included in this volume forced us to be more selective than we
would have liked. From the 44 papers presented at the colloquium, we had to
select the 20 best papers only.
viii FOREWORD
The book is organised thematically. The papers in the volume have been
topically grouped into three sections:
(i) Computational treatment (6 papers)
(ii) Theoretical, psycholinguistic and cognitive issues (7 papers)
(iii) Corpus-based studies (7 papers)
However, this classification should not be regarded as too strict or absolute, as
some of the papers touch on issues pertaining to more than one of three above
topical groups.
We would like to thank all authors who submitted papers both to the
colloquium and to the call for papers associated with this volume. Their original
and revised contributions made this project materialise.
Mira Ariel
(Tel Aviv University, Israel)
Amit Bagga
(Avaya Inc., USA)
Branimir Boguraev
(IBM T. J. Watson Research Center, USA)
Peter Bosch
(University of Osnabrück, Institute of Cognitive Science, Germany)
FOREWORD ix
Donna Byron
(The Ohio State University, Computer and Information Science Dept., USA)
Francis Cornish
(CNRS and Université de Toulouse-Le Mirail, Département des Sciences du
Langage, France)
Dan Cristea
(University “Al. I. Cuza” of Iasi, Faculty of Computer Science, Romania)
Robert Dale
(Macquarie University, Division of Information and Communication Sciences,
Centre for Language Technology, Australia)
Iason Demiros
(Institute for Language and Speech Processing, Greece)
Richard Evans
(University of Wolverhampton, School of Humanities, Languages and Social
Sciences, UK)
Martin Everaert
(OTS, The Netherlands)
Claire Gardent
(INRIA-Lorraine, LORIA, France)
Jeanette Gundel
(University of Minnesota, USA and NTNU, Norway)
Sanda Harabagiu
(University of Texas, USA)
Graeme Hirst
(University of Toronto, Department of Computer Science, Canada)
Yan Huang
(University of Reading, Department of Linguistics, UK)
Andrew Kehler
(University of California, San Diego, Department of Linguistics, USA)
Rodger Kibble
(University of London, Department of Computing, UK)
Andrej Kibrik
(Russian Academy of Sciences, Institute of Linguistics, Russia)
Emiel Krahmer
(Tilburg University, The Netherlands)
x FOREWORD
Shalom Lappin
(King's College, UK)
Naila Pimodri
(University of Cambridge)
Maria Mercedes Piñango
(Yale University, Department of Linguistics, USA)
Massimo Poesio
(University of Essex, Department of Computer Science, UK)
Eric Reuland
(OTS, The Netherlands)
Marco Rocha
(Universidade Federal de Santa Catarina, Brazil)
Antonio Fernandez Rodriguez
(University of Alicante, Spain)
Monique Rolbert
(Université de Marseille, France)
Tony Sanford
(University of Glasgow, Department of Psychology, UK)
Roland Stuckardt
(Johann Wolfgang Goethe University Frankfurt am Main, Germany)
Linda Van Guilder
(MITRE, USA)
We would like also to acknowledge the help received from our series editor,
Prof. Konrad Koerner and from Ms Anke de Looper of John Benjamins in
Amsterdam. We are also very grateful to João Silva for his patience and
considerable help in the nitty gritty of file formatting. Without them this book
would not have been a viable project.
September 2004
António Branco
Tony McEnery
Ruslan Mitkov
Section I
Computational Treatment
A Sequenced Model of Anaphora and Ellipsis Resolution
Shalom Lappin
Department of Computer Science, King's College
1
Earlier versions of this paper were presented at the 4th Discourse Anaphora and Anaphora Resolution
Colloquium in Lisbon in September 2002, the Linguistics Colloquium at the University of Toronto in
November 2002, and the Linguistics Colloquium at the University of Reading in January 2003. I am grateful to
the audiences of these forums for useful discussion of the ideas presented here. I would also like to thank
Ruslan Mitkov and Andy Kehler for their encouragement and their helpful comments on this work.
2
See (Mitkov, 2002) for a recent study of anaphora resolution that includes a history of the problem within
natural language processing. See (Mitkov et al., 2001) for examples of current work on anaphora resolution.
(Huang, 2000) offers an extensive cross-linguistic investigation of anaphora and examines alternative linguistic
theories of this relation. See (Lappin, 1996) and (Lappin & Benmamoun, 1999) for theoretical and
computational discussions of ellipsis resolution.
4 LAPPIN
e. not_want(e3,John,e5) ∧ have(e5,Bill,ck)
f. not_want(e4,John,e6) ∧ drive(e6,Bill)
g. drunk(e2,Bill)
The main strength of knowledge-based systems is their capacity to capture
fine-grained semantic and pragmatic distinctions not encoded in syntactic
features or frequency of co-occurence patterns. These distinctions are not
accessible to knowledge-poor approaches. They are crucial to correctly
resolving pronominal anaphora and VP ellipsis in a small but important set of
cases that arise in text and dialogue.
The two main difficulties with these systems are that (i) they require a large
database of axioms encoding real world knowledge, and (ii) they apply
defeasible inference rules which produce combinatorial blow up very quickly.
Assigning cost values to inference rules and invoking a cost driven preference
system for applying these rules (as in (Hobbs et al., 1993)) may reduce the blow
up to some extent, but the problem remains significant.
As a result, knowledge-based models of anaphora resolution are generally
not robust. Their rules are often domain-dependent and hard to formulate in a
way that will support inference over more than a small number of cases.
Moreover, the semantic/discourse representations to which the inference rules
apply are not reliably generated for large texts.
3 Knowledge-Poor Approaches
Knowledge-poor systems of anaphora resolution rely on features of the input
which can be identified without reference to deep semantic information or
detailed real world knowledge. One version of this approach employs syntactic
structure and grammatical roles to compute the relative salience of candidate
antecedents. Another uses machine-learning strategies to evaluate the
probability of alternative pronoun-antecedent pairings by training on large
corpora in which antecedent links are marked.
Hobbs (1978) suggests one of the first instances of a syntactic salience
procedure for resolving pronouns. He formulates a tree search algorithm that
uses syntactic configuration and sequential ordering to select NP antecedents of
pronouns through left-right, breadth-first traversal of a tree. Lappin and Leass
(1994) propose an algorithm which relies on weighted syntactic measures of
salience and recency to rank a filtered set of NP candidates. This algorithm
applies to full syntactic parses. Kennedy and Boguraev (1996), Mitkov (1998),
and Stuckardt (2001) modify and extend this approach to yield results for
partial syntactic representations rather than full and unambiguous parse
structures. Grosz et al. (1995) employ a grammatical role hierarchy and
6 LAPPIN
preference rules for managing informational state change to select the local NP
centre (focus) for each element of the sequence of sentences in a discourse.
A recent instance of the application of machine learning to anaphora is
(Soon et al., 2001). They describe a procedure for training a classifier on a
corpus annotated with coreference chains, where the NP elements of these
chains are assigned a set of features. The classifier goes through all pairs of
referential NP's in a text to identify a subset of coreferential pairs.
The obvious advantage of knowledge-poor systems relative to knowledge-
based models is that the former are computationally inexpensive and potentially
robust. However, these claims of resource efficiency and wide coverage must
be qualified by recognition of the expense involved in generating accurate
syntactic representations for systems that apply to full parses or detailed
grammatical role information. Salience-driven systems also require domain
specific and, possibly, language specific values for syntactic salience measures.
In the case of machine learning techniques, it is necessary to factor in the cost
of annotating large corpora and training classification procedures.
An important weakness of these models is that they cannot handle a small
but significant core of anaphora resolution cases in which salience cannot be
identified solely on the basis of syntactic and morphological properties, and
relative recency. These features are also the basis for the candidate rankings
that machine learning methods generate.
Dagan et al. (1995) attempt to enrich a syntactic salience system by
modelling (a certain amount of) semantic and real world information cheaply.
They combine the Lappin-Leass algorithm with a statistically trained lexical co-
occurrence preference module. Elements of the candidate antecedent list are
assigned both salience and lexical preference scores. The latter are based on
frequency counts for verb-NP and prep-NP pairs in a corpus, and the
substitution of the candidate for the pronoun in the observed head-argument
relation of the pronoun. When the difference between the salience scores of the
two highest ranked candidates is below a (experimentally) determined threshold
and the lexical preference score of another candidate Ci exceeds that of the first
by a (experimentally) specified ratio, then Ci is selected.
Consider the pronoun it in 5.
(5) The utility (CDVU) shows you a LIST4250, LIST38PP, or LIST3820 file on your
terminal for a format similar to that in which it will be printed.
The statistical preference module overrides the higher syntactic salience
ranking of utility to select file as the antecedent of it. This preference is due to
the fact that print file has a significantly higher frequency count than print
utility. The statistical module improved the performance of Lappin and
A SEQUENCED MODEL OF ANAPHORA AND ELLIPSIS RESOLUTION 7
3
Asher et al. (2001) also invoke this condition to resolve pronouns in ambiguous elided VP's.
8 LAPPIN
relative to every student. The DRS's of the wide scope reading for a test do not
produce a theme for the DRS's of the wide scope reading of every student.
Several other instances of knowledge-based and inference-driven models of
VP ellipsis interpretation are as follows. Hobbs and Kehler (1997), and Kehler
(2002) use parallelism constraints for text coherence to identify VP antecedents.
Dalrymple et al. (1991) and Shieber et al. (1996) apply higher-order unification
to resolve the predicate variable in the semantic representation of an elided VP.
Crouch (1999) constructs derivations in linear logic to provide alternative ways
of assembling the constituents in the representation of an antecedent in order to
obtain possible interpretations of the clause containing the ellipsis site.
This approach to VP ellipsis enjoys the same advantages and suffers from
the same weaknesses that we noted with respect to the knowledge intensive
view of pronominal anaphora resolution.
Turning to a knowledge-poor model, Hardt (1997) describes a procedure for
identifying the antecedent of an elided VP in text that applies to the parse
structures of the Penn Treebank.4
It constructs a list of candidate VP's to which it applies a syntactic filter. The
elements of the filtered candidate list are assigned scores on the basis of
syntactic salience factors and recency.
On a blind test of 96 examples from the Wall Street Journal the procedure
achieved a success rate of 94.8% according to a head verb overlap criterion (the
head verb of the system's selected candidate is contained in, or contains the
head verb of the coder's choice of antecedent). It achieved 85.4% for exact head
verb match and 76% for full antecedent match. A comparison procedure that
relies only on recency scored 75% for head verb overlap, 61.5% for exact head
verb match, and 14.6% for full antecedent match.
Hardt's syntactic salience-based procedure uses essentially the same strategy
and design for identifying the antecedent of an elided VP as Lappin and Leass’
(1994) algorithm applies to pronominal anaphora resolution. Its higher success
rate may, in part, be due to the fact that recency and syntactic filtering tend to
reduce the set of candidates more effectively for elided VP's than for pronouns.
As in the case of pronouns, a small set of elided VP cases are not accessible
to resolution by salience ranking or statistically modelled lexical preference.
The following examples clearly indicate that inference based on semantic and
real world knowledge appears to be inescapable for these cases.5
4
Hard's procedure applies to elided VP's that have already been recognized. Nielsen (2003, 2004) presents
preliminary results for the application of a variety of machine learning methods to the identification of elided
VP's in text.
5
Dalrymple (1991), Hardt (1993), and Kehler (2002) claim that the fact that inference is required to identify
the antecedents of the elided VP's in (9) and (10) shows that ellipsis resolution applies to semantic rather than
A SEQUENCED MODEL OF ANAPHORA AND ELLIPSIS RESOLUTION 9
(9) Mary and Irv want to go out, but Mary can't, because her father disapproves of Irv.
(Webber, 1979)
Mary can't go out with Irv
(10) Harry used to be a great speaker, but he can't anymore, because he lost his voice.
(Hardt, 1993)
he can't speak anymore
5 The Interpretation of Fragments in Dialogue
Fernández et al. (to appear) present SHARDS, a system for interpreting non-
sentential phrasal fragments in dialogue. Examples of such fragments are short
answers (11), sluices (short questions, 12), and bare adjuncts (13). The latter are
possible even when no wh-phrase adjunct appears in the antecedent to anchor
them, as in (14).
(11) A: Who saw Mary?
B: John.
John saw Mary.
(12) A: A student saw John.
B: Who?
Which student saw John?
(13) A: When did Mary arrive?
B: At 2.
Mary arrived at 2.
(14) A: John completed his paper.
B: When?
When did John complete his paper?
SHARDS is a Head Driven Phrase Structure Grammar (HPSG)-based
system for the resolution of fragments in dialogue. It treats the task of resolving
fragment ellipsis as locating for the (target) ellipsis element a parallel (source)
element in the context, and computing from contextual information a property
which composes with the target to yield the resolved content. This basic view of
ellipsis resolution is similar in spirit to the higher-order unification (HOU)
approach of Dalrymple et al. (1991) and Pulman (1997).
syntactic representations. In fact, it is not obvious that the need for inference in (some cases of) ellipsis
resolution in itself determines the nature of the representation to which the inference rules apply. Lappin (1996)
argues that inference can apply to syntactic representations of sentences to generate structures corresponding to
(i) and (ii).
(i) Mary wants to go out with Irv.
(ii) Harry used to speak.
These structures supply appropriate antecedents for the syntactic reconstruction of the elided VP's in (9) and
(10), respectively. The need for inference in ellipsis resolution, on one hand, and the nature of the level of
representation to which inference and ellipsis resolution apply, on the other, are independent questions which
should be distinguished.
10 LAPPIN
Two new attributes are defined within the CONTEXT feature structure: the
Maximal Question Under Discussion (MAXQUD) and the Salient Utterance
(SALUTT). The MAXQUD is the most salient question that needs to be answered
in the course of a dialogue. The SALUTT represents a distinguished constituent
of the utterance whose content is the current value of MAXQUD. In information
structure terms, the SALUTT specifies a potential parallel element correlated with
an element in the antecedent question or assertion. The SALUTT is the element
of the MAXQUD that corresponds to the fragment phrase. By deleting the SALUTT
from the MAXQUD, SHARDS produces the representation of a property from
which the propositional core of the CONTENT value for the fragment can be
constructed.
(15) is the (simplified) typed feature structure that (Fernández et al., to
appear) posit for a bare fragment phrase.
(15)bare-arg-ph ⇒
STORE {}
CONT | SOA | NUCL 1
MAX − QUD | SOA | NUCL 1
CTXT CAT 2
SAL − UTT CONT | INDEX
3
CAT 2
HD − DTRS CONT | INDEX 3
SHARDS interprets a fragment in dialogue by computing from context
(represented as a dialogue record) the values of MAXQUD and SALUTT for the
assertion or question clause that the fragment expresses. It uses these feature
values to specify the CONTENT feature of the clause for the fragment. The basic
propositional content of the fragment clause is recovered from the MAXQUD,
whose NUCL feature value is shared with the NUCL feature of the fragment
clause's CONT feature.
The value of SALUTT is of type sign, enabling the system to encode syntactic
categorial parallelism conditions, including case assignment for the fragment.
The SALUTT is computed as the (sub)utterance associated with the role bearing
widest quantificational scope within the MAXQUD.
SHARDS computes the possible MAXQUD's from each sentence which it
processes and adds them to the list of MAXQUD candidates in the dialogue
record. When a fragment phrase FP is encountered, SHARDS selects the most
recent element of the MAXQUD candidate list which is compatible with FP's
clausal semantic type.
(16) is the Attribute Value Matrix (AVM) produced for the CONT of Who
saw Mary. 1 is the index value of who and 2 of Mary:
A SEQUENCED MODEL OF ANAPHORA AND ELLIPSIS RESOLUTION 11
(16)
INDEX 1
PARAMS RESTR {person − rel ( 1 )}
saw − rel
SOA | NUCL see − er 1
seen 2
This is the feature structure counterpart of the λ-abstract λπ.(…π…).
The (abbreviated) AVM for the SALUTT who is (17).
(17)
CAT NP[+ nom]
CONTENT 1
STORE INDEX 1
RESTR {person( 1 )}
(18) is the AVM produced for John as a short answer, where 1 is the index
value of John and 2 of Mary.
(18)
see − rel
NUCL see − er 1
SOA seen
2
RESTR {person − rel ( 1 )}
6
(24) is from the dialogue component of the British National Corpus, File KB4, sentences 144-150. I am
grateful to Raquel Fernández for providing this example.
A SEQUENCED MODEL OF ANAPHORA AND ELLIPSIS RESOLUTION 13
6 A Sequenced Model
As we have seen, work on anaphora and ellipsis within the framework of the
knowledge-poor approach indicates that syntactic measures of salience
combined with recency provide a highly effective procedure for antecedent
identification across a wide range of ellipsis and anaphora resolution tasks in
text and dialogue. These methods are computationally inexpensive and
generally robust. It is possible to deal with a subset of the (significant) minority
of cases which are not amenable to syntactic salience-based resolution through
statistical modelling of semantic and real world knowledge as lexical preference
patterns. For the remaining cases abductive inference appears to be
unavoidable. These considerations suggest that a promising approach is to
apply the techniques in an ascending sequence of computational cost. (25) gives
the outline of a plausible architecture for such an integrated sequenced model of
anaphora and ellipsis resolution.
(25) <P,Candidate_Antecedent_List> ⇒
Module 1
Syntactic Salience & Recency Measures +
Syntactic & Morphological Filtering →
Ranked Candidate List →
Confidence Metric 1 →
correctly resolved;
unresolved ⇒
Module 2
Statistically Determined Lexical Preference Measures →
New Ranked Candidate List →
Confidence Metric 2 →
correctly resolved;
unresolved ⇒
Module 3
Abductive Inference ⇒
resolved
The sequenced model of anaphora and ellipsis resolution proposed here
moves successively from computationally inexpensive and interpretationally
rough-grained procedures to increasingly costly and fine-grained methods. The
model encodes a strategy of maximizing the efficiency of an anaphora (ellipsis)
resolution system by invoking fine-grained techniques only when necessary.
In order to succeed, this strategy must use reliable confidence metrics to
evaluate the candidate rankings which the first two modules produce. Such
metrics can be constructed on the model of the criteria that Dagan et al. (1995)
use to evaluate the reliability of salience scores. When the distance between the
14 LAPPIN
salience scores of the two top candidates in a list falls below a certain threshold,
the ranking is taken as an unreliable basis for antecedent selection and the
statistical lexical preference module is activated. Intensive experimental work
using machine learning techniques will be needed to determine optimal values
for both the salience factors of Module 1 and the confidence metrics used to
assess the outputs of Modules 1 and 2.
A computationally viable abductive inference component will require
resource sensitive inference rules to control the size and number of the
inference chains that it generates.7 Resource sensitivity and upper bounds on
derivations in abductive inference are essential to rendering the procedures of
Module 3 tractable.
7 Conclusions and Future Work
While the knowledge-based and inference driven approach to anaphora and
ellipsis resolution can deal with cases that require fine-grained semantic
interpretation and detailed real world knowledge, it does not provide the basis
for developing computationally efficient, wide coverage systems. By contrast,
knowledge-poor methods are inexpensive and potentially robust, but they miss
an important minority of recalcitrant cases for which real world knowledge and
inference are indispensable. A promising solution to this engineering problem is
to construct an integrated system that orders the application of anaphora and
ellipsis interpretation techniques in a sequence of modules that apply
increasingly fine-grained techniques of interpretation with an attendant rise in
computational cost. Confidence metrics filter the output of each module to
insure that the more expensive components are invoked only when needed.
In order to implement the proposed model, it is important to achieve
optimisation of the selection of salience parameters and their relative values
through statistical analysis of experimental results. A considerable amount of
work has been done on the application of salience parameters and values to
minimal syntactic representations rather than fully specified parse structures.
This is a fruitful area of investigation which merits further research, as it holds
out the promise of major gains in efficiency and robustness for the salience
methods that comprise the first module of an integrated system. Another
problem worth pursuing is the generalization of lexical preference patterns to
relations between semantic classes. Measuring preference in terms of semantic
categories rather than specific lexical head-argument and head-adjunct patterns
7
Kohlhase and Koller (2003) propose resource sensitive inference rules for model generation in which the
salience of a referential NP in discourse is used to compute the relative cost of applying inference rules to
entities introduced by this NP. They measure salience in discourse largely in terms of the sorts of syntactic and
recency factors that Lappin and Leass (1994) use in their anaphora resolution algorithm.
A SEQUENCED MODEL OF ANAPHORA AND ELLIPSIS RESOLUTION 15
will increase the power and reliability of Module 2. The viability of the entire
system depends upon determining reliable confidence metrics for both salience-
based and lexical preference-based antecedent selection. Finally, to implement
the third module much work must be done to develop efficiently resource
sensitive procedures for abductive inference in different domains.
References
Asher, N. 1993. Reference to Abstract Objects in English. Dordrecht: Kluwer.
Asher, N., D. Hardt and J. Busquets 2001. “Discourse Parallelism, Ellipsis, and Ambiguity.”
Journal of Semantics 18.
Crouch, D. 1999. “Ellipsis and Glue Languages.” S. Lappin and E. Benmamoun (eds.)
Fragments: Studies in Ellipsis and Gapping. New York: Oxford University Press.32-67.
Dagan, I., J. Justeson, S. Lappin, H. Leass and A. Ribak 1995. “Syntax and Lexical Statistics in
Anaphora Resolution.” Applied Artificial Intelligence 9:6.633-644.
Dalrymple, M. 1991. “Against Reconstruction in Ellipsis.” Xerox PARC, Palo Alto, CA:
unpublished ms.
----------, S. Shieber and F. Pereira 1991. “Ellipsis and Higher-Order Unification.” Linguistics
and Philosophy 14.399-452.
Fernández, R., J. Ginzburg, H. Gregory and S. Lappin. To appear. “SHARDS: Fragment
Resolution in Dialogue.” H. Bunt and R. Muskens (eds.) Computing Meaning 3. Dordrecht:
Kluwer.
Ginzburg, J. and R. Cooper. To appear. “Clarification, Ellipsis, and the Nature of Contextual
Updates.” Linguistics and Philosophy.
Grosz, B., A. Joshi and S. Weinstein 1995. “Centering: A Framework for Modeling the Local
Coherence of Discourse.” Computational Linguistics 21.203-225.
Hardt, D. 1993. “Verb Phrase Ellipsis: Form, Meaning, and Processing.” University of
Pennsylvania: unpublished Ph.D dissertation.
---------- 1997. “An Empirical Approach to VP Ellipsis.” Computational Linguistics 23.
Hobbs, J. 1978. “Resolving Pronoun References.” Lingua 44.339-352.
---------- and A. Kehler 1997. “A Theory of Parallelism and the Case of VP Ellipsis.” Proc. of
the 35th Conference of the ACL. 394-401. Madrid.
----------, M. Stickel, D. Appelt and P. Martin 1993. “Interpretation as Abduction.” Artificial
Intelligence 63.69-142.
Huang, Y. 2000. Anaphora: A Cross-Linguistic Study. Oxford: Oxford University Press.
Kehler, A. 2000. “Pragmatics, Chapter 18.” D. Jurafsky and J. Martin (eds.) Speech and
Language Processing. Upper Saddle River, NJ: Prentice Hall.
---------- 2002. Coherence, Reference, and the Theory of Grammar. Stanford, CA: CSLI.
Kennedy, C. and B. Boguraev 1996. “Anaphora for Everyone: Pronominal Anaphora
Resolution without a Parser.” Proc. of the 16th Int. Conference on Computational Linguistics
(COLING'96). Copenhagen. 113-118.
Kolhase, M. and A. Koller 2003. “Resource-Adaptive Model Generation as a Performance
Model.” Logic Journal of the IGPL. 435-456.
Lappin, S. 1996. “The Interpretation of Ellipsis.” S. Lappin (ed.) Handbook of Contemporary
Semantic Theory Oxford: Blackwell. 145-175.
---------- and E. Benmamoun (eds.) 1999. Fragments: Studies in Ellipsis and Gapping. New
York: Oxford University Press.
16 LAPPIN
---------- and H. Leass 1994. “A Syntactically Based Algorithm for Pronominal Anaphora
Resolution.” Computational Linguistics 20.535-561.
Mitkov, R. 1998. “Robust Pronoun Resolution with Limited Knowledge.” Proc. of ACL'98 and
COLING'98. 869-875. Montreal.
---------- 2002. Anaphora Resolution. London: Longman.
----------, B. Boguraev and S. Lappin (eds.) 2001. Computational Linguistics 27. Special Issue
on Computational Anaphora Resolution.
Nielsen, L. 2003. “Using Machine Learning Techniques for VPE Detection”. Proc. of RANLP
2004. 339-346. Borovetz.
Nielsen, L. 2004. “Verb Phrase Ellipsis Detection Using Automatically Parsed Text”. Proc. of
COLING 2004. Geneva.
Pulman, S. 1997. “Focus and Higher-Order Unification.” Linguistics and Philosophy 20.
Shieber, S., F. Pereira and M. Dalrymple 1996. “Interactions of Scope and Ellipsis.” Linguistics
and Philosophy 19.527-552.
Soon, W., H. Ng and D. Lim 2001. “A Machine Learning Approach to Coreference Resolution
of Noun Phrases.” Computational Linguistics 27.521-544.
Stuckardt, R. 2001. “Design and Enhanced Evaluation of a Robust Anaphor Resolution
Algorithm.” Computational Linguistics 27.479-506.
How to Deal with Wicked Anaphora?
This paper revises a framework (called AR-engine) capable of easily defining and
operating models of anaphora resolution. The proposed engine envisages the linguistic
and semantic entities involved in the cognitive process of anaphora resolution as
represented in three layers: the referential expressions layer, the projected layer of
referential expression’s features and the semantic layer of discourse entities. Within this
framework, cases of anaphora resolution usually considered difficult to be tackled are
investigated and solutions are proposed. Among them, one finds relations triggered by
syntactic constraints, lemma and number disagreement, and bridging anaphora. The
investigation uses a contiguous text from the belletrist register. The research is
motivated by the view that interpretation of free language in modern applications,
especially those related to the semantic web, requires more and more sophisticated tools.
1 Introduction
Although it is generally accepted that semantic features are essential for
anaphora resolution, due to the difficulty and complexity of achieving a correct
semantic approach, authors of automatic systems mainly preferred to avoid the
extensive use of semantic information (Lappin & Leass, 1994; Mitkov, 1997;
Kameyama, 1997). It is well known that anaphora studies reveal a
psychological threshold around the value of 80% precision and recall that
seems to resist any attempt to be surmounted by present systems (Mitkov,
2002). It is our belief that one of the causes for the current impasse of devising
an anaphora resolution (AR) system with a very high degree of confidence
should be searched also in the choice for a sub-semantic limitation. Drawn
mainly on strict matching criteria, in which morphological and syntactic
features are of great value, these systems disregard resolution decisions based
on more subtle strategies that would allow lemma and number mismatch,
gender variation, split antecedents, bridging anaphora or cataphora resolution.
Moreover, types of anaphora different than strict coreference, like type/token,
subset/superset, is-element-of/has-as-element, is-part-of/has-as-part, etc. often
impose more complex types of decision-making, which could get down to the
semantic level as well.
Our study makes use of the AR framework defined by Cristea and Dima
(2001), and Cristea et al. (2002a) (called AR-engine) with the aim of applying
18 CRISTEA AND POSTOLACHE
1 AR-engine and the related documentation are freely available for research purposes at
http://consilr.info.uaic.ro.
2 We will restrict this study only to nominal referential expressions.
HOW TO DEAL WITH WICKED ANAPHORA? 19
Within such a view, two basic types of anaphoric references can be expressed:
coreferences, inducing equivalence classes of all REs in a text which participate
in a coreference chain, and functional references (Markert et al., 1996), also
called indirect anaphora or associative anaphora (Mitkov, 2002), which express
semantic relations between different discourse entities, including type/token, is-
part-of/has-as-part, is-element-of/has-as-element, etc. As sketched in Figure 1,
chains of coreferential REs are represented as corresponding to a unique DE on
the semantic layer, whereas functional references are represented as relational
text layer REa REb text layer REa REb
a. b. relation
links between the DEs of the corresponding REs.
Figure 1: Representation of anaphoric relations revealing their semantic nature:
a. coreferences; b. functional references.
Representations involving only REs and DEs are the result of an
interpretation process applied to a text. Even if the semantic level is kept
hidden, these types of representations are implicitly assumed by the majority of
anaphora resolution annotation tasks. Indeed, DEs of the semantic layer could
be short-circuited by appropriate tags associated to coreferential REs, where
each RE points either to the first RE of the chain or to the most recent
antecedent RE. Analogously, in the case of functional references, the
annotation tags associated to the surface REs name the nature of the referential
function. However, if we are interested to model the interpretation process
itself, in a way that simulates the cognitive processes developed in a human
mind during text reading, the need for another intermediate layer can
immediately be argued for. On this layer, that we will call the projection layer,
feature structures (in the following, projected structures – PSs) are filled-in
with information fetched from the text layer and all the resolution decisions are
to be negotiated between PSs of the projection layer and DEs of the semantic
layer. We will say that a PS is projected from an RE and that a DE is proposed
(if it appears for the first time in the discourse) or evoked (if it exists already)
by a PS (Figure 2).
20 CRISTEA AND POSTOLACHE
to help the proposal/identification of a discourse entity, once this task has been
fulfilled, the projected structure can be discarded. The result is a bidirectional
link that will be kept between REa and the corresponding DEa. Some moments
later, when a referential expression REb is identified on the text layer, it
projects a features structure PSb on the projection layer (Figure 3c). Finally, if
the model takes the decision that PSb evokes DEa, a bidirectional link between
REb and DEa is established and PSb is discarded (Figure 3d). A similar
sequence takes place when other types of anaphoric relations than strict
coreference are established.
text layer REa REa REa REb REa REb
4
In Spanish: <genderless possessive pronoun> supreme Majesty (feminine noun) … he.
22 CRISTEA AND POSTOLACHE
Majesty… he, displays no such problem because English nouns do not have
genders). Also, most languages acknowledging gender distinction have a
number of nouns or phrases that can be referred to by both masculine and
feminine pronouns, according to the natural gender of the person designated (le
docteur… elle; in English the doctor… she). Though we do not share Barlow’s
view in this respect, namely that morphology should be ignored in AR, a less
categorical approach with respect to a filtering rule based on morphology is
preferable.
b. syntactical features:
- full syntactic description of REs as constituents of a syntactic tree (Lappin &
Leass, 1994; Hobbs, 1978);
- marking of the syntactic role for subject position or obliqueness (the
subcategorisation function with respect to the verb) of the REs, as in all
centering based approaches (Grosz et al., 1995; Brennan et al., 1987),
syntactic domain based approaches (Chomsky, 1981; Reinhart, 1981;
Gordon & Hendricks, 1998; Kennedy & Boguraev, 1996);
- quality of being adjunct, embedded or complement of a preposition
(Kennedy & Boguraev, 1996);
- inclusion or not in an existential construction (Kennedy & Boguraev, 1996);
- syntactic patterns in which the RE is involved, that can lead to the
determination of syntactic parallelism (Kennedy & Boguraev, 1996; Mitkov,
1997);
- the quality of being in an apposition or a predicative noun position.
c. lexico-semantic features:
- lemma;
- person;5
- name (for proper nouns);
- natural gender;
- the part-of-speech of the head word of the RE. The domain of this feature
contains: zero-pronoun (also called zero-anaphora or non-text string), clitic
pronoun, full-flagged pronoun, reflexive pronoun, possessive pronoun,
demonstrative pronoun, reciprocal pronoun, expletive “it”, bare noun
(undetermined), indefinite determined noun, definite determined noun,
proper noun (name);6
- the sense of the head word of the RE, as for instance, given by a wordnet;7
- position of the head of the RE in a conceptual hierarchy (hypo/hypernymy)
as in all models using wordnets (Poesio et al., 1997; Cristea et al., 2002a).
5
Since, among the nominal REs, only pronouns can distinguish the person, for our purposes person is a lexical
feature.
6
As mentioned already, this classification takes into account only nominal anaphors, therefore ignoring verbal,
adverbial, adjectival, etc. (Mitkov, 2002).
7
We prefer to use wordnet as a common noun when we refer to any language variant (Vossen, 1998; Tufiş &
Cristea, 2002a) of the original American English WordNet (Miller et al., 1993).
HOW TO DEAL WITH WICKED ANAPHORA? 23
discourse units whose vein structure (Cristea et al., 1998, 2000) is the one
depicted in Figure 5 in bold line,8 then the order to consider the candidate
referents for PSc (projected from REc) is DEa first and DEb after, since,
hierarchically, REa (and therefore its corresponding DEa) is closer to REc than
REb (and its corresponding DEb).
text layer REa REb REc REa REb REc REa REb REc REd
8
The vein expression of an elementary discourse unit (edu) u, following Veins Theory, is a sequence of edus,
proceeding, including and following u, which account to the minimal coherent sub-discourse focused on u. The
gray lines in Figure 5 exemplify a situation in which REb, the linearly most recent RE from REc, is short-
circuited by the vein expression of the edu REc belongs to, which means that REa is more pregnant in the
reader’s memory than REb when REc is read.
26 CRISTEA AND POSTOLACHE
In the third compulsory phase, the completion phase, the data contained in
the resolved PS is combined with the data configuring the found referent, if
such a DE has been identified or, simply, the PS content is copied onto the
newly build DE if none of the already existing DEs has been recognised. The
resolved PS is afterwards deleted from the projection layer since any
information that it used to capture can now be recuperated from the DE. So, to
give an extreme example, if for some reason a model chooses to look for
previous syntactic patterns of chained REs, they can be found on the semantic
level. Although apparently contradictory to the “semantic” significance of the
layer, this behaviour can mimic the short-term memory that records information
of value for immediate anaphoric resolution.
Finally, the optional re-evaluation phase is triggered if postponed PSs
remained on the projection layer at a former step. The intent is to apply the
matching rules again on all of them. Humans usually resolve anaphors at the
time of reading, but sometimes decisions should be postponed until the
acquisition of complementary information adds enough data to allow a
disambiguation process. Cases of postponed resolution will be discussed in
Section 7.2. At the end of processing, each RE should record a link towards its
corresponding DE and each DE should record a list of links towards its surface
REs.
As we shall see in Sections 3 to 6, when referential relations different than
strict coreference are to be revealed, DE attributes, which are not directly
triggered from the corresponding PSs, appear as necessary. As mentioned at
item 2 of the proposing/evoking phase, a section dedicated to actions to be
performed for the filling-in of specific attributes following a proposing action is
opened in the third component of the framework – the one dedicated to rules
and heuristics.
In the following examples, we will mark REs by italic letters (as a car) and
their corresponding DEs by a paraphrasing text in bold fonts and within square
brackets (as [the car]). The following sections will analyse, within the AR-
engine framework, a set of AR cases, usually considered difficult to interpret.
The discussion intends to evidence specific difficulties inherent to a large
range of anaphoric phenomena, to imagine solutions in terms of an AR model,
by indicating knowledge sources and rules/heuristics capable to deal with the
identified tasks and to informally appreciate the tractability of these solutions.
The discussion remains under the universal panacea for all the failures in AR,
world knowledge (WK).
28 CRISTEA AND POSTOLACHE
9
From G. Orwell’s “1984”.
HOW TO DEAL WITH WICKED ANAPHORA? 29
3.2 Apposition
(4) Mrs. Parsons, the wife of a neighbour on the same floor
(5) Nefertiti, Amenomphis the IVth's wife
(6) Jane, beautiful girl, come to me!
(7) a sort of gaping solemnity, a sort of edified boredom
An apposition usually brings supplementary knowledge on a discourse
entity. Also according to other approaches (Mitkov, 2002), but in disagreement
with the annotation convention of MUC-7 which sees the apposition as one RE
and the pair of the two elements as another RE, we consider the two elements
of the apposition as different REs. In the model that we have built, the type of
relation linking the two REs obeys the following heuristic: definite determined
NP, genitival appositions and undetermined NP, as in (4), (5) and (6) yield
coreferences, whereas indefinite noun appositions as in (7) yield type-of
relations between the DE corresponding to the second RE towards the DE
corresponding to the first RE. Let RE2 be an apposition of RE1 on the text
level. We will suppose a knowledge source capable to apply syntactic criteria in
order to fetch a apposition-of=RE1 slot attached to PS2. As PS1 should
have matched a DE1 the moment PS2 is being processed, a certifying rule must
unify PS2 with DE1, in case RE2 is a definite determined NP, undetermined NP
or a genitival construction. As a result, DE1 will accumulate all the attributes of
PS2. Examples of cases correctly interpreted following this strategy are:10
(Emmanuel Goldstein), (the Enemy of the People); (the primal traitor), (the
earliest defiler of the Party's purity). If the apposition is an indefinite
determined NP, a demolishing rule will rule out as a possible antecedent the
argument of the apposition-of attribute in the current PS. As a
consequence, the usual proposing/evoking mechanism will work, finalized in
finding a target DE. Then, only if the found DE is new, a rule in the attribute-
filling section of the set of rules/heuristics will exploit the apposition-
of=RE1 slot attached to PS2 in order to transform it into a type-of=DE1
value. This strategy will correctly interpret an apposition like (a narrow scarlet
sash), (emblem of the Junior Anti-Sex League). Unfortunately, the knowledge
source responsible to detect appositions can easily go into errors. This is the
case when apposition is iterated over more than just two adjacent constituents:
(the most bigoted adherents of the Party), (the swallowers of slogans), (the
amateur spies) and (nosers-out of unorthodoxy); (a man named O'Brien),
(a member of the Inner Party) and (holder of some post so important and
remote), where clear criteria to disambiguate from enumerations or from
10
From G. Orwell: “1984”.
30 CRISTEA AND POSTOLACHE
11
The implicit assumption here was that WSD capabilities were used as a knowledge source.
12
From G. Orwell’s “1984”.
13
The present model does not implement specific criteria to deal with modalities.
HOW TO DEAL WITH WICKED ANAPHORA? 31
substitute for the person faced in the photo necessitates deep WK. To offer a
substitute of a solution in cases like that, a generic relation like metaphoric-
type-of can be adopted.
The solution we adopted for representing discourse entities subject to time
changes, different than the one proposed in MUC-7 (Hirschman & Chinchor,
1997), is described in (Cristea & Dima, 2001): we have linked entities as the
ones in example (11) with the same-as relation, triggered by the occurrence of
the interposed predicate become.
In all cases (8) to (11), a complication arises when the resolution of RE1 (the
subject) was postponed to the moment RE2 (the predicative noun) is
processed.14 If this happens, either the unification makes PS2 coreferential with
the postponed PS1, or the semantic relation is established between the current
proposed DE and the postponed PS1. Later on, when the postponed PS is
lowered at the semantic level, these relations are maintained.
4 Lemma disagreement of common nouns
4.1 Common NPs displaying identical grammatical number but different
lemmas
(12) Amenomphis the IVth's wife … the beautiful queen
The discovering of the coreference relation in this case should mainly be
similarity-based. In principle, a queen should be found more similar to a wife
then to a pharaoh, supposing Amenomphis is known to be as such. If, instead,
this elaborate knowledge is not available, and all that is known about
Amenomphis, as contributed by a name-entity recogniser knowledge source, is
his quality of being a man, the moment the beautiful queen is processed, a
queen should again be found more similar to a wife than to a man. Many
approaches to measure similarity in NLP are already known and some use
wordnets (e.g. (Resnik, 1999)). When a sense disambiguation procedure is
lacking, then a wordnet-driven similarity that counts the common hypernyms of
all senses of the two lemmas could be a useful substitute in some cases.15 Still,
criteria to decide similarity are not elementary and a simple intersection of the
wordnet hypernymic paths of the anaphor lemma and the candidate antecedent
lemma often does not work. The following is an example of a chain of
erroneous coreferences found on the basis of this simplistic criteria: the centre
of the hall opposite the big telescreen | his place | some post so important and
14
The same is true for apposition.
15
There is good reason to believe that such an approach is successful when lexical ontologies, as fine graded in
word senses as WordNet, are used. This criterion is based on the assumption that senses displaying common
ancestors must be more similar than the ones whose hierarchical paths do not intersect.
32 CRISTEA AND POSTOLACHE
remote | the back of one's neck | a chair | places away | the end of the room | the
protection of his foreign paymasters.16
Sometimes, a useful criterion for the identification of coreferential common
noun REs with different lemmas could be the natural gender (queen and wife
are both feminine in natural gender). In other cases, the antecedent could be
recuperated by looking at the modifiers of the head nouns. Consider example
(13):
(13) the most beautiful women… those beauties
A promoting rule should be able to confront the lemma beauty with
modifiers of the head women in the DE for [the most beautiful women].
4.2 Common NPs with different grammatical number and different lemmas
(14) a patrol … the soldiers
(15) the government… the ministers
According to WordNet, in two out of three senses, a patrol is a group and, in
one sense out of four, government is also a group. This suggests to fill-in a
sem=group feature if the group, grouping -- (any number of
entities (members) considered as a unit) synset is found on a
hypernymic path of the lemma of a candidate antecedent of the plural NP (see
examples (14) and (15)). However, this criterion could prove to be weak
because many words have senses that correspond to groups (a garden, for
instance, has a sense that means a group of flowers, and in a text like A patrol
stopped by the garden. The soldiers… there is high chance to find the soldiers
coreferring to [the garden] rather than to [the patrol]). Different criteria
should be combined to maximize the degree of confidence, among which a
similarity criteria, for instance based on wordnet glosses (as in forest – the
trees and other plants in a large densely woodened area) or
on meronymy, (as in flock – a group of sheep or goats – HAS
MEMBER: sheep – woolly usu. horned ruminant mammal related
to the goat), or even the simple identification of antecedents within a fixed
collection of collective nouns, as suggested in (Barbu et al., 2002). In principle,
this case is similar to the preceding one if an attribute of being a group is
included in the representation of the DE referent.
4.3 Common nouns referring proper nouns
(16) Bucharest… the capital
16
From G. Orwell’s “1984”.
HOW TO DEAL WITH WICKED ANAPHORA? 33
There are no other means to solve this reference than enforcing the labelling
of Bucharest, in its corresponding DE, the very moment when it is processed,
with, for instance, a city1 value of a sem attribute. If this labelling
information is available, fetched by a name-entity recogniser, then the
framework processes the reference the same way it does with common nouns
with different lemmas.
5 Number disagreement
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.