0% found this document useful (0 votes)
23 views

PDF 97025 download

ebook

Uploaded by

attouebideac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

PDF 97025 download

ebook

Uploaded by

attouebideac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Download the full version of the ebook at

https://ebookultra.com

Anaphora Processing Linguistic Cognitive and


Computational Modelling 1st Edition António
Branco

https://ebookultra.com/download/anaphora-
processing-linguistic-cognitive-and-computational-
modelling-1st-edition-antonio-branco/

Explore and download more ebook at https://ebookultra.com


Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Computational Neuroscience and Cognitive Modelling A


Student s Introduction to Methods and Procedures 1st
Edition Britt Anderson
https://ebookultra.com/download/computational-neuroscience-and-
cognitive-modelling-a-student-s-introduction-to-methods-and-
procedures-1st-edition-britt-anderson/
ebookultra.com

Cognitive Linguistic Approaches to Teaching Vocabulary and


Phraseology Applications of Cognitive Linguistics 1st
Edition Boers
https://ebookultra.com/download/cognitive-linguistic-approaches-to-
teaching-vocabulary-and-phraseology-applications-of-cognitive-
linguistics-1st-edition-boers/
ebookultra.com

Computational Modelling of Concrete Structures 1st Edition


Nenad Bicanic

https://ebookultra.com/download/computational-modelling-of-concrete-
structures-1st-edition-nenad-bicanic/

ebookultra.com

Processing Creative Coding and Computational Art 1st


Edition Ira Greenberg

https://ebookultra.com/download/processing-creative-coding-and-
computational-art-1st-edition-ira-greenberg/

ebookultra.com
Computational Modelling in Hydraulic and Coastal
Engineering 1st Edition Christopher Koutitas (Author)

https://ebookultra.com/download/computational-modelling-in-hydraulic-
and-coastal-engineering-1st-edition-christopher-koutitas-author/

ebookultra.com

Cognitive and Communicative Approaches to Linguistic


Analysis 1st Edition Ellen Contini-Morava (Ed.)

https://ebookultra.com/download/cognitive-and-communicative-
approaches-to-linguistic-analysis-1st-edition-ellen-contini-morava-ed/

ebookultra.com

Cognitive foundations of linguistic usage patterns 1st


Edition Hans-Jo■Rg Schmid

https://ebookultra.com/download/cognitive-foundations-of-linguistic-
usage-patterns-1st-edition-hans-jo%cc%88rg-schmid/

ebookultra.com

Principles of Linguistic Change Volume 3 Cognitive and


Cultural Factors 1st Edition William Labov

https://ebookultra.com/download/principles-of-linguistic-change-
volume-3-cognitive-and-cultural-factors-1st-edition-william-labov/

ebookultra.com

The Handbook of Computational Linguistics and Natural


Language Processing 1st Edition Alexander Clark

https://ebookultra.com/download/the-handbook-of-computational-
linguistics-and-natural-language-processing-1st-edition-alexander-
clark/
ebookultra.com
Anaphora Processing Linguistic Cognitive and
Computational Modelling 1st Edition António Branco
Digital Instant Download
Author(s): António Branco, Anthony Mark McEnery, Ruslan Mitkov
ISBN(s): 9789027247773, 9027247773
Edition: 1
File Details: PDF, 2.24 MB
Year: 2005
Language: english
ANAPHORA PROCESSING
AMSTERDAM STUDIES IN THE THEORY AND
HISTORY OF LINGUISTIC SCIENCE
General Editor
E.F. KONRAD KOERNER
(Zentrum für Allgemeine Sprachwissenschaft, Typologie
und Universalienforschung, Berlin)

Series IV – CURRENT ISSUES IN LINGUISTIC THEORY

Advisory Editorial Board

Lyle Campbell (Salt Lake City); Sheila Embleton (Toronto)


Brian D. Joseph (Columbus, Ohio); John E. Joseph (Edinburgh)
Manfred Krifka (Berlin); E. Wyn Roberts (Vancouver, B.C.)
Joseph C. Salmons (Madison, Wis.); Hans-Jürgen Sasse (Köln)

Volume 263

António Branco, Tony McEnery and Ruslan Mitkov (eds)

Anaphora Processing
Linguistic, cognitive and computational modelling
ANAPHORA PROCESSING
LINGUISTIC, COGNITIVE
AND COMPUTATIONAL MODELLING

Edited by

ANTÓNIO BRANCO
Universidade de Lisboa
TONY McENERY
Lancaster University
RUSLAN MITKOV
University of Wolverhampton

JOHN BENJAMINS PUBLISHING COMPANY


AMSTERDAM/PHILADELPHIA
TM The paper used in this publication meets the minimum requirements of American
8

National Standard for Information Sciences — Permanence of Paper for Printed


Library Materials, ANSI Z39.48-984.

Library of Congress Cataloging-in-Publication Data


DAARC 2002 (2002 : Lisbon, Portugal)
Anaphora processing : linguistic, cognitive, and computational modelling : selected papers from
DAARC 2002 / edited by António Branco, Anthony Mark McEnery, Ruslan Mitkov.
p. cm. -- (Amsterdam studies in the theory and history of linguistic science. Series IV,
Current issues in linguistic theory, ISSN 0304-0763 ; v. 263)
Papers presented at the 4th Discourse Anaphora and Anaphor Resolution Colloquium held in
Lisbon in Sept. 2002.
Includes bibliographical references and index.
Anaphora (Linguistics)--Data processing--Congresses. Anaphora (Linguistics)--Psychological
aspects--Congresses.
P299.A5 D3 2002
45--dc22 2004062375
ISBN 90 272 4777 3 (Eur.) /  588 62 2 (US) (Hb; alk. paper)
© 2005 – John Benjamins B.V.
No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other
means, without written permission from the publisher.
John Benjamins Publishing Co. • P.O.Box 36224 • 020 ME Amsterdam • The Netherlands
John Benjamins North America • P.O.Box 2759 • Philadelphia PA 98-059 • USA
CONTENTS

Editors’ Foreword vii

Section I – Computational Treatment

A Sequenced Model of Anaphora and Ellipsis Resolution 3


Shalom Lappin
How to Deal with Wicked Anaphora? 17
Dan Cristea and Oana-Diana Postolache
A Machine Learning Approach to Preference Strategies 47
for Anaphor Resolution
Roland Stuckardt
Decomposing Discourse 73
Joel Tetreault
A Lightweight Approach to Coreference Resolution for Named 97
Entities in Text
Marin Dimitrov, Kalina Bontcheva, Hamish Cunningham and
Diana Maynard
A Unified Treatment of Spanish se 113
Randy Sharp

Section II – Theoretical, Psycholinguistic and Cognitive Issues

Binding and Beyond: Issues in Backward Anaphora 139


Eric Reuland and Sergey Avrutin
Modelling Referential Choice in Discourse: A Cognitive 163
Calculative Approach and a Neural Network Approach
André Grüning and Andrej A. Kibrik
Degrees of Indirectness: Two Types of Implicit Referents and their 199
Retrieval via Unaccented Pronouns
Francis Cornish
Pronominal Interpretation and the Syntax-Discourse Interface: 221
Real-time Comprehension and Neurological Properties
Maria Mercedes Piñango and Petra Burkhardt
vi

Top-down and Bottom-up Effects on the Interpretation of Weak 239


Object Pronouns in Greek
Stavroula-Thaleia Kousta
Different Forms Have Different Referential Properties: Implications 261
for the Notion of ‘Salience’
Elsi Kaiser
Referential Accessibility and Anaphor Resolution: The Case of the 283
French Hybrid Demonstrative Pronoun Celui-Ci/Celle-Ci
Marion Fossard and François Rigalleau

Section III – Corpus-Based Studies

The Predicate-Argument Structure of Discourse Connectives: 303


A Corpus-Based Study
Cassandre Creswell, Katherine Forbes, Eleni Miltsakaki,
Rashmi Prasad, Aravind Joshi and Bonnie Webber
Combining Centering-Based Models of Salience and Information 329
Structure for Resolving Intersentential Pronominal Anaphora
Costanza Navarretta
Pronouns Without NP Antecedents: How do we Know when a 351
Pronoun is Referential?
Jeanette Gundel, Nancy Hedberg and Ron Zacharski
Syntactic Form and Discourse Accessibility 365
Gregory Ward and Andrew Kehler
Coreference and Anaphoric Relations of Demonstrative Noun 385
Phrases in Multilingual Corpus
Renata Vieira, Susanne Salmon-Alt and Caroline Gasperin
Anaphoric Demonstratives: Dealing with the Hard Cases 403
Marco Rocha
Focus, Activation, and This-Noun Phrases: An Empirical Study 429
Massimo Poesio and Natalia N. Modjeska
EDITORS’ FOREWORD
Anaphora is a central topic in the study of natural language and has long been
the object of research in a wide range of disciplines such as theoretical, corpus
and computational linguistics, philosophy of language, psycholinguistics and
cognitive psychology. On the other hand, the correct interpretation of anaphora
has played an increasingly vital role in real-world natural language processing
applications including machine translation, automatic abstracting, information
extraction and question answering. As a result, the processing of anaphora has
become one of the most popular and productive topics of multi- and
inter-disciplinary research, and has enjoyed increased interest and attention in
recent years.

In this context, the biennial Discourse Anaphora and Anaphor Resolution


Colloquia (DAARC) have emerged as the major regular forum for presentation
and discussion of the best research results in this area. Initiated in 1996 at
Lancaster University and taken over in 2000 by the University of Lisbon, the
DAARC series established itself as a specialised and competitive forum for the
presentation of the latest results on anaphora processing, ranging from
theoretical linguistic approaches through psycholinguistic and cognitive work
to corpus studies and computational modelling. The series is unique in that it
covers anaphora from such a variety of multidisciplinary perspectives. The
fourth Discourse Anaphora and Anaphor Resolution Colloquium
(DAARC’2002) took place in Lisbon in September 2002 and featured 44
state-of-the-art presentations (2 invited talks and 42 papers selected from 61
submissions) by 72 researchers from 20 countries.

This volume includes extended versions of the best papers from DAARC’2002.
The selection process was highly competitive in that all authors of papers at
DAARC’2002 were invited to submit an extended and updated version of their
DAARC’2002 paper which was reviewed anonymously by 3 reviewers,
members of a Paper Selection Committee of leading international researchers. It
is worth mentioning that whilst we were delighted to have so many
contributions at DAARC’2002, restrictions on the number of papers and pages
which could be included in this volume forced us to be more selective than we
would have liked. From the 44 papers presented at the colloquium, we had to
select the 20 best papers only.
viii FOREWORD

The book is organised thematically. The papers in the volume have been
topically grouped into three sections:
(i) Computational treatment (6 papers)
(ii) Theoretical, psycholinguistic and cognitive issues (7 papers)
(iii) Corpus-based studies (7 papers)
However, this classification should not be regarded as too strict or absolute, as
some of the papers touch on issues pertaining to more than one of three above
topical groups.

We believe this book provides a unique, up-to-date overview of recent


significant work on the processing of anaphora from a multi- and inter-
disciplinary angle. It will be of interest and practical use to readers from fields
as diverse as theoretical linguistics, corpus linguistics, computational
linguistics, computer science, natural language processing, artificial
intelligence, human language technology, psycholinguistics, cognitive science
and translation studies. The readership will include but will not be limited to
university lecturers, researchers, postgraduate and senior undergraduate
students.

We would like to thank all authors who submitted papers both to the
colloquium and to the call for papers associated with this volume. Their original
and revised contributions made this project materialise.

We would like to express our gratitude to all members of the DAARC


programme committee as well as the members of this volume’s paper selection
committee. Without their help, we would not have been able to arrive at such a
high quality selection. The following is a list of those who participated in the
selection process for the papers in this volume:

Mira Ariel
(Tel Aviv University, Israel)
Amit Bagga
(Avaya Inc., USA)
Branimir Boguraev
(IBM T. J. Watson Research Center, USA)
Peter Bosch
(University of Osnabrück, Institute of Cognitive Science, Germany)
FOREWORD ix

Donna Byron
(The Ohio State University, Computer and Information Science Dept., USA)
Francis Cornish
(CNRS and Université de Toulouse-Le Mirail, Département des Sciences du
Langage, France)
Dan Cristea
(University “Al. I. Cuza” of Iasi, Faculty of Computer Science, Romania)
Robert Dale
(Macquarie University, Division of Information and Communication Sciences,
Centre for Language Technology, Australia)
Iason Demiros
(Institute for Language and Speech Processing, Greece)
Richard Evans
(University of Wolverhampton, School of Humanities, Languages and Social
Sciences, UK)
Martin Everaert
(OTS, The Netherlands)
Claire Gardent
(INRIA-Lorraine, LORIA, France)
Jeanette Gundel
(University of Minnesota, USA and NTNU, Norway)
Sanda Harabagiu
(University of Texas, USA)
Graeme Hirst
(University of Toronto, Department of Computer Science, Canada)
Yan Huang
(University of Reading, Department of Linguistics, UK)
Andrew Kehler
(University of California, San Diego, Department of Linguistics, USA)
Rodger Kibble
(University of London, Department of Computing, UK)
Andrej Kibrik
(Russian Academy of Sciences, Institute of Linguistics, Russia)
Emiel Krahmer
(Tilburg University, The Netherlands)
x FOREWORD

Shalom Lappin
(King's College, UK)
Naila Pimodri
(University of Cambridge)
Maria Mercedes Piñango
(Yale University, Department of Linguistics, USA)
Massimo Poesio
(University of Essex, Department of Computer Science, UK)
Eric Reuland
(OTS, The Netherlands)
Marco Rocha
(Universidade Federal de Santa Catarina, Brazil)
Antonio Fernandez Rodriguez
(University of Alicante, Spain)
Monique Rolbert
(Université de Marseille, France)
Tony Sanford
(University of Glasgow, Department of Psychology, UK)
Roland Stuckardt
(Johann Wolfgang Goethe University Frankfurt am Main, Germany)
Linda Van Guilder
(MITRE, USA)

We would like also to acknowledge the help received from our series editor,
Prof. Konrad Koerner and from Ms Anke de Looper of John Benjamins in
Amsterdam. We are also very grateful to João Silva for his patience and
considerable help in the nitty gritty of file formatting. Without them this book
would not have been a viable project.

September 2004

António Branco
Tony McEnery
Ruslan Mitkov
Section I
Computational Treatment
A Sequenced Model of Anaphora and Ellipsis Resolution
Shalom Lappin
Department of Computer Science, King's College

I compare several types of knowledge-based and knowledge-poor approaches to


anaphora and ellipsis resolution. The former are able to capture fine-grained distinctions
that depend on lexical meaning and real world knowledge, but they are generally not
robust. The latter show considerable promise for yielding wide coverage systems.
However, they consistently miss a small but significant subset of cases that are not
accessible to rough-grained techniques of interpretation. I propose a sequenced model
which first applies the most computationally efficient and inexpensive methods to
resolution and then progresses successively to more costly techniques to deal with cases
not handled by previous modules. Confidence measures evaluate the judgements of each
component in order to determine which instances of anaphora or ellipsis are to be passed
on to the next, more fine-grained subsystem.1
1 Introduction
Anaphora and ellipsis resolution have been an important focus for work in
natural language processing over the past twenty-five years. Providing adequate
solutions to these tasks is necessary for the development of genuinely robust
systems for (among other applications) text interpretation, dialogue
management, query answering, and machine translation. A wide variety of
methods have been applied to the treatment of anaphora and ellipsis ranging
from knowledge intensive and inference-based techniques to statistical
modelling and machine learning. In this paper, I will provide an overview of the
main approaches and summarize their comparative strengths and limitations.
My concern in this survey is not to offer a detailed account of the numerous
computational treatments of anaphora and ellipsis that appear in the literature
but to indicate the main advantages and shortcomings of the primary
approaches that have been suggested.2

1
Earlier versions of this paper were presented at the 4th Discourse Anaphora and Anaphora Resolution
Colloquium in Lisbon in September 2002, the Linguistics Colloquium at the University of Toronto in
November 2002, and the Linguistics Colloquium at the University of Reading in January 2003. I am grateful to
the audiences of these forums for useful discussion of the ideas presented here. I would also like to thank
Ruslan Mitkov and Andy Kehler for their encouragement and their helpful comments on this work.
2
See (Mitkov, 2002) for a recent study of anaphora resolution that includes a history of the problem within
natural language processing. See (Mitkov et al., 2001) for examples of current work on anaphora resolution.
(Huang, 2000) offers an extensive cross-linguistic investigation of anaphora and examines alternative linguistic
theories of this relation. See (Lappin, 1996) and (Lappin & Benmamoun, 1999) for theoretical and
computational discussions of ellipsis resolution.
4 LAPPIN

I will then sketch an integrated model which employs alternative techniques


in a sequence of ascending computational cost and domain specificity. This
model first invokes relatively inexpensive wide coverage procedures for
selecting an antecedent for a pronoun or an elided element. It then moves
through successively more expensive, fine-grained measures to handle the cases
not resolved by the preceding modules. It applies confidence measures to the
decisions of each module to evaluate the reliability of its output. In this way it
determines, for each module, which cases have been correctly resolved and
which ones are passed on to the following component.
In Section 2, I look at knowledge-based and inference driven approaches to
pronominal anaphora resolution. Section 3 considers various knowledge-poor
methods for anaphora interpretation. Section 4 extends the comparison to VP
ellipsis, and Section 5 takes up fragment interpretation in dialogue viewed as a
type of ellipsis. Finally, in Section 6, I describe the proposed sequenced model.
Section 7 states conclusions and indicates directions for future work.
2 Knowledge-Based and Inference-Driven Approaches to Anaphora
Knowledge-based approaches to anaphora resolution generally rely on rules of
inference that encode semantic and real world information in order to identify
the most likely antecedent candidate of a pronoun in discourse. An interesting
example of this approach is Kehler's (2000, 2002) use of Hobbs et al.’s (1993)
model of abductive reasoning to establish coherence relations among the
sentences of a text. In Kehler's theory, pronouns are assigned antecedents
through the abductive inference chains required for text coherence. Hobbs and
Kehler (1997), and Kehler (2002) also invoke abductive inference to interpret
elided VP's and resolve pronouns within VP ellipsis.
To illustrate this approach consider (1), to which Kehler (2000) assigns the
representation (2).
(1) John hid Bill's keys. He was drunk.
(2) a. hide(e1,John,Bill,ck) ∧ car_keys(ck,Bill)
b. drunk(e2,he)
He uses axioms like those in (3) to construct the backwards abductive
inference chain in (4) from (2) to a conclusion in which he is resolved to Bill
(4g).
(3) a. ∀ei,ej(cause(ej,ei) ⇒ explanation(ei,ej))
b. ∀x,y,ei(drunk(ei,x) ⇒ ∃ej,ek(not_want(ej,y,ek) ∧ drive(ek,x), ∧ cause(ei,ej)))
(4) a. explanation(e1,e2)
b. cause(e2,e1)
c. cause(e2,e3) ∧ cause(e3,e1)
d. cause(e2,e4) ∧ cause(e4,e3)
A SEQUENCED MODEL OF ANAPHORA AND ELLIPSIS RESOLUTION 5

e. not_want(e3,John,e5) ∧ have(e5,Bill,ck)
f. not_want(e4,John,e6) ∧ drive(e6,Bill)
g. drunk(e2,Bill)
The main strength of knowledge-based systems is their capacity to capture
fine-grained semantic and pragmatic distinctions not encoded in syntactic
features or frequency of co-occurence patterns. These distinctions are not
accessible to knowledge-poor approaches. They are crucial to correctly
resolving pronominal anaphora and VP ellipsis in a small but important set of
cases that arise in text and dialogue.
The two main difficulties with these systems are that (i) they require a large
database of axioms encoding real world knowledge, and (ii) they apply
defeasible inference rules which produce combinatorial blow up very quickly.
Assigning cost values to inference rules and invoking a cost driven preference
system for applying these rules (as in (Hobbs et al., 1993)) may reduce the blow
up to some extent, but the problem remains significant.
As a result, knowledge-based models of anaphora resolution are generally
not robust. Their rules are often domain-dependent and hard to formulate in a
way that will support inference over more than a small number of cases.
Moreover, the semantic/discourse representations to which the inference rules
apply are not reliably generated for large texts.
3 Knowledge-Poor Approaches
Knowledge-poor systems of anaphora resolution rely on features of the input
which can be identified without reference to deep semantic information or
detailed real world knowledge. One version of this approach employs syntactic
structure and grammatical roles to compute the relative salience of candidate
antecedents. Another uses machine-learning strategies to evaluate the
probability of alternative pronoun-antecedent pairings by training on large
corpora in which antecedent links are marked.
Hobbs (1978) suggests one of the first instances of a syntactic salience
procedure for resolving pronouns. He formulates a tree search algorithm that
uses syntactic configuration and sequential ordering to select NP antecedents of
pronouns through left-right, breadth-first traversal of a tree. Lappin and Leass
(1994) propose an algorithm which relies on weighted syntactic measures of
salience and recency to rank a filtered set of NP candidates. This algorithm
applies to full syntactic parses. Kennedy and Boguraev (1996), Mitkov (1998),
and Stuckardt (2001) modify and extend this approach to yield results for
partial syntactic representations rather than full and unambiguous parse
structures. Grosz et al. (1995) employ a grammatical role hierarchy and
6 LAPPIN

preference rules for managing informational state change to select the local NP
centre (focus) for each element of the sequence of sentences in a discourse.
A recent instance of the application of machine learning to anaphora is
(Soon et al., 2001). They describe a procedure for training a classifier on a
corpus annotated with coreference chains, where the NP elements of these
chains are assigned a set of features. The classifier goes through all pairs of
referential NP's in a text to identify a subset of coreferential pairs.
The obvious advantage of knowledge-poor systems relative to knowledge-
based models is that the former are computationally inexpensive and potentially
robust. However, these claims of resource efficiency and wide coverage must
be qualified by recognition of the expense involved in generating accurate
syntactic representations for systems that apply to full parses or detailed
grammatical role information. Salience-driven systems also require domain
specific and, possibly, language specific values for syntactic salience measures.
In the case of machine learning techniques, it is necessary to factor in the cost
of annotating large corpora and training classification procedures.
An important weakness of these models is that they cannot handle a small
but significant core of anaphora resolution cases in which salience cannot be
identified solely on the basis of syntactic and morphological properties, and
relative recency. These features are also the basis for the candidate rankings
that machine learning methods generate.
Dagan et al. (1995) attempt to enrich a syntactic salience system by
modelling (a certain amount of) semantic and real world information cheaply.
They combine the Lappin-Leass algorithm with a statistically trained lexical co-
occurrence preference module. Elements of the candidate antecedent list are
assigned both salience and lexical preference scores. The latter are based on
frequency counts for verb-NP and prep-NP pairs in a corpus, and the
substitution of the candidate for the pronoun in the observed head-argument
relation of the pronoun. When the difference between the salience scores of the
two highest ranked candidates is below a (experimentally) determined threshold
and the lexical preference score of another candidate Ci exceeds that of the first
by a (experimentally) specified ratio, then Ci is selected.
Consider the pronoun it in 5.
(5) The utility (CDVU) shows you a LIST4250, LIST38PP, or LIST3820 file on your
terminal for a format similar to that in which it will be printed.
The statistical preference module overrides the higher syntactic salience
ranking of utility to select file as the antecedent of it. This preference is due to
the fact that print file has a significantly higher frequency count than print
utility. The statistical module improved the performance of Lappin and
A SEQUENCED MODEL OF ANAPHORA AND ELLIPSIS RESOLUTION 7

Leass' (1994) syntactic salience-based algorithm from 86.1% to 88.6% on a


blind test of 360 pronoun cases in a set of sentences taken from a corpus of
computer manuals.
However, there are cases which still resist resolution even under the finer
grain of lexical co-occurrence information that such a statistical preference
module provides. The contrast between (6) (= 1) and (7) illustrates the limits of
syntactic salience enriched with a statistically trained lexical preference metric.
(6) John hid Bill's keys. He was drunk.
(7) John hid Bill's keys. He was playing a joke on him.
John receives the highest syntactic salience ranking in both (6) and (7).
Lexical preference conditions do not select between John and Bill in these
cases. Reliance on real world knowledge and inference are needed to identify
Bill as the antecedent of he in (6), and John and Bill as the antecedents of he
and him, respectively, in (7).
4 VP Ellipsis
Asher et al. (2001) specify a knowledge-based approach to the interpretation of
VP ellipsis. They employ a general parallelism constraint based on Asher’s
(1993) notion of Maximal Common Theme (MCT) to resolve ambiguity in VP
ellipsis. They define a Theme for a Discourse Representation Structure (DRS) K
as a DRS K' obtained from K by the application of 0 or more operations of a
certain type on K. These operations are (i) deletion of a discourse marker, (ii)
deletion of an atomic condition, and (iii) systematic renaming of a bound
discourse marker. A Common Theme (CT) T of two DRS's J and K is a theme
of J and a theme of K which is such that for any other theme T' of J and K, T is
a theme of T'. Asher et al.'s maximalisation constraint states that in resolving
scope ambiguity within a VP ellipsis construction, the preferred DRS for the
elided VP and its antecedent is the DRS that provides the MCT for the DRS's
representing each clausal constituent. This constraint effectively constitutes a
unification principle for the discourse representations of the sentences
containing the elided and antecedent VP's.3
The MCT condition selects the parallel wide scope reading of the quantified
NP every student relative to a test in (8).
(8) John gave every student a test, and Bill did too.
This is because the DRS's corresponding to this reading of each clause yield
a CT that is a theme of the DRS's for the wide scope interpretation of a test

3
Asher et al. (2001) also invoke this condition to resolve pronouns in ambiguous elided VP's.
8 LAPPIN

relative to every student. The DRS's of the wide scope reading for a test do not
produce a theme for the DRS's of the wide scope reading of every student.
Several other instances of knowledge-based and inference-driven models of
VP ellipsis interpretation are as follows. Hobbs and Kehler (1997), and Kehler
(2002) use parallelism constraints for text coherence to identify VP antecedents.
Dalrymple et al. (1991) and Shieber et al. (1996) apply higher-order unification
to resolve the predicate variable in the semantic representation of an elided VP.
Crouch (1999) constructs derivations in linear logic to provide alternative ways
of assembling the constituents in the representation of an antecedent in order to
obtain possible interpretations of the clause containing the ellipsis site.
This approach to VP ellipsis enjoys the same advantages and suffers from
the same weaknesses that we noted with respect to the knowledge intensive
view of pronominal anaphora resolution.
Turning to a knowledge-poor model, Hardt (1997) describes a procedure for
identifying the antecedent of an elided VP in text that applies to the parse
structures of the Penn Treebank.4
It constructs a list of candidate VP's to which it applies a syntactic filter. The
elements of the filtered candidate list are assigned scores on the basis of
syntactic salience factors and recency.
On a blind test of 96 examples from the Wall Street Journal the procedure
achieved a success rate of 94.8% according to a head verb overlap criterion (the
head verb of the system's selected candidate is contained in, or contains the
head verb of the coder's choice of antecedent). It achieved 85.4% for exact head
verb match and 76% for full antecedent match. A comparison procedure that
relies only on recency scored 75% for head verb overlap, 61.5% for exact head
verb match, and 14.6% for full antecedent match.
Hardt's syntactic salience-based procedure uses essentially the same strategy
and design for identifying the antecedent of an elided VP as Lappin and Leass’
(1994) algorithm applies to pronominal anaphora resolution. Its higher success
rate may, in part, be due to the fact that recency and syntactic filtering tend to
reduce the set of candidates more effectively for elided VP's than for pronouns.
As in the case of pronouns, a small set of elided VP cases are not accessible
to resolution by salience ranking or statistically modelled lexical preference.
The following examples clearly indicate that inference based on semantic and
real world knowledge appears to be inescapable for these cases.5
4
Hard's procedure applies to elided VP's that have already been recognized. Nielsen (2003, 2004) presents
preliminary results for the application of a variety of machine learning methods to the identification of elided
VP's in text.
5
Dalrymple (1991), Hardt (1993), and Kehler (2002) claim that the fact that inference is required to identify
the antecedents of the elided VP's in (9) and (10) shows that ellipsis resolution applies to semantic rather than
A SEQUENCED MODEL OF ANAPHORA AND ELLIPSIS RESOLUTION 9

(9) Mary and Irv want to go out, but Mary can't, because her father disapproves of Irv.
(Webber, 1979)
Mary can't go out with Irv
(10) Harry used to be a great speaker, but he can't anymore, because he lost his voice.
(Hardt, 1993)
he can't speak anymore
5 The Interpretation of Fragments in Dialogue
Fernández et al. (to appear) present SHARDS, a system for interpreting non-
sentential phrasal fragments in dialogue. Examples of such fragments are short
answers (11), sluices (short questions, 12), and bare adjuncts (13). The latter are
possible even when no wh-phrase adjunct appears in the antecedent to anchor
them, as in (14).
(11) A: Who saw Mary?
B: John.
John saw Mary.
(12) A: A student saw John.
B: Who?
Which student saw John?
(13) A: When did Mary arrive?
B: At 2.
Mary arrived at 2.
(14) A: John completed his paper.
B: When?
When did John complete his paper?
SHARDS is a Head Driven Phrase Structure Grammar (HPSG)-based
system for the resolution of fragments in dialogue. It treats the task of resolving
fragment ellipsis as locating for the (target) ellipsis element a parallel (source)
element in the context, and computing from contextual information a property
which composes with the target to yield the resolved content. This basic view of
ellipsis resolution is similar in spirit to the higher-order unification (HOU)
approach of Dalrymple et al. (1991) and Pulman (1997).

syntactic representations. In fact, it is not obvious that the need for inference in (some cases of) ellipsis
resolution in itself determines the nature of the representation to which the inference rules apply. Lappin (1996)
argues that inference can apply to syntactic representations of sentences to generate structures corresponding to
(i) and (ii).
(i) Mary wants to go out with Irv.
(ii) Harry used to speak.
These structures supply appropriate antecedents for the syntactic reconstruction of the elided VP's in (9) and
(10), respectively. The need for inference in ellipsis resolution, on one hand, and the nature of the level of
representation to which inference and ellipsis resolution apply, on the other, are independent questions which
should be distinguished.
10 LAPPIN

Two new attributes are defined within the CONTEXT feature structure: the
Maximal Question Under Discussion (MAXQUD) and the Salient Utterance
(SALUTT). The MAXQUD is the most salient question that needs to be answered
in the course of a dialogue. The SALUTT represents a distinguished constituent
of the utterance whose content is the current value of MAXQUD. In information
structure terms, the SALUTT specifies a potential parallel element correlated with
an element in the antecedent question or assertion. The SALUTT is the element
of the MAXQUD that corresponds to the fragment phrase. By deleting the SALUTT
from the MAXQUD, SHARDS produces the representation of a property from
which the propositional core of the CONTENT value for the fragment can be
constructed.
(15) is the (simplified) typed feature structure that (Fernández et al., to
appear) posit for a bare fragment phrase.
(15)bare-arg-ph ⇒
STORE {} 
CONT | SOA | NUCL 1 
 MAX − QUD | SOA | NUCL 1 
  
CTXT CAT 2 
SAL − UTT CONT | INDEX   
  3
 CAT 2  
HD − DTRS CONT | INDEX 3  
 
SHARDS interprets a fragment in dialogue by computing from context
(represented as a dialogue record) the values of MAXQUD and SALUTT for the
assertion or question clause that the fragment expresses. It uses these feature
values to specify the CONTENT feature of the clause for the fragment. The basic
propositional content of the fragment clause is recovered from the MAXQUD,
whose NUCL feature value is shared with the NUCL feature of the fragment
clause's CONT feature.
The value of SALUTT is of type sign, enabling the system to encode syntactic
categorial parallelism conditions, including case assignment for the fragment.
The SALUTT is computed as the (sub)utterance associated with the role bearing
widest quantificational scope within the MAXQUD.
SHARDS computes the possible MAXQUD's from each sentence which it
processes and adds them to the list of MAXQUD candidates in the dialogue
record. When a fragment phrase FP is encountered, SHARDS selects the most
recent element of the MAXQUD candidate list which is compatible with FP's
clausal semantic type.
(16) is the Attribute Value Matrix (AVM) produced for the CONT of Who
saw Mary. 1 is the index value of who and 2 of Mary:
A SEQUENCED MODEL OF ANAPHORA AND ELLIPSIS RESOLUTION 11

(16)
 INDEX 1 
PARAMS RESTR {person − rel ( 1 )} 
 
  saw − rel  
SOA | NUCL see − er 1  
 seen 2  
 
This is the feature structure counterpart of the λ-abstract λπ.(…π…).
The (abbreviated) AVM for the SALUTT who is (17).
(17)
 
CAT NP[+ nom] 
CONTENT 1 
 
STORE INDEX 1 
 RESTR {person( 1 )} 

(18) is the AVM produced for John as a short answer, where 1 is the index
value of John and 2 of Mary.
(18)
   see − rel  
  NUCL see − er 1  
SOA  seen  
   2  
 RESTR {person − rel ( 1 )} 

(19) is the feature structure generated for Who as a sluice in response to A


student saw John. 1 is the index value of Who and 2 of John.
(19)
 INDEX 1 
PARAMS   person − rel ( 1 ), 
 RESTR student − rel ( 1 )  
  
  see − rel  
SOA | NUCL see − er 1  
 seen 2  
   
For at least some cases it is necessary to relax the requirement of strict
syntactic category match between the fragment and the SALUTT to allow
correspondence to be specified in terms of an equivalence class of categories.
12 LAPPIN

(20) A: What does Mary want most?


B: a good job/that people should like her/to have her freedom
(21) A: When did John complete his paper?
B: yesterday/on Wednesday/after the teacher spoke to him
There are also instances where the scope criterion for determining the
SALUTT must be overridden.
(22)a. A: Each student will consult a supervisor.
B: Which one?
b. Which supervisor will each student consult?
(22b), which selects a supervisor as the SALUTT, is the most natural
interpretation of which one in (22a) B, even when a supervisor receives a
narrow scope reading relative to each student in (22a) A.
Similarly, the recency condition for selecting the MAXQUD from the list of
MAXQUD candidates in the dialogue record does not always yield the correct
results, as the dialogue sequence in (23) illustrates.
(23) A: Why did Mary arrive early?
B: I can't tell you.
A: Why can't you tell me?
B: Okay, if you must know, to surprise you.
The fragment phrase to surprise you is a reply to the first question that A
asks, Why did Mary arrive early?, rather than the second, Why can't you tell
me?.
Knowledge-based inference is required to select the more distant question as
the preferred MAXQUD in this case.
The following example for the British National Corpus is an example of a
naturally occurring dialogue in which the recency criterion for determining the
6
MAXQUD is defeasible.
(24) A1: That new tyre law comes in soon dunnit?
B2: That what?
A3: New tyre law.
C4: First of <pause> first of November it came in.
A5: Oh.
C6: Why?
A7: I'd better check my two back ones then.
The sluice in (24) C6 is a case of clarificatory ellipsis (Ginzburg & Cooper,
to appear). It takes as its MAXQUD antecedent the illocutionary statement
corresponding to (24) A1 rather than the statement in (24) C4.

6
(24) is from the dialogue component of the British National Corpus, File KB4, sentences 144-150. I am
grateful to Raquel Fernández for providing this example.
A SEQUENCED MODEL OF ANAPHORA AND ELLIPSIS RESOLUTION 13

6 A Sequenced Model
As we have seen, work on anaphora and ellipsis within the framework of the
knowledge-poor approach indicates that syntactic measures of salience
combined with recency provide a highly effective procedure for antecedent
identification across a wide range of ellipsis and anaphora resolution tasks in
text and dialogue. These methods are computationally inexpensive and
generally robust. It is possible to deal with a subset of the (significant) minority
of cases which are not amenable to syntactic salience-based resolution through
statistical modelling of semantic and real world knowledge as lexical preference
patterns. For the remaining cases abductive inference appears to be
unavoidable. These considerations suggest that a promising approach is to
apply the techniques in an ascending sequence of computational cost. (25) gives
the outline of a plausible architecture for such an integrated sequenced model of
anaphora and ellipsis resolution.
(25) <P,Candidate_Antecedent_List> ⇒
Module 1
Syntactic Salience & Recency Measures +
Syntactic & Morphological Filtering →
Ranked Candidate List →
Confidence Metric 1 →
correctly resolved;
unresolved ⇒
Module 2
Statistically Determined Lexical Preference Measures →
New Ranked Candidate List →
Confidence Metric 2 →
correctly resolved;
unresolved ⇒
Module 3
Abductive Inference ⇒
resolved
The sequenced model of anaphora and ellipsis resolution proposed here
moves successively from computationally inexpensive and interpretationally
rough-grained procedures to increasingly costly and fine-grained methods. The
model encodes a strategy of maximizing the efficiency of an anaphora (ellipsis)
resolution system by invoking fine-grained techniques only when necessary.
In order to succeed, this strategy must use reliable confidence metrics to
evaluate the candidate rankings which the first two modules produce. Such
metrics can be constructed on the model of the criteria that Dagan et al. (1995)
use to evaluate the reliability of salience scores. When the distance between the
14 LAPPIN

salience scores of the two top candidates in a list falls below a certain threshold,
the ranking is taken as an unreliable basis for antecedent selection and the
statistical lexical preference module is activated. Intensive experimental work
using machine learning techniques will be needed to determine optimal values
for both the salience factors of Module 1 and the confidence metrics used to
assess the outputs of Modules 1 and 2.
A computationally viable abductive inference component will require
resource sensitive inference rules to control the size and number of the
inference chains that it generates.7 Resource sensitivity and upper bounds on
derivations in abductive inference are essential to rendering the procedures of
Module 3 tractable.
7 Conclusions and Future Work
While the knowledge-based and inference driven approach to anaphora and
ellipsis resolution can deal with cases that require fine-grained semantic
interpretation and detailed real world knowledge, it does not provide the basis
for developing computationally efficient, wide coverage systems. By contrast,
knowledge-poor methods are inexpensive and potentially robust, but they miss
an important minority of recalcitrant cases for which real world knowledge and
inference are indispensable. A promising solution to this engineering problem is
to construct an integrated system that orders the application of anaphora and
ellipsis interpretation techniques in a sequence of modules that apply
increasingly fine-grained techniques of interpretation with an attendant rise in
computational cost. Confidence metrics filter the output of each module to
insure that the more expensive components are invoked only when needed.
In order to implement the proposed model, it is important to achieve
optimisation of the selection of salience parameters and their relative values
through statistical analysis of experimental results. A considerable amount of
work has been done on the application of salience parameters and values to
minimal syntactic representations rather than fully specified parse structures.
This is a fruitful area of investigation which merits further research, as it holds
out the promise of major gains in efficiency and robustness for the salience
methods that comprise the first module of an integrated system. Another
problem worth pursuing is the generalization of lexical preference patterns to
relations between semantic classes. Measuring preference in terms of semantic
categories rather than specific lexical head-argument and head-adjunct patterns
7
Kohlhase and Koller (2003) propose resource sensitive inference rules for model generation in which the
salience of a referential NP in discourse is used to compute the relative cost of applying inference rules to
entities introduced by this NP. They measure salience in discourse largely in terms of the sorts of syntactic and
recency factors that Lappin and Leass (1994) use in their anaphora resolution algorithm.
A SEQUENCED MODEL OF ANAPHORA AND ELLIPSIS RESOLUTION 15

will increase the power and reliability of Module 2. The viability of the entire
system depends upon determining reliable confidence metrics for both salience-
based and lexical preference-based antecedent selection. Finally, to implement
the third module much work must be done to develop efficiently resource
sensitive procedures for abductive inference in different domains.
References
Asher, N. 1993. Reference to Abstract Objects in English. Dordrecht: Kluwer.
Asher, N., D. Hardt and J. Busquets 2001. “Discourse Parallelism, Ellipsis, and Ambiguity.”
Journal of Semantics 18.
Crouch, D. 1999. “Ellipsis and Glue Languages.” S. Lappin and E. Benmamoun (eds.)
Fragments: Studies in Ellipsis and Gapping. New York: Oxford University Press.32-67.
Dagan, I., J. Justeson, S. Lappin, H. Leass and A. Ribak 1995. “Syntax and Lexical Statistics in
Anaphora Resolution.” Applied Artificial Intelligence 9:6.633-644.
Dalrymple, M. 1991. “Against Reconstruction in Ellipsis.” Xerox PARC, Palo Alto, CA:
unpublished ms.
----------, S. Shieber and F. Pereira 1991. “Ellipsis and Higher-Order Unification.” Linguistics
and Philosophy 14.399-452.
Fernández, R., J. Ginzburg, H. Gregory and S. Lappin. To appear. “SHARDS: Fragment
Resolution in Dialogue.” H. Bunt and R. Muskens (eds.) Computing Meaning 3. Dordrecht:
Kluwer.
Ginzburg, J. and R. Cooper. To appear. “Clarification, Ellipsis, and the Nature of Contextual
Updates.” Linguistics and Philosophy.
Grosz, B., A. Joshi and S. Weinstein 1995. “Centering: A Framework for Modeling the Local
Coherence of Discourse.” Computational Linguistics 21.203-225.
Hardt, D. 1993. “Verb Phrase Ellipsis: Form, Meaning, and Processing.” University of
Pennsylvania: unpublished Ph.D dissertation.
---------- 1997. “An Empirical Approach to VP Ellipsis.” Computational Linguistics 23.
Hobbs, J. 1978. “Resolving Pronoun References.” Lingua 44.339-352.
---------- and A. Kehler 1997. “A Theory of Parallelism and the Case of VP Ellipsis.” Proc. of
the 35th Conference of the ACL. 394-401. Madrid.
----------, M. Stickel, D. Appelt and P. Martin 1993. “Interpretation as Abduction.” Artificial
Intelligence 63.69-142.
Huang, Y. 2000. Anaphora: A Cross-Linguistic Study. Oxford: Oxford University Press.
Kehler, A. 2000. “Pragmatics, Chapter 18.” D. Jurafsky and J. Martin (eds.) Speech and
Language Processing. Upper Saddle River, NJ: Prentice Hall.
---------- 2002. Coherence, Reference, and the Theory of Grammar. Stanford, CA: CSLI.
Kennedy, C. and B. Boguraev 1996. “Anaphora for Everyone: Pronominal Anaphora
Resolution without a Parser.” Proc. of the 16th Int. Conference on Computational Linguistics
(COLING'96). Copenhagen. 113-118.
Kolhase, M. and A. Koller 2003. “Resource-Adaptive Model Generation as a Performance
Model.” Logic Journal of the IGPL. 435-456.
Lappin, S. 1996. “The Interpretation of Ellipsis.” S. Lappin (ed.) Handbook of Contemporary
Semantic Theory Oxford: Blackwell. 145-175.
---------- and E. Benmamoun (eds.) 1999. Fragments: Studies in Ellipsis and Gapping. New
York: Oxford University Press.
16 LAPPIN

---------- and H. Leass 1994. “A Syntactically Based Algorithm for Pronominal Anaphora
Resolution.” Computational Linguistics 20.535-561.
Mitkov, R. 1998. “Robust Pronoun Resolution with Limited Knowledge.” Proc. of ACL'98 and
COLING'98. 869-875. Montreal.
---------- 2002. Anaphora Resolution. London: Longman.
----------, B. Boguraev and S. Lappin (eds.) 2001. Computational Linguistics 27. Special Issue
on Computational Anaphora Resolution.
Nielsen, L. 2003. “Using Machine Learning Techniques for VPE Detection”. Proc. of RANLP
2004. 339-346. Borovetz.
Nielsen, L. 2004. “Verb Phrase Ellipsis Detection Using Automatically Parsed Text”. Proc. of
COLING 2004. Geneva.
Pulman, S. 1997. “Focus and Higher-Order Unification.” Linguistics and Philosophy 20.
Shieber, S., F. Pereira and M. Dalrymple 1996. “Interactions of Scope and Ellipsis.” Linguistics
and Philosophy 19.527-552.
Soon, W., H. Ng and D. Lim 2001. “A Machine Learning Approach to Coreference Resolution
of Noun Phrases.” Computational Linguistics 27.521-544.
Stuckardt, R. 2001. “Design and Enhanced Evaluation of a Robust Anaphor Resolution
Algorithm.” Computational Linguistics 27.479-506.
How to Deal with Wicked Anaphora?

Dan Cristea1,2 and Oana-Diana Postolache1


1
Al.I.Cuza University, Faculty of Computer Science
2
Romanian Academy, Institute for Theoretical Computer Science

This paper revises a framework (called AR-engine) capable of easily defining and
operating models of anaphora resolution. The proposed engine envisages the linguistic
and semantic entities involved in the cognitive process of anaphora resolution as
represented in three layers: the referential expressions layer, the projected layer of
referential expression’s features and the semantic layer of discourse entities. Within this
framework, cases of anaphora resolution usually considered difficult to be tackled are
investigated and solutions are proposed. Among them, one finds relations triggered by
syntactic constraints, lemma and number disagreement, and bridging anaphora. The
investigation uses a contiguous text from the belletrist register. The research is
motivated by the view that interpretation of free language in modern applications,
especially those related to the semantic web, requires more and more sophisticated tools.
1 Introduction
Although it is generally accepted that semantic features are essential for
anaphora resolution, due to the difficulty and complexity of achieving a correct
semantic approach, authors of automatic systems mainly preferred to avoid the
extensive use of semantic information (Lappin & Leass, 1994; Mitkov, 1997;
Kameyama, 1997). It is well known that anaphora studies reveal a
psychological threshold around the value of 80% precision and recall that
seems to resist any attempt to be surmounted by present systems (Mitkov,
2002). It is our belief that one of the causes for the current impasse of devising
an anaphora resolution (AR) system with a very high degree of confidence
should be searched also in the choice for a sub-semantic limitation. Drawn
mainly on strict matching criteria, in which morphological and syntactic
features are of great value, these systems disregard resolution decisions based
on more subtle strategies that would allow lemma and number mismatch,
gender variation, split antecedents, bridging anaphora or cataphora resolution.
Moreover, types of anaphora different than strict coreference, like type/token,
subset/superset, is-element-of/has-as-element, is-part-of/has-as-part, etc. often
impose more complex types of decision-making, which could get down to the
semantic level as well.
Our study makes use of the AR framework defined by Cristea and Dima
(2001), and Cristea et al. (2002a) (called AR-engine) with the aim of applying
18 CRISTEA AND POSTOLACHE

it to the treatment of cases of anaphora resolution usually considered to be


difficult. The AR-engine approach is settled on a view that sees anaphoric
relations as having a semantic nature (Halliday & Hassan, 1976), as opposed to
a textual nature.
This paper discusses the tractability of implementing AR-models capable of
tackling cases of anaphora usually considered difficult. The validation of the
approach is currently being done on a contiguous free text by informally
appreciating the computational feasibility of the proposed solutions within the
AR-engine framework.
The research is motivated by the belief that interpretation of free language in
modern applications, especially those related to semantic web, justifies more
and more sophisticated tools. We think that our investigation is a step forward
towards dealing with really hard anaphora resolution problems as those
occurring in free texts. The study intends to determine a psychological
boundary beyond which is really hard to process anaphora. It is our belief that
the usual lack of interest for considering hard cases of anaphora in practical
settings is not always motivated by high modelling and computational costs and
their notoriety of “untouchables”, tacitly accepted, is exaggerated. The real hard
life in dealing with AR happens only when world knowledge is to be put on the
table. In this paper, we try to prove that until then, there is still a lot to do.
The presentation proceeds as follows: Section 2 describes AR-engine: its
basic principles, the constituent parts in the definition of a model within the
framework and the basic functionality of the engine put to analyse a free text.
Sections 3 to 7 discuss cases of AR, from more simple to more complex.
Finally, Section 8 presents preliminary evaluation data and conclusions.
2 The framework
2.1 The AR-engine1 basic principles
In (Cristea & Dima, 2001; Cristea et al., 2002a) a framework having the
functionality of a general AR engine and able to accommodate different AR
models is proposed. This approach recognizes the intrinsic incrementality of the
cognitive process of anaphora interpretation during reading a text or listening a
discourse. It sees the linguistic and semantic entities involved in the process of
AR as settled on two fundamental layers: a text layer – populated with
referential expressions (REs),2 and a deep semantic layer – where discourse
entities (DEs), representations of entities the discourse is about, are placed.

1 AR-engine and the related documentation are freely available for research purposes at
http://consilr.info.uaic.ro.
2 We will restrict this study only to nominal referential expressions.
HOW TO DEAL WITH WICKED ANAPHORA? 19

Within such a view, two basic types of anaphoric references can be expressed:
coreferences, inducing equivalence classes of all REs in a text which participate
in a coreference chain, and functional references (Markert et al., 1996), also
called indirect anaphora or associative anaphora (Mitkov, 2002), which express
semantic relations between different discourse entities, including type/token, is-
part-of/has-as-part, is-element-of/has-as-element, etc. As sketched in Figure 1,
chains of coreferential REs are represented as corresponding to a unique DE on
the semantic layer, whereas functional references are represented as relational
text layer REa REb text layer REa REb

semantic layer DEa semantic layer DEa DEb

a. b. relation
links between the DEs of the corresponding REs.
Figure 1: Representation of anaphoric relations revealing their semantic nature:
a. coreferences; b. functional references.
Representations involving only REs and DEs are the result of an
interpretation process applied to a text. Even if the semantic level is kept
hidden, these types of representations are implicitly assumed by the majority of
anaphora resolution annotation tasks. Indeed, DEs of the semantic layer could
be short-circuited by appropriate tags associated to coreferential REs, where
each RE points either to the first RE of the chain or to the most recent
antecedent RE. Analogously, in the case of functional references, the
annotation tags associated to the surface REs name the nature of the referential
function. However, if we are interested to model the interpretation process
itself, in a way that simulates the cognitive processes developed in a human
mind during text reading, the need for another intermediate layer can
immediately be argued for. On this layer, that we will call the projection layer,
feature structures (in the following, projected structures – PSs) are filled-in
with information fetched from the text layer and all the resolution decisions are
to be negotiated between PSs of the projection layer and DEs of the semantic
layer. We will say that a PS is projected from an RE and that a DE is proposed
(if it appears for the first time in the discourse) or evoked (if it exists already)
by a PS (Figure 2).
20 CRISTEA AND POSTOLACHE

text layer REa REb text layer REa REb


REa projects PSa REb projects PSb REa projects PSa REb projects PSb
projection layer PSa PSb projection layer PSa PSb
PSa proposes DEa PSb evokes DEa PSa proposes DEa PSb proposes DEb
semantic layer DEa semantic layer DEa DEb
relation
a. b.
Figure 2: The three-layer representation of:
a. two coreferring expressions; b. two functional referential expressions
We term referential expression (RE) any noun phrase having a referential
function, including the first mention of an entity. The coreference relation (two
REs are coreferent if they refer to the same entity (Hirschman et al., 1997)) is,
in most of the cases, anaphoric,3 while not all anaphoric relations are
coreferential (e.g. bridging anaphora). Then, according to the usual acceptance
(see for instance (Mitkov, 2002)), if REb corefers with REa, with REb following
REa in text, we say that REb is the anaphor and REa the antecedent. In order to
stress the semantic nature of anaphora as a referential relation (Halliday &
Hassan, 1976), if anaphors and antecedents remain intrinsically connected to
the text, discourse entities belong to the semantic layer and are said to be the
referents of REs. The unique DE that is referred to by a set of REs disposed in
sequence reveals thus the equivalence class of these REs as a chain of
coreferencing expressions.
Figure 3 presents a sequence of phases during the functioning of the AR-
engine in which two referential expressions are found to corefer. First, the
referential expression REa is identified on the text layer. It projects down to the
projection layer a feature structure composed of a set of attribute-value pairs –
PSa (Figure 3a). Supposing the model decides in favour of considering REa as
introducing a new discourse entity during interpretation, the feature structure
PSa proposes an adequate semantic representation on the semantic layer – DEa,
mainly a copy of PSa (Figure 3b). Because the aim of the projected structure is
3
For the definition of anaphoric relations we adopt a somehow different position than Deemter and Kibble
(2000), for instance. They argue that, following the definition of anaphora: an NP α1 is said to take an NP α2 as
its anaphoric antecedent if and only if α1 depends on α2 for its interpretation (e.g. (Kamp & Reyle, 1993)),
W.J.Clinton and Hillary Rodham’s husband, are not anaphoric since Hillary Rodham’s husband can be
understood as W.J.Clinton by itself, therefore without the help of the former RE. Our meaning for α1 depends
on α2 for its interpretation is α1 and α2 are related in the given setting. In this sense, the two REs above are
anaphoric if the intent of the writer is to let the reader establish a link between the two mentions, in this
particular case, as the same person. In (Cristea, 2000), co-referential non-anaphoric references are called pseudo
references. These are REs which, although referring to the same entity, can be understood independently
without making the text interpretation to suffer if a relation between them is not established (for instance, two
mentions of the sun: I waked up this morning when the sun rose; and later on: I read a book about Amenomphis
the IVth, the Egyptian pharaoh, son of the sun).
HOW TO DEAL WITH WICKED ANAPHORA? 21

to help the proposal/identification of a discourse entity, once this task has been
fulfilled, the projected structure can be discarded. The result is a bidirectional
link that will be kept between REa and the corresponding DEa. Some moments
later, when a referential expression REb is identified on the text layer, it
projects a features structure PSb on the projection layer (Figure 3c). Finally, if
the model takes the decision that PSb evokes DEa, a bidirectional link between
REb and DEa is established and PSb is discarded (Figure 3d). A similar
sequence takes place when other types of anaphoric relations than strict
coreference are established.
text layer REa REa REa REb REa REb

projection layer PSa PSb

semantic layer DEa DEa DEa


a b c d
Figure 3: a. Projection of PSa from REa; b. Proposing of DEa from PSa;
c. Projection of PSb from REb; d. Evocation of DEa by PSb

2.2 Definition of an AR model


The AR-engine framework can accommodate different AR models. Such a
model is defined in terms of four components. The first component specifies
the set of attributes of the objects populating the projection and semantic
layers and their corresponding types. Different approaches in AR may lead
to consider specific options for what features of the anaphor and the referent are
to be considered important in the resolution process. An analysis of the state of
the art of the existing approaches suggests a classification of the possible
features (attributes) on the following lines:
a. morphological features:
- grammatical number;
- grammatical gender;
- case.
All known approaches use morphological criteria to filter out antecedents.
However, there are frequent cases when elimination of possible referential links
based on mismatches of morphological features may lead to erroneous
conclusions. Barlow (1998), for instance, presents examples when gender
concord between a pronominal anaphor and a common noun antecedent seems
to be unobserved (Su Majestad suprema… él,4 in which the antecedent is a
feminine NP and the anaphor – a masculine pronoun; in English his supreme

4
In Spanish: <genderless possessive pronoun> supreme Majesty (feminine noun) … he.
22 CRISTEA AND POSTOLACHE

Majesty… he, displays no such problem because English nouns do not have
genders). Also, most languages acknowledging gender distinction have a
number of nouns or phrases that can be referred to by both masculine and
feminine pronouns, according to the natural gender of the person designated (le
docteur… elle; in English the doctor… she). Though we do not share Barlow’s
view in this respect, namely that morphology should be ignored in AR, a less
categorical approach with respect to a filtering rule based on morphology is
preferable.
b. syntactical features:
- full syntactic description of REs as constituents of a syntactic tree (Lappin &
Leass, 1994; Hobbs, 1978);
- marking of the syntactic role for subject position or obliqueness (the
subcategorisation function with respect to the verb) of the REs, as in all
centering based approaches (Grosz et al., 1995; Brennan et al., 1987),
syntactic domain based approaches (Chomsky, 1981; Reinhart, 1981;
Gordon & Hendricks, 1998; Kennedy & Boguraev, 1996);
- quality of being adjunct, embedded or complement of a preposition
(Kennedy & Boguraev, 1996);
- inclusion or not in an existential construction (Kennedy & Boguraev, 1996);
- syntactic patterns in which the RE is involved, that can lead to the
determination of syntactic parallelism (Kennedy & Boguraev, 1996; Mitkov,
1997);
- the quality of being in an apposition or a predicative noun position.
c. lexico-semantic features:
- lemma;
- person;5
- name (for proper nouns);
- natural gender;
- the part-of-speech of the head word of the RE. The domain of this feature
contains: zero-pronoun (also called zero-anaphora or non-text string), clitic
pronoun, full-flagged pronoun, reflexive pronoun, possessive pronoun,
demonstrative pronoun, reciprocal pronoun, expletive “it”, bare noun
(undetermined), indefinite determined noun, definite determined noun,
proper noun (name);6
- the sense of the head word of the RE, as for instance, given by a wordnet;7
- position of the head of the RE in a conceptual hierarchy (hypo/hypernymy)
as in all models using wordnets (Poesio et al., 1997; Cristea et al., 2002a).
5
Since, among the nominal REs, only pronouns can distinguish the person, for our purposes person is a lexical
feature.
6
As mentioned already, this classification takes into account only nominal anaphors, therefore ignoring verbal,
adverbial, adjectival, etc. (Mitkov, 2002).
7
We prefer to use wordnet as a common noun when we refer to any language variant (Vossen, 1998; Tufiş &
Cristea, 2002a) of the original American English WordNet (Miller et al., 1993).
HOW TO DEAL WITH WICKED ANAPHORA? 23

Features as animacy, sex (or natural gender) and concreteness could be


considered simplified semantic tags derived from a conceptual hierarchy;
- inclusion in a wordnet synonymy class;
- semantic roles, out of which selectional restrictions, inferential links,
pragmatic limitations, semantic parallelism and object preference can be
verified.
d. positional features:
- offset of the first token of the RE (an NP) in the text (Kennedy & Boguraev,
1996);
- inclusion in an utterance, sentence or clause, considered as a discourse unit
(Azzam et al., 1998; Cristea et al., 1998). This feature allows, for instance,
calculation of the proximity between the anaphor and the antecedent in terms
of the number of intervening discourse units.
e. other features:
- inclusion or not of the RE in a specific lexical field, dominant in the text
(this is called “domain concept” in (Mitkov, 1997));
- frequency of the term in the text (Mitkov, 1997);
- occurrence of the term in a heading (Mitkov, 1997).
The second component of a model is a set of knowledge sources intended to
fetch values from the text to the attributes of the PS. A knowledge source is a
virtual processor able to fill in values for one single attribute on the projection
layer. Depending on the application the AR-engine is coupled to, as well as on
the format of the input, sometimes more than just one such virtual processor
could be served by one NLP processor. Thus, a morpho-syntactic tagger usually
serves several knowledge sources as it can provide at least lemma, grammatical
number and gender, case, person and part of speech of the head word of the RE
(Brill, 1992; Tufiş, 1999). An FDG (functional dependency grammar) parser
(Järvinen & Tapanainen, 1997) fetches the syntactic role of the RE, while
wordnet access functions can bring all the headword senses (or synsets), and
their position in a conceptual hierarchy. If word sense disambiguation (WSD) is
available as a knowledge source, then the exact word sense of the head-word in
the corresponding context can be determined. The membership of an RE to a
certain segment can be the contribution of a discourse segmenter or a syntactic
parser.
The third component is a set of matching rules and heuristics responsible
to decide whether the PS corresponding to an RE introduces a new DE or, if
not, which of the existing DEs it evokes. This set includes rules of the
following four types:
- certifying rules, which if evaluated to 'true' on a pair (PS, DE), certify without
ambiguity the DE as a referent of the PS. For instance, coreference based on
24 CRISTEA AND POSTOLACHE

proper name identity could be implemented, in most application settings, by a


certifying rule;
- demolishing rules, which rule out a possible DE as referent candidate of a PS
(and, therefore, of its corresponding RE). These rules lead to a filtering phase
that eliminates from among the candidates those discourse entities that cannot
possibly be referred to by the RE under investigation. The order of application
of certifying and demolishing rules is specified in the model through priority
declarations;
- promoting/demoting rules (applied after the certifying and demolishing rules),
which increase/decrease a resolution score associated with a pair (PS, DE). The
evaluation of these rules allows the run of a proposing/evoking phase, in which
either the best DE candidate of a PS is chosen from the ones remained after the
demolishing rules have been applied, or a new entity is introduced. The use of
promoting/demoting rules can be assimilated with the preferences paradigm,
employed by many classical approaches;
- a special section of the third component is dedicated to attribute filling rules,
which are activated each time a new DE is proposed. These rules, behaving
similar to the certifying ones, are responsible for the setting of anaphoric
relations of a functional type. Each such rule receives as parameters: the name
of an attribute (a functional relation), and a pair (DE1, DE2), in which DE1 is
the current DE and DE2 is a DE previously introduced. If a matching is
verified, that attribute of DE1 mentioned as the rule’s first parameter, receives
as value the identifier of DE2.
Finally, the fourth component is a set of heuristics that configure the
domain of referential accessibility, establishing the order in which DEs have
to be checked, or certain proximity restrictions. For instance, if we want to
narrow the search for an antecedent to a vicinity of five sentences (or discourse
units) with the intent to reduce the resolution effort on the basis that the great
majority of the anaphors can find an antecedent within this range, e.g.
(McEnery et al., 1997), then the fourth component of the model will record that
only those DEs linked with REs belonging to the last five discourse units are
considered. Not the least, the domain of referential accessibility can model a
linear search back order (Mitkov, 2000), or a hierarchical search back order on
the discourse tree structure. Figures 4 and 5 display an example of a domain of
referential accessibility for the linear case, respectively the hierarchical case.
Figure 4a shows a case when REa evokes DEa and REb evokes DEb. Then the
order to search the candidate referents for PSc (projected from REc) is DEb first,
then DEa. If a match between PSc and DEa is found (Figure 4b) then, for a
subsequent REd, the order to search the candidate referent matching the
correspondent PSd is DEa first, then DEb (Figure 4c). If, instead, hierarchical
order is preferred, considering that REa, REb and REc belong to three adjacent
HOW TO DEAL WITH WICKED ANAPHORA? 25

discourse units whose vein structure (Cristea et al., 1998, 2000) is the one
depicted in Figure 5 in bold line,8 then the order to consider the candidate
referents for PSc (projected from REc) is DEa first and DEb after, since,
hierarchically, REa (and therefore its corresponding DEa) is closer to REc than
REb (and its corresponding DEb).
text layer REa REb REc REa REb REc REa REb REc REd

projection layer PSc PSc PSd

semantic layer DEa DEb DEa DEb DEb DEa


a b c
Figure 4: Linear search order
In certain cases, it could be of help to see the domain of referential
accessibility as dynamically scaled on the type of the anaphor. A synthesis done
by Mitkov (2002: 24) evidences that demonstrative anaphors find their
antecedents more distantly than pronouns, while this distance could be even
greater in the case of definite nouns and proper nouns. Rules of this kind could
be included in the fourth component of the AR-engine.
The framework is language independent, in the sense that the adjustment to
one language or another consists in defining a specific set of attributes,
establishing the language specific knowledge sources capable to fill them and
devising evoking heuristics/rules specific to each language. The domain of
referential accessibility is thought to be stable to language change.
text layer REa REb REc

projection layer PSc

semantic layer DEb DEa

Figure 5: Hierarchical search order

2.3 Processing anaphors with AR-engine


Figure 3 depicts the main processing stream of AR-engine. The fundamental
assumption is that anaphors should be resolved in a left-to-right order (in left-

8
The vein expression of an elementary discourse unit (edu) u, following Veins Theory, is a sequence of edus,
proceeding, including and following u, which account to the minimal coherent sub-discourse focused on u. The
gray lines in Figure 5 exemplify a situation in which REb, the linearly most recent RE from REc, is short-
circuited by the vein expression of the edu REc belongs to, which means that REa is more pregnant in the
reader’s memory than REb when REc is read.
26 CRISTEA AND POSTOLACHE

to-right reading languages) and vice versa in right-to-left reading languages.


This way, the linear processing done by humans while reading, from the
beginning of the text to its end is mimicked. At any moment during processing,
just one RE is under investigation, which we will call – the current RE. As the
current RE is momentarily the last one on the input stream, all resulting activity
is performed against DEs already existent and, therefore, all found relations
will point towards the beginning of the text. One processing cycle of the engine
deals with the resolution of one RE and develops along three compulsory
phases and an optional one.
The first (mandatory) phase is the projection phase when a PS (called the
current PS) is build on the projection layer, using the information centred on the
current RE obtained from the text layer with the contribution of the available
knowledge sources.
The second (mandatory) phase, proposing/evoking, is responsible for
matching the current PS towards one DE, either by proposing a new discourse
entity or by deciding on the best candidate from the existent ones. This process
involves first running the certifying and demolishing rules (if available),
followed by the promoting/demoting rules. In the end, either an existent DE is
firmly identified by a certifying rule, or matching scores between the current PS
and a class of referent DEs are computed. Based on these scores, three
possibilities can be judged:
1. all candidate DEs range under thresholdmin, a parameter of the engine in the
range 0 to 1: the interpretation is that none of the preceding DEs is sufficiently
convincing as a referent for the current RE, and therefore a new DE is build.
Each time a DE is created, a relation (type-of, is-part-of, etc.) is searched
for between the new DE and previous DEs in a certain length window.
Responsible for this activity are the attribute-filling rules;
2. the best rated scores are above thresholdmin, but in the thresholddiff range
(a parameter usually less than 0.1) more than one candidate is placed: this
situation should be interpreted as a lack of enough evidence to firmly consider
one referent (the one scored the best) as the selected candidate. Consequently,
the decision to choose a referent is postponed in order to allow following
resolutions to bring supplementary clues to the resolution of the current RE, and
the postponed corresponding PS is left on the projection layer;
3. the best score rated above thresholdmin and there is no other score under it in
the thresholddiff range: the interpretation is that the corresponding candidate
individualises itself strongly among the rest of DE candidates. It will be
confirmed as the referent and any of the preceding REs of the current RE,
which correspond to the identified DE, should be considered antecedents of the
current RE.
HOW TO DEAL WITH WICKED ANAPHORA? 27

In the third compulsory phase, the completion phase, the data contained in
the resolved PS is combined with the data configuring the found referent, if
such a DE has been identified or, simply, the PS content is copied onto the
newly build DE if none of the already existing DEs has been recognised. The
resolved PS is afterwards deleted from the projection layer since any
information that it used to capture can now be recuperated from the DE. So, to
give an extreme example, if for some reason a model chooses to look for
previous syntactic patterns of chained REs, they can be found on the semantic
level. Although apparently contradictory to the “semantic” significance of the
layer, this behaviour can mimic the short-term memory that records information
of value for immediate anaphoric resolution.
Finally, the optional re-evaluation phase is triggered if postponed PSs
remained on the projection layer at a former step. The intent is to apply the
matching rules again on all of them. Humans usually resolve anaphors at the
time of reading, but sometimes decisions should be postponed until the
acquisition of complementary information adds enough data to allow a
disambiguation process. Cases of postponed resolution will be discussed in
Section 7.2. At the end of processing, each RE should record a link towards its
corresponding DE and each DE should record a list of links towards its surface
REs.
As we shall see in Sections 3 to 6, when referential relations different than
strict coreference are to be revealed, DE attributes, which are not directly
triggered from the corresponding PSs, appear as necessary. As mentioned at
item 2 of the proposing/evoking phase, a section dedicated to actions to be
performed for the filling-in of specific attributes following a proposing action is
opened in the third component of the framework – the one dedicated to rules
and heuristics.
In the following examples, we will mark REs by italic letters (as a car) and
their corresponding DEs by a paraphrasing text in bold fonts and within square
brackets (as [the car]). The following sections will analyse, within the AR-
engine framework, a set of AR cases, usually considered difficult to interpret.
The discussion intends to evidence specific difficulties inherent to a large
range of anaphoric phenomena, to imagine solutions in terms of an AR model,
by indicating knowledge sources and rules/heuristics capable to deal with the
identified tasks and to informally appreciate the tractability of these solutions.
The discussion remains under the universal panacea for all the failures in AR,
world knowledge (WK).
28 CRISTEA AND POSTOLACHE

3 Relations triggered by positional and/or syntactic constraints


3.1 Nested referential expressions
(1) the University building
(2) Amenomphis the IVth's wife
(3) the face of the beautiful queen
In constructions of these types, two included (nested) REs are involved.
They refer to two distinct DEs, which are linked by a certain relation. In (1), the
two DEs are [the University building] and [University], where [the
University building] belongs-to [University]. In (2), between
[Amenomphis the IVth's wife] and [Amenomphis the IVth] a variant of the
belongs-to relation holds, perhaps a commitment. In (3), between [the face
of the beautiful queen] and [the beautiful queen] a still different type of
belongs-to relation holds, perhaps a is-part-of relation. In all cases, the
possessed object (or the part) corresponds to the outer RE while the possessing
entity (or the whole) corresponds to the inner RE on the surface string. The
incremental type of processing, including surface string parsing, and the
included pattern of the REs allow that processing of the possessing entity
(corresponding to the inner RE) be performed before the possessed entity
(corresponding to the outer RE). If RE1 is nested on RE2 on the text layer, a
knowledge source should fetch the value RE1 to a nesting slot of the PS
corresponding to RE2. On DE2 of the semantic layer, this slot will later on be
transformed, by an attribute-filling rule, into a belongs-to (or some variation
of it) attribute indicating the DE corresponding to RE1. Other constructions
where a belongs-to or variations of it are correctly included are:9 (the center
of (the hall opposite the big telescreen)), (emblem of (the Junior Anti-Sex
League)), (one of (the middle rows)), (one of (them)), (one of (the novel-writing
machines)). In some cases the rule should be applied recursively: (the waist of
((her) overalls)), (the shapeliness of ((her) hips)). However, in expressions like:
(the hall opposite (the big telescreen)), (preparation for (the Two Minutes
Hate)), (some mechanical job on (one of the novel-writing machines)), (a bold-
looking girl, of (about twenty-seven)), (the girl with (dark hair)), the relation
between the two constituents are different than belongs-to or its variations.
Our refinement of the types of relations to consider did not go so far. Moreover,
a demolishing rule should always prevent a coreference relation between the
DEs corresponding to the two REs.

9
From G. Orwell’s “1984”.
HOW TO DEAL WITH WICKED ANAPHORA? 29

3.2 Apposition
(4) Mrs. Parsons, the wife of a neighbour on the same floor
(5) Nefertiti, Amenomphis the IVth's wife
(6) Jane, beautiful girl, come to me!
(7) a sort of gaping solemnity, a sort of edified boredom
An apposition usually brings supplementary knowledge on a discourse
entity. Also according to other approaches (Mitkov, 2002), but in disagreement
with the annotation convention of MUC-7 which sees the apposition as one RE
and the pair of the two elements as another RE, we consider the two elements
of the apposition as different REs. In the model that we have built, the type of
relation linking the two REs obeys the following heuristic: definite determined
NP, genitival appositions and undetermined NP, as in (4), (5) and (6) yield
coreferences, whereas indefinite noun appositions as in (7) yield type-of
relations between the DE corresponding to the second RE towards the DE
corresponding to the first RE. Let RE2 be an apposition of RE1 on the text
level. We will suppose a knowledge source capable to apply syntactic criteria in
order to fetch a apposition-of=RE1 slot attached to PS2. As PS1 should
have matched a DE1 the moment PS2 is being processed, a certifying rule must
unify PS2 with DE1, in case RE2 is a definite determined NP, undetermined NP
or a genitival construction. As a result, DE1 will accumulate all the attributes of
PS2. Examples of cases correctly interpreted following this strategy are:10
(Emmanuel Goldstein), (the Enemy of the People); (the primal traitor), (the
earliest defiler of the Party's purity). If the apposition is an indefinite
determined NP, a demolishing rule will rule out as a possible antecedent the
argument of the apposition-of attribute in the current PS. As a
consequence, the usual proposing/evoking mechanism will work, finalized in
finding a target DE. Then, only if the found DE is new, a rule in the attribute-
filling section of the set of rules/heuristics will exploit the apposition-
of=RE1 slot attached to PS2 in order to transform it into a type-of=DE1
value. This strategy will correctly interpret an apposition like (a narrow scarlet
sash), (emblem of the Junior Anti-Sex League). Unfortunately, the knowledge
source responsible to detect appositions can easily go into errors. This is the
case when apposition is iterated over more than just two adjacent constituents:
(the most bigoted adherents of the Party), (the swallowers of slogans), (the
amateur spies) and (nosers-out of unorthodoxy); (a man named O'Brien),
(a member of the Inner Party) and (holder of some post so important and
remote), where clear criteria to disambiguate from enumerations or from

10
From G. Orwell: “1984”.
30 CRISTEA AND POSTOLACHE

indications of locations (as in (the same row as Winston), (a couple of places


away)), the only two types of exceptions found so far matching the patterns of
our apposition-finding knowledge source, are difficult to devise.
3.3 The subject – predicative noun relation
(8) Maria is the best student of the whole class.
(9) John is a high school teacher.
(10) Your rival is a photo.
(11) The young lady became a wife.
Supposing RE1 is the subject and RE2 is the predicative noun, a knowledge
source of a syntactic nature should be able to fetch a predicative-noun-
of=RE1 attribute into the PS2 corresponding to the predicative noun RE2.
Definite determined predicative nouns as the best student of the whole class in
(8) are, in our model, considered coreferential with the subject. The resolution
should aim at injecting into the DE [Maria] the information brought by the
predicative noun RE2, and temporarily stored on PS2. Suppose the DE [Maria]
is something of the kind: [name="Maria", sem=person1, Ngen=fem,
num=sg], where person1 is the first sense of the word person according to
WordNet. Then, the fact that she is seen now also as a student must not affect
any of the attributes name, Ngen (natural gender) or num (grammatical number)
but instead add into the description an attribute lemma=student (if only the
head of the RE is considered in the representation, or a more sophisticated
description if the constituents are also kept: the best of the whole class), and
replace the person1 value of the sem attribute with a more specific one:
student1.11 When the predicative noun is an indefinite NP, as in (9), our
model interprets it as the semantic type of the subject. The more general
concept is replaced with a more specific one both when a concept is predicated
as a more specific one (the animal is an elephant) as well as when the reverse
predication holds (the elephant is a heavy animal with a trump). Other
examples of the same kind are:12 (one of them) was (a girl); (she) was (a bold-
looking girl, of about twenty-seven); (who) were (the most bigoted adherents of
the Party); (the other person) was (a man named O'Brien); (O'Brien) was (a
large, burly man); (she) might be (an agent of the Thought Police).13
Conceptual hierarchies like WordNet can help to identify, in examples like
(10), that a photo (an object) cannot be a type for [the rival] (hyponym of a
person, according to WordNet). On the contrary, to find out that a photo is a

11
The implicit assumption here was that WSD capabilities were used as a knowledge source.
12
From G. Orwell’s “1984”.
13
The present model does not implement specific criteria to deal with modalities.
HOW TO DEAL WITH WICKED ANAPHORA? 31

substitute for the person faced in the photo necessitates deep WK. To offer a
substitute of a solution in cases like that, a generic relation like metaphoric-
type-of can be adopted.
The solution we adopted for representing discourse entities subject to time
changes, different than the one proposed in MUC-7 (Hirschman & Chinchor,
1997), is described in (Cristea & Dima, 2001): we have linked entities as the
ones in example (11) with the same-as relation, triggered by the occurrence of
the interposed predicate become.
In all cases (8) to (11), a complication arises when the resolution of RE1 (the
subject) was postponed to the moment RE2 (the predicative noun) is
processed.14 If this happens, either the unification makes PS2 coreferential with
the postponed PS1, or the semantic relation is established between the current
proposed DE and the postponed PS1. Later on, when the postponed PS is
lowered at the semantic level, these relations are maintained.
4 Lemma disagreement of common nouns
4.1 Common NPs displaying identical grammatical number but different
lemmas
(12) Amenomphis the IVth's wife … the beautiful queen
The discovering of the coreference relation in this case should mainly be
similarity-based. In principle, a queen should be found more similar to a wife
then to a pharaoh, supposing Amenomphis is known to be as such. If, instead,
this elaborate knowledge is not available, and all that is known about
Amenomphis, as contributed by a name-entity recogniser knowledge source, is
his quality of being a man, the moment the beautiful queen is processed, a
queen should again be found more similar to a wife than to a man. Many
approaches to measure similarity in NLP are already known and some use
wordnets (e.g. (Resnik, 1999)). When a sense disambiguation procedure is
lacking, then a wordnet-driven similarity that counts the common hypernyms of
all senses of the two lemmas could be a useful substitute in some cases.15 Still,
criteria to decide similarity are not elementary and a simple intersection of the
wordnet hypernymic paths of the anaphor lemma and the candidate antecedent
lemma often does not work. The following is an example of a chain of
erroneous coreferences found on the basis of this simplistic criteria: the centre
of the hall opposite the big telescreen | his place | some post so important and

14
The same is true for apposition.
15
There is good reason to believe that such an approach is successful when lexical ontologies, as fine graded in
word senses as WordNet, are used. This criterion is based on the assumption that senses displaying common
ancestors must be more similar than the ones whose hierarchical paths do not intersect.
32 CRISTEA AND POSTOLACHE

remote | the back of one's neck | a chair | places away | the end of the room | the
protection of his foreign paymasters.16
Sometimes, a useful criterion for the identification of coreferential common
noun REs with different lemmas could be the natural gender (queen and wife
are both feminine in natural gender). In other cases, the antecedent could be
recuperated by looking at the modifiers of the head nouns. Consider example
(13):
(13) the most beautiful women… those beauties
A promoting rule should be able to confront the lemma beauty with
modifiers of the head women in the DE for [the most beautiful women].
4.2 Common NPs with different grammatical number and different lemmas
(14) a patrol … the soldiers
(15) the government… the ministers
According to WordNet, in two out of three senses, a patrol is a group and, in
one sense out of four, government is also a group. This suggests to fill-in a
sem=group feature if the group, grouping -- (any number of
entities (members) considered as a unit) synset is found on a
hypernymic path of the lemma of a candidate antecedent of the plural NP (see
examples (14) and (15)). However, this criterion could prove to be weak
because many words have senses that correspond to groups (a garden, for
instance, has a sense that means a group of flowers, and in a text like A patrol
stopped by the garden. The soldiers… there is high chance to find the soldiers
coreferring to [the garden] rather than to [the patrol]). Different criteria
should be combined to maximize the degree of confidence, among which a
similarity criteria, for instance based on wordnet glosses (as in forest – the
trees and other plants in a large densely woodened area) or
on meronymy, (as in flock – a group of sheep or goats – HAS
MEMBER: sheep – woolly usu. horned ruminant mammal related
to the goat), or even the simple identification of antecedents within a fixed
collection of collective nouns, as suggested in (Barbu et al., 2002). In principle,
this case is similar to the preceding one if an attribute of being a group is
included in the representation of the DE referent.
4.3 Common nouns referring proper nouns
(16) Bucharest… the capital

16
From G. Orwell’s “1984”.
HOW TO DEAL WITH WICKED ANAPHORA? 33

There are no other means to solve this reference than enforcing the labelling
of Bucharest, in its corresponding DE, the very moment when it is processed,
with, for instance, a city1 value of a sem attribute. If this labelling
information is available, fetched by a name-entity recogniser, then the
framework processes the reference the same way it does with common nouns
with different lemmas.
5 Number disagreement

5.1 Plural pronouns identifying split antecedents


(17) John waited for Maria. They went for a pizza.
Despite the opinion of other scholars on the matter (see, for instance,
(Eschenbach et al., 1998)) we do not think that, during the interpretation of (17)
above, a discourse entity for the group [John, Maria] must have been
proposed, as soon as the referential expression Maria is parsed. Or else, we
have to face a very uncomfortable indecision regarding what groups to consider
and when. The mentioned group is seen as a DE only because at a certain
moment, as the text unfolds, an anaphor coreferring to it appears: they. In (18)
below, there is no need for such a group representation, as the reader is perhaps
not conscious of its existence:
(18) John waited for Maria. He invited her for a pizza.
Neither vicinity in the location space of the story, nor textual vicinity or
framing in a wording pattern are a sufficient constraining criteria for proposing
groups on the semantic layer, see examples (19) and (20):
(19) John was in New York when Maria wrote him that she finally made up her mind. They
got married the next month.
(20) John finished his classes. He went to a football match. As it was a rainy day, no more
than 10 people were on the stadium. Maria happened to be there too. They went for a
pizza and one month later got married.
To make life even harder, note that in (20) 12 people are candidates for
different groups of persons ([John, 10pers], [10pers, Maria], [John, 10pers,
Maria], [John, Maria] or only [10pers]). Nevertheless, the reader has no
difficulty to identify they with the group [John, Maria]. But why not to attach
to the group also [John's classes], [the football match], [the rainy day] or
[the stadium]? The obvious WK-based answer is: because none of the others
can go for a pizza! And also because getting married is an occupation for
exactly two people! But this is deep WK and, as agreed, we would not want to
rely on it.
34 CRISTEA AND POSTOLACHE

From the discussion above, we know that group formation is triggered by a


first reference to it. A group, unless it is verbalised to as such in the text, does
not exist until it is referred to. Still, two questions remain: how much we can do
in the absence of WK for the group content identification, and what are the
criteria to trigger the creation of group DEs, therefore by what means a plural
pronoun is considered as referring to a group. The answer to the first question
stays again in the use of similarity measures (common association basis in
(Eschenbach et al., 1998)) to identify members of groups in the text preceding
the plural pronoun. As for the second question, the framework policy is to
propose new DEs when no match between the current PS and the preceding
DEs rises above thresholdmin. This policy is good enough for our purpose as
long as no plural DEs, toward which the plural anaphor could match, are in the
recent proximity. If an ambiguity arises, then the second framework policy to
postpone resolution until sufficient discrimination criteria leaves a unique
candidate within a thresholddiff range is well suited again. The combination
of these two policies in example (21) below, for instance, would maintain the
indecision whether they should corefer to [John, Maria] or to [the classes] as
long as no WK is available to state that only people can go for a pizza, and this
should be a correct behaviour.
(21) John waited for Maria when the classes were over. They went for a pizza.

5.2 Plural nouns identifying split antecedents


Supplementary to the problems identified above, when the anaphor is a noun,
the similarity criteria found to characterize the group should extend to the
anaphor as well. Consider the following example:
(22) Athos, Porthos and Aramis … the musketeers
The similarity criteria sketched above yields person, individual,
someone, somebody, mortal, human, soul – (a human being) as
the WordNet concept characteristic to the discovered group, while the word
musketeer means also a person. As such, there is enough evidence to conclude
that a DE [the musketeers] should be proposed that points to each of the DEs
[Athos], [Porthos] and [Aramis] as members. As already discussed in Section
2.3, the decoration of existing DEs with attributes different than those inherited
from the PS it evolves from, in our case the completion of the DE [the
musketeers] with an attribute has-as-element=<x,y,z>, with x, y, z being
identifiers of the DEs [Athos], [Porthos] and [Aramis], is an action
characteristic to the attribute-filling rules.
Exploring the Variety of Random
Documents with Different Content
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation,


the trademark owner, any agent or employee of the Foundation,
anyone providing copies of Project Gutenberg™ electronic works in
accordance with this agreement, and any volunteers associated with
the production, promotion and distribution of Project Gutenberg™
electronic works, harmless from all liability, costs and expenses,
including legal fees, that arise directly or indirectly from any of the
following which you do or cause to occur: (a) distribution of this or
any Project Gutenberg™ work, (b) alteration, modification, or
additions or deletions to any Project Gutenberg™ work, and (c) any
Defect you cause.

Section 2. Information about the Mission


of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many
small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About


Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.
Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.

You might also like