Reference Scope Identification in Citing Sentences

Reference Scope Identification
in Citing Sentences
　　　　　　　　　Authors:
Amjad Abu-Jbara, Dragomir Radev
(University of Michigan)
　　　　　　　　　　　　Conference:
NAACL 2012
　　　　　　　　　　　　Expositor:
Akihiro Kameda
(Aizawa Lab. The University of Tokyo)

Abstract
● Problem:
● Multiple citation in one sentence
● There are many POS taggers developed using
different techniques for many major languages such
as transformation-based error-driven learning (Brill,
1995), decision trees (Black et al., 1992), Markov
model (Cutting et al., 1992), maximum entropy
methods (Ratnaparkhi, 1996) etc for English.
● Approach:Prepossessing
　　　　　and 2+1+2*3+1=10 methods

Reference Preprocessing
(tagging, grouping, non-syntactical element removal)
● These constraints can be lexicalized (REF.1; REF.2),
unlexicalized (REF.3; TREF.4) or automatically learned
(REF.5; REF.6).

● These constraints can be lexicalized (GREF.1), unlexicalized
(GTREF.2) or automatically learned (GREF.3).

● (GTREF.1) apply fuzzy techniques for integrating source
syntax into hierarchical phrase-based systems (REF.2).

Approach 1(SVM,LR)
● Word classification
● with SVM, a logistic regression classifier
● Feature: Distance, Position(Before/After), in Segment(,.;
and, but, for, nor, or, so, yet), POS tag, Dependency
Distance, Dependency Relations, Common Ancestor Node,
Syntactic Distance
● Problem Example:
● There are many POS taggers developed using different
techniques for many major languages such as transformation-
based error-driven learning (Brill, 1995), decision trees (Black et
al., 1992), Markov model (Cutting et al., 1992), maximum entropy
methods (Ratnaparkhi, 1996) etc for English.

Approach 2(CRF)
● Sequence Labeling with CRF
● feature is same as Approach 1

Approach 3-S1-* (CRF/segment)
● segmentation (1)
● punctuation marks
● coordination conjunctions
– and, but, for, nor, or, so, yet
● a set of special expressions
– "for example", "for instance", "including", "includes",
"such as", "like", etc.
● [Rerankers have been successfully applied to numerous
NLP tasks such as] [parse selection (GTREF)], [parse
reranking (GREF)], [question-answering (REF)].

Approach 3-S2-* (CRF/segment)
● segmentation (2)
● chunking tool
– noun groups
– verb groups
– preposition groups
– adjective groups
– adverb groups
– other parts form segment by themselves
● [To] [score] [the output] [of] [the coreference models], [we]
[employ] [the commonly-used MUC scoring program (REF)]
[and] [the recently-developed CEAF scoring program (TREF)].

Approach 3-*-R1,2,3
(CRF/segment)
● R1: majority label of the words it contains
● R2: inside if any word is inside
● R3: outside if any word is outside
● [I O O O O] [I I I] [O O]

AR2011

the link grammar parser
(Sleator and Temperley,1991)

Data
● ACL Anthology Network Corpus
● 3300 sentences, citations in each ≧ 2

Annotation agreement
● 500 of 3300,
● Preprocessing is perfect
● Kappa coefficient of scope is
P ( A)−P ( E )
K= =2P ( A)−1=0.61
1−P ( E )

Tools
● Edinburgh Language Technology Text
Tokenization Toolkit (LT-TTT)
● text tokenization, part-of-speech tagging, chunking,
and noun phrase head identification.
● Stanford parser
● syntactic and dependency parsing
● LibSVM with linear kernel
● Weka
● logistic regression classification

Tools
● Machine Learning for Language Toolkit
(MALLET)
● CRF

Validation
● 10-fold cross validation

Experiment (Preprocessing)
These constraints can be lexicalized (REF.1; REF.2), ll
r ec a
●

unlexicalized (REF.3; TREF.4) or and 93 .1%learned
(REF.5; REF.6). 3% preci
s ion automatically
ng: 9 8 .
Taggi
● These constraints can be lexicalized (GREF.1), unlexicalized
(GTREF.2) or Perfect!
automatically learned (GREF.3).
Grouping:
(GTREF.1) apply fuzzy techniques for integrating source
a l:
●

syntax into hierarchicalence
removsystems (REF.2).
Non-syn tactic refer phrase-based ecall
9 0. 1% r
cision and
9 0.08% pre

Experiment (Main)
● CRF
● Chunking

● Majority

Feature Analysis
● Feature: Distance, Position(Before/After), Same
segment(,.; and, but, for, nor, or, so, yet), POS
tag, Dependency Distance, Dependency
Relations, Common Ancestor Node, Syntactic
Distance

Summary
● Identified reference scope in a sentence which
has multiple citation
● CRF
● Chunking

● Majority

Reference Scope Identification in Citing Sentences

Reference Scope Identification in Citing Sentences

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Reference Scope Identification in Citing Sentences (20)

More from Akihiro Kameda (7)

Recently uploaded (20)

Reference Scope Identification in Citing Sentences

Editor's Notes