DiscourseMarkersLREC 2008 Samy
DiscourseMarkersLREC 2008 Samy
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/220746906
CITATIONS READS
4 136
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Doaa Samy on 25 April 2016.
Abstract
Discourse structure and coherence relations are one of the main inferential challenges addressed by computational pragmatics. The
present study focuses on discourse markers as key elements in guiding the inferences of the statements in natural language. Through a
rule-based approach for the automatic identification, classification and annotation of the discourse markers in a multilingual parallel
corpus (Arabic-Spanish-English), this research provides a valuable resource for the community. Two main aspects define the novelty of
the present study. First, it offers a multilingual computational processing of discourse markers, grounded on a theoritical framework and
implemented in a XML tagging scheme. The XML scheme represents a set of pragmatic and grammatical attributes, considered as basic
features for the different kinds of discourse markers. Besides, the scheme provides a typology of discourse markers based on their
discursive functions including hypothesis, co-argumentation, cause, consequence, concession, generalization, topicalization,
reformulation, enumeration, synthesis, etc. Second, Arabic language is addressed from a computational pragmatic perspective where the
identification, classification and annotation processes are carried out using the information provided from the tagging of Spanish
discourse markers and the alignments.
Time
Coargument
Coargument
Coargument
Coargument
Topicalizati
Generalizati
Reformulati
Purpose
No
Condition
Option
Concretion
Concession
Consequen
Cause
Simultaneity
Hipothesis
How
DM _types
300
250 Spanish and English regarding the position of occurrence
200
150 of discourse markers and the use of punctuation marks
100
50 increased rates of the heuristics search affecting the overall
0 precision. For the 558 Spanish source markers, the applied
Contrargumentation
Coargumentation3
Coargumentation2
Coargumentation1
Time
Coargumentation
Concretion
Purpose
Concession
Consequence
Cause
No
Option
Topicalization
Generalization
Reformulation
Condition
Hipothesis
Simultaneity