An Approach To Abstractive Text Summarization

This document describes an approach to abstractive text summarization based on discourse rules, syntactic constraints, and word graphs. It proposes using discourse rules and syntactic constraints to generate sentences from keywords, and using a word graph to represent word relations and combine multiple sentences. The approach aims to address issues with generating incorrect meanings from existing word graph methods by separating the process into sentence reduction and sentence combination stages.

Uploaded by

Rodrigo Sánchez Mariño

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

An Approach To Abstractive Text Summarization

Uploaded by

Rodrigo Sánchez Mariño

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/282282594

An approach to abstractive text summarization

Article · March 2015

DOI: 10.1109/SOCPAR.2013.7054161

CITATIONS READS
17 470

2 authors, including:

Huong Le
Hanoi University of Science and Technology
30 PUBLICATIONS 184 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Machine Learning View project

All content following this page was uploaded by Huong Le on 15 January 2016.

The user has requested enhancement of the downloaded file.

An approach to Abstractive Text Summarization

Huong Thanh Le Tien Manh Le

Hanoi University of Science and Technology Hanoi University of Science and Technology
Hanoi, Vietnam Hanoi, Vietnam
[email protected] [email protected]

Abstract—Abstractive summarization is the technique of approaches. The input of our system is an extractive
generating a summary of a text from its main ideas, not by summary after anaphora resolution. That means all pronouns
copying verbatim most salient sentences from text. This is an have been replaced by corresponding nouns/noun phrases
important and challenge task in natural language processing. (NPs).
In this paper, we propose an approach to abstractive text The rest of this paper is organized as follows. Section 2
summarization based on discourse rules, syntactic constraints, analyzes existing problems with the word graph and
and word graph. Discourse rules and syntactic constraints are proposes our strategies to deal with them. Our sentence
used in the process of generating sentences from keywords. reduction’s method is introduced in Section 3. Section 4
Word graph is used in the sentence combination process to
presents our method of merging sentences using word graph.
represent word relations in the text and to combine several
sentences into one. Experimental results show that our
Experimental results are discussed in Section 5. Finally,
approach is promising in solving the abstractive Section 6 concludes the paper and gives some insight for
summarization task. future work.

Keywords- abstractive text summarization, discourse relation,

II. CONSTRUCTING GRAPH
word graph A word graph consists of nodes and edges. Existing
approaches on AS [3,9] use nodes to store information about
I. INTRODUCTION words and theirs POS tag and edges to represent adjacency
Automatic text summarization is the technique which relations between word pairs. A new sentence is generated
automatically creates an abstract or summary of a text. It by connecting all words in a path of the word graph.
gained widespread interest due to overwhelming amount of The approaches using word graph for single document
textual information available in electronic format. Text summarization still have problems as many sentences with
summarization techniques can be broadly grouped into incorrect meaning can be generated. This is because the
abstractive summarization (AS) and extractive generation algorithms find paths among words on the graph,
summarization (ES). Most research on text summarization regardless of their syntactically correctness and the original
are ES [2,11] since it is easier and faster than AS. ES extracts text. An example of sentences with incorrect meaning is
verbatim most salient sentences from text. Meanwhile, AS is shown below.
relied on Natural Language Processing (NLP) techniques to Example 1: Mẹmother BáchBach muabuy thuốcmedecine vềback
copy-paste sentence fragments from the input document and chofor uốngdrink. Sau_khiafter uốngdrink, BáchBach cóhas
maybe combine the selected content with extra linguistic biểu_hiệnsymptom đỏred môilip vàand nổiappear bọngbubble nướcwater
information in order to generate the final summary. There ởat tayhand vàand chânleg.
are two main problems with ES. First, the textual coherence Mẹ có
is not guaranteed as resolving anaphora resolution is not paid
attention in this approach. Second, redundant phrases still Bách biểu_hiện
exist in the summary. AS can solve this problem by carrying mua đỏ tay
out NLP techniques to post-process the output of ES such as
sentence truncation, aggregation, generalization, reference thuốc , môi ở
adjustment and rewording [4,6,8]. However, AS is still a chân
major challenge for NLP community despite some work on về và nước
sub-sentential modification [4,6]. cho nổi bọng
Recent approaches in AS use word graphs to represent a uống
document [3,9]. These graphs are then used to produce
document abstracts, allowing the algorithm to compress and Sau_khi
.
merge information. Representing documents by word
graphs is a new and potential approach for generating Figure 1. The word graph representation for Example 1
abstractive summary. However, this approach still has many
problems, as discussed in Section 2. In this paper, we In Fig. 1, each small circle represents a word of the text;
concentrate on rhetorical structure and word graph to the symbol ⊗ means the end of a sentence. Each arrow is
generate an abstractive summary. Several strategies are created by connecting two adjacent words in a sentence. The
proposed to solve existing problems with word graph based above word graph can generate sentences “BáchBach muabuy
thuốcmedecine vềback chofor uốngdrink” and “Mẹmother BáchBach đã ký hợp tác
cóhas biểu_hiệnsymptom đỏred môilip vàand nổiappear bọngbubble Hà_Giang
nướcwater ởat tayhand vàand chânleg”, which do not reflex the . phát_triển
trở_thành
correct meaning of the text. In addition, “đỏ môi và chân|red
lip and leg” should not be generated since it does not reflex the điểm nước
correct meaning of the original text. Pagerank method or
adding information about sentence position used in [3] and hấp_dẫn ngoài
.
[9] cannot deal with this problem. khách
The problem of incorrect meaning of a new phrase does thành_phố và Văn_hóa
not only happen with NPs, but also with other phrases such Du_lịch trong
nhiều Sở
as verb phrases (VPs) and adjective phrases (AdjPs). In order với
to solve this problem, instead of finding paths containing của
keywords using scores or shortest paths as in [3] and [9], our
abstractive summary generation process is separated into two Figure 2. The word graph representation for Example 2
stages: sentence reduction and sentence combination. The
sentence reduction step is based on input sentences, The sentence “Sởdepartment văn_hóaculture vàand ngoàioutside
keywords of the original text and syntactic constraints. Word nướccountry.” is generated by the above word graph. However,
graph is used only in the sentence combination stage. this is only a NP, not a sentence. Moreover, this NP is an
The problem of incorrect meaning of a new phrase is incorrect name of a Vietnamese department. This is because
solved in the sentence reduction stage using two strategies. the word “vàand” at the middle of the original department
First, all basic phrases1 (including basic NPs, basic VPs, and name having two output branches in the word graph: one to
basic AdjPs) from the extractive sentences that contain the rest of the department name and one to another branch.
keywords are used as essential materials for the sentence This sentence is created by visiting the second branch from
reduction stage. A new sentence is created by connecting the the word “vàand” in this case. This sentence never appears in
first phrase to the last one in the original sentence and then our system since this is the case of incorrect meaning and is
expanding its left and right sides to satisfy syntactic solved by our two strategies in the sentence reduction stage
constraints. A detailed description of this procedure is mentioned above.
introduced in Section 3. Another drawback of existing word graph based
In the above example, to generate a new sentence from approaches is that these researches do not care about
the original sentence “Mẹmother BáchBach muabuy thuốcmedecine word/phrase meaning. Different words/phrases that refer to
vềback chofor uốngdrink.”, the basic NP in this sentence that the same concept are represented as different nodes in their
contains the keyword “Bách” is “Mẹ Bách”. Therefore, “Mẹ graphs. As a result, sentences that contain these nodes cannot
Bách” (not “Bách”) is used as the subject of this sentence. be merged to create a new sentence with richer information
To generate a new sentence from the original sentence than the old ones. To solve this problem, an anaphora
“Sau_khiafter uốngdrink, BáchBach cóhas biểu_hiệnsymptom đỏred resolution module2 has been integrated into our ES system.
môilip vàand nổiappear bọngbubble nướcwater ởat tayhand vàand Then the output of our ES system is used as the input of our
chânleg.”, “Mẹ Bách” cannot be the subject of the new abstractive summarizer. This is different than that of [3] and
sentence since it does not appear in the original sentence. [9] in which anaphora resolution has not been considered.
The second strategy to solve the first problem is to From such a type of input, all nodes that refer to a concept
consider stop words, prepositions, numerals, auxiliary words, are grouped into one. That is, the text field of a node will
and negative words (e.g., “khôngnot”, “chẳngnever”) as store multi-values, as illustrated in Fig. 3. If the original
separated nodes. Otherwise, the real meaning of the sentence sentence uses one value in this text field, the system can use
could be changed when generating new sentences. any value in this group to generate a new sentence. We
The second problem of existing approaches using word consider two cases: (i) synonym words; and (ii) different
graph is that ungrammatical sentences can be generated by expressions refer to the same concept.
the word graph. Let us consider the example below. To deal with the first case, a synonym dictionary is used.
Example 2: Hà_GiangHaGiang trở_thànhbecomes điểmplace For example, “phát_biểusay” and “tuyên_bốdeclare” are two
hấp_dẫnattractive kháchguess du_lịchtourist tronginside vàand synonyms, they are just considered as one node in the graph.
ngoàioutside nướccountry. Sởdepartment Văn_hóaculture vàand The second case is solved by using coreference
Du_lịchtourist Hà_GiangHaGiang đãhas kýsign hợp_táccooperation resolution. For example, if the original sentence is
phát_triểndevelop du_lịchtourist vớiwith Sởdepartment Văn_hóaculture “Vũ_DưVuDu dùnguse thuốcmedecine Biseptol.” and by
vàand Du_lịchtourist củaof nhiềumany thành_phốcity. coreference resolution, we know that “bệnh_nhânpatient”,
“Vũ_DưVuDu”, “DưDu”, and “bệnh_nhânpatient Vũ_DưVuDu”
refer to the same object, the original word graph in Fig. 3a
can be expanded as in Fig. 3b. Such of coreference
1
The Vietnamese chunker, created by Nguyen Le Minh and Cao
Hoang Tru, belongs to the VLSP project
2
http://vlsp.vietlp.org:8080/demo/?page=home is used for Due to the scope of this paper, the anaphora resolution step is not
extracting basic phrases from sentences. mentioned in this paper.
resolution’s rules are proposed by us and are integrated in III. SENTENCE REDUCTION
our system. The input of sentence reduction module is the extractive
Vũ_Dư dùng thuốc Biseptol . summary of the document and keywords of the original
(a) document. Keywords are extracted from the original
Bệnh_nhân| Vũ_Dư| Dư| document by computing its tf-isf (term frequency - inverted
Bệnh_nhân Vũ_Dư| sentence frequency) and getting top k per cent keywords
Bệnh nhân Dư dùng thuốc Biseptol . with highest tf-isf. The optimal value of k in our system is
(b) 15% and it is determined by our experiments. The output of
Figure 3. The graph representation for the sentence “Vũ_Dư dùng thuốc this module is another version of summary that is shorter
Biseptol.” than its input text.
By studying how humans write summaries, Jing and
If the next sentence involves “Vũ_Dư” such as McKeown [6] found that professional abstractors often reuse
“Bệnh_nhânpatient bịsuffer from biến_chứngside_effect nặngheavy”, the the text in an original document, and then edit the extracted
word “bệnh_nhânpatient” is also mapped to the first node of sentences for producing the summary. Applying this idea,
the graph as in Fig. 3b. instead of creating new sentences from keywords, we locate
Our graph to represent the input text is organized as important phrases in original sentences (basic phrases in the
follow. The graph G = (V, E) consists of a set of vertexes original sentences that contain keywords) and use them as
(nodes) V and a set of edges E. A vertex keeps four kinds of essential materials for generating an abstractive summary.
information: This method permits us reduce ungrammatical phrases and
• a text field stores words or phrases that refer to a produce sentences whose meaning are close to the original
concept; sentences.
• a POS field stores the grammatical role of the text field. To create a new sentence from important phrases of the
If the text field has several values, the largest POS tag original sentence, the fragment that spans from the first
will be assigned. important phrase to the last one in the original sentence is
An edge connects two vertexes in the graph. Two generated. This fragment is considered as an essential part in
vertexes are connected if their texts are adjacent in the input the original sentence. Then other words of the original
text. sentence are added to the beginning and end of this fragment
The input of the algorithm to create a graph is the to create a syntactically correct sentence whose meaning of
original document and its extractive summary. The the original sentence is still remained.
extractive summary has been tokenized, POS tagged, solved The process of generating a sentence from the essential
coreferences and defined unsplittable phrases. The words fragment of a sentence is divided into two steps: (i)
and unsplittable phrases are called textual units of the input completing the beginning of a sentence; and (ii) completing
text. The output of the algorithm is a graph G = (V, E) the end of a sentence.
represents the extractive summary. Steps to generate the
graph that represents the extractive summary of a text is A. Completing the end of a sentence
shown below: The input of this process is the original sentence that has
• Detect all phrases that refer to a proper name in the been divided into basic phrases (NP, VP, etc.) and the
original text (e.g., “bệnh_nhânpatient”, “Vũ_DưVuDu”, essential fragment of the original sentence. To investigate
“DưDu”, and “bệnh_nhânpatient Vũ_DưVuDu”). These grammatical problems with fragments generated by our
phrases are called unsplittable phrases. system, we carried out an experiment using a data set of 200
documents collected from online newspapers. Experimental
• For each textual unit from sentences in the extractive
results shown that fragments end with the following phrases
summary:
cannot be the end of a syntactically correct sentence:
• Add a vertex vi corresponding to this textual unit • The fragment ends with a NP, which can be the subject
when: (i) the textual unit is a stop word, of a clause/sentence, or the object of the main VP of the
prepositions, numerals, and negative words; or (ii) original sentence.
the textual unit with its POS does not exist in the • The fragment ends with a VP, which follows by a NP or
graph. If the textual unit is a proper name or an an AdjP in the original sentence; or the VP ends with a
unsplittable phrase, add all coreference verb (V) and follows by a preposition phrase in the
words/phrases of this textual unit to the text field original sentence.
of the new node.
• Create a directed edge by connecting the vertex Based on our observation, the process of filling the end
corresponding to the previous textual unit with the of a sentence is as follow:
vertex corresponding to the considered textual unit. • If the fragment ends with a NP and there is an AdjP or a
As mentioned earlier in Section 2, our process of VP right after that at the original sentence, connect that
generating an abstractive summary is divided into two AdjP or VP to the end of the fragment.
stages: sentence reduction and sentence combination. The • The fragment ends with a VP and there is a NP or an
stage of sentence reduction is introduced next. AdjP or a VP right after that at the original sentence,
connect that NP or AdjP or VP to the end of the Example 3: [You should meet Thanh today3.1] [after you
fragment. finish this work3.2]. [He will go to Saigon tomorrow.3.3]
• The fragment ends with a VP. That VP ends with a verb
and follows by a preposition phrase in the original 3.1-3.3
EXPLANATION
sentence. In this case, the preposition phrase is
connected to the end of the fragment. 3.1-3.2 S
B. Completing the beginning of a sentence CIRCUMSTANCE
S
The input of this process is the original sentence that has 3.1 3.2 3.3
been divided into basic phrases (NP, VP, etc.) and the Figure 4. The Discourse Tree of Example 3
essential fragment of the original sentence after completing
the end of a sentence. By investigating fragments returned by To construct the discourse structure of a text, the
our experimental results, we found out that the NP and the following tasks should be performed: (i) segmenting text into
VP at the beginning of the fragments may not be the main edus; (ii) recognizing discourse relations between spans; and
NP or the main VP of the original sentence. This is because (iii) constructing a discourse tree that represents the
keywords of the sentence may be in the object of the main discourse structure of the text.
verb of the sentence; or in the preposition phrase of the Most of research on RST for English bases on cue
sentence. phrases such as because, but, although, etc. to segment text
Finding the subject or the main verb of the sentence by [12]. For example, the sentence “We cannot be sure the
locating the first NP or the first VP of the sentence, product is safe although we have tested it.” can be splitted
respectively, is not always correct since these phrases can be into two edus “We cannot be sure the product is safe” and
at the adVP of the sentence. Finding these phrases by “although we have tested it.”, based on the cue phrase
locating the NP or the VP right before these phrases is not although. In addition to cue phrases, syntactic information is
always correct either. Therefore, filling the beginning of the also used in [8] to segment text into edus.
fragment is more complicate than filling the end of the Researchers have defined many discourse relations such
fragment. as list, sequence, elaboration, cause, result, evidence, etc.
By studying written text, we found that the important These relations are divided into three types: N-N, N-S, S-N.
parts of a sentence are often located at the beginning of the In this research, since we only concern in remove
sentence. Therefore, when creating a new sentence from the unimportant part at the beginning of sentences, only S-N
essential fragment of the original sentence, the beginning of relations are concerned. Identifying names of discourse
the fragment is expanded to the beginning of the original relations and constructing the discourse tree of the text are
sentence. After that, some rules are applied to remove out of scope of this research.
unimportant parts at the beginning of the new fragment. To The next section will introduce our method of
recognize these unimportant parts, rules to detect discourse recognizing S-N relations from sentences and removing
relations at the sentence-level [8] are applied. In order to unimportant part at the beginning of a sentence.
understand rules to detect discourse relations at the sentence- 2) Removing unimportant part at the beginning of a
level, Rhetorical Structure Theory (RST) is introduced next. sentence
1) Rhetorical Structure Theory The first step of this stage is to recognize S-N relations
Rhetorical Structure Theory (RST) [10] is a method of from sentences. Based on this, sentence reduction is done by
representing the coherence of text. It models the rhetorical keeping the N part in the summary.
structure of a text by a hierarchical tree that labels discourse As mentioned in Section 3.2.1, text segmentation can be
relations between spans. This hierarchical tree diagram is done by using cue phrases [12] and syntactic information [8].
called a “rhetorical tree” or “discourse tree”. The leaves of As far as we know, there is no Vietnamese syntactic parser
an RST tree correspond to elementary discourse units (edus), whose accuracy is higher than 90%. Therefore, it is not
which are clauses or clause-like units with independent reliable to use the output of syntactic parser for the text
functional integrity, whereas the internal tree nodes segmentation task. The segmentation process for Vietnamese
correspond to larger spans. cannot rely simply on cue phrases neither, as analyzed
Fig. 4 represents the discourse tree of Example 3. Instead below.
of displaying the full text of each tree node, we cite the first Since Vietnamese is a monosyllabic language, a cue
and last edus that contribute to it (e.g., “3.1-3.2”, “3.1-3.3”). phrase may be recognized incorrectly as a part of another
An internal tree node contains one or several names (e.g., word. A word may also be recognized incorrectly as a cue
elaboration, explanation) of the discourse relations that hold phrase. Let us consider Example 4 below:
between adjacent, non-overlapping spans. The span that Example 4:
participates in a discourse relation is either a nucleus (N) or a a. TôiI rấtvery buồnsad khiwhen emyou khôngdid not đếncome.
satellite (S). The nucleus plays a more important role than b. Chẳng_mấy_khirarely anhyou đếncome to nhàhouse tôimy.
the satellite in respect to the writer’s intention. If both spans In Example 4a, the word “khiwhen” is a cue phrase. In
have equal roles, they are both considered as nuclei in the Example 4b, “khi” is a part of the word
relation. “chẳng_mấy_khirarely”, and it is not a cue phrase. To deal
with this problem, information about cue phrases is
combined with information about words and their POS tag to sentence. Since all keywords of the second sentence are in
detect cue phrases in a given sentence3. the NP “các_emthey học_sinhpupil Trườngschool
The list of cue phrases is created by our empirical Mường_LýMuongLy”, the two sentences in Example 4 are
research on Vietnamese text and by inheriting cue phrases combined to create the new sentence “các_emthey
and its template from [5,8,12]. Examples of our template học_sinhpupil Trườngschool Mường_LýMuongLy chỉonly ăneat cơmrice
using cue phrases are: vớiwith muốisalt”
Bởi_vìsince S nêntherefore N.
Nếuif S thìthen N. học_sinh Trường Mường Lý .
In general, the strategy of sentence reduction is language
independent. However, the process of filling the end of a các em chỉ ăn cơm với muối .
sentence is language dependent since each language has its
own grammar principles. của
Đó là tình_cảnh cuộc_sống
IV. SENTENCE COMBINATION
After sentence reduction, the process of sentence
Figure 5. The graph representation for Example 5
combination is carried out. By studying how humans write
summaries, we found that the following cases can be merged c. A sentence has a component that provides more detailed
to create a new sentence with richer information: information for a clause of the previous sentence.
a. Two short and consecutive sentences with the same <sentence 1> = <left text 1> <clause 1>
subject: <sentence 2> = <left text 2> <component 2>
<sentence 1> = <noun|NP> <VP 1> <component 2> in <sentence 2> starts with a phrase with
<sentence 2> = <noun|NP> <VP 2> similar meaning to <clause 1> in <sentence 1>. Notice that
Two sentences are considered as consecutive if they are two consecutive sentences rarely use the same words to
adjacent in the extractive summary. Two sentences have the express a meaning, but synonyms are used instead. A
same subject (i.e., <noun|NP>) if they start from the same synonym dictionary is created by us to detect such cases.
node with the POS is a noun or a NP in the graph. The If all keywords of <sentence 2> is in <component 2>, the
merged sentence in this case is two sentences are merged into one.
<new sentence> = <noun|NP> <VP 1> vàand <VP 2> <new sentence> = <left text 1> <component 2>
Example 6: “MỹU.S. đãhas bày_tỏexpressed lo_ngạiconcern
b. A sentence has a component that provides more detailed vềabout mối đe_dọathreat xâm_nhậpintrusion mạngInternet
information for a noun or a NP of the previous sentence. ngày_càngday by day gia_tăngincreasing” , ôngMr. Hagel
This sentence always starts with a phrase mentioned to phát_biểusaid. Điềuthe problem đángworth chú_ýattention làis ôngMr.
the previous sentence such as “đó làthis is”, “điều đóthis Hagel đãhas đưaissue ra tuyên_bốstatement ngay trước mặtin front of
problem”. The list of such phrases is manually created by các đại_diệnrepresentatives củaof chính_phủgovernment
our empirical research. Trung_QuốcChinese tạiin Đối_thoạidialogue Shangri - La .
<sentence 1> = <left text 1> <noun|NP1> <right text 1>
<sentence 2> = <a phrase mentioned to the previous phát_biểu|
sentence > <left text 2> <NP2> <right text 2> Shangri-La
đã bày_tỏ gia_tăng ” , ông Hagel tuyên_bố . .
in which <NP2> starts with <noun|NP1> and contains ...
proper name in its remaining part. In this case, an edge is
created from the node corresponding to <NP2>, to the node “ Mỹ Điều đáng chú_ý là đã ngay tại
corresponding to <right text 1> in the graph. If all keywords ...
of <sentence 2> is in <NP2> only, the two sentences are đưa_ra Đối_thoại
merged into one: Figure 6. The graph representation for Example 6
<new sentence> = <left text 1> <NP2> <right text 1>
Example 5: Các_emthey chỉonly ăneat cơmrice vớiwith In Example 6, the clause “ôngMr. Hagel phát_biểusaid” in
muốisalt. Đóthis làis tình_cảnhsituation cuộc_sốnglife củaof the first sentence has the same meaning with “ôngMr. Hagel
các_emthey học_sinhpupil Trườngschool Mường_LýMuongLy. đãhas đưaissue ra tuyên_bốstatement” in the second sentence.
The graph representation for Example 5 is shown in Fig. Therefore, these two sentences are combined to create a new
5. The second sentence in Example 5 starts with the phrase sentence:
“đó làthis is” and contains the NP “các_emthey học_sinhpupil “MỹU.S. đãhas bày_tỏexpressed lo_ngạiconcern vềabout mối
Trườngschool Mường_LýMuongLy”, which is a detailed đe_dọathreat xâm_nhậpintrusion mạngInternet ngày_càngday by day
description of the noun “Các_emthey” in the previous gia_tăngincreasing” , ôngMr. Hagel đãhas đưaissue ra
tuyên_bốstatement ngay trước mặtin front of các đại_diệnrepresentatives
3
củaof chính_phủgovernment Trung_QuốcChinese tạiin
The softwares vnTokenizer and vnTagger, created by Le Hong Đối_thoạidialogue Shangri - La .
Phuong (at http://mim.hus.vnu.edu.vn/phuonglh/softwares), are The strategy of sentence combination is language
used for segmenting a Vietnamese text into words and tagging independent.
POS.
V. EXPERIMENTAL RESULTS AND DISCUSSION VI. CONCLUSIONS AND FUTURE WORK
As far as we know, there is no abstractive summarizing This paper has introduced an approach to abstractive text
corpus for Vietnamese. Therefore, to carry out experiments summarization, which consists of two stages: sentence
with the summarizing system, we have to create a corpus by reduction and sentence combination. The sentence reduction
ourselves. Our corpus consists of 50 documents collected stage is based on discourse rules to remove redundant
from several Vietnamese newspaper websites (e.g., Dantri, clauses at the beginning of a sentence, and syntactic
VnExpress, etc.) and belongs to two categories: economy constraints to complete the end of the reduced sentence. The
and culture. The lengths of documents are various from 300 sentence combination stage is based on word graph to
words to 1000 words. Each document has 22 sentences in present relations among words, clauses and sentences from
average. The abstractive summaries were created manually the input text. New sentences that combine information from
by hand (one summary per document) with approximately several sentences are generated by using word graph.
100 words in length. Experimental results show that our approach is promising in
The input of our abstractive summarizer is the output of solving the AS task.
our extractive one, which generates summaries with To improve the system, our future works include: (i)
approximately 120 words in length. The output of our propose methods to improve the meaning completeness of
abstractive summarizer contains 100 words in average. sentences generated in the sentence reduction phrase; (ii)
Among 433 sentences generated by our abstractive propose methods to further compress sentences; and (iii)
summarizer, 95% sentences are syntactic correct; 72% of investigating strategies to efficiently combine sentences in
those sentences are complete in meaning with unimportant the summary.
parts at the end of sentences being removed. Most cases of
incomplete sentences are due to the process of completing ACKNOWLEDGMENTS
the end of a new sentence in the sentence reduction phrase. This work was supported by the Vietnam Ministry
Reasons for this problem are: project, under Grant B2012 – 01 - 24.
• In the case of elaborative clauses situating between the REFERENCES
main NP and the main VP of a sentence, the system
[1] Dijkstra, E. W. 1959. A note on two problems in connexion
misrecognizes the VP of the clause as the main VP of with graphs. Numerische Mathematik, vol. 1, pp. 269–271.
the sentence.
[2] Gunes, E. and Radev, D.R. 2004. Lexrank: graph-based
• The basic phrases of a sentence is detected incorrectly lexical centrality as salience in text summarization. J. Artif.
by the Vietnamese chunker, whereas information about Int.Res., 22(1):457–479.
basic phrases are the key point in completing the end of
[3] Ganesan, K., Zhai, C., Han, J. 2010. Opinosis: A Graph-
a new sentence. Based Approach to Abstractive Summarization of Highly
The abstractive summaries generated by our system are Redundant Opinions. In Proc. of Coling 2010, pages 340–348.
also compared with the summaries in the corpus, using the
[4] Knight, K. and Marcu, D. 2000. Statistics-based
ROUGE (Recall-Oriented Understudy for Gisting
summarization - step one: sentence compression. In Proc. of
Evaluation) measurement [7]. The ROUGE measures count AAAI 2000.
the number of overlapping units such as n-gram, word
sequences, and word pairs between the computer-generated [5] Hoang, T.P. 1980. Vietnamese grammar. Publisher of
summary and the ideal summaries created by humans. In our professional school.
experiments, since each document has only one summary, [6] Jing, H. and McKeown, K. R. 2000. Cut and paste based text
we only compare a candidate summary with a reference one. summarization. In Proc. of NAACL 2000.
Using the above formula, we get values of Rouge-1 and [7] Lin, C.Y. 2004. ROUGE: A Package for Automatic
Rouge-2 of 0.2513 and 0.1344, respectively. Since there is Evaluation of Summaries. In Proc.of NTCIR Workshop 2004.
no work on generating abstractive summaries using the same [8] Le, H.T., Abeysinghe, G. and Huyck, C. 2004. Generating
corpus with us, we cannot compare our experimental results Discourse Structures for Written Texts. In Proc. of COLING
with other research. However, according to [3], Rouge-1 and 2004, Switzerland.
Rouge-2 values when comparing abstractive summaries [9] Lloret, E., Palomar, M. 2011. Analyzing the Use of Word
created by two people are 0.3088 and 0.1069, respectively4. Graphs for Abstractive Text Summarization. In Proc. of
It indicates that our approach is promising in solving the text IMMM 2011.
summarization task. However, since text generation in [10] Mann, W. C. and Thompson, S. A. 1988. Rhetorical Structure
general and automatically abstractive text summarization in Theory: Toward a Functional Theory of Text Organization.
particular is still a challenge task, more work should be done Text, vol. 8(3), 243-281.
to improve the quality of the system.
[11] Mihalcea, R. and Tarau,P. 2004. TextRank: Bringing order
into texts. In Proc. of EMNLP-04.
[12] Marcu, D. 1997. The Rhetorical Parsing, Summarization, and
Generation of Natural Language Texts”, PhD Thesis,
Department of Computer Science, University of Toronto.
4
The corpus used in [3] is different than ours.

View publication stats

Work On Your Grammar Pre-Intermediate A2 (RED)
100% (38)
Work On Your Grammar Pre-Intermediate A2 (RED)
128 pages
The Forty Rules of Love
0% (3)
The Forty Rules of Love
2 pages
EnglishWorkBook 150920171422
No ratings yet
EnglishWorkBook 150920171422
85 pages
Urban Sociolinguistics
100% (1)
Urban Sociolinguistics
261 pages
Conceptual Framework For Abstractive Text Summarization
No ratings yet
Conceptual Framework For Abstractive Text Summarization
11 pages
The Impact of Rule-Based Text Generation On The Quality of Abstractive Summaries
No ratings yet
The Impact of Rule-Based Text Generation On The Quality of Abstractive Summaries
10 pages
moawad2012
No ratings yet
moawad2012
7 pages
Text Summarization Using Natural Language Processing
No ratings yet
Text Summarization Using Natural Language Processing
5 pages
Abstractive Text Summary Generation With Knowledge Graph Representation
No ratings yet
Abstractive Text Summary Generation With Knowledge Graph Representation
9 pages
Extractive Text Summarization Using Word Frequency
No ratings yet
Extractive Text Summarization Using Word Frequency
6 pages
ATSSI Abstractive Text Summarization Using Sentiment Infusion
No ratings yet
ATSSI Abstractive Text Summarization Using Sentiment Infusion
7 pages
2020.acl-main.457
No ratings yet
2020.acl-main.457
14 pages
Text Summarization Using Python NLTK
No ratings yet
Text Summarization Using Python NLTK
8 pages
An Extractive Approach for English Text
No ratings yet
An Extractive Approach for English Text
11 pages
Rane, Govilkar - 2019 - Recent Trends in Deep Learning Based Abstractive Text Summarization-Annotated
No ratings yet
Rane, Govilkar - 2019 - Recent Trends in Deep Learning Based Abstractive Text Summarization-Annotated
8 pages
Abstractive Text Summarization: State of The Art, Challenges, and Improvements
No ratings yet
Abstractive Text Summarization: State of The Art, Challenges, and Improvements
38 pages
EASESUM: An Online Abstractive and Extractive Text Summarizer Using Deep Learning Technique
No ratings yet
EASESUM: An Online Abstractive and Extractive Text Summarizer Using Deep Learning Technique
12 pages
Abstractive Text Summarization Using Transformer Based Approach
No ratings yet
Abstractive Text Summarization Using Transformer Based Approach
10 pages
Abstractive Text Summarization Using Transformer Architecture
No ratings yet
Abstractive Text Summarization Using Transformer Architecture
5 pages
Rare Words in Text Summarization
No ratings yet
Rare Words in Text Summarization
11 pages
Abstractive Survey
No ratings yet
Abstractive Survey
8 pages
Proposing An Extractive Mono-Document Summarization System For Persian Language
No ratings yet
Proposing An Extractive Mono-Document Summarization System For Persian Language
8 pages
Automatic Text Summarization Using Python
No ratings yet
Automatic Text Summarization Using Python
8 pages
Research Final
No ratings yet
Research Final
6 pages
Feature Based Automatic Text Summarization Methods a Comprehensive State-Of-The-Art Survey
No ratings yet
Feature Based Automatic Text Summarization Methods a Comprehensive State-Of-The-Art Survey
23 pages
1 s2.0 S2949719124000281 Main
No ratings yet
1 s2.0 S2949719124000281 Main
11 pages
Optimal Features Set For Extractive Automatic Text Summarization
No ratings yet
Optimal Features Set For Extractive Automatic Text Summarization
6 pages
A_Survey_of_Advances_in_Text_Summarization_Methods
No ratings yet
A_Survey_of_Advances_in_Text_Summarization_Methods
5 pages
Abstractive Text Summarization of Multimedia News Content Using RNN
No ratings yet
Abstractive Text Summarization of Multimedia News Content Using RNN
10 pages
2023.newsum-1.4
No ratings yet
2023.newsum-1.4
8 pages
22mca025 22mca032 22mca034
No ratings yet
22mca025 22mca032 22mca034
14 pages
Paper A Survey On ETS
No ratings yet
Paper A Survey On ETS
6 pages
State of The Art Text - Summarisation
No ratings yet
State of The Art Text - Summarisation
15 pages
A_Comprehensive_Survey_of_Abstractive_Text_Summarization_Techniques
No ratings yet
A_Comprehensive_Survey_of_Abstractive_Text_Summarization_Techniques
5 pages
Text Summarization Using Word Frequency
No ratings yet
Text Summarization Using Word Frequency
3 pages
Paper for reference
No ratings yet
Paper for reference
47 pages
Data Representation for Deep Learning - Based Arabic Text Summarization Performance Using Python Results
No ratings yet
Data Representation for Deep Learning - Based Arabic Text Summarization Performance Using Python Results
18 pages
Research Paper On Text
No ratings yet
Research Paper On Text
7 pages
NLP Miniproject
No ratings yet
NLP Miniproject
8 pages
Get To The Point: Summarization With Pointer-Generator Networks
No ratings yet
Get To The Point: Summarization With Pointer-Generator Networks
20 pages
Types of Extractive Methods
No ratings yet
Types of Extractive Methods
22 pages
Automatic Text Summarization Using Natural Language Processing PDF
No ratings yet
Automatic Text Summarization Using Natural Language Processing PDF
54 pages
Automatic Text Summarization Using Natural Language Processing
No ratings yet
Automatic Text Summarization Using Natural Language Processing
54 pages
NLP Text Summary
No ratings yet
NLP Text Summary
21 pages
An Overall Survey of Extractive Based Automatic Text Summarization Methods
No ratings yet
An Overall Survey of Extractive Based Automatic Text Summarization Methods
6 pages
A Neural Attention Model For Abstractive Sentence Summarization
No ratings yet
A Neural Attention Model For Abstractive Sentence Summarization
11 pages
NLP Report
No ratings yet
NLP Report
14 pages
Abstractive Text Summarization Using Deep Learning
No ratings yet
Abstractive Text Summarization Using Deep Learning
43 pages
Comparative Analysis of T5 Model For Abstractive Text Summarization On Different Datasets
No ratings yet
Comparative Analysis of T5 Model For Abstractive Text Summarization On Different Datasets
7 pages
Textsummarization 171230181022
No ratings yet
Textsummarization 171230181022
17 pages
Text Summarizing Using NLP
No ratings yet
Text Summarizing Using NLP
8 pages
Bachelor Thesis 2016
No ratings yet
Bachelor Thesis 2016
56 pages
14.0
No ratings yet
14.0
20 pages
Unsupervised Text Summarization Using Sentence Embeddings: Aishwarya Padmakumar Akanksha Saran
No ratings yet
Unsupervised Text Summarization Using Sentence Embeddings: Aishwarya Padmakumar Akanksha Saran
9 pages
ASWIN_TS_summarisation_of_NLP_simplified_notes_unit_3[1]
No ratings yet
ASWIN_TS_summarisation_of_NLP_simplified_notes_unit_3[1]
4 pages
Text Summarizer Using NLP (Natural Language Processing) : © JUL 2022 - IRE Journals - Volume 6 Issue 1 - ISSN: 2456-8880
No ratings yet
Text Summarizer Using NLP (Natural Language Processing) : © JUL 2022 - IRE Journals - Volume 6 Issue 1 - ISSN: 2456-8880
6 pages
A Graph Based Approach On Extractive Summarization
No ratings yet
A Graph Based Approach On Extractive Summarization
9 pages
Nbnfi Fe2020042322174
No ratings yet
Nbnfi Fe2020042322174
27 pages
Summarization of Odia Text Document Using Cosine Similarity and Clustering
No ratings yet
Summarization of Odia Text Document Using Cosine Similarity and Clustering
4 pages
Combining Word Embeddings and N-Grams For Unsupervised Document Summarization
No ratings yet
Combining Word Embeddings and N-Grams For Unsupervised Document Summarization
5 pages
Automatic Text Recognisation
No ratings yet
Automatic Text Recognisation
4 pages
Abstractive Summarizer For Youtube Videos: Abstract. The Paper Goal Is To Design A User Interface Where The User Can Get
No ratings yet
Abstractive Summarizer For Youtube Videos: Abstract. The Paper Goal Is To Design A User Interface Where The User Can Get
8 pages
Extractive Text Summarization: Motilal Nehru National Institute of Technology Allahabad
No ratings yet
Extractive Text Summarization: Motilal Nehru National Institute of Technology Allahabad
29 pages
An Introduction to Functional Programming Through Lambda Calculus
From Everand
An Introduction to Functional Programming Through Lambda Calculus
Greg Michaelson
No ratings yet
Pakistan English Linguistics Contest Kids Grade 3 4
No ratings yet
Pakistan English Linguistics Contest Kids Grade 3 4
8 pages
Are They Your Friends
No ratings yet
Are They Your Friends
6 pages
Speakout 2e Advanced Contents
No ratings yet
Speakout 2e Advanced Contents
2 pages
First ND Second Book of Sanskrit
No ratings yet
First ND Second Book of Sanskrit
4 pages
88 Second-Conditional US Student
No ratings yet
88 Second-Conditional US Student
14 pages
Any Important Work That Doesn't Begin With: BISMILLAH Is Devoid of Barakah.
No ratings yet
Any Important Work That Doesn't Begin With: BISMILLAH Is Devoid of Barakah.
12 pages
The Ballad of Billy Magee
100% (1)
The Ballad of Billy Magee
26 pages
Metaphor in Political Discourse: Armenian Folia Anglistika
No ratings yet
Metaphor in Political Discourse: Armenian Folia Anglistika
6 pages
Gender FoL
No ratings yet
Gender FoL
30 pages
Simple Present
No ratings yet
Simple Present
21 pages
Class 8 Syllabus 22 23nov March - 14 Dec 22
No ratings yet
Class 8 Syllabus 22 23nov March - 14 Dec 22
7 pages
How Can You Learn English Alone - Self-Study Plan! - EnglishClass101
100% (1)
How Can You Learn English Alone - Self-Study Plan! - EnglishClass101
6 pages
Initial 2013-14
No ratings yet
Initial 2013-14
20 pages
5 English Model Tests For Revision
No ratings yet
5 English Model Tests For Revision
67 pages
Shivani SAT Writing Notes
100% (1)
Shivani SAT Writing Notes
8 pages
Language Milestones Rhea Paul 02
No ratings yet
Language Milestones Rhea Paul 02
1 page
Elementary Unit Test 6: Grammar
No ratings yet
Elementary Unit Test 6: Grammar
1 page
Present
No ratings yet
Present
21 pages
Language and Identity in Greece (1900-1976)
100% (1)
Language and Identity in Greece (1900-1976)
55 pages
9 Elementary Test 2 PDF Syntactic Relationships Morphology
No ratings yet
9 Elementary Test 2 PDF Syntactic Relationships Morphology
1 page
Anglmova 5kl Mitchell 2022-Pages-4
No ratings yet
Anglmova 5kl Mitchell 2022-Pages-4
12 pages
Soal Responsi Usp Xii Sma Ke 1 Bahasa Inggris
No ratings yet
Soal Responsi Usp Xii Sma Ke 1 Bahasa Inggris
9 pages
Quiz Passive Voice
No ratings yet
Quiz Passive Voice
1 page
Useful Sentences
No ratings yet
Useful Sentences
21 pages
Terjemahan Beranotasi Dongeng La Sorcière de La Rue Mouffetard Da
No ratings yet
Terjemahan Beranotasi Dongeng La Sorcière de La Rue Mouffetard Da
18 pages
Unit - 1 Formation of Words 1.1 Morphology: Morphemes-Free and Bound, Lexical and Functional, Derivational and Inflectional
No ratings yet
Unit - 1 Formation of Words 1.1 Morphology: Morphemes-Free and Bound, Lexical and Functional, Derivational and Inflectional
4 pages

An Approach To Abstractive Text Summarization

Uploaded by

An Approach To Abstractive Text Summarization

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

An approach to abstractive text summarization

Article · March 2015

Machine Learning View project

The user has requested enhancement of the downloaded file.

Huong Thanh Le Tien Manh Le

Keywords- abstractive text summarization, discourse relation,

View publication stats

You might also like