RVVM
RVVM
5120/11587-6925,
author = { R. V. V. Murali Krishna, S. Y. Pavan Kumar, Ch. Satyananda Reddy },
title = { A Hybrid Method for Query based Automatic Summarization System },
journal = { International Journal of Computer Applications },
issue_date = { April 2013 },
volume = { 68 },
number = { 6 },
month = { April },
year = { 2013 },
issn = { 0975-8887 },
pages = { 39-43 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume68/number6/11587-6925/ },
doi = { 10.5120/11587-6925 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
))X 100
P = ) /
International Journal of Computer Applications (0975 – 8887)
Volume 68– No.6, April 2013
42
R = ( )
/
F1= (2*P*R) / (P+R)
The second measure we use the ROUGE (Recall-Oriented
Understudy for Gisting Evaluation), which was adopted by NIST
(DUC) for evaluating automatic summarization system. It has
been shown that ROUGE is very effective for measuring
document summarization .It measures summary quality by
counting overlapping units such as the N-gram, word sequences
and word pairs between the candidate summary and the reference
summary. The ROUGE-N [9] measure compares N-grams of two
summaries, and counts the number of matches. The measure is
defined by formula
ROUGE-N=
( ) / (
)
Where N stands for the length of the N-gram, Count_match
(N-grams) is the maximum number of N-grams co-occurring in
candidate summary and a set of reference summaries. Count (Ngrams) is the number of
N-grams in the reference summaries.
5. IMPLEMENTATION
The entire system is implemented in java programming
language (J2SE). The following are the inputs and outputs of the
developed system.
Input Document : Automatic summarization of a text
document based on user query.doc
Query : what is automatic summarization of
a text document Based on a user query?
Output (Summary):
Technology of automatic text
summarization plays an important role in Information Retrieval
Systems and Text Mining, and provides a solution to the
information overload problem. Automatic text summarization is
the creation of a shortened version of a text by a computer
program. Automatic text summarization can be generic or query
specific. This paper is focused on Query specific text
summarization. Query specific text summarization is based on the
calculation of the relationship between sentences in the text
document and the query given. The technology of automatic text
summarization is maturing and may provide a solution to the
information overload problem. Automatic document
summarization aims to condense the original text into essential
content and to assist in filtering and selection of necessary
information.
6. RESULT ANALYSIS
TABLE 1.Output in terms of no of sentences when each
of the methods are applied individually on the source
document consisting of 186 sentences
Sentence Scoring methods Resulting No. of
sentences
from the source
document
Word Form Similarity 3
N-gram Based Similarity 30
Word Order Similarity 7
Semantic Similarity 9
Proposed Similarity 12 Measure
Figure 2: Graphical comparison of different sentence
scoring methods
TABLE 2.Comparision of sentence scoring methods
based on standard summary evaluation measures
Sentence Precision Recall F1- ROUGE-2
Scoring Measure
Method
Word Form 0.27 0.33 0.29 0.64
Similarity
N-gram 0.16 0.55 0.24 0.95
Based
Similarity
Word 0.14 0.11 0.12 0.85
Order
Similarity
Semantic 0.33 0.33 0.32 0.87
Similarity
Proposed 0.50 0.66 0.56 0.93
Similarity
Note : The abstract of the document is taken as the reference
summary to evaluate automatic summary.
7. CONCLUSION
It is observed from the results that the proposed sentence
scoring method has given more accuracy in sentence scoring than
the existing methods. Hence summary obtained is more relevant
and close to manually generated summary.
8. FUTURE SCOPE
The proposed sentence scoring method is based on the average
of the values scored using statistical techniques and linguistic
techniques. Instead of just calculating average of the values, a
weighted average can be taken where the weights for each of the
values can be chosen based on the appropriateness of the methods
applied for the content in the given document. Since a large size
document will usually result in large no of relevant sentences the
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
F1-measure
Rouge-2
International Journal of Computer Applications (0975 – 8887)
Volume 68– No.6, April 2013
43
clustering process can be implemented via a parallel processing
technique like Threads in java to speed up the overall process of
summarization.
9. REFERENCES
[1] Gholamrezazadeh, Saeedeh; Salehi, Mohsen Amini;
Gholamzadeh, Bahareh, "A Comprehensive Survey on Text
Summarization Systems," Computer Science and its
Applications, 2009. CSA '09. 2nd International Conference
on , vol., no., pp.1,6, 10-12 Dec. 2009
doi: 10.1109/CSA.2009.5404226
[2] YoungkoongKo, JungyunSeo, “An Effective SentenceExtraction Technique Using
Contextual Information and
Statistical Approaches for Text Summarization”, Pattern
Recognition Letters. doi:10.1016/j.patrec.2008.02.008
[3] Wasson, M., “Using leading text for news summaries:
Evaluation results and implications for commercial
summarization applications”, in Proc. 17th International
Conference on Computational Linguistics and 36th Annual
Meeting of the ACL, 1998, pp.1364-1368
[4] Salton, G, Automatic Text Processing: The Transformation,
Analysis, and Retrieval of Information by Computer,
Addison-Wesley Publishing Company, 1989.
[5] Waleed al-sanie, “Towards an infrastructure for Arabic text
summarization using rhetorical structure theory”, Master
Thesis, Department of computer science. King Saud
University, Riyadh, Kingdom of Saudi Arabia, 2005.
[6] http://web.science.mq.edu.au/~swan/summarization/projects_f
ull.htm
[7] Bellegarda, J., “Exploiting latent semantic information in
statistical language modeling,” in Proc. IEEE, August
2000.Vol. 88, No. 8,pp: 1279-1296.
[8] R. L. Cilibrasi and P. M. B. Vitanyi, “The google similarity
distance,” IEEE Trans. On Knowl. and Data Eng., vol. 19, no.
3, pp. 370–383, 2007.
[9] Zhang Pei-ying and LI un-he,”Automatic text summarization
based on sentences clustering and extraction” , IEEE2009
[10] Yuhua Li, Zuhair Bandar, David McLean and James O’Shea
,”A Method for Measuring Sentence Similarity and its
application to conversational agents”
[11] Harish Karnick and VaruneshMishra,”Query Specific MultiDocument
Summarization”, Indian Institute of Technology,
Kanpur, April 25, 2010
[12] Adamson, G and J. Boreham, ” The use of an Association
Measure Based on Character Structure to identify
semantically related pairs ofwords and document titles”.