Multi Document Summarization Research Paper 1
Multi Document Summarization Research Paper 1
Virendra P. Yadav, Anshul M. Gedam, Sanskar Korekar, Pranay There, Sujal Bitle, Tarang Bhaisare
Department of Computer Science and Engineering, P .C .E Nagpur, India
1.Abstract :
2.Introduction :
Literature Review :
Extractive summariser is used for the extraction of the most repetitive words ,
it also decides that which sentence and points are more important depending
upon their uses in the uploaded documents.
The main purpose of the clustering approach is to determine the topics and
the summerize the text from the multi documents uploaded in proper
sequences and subtopics. The clustering approach forms the topics for the
domain and forms sentences in proper format for the following topic.
5)Domain-oriented Summarisation:-
Methodology :
Rule Engine :
Rule engines, in the context of this study, operate through the establishment of
a structured and intelligible framework for rule definition. These rules are
conventionally formulated as "if-then" statements, wherein conditions undergo
assessment, and corresponding actions are executed upon their satisfaction.
Rule engines serve as a means to automate intricate decision-making
processes, oversee workflows, and enforce organisational policies.
Within the scope of this research, the rule engine is harnessed to mechanise
the process of generating pertinent summaries from a collection of uploaded
documents. The rule engine employs a rule-based approach for the extraction
of sentences, paragraphs, or key phrases, with a focus on content relevance
within the uploaded documents.
The central objective of the rule engine lies in the curation of significant and
valuable information within the resulting summary. In the context of this
research paper, the rule engine leverages cross-document coherence rules to
discern connections among the concepts, events, and entities present in the
provided documents. It also dynamically adjusts the length and structure of the
summary in response to the complexity of the source material.
The incorporation of the rule engine into the summarisation process has
contributed to its systematisation, flexibility, and adaptability in handling multi-
document summarisation tasks. This paper delves into the utilisation of the rule
engine to enhance the efficiency and effectiveness of multi-document
summarisation by harnessing rule-based techniques and cross-document
coherence rules to capture the essence of the source documents, thereby
facilitating a more refined and coherent summarisation process.
This research underscores the signi cance of this innovative approach, which
allows the rule engine to optimise the summarisation process by rearranging
the order of Clustering, TF-IDF, Topic Modelling, and Cosine Similarity
operations based on the inherent needs of the documents and the rules it
employs. This adaptive and rule-centric methodology contributes to the rule
engine's ability to enhance the ef ciency and effectiveness of multi-document
summarisation, offering a novel perspective in the eld of automated document
summarisation.
fi
fi
fi
fi
fi
PDF Text Extraction using PyPDF2:
Library: PyPDF2 is a Python library used for working with PDF les and It
allows you to extract text and manipulate PDF documents.
Usage: This technique is essential for extracting text content from PDF
documents, which may contain text, images, and various other elements.
Extracted text can then be processed, analysed, and summarised as
needed.
Text Preprocessing :
• Lowercasing:
• Purpose: Convert all text to lowercase to ensure uniformity and
reduce the impact of letter case on text analysis. This makes
"Hello" and "hello" identical for analysis.
• Removing Numbers:
• Purpose: Exclude numerical digits from the text. This is useful
for tasks where numerical values are not relevant, such as
sentiment analysis or topic modelling.
• Handling Emojis:
• Purpose: Depending on the analysis task, you can choose to
remove, replace, or retain emojis. Emojis can convey sentiment
and add meaning to text, so handling them depends on your
speci c goals.
• Removing Diacritics:
• Purpose: Use the “unicodedata” library to remove accents and
diacritical marks from characters. This simpli es the text and
fi
fi
fi
ensures that words with and without diacritics are treated the
same way.
Topic Modelling :
How it works:
2. Gensim:
Key Features:
• Interpreting Topics:
• Examine the top words associated with each topic to
understand the themes it represents.
• Analyse the document-topic distributions to see which
topics are prevalent in each document.
How it works:
Key Features:
Clustering :
1. K-Means Clustering:
How it works:
2. NumPy:
Extractive Summarisation :
1. Extractive Summarisation:
How it works:
Sequencing sentences
Results:
Analysis:
Future Scope :
Conclusion :
2. Zhang, Y., Ni, A., Mao, Z., Wu, C. H., Zhu, C., Deb, B.,
Awadallah, A. H., Radev, D., & Zhang, R. (2022). SUMMN: A
Multi-Stage Summarization Framework for Long Input Dialogues
and Documents. In Proceedings of the 60th Annual Meeting of
the Association for Computational Linguistics (Vol. 1: Long
Papers, pp. 1592-1604). Association for Computational
Linguistics.
7. Asa, A. S., Akter, S., Uddin, M. P., Hossain, M. D., Roy, S. K., &
Afjal, M. I. (2017). A Comprehensive Survey on Extractive Text
Summarization Techniques. American Journal of Engineering
Research (AJER), 6(1), 226-239. https://www.ajer.org/ISSN:
2320-0847 | p-ISSN: 2320-0936
8. Liu, P. J., Saleh, M., Pot, E., Goodrich, B., Sepassi, R., Kaiser,
Ł., & Shazeer, N. (2018). Generating Wikipedia by Summarizing
Long Sequences. In Proceedings of the International
Conference on Learning Representations (ICLR),
arXiv:1801.10198v1 [cs.CL].
10. White, C. T., Molino, N. P., Yang, J. S., & Conroy, J. M. (2022).
occams: A Text Summarization Package. Analytics, 2, 546–559.
https://doi.org/10.3390/analy
12. Koh, H. Y., Ju, J., Liu, M., & Pan, S. (2022). An Empirical Survey
on Long Document Summarization: Datasets, Models and
Metrics. arXiv preprint arXiv:2207.00939v1 [cs.CL].
fi
fl
13. P, Keerthana. (2021). Automatic Text Summarization Using
Deep Learning. EPRA International Journal of Multidisciplinary
Research (IJMR), 7(4). https://doi.org/10.36713/epra2013
15. White, C. T., Molino, N. P., Yang, J. S., & Conroy, J. M. (Year of
publication). occams: A Text Summarization Package. Analytics,
2, 546-559. https://doi.org/10.3390/analytics2030030.