0% found this document useful (0 votes)

32 views

Text Summarization Using NLP Final

Project Presentation on Text summarisation

Uploaded by

Soundar Ravi

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Text Summarization Using NLP Final

Project Presentation on Text summarisation

Uploaded by

Soundar Ravi

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

TEXT SUMMARIZATION USING

NLP

BY:

SOUNDARAJULU R–
21MDT1069
GUIDED BY: DR. SOMNATH
1
BERA
CONTENTS
• background
• Project objective
• About dataset
• About algorithms
• Algorithms implemented
• Comparisons
• Conclusion
• Future work
• References

2
INTRO ABOUT NATURAL LANGUAGE
PROCESSING
• NLP stands for natural language processing which is used to understand and
interpret human language to the machine.

• Basically it is the automatic way to manipulate the natural language like speech
and text, by software for further analysis to get the required information from
them.

• NLP combines computational linguistics—rule-based modeling of human

language—with statistical, machine learning, and deep learning models.

• These enable machines to process human language in the form of text or voice
data.

• This can not implemented on single paragraph ,because it requires more text data. 3
LITERATURE REVIEW

• Vishnu Preethi K, Vijaya MS - 16 April 2018 "Text Summarizers for Education News Articles" In this
paper, presentation examination was made and found that the famous Message rank calculation which
was not involved a lot of in message rundown research produce improved results for both datasets.
• Termite: Visualization Techniques for Assessing Textual Topic Models Jason Chuang,
Christopher D. Manning, Jeffrey Heer Advanced Visual Interfaces, 2012 In this paper, they
examines document-topic probabilities. They also focused on understanding terms and
term-topic distributions.
• arson Sievert and Kenneth Shirley. 2014. LDAvis: A method for visualizing and interpreting
topics. In Proceedings of the Workshop on Interactive Language Learning, Visualization,
and Interfaces, pages 63–70, Baltimore, Maryland, USA. Association for Computational
Linguistics. In this paper, they have explained a web based interactive visual representation
LDAvis to determine topic-term relation using an R package "LDAvis".

4
PROJECT OBJECTIVE

• TO SUMMARISE TEXT DOCUMENTS USING NLP and comparing the results of algorithms implemented.

5
DATASET

• Article Mixture (Text & Lex rank)

• BBC News (LSA) (raw texts related to 5 different categories such as business, entertainment, politics, sports, and
tech.)
• NIPS 1987-2016 papers (LDA)

6
ALGORITHMS IMPLEMENTED

• TextRank Algorithm

• LexRank Algorithm

• ROUGE (Recall – Oriented Understudy for Gisting Evaluation)

• LSA (Latent Semantic Analysis)

• LDA (Latent Dirichlet Allocation)

7
TEXT SUMMARIZATION

• Text summarization is the process of shortening the number of sentences and words of a report
without changing its importance.

• There are various methods to separate data from raw text data and use it for a summarization
model, generally they can be sorted as Extractive and Abstractive.

8
TYPES OF TEXT SUMMARIZATION

• Extractive methods select the main sentences

inside a message (without essentially figuring
out the importance), in this way the result is
only a subset of the full text.

• On the contrary, Abstractive models utilize

advanced NLP (for example word embeddings)
to grasp the semantics of the text and create a
significant outline.

• Subsequently, Abstractive strategies are a lot

harder to train (from scratch) as they need a
ton of parameters and data.
9
HOW TO DO TEXT SUMMARIZATION

• Text cleaning

• Sentence tokenization

• Word tokenization

• Summarization

10
TEXT CLEANING

• Removing Punctuations
• Removing Numbers, Extra Cases
• Removing HTML Tags
• Removing & Finding URL
• Removing & Finding Email ID
• Removing Stop Words
• Spell Check
• Remove the less frequent words

11
SENTENCE AND WORD TOKENIZATION

• WORD TOKENIZATION:-

• SENTENCE TOKENIZATION:-
• Word tokenization is the method involved with
parting a huge sample of text into words.
• Sentence Tokenization is the process of
splitting text into individual sentences.
• Each word should be captured and subjected
to additional analysis like classifying and
counting them for a specific sentiment and so
on.

12
TEXT RANK ALGORITHM

• TextRank (2004) is an unsupervised graph based ranking model for text processing, based on Google’s
PageRank Algorithm.

• First the whole text is split into sentences, then the algorithm builds a graph where sentences are the
nodes and overlapped words are the links.

• Finally it identifies the most important nodes of this network for these sentences.

13
14
LEXRANK ALGORITHM

• LexRank is an unsupervised graph based approach for text summarization in which the scoring of
sentences is done using the graph method.

• The main idea is that sentences "suggest" other similar sentences to the reader.

• Ex: This is an example of the article. This is the second example sentence. This is the third sentence,
that is the most important because it says other sentences are just examples.

15
LEXRANK SCORES

• Cosine Similarity

• Adjacency Matrix

• Connectivity Matrix

• Eigenvector Centrality

16
These Scores defines Similarity Matrix of Classical and Continuous LexRank.

17
18
LexRank TextRank

• In addition to pageRank approach, it uses • Uses typical PageRank approach.

similarity metrics.

• Does not consider any such parameters.

• Considers position and length of sentences.

• Used for Single document summarization.

• Used for Multi-document summarization.

19
ROGUE

• ROGUE refers to Recall – Oriented Understudy for Gisting Evaluation.

• ROUGE metric is used for measuring the performance of the automatic summarization and machine
translation tasks.

• ROUGE-N measures the number of matching ‘n-grams’ between our model-generated text and a
‘reference’.

• An n-gram is simply a grouping of tokens/words.

• A unigram (1-gram) would consist of a single word. A bigram (2-gram) consists of two consecutive
words and so on.
20
RECALL, PRECISION & F1 SCORE

Recall:-
• To ensure our model is capturing all of the information contained in the reference.
• The recall counts (the number of overlapping n-grams found in both the model output and reference)
— then divides this number by (the total number of n-grams in the reference).

Precision:-
• Precision is calculated in almost the exact same way, but rather than dividing by the reference n-gram
count, we divide by the model n-gram count.

F1 Score:-
• F1 Score finds (2*Recall*Precision) — then divides this number by (Recall*Precision).
21
ROUGE FOR TEXTRANK

• R – Recall

• P – Precision

• F – F1 Score

22
ROUGE FOR LEXRANK

• R – Recall

• P – Precision

• F – F1 Score

23
LSA (LATENT SEMANTIC ANALYSIS)

• LSA (Latent Semantic Analysis) also known as LSI (Latent Semantic Index) LSA uses bag of word(BoW)
model, which results in a term-document matrix(occurrence of terms in a document).

• Rows represent terms and columns represent documents. LSA learns latent topics by performing a
matrix decomposition on the document-term matrix using Singular value decomposition.

• LSA is typically used as a dimension reduction or noise reducing technique.

24
TEXT CLASSIFICATION VS TOPIC
MODELING

https://www.datacamp.com/tutorial/discovering-hidden-topics-python

• Text classification is a supervised machine learning problem, where a text document or article
classified into a pre-defined set of classes. Topic modeling is the process of discovering groups of co-
occurring words in text documents.
25
• Topic modeling can be used to solve the text classification problem. Topic modeling will
identify the topics presents in a document" while text classification classifies the text into a
single class.

https://www.datacamp.com/tutorial/discovering-hidden-topics-python
IMPLEMENTING LSA

• Loading original data into data frame

• PRE PROCESSING
• Document Clustering

27
Here, only the tech-related news
article looks like having a wider
spread whereas other news
articles nicely clustered.

It also suggests that LSA (or

Truncated SVD) has done a nice
work on the textual data to
extract 200 important
dimensions to segregate news
articles on different topics. It is
to be understood that TSNE is
non-deterministic in nature and
multiple runs will produce
multiple representations, even
though, the structure will be
more likely to remain similar if
not the same.
28
LDA (LATENT DIRICHLET ALLOCATION)

• LDA is a generative probabilistic model that assumes each topic is a mixture over an underlying set of
words, and each document is a mixture of over a set of topic probabilities.

29
IMPLEMENTING LDA

• Loading data
• Data cleaning
• EDA
• Preparing data for LDA analysis
• LDA model training
• Analyzing LDA model results

30
31
32
LSA VS LDA

• LSA and LDA have same input which is Bag of words in matrix format. LSA focus on reducing matrix
dimension while LDA solves topic modeling problems.
• LDA & LSA are unsupervised

33
CONCLUSION

• We have done word cleaning and prepossessing steps. We have done word tokenization and sentence
tokenization to bring out the summary. Even though the summary is too large. Then we implemented
Lex rank and Text rank for the dataset. And gained f1 measure. By Comparing the result, we conclude
that lex rank giving the 98% of f1 score. And also we have implemented LSA & LDA for different
dataset (bbc news & nips papers).
• We have generated eda and document clustering for LSA and summarization output generated by
query.
• In LDA we have created word cloud and LDA visualization (Intertopic Distance Map (via
multidimensional scaling), Marginal topic distribution, Overall term frequency, Estimated term
frequency within the selected topic)
• Both LDA and LSA are used for topic summarization based on the dataset.
34
REFERENCES

• [1]Termite: Visualization Techniques for Assessing Textual Topic Models Jason Chuang, Christopher D. Manning, Jeffrey Heer
Advanced Visual Interfaces, 2012
• [2]arson Sievert and Kenneth Shirley. 2014. LDAvis: A method for visualizing and interpreting topics. In Proceedings of the
Workshop on Interactive Language Learning, Visualization, and Interfaces, pages 63–70, Baltimore, Maryland, USA. Association
for Computational Linguistics.
• [3] https://www.ijesi.org/v7i4(version-2).html- Vishnu Preethi K, Vijaya MS Text Summarizers for Education News Articles. 16
April 2018, Invention Journals
• [4] Alomari, A., Idris, N., Sabri, A. Q. M., & Alsmadi, I. (2022). Deep reinforcement and transfer learning for abstractive text
summarization: A review. Computer Speech & Language, 71, 101276.
• [5] Wazery, Y. M., Saleh, M. E., Alharbi, A., & Ali, A. A. (2022). Abstractive Arabic Text Summarization Based on Deep Learning.
Computational Intelligence and Neuroscience, 2022.
• [6] Laskar, M. T. R., Hoque, E., & Huang, J. X. (2022). Domain Adaptation with Pre-trained Transformers for Query-Focused
Abstractive Text Summarization. Computational Linguistics, 48(2), 279-320.
• [7] Suleiman, D., & Awajan, A. (2022). Multilayer encoder and single-layer decoder for abstractive Arabic text summarization.
Knowledge-Based Systems, 237, 107791.
• [8] Ertam, F., & Aydin, G. (2022). Abstractive text summarization using deep learning with a new Turkish summarization
benchmark dataset. Concurrency and Computation: Practice and Experience, 34(9), e6482.

35
• [9] Aliakbarpour, H., Manzuri, M. T., & Rahmani, A. M. (2022). Improving the readability and saliency of abstractive text
summarization using combination of deep neural networks equipped with auxiliary attention mechanism. The Journal of
Supercomputing, 78(2), 2528-2555.
• [10] Aggarwal, C. C. (2022). Text summarization. In Machine Learning for Text (pp. 393-418). Springer, Cham.
• [11] Gupta, A., Chugh, D., & Katarya, R. (2022). Automated news summarization using transformers. In Sustainable
Advanced Computing (pp. 249-259). Springer, Singapore.
• [12] Khurana, A., & Bhatnagar, V. (2022). Investigating entropy for extractive document summarization. Expert Systems
with Applications, 187, 115820.
• [13] Zhong, M., Liu, Y., Xu, Y., Zhu, C., & Zeng, M. (2022, June). Dialoglm: Pre-trained model for long dialogue
understanding and summarization. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 10, pp.
11765-11773).
• [14] Ma, C., Zhang, W. E., Guo, M., Wang, H., & Sheng, Q. Z. (2022). Multi-document summarization via deep learning
techniques: A survey. ACM Computing Surveys, 55(5), 1-37.
• [15] Moro, G., & Ragazzi, L. (2022, February). Semantic Self-segmentation for Abstractive Summarization of Long Legal
Documents in Low-resource Regimes. In Proceedings of the Thirty-Six AAAI Conference on Artificial Intelligence,
Virtual (Vol. 22).
• [16] Patil, P., Rao, C., Reddy, G., Ram, R., & Meena, S. M. (2022). Extractive Text Summarization Using BERT. In
Proceedings of the 2nd International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and
Applications (pp. 741-747). Springer, Singapore.
36
• [17] Mohan, G. B., & Kumar, R. P. (2022). A Comprehensive Survey on Topic Modeling in Text Summarization. Micro-
Electronics and Telecommunication Engineering, 231-240.
FUTURE WORK

• In future we can analyze using other techniques to improve the summaries such as
Natural Language Understanding, Natural Language Generation, Multi-Document
Summarization, Personalized Summarization, Cross-Lingual Summarization, Visual
Summarization. These are just a few of the many potential directions for future
research and development in text summarization.

37
THANK YOU

Soundar Poster PDF
No ratings yet
Soundar Poster PDF
1 page
1656254641484 Project Final Presentation (1)
No ratings yet
1656254641484 Project Final Presentation (1)
30 pages
Unit-5-TB
No ratings yet
Unit-5-TB
19 pages
Unit-4 NLP
No ratings yet
Unit-4 NLP
21 pages
Unit-5-TB
No ratings yet
Unit-5-TB
18 pages
Tsreport
No ratings yet
Tsreport
25 pages
1 Text Mining Review Slides
No ratings yet
1 Text Mining Review Slides
78 pages
feature eng
No ratings yet
feature eng
34 pages
Viswajothi Technologies PR Ivate Limited: "Text Summarization Based On NLP"
67% (3)
Viswajothi Technologies PR Ivate Limited: "Text Summarization Based On NLP"
23 pages
21MDT1069 - Soundarajulu R
No ratings yet
21MDT1069 - Soundarajulu R
95 pages
Statistical Language Processing
No ratings yet
Statistical Language Processing
32 pages
dvt u4 my notes
No ratings yet
dvt u4 my notes
15 pages
NLP-Driven Summarization of Local Language Texts
No ratings yet
NLP-Driven Summarization of Local Language Texts
52 pages
Text Summarization
No ratings yet
Text Summarization
6 pages
Topic Modelling Using NLP
No ratings yet
Topic Modelling Using NLP
18 pages
Paper Work
No ratings yet
Paper Work
12 pages
Text Mining
No ratings yet
Text Mining
25 pages
Seminar - Report - PYLI - RAGHURAM - Entire Document Ready
No ratings yet
Seminar - Report - PYLI - RAGHURAM - Entire Document Ready
26 pages
Bachelor Thesis 2016
No ratings yet
Bachelor Thesis 2016
56 pages
CoSc581 NLP Topic 5-Text Summarization PDF
No ratings yet
CoSc581 NLP Topic 5-Text Summarization PDF
25 pages
CSE442 Text
No ratings yet
CSE442 Text
89 pages
Text Summarization Using NLP Technique
No ratings yet
Text Summarization Using NLP Technique
7 pages
Text Summerizer Synopsis-1
No ratings yet
Text Summerizer Synopsis-1
6 pages
FALLSEM2024-25_BCSE409L_TH_VL2024250101879_2024-11-14_Reference-Material-I
No ratings yet
FALLSEM2024-25_BCSE409L_TH_VL2024250101879_2024-11-14_Reference-Material-I
13 pages
Unit 1 NLP and TA
No ratings yet
Unit 1 NLP and TA
9 pages
Information Retrieval Using Effective Bigram Topic Modeling
No ratings yet
Information Retrieval Using Effective Bigram Topic Modeling
8 pages
pdf NLP
No ratings yet
pdf NLP
7 pages
Project File
No ratings yet
Project File
23 pages
Lect08
No ratings yet
Lect08
17 pages
Summarization of Odia Text Document Using Cosine Similarity and Clustering
No ratings yet
Summarization of Odia Text Document Using Cosine Similarity and Clustering
4 pages
Text Mining
No ratings yet
Text Mining
34 pages
IEEE_Conference_Template__1_ (4).pdf (1)
No ratings yet
IEEE_Conference_Template__1_ (4).pdf (1)
3 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
NLP FINAL
No ratings yet
NLP FINAL
33 pages
NLP_MODULE_6
No ratings yet
NLP_MODULE_6
30 pages
Natural Language Processing Using Java: Sang Venkatraman April 21, 2015
No ratings yet
Natural Language Processing Using Java: Sang Venkatraman April 21, 2015
51 pages
AP for NLP-Word 2 Vec
No ratings yet
AP for NLP-Word 2 Vec
33 pages
Text Summarization Using Natural Language Processing
No ratings yet
Text Summarization Using Natural Language Processing
5 pages
Apex Institute of Technology Natural Language Processing (CST-354)
No ratings yet
Apex Institute of Technology Natural Language Processing (CST-354)
22 pages
Chapter 4
No ratings yet
Chapter 4
17 pages
Text Summarizing Using NLP
No ratings yet
Text Summarizing Using NLP
8 pages
Chapter 5 Predictive Analytics II Text^j Web^j and Social Media Analytics
No ratings yet
Chapter 5 Predictive Analytics II Text^j Web^j and Social Media Analytics
5 pages
Automatic Text Summarization Using Natural Language Processing PDF
No ratings yet
Automatic Text Summarization Using Natural Language Processing PDF
54 pages
Automatic Text Summarization Using Natural Language Processing
No ratings yet
Automatic Text Summarization Using Natural Language Processing
54 pages
NLP unit1
No ratings yet
NLP unit1
24 pages
NLP Question Bank Answers(Raghav)- this is better
No ratings yet
NLP Question Bank Answers(Raghav)- this is better
25 pages
Simad University: Chapter 7: Text and Web Mining
No ratings yet
Simad University: Chapter 7: Text and Web Mining
6 pages
Session 11-12 - Text Analytics
No ratings yet
Session 11-12 - Text Analytics
38 pages
DS Finalexam (Thxtoshravani)
No ratings yet
DS Finalexam (Thxtoshravani)
31 pages
Text Summarization - Articles - Weights & Biases
No ratings yet
Text Summarization - Articles - Weights & Biases
16 pages
Topic Models Dsi Talk March 2017
No ratings yet
Topic Models Dsi Talk March 2017
24 pages
S12 Text Analytics
No ratings yet
S12 Text Analytics
15 pages
UNIT_5_DL
No ratings yet
UNIT_5_DL
11 pages
Text
No ratings yet
Text
102 pages
AP for NLP-LO1
No ratings yet
AP for NLP-LO1
61 pages
Text Analysis: Why Do We Need Text Analytics
No ratings yet
Text Analysis: Why Do We Need Text Analytics
2 pages
DL Unit-IV
No ratings yet
DL Unit-IV
20 pages
EXT Ummarization: Kareem El-Sayed Hashem Mohamed Mohsen Brary
No ratings yet
EXT Ummarization: Kareem El-Sayed Hashem Mohamed Mohsen Brary
24 pages
Extractive Text Summarization: Motilal Nehru National Institute of Technology Allahabad
No ratings yet
Extractive Text Summarization: Motilal Nehru National Institute of Technology Allahabad
29 pages
Python Regular Expressions Explained: A Practical Guide with Examples
From Everand
Python Regular Expressions Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Sahelu Tesfaye Mamire
100% (1)
Sahelu Tesfaye Mamire
101 pages
College Students
No ratings yet
College Students
38 pages
Test - Chapter 6 - Quizlet
No ratings yet
Test - Chapter 6 - Quizlet
12 pages
En Tanagra KMO Bartlett PDF
No ratings yet
En Tanagra KMO Bartlett PDF
8 pages
21-6135 Harris County Equity Guidelines
No ratings yet
21-6135 Harris County Equity Guidelines
3 pages
Understanding Local Networks Critical Thinking and Decision Making
No ratings yet
Understanding Local Networks Critical Thinking and Decision Making
15 pages
Thesis Tabulation Group 2
No ratings yet
Thesis Tabulation Group 2
18 pages
Multi-Scale Microstructural Modelling of Concrete Diffusivity: Identification of Significant Variables
No ratings yet
Multi-Scale Microstructural Modelling of Concrete Diffusivity: Identification of Significant Variables
50 pages
DSO581 Fall 2021 Syllabus
No ratings yet
DSO581 Fall 2021 Syllabus
14 pages
General Chemistry I Laboratory Report Format
No ratings yet
General Chemistry I Laboratory Report Format
1 page
The Impostor Phenomenon in Mental Health Professio
100% (1)
The Impostor Phenomenon in Mental Health Professio
14 pages
Eaap Surevy
No ratings yet
Eaap Surevy
23 pages
Thesis Proposal
No ratings yet
Thesis Proposal
11 pages
City Form and Well-Being: What Makes London Neighborhoods Good Places To Live?
No ratings yet
City Form and Well-Being: What Makes London Neighborhoods Good Places To Live?
4 pages
Alteryx Enablement Series
No ratings yet
Alteryx Enablement Series
3 pages
Effects of Technology On Marketing
No ratings yet
Effects of Technology On Marketing
14 pages
Skill Demonstration For Nursing Students: Open Access Journal of Nursing August 2023
No ratings yet
Skill Demonstration For Nursing Students: Open Access Journal of Nursing August 2023
7 pages
Dr. Ria Liza C. Canlas: Technological Institute of The Philippines
No ratings yet
Dr. Ria Liza C. Canlas: Technological Institute of The Philippines
26 pages
Pacing Surveying
No ratings yet
Pacing Surveying
3 pages
Industry Research - IT Sector
No ratings yet
Industry Research - IT Sector
20 pages
Social Media and Teens
No ratings yet
Social Media and Teens
2 pages
Writing An Abstract, Precis, or
0% (1)
Writing An Abstract, Precis, or
28 pages
Behavioural Finance: Carter Racing Case Submission
No ratings yet
Behavioural Finance: Carter Racing Case Submission
3 pages
January 2017 1483613866 142
No ratings yet
January 2017 1483613866 142
4 pages
Roots Industries India Limited
100% (2)
Roots Industries India Limited
12 pages
Making The Most of Field Placement 3rd Edition Helen Cleak 2024 scribd download
No ratings yet
Making The Most of Field Placement 3rd Edition Helen Cleak 2024 scribd download
65 pages
Learning IOM Implications of The Institute of Medicine Reports For Nursing Education 1st Edition Anita Finkelman Download PDF
100% (2)
Learning IOM Implications of The Institute of Medicine Reports For Nursing Education 1st Edition Anita Finkelman Download PDF
84 pages
Chapter-1 Introduction To Perception and Satisfaction
No ratings yet
Chapter-1 Introduction To Perception and Satisfaction
63 pages
Employee Training and Development - Websoft
No ratings yet
Employee Training and Development - Websoft
80 pages
Business Studies in Higher Education: Towards A New Understanding
No ratings yet
Business Studies in Higher Education: Towards A New Understanding
281 pages

Text Summarization Using NLP Final

Uploaded by

Text Summarization Using NLP Final

Uploaded by

TEXT SUMMARIZATION USING

• NLP combines computational linguistics—rule-based modeling of human

• Article Mixture (Text & Lex rank)

• ROUGE (Recall – Oriented Understudy for Gisting Evaluation)

• LSA (Latent Semantic Analysis)

• LDA (Latent Dirichlet Allocation)

• Extractive methods select the main sentences

• On the contrary, Abstractive models utilize

• Subsequently, Abstractive strategies are a lot

• In addition to pageRank approach, it uses • Uses typical PageRank approach.

• Does not consider any such parameters.

• Used for Single document summarization.

• ROGUE refers to Recall – Oriented Understudy for Gisting Evaluation.

• An n-gram is simply a grouping of tokens/words.

• LSA is typically used as a dimension reduction or noise reducing technique.

• Loading original data into data frame

It also suggests that LSA (or

You might also like