Topic Modelling Unveiling Hidden Themes in Text

this machine learning nlp research paper

Uploaded by

tanmaya

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Topic Modelling Unveiling Hidden Themes in Text

this machine learning nlp research paper

Uploaded by

tanmaya

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Topic Modelling to

discover underlying theme

in document
Topic modelling is a powerful technique used in natural language
processing (NLP) to uncover hidden topics within a collection of
documents. This presentation explores two prominent topic
modelling approaches: Latent Dirichlet Allocation (LDA) and Non-
Negative Matrix Factorization (NMF), comparing their strengths and
weaknesses in generating coherent and interpretable topics.
The Problem of
Unstructured Text Data
Exponential Growth Identifying Patterns
The volume of unstructured Organizations and
text data is rapidly researchers need to identify
increasing across various meaningful patterns and
domains, making it themes within large
challenging to extract collections of documents.
meaningful insights.

Topic Modelling as a Solution

Topic modelling provides a solution by revealing hidden topics
within unstructured data, offering valuable insights into the
underlying themes.
LDA and NMF: Two Approaches to Topic
Modelling
Latent Dirichlet Allocation (LDA) Non-Negative Matrix Factorization (NMF)

A probabilistic model that assumes documents are A linear-algebra-based approach that decomposes
generated as mixtures of topics, each characterized by documents into parts that can be reconstructed
a specific distribution of words. through latent topics of representation.
Research Questions
1 Coherence and 2 Computational
Interpretability Efficiency and
Scalability
How do LDA and NMF
compare in terms of the What are the differences
coherence and in computational
interpretability of the efficiency and scalability
generated topics? between LDA and NMF?

3 Topic Separation and Interpretability

Which algorithm better separates topics and is more
interpretable on a given dataset of texts?
Methodology: Data Collection and Preprocessing
1 Data Collection
A sample of multiple short documents belonging to different thematic categories (sports, technology,
general knowledge) was created.

2 Data Cleaning
Documents were cleaned to ensure consistency and noise reduction, including tokenization, lowercase
conversion, stop word removal, and lemmatization.

3 Vectorization
The pre-processed documents were vectorized using Count Vectorization for LDA and TF-IDF Vectorization
for NMF.
Topic Modelling Algorithms: LDA and NMF
Algorithm Input Parameters

LDA Count Vectorized document-term Alpha (document-topic density),

matrix Beta (topic-word density)

NMF TF-IDF vectorized document-term Number of topics, initialization

matrix method, regularization settings
Evaluation Metrics: Assessing Topic Quality

Topic Coherence Perplexity (for LDA)

Measures the semantic similarity of top words within Measures the model's ability to represent the data, with
each topic, indicating the interpretability of the topics. lower perplexity indicating a better fit.

Human Interpretability Computational Efficiency

Evaluates the clarity and relevance of the topics based Measures the time taken for the model to converge and
on human judgment. memory usage during training.
Results and Analysis:
Comparing LDA and NMF
LDA
Produces more coherent topics, capturing the
probabilistic nature of word distribution across topics.

NMF
Offers greater interpretability, with topics centered
around more specific keywords.

Computational Efficiency
NMF outperforms LDA in terms of computational speed
and scalability.
Conclusion: Choosing the Right Topic
Modelling Approach
The choice between LDA and NMF depends on the specific requirements of the application. LDA is well-suited for
complex datasets with overlapping themes, while NMF is computationally efficient and provides clearer topics for
simpler datasets.
Future Work: Exploring
Hybrid Models and Deep
Learning
Future research could explore hybrid models that combine the
strengths of LDA and NMF, or integrate deep learning models to
further enhance topic coherence and interpretability. This research
provides valuable insights into the practical considerations
involved in choosing a topic modelling technique, guiding data
scientists and researchers in selecting the most appropriate
approach for their specific data and goals.

ScientificComputing2eHeath Solution
100% (10)
ScientificComputing2eHeath Solution
161 pages
Bayesian Statistical Methods
100% (10)
Bayesian Statistical Methods
288 pages
Lecture 8.2 - Variational Quantum Eigensolver
No ratings yet
Lecture 8.2 - Variational Quantum Eigensolver
27 pages
Operational Calculus - A Theory of Hyperfunctions - Yosida
No ratings yet
Operational Calculus - A Theory of Hyperfunctions - Yosida
181 pages
Topic Modeling With BERT. - Towards Data Science
No ratings yet
Topic Modeling With BERT. - Towards Data Science
9 pages
Term Paper Int 423
No ratings yet
Term Paper Int 423
9 pages
Topic Modeling Uncovering Hidden Themes in Text
No ratings yet
Topic Modeling Uncovering Hidden Themes in Text
10 pages
Topic Modelling Using NLP
No ratings yet
Topic Modelling Using NLP
18 pages
dbm302Presentation
No ratings yet
dbm302Presentation
5 pages
2024.eacl-long.51
No ratings yet
2024.eacl-long.51
20 pages
1 Text Mining Review Slides
No ratings yet
1 Text Mining Review Slides
78 pages
s10462-023-10661-7
No ratings yet
s10462-023-10661-7
30 pages
Information Retrieval Using Effective Bigram Topic Modeling
No ratings yet
Information Retrieval Using Effective Bigram Topic Modeling
8 pages
Business Analytics (A Case-Study Approach Using LDA Topic Modeling)
No ratings yet
Business Analytics (A Case-Study Approach Using LDA Topic Modeling)
6 pages
Adison Wongkar, Christoph Wertz, What Are People Saying About Net Neutrality
No ratings yet
Adison Wongkar, Christoph Wertz, What Are People Saying About Net Neutrality
5 pages
Advanced Deep Learning Techniques for Natural Language Understanding: A Comprehensive Guide
From Everand
Advanced Deep Learning Techniques for Natural Language Understanding: A Comprehensive Guide
Adam Jones
No ratings yet
Topic Model For LDA
No ratings yet
Topic Model For LDA
9 pages
Social Media Web and Text Analytics
No ratings yet
Social Media Web and Text Analytics
10 pages
Apex Institute of Technology Natural Language Processing (CST-354)
No ratings yet
Apex Institute of Technology Natural Language Processing (CST-354)
22 pages
An Integrated Clustering and BERT Framework For Improved Topic Modeling
No ratings yet
An Integrated Clustering and BERT Framework For Improved Topic Modeling
9 pages
What Is Topic Modeling - A Beginner's Guide
No ratings yet
What Is Topic Modeling - A Beginner's Guide
20 pages
Pip Install Guidedlda
No ratings yet
Pip Install Guidedlda
3 pages
Probabilistic Topic Modeling and Its Variants - A Survey: Padmaja CH V R S Lakshmi Narayana
No ratings yet
Probabilistic Topic Modeling and Its Variants - A Survey: Padmaja CH V R S Lakshmi Narayana
5 pages
Sessionppt Topicmoelling
No ratings yet
Sessionppt Topicmoelling
40 pages
Ke Et Al. - 2024 - Recent Advances in Text Analysis
No ratings yet
Ke Et Al. - 2024 - Recent Advances in Text Analysis
60 pages
Text Classification MLND Project Report Prasann Pandya
No ratings yet
Text Classification MLND Project Report Prasann Pandya
17 pages
Topic Modeling P.P.T
No ratings yet
Topic Modeling P.P.T
27 pages
Latent Dirichlet Allocation LDA and Topic Modeling PDF
No ratings yet
Latent Dirichlet Allocation LDA and Topic Modeling PDF
41 pages
A Gentle Introduction To Topic Modeling Using Pyth
No ratings yet
A Gentle Introduction To Topic Modeling Using Pyth
10 pages
Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, A Survey
No ratings yet
Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, A Survey
40 pages
MARK3088 - Lecture WK 5 - New Product Idea Generation
No ratings yet
MARK3088 - Lecture WK 5 - New Product Idea Generation
46 pages
7.2 Latent
No ratings yet
7.2 Latent
27 pages
IIT-P ADS Week 22 Transcripts
No ratings yet
IIT-P ADS Week 22 Transcripts
4 pages
NLP and ML Project
100% (1)
NLP and ML Project
37 pages
Topic Modeling v.02
No ratings yet
Topic Modeling v.02
26 pages
Text Mining Applications and Theory
100% (4)
Text Mining Applications and Theory
223 pages
3 Topic Models
No ratings yet
3 Topic Models
15 pages
Unit-4 NLP
No ratings yet
Unit-4 NLP
21 pages
Using Topic Modeling Methods For Short-Text Data: A Comparative Analysis
No ratings yet
Using Topic Modeling Methods For Short-Text Data: A Comparative Analysis
14 pages
Unit6 002
No ratings yet
Unit6 002
10 pages
Machine Learning: Suprit Shrestha (19BCE2584)
No ratings yet
Machine Learning: Suprit Shrestha (19BCE2584)
20 pages
Machine Learning for data science Unit-5
No ratings yet
Machine Learning for data science Unit-5
10 pages
chp_5
No ratings yet
chp_5
57 pages
Business Analytics CA3
No ratings yet
Business Analytics CA3
11 pages
Song 2009
No ratings yet
Song 2009
4 pages
Text Mining Package and Datacleaning: #Cleaning The Text or Text Transformation
No ratings yet
Text Mining Package and Datacleaning: #Cleaning The Text or Text Transformation
6 pages
Sma Exp 4
No ratings yet
Sma Exp 4
3 pages
Topic Modelling Meets Deep Neural Networks - A Survey
No ratings yet
Topic Modelling Meets Deep Neural Networks - A Survey
8 pages
AFM_Module 4
No ratings yet
AFM_Module 4
48 pages
Abdelrazek Et Al 2023 - Topic Modeling Algorithms and Applications, A Survey - Information Systems 112 (2023) 102131
No ratings yet
Abdelrazek Et Al 2023 - Topic Modeling Algorithms and Applications, A Survey - Information Systems 112 (2023) 102131
17 pages
Topic Modelling Using Non-Negative Matrix Factorization: Anjusha C MA18M008
No ratings yet
Topic Modelling Using Non-Negative Matrix Factorization: Anjusha C MA18M008
21 pages
2021 Agreeing to Disagree - Choosing Among Eight Topic-modeling Methods
No ratings yet
2021 Agreeing to Disagree - Choosing Among Eight Topic-modeling Methods
9 pages
Ex6 SMA
No ratings yet
Ex6 SMA
11 pages
Report NLP
No ratings yet
Report NLP
25 pages
Combine PDF
No ratings yet
Combine PDF
7 pages
Session 2
No ratings yet
Session 2
58 pages
Text Summarization Using NLP Final
No ratings yet
Text Summarization Using NLP Final
38 pages
Maier 2018
No ratings yet
Maier 2018
27 pages
MCQ-402- Unstructured Data Analysis
No ratings yet
MCQ-402- Unstructured Data Analysis
20 pages
Identifying Hot Information Security Topics Using LDA and Multivariate Mann-Kendall Test
No ratings yet
Identifying Hot Information Security Topics Using LDA and Multivariate Mann-Kendall Test
11 pages
2019 - Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, A Survey
No ratings yet
2019 - Latent Dirichlet Allocation (LDA) and Topic Modeling: Models, Applications, A Survey
43 pages
One ✅ (18)
No ratings yet
One ✅ (18)
38 pages
Regular Expressions Demystified: A Practical Guide with Examples
From Everand
Regular Expressions Demystified: A Practical Guide with Examples
William E. Clark
No ratings yet
Basic Category Theory for Computer Scientists
From Everand
Basic Category Theory for Computer Scientists
Benjamin C. Pierce
3.5/5 (14)
DAA Module 4 - NOTES
No ratings yet
DAA Module 4 - NOTES
21 pages
NFTOOL
No ratings yet
NFTOOL
8 pages
What Is LightGBM, How To Implement It - How To Fine Tune The Parameters
No ratings yet
What Is LightGBM, How To Implement It - How To Fine Tune The Parameters
2 pages
Copy of Copy of CRISP_ML (1)
No ratings yet
Copy of Copy of CRISP_ML (1)
8 pages
Introduction To The Simulation of Mechanical Systems
No ratings yet
Introduction To The Simulation of Mechanical Systems
15 pages
The Feynman Path Integral Approach To Atomic Interferometry. A Tutorial
No ratings yet
The Feynman Path Integral Approach To Atomic Interferometry. A Tutorial
30 pages
Classification: Decision Tree Induction: Lecture #9
No ratings yet
Classification: Decision Tree Induction: Lecture #9
121 pages
DSA Final Solution
No ratings yet
DSA Final Solution
7 pages
Optimization Techniques and Their Applications To Mine
No ratings yet
Optimization Techniques and Their Applications To Mine
41 pages
Entropy Production in The Steady State
No ratings yet
Entropy Production in The Steady State
4 pages
Mech
No ratings yet
Mech
89 pages
IFM01A1 2024 ST2 Main SSA Marksheet
No ratings yet
IFM01A1 2024 ST2 Main SSA Marksheet
1 page
Stochastic Partial Differential Equations 2nd Edition Chow download pdf
100% (9)
Stochastic Partial Differential Equations 2nd Edition Chow download pdf
82 pages
Damped
No ratings yet
Damped
11 pages
Forecasting
No ratings yet
Forecasting
121 pages
A Comparative Analysis of Adversarial Robustness For Quantum and Classical Machine Learning Models
No ratings yet
A Comparative Analysis of Adversarial Robustness For Quantum and Classical Machine Learning Models
11 pages
Understanding Back-Translation at Scale
No ratings yet
Understanding Back-Translation at Scale
12 pages
Recursive Macroeconomic Theory Third Edition Lars Ljungqvist 2024 scribd download
100% (6)
Recursive Macroeconomic Theory Third Edition Lars Ljungqvist 2024 scribd download
60 pages
Regression Analysis and Its Application: A Data-Oriented Approach First Edition Richard F. Gunst download pdf
100% (1)
Regression Analysis and Its Application: A Data-Oriented Approach First Edition Richard F. Gunst download pdf
65 pages
FEM Syllabus
No ratings yet
FEM Syllabus
2 pages
BSC208 Revision Question
No ratings yet
BSC208 Revision Question
6 pages
Data Analytics Project Life Cycle
No ratings yet
Data Analytics Project Life Cycle
4 pages
Sec D EE 490 Artifical Intelligence
No ratings yet
Sec D EE 490 Artifical Intelligence
9 pages
Is It Time To Reformulate The Partial Differential Equations of Poisson and Laplace?
No ratings yet
Is It Time To Reformulate The Partial Differential Equations of Poisson and Laplace?
9 pages
177 Compound Interest चक्रवृद्धि ब्याज Compound Interest 1 118 Questions
No ratings yet
177 Compound Interest चक्रवृद्धि ब्याज Compound Interest 1 118 Questions
9 pages
Random Processes:spectral Characteristics
No ratings yet
Random Processes:spectral Characteristics
5 pages

Topic Modelling Unveiling Hidden Themes in Text

Uploaded by

Topic Modelling Unveiling Hidden Themes in Text

Uploaded by

Topic Modelling to

discover underlying theme

Topic Modelling as a Solution

3 Topic Separation and Interpretability

LDA Count Vectorized document-term Alpha (document-topic density),

NMF TF-IDF vectorized document-term Number of topics, initialization

Topic Coherence Perplexity (for LDA)

Human Interpretability Computational Efficiency

You might also like