Topic Modelling Unveiling Hidden Themes in Text
Topic Modelling Unveiling Hidden Themes in Text
A probabilistic model that assumes documents are A linear-algebra-based approach that decomposes
generated as mixtures of topics, each characterized by documents into parts that can be reconstructed
a specific distribution of words. through latent topics of representation.
Research Questions
1 Coherence and 2 Computational
Interpretability Efficiency and
Scalability
How do LDA and NMF
compare in terms of the What are the differences
coherence and in computational
interpretability of the efficiency and scalability
generated topics? between LDA and NMF?
2 Data Cleaning
Documents were cleaned to ensure consistency and noise reduction, including tokenization, lowercase
conversion, stop word removal, and lemmatization.
3 Vectorization
The pre-processed documents were vectorized using Count Vectorization for LDA and TF-IDF Vectorization
for NMF.
Topic Modelling Algorithms: LDA and NMF
Algorithm Input Parameters
NMF
Offers greater interpretability, with topics centered
around more specific keywords.
Computational Efficiency
NMF outperforms LDA in terms of computational speed
and scalability.
Conclusion: Choosing the Right Topic
Modelling Approach
The choice between LDA and NMF depends on the specific requirements of the application. LDA is well-suited for
complex datasets with overlapping themes, while NMF is computationally efficient and provides clearer topics for
simpler datasets.
Future Work: Exploring
Hybrid Models and Deep
Learning
Future research could explore hybrid models that combine the
strengths of LDA and NMF, or integrate deep learning models to
further enhance topic coherence and interpretability. This research
provides valuable insights into the practical considerations
involved in choosing a topic modelling technique, guiding data
scientists and researchers in selecting the most appropriate
approach for their specific data and goals.