0% found this document useful (0 votes)
45 views

Topic Modeling MFM

The document discusses text analysis using topic modeling. It introduces topic modeling and latent Dirichlet allocation (LDA), which is a popular topic modeling technique. It explains how LDA works by assigning probabilistic topics to documents and words within those topics. Various tools for implementing LDA in Python are also mentioned.

Uploaded by

mike yordan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Topic Modeling MFM

The document discusses text analysis using topic modeling. It introduces topic modeling and latent Dirichlet allocation (LDA), which is a popular topic modeling technique. It explains how LDA works by assigning probabilistic topics to documents and words within those topics. Various tools for implementing LDA in Python are also mentioned.

Uploaded by

mike yordan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Text Analysis using Topic Modeling

Perkenalan
● Educational Background
○ Bachelor of Mathematics, Universitas Indonesia 2012-2017

● Working Experience
○ Backend Engineer at Dattabot (2016-2017)
○ Data Analyst at Home Credit Indonesia (2018)
○ Senior Data Engineer at Detik.com (2018-2023)
Pembukaan
● Introduction to Topic Modelling
● Topic Modelling Application
● Document Representation
● Latent Dirichlet Allocation
● LDA Intuition and Idea
● LDA Process
● Tools and Framework
Introduction to Topic Modelling
● Topic Modelling is a type of statistical model for
discovering the abstract "topics" that occur in a
collection of documents.
● Topic Modelling aims to grouping certains of article
by the similarity of their “topics” without knowing
the label of the “topics”
● Topic Modelling itself can be classified as
unsupervised method or commonly as clustering
method but for documents or articles
What is Topics?
Joko Widodo Messi Iphone
pemerintahan Ronaldo ChatGPT
Luhut Binsar Pandjaitan Liga Inggris prosesor
Anies Baswedan Inter Milan deep learning
pemilu gol artificial intelligence
koalisi pinalti Javascript

polisi Marvel resesi


tersangka Blackpink inflasi
begal Last of Us neraca dagang
pasal Isyana investasi
korban konser suku bunga
penipuan syuting Bank Indonesia
Text Representation
● Documents or texts is not a quantitative value, so it must convert to value to be calculated or
measured by computer
● Several method to quantify the text is by using frequency of words or probabilistic of the word itself
Topics Modelling Intuition
● Idea :
○ Topics is an label of document which contains
several consistent pattern of words
○ Document are mixtures of topics and topic is
a probability distribution over words

● Aim :
○ Discover pattern of word use and and
connects documents that exhibit similar
pattern
Latent Dirichlet Allocation (LDA)
●  
LDA
LDA Formula
Let θ as is the topic distribution for document i,
α is the parameter of the Dirichlet prior on the per-document topic distributions,
β is the parameter of the Dirichlet prior on the per-topic word distribution,

p(word w with topic t) = p(topic t | document d) * p(word w | topic t)


LDA
●  
LDA
Assume there are two topics which every word is consisted for each topics

words P(words | topic =1) P(words | topic =2)


Heart 0.2 0
Love 0.2 0
Soul 0.2 0
Tears 0.2 0
Joy 0.2 0
Scientific 0 0.2
Knowledge 0 0.2
Work 0 0.2
Research 0 0.2
Mathematics 0 0.2
LDA
Words in Document  {P(topic=1), P(topic=2)}
MATHEMATICS KNOWLEDGE RESEARCH WORK
MATHEMATICS RESEARCH WORK SCIENTIFIC {0,1}
MATHEMATICS WORK
SCIENTIFIC KNOWLEDGE MATHEMATICS
SCIENTIFIC HEART LOVE TEARS KNOWLEDGE {0.25, 0.75}
HEART
MATHEMATICS HEART RESEARCH LOVE
MATHEMATICS WORK TEARS SOUL KNOWLEDGE {0.5, 0.5}
HEART
WORK JOY SOUL TEARS MATHEMATICS TEARS
{0.75, 0.25}
LOVE LOVE LOVE SOUL
TEARS LOVE JOY SOUL LOVE TEARS SOUL
{1,0}
SOUL TEARS JOY 
Tools and Framework
● Python 3.6+
● Pandas
● Sklearn
● Seaborn/Matplotlib
● Gensim
● PyLDAvis (required <= pandas 1.2)
Implementation
Reference
● https://www.geeksforgeeks.org/latent-dirichlet-allocation/
● https://socs.binus.ac.id/2018/11/29/latent-dirichlet-allocation-lda/
● https://towardsdatascience.com/latent-dirichlet-allocation-lda-9d1cd064ffa2
● https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf

You might also like