SlideShare a Scribd company logo
Efficient Methods for Incorporating
Knowledge into Topic Models
[Yang, Downey and Boyd-Graber 2015]
2015/10/24
EMNLP 2015 Reading
@shuyo
Large-scale Topic Model
• In academic papers
– Up to 10^3 topics
• Industrial applications
– 10^5~10^6 topics!
– Search engines, online ads. and so on
– To capture infrequent topics
• This paper handles up to 500 topics...
really?
(Standard) LDA
[Blei+ 2003, Griffiths+ 2004]
• "Conventional" Gibbs sampling
𝑃 𝑧 = 𝑡 𝒛−, 𝑤 ∝ 𝑞𝑡 ≔ 𝑛 𝑑,𝑡 + 𝛼
𝑛 𝑤,𝑡 + 𝛽
𝑛 𝑡 + 𝑉𝛽
– 𝑇 : Topic size
– For 𝑈~𝒰 0, 𝑧
𝑇 𝑞 𝑧 , find 𝑡 s.t. 𝑧
𝑡−1 𝑞 𝑧 < 𝑈 < 𝑧
𝑡 𝑞 𝑧
• For large T, it is computationally intensive
– 𝑛 𝑤,𝑡 is sparse
– When T is very large, 𝑛 𝑑,𝑡 is too e.g. 𝑇 = 106
> 𝑛 𝑑
SparseLDA [Yao+ 2009]
𝑡
𝑃 𝑧 = 𝑡 𝒛−, 𝑤 ∝
𝑡
𝛼𝛽
𝑛 𝑡 + 𝑉𝛽
+
𝑡
𝑛 𝑑,𝑡 𝛽
𝑛 𝑡 + 𝑉𝛽
+
𝑡
𝑛 𝑑,𝑡 + 𝛼 𝑛 𝑤,𝑡
𝑛 𝑡 + 𝑉𝛽
• 𝑠 = 𝑡 𝑠𝑡 , 𝑟 = 𝑡 𝑟𝑡 , 𝑞 = 𝑡 𝑞𝑡
• For 𝑈~𝒰 0, 𝑠 + 𝑟 + 𝑞 ,
– If 0 < 𝑈 < 𝑠, find 𝑡 s.t. 𝑧
𝑡−1
𝑠 𝑧 < 𝑈 < 𝑧
𝑡
𝑠 𝑧
– If 𝑠 < 𝑈 < 𝑠 + 𝑟, find 𝑡 s.t.𝑛 𝑑,𝑡 > 0, 𝑧
𝑡−1
𝑟𝑧 < 𝑈 − 𝑠 < 𝑧
𝑡
𝑟𝑧
– If 𝑠 + 𝑟 < 𝑈 < 𝑠 + 𝑟 + 𝑞,
find 𝑡 s.t.𝑛 𝑤,𝑡 > 0, 𝑧
𝑡−1 𝑞 𝑧 < 𝑈 − 𝑠 − 𝑟 < 𝑧
𝑡 𝑞 𝑧
• Faster because 𝑛 𝑤,𝑡 and 𝑛 𝑑,𝑡 are sparse
𝑠𝑡 𝑟𝑡 𝑞𝑡
independent on w, d dependent on d only
Leveraging Prior Knowledge
• The objective function of topic models
does not correlate with human
judgements
Word correlation prior
knowledge
• Must-link
– “quarterback” and “fumble” are both
related to American football
• Cannot-link
– “fumble” and “bank” imply two different
topics
SC-LDA [Yang+ 2015]
• 𝑚 ∈ 𝑀 : Prior knowledge
• 𝑓𝑚(𝑧, 𝑤, 𝑑) : Potential function of prior
knowledge 𝑚 about word 𝑤 with topic
𝑧 in document 𝑑
• 𝜓 𝒛, 𝑀 = 𝑧∈𝒛 exp 𝑓𝑚 𝑧, 𝑤, 𝑑
• 𝑃 𝒘, 𝒛 𝛼, 𝛽, 𝑀 = 𝑃 𝒘 𝒛, 𝛽 𝑃 𝒛 𝛼 𝜓(𝒛, 𝑀)
maybe ∝
maybe 𝑚 ∈ 𝑀, all 𝑤 with 𝑧 in all 𝑑
Sparse Constrained
Inference for SC-LDA
𝑉
Word correlation prior
knowledge for SC-LDA
• 𝑓𝑚 𝑧, 𝑤, 𝑑 =
𝑢∈𝑀 𝑤
𝑚
log max 𝜆, 𝑛 𝑢,𝑧 +
𝑣∈𝑀 𝑤
𝑐
log
1
max 𝜆, 𝑛 𝑣,𝑧
– where 𝑀 𝑤
𝑚 : Must-link of 𝑤, 𝑀 𝑤
𝑐 : Cannot-link of 𝑤
• 𝑃 𝑧 = 𝑡 𝒛−, 𝑤, 𝑀 ∝
𝛼𝛽
𝑛 𝑡+𝑉𝛽
+
𝑛 𝑑,𝑡 𝛽
𝑛 𝑡+𝑉𝛽
+
𝑛 𝑑,𝑡+𝛼 𝑛 𝑤,𝑡
𝑛 𝑡+𝑉𝛽
𝑢∈𝑀 𝑤
𝑚
max 𝜆, 𝑛 𝑢,𝑧
𝑣∈𝑀 𝑤
𝑐
1
max 𝜆, 𝑛 𝑣,𝑧
Factor Graph
• They tell that prior knowledge is incorporated
“by adding a factor graph to encode prior
knowledge,” but it does not be drawn.
• The potential function 𝑓𝑚 𝑧, 𝑤, 𝑑 contains 𝑛 𝑤,𝑧,
and 𝜑 𝑤,𝑧 ∝ 𝑛 𝑤,𝑧 + 𝛽.
• So the above model seems like Fig.b:
Fig.a Fig.b
[Ramage+ 2009] Labeled LDA
• Supervized LDA for labeled documents
– It is equivalent to SC-LDA with the
following potential function
𝑓𝑚 𝑧, 𝑤, 𝑑 =
1, if 𝑧 ∈ 𝑚 𝑑
−∞, else
where 𝑚 𝑑 specifies a label set of 𝑑
Experiments
• Baselines
– Dirichlet Forest-LDA [Andrzejewski+ 2009]
– Logic-LDA [Andrzejewski+ 2011]
– MRF-LDA [Xie+ 2015]
• Encodes word correlations in LDA as MRF
– SparseLDA
DATASET DOCS TYPE TOKEN(APPROX) Experiments
NIPS 1,500 12,419 1,900,000
Word correlation
NYT-NEWS 3,000,000 102,660 100,000,000
20NG 18,828 21,514 1,946,000 Labeled docs
Generate Word Correlation
• Must-link
– Obtain synsets from WordNet 3.0
– Similarity between the word and its
synsets on word embedding from
word2vec is higher than threshold 0.2
• Cannot-link
– Nothing?
Convergence Speed
The average running time per iteration
over 100 iterations, averaged over 5
seeds, on 20NG dataset.
Coherence [Mimno+ 2011]
• 𝐶 𝑡: 𝑉 𝑡 = 𝑚=2
𝑀
𝑙=1
𝑚−1
log
𝐹 𝑣 𝑚
𝑡
,𝑣𝑙
𝑡
+𝜖
𝐹 𝑣𝑙
𝑡
– 𝐹 𝑣 : document frequency of word type 𝑣
– 𝐹 𝑣, 𝑣′ :co-document frequency of word type 𝑣, 𝑣′
It means
“include”?
𝜖 is very small like
10−12
[Röder+ 2015]
-39.1 -36.6
References
• [Yang+ 2015] Efficient Methods for Incorporating Knowledge into Topic Models
• [Blei+ 2003] Latent Dirichlet allocation.
• [Griffiths+ 2004] Finding scientific topics.
• [Yao+ 2009] Efficient methods for topic model inference on streaming document
collections.
• [Ramage+ 2009] Labeled LDA: A supervised topic model for credit attribution in
multilabeled corpora.
• [Andrzejewski+ 2009] Incorporating domain knowledge into topic modeling via
Dirichlet forest priors.
• [Andrzejewski+ 2011] A framework for incorporating general domain knowledge
into latent Dirichlet allocation using first-order logic.
• [Xie+ 2015] Incorporating word correlation knowledge into topic modeling.
• [Mimno+ 2011] Optimizing semantic coherence in topic models.
• [Röder+ 2015] Exploring the space of topic coherence measures.

More Related Content

What's hot (14)

PDF
Skip gram and cbow
hyunyoung Lee
 
PDF
AI applications in education, Pascal Zoleko, Flexudy
Erlangen Artificial Intelligence & Machine Learning Meetup
 
PPTX
A first look at tf idf-pdx data science meetup
Dan Sullivan, Ph.D.
 
PPT
Topic Models
Claudia Wagner
 
PDF
Word2vec and Friends
Bruno Gonçalves
 
PPTX
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
HPCC Systems
 
PPT
Maximum likelihood-set - introduction
Yusuke Matsubara
 
PPT
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
rusbase
 
PDF
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
👋 Christopher Moody
 
PDF
Probabilistic Retrieval
otisg
 
PDF
Navigating and Exploring RDF Data using Formal Concept Analysis
Mehwish Alam
 
PDF
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
Masumi Shirakawa
 
PPT
8. String
Nilesh Dalvi
 
Skip gram and cbow
hyunyoung Lee
 
AI applications in education, Pascal Zoleko, Flexudy
Erlangen Artificial Intelligence & Machine Learning Meetup
 
A first look at tf idf-pdx data science meetup
Dan Sullivan, Ph.D.
 
Topic Models
Claudia Wagner
 
Word2vec and Friends
Bruno Gonçalves
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
HPCC Systems
 
Maximum likelihood-set - introduction
Yusuke Matsubara
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
rusbase
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
👋 Christopher Moody
 
Probabilistic Retrieval
otisg
 
Navigating and Exploring RDF Data using Formal Concept Analysis
Mehwish Alam
 
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
Masumi Shirakawa
 
8. String
Nilesh Dalvi
 

Viewers also liked (10)

PDF
Learning Better Embeddings for Rare Words Using Distributional Representations
Takanori Nakai
 
PPTX
EMNLP 2015 yomikai
Yo Ehara
 
PPTX
Emnlp読み会資料
Jiro Nishitoba
 
PDF
Humor Recognition and Humor Anchor Extraction
裕樹 奥田
 
PDF
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Shuyo Nakatani
 
PDF
無限関係モデル (続・わかりやすいパターン認識 13章)
Shuyo Nakatani
 
PDF
強化学習その1
nishio
 
PDF
星野「調査観察データの統計科学」第3章
Shuyo Nakatani
 
PDF
星野「調査観察データの統計科学」第1&2章
Shuyo Nakatani
 
PDF
A Neural Attention Model for Sentence Summarization [Rush+2015]
Yuta Kikuchi
 
Learning Better Embeddings for Rare Words Using Distributional Representations
Takanori Nakai
 
EMNLP 2015 yomikai
Yo Ehara
 
Emnlp読み会資料
Jiro Nishitoba
 
Humor Recognition and Humor Anchor Extraction
裕樹 奥田
 
Memory Networks (End-to-End Memory Networks の Chainer 実装)
Shuyo Nakatani
 
無限関係モデル (続・わかりやすいパターン認識 13章)
Shuyo Nakatani
 
強化学習その1
nishio
 
星野「調査観察データの統計科学」第3章
Shuyo Nakatani
 
星野「調査観察データの統計科学」第1&2章
Shuyo Nakatani
 
A Neural Attention Model for Sentence Summarization [Rush+2015]
Yuta Kikuchi
 
Ad

Similar to [Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowledge into Topic Models (20)

PPTX
04-Data-Analysis-Overview.pptx
Shree Shree
 
PDF
R tutorial
Richard Vidgen
 
PPT
Text Mining
sathish sak
 
PDF
Information Retrieval and Map-Reduce Implementations
Jason J Pulikkottil
 
PDF
An Overview of Naïve Bayes Classifier
ananth
 
PPTX
Welcome-to-Data-Structures-and-Algorithms-Course.pptx
ReemEmad26
 
PPTX
Text Mining Infrastructure in R
Ashraf Uddin
 
PDF
Developing in R - the contextual Multi-Armed Bandit edition
Robin van Emden
 
PPTX
Improving Search in Workday Products using Natural Language Processing
DataWorks Summit
 
PDF
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
ryanorban
 
PDF
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
Dr Arash Najmaei ( Phd., MBA, BSc)
 
PPTX
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Max Irwin
 
PPTX
Recurrent Neural Networks for Text Analysis
odsc
 
PPTX
Word_Embeddings.pptx
GowrySailaja
 
PDF
retrieval augmentation generation presentation slide part2
ViswakarmaChakravart
 
PPTX
Natural Language Processing in R (rNLP)
fridolin.wild
 
PDF
Deep learning for NLP
Shishir Choudhary
 
PPTX
A Matching Approach Based on Term Clusters for eRecruitment
Kemal Can Kara
 
PDF
information retrival and text processing
mausamraushan2288
 
PPTX
Learning deep structured semantic models for web search
hyunsung lee
 
04-Data-Analysis-Overview.pptx
Shree Shree
 
R tutorial
Richard Vidgen
 
Text Mining
sathish sak
 
Information Retrieval and Map-Reduce Implementations
Jason J Pulikkottil
 
An Overview of Naïve Bayes Classifier
ananth
 
Welcome-to-Data-Structures-and-Algorithms-Course.pptx
ReemEmad26
 
Text Mining Infrastructure in R
Ashraf Uddin
 
Developing in R - the contextual Multi-Armed Bandit edition
Robin van Emden
 
Improving Search in Workday Products using Natural Language Processing
DataWorks Summit
 
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
ryanorban
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
Dr Arash Najmaei ( Phd., MBA, BSc)
 
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Max Irwin
 
Recurrent Neural Networks for Text Analysis
odsc
 
Word_Embeddings.pptx
GowrySailaja
 
retrieval augmentation generation presentation slide part2
ViswakarmaChakravart
 
Natural Language Processing in R (rNLP)
fridolin.wild
 
Deep learning for NLP
Shishir Choudhary
 
A Matching Approach Based on Term Clusters for eRecruitment
Kemal Can Kara
 
information retrival and text processing
mausamraushan2288
 
Learning deep structured semantic models for web search
hyunsung lee
 
Ad

More from Shuyo Nakatani (20)

PDF
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
Shuyo Nakatani
 
PDF
Generative adversarial networks
Shuyo Nakatani
 
PDF
人工知能と機械学習の違いって?
Shuyo Nakatani
 
PDF
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
Shuyo Nakatani
 
PDF
ドラえもんでわかる統計的因果推論 #TokyoR
Shuyo Nakatani
 
PDF
言語処理するのに Python でいいの? #PyDataTokyo
Shuyo Nakatani
 
PDF
Zipf? (ジップ則のひみつ?) #DSIRNLP
Shuyo Nakatani
 
PDF
ソーシャルメディアの多言語判定 #SoC2014
Shuyo Nakatani
 
PDF
猫に教えてもらうルベーグ可測
Shuyo Nakatani
 
PDF
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
Shuyo Nakatani
 
PDF
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
Shuyo Nakatani
 
PDF
Active Learning 入門
Shuyo Nakatani
 
PDF
数式を綺麗にプログラミングするコツ #spro2013
Shuyo Nakatani
 
PDF
ノンパラベイズ入門の入門
Shuyo Nakatani
 
PDF
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
Shuyo Nakatani
 
PDF
Short Text Language Detection with Infinity-Gram
Shuyo Nakatani
 
PDF
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
Shuyo Nakatani
 
PDF
極大部分文字列を使った twitter 言語判定
Shuyo Nakatani
 
PDF
人間言語判別 カタルーニャ語編
Shuyo Nakatani
 
PDF
Extreme Extraction - Machine Reading in a Week
Shuyo Nakatani
 
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
Shuyo Nakatani
 
Generative adversarial networks
Shuyo Nakatani
 
人工知能と機械学習の違いって?
Shuyo Nakatani
 
RとStanでクラウドセットアップ時間を分析してみたら #TokyoR
Shuyo Nakatani
 
ドラえもんでわかる統計的因果推論 #TokyoR
Shuyo Nakatani
 
言語処理するのに Python でいいの? #PyDataTokyo
Shuyo Nakatani
 
Zipf? (ジップ則のひみつ?) #DSIRNLP
Shuyo Nakatani
 
ソーシャルメディアの多言語判定 #SoC2014
Shuyo Nakatani
 
猫に教えてもらうルベーグ可測
Shuyo Nakatani
 
アラビア語とペルシャ語の見分け方 #DSIRNLP 5
Shuyo Nakatani
 
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
Shuyo Nakatani
 
Active Learning 入門
Shuyo Nakatani
 
数式を綺麗にプログラミングするコツ #spro2013
Shuyo Nakatani
 
ノンパラベイズ入門の入門
Shuyo Nakatani
 
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
Shuyo Nakatani
 
Short Text Language Detection with Infinity-Gram
Shuyo Nakatani
 
[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
Shuyo Nakatani
 
極大部分文字列を使った twitter 言語判定
Shuyo Nakatani
 
人間言語判別 カタルーニャ語編
Shuyo Nakatani
 
Extreme Extraction - Machine Reading in a Week
Shuyo Nakatani
 

Recently uploaded (20)

PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
July Patch Tuesday
Ivanti
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 

[Yang, Downey and Boyd-Graber 2015] Efficient Methods for Incorporating Knowledge into Topic Models

  • 1. Efficient Methods for Incorporating Knowledge into Topic Models [Yang, Downey and Boyd-Graber 2015] 2015/10/24 EMNLP 2015 Reading @shuyo
  • 2. Large-scale Topic Model • In academic papers – Up to 10^3 topics • Industrial applications – 10^5~10^6 topics! – Search engines, online ads. and so on – To capture infrequent topics • This paper handles up to 500 topics... really?
  • 3. (Standard) LDA [Blei+ 2003, Griffiths+ 2004] • "Conventional" Gibbs sampling 𝑃 𝑧 = 𝑡 𝒛−, 𝑤 ∝ 𝑞𝑡 ≔ 𝑛 𝑑,𝑡 + 𝛼 𝑛 𝑤,𝑡 + 𝛽 𝑛 𝑡 + 𝑉𝛽 – 𝑇 : Topic size – For 𝑈~𝒰 0, 𝑧 𝑇 𝑞 𝑧 , find 𝑡 s.t. 𝑧 𝑡−1 𝑞 𝑧 < 𝑈 < 𝑧 𝑡 𝑞 𝑧 • For large T, it is computationally intensive – 𝑛 𝑤,𝑡 is sparse – When T is very large, 𝑛 𝑑,𝑡 is too e.g. 𝑇 = 106 > 𝑛 𝑑
  • 4. SparseLDA [Yao+ 2009] 𝑡 𝑃 𝑧 = 𝑡 𝒛−, 𝑤 ∝ 𝑡 𝛼𝛽 𝑛 𝑡 + 𝑉𝛽 + 𝑡 𝑛 𝑑,𝑡 𝛽 𝑛 𝑡 + 𝑉𝛽 + 𝑡 𝑛 𝑑,𝑡 + 𝛼 𝑛 𝑤,𝑡 𝑛 𝑡 + 𝑉𝛽 • 𝑠 = 𝑡 𝑠𝑡 , 𝑟 = 𝑡 𝑟𝑡 , 𝑞 = 𝑡 𝑞𝑡 • For 𝑈~𝒰 0, 𝑠 + 𝑟 + 𝑞 , – If 0 < 𝑈 < 𝑠, find 𝑡 s.t. 𝑧 𝑡−1 𝑠 𝑧 < 𝑈 < 𝑧 𝑡 𝑠 𝑧 – If 𝑠 < 𝑈 < 𝑠 + 𝑟, find 𝑡 s.t.𝑛 𝑑,𝑡 > 0, 𝑧 𝑡−1 𝑟𝑧 < 𝑈 − 𝑠 < 𝑧 𝑡 𝑟𝑧 – If 𝑠 + 𝑟 < 𝑈 < 𝑠 + 𝑟 + 𝑞, find 𝑡 s.t.𝑛 𝑤,𝑡 > 0, 𝑧 𝑡−1 𝑞 𝑧 < 𝑈 − 𝑠 − 𝑟 < 𝑧 𝑡 𝑞 𝑧 • Faster because 𝑛 𝑤,𝑡 and 𝑛 𝑑,𝑡 are sparse 𝑠𝑡 𝑟𝑡 𝑞𝑡 independent on w, d dependent on d only
  • 5. Leveraging Prior Knowledge • The objective function of topic models does not correlate with human judgements
  • 6. Word correlation prior knowledge • Must-link – “quarterback” and “fumble” are both related to American football • Cannot-link – “fumble” and “bank” imply two different topics
  • 7. SC-LDA [Yang+ 2015] • 𝑚 ∈ 𝑀 : Prior knowledge • 𝑓𝑚(𝑧, 𝑤, 𝑑) : Potential function of prior knowledge 𝑚 about word 𝑤 with topic 𝑧 in document 𝑑 • 𝜓 𝒛, 𝑀 = 𝑧∈𝒛 exp 𝑓𝑚 𝑧, 𝑤, 𝑑 • 𝑃 𝒘, 𝒛 𝛼, 𝛽, 𝑀 = 𝑃 𝒘 𝒛, 𝛽 𝑃 𝒛 𝛼 𝜓(𝒛, 𝑀) maybe ∝ maybe 𝑚 ∈ 𝑀, all 𝑤 with 𝑧 in all 𝑑 Sparse Constrained
  • 9. Word correlation prior knowledge for SC-LDA • 𝑓𝑚 𝑧, 𝑤, 𝑑 = 𝑢∈𝑀 𝑤 𝑚 log max 𝜆, 𝑛 𝑢,𝑧 + 𝑣∈𝑀 𝑤 𝑐 log 1 max 𝜆, 𝑛 𝑣,𝑧 – where 𝑀 𝑤 𝑚 : Must-link of 𝑤, 𝑀 𝑤 𝑐 : Cannot-link of 𝑤 • 𝑃 𝑧 = 𝑡 𝒛−, 𝑤, 𝑀 ∝ 𝛼𝛽 𝑛 𝑡+𝑉𝛽 + 𝑛 𝑑,𝑡 𝛽 𝑛 𝑡+𝑉𝛽 + 𝑛 𝑑,𝑡+𝛼 𝑛 𝑤,𝑡 𝑛 𝑡+𝑉𝛽 𝑢∈𝑀 𝑤 𝑚 max 𝜆, 𝑛 𝑢,𝑧 𝑣∈𝑀 𝑤 𝑐 1 max 𝜆, 𝑛 𝑣,𝑧
  • 10. Factor Graph • They tell that prior knowledge is incorporated “by adding a factor graph to encode prior knowledge,” but it does not be drawn. • The potential function 𝑓𝑚 𝑧, 𝑤, 𝑑 contains 𝑛 𝑤,𝑧, and 𝜑 𝑤,𝑧 ∝ 𝑛 𝑤,𝑧 + 𝛽. • So the above model seems like Fig.b: Fig.a Fig.b
  • 11. [Ramage+ 2009] Labeled LDA • Supervized LDA for labeled documents – It is equivalent to SC-LDA with the following potential function 𝑓𝑚 𝑧, 𝑤, 𝑑 = 1, if 𝑧 ∈ 𝑚 𝑑 −∞, else where 𝑚 𝑑 specifies a label set of 𝑑
  • 12. Experiments • Baselines – Dirichlet Forest-LDA [Andrzejewski+ 2009] – Logic-LDA [Andrzejewski+ 2011] – MRF-LDA [Xie+ 2015] • Encodes word correlations in LDA as MRF – SparseLDA DATASET DOCS TYPE TOKEN(APPROX) Experiments NIPS 1,500 12,419 1,900,000 Word correlation NYT-NEWS 3,000,000 102,660 100,000,000 20NG 18,828 21,514 1,946,000 Labeled docs
  • 13. Generate Word Correlation • Must-link – Obtain synsets from WordNet 3.0 – Similarity between the word and its synsets on word embedding from word2vec is higher than threshold 0.2 • Cannot-link – Nothing?
  • 14. Convergence Speed The average running time per iteration over 100 iterations, averaged over 5 seeds, on 20NG dataset.
  • 15. Coherence [Mimno+ 2011] • 𝐶 𝑡: 𝑉 𝑡 = 𝑚=2 𝑀 𝑙=1 𝑚−1 log 𝐹 𝑣 𝑚 𝑡 ,𝑣𝑙 𝑡 +𝜖 𝐹 𝑣𝑙 𝑡 – 𝐹 𝑣 : document frequency of word type 𝑣 – 𝐹 𝑣, 𝑣′ :co-document frequency of word type 𝑣, 𝑣′ It means “include”? 𝜖 is very small like 10−12 [Röder+ 2015] -39.1 -36.6
  • 16. References • [Yang+ 2015] Efficient Methods for Incorporating Knowledge into Topic Models • [Blei+ 2003] Latent Dirichlet allocation. • [Griffiths+ 2004] Finding scientific topics. • [Yao+ 2009] Efficient methods for topic model inference on streaming document collections. • [Ramage+ 2009] Labeled LDA: A supervised topic model for credit attribution in multilabeled corpora. • [Andrzejewski+ 2009] Incorporating domain knowledge into topic modeling via Dirichlet forest priors. • [Andrzejewski+ 2011] A framework for incorporating general domain knowledge into latent Dirichlet allocation using first-order logic. • [Xie+ 2015] Incorporating word correlation knowledge into topic modeling. • [Mimno+ 2011] Optimizing semantic coherence in topic models. • [Röder+ 2015] Exploring the space of topic coherence measures.