SlideShare a Scribd company logo
constructing aspect-based sentiment
lexicons with topic modeling
.
Elena Tutubalina1
and Sergey I. Nikolenko1,2,3,4
1
Kazan (Volga Region) Federal University, Kazan, Russia
2
Steklov Institute of Mathematics at St. Petersburg
3
National Research University Higher School of Economics, St. Petersburg
4
Deloitte Analytics Institute, Moscow, Russia
April 7, 2016
intro: topic modeling
and sentiment analysis
.
overview
.
• Very brief overview of the paper:
• we would like to do sentiment analysis;
• there are topic model extensions that deal with sentiment;
• but they always rely on an external dictionary of sentiment words;
• in this work, we show a way to extend this dictionary automatically
from that same topic model.
3
opinion mining
.
• Sentiment analysis / opinion mining:
• traditional approaches set positive/negative labels by hand;
• recently, machine learning models are trained to assign sentiment
scores for most words in the corpora;
• however, they can’t really work totally unsupervised, and
high-quality manual annotation is expensive;
• moreover, there are different aspects.
• Problem: automatically mine sentiment lexicons for specific
aspects.
4
topic modeling with lda
.
• Latent Dirichlet Allocation (LDA) – topic modeling for a corpus of
texts:
• a document is represented as a mixture of topics;
• a topic is a distribution over words;
• to generate a document, for each word we sample a topic and
then sample a word from that topic;
• by learning these distributions, we learn what topics appear in a
dataset and in which documents.
5
topic modeling with lda
.
• Sample LDA result from (Blei, 2012):
5
topic modeling with lda
.
• Sample LDA result from (Blei, 2012):
5
topic modeling with lda
.
• There are two major approaches to inference in probabilistic
models with a loopy factor graph like LDA:
• variational approximations simplify the graph by approximating
the underlying distribution with a simpler one, but with new
parameters that are subject to optimization;
• Gibbs sampling approaches the underlying distribution by
sampling a subset of variables conditional on fixed values of all
other variables.
• Both approaches have been applied to LDA.
• We will extend the Gibbs sampling.
5
lda likelihood
.
• The total likelihood of the LDA model is
p(z, w, α, β) =
∫θ,φ
p(θ ∣ α)p(z ∣ θ)p(w ∣ z, φ)p(φ ∣ β)dθdφ.
6
gibbs sampling
.
• And in collapsed Gibbs sampling, we sample
p(zj = t ∣ z−j, w, α, β) ∝
n
¬j
∗,t,d + α
n
¬j
∗,∗,d + Tα
⋅
n
¬j
w,t,∗ + α
n
¬j
∗,t,∗ + Wβ
,
where z−j denotes the set of all z values except zj.
• Samples are then used to estimate model variables:
θtd =
nw,t,d + α
nw,∗,d + Tα
, φwt =
nw,t,∗ + β
n∗,t,∗ + Wβ
.
7
lda extensions
.
• There exist many LDA extensions:
• DiscLDA: LDA for classification with a class-dependent
transformation in the topic mixtures;
• Supervised LDA: documents with a response variable, we mine
topics that are indicative of the response;
• TagLDA: words have tags that mark context or linguistic features;
• Tag-LDA: documents have topical tags, the goal is to recommend
new tags to documents;
• Topics over Time: topics change their proportions with time;
• hierarchical modifications with nested topics are also important.
• In particular, there are extensions tailored for sentiment
analysis.
8
joint sentiment-topic
.
• JST: topics depend on
sentiments from a
document’s sentiment
distribution πd, words are
conditional on
sentiment-topic pairs.
• Generative process – for each
word position j:
(1) sample a sentiment label
lj ∼ Mult(πd);
(2) sample a topic
zj ∼ Mult(θd,lj
);
(3) sample a word
w ∼ Mult(φlj,zj
).
9
joint sentiment-topic
.
• In Gibbs sampling, one can marginalize out πd:
p(zj = t, lj = k ∣ z−j, w, α, β, γ, λ) ∝
n
¬j
∗,k,t,d + αtk
n
¬j
∗,k,∗,d + ∑t αtk
⋅
n
¬j
w,k,t,∗ + βkw
n
¬j
∗,k,t,∗ + ∑w βkw
⋅
n
¬j
∗,k,∗,d + γ
n
¬j
∗,∗,∗,d + Sγ
,
where nw,k,t,d is the number of words w generated with topic
t and sentiment label k in document d, αtk is the Dirichlet prior
for topic t with sentiment label k.
9
aspect and sentiment unification model
.
• ASUM: aspect-based analysis
+ sentiment for user reviews;
a review is broken down into
sentences, assuming that
each sentence speaks about
only one aspect.
• Basic model – Sentence LDA
(SLDA): for each review d with
topic distribution θd, for each
sentence in d,
(1) choose its sentiment label
ls ∼ Mult(πd),
(2) choose topic
ts ∼ Mult(θdls )
conditional on the
sentiment label ls,
(3) generate words
w ∼ Mult(φlsts ).
10
gibbs sampling for asum
.
• Denoting by sk,t,d the number of sentences (rather than words)
assigned with topic t and sentiment label t in document d:
p(zj = t, lj = k ∣ l−j, z−j, w, γ, α, β) ∝
s
¬j
k,t,d + αt
s
¬j
k,∗,d + ∑t αt
⋅
s
¬j
k,∗,d + γk
s
¬j
∗,∗,d + ∑k′ γk′
×
×
Γ (n
¬j
∗,k,t,∗ + ∑w βkw)
Γ (n
¬j
∗,k,t,∗ + ∑w βkw + W∗,j)
∏
w
Γ (n
¬j
w,k,t,∗ + βkw + Ww,j)
Γ (n
¬j
w,k,t,∗ + βkw)
,
where Ww,j is the number of words w in sentence j.
• There are other models and extensions (USTM).
11
learning sentiment priors
.
idea
.
• All of the models above assume that we have prior sentiment
information from an external vocabulary:
• in JST and Reverse-JST, word-sentiment priors λ are drawn from an
external dictionary and incorporated into β priors; βkw = β if
word w can have sentiment label k and βkw = 0 otherwise;
• in ASUM, prior sentiment information is also encoded in the β
prior, making βkw asymmetric similar to JST;
• the same holds for other extensions such as USTM.
13
idea
.
• Dictionaries of sentiment words do exist.
• But they are often incomplete; for instance, we wanted to apply
it to Russian where there are few such dictionaries.
• It would be great to extend topic models for sentiment analysis
to train sentiment for new words automatically!
• We can assume access to a small seed vocabulary with
predefined sentiment, but the goal is to extend it to new words
and learn their sentiment from the model.
13
idea
.
• In all of these models, word sentiments are input as different β
priors for sentiment labels.
• If only we could train these priors automatically...
14
idea
.
• In all of these models, word sentiments are input as different β
priors for sentiment labels.
• If only we could train these priors automatically...
• ...and we can do it with EM!
GeneralEMScheme
1: while inference has not converged do
2: for N steps do M-step
3: run one Gibbs sampling update step
4: update βkw priors E-step
14
em to train β
.
• This scheme works for every LDA extension considered above.
• At the E-step, we update βkw ∝ nw,k,∗,∗, and we can choose
the normalization coefficient ourselves, so we start with high
variance and then gradually refine βkw estimates in simulated
annealing:
βkw =
1
τ
nw,k,∗,∗,
where τ is a regularization coefficient (temperature) that starts
large (high variance) and then decrease (lower variance).
15
em to train β
.
• Thus, the final algorithm is as follows:
• start with some initial approximation to βw
s (from a small seed
dictionary and maybe some simpler learning method used for
initialization and then smoothed);
• then, iteratively,
• at the E-step of iteration i, update βkw as βkw = 1
τ(i)
nw,k,∗,∗
with, e.g., τ(i) = max(1, 200/i);
• at the M-step, perform several iterations of Gibbs sampling for the
corresponding model with fixed values of βkw.
15
word embeddings
.
• Earlier (MICAI 2015), we have shown that this approach leads to
improved results in terms of sentiment prediction quality.
• In this work, we use improved sentiment-topic models to learn
new aspect-based sentiment dictionaries.
• To do so, we used distributed word representations (word
embeddings).
16
word embeddings
.
• Distributed word representations map each word occurring in
the dictionary to a Euclidean space, attempting to capture
semantic relationships between the words as geometric
relationships in the Euclidean space.
• Started back in (Bengio et al., 2003), exploded after the works of
Bengio et al. and Mikolov et al. (2009–2011), now used
everywhere; we use embeddings trained on a very large Russian
dataset (thanks to Nikolay Arefyev and Alexander Panchenko!).
CBOW skip-gram
16
how to extend lexicons
.
• Intuition: words similar in some aspects of their meaning, e.g.,
sentiment, will be expected to be close in the semantic
Euclidean space.
• To expand the top words of resulting topics:
• extract word vectors for all top words from the distribution φ in
topics and all words in available general-purpose sentiment
lexicons;
• for every top word in the topics, construct a list of its nearest
neighbors according to the cosine similarity measure in the R500
space among the sentiment words from the lexicons (20
neighbors is almost always enough).
• We have experimented with other similarity metrics (L1, L2,
variations on L∞) with either worse or very similar results.
17
experiments
.
dataset
.
• Dataset with Russian language reviews on restaurants released
for the SentiRuEval-2015 task (Loukachevitch et al., 2015).
• In total, 17,132 unlabeled reviews were used to train the
Reverse-JST model.
• Preprocessing natural for topic modeling: remove stopwords
and punctuation, convert to lowercase, normalize the text with
Mystem, remove too rare words (< 3 occurrences).
• For initial β priors, we used a manually constructed sentiment
lexicon.
19
sample topics
.
# sent. sentiment words
1
neu соус [sauce], салат [salad], кусочек [slice], сыр [cheese], тарелка [plate], овощ
[vegetable], масло [oil], лук [onions], перец [pepper]
pos приятный [pleasant], атмосфера [atmosphere], уютный [cozy], вечер
[evening], музыка [music], ужин [dinner], романтический [romantic]
neg ресторан [restaurant], официант [waiter], внимание [attention], сервис [ser-
vice], обращать [to notice], обслуживать [to serve], уровень [level]
2
neu столик [table], заказывать [to order], вечер [evening], стол [table], приходить
[to come], место [place], заранее [in advance], встречать [to meet]
pos место [place], хороший [good], вкус [taste], самый [most], приятный [pleas-
ant], вполне [quite], отличный [excellent], интересный [interesting]
neg еда [food], вообще [in general], никакой [none], заказывать [to order],
оказываться [appear], вкус [taste], ужасный [awful], ничто [nothing]
3
neu девушка [girl], спрашивать [to ask], вопрос [question], подходить [to come],
официантка [waitress], официант [waiter], говорить [to speak]
pos большой [big], место [place], выбор [choice], хороший [good], блюдо [dish],
цена [price], порция [portion], небольшой [small], плюс [plus]
neg цена [price], обслуживание [service], качество [quality], уровень [level],
кухня [kitten], средний [average], ценник [price tag], высоко [high]
20
mining aspects
.
• The resulting aspect-based lexicons contain 726 topical aspects
commonly divided into three types:
(1) explicit aspects that denote parts of a product (e.g., сотрудник
[worker], баранина [lamb], овощ [vegetable], мексиканский
[mexican]);
(2) implicit aspects that refer indirectly to a product (e.g., чисто
[clean], ароматный [aromatic], сытно [filling], шумно [noisy]);
(3) narrative words which related to major topics in the text and
indirectly refer to sentiment polarity of the text (e.g., пересолить
[to oversalt], пожелать [to wish], почувствовать [to sense],
отсутствовать [be missing]).
• Next we applied the mined aspects to sentiment classification
to see whether there is an improvement.
21
sentiment classification
.
• Classifier from (Ivanov, Tutubalina et al., 2015) based on a
max-entropy model.
• It uses term frequency features in the context of an aspect term
and lexicon-based features.
• Specifically, the following features from an aspect’s context
window of 4 words:
(1) lowercased character n-grams with document frequency greater
than two;
(2) lexicon-based unigrams and context unigrams and bigrams;
(3) aspect-based bigrams as a combination of the aspect terms itself
and words;
(4) lexicon-based features: the maximal sentiment score, the
minimum sentiment score, the total and averaged sums of the
words’ sentiment scores.
22
sentiment classification
.
• We compare classifiers with lexicon-based features:
(1) computed on a manually constructed general-purpose lexicon
(baseline classifier),
(2) computed on a general-purpose lexicon for all words and
aspect-based lexicons for individual aspects.
• We evaluated three different versions of sentiment scores:
(1) scoresDict: take sentiment score from the manually created
lexicon if the word occurs in the lexicon with a positive or negative
label; otherwise, set the score to 0;
(2) scoresMult: set the sentiment score of a word as a product of the
dictionary score and the similarity;
(3) scoresCos: set the sentiment score to cosine similarity score if
similarity between the word in question and хороший [good] is
higher than similarity with плохой [bad]; otherwise, shift
sentiment score towards the opposite polarity.
22
classification results
.
Max-Entropy Classifier
micro-averaged macro-averaged
P R F1 P R F1
baseline - Lexicon1 0.595 0.344 0.436 0.738 0.649 0.676
scoresDict 0.592 0.344 0.436 0.737 0.649 0.676
scoresMult 0.600 0.351 0.442 0.740 0.653 0.680
scoresCos 0.610 0.372 0.462 0.748 0.663 0.691
baseline - Lexicon2 0.572 0.341 0.427 0.727 0.646 0.671
scoresDict 0.568 0.345 0.430 0.725 0.647 0.672
scoresMult 0.556 0.338 0.420 0.719 0.643 0.667
scoresCos 0.566 0.368 0.447 0.725 0.657 0.680
baseline - Lex1 + Lex2 0.594 0.348 0.439 0.738 0.651 0.679
scoresDict 0.595 0.376 0.461 0.741 0.663 0.689
scoresMult 0.590 0.372 0.457 0.738 0.661 0.687
scoresCos 0.602 0.376 0.463 0.744 0.664 0.690
23
sample aspect-related sentiment words
.
aspect sentiment words
баранина
[lamb]
вкусный [tasty], сытный [filling], аппетитный [delicious], душистый
[sweet smelling], деликатесный [speciality], сладкий [sweet]
караоке
[karaoke]
музыкальный [musical], попсовый [pop], классно [awesome],
развлекательный [entertaining], улетный [mind-blowing]
пирог [pie] вкусный [tasty], аппетитный [delicious],обсыпной [bulk ], сытный [fill-
ing], черствый [stale], ароматный [aromatic], сладкий [sweet]
ресторан
[restaurant]
шикарный [upscale], фешенебельный [fashionable], уютный [cozy],
люкс [luxe], роскошный [luxurious], недорогой [affordable]
вывеска [sign] обветшалый [decayed], выцветший [faded], аляповатый [flashy],
фешенебельный [fashionable], фанерный [veneer]
администратор
[manager]
люкс [luxe], неисполнительный [careless], ответственный [responsible],
компетентный [competent], толстяк [fatty]
интерьер
[interior]
уют [comfort], уютный[cozy], стильный [stylish], просторный [spacious],
помпезный [magnific], роскошный [luxurious], шикарный [upscale]
вежливый
[delicate]
вежливый [delicate], учтивый[polite], обходительный [affable],
доброжелательный [good-minded], тактичный [diplomatic]
24
conclusion
.
• We have presented a method for automatically extracting
aspect-based sentiment lexicons based on an extension of
sentiment-related topic models augmented with similarity
search based on distributed word representations.
• We extract important new sentiment words for aspect-specific
lexicons and show improvements in sentiment classification on
standard benchmarks.
• Future work:
• can we train a more informative relation between sentiment priors
and distributed word representations?
• maybe distributed word representations can be fed directly into
the priors?
25
thank you!
.
Thank you for your attention!
26

More Related Content

What's hot (15)

PDF
Discrete Structure Lecture #5 & 6.pdf
MuhammadUmerIhtisham
 
PPT
Inductive definitions
lorcap
 
PPTX
Theory of Computation "Chapter 1, introduction"
Ra'Fat Al-Msie'deen
 
PPTX
Jarrar: Probabilistic Language Modeling - Introduction to N-grams
Mustafa Jarrar
 
PDF
AI Lesson 11
Assistant Professor
 
PPTX
Jarrar: Introduction to logic and Logic Agents
Mustafa Jarrar
 
PPTX
theory of computation lecture 02
8threspecter
 
PDF
Meta back translation
HyunKyu Jeon
 
PDF
Wei Yang - 2014 - Consistent Improvement in Translation Quality of Chinese–Ja...
Association for Computational Linguistics
 
PDF
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
NAIST Machine Translation Study Group
 
PPTX
Theory of automata and formal language
Rabia Khalid
 
PPTX
Fuzzy logic and application in AI
Ildar Nurgaliev
 
PPTX
Fuzzy logic
KristantoMath
 
PPTX
Jarrar: Description Logic
Mustafa Jarrar
 
PPTX
Jarrar: First Order Logic- Inference Methods
Mustafa Jarrar
 
Discrete Structure Lecture #5 & 6.pdf
MuhammadUmerIhtisham
 
Inductive definitions
lorcap
 
Theory of Computation "Chapter 1, introduction"
Ra'Fat Al-Msie'deen
 
Jarrar: Probabilistic Language Modeling - Introduction to N-grams
Mustafa Jarrar
 
AI Lesson 11
Assistant Professor
 
Jarrar: Introduction to logic and Logic Agents
Mustafa Jarrar
 
theory of computation lecture 02
8threspecter
 
Meta back translation
HyunKyu Jeon
 
Wei Yang - 2014 - Consistent Improvement in Translation Quality of Chinese–Ja...
Association for Computational Linguistics
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
NAIST Machine Translation Study Group
 
Theory of automata and formal language
Rabia Khalid
 
Fuzzy logic and application in AI
Ildar Nurgaliev
 
Fuzzy logic
KristantoMath
 
Jarrar: Description Logic
Mustafa Jarrar
 
Jarrar: First Order Logic- Inference Methods
Mustafa Jarrar
 

Similar to Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment Lexicons with Topic Modeling (20)

PDF
O01741103108
IOSR Journals
 
PDF
IRJET- Cross-Domain Sentiment Encoding through Stochastic Word Embedding
IRJET Journal
 
PPTX
Major presentation
PS241092
 
PPTX
Aman chaudhary
AMANCHAUDHARY130
 
PDF
A Survey On Sentiment Analysis And Opinion Mining Techniques
Sabrina Green
 
PDF
A Survey on Sentiment Analysis and Opinion Mining.pdf
Mandy Brown
 
PPTX
Continuous Sentiment Intensity Prediction based on Deep Learning
Yunchao He
 
PPT
Fypca4
Haha Teh
 
PDF
A Survey on Sentiment Mining Techniques
Khan Mostafa
 
PDF
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
mathsjournal
 
PDF
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
mlaij
 
PDF
Neural Network Based Context Sensitive Sentiment Analysis
Editor IJCATR
 
PDF
UWB semeval2016-task5
Lukáš Svoboda
 
PDF
A Review Paper on Analytic System Based on Prediction Analysis of Social Emot...
ijtsrd
 
PDF
Enhanced sentiment analysis based on improved word embeddings and XGboost
IJECEIAES
 
PDF
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET Journal
 
PDF
UNED Online Reputation Monitoring Team at RepLab 2013
Damiano Spina
 
PDF
IRJET- Analyzing Sentiments in One Go
IRJET Journal
 
PPTX
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Geetika Gautam
 
O01741103108
IOSR Journals
 
IRJET- Cross-Domain Sentiment Encoding through Stochastic Word Embedding
IRJET Journal
 
Major presentation
PS241092
 
Aman chaudhary
AMANCHAUDHARY130
 
A Survey On Sentiment Analysis And Opinion Mining Techniques
Sabrina Green
 
A Survey on Sentiment Analysis and Opinion Mining.pdf
Mandy Brown
 
Continuous Sentiment Intensity Prediction based on Deep Learning
Yunchao He
 
Fypca4
Haha Teh
 
A Survey on Sentiment Mining Techniques
Khan Mostafa
 
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
mathsjournal
 
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
mlaij
 
Neural Network Based Context Sensitive Sentiment Analysis
Editor IJCATR
 
UWB semeval2016-task5
Lukáš Svoboda
 
A Review Paper on Analytic System Based on Prediction Analysis of Social Emot...
ijtsrd
 
Enhanced sentiment analysis based on improved word embeddings and XGboost
IJECEIAES
 
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET Journal
 
UNED Online Reputation Monitoring Team at RepLab 2013
Damiano Spina
 
IRJET- Analyzing Sentiments in One Go
IRJET Journal
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Geetika Gautam
 
Ad

More from AIST (20)

PDF
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
AIST
 
PDF
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
AIST
 
PDF
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
AIST
 
PDF
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
AIST
 
PDF
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
AIST
 
PDF
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
AIST
 
PDF
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
AIST
 
PPTX
Иосиф Иткин, Exactpro - TBA
AIST
 
PPTX
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
AIST
 
PDF
George Moiseev - Classification of E-commerce Websites by Product Categories
AIST
 
PDF
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
AIST
 
PDF
Marina Danshina - The methodology of automated decryption of znamenny chants
AIST
 
PDF
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
AIST
 
PPTX
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
AIST
 
PDF
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
AIST
 
PDF
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
AIST
 
PPTX
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
AIST
 
PPTX
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
AIST
 
PDF
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
AIST
 
PPTX
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
AIST
 
Alexey Mikhaylichenko - Automatic Detection of Bone Contours in X-Ray Images
AIST
 
Алена Ильина и Иван Бибилов, GoTo - GoTo школы, конкурсы и хакатоны
AIST
 
Станислав Кралин, Сайтсофт - Связанные открытые данные федеральных органов ис...
AIST
 
Павел Браславский,Velpas - Velpas: мобильный визуальный поиск
AIST
 
Евгений Цымбалов, Webgames - Методы машинного обучения для задач игровой анал...
AIST
 
Александр Москвичев, EveResearch - Алгоритмы анализа данных в маркетинговых и...
AIST
 
Петр Ермаков, HeadHunter - Модерация резюме: от людей к роботам. Машинное обу...
AIST
 
Иосиф Иткин, Exactpro - TBA
AIST
 
Nikolay Karpov - Evolvable Semantic Platform for Facilitating Knowledge Exchange
AIST
 
George Moiseev - Classification of E-commerce Websites by Product Categories
AIST
 
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
AIST
 
Marina Danshina - The methodology of automated decryption of znamenny chants
AIST
 
Edward Klyshinsky - The Corpus of Syntactic Co-occurences: the First Glance
AIST
 
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
AIST
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
AIST
 
Kaytoue Mehdi - Finding duplicate labels in behavioral data: an application f...
AIST
 
Valeri Labunets - The bichromatic excitable Schrodinger metamedium
AIST
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
AIST
 
Alexander Karkishchenko - Threefold Symmetry Detection in Hexagonal Images Ba...
AIST
 
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
AIST
 
Ad

Recently uploaded (20)

PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
DOCX
🧩 1. Solvent R-WPS Office work scientific
NohaSalah45
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PDF
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PDF
Group 5_RMB Final Project on circular economy
pgban24anmola
 
PDF
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
PDF
5991-5857_Agilent_MS_Theory_EN (1).pdf. pdf
NohaSalah45
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
DOCX
INDUSTRIAL BENEFIT FROM MICROSOFT AZURE.docx
writercontent500
 
PPTX
Krezentios memories in college data.pptx
notknown9
 
PPTX
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PPTX
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
🧩 1. Solvent R-WPS Office work scientific
NohaSalah45
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
IT GOVERNANCE 4-2 - Information System Security (1).pdf
mdirfanuddin1322
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
Group 5_RMB Final Project on circular economy
pgban24anmola
 
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
5991-5857_Agilent_MS_Theory_EN (1).pdf. pdf
NohaSalah45
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
BinarySearchTree in datastructures in detail
kichokuttu
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
INDUSTRIAL BENEFIT FROM MICROSOFT AZURE.docx
writercontent500
 
Krezentios memories in college data.pptx
notknown9
 
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
 
Research Methodology Overview Introduction
ayeshagul29594
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 

Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment Lexicons with Topic Modeling

  • 1. constructing aspect-based sentiment lexicons with topic modeling . Elena Tutubalina1 and Sergey I. Nikolenko1,2,3,4 1 Kazan (Volga Region) Federal University, Kazan, Russia 2 Steklov Institute of Mathematics at St. Petersburg 3 National Research University Higher School of Economics, St. Petersburg 4 Deloitte Analytics Institute, Moscow, Russia April 7, 2016
  • 2. intro: topic modeling and sentiment analysis .
  • 3. overview . • Very brief overview of the paper: • we would like to do sentiment analysis; • there are topic model extensions that deal with sentiment; • but they always rely on an external dictionary of sentiment words; • in this work, we show a way to extend this dictionary automatically from that same topic model. 3
  • 4. opinion mining . • Sentiment analysis / opinion mining: • traditional approaches set positive/negative labels by hand; • recently, machine learning models are trained to assign sentiment scores for most words in the corpora; • however, they can’t really work totally unsupervised, and high-quality manual annotation is expensive; • moreover, there are different aspects. • Problem: automatically mine sentiment lexicons for specific aspects. 4
  • 5. topic modeling with lda . • Latent Dirichlet Allocation (LDA) – topic modeling for a corpus of texts: • a document is represented as a mixture of topics; • a topic is a distribution over words; • to generate a document, for each word we sample a topic and then sample a word from that topic; • by learning these distributions, we learn what topics appear in a dataset and in which documents. 5
  • 6. topic modeling with lda . • Sample LDA result from (Blei, 2012): 5
  • 7. topic modeling with lda . • Sample LDA result from (Blei, 2012): 5
  • 8. topic modeling with lda . • There are two major approaches to inference in probabilistic models with a loopy factor graph like LDA: • variational approximations simplify the graph by approximating the underlying distribution with a simpler one, but with new parameters that are subject to optimization; • Gibbs sampling approaches the underlying distribution by sampling a subset of variables conditional on fixed values of all other variables. • Both approaches have been applied to LDA. • We will extend the Gibbs sampling. 5
  • 9. lda likelihood . • The total likelihood of the LDA model is p(z, w, α, β) = ∫θ,φ p(θ ∣ α)p(z ∣ θ)p(w ∣ z, φ)p(φ ∣ β)dθdφ. 6
  • 10. gibbs sampling . • And in collapsed Gibbs sampling, we sample p(zj = t ∣ z−j, w, α, β) ∝ n ¬j ∗,t,d + α n ¬j ∗,∗,d + Tα ⋅ n ¬j w,t,∗ + α n ¬j ∗,t,∗ + Wβ , where z−j denotes the set of all z values except zj. • Samples are then used to estimate model variables: θtd = nw,t,d + α nw,∗,d + Tα , φwt = nw,t,∗ + β n∗,t,∗ + Wβ . 7
  • 11. lda extensions . • There exist many LDA extensions: • DiscLDA: LDA for classification with a class-dependent transformation in the topic mixtures; • Supervised LDA: documents with a response variable, we mine topics that are indicative of the response; • TagLDA: words have tags that mark context or linguistic features; • Tag-LDA: documents have topical tags, the goal is to recommend new tags to documents; • Topics over Time: topics change their proportions with time; • hierarchical modifications with nested topics are also important. • In particular, there are extensions tailored for sentiment analysis. 8
  • 12. joint sentiment-topic . • JST: topics depend on sentiments from a document’s sentiment distribution πd, words are conditional on sentiment-topic pairs. • Generative process – for each word position j: (1) sample a sentiment label lj ∼ Mult(πd); (2) sample a topic zj ∼ Mult(θd,lj ); (3) sample a word w ∼ Mult(φlj,zj ). 9
  • 13. joint sentiment-topic . • In Gibbs sampling, one can marginalize out πd: p(zj = t, lj = k ∣ z−j, w, α, β, γ, λ) ∝ n ¬j ∗,k,t,d + αtk n ¬j ∗,k,∗,d + ∑t αtk ⋅ n ¬j w,k,t,∗ + βkw n ¬j ∗,k,t,∗ + ∑w βkw ⋅ n ¬j ∗,k,∗,d + γ n ¬j ∗,∗,∗,d + Sγ , where nw,k,t,d is the number of words w generated with topic t and sentiment label k in document d, αtk is the Dirichlet prior for topic t with sentiment label k. 9
  • 14. aspect and sentiment unification model . • ASUM: aspect-based analysis + sentiment for user reviews; a review is broken down into sentences, assuming that each sentence speaks about only one aspect. • Basic model – Sentence LDA (SLDA): for each review d with topic distribution θd, for each sentence in d, (1) choose its sentiment label ls ∼ Mult(πd), (2) choose topic ts ∼ Mult(θdls ) conditional on the sentiment label ls, (3) generate words w ∼ Mult(φlsts ). 10
  • 15. gibbs sampling for asum . • Denoting by sk,t,d the number of sentences (rather than words) assigned with topic t and sentiment label t in document d: p(zj = t, lj = k ∣ l−j, z−j, w, γ, α, β) ∝ s ¬j k,t,d + αt s ¬j k,∗,d + ∑t αt ⋅ s ¬j k,∗,d + γk s ¬j ∗,∗,d + ∑k′ γk′ × × Γ (n ¬j ∗,k,t,∗ + ∑w βkw) Γ (n ¬j ∗,k,t,∗ + ∑w βkw + W∗,j) ∏ w Γ (n ¬j w,k,t,∗ + βkw + Ww,j) Γ (n ¬j w,k,t,∗ + βkw) , where Ww,j is the number of words w in sentence j. • There are other models and extensions (USTM). 11
  • 17. idea . • All of the models above assume that we have prior sentiment information from an external vocabulary: • in JST and Reverse-JST, word-sentiment priors λ are drawn from an external dictionary and incorporated into β priors; βkw = β if word w can have sentiment label k and βkw = 0 otherwise; • in ASUM, prior sentiment information is also encoded in the β prior, making βkw asymmetric similar to JST; • the same holds for other extensions such as USTM. 13
  • 18. idea . • Dictionaries of sentiment words do exist. • But they are often incomplete; for instance, we wanted to apply it to Russian where there are few such dictionaries. • It would be great to extend topic models for sentiment analysis to train sentiment for new words automatically! • We can assume access to a small seed vocabulary with predefined sentiment, but the goal is to extend it to new words and learn their sentiment from the model. 13
  • 19. idea . • In all of these models, word sentiments are input as different β priors for sentiment labels. • If only we could train these priors automatically... 14
  • 20. idea . • In all of these models, word sentiments are input as different β priors for sentiment labels. • If only we could train these priors automatically... • ...and we can do it with EM! GeneralEMScheme 1: while inference has not converged do 2: for N steps do M-step 3: run one Gibbs sampling update step 4: update βkw priors E-step 14
  • 21. em to train β . • This scheme works for every LDA extension considered above. • At the E-step, we update βkw ∝ nw,k,∗,∗, and we can choose the normalization coefficient ourselves, so we start with high variance and then gradually refine βkw estimates in simulated annealing: βkw = 1 τ nw,k,∗,∗, where τ is a regularization coefficient (temperature) that starts large (high variance) and then decrease (lower variance). 15
  • 22. em to train β . • Thus, the final algorithm is as follows: • start with some initial approximation to βw s (from a small seed dictionary and maybe some simpler learning method used for initialization and then smoothed); • then, iteratively, • at the E-step of iteration i, update βkw as βkw = 1 τ(i) nw,k,∗,∗ with, e.g., τ(i) = max(1, 200/i); • at the M-step, perform several iterations of Gibbs sampling for the corresponding model with fixed values of βkw. 15
  • 23. word embeddings . • Earlier (MICAI 2015), we have shown that this approach leads to improved results in terms of sentiment prediction quality. • In this work, we use improved sentiment-topic models to learn new aspect-based sentiment dictionaries. • To do so, we used distributed word representations (word embeddings). 16
  • 24. word embeddings . • Distributed word representations map each word occurring in the dictionary to a Euclidean space, attempting to capture semantic relationships between the words as geometric relationships in the Euclidean space. • Started back in (Bengio et al., 2003), exploded after the works of Bengio et al. and Mikolov et al. (2009–2011), now used everywhere; we use embeddings trained on a very large Russian dataset (thanks to Nikolay Arefyev and Alexander Panchenko!). CBOW skip-gram 16
  • 25. how to extend lexicons . • Intuition: words similar in some aspects of their meaning, e.g., sentiment, will be expected to be close in the semantic Euclidean space. • To expand the top words of resulting topics: • extract word vectors for all top words from the distribution φ in topics and all words in available general-purpose sentiment lexicons; • for every top word in the topics, construct a list of its nearest neighbors according to the cosine similarity measure in the R500 space among the sentiment words from the lexicons (20 neighbors is almost always enough). • We have experimented with other similarity metrics (L1, L2, variations on L∞) with either worse or very similar results. 17
  • 27. dataset . • Dataset with Russian language reviews on restaurants released for the SentiRuEval-2015 task (Loukachevitch et al., 2015). • In total, 17,132 unlabeled reviews were used to train the Reverse-JST model. • Preprocessing natural for topic modeling: remove stopwords and punctuation, convert to lowercase, normalize the text with Mystem, remove too rare words (< 3 occurrences). • For initial β priors, we used a manually constructed sentiment lexicon. 19
  • 28. sample topics . # sent. sentiment words 1 neu соус [sauce], салат [salad], кусочек [slice], сыр [cheese], тарелка [plate], овощ [vegetable], масло [oil], лук [onions], перец [pepper] pos приятный [pleasant], атмосфера [atmosphere], уютный [cozy], вечер [evening], музыка [music], ужин [dinner], романтический [romantic] neg ресторан [restaurant], официант [waiter], внимание [attention], сервис [ser- vice], обращать [to notice], обслуживать [to serve], уровень [level] 2 neu столик [table], заказывать [to order], вечер [evening], стол [table], приходить [to come], место [place], заранее [in advance], встречать [to meet] pos место [place], хороший [good], вкус [taste], самый [most], приятный [pleas- ant], вполне [quite], отличный [excellent], интересный [interesting] neg еда [food], вообще [in general], никакой [none], заказывать [to order], оказываться [appear], вкус [taste], ужасный [awful], ничто [nothing] 3 neu девушка [girl], спрашивать [to ask], вопрос [question], подходить [to come], официантка [waitress], официант [waiter], говорить [to speak] pos большой [big], место [place], выбор [choice], хороший [good], блюдо [dish], цена [price], порция [portion], небольшой [small], плюс [plus] neg цена [price], обслуживание [service], качество [quality], уровень [level], кухня [kitten], средний [average], ценник [price tag], высоко [high] 20
  • 29. mining aspects . • The resulting aspect-based lexicons contain 726 topical aspects commonly divided into three types: (1) explicit aspects that denote parts of a product (e.g., сотрудник [worker], баранина [lamb], овощ [vegetable], мексиканский [mexican]); (2) implicit aspects that refer indirectly to a product (e.g., чисто [clean], ароматный [aromatic], сытно [filling], шумно [noisy]); (3) narrative words which related to major topics in the text and indirectly refer to sentiment polarity of the text (e.g., пересолить [to oversalt], пожелать [to wish], почувствовать [to sense], отсутствовать [be missing]). • Next we applied the mined aspects to sentiment classification to see whether there is an improvement. 21
  • 30. sentiment classification . • Classifier from (Ivanov, Tutubalina et al., 2015) based on a max-entropy model. • It uses term frequency features in the context of an aspect term and lexicon-based features. • Specifically, the following features from an aspect’s context window of 4 words: (1) lowercased character n-grams with document frequency greater than two; (2) lexicon-based unigrams and context unigrams and bigrams; (3) aspect-based bigrams as a combination of the aspect terms itself and words; (4) lexicon-based features: the maximal sentiment score, the minimum sentiment score, the total and averaged sums of the words’ sentiment scores. 22
  • 31. sentiment classification . • We compare classifiers with lexicon-based features: (1) computed on a manually constructed general-purpose lexicon (baseline classifier), (2) computed on a general-purpose lexicon for all words and aspect-based lexicons for individual aspects. • We evaluated three different versions of sentiment scores: (1) scoresDict: take sentiment score from the manually created lexicon if the word occurs in the lexicon with a positive or negative label; otherwise, set the score to 0; (2) scoresMult: set the sentiment score of a word as a product of the dictionary score and the similarity; (3) scoresCos: set the sentiment score to cosine similarity score if similarity between the word in question and хороший [good] is higher than similarity with плохой [bad]; otherwise, shift sentiment score towards the opposite polarity. 22
  • 32. classification results . Max-Entropy Classifier micro-averaged macro-averaged P R F1 P R F1 baseline - Lexicon1 0.595 0.344 0.436 0.738 0.649 0.676 scoresDict 0.592 0.344 0.436 0.737 0.649 0.676 scoresMult 0.600 0.351 0.442 0.740 0.653 0.680 scoresCos 0.610 0.372 0.462 0.748 0.663 0.691 baseline - Lexicon2 0.572 0.341 0.427 0.727 0.646 0.671 scoresDict 0.568 0.345 0.430 0.725 0.647 0.672 scoresMult 0.556 0.338 0.420 0.719 0.643 0.667 scoresCos 0.566 0.368 0.447 0.725 0.657 0.680 baseline - Lex1 + Lex2 0.594 0.348 0.439 0.738 0.651 0.679 scoresDict 0.595 0.376 0.461 0.741 0.663 0.689 scoresMult 0.590 0.372 0.457 0.738 0.661 0.687 scoresCos 0.602 0.376 0.463 0.744 0.664 0.690 23
  • 33. sample aspect-related sentiment words . aspect sentiment words баранина [lamb] вкусный [tasty], сытный [filling], аппетитный [delicious], душистый [sweet smelling], деликатесный [speciality], сладкий [sweet] караоке [karaoke] музыкальный [musical], попсовый [pop], классно [awesome], развлекательный [entertaining], улетный [mind-blowing] пирог [pie] вкусный [tasty], аппетитный [delicious],обсыпной [bulk ], сытный [fill- ing], черствый [stale], ароматный [aromatic], сладкий [sweet] ресторан [restaurant] шикарный [upscale], фешенебельный [fashionable], уютный [cozy], люкс [luxe], роскошный [luxurious], недорогой [affordable] вывеска [sign] обветшалый [decayed], выцветший [faded], аляповатый [flashy], фешенебельный [fashionable], фанерный [veneer] администратор [manager] люкс [luxe], неисполнительный [careless], ответственный [responsible], компетентный [competent], толстяк [fatty] интерьер [interior] уют [comfort], уютный[cozy], стильный [stylish], просторный [spacious], помпезный [magnific], роскошный [luxurious], шикарный [upscale] вежливый [delicate] вежливый [delicate], учтивый[polite], обходительный [affable], доброжелательный [good-minded], тактичный [diplomatic] 24
  • 34. conclusion . • We have presented a method for automatically extracting aspect-based sentiment lexicons based on an extension of sentiment-related topic models augmented with similarity search based on distributed word representations. • We extract important new sentiment words for aspect-specific lexicons and show improvements in sentiment classification on standard benchmarks. • Future work: • can we train a more informative relation between sentiment priors and distributed word representations? • maybe distributed word representations can be fed directly into the priors? 25
  • 35. thank you! . Thank you for your attention! 26