SlideShare a Scribd company logo
Concurrent Inference of Topic Models and
Distributed Vector Representations
Debakar Shamanta1
, Sheikh Motahar Naim1
, Parang Saraf2
,
Naren Ramakrishnan2
, and M. Shahriar Hossain1
1
Dept of CS, University of Texas at El Paso, El Paso, TX 79968
2
Dept of CS, Virginia Tech, Arlington, VA 22203
dshamanta@miners.utep.edu,snaim@miners.utep.edu,
parang@cs.vt.edu,naren@cs.vt.edu,mhossain@utep.edu
Abstract. Topic modeling techniques have been widely used to uncover
dominant themes hidden inside an unstructured document collection.
Though these techniques first originated in the probabilistic analysis of
word distributions, many deep learning approaches have been adopted
recently. In this paper, we propose a novel neural network based architec-
ture that produces distributed representation of topics to capture topical
themes in a dataset. Unlike many state-of-the-art techniques for generat-
ing distributed representation of words and documents that directly use
neighboring words for training, we leverage the outcome of a sophisti-
cated deep neural network to estimate the topic labels of each document.
The networks, for topic modeling and generation of distributed represen-
tations, are trained concurrently in a cascaded style with better runtime
without sacrificing the quality of the topics. Empirical studies reported
in the paper show that the distributed representations of topics repre-
sent intuitive themes using smaller dimensions than conventional topic
modeling approaches.
Keywords: Topic Modeling, Distributed Representation
1 Introduction
The representation of textual datasets in vector space has been a long-standing
central issue in data mining with a veritable cottage industry devoted to repre-
senting domain-specific information. Most representations consider features as
localized chunks as a result of which the interpretation of the features might lack
generalizability. Researchers have recently become interested in distributed rep-
resentations [12, 8, 14, 19] because distributed representations generalize features
based on the facts captured from the entire dataset rather than one single object
or a small group of objects. Moreover, modern large and unstructured datasets
involve too many heterogeneous entries for which local subspaces cannot capture
relationships between the features. For example, publication datasets nowadays
come with a substantial number of features like author information, scientific
area, and keywords along with the actual text for each document. News article
datasets have author information, time stamp data, category, and sometimes
tweets and comments posted against the articles. Movie clips are accompanied
2 Shamanta, Naim, Saraf, Ramakrishnan and Hossain
by synopsis, production information, rating, and text reviews. The focus of this
paper is on the design of a flexible mechanism that can generate multiple types of
features in the same space. We show that the proposed method is not only able to
generate feature vectors for labeled information available with the datasets but
also for discovered information that are not readily available with the dataset
as labels, for example, topics. Current state-of-the-art of distributed representa-
tions for unstructured text datasets can model two different types of elements
in the same hyperspace, as described by Le and Mikolov [16]. Le and Mikolov’s
framework generates distributed vectors of documents (or paragraphs) and words
in the same space using a deep neural network. Further generalization, that we
have described in this paper, can provide distributed representations for hetero-
geneous elements of a dataset in the same hyperspace. However, the problem
of creating distributed representations becomes more challenging when the label
information is not contained within the dataset. The focus of this paper is on the
generation of topical structures and their representations in the same space as
documents and words. The capability of representing topics, documents, words,
and other labeled information in the same space opens up the opportunity to
compute syntactic and semantic relationships between not only words but also
between topics and documents by directly by using simple vector algebra.
Estimating the topic labels for documents is another challenge while using
distributed representations. Earlier topic modeling techniques [9, 13] used to de-
fine a document as a mixture of topics and estimate the probability p(t|d) of a
topic (t) of a document (d) through probabilistic reasoning. More recently, topic
models are seen from a neural network point of view [26, 15, 6] where these prob-
abilities are generated from the hidden nodes of a network. Such neural networks
require compact numeric representations of words and documents for effective
training, which are not easy to estimate with traditional vector space based
document modeling techniques that represent the documents using a very high
dimensional space. There have been attempts to use the compact distributed
representations of words and documents learned from a general purpose large
dataset [6] but the precomputed vectors may not be always appropriate for many
new domain specific datasets. Furthermore, the vocabulary shifts in a new di-
rection over time resulting in changes in the distributed representations.
Specific contributions of this paper are as follows.
1. We formulate the problem of computing distributed representation of topics
in the same space as documents and words using a novel fusion of a neural
network based topic modeling and a distributed representation generation
technique.
2. The tasks of computing topics for documents and generating distributed rep-
resentations are simultaneous in the proposed method unlike closely related
state-of-the-art techniques where precomputed distributed vectors of words
are leveraged to compute topics. Additionally, none of the state-of-the-art
methods generates distributed representation of topics to the best of our
knowledge.
3. Our proposed method generates the distributed vectors using a smaller num-
ber of dimensions than the actual text feature space. Even if the space is
Concurrent Inference of Topic Models and Distributed Representations 3
of lower number of dimensions, the vectors capture syntactic and semantic
relationships between language components.
4. We demonstrate that the generated topic vectors explain domain specific
properties of datasets, help identify topical similarities, and exhibit topic-
specific relationships with document vectors.
2 Related Work
Distributed representations have been used in diverse fields of scientific research
with notable success due to their superiority in capturing generalized view of
information over local representations. Rumelhart et al. [22] designed a neural
network based approach for distributed representation of words which has been
followed by many efforts in language modeling. One such model is the neural
probabilistic model [2] proposed by Bengio et al. This framework uses a sliding
window based context of a word to generate compact representations. Mikolov
et al. [17] brings in continuous bag-of-words (CBOW) and skip-gram models to
compute continuous vector representations of words efficiently from very large
data sets. The skip-gram model was significantly improved in [18], which includes
phrase vectors along with words. Le and Mikolov [16] extended the CBOW
model to learn distributed representation of higher level texts like paragraphs
and documents. Our proposed model further enriches the literature by including
the capability to generate (1) vectors for arbitrary labels in the dataset and
(2) vectors for topics for which a text dataset does not contain any labeled
information.
Finding hidden themes in a document collection has been of great interest
to data mining and information retrieval researchers for more than two decades.
An earlier work in the literature is latent semantic indexing (LSI) [9] that maps
document and terms in a special “latent semantic” space by applying dimen-
sionality reduction on traditional bag-of-words vector space representations of
documents. A probabilistic version of LSI, pLSI [13], introduces a mixture model
where each document is represented by a mixing proportion of hidden “topics”.
Latent Dirichlet Allocation (LDA) [5], a somewhat generalized but more sophis-
ticated version of pLSI, is one of the most notable ones in the literature. It
provides a generative probabilistic approach for document modeling assuming a
random process by which the documents are created. LDA spawned a deluge of
work exploring different aspects of topic modeling. For example, the Dynamic
Topic Model (DTM) [4] captures the evolution of topics in a time-labeled corpus.
Online LDA (OLDA) [1] handles streams of documents with dynamic vocabulary,
Wallach [25] and Griffiths et al. [11] exploit the sentence structures of documents
and Correlated Topic Model (CTM) [3] captures the correlation between topics.
More recently, neural network based models have received great attention
from the data mining community. Wan [26] et al. introduce a hybrid model
in computer vision settings; DocNADE [15] provides an autoregressive neural
network for topic modeling; Cao et al. [6] propose a neural topic model (NTM)
with supervised extension. The latter work has close resemblance to a part of
our proposed model that focuses on generating topics for each document.
4 Shamanta, Naim, Saraf, Ramakrishnan and Hossain
Forward
propagation
Ξ"
g"
d"
lt""
ld"
ls"W1"
W2"
Topic Generation
Module
!!!!!!!!!!!!!!
!!
!!
!!
1"
k"
!!!!!!!!!!!!!!
!!!!!!!!!!!!!!
t+p"
t+p"
t"
!!
!!
!!
!!
!!
!!
!"#!
!"#!
!"#!
Distributed Vector
Generation
Update vectors
using BP
Doc vectors,
Topic vectors,!!!!!!!!!!!!!!
!!!!!!!!!!!!!!
Word vectors,!"#!
Fig. 1: The proposed framework.
3 Problem Formulation
Let D = {d1, d2, . . . , dN } be a text dataset containing N documents taking terms
from the set of M words W = {w1, w2, . . . , wM }. Each document can contain
an arbitrary number of words in any sequence. The objective is to generate a
universal distributed representation for the labeled items (e.g., words and docu-
ments) and latent topics of each document of dataset D. Let T = {t1, t2, . . . , tK}
be the set of topics. Consider that the expected number of dimensions in the
distributed representation of words, documents, and topics is L. L should be
much smaller than the number of words M. Word vectors W ∈ RM×L
, docu-
ment vectors, D ∈ RN×L
and topic vectors, T ∈ RK×L
generated in the same
L-dimensional space should maintain two specific properties: (1) distributed rep-
resentation of each type should be capable of capturing the semantic, syntactic,
and topical aspect of conventional language models, and (2) all types of vectors
(topics, documents, and words) organized in the L-dimensional hyperspace must
be comparable to each other.
The first property aligns the framework with the objectives of any language
model where features are generated for most common data mining tasks likes
clustering and classification. The second property, however, is unique and spe-
cific to relating vectors of different types of entities like topics, documents, and
vectors. In word2vec [17], the authors show that distributed representations of
word can retrieve linguistic similarities between pairs of words. For example,
WKing − WMan is close to WQueen − WWoman. The ability to model topics in the
same hyperspace extends this property by capturing similarity between relation-
ships among topics and documents. For example, if two documents di and dj are
drawn from the same topic tp then Tp −Di should be closer to Tp −Dj. Similarly,
if two documents di and dj are drawn from two different topics tp and tq, then
Tp − Di should tend to be different than Tp − Dj.
4 Methodology
The main objective of the proposed framework is to generate a compact dis-
tributed representation for topics, documents, and words of a document collec-
Concurrent Inference of Topic Models and Distributed Representations 5
tion in the same hyperspace in such a way that all these heterogeneous objects are
comparable to each other and capture the semantic, syntactic and thematic prop-
erties. The proposed framework has three major components. First, we adopt
a generic neural network that can generate distributed vectors for documents,
words, and any given labels. Second, we propose a deep neural network based
topic modeling that can take distributed representations of words and docu-
ments, and estimate topic distribution for each document. Finally, we convolute
both these networks so that they can share information and train simultane-
ously. Fig. 1 shows the proposed framework. The following subsections describe
the model in a sequence.
4.1 Distributed Representation of Heterogeneous Entities
Inferring a distributed representation W for the words of a document collection
D having vocabulary W is based on predicting a word given other words in the
same context. The objective of such a word representation model is to maximize
the average log probability
1
M
M−p
m=p
log p(wm|wm−p, . . . , wm+p) (1)
The individual probabilities in Equation 1 are estimated by training a multi-class
deep neural network, such as softmax. They can be computed as:
p(wm|wm−p, . . . , wm+p) =
eym
i eyi
(2)
where yi is the unnormalized log-probability for every output word wi.
yi = b + Uh(wm|wm−p, . . . , wm+p; W) (3)
Here, U and b are the softmax parameters. h is constructed by a concatenation
or average of relevant word vectors. We use hierarchical softmax [17] instead of
softmax for faster training, and calculate the gradient using stochastic gradient
descent. After the training converges, words with similar meaning are mapped to
a similar position in the vector space. To obtain a document vector, a document
is thought of as another word. The only change in the model is in Equation 3,
where h is constructed using W and D.
Algorithm 1: LearnDistRep – algorithm for learning topic vectors
input : Document id, d
Set of topics in d, Td
Word to predict, w
Context of w, Cw
parameter: Distributed representations D, W and T
1 Calculate y using Equation 4 ;
2 Calculate gradient gr using stochastic gradient descent ;
3 Update document vector Dd, topic vectors TTd and word vectors WCw using gr;
6 Shamanta, Naim, Saraf, Ramakrishnan and Hossain
Algorithm 2: LearnTopic – algorithm for learning topic distribution.
input : Document id, d
N-gram or context id, g
parameter: Distributed representations D, W and T
Weight matrices W1 and W2
output : Updated weight matrices
1 Calculate ls(g, d) using equations for lt and ld ;
2 Determine error in output node with respect to the ideal value:
δ(3)
= ls(g, d) − 1 ;
3 Compute the error in n-gram-topic hidden node:
δ
(2)
1 = (δ(3)
× ld(d)) · (lt(g) · (1 − lt(g))) ;
4 Update W2: W2 = W2 + α[δ
(2)
1 × Wg + λ × W2] ;
5 Compute error in the document-topic hidden node:
new ld(d) = ld(d) + α[δ(3)
× lt(g) + λ × ld(d)] ;
6 δ
(2)
2 = new ld(d) − ld(d) ;
7 Update W1: W1 = W1 + α[δ
(2)
2 × Dd + λ × W1] ;
Inclusion of further labels, for example, authors, topic, and tags can be done
the same way document vectors are added. Our focus in this paper is to in-
corporate topics instead of additional labels. Incorporation of topic vectors is
challenging because the topics are not given and rather should be generated us-
ing the documents and words. For the time being, let us assume that topic is just
a given label that comes with the data. In contrast to the word vector matrix W
that is shared across all the documents, a topic vector can be shared only across
the documents which contain that particular topic. Considering topic vectors
along with the vectors for words and documents, Equation 3 is modified to:
y = b + Uh(wt−k, . . . , wt+k, dq, tr1 , tr2 , . . . , trs ; W, D, T ) (4)
For the training purpose, we use sampling of variable-length contexts using a
sliding window over each document. Such a sliding window is commonly referred
to as n-gram. We use n-grams instead of single words (unigrams) since n-grams
produce representative contexts around each word [18]. A procedure for training
this generic network for topic, documents, and words is explained in Algorithm
1.
4.2 Estimating Topic Labels of Documents
Wt% T2V!
W2V!
D2V!
Update!vectors!
using!BP!
ld%
lt%
ls%
NTM!
d%
g%
ld%
lt%
ls%W2%
W1%
Fig. 2: Basic neural network rep-
resentation of topic modeling.
As stated earlier, the generic model described
in Section 4.1 requires topic as labels of each
document. This section focuses on a topic
modeling technique that can generate topic
labels taking document vectors and word vec-
tors into account. For effective and efficient
generation of topic vectors, the topic model-
ing technique must synchronize with the it-
erations of the distributed vector generation
part. Several topic modeling techniques have been proposed in the literature to
find topic distribution of documents of such unlabeled datasets. In a general topic
Concurrent Inference of Topic Models and Distributed Representations 7
model, each document is seen as a mixture of topics, and each topic is represented
as a probability distribution over the vocabulary of the entire corpus. The condi-
tional probability p(w|d) of a word and a document is computed from word-topic
distribution and topic-document distribution as p(w|d) =
K
i=1 p(w|ti)p(ti|d),
where K is the number of topics and ti is a latent topic. This equation can be
re-written as
p(w|d) = φ(w) × θT
(d) (5)
where φ(w) = [p(w|t1), p(w|t2), . . . , p(w|tK)] is the conditional probabilities of w
with all the topics and θ(d) = [p(t1|d), p(t2|d), . . . , p(tK|d)] is the topic distribu-
tion of d.
We can view topic models from a neural network perspective considering the
formation of Equation 5. Fig. 2 shows the architecture of a neural network with
two input nodes for sliding window with n-gram g and document d, two hidden
nodes representing φ(g) and θ(d) and one output node producing the condi-
tional probability p(g|d). The topic-document node ld ∈ R1×K
computes the
topic distribution of a document (similar to θ in topic models) using the weight
matrix W1 ∈ RL×K
. It is computed by the equation ld(d) = softmax(Dd × W1)
which uses a softmax function to maintain the probabilistic constraint on topic
distribution that all the topic probabilities of a document must sum up to 1.
The n-gram-topic node lt ∈ R1×K
stands for the topic representation of the
input n-grams, and calculated as lt(g) = sigmoid(Wg × W2) where W2 ∈ RL×K
denotes the weight matrix between the n-gram input node and the n-gram-topic
node. This vector follows a probabilistic form similar to φ in topic models.
The output node ls ∈ R gives the matching score of an n-gram g and a
document d by computing the dot product of lt(g) and ld(d). The outputted score
ls(g, d) = lt(g) × ld(d)T
is a value between 0 and 1, similar to the conditional
probability of p(g|d).
The n-gram-document probability p(g|d), which initially is expected to be
very different from the ideal value, is estimated by performing a forward prop-
agation in the network. Algorithm 2 describes the training procedure for the
neural topic model part of our proposed model. For each n-gram-document pair
(g, d) the expected output value is 1 due to the fact that g is taken from docu-
ment d. The weights are updated using backpropagation to mitigate that error
(Steps 3 to 7 in Algorithm 2).
4.3 Concurrent Training
The training process runs concurrently for both topic modeling and distributed
vector generation. Fig. 1 shows the proposed combination of two networks. No-
tice the training is simultaneous unlike NTM [6] where already trained word
vectors are used for topic modeling. All the weights (W1 and W2 matrices) and
vectors (W, D and T matrices) in both the networks are initialized with random
values (Step 1 and 2 of Algorithm 3). As shown in the loop at Step 3 of Algo-
rithm 3, the combined framework reads each document in sequence of n words
(context) using a continuous window. For a particular document, the topic mod-
eling network gives its topic distribution as the output of the hidden node ld. We
8 Shamanta, Naim, Saraf, Ramakrishnan and Hossain
select k most probable topics from this distribution – with an assumption that
a document is made up of k number of topics – and provide them as input to
the distributed vector generation network. The call to the method LearnTopics
in Step 7 of Algorithm 3 accomplishes this task. The corresponding word, doc-
ument and topic vectors are updated using method LearnDistRep in Step 8 of
Algorithm Algorithm 3. Method LearnTopics and LearnDistRep are explained
in Algorithms 2 and 1, respectively.
Notice that the document and word vectors of context (n-gram) generated by
Algorithm 1 are provided as input to the topic modeling network of Algorithm
2. Also the top k topics generated for each document using Algorithm 2 are
provided to the distributed vector generation part (Algorithm 1). Algorithm 3
combines all these steps.
5 Complexity Analysis
Although both the neural networks in our proposed framework are concurrently
trained, we analyze their complexities separately for simplicity. For every ex-
ample during the training of the distributed vector generation network, there
are P words (context length), k topics and one document as input resulting in
I = P +k+1 input nodes. These inputs are projected into a L dimensional space.
Although there are V = N +M +K output nodes, this part of the network needs
to update only O(log V ) nodes using the gradient vector since the model uses
hierarchical softmax. I input nodes get updated during backpropagation making
the complexity for training a single example, Cdr = I × L + O(log V ) × L.
The topic modeling network takes the same document and input words. Cal-
culating Wg from the words in n-gram g takes O(P × L) time. Calculating each
of ld and lt takes O(L × K) operations and ls requires O(K) operations. Back-
propagation (step 3 to 7 of Algorithm 2) runs in O(L×K) time incurring a total
cost of Ctm = O(P × L) + O(L × K) + O(K) + O(L × K), or Ctm = O(L × K)
Algorithm 3: ConcurrentTrain – algorithm for simultaneous training of
both networks
input : Document collection D
parameter: Distributed representations D, W and T
Weight matrices W1 and W2 of topic modeling network
output : D, W and T
1 Randomly initialize D, W and T ;
2 Randomly initialize W1 and W2 ;
3 for each document d ∈ D do
4 Topics in d, Td ← top k topics from ld(d) ;
5 for each word w of d do
6 Cw ← context of w ;
7 LearnTopics(d, Cw) ;
8 LearnDistRep(d, Td, w, Cw) ;
9 end
10 end
Concurrent Inference of Topic Models and Distributed Representations 9
given K > P, for every example. Therefore, the cost of training the combined
network for each example is C = Ctm + Cdr.
6 Evaluation
We use a number of metrics to evaluate the quality of our results. Some of
these metrics are generally used to evaluate clustering results when ground truth
labels are not available. Two such evaluations are the Dunn Index (DI) [10]
and the Average Silhouette Coefficient (ASC) [21]. DI measures the separation
between groups of vectors and larger values are better. ASC is a measure that
takes both cohesion and separation of groups into account (higher values are
better). In our experiments, we utilize ASC and DI together to evaluate the
final topic assignments of the documents. Topics are analogous to clusters in
those evaluations. ASC and DI give us an idea about how crisply the topics are
distributed across the documents.
In the presence of ground truth labels, we evaluated the assigned topics
using Normalized Mutual Information (NMI) [7], Adjusted Rand Index (ARI)
[24], and the hypergeometric distribution-based enrichment. Both NMI and ARI
estimates the agreement between two topic assignments, irrespective of permuta-
tions. Higher values are better for NMI and ARI. While NMI is an information-
theoretic approach to evaluate agreement between two sets of assignments, ARI
is a normalized ratio of total positive agreements of pairs of documents of be-
ing in the same or different topics over all possible pairs. The normalization of
ARI ensures that the score is very low with random assignments. Hypergeomet-
ric enrichment [23] maps topics to available ground truth labels. This allows
us to measure a significance based on hypergeometric distribution of the topic
assignments over the already known labels. Higher number of enriched topics is
better.
Our proposed model is able to generate topic and document vectors in the
same hyperspace. In an ideal case, all angles between a topic vector and each
document vector assigned to this topic should be similar and the standard devi-
ation of those angles should be small. We use this concept to compute alignment
between a topic vector and a given set of document vectors. Given a topic vector
Ti of topic ti, and a set of document vectors Dtj
that are assigned a topic tj, we
compute alignment using the following formula:
A(Ti, Dtj
) =
1
|Dtj |
|Dtj |
m=1
Ti.D
tj
m
Ti D
tj
m
− µ
2
(6)
where D
tj
m refers to the document vector of mth document in topic tj, and
µ =
1
|Dtj |
|Dtj |
m=1
Ti.D
tj
m
Ti D
tj
m
(7)
Notice that Equation 6 is the standard deviation between the cosine angles
between the topic vectors and the document vectors. Lower values are expected
when ti = tj and higher values are expected when ti = tj.
10 Shamanta, Naim, Saraf, Ramakrishnan and Hossain
7 Experiments
In this section, we seek to answer the following questions to justify the capabil-
ities and correctness of the proposed model.
1. Can our framework establish relationships between distributed representa-
tions of topics and documents? (Section 7.1)
2. Are the generated topic vectors expressive enough to capture similarity be-
tween topics and to distinguish difference between them? (Section 7.2)
3. How do our topic modeling results compare with the results produced by
other topic modeling algorithms? (Section 7.3)
4. Do the generated topics bring documents with similar domain-specific themes
together? (Section 7.4)
5. How does the runtime of the proposed framework scale with the size of the
distributed representations, increasing number of documents, and increasing
number of topics? (Section 7.5)
We used seven different text datasets1
with different number of documents
and words. The datasets are listed in Table 1. Some of these datasets are widely
used in the text processing literature (e.g., Reuters , WebKB, and 20Newsgroups
datasets), while we have collected most of the other corpora from the public do-
main. The PubMed dataset is collected from publicly available citation databases
for biomedical literature provided by the US National Library of Medicine. The
PubMed dataset contains abstracts of cancer-related publications. The Spanish
news dataset was collected as a part of the EMBERS [20] project. The articles
covered news stories from 207 countries around the world.
7.1 Analysis of Distributed Representations of Topics and
Documents
The topic and document vectors generated by the proposed framework maintain
consistent relationships that can be leveraged in many applications to study the
topics of a stream of unseen documents. To be able to develop such applications, a
relationship between a topic vector Ti and any of its document vectors Dti
p should
be different than the relationship between another topic Tj and a document
vector D
tj
q . In contrast, such topic-document relationships should be similar for
Table 1: Summary of the datasets.
Dataset #Docs #Words Additional information
Synthetic 400 40,000 Four lower and two upper level groups.
20 Newsgroups 18,821 2,654,769 20 categories in seven groups.
Reuters R8 7,674 495,226 Eight category labels.
Reuters R52 9,100 624,456 52 groups.
WebKB 4,199 559,984 Four overlapping categories
PubMed 1.3 million 220 million Publication abstracts related to cancer.
Spanish news 3.7 million 3 billion News articles from 2013 and 2014.
1
Data and software source codes are provided here: http://dal.cs.utep.edu/
projects/tvec/.
Concurrent Inference of Topic Models and Distributed Representations 11
1 2 3 4
1
2
3
4
Documents ot topic j
Vectoroftopici
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
Documents ot topic j
Vectoroftopici
(a) Result with synthetic data. (b) Result with Reuters R8 dataset.
Fig. 3: Heat map of standard deviation of cosine similarity between ith topic vector
and all documents of topic j. Darker cells in the diagonal indicates that the standard
deviation is lower for angles between a topic vector and its own document vectors.
Term set
7
Term set
5
Document
set, G2
Document
set, G1
Term set
1
Term set
2
Term set
3
Term set
6
Term set
4
Document
set, G3
Document
set, G4 Topic 4 Topic 3 Topic 2 Topic 1
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
DistancewithCompletelinkage
(a) The synthetic dataset has four groups
of documents.
(b) Dendrogram generated using topic vec-
tors.
Fig. 4: Experiment with a synthetic dataset. (a) Sets of terms used to prepare the
synthetic text corpus, (b) Dendrogram generated from the topic vectors.
two documents of the same topic. Each plot of Fig. 3 shows a heat map of
alignment between a topic vector Ti of topic ti and all document vectors Dtj
of
topic tj using Equation 6. Fig. 3(a) shows the heat map with four topics of the
synthetic dataset and Fig. 3(b) shows the map with eight topics of Reuters R8
data. In these heat maps, lower alignment values result in darker cells. With both
the datasets, the diagonal dark cells indicate stronger topic-document alignment
for topic and document vectors of the same topic, where as weaker alignments
are exhibited when document vectors are chosen from a different topic. This
indicates that our proposed framework captures topical structures as well as it
models relationships between topics and documents in the same hyperspace.
7.2 Expressiveness of Topic Vectors
As described in Section 4.3, k-best topics generated by the topic modeling part of
the proposed model are selected as input to the distributed representation gen-
eration part. We set k = 1 for all our experiments including the ones described
in this subsection . To examine how expressive our distributed topic vectors are,
we prepared a synthetic corpus containing documents with term from seven sets
as illustrated by Fig. 4(a). Four groups of documents contains terms specific to
12 Shamanta, Naim, Saraf, Ramakrishnan and Hossain
each group. The same dataset can be divided into two groups of documents be-
cause each group contains terms from a specific group set of words. Additionally,
all sets of documents share a common set of terms. We generated topic, docu-
ment, and words vectors using our proposed framework. A dendrogram for the
generated four topic vectors is shown in Fig. 4(b). As expected, the dendrogram
exhibits the topical structure where two topic vectors separately and then those
two groups merge at the top of the hierarchy. The dendrogram of topic vectors
reflects the grouping mechanism we used to create the dataset.
sci.space
comp.graphics
comp.os.ms-windows.misc
misc.forsale
rec.autos
rec.motorcyles
sci.electronics
comp.sys.ibm.pc.hardware
comp.sys.mac.hardware
comp.windows.x
rec.sport.baseball
rec.sport.hockey
soc.religion.christain
talk.politics.guns
talk.politics.misc
alt.atheism
talk.politics.mideast
talk.religion.misc
sci.crypt
sci.med
0.0 0.2 0.4 0.6 0.8 1.0 1.2
Distance using complete linkage
Fig. 5: Dendrogram prepared with the 20 cate-
gory vectors of 20 Newsgroups dataset.
In a second experiment in
this space, we used a dataset
that already has category labels
(20 Newsgroups) to verify how
intuitive the topic vectors are
in bringing similar categories to-
gether. To be able to generate dis-
tributed vectors for existing cate-
gories along with document and
word vectors, we directly pro-
vided the known labels to the
distributed representation gener-
ation part of the model as an in-
puts as opposed to providing top-
ics generated by the topic mod-
eling network. The official site
for the 20 News Groups dataset reports that some of the newsgroups
are very closely related to each other (e.g. comp.sys.ibm.pc.hardware and
comp.sys.mac.hardware), while others may be highly unrelated (e.g misc.forsale
and soc.religion.christian). Our target is to verify if the generated category
vectors can provide insights about how the topics should be merged. Fig. 5
shows the dendrogram prepared for the 20 category vectors of 20 Newsgroups
dataset. There are some differences between the official grouping and the group-
ing we have discovered using the category vectors, for example, sci.electronics is
grouped with comp.sys.mac.hardware and comp.sys.ibm.pc.hardware. The label
sci.electronics is far away from sci.space even though they have a common pre-
fix “sci”. Our observation is that sci.electronics has many documents containing
hardware related discussions. As a result, sci.electronics has greater similarity
with hardware than sci.space. Similar evidences are found for the rec.* groups.
For example, rec.sport.* groups are different from rec.motorcycles and rec.autos
but the latter two groups are closely related, as evident in the dendrogram.
7.3 Comparison of Quality of Generated Topics
Fig. 6 shows a comparison of results generated by our framework and two other
topic modeling methods, LDA and NTM, when applied on four classification
datasets — synthetic, Reuters-R8, Reuters-R52, WebKB, and 20 Newsgroups.
Fig. 6 (a) and (b) use adjusted Rand index (ARI) and normalized mutual in-
formation (NMI) to compare the topic assignments of the documents with the
Concurrent Inference of Topic Models and Distributed Representations 13
0
0.2
0.4
0.6
0.8
1
1.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
AdjustedRandIndex
NTM
LDA
Proposed method
0
0.2
0.4
0.6
0.8
1
1.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
AdjustedRandIndex
NTM
LDA
Proposed method
0
0.2
0.4
0.6
0.8
1
1.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
AdjustedRandIndex
NTM
LDA
Proposed method
0
0.2
0.4
0.6
0.8
1
1.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
AdjustedRandIndex
NTM
LDA
Proposed method
0
0.2
0.4
0.6
0.8
1
1.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
NormalizedMutualInfo
NTM
LDA
Proposed method
0
0.2
0.4
0.6
0.8
1
1.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
AdjustedRandIndex
NTM
LDA
Proposed method
0
0.2
0.4
0.6
0.8
1
1.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
AdjustedRandIndex
NTM
LDA
Proposed method
0
0.2
0.4
0.6
0.8
1
1.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
AdjustedRandIndex
NTM
LDA
Proposed method
(a) Adjusted rand index. (b) Normalized mutual information.
0
0.1
0.2
0.3
0.4
0.5
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
DunnIndex
LDA
Proposed method
0
0.2
0.4
0.6
0.8
1
1.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
AdjustedRandIndex
NTM
LDA
Proposed method
0
0.2
0.4
0.6
0.8
1
1.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
AdjustedRandIndex
NTM
LDA
Proposed method
0
0.04
0.08
0.12
0.16
0.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
Avg.SilhouetteScore
LDA
Proposed method
0
0.2
0.4
0.6
0.8
1
1.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
AdjustedRandIndex
NTM
LDA
Proposed method
0
0.2
0.4
0.6
0.8
1
1.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
AdjustedRandIndex NTM
LDA
Proposed method
(c) Dunn index. (d) Average Silhouette score.
Fig. 6: Evaluation of the proposed framework using benchmark labels (a and b) and
locality of the topics (c and d).
expected classes. ARI and NMI are larger for the proposed methods for all the
datasets. This implies that our framework realizes the expected themes of the
collections better than LDA and NTM. Not only the expected categories better
match with the topic assignments, but also the generated topics are local in the
corresponding space of our framework. Higher Dunn index and higher average
silhouette coefficient for all the datasets, as depicted in Fig. 6(c) and (d), imply
that our model provides high quality local topics. Notice that Fig. 6(c) and (d)
do not have NTM. This is because Dunn index and average silhouette coefficient
require document vectors, but NTM [6] does not directly use any document
vector; rather, it uses precomputed word vectors only.
0
5
10
15
20
25
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
ClusterEnrichment
NTM
LDA
Proposed method
0
0.2
0.4
0.6
0.8
1
1.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
AdjustedRandIndex
NTM
LDA
Proposed method
0
0.2
0.4
0.6
0.8
1
1.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
AdjustedRandIndex
NTM
LDA
Proposed method
0
0.2
0.4
0.6
0.8
1
1.2
Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups
AdjustedRandIndex
NTM
LDA
Proposed method
Fig. 7: Comparison of numbers of topics
enriched by hypergeometric distribution.
We also used a hypergeometric
distribution based procedure to map
each topic to a class label. Fig. 7
shows that the topic assignments su-
ing our framework have higher num-
ber of enriched topics than any other
method. This indicates that the topics
generated by our methods has higher
thematic resemblance with the benchmark labels.
Table 2: Evaluation using the EM-
BERS news article dataset.
Method
Evaluation metric
Dunn index Silhouette score
NTM 0.04 0.01
LDA 0.01 -0.015
Proposed method 0.1 0.05
All these datasets described so far, in this
subsection are labeled and are widely used
a ground truths in many data mining and
machine learning evaluations. In addition to
these datasets, we used our EMBERS data
containing around 3.7 million news articles to
compare locality of the topics with other methods. Table 2 shows that our
method produces topics with greater Dunn index and average silhouette score
than other methods. This indicates that our method performs even better when
the datasets are very large.
14 Shamanta, Naim, Saraf, Ramakrishnan and Hossain
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 6 7 8 9 10
TruePositive
Top n MeSH terms
Proposed method
LDA
0
0.2
0.4
0.6
0.8
1
1 2 3 4 5 6 7 8 9 10
TrueNegative
Top n MeSH terms
Proposed method
LDA
(a) Ratio of true positives. (b) Ratio of true negatives.
Fig. 8: Comparison of our method and LDA using MeSH terms associated with the
PubMed abstracts.
7.4 Evaluation using Domain Specific Information
In this experiment, we used the PubMed dataset to compute overlap of domain
specific information for documents in the same topic (i.e., true positive) and
lack of such overlap for a pair of documents from two different topics (i.e., true
negative). In the PubMed dataset, each abstract is provided with some major
Medical Subject Header (MeSH) terms which come from a predefined ontology.
We used these MeSH terms as domain specific information to evaluate the topics.
It is expected that the sets of MeSH terms of two documents of the same topic will
have some common entries, where as the sets of MeSH terms of two documents
from two different topics will have lesser or no overlapping records. For each
abstract, we ordered the MeSH terms based on Jaccard similarity between a
MeSH terms and the abstract. Notice that if we pick up n best MeSH terms
for two documents from the same topic the chance that these two sets of n
best MeSH terms have common entries increases with larger n. This trend is
observed in Fig. 8(a) for both our framework and LDA. The true positive ratio
quickly becomes around 80% with only five best MeSH terms for each pair of
documents. Now, the top n MeSH terms of two documents from two different
topics should have higher absence of overlapping terms with smaller n since the
topical similarity of these two documents is minimal. As n increases the true
negative ratio will decrease due to inclusion of more general entries in the lists
of n best MeSH terms. Fig. 8(b) shows the expected trend for both LDA and
our framework. We selected random 5,000 pairs of documents from same topics
and another 5,000 pairs from different topics for the two plots, Fig. 8(a) and (b)
respectively. Fig. 8(a) and (b) demonstrate that our method follows an expected
trend of sharing domain specific information. Although the true positive values
are slightly lower than LDA in our method in some cases, the true negative
values are always greater than LDA. This indicates that our model generates
topics containing similar biological themes while documents of different topics,
as expected, have lesser similarity in domain specific information.
7.5 Runtime Characteristics
Fig. 9 depicts the runtime behavior of our proposed framework with varying
number of documents, topics, and vector size. The runtime increases almost
linearly with each of these variables. This indicates our proposed framework
Concurrent Inference of Topic Models and Distributed Representations 15
0
5
10
15
20
25
1K 2K 3K 4K 5K 6K 7K 8K 9K 10K
Runtime(sec)
Number of documents
25 words/doc
50 words/doc
75 words/doc
100 words/doc
0
10
20
30
40
50
60
10 20 30 40 50 60 70 80 90 100
Runtime(sec)
Number of topics
25 words/doc
50 words/doc
75 words/doc
100 words/doc
0
5
10
15
20
25
30
35
40
50 100 150 200 250 300 350 400
Runtime(sec)
Vector size
25 words/doc
50 words/doc
75 words/doc
100 words/doc
(a) (b) (c)
Fig. 9: Execution time with varying (a) number of documents, (b) number of topics,
and (c) vector size.
is scalable with large amount of data. The experiments in this space were done
using synthetic data with different number of words in each document as depicted
by multiple lines in each of the plots of Fig. 9.
8 Conclusion
We have presented a framework to generate distributed vectors for elements in a
corpus as well as the underlying latent topics. All types of vectors — topics, doc-
uments, and words — share the same space allowing the framework to compute
relationships between all types of elements. Our results show that the framework
can efficiently discover latent topics and generate distributed vectors simultane-
ously. The proposed framework is expressive and able to capture domain specific
information in a lower-dimensional space. In future, we will investigate how one
can study the information genealogy of a document collection with temporal
signatures using the proposed framework. We are inspired by the fact that we
can train the distributed vector generation network in a sequence as found in the
temporal signatures associated with the documents and observe the shift of the
word probabilities at the output of the network. We can also observe how the
probability distributions of the topic generation network change over the given
time sequence. This would help identify how one topic influence and transcend
another and how the topical vocabulary shifts over time.
Acknowledgments. This work is supported in part by M. S. Hossain’s startup
grant at UTEP, University Research Institute (URI, Office of Research and and
Sponsored Projects, UTEP), and the Intelligence Advanced Research Projects
Activity (IARPA) via DoI/NBC contract number D12PC000337. The funders
had no role in study design, data collection and analysis, decision to publish, or
preparation of the manuscript. The US Government is authorized to reproduce
and distribute reprints of this work for Governmental purposes notwithstanding
any copyright annotation thereon.
References
1. L. AlSumait, D. Barbar´a, and C. Domeniconi. On-line lda: Adaptive topic models
for mining text streams with applications to topic detection and tracking. In
ICDM’08, pages 3–12, 2008.
16 Shamanta, Naim, Saraf, Ramakrishnan and Hossain
2. Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic lan-
guage model. Machine Learning Research, 3:1137–1155, 2003.
3. D. Blei and J. Lafferty. Correlated topic models. Advances in Neural Information
Processing Systems, 18:147, 2006.
4. D. M. Blei and J. D. Lafferty. Dynamic topic models. In ICML’06, pages 113–120,
2006.
5. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Machine
Learning Research, 3:993–1022, 2003.
6. Z. Cao, S. Li, Y. Liu, W. Li, and H. Ji. A novel neural topic model and its
supervised extension. In AAAI’15, 2015.
7. G. J. Chaitin. Algorithmic information theory. Wiley Online Library, 1982.
8. D. J. Chalmers. Syntactic transformations on distributed representations. In Con-
nectionist Natural Language Processing, pages 46–55. Springer, 1992.
9. S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harsh-
man. Indexing by latent semantic analysis. American Society for Information
Science, 41(6):391–407, 1990.
10. J. C. Dunn. A fuzzy relative of the isodata process and its use in detecting compact
well-separated clusters. 1973.
11. T. L. Griffiths, M. Steyvers, D. M. Blei, and J. B. Tenenbaum. Integrating topics
and syntax. In NIPS’04, pages 537–544, 2004.
12. G. E. Hinton. Learning distributed representations of concepts. In CogSci’86,
volume 1, page 12, 1986.
13. T. Hofmann. Probabilistic latent semantic indexing. In SIGIR’99, pages 50–57.
ACM, 1999.
14. J. E. Hummel and K. J. Holyoak. Distributed representations of structure: A
theory of analogical access and mapping. Psychological Review, 104(3):427, 1997.
15. H. Larochelle and S. Lauly. A neural autoregressive topic model. In NIPS’12,
pages 2708–2716, 2012.
16. Q. V. Le and T. Mikolov. Distributed representations of sentences and documents.
In ICML’14, pages 1188–1196, 2014.
17. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word
representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
18. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed rep-
resentations of words and phrases and their compositionality. In NIPS’13, pages
3111–3119, 2013.
19. J. B. Pollack. Recursive distributed representations. Artificial Intelligence,
46(1):77–105, 1990.
20. N. Ramakrishnan et al. ‘Beating the news’ with EMBERS: Forecasting civil unrest
using open source indicators. In SIGKDD’14, pages 1799–1808, 2014.
21. P. J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation
of cluster analysis. Computational and Applied Mathematics, 20(0):53 – 65, 1987.
22. D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by
back-propagating errors. Cognitive Modeling, 5, 1988.
23. D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Enrichment or depletion of
a go category within a class of genes: which test? Bioinformatics, 23(4):401–407,
2007.
24. D. Steinley. Properties of the hubert-arable adjusted rand index. Psychological
Methods, 9(3):386, 2004.
25. H. M. Wallach. Topic modeling: beyond bag-of-words. In ICML’06, pages 977–984,
2006.
26. L. Wan, L. Zhu, and R. Fergus. A hybrid neural network-latent topic model. In
AISTATS’12, pages 1287–1294, 2012.
Ad

More Related Content

What's hot (18)

A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
ijma
 
Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...
Bhaskar Mitra
 
Different Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using KnnDifferent Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using Knn
IOSR Journals
 
Tdm recent trends
Tdm recent trendsTdm recent trends
Tdm recent trends
KU Leuven
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 
Recommender System with Distributed Representation
Recommender System with Distributed RepresentationRecommender System with Distributed Representation
Recommender System with Distributed Representation
Rakuten Group, Inc.
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
Primya Tamil
 
G04124041046
G04124041046G04124041046
G04124041046
IOSR-JEN
 
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
IRJET Journal
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with Idioms
Waqas Tariq
 
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESEFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
IJCSEIT Journal
 
Text clustering
Text clusteringText clustering
Text clustering
KU Leuven
 
Searching in metric spaces
Searching in metric spacesSearching in metric spaces
Searching in metric spaces
unyil96
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
IJwest
 
Text Segmentation for Online Subjective Examination using Machine Learning
Text Segmentation for Online Subjective Examination using Machine   LearningText Segmentation for Online Subjective Examination using Machine   Learning
Text Segmentation for Online Subjective Examination using Machine Learning
IRJET Journal
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973
Editor IJARCET
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
IJDKP
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
KU Leuven
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
ijma
 
Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...Exploring Session Context using Distributed Representations of Queries and Re...
Exploring Session Context using Distributed Representations of Queries and Re...
Bhaskar Mitra
 
Different Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using KnnDifferent Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using Knn
IOSR Journals
 
Tdm recent trends
Tdm recent trendsTdm recent trends
Tdm recent trends
KU Leuven
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 
Recommender System with Distributed Representation
Recommender System with Distributed RepresentationRecommender System with Distributed Representation
Recommender System with Distributed Representation
Rakuten Group, Inc.
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
Primya Tamil
 
G04124041046
G04124041046G04124041046
G04124041046
IOSR-JEN
 
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
IRJET Journal
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with Idioms
Waqas Tariq
 
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESEFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
IJCSEIT Journal
 
Text clustering
Text clusteringText clustering
Text clustering
KU Leuven
 
Searching in metric spaces
Searching in metric spacesSearching in metric spaces
Searching in metric spaces
unyil96
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
IJwest
 
Text Segmentation for Online Subjective Examination using Machine Learning
Text Segmentation for Online Subjective Examination using Machine   LearningText Segmentation for Online Subjective Examination using Machine   Learning
Text Segmentation for Online Subjective Examination using Machine Learning
IRJET Journal
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973
Editor IJARCET
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
IJDKP
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
KU Leuven
 

Similar to Concurrent Inference of Topic Models and Distributed Vector Representations (20)

A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
IJDKP
 
Semantics Graph Mining for Topic Discovery and Word Associations
Semantics Graph Mining for Topic Discovery and Word AssociationsSemantics Graph Mining for Topic Discovery and Word Associations
Semantics Graph Mining for Topic Discovery and Word Associations
IJDKP
 
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)
IJERA Editor
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
cscpconf
 
Ay3313861388
Ay3313861388Ay3313861388
Ay3313861388
IJMER
 
Artificial Intelligence of the Web through Domain Ontologies
Artificial Intelligence of the Web through Domain OntologiesArtificial Intelligence of the Web through Domain Ontologies
Artificial Intelligence of the Web through Domain Ontologies
International Journal of Science and Research (IJSR)
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
Bhaskar Mitra
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
ijseajournal
 
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
AI Publications
 
Co-Clustering For Cross-Domain Text Classification
Co-Clustering For Cross-Domain Text ClassificationCo-Clustering For Cross-Domain Text Classification
Co-Clustering For Cross-Domain Text Classification
paperpublications3
 
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...
Shakas Technologies
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
ijaia
 
Semantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic WebSemantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic Web
Editor IJCATR
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
IJwest
 
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGEXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
ijnlc
 
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
ijnlc
 
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
ijcsity
 
Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...
Salam Shah
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
Patricia Tavares Boralli
 
A scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linkingA scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linking
Sunny Kr
 
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
IJDKP
 
Semantics Graph Mining for Topic Discovery and Word Associations
Semantics Graph Mining for Topic Discovery and Word AssociationsSemantics Graph Mining for Topic Discovery and Word Associations
Semantics Graph Mining for Topic Discovery and Word Associations
IJDKP
 
Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)Text Mining: (Asynchronous Sequences)
Text Mining: (Asynchronous Sequences)
IJERA Editor
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
cscpconf
 
Ay3313861388
Ay3313861388Ay3313861388
Ay3313861388
IJMER
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
Bhaskar Mitra
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
ijseajournal
 
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
Low Resource Domain Subjective Context Feature Extraction via Thematic Meta-l...
AI Publications
 
Co-Clustering For Cross-Domain Text Classification
Co-Clustering For Cross-Domain Text ClassificationCo-Clustering For Cross-Domain Text Classification
Co-Clustering For Cross-Domain Text Classification
paperpublications3
 
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...
Identifying Hot Topic Trends in Streaming Text Data Using News Sequential Evo...
Shakas Technologies
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
ijaia
 
Semantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic WebSemantic Annotation: The Mainstay of Semantic Web
Semantic Annotation: The Mainstay of Semantic Web
Editor IJCATR
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
IJwest
 
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGEXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
ijnlc
 
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
ijnlc
 
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
ijcsity
 
Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...Navigation through citation network based on content similarity using cosine ...
Navigation through citation network based on content similarity using cosine ...
Salam Shah
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
Patricia Tavares Boralli
 
A scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linkingA scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linking
Sunny Kr
 
Ad

More from Parang Saraf (20)

Email and Network Analyzer
Email and Network AnalyzerEmail and Network Analyzer
Email and Network Analyzer
Parang Saraf
 
Slides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data PerspectivesSlides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data Perspectives
Parang Saraf
 
Slides: Safeguarding Abila: Real-time Streaming Analysis
Slides: Safeguarding Abila: Real-time Streaming AnalysisSlides: Safeguarding Abila: Real-time Streaming Analysis
Slides: Safeguarding Abila: Real-time Streaming Analysis
Parang Saraf
 
Slides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity ModelingSlides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Parang Saraf
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
Parang Saraf
 
News Analyzer
News AnalyzerNews Analyzer
News Analyzer
Parang Saraf
 
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsEMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
Parang Saraf
 
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
Parang Saraf
 
Slides: Forex-Foreteller: Currency Trend Modeling using News Articles
Slides: Forex-Foreteller: Currency Trend Modeling using News ArticlesSlides: Forex-Foreteller: Currency Trend Modeling using News Articles
Slides: Forex-Foreteller: Currency Trend Modeling using News Articles
Parang Saraf
 
Slides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterSlides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on Twitter
Parang Saraf
 
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsEMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
Parang Saraf
 
DMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkDMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation Framework
Parang Saraf
 
EMBERS Posters
EMBERS PostersEMBERS Posters
EMBERS Posters
Parang Saraf
 
Bayesian Model Fusion for Forecasting Civil Unrest
Bayesian Model Fusion for Forecasting Civil UnrestBayesian Model Fusion for Forecasting Civil Unrest
Bayesian Model Fusion for Forecasting Civil Unrest
Parang Saraf
 
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
Parang Saraf
 
Safeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data PerspectivesSafeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data Perspectives
Parang Saraf
 
Safeguarding Abila: Real-time Streaming Analysis
Safeguarding Abila: Real-time Streaming AnalysisSafeguarding Abila: Real-time Streaming Analysis
Safeguarding Abila: Real-time Streaming Analysis
Parang Saraf
 
Safeguarding Abila: Spatio-Temporal Activity Modeling
Safeguarding Abila: Spatio-Temporal Activity ModelingSafeguarding Abila: Spatio-Temporal Activity Modeling
Safeguarding Abila: Spatio-Temporal Activity Modeling
Parang Saraf
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
Parang Saraf
 
Forex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News ArticlesForex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News Articles
Parang Saraf
 
Email and Network Analyzer
Email and Network AnalyzerEmail and Network Analyzer
Email and Network Analyzer
Parang Saraf
 
Slides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data PerspectivesSlides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data Perspectives
Parang Saraf
 
Slides: Safeguarding Abila: Real-time Streaming Analysis
Slides: Safeguarding Abila: Real-time Streaming AnalysisSlides: Safeguarding Abila: Real-time Streaming Analysis
Slides: Safeguarding Abila: Real-time Streaming Analysis
Parang Saraf
 
Slides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity ModelingSlides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Slides: Safeguarding Abila: Spatio-Temporal Activity Modeling
Parang Saraf
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
Parang Saraf
 
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsEMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
Parang Saraf
 
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
EMBERS at 4 years: Experiences operating an Open Source Indicators Forecastin...
Parang Saraf
 
Slides: Forex-Foreteller: Currency Trend Modeling using News Articles
Slides: Forex-Foreteller: Currency Trend Modeling using News ArticlesSlides: Forex-Foreteller: Currency Trend Modeling using News Articles
Slides: Forex-Foreteller: Currency Trend Modeling using News Articles
Parang Saraf
 
Slides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterSlides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on Twitter
Parang Saraf
 
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsEMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
Parang Saraf
 
DMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkDMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation Framework
Parang Saraf
 
Bayesian Model Fusion for Forecasting Civil Unrest
Bayesian Model Fusion for Forecasting Civil UnrestBayesian Model Fusion for Forecasting Civil Unrest
Bayesian Model Fusion for Forecasting Civil Unrest
Parang Saraf
 
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
‘Beating the News’ with EMBERS: Forecasting Civil Unrest using Open Source In...
Parang Saraf
 
Safeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data PerspectivesSafeguarding Abila through Multiple Data Perspectives
Safeguarding Abila through Multiple Data Perspectives
Parang Saraf
 
Safeguarding Abila: Real-time Streaming Analysis
Safeguarding Abila: Real-time Streaming AnalysisSafeguarding Abila: Real-time Streaming Analysis
Safeguarding Abila: Real-time Streaming Analysis
Parang Saraf
 
Safeguarding Abila: Spatio-Temporal Activity Modeling
Safeguarding Abila: Spatio-Temporal Activity ModelingSafeguarding Abila: Spatio-Temporal Activity Modeling
Safeguarding Abila: Spatio-Temporal Activity Modeling
Parang Saraf
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
Parang Saraf
 
Forex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News ArticlesForex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News Articles
Parang Saraf
 
Ad

Recently uploaded (20)

computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Modern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx AaModern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx Aa
MuhammadAwaisKamboh
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Modern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx AaModern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx Aa
MuhammadAwaisKamboh
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 

Concurrent Inference of Topic Models and Distributed Vector Representations

  • 1. Concurrent Inference of Topic Models and Distributed Vector Representations Debakar Shamanta1 , Sheikh Motahar Naim1 , Parang Saraf2 , Naren Ramakrishnan2 , and M. Shahriar Hossain1 1 Dept of CS, University of Texas at El Paso, El Paso, TX 79968 2 Dept of CS, Virginia Tech, Arlington, VA 22203 [email protected],[email protected], [email protected],[email protected],[email protected] Abstract. Topic modeling techniques have been widely used to uncover dominant themes hidden inside an unstructured document collection. Though these techniques first originated in the probabilistic analysis of word distributions, many deep learning approaches have been adopted recently. In this paper, we propose a novel neural network based architec- ture that produces distributed representation of topics to capture topical themes in a dataset. Unlike many state-of-the-art techniques for generat- ing distributed representation of words and documents that directly use neighboring words for training, we leverage the outcome of a sophisti- cated deep neural network to estimate the topic labels of each document. The networks, for topic modeling and generation of distributed represen- tations, are trained concurrently in a cascaded style with better runtime without sacrificing the quality of the topics. Empirical studies reported in the paper show that the distributed representations of topics repre- sent intuitive themes using smaller dimensions than conventional topic modeling approaches. Keywords: Topic Modeling, Distributed Representation 1 Introduction The representation of textual datasets in vector space has been a long-standing central issue in data mining with a veritable cottage industry devoted to repre- senting domain-specific information. Most representations consider features as localized chunks as a result of which the interpretation of the features might lack generalizability. Researchers have recently become interested in distributed rep- resentations [12, 8, 14, 19] because distributed representations generalize features based on the facts captured from the entire dataset rather than one single object or a small group of objects. Moreover, modern large and unstructured datasets involve too many heterogeneous entries for which local subspaces cannot capture relationships between the features. For example, publication datasets nowadays come with a substantial number of features like author information, scientific area, and keywords along with the actual text for each document. News article datasets have author information, time stamp data, category, and sometimes tweets and comments posted against the articles. Movie clips are accompanied
  • 2. 2 Shamanta, Naim, Saraf, Ramakrishnan and Hossain by synopsis, production information, rating, and text reviews. The focus of this paper is on the design of a flexible mechanism that can generate multiple types of features in the same space. We show that the proposed method is not only able to generate feature vectors for labeled information available with the datasets but also for discovered information that are not readily available with the dataset as labels, for example, topics. Current state-of-the-art of distributed representa- tions for unstructured text datasets can model two different types of elements in the same hyperspace, as described by Le and Mikolov [16]. Le and Mikolov’s framework generates distributed vectors of documents (or paragraphs) and words in the same space using a deep neural network. Further generalization, that we have described in this paper, can provide distributed representations for hetero- geneous elements of a dataset in the same hyperspace. However, the problem of creating distributed representations becomes more challenging when the label information is not contained within the dataset. The focus of this paper is on the generation of topical structures and their representations in the same space as documents and words. The capability of representing topics, documents, words, and other labeled information in the same space opens up the opportunity to compute syntactic and semantic relationships between not only words but also between topics and documents by directly by using simple vector algebra. Estimating the topic labels for documents is another challenge while using distributed representations. Earlier topic modeling techniques [9, 13] used to de- fine a document as a mixture of topics and estimate the probability p(t|d) of a topic (t) of a document (d) through probabilistic reasoning. More recently, topic models are seen from a neural network point of view [26, 15, 6] where these prob- abilities are generated from the hidden nodes of a network. Such neural networks require compact numeric representations of words and documents for effective training, which are not easy to estimate with traditional vector space based document modeling techniques that represent the documents using a very high dimensional space. There have been attempts to use the compact distributed representations of words and documents learned from a general purpose large dataset [6] but the precomputed vectors may not be always appropriate for many new domain specific datasets. Furthermore, the vocabulary shifts in a new di- rection over time resulting in changes in the distributed representations. Specific contributions of this paper are as follows. 1. We formulate the problem of computing distributed representation of topics in the same space as documents and words using a novel fusion of a neural network based topic modeling and a distributed representation generation technique. 2. The tasks of computing topics for documents and generating distributed rep- resentations are simultaneous in the proposed method unlike closely related state-of-the-art techniques where precomputed distributed vectors of words are leveraged to compute topics. Additionally, none of the state-of-the-art methods generates distributed representation of topics to the best of our knowledge. 3. Our proposed method generates the distributed vectors using a smaller num- ber of dimensions than the actual text feature space. Even if the space is
  • 3. Concurrent Inference of Topic Models and Distributed Representations 3 of lower number of dimensions, the vectors capture syntactic and semantic relationships between language components. 4. We demonstrate that the generated topic vectors explain domain specific properties of datasets, help identify topical similarities, and exhibit topic- specific relationships with document vectors. 2 Related Work Distributed representations have been used in diverse fields of scientific research with notable success due to their superiority in capturing generalized view of information over local representations. Rumelhart et al. [22] designed a neural network based approach for distributed representation of words which has been followed by many efforts in language modeling. One such model is the neural probabilistic model [2] proposed by Bengio et al. This framework uses a sliding window based context of a word to generate compact representations. Mikolov et al. [17] brings in continuous bag-of-words (CBOW) and skip-gram models to compute continuous vector representations of words efficiently from very large data sets. The skip-gram model was significantly improved in [18], which includes phrase vectors along with words. Le and Mikolov [16] extended the CBOW model to learn distributed representation of higher level texts like paragraphs and documents. Our proposed model further enriches the literature by including the capability to generate (1) vectors for arbitrary labels in the dataset and (2) vectors for topics for which a text dataset does not contain any labeled information. Finding hidden themes in a document collection has been of great interest to data mining and information retrieval researchers for more than two decades. An earlier work in the literature is latent semantic indexing (LSI) [9] that maps document and terms in a special “latent semantic” space by applying dimen- sionality reduction on traditional bag-of-words vector space representations of documents. A probabilistic version of LSI, pLSI [13], introduces a mixture model where each document is represented by a mixing proportion of hidden “topics”. Latent Dirichlet Allocation (LDA) [5], a somewhat generalized but more sophis- ticated version of pLSI, is one of the most notable ones in the literature. It provides a generative probabilistic approach for document modeling assuming a random process by which the documents are created. LDA spawned a deluge of work exploring different aspects of topic modeling. For example, the Dynamic Topic Model (DTM) [4] captures the evolution of topics in a time-labeled corpus. Online LDA (OLDA) [1] handles streams of documents with dynamic vocabulary, Wallach [25] and Griffiths et al. [11] exploit the sentence structures of documents and Correlated Topic Model (CTM) [3] captures the correlation between topics. More recently, neural network based models have received great attention from the data mining community. Wan [26] et al. introduce a hybrid model in computer vision settings; DocNADE [15] provides an autoregressive neural network for topic modeling; Cao et al. [6] propose a neural topic model (NTM) with supervised extension. The latter work has close resemblance to a part of our proposed model that focuses on generating topics for each document.
  • 4. 4 Shamanta, Naim, Saraf, Ramakrishnan and Hossain Forward propagation Ξ" g" d" lt"" ld" ls"W1" W2" Topic Generation Module !!!!!!!!!!!!!! !! !! !! 1" k" !!!!!!!!!!!!!! !!!!!!!!!!!!!! t+p" t+p" t" !! !! !! !! !! !! !"#! !"#! !"#! Distributed Vector Generation Update vectors using BP Doc vectors, Topic vectors,!!!!!!!!!!!!!! !!!!!!!!!!!!!! Word vectors,!"#! Fig. 1: The proposed framework. 3 Problem Formulation Let D = {d1, d2, . . . , dN } be a text dataset containing N documents taking terms from the set of M words W = {w1, w2, . . . , wM }. Each document can contain an arbitrary number of words in any sequence. The objective is to generate a universal distributed representation for the labeled items (e.g., words and docu- ments) and latent topics of each document of dataset D. Let T = {t1, t2, . . . , tK} be the set of topics. Consider that the expected number of dimensions in the distributed representation of words, documents, and topics is L. L should be much smaller than the number of words M. Word vectors W ∈ RM×L , docu- ment vectors, D ∈ RN×L and topic vectors, T ∈ RK×L generated in the same L-dimensional space should maintain two specific properties: (1) distributed rep- resentation of each type should be capable of capturing the semantic, syntactic, and topical aspect of conventional language models, and (2) all types of vectors (topics, documents, and words) organized in the L-dimensional hyperspace must be comparable to each other. The first property aligns the framework with the objectives of any language model where features are generated for most common data mining tasks likes clustering and classification. The second property, however, is unique and spe- cific to relating vectors of different types of entities like topics, documents, and vectors. In word2vec [17], the authors show that distributed representations of word can retrieve linguistic similarities between pairs of words. For example, WKing − WMan is close to WQueen − WWoman. The ability to model topics in the same hyperspace extends this property by capturing similarity between relation- ships among topics and documents. For example, if two documents di and dj are drawn from the same topic tp then Tp −Di should be closer to Tp −Dj. Similarly, if two documents di and dj are drawn from two different topics tp and tq, then Tp − Di should tend to be different than Tp − Dj. 4 Methodology The main objective of the proposed framework is to generate a compact dis- tributed representation for topics, documents, and words of a document collec-
  • 5. Concurrent Inference of Topic Models and Distributed Representations 5 tion in the same hyperspace in such a way that all these heterogeneous objects are comparable to each other and capture the semantic, syntactic and thematic prop- erties. The proposed framework has three major components. First, we adopt a generic neural network that can generate distributed vectors for documents, words, and any given labels. Second, we propose a deep neural network based topic modeling that can take distributed representations of words and docu- ments, and estimate topic distribution for each document. Finally, we convolute both these networks so that they can share information and train simultane- ously. Fig. 1 shows the proposed framework. The following subsections describe the model in a sequence. 4.1 Distributed Representation of Heterogeneous Entities Inferring a distributed representation W for the words of a document collection D having vocabulary W is based on predicting a word given other words in the same context. The objective of such a word representation model is to maximize the average log probability 1 M M−p m=p log p(wm|wm−p, . . . , wm+p) (1) The individual probabilities in Equation 1 are estimated by training a multi-class deep neural network, such as softmax. They can be computed as: p(wm|wm−p, . . . , wm+p) = eym i eyi (2) where yi is the unnormalized log-probability for every output word wi. yi = b + Uh(wm|wm−p, . . . , wm+p; W) (3) Here, U and b are the softmax parameters. h is constructed by a concatenation or average of relevant word vectors. We use hierarchical softmax [17] instead of softmax for faster training, and calculate the gradient using stochastic gradient descent. After the training converges, words with similar meaning are mapped to a similar position in the vector space. To obtain a document vector, a document is thought of as another word. The only change in the model is in Equation 3, where h is constructed using W and D. Algorithm 1: LearnDistRep – algorithm for learning topic vectors input : Document id, d Set of topics in d, Td Word to predict, w Context of w, Cw parameter: Distributed representations D, W and T 1 Calculate y using Equation 4 ; 2 Calculate gradient gr using stochastic gradient descent ; 3 Update document vector Dd, topic vectors TTd and word vectors WCw using gr;
  • 6. 6 Shamanta, Naim, Saraf, Ramakrishnan and Hossain Algorithm 2: LearnTopic – algorithm for learning topic distribution. input : Document id, d N-gram or context id, g parameter: Distributed representations D, W and T Weight matrices W1 and W2 output : Updated weight matrices 1 Calculate ls(g, d) using equations for lt and ld ; 2 Determine error in output node with respect to the ideal value: δ(3) = ls(g, d) − 1 ; 3 Compute the error in n-gram-topic hidden node: δ (2) 1 = (δ(3) × ld(d)) · (lt(g) · (1 − lt(g))) ; 4 Update W2: W2 = W2 + α[δ (2) 1 × Wg + λ × W2] ; 5 Compute error in the document-topic hidden node: new ld(d) = ld(d) + α[δ(3) × lt(g) + λ × ld(d)] ; 6 δ (2) 2 = new ld(d) − ld(d) ; 7 Update W1: W1 = W1 + α[δ (2) 2 × Dd + λ × W1] ; Inclusion of further labels, for example, authors, topic, and tags can be done the same way document vectors are added. Our focus in this paper is to in- corporate topics instead of additional labels. Incorporation of topic vectors is challenging because the topics are not given and rather should be generated us- ing the documents and words. For the time being, let us assume that topic is just a given label that comes with the data. In contrast to the word vector matrix W that is shared across all the documents, a topic vector can be shared only across the documents which contain that particular topic. Considering topic vectors along with the vectors for words and documents, Equation 3 is modified to: y = b + Uh(wt−k, . . . , wt+k, dq, tr1 , tr2 , . . . , trs ; W, D, T ) (4) For the training purpose, we use sampling of variable-length contexts using a sliding window over each document. Such a sliding window is commonly referred to as n-gram. We use n-grams instead of single words (unigrams) since n-grams produce representative contexts around each word [18]. A procedure for training this generic network for topic, documents, and words is explained in Algorithm 1. 4.2 Estimating Topic Labels of Documents Wt% T2V! W2V! D2V! Update!vectors! using!BP! ld% lt% ls% NTM! d% g% ld% lt% ls%W2% W1% Fig. 2: Basic neural network rep- resentation of topic modeling. As stated earlier, the generic model described in Section 4.1 requires topic as labels of each document. This section focuses on a topic modeling technique that can generate topic labels taking document vectors and word vec- tors into account. For effective and efficient generation of topic vectors, the topic model- ing technique must synchronize with the it- erations of the distributed vector generation part. Several topic modeling techniques have been proposed in the literature to find topic distribution of documents of such unlabeled datasets. In a general topic
  • 7. Concurrent Inference of Topic Models and Distributed Representations 7 model, each document is seen as a mixture of topics, and each topic is represented as a probability distribution over the vocabulary of the entire corpus. The condi- tional probability p(w|d) of a word and a document is computed from word-topic distribution and topic-document distribution as p(w|d) = K i=1 p(w|ti)p(ti|d), where K is the number of topics and ti is a latent topic. This equation can be re-written as p(w|d) = φ(w) × θT (d) (5) where φ(w) = [p(w|t1), p(w|t2), . . . , p(w|tK)] is the conditional probabilities of w with all the topics and θ(d) = [p(t1|d), p(t2|d), . . . , p(tK|d)] is the topic distribu- tion of d. We can view topic models from a neural network perspective considering the formation of Equation 5. Fig. 2 shows the architecture of a neural network with two input nodes for sliding window with n-gram g and document d, two hidden nodes representing φ(g) and θ(d) and one output node producing the condi- tional probability p(g|d). The topic-document node ld ∈ R1×K computes the topic distribution of a document (similar to θ in topic models) using the weight matrix W1 ∈ RL×K . It is computed by the equation ld(d) = softmax(Dd × W1) which uses a softmax function to maintain the probabilistic constraint on topic distribution that all the topic probabilities of a document must sum up to 1. The n-gram-topic node lt ∈ R1×K stands for the topic representation of the input n-grams, and calculated as lt(g) = sigmoid(Wg × W2) where W2 ∈ RL×K denotes the weight matrix between the n-gram input node and the n-gram-topic node. This vector follows a probabilistic form similar to φ in topic models. The output node ls ∈ R gives the matching score of an n-gram g and a document d by computing the dot product of lt(g) and ld(d). The outputted score ls(g, d) = lt(g) × ld(d)T is a value between 0 and 1, similar to the conditional probability of p(g|d). The n-gram-document probability p(g|d), which initially is expected to be very different from the ideal value, is estimated by performing a forward prop- agation in the network. Algorithm 2 describes the training procedure for the neural topic model part of our proposed model. For each n-gram-document pair (g, d) the expected output value is 1 due to the fact that g is taken from docu- ment d. The weights are updated using backpropagation to mitigate that error (Steps 3 to 7 in Algorithm 2). 4.3 Concurrent Training The training process runs concurrently for both topic modeling and distributed vector generation. Fig. 1 shows the proposed combination of two networks. No- tice the training is simultaneous unlike NTM [6] where already trained word vectors are used for topic modeling. All the weights (W1 and W2 matrices) and vectors (W, D and T matrices) in both the networks are initialized with random values (Step 1 and 2 of Algorithm 3). As shown in the loop at Step 3 of Algo- rithm 3, the combined framework reads each document in sequence of n words (context) using a continuous window. For a particular document, the topic mod- eling network gives its topic distribution as the output of the hidden node ld. We
  • 8. 8 Shamanta, Naim, Saraf, Ramakrishnan and Hossain select k most probable topics from this distribution – with an assumption that a document is made up of k number of topics – and provide them as input to the distributed vector generation network. The call to the method LearnTopics in Step 7 of Algorithm 3 accomplishes this task. The corresponding word, doc- ument and topic vectors are updated using method LearnDistRep in Step 8 of Algorithm Algorithm 3. Method LearnTopics and LearnDistRep are explained in Algorithms 2 and 1, respectively. Notice that the document and word vectors of context (n-gram) generated by Algorithm 1 are provided as input to the topic modeling network of Algorithm 2. Also the top k topics generated for each document using Algorithm 2 are provided to the distributed vector generation part (Algorithm 1). Algorithm 3 combines all these steps. 5 Complexity Analysis Although both the neural networks in our proposed framework are concurrently trained, we analyze their complexities separately for simplicity. For every ex- ample during the training of the distributed vector generation network, there are P words (context length), k topics and one document as input resulting in I = P +k+1 input nodes. These inputs are projected into a L dimensional space. Although there are V = N +M +K output nodes, this part of the network needs to update only O(log V ) nodes using the gradient vector since the model uses hierarchical softmax. I input nodes get updated during backpropagation making the complexity for training a single example, Cdr = I × L + O(log V ) × L. The topic modeling network takes the same document and input words. Cal- culating Wg from the words in n-gram g takes O(P × L) time. Calculating each of ld and lt takes O(L × K) operations and ls requires O(K) operations. Back- propagation (step 3 to 7 of Algorithm 2) runs in O(L×K) time incurring a total cost of Ctm = O(P × L) + O(L × K) + O(K) + O(L × K), or Ctm = O(L × K) Algorithm 3: ConcurrentTrain – algorithm for simultaneous training of both networks input : Document collection D parameter: Distributed representations D, W and T Weight matrices W1 and W2 of topic modeling network output : D, W and T 1 Randomly initialize D, W and T ; 2 Randomly initialize W1 and W2 ; 3 for each document d ∈ D do 4 Topics in d, Td ← top k topics from ld(d) ; 5 for each word w of d do 6 Cw ← context of w ; 7 LearnTopics(d, Cw) ; 8 LearnDistRep(d, Td, w, Cw) ; 9 end 10 end
  • 9. Concurrent Inference of Topic Models and Distributed Representations 9 given K > P, for every example. Therefore, the cost of training the combined network for each example is C = Ctm + Cdr. 6 Evaluation We use a number of metrics to evaluate the quality of our results. Some of these metrics are generally used to evaluate clustering results when ground truth labels are not available. Two such evaluations are the Dunn Index (DI) [10] and the Average Silhouette Coefficient (ASC) [21]. DI measures the separation between groups of vectors and larger values are better. ASC is a measure that takes both cohesion and separation of groups into account (higher values are better). In our experiments, we utilize ASC and DI together to evaluate the final topic assignments of the documents. Topics are analogous to clusters in those evaluations. ASC and DI give us an idea about how crisply the topics are distributed across the documents. In the presence of ground truth labels, we evaluated the assigned topics using Normalized Mutual Information (NMI) [7], Adjusted Rand Index (ARI) [24], and the hypergeometric distribution-based enrichment. Both NMI and ARI estimates the agreement between two topic assignments, irrespective of permuta- tions. Higher values are better for NMI and ARI. While NMI is an information- theoretic approach to evaluate agreement between two sets of assignments, ARI is a normalized ratio of total positive agreements of pairs of documents of be- ing in the same or different topics over all possible pairs. The normalization of ARI ensures that the score is very low with random assignments. Hypergeomet- ric enrichment [23] maps topics to available ground truth labels. This allows us to measure a significance based on hypergeometric distribution of the topic assignments over the already known labels. Higher number of enriched topics is better. Our proposed model is able to generate topic and document vectors in the same hyperspace. In an ideal case, all angles between a topic vector and each document vector assigned to this topic should be similar and the standard devi- ation of those angles should be small. We use this concept to compute alignment between a topic vector and a given set of document vectors. Given a topic vector Ti of topic ti, and a set of document vectors Dtj that are assigned a topic tj, we compute alignment using the following formula: A(Ti, Dtj ) = 1 |Dtj | |Dtj | m=1 Ti.D tj m Ti D tj m − µ 2 (6) where D tj m refers to the document vector of mth document in topic tj, and µ = 1 |Dtj | |Dtj | m=1 Ti.D tj m Ti D tj m (7) Notice that Equation 6 is the standard deviation between the cosine angles between the topic vectors and the document vectors. Lower values are expected when ti = tj and higher values are expected when ti = tj.
  • 10. 10 Shamanta, Naim, Saraf, Ramakrishnan and Hossain 7 Experiments In this section, we seek to answer the following questions to justify the capabil- ities and correctness of the proposed model. 1. Can our framework establish relationships between distributed representa- tions of topics and documents? (Section 7.1) 2. Are the generated topic vectors expressive enough to capture similarity be- tween topics and to distinguish difference between them? (Section 7.2) 3. How do our topic modeling results compare with the results produced by other topic modeling algorithms? (Section 7.3) 4. Do the generated topics bring documents with similar domain-specific themes together? (Section 7.4) 5. How does the runtime of the proposed framework scale with the size of the distributed representations, increasing number of documents, and increasing number of topics? (Section 7.5) We used seven different text datasets1 with different number of documents and words. The datasets are listed in Table 1. Some of these datasets are widely used in the text processing literature (e.g., Reuters , WebKB, and 20Newsgroups datasets), while we have collected most of the other corpora from the public do- main. The PubMed dataset is collected from publicly available citation databases for biomedical literature provided by the US National Library of Medicine. The PubMed dataset contains abstracts of cancer-related publications. The Spanish news dataset was collected as a part of the EMBERS [20] project. The articles covered news stories from 207 countries around the world. 7.1 Analysis of Distributed Representations of Topics and Documents The topic and document vectors generated by the proposed framework maintain consistent relationships that can be leveraged in many applications to study the topics of a stream of unseen documents. To be able to develop such applications, a relationship between a topic vector Ti and any of its document vectors Dti p should be different than the relationship between another topic Tj and a document vector D tj q . In contrast, such topic-document relationships should be similar for Table 1: Summary of the datasets. Dataset #Docs #Words Additional information Synthetic 400 40,000 Four lower and two upper level groups. 20 Newsgroups 18,821 2,654,769 20 categories in seven groups. Reuters R8 7,674 495,226 Eight category labels. Reuters R52 9,100 624,456 52 groups. WebKB 4,199 559,984 Four overlapping categories PubMed 1.3 million 220 million Publication abstracts related to cancer. Spanish news 3.7 million 3 billion News articles from 2013 and 2014. 1 Data and software source codes are provided here: http://dal.cs.utep.edu/ projects/tvec/.
  • 11. Concurrent Inference of Topic Models and Distributed Representations 11 1 2 3 4 1 2 3 4 Documents ot topic j Vectoroftopici 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Documents ot topic j Vectoroftopici (a) Result with synthetic data. (b) Result with Reuters R8 dataset. Fig. 3: Heat map of standard deviation of cosine similarity between ith topic vector and all documents of topic j. Darker cells in the diagonal indicates that the standard deviation is lower for angles between a topic vector and its own document vectors. Term set 7 Term set 5 Document set, G2 Document set, G1 Term set 1 Term set 2 Term set 3 Term set 6 Term set 4 Document set, G3 Document set, G4 Topic 4 Topic 3 Topic 2 Topic 1 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 DistancewithCompletelinkage (a) The synthetic dataset has four groups of documents. (b) Dendrogram generated using topic vec- tors. Fig. 4: Experiment with a synthetic dataset. (a) Sets of terms used to prepare the synthetic text corpus, (b) Dendrogram generated from the topic vectors. two documents of the same topic. Each plot of Fig. 3 shows a heat map of alignment between a topic vector Ti of topic ti and all document vectors Dtj of topic tj using Equation 6. Fig. 3(a) shows the heat map with four topics of the synthetic dataset and Fig. 3(b) shows the map with eight topics of Reuters R8 data. In these heat maps, lower alignment values result in darker cells. With both the datasets, the diagonal dark cells indicate stronger topic-document alignment for topic and document vectors of the same topic, where as weaker alignments are exhibited when document vectors are chosen from a different topic. This indicates that our proposed framework captures topical structures as well as it models relationships between topics and documents in the same hyperspace. 7.2 Expressiveness of Topic Vectors As described in Section 4.3, k-best topics generated by the topic modeling part of the proposed model are selected as input to the distributed representation gen- eration part. We set k = 1 for all our experiments including the ones described in this subsection . To examine how expressive our distributed topic vectors are, we prepared a synthetic corpus containing documents with term from seven sets as illustrated by Fig. 4(a). Four groups of documents contains terms specific to
  • 12. 12 Shamanta, Naim, Saraf, Ramakrishnan and Hossain each group. The same dataset can be divided into two groups of documents be- cause each group contains terms from a specific group set of words. Additionally, all sets of documents share a common set of terms. We generated topic, docu- ment, and words vectors using our proposed framework. A dendrogram for the generated four topic vectors is shown in Fig. 4(b). As expected, the dendrogram exhibits the topical structure where two topic vectors separately and then those two groups merge at the top of the hierarchy. The dendrogram of topic vectors reflects the grouping mechanism we used to create the dataset. sci.space comp.graphics comp.os.ms-windows.misc misc.forsale rec.autos rec.motorcyles sci.electronics comp.sys.ibm.pc.hardware comp.sys.mac.hardware comp.windows.x rec.sport.baseball rec.sport.hockey soc.religion.christain talk.politics.guns talk.politics.misc alt.atheism talk.politics.mideast talk.religion.misc sci.crypt sci.med 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Distance using complete linkage Fig. 5: Dendrogram prepared with the 20 cate- gory vectors of 20 Newsgroups dataset. In a second experiment in this space, we used a dataset that already has category labels (20 Newsgroups) to verify how intuitive the topic vectors are in bringing similar categories to- gether. To be able to generate dis- tributed vectors for existing cate- gories along with document and word vectors, we directly pro- vided the known labels to the distributed representation gener- ation part of the model as an in- puts as opposed to providing top- ics generated by the topic mod- eling network. The official site for the 20 News Groups dataset reports that some of the newsgroups are very closely related to each other (e.g. comp.sys.ibm.pc.hardware and comp.sys.mac.hardware), while others may be highly unrelated (e.g misc.forsale and soc.religion.christian). Our target is to verify if the generated category vectors can provide insights about how the topics should be merged. Fig. 5 shows the dendrogram prepared for the 20 category vectors of 20 Newsgroups dataset. There are some differences between the official grouping and the group- ing we have discovered using the category vectors, for example, sci.electronics is grouped with comp.sys.mac.hardware and comp.sys.ibm.pc.hardware. The label sci.electronics is far away from sci.space even though they have a common pre- fix “sci”. Our observation is that sci.electronics has many documents containing hardware related discussions. As a result, sci.electronics has greater similarity with hardware than sci.space. Similar evidences are found for the rec.* groups. For example, rec.sport.* groups are different from rec.motorcycles and rec.autos but the latter two groups are closely related, as evident in the dendrogram. 7.3 Comparison of Quality of Generated Topics Fig. 6 shows a comparison of results generated by our framework and two other topic modeling methods, LDA and NTM, when applied on four classification datasets — synthetic, Reuters-R8, Reuters-R52, WebKB, and 20 Newsgroups. Fig. 6 (a) and (b) use adjusted Rand index (ARI) and normalized mutual in- formation (NMI) to compare the topic assignments of the documents with the
  • 13. Concurrent Inference of Topic Models and Distributed Representations 13 0 0.2 0.4 0.6 0.8 1 1.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups AdjustedRandIndex NTM LDA Proposed method 0 0.2 0.4 0.6 0.8 1 1.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups AdjustedRandIndex NTM LDA Proposed method 0 0.2 0.4 0.6 0.8 1 1.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups AdjustedRandIndex NTM LDA Proposed method 0 0.2 0.4 0.6 0.8 1 1.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups AdjustedRandIndex NTM LDA Proposed method 0 0.2 0.4 0.6 0.8 1 1.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups NormalizedMutualInfo NTM LDA Proposed method 0 0.2 0.4 0.6 0.8 1 1.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups AdjustedRandIndex NTM LDA Proposed method 0 0.2 0.4 0.6 0.8 1 1.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups AdjustedRandIndex NTM LDA Proposed method 0 0.2 0.4 0.6 0.8 1 1.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups AdjustedRandIndex NTM LDA Proposed method (a) Adjusted rand index. (b) Normalized mutual information. 0 0.1 0.2 0.3 0.4 0.5 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups DunnIndex LDA Proposed method 0 0.2 0.4 0.6 0.8 1 1.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups AdjustedRandIndex NTM LDA Proposed method 0 0.2 0.4 0.6 0.8 1 1.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups AdjustedRandIndex NTM LDA Proposed method 0 0.04 0.08 0.12 0.16 0.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups Avg.SilhouetteScore LDA Proposed method 0 0.2 0.4 0.6 0.8 1 1.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups AdjustedRandIndex NTM LDA Proposed method 0 0.2 0.4 0.6 0.8 1 1.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups AdjustedRandIndex NTM LDA Proposed method (c) Dunn index. (d) Average Silhouette score. Fig. 6: Evaluation of the proposed framework using benchmark labels (a and b) and locality of the topics (c and d). expected classes. ARI and NMI are larger for the proposed methods for all the datasets. This implies that our framework realizes the expected themes of the collections better than LDA and NTM. Not only the expected categories better match with the topic assignments, but also the generated topics are local in the corresponding space of our framework. Higher Dunn index and higher average silhouette coefficient for all the datasets, as depicted in Fig. 6(c) and (d), imply that our model provides high quality local topics. Notice that Fig. 6(c) and (d) do not have NTM. This is because Dunn index and average silhouette coefficient require document vectors, but NTM [6] does not directly use any document vector; rather, it uses precomputed word vectors only. 0 5 10 15 20 25 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups ClusterEnrichment NTM LDA Proposed method 0 0.2 0.4 0.6 0.8 1 1.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups AdjustedRandIndex NTM LDA Proposed method 0 0.2 0.4 0.6 0.8 1 1.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups AdjustedRandIndex NTM LDA Proposed method 0 0.2 0.4 0.6 0.8 1 1.2 Synthetic Reuters-R8 Reuters-R52 WebKB 20-news-groups AdjustedRandIndex NTM LDA Proposed method Fig. 7: Comparison of numbers of topics enriched by hypergeometric distribution. We also used a hypergeometric distribution based procedure to map each topic to a class label. Fig. 7 shows that the topic assignments su- ing our framework have higher num- ber of enriched topics than any other method. This indicates that the topics generated by our methods has higher thematic resemblance with the benchmark labels. Table 2: Evaluation using the EM- BERS news article dataset. Method Evaluation metric Dunn index Silhouette score NTM 0.04 0.01 LDA 0.01 -0.015 Proposed method 0.1 0.05 All these datasets described so far, in this subsection are labeled and are widely used a ground truths in many data mining and machine learning evaluations. In addition to these datasets, we used our EMBERS data containing around 3.7 million news articles to compare locality of the topics with other methods. Table 2 shows that our method produces topics with greater Dunn index and average silhouette score than other methods. This indicates that our method performs even better when the datasets are very large.
  • 14. 14 Shamanta, Naim, Saraf, Ramakrishnan and Hossain 0 0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 9 10 TruePositive Top n MeSH terms Proposed method LDA 0 0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 9 10 TrueNegative Top n MeSH terms Proposed method LDA (a) Ratio of true positives. (b) Ratio of true negatives. Fig. 8: Comparison of our method and LDA using MeSH terms associated with the PubMed abstracts. 7.4 Evaluation using Domain Specific Information In this experiment, we used the PubMed dataset to compute overlap of domain specific information for documents in the same topic (i.e., true positive) and lack of such overlap for a pair of documents from two different topics (i.e., true negative). In the PubMed dataset, each abstract is provided with some major Medical Subject Header (MeSH) terms which come from a predefined ontology. We used these MeSH terms as domain specific information to evaluate the topics. It is expected that the sets of MeSH terms of two documents of the same topic will have some common entries, where as the sets of MeSH terms of two documents from two different topics will have lesser or no overlapping records. For each abstract, we ordered the MeSH terms based on Jaccard similarity between a MeSH terms and the abstract. Notice that if we pick up n best MeSH terms for two documents from the same topic the chance that these two sets of n best MeSH terms have common entries increases with larger n. This trend is observed in Fig. 8(a) for both our framework and LDA. The true positive ratio quickly becomes around 80% with only five best MeSH terms for each pair of documents. Now, the top n MeSH terms of two documents from two different topics should have higher absence of overlapping terms with smaller n since the topical similarity of these two documents is minimal. As n increases the true negative ratio will decrease due to inclusion of more general entries in the lists of n best MeSH terms. Fig. 8(b) shows the expected trend for both LDA and our framework. We selected random 5,000 pairs of documents from same topics and another 5,000 pairs from different topics for the two plots, Fig. 8(a) and (b) respectively. Fig. 8(a) and (b) demonstrate that our method follows an expected trend of sharing domain specific information. Although the true positive values are slightly lower than LDA in our method in some cases, the true negative values are always greater than LDA. This indicates that our model generates topics containing similar biological themes while documents of different topics, as expected, have lesser similarity in domain specific information. 7.5 Runtime Characteristics Fig. 9 depicts the runtime behavior of our proposed framework with varying number of documents, topics, and vector size. The runtime increases almost linearly with each of these variables. This indicates our proposed framework
  • 15. Concurrent Inference of Topic Models and Distributed Representations 15 0 5 10 15 20 25 1K 2K 3K 4K 5K 6K 7K 8K 9K 10K Runtime(sec) Number of documents 25 words/doc 50 words/doc 75 words/doc 100 words/doc 0 10 20 30 40 50 60 10 20 30 40 50 60 70 80 90 100 Runtime(sec) Number of topics 25 words/doc 50 words/doc 75 words/doc 100 words/doc 0 5 10 15 20 25 30 35 40 50 100 150 200 250 300 350 400 Runtime(sec) Vector size 25 words/doc 50 words/doc 75 words/doc 100 words/doc (a) (b) (c) Fig. 9: Execution time with varying (a) number of documents, (b) number of topics, and (c) vector size. is scalable with large amount of data. The experiments in this space were done using synthetic data with different number of words in each document as depicted by multiple lines in each of the plots of Fig. 9. 8 Conclusion We have presented a framework to generate distributed vectors for elements in a corpus as well as the underlying latent topics. All types of vectors — topics, doc- uments, and words — share the same space allowing the framework to compute relationships between all types of elements. Our results show that the framework can efficiently discover latent topics and generate distributed vectors simultane- ously. The proposed framework is expressive and able to capture domain specific information in a lower-dimensional space. In future, we will investigate how one can study the information genealogy of a document collection with temporal signatures using the proposed framework. We are inspired by the fact that we can train the distributed vector generation network in a sequence as found in the temporal signatures associated with the documents and observe the shift of the word probabilities at the output of the network. We can also observe how the probability distributions of the topic generation network change over the given time sequence. This would help identify how one topic influence and transcend another and how the topical vocabulary shifts over time. Acknowledgments. This work is supported in part by M. S. Hossain’s startup grant at UTEP, University Research Institute (URI, Office of Research and and Sponsored Projects, UTEP), and the Intelligence Advanced Research Projects Activity (IARPA) via DoI/NBC contract number D12PC000337. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The US Government is authorized to reproduce and distribute reprints of this work for Governmental purposes notwithstanding any copyright annotation thereon. References 1. L. AlSumait, D. Barbar´a, and C. Domeniconi. On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. In ICDM’08, pages 3–12, 2008.
  • 16. 16 Shamanta, Naim, Saraf, Ramakrishnan and Hossain 2. Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic lan- guage model. Machine Learning Research, 3:1137–1155, 2003. 3. D. Blei and J. Lafferty. Correlated topic models. Advances in Neural Information Processing Systems, 18:147, 2006. 4. D. M. Blei and J. D. Lafferty. Dynamic topic models. In ICML’06, pages 113–120, 2006. 5. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Machine Learning Research, 3:993–1022, 2003. 6. Z. Cao, S. Li, Y. Liu, W. Li, and H. Ji. A novel neural topic model and its supervised extension. In AAAI’15, 2015. 7. G. J. Chaitin. Algorithmic information theory. Wiley Online Library, 1982. 8. D. J. Chalmers. Syntactic transformations on distributed representations. In Con- nectionist Natural Language Processing, pages 46–55. Springer, 1992. 9. S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harsh- man. Indexing by latent semantic analysis. American Society for Information Science, 41(6):391–407, 1990. 10. J. C. Dunn. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. 1973. 11. T. L. Griffiths, M. Steyvers, D. M. Blei, and J. B. Tenenbaum. Integrating topics and syntax. In NIPS’04, pages 537–544, 2004. 12. G. E. Hinton. Learning distributed representations of concepts. In CogSci’86, volume 1, page 12, 1986. 13. T. Hofmann. Probabilistic latent semantic indexing. In SIGIR’99, pages 50–57. ACM, 1999. 14. J. E. Hummel and K. J. Holyoak. Distributed representations of structure: A theory of analogical access and mapping. Psychological Review, 104(3):427, 1997. 15. H. Larochelle and S. Lauly. A neural autoregressive topic model. In NIPS’12, pages 2708–2716, 2012. 16. Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In ICML’14, pages 1188–1196, 2014. 17. T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. 18. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed rep- resentations of words and phrases and their compositionality. In NIPS’13, pages 3111–3119, 2013. 19. J. B. Pollack. Recursive distributed representations. Artificial Intelligence, 46(1):77–105, 1990. 20. N. Ramakrishnan et al. ‘Beating the news’ with EMBERS: Forecasting civil unrest using open source indicators. In SIGKDD’14, pages 1799–1808, 2014. 21. P. J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Computational and Applied Mathematics, 20(0):53 – 65, 1987. 22. D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors. Cognitive Modeling, 5, 1988. 23. D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Enrichment or depletion of a go category within a class of genes: which test? Bioinformatics, 23(4):401–407, 2007. 24. D. Steinley. Properties of the hubert-arable adjusted rand index. Psychological Methods, 9(3):386, 2004. 25. H. M. Wallach. Topic modeling: beyond bag-of-words. In ICML’06, pages 977–984, 2006. 26. L. Wan, L. Zhu, and R. Fergus. A hybrid neural network-latent topic model. In AISTATS’12, pages 1287–1294, 2012.