SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 680
Mining Users Rare Sequential Topic Patterns from Tweets based on
Topic Extraction
Bhakti Patil1, Sachin Takmare2, Rahul Mirajkar3, Pramod Kharade4
1Student, Dept. of Computer Science & Engineering, Bharati Vidyapeeth’s College of Engg,
Kolhapur, Maharashtra, India.
2,3,4 Professor, Dept. of Computer Science & Engineering, Bharati Vidyapeeth’s College of Engg,
Kolhapur, Maharashtra, India.
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Twitter is an online news and social networking
service where users post and interact with messages, "tweets"
spontaneously. Most of the existing works are dedicated to
discovering the abstract "topics" that occur in a collection of
documents and creation of discrete topic. It means when a
specific user publishes successive documents then successive
relation between topics is totally ignored. In this paper, a
different approach for detecting users’ Sequential Topic
Patterns is proposed which consequentially characterizesand
detects personalized and abnormal behaviors of users and
then we prepare the problem of Mining Users Rare Sequential
Topic Patterns(URSTP) from Tweets. URSTPs are rare for all
users but relatively frequent for some specific users, so this
approach can be applied in many real-life scenarios, such as
real-time monitoring on abnormal user behaviors. Wepresent
a group of algorithms to solve suchinnovative miningproblem
using different phases such as preprocessing to extract
probabilistic topics, identifying sessions for different users,
generating all the STP candidates and selecting URSTPs by
making user-aware rarity analysis on derived STPs.
Experiment show that our approach can significant to find
special users and interpretable URSTPs, which significantly
indicate users’ characteristics.
Key Words: Sequential topics, Web mining, Topic
Extraction, Keyword Extraction, frequent patterns,
clustering.
1. INTRODUCTION
Social networking servicesuchasfacebook,Twitter,
LinkedIn creates an environment where user could spend a
lot of time on it and use it for different purposes. Based on
this interaction between users, we have a huge amount of
data for each individual user. Documents of such services
focus on some particular topic. Topic provides users
characteristics. Text mining is one and only way to mine the
piece of information for extracting topics. Generally some
probabilistic topic models such as LDA [1], classical PLSI[5]
and their extensions[3],[4],[6],[7],[8],[9] are used for topic
extraction.
In the literature most of the researchers
concentrates on adaptation of single topic to identify and
imagine social events and user behaviors [10], [11], [12].
Some researchers studied relation between the different
topics of successive documents published by same user
successively where some hidden but important information
behaviors has been neglected which uncovers personalized
behaviors of that user.
In this paper we mainly concentrates on relation
mainly between theextractedsequential topicsreferthemas
Sequential Topic Patterns (STP) that indirectly reflects user
behaviors. For a document stream some STPs may occur
frequently and so it reflects common behaviors of involved
users. But away from that, there may still exists some other
patterns which are infrequentforthegeneral population, but
occur relatively frequent for some specific user or some
specific group of users. We refer themUser-awareRareSTPs
(URSTPs). Compared to frequent patterns, discovering rare
patterns is interesting andimportant.Basically,itformulates
a new problem for rare event mining, so that it is possible to
characterize personalized and abnormal behaviors for
special users’ behavior.
In our case STPs can characterize complete
browsing behaviors of readers. Then compared with
statistical methods, miningURSTPs canbettertofind special
interests and browsing habits of users, andisthuscapable to
give effective and context-aware recommendation for them.
Our approach will concentrate on published document
streams.
Solving such important problem of mining URSTPs
in document streams, new technical provocations are raised
and will be solved in this paper. First, the input of the
approach is a text stream, so existing techniques of
probabilistic databases cannot be directly applied to solve
this problem. A preprocessing phase is required and
important thing to get conceptual and probabilistic
descriptions of documents by topic extraction, and then to
identify complete and repeated liveliness ofusersbysession
identification. Second, in case of the real-time requirements
in many applications, both the precision and the
effectiveness of mining algorithms are important, especially
for the probability computation process. Third, unlike from
frequent patterns,theuseraware rare patterncaneffectively
characterize most of personalized and abnormal behaviors
of users and can applied to different application scenarios.
And correspondingly, unsupervised mining algorithms for
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 681
this kind of rare patterns need to be designed in a manner
different from existing frequent pattern mining algorithms.
2. LITERATURE REVIEW
Topic mining in document collections has been extensively
studied in the literature.Topic Detectionand Tracking(TDT)
task covers detection and tracking of topics (events)in news
based on keywords. A lot of probabilistic generative models
for extracting topics from documents were also proposed,
such as LDA [1], PLSI and their extension[2] also models for
short texts like Twitter-LDA [3].
LDA is a three-level hierarchical Bayesian model .Each item
of a collection is modeled as a finite mixture over an
underlying set of topics. Each topic is modeled as an infinite
mixture. Instead of text modeling, the topic probabilities
provide an explicit representation of a document. Blei, Ng,
and Jordan have presented an efficient approximate
inference techniques based on variational methods and an
EM algorithm for empirical Bayes parameter estimation.
Li and McCallum proposed pachinkoallocationmodel (PAM)
that captures arbitrary and sparse correlations between
topics using a directed acyclic graph (DAG) which is not
possible in case of LDA. The child node of the DAG shows
individual words in the vocabulary, while each interiornode
represents a correlation among its children, which may be
words or other interior nodes (topics).
Zhao, Jiang, Weng, He, Lim, Yan, and Li have compared the
tweets with a traditional news medium by using
unsupervised topic modeling. They discover topics through
Twitter-LDA model then they compare Twitter topics with
topics of news using text mining techniques. Also they
concentrate on relation between tweets and retweets.
In case if real application, the content ofdocumentcollection
is temporal so various dynamic topic models have been
proposed to discover topics over time in document streams
and then offline social events are predicated. One of that is
dynamic topic model proposed by Blei and Lafferty in which
it uses state space to represent the topics. Approximate
posterior inference over the latent topics carried out by
variational approximations based on Kalman filters and
nonparametric wavelet regression. Dynamic topic models
provide a qualitative perspective into the contents of a large
document collection.
The important problem in data mining is mining Sequential
pattern. The frequency of a sequential pattern is evaluated
by using support. The mining algorithms like PrefixSpan,
FreeSpan, SPADE have been proposed based on support.
These algorithms find outsfrequent sequential patterns with
support values are not less than a user-defined threshold,
and then used by SLPMiner to deal with length-decreasing
support constraints. Muzammal et al. concentrated on
sequence-level uncertainty in sequential databases, and
proposed methods to calculate the frequency of a sequential
pattern based on expected support, using generate-and-test
or pattern-growth. This paper is an extension of our
previous work.
3. PRELIMINARIES
At first, we define some basic concepts in a usual way.
Definition 1 (Document)
A text document d in a document collection D consists of a
many number of words from a fixed vocabulary V = {w1, w2,
......., w|v|}. Document can be represented d = {c (d, w)} where
wv, c denotes the occurrence number of the word w in d.
Definition 2 (Topic)
A topic z in the text collection D is represented by a
probabilistic distribution of wordsinthegiven vocabularyV.
Definition 3 (Topic-Level Document)
Given an original document dD and a topic set T, the
corresponding topic-level document tdd is defined as a set
of topic-probability pairs, in the form of{(z,p(z|d))} where
zT.
Definition 4 (Document Stream)
A document stream is defined as a set that consists of
sequence of document number, a document published by
user ui at time ti on a specific website, and ti tj for all i j.
Definition 5 (Sequential Topic Pattern)
A Sequential Topic Pattern (STP) is defined as a topic
sequence of topics i.e. [z1, z2, ....., zn] where topic z T.
Definition 6 (Session)
A session s is defined as a subsequence of topic level
document stream associated with thesameuser,i.e.itisa set
of topic level document with its associated time for different
user.
Definition 8 (Support of STP)
It is defined as a probability of topics with respect to
sessions.
Definition 9(User-Aware Rare STP)
An STP a is called a User-aware Rare STP (URSTP) if and
only if both scaled support is less than or equal to scaled
support threshold and relativeraritygreaterthanor equal to
relative rarity threshold hold for some user u.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 682
4. MINING USERS RARE SEQUENTIAL TOPIC
PATTERNS
At first we have list of users’ tweets collected using some
API. The one that we used was... Topic detection from the
whole number of documents needs some pre-processing
initially. At the first step, we will remove stop words and
repeated posts. Stop words are such as [“at”,” the”,” how”
etc.]. We now have list of cleaned twitter posts or we have
list of cleaned documents. Tweets = [list of tweets of all
users]. Each tweet has a list of posts. Therefore, for each
user/tweet we are removing repeated words and stop
words. This new list of tweets will be the input of keyword
extraction algorithm to extract keywords. Fig.-1 shows
different topic extraction methods.
Fig -1: Keyword Extraction Methods
After that with cosine similarity, we cluster keywords. It
means that we cluster posts, which are similar to eachother.
This approach is trained to work as unsupervised topic
detection. Now, new tweets will be the input to our next
step topic extraction. So, the output will be the index of the
lookup table, which gives the list of topicsasshowninFig–2.
Fig -2: Overview of Keyword Extraction Process
Naïve Bayes classifiers which is a simple probabilistic
classifiers dependent on using Bayes' theorem with strong
(naive) independenceassumptions betweenthefeatures. We
use such a baseline method for text categorization. This
popular method solves problem of judging documents as
belonging to one category or the other such as sports or
politics, etc. with word frequencies. Naive Bayes classifiers
requires a number of parameters linear in the number of
variables i.e. features/predictors.
Naive Bayes model assigns class labels toprobleminstances,
represented as vectors of feature values, where the class
labels are drawn from some finite set. All naive Bayes
classifiers suppose that the value of a particular feature is
independent of the value of any other feature,giventheclass
variable. A naive Bayes classifier considers each of these
features to contribute independently to the probability,
regardless of any possible correlations .It works in case of a
small number of training data and estimates the parameters
required for classification. Using Bayesian probability
terminology,
Posterior = (likelihood * prior) / evidence
We use Apriori algorithmto operateondatabasescontaining
transactions (for example, collections of items bought by
customers, or details of a website frequentation. Each
transaction is a set of items (an itemset).Givena thresholdC,
the algorithm identifies the item sets which are subsets of at
least C transactions in the database.
In this method frequent subsets are carried out oneitemata
time and groups of candidates are trying out against the
data. Candidate item sets are counted by using breadth-first
search and a Hash tree structure. Itgeneratescandidateitem
sets of length K from item sets of length k-1. Then it
minimizes the candidates which have an infrequent sub
pattern.
Now we propose an approachtominingraresequential topic
patterns in document streams. The main processing
framework is shown in Fig.-3.
Fig -3: Processing framework of URSTP mining
It consists of three main phases. At first, text documents are
collected from some micro-blog sites or forums (in our case
we crawled tweets from Twitter using Twitter API ),anduse
a document stream as the input of our approach. Then, in
preprocessing phase, we first remove useless symbols such
as “@”, “#”, URL in the input tweet and stop words. Then
original stream is transformed to a topic level document
stream and then divided into many sessions to identify
complete user behaviors. Finally and most importantly, we
find out all the STP candidates in thedocumentstreamfor all
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 683
users, and finally pick out important URSTPs associated to
specific users by user-aware rarity analysis.
5. RESULT ANALYSIS
In this section, we apply our mining URSTP mechanism to
find out rare topics. In order to simulate the proposed
architecture, we implemented approach by using minimum
1GB RAM and 60GB (or above) hard disk. The results are
carried out with different file size. As the file size increases,
the time required for execution increases in both naïve
Bayes algorithm and Apriori algorithm. But as compare to
Apriori Algorithm, Naïve Bayes requires less time. The
results are shown in Fig. 1 and 2.
Chart -1: Time costs of the two algorithms
Chart -2: Performance Time Difference
Fig. 1 we compare, the time required to execute Naïve Bayes
Algorithm and Apriori Algorithm of file size ranging from
1KB to 1000KB.
In Fig. 2 we compare, the time required for Sequential Topic
Pattern and User Aware Rare Sequential Topic Patterns to
find out the same from 1 KB to 1000 KB file size.
6. CONCLUSIONS
Twitter Tweets capability of social networking sites isreally
high. In order to tackle this ability of social networking sites,
we propose some new methods. At first we extract users’
posts through API then we extract appropriate topics
depending on certain keywords. It is then shown that by
creating clusters based on keywords which are helpful in
easier detection of topicsfromusers’tweets.These extracted
topics are then advantageous to real-time monitoring on
abnormal behaviors of users. Also we proposed an effective
approach to discover special users and interesting URSTPs
from document streams i.e. user tweet, which captures
users’ personalized and abnormal behaviors and
characteristics of users’. We are interested in the dual
problem, i.e., discovering sequential topicpatternsoccurring
frequently on the whole, but relativelyrareforspecificusers.
REFERENCES
[1] D. Blei, A. Ng, and M. Jordan, “Latent dirichlet
allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–
1022, 2003.
[2] W. Li and A. McCallum, “Pachinko allocation: DAG-
structured mixture models of topic correlations,”in
Proc. ACM Int. Conf. Mach. Learn.,2006,vol.148,pp.
577–584.
[3] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan,
and X. Li, “Comparing Twitter and traditional media
using topic models,” in Proc. 33rd Eur. Conf. Adv.
Inf. Retrieval, 2011, pp. 338–349.
[4] D. M. Blei and J. D. Lafferty, “Dynamic topic models,”
in Proc. ACM Int. Conf. Mach. Learn., 2006, pp. 113–
120.
[5] T. Hofmann, “Probabilistic latent semantic
indexing,” in Proc. 22nd Annu. Int. ACM SIGIR Conf.
Res. Develop. Inf. Retrieval, 1999, pp. 50–57.
[6] D. Blei and J. Lafferty, “Correlated topic models,”
Adv. Neural Inf. Process. Syst., vol. 18, pp. 147–154,
2006.
[7] L. Hong and B. D. Davison, “Empirical study of topic
modeling in Twitter,” in Proc. 1st Workshop Soc.
Media Anal., 2010, pp. 80–88.
[8] Z. Hu, H. Wang, J. Zhu, M. Li, Y. Qiao, and C. Deng,
“Discovery of rare sequential topic patterns in
document stream,” in Proc. SIAM Int. Conf. Data
Mining, 2014, pp. 533–541.
[9] A. Krause, J. Leskovec, and C. Guestrin, “Data
association for topic intensity tracking,” in Proc.
ACM Int. Conf. Mach. Learn., 2006, pp. 497–504.
[10] Q. Mei, C. Liu, H. Su, and C. Zhai, “A
probabilistic approach to spatiotemporal theme
pattern mining on weblogs,” in Proc. 15th Int. Conf.
World Wide Web, 2006, pp. 533–542.
[11] W. Dou, X. Wang, D. Skau, W. Ribarsky, and
M. X. Zhou, “LeadLine: Interactive visual analysis of
text data through event identification and
exploration,” in Proc. IEEE Conf. Vis. Anal. Sci.
Technol., 2012, pp. 93–102.
[12] G. P. C. Fung, J. X. Yu, P. S. Yu, and H. Lu, “Parameter
free bursty events detection in text streams,” in Proc.
31st Int. Conf. Very Large Data Bases, 2005, pp. 181–
192.
Ad

More Related Content

What's hot (19)

Novelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articlesNovelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articles
csandit
 
[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma
IJET - International Journal of Engineering and Techniques
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with Idioms
Waqas Tariq
 
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
cscpconf
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data MiningA Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining
ijsc
 
Experimental Result Analysis of Text Categorization using Clustering and Clas...
Experimental Result Analysis of Text Categorization using Clustering and Clas...Experimental Result Analysis of Text Categorization using Clustering and Clas...
Experimental Result Analysis of Text Categorization using Clustering and Clas...
ijtsrd
 
Topic detecton by clustering and text mining
Topic detecton by clustering and text miningTopic detecton by clustering and text mining
Topic detecton by clustering and text mining
IRJET Journal
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
IJDKP
 
Bl24409420
Bl24409420Bl24409420
Bl24409420
IJERA Editor
 
IRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft ComputingIRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET Journal
 
E43022023
E43022023E43022023
E43022023
IJERA Editor
 
A systematic study of text mining techniques
A systematic study of text mining techniquesA systematic study of text mining techniques
A systematic study of text mining techniques
ijnlc
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ijaia
 
Hc3612711275
Hc3612711275Hc3612711275
Hc3612711275
IJERA Editor
 
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHTEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
IJDKP
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text Clustering
IJRES Journal
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
Bhaskar Mitra
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
Dustin Smith
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
Hiroyuki Kuromiya
 
Novelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articlesNovelty detection via topic modeling in research articles
Novelty detection via topic modeling in research articles
csandit
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with Idioms
Waqas Tariq
 
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
cscpconf
 
A Review on Text Mining in Data Mining
A Review on Text Mining in Data MiningA Review on Text Mining in Data Mining
A Review on Text Mining in Data Mining
ijsc
 
Experimental Result Analysis of Text Categorization using Clustering and Clas...
Experimental Result Analysis of Text Categorization using Clustering and Clas...Experimental Result Analysis of Text Categorization using Clustering and Clas...
Experimental Result Analysis of Text Categorization using Clustering and Clas...
ijtsrd
 
Topic detecton by clustering and text mining
Topic detecton by clustering and text miningTopic detecton by clustering and text mining
Topic detecton by clustering and text mining
IRJET Journal
 
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATION
IJDKP
 
IRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft ComputingIRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET Journal
 
A systematic study of text mining techniques
A systematic study of text mining techniquesA systematic study of text mining techniques
A systematic study of text mining techniques
ijnlc
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ijaia
 
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHTEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACH
IJDKP
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text Clustering
IJRES Journal
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
Bhaskar Mitra
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
Dustin Smith
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
Hiroyuki Kuromiya
 

Similar to Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extraction (20)

IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET Journal
 
An in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPAn in-depth review on News Classification through NLP
An in-depth review on News Classification through NLP
IRJET Journal
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
IRJET Journal
 
Prediction of User Rare Sequential Topic Patterns of Internet Users
Prediction of User Rare Sequential Topic Patterns of Internet UsersPrediction of User Rare Sequential Topic Patterns of Internet Users
Prediction of User Rare Sequential Topic Patterns of Internet Users
IRJET Journal
 
Research on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesResearch on ontology based information retrieval techniques
Research on ontology based information retrieval techniques
Kausar Mukadam
 
Evolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelEvolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability Model
IJERA Editor
 
Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining Applications
IRJET Journal
 
A simplified classification computational model of opinion mining using deep ...
A simplified classification computational model of opinion mining using deep ...A simplified classification computational model of opinion mining using deep ...
A simplified classification computational model of opinion mining using deep ...
IJECEIAES
 
Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...
IRJET Journal
 
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big DataIRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
IRJET Journal
 
Dr31564567
Dr31564567Dr31564567
Dr31564567
IJMER
 
M045067275
M045067275M045067275
M045067275
IJERA Editor
 
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
IJDKP
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
dannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
IJwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
dannyijwest
 
Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase
IJECEIAES
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
cscpconf
 
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGEXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
ijnlc
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clustering
IRJET Journal
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET Journal
 
An in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPAn in-depth review on News Classification through NLP
An in-depth review on News Classification through NLP
IRJET Journal
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
IRJET Journal
 
Prediction of User Rare Sequential Topic Patterns of Internet Users
Prediction of User Rare Sequential Topic Patterns of Internet UsersPrediction of User Rare Sequential Topic Patterns of Internet Users
Prediction of User Rare Sequential Topic Patterns of Internet Users
IRJET Journal
 
Research on ontology based information retrieval techniques
Research on ontology based information retrieval techniquesResearch on ontology based information retrieval techniques
Research on ontology based information retrieval techniques
Kausar Mukadam
 
Evolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelEvolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability Model
IJERA Editor
 
Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining Applications
IRJET Journal
 
A simplified classification computational model of opinion mining using deep ...
A simplified classification computational model of opinion mining using deep ...A simplified classification computational model of opinion mining using deep ...
A simplified classification computational model of opinion mining using deep ...
IJECEIAES
 
Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...
IRJET Journal
 
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big DataIRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
IRJET-A Review on Topic Detection and Term-Term Relation Analysis in Big Data
IRJET Journal
 
Dr31564567
Dr31564567Dr31564567
Dr31564567
IJMER
 
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
IJDKP
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
dannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
IJwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
dannyijwest
 
Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase
IJECEIAES
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
cscpconf
 
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELINGEXPERT OPINION AND COHERENCE BASED TOPIC MODELING
EXPERT OPINION AND COHERENCE BASED TOPIC MODELING
ijnlc
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clustering
IRJET Journal
 
Ad

More from IRJET Journal (20)

Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
BRAIN TUMOUR DETECTION AND CLASSIFICATIONBRAIN TUMOUR DETECTION AND CLASSIFICATION
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ..."Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
Breast Cancer Detection using Computer VisionBreast Cancer Detection using Computer Vision
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the HeliosphereAnalysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
A Novel System for Recommending Agricultural Crops Using Machine Learning App...A Novel System for Recommending Agricultural Crops Using Machine Learning App...
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the HeliosphereAnalysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
FIR filter-based Sample Rate Convertors and its use in NR PRACH
FIR filter-based Sample Rate Convertors and its use in NR PRACHFIR filter-based Sample Rate Convertors and its use in NR PRACH
FIR filter-based Sample Rate Convertors and its use in NR PRACH
IRJET Journal
 
Kiona – A Smart Society Automation Project
Kiona – A Smart Society Automation ProjectKiona – A Smart Society Automation Project
Kiona – A Smart Society Automation Project
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
Invest in Innovation: Empowering Ideas through Blockchain Based CrowdfundingInvest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUBSPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...
AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...
AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
BRAIN TUMOUR DETECTION AND CLASSIFICATIONBRAIN TUMOUR DETECTION AND CLASSIFICATION
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ..."Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
Breast Cancer Detection using Computer VisionBreast Cancer Detection using Computer Vision
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the HeliosphereAnalysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
A Novel System for Recommending Agricultural Crops Using Machine Learning App...A Novel System for Recommending Agricultural Crops Using Machine Learning App...
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the HeliosphereAnalysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
FIR filter-based Sample Rate Convertors and its use in NR PRACH
FIR filter-based Sample Rate Convertors and its use in NR PRACHFIR filter-based Sample Rate Convertors and its use in NR PRACH
FIR filter-based Sample Rate Convertors and its use in NR PRACH
IRJET Journal
 
Kiona – A Smart Society Automation Project
Kiona – A Smart Society Automation ProjectKiona – A Smart Society Automation Project
Kiona – A Smart Society Automation Project
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
Invest in Innovation: Empowering Ideas through Blockchain Based CrowdfundingInvest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUBSPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...
AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...
AR Application: Homewise VisionMs. Vaishali Rane, Om Awadhoot, Bhargav Gajare...
IRJET Journal
 
Ad

Recently uploaded (20)

Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Introduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptxIntroduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptx
AS1920
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Structural Response of Reinforced Self-Compacting Concrete Deep Beam Using Fi...
Journal of Soft Computing in Civil Engineering
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design ThinkingDT REPORT by Tech titan GROUP to introduce the subject design Thinking
DT REPORT by Tech titan GROUP to introduce the subject design Thinking
DhruvChotaliya2
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
Reagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptxReagent dosing (Bredel) presentation.pptx
Reagent dosing (Bredel) presentation.pptx
AlejandroOdio
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.Fort night presentation new0903 pdf.pdf.
Fort night presentation new0903 pdf.pdf.
anuragmk56
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Introduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptxIntroduction to Zoomlion Earthmoving.pptx
Introduction to Zoomlion Earthmoving.pptx
AS1920
 
Metal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistryMetal alkyne complexes.pptx in chemistry
Metal alkyne complexes.pptx in chemistry
mee23nu
 
π0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalizationπ0.5: a Vision-Language-Action Model with Open-World Generalization
π0.5: a Vision-Language-Action Model with Open-World Generalization
NABLAS株式会社
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
ELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdfELectronics Boards & Product Testing_Shiju.pdf
ELectronics Boards & Product Testing_Shiju.pdf
Shiju Jacob
 
theory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptxtheory-slides-for react for beginners.pptx
theory-slides-for react for beginners.pptx
sanchezvanessa7896
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 

Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extraction

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 680 Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extraction Bhakti Patil1, Sachin Takmare2, Rahul Mirajkar3, Pramod Kharade4 1Student, Dept. of Computer Science & Engineering, Bharati Vidyapeeth’s College of Engg, Kolhapur, Maharashtra, India. 2,3,4 Professor, Dept. of Computer Science & Engineering, Bharati Vidyapeeth’s College of Engg, Kolhapur, Maharashtra, India. ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Twitter is an online news and social networking service where users post and interact with messages, "tweets" spontaneously. Most of the existing works are dedicated to discovering the abstract "topics" that occur in a collection of documents and creation of discrete topic. It means when a specific user publishes successive documents then successive relation between topics is totally ignored. In this paper, a different approach for detecting users’ Sequential Topic Patterns is proposed which consequentially characterizesand detects personalized and abnormal behaviors of users and then we prepare the problem of Mining Users Rare Sequential Topic Patterns(URSTP) from Tweets. URSTPs are rare for all users but relatively frequent for some specific users, so this approach can be applied in many real-life scenarios, such as real-time monitoring on abnormal user behaviors. Wepresent a group of algorithms to solve suchinnovative miningproblem using different phases such as preprocessing to extract probabilistic topics, identifying sessions for different users, generating all the STP candidates and selecting URSTPs by making user-aware rarity analysis on derived STPs. Experiment show that our approach can significant to find special users and interpretable URSTPs, which significantly indicate users’ characteristics. Key Words: Sequential topics, Web mining, Topic Extraction, Keyword Extraction, frequent patterns, clustering. 1. INTRODUCTION Social networking servicesuchasfacebook,Twitter, LinkedIn creates an environment where user could spend a lot of time on it and use it for different purposes. Based on this interaction between users, we have a huge amount of data for each individual user. Documents of such services focus on some particular topic. Topic provides users characteristics. Text mining is one and only way to mine the piece of information for extracting topics. Generally some probabilistic topic models such as LDA [1], classical PLSI[5] and their extensions[3],[4],[6],[7],[8],[9] are used for topic extraction. In the literature most of the researchers concentrates on adaptation of single topic to identify and imagine social events and user behaviors [10], [11], [12]. Some researchers studied relation between the different topics of successive documents published by same user successively where some hidden but important information behaviors has been neglected which uncovers personalized behaviors of that user. In this paper we mainly concentrates on relation mainly between theextractedsequential topicsreferthemas Sequential Topic Patterns (STP) that indirectly reflects user behaviors. For a document stream some STPs may occur frequently and so it reflects common behaviors of involved users. But away from that, there may still exists some other patterns which are infrequentforthegeneral population, but occur relatively frequent for some specific user or some specific group of users. We refer themUser-awareRareSTPs (URSTPs). Compared to frequent patterns, discovering rare patterns is interesting andimportant.Basically,itformulates a new problem for rare event mining, so that it is possible to characterize personalized and abnormal behaviors for special users’ behavior. In our case STPs can characterize complete browsing behaviors of readers. Then compared with statistical methods, miningURSTPs canbettertofind special interests and browsing habits of users, andisthuscapable to give effective and context-aware recommendation for them. Our approach will concentrate on published document streams. Solving such important problem of mining URSTPs in document streams, new technical provocations are raised and will be solved in this paper. First, the input of the approach is a text stream, so existing techniques of probabilistic databases cannot be directly applied to solve this problem. A preprocessing phase is required and important thing to get conceptual and probabilistic descriptions of documents by topic extraction, and then to identify complete and repeated liveliness ofusersbysession identification. Second, in case of the real-time requirements in many applications, both the precision and the effectiveness of mining algorithms are important, especially for the probability computation process. Third, unlike from frequent patterns,theuseraware rare patterncaneffectively characterize most of personalized and abnormal behaviors of users and can applied to different application scenarios. And correspondingly, unsupervised mining algorithms for
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 681 this kind of rare patterns need to be designed in a manner different from existing frequent pattern mining algorithms. 2. LITERATURE REVIEW Topic mining in document collections has been extensively studied in the literature.Topic Detectionand Tracking(TDT) task covers detection and tracking of topics (events)in news based on keywords. A lot of probabilistic generative models for extracting topics from documents were also proposed, such as LDA [1], PLSI and their extension[2] also models for short texts like Twitter-LDA [3]. LDA is a three-level hierarchical Bayesian model .Each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is modeled as an infinite mixture. Instead of text modeling, the topic probabilities provide an explicit representation of a document. Blei, Ng, and Jordan have presented an efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. Li and McCallum proposed pachinkoallocationmodel (PAM) that captures arbitrary and sparse correlations between topics using a directed acyclic graph (DAG) which is not possible in case of LDA. The child node of the DAG shows individual words in the vocabulary, while each interiornode represents a correlation among its children, which may be words or other interior nodes (topics). Zhao, Jiang, Weng, He, Lim, Yan, and Li have compared the tweets with a traditional news medium by using unsupervised topic modeling. They discover topics through Twitter-LDA model then they compare Twitter topics with topics of news using text mining techniques. Also they concentrate on relation between tweets and retweets. In case if real application, the content ofdocumentcollection is temporal so various dynamic topic models have been proposed to discover topics over time in document streams and then offline social events are predicated. One of that is dynamic topic model proposed by Blei and Lafferty in which it uses state space to represent the topics. Approximate posterior inference over the latent topics carried out by variational approximations based on Kalman filters and nonparametric wavelet regression. Dynamic topic models provide a qualitative perspective into the contents of a large document collection. The important problem in data mining is mining Sequential pattern. The frequency of a sequential pattern is evaluated by using support. The mining algorithms like PrefixSpan, FreeSpan, SPADE have been proposed based on support. These algorithms find outsfrequent sequential patterns with support values are not less than a user-defined threshold, and then used by SLPMiner to deal with length-decreasing support constraints. Muzammal et al. concentrated on sequence-level uncertainty in sequential databases, and proposed methods to calculate the frequency of a sequential pattern based on expected support, using generate-and-test or pattern-growth. This paper is an extension of our previous work. 3. PRELIMINARIES At first, we define some basic concepts in a usual way. Definition 1 (Document) A text document d in a document collection D consists of a many number of words from a fixed vocabulary V = {w1, w2, ......., w|v|}. Document can be represented d = {c (d, w)} where wv, c denotes the occurrence number of the word w in d. Definition 2 (Topic) A topic z in the text collection D is represented by a probabilistic distribution of wordsinthegiven vocabularyV. Definition 3 (Topic-Level Document) Given an original document dD and a topic set T, the corresponding topic-level document tdd is defined as a set of topic-probability pairs, in the form of{(z,p(z|d))} where zT. Definition 4 (Document Stream) A document stream is defined as a set that consists of sequence of document number, a document published by user ui at time ti on a specific website, and ti tj for all i j. Definition 5 (Sequential Topic Pattern) A Sequential Topic Pattern (STP) is defined as a topic sequence of topics i.e. [z1, z2, ....., zn] where topic z T. Definition 6 (Session) A session s is defined as a subsequence of topic level document stream associated with thesameuser,i.e.itisa set of topic level document with its associated time for different user. Definition 8 (Support of STP) It is defined as a probability of topics with respect to sessions. Definition 9(User-Aware Rare STP) An STP a is called a User-aware Rare STP (URSTP) if and only if both scaled support is less than or equal to scaled support threshold and relativeraritygreaterthanor equal to relative rarity threshold hold for some user u.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 682 4. MINING USERS RARE SEQUENTIAL TOPIC PATTERNS At first we have list of users’ tweets collected using some API. The one that we used was... Topic detection from the whole number of documents needs some pre-processing initially. At the first step, we will remove stop words and repeated posts. Stop words are such as [“at”,” the”,” how” etc.]. We now have list of cleaned twitter posts or we have list of cleaned documents. Tweets = [list of tweets of all users]. Each tweet has a list of posts. Therefore, for each user/tweet we are removing repeated words and stop words. This new list of tweets will be the input of keyword extraction algorithm to extract keywords. Fig.-1 shows different topic extraction methods. Fig -1: Keyword Extraction Methods After that with cosine similarity, we cluster keywords. It means that we cluster posts, which are similar to eachother. This approach is trained to work as unsupervised topic detection. Now, new tweets will be the input to our next step topic extraction. So, the output will be the index of the lookup table, which gives the list of topicsasshowninFig–2. Fig -2: Overview of Keyword Extraction Process Naïve Bayes classifiers which is a simple probabilistic classifiers dependent on using Bayes' theorem with strong (naive) independenceassumptions betweenthefeatures. We use such a baseline method for text categorization. This popular method solves problem of judging documents as belonging to one category or the other such as sports or politics, etc. with word frequencies. Naive Bayes classifiers requires a number of parameters linear in the number of variables i.e. features/predictors. Naive Bayes model assigns class labels toprobleminstances, represented as vectors of feature values, where the class labels are drawn from some finite set. All naive Bayes classifiers suppose that the value of a particular feature is independent of the value of any other feature,giventheclass variable. A naive Bayes classifier considers each of these features to contribute independently to the probability, regardless of any possible correlations .It works in case of a small number of training data and estimates the parameters required for classification. Using Bayesian probability terminology, Posterior = (likelihood * prior) / evidence We use Apriori algorithmto operateondatabasescontaining transactions (for example, collections of items bought by customers, or details of a website frequentation. Each transaction is a set of items (an itemset).Givena thresholdC, the algorithm identifies the item sets which are subsets of at least C transactions in the database. In this method frequent subsets are carried out oneitemata time and groups of candidates are trying out against the data. Candidate item sets are counted by using breadth-first search and a Hash tree structure. Itgeneratescandidateitem sets of length K from item sets of length k-1. Then it minimizes the candidates which have an infrequent sub pattern. Now we propose an approachtominingraresequential topic patterns in document streams. The main processing framework is shown in Fig.-3. Fig -3: Processing framework of URSTP mining It consists of three main phases. At first, text documents are collected from some micro-blog sites or forums (in our case we crawled tweets from Twitter using Twitter API ),anduse a document stream as the input of our approach. Then, in preprocessing phase, we first remove useless symbols such as “@”, “#”, URL in the input tweet and stop words. Then original stream is transformed to a topic level document stream and then divided into many sessions to identify complete user behaviors. Finally and most importantly, we find out all the STP candidates in thedocumentstreamfor all
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 683 users, and finally pick out important URSTPs associated to specific users by user-aware rarity analysis. 5. RESULT ANALYSIS In this section, we apply our mining URSTP mechanism to find out rare topics. In order to simulate the proposed architecture, we implemented approach by using minimum 1GB RAM and 60GB (or above) hard disk. The results are carried out with different file size. As the file size increases, the time required for execution increases in both naïve Bayes algorithm and Apriori algorithm. But as compare to Apriori Algorithm, Naïve Bayes requires less time. The results are shown in Fig. 1 and 2. Chart -1: Time costs of the two algorithms Chart -2: Performance Time Difference Fig. 1 we compare, the time required to execute Naïve Bayes Algorithm and Apriori Algorithm of file size ranging from 1KB to 1000KB. In Fig. 2 we compare, the time required for Sequential Topic Pattern and User Aware Rare Sequential Topic Patterns to find out the same from 1 KB to 1000 KB file size. 6. CONCLUSIONS Twitter Tweets capability of social networking sites isreally high. In order to tackle this ability of social networking sites, we propose some new methods. At first we extract users’ posts through API then we extract appropriate topics depending on certain keywords. It is then shown that by creating clusters based on keywords which are helpful in easier detection of topicsfromusers’tweets.These extracted topics are then advantageous to real-time monitoring on abnormal behaviors of users. Also we proposed an effective approach to discover special users and interesting URSTPs from document streams i.e. user tweet, which captures users’ personalized and abnormal behaviors and characteristics of users’. We are interested in the dual problem, i.e., discovering sequential topicpatternsoccurring frequently on the whole, but relativelyrareforspecificusers. REFERENCES [1] D. Blei, A. Ng, and M. Jordan, “Latent dirichlet allocation,” J. Mach. Learn. Res., vol. 3, pp. 993– 1022, 2003. [2] W. Li and A. McCallum, “Pachinko allocation: DAG- structured mixture models of topic correlations,”in Proc. ACM Int. Conf. Mach. Learn.,2006,vol.148,pp. 577–584. [3] W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li, “Comparing Twitter and traditional media using topic models,” in Proc. 33rd Eur. Conf. Adv. Inf. Retrieval, 2011, pp. 338–349. [4] D. M. Blei and J. D. Lafferty, “Dynamic topic models,” in Proc. ACM Int. Conf. Mach. Learn., 2006, pp. 113– 120. [5] T. Hofmann, “Probabilistic latent semantic indexing,” in Proc. 22nd Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 1999, pp. 50–57. [6] D. Blei and J. Lafferty, “Correlated topic models,” Adv. Neural Inf. Process. Syst., vol. 18, pp. 147–154, 2006. [7] L. Hong and B. D. Davison, “Empirical study of topic modeling in Twitter,” in Proc. 1st Workshop Soc. Media Anal., 2010, pp. 80–88. [8] Z. Hu, H. Wang, J. Zhu, M. Li, Y. Qiao, and C. Deng, “Discovery of rare sequential topic patterns in document stream,” in Proc. SIAM Int. Conf. Data Mining, 2014, pp. 533–541. [9] A. Krause, J. Leskovec, and C. Guestrin, “Data association for topic intensity tracking,” in Proc. ACM Int. Conf. Mach. Learn., 2006, pp. 497–504. [10] Q. Mei, C. Liu, H. Su, and C. Zhai, “A probabilistic approach to spatiotemporal theme pattern mining on weblogs,” in Proc. 15th Int. Conf. World Wide Web, 2006, pp. 533–542. [11] W. Dou, X. Wang, D. Skau, W. Ribarsky, and M. X. Zhou, “LeadLine: Interactive visual analysis of text data through event identification and exploration,” in Proc. IEEE Conf. Vis. Anal. Sci. Technol., 2012, pp. 93–102. [12] G. P. C. Fung, J. X. Yu, P. S. Yu, and H. Lu, “Parameter free bursty events detection in text streams,” in Proc. 31st Int. Conf. Very Large Data Bases, 2005, pp. 181– 192.