SlideShare a Scribd company logo
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012
DOI : 10.5121/ijist.2012.2304 53
TEXT SENTIMENTS FOR FORUMS HOTSPOT
DETECTION
K. Nirmala Devi1
and Dr. V. Murali Bhaskarn2
1
Department of CSE, Kongu Engineering College, Perundurai, Erode, Tamil Nadu
k_nirmal@kongu.ac.in
2
Principal, Paavai College of Engineering, Pachal, Namakkal, Tamil Nadu
murali66@gmail.com
ABSTRACT
The user generated content on the web grows rapidly in this emergent information age. The evolutionary
changes in technology make use of such information to capture only the user’s essence and finally the
useful information are exposed to information seekers. Most of the existing research on text information
processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online
hotspot forums by computing sentiment analysis for text data available in each forum. This approach
analyses the forum text data and computes value for each word of text. The proposed approach combines
K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be
used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the
current time span. The proposed system accuracy is compared with the other classification algorithms such
as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO
together achieve highly consistent results.
KEYWORDS
Sentiment analysis, SVM-PSO, hot spot, k-means, Text mining
1. INTRODUCTION
Data mining is the process of nontrivial extraction of implicit, previously unknown, and
potentially useful information from data that can help the businesses to make proactive and
knowledge driven decisions. It uses machine learning, statistical and visualization techniques to
discover and present knowledge that previously went unnoticed. Opinion mining is an important
sub discipline within data mining and natural language processing (NLP), which automatically
extracts, classifies, and understands the opinion generated by various users. These techniques also
help to enhance the value of existing information resources that can be integrated with new
products and systems as they are brought on-line.
The growth of tremendous amount of online information from various forums has made very
difficult for the customers to acquire information that are useful to them. This has motivated on
the detection of hotspot forums [5] where useful information are quickly made available for those
customers which might make them benefit in decision making process. In topic-based
classification, topic related words are important.
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012
54
Efficient statistical and machine learning techniques can be applied to process the enormous
amount of online data. An emergent technique called Emotional polarity computation also
known as sentiment analysis [6] can also be performed during online text mining. However, in
opinion classification, topic-related words are not very important. But, opinion words that
indicate positive or negative opinions are important, e.g., great, excellent, amazing, horrible, bad,
worst, etc. Most of the methodologies for opinion mining apply some forms of machine learning
techniques for classification.
Customized-algorithms specifically for opinion classification have also been developed, which
exploit opinion words and phrases together with some scoring functions. In this paper we detect
the hotspot forums by computing text sentiment analysis. This method quantifies the user
attention on any forum with which hotspot forums can be identified. The proposed work is then
integrated with K-means clustering and Support Vector Machine with Particle Swarm
Optimization (SVM-PSO) algorithm. It optimally groups the forums into two clusters, forming
hotspot forums and non-hotspot forums within each time window.
The rest of the paper is structured as follows: Section 2 discusses related works that describes
various existing semantic orientation- based sentiment classification approaches. The proposed
Support Vector Machine along with Particle Swarm Optimization (SVM-PSO) algorithm is
discussed in Section 3. The experimental results were discussed in Section 4. Finally Section 5
concludes the paper.
2. RELATED WORK
This section focuses various streams of related work such as analysis of review mining, sentiment
classification, machine learning techniques for predicting hotspots.
2.1. Analysis of Review Mining
Mining of online reviews has become a flourishing frontier in today’s environment as it can
provide a solid basis for predicting future events. For example Zhou et al in 2005 [1] has stated
that online reviews became more useful and influence the sales as it provides important
information about the product to potential consumers.
A multi-knowledge based approach is proposed where WordNet, statistical analysis and movie
knowledge are integrated. The experimental results have shown the effectiveness of the approach
in movie review mining and summarizing.
Hu et al [4], in his work has proposed a method in which a generated and semantic orientation
labelled list containing only adjectives are used for analysing. Finally it is observed that machine
learning is used to depict the interacting structure of reviews.
2.2. Sentiment Classification
The documents available on the web can be classified based on various metrics including topics,
authors, structures, and so forth. Classification based on sentiments has become a new frontier to
text mining community. The task of sentiment classification is to determine the semantic
orientations of words, sentences or documents. Most of the early work on this topic used words as
the processing unit.
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012
55
An automatic sentiment classification at document level has been done by Pang and
Vaithyanathan[7] in which several machine learning approaches are used with common text
features to classify movie reviews from IMDB. It has been pointed out that direct marketing is a
promotion process which has motivated customers to place orders through various channels [6,9].
In order to work for this, one is needed to have an accurate customer segmentation based on a
good understanding of the customers, so that relevant product information can be delivered to
different customer segments. Thelwall et al. [11] has stated that analysing Twitter has given
insights into why certain events resonate with the people.
It is found that the customers, who are used to having only a limited range of product choices due
to physical and/or time constraints, are now facing the problem of information overload. An
effective way of increasing customer satisfaction and consequently customer loyalty has been
done that has helped the customers identify products according to their interests. This again has
called for the provision of personalized product recommendations [8, 9]. Hofmann and Puzicha in
their work have used the Latent Class Model (LCM) to circumvent the aforementioned problems.
Paltoglou and Thelwall [2,10] have explored in their work that incorporating sentiment
information into Vector Space Model (VSM) values using supervised methods was helpful for
sentiment analysis.
2.3. Machine Learning Techniques for Predicting Hot Spots
For predicting online hotspot forums two machine learning techniques [5] have been proposed by
Nan Li and Dash. It includes K-means and SVM. Unlike other learning methods, SVM’s
performance is related not to the number of features in the system, but to the margin with which it
separates the data. SVM [3] achieves a clustering result by exactly classifying each forum as
either hotspot forum or non-hotspot forum.
3. PROPOSED WORK
The proposed work helps in detecting hotspot forums and achieves highly consistent results by
applying an efficient optimization algorithm with SVM-PSO. Figure 1 depicts the conceptual
diagram of proposed approach.
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012
56
Figure 1. Conceptual diagram of the proposed approach
3.1. Pre processing
The data set used in our experimental research is acquired from forums.digitalpoint.com and after
data cleaning they are formatted to 37 different forums and 1616 threads. The data collection is
initiated by crawling the forum names of first 50 forums. The parsed forum names are then stored
in a table. Then all the thread posts and the reply posts contained in the corresponding web pages
are parsed and they are stored separately in a table. After crawling process is achieved data
cleaning is done where noise data and irrelevant data are manually removed. Noise data include
forums with picture postings that are not clearly shown online. Irrelevant data are from forums
where the posting contents are not related to the forum threads at all. The threads that have no
replies and the forums that have no threads across the time window are also removed. The data
before cleaning and after cleaning are listed in Table 1. Finally after cleaning, 37 forums are
narrowed down within the time span from January to October and each time window is of a
month length over the year 2011.
Table 1. Data view before cleaning and after cleaning
Before cleaning After cleaning
Time period 2007 Jan to 2011 Oct 2011 Jan to 2011 Oct
Number of forums 50 37
Number of threads 2430 1616
Number of replies 39239 19370
K-means
clustering model
SVM-PSO
classification model
Predicted cluster
using K-means
Cluster analysis Class analysis
Result comparison
and evaluation
Sentiment values for postings
Identified binary
classes (hotspots,
non-hotspots)
Text data collected from forums
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012
57
3.2. Feature Extraction
The pre-processing work is followed by feature extraction process. For each forum five features
are extracted across each time window such as the number of threads, the average number of
replies of threads, the average sentiment value of threads, the fraction of positive threads among
all the threads and the fraction of negative threads among all the threads. Sentiment value for each
thread can be calculated by computing text sentiment.
3.3. Sentiment Computation on Forum Text
Feature extraction includes text sentiment analysis which aims at calculating an integer value for
each piece of text. It is a semantic orientation based approach where the sentiment values for all
keywords are added to achieve the sentiment value for the whole article. The replies of
thread are decomposed into a set of keywords. For each keyword a sentiment value is assigned.
The sum of the sentiment values for all the keywords will give the sentiment value for the thread.
Suppose for a thread t, its replies are decomposed into a set of key words. For each key word wi
(i=1, 2,...,n) let the sentiment value be si. Then the sentiment value St of the thread t can be
calculated as using Eq.(1)
Calculation of sentiment value is based on SentiStrength. SentiStrength is an algorithm for text
sentiment analysis that helps in estimating the sentiment values for texts.
3.4. Forum Clustering Using K-means
After the features are extracted clustering can be carried out using K-means algorithm in Rapid
miner tool. Each forum may be represented as a data point in a vector space. During the feature
extraction process a vector is used to represent the emotional polarity of any forum and it is
composed of five elements: the number of threads, the average number of replies of threads, the
average sentiment value of threads, the fraction of positive threads among all the threads and the
fraction of negative threads among all the threads. These datasets are given as the input to the k-
means clustering where a clustered view of all the forums is obtained. The hotspot and non-
hotspot forums being obtained, within each time window are those closest to the theoretical
centres of clusters.
3.5. Forum Classification using SVM-PSO
Classification can be carried out using Support Vector Machine with Particle Swarm
Optimization (SVM-PSO) algorithm. PSO is a computational method that optimizes a problem by
iteratively trying to improve a candidate solution with regard to a given measure of quality. The
proposed work PSO aims at optimizing the accuracy of SVM classifier. The standard SVM-PSO
takes a set of input data and it optimally predicts, for each given input, which of two possible
classes comprises the input.
SVM-PSO is employed to realize hotspot forecasting. In order to forecast the hotspot forums
within the current time window the clustering result obtained by K-means approach from the
previous time window is used. SVM-PSO performs forum classification iteratively and tries to
find the optimized solution. For each SVM-PSO, the input is a forum’s representation vector and
the optimized output is achieved by classifying each forum as either hotspot forum or non-hotspot
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012
58
forum. The accuracy in predicting hotspot forums is improved with the proposed model and the
consistency of the model is validated for its performance.
4. EXPERIMENTAL RESULT
The data that we have collected for our empirical studies are from forums.digital point.com. A list
of posts in the form of threads and replies has been crawled from January 2007 to October 2011.
The data view before and after cleaning is depicted in Table1. After cleaning the data are
narrowed to 37 forums from January 2011 to October 2011 and then the features are extracted
that includes computing sentiment values for threads.
The feature extraction is then followed by K-means clustering and classification using Support
Vector Machine with Particle Swarm Optimization (SVM-PSO) among the 37 leaf forums for
each time window in 2011. Clustering and classification is done using Rapid miner tool. The
results that have been obtained using Support Vector Machine with Particle Swarm Optimization
(SVM-PSO) present a noticeable consistency with the results achieved by K-means clustering.
The forums that are most popular among the users based on average number of threads include
‘Search Marketing, Publisher Network, adcenter, General Marketing’, etc. The forums that are
popular based on average number of replies include ‘Affiliate Programs-Google, Affiliate
Network, Payments, Google-Google+’, etc. The classification model for forums from
forums.digitalpoint.com is shown in Figure 2.
Figure 2. Classification model using SVM-PSO
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012
59
Table 2 shows the initial data view for user attention that consists of average number of threads
and average number of replies for the 37 forums across the 10 time windows.
Table 2. Data view for forums over time window
Forum Id Forum name Avg
num of
threads
Avg
num of
replies
4 Guidelines / Compliance 4.0 9.475
5 Placement / Reviews /
Examples
4.9 4.71428
6 Reporting & Stats 4.8 14.89583
7 Payments 4.2 17.95238
8 AdWords 5.0 8.48
9 Analytics 4.2 8.61904
10 Google-Google+ 4.5 16.28888
11 Affiliate Network 4.8 22.22916
12 Sitemaps 4.4 9.40909
13 Google API 4.6 9.19565
14 Product Search 4.5 14.48888
16 Publisher Network 5.0 14.48
17 Search Marketing 5.1 9.74509
18 Yahoo API 4.9 9.26530
20 AdCenter 5.0 11.82
21 All Other Search Engines 4.6 16.82608
23 Solicitations &
Announcements
3.8 9.07894
24 ODP / DMOZ 4.8 12.20833
26 General Marketing 5.0 10.74
28 Keywords 4.6 7.97826
29 Sandbox 4.3 8.46511
32 Facebook API 3.0 13.4
33 Twitter 3.0 14.26666
34 Social Network-Google+ 4.4 10.36363
35 Link Development 4.7 14.89361
37 Digital Point Ads 4.0 9.5
38 Google AdWords 3.3 4.18181
39 Yahoo Search Marketing 3.7 9.59459
40 Microsoft adCenter 3.6 7.75
43 Commission Junction 4.7 10.06382
44 Affiliate Programs-Google 4.3 23.79069
45 Pepperjam 4.4 10.29545
46 Azoogle 3.8 13.02631
47 Amazon 4.3 14.06976
48 EBay 4.2 9.30952
49 ClickBank 4.9 14.22448
50 Chitika 4.5 9.77777
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012
60
The forums that are mostly identified as hotspots by both K-means clustering and Support Vector
Machine with Particle Swarm Optimization (SVM-PSO) over the time window from JAN 2011 to
OCT 2011 are shown in Table 3.
Table 3. Forums mostly identified as hot spot by K-means and SVM-PSO
Forum ID Forum name
11
44
10
6
49
Affiliate Network
Google
Google+
Reporting & Stats
ClickBank
4.1. Performance Evaluation
The consistency between K-means and Support Vector Machine along with Particle Swarm
Optimization (SVM-PSO) algorithms is validated using five metrics. They are accuracy,
sensitivity, specificity, positive predictive value and negative predictive value.
A set of these five metrics are applied for each time window which are defined as follows.
Accuracy = TP + TN / (TP + TN + FP + FN) (2)
Sensitivity = TP / (TP + FN) (3)
Specificity = TN / (TN + FP) (4)
PPV = TP / (TP + FP) (5)
NPV = TN / (TN + FN) (6)
Where, TP denotes the number of forums that are estimated as hotspots by both K-means and
SVM-PSO. TN denotes the number of forums that are estimated as non -hotspots by both K-
means and SVM-PSO. FP denotes the number of forums that are estimated as hotspots by SVM-
PSO whereas non-hotspots by K-means. FN denotes the number of forums that are estimated as
non-hotspots by SVM-PSO whereas hotspots by K-means.
Using formulae (1)-(5), the performance is evaluated for each time window. The time windows
are those that are used in SVM-PSO classification process. Table 4 suggests that the proposed
classification algorithm gives an optimized accuracy result than that of the other classification
algorithms. Similarly the performance can be evaluated using other four metrics and the results
can be compared. Figure.3 shows a graphical view of the accuracy result.
Table 4. Comparison of Accuracy using different algorithms with SVM-PSO
Time
window
Accuracy (%)
Naïve Bayes
Decision
Tree
SVM SVM-PSO
2 64 80 84 96
3 60 54.1 60 60
4 61.54 60 61.54 65.38
5 96.54 99 99 96.15
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012
61
6 60 62.22 60 73.33
7 84.38 80.99 81.25 84.38
8 68.57 64 65.71 68.57
9 90 93.1 94.59 94.59
10 48.65 58.2 60 74
Figure 3. Accuracy comparison while using SVM-PSO and other algorithms
5. CONCLUSIONS
This paper proposes a new approach for predicting hotspot forums. In this approach emotional
polarity of the text is obtained by computing a value for each piece of text. After calculating the
sentiment values the method is then integrated with K-means clustering and SVM-PSO
classification algorithms for forums cluster analysis. Computation indicates both K-means and
SVM-PSO produce consistent grouping results. Thus the efficient detection of hotspot
forums based on sentiment analysis might make internet social network members benefit in the
decision making process.
REFERENCES
[1] Chaovalit P. and Zhou L.(2005), “Movie review mining: a comparison between supervised and
unsupervised classification approaches”, Proceedings of the 38th Hawaii International Conference on
System Sciences.
[2] Paltoglou and Thelwall M.(2010), “A study of information retrieval weighting schemes for sentiment
analysis”, In Proceedings of the ACL, pp 1386–1395.
[3] Nirmala Devi K ., Preethi T., and Murali Bhaskaran V.(2012), “A semantic enhanced approach for
online hotspot forums detection”, Proceedings of second International Conference on Recent Trends
in Information Technology ,pp 497-501.
[4] Hu M., and Liu B.(2004), “Mining and summarizing customer reviews”, Proceedings of ACM
Transactions on Knowledge and Data Engineering, pp168-177.
[5] Nan Li., and Wu Dash D. (2010) “Using text mining and sentiment analysis for online forums hotspot
detection and forecast” Decision Support Systems 48, pp 354–368.
International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012
62
[6] Li, M. Huang, and X. Zhu. (2010) “Sentiment analysis with global topics and local dependency”, In
Proceedings of AAAI, pp 1371–1376.
[7] Pang B., Lee L., Vaithyanathan S.(2002), “Thumbs up? Sentiment classification using machine
learning techniques”, Proceedings of the Conference on Empirical Methods in Natural Language
Processing, pp 79-86.
[8] Popescu A. and Etzioni O.(2005), “Extracting product features and opinions from reviews”,
Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp339-346.
[9] Sindhwani S. and Mellville A.(2008), “Document-Word Co regularization for Semi-supervised
Sentiment Analysis”, Eighth IEEE International Conference on Publication, pp1025 – 1030.
[10] Thelwall M., Kevan B., Paltoglou G., Cai D., Kappas A.(2010), “Sentiment strength detection in
short informal text”, Journal of the American Society for Information Science and Technology,
pp2544–2558.
[11] Thelwall M., Buckley K., and Paltoglo G.(2011), “Sentiment in Twitter Events”, Journal of the
American Society for Information Science and Technology, 62(2), pp 406–418.
Authors
Dr. V.Murali Bhaskaran, M.E., Ph.D., Principal, Paavai College of Engineering, Pachal,
Namakkal-637 018, India, He obtained his Bachelors degree in Computer Science and
Engineering,” from Bharathidasan University, Thiruchirapalli and MS in Computer
Science from BITS, Pilani and Masters Degree in Computer Science and Engineering from
Bharathiyar University, Coimbatore. He completed PhD in Network Security from
Bharathiyar University, Coimbatore. He presented 22 papers in National and International
Conferences. He published 14 papers in international journals. He is presently working as
a Principal of Paavai College of Engineering, Pachal, Namakkal. He received the “Best Staff” award for
the year 1991- 1992 at Sathyabama Engineering College, Chennai. and 2002-2003 in Kongu Engineering
College, Perundurai. He is guiding 10 research scholars and his area of interest is Cryptography and
Network Security, High Speed Networks, and Computer Architecture.
K. Nirmala Devi, M.C.A, M.E., (Ph.D)., Assistant Professor(SLG), Kongu Engineering
College, Perundurai, Erode-638 052, India. She obtained her Bachelors degree in
Computer Science from Bharathiar University, Coimbatore and Masters of Computer
Applications from Bharathiar University, Coimbatore and M.E degree in CSE from Anna
University, Chennai. She is currently doing research in Data mining under Anna
University, Coimbatore. She is presently working as a Assistant Professor(SLG ) in the
Department of Computer Science and Engineering, Kongu Engineering College,
Perundurai, Tamilnadu, India. Her area of interest is Data mining, Soft Computing, Data structures and
analysis of algorithms, and Compiler Design. She has presented papers in National and international
conferences and also published papers in national and international journals.

More Related Content

Similar to TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION (20)

PDF
A Novel Hybrid Classification Approach for Sentiment Analysis of Text Document
IJECEIAES
 
PDF
Analysis Levels And Techniques A Survey
Liz Adams
 
PDF
Using NLP Approach for Analyzing Customer Reviews
cscpconf
 
PDF
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
csandit
 
PDF
An Unsupervised Approach For Reputation Generation
Kayla Jones
 
PDF
Web Opinion Mining
Erhard Dinhobl
 
PDF
L017358286
IOSR Journals
 
PDF
Sentiment Features based Analysis of Online Reviews
iosrjce
 
PDF
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
IRJET Journal
 
PDF
Ijetcas14 580
Iasir Journals
 
PDF
Text pre-processing of multilingual for sentiment analysis based on social ne...
IJECEIAES
 
PDF
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
PDF
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
PDF
A Study On Sentiment Analysis Methods And Tools
Jim Jimenez
 
PDF
A Survey on Sentiment Analysis and Opinion Mining
IJSRD
 
PDF
A Survey on Sentiment Analysis and Opinion Mining
IJSRD
 
PDF
B021202011015
theijes
 
PDF
Ijmet 10 01_094
IAEME Publication
 
PDF
Anu paper(IJARCCE)
Anu Maheshwari
 
PDF
A scalable, lexicon based technique for sentiment analysis
ijfcstjournal
 
A Novel Hybrid Classification Approach for Sentiment Analysis of Text Document
IJECEIAES
 
Analysis Levels And Techniques A Survey
Liz Adams
 
Using NLP Approach for Analyzing Customer Reviews
cscpconf
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
csandit
 
An Unsupervised Approach For Reputation Generation
Kayla Jones
 
Web Opinion Mining
Erhard Dinhobl
 
L017358286
IOSR Journals
 
Sentiment Features based Analysis of Online Reviews
iosrjce
 
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
IRJET Journal
 
Ijetcas14 580
Iasir Journals
 
Text pre-processing of multilingual for sentiment analysis based on social ne...
IJECEIAES
 
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
A Study On Sentiment Analysis Methods And Tools
Jim Jimenez
 
A Survey on Sentiment Analysis and Opinion Mining
IJSRD
 
A Survey on Sentiment Analysis and Opinion Mining
IJSRD
 
B021202011015
theijes
 
Ijmet 10 01_094
IAEME Publication
 
Anu paper(IJARCCE)
Anu Maheshwari
 
A scalable, lexicon based technique for sentiment analysis
ijfcstjournal
 

More from ijistjournal (20)

PDF
TECHNIQUES FOR ATTACKING WEB APPLICATION SECURITY
ijistjournal
 
PPTX
Submit Your Research Articles - International Journal of Information Sciences...
ijistjournal
 
PDF
A SURVEY OF BIG DATA ANALYTICS..........
ijistjournal
 
PDF
7th International Conference on Machine Learning & Applications (CMLA 2025)
ijistjournal
 
PDF
International Journal of Information Sciences and Techniques (IJIST)
ijistjournal
 
PPTX
Call for Papers - International Journal of Information Sciences and Technique...
ijistjournal
 
PDF
6th International Conference on Natural Language Computing Advances (NLCA 2025)
ijistjournal
 
PDF
CLOUD COMPUTING – KEY PILLAR FOR DIGITAL INDIA
ijistjournal
 
PPTX
Online Paper Submission - International Journal of Information Sciences and T...
ijistjournal
 
PDF
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
ijistjournal
 
PDF
6th International Conference on Advances in Artificial Intelligence Technique...
ijistjournal
 
PPTX
Submit Your Research Articles - International Journal of Information Sciences...
ijistjournal
 
PDF
AN OVERVIEW OF CLOUD COMPUTING FOR E-LEARNING WITH ITS KEY BENEFITS
ijistjournal
 
PDF
6th International Conference on Artificial Intelligence and Machine Learning ...
ijistjournal
 
PPTX
Call for Papers - International Journal of Information Sciences and Technique...
ijistjournal
 
PDF
7th International Conference on Machine Learning & Applications (CMLA 2025)
ijistjournal
 
PDF
PHISHING DETECTION IN IMS USING DOMAIN ONTOLOGY AND CBA – AN INNOVATIVE RULE ...
ijistjournal
 
PDF
International Journal of Information Sciences and Techniques (IJIST)
ijistjournal
 
PPTX
Online Paper Submission - International Journal of Information Sciences and T...
ijistjournal
 
PDF
FUZZY BASED HYPERSPECTRAL IMAGE SEGMENTATION USING SUBPIXEL DETECTION
ijistjournal
 
TECHNIQUES FOR ATTACKING WEB APPLICATION SECURITY
ijistjournal
 
Submit Your Research Articles - International Journal of Information Sciences...
ijistjournal
 
A SURVEY OF BIG DATA ANALYTICS..........
ijistjournal
 
7th International Conference on Machine Learning & Applications (CMLA 2025)
ijistjournal
 
International Journal of Information Sciences and Techniques (IJIST)
ijistjournal
 
Call for Papers - International Journal of Information Sciences and Technique...
ijistjournal
 
6th International Conference on Natural Language Computing Advances (NLCA 2025)
ijistjournal
 
CLOUD COMPUTING – KEY PILLAR FOR DIGITAL INDIA
ijistjournal
 
Online Paper Submission - International Journal of Information Sciences and T...
ijistjournal
 
STUDY OF NAMED ENTITY RECOGNITION FOR INDIAN LANGUAGES
ijistjournal
 
6th International Conference on Advances in Artificial Intelligence Technique...
ijistjournal
 
Submit Your Research Articles - International Journal of Information Sciences...
ijistjournal
 
AN OVERVIEW OF CLOUD COMPUTING FOR E-LEARNING WITH ITS KEY BENEFITS
ijistjournal
 
6th International Conference on Artificial Intelligence and Machine Learning ...
ijistjournal
 
Call for Papers - International Journal of Information Sciences and Technique...
ijistjournal
 
7th International Conference on Machine Learning & Applications (CMLA 2025)
ijistjournal
 
PHISHING DETECTION IN IMS USING DOMAIN ONTOLOGY AND CBA – AN INNOVATIVE RULE ...
ijistjournal
 
International Journal of Information Sciences and Techniques (IJIST)
ijistjournal
 
Online Paper Submission - International Journal of Information Sciences and T...
ijistjournal
 
FUZZY BASED HYPERSPECTRAL IMAGE SEGMENTATION USING SUBPIXEL DETECTION
ijistjournal
 
Ad

Recently uploaded (20)

PPTX
Comparison of Flexible and Rigid Pavements in Bangladesh
Arifur Rahman
 
PDF
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 
PDF
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
PDF
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
محمد قصص فتوتة
 
PDF
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
PDF
Python Mini Project: Command-Line Quiz Game for School/College Students
MPREETHI7
 
PPT
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
PPTX
Precooling and Refrigerated storage.pptx
ThongamSunita
 
PDF
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
Mark Billinghurst
 
PDF
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
Mark Billinghurst
 
PPTX
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
PDF
01-introduction to the ProcessDesign.pdf
StiveBrack
 
PDF
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
PPTX
Bitumen Emulsion by Dr Sangita Ex CRRI Delhi
grilcodes
 
PPTX
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
PPTX
LECTURE 7 COMPUTATIONS OF LEVELING DATA APRIL 2025.pptx
rr22001247
 
PPTX
Work at Height training for workers .pptx
cecos12
 
PDF
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
PDF
Decision support system in machine learning models for a face recognition-bas...
TELKOMNIKA JOURNAL
 
Comparison of Flexible and Rigid Pavements in Bangladesh
Arifur Rahman
 
13th International Conference of Security, Privacy and Trust Management (SPTM...
ijcisjournal
 
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
تقرير عن التحليل الديناميكي لتدفق الهواء حول جناح.pdf
محمد قصص فتوتة
 
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
Python Mini Project: Command-Line Quiz Game for School/College Students
MPREETHI7
 
دراسة حاله لقرية تقع في جنوب غرب السودان
محمد قصص فتوتة
 
Precooling and Refrigerated storage.pptx
ThongamSunita
 
Rapid Prototyping for XR: Lecture 2 - Low Fidelity Prototyping.
Mark Billinghurst
 
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
Mark Billinghurst
 
Kel.3_A_Review_on_Internet_of_Things_for_Defense_v3.pptx
Endang Saefullah
 
01-introduction to the ProcessDesign.pdf
StiveBrack
 
Plant Control_EST_85520-01_en_AllChanges_20220127.pdf
DarshanaChathuranga4
 
Bitumen Emulsion by Dr Sangita Ex CRRI Delhi
grilcodes
 
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
LECTURE 7 COMPUTATIONS OF LEVELING DATA APRIL 2025.pptx
rr22001247
 
Work at Height training for workers .pptx
cecos12
 
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
Decision support system in machine learning models for a face recognition-bas...
TELKOMNIKA JOURNAL
 
Ad

TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION

  • 1. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012 DOI : 10.5121/ijist.2012.2304 53 TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION K. Nirmala Devi1 and Dr. V. Murali Bhaskarn2 1 Department of CSE, Kongu Engineering College, Perundurai, Erode, Tamil Nadu [email protected] 2 Principal, Paavai College of Engineering, Pachal, Namakkal, Tamil Nadu [email protected] ABSTRACT The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results. KEYWORDS Sentiment analysis, SVM-PSO, hot spot, k-means, Text mining 1. INTRODUCTION Data mining is the process of nontrivial extraction of implicit, previously unknown, and potentially useful information from data that can help the businesses to make proactive and knowledge driven decisions. It uses machine learning, statistical and visualization techniques to discover and present knowledge that previously went unnoticed. Opinion mining is an important sub discipline within data mining and natural language processing (NLP), which automatically extracts, classifies, and understands the opinion generated by various users. These techniques also help to enhance the value of existing information resources that can be integrated with new products and systems as they are brought on-line. The growth of tremendous amount of online information from various forums has made very difficult for the customers to acquire information that are useful to them. This has motivated on the detection of hotspot forums [5] where useful information are quickly made available for those customers which might make them benefit in decision making process. In topic-based classification, topic related words are important.
  • 2. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012 54 Efficient statistical and machine learning techniques can be applied to process the enormous amount of online data. An emergent technique called Emotional polarity computation also known as sentiment analysis [6] can also be performed during online text mining. However, in opinion classification, topic-related words are not very important. But, opinion words that indicate positive or negative opinions are important, e.g., great, excellent, amazing, horrible, bad, worst, etc. Most of the methodologies for opinion mining apply some forms of machine learning techniques for classification. Customized-algorithms specifically for opinion classification have also been developed, which exploit opinion words and phrases together with some scoring functions. In this paper we detect the hotspot forums by computing text sentiment analysis. This method quantifies the user attention on any forum with which hotspot forums can be identified. The proposed work is then integrated with K-means clustering and Support Vector Machine with Particle Swarm Optimization (SVM-PSO) algorithm. It optimally groups the forums into two clusters, forming hotspot forums and non-hotspot forums within each time window. The rest of the paper is structured as follows: Section 2 discusses related works that describes various existing semantic orientation- based sentiment classification approaches. The proposed Support Vector Machine along with Particle Swarm Optimization (SVM-PSO) algorithm is discussed in Section 3. The experimental results were discussed in Section 4. Finally Section 5 concludes the paper. 2. RELATED WORK This section focuses various streams of related work such as analysis of review mining, sentiment classification, machine learning techniques for predicting hotspots. 2.1. Analysis of Review Mining Mining of online reviews has become a flourishing frontier in today’s environment as it can provide a solid basis for predicting future events. For example Zhou et al in 2005 [1] has stated that online reviews became more useful and influence the sales as it provides important information about the product to potential consumers. A multi-knowledge based approach is proposed where WordNet, statistical analysis and movie knowledge are integrated. The experimental results have shown the effectiveness of the approach in movie review mining and summarizing. Hu et al [4], in his work has proposed a method in which a generated and semantic orientation labelled list containing only adjectives are used for analysing. Finally it is observed that machine learning is used to depict the interacting structure of reviews. 2.2. Sentiment Classification The documents available on the web can be classified based on various metrics including topics, authors, structures, and so forth. Classification based on sentiments has become a new frontier to text mining community. The task of sentiment classification is to determine the semantic orientations of words, sentences or documents. Most of the early work on this topic used words as the processing unit.
  • 3. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012 55 An automatic sentiment classification at document level has been done by Pang and Vaithyanathan[7] in which several machine learning approaches are used with common text features to classify movie reviews from IMDB. It has been pointed out that direct marketing is a promotion process which has motivated customers to place orders through various channels [6,9]. In order to work for this, one is needed to have an accurate customer segmentation based on a good understanding of the customers, so that relevant product information can be delivered to different customer segments. Thelwall et al. [11] has stated that analysing Twitter has given insights into why certain events resonate with the people. It is found that the customers, who are used to having only a limited range of product choices due to physical and/or time constraints, are now facing the problem of information overload. An effective way of increasing customer satisfaction and consequently customer loyalty has been done that has helped the customers identify products according to their interests. This again has called for the provision of personalized product recommendations [8, 9]. Hofmann and Puzicha in their work have used the Latent Class Model (LCM) to circumvent the aforementioned problems. Paltoglou and Thelwall [2,10] have explored in their work that incorporating sentiment information into Vector Space Model (VSM) values using supervised methods was helpful for sentiment analysis. 2.3. Machine Learning Techniques for Predicting Hot Spots For predicting online hotspot forums two machine learning techniques [5] have been proposed by Nan Li and Dash. It includes K-means and SVM. Unlike other learning methods, SVM’s performance is related not to the number of features in the system, but to the margin with which it separates the data. SVM [3] achieves a clustering result by exactly classifying each forum as either hotspot forum or non-hotspot forum. 3. PROPOSED WORK The proposed work helps in detecting hotspot forums and achieves highly consistent results by applying an efficient optimization algorithm with SVM-PSO. Figure 1 depicts the conceptual diagram of proposed approach.
  • 4. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012 56 Figure 1. Conceptual diagram of the proposed approach 3.1. Pre processing The data set used in our experimental research is acquired from forums.digitalpoint.com and after data cleaning they are formatted to 37 different forums and 1616 threads. The data collection is initiated by crawling the forum names of first 50 forums. The parsed forum names are then stored in a table. Then all the thread posts and the reply posts contained in the corresponding web pages are parsed and they are stored separately in a table. After crawling process is achieved data cleaning is done where noise data and irrelevant data are manually removed. Noise data include forums with picture postings that are not clearly shown online. Irrelevant data are from forums where the posting contents are not related to the forum threads at all. The threads that have no replies and the forums that have no threads across the time window are also removed. The data before cleaning and after cleaning are listed in Table 1. Finally after cleaning, 37 forums are narrowed down within the time span from January to October and each time window is of a month length over the year 2011. Table 1. Data view before cleaning and after cleaning Before cleaning After cleaning Time period 2007 Jan to 2011 Oct 2011 Jan to 2011 Oct Number of forums 50 37 Number of threads 2430 1616 Number of replies 39239 19370 K-means clustering model SVM-PSO classification model Predicted cluster using K-means Cluster analysis Class analysis Result comparison and evaluation Sentiment values for postings Identified binary classes (hotspots, non-hotspots) Text data collected from forums
  • 5. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012 57 3.2. Feature Extraction The pre-processing work is followed by feature extraction process. For each forum five features are extracted across each time window such as the number of threads, the average number of replies of threads, the average sentiment value of threads, the fraction of positive threads among all the threads and the fraction of negative threads among all the threads. Sentiment value for each thread can be calculated by computing text sentiment. 3.3. Sentiment Computation on Forum Text Feature extraction includes text sentiment analysis which aims at calculating an integer value for each piece of text. It is a semantic orientation based approach where the sentiment values for all keywords are added to achieve the sentiment value for the whole article. The replies of thread are decomposed into a set of keywords. For each keyword a sentiment value is assigned. The sum of the sentiment values for all the keywords will give the sentiment value for the thread. Suppose for a thread t, its replies are decomposed into a set of key words. For each key word wi (i=1, 2,...,n) let the sentiment value be si. Then the sentiment value St of the thread t can be calculated as using Eq.(1) Calculation of sentiment value is based on SentiStrength. SentiStrength is an algorithm for text sentiment analysis that helps in estimating the sentiment values for texts. 3.4. Forum Clustering Using K-means After the features are extracted clustering can be carried out using K-means algorithm in Rapid miner tool. Each forum may be represented as a data point in a vector space. During the feature extraction process a vector is used to represent the emotional polarity of any forum and it is composed of five elements: the number of threads, the average number of replies of threads, the average sentiment value of threads, the fraction of positive threads among all the threads and the fraction of negative threads among all the threads. These datasets are given as the input to the k- means clustering where a clustered view of all the forums is obtained. The hotspot and non- hotspot forums being obtained, within each time window are those closest to the theoretical centres of clusters. 3.5. Forum Classification using SVM-PSO Classification can be carried out using Support Vector Machine with Particle Swarm Optimization (SVM-PSO) algorithm. PSO is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. The proposed work PSO aims at optimizing the accuracy of SVM classifier. The standard SVM-PSO takes a set of input data and it optimally predicts, for each given input, which of two possible classes comprises the input. SVM-PSO is employed to realize hotspot forecasting. In order to forecast the hotspot forums within the current time window the clustering result obtained by K-means approach from the previous time window is used. SVM-PSO performs forum classification iteratively and tries to find the optimized solution. For each SVM-PSO, the input is a forum’s representation vector and the optimized output is achieved by classifying each forum as either hotspot forum or non-hotspot
  • 6. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012 58 forum. The accuracy in predicting hotspot forums is improved with the proposed model and the consistency of the model is validated for its performance. 4. EXPERIMENTAL RESULT The data that we have collected for our empirical studies are from forums.digital point.com. A list of posts in the form of threads and replies has been crawled from January 2007 to October 2011. The data view before and after cleaning is depicted in Table1. After cleaning the data are narrowed to 37 forums from January 2011 to October 2011 and then the features are extracted that includes computing sentiment values for threads. The feature extraction is then followed by K-means clustering and classification using Support Vector Machine with Particle Swarm Optimization (SVM-PSO) among the 37 leaf forums for each time window in 2011. Clustering and classification is done using Rapid miner tool. The results that have been obtained using Support Vector Machine with Particle Swarm Optimization (SVM-PSO) present a noticeable consistency with the results achieved by K-means clustering. The forums that are most popular among the users based on average number of threads include ‘Search Marketing, Publisher Network, adcenter, General Marketing’, etc. The forums that are popular based on average number of replies include ‘Affiliate Programs-Google, Affiliate Network, Payments, Google-Google+’, etc. The classification model for forums from forums.digitalpoint.com is shown in Figure 2. Figure 2. Classification model using SVM-PSO
  • 7. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012 59 Table 2 shows the initial data view for user attention that consists of average number of threads and average number of replies for the 37 forums across the 10 time windows. Table 2. Data view for forums over time window Forum Id Forum name Avg num of threads Avg num of replies 4 Guidelines / Compliance 4.0 9.475 5 Placement / Reviews / Examples 4.9 4.71428 6 Reporting & Stats 4.8 14.89583 7 Payments 4.2 17.95238 8 AdWords 5.0 8.48 9 Analytics 4.2 8.61904 10 Google-Google+ 4.5 16.28888 11 Affiliate Network 4.8 22.22916 12 Sitemaps 4.4 9.40909 13 Google API 4.6 9.19565 14 Product Search 4.5 14.48888 16 Publisher Network 5.0 14.48 17 Search Marketing 5.1 9.74509 18 Yahoo API 4.9 9.26530 20 AdCenter 5.0 11.82 21 All Other Search Engines 4.6 16.82608 23 Solicitations & Announcements 3.8 9.07894 24 ODP / DMOZ 4.8 12.20833 26 General Marketing 5.0 10.74 28 Keywords 4.6 7.97826 29 Sandbox 4.3 8.46511 32 Facebook API 3.0 13.4 33 Twitter 3.0 14.26666 34 Social Network-Google+ 4.4 10.36363 35 Link Development 4.7 14.89361 37 Digital Point Ads 4.0 9.5 38 Google AdWords 3.3 4.18181 39 Yahoo Search Marketing 3.7 9.59459 40 Microsoft adCenter 3.6 7.75 43 Commission Junction 4.7 10.06382 44 Affiliate Programs-Google 4.3 23.79069 45 Pepperjam 4.4 10.29545 46 Azoogle 3.8 13.02631 47 Amazon 4.3 14.06976 48 EBay 4.2 9.30952 49 ClickBank 4.9 14.22448 50 Chitika 4.5 9.77777
  • 8. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012 60 The forums that are mostly identified as hotspots by both K-means clustering and Support Vector Machine with Particle Swarm Optimization (SVM-PSO) over the time window from JAN 2011 to OCT 2011 are shown in Table 3. Table 3. Forums mostly identified as hot spot by K-means and SVM-PSO Forum ID Forum name 11 44 10 6 49 Affiliate Network Google Google+ Reporting & Stats ClickBank 4.1. Performance Evaluation The consistency between K-means and Support Vector Machine along with Particle Swarm Optimization (SVM-PSO) algorithms is validated using five metrics. They are accuracy, sensitivity, specificity, positive predictive value and negative predictive value. A set of these five metrics are applied for each time window which are defined as follows. Accuracy = TP + TN / (TP + TN + FP + FN) (2) Sensitivity = TP / (TP + FN) (3) Specificity = TN / (TN + FP) (4) PPV = TP / (TP + FP) (5) NPV = TN / (TN + FN) (6) Where, TP denotes the number of forums that are estimated as hotspots by both K-means and SVM-PSO. TN denotes the number of forums that are estimated as non -hotspots by both K- means and SVM-PSO. FP denotes the number of forums that are estimated as hotspots by SVM- PSO whereas non-hotspots by K-means. FN denotes the number of forums that are estimated as non-hotspots by SVM-PSO whereas hotspots by K-means. Using formulae (1)-(5), the performance is evaluated for each time window. The time windows are those that are used in SVM-PSO classification process. Table 4 suggests that the proposed classification algorithm gives an optimized accuracy result than that of the other classification algorithms. Similarly the performance can be evaluated using other four metrics and the results can be compared. Figure.3 shows a graphical view of the accuracy result. Table 4. Comparison of Accuracy using different algorithms with SVM-PSO Time window Accuracy (%) Naïve Bayes Decision Tree SVM SVM-PSO 2 64 80 84 96 3 60 54.1 60 60 4 61.54 60 61.54 65.38 5 96.54 99 99 96.15
  • 9. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012 61 6 60 62.22 60 73.33 7 84.38 80.99 81.25 84.38 8 68.57 64 65.71 68.57 9 90 93.1 94.59 94.59 10 48.65 58.2 60 74 Figure 3. Accuracy comparison while using SVM-PSO and other algorithms 5. CONCLUSIONS This paper proposes a new approach for predicting hotspot forums. In this approach emotional polarity of the text is obtained by computing a value for each piece of text. After calculating the sentiment values the method is then integrated with K-means clustering and SVM-PSO classification algorithms for forums cluster analysis. Computation indicates both K-means and SVM-PSO produce consistent grouping results. Thus the efficient detection of hotspot forums based on sentiment analysis might make internet social network members benefit in the decision making process. REFERENCES [1] Chaovalit P. and Zhou L.(2005), “Movie review mining: a comparison between supervised and unsupervised classification approaches”, Proceedings of the 38th Hawaii International Conference on System Sciences. [2] Paltoglou and Thelwall M.(2010), “A study of information retrieval weighting schemes for sentiment analysis”, In Proceedings of the ACL, pp 1386–1395. [3] Nirmala Devi K ., Preethi T., and Murali Bhaskaran V.(2012), “A semantic enhanced approach for online hotspot forums detection”, Proceedings of second International Conference on Recent Trends in Information Technology ,pp 497-501. [4] Hu M., and Liu B.(2004), “Mining and summarizing customer reviews”, Proceedings of ACM Transactions on Knowledge and Data Engineering, pp168-177. [5] Nan Li., and Wu Dash D. (2010) “Using text mining and sentiment analysis for online forums hotspot detection and forecast” Decision Support Systems 48, pp 354–368.
  • 10. International Journal of Information Sciences and Techniques (IJIST) Vol.2, No.3, May 2012 62 [6] Li, M. Huang, and X. Zhu. (2010) “Sentiment analysis with global topics and local dependency”, In Proceedings of AAAI, pp 1371–1376. [7] Pang B., Lee L., Vaithyanathan S.(2002), “Thumbs up? Sentiment classification using machine learning techniques”, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp 79-86. [8] Popescu A. and Etzioni O.(2005), “Extracting product features and opinions from reviews”, Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp339-346. [9] Sindhwani S. and Mellville A.(2008), “Document-Word Co regularization for Semi-supervised Sentiment Analysis”, Eighth IEEE International Conference on Publication, pp1025 – 1030. [10] Thelwall M., Kevan B., Paltoglou G., Cai D., Kappas A.(2010), “Sentiment strength detection in short informal text”, Journal of the American Society for Information Science and Technology, pp2544–2558. [11] Thelwall M., Buckley K., and Paltoglo G.(2011), “Sentiment in Twitter Events”, Journal of the American Society for Information Science and Technology, 62(2), pp 406–418. Authors Dr. V.Murali Bhaskaran, M.E., Ph.D., Principal, Paavai College of Engineering, Pachal, Namakkal-637 018, India, He obtained his Bachelors degree in Computer Science and Engineering,” from Bharathidasan University, Thiruchirapalli and MS in Computer Science from BITS, Pilani and Masters Degree in Computer Science and Engineering from Bharathiyar University, Coimbatore. He completed PhD in Network Security from Bharathiyar University, Coimbatore. He presented 22 papers in National and International Conferences. He published 14 papers in international journals. He is presently working as a Principal of Paavai College of Engineering, Pachal, Namakkal. He received the “Best Staff” award for the year 1991- 1992 at Sathyabama Engineering College, Chennai. and 2002-2003 in Kongu Engineering College, Perundurai. He is guiding 10 research scholars and his area of interest is Cryptography and Network Security, High Speed Networks, and Computer Architecture. K. Nirmala Devi, M.C.A, M.E., (Ph.D)., Assistant Professor(SLG), Kongu Engineering College, Perundurai, Erode-638 052, India. She obtained her Bachelors degree in Computer Science from Bharathiar University, Coimbatore and Masters of Computer Applications from Bharathiar University, Coimbatore and M.E degree in CSE from Anna University, Chennai. She is currently doing research in Data mining under Anna University, Coimbatore. She is presently working as a Assistant Professor(SLG ) in the Department of Computer Science and Engineering, Kongu Engineering College, Perundurai, Tamilnadu, India. Her area of interest is Data mining, Soft Computing, Data structures and analysis of algorithms, and Compiler Design. She has presented papers in National and international conferences and also published papers in national and international journals.