Sentiment Classification System of Twitter Data For US Airline Service Analysis
Sentiment Classification System of Twitter Data For US Airline Service Analysis
IV. WORD EMBEDDINGS AND DOCUMENT VECTORS Fig. 1. Framework of Doc2vec Distributed Memory Model (PV-DM). The
average of the vectors of the three words is calculated to predict the fourth
Word Embeddings is a technique where each word is word in the sentence. The paragraph id holds the information about the
given a unique vector representation with its semantic missing word and thus acts as a memory.
meaning taken into consideration. The diverse representation
of text data is a breakthrough for the performance of deep
learning techniques on NLP problems. Each word is mapped V. CLASSIFICATION TECHNIQUES
to a vector in a predefined vector space. These vectors are Here we describe seven different classifiers using
learned using neural networks. The learning process can be different classification techniques. These classification
done with a neural network model or by using unsupervised techniques are generally used for text classification can be
process involving document statistics. In this sentiment also used for twitter sentiment analysis.
analysis we will be making use of a neural network model
which incorporates a few aspects of deep learning. A. Decision Tree Classifier
A. Doc2Vec Model Decision tree classifier is a simple and popularly used
Numeric representation of words is a tough and algorithm to classify data. Decision Tree represent a tree like
challenging task. There are alternative techniques such as structure with internal nodes representing the test conditions
770
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:24:33 UTC from IEEE Xplore. Restrictions apply.
and leaf nodes as the class labels. This classification approach with continuous values we make use of the Gaussian
poses carefully crafted questions about the attributes of the distribution. Gaussian NB is easier to work with as we only
test data set. Each time an answer is received another follow need to compute mean and standard deviation from the
up question is asked until we can correctly classify the class training data. This classifier passes each tweet and calculates
of the test data. This classifier handles over-fitting by using the product of the probabilities of every feature present in the
post pruning approaches. tweet for each class label i.e. positive, negative and neutral.
The class label is assigned to the tweet based on the sentiment
B. Random Forest Classifier label that has biggest sentiment product. The equation for
Random forest classifier is an ensemble learning normal distribution is described as
classification algorithm. It is very similar to decision tree but
contains a multitude of decision trees and the class label is the ( )
( | ) = exp(− ) (2)
mode value of the classes predicted by individual decision
trees. This algorithm is efficient in handling large datasets and
thousands of input variables without their deletion. This F. AdaBoost Classifier
model can deal with overfitting of data points. For a dataset, Adaptive Boosting or AdaBoost is a meta-algorithm
D, with N instances and A attributes, the general procedure to formulated by Yoav Freund and Robert Schapire. It is used with
build a Random Forest ensemble classifier is as follows. For
other learning algorithms to get an improved performance. The
each time of building a candidate Decision Tree, a subset of
output of the weak learners (other classifiers) is combined into a
the dataset D, d, is sampled with replacement as the training
dataset. In each decision tree, for each node a random subset weighted sum which gives us the output of the AdaBoost
of the attributes A, a, is selected as the candidate attributes to Classifier. One drawback of this classification is that it is very
split the node. By building K Decision Trees in this way, a sensitive to noise points and outliers. The training data fed to the
Random Forest classifier is built. Random forest uses majority classifier must be of high quality.
vote and returns the class label that is has maximum votes by
the individual decision trees. Headings, or heads, are
organizational devices that guide the reader through your
paper. There are two types: component heads and text heads.
C. Logistic Regression Classifier
This algorithm was named after the core function used in
it that is the logistic function. The logistic function is also
known as the sigmoid function. It is a S-shaped curve that
takes real values as input and converts it into a range between
0 and 1. The sigmoid function is defined as follows:
() = =
D. Support Vector Machine Classifier Fig. 2. Framework of AdaBoost Classifier (Ensemble Classifier)
This algorithm works on a simple strategy of separating
hyperplanes. Given training data, the algorithm categorizes
the test data into an optimal hyperplane. The data points are G. K- Nearest Neighbour Classifier
plotted in a n-dimension vector space (n depends upon the KNN Classifier is an instance-based learner used for both
features of the data points). SVM algorithm is used for binary classification and regression tasks. This algorithm does not
classification and regression tasks but in our case, we have a use the training data to make any generalizations. It is based
3-class sentiment analysis making it multiclass SVM on feature similarity. A test sample is classified based on a
classification. We adopt the pairwise classification technique majority vote of its neighbors, the class assigned to the test
where each pair of classes will have one SVM classifier sample is the most common class among k nearest neighbors
trained to separate the classes. The overall accuracy of this [3]. When used for regression the output value is the average
classifier will be accuracies of every SVM classification of the outputs of its k nearest neighbors. This classifier is a
included [2]. Then on performing classification we find a lazy learner because nothing is done with the training data
hyperplane that differentiates the 3 classes very well. until the model tries to classify the test data. We have taken
E. Gaussian Naïve Bayes Classifier the k value to be 3 which gave us the most accurate result.
Naïve Bayes is a popular text classifier. This classifier is The k value must not be too large that it includes the noise
highly scalable. This algorithm makes use if the Bayes points or points that belong to the neighboring class.
Theorem of conditional probability [7]. Since we are dealing
771
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:24:33 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. Representation of Classification by KNN
772
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:24:33 UTC from IEEE Xplore. Restrictions apply.
without preserving the word order. However, in this research
we have done a phrase-level analysis of tweets using
document vectors (Doc2vec) which considers the word
ordering as well. The classification techniques used include
ensemble approaches such as AdaBoost which combine
several other classifiers to form one strong classifier and give
an accuracy of 84.5%. The accuracies attained by the
classifiers are high enough to be used by the airline industry
to implement customer satisfactory investigation. There is
still scope for improvement in this analysis as the major
setback is the limited number of tweets used in training the
model. By increasing the number of tweets, we can build a
stronger model thus resulting in better classification
accuracy. The approach described in this paper can be used
Fig. 7. Reasons for Negative Feedback by airline companies to analyze the twitter data.
REFERENCES
TABLE II. ACCURACY OF CLASSIFIER FOR 3- CLASS DATASET
[1] Pang, Bo, and Lillian Lee, "Opinion mining and sentiment analysis."
Classifier Precision Recall F- Measure Foundations and trends in information retrieval 2.1-2 (2008): 1-135.J.
Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol.
Decision Tree 63% 64.6% 64.5% 2. Oxford: Clarendon, 1892, pp.68–73.
Random Forest 85.6% 86.5% 86.5% [2] Xia, Rui, Chengqing Zong, and Shoushan Li, "Ensemble of feature sets
SVM 81.2% 84.4% 84.8% and classification algorithms for sentiment classification." Information
Gaussian Naïve Sciences 181.6 (2011): 1138-1152.
Bayes 64.2% 64.7% 64.6% [3] Han, Jiawei, Micheline Kamber, and Jian Pei. Data mining, southeast
AdaBoost 84.5% 83.5% 83.5% asia edition: Concepts and techniques. Morgan kaufmann, 2006.
Logistic Regression 81% 81.6% 81.9% [4] E. Cambria, H. Wang, and B. White, “Guest editorial: Big social data
KNN 59% 59.2% 59.3% analysis,” Knowledge-Based Systems, vol. 69, 2014, pp. 1–2.
[5] Quoc Le, Tomas Mikolov. “Distributed Representations of Sentences
and Documents” , Cornell University, 2014.
VII. CONCLUSION [6] S Kamal, N. Dey, A.S Ashour, s. Ripon, V.E. Balas and M. Kaysar,
“FbMapping: An automated system for monitoring facebook
This paper makes empirical contribution to the field of data data”,Neural Network World, 2017.
science and sentiment analysis. In this paper, we compare [7] Pak, Alexander, and Patrick Paroubek, "Twitter as a Corpus for
various traditional classification techniques and compare Sentiment Analysis and Opinion Mining." LREC. Vol. 10. 2010.
[8] Melville, Prem, Wojciech Gryc, and Richard D. Lawrence,
their accuracies. In the domain of sentiment analysis for "Sentiment analysis of blogs by combining lexical knowledge with
airline services very little research has been done. The past text classification." Proceedings of the 15th ACM SIGKDD
work that has been done does a word level analysis of tweets international conference on Knowledge discovery and data mining.
ACM, 2009.
773
Authorized licensed use limited to: Somaiya University. Downloaded on December 28,2023 at 13:24:33 UTC from IEEE Xplore. Restrictions apply.