0% found this document useful (0 votes)
2 views

Arabic_Sentiment_Analysis

This paper presents a study on sentence-level sentiment analysis for Arabic tweets, addressing the limited research in this area compared to English. The authors implement a machine learning approach using classifiers like Support Vector Machine and Naïve Bayes to classify sentiments as positive or negative based on feature vectors derived from tweet data. The study highlights the challenges of Arabic sentiment analysis and proposes a methodology for improving classification accuracy through feature extraction and data annotation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Arabic_Sentiment_Analysis

This paper presents a study on sentence-level sentiment analysis for Arabic tweets, addressing the limited research in this area compared to English. The authors implement a machine learning approach using classifiers like Support Vector Machine and Naïve Bayes to classify sentiments as positive or negative based on feature vectors derived from tweet data. The study highlights the challenges of Arabic sentiment analysis and proposes a methodology for improving classification accuracy through feature extraction and data annotation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/233859568

Sentence-Level Arabic Sentiment Analysis

Conference Paper · May 2012


DOI: 10.1109/CTS.2012.6261103

CITATIONS READS
252 5,843

2 authors:

Amira Shoukry Ahmed Rafea


The American University in Cairo The American University in Cairo
5 PUBLICATIONS 329 CITATIONS 168 PUBLICATIONS 2,728 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Ahmed Rafea on 26 May 2014.

The user has requested enhancement of the downloaded file.


Sentence-Level Arabic Sentiment Analysis
Amira Shoukry, Ahmed Rafea
Department of Computer Science and Engineering
The American University in Cairo
Cairo, Egypt
[email protected], [email protected]

approach in which a set of data labeled with its class such as


Abstract – Arabic sentiment analysis research existing
“positive” or “negative” are represented by feature vectors.
currently is very limited. While sentiment analysis has many
applications in English, the Arabic language is still recognizing Then, these vectors are used by the classifier as a training data
its early steps in this field. In this paper, we show an application inferring that a combination of specific features yields a
on Arabic sentiment analysis by implementing a sentiment specific class [17] employing one of the supervised
classification for Arabic tweets. The retrieved tweets are categorization algorithm. Examples of categorization
analyzed to provide their sentiments polarity (positive, or algorithms are Support Vector Machine (SVM), Naïve
negative). Since, this data is collected from the social network Bayesian Classifier, Maximum Entropy, etc… On the other
Twitter; it has its importance for the Middle East region, which hand, the SO approach is an unsupervised approach in which
mostly speaks Arabic. a sentiment lexicon is created with each word having its
Keywords-component; Sentiment; Feature; Tweets; Polarity semantic intensity as a number indicating its class. Then, this
lexicon is used to extract all sentiment words from the
sentence and sum up their polarities to determine if the
I. INTRODUCTION sentence has an overall positive or negative sentiment in
Sentiment analysis or opinion mining has been currently addition to its intensity whether they hold strong or weak
considered to be one of the most emerging research fields intensity [17]. The SO approach is domain-independent, since
caused by the great opinionated web contents coming from one lexicon is built for all domains. The approach we have
blogs and social network websites. Sentiment analysis is the chosen for sentiment classification is the ML approach
task of identifying positive and negative opinions, emotions, because we do not have a lexicon for Arabic sentiment word.
and evaluations. In general, sentiment analysis aims to This approach is based on selecting a set of features to build
determine the attitude of a writer with respect to some topic or feature vectors and train a classifier.
the overall tonality of a document [4]. In this study, we are
interested in sentiment classification in the Arabic language at The remaining of the paper shows in more details our
the sentence level in which the aim is to classify a sentence achieved work in analyzing and extracting sentiments from
whether a blog, review, tweet, etc… as holding an overall the Arabic tweets. Section II summaries the related work
positive, negative or neutral sentiment with regards to the done in this area, while section III proposes the system
given target. It should be noted that this work lies in a larger architecture and discusses the system implementation details.
project that will include extracting sentiment topic and other Section IV describes the experiments conducted and their
features. results. Finally, Section V talks about the challenges,
conclusion and future work.
Choosing to work with the Arabic language is due to
several factors. First, the complexity of the language has with II. RELATED WORK
regards to both the morphology and the structure has created
a lot of challenges which resulted in very limited tools According to the type of the classes to predict (positive
currently available for the aim of sentiment and opinion or negative, subjective or objective), and the levels of
mining [6], while on the other hand, Arabic sentiment classification (sentence, phrase, or document level), the
analysis is of growing importance due to its already large processes of sentiment analysis differ with respect to the
scale audience. Second, the Arabic language is both technique used whether ML, or SO.
challenging and interesting because of its history, the strategic
importance of its people, the region they occupy, and its The author in [15] determines the class of the sentence
cultural and literary heritage. using the average semantic orientation of different phrases.
The similarity score measure (Pointwise-Mutual Information
There are mainly two approaches for sentiment or PMI) is used to determine the semantic orientation of each
classification: machine learning (ML) and semantic phrase by comparing the similarity of the phrase with a set of
orientation (SO). The ML approach is typically a supervised ideal 7 positive words with its similarity to a set of ideal 7
negative words. Also, the author in [6] determines the class of
the sentence using list storing the semantic orientation of the features are done in two distinct components, allowing us
some Arabic word roots which are extracted using a stemmer to easily try different combinations of classifiers and features
program. In the classification process, the root of each word is until we reach the ones yielding the highest accuracy.
extracted using an Arabic stemmer, and then this root is
checked against the stored dictionary. If the root is present, its Figure 1, summaries the ML process of the sentence’s
sentiment is extracted as positive, negative, or neutral. sentiment analysis in the Arabic language using Arabic tweets
Otherwise, the dictionary asks the user to identify the polarity from the social network website twitter. The process starts by
of the word it has not learned, and add its root to the list of getting the tweets from twitter. Then we will pass by each
learned roots. tweet and label it as positive, or negative. After that the
features in each tweet will be extracted and represented in a
On the other hand, the author in [3] determines the class feature vector. Then, these feature vectors will be used in the
of the opinions in Web forums in multiple languages: English training phase of the classifier. We have used the Weka Suite
and Arabic using the ML technique which is the SVM software for the classification process.
employing combination of both syntactic and stylistic
features. The syntactic features used for Arabic were N-
grams’ frequency, word roots’ frequency and punctuation
marks’ occurrences. POS N-grams were only employed for
English but not for Arabic. Whereas the stylistic features used
included total words, total characters, special-character
frequencies, world-length distributions, character length of
forum messages, etc [6]. High accuracy (90%) was noticed by
combining both syntactic and stylistic features. Also, the
author in [16] built an opinion corpus for Arabic using two
different ML techniques: SVM and Naïve Bayes (NB)
utilizing various N-grams models like (unigrams, bigrams and
trigrams) and getting their term frequency as a weighting
scheme. By comparing the results of both learning algorithms, Figure 1. The ML process of the sentence’s sentiment analysis
it is noticeable that SVM slightly improves on the
performance of NB with an improvement between the best A. Getting Data from Twitter (Arabic Tweets)
accuracy results of both models of 3.43% for SVM.
Although Arabic is considered as one of the top 10
The sentiment analysis on Twitter data has been recently languages most used on the Internet [13], it is considered as a
the interest of several researchers as Twitter becomes more poor content language over the web, unlike English [3] with
popular. Most of these researches done in this field have used very few web pages that specialize in Arabic reviews. We
the ML approach to classify the sentiment of the English have searched for a source that is used to communicate real
tweets with almost very limited or rare work performed to opinions and at the same time the opinions are written in
classify the sentiment of tweets in any other language like Arabic. For this reason, we have used Twitter’s APIs to get
Arabic. For example, the authors in [19] used a supervised K- the required tweets, as Twitter provides a search API that lets
Nearest Neighbor (KNN) like classifier to classify English you search for tweets in a certain language [12]. By setting the
tweets with hash-tags and smileys as features. On the other language to Arabic: lang=ar, we are now able to get Arabic
hand, the authors in [20] used the SVM classifiers to classify tweets. Also, it was very important to get a big set of Arabic
the sentiment of the tweets in a two-step approach with sentences in order for the classifier to be trained and be able to
abstract features. The training data they have used for their further classify any new supplied sentence. Twitter was one of
system is gathered from the output of three existing Twitter the main sources for getting vast amounts of data. We got
sentiment classification web sites. more than 4000 tweets from twitter from which we have
extracted 1000 tweets consisting of 500 positive and 500
III. TOOLS AND METHODS negative tweets. We have chosen the tweets that hold only one
opinion, not sarcastic, subjective and from different topics.
After reviewing the majority of the work done in the Then we have used these tweets to be our corpus to train the
field of sentiment analysis for Arabic at the sentence level, we classifier.
wanted to propose an approach that differs and improves upon
those proposed ones. In this approach the preprocessing of the
tweets is different from the preprocessing done in Arabic B. Tweets Cleaning and Annotation
sentiment analysis as different stop words list will be used, The next step is the process of determining the class of
particularly built for the Egyptian dialect. Also, this approach each tweet by annotating it as positive, or negative. Two raters
uses different machine learning classifiers and feature sets. were used to determine the sentiment of the tweets. They had
The machine learning classifiers used are Naive Bayes (NB), a high degree of agreement in their classification of the
and Support Vector Machines (SVM). The features used are tweets, and for those tweets that they disagreed about their
unigrams and bigrams. The classification and the extraction of sentiment; a third rater was used to determine its final
sentiment. Figure 2, shows samples of the annotated tweets. as well as feature selection methods such as IG. It also
After annotating the tweets, we then went into the process of provides a number of test options, such as cross validation and
putting them into a format understandable by the classifier for percentage split. It can be run directly by inserting the dataset
maximum throughput. This process involved removing the into the program or from the command line (when the dataset
user-names, the pictures, the hash tags, the URLs and all non- size is large).
Arabic words.
IV. EXPERIMENTATION AND EVALUAION

Positive ‫ انا صىتً لن يخرج هن دائرة تضوك‬...... ‫إنت راجل هحترم‬ A. Results
‫جبهت‬.....‫هع العظين أبى الفتىح ياريت تتحذوا علً رئيس و نائب‬ The two classifiers: SVM, and NB, were trained first
‫ال تقهر‬ using the frequency of the unigrams only; secondly they were
You are a respectable person … I will definitely trained using a combination of both unigrams and bigrams
vote for you and Abu El Fotoh I hope you can with an attempt to capture any negation or sentiment
challenge for president and vice president... switching phrases. The results were as follows for each
unbeatable front classifier using 10-fold validation:

TABLE I. SVM RESULTS


Negative ‫حرام عليك اللً عولته فينا وفً نفسك وهللا هصر ام الذنيا حرام‬
‫عليك قىي‬ SVM
This is over what you have done to us and to
yourself Egypt is the mother of the world this is Accuracy Precision Recall F-Measure
really over
Unigrams 0.721 0.721 0.721 0.721
Figure 2. Samples of positive and negative tweets Unigrams 0.721 0.721 0.721 0.721
+
Bigrams
C. Feature Extraction and Feature Vector
a. SVM results before removing stop words
The feature vectors applied to the classifier consisted of
the term frequency, as we are using statistical machine TABLE II. NB RESULTS
learning [7]. First the process starts by extracting all the
unigrams and bigrams in the corpus that exceed certain NB
threshold. For example all the unigrams and bigrams with Accuracy Precision Recall F-Measure
frequencies more than 5 were extracted to be our candidate
features. We have chosen to work with unigrams and bigrams Unigrams 0.654 0.654 0.654 0.654
as our work is on word/Phrase level sentiment analysis [14]. Unigrams 0.654 0.654 0.654 0.654
This can be extended easily using trigrams also. Then for each +
Tweet, we count the frequency of each candidate features Bigrams
found in it. Thus, for each tweet the following feature vector b. NB results before removing stop words

was constructed using term frequency.


TABLE III. SVM RESULTS
({word1:frequency1, word2:frequency2 …}, “polarity”)
SVM

For simplicity, in this experiment we have ignored some Accuracy Precision Recall F-Measure
factors which are to be considered. One of these factors is
Unigrams 0.726 0.728 0.726 0.725
negation in phrases [10]. In Arabic, they are around 20-words
[3]. The negation mechanism in simple terms is inversing the Unigrams 0.726 0.728 0.726 0.725
+
sentence polarity if it is preceded by one of the negation Bigrams
words in phrases. As negation can be local (e.g. not good), or c. SVM results after removing stop words
it could involve longer-distance dependencies such as the
negation of the proposition (e.g. does not look very good) or TABLE IV. NB RESULTS AFTER REMOVING STOP WORDS
the negation of the subject [10]. In general, sentiment analysis
seems to require more understanding than the usual topic- NB
based classification.
Accuracy Precision Recall F-Measure

D. Weka Suite Software Unigrams 0.652 0.662 0.652 0.646


Unigrams 0.652 0.672 0.652 0.646
The Weka Suite Software version 3.6.43 will be used for +
the classification process. Weka is written in Java and it Bigrams
provides several ML algorithms such as SVM, NB and others
d. NB results after removing stop words
Tables I and II show the results obtained in the
classification process for the two classifiers: SVM and NB In our approach, we applied the feature vectors to the NB
using term frequency scheme respectively before removing and SVM Classifiers with the aim of comparing the results
the stop words. Tables III and IV show the results obtained and choosing the classifier with the higher accuracy. Problems
using the same techniques but after removing the stop words. with regards to the training data is that some tweets may occur
Comparing the results obtained to the ones in Tables I and II many times without any change, through re-tweeting. This
shows that, there were very small improvements in both SVM gives a misleading boosting to the weight of the terms in the
and NB. We might explain this behavior as there are no lists sentence; sometime re-tweets are more than 7 times in the
available containing the stop words for the Egyptian dialect, corpus. Also the problem of opinion spamming or untruthful
thus we have developed this list from the beginning opinions could affect the accuracy of the classification as then
containing all the words that we believe are considered stop the classifier will be built on a misleading tweets. On the other
words. Given that there were not big improvements in hand, one thing with regards to the testing tweets is that the
performance, means that there might be some important tweet may contain dual opinions, thus its sentiment to some
words that we have removed that should not have been extend is ambiguous.
removed, or there are some other stop words that still need to
be removed. Thus, we are now in the process of developing a For future work, we will continue in this line of research
reliable list of stop words that can increase the performance. by improving our corpus using techniques such as enlarging
or fine-grained annotation. Moreover, we will focus on adding
B. Discussion some stylistics features, in addition to considering adding
some semantic features thus creating a hybrid approach that
Comparing the results of SVM and NB in both cases, it is combines both the ML and SO approaches. This will be
clear that SVM has better results than NB. The improvement achieved by building a more comprehensive list of all the
between the best accuracy results of both models is almost 4- positive and negative sentiment words for the Egyptian dialect
6% for SVM. This behavior was observed in more than one since there doesn’t exist any of them. Also, negations and
study as usually SVM produces more accurate results than the valence shifters will be considered as a feature in ML
NB. This is because NB is based on probabilities, thus it is approach because their presence in the sentence can result in
more suitable for inputs with high dimensionality. changing the sentiment of the whole tweet like “‫حلى‬- nice”
implying positive sentiment if preceded by “‫ – هش حلى‬not
Regarding the n-gram model, we can note clearly that good” would then imply negative sentiment. And finally,
bigram model didn’t enhance the result using the unigram neutral sentiment tweets has to be considered as in real world
model. This is because the number of frequent bigrams in the applications neutral tweets cannot be ignored.
corpus was only 12 bigrams that exceed the threshold of 4.
Thus their number was not that effective when extracting the ACKNOWLEDGMENT
feature vector for each tweet. It should be noted that we have
used only the 1000 cleaned and annotated tweets to build the The authors would like to thank ITIDA for sponsoring this
unigram and bigram models, as cleaning is currently done project entitled "Semantic Analysis and Opinion Mining for
manually. Arabic Web", and the Egyptian industrial company LINK-
Development team for their help in developing a tool for
On the other hand, the results obtained by the SVM have collecting and annotating the tweets.
been shown to be highly effective in sentiment analysis
outperforming the results obtained by the NB. Because of the REFERENCES
principle advantages of the SVM, it was applied successfully [1] K. Yessenov, and S. Misailovic, “Sentiment Analysis of Movie
in several sentiment analysis tasks. These principle Review Comments”, Graduation project. 17th, May, 2009.
advantages include: “First, they are robust in high [2] A. Abbasi, H. Chen, and A. Salem, “Sentiment analysis in multiple
dimensional spaces; second, any feature is relevant; third, they languages: Feature selection for opinion classification in web forums”
are robust when there is a sparse set of samples; and, finally, ACM Transactions on Information Systems (TOIS), Vol 26, Issue 3,
most text categorization problems are linearly separable” [16]. June 2008.
By comparing the results obtained by SVM in sentiment [3] M. Elhawary, and M. Elfeky, “ Mining Arabic Business Reviews”,
analysis in general, it is noticeable that SVM overcomes other Google Inc., Mountain View, CA, USA, 2010 IEEE International
Conference on Data Mining Workshops.
machine learning techniques.
[4] http://en.wikipedia.org/wiki/Sentiment_analysis
[5] T. Helmy, and A. Daud, “Intelligent Agent for Information Extraction
V. CONCLUSION AND FUTURE WORK from Arabic Text without Machine Translation”, Information and
Research in sentiment analysis for the Arabic language Computer Science Department, College of Computer Science and
has been very limited considered to other languages like Engineering, King Fahd University of Petroleum and Minerals.
[6] N. Farra, E. Challita, R. Abou Assi, and H. Hajj, “Sentence-level and
English whether at the sentence-level or document-level. In
Document-level Sentiment Mining for Arabic Texts”, Department of
this study, we investigated the ML approach for sentence- Electrical and Computer Engineering, American University of Beirut,
level sentiment analysis for Arabic using 1000 tweets from Beirut, Lebanon.
twitter. The results obtained are very promising as a first step.
[7] B. Pang and L. Lee,”Thumbs up? Sentiment Classification using
Machine Learning Techniques” Department of Computer Science,
Cornell University, Shivakumar Vaithyanathan, IBM Alma den
Research Center
[8] Y. Lu, M. Castellanos, U. Dayal, and C. Zhai, “Automatic
Construction of a Context-Aware Sentiment Lexicon: An
Optimization Approach”, UIUC Computer Science, 201 N. Goodwin
Avenue, Intelligent Information Management Lab, HP Laboratories.
[9] A. Farghali and K. Shaalan, “Arabic Natural Language Processing:
Challenges and Solutions”, Monterey Institute of International Studies,
the British University in Dubai
[10] T. Wilson, J. Wiebe, and P. Hoffmann, ”Recognizing Contextual
Polarity in Phrase-Level Sentiment Analysis”, Intelligent Systems
Program, Department of Computer Science. University of Pittsburgh,
[11] ZhangWei, “Opinion Mining and Sentiment Analysis: A Survey”,
Department of Computer Science, School of Computing, National
University of Singapore
[12] Twitter search API, http://search.twitter.com/search.atom?lang=ar&
rpp=100&page={0}&q={1}
[13] http://www.internetworldstats.com
[14] L. Khreisata, “A machine learning approach for Arabic text
classification using N-gram frequency statistics”, Journal of
Informatics, Vol 3, Issue 1, January 2009, Pages 72-77.
[15] P. Turney, “Thumbs Up or Thumbs Down? Semantic Orientation
Applied to Unsupervised Classification of Reviews”, in Proceedings
of the 40th Annual Meeting of the Association for Computational
Linguistics(ACL) , pp. 417-424, July 2002
[16] M. Rushdi-Saleh, M. Teresa, L. Martín-Valdivia, A. Ureña-López, and
J. M. Perea-Ortega, “OCA: Opinion corpus for Arabic”. Journal of the
American Society for Information Science and Technology, 62: 2045–
2054. doi: 10.1002/asi.21598. 2011
[17] S. Morsy, “Recognizing Contextual Valence Shifters in Document-
Level Sentiment Classification”. Department of Computer Science and
Engineering, The American University in Cairo (AUC). 2011
[18] http://www.cs.waikato.ac.nz/ml/weka
[19] D. Davidiv, O. Tsur and A. Rappoport, “Enhanced Sentiment Learning
Using Twitter Hash-tags and Smileys”. Coling 2010.
[20] L. Barbosa and J. Feng, “Robust Sentiment Detection on Twitter from
Biased and Noisy Data”. Coling 2010.

View publication stats

You might also like