International Journal of Computer Science Trends and Technology (IJCST) – Volume 9 Issue 4, Jul-Aug 2021
RESEARCH ARTICLE OPEN ACCESS
A Review to Classify Sentiments Using Some Machine
Learning Techniques
G. Bala Krishna Priya [1], Dr. Jabeen Sultana [2], Prof. M. Usha Rani [3]
[1]
Department of Computer Science, SPMVV, Tirupati, Andhra Pradesh - India
[2]
Department of Computer Science, Majmaah University, Kingdom of Saudi Arabia
[3]
Department of Computer Science, SPMVV, Tirupati, Andhra Pradesh - India
ABSTRACT
In this enlightening age of information, due to tremendous increase in network connection among devices lead to huge data
generation. Popular networking sites have reached to maximum usage among the different age groups of people across the
globe. Websites like Twitter, Instagram and Facebook are gaining huge demand among the individuals to express their
sentiments towards a particular topic or subject. Analysing these sentiments will help the higher authorities like businesses to
make decisions as per public demand. There are lot of applications of sentiment analysis in real life to meet public demands in
different sectors like marketing, stock exchange, movie reviews and in education sector to name a few. This paper reviews the
different methods used for performing sentiment analysis on different data using machine learning techniques. Deep reviews
suggest that supervised learning yields better results to classify sentiments using classifiers like Naive Bayes (NB), Support
Vector Machine (SVM), Multi-Layer Perceptron (MLP) and Long Short Term Memory (LSTM) outperforms for unsupervised
learning. Few mathematical calculations and parameters can be adjusted for the existing models in order to attain highest
possible classification accuracy.
Keywords:- Classification, Sentiment, Supervised Learning, Unsupervised Learning, NB, SVM, MLP and LSTM.
I. INTRODUCTION
feedback reviews related to marketing sector, educational
Sentiment analysis has gained huge popularity due to the sector, finance sector, weather prediction, movie reviews
web and has become the hot area among the researchers. and some other areas. Sentiments are obtained from the
Sentiment is the other name for feeling, emotion, review or tweets to make highly important decisions. The examination
feedback. Analysing emotions or sentiments is playing of these opinions and sentiments has distributed across
prominent role in the current technological age as every several sectors namely marketing, consumer information,
individual is dependent on reviews of other individuals application, books, websites and sentiment analysis becomes
before making a decision to purchase a product or to come an essential field in decision making [4]. Classifying tweets
to some decision. Much work has been done to analyse the is the process of dividing text into two or more classes based
sentiments on various texts using different platforms but on the subject of the tweets. Each word in the tweet is
these days Twitter, a social networking site is grabbing the assigned a polarity value and based on the polarity score of
attention of public, researchers and businesses. Analysing the whole tweet, tweets are classified a positive, negative
the sentiments obtained from Twitter facilitates businesses, and neutral. Tweets can be manually or automatically
researchers to make prominent decisions by analysing the classified. Manual classification of tweets consumes more
sentiments from public from all the walks of society. It is time and needs more accurate classification of tweets into
easy to access the micro-blog data as lots of data is produced proper class, whereas automatic classification of tweets
in the form of tweets through Twitter. If one has Twitter requires less time. There are many applications of text like
account and gets access to Twitter developer account, document indexing, organizing the document and ordering
millions of tweets can be extracted within no time. Now the the text [5]. Classification using machine learning
challenge lies in cleaning the tweets and then classifying techniques involves the tweets to be arranged in a set and
these tweets into classes namely positive, negative and then splitting the data set into two parts, one as train set
neutral [1]. As more and more communication started taking comprising 70% of tweets and 30% of tweets into test set
place with internet, micro-blog has drawn attention for accordingly. Initially, train set is fed to a machine learning
online community and with supreme source of information classifier and a model is obtained and later the model is
is gathered in Twitter in the form of tweets, it is more tested for it is classification accuracy. Specificity, Sensitivity,
preferred than any other social networking site for analysing F-Measure are also calculated besides accuracy to check the
the sentiments [2] [3]. efficiency of the model. There are various machine learning
Natural Language Processing techniques are used to clean techniques used to classify tweets like MLP, SVM, NB,
the text of tweets, remove noise, eliminate URL’s, spaces Logistic Regression, LSTM, KNN and N-gram sentiment
and unnecessary attributes and once the text is cleaned, analysis. SVM, NB and LSTM have yielded good
classification techniques are applied in order to classify the classification results. NB classifier is extensively used to
text. Tweets belong to different sentiments mainly customer classify the tweets for regional languages and is the basic
classifier for document classification for content present in
ISSN: 2347-8578 www.ijcstjournal.org Page 18
International Journal of Computer Science Trends and Technology (IJCST) – Volume 9 Issue 4, Jul-Aug 2021
Indian Language [6]. Classifying text for undefined class in which are more significant in identifying sentiments.
the text data uses available lexical content [7]. Sometimes negativity and positivity of the words resemble
The foremost objective of this review work is to compare the same and therefore such words make difficult to classify.
the text classification results of different sentiments present As Malayali language contains words which resemble same
in the tweets pertaining to movie reviews, customer meanings, it is quite difficult for classifying customer related
feedback and educational sentiments etc. using machine tweets into positive or negative in order to perform
learning techniques. sentiment analysis [19]. In addition to this, it is very difficult
to build a parser for parsing this Malayali tweets as words of
II. LITERATURE REVIEW this language are dependent on each other.
Sentences are tagged for getting better classification
A. Sentiment Analysis of Regional Languages: Tamil, results in analysing sentiments of Telugu tweets. In this
Kannada, Bangla, Malayalam, Telugu, Hindi, Turkey work, more importance is given to improve the
and Saudi Language dependencies existing in the parser. A frame work for
sentiment analysis of Telugu language tweets [20] facilitates
Sentiment Analysis on Tamil text as taken from a Tamil the users to analyse the Telugu language and find the
opinion corpus by using Supervised Machine Learning sentiment analysis by using the user defied deep learning
based Sentiment Analysis approaches to obtained 79% of algorithms. As more and more different language tweets
accuracy [8]. Sentiment Analysis on Tamil tweets and Saudi related to public sector, health, education etc., demand for
dialect was done by using the deep learning algorithms like analysing the sentiments before making any important
LSTM, and BiLSTM. Here the results were shown in the decision is one peak. These corresponding facts are utilized
terms of the accuracy and F-score, 80% of training dataset in distinct dialects and relocate data from high level to low
and 20% of the testing data was taken to perform the level languages.
experiment [9,12,13]. Sentiment analysis was performed by Coming to Hindi language, related words exist with
using the Natural language processing and deep learning related sense and may have distinct meanings in particular
approaches like RNN and Naive Bayes algorithms. Here tweets. Therefore, it is highly challenging to get the whole
tweets are collected from the twitter by using the API keys tweet in a lexicon and when a model is trained, it is
of the twitter account and obtained the results in terms of the somewhat problematic to achieve all the alternatives of
Accuracy, precision, F-Score and Recall with the polarities meaning. Sentiment Analysis of Hindi tweets is therefore
like negative, positive and neutral [10]. quite challenging as there are not sufficient tools and corpus
Supervised classification was performed to analysis the as it is written in Devanagari, even though it is widely used
tweets in the Turkish language and Saudi dialect using SVM language across the World [21]. Students prefer to study in
and Random Forest algorithms. The results were shown in Hindi medium are very rare and it is difficult to parse
terms of the accuracy [11, 13]. Sentiment analysis for the because of inefficient parsers and corpus and manual
tweets that available in the languages other than English was tagging takes place for the Hindi tweets. Each linguistic
reviewed and specified the various challenges involved in it encounters some difficulties in terms of classifying
[14]. Sentiment analysis was done for the region language semantics and syntaxes.
Kannada based on the domain knowledge and by using
decision tree classifier [15]. Experiment analysis was done B. Sentiment Analysis of Different Text: COVID-19 Data,
for the word embed tool Word2vec in the Bangla language Educational Data, Breast Cancer Data and Marketing
[16]. Data
Malayali tweets were retrieved basing on rule based
approach [17] and sentiments were analysed accordingly. Sentiments of public towards Covid-19 in India are
Negative sentiments need to be carefully analysed as extracted from Twitter and are classified and analysed. This
negative tweets comprises of words which actually doesn’t Research work collected tweets related to impact of
express negativity. Negative sentiments can be easily COVID-19 in India, sentiments of people towards COVID-
detected and well analysed from movie reviews since the 19 using twitter API. Pre-processing strategies were
tweets in the form of sentiments clearly signify the viewers imparted to COVID tweets in order to well-ordered tweets
are not satisfied with it with comments clearly stating. dependent on the subject of the tweet whether tweet belongs
Further rules are applied to clearly perform sentiment to the class-positive, negative and neutral. It was found that
analysis with low error rates. conventional deep learning method-MLP outperforms in
Corpus was extracted at initial step from Malayali novel classifying the tweets in terms of high accuracy with 97%,
stories in one of the study [18] and sentences were manually low root mean square error-0.12, precision and recall-0.97.
typed in order to overcome minute errors. At Maximum side, The outcomes also specify good F-Score-0.95 and ROC
one hundred sentences are grouped during commencement Curve area-0.99 [22].
of sentiment analysis and grammar detection for adjectives Presently, researchers are showing their research interest
and adverbs takes place manually. word used inside the towards predicting sentiment analysis in learning tract of
sentence are manually tagged as adverb and adjectives students by means of various data mining, deep learning and
ISSN: 2347-8578 www.ijcstjournal.org Page 19
International Journal of Computer Science Trends and Technology (IJCST) – Volume 9 Issue 4, Jul-Aug 2021
NLP techniques. Data mining techniques have been authorities of college or educational sector in making
employed to natural language processing with some educational reforms. Machine Learning and Deep Learning
success [23]. Sentiment classification of online learning Classifiers were used to classify the tweets by using Multi-
posts and comments using a hybrid supervised technique Layer Perceptron, Decision Trees, Naive Bayes and SVM.
was proposed and also found that the chi-statistics method Multi-Layer Perceptron yielded 90% classification results
dominates the other Feature selection methods. Sentiments followed by SVM with accuracy of 80% accuracy. Also, top
of students towards learning environment and the difficulties quality tweets were analysed and important suggestions
faced by them were analysed. were recommended accordingly [29].
In this research work. The authors state the usage of Sentiment analysis on Twitter data is carried out to find
various data mining methods by means of grouping breast observations and sentiments of the public on education in
cancer related data and further classifying as benign or India. Further, tweets were processed and classified into 3
malignant. Data set comprises of 569 samples with more classes namely; positive, negative and neutral. Classifiers
than 30 attributes. Initially, cancer data is pre-processed well like SVM, Naïve Bayes, Decision Tree, KNN and MLP
and number of machine learning classifiers like NB, MLP, were employed. Training on the dataset was done and a
DT and Multi-classifiers are used to classify the breast model was obtained. Testing was done and the obtained
cancer data. Training is carried out and the model is built results shows that SVM outstands in the performance by
and then test set is applied to get better results. Also, the yielding 91% accuracy followed by Decision Tree and
model which yields best classification results are identified KNN-with 90% accuracy, and Naïve Bayes is 68% [30]. An
by comparing the results attained by other classifiers. Effective method in order to classify Undergraduate’s
Decision trees failed to yield good rules and logistic Performance was proposed using deep learning classifier on
regression methods yielded good results [24]. Related effort educational data collected from Kaggle and found that MLP
was laid by the authors to diagnose chest cancer and to outperforms the best [31].
classify by means of different classifiers of machine learning
and it was observed that Support Vector Machine and Multi- III. CONCLUSIONS
classifiers outperformed the best [25].
Mining techniques to associate rules for market data was Opinions or feelings or thoughts or sentiments have
carried out in order to learn the feedback of the customer always been an important and integral part of our society
like what customers prefer and associate the products during since our sentiments express what we are by impelling our
the time of purchase. ARM-Association rule mining, yielded actions. Considering public sentiments is given prime
best possible results in associating the products with one importance when making decisions for any product
another [26]. The authors focused in improving the learning improvement or planning for new product or to make
process of students by analysing the feedback of students changes in the rules or reforms in educational organizations.
after the course completion. The sentiments of the students Bulks of data is obtained from the most popular social media
during the course duration collected for limited time span site i.e., twitter as it offers wide range of sentiments on
and student’s difficulties were analysed through tweets. At various topics shared by the users. Users express their
initial stage, data was roughly extracted and later tweets sentiments on a particular product or movie and this
were pre-processed by using pre-processing techniques. The sentiment helps in analysing the views and ultimately leads
well processed tweets were classified into positive class, to decision making. In this research work, text mining and
negative class and neutral class. Machine Learning and Deep data mining techniques are reviewed to preprocess and mine
Learning techniques were used in training the tweets and a educational tweets, breast cancer tweets, customer reviews
model was built, which was later tested for best and sentiments belonging to different text are analysed for
classification accuracy, error rate, specificity and sensitivity. reforming education, health sector and marketing. In the
The results specify that Multi-Layer Perceptron-96% conclusion, we analysed that it is easy to analyse the tweets
outperformed best results matched to Convolutional Neural in English compared to other language tweets. It can be
Networks by obtaining accuracy of 93% [27]. Educational concluded that Tweets in English language could be
sentiments were classified and analysed with other methods analysed for its sentiments with lesser difficulty when
of AI and was found that Multi-Layer Perceptron got the compared with that of Indian Regional Languages. Also,
ideal outcomes [28]. classifying and analysing the tweets depends on the kinds of
Machine Learning classifiers assists in improving the tweets we choose and the machine learning technique we opt.
classification results in order to classify educational tweets Naïve Bayes, SVM, Passive Aggressive Classifier, MLP,
proficiently. These days’ students started posting their RNN are mostly used in text data classification for better
sentiments they have gone through during the course period classification results. In future, we plan to work on audio
like difficulties faced in learning, grasping or whether the and image data classification.
speed of the tutor was not matching to the student, absent
rate and other factors. The authors in this research work
extracted tweets of Indian students so that they can reveal
the valuable sentiments of students and suggest the higher
ISSN: 2347-8578 www.ijcstjournal.org Page 20
International Journal of Computer Science Trends and Technology (IJCST) – Volume 9 Issue 4, Jul-Aug 2021
REFERENCES Conference on Electronics, Information, and
Communication (ICEIC). IEEE, 2019.
[14] Djatmiko, Fahim, RidiFerdiana, and Muhammad Faris.
[1] Karanasou M, Ampla A, Doulkeridis C & Halkidi M, "A review of sentiment analysis for non-English
“Scalable and Real-time Sentiment Analysis of Twitter language." 2019 International Conference of Artificial
Data”, 16th IEEE International Conference on Data Intelligence and Information Technology (ICAIIT).
Mining Workshops (ICDMW), (2016), pp.944-951. IEEE, 2019.
[2] Bilgin M & Şentürk İF, “Sentiment analysis on Twitter [15] Rohini, V., Merin Thomas, and C. A. Latha. "Domain
data with semi-supervised Doc2Vec”, IEEE based sentiment analysis in regional Language-
International Conference on Computer Science and Kannada using machine learning algorithm." 2016
Engineering, (2017), pp.661-666. IEEE International Conference on Recent Trends in
[3] Twitter.Com, Twitter Developer, Electronics, Information & Communication
https://dev.twitter.com/streaming/ overview, Founded Technology (RTEICT). IEEE, 2016.
on 21st March, 2006. [16] Sumit, Sakhawat Hosain, et al. "Exploring word
[4] Cheng, Li-Chen, and Song-Lin Tsai. "Deep learning embedding for bangla sentiment analysis." 2018
for automated sentiment analysis of social media." International Conference on Bangla Speech and
Proceedings of the 2019 IEEE/ACM International Language Processing (ICBSLP). IEEE, 2018
Conference on Advances in Social Networks Analysis [17] Nair DS, Jayan JP & Sherly E, “Sentiment Analysis of
and Mining. 2019. Malayalam film review using machine learning
[5] Sebastiani, F., Machine learning in automated text techniques”, IEEE International Conference on
categorization, ACM Computing Surveys, 34(2002), 1- Advances in Computing, Communications and
47 Informatics, (2015).
[6] Murthy K.N.,Automatic Categorization of Telugu [18] Shankar R, Shilpa KM, Patil S & Swamy S, “A Survey
News Articles, Department of Computer and on Sentimental Analysis in Different Indian Dialects”,
Information Sciences, University of Hyderabad, International Journal of Advanced Research in
Hyderabad,2003, DOI= 202.41.85.68. Computer and Communication Engineering, Vol.5,
[7] Youngjoong Ko Jungyun Seo, Automatic Text No.4, (2016), pp.1072-1076.
Categorization by Unsupervised Learning, Proceedings [19] Nagaraju G, Mangathayaru N & Rani BP,
of the 18th conference on Computational linguistics, “Dependency Parser for Telugu Language”,
1(2000), 453-459. Proceedings of the Second ACM International
[8] Thavareesan S., and Sinnathamby Mahesan, Conference on Information and Communication
“Sentiment Analysis in Tamil Texts: A Study on Technology for Competitive Strategies, (2016),
Machine Learning Techniques and Feature pp.138-139.
Representation” 2019 14th International Conference on [20] Bala Krishna Priya, G., Usha Rani, M., (2020). A
Industrial and Information Systems (ICIIS). IEEE, Framework for Sentiment Analysis of Telugu Tweets.
2019. In ‘International Journal of Engineering and
[9] Anbukkarasi, S., and S. Varadhaganapathy. "Analyzing Advanced Technology (IJEAT)’, ISSN: 2249-8958
Sentiment in Tamil Tweets using Deep Neural (Online), Volume-9 Issue-6, August 2020, Page No.
Network." 2020 Fourth International Conference on 523-525.
Computing Methodologies and Communication [21] Mishra D, Venugopalan M & Gupta D, “Context
(ICCMC). IEEE, 2020. Specific Lexicon for Hindi Reviews”, Procedia
[10] Goel, Vikas, Amit Kr Gupta, and Narendra Kumar. Computer Science, (2016), pp.554563.
"Sentiment Analysis of Multilingual Twitter Data [22] Jabeen Sultana, et. al. “Predicting Indian Sentiments
using Natural Language Processing." 2018 8th of COVID-19 Using MLP and Adaboost”. Turkish
International Conference on Communication Systems Journal of Computer and Mathematics Education,
and Network Technologies (CSNT). IEEE, 2018. pp.706-714, Vol. 12 No. 10 (2021).
[11] Demirci, GözdeMerve, Şeref Recep Keskin, and [23] Sultana, J., Rani, M. U., & Farquad, M. A. H. (2020).
GülüstanDoğan. "Sentiment Analysis in Turkish with An Extensive Survey on Some Deep-Learning
Deep Learning." 2019 IEEE International Conference Applications. In Emerging Research in Data
on Big Data (Big Data). IEEE, 2019. Engineering Systems and Computer Communications
[12] Monika, R., S. Deivalakshmi, and B. Janet. "Sentiment (pp. 511- 519). Springer, Singapore.
Analysis of US Airlines Tweets Using LSTM/RNN." [24] Sultana, J., & Jilani, A. K. (2018). Predicting Breast
2019 IEEE 9th International Conference on Advanced Cancer using Logistic Regression and Multi-Class
Computing (IACC). IEEE, 2019. Classifiers. International Journal of Engineering &
[13] Alahmary, Rahma M., Hmood Z. Al-Dossari, and Technology, 7(4.20), 22-26.
Ahmed Z. Emam. "Sentiment analysis of Saudi dialect [25] Sultana, J., Sadaf, K., Jilani, A. K., & Alabdan, R.
using deep learning techniques." 2019 International (2019, December). Diagnosing Breast Cancer using
ISSN: 2347-8578 www.ijcstjournal.org Page 21
International Journal of Computer Science Trends and Technology (IJCST) – Volume 9 Issue 4, Jul-Aug 2021
Support Vector Machine and Multi-Classifiers. In 2019
International Conference on Computational
Intelligence and Knowledge Economy (ICCIKE) (pp.
449- 451). IEEE.
[26] Sultana, J., & Nagalaxmi, G. (2015). How Efficient is
Apriori: A Comparative Analysis. International Journal
of Current Engineering and Scientific Research, ISSN
(PRINT):2393-8374, (ONLINE): 2394-0697, 2(8), 91-
99.
[27] Sultana, J., Rani, M. U., & Farquad, M. A. H. (2019,
November). Knowledge Discovery from
Recommender Systems using Deep Learning. In 2019
International Conference on Smart Systems and
Inventive Technology (ICSSIT) (pp. 1074-1078). IEEE.
[28] Sultana, J., Sultana, N., Yadav, K., & AlFayez, F.
(2018, April). Prediction of sentiment analysis on
educational data based on deep learning approach. In
2018 21st Saudi Computer Society National Computer
Conference (NCC) (pp. 1-5). IEEE.
[29] Sultana, J., Rani, M. U., & Farquad, M. A. H. (2019,
December). Deep Learning Based Recommender
System Using Sentiment Analysis to Reform Indian
Education. In International Conference On
Computational and Bio Engineering (pp. 143-150).
Springer, Cham.
[30] Sultana, M. J., Rani, M. U., & Farquad, M. A. H.
(2020). Sentiment Analysis based Recommender
System for Reforming Indian Education using Multi-
Classifiers.
[31] Sultana, M. J., Rani, M. U., & Farquad, M. A. H.
(2018). An Efficient Deep Learning Method to Predict
Student’s Performance.
ISSN: 2347-8578 www.ijcstjournal.org Page 22