0% found this document useful (0 votes)

210 views

Emotion Detection Analysis Documenration

This document provides an outline for a research paper on emotion detection from tweets using machine learning techniques. Section 1 introduces the topic and outlines the goal of developing an accurate sentiment classifier for tweets. Section 2 provides motivation for the research, defines the problem statement as classifying tweets into emotion categories, and introduces social media and Twitter as the domain. Section 3 reviews related work and limitations of prior approaches. Sections 4-8 describe the methodology, including data acquisition, machine learning techniques, algorithm design, and implementation details. Sections 9-10 discuss evaluation metrics and results analysis. Sections 11-12 provide the conclusion and discuss future work. The paper aims to classify tweets into emotion categories like sad, joy, anger using a Naive Bayes classifier.

Uploaded by

Kotha Sindhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

210 views

Emotion Detection Analysis Documenration

Uploaded by

Kotha Sindhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

1. ABSTRACT ………………………………………………… 3

2. INTRODUCTION……………………………………………4
2.1 Motivation………………………………………………...4
2.2 Problem Statement………………………………………..5
2.3 Domain Introduction ……………………………………..5
3. REVIEW OF LITERATURE
3.1 Limitation Of prior Art……………………………...……….7
3.2 Related Work………………………………………………...7
3.3 Social Media…………………………………………………9
4. DATA CHARECTERISTICS…………………………………11
5. DATA ACQUISITION………………………………………..12
6. METHODOLOGY…………………………………………….14
6.1 Machine Learning Techniques……………………………….17
6.2 The Classification Problem…………………………………..18
6.3 Naïve Bayes Classifier……………………………………….18
6.4 Algorithm…………………………………………………….19
7. SYSTEM DESIGN
7.1 Fetching tweets………………………………………………21
7.2 Processing and storing tweets………………………………..22
7.3 Extraction Of emotion from tweets…………………………..23
2
Emotion Detection Analysis
8. IMPLEMENTATION………………………………………..25
8.1 A Bag Of Words…………………………………………….28
8.2 Application Of sentiment analysis………………………….30
8.3 Bayesian theorem appiled to document……………………..30
9. EVALUATION METRCIS…………………………………..32
10. RESULT AND ANALYSIS………………………………...32
11. CONCLUSION……………………………………………...34
12. FUTURE WORK …………………………………………...35
13.REFERENCES……………………………………………….36
3
Emotion Detection Analysis

1. ABSTRACT
This project addresses the problem of emotion analysis in twitter; that is classifying tweets
according to the sentiment expressed in them: sad, angry, love, happy, guilt etc Twitter is an
online micro-blogging and social-networking platform which allows users to write short status
updates of maximum length 140 characters. It is a rapidly expanding service with over 350
million registered users - out of which 250 million are active users and half of them log on
twitter on a daily basis - generating nearly 500 million tweets per day . Due to this large amount of
usage we hope to achieve a reflection of public sentiment by analysing the sentiments expressed
in the tweets. Analysing the public sentiment is important for many applications such as firms
trying to find out the response of their products in the market, predicting political elections and
predicting socioeconomic phenomena like stock exchange. The aim of this project is to develop a
functional classifier for accurate and automatic sentiment classification of an unknown tweet
stream.

Keywords: Emotion detection, navies bayes, python.

4
Emotion Detection Analysis
2. INTRODUCTION
2.1 MOTIVATION:

Emotions play an important role in social media and they are used to stimulate cognitive
processes for strategies making. Emotions represent 1000 another form of language universally
spoken and understood. Identification and classification of emotions has been a research area
since Charles Darwin's age. Textual information can be collected from many sources, such as
books, newspapers, web pages, e-mail messages, etc. Nowadays Internet is the most popular
communication medium also rich in emotion. Emotions can be extracted from textual input by
analyzing punctuation, emotional keywords, syntactic structure and semantic information with
the help of natural language processing techniques. There is a large amount of textual information
available in the web. It is interesting to extract emotions for different goals like those of business.
For example, in luxury goods, the emotional aspects as brand, uniqueness and prestige for
purchasing decisions, are more important than rational aspects such as technical, functional or
price. In this case customer is happy to buy a product even with high prices. Emotional Marketing
aims to stimulate emotions in customer for tying him to brand and so increase the sell of
product/service. Nowadays it isn’t the product to be sold, since for each category there is a wide
choice, but the focus is the relationship that the consumer establishes with the brand and with the
emotions which the product communicates. Twitter is a large and rapidly growing micro blogging
social networking website where people express their opinions in a short and simple manner of
expressions. It is a common practice that merchants selling products on the Web ask their
customers to review the products. In twitter number of customer reviews on different products is
appearing. Electronic products are a common domain in which number of customer reviews
appears. Among the electronic products, mobile phones are a major domain in which customers
have a keen interest. This makes it difficult for a potential customer to read them in order to make
a decision on whether to buy the product. An emotion is a particular feeling that characterizes a
state of mind, such as joy, anger, love, fear and so on. Automatic emotion detection from text has
5
Emotion Detection Analysis
attracted growing attention due to its potentially useful applications. For examples, psychologists
can better assist their patients by analyzing their session transcripts for any subtle emotions;
reliable emotion detection can help develop powerful human-computer interaction devices; and
deep emotional analysis of public data such as tweets and blogs could reveal interesting insights
into human nature and behavior. Emotion is expressed as joy, sadness, anger, surprise, hate, fear
and so on. According to W. Gerrod Parrot , the human emotions are classified through an emotion
hierarchy into six classes at primary level which are Love, Joy, Anger, Sadness, Fear and
Surprise. Naïve Bayes classifier is an efficient classifier in terms of accuracy. This paper aims to
automatically detect the emotions from tweets using Naïve Bayes classifier. Tweets are extracted
and stored in a MYsql database which stores the data in csv format. Section 3.2 details the related
works, section 6 and section 6.3 explains the modular design and Naïve Bayes classifier
respectively.

2.2 PROBLEM STATEMENT:

Given a tweet, classify whether the given tweet is expressing sad, joy, anger, disgust, guilt
or love emotions. The tweets are stored in the form of csv format and those files are given as
input.The nltk, numpy, nltk.corpus, nltk.stem, textblob.classifiers, Kfold, optionparser data
science libraries are to be installed to perform emotion detection analysis using python.

2.3 DOMAIN INTRODUCTION:

This project of analyzing sentiments of tweets comes under the domain of“Pattern
Classification” and “Data Mining”. Both of these terms are very closely related and intertwined,
and they can be formally defined as the process of discovering “useful” patterns in large set of
data, either automatically (unsupervised) or semi-automatically (supervised). The project would
heavily rely on techniques of “Natural Language Processing” in extracting significant patterns
and features from the large data set of tweets and on “Machine Learning” techniques for
accurately classifying individual unlabelled data samples (tweets) according to whichever pattern
model best describes them.
6
Emotion Detection Analysis
The features that can be used for modeling patterns and classification can be divided into two
main groups: formal language based and informal blogging based. Language based features are
those that deal with formal linguistics and include prior sentiment polarity of individual words
and phrases, and parts of speech tagging of the sentence. Prior sentiment polarity means that some
words and phrases have a natural innate tendency for expressing particular and specific
sentiments in general. For example the word “excellent” has a strong positive connotation while
the word “evil” possesses a strong negative connotation. So whenever a word with positive
connotation is used in a sentence, chances are that the entire sentence would be expressing a
positive sentiment. Parts of Speech tagging, on the other hand, is a syntactical approach to the
problem. It means to automatically identify which part of speech each individual word of a
sentence belongs to: noun, pronoun, adverb, adjective , verb , interjection , etc. Patterns can be
extracted from analyzing the frequency distribution of these parts of speech (ether individually or
collectively with some other part of speech) in a particular class of labeled tweets.
Twitter based features are more informal and relate with how people express themselves on
online social platforms and compress their sentiments in the limited space of 140 characters
offered by twitter. They include twitter hashtags, retweets, word capitalization, word lengthening,
question marks, presence of url in tweets, exclamation marks, internet emoticons and internet
shorthand/slangs.
7
Emotion Detection Analysis
3. REVIEW OF LITERATURE
3.1 Limitations of Prior Art:

Sentiment analysis in the domain of micro-blogging is a relatively new research topic

so there is still a lot of room for further research in this area. Decent amount of related
prior work has been done on sentiment analysis of user reviews [x], documents, web
blogs/articles and general phrase level sentiment analysis. These differ from twitter mainly
because of the limit of 140 characters per tweet which forces the user to express opinion
compressed in very short text. The best results reached in sentiment classification, use supervised
learning techniques such as Naive Bayes and Support Vector Machines, but the manual labelling
required for the Supervised approach is very expensive. Some work has been done on
unsupervised and semi-supervised approaches, and there is a lot of room of improvement.
Various researchers testing new features and classification techniques often just compare their
results to base-line performance. There is a need of proper and formal comparisons between
these results arrived through different features and classification techniques in order to select
the best features and most efficient classification techniques for particular
applications.

3.2 Related Work:

S. Radha Krishna et.al studied different spectral features such as MFCC, pitch chroma,
skewness and centroid for emotion recognition. The emotions considered in this study are Fear,
Anger, Neutral, and Happy. The system is evaluated for various combinations of spectral
features. It is established that the combination of MFCC and skewness gave better recognition
performance when compared with other combinations. These experiments are conducted and
evaluated using Gaussian Mixture models (GMMs).The data base used in this study is Telugu
emotion speech corpus (IIIT-KGP). Taner Danisman et.al [2] studied, automatic classification of
anger, disgust, fear, joy and sad emotions in text. The study was conducted on ISEAR
(International Survey on Emotion Antecedents and Reactions) dataset. For the classification they
8
Emotion Detection Analysis
have used Vector Space Model with a total of 801 news headlines provided by “Affective Task”
in SemEval 2007 workshop which focuses on classification of emotions and valences in text.
They have compared their results with ConceptNet and powerful text based classifiers including
Naive Bayes and Support Vector Machines. Their experiments showed that VSM classification
gives better performance than ConceptNet, Naive Bayes and SVM based classifiers for emotion
detection in sentences. An overall F-measure value of 32.22% and kappa value of 0.18 for five
class emotional text classification on SemEval dataset which is better than Navie Bayes
(28.52%), SVM (28.6%) was obtained. Swati D. Bhutekar et.al [3] presents a methodology to
extract emotion from the text at real time and add the expression to the textual contents during
speech synthesis. This paper also focuses on implementation of creation of Corpus, emotion
recognition module etc. In text analysis, all emotional keywords and emotion modification words
are manually defined. The test was carried out on set of textual sentences and preliminary rules
written for 34 different emotions. These rules are used in an automated procedure that assigns
emotional state values to words. These values are then used by speech synthesizer to add
emotions to speech & input sentence. Pitch detection algorithm has been implemented for pitch
recognition. The system is language dependent. Changqin Quan et.al[4] , make an analysis on
sentence emotion based on emotion words using Ren-CECps (a Chinese emotion corpus). Some
classification methods (including C4.5 decision tree, SVM, NaiveBayes, ZEROR, and
DecisionTable) have been compared. Then a supervised machine learning method (Polynomial
kernel method) is proposed to recognize the eight basic emotions (Expect, Joy, Love, Surprise,
Anxiety, Sorrow, Angry and Hate). Using Ren-CECps, we get the emotion lexicons for the eight
basic emotions. Polynomial kernel (PK) method is used to compute the similarities between
sentences and the eight emotion lexicons. Then the experiential knowledge derived from
Ren-CECps is used to recognize whether the eight emotion categories are present in a sentence.
The experiments showed promising results. Ali Houjeij, Layla Hamieh et.al [5] designed a
system that adopts a novel approach for emotional classification from human dialogue based on
text and speech context. Their main objective was to boost the accuracy of speech emotional
9
Emotion Detection Analysis
classification by accounting for the features extracted from the spoken text. The proposed system
concatenates text and speech features and feeds them as one input to the classifier. The work
builds on past research on music mood classification based on the combination of lyrics and audio
features. The innovation in our approach is in the specific application of text and speech fusion
for emotion classification and in the choice of features, Furthermore, in the absence of benchmark
data, a dataset of movie quotes was developed for testing of emotional classification and future
benchmarking. The comparison of the results obtained in each case shows that the hybrid
text-speech approach achieves better accuracy than speech or text mining alone. Amira F. El
Gohary et.a l[6] are concerned with the automatic detection of emotions in Arabic text. This
construction is based on a moderate sized Arabic emotion lexicon used to annotate Arabic
children stories for the six basic emotions: Joy, Fear, Sadness, Anger, Disgust, and Surprise.
Their approach achieves 65% accuracy for emotion detection in Arabic text. Shadi Shaheen,
Wassim El-Hajj et.al k [7], proposed a framework for emotion classification in English sentences
where emotions are treated as generalized concepts extracted from the sentences. They generated
an intermediate emotional data representation of a given input sentence based on its syntactic and
semantic structure. They then generalized this representation using various ontologies such as
WordNet and ConceptNet, which resulted in an emotion seed called an emotion recognition rule
(ERR). Finally, a suite of classifiers are used to compare the generated ERR with a set of
reference ERRs extracted from a training set in a similar fashion. The used classifiers are
k-nearest neighbors (KNN) with handcrafted similarity measure, Point Mutual Information
(PMI), and PMI with Information Retrieval (PMI-IR). When applied on different datasets, the
proposed approach significantly outperformed the existing state-of-the art machine learning and
rule-based classifiers with an average F-Score of 84%.

3.3 Social media:

Social media can be referred to as the ”group of internet-based applications that build on the
ideological and technological foundations of Web 2.0, and that allow the creation and exchange
of user - generated content”, as defined by Kaplan and Haenlein .In recent years in addition to
10
Emotion Detection Analysis
the leaders of the World Wide Web such as Facebook,Google+, LinkedIn and Twitter, there are
new services for different groups of users: social network for students, the network for specific
groups of professionals, communities of ethnic minorities, and even a special network for
all the world’s drinkers. This extends the scope to very different kinds of research from consumer
preferences to psychological characteristics. In early 2015 Facebook retained the first place
among social platforms, and also Twitter was in the top ten. According to the same study by
Simon Kemp [41], more than 2 billion people worldwide are active users of social networks and
blogs. In Twitter, the number of monthly active users is 284 million in 2015. In 2016 the number
of monthly active users exceeded 320 million.
11
Emotion Detection Analysis
4. DATA CHARECTERISTICS
Twitter is a social networking and microblogging service that lets its users post real time
messages , called tweets. Tweets have many unique characteristics ,which implicates new
challenges and shape up the means of carrying sentiment analysis on it as compared to other
domains.

Following are some key characteristics of tweets:

 Message Length: The maximum length of a Twitter message is 140 characters. This is
different from previous sentiment classification research that focused on classifying
longer texts, such as product and movie reviews.

 Writing technique: The occurrence of incorrect spellings and cyber slang in tweets is more
often in comparison with other domains. As the messages are quick and short, people use
acronyms, misspell, and use emoticons and other characters that convey special meanings.

 Availability: The amount of data available is immense. More people tweet in the public
domain as compared to Facebook (as Facebook has many privacy settings) thus making
data more readily available. The Twitter API facilitates collection of tweets for training.

 Topics: Twitter users post messages about a range of topics unlike other sites which are
designed for a specific topic. This differs from a large fraction of past research, which
focused on specific domains such as movie reviews.

 Real time: Blogs are updated at longer intervals of time as blogs are Longer in nature and
writing them takes time. Tweets on the other hand being limited to 140 letters and are
updated very often. This gives a more real time feel and represents the first reactions to
events. We now describe some basic terminology related to twitter.

 Emoticons: These are pictorial representations of facial expressions using punctuation and
letters. The purpose of emoticons is to express the user’s mood.

 Target: Twitter users make use of the “@” symbol to refer to other users on Twitter. Users
12
Emotion Detection Analysis
are automatically alerted if they have been mentioned in this fashion.

 Hash tags: Users use hash tags “#” to mark topics. It is used by Twitter users to make their
tweets visible to a greater audience.

 Special symbols: “RT” is used to indicate that it is a repeat of someone else’s earlier tweet.

5.DATA ACQUISITION
Data in the form of raw tweets is acquired by using the python library “tweestream” which
provides a package for simple twitter streaming API . This API allows two modes of accessing
tweets: SampleStream and FilterStream. SampleStream simply delivers a small, random sample
of all the tweets streaming at a real time. FilterStream delivers tweet which match a certain
criteria. It can filter the delivered tweets according to three criteria:

• Specific keyword(s) to track/search for in the tweets

• Specific Twitter user(s) according to their user-id’s

• Tweets originating from specific location(s) (only for geo-tagged tweets).

A programmer can specify any single one of these filtering criteria or a multiple combination of
these. But for our purpose we have no such restriction and will thus stick to the SampleStream
mode.

Since we wanted to increase the generality of our data, we acquired it in portions at different
points of time instead of acquiring all of it at one go. If we used the latter approach then the
generality of the tweets might have been compromised since a significant portion of the tweets
would be referring to some certain trending topic and would thus have more or less of the same
general mood or sentiment. This phenomenon has been observed when we were going through
our sample of acquired tweets. For example the sample acquired near Christmas and New Year’s
had a significant portion of tweets referring to these joyous events and were thus of a generally
positive sentiment. Sampling our data in portions at different points in time would thus try to
minimize this problem. Thus forth, we acquired data at four different points which would be 17th
13
Emotion Detection Analysis

of December 2011, 29th of December 2011, 19th of January 2012 and 8th of February 2012.

A tweet acquired by this method has a lot of raw information in it which we may or may not find
useful for our particular application. It comes in the form of the python “dictionary” data type
with various key-value pairs. A list of some key-value pairs are given below:

• Whether a tweet has been favourited

• User ID

• Screen name of the user

• Original Text of the tweet

• Presence of hashtags

• Whether it is a re-tweet

• Language under which the twitter user has registered their account

• Geo-tag location of the tweet

• Date and time when the tweet was created.

Since this is a lot of information we only filter out the information that we need and discard the
rest. For our particular application we iterate through all the tweets in our sample and save the
actual text content of the tweets in a separate file given that Project Thesis Report 24 language of
the twitter is user’s account is specified to be English. The original text content of the tweet is
given under the dictionary key “text” and the language of user’s account is given under “lang”.

Since human labelling is an expensive process we further filter out the tweets to be labelled so
that we have the greatest amount of variation in tweets without the loss of generality. The filtering
criteria applied are stated below:

• Remove Retweets (any tweet which contains the string “RT”)

14
Emotion Detection Analysis

• Remove very short tweets (tweet with length less than 20 characters)

• Remove non-English tweets (by comparing the words of the tweets with a list of 2,000
common English words, tweets with less than 15% of content matching threshold are discarded)

• Remove similar tweets (by comparing every tweet with every other tweet, tweets with more
than 90% of content matching with some other tweet is discarded) .

After this filtering roughly 30% of tweets remain for human labelling on average per sample,
which made a total of 10,173 tweets to be labelled.

6. METHODOLOGY

Approaches to sentiment classification:

The existing approaches to sentiment classification fall into two large categories:

1. Approaches based on lexicon and rules.

2. Machine learning approach.

The rule-based approach uses a set of rules based on analysis of object domain that could explain
and predict the polarity of the text (or the single sentence).Rules tend to get increasingly
complicated that lets to increase accuracy of results.

Disadvantages are associated by the large amount of time and expertise needed to design such
rules. Using of this approach for analysis of micro blogging may be difficult due to noise data.
The performance of rule-based approaches tends to be comparably robust across domains and
texts, but it is typically inferior to the performance of machine learning methods of polarity
classification. Rule-based methods mostly rely on lexicons that list words and their associated
sentiment scores. The sentiment scores of words in a text are typically combined (e.g., summed or
averaged) in accordance with predefined rules and assumptions in order to obtain a text’s overall
sentiment score, which can be used as an indicator for the text’s polarity. A lexicon based
15
Emotion Detection Analysis

approach uses an affective lexicon to derive the polarity of the examined text. Affective lexicons
contain lists of words either divided by certain sentiment classes (e.g. positive, negative,neutral)
or providing a single list of words each associated with a numerical value representing its
polarity. The follows dictionaries (affective lexicons) can be used for the English language:

 ANEW (Affective Norms of English Words ) : is a set of normative emotiona

ratings for 1034 English words developed by Bradley and Lang from the NIMH Center for
Emotion and Attention (CSEA) at the University of Florida. For each word in the dataset ,
there are scores for three dimensions of emotional assessment: valence (ranging from
pleasant to unpleasant), arousal (ranging from calm to excited) and dominance (ranging
from in-control to dominated).This dataset is a useful tool for emotion studies as well as for
sentiment analysis.

 WordNet: WordNet is one of the largest lexical resource for English language which is
extensively used in scientific research. Multiple words can form a synset, a set of words
that may be used synonymously within a word sense. Additionally, synonymous relations
are defined between synsets, leading to a taxonomy structure. These relations may
be used for generalization over objects (e.g., cheese and bread are a type of food).
WordNet 3.0 lists 117,798 nouns in 82,115 synsets. WordNet also contains words of other
part-of-speech, such as adjectives and verbs, however they have much lower coverage and
their taxonomies are relatively flat.

The process of creating an emotion wordnet is as follows:

def emotion_word_set(emotions):
word_set = {}
for e in emotions:
representative_words = readfile(e)
wordlist = affect_wordlist(representative_words)
16
Emotion Detection Analysis
word_set[e] = wordlist
return word_set

Figure:Graphical visualization of WordNet.Nodes represent synsets, edges

represent relations between synsets.

 SentiWordNet: SentiWordNet is a lexical resource for sentiment analysis developed by

Baccianella et al. It was constructed by automatic annotation of WordNet synsets.
SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity,
negativity, objectivity.
17
Emotion Detection Analysis
Machine learning methods involve training of models on specific collections of
documents (i.e., corpora) by means of mostly supervised methods that exploit patterns in vector
representations of natural language text. For collections of documents or datasets the class

attribute values for the dataset are known. This data is called training data . The training data
consists of a set of training examples . To evaluate the performance of the learned
model after training is finished, one applies it to a different set of data, known as test data.
Sometimes, people also use part of the whole dataset as validation dataset for model selection,
i.e., select from all the models obtained from the training process the one model with the best
performance on the validation dataset. Machine learning methods can be divided into
classification, regression, clustering. When the class attribute is discrete, it is called classification;
when the class attribute is continuous, it is regression. In clustering, the data is often unlabeled.
Thus ,the label for each instance is not known to the clustering algorithm.

Considering the task of sentiment analysis in social media in this thesis, classification is used.
Classification methods such as decision trees, naive Bayes classifier and other are considered in
the Section 6.1. Many researchers have used a combination of the two approaches: machine
learning and approache based on affective lexicon and rules. The reason is that the hybrid
approach in practice shows the best results. In this thesis , machine learning classification
methods are considered. Since these methods show good results in the sentiment analysis of
social media (blog-sites and reviews). The approaches described above such as approaches based
on rules and affective lexicon are beyond the scope of this thesis.

6.1 Machine Learning Techniques:

We begin this section with a formal definition to the text classification problem .Then, we
consider different techniques which are used for sentiment analysis in social media.
18
Emotion Detection Analysis
6.2 The Classification Problem:
In general , the problem of text classification is defined as follows [50]. Given a description d ∈ X
of a document, where X is the document space, a fixed set of classes C = {c1, c2, . . . , cm} and a
training set D of labeled documents hd, ci, where hd, ci ∈ X × C using a learning algorithm Γ, we

need learn a classifier or classifier function Γ(D) = γ that maps documents to classes:

γ : X → C.

For the sentiment analysis of social media, the set C consists of three classes

C = {positive, negative, neutral}.

6.3 Naive Bayes Classifier:

Among many methods that use the Bayes theorem, the naive Bayes classifier is the simplest one.
Given two random variables X and Y , Bayes theorem states that

P(Y|X) =P(X|Y)P(Y)/P(X) (2.1)

In Naive Bayes classifier, Y represents the class variable and X represents the instance feature.
Let X be (x1, x2,. . . , xm), where xi represents the value of feature i.

Let (y1, y2, . . . , yn) represents the value the class attribute Y can take. Then, the class attribute
value of instance X can be calculated by measuring

argmaxyiP(yi|X). (2.2)

Based on the Bayes theorem,

P(yi|X) = P(X|yi)P(yi)/P(X) (2.3)

Note that P(X) is constant and independent of yi, so we can ignore the denominator of
Equation 2.3 when maximizing Equation 2.2. The Naive Bayes Classifier also assumes
conditional independence to make the calculations easier ; that is , given the class
19
Emotion Detection Analysis
attribute value, other feature attributes become conditionally independent . This
condition, though unrealistic, performs well in practice and greatly simplifies calculation.

P(X|yi) =m j=1P(xj |yi) (2.4)

Substituting P(X|yi) from Equation 2.4 in Equation 2.3, we get

P(yi|X) =m j=1 P(xj |yi)P(yi)/P(X) (2.5)

where m is the total number of words in Y .

Naive Bayes algorithm is also called the probabilistic method .In Naïve Bayes algorithm and
binary keyword were simultaneously used to produce a single Dimensional degree of sentiment
entrenched in tweets from twitter network.There are two different ways we can set up the Naive
Bayes classifier: the multinomial model and the multivariate Bernoulli model. In this thesis the
multinomial Naive Bayes model is considered.

Testing a naive bayes classifier is as follows:

def testing(cl, test):

print( cl.classify('angry'))
for s, e in test:
r = cl.classify(s)
print (s, e, r)
if r == e:
print ("*")
6.4 ALGORITHM:

1. Consider a training data set D consists of documents which belongs to different classes
say class A and B.
20
Emotion Detection Analysis
2. Prior probability of both classes A and B is calculated as shown Class A=number of objects of
class A / total number of objects. Class B=number of objects of class B / total number of objects.

3. Now calculate the total number of word frequencies of both classes A and B i.e., ni na = the
total number of word frequency of class A. nb =the total number of word frequency of class B.

4. Calculate the conditional probability of keyword occurrence for given class

P(word1 / class A) = wordcount / ni(A)

P(word1 / class B) = wordcount / ni(B)

P(word2 / class A) = wordcount / ni(A)

P(word2 / class B) = wordcount / ni (B)

…………………………………………

P(wordn / class B) = wordcount / ni (B)

5. Uniform distributions are to be performed in order to avoid zero frequency problem.

6. Now a new document M is classified based on calculating the probability for both classes A
and B P (M/W).

a) Find P(A / W) = P(A) * P(word1/class A)* P(word2/ class A)……* P(wordn / class A).

b) Find P(B / W) = P(B) * P(word1/class B)*P(word2/ class B)……* P(wordn / class B).

7. After calculating probability for both classes A and B the class with higher probability is the
one the new document M assigned.
21
Emotion Detection Analysis

Learning
Algorithm

Training Set Bayes training

Bayes Mode;
Bayes Testing

Test Set

Predicted
Results

Fig:General Approach to naïve bayes method

7. SYSTEM DESIGN

The proposed methodology can be divided into different modules.The first module fetches the
tweets. Second module extracts and stores the tweets in Mysql database and third module extracts
emotions from tweets using Naïve Bayes Classifier.

7.1 Fetching tweets

In this module, tweets are fetched from the twitter website. The package twitter is imported for
the same.
There are two sub modules within this module
22
Emotion Detection Analysis
1 Twitter OAuth

Here the application should authenticate with twitter so as to access the twitter data. For this, an
account has to be created in twitter. Four parameters will be provided by twitter for the developers
using which the developers and their applications can be authenticated. The four parameters are:
consumer key, consumer secret, access token and access secret. After authentication, data can be
fetched from twitter.

2 Extraction of tweets

Tweets are extracted from the site using searchTwitter() function. The function takes the
following inputs:

 Search query to be issued to twitter.

 Maximum number of tweets to be returned.
 Language.
 Date since which the tweets have to be obtained.

The search for tweets is done using the hashtag of respective phone name. The search can also be
done using other twitter handles as well.

7.2 Processing and storing tweets

In this module, tweets after processing are stored into the database. Mongodb [11] is used as the
database.
This module can be viewed as two submodules .

1) Processing of tweets:
Here the tweets are subjected to a set of procedures which makes them suitable for sentiment
analysis in the next module. The following steps are involved in the processing of tweets:
• Removal of re-tweets.
23
Emotion Detection Analysis
• Removal of handles like # etc.
• Removal of punctuations.
• Removal of digits.
• Removal of web addresses like https:// etc.
• Removal of blank spaces.
• Convert the entire text to lower case.

2) Storing tweets
The processed tweets are then converted into a dataframe and inserted into a mysql collection.

7.3 Extraction of emotions from tweets

Emotions are extracted from tweets stored in Mysql using Naive Bayes Classifier. In this paper
we are considering emotion categories like fear, joy, sadness etc. Tweets from the database are
fed to the classifier and it classifies the emotions into different emotion classes.The methodology
is applied to latest 500 tweets extracted from Twitter. Five mobile phone brands are considered
for study- Samsung, nexus, iphone, lenovo, motorola. Interface is created using shiny, Emotion
categories.

Evaluation metrics used are precision, recall and accuracy.

Precision is the measurement of correctness.

Recall is the measurement of completeness.

Accuracy is overall performance of individual classifier is measured by:

24
Emotion Detection Analysis

Code to get accuracy of the data:

def get_accuracy(test_data, classifier):
total = accuracy = float(len(test_data))
for data in test_data:
if classify_dataset(data[0]) != data[1]:
accuracy -= 1
print('Total accuracy: %f%% (%d/20).' % (accuracy / total * 100, accuracy))
25
Emotion Detection Analysis
8. IMPLEMENTATION
In proposed work, we have discussed how a sentiment is extracted from a tweet/text using Twitter
dataset. It is a place where the users posts their views and opinions based on the situation. The
main objective of our proposed system is to perform analysis on tweets having sentiment which
causes the great help to business intelligence on predicting the future. This paper addresses the
sentiment analysis on twitter dataset; that is at first classification is performed on tweets using
naïve bayes classifier. Each tweet is represented in the form of sentiment asserted in terms of
positive, negative and neutral. Performing sentiment analysis is vital which is used to find out the
pros and cons of their products in the market by public that results in improving their business
productivity. The aim of this project is to develop a classification technique using machine
learning which gives accurate results and automatic sentiment classification of an unknown tweet
by predicting the future. In this paper, sentiment analysis is done on Twitter data. The dataset is
collected which contains 65536 tweets these tweets are collected based on the situation on all
topics. There are different attributes in the database such as item-id, sentiment, sentiment source,
sentiment text but sentiment text has been considered for our proposed research. The first
attribute item-id contains the id of the tweet, the second attribute sentiment represents the
Boolean value (1 or 0) i.e., the tweet containing sentiment is taken as 0 and tweet without any
sentiment is declared as 1, and the third attribute sentiment source represents the source from the
tweet is taken and of maximum length 140 characters, and the last attribute sentiment text
represents the text or tweet based on all situations either containing sentiment or not. Our main
aim is to perform analysis on these tweets and conclude the tweets which are positive and
negative.

So in order to classify data first, we need to perform the following steps.

 Tokenization:

It is a method that divides the variety of document into small parts called tokens. These
tokens may be in the form of words or numbers or punctuation marks.

Ex: it is going to rain today

26
Emotion Detection Analysis
After performing tokenization the sentence is divided into tokens as follows

“It”, “is”, “going”, “to”, “rain”, “today”.

 Stop words:

These are the common words that are to be ignored which reduces the size of the dataset
also the no of words (tokens). In our programming language python we use a tool called natural
language tool kit(NLTK) in which there is list of stop words in 16 different languages.

Ex: I like dancing, so I dance.

After removing stop words the sentence will be as follows Like,dancing,dance.

 Bag of words concept is applied to these tokens.

 Finally, our classification technique Naïve Bayesian classifier is applied which

calculates the probability of all words in the document and gives the result i.e., probability of
each tweet in both positive and negative.

 Results show the probability of each tweet saying whether the tweet is either
positive or negative.

• Url’s and user references (identified by tokens “http” and “@”) are removed if we are
interested in only analyzing the text of the tweet.

• Punctuation marks and digits/numerals may be removed if for example we wish to

compare the tweet to a list of English words.

The process of removing stopwords is as follows:

def removal(sentences):
sentence_list = []
count = 0
for sen in sentences:
27
Emotion Detection Analysis
count += 1
print count
print sen
print type(sen)
s = nltk.word_tokenize(sen)
characters = ["Ã¡", "\xc3", "\xa1", "\n", ",", "."]
new = ' '.join([i for i in s if not [e for e in characters if e in i]])
sentence_list.append(new)
return sentence_list
• Lowercase Conversion:

Tweet may be normalized by converting it to lowercase which makes it’s comparison with an
English dictionary easier.

• Stemming:

It is the text normalizing process of reducing a derived word to its root or stem [28]. For example
a stemmer would reduce the phrases “stemmer”, “stemmed”, “stemming” to the root word
“stem”. Advantage of stemming is that it makes comparison between words simpler, as we do not
need to deal with complex grammatical transformations of the word. In our case we employed the
algorithm of “porter stemming” on both the tweets and the dictionary, whenever there was a need
of comparison. The process of stemming is as follows:

def stemming(sentences):
sentence_list = []
sen_string = []
sen_token = []
stemmer = PorterStemmer()
i=0
for sen in sentences:
28
Emotion Detection Analysis
print i,
i += 1
st = ""
for word in sen:
word_l = word.lower()
if len(word_l) >= 3:
st += stemmer.stem(word_l) + " "
sen_string.append(st)
w_set = nltk.word_tokenize(st)
sen_token.append(w_set)
w_text = nltk.Text(w_set)
sentence_list.append(w_text)
return sentence_list, sen_string, sen_token

• Parts-of-Speech Tagging:

POS-Tagging is the process of assigning a tag to each word in the sentence as to which
grammatical part of speech that word belongs to, i.e. noun, verb, adjective, adverb, coordinating
conjunction etc.

8.1 Bag-of-words:
A bag-of-words is a representation of text that describes the occurrence of words within a
document. The occurrence of words is represented in a numerical feature. It is a way of extracting
features from the text for use in modelling, such as with machine learning algorithms.The
approach is very simple and flexible and can be used for extracting features from documents.But
there is some complexity on twocases i.e., one is on designing the vocabulary of known words
and the other is on scoring the presence of known words. Let us consider there are 2 classes i.e.,
positive class and negative class. Each class contains some words that is positive class contains
29
Emotion Detection Analysis
some bag of positive words (slow, fine, good, fantastic) and negative class contains some bag of
negative words (hate, terrible, heavy). We will give the input as a text/sentence and starts
counting the frequency of each word in the document and this gives the result whether the
text/sentence belongs to positive class or negative class.

Example: “it is going to rain today”

“today I am not going outside"

“I am going to watch the season premiere”

Now tokenization is performed on these lines then we get

Line 1: It, is,so, hot, today.

Line 2: Today, I, am, not, going, outside.

Line 3: I, am, going, to, watch, the, season, premiere.

Now we have to maintain a table which contains 2 columns with attributes word and counts
i.e., we are going to count frequency of each word in the document.

Table1: Frequency of each word represented as count

Word Count
It 1
Is 1
So 2
Hot 1
Today 1
I 2
Am 2
Not 1
Outside 1
Watch 1
The 1
Season 1
30
Emotion Detection Analysis
Primere 1
Now after completion of the table, we came to know the words that are frequently occurred and
the words with rare cases. So due to this, we don't really want to consider all the different words
that appear in different documents. We consider only some fixed words. As from the above table,
there is a total of 14 words but after that, there are only 10 most frequent words which reduce the
comparisons.

8.2 Application of sentiment analysis:

Naïve Bayes classifier is one of the supervised classification technique which classifies the
text/sentence that belongs to particular class. It is the probabilistic algorithm which calculates the
probability of each word in the text/sentence and the word with highest probability is considered
as output.

 Let us consider a document a

 A document a with a set of classes B = { b1, b2, … , bn}
 Consider a training set having m documents which is pre-determined that belongs to
a particular class.

Now we train our classification algorithm using this training set and we get trained classifier. By
using this trained classifier we can classify the new document.

8.3 Bayesian Theorem applied to Documents.

For a document a and a class b using Bayesian theorem,

P (b | a) = [p (a | b) * p (b)] / [p (a)]

The term p (a|b) is represented as

 Now representing the document a as a set of features (words or tokens) x1, x2, x3 … We
can then re-write
P (a | b) as: P(x1, x2, x3… xn | b)
 P (b) is defined as total probability of a class,Which gives the frequency of class b
31
Emotion Detection Analysis
Example : let us consider two classes positive and negative without analysing the input document
the probability of text/sentence is calculated which results whether the text is positive or negative
The calculation is done by counting the relative frequencies of each class in a corpus.

E.g. out of 10 reviews we have seen, 4 have been classified as positive.

P (positive) = 4 / 10

Example: consider a tweet or a sentence such as “It is going to rain today" now. We are going to
apply naïve Bayes classifier and say whether the sentence is either positive or negative.By
considering the bag of words concept i.e., which contains some positive and negative words and
their frequency counts. Now comparison is performed among each word in the sentence and
positive and negative words in bag of words. Probability is calculated on both positive and
negative words, the words which is having highest probability is taken into consideration i.e., if
the positive words having highest probability, then the tweet is considered as positive and vice
versa.

P(it is so hot today) = P(it)P(is)P(so)P(hot) P(today)

By removing stop words, the words will be reduced.

P (it | positive)P(is | positive)P (so | positive)P (hot | positive)P(today | positive) .

After calculating the probability of the above statement we get some overall probability for
positive words.

P (it | negative)P(is | negative)P (so | negative)P (hot | negative)P(today | negative)

After this, we will get some overall probability for negative words.

Now by comparing both probabilities the words having the highest range is taken into
consideration. By this, we can say whether the new input tweet or sentence is positive or negative.
32
Emotion Detection Analysis
9. EVALUATION METRICS:
Table depicts the precision, recall and accuracy corresponding to the emotions sad, joy
and surprise extracted from latest 500 tweets.

Emotion Precision Recall Accuracy

Joy 1 0.98 99.31
Fear 1 0.982 99.31
Anger 0.8 1 99.31
Sadness 0.98 0.9 99.31
Disgust 1 0.8 99.31
Shame 0.9 0.98 99.31
Guilt 0.8 0.88 99.31

10. RESULT AND ANALYSIS

This is the result obtained after preprocessing and processing phases are applied and in table
the actual tweet is the dataset that consists of all the emotions such as
joy,fear,sadness,disgust,anger,shame. I was in a state of joy and shock with your results‖. After
preprocessing phase is completed the tweet contain all joy , fear, sadness, disgust, anger, shame
results.

The below table depicts the total number of tweets according the emotions mentioned below.

Total Joy Fear Anger Sadness Disgust Shame

number of
tweets
7529 1213 1320 1530 1600 520 350

The below graph represents the total number of tweets tweeted according to the emotions.
33
Emotion Detection Analysis
14

8 max
6 avg

4 min

0
Joy Fear Anger Sadness Disgust Shame

In addition to the above information, we make a condition while reporting the results of
polarity classification (which differentiates between positive and negative classes) that only
subjective labelled tweets are used to calculate these results. However, in case of final
classification approach, any such condition is removed and basically both objectivity and polarity
classifications are applied to all tweets regardless of whether they are labelled objective or
subjective. If we compare these results to those provided by Wilson et al. (results are displayed in
Table 1 and Table 2 of this report) we see that although the accuracy of neutral class falls from
82.1% to 73% if we use our classification instead of theirs. However, for all other classes we
report significantly greater results. Although the results presented by Wilson et al. are not from
Twitter data they are of phrase level sentiment analysis which is very close in concept to Twitter
sentiment analysis. Next we will compare our results with those presented by Go et al. The results
presented by this paper are as follows:

Features Navies Bayes Max Entropy

Unigram 81.3% 80.5%

Bigram 81.6% 79.1%

Unigram+POS 79.9% 79.9%

Unigram+bigram 82.7% 83.0%

34
Emotion Detection Analysis
11. CONCLUSION
This work represents case study and aims to compare standard machine learning techniques
applied to sentiment analysis of social media – specifically to the sentiment classification of the
data gathered from social network Twitter – and to find the best performing method for three
datasets various sizes and from different domains collected from Twitter. To answer research
questions we conducted the experiments. And since we have datasets of different sample size
with tweets from different domains, experiments for each dataset were carried out. Still main
tendencies was the same for all datasets. To answer the research question and find the most
accurate method for classication these datasets, on the first place the data were analyzed and
before applying machine learning techniques to text data, the data were represented in the vector
form. In purpose to find the best data representation, experiments were conducted and
Bag-of-Words model (vector of unigrams) and Bag-of-N-grams model (vector of bigrams and
vector of trigrams) of representation of the text were used. The one of research questions was
about the data itself, which techniques for the preprocessing of text data can be used to provide

the best foundation for classifiers. To find the best data representation, all datasets were
represented in vector models: Bagof-Words model (unigrams) and Bag-of-N-grams model
(bigrams and trigrams). Bag-ofWords model (unigrams) of representation of the datasets showed
the best results for all methods and influenced in a positive way improving the overall accuracy of
the machine learning techniques than Bag-of-N-grams model (bigrams and trigrams). The
primary research question is about comparing machine learning models for sentiment
classification of the datasets collected from Twitter and than analysing the outcome. Several
machine learning methods were used during experimentation session: Naive Bayes. All the
methods were compared in terms of accuracy. As turned out all the machine learning models that
have been applied to these datasets gave good performance results. Conclusion The main point
for further work is to use recurrent neural networks, particularly LSTM (Long short-term
memory) model and deep learning models for experimentation. The resulting vectors of text data
35
Emotion Detection Analysis
representation could be used if necessary as part of deep learning models. Regarding
representation text as vector models could be also used another techniques taking into account the
frequency with which a term appears in the collection of tweets. Examples are the model TF
(Term Frequency - the frequency of the term) and TF-IDF (Term Frequency - Inverse Document
Frequency).

12. FUTURE WORK

Although we have made a step towards the analysis and prediction of Naive Bayes MapReduce
implementation, there is still much more work that needs more and deeper research.

(1) The collected sample data is limited. The streaming data may be big enough, however, less
than 5% of them have geographical information available. The amount of tweets finally flowed
into our system are very small. A method to predict or deduct the location of a tweet based on the
tweet’s information and the user’s information should be found in the future.

(2) For the parallel Naive Bayes MapReduce implementation in this thesis, it is not only can be
applied to text classification or sentiment analysis. It can also be extended to other areas like
image processing, or gene analysis etc.

(3) We have concluded the performance prediction model related to the size of input data and the
number of nodes in a cluster, we also have a certain understanding of the relationship between the
performance and the global counters, and the hardware configuration. For the latter, we still need
more data and experiments to have more details, such as how many global counters we should use
for a certain use case, or under what situation that upgrading the hardware is worth the
improvements it brings.

(4) When the MapReduce framework is dealing with some algorithms that have a high time
complexity in the data mining area, the consumption of resource is increased since the exchange
of data will bring a lot of I/O transferring and 73 network communication during the iteration.
Therefore, how to reduce and optimize the data exchange in the MapReduce framework is also an
issue that needs to be solved.
36
Emotion Detection Analysis

13.REFERENCES

Steven Bird, Even Klein & Edward Loper. Natural Language Processing with python

Parrott, W.G, “Emotions in Social Psychology,” in Psychology Press, Philadelphia 2001

S. Radha Krishna, R. Rajeswara Rao, “Automatic Text-Independent Emotion Recognition

Using Spectral Features”, in Journal of Innovation in Computer Science and Engineering

Taner Danisman, Adil Alpkocak, “Feeler: Emotion Classification of Text Using Vector Space
Model”.

Albert Biffet and Eibe Frank. Sentiment Knowledge Discovery in Twitter Streaming Data.
Discovery Science, Lecture Notes in Computer Science, 2010, Volume 6332/2010.

Alec Go, Richa Bhayani and Lei Huang. Twitter Sentiment Classification using Distant
Supervision. Project Technical Report, Stanford University, 2009.

Alexander Pak and Patrick Paroubek. Twitter as a Corpus for Sentiment Analysis and Opinion
Mining. In Proceedings of international conference on Language Resources and Evaluation
(LREC), 2010.

Andranik Tumasjan, Timm O. Sprenger, Philipp G. Sandner and Isabell M. Welpe. Predicting
Elections with Twitter: What 140 Characters Reveal about Political Sentiment. In Proceedings of
AAAI Conference on Weblogs and Social Media (ICWSM), 2010.

Bo Pang, Lillian Lee and Shivakumar Vaithyanathan. Thumbs up? Sentiment Classification
using Machine Learning Techniques. In Proceedings of the Conference on Empirical Methods in
Natural Language Processing (EMNLP), 2002.
37
Emotion Detection Analysis

Chenhao Tan, Lilian Lee, Jie Tang, Long Jiang, Ming Zhou and Ping Li. User Level Sentiment
Analysis Incorporating Social Networks. In Proceedings of ACM Special Interest Group on
Knowledge Discovery and Data Mining (SIGKDD), 2011.

Sentiment Analysis On Twitter
100% (2)
Sentiment Analysis On Twitter
8 pages
Plan de Lectie Engleza
100% (1)
Plan de Lectie Engleza
7 pages
Emotion Based Music Recommendation System
No ratings yet
Emotion Based Music Recommendation System
4 pages
Emotion Detection
No ratings yet
Emotion Detection
17 pages
CAPTCHA
No ratings yet
CAPTCHA
10 pages
Sentiment Analysis For Movie Reviews
No ratings yet
Sentiment Analysis For Movie Reviews
3 pages
Face Detection and Feature Extraction For Facial Emotion Detection
No ratings yet
Face Detection and Feature Extraction For Facial Emotion Detection
6 pages
62 Green Screen Background Remover Using OpenCV System PY062
No ratings yet
62 Green Screen Background Remover Using OpenCV System PY062
8 pages
Human Emotion Detection System
50% (6)
Human Emotion Detection System
54 pages
EDandSA On Social Media
No ratings yet
EDandSA On Social Media
8 pages
Comparison of Various ML and DL Models For Emotion Recognition Using Twitter
No ratings yet
Comparison of Various ML and DL Models For Emotion Recognition Using Twitter
6 pages
crowd sourcing platform IEEE paper 1
No ratings yet
crowd sourcing platform IEEE paper 1
7 pages
ML Paper (Namrit & Ritika)
No ratings yet
ML Paper (Namrit & Ritika)
16 pages
Anger Detection in Social Media For Resource Scarce Languages
No ratings yet
Anger Detection in Social Media For Resource Scarce Languages
5 pages
Polarity Identification Through Emoticon Using Context Based Sentiment Analysis_1605073640
No ratings yet
Polarity Identification Through Emoticon Using Context Based Sentiment Analysis_1605073640
5 pages
CP5074 - SNA Unit V Notes
No ratings yet
CP5074 - SNA Unit V Notes
21 pages
Twitter Sentiment Analysis Using Deep Learning
No ratings yet
Twitter Sentiment Analysis Using Deep Learning
17 pages
Classification of Airline Tweet Using Nave-Bayes Classifier For Sentiment Analysis
No ratings yet
Classification of Airline Tweet Using Nave-Bayes Classifier For Sentiment Analysis
6 pages
Sentiment Analysis of Tweets Using Machine Learning
No ratings yet
Sentiment Analysis of Tweets Using Machine Learning
22 pages
TSA Synopsis
No ratings yet
TSA Synopsis
18 pages
93-Submission+548+pp-616-622
No ratings yet
93-Submission+548+pp-616-622
7 pages
Emotions Detection From Messages Using Machine Learning: Abstract
No ratings yet
Emotions Detection From Messages Using Machine Learning: Abstract
4 pages
Sustainability 15 12539
No ratings yet
Sustainability 15 12539
24 pages
Emotion Detection Using Text
No ratings yet
Emotion Detection Using Text
5 pages
Depression Detection Emotion AI
No ratings yet
Depression Detection Emotion AI
5 pages
Twitter Sentiment Analysis System
No ratings yet
Twitter Sentiment Analysis System
5 pages
Wang 2020
No ratings yet
Wang 2020
30 pages
Sentiment Analysis of Tweets: An Emoticon- Focused Method
No ratings yet
Sentiment Analysis of Tweets: An Emoticon- Focused Method
6 pages
Twiiter Sentiment Analysis
No ratings yet
Twiiter Sentiment Analysis
15 pages
Social Media Sentiment Analysis Based on COVID-19
No ratings yet
Social Media Sentiment Analysis Based on COVID-19
16 pages
Sciencedirect
No ratings yet
Sciencedirect
6 pages
Michael Final Project
100% (1)
Michael Final Project
59 pages
Sentimental Analysis On Twitter Data Using Naive Bayes: Ijarcce
No ratings yet
Sentimental Analysis On Twitter Data Using Naive Bayes: Ijarcce
4 pages
1806.00674v1
No ratings yet
1806.00674v1
14 pages
TWITTER SENTIMENT NLP Projectt
No ratings yet
TWITTER SENTIMENT NLP Projectt
19 pages
Sentiment Analysis of User-Generated Twitter Updates Using Various Classification Techniques
No ratings yet
Sentiment Analysis of User-Generated Twitter Updates Using Various Classification Techniques
18 pages
Pre Processing
No ratings yet
Pre Processing
9 pages
Ijcse What
No ratings yet
Ijcse What
10 pages
Sentiment Analysis - Beyond Polarity
No ratings yet
Sentiment Analysis - Beyond Polarity
42 pages
Introduction
No ratings yet
Introduction
27 pages
Enhancing Emotion Detection in Textual Data: A Comparative Analysis of Machine Learning Models and Feature Extraction Techniques
No ratings yet
Enhancing Emotion Detection in Textual Data: A Comparative Analysis of Machine Learning Models and Feature Extraction Techniques
7 pages
Implementation of Sentiment Analysis On Twitter Data
No ratings yet
Implementation of Sentiment Analysis On Twitter Data
6 pages
10 1109@icaccs48705 2020 9074208
No ratings yet
10 1109@icaccs48705 2020 9074208
3 pages
Sentiment Analysis On Twitter in R
No ratings yet
Sentiment Analysis On Twitter in R
3 pages
Machine Learning With Advance Model
No ratings yet
Machine Learning With Advance Model
19 pages
S Arlan 2014
No ratings yet
S Arlan 2014
5 pages
Sentiment Analysis On Twitter Using Streaming Api: Abstract
No ratings yet
Sentiment Analysis On Twitter Using Streaming Api: Abstract
5 pages
Abstract
No ratings yet
Abstract
2 pages
Mondal - Mining Emotions On Plutchik's Wheel
No ratings yet
Mondal - Mining Emotions On Plutchik's Wheel
6 pages
10.1016 J.ins.2013.12.059 Sentiment Topic Models For Social Emotion Mining
No ratings yet
10.1016 J.ins.2013.12.059 Sentiment Topic Models For Social Emotion Mining
11 pages
Project Report
No ratings yet
Project Report
10 pages
Sentiment Analysis On Twitter Data
No ratings yet
Sentiment Analysis On Twitter Data
23 pages
Large Scale Sentiment Analysis On Twitter With Spark: Nikolaos Nodarakis Spyros Sioutas
No ratings yet
Large Scale Sentiment Analysis On Twitter With Spark: Nikolaos Nodarakis Spyros Sioutas
8 pages
IJRPR6548
No ratings yet
IJRPR6548
5 pages
571 Document Mod
No ratings yet
571 Document Mod
30 pages
Emotional Sentiment Analysis of Social Media Conte
No ratings yet
Emotional Sentiment Analysis of Social Media Conte
12 pages
Machine Learning Algorithm For Sentimental Analysis of Twitter Feeds
No ratings yet
Machine Learning Algorithm For Sentimental Analysis of Twitter Feeds
4 pages
Social Media Sentiment
No ratings yet
Social Media Sentiment
8 pages
W14-6905
No ratings yet
W14-6905
8 pages
Challenges_and_Opportunities_of_Text-Based_Emotion_Detection_A_Survey
No ratings yet
Challenges_and_Opportunities_of_Text-Based_Emotion_Detection_A_Survey
35 pages
Ppt- Sentiment Analysis Using Machine Learning Algorithms
No ratings yet
Ppt- Sentiment Analysis Using Machine Learning Algorithms
23 pages
Reference Word
100% (2)
Reference Word
41 pages
Simple Past: TH TH ND
No ratings yet
Simple Past: TH TH ND
4 pages
3rd TERM: Extra-Work For 2 Bachillerato
No ratings yet
3rd TERM: Extra-Work For 2 Bachillerato
3 pages
Asking For Help and Directions
No ratings yet
Asking For Help and Directions
11 pages
Stovel (2005) Reconciliation and Trust
No ratings yet
Stovel (2005) Reconciliation and Trust
280 pages
New Cornerstone L5 U3-U5 - Student Edition
No ratings yet
New Cornerstone L5 U3-U5 - Student Edition
124 pages
3 Irregular-Connect-4
No ratings yet
3 Irregular-Connect-4
2 pages
Common Vocabulary Mistakes
No ratings yet
Common Vocabulary Mistakes
4 pages
English Phonetics - Task 2 - Practical Exercises
No ratings yet
English Phonetics - Task 2 - Practical Exercises
5 pages
Making Oral Presentations: Mcgraw-Hill/Irwin
No ratings yet
Making Oral Presentations: Mcgraw-Hill/Irwin
14 pages
JOURNAL First Week Practicum
100% (1)
JOURNAL First Week Practicum
6 pages
A Glossary of Indian Figures of Speech PDF
No ratings yet
A Glossary of Indian Figures of Speech PDF
128 pages
Leger Nilai Rapor Kelas XI TPM 3
No ratings yet
Leger Nilai Rapor Kelas XI TPM 3
5 pages
Academic Writing: Liv Jonassen Elizabeth Tomchak
100% (3)
Academic Writing: Liv Jonassen Elizabeth Tomchak
58 pages
David Utz - Language, Writing, and Tradition in Iran
No ratings yet
David Utz - Language, Writing, and Tradition in Iran
46 pages
GORGOR AGENCY UNIVERSITIES For Agencies PDF
No ratings yet
GORGOR AGENCY UNIVERSITIES For Agencies PDF
84 pages
1 DA in English
No ratings yet
1 DA in English
7 pages
Unit 1 Language Acquisition: Structure
No ratings yet
Unit 1 Language Acquisition: Structure
17 pages
10 Benefits of Reading
No ratings yet
10 Benefits of Reading
4 pages
Antonim
No ratings yet
Antonim
4 pages
Grade 1 Creative Writing
No ratings yet
Grade 1 Creative Writing
118 pages
Preschooler Development by Age and Domain
No ratings yet
Preschooler Development by Age and Domain
8 pages
The English Route: Teacher's Manual
0% (1)
The English Route: Teacher's Manual
134 pages
Openmind 3 Unit 11 Wordlist - Thai
No ratings yet
Openmind 3 Unit 11 Wordlist - Thai
6 pages
Murphy Raymond English Grammar in Use-37
No ratings yet
Murphy Raymond English Grammar in Use-37
1 page
Of Rice and Yen: An Englishman's Look at The Best and Worst of Japan
100% (1)
Of Rice and Yen: An Englishman's Look at The Best and Worst of Japan
252 pages
UML Reference Card
100% (10)
UML Reference Card
2 pages
Notes on English Paper 1 Questions 1
No ratings yet
Notes on English Paper 1 Questions 1
34 pages
06 - 14 - ESL-Brains-From-a-pushover-to-your-own-boss-TV-8349
No ratings yet
06 - 14 - ESL-Brains-From-a-pushover-to-your-own-boss-TV-8349
7 pages

Emotion Detection Analysis Documenration

Uploaded by

Emotion Detection Analysis Documenration

Uploaded by

TABLE OF CONTENTS

Keywords: Emotion detection, navies bayes, python.

2.2 PROBLEM STATEMENT:

2.3 DOMAIN INTRODUCTION:

Sentiment analysis in the domain of micro-blogging is a relatively new research topic

3.2 Related Work:

3.3 Social media:

Following are some key characteristics of tweets:

• Specific keyword(s) to track/search for in the tweets

• Specific Twitter user(s) according to their user-id’s

• Tweets originating from specific location(s) (only for geo-tagged tweets).

• Whether a tweet has been favourited

• Screen name of the user

• Original Text of the tweet

• Geo-tag location of the tweet

• Date and time when the tweet was created.

• Remove Retweets (any tweet which contains the string “RT”)

Approaches to sentiment classification:

1. Approaches based on lexicon and rules.

2. Machine learning approach.

 ANEW (Affective Norms of English Words ) : is a set of normative emotiona

The process of creating an emotion wordnet is as follows:

Figure:Graphical visualization of WordNet.Nodes represent synsets, edges

 SentiWordNet: SentiWordNet is a lexical resource for sentiment analysis developed by

6.1 Machine Learning Techniques:

C = {positive, negative, neutral}.

6.3 Naive Bayes Classifier:

P(Y|X) =P(X|Y)P(Y)/P(X) (2.1)

Based on the Bayes theorem,

P(yi|X) = P(X|yi)P(yi)/P(X) (2.3)

P(X|yi) =m j=1P(xj |yi) (2.4)

Substituting P(X|yi) from Equation 2.4 in Equation 2.3, we get

P(yi|X) =m j=1 P(xj |yi)P(yi)/P(X) (2.5)

where m is the total number of words in Y .

Testing a naive bayes classifier is as follows:

def testing(cl, test):

4. Calculate the conditional probability of keyword occurrence for given class

P(word1 / class A) = wordcount / ni(A)

P(word1 / class B) = wordcount / ni(B)

P(word2 / class A) = wordcount / ni(A)

P(word2 / class B) = wordcount / ni (B)

P(wordn / class B) = wordcount / ni (B)

5. Uniform distributions are to be performed in order to avoid zero frequency problem.

Training Set Bayes training

Fig:General Approach to naïve bayes method

7.1 Fetching tweets

 Search query to be issued to twitter.

7.2 Processing and storing tweets

7.3 Extraction of emotions from tweets

Evaluation metrics used are precision, recall and accuracy.

Precision is the measurement of correctness.

Recall is the measurement of completeness.

Accuracy is overall performance of individual classifier is measured by:

Code to get accuracy of the data:

So in order to classify data first, we need to perform the following steps.

Ex: it is going to rain today

“It”, “is”, “going”, “to”, “rain”, “today”.

Ex: I like dancing, so I dance.

After removing stop words the sentence will be as follows Like,dancing,dance.

 Bag of words concept is applied to these tokens.

 Finally, our classification technique Naïve Bayesian classifier is applied which

• Punctuation marks and digits/numerals may be removed if for example we wish to

The process of removing stopwords is as follows:

Example: “it is going to rain today”

“today I am not going outside"

“I am going to watch the season premiere”

Now tokenization is performed on these lines then we get

Line 1: It, is,so, hot, today.

Line 2: Today, I, am, not, going, outside.

Line 3: I, am, going, to, watch, the, season, premiere.

Table1: Frequency of each word represented as count

8.2 Application of sentiment analysis:

 Let us consider a document a

8.3 Bayesian Theorem applied to Documents.

For a document a and a class b using Bayesian theorem,

The term p (a|b) is represented as

P(it is so hot today) = P(it)P(is)P(so)P(hot) P(today)

P (it | positive)P(is | positive)P (so | positive)P (hot | positive)P(today | positive) .

P (it | negative)P(is | negative)P (so | negative)P (hot | negative)P(today | negative)