0% found this document useful (0 votes)
7 views

A Data-Centric Approach to Understanding the 2020 U.S.

The article discusses the use of analytics on Twitter feeds to analyze sentiments and emotions related to the 2020 U.S. Presidential Election, focusing on nearly seven million tweets from battleground states. Utilizing the NRC lexicon-based classifier, the study successfully predicted swing directions for several states, demonstrating the potential of social media analysis in forecasting election outcomes. The findings highlight the importance of understanding emotional dynamics in political contexts and suggest practical applications for improving election strategies.

Uploaded by

Pedro Natanael
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

A Data-Centric Approach to Understanding the 2020 U.S.

The article discusses the use of analytics on Twitter feeds to analyze sentiments and emotions related to the 2020 U.S. Presidential Election, focusing on nearly seven million tweets from battleground states. Utilizing the NRC lexicon-based classifier, the study successfully predicted swing directions for several states, demonstrating the potential of social media analysis in forecasting election outcomes. The findings highlight the importance of understanding emotional dynamics in political contexts and suggest practical applications for improving election strategies.

Uploaded by

Pedro Natanael
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

big data and

cognitive computing

Article
A Data-Centric Approach to Understanding the 2020 U.S.
Presidential Election
Satish Mahadevan Srinivasan 1, * and Yok-Fong Paat 2

1 Engineering Department, Pennsylvania State University, Great Valley, Malvern, PA 19355, USA
2 Department of Social Work, The University of Texas at El Paso, El Paso, TX 79968, USA; [email protected]
* Correspondence: [email protected]

Abstract: The application of analytics on Twitter feeds is a very popular field for research. A tweet
with a 280-character limitation can reveal a wealth of information on how individuals express their
sentiments and emotions within their network or community. Upon collecting, cleaning, and mining
tweets from different individuals on a particular topic, we can capture not only the sentiments and
emotions of an individual but also the sentiments and emotions expressed by a larger group. Using
the well-known Lexicon-based NRC classifier, we classified nearly seven million tweets across seven
battleground states in the U.S. to understand the emotions and sentiments expressed by U.S. citizens
toward the 2020 presidential candidates. We used the emotions and sentiments expressed within
these tweets as proxies for their votes and predicted the swing directions of each battleground state.
When compared to the outcome of the 2020 presidential candidates, we were able to accurately predict
the swing directions of four battleground states (Arizona, Michigan, Texas, and North Carolina), thus
revealing the potential of this approach in predicting future election outcomes. The week-by-week
analysis of the tweets using the NRC classifier corroborated well with the various political events that
took place before the election, making it possible to understand the dynamics of the emotions and
sentiments of the supporters in each camp. These research strategies and evidence-based insights
may be translated into real-world settings and practical interventions to improve election outcomes.

Keywords: NRC classifier; lexicon-based classifier; emotion classification; US presidential election;


social media

Citation: Srinivasan, S.M.; Paat, Y.-F.


A Data-Centric Approach to
Understanding the 2020 U.S. 1. Introduction
Presidential Election. Big Data Cogn. Social media and blogging sites have gained popularity, enabling individuals to
Comput. 2024, 8, 111. https://doi.org/ express their opinions and thoughts. Over a decade now, these sites have revolutionized
10.3390/bdcc8090111 the digital landscape. It has empowered every individual and community to freely express
Academic Editor: Danilo Ardagna their opinions, feelings, and thoughts on a variety of topics. Users of these sites are generally
allowed to post short and limited-character texts. Even though these texts are limited in size,
Received: 1 July 2024 they hold a wealth of information. They clearly convey thoughts, emotions, and feelings
Revised: 20 August 2024
on a particular topic within their social network. Upon mining these texts, it is possible
Accepted: 22 August 2024
to determine the emotions and feelings expressed by an individual on a particular topic.
Published: 4 September 2024
An individual can express any emotions or feelings, including anger, disgust, fear, joy, love,
sadness, surprise, etc. These texts can be examined at different ecological levels. Sometimes,
more than one emotion can also be expressed via the same text. Given the inherent lack of
Copyright: © 2024 by the authors.
structure and variability in the size of these texts, understanding these emotions, whether
Licensee MDPI, Basel, Switzerland. from an individual or larger group’s perspective (also known as emotion classification), can
This article is an open access article pose challenges. Further, despite significant breakthroughs in sentiment analysis within the
distributed under the terms and field of Data Mining and Machine Learning, the wide range of emotions associated with
conditions of the Creative Commons human behavior has yet to be addressed. Knowing the exact emotion underlying a topic of
Attribution (CC BY) license (https:// investigation rather than a generic sentiment is critical. Since more than one emotion can
creativecommons.org/licenses/by/ be expressed via text, it becomes necessary to analyze each sentence one by one to get a
4.0/). better grasp of the overall emotion associated with it. Additionally, the popularity of social

Big Data Cogn. Comput. 2024, 8, 111. https://doi.org/10.3390/bdcc8090111 https://www.mdpi.com/journal/bdcc


Big Data Cogn. Comput. 2024, 8, 111 2 of 19

media is encouraging users to submit short messages, thus replacing the use of traditional
electronic documents and article styles for expressing views and opinions.
Twitter tweets are short messages, unlike conventional texts. They are very peculiar
in terms of size and structure. As indicated earlier, they are restricted to 280 characters.
Because of this limitation, users have to use very restrictive language to express their
feelings and emotions. The language used in tweets is very different from the language
used in other digitized documents such as blogs, articles, and news [1]. A large variety
of features (i.e., words) are used in these texts, which poses a very significant challenge.
Upon representing these texts as a vector of features, each text results in exponentially
increasing the size of the available features. This is because the corpus would contain a
million features for a given topic [2]. A major significant challenge persists in manually
classifying the text within the tweets into different emotion classes. Manually classifying
the tweets into different emotion classes has been tried previously. However, manually
annotating the tweets into different emotion classes is not free from ambiguity and does not
also guarantee 100% accuracy [2]. It is also the inherent complexity of the various emotional
states that poses a significant challenge. Differentiating the emotional classes from one
another is also a complication. According to the Circumplex model [3], human beings have
28 different types of affect emotions. To explain this, Russell proposed a two-dimensional
circular space model in which it was demonstrated that the 28 different emotion types differ
from each other by a slight angle. Russell clearly showed that several emotion types are
clustered so closely together that it becomes difficult to differentiate between them. Thus, it
becomes very difficult for humans to label those texts accurately. When humans try to label
these texts, there is a notable risk of mislabeling the emotions that are subtly different or
close to each other. This is a serious issue because it eventually inhibits the classifier from
learning the critical features that can be used to identify emotions hidden in the texts.
In this article, we focus on analyzing the tweets collected during the 2020 presidential
elections. Using the lexicon-based NRC classifier, we analyzed the emotions and sentiments
expressed by people toward the two presidential candidates, Donald Trump and Joe Biden,
on various topics. Based on these emotions and sentiments, we predicted the swing
direction of the 2020 presidential election in a subset of states deemed battleground and
key to the election. To begin with, we have provided a short review of the emotion
classification work performed in the past. Following that, we discussed the materials and
methods employed in this study, presented the results and discussions from this study, and
concluded our research with discussions on the scope for future work.
Literature Review
Studies related to sentiment and emotional classification have recently garnered consider-
able empirical attention. This popularity is due to the increase in the amount of unstructured
opinion-rich text resources from social media, blogs, and textual corpus. These texts have
given researchers and companies access to the opinions of a larger group of individuals
around the globe. Meanwhile, the advances in ML and natural language processing (NLP)
have also sparked increased interest in sentiment and emotion classification. For example,
Hasan, Rundensteiner, and Agu (2014) have proposed the use of EMOTEX, which can detect
emotions in text messages. EMOTEX uses supervised classifiers for emotion classification.
Using Naïve Bayes (N.B.), Support Vector Machine (SVM), Decision trees, and the KNN
(k-nearest neighbor), they have demonstrated 90% precision for a four-class emotion classifi-
cation on the Twitter dataset [2]. Other studies, including the work by Pak et al. (2010) and
Barbosa et al. (2010), have considered using ML techniques on Twitter datasets. They both
have demonstrated accuracies ranging between 60% and 80% for distinguishing between
positive and negative classes [4,5]. Go et al. (2009) have also performed sentiment analysis
on the Twitter dataset using Western-style emoticons. They have used the N.B., SVM, and
Maximum Entropy and have reported an accuracy of 80% [6].
Furthermore, Brynielsson et al. (2014) have demonstrated close to 60% accuracy on a
four-class emotion (positive, fear, anger, and others) classification on the tweets related to
the Sandy hurricane using the SVM classifier [7]. Last but not least, Roberts et al. (2012)
Big Data Cogn. Comput. 2024, 8, 111 3 of 19

have proposed Empa Tweet that can be used to annotate and detect emotions on Twitter
posts. In their work, they have discussed developing a synthetic corpus containing tweets
for seven different emotion types (anger, disgust, fear, joy, love, sadness, and surprise).
On their constructed synthetic dataset, they used seven different binary SVM classifiers
and classified the tweets. Using their ensemble classification technique, they have classified
each tweet to determine if a particular emotion was present. In addition, they have reported
that their corpus contained tweets with multiple emotion labels [8].
Emotion and sentiment classification has been widely researched using various ma-
chine learning and deep learning techniques.
Bhowmick et al. (2010) performed an experiment where they observed that humans and
machine learning models exhibited a very similar level of performance for emotion and senti-
ment classification on multiple data sets. Therefore, they concluded that the machine learning
(deep learning) models can be trusted for this task [9]. Chatterjee et al. (2019) also confirmed
through their study that methods employing Deep neural networks outperform other off-
the-shelf models for emotion classification in textual data [10]. Kim (2014) performed several
experiments on emotion classification using CNN on multiple benchmark datasets, including
the fine-grained Stanford Sentiment Treebank. A simple CNN with slight hyperparameter
tuning demonstrated excellent results for binary classifications of different emotions [11]. In a
work by Kalchbrenner et al., 2014 Dynamic Convolutional Neural Network (DCNN) has been
explored for sentiment classification on the Twitter dataset. According to them, DCNN is
capable of handling varying lengths of input texts in any language. They have reasoned that
the use of Dynamic k-Max Pooling makes DCNN a potential method for sentiment analysis
of Twitter data [12]. Acharya et al. (2018) have explored emotion detection in EEG signals. In
their study, they have explored and demonstrated the potential of using the complex 13-layer
CNN architecture [13]. In one of the studies, Hamdi et al. (2020) utilized the CNN streams and
the pre-trained word embeddings (Word2Vec) to achieve a staggering 84.9% accuracy on the
Stanford Twitter Sentiment dataset [14]. On the contrary, Zhang et al. (2016) have proposed the
Dependency Sensitive Convolutional Neural Networks (DSCNN) that outperforms traditional
CNNs. They have reported 81.5% accuracy in the sentiment analysis of Movie Review Data
(MR) [15]. Zhou et al. (2015) have proposed C-LSTM, which utilizes both the CNN and LSTM
for a 5-class classification task. However, they have only reported an accuracy of 49.2% [16].
Since emotion and sentiment classification is a sequence problem, several studies
have focused on exploring recurrent neural networks or RNNs. Lai et al. (2015) have
explored RNNs and have determined that RNNs have the capability to capture the key
features and phases in texts that can help boost performance for emotion and sentiment
classification [17]. Abdul-Mageed and Ungar (2017) have explored Gated RNN or GRNN
for emotion and sentiment classification in several dimensions and have demonstrated
significantly high accuracies [18]. Kratzwald et al. (2018) have explored six benchmark
datasets for emotion classification using the combination of both the RNN and sent2affect.
They have reported exceptional performance of this combination when compared against
any traditional machine learning algorithm [19].
Using the Recursive Neural Tensor Network (RNTN) for the famous Stanford Sen-
timent Treebank dataset (SST), Socher et al. (2013) have reported 85.4% accuracy for
sentiment classification [20]. Zhou et al. (2016) used the BLSTM-2DCNN architecture for
Stanford Sentiment Treebank binary and fine-grained classification tasks, archiving a mere
52.4% accuracy. In their study, they observed that the BLSTM-2DCNN architecture was
very efficient in capturing long-term sentence dependencies [21]. Czarnek et al. (2022)
used the Linguistic Inquiry and Word Count (LIWC) and NRC Word-Emotion Associa-
tion Lexicon (NRC) to investigate whether older people have more positive expressions
through their language use. They examined nearly five million tweets created by 3573
people between 18 and 78 years old and found that both methods show an increase in
positive affect until age 50. They also concluded that according to NRC, the growth of
positive affect increases steadily until age 65 and then levels off [22]. Barnes, J. (2023)
has presented a systematic comparison of sentiment and emotion classification methods.
Big Data Cogn. Comput. 2024, 8, 111 4 of 19

In this study, different methods for sentiment and emotion classification have been com-
pared, ranging from rule- and dictionary-based methods to recently proposed few-shot
and prompting methods with large language models. In this study, it has been reported
that in different settings—including the in-domain, out-of-domain, and cross-lingual—the
rule- and dictionary-based methods outperformed the few-shot and prompting methods in
low-resource settings [23].
There are three types of classifiers for emotion and sentiment classification: supervised,
unsupervised, and lexicon-based classifiers. Supervised classifiers are more commonly
used to address the emotion and sentiment classification problem [2,4–6,8,24–32]. However,
a training dataset is required to employ a supervised classifier for the classification problem.
More specifically, a domain-specific training dataset is required. Obtaining a domain-
specific training dataset for a task in hand is hard as it might not always be available.
Therefore, it is wise to explore unsupervised or lexicon-based classifiers.
Unsupervised classifiers are utilized to model the underlying structure or the distribution
of the data. Therefore, these algorithms are left on their own to discover and present interesting
patterns. Upon using unsupervised learning, users are left to look at those patterns and assign
the class labels. In the Lexicon-based approach, the aim is to identify certain patterns that
occur together with a seed list of sentiment/emotion-related words. More specifically, similar
sentiment/emotion-related words are identified from a large corpus with the same feature-
specific orientations. For this study, the unavailability of the domain-specific corpus is a
major challenge [33]. Therefore, we have opted for a lexicon-based classifier, NRC [34,35], to
determine the emotions and sentiments expressed within the collected tweets. In previous
works related to US presidential elections [36–41], it has been clearly demonstrated that the
NRC classifiers are best suited for emotion and sentiment classification of tweets compared to
several different supervised learning techniques.
2. Methods and Materials
2.1. Methodology
Collecting and mining Twitter feeds related to political events provides insight into the
opinion expressed by an individual or an entire community. In this particular context, the
Twitter feeds have the potential to serve as a proxy for the voter’s vote. These tweets also
have the potential to predict and understand all the major events related to the presidential
elections, ultimately leading to determining the outcomes. Srinivasan et al. (2019) clearly
demonstrated that the collected Twitter feeds relating to the 2016 presidential election had
the potential to predict the major events that led to the final outcome of the election [36].
In this study, we have chosen to collect data from a secondary source (social media sites).
To address the research queries in this study, we have identified collecting tweets from
Twitter. Here, we will first introduce the lexicon-based NRC classifier that will be used to
classify the tweets. Secondly, we will discuss the data collection process in detail.
2.1.1. NRC Classifier
The tweets collected for this study were classified using the NRC classifier. We im-
plemented the NRC classifier in R (version 3.6.0) using the Syuzhet package. The Syuzhet
package implements the NRC classifier using the function get_nrc_sentiment (). This function
classifies tweets across eight emotions and two sentiments.
The NRC is a lexicon-based classifier with annotations for about 14,182 words. Using
these words, the NRC classifier can classify texts into eight different emotions—anger,
anticipation, disgust, fear, joy, sadness, surprise, and trust—as well as sentiments: negative
and positive. This lexicon corpus (words) of the NRC classifier was constructed based on
two measures, namely the Strength of Association (SOA) and Pointwise Mutual Information
(PMI). The use of the two measures mentioned above ensures that the lexicon corpus has
the potential to determine a particular emotion class in a sentence [34,35]. A significant
drawback of the NRC classifier is that it cannot classify those sentences that do not contain
the words that belong to the lexicon corpus.
Big Data Cogn. Comput. 2024, 8, 111 5 of 19

Across each state, we determined each candidate’s average net sentiment score, which
is the difference between the fraction of the positive sentiment tweet to the total number
of tweets and the fraction of the negative sentiment tweet to the total number of tweets.
We predicted that a state would swing in favor of a candidate if a candidate received the
highest number of net positive tweets. Similarly, we determined the average score for each
of the eight emotions as a fraction of the total number of tweets.
2.1.2. Data Collection
To retrieve the tweets from Twitter, we developed an automated script. This script was
designed to retrieve tweets using both the Search API, which is part of the Twitter REST
API, and the built-in Twitter API package within the R Studio. To successfully execute
the automated script, we established a developer account on Twitter that provided us
with access to various Tokens and API keys. The automated script identified and used
appropriate handles for both candidates, Donald Trump and Joe Biden, to selectively
retrieve the tweets. Tweets were collected over eight months in 2020.
Using 20 different hashtags, a total of 7,653,518 tweets were collected for both presidential
candidates between 3 March 2020 and 30 October 2020. Of these hashtags, the most popular
were Trump, Trump2000, and Joe Biden. The unique number of tweeters who tweeted about
Donald Trump and Joe Biden was 6,258,545 and 1,394,973, respectively. Table 1 provides a
monthly distribution of the collected tweets. It is evident that there were at least five times the
number of tweets collected for Donald Trump than for Joe Biden (see Table 1).
Table 1. Month-wise distribution of tweets collected for the U.S. 2020 presidential candidates.

Months Total Number of Tweets


Donald Trump Joe Biden
March 574,020 460,509 113,511
April 579,452 525,014 54,438
May 557,608 494,448 63,160
June 723,169 663,650 59,519
July 506,005 446,262 59,743
August 1,598,082 1,288,131 309,951
September 1,502,981 1,166,885 336,096
October 1,612,201 1,213,646 398,555
Total 6,421,511 1,232,007

Once the tweets were collected, extensive cleaning was performed. To clean the tweets,
we used the gsub function from the stringr package in R. Stanton outlined the steps for
cleaning the tweets [42]. The focus of data cleaning was to get rid of unnecessary spaces,
get rid of the URLs, and remove the retweet header, hashtags, and references to other links.
We used the R statistical package cldr and identified that ~10% of the tweets were posted in
38 different languages. The rest of the tweets were in English. All the non-English tweets
were identified and filtered out.
Both data collection and cleaning took about ten hours per Twitter handle. The entire
workload was distributed evenly across several Google Cloud Computing (GCC) engines.
Two machines, each running seven different GCC virtual machines, took approximately
seven hours to collect the data. Once the data was collected, the data was validated
by a two-fold mechanism. A Python script was first used to compare the daily tweets
collected by the “Streaming APIs” of Twitter in order to confirm the completeness of the
tweets’ content and attributes. Another R script using the same “Streaming APIs” was
compared against the main script, but instead of comparing the daily tweets collected, it
was compared on a weekly basis to confirm the completeness of the collected tweets.
All the collected tweets were classified using the NRC classifier, which was imple-
mented in R using the Syuzhet package. This package contains an implementation of
an interface (function) called get_nrc_sentiment (). All the tweets were classified into one
of the eight different emotions and one of the two sentiments. For each tweet, the NRC
Big Data Cogn. Comput. 2024, 8, 111 6 of 19

classifier gives a nominal value ranging between zero (low) and seven (high) across eight
emotions and two sentiments. We assigned a unique label to each tweet by determining
which emotion was prominent (high nominal value) within it. If a tweet had more than one
prominent emotion, then we duplicated those tweets as a separate instance and assigned all
the respective prominent emotions to it. While classifying the tweets, we also encountered
situations where tweets had no emotion or sentiment label attached. This was because the
NRC classifier assigned a zero (0) nominal value across eight emotions and two sentiments.
Such tweets were removed from further processing and analysis. In addition to that, we
also computed the average net sentiment score for each candidate. We determined each
candidate’s average net sentiment score, which is the difference between the fraction of
the positive sentiment tweet to the total number of tweets and the fraction of the negative
sentiment tweet to the total number of tweets. We predicted that a state would swing
in favor of a candidate if a candidate received the highest number of net positive tweets.
Similarly, we determined the average score for each of the eight emotions as a fraction of
the total number of tweets.
3. Results
Using the NRC classifier, we determined the general population’s emotions and
sentiments expressed toward both presidential candidates in these tweets. Figure 1 shows
a higher overall positive sentiment for Biden (0.917) than for Trump (0.827). Overall, the
negative sentiment was similar, slightly more toward Trump (0.869) than Biden (0.859).
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 7 of 22
People expressed more anger toward Biden than Trump but were more surprised, disgusted,
and less trusting toward Trump.

Figure 1. Sentiment and emotion analysis of people toward the 2020 U.S. presidential candidates.
Figure 1. Sentiment and emotion analysis of people toward the 2020 U.S. presidential candidates.
Figure 2 indicates that the average positive sentiment for Trump has increased stead-
ily since June but decreased again in October as the election approached. On the other
hand, the average negative sentiment for Trump has been decreasing since mid-July,
while the negative sentiment for Biden has been increasing since August.
Big Data Cogn. Comput. 2024, 8, 111 7 of 19

Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 8 of 22


Figure 2 indicates that the average positive sentiment for Trump has increased steadily
Big Data Cogn. Comput. 2024, 8, x FORsince
PEER June but decreased again in October as the election approached. On the other8hand,
REVIEW of 22
the average negative sentiment for Trump has been decreasing since mid-July, while the
negative sentiment for Biden has been increasing since August.

Figure 2. Month-wise analyses of positive and negative sentiments for both the 2020 U.S. presiden-
tial candidates.
Figure 2. Month-wise
Figure 2. Month-wiseanalyses
analyses of positive
of positive and negative
and negative sentiments
sentiments for2020
for both the bothU.S.
thepresidential
2020 U.S. candidates.
presiden-
tial candidates.
Figure 3 shows the net sentiment (net sentiment = positive − negative) score for both
Figure candidates,
presidential 3 shows theand
net sentiment (netTrump
it is clear that sentiment
has =been
positive − negative)
gaining score forsince
ground steadily both
AugustFigure
presidential3candidates,
comparedshows the net
and
to Biden sentiment
andit is clear
was (net Trump
that
holding sentiment
a slighthas=been
positive − negative)
gaining
advantage. groundscore for since
steadily both
presidential candidates,
August compared andand
to Biden it iswas
clearholding
that Trump hasadvantage.
a slight been gaining ground steadily since
August compared to Biden and was holding a slight advantage.

Figure
Figure3.3.Month-wise
Month-wise analyses of the
analyses of thenet
netsentiment
sentimentscore
scorefor
for both
both thethe 2020
2020 U.S.
U.S. Presidential
Presidential Candi-
Candidates.
dates.
FigureWe analyze both
3. Month-wise the candidates’
analyses stand related
of the net sentiment to healthcare,
score for both the 2020immigration, economy,
U.S. Presidential Candi-
dates.
raceWe analyze
relations, bothand
trade thetariffs,
candidates’
foreignstand related
affairs, to healthcare,
and climate change,immigration, economy,to
which were essential
the electorate
race relations, in the 2020
trade electionforeign
and tariffs, cycle. Based
affairs,onand
Figure 4, Biden
climate heldwhich
change, an advantage on most
were essential
theWe
toofthe analyze
issues
electorate inboth
exceptthe the
thecandidates’
for2020 election stand
matters pertaining
cycle. related
Based toon to healthcare,
trade and4,tariffs
Figure immigration,
Bidenas wellan
held raceeconomy,
asadvantage
relations.
on
race relations, trade and tariffs, foreign affairs, and climate change, which
most of the issues except for the matters pertaining to trade and tariffs as well as race were essential
to the electorate
relations. in the 2020
Under Trump’ election cycle.
presidency, the US Based
economyon Figure
shrank4, Biden heldwhich
by 4.8%, an advantage on
took a sig-
most of the issues except for the matters pertaining to trade and tariffs
nificant hit for Trump in the 2020 presidential election. In terms of the economy, foreign as well as race
relations. Under Trump’ presidency, the US economy shrank by 4.8%, which took a sig-
nificant hit for Trump in the 2020 presidential election. In terms of the economy, foreign
Data Cogn.
BigComput.
Data Cogn.2024, 8, x2024,
Comput. FOR8, PEER
111 REVIEW 8 of 19
9 of

Under Trump’ presidency, the US economy shrank by 4.8%, which took a significant hit
affairs,
forand immigration,
Trump Trump’s election.
in the 2020 presidential averageInnet score
terms was
of the low until
economy, July,affairs,
foreign but then
and it start
picking up as theTrump’s
immigration, electionaverage
got closer. With
net score wasrespect toJuly,
low until trade
butand
thentariffs, people
it started pickingshowed
up mo
as the election got closer.
faith in Trump than in Biden. With respect to trade and tariffs, people showed more faith in
Trump than in Biden.

Figure 4. Cont.
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 10 of 22

Big
Big Data Cogn. Comput.
Data Cogn. Comput. 2024,
2024, 8,
8, x FOR PEER REVIEW
111 10 of 22
9 of 19

Figure 4. Candidates’ stand on the issues important to the electorate in the 2020 election cycle.
Figure 4. Candidates’ stand on the issues important to the electorate in the 2020 election cycle.
Figure 4. Candidates’ stand on the issues important to the electorate in the 2020 election cycle.
Figure 5 shows the overall sentiment for Trump and Biden and important events be-
Figure 5 shows the overall sentiment for Trump and Biden and important events be-
tween March Figure and
5 shows the overall
October sentiment for the
2020. Interestingly, Trump and Biden
increase and importanthad
in unemployment events
a much
tween March and October 2020. Interestingly, the increase in unemployment had a much
more between March
dramatic and October 2020. Interestingly, the increase in unemployment had a much
more dramaticeffect
effecton onBiden
Biden than
than onon Trump.
Trump.Trump’s
Trump’s attempt
attempt to block
to block JohnJohn Bolton’s
Bolton’s
book moreand dramatic
his effect
views on ondelaying
Biden thantheon2020
Trump. Trump’s attempt
presidential electionto block John Bolton’s
impacted the booksenti-
overall
book and his views on delaying the 2020 presidential election impacted the overall senti-
and his views on delaying the 2020 presidential election impacted the overall sentiment
mentment he he
garnered.
garnered. By
Bycontrast,
contrast, Joe Biden’sselection
Joe Biden’s selection of hishis vice presidential running mate
he garnered. By contrast, Joe Biden’s selection of hisofvice vice presidential
presidential running
running matematedid
diddid not seem
not seem
not seem
to have
to have
to have
helped
helped
helped
him.
him.him.
Overall,
Overall,
Overall,
Trump
Trump
Trump heldheld
held a slight
a slight
a slight
edge
edgeedge
overover
over
BidenBidenBiden
sincesince
since
the the
the first
firstfirst
presidential
presidential
presidential debate.
debate.
debate.

Figure 5. Month-wise analysis of the overall average sentiment for both the 2020 U.S. Presidential Can-
didates (blue for Biden, red for Trump) correlated with major events.
Figure 5. Month-wise analysis of the overall average sentiment for both the 2020 U.S. Presidential Can-
Figure
We Month-wise
5. also analysis of the overall average sentiment for both thecollected
2020 U.S. from
Presidential
didates (blue for performed sentiment
Biden, red for Trump)and emotion
correlated analyses
with major on tweets
events. seven
Candidates (blue for Biden, red for Trump) correlated with major events.
battleground states (Texas, Florida, Arizona, Michigan, Wisconsin, Pennsylvania, and
We also performed sentiment and emotion analyses on tweets collected from seven
battleground states (Texas, Florida, Arizona, Michigan, Wisconsin, Pennsylvania, and
Big Data Cogn. Comput. 2024, 8, 111 10 of 19
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 11 of 22

We also performed sentiment and emotion analyses on tweets collected from seven
battleground
North Carolina). states (Texas,
A total Florida, Arizona,
of 1,507,525 Michigan,
tweets were analyzedWisconsin, Pennsylvania,
across the and
seven battleground
North Carolina). A total of 1,507,525 tweets were analyzed across the seven battleground
states collected since July 2020. As depicted in Figure 6, a vast majority of the tweets came
states collected since July 2020. As depicted in Figure 6, a vast majority of the tweets came
from Texas and Florida, and Trump received relatively more tweets than Biden.
from Texas and Florida, and Trump received relatively more tweets than Biden.

Figure 6. 6.
Figure Distribution
Distributionof
oftweets collectedfrom
tweets collected fromthe
the seven
seven battleground
battleground states.
states.

AsAsshown
showninin Figure
Figure 7,7,Biden
Bidensecured a higher
secured net sentiment
a higher score inscore
net sentiment Pennsylvania and
in Pennsylvania
and Arizona, but Trump was in a close race in North Carolina and Wisconsin andinheld a
Arizona, but Trump was in a close race in North Carolina and Wisconsin and held a lead
Florida,
lead Michigan,
in Florida, and Texas.
Michigan, and Texas.
Big Data Cogn. Comput. 2024, 8, 111
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 11 12
of of
1922

Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 12 of 22

Figure 7. Weekly distribution of the


Figure 7. Weekly average
distribution netaverage
of the sentiment score for
net sentiment scorethe 2020
for the U.S.
2020 U.S. presidential
presidential can-
didates
candidates across seven across seven states.
battleground battleground states.

4. Analysis of theStates
4. Analysis of the Battleground Battleground States

4.1. Arizona 4.1. Arizona


In Arizona, Trump had gained a significant lead in trust over Biden. Even though the
In Arizona, Trump
publichad gained
expressed a significant
greater lead in
anger and disgust trusthim
toward over
(seeBiden.
Table 2),Even
Trump though the
had garnered
public expressed greater anger
a steady andindisgust
increase toward him
positive sentiment. (see Table
However, 2), Trump
in October, as shown hadin garnered a
Figure 8, there
steady increase in positive sentiment.
was a substantial However,
increase in October,
in the positive as shown
sentiment in and
for Biden Figure 8, theredrop
a significant wasin
positive
a substantial increase in thesentiment
positive forsentiment
Trump, suggesting
for Bidenthat Arizona would likelydrop
and a significant swingintoward Biden.
positive
sentiment for Trump, suggesting that Arizona would likely swing toward Biden.
Table 2. Emotions toward both the presidential candidates in Arizona.
Figure 7. Weekly distribution of the average net sentiment score for the 2020 U.S. presidential can-
Emotion
didates across seven battleground states.
Donald Trump Joe Biden
Anger 0.4606 0.3950
4. Analysis of the Battleground States
4.1. Arizona
Disgust 0.2918 0.2490
In Arizona,Fear
Trump had gained a significant lead0.4458
in trust over Biden. Even though the0.3675
public expressed greater anger and disgust toward him (see Table 2), Trump had garnered
Joy 0.3822 0.3185
a steady increase in positive sentiment. However, in October, as shown in Figure 8, there
Trust
was a substantial increase in the positive sentiment0.6545
for Biden and a significant drop in0.5840
positive sentiment for Trump, suggesting that Arizona would likely swing toward Biden.

Figure 8. Monthly-wise net sentiment analysis of both the presidential candidates in Arizona.

Figure Monthly-wise
Figure 8.8.Monthly-wise net sentiment
net sentiment analysis
analysis of both the of both thecandidates
presidential presidential candidates in Arizona.
in Arizona.
Table 2. Emotions toward both the presidential candidates in Arizona.

Emotion Donald Trump Joe Biden


Anger 0.4606 0.3950
Big Data Cogn. Comput. 2024, 8, 111 12 of 19
Disgust 0.2918 0.2490
Fear 0.4458 0.3675
Joy 0.3822 0.3185
4.2. Florida Trust 0.6545 0.5840
In Florida, people showed more trust toward Trump but also expressed more anger,
4.2. Florida
disgust, joy, and fear toward him than Biden (see Table 3). As shown in Figure 9, we noticed
a substantial
In Florida,increase in the positive
people showed more trustsentiment for Biden
toward Trump since
but also October
expressed and
more a significant
anger,
drop in joy,
disgust, positive sentiment
and fear for than
toward him Trump, thus
Biden (seesuggesting
Table 3). As that
shownFlorida would
in Figure 9, welikely
no- swing
ticed a substantial
toward Biden. increase in the positive sentiment for Biden since October and a signif-
icant drop in positive sentiment for Trump, thus suggesting that Florida would likely
Table 3. Emotions toward presidential candidates in Florida.
swing toward Biden.
Emotion Donald Trump Joe Biden
Table 3. Emotions toward presidential candidates in Florida.
Anger 0.4106 0.3629
Emotion Donald Trump Joe Biden
Disgust
Anger 0.2662
0.4106 0.3629 0.2250
Fear
Disgust 0.4302
0.2662 0.2250 0.3484
Fear
Joy 0.4302
0.3682 0.3484 0.3171
Joy 0.3682 0.3171
Trust 0.6535 0.5882
Trust 0.6535 0.5882

Figure 9. Monthly-wise net sentiment analysis of both presidential candidates in Florida.


Figure 9. Monthly-wise net sentiment analysis of both presidential candidates in Florida.
4.3. Michigan
4.3. Michigan
In Michigan, we noted an increase in trust in Trump. Comparatively, in Michigan,
TrumpIn Michigan, wemore
experienced noteddisgust,
an increase
fear,inanger,
trust in Trump.
and Comparatively,
joy than in Michigan,
Biden (see Table 4). As indicated
Trump experienced more disgust, fear, anger, and joy than Biden (see Table 4). As indi-
in Figure 10, Trump consistently outperformed Biden in the net sentiment score. However,
cated in Figure 10, Trump consistently outperformed Biden in the net sentiment score.
soon after the second presidential debate in late October, Biden gained a significant net
However, soon after the second presidential debate in late October, Biden gained a signif-
sentiment suggesting
icant net sentiment that Michigan
suggesting couldcould
that Michigan swing toward
swing him.
toward him.
Table 4. Emotions toward presidential candidates in Michigan.

Emotion Donald Trump Joe Biden


Anger 0.3941 0.3657
Disgust 0.2576 0.2329
Fear 0.4447 0.3532
Joy 0.3768 0.2937
Trust 0.6549 0.5619
Table 4. Emotions toward presidential candidates in Michigan.

Emotion Donald Trump Joe Biden


Anger 0.3941 0.3657
Disgust 0.2576 0.2329
Big Data Cogn. Comput. 2024, 8, 111 Fear 0.4447 0.3532
13 of 19
Joy 0.3768 0.2937
Trust 0.6549 0.5619

Figure
Figure 10.10. Monthly-wise
Monthly-wise net sentiment
net sentiment analysis
analysis of both of both
the presidential the presidential
candidates in Michigan. candidates in Michigan.

4.4. North
4.4. NorthCarolina
Carolina
In North Carolina, Trump held a significant lead over Biden in trust but also received
In North
more disgust, Carolina,
anger, Trump
and fear (see Table 5).held a significant
We also lead over
observed a substantial Biden
increase in trust but also received
in the
more disgust,
net sentiment anger,
toward Trump and
sincefear (see Table
September, 5). We
even though thealso observed
official a substantial
death toll due to increase in the
the COVID-19
net sentiment pandemic
toward surpassed
Trump 200,000.
since ForSeptember,
both candidates, the number
even though of the
tweetsofficial death toll due to
collected from North Carolina was very close, with slightly more tweets collected for
the COVID-19 pandemic surpassed 200,000. For both candidates, the number of tweets
Trump (i.e., ~9k for Trump and ~6k for Biden). Although we noticed a substantial rise in
collected fromfor
the net sentiment North
Biden Carolina
toward the was
end ofvery close,and
September with slightly more
the beginning tweets collected for Trump
of October,
there was
(i.e., ~9ka for
steepTrump
drop in and
positive
~6ksentiment for Biden,
for Biden). suggestingwe
Although that noticed
North Carolina
a substantial rise in the net
would undoubtedly swing toward Trump (see Figure 11).
sentiment for Biden toward the end of September and the beginning of October, there
was a Emotions
Table 5. steep drop
toward in positive
presidential sentiment
candidates in North for Biden, suggesting that North Carolina would
Carolina.
undoubtedlyEmotion swing toward Trump
Donald Trump(see Figure 11). Joe Biden
Anger 0.4248 0.3453
Table 5. Emotions
Disgust toward presidential
0.2966candidates in North0.2124
Carolina.
Fear 0.4430 0.3232
Emotion
Joy 0.3685 Donald Trump 0.2959 Joe Biden
Trust
Anger 0.6402 0.4248 0.5153 0.3453
Disgust 0.2966 0.2124
Fear 0.4430 0.3232
Joy 0.3685 0.2959
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW Trust 0.6402 15 of 22 0.5153

Figure
Figure 11.11. Monthly-wise
Monthly-wise net sentiment
net sentiment analysis
analysis of both of both the
the presidential presidential
candidates candidates in North Carolina.
in North Caro-
lina.

4.5. Pennsylvania
In Pennsylvania, both candidates experienced similar levels of trust, with Trump
slightly ahead of Biden. At the same time, Trump was also experiencing more disgust,
anger, and fear among people in Pennsylvania (see Table 6). In Pennsylvania, Trump en-
Big Data Cogn. Comput. 2024, 8, 111 14 of 19

4.5. Pennsylvania
In Pennsylvania, both candidates experienced similar levels of trust, with Trump slightly
ahead of Biden. At the same time, Trump was also experiencing more disgust, anger, and
fear among people in Pennsylvania (see Table 6). In Pennsylvania, Trump encountered a
lot of negative sentiment due to a variety of reasons, including suggesting that the election
should be delayed, recommending voters in North Carolina to vote twice, and the official
death toll for COVID-19 that surpassed 200,000, among others. However, in the final weeks of
the election, Trump surpassed Biden in the net average sentiment, strongly suggesting that
Pennsylvania would likely swing toward Trump (see Figure 12). Before the election’s final
weeks, this state appeared to be consistently swinging toward Biden.
Table 6. Emotions toward presidential candidates in Pennsylvania.

Emotion Donald Trump Joe Biden


Anger 0.4651 0.3666
Disgust 0.3156 0.2304
Fear 0.5068 0.3660
Joy 0.3423 0.3495
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 16 of 22
Trust 0.6551 0.6115

Figure 12. Monthly-wise net sentiment analysis of both the presidential candidates in Pennsylvania.
Figure 12. Monthly-wise net sentiment analysis of both the presidential candidates in Pennsylvania.
4.6. Texas
4.6. Texas
In Texas, Trump garnered notably greater trust among people over Biden. At the same
time,In Texas,showed
people Trumpslightly
garnered notably
more anger,greater
disgust,trust
fear,among
and joypeople
towardover himBiden. At the
(see Table 7).
same
In time,part
the later people showedafter
of October, slightly more anger,
the second disgust,
presidential fear, and
debate, Biden joyexperienced
toward him (see
a sub-
Table 7).
stantial In theinlater
decline part of October,
the positive sentiment,after the secondceding
consequently presidential
grounddebate,
to Trump,Biden experi-
suggesting
enced a substantial decline in the positive sentiment, consequently ceding
that Texas would swing toward Trump. From Figure 13, it is evident that the pandemic had ground to
Trump,
no suggesting
significant impactthat Texas
on the would swing
candidates. Even toward Trump.
as the U.S. deathFrom Figure 13,200,000,
toll surpassed it is evident
both
that the pandemic
candidates had no significant
were experiencing impact
an increase in on
thethe candidates.
average Even as score,
net sentiment the U.S. death
with toll
Biden
surpasseda 200,000,
receiving bothsurpassing
steeper rise, candidatesTrump
were experiencing an increase
in the latter part in the average net sen-
of September.
timent score, with Biden receiving a steeper rise, surpassing Trump in the latter part of
September.

Table 7. Emotions toward presidential candidates in Texas.

Emotion Donald Trump Joe Biden


Anger 0.3986 0.3708
Disgust 0.2595 0.2233
Fear 0.4154 0.3849
Big Data Cogn. Comput. 2024, 8, 111 15 of 19

Table 7. Emotions toward presidential candidates in Texas.

Emotion Donald Trump Joe Biden


Anger 0.3986 0.3708
Disgust 0.2595 0.2233
Fear 0.4154 0.3849
Joy 0.3644 0.2875
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 17 of 22
Trust 0.6085 0.5477

Figure 13. Monthly-wise net sentiment analysis of both the presidential candidates in Texas.
Figure 13. Monthly-wise net sentiment analysis of both the presidential candidates in Texas.
4.7. Wisconsin
4.7. Wisconsin
In Wisconsin, Trump experienced significantly more trust than Biden but also more
angerInand fear. On the
Wisconsin, otherexperienced
Trump hand, Biden significantly
received slightly
moremore
trustdisgust (see Table
than Biden but 8).
alsoUnlike
more
in different
anger states,
and fear. OnBiden encountered
the other a steep
hand, Biden increase
received in the
slightly moreaverage net(see
disgust sentiment
Table 8).soon
Un-
after hedifferent
like in nominated Senator
states, Kamala
Biden Harris asahis
encountered running
steep mate.
increase inHowever, as noted
the average in Texas,
net sentiment
Biden experienced
soon after a decrease
he nominated in the
Senator positive
Kamala sentiment
Harris followingmate.
as his running the second
However,presidential
as noted
debate, bringing both Biden and Trump into a close race in late October
in Texas, Biden experienced a decrease in the positive sentiment following the (see Figure 14).
second
As the election date approached, Trump surpassed Biden on the average net
presidential debate, bringing both Biden and Trump into a close race in late October (See sentiment
score,
Figuresuggesting
14). As thethat Wisconsin
election would likelyTrump
date approached, favor Trump.
surpassed Biden on the average net
sentiment
Table score, toward
8. Emotions suggesting that Wisconsin
presidential candidateswould likely favor Trump.
in Wisconsin.

Table 8. Emotions
Emotiontoward presidential candidates in Wisconsin.
Donald Trump Joe Biden
Anger
Emotion 0.4172
Donald Trump 0.4013
Joe Biden
Disgust
Anger 0.2811
0.4172 0.2966
0.4013
Disgust
Fear 0.2811
0.4261 0.2966
0.3718
Fear
Joy 0.4261
0.3473 0.3718
0.3205
Joy
Trust 0.3473
0.6083 0.3205
0.5799
Trust 0.6083 0.5799
BigData
Big DataCogn.
Cogn. Comput.
Comput. 2024,
2024,8,
8,111
x FOR PEER REVIEW 18 of 22
16 of 19

Figure 14. Monthly-wise net sentiment analysis of both the presidential candidates in Wisconsin.
Figure 14. Monthly-wise net sentiment analysis of both the presidential candidates in Wisconsin.
Table 9 compares our study predictions with the outcome across the seven battleground
Table 9 compares our study predictions with the outcome across the seven battle-
states. The cells in Table 9 with bolded texts indicate the battleground states where our
ground states. The cells in Table 9 with bolded texts indicate the battleground states where
predictions matched the outcomes. Out of the seven battleground states we analyzed, our
our predictions matched the outcomes. Out of the seven battleground states we analyzed,
predictions were accurate for the four states (North Carolina, Texas, Arizona, and Michigan).
our predictions were accurate for the four states (North Carolina, Texas, Arizona, and
Michigan).
Table 9. Comparison of our study predictions with the final outcome.

Battleground State Table 9. OUR PREDICTION


Comparison Actual
of our study predictions with Outcome
the final of the 2020 U.S. Presidential Election
outcome.
Pennsylvania Likely Trump Biden
Battleground State OUR PREDICTION Actual Outcome of the 2020 U.S. Presidential Election
Florida Likely Biden Trump
Pennsylvania Likely Trump Biden
North Carolina Likely Trump Trump
Florida Likely Biden Trump
Wisconsin Likely Trump Biden
North Carolina Likely Trump Trump
Texas Likely Trump Trump
Wisconsin Likely Trump Biden
Michigan Likely Biden Biden
Texas Likely Trump Trump
Arizona Likely Biden Biden
Michigan Likely Biden Biden
Arizona Likely Biden Biden
Figure 15 is a convenient way to visualize multivariate data. The spider chart in
Figure 15 compares
Figure the mediaway
15 is a convenient predictions and multivariate
to visualize the actual outcome of spider
data. The the 2020 U.S.inpres-
chart Fig-
idential election. The projections by the media were closer to the outcomes
ure 15 compares the media predictions and the actual outcome of the 2020 U.S. presiden- in Georgia,
Nevada, Arizona,
tial election. and Northby
The projections Carolina.
the mediaHowever, the differences
were closer werein
to the outcomes huge in Wisconsin,
Georgia, Nevada,
Michigan,
Arizona, and andNorth
Florida. Figure However,
Carolina. 16 compares theour predictions
differences wereagainst theWisconsin,
huge in actual outcomes.
Michi-
Our predictions were closer to the states of North Carolina, Nevada, Wisconsin,
gan, and Florida. Figure 16 compares our predictions against the actual outcomes. Our Florida,
and the U.S. were
predictions However,
closerour predictions
to the states ofvaried
Northmainly in Pennsylvania
Carolina, and Ohio.
Nevada, Wisconsin, Florida, and
Our predictions were better than the media predictions for
the U.S. However, our predictions varied mainly in Pennsylvania and Ohio. North Carolina, Nevada,
Wisconsin, Florida, and the U.S. However, in Michigan, Pennsylvania, Georgia, and Ohio,
the media predictions outperformed ours. In Arizona and Texas, both our predictions and
the predictions by the media were comparable (see Figures 15 and 16).
Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 19 of

Big Data Cogn. Comput. 2024, 8, 111 17 of 19

Figure 15. Spider chart comparing the media predictions to the actual outcomes of the 2020 presi-
dential election.

Figure 15. Spider chart comparing the media predictions to the actual outcomes of the 2020 presidential election.

Figure 15. Spider chart comparing the media predictions to the actual outcomes of the 2020 pre
dential election.

Figure 16. Spider chart comparing our predictions to the actual outcomes of the 2020 presidential election.

5. Discussion and Conclusions


Figure 16. Spider chart
In thiscomparing ourdemonstrated
study, we have predictions tothethe actualand
potential outcomes
utility ofofthe
theNRC
2020classifier
presidential
for
election. emotion and sentiment classification. Using the NRC classifier, we have classified about
seven million tweets related to the 2020 U.S. presidential election. In four battleground states
(Arizona, Michigan, Texas, and North Carolina), we were able to understand the emotions and
Our predictions were better than the media predictions for North Carolina, Nevada,
sentiments expressed by the supporters and have determined their swing directions. In North
Wisconsin, Florida,
Carolina,and the Wisconsin,
Nevada, U.S. However, Florida,inandMichigan, Pennsylvania,
the U.S., our predictions wereGeorgia,
more accurateand than
Ohio,
the media predictions outperformed
the media predictions, suggesting ours.
thatInthe
Arizona
emotionsand Texas, both
and sentiments our predictions
expressed by individuals and
the predictions by the media were comparable (see Figures 15 and 16).
over the tweets have the potential to serve as proxies for their votes. The emotion and
sentiment classification by NRC for each week before the elections corroborated well and
Figureaccurately
16. Spider withchart comparing
the various ourevents
political predictions toplace
that took the actual
duringoutcomes
that period, ofthus
the making
2020 president
5. Discussion
election. and Conclusions
it possible to understand the dynamics in the emotions and sentiments of the supporters.
In this study, we have
This study demonstrated
has evidently thethe
highlighted potential
potentialand utilitysocial
of mining of themedia
NRCdata classifier
and thefor
wealth of information it holds. At the same time, advances in
emotion and sentiment classification. Using the NRC classifier, we have classified aboutNevad
Our predictions were better than the media predictions the big
for data
North infrastructure
Carolina,
and technology have paved the way for capturing, storing, and processing large volumes of
Wisconsin,
seven million tweetsFlorida,
related andtothetheU.S.
2020However, in Michigan,
U.S. presidential Pennsylvania,
election. Georgia, and Oh
In four battleground
social media data from different sources. Together, they have made it possible to design and
the media
states (Arizona, predictions
Michigan,
implement automated outperformed
Texas, Northours.
andpredictive
real-time In Arizona
Carolina),
analytics and
we were
systems. Texas,
In sum,able toboth
analyzing our predictions
understand
emotions andthe a
emotionstheand
predictions
sentiments
sentiments by the
embedded media
expressed
on Twitterwere thecomparable
byusing data-centric(see
asupporters andFigures
have
approach 15 andvaluable
hasdetermined
provided 16).their swing
insights
directions. InintoNorth Carolina,public
understanding Nevada, Wisconsin,
sentiment Florida,
in real time. and therefining
We encourage U.S., our
thesepredictions
predictive
5. models
Discussion to help
and policymakers
Conclusions better understand important
were more accurate than the media predictions, suggesting that the emotions and societal trends in order to make
senti-
informed decisions and facilitate effective targeted interventions.
ments expressed
In this bystudy,
This
individuals
we have
study highlights
over the tweets have
demonstrated the
the potential
the superior performance
potential to serve
andclassifier
of the NRC utility as proxies
ofidentifying
in the NRC the for
classifier f
their votes. The
emotion emotion
and and
emotions and
sentiment sentiment
sentimentsclassification.classification
Using theorNRC
expressed by individuals by NRC for each
classifier,
a community week
we the
toward have before the
classified abo
candidates
elections corroborated
seven well and
million tweets accurately
related to thewith
2020the various
U.S. politicalelection.
presidential events that tookbattlegrou
In four place
states (Arizona, Michigan, Texas, and North Carolina), we were able to understand t
emotions and sentiments expressed by the supporters and have determined their swi
directions. In North Carolina, Nevada, Wisconsin, Florida, and the U.S., our predictio
Big Data Cogn. Comput. 2024, 8, 111 18 of 19

of the 2020 U.S. presidential election. We believe this approach to mine social media data
and understanding the emotions and sentiments of an individual or a community has broad
applicability not just in predicting the outcomes of political events but also in studying
public sentiment surrounding social policy and public policy issues. Therefore, we believe
that this study could be a great case for assessing public sentiment regarding major party
platforms or ballot initiatives.

Author Contributions: Conceptualization, S.M.S. and Y.-F.P.; methodology, S.M.S.; validation, S.M.S.;
formal analysis S.M.S.; investigation, S.M.S. and Y.-F.P.; data curation, S.M.S.; writing—original draft
preparation, S.M.S.; writing—review and editing, Y.-F.P. All authors have read and agreed to the
published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The original contributions presented in the study are included in the article.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Ling, R.; Baron, N.S. Text Messaging and I.M.: Linguistic Comparison of American College Data. J. Lang. Soc. Psychol. 2007, 26, 291–298.
[CrossRef]
2. Hasan, M.; Rundensteiner, E.; Agu, E. EMOTEX: Detecting Emotions in Twitter Messages. In Proceedings of the ASE BIG-
DATA/SOCIALCOM/CYBERSECURITY Conference, Stanford, CA, USA, 27–31 May 2014; pp. 1–10.
3. Russell, J.A. A Circumplex model of affect. J. Personal. Soc. Psychol. 1980, 39, 1161–1178. [CrossRef]
4. Pak, A.; Paroubek, P. Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of the Seventh International
Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta, 17–23 May 2010; pp. 1320–1326.
5. Barbosa, L.; Feng, J. Robust sentiment detection on Twitter from biased and noisy data. In Proceedings of the 23rd International
Conference on Computational Linguistics (COLING 2010), Beijing, China, 23–27 August 2010; pp. 36–44.
6. Go, A.; Bhayani, R.; Huang, L. Twitter Sentiment Classification Using Distant Supervision. In CS224N Project Report; Stanford
University Press: Redwood City, CA, USA, 2009; pp. 1–12.
7. Bryneilsson, J.; Johansson, F.; Jonsson, C.; Westling, A. Emotion classification of social media posts for estimating people’s
reactions to communicated alert messages during crises. Secur. Inform. 2014, 3, 7. [CrossRef]
8. Roberts, K.; Roach, M.A.; Johnson, J.; Guthrie, J.; Harabagiu, S.M. EmpaTweet: Annotating and Detecting Emotions on Twitter.
LREC 2012, 12, 3806–3813.
9. Bhowmick; Kumar, P.; Basu, A.; Mitra, P. Classifying emotion in news sentences: When machine classification meets human
classification. Int. J. Comput. Sci. Eng. 2010, 2, 98–108.
10. Chatterjee, A.; Gupta, U.; Chinnakotla, M.K.; Srikanth, R.; Galley, M.; Agrawal, P. Understanding emotions in text using deep
learning and big data. Comput. Hum. Behav. 2019, 93, 309–317. [CrossRef]
11. Yoon, K. Convolutional neural networks for sentence classification. arXiv 2014, arXiv:1408.5882v2.
12. Kalchbrenner, N.; Grefenstette, E.; Blunsom, P. A convolutional neural network for modelling sentences. arXiv 2014, arXiv:1404.2188v1.
13. Acharya, U.R.; Oh, S.L.; Hagiwara, Y.; Tan, J.H.; Adeli, H. Deep convolutional neural network for the automated detection and
diagnosis of seizure using EEG signals. Comput. Biol. Med. 2018, 100, 270–278. [CrossRef]
14. Hamdi, E.; Rady, S.; Aref, M. A Deep Learning Architecture with Word Embeddings to Classify Sentiment in Twitter. In Proceedings
of the International Conference on Advanced Intelligent Systems and Informatics, Cairo, Egypt, 19–21 October 2020; pp. 115–125.
15. Zhang, R.; Lee, H.; Radev, D. Dependency sensitive convolutional neural networks for modeling sentences and documents. arXiv
2016, arXiv:1611.02361.
16. Zhou, C.; Sun, C.; Liu, Z.; Lau, F. A C-LSTM neural network for text classification. arXiv 2015, arXiv:1511.08630.
17. Lai, S.; Xu, L.; Liu, K.; Zhao, J. Recurrent convolutional neural networks for text classification. In Proceedings of the Twenty-Ninth
Aaai Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015.
18. Abdul-Mageed, M.; Ungar, L. Emonet: Fine-grained emotion detection with gated recurrent neural networks. In Proceedings of
the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada,
30 July–4 August 2017; pp. 718–728.
19. Kratzwald, B.; Ilić, S.; Kraus, M.; Feuerriegel, S.; Prendinger, H. Deep learning for affective computing: Text-based emotion
recognition in decision support. Decis. Support Syst. 2018, 115, 24–35. [CrossRef]
20. Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Manning, C.D.; Ng, A.Y.; Potts, C. Recursive deep models for semantic compositional-
ity over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing,
Seattle, WA, USA, 18–21 October 2013; pp. 1631–1642.
21. Zhou, P.; Qi, Z.; Zheng, S.; Xu, J.; Bao, H.; Xu, B. Text classification improved by integrating bidirectional LSTM with two-
dimensional max pooling. arXiv 2016, arXiv:1611.06639v1.
Big Data Cogn. Comput. 2024, 8, 111 19 of 19

22. Czarnek, G.; Stillwell, D. Two is better than one: Using a single emotion lexicon can lead to unreliable conclusions. PLoS ONE
2022, 17, e0275910. [CrossRef] [PubMed] [PubMed Central]
23. Barnes, J. Sentiment and Emotion Classification in Low-resource Settings. In Proceedings of the 13th Workshop on Computational
Approaches to Subjectivity, Sentiment, & Social Media Analysis, Toronto, ON, Canada, 14 July 2023; pp. 290–304.
24. Peng, B.; Lee, L.; Vaithyanathan, S. Thumbs us? Sentiment classification using machine learning techniques. In Proceedings of the Seventh
Conference on Empirical Methods in Natural Language Processing (EMNLP-02), Philadelphia, PA, USA, 6–7 July 2002; pp. 79–86.
25. Lliou, T.; Anagnostopoulos, C.N. Comparison of Different Classifiers for Emotion Recognition. In Proceedings of the 13th
Panhellenic IEEE Conference on Informatics, Corfu, Greece, 10–12 September 2009; Available online: http://ieeexplore.ieee.org/
document/5298878/ (accessed on 10 September 2016).
26. Badshah, A.M.; Ahmad, J.; Lee, M.Y.; Baik, S.W. Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC
and Random Forest. In Proceedings of the 2nd International Integrated Conference & Concert on Convergence, Saint Petersburg,
Russia, 7–14 August 2016; pp. 1–8.
27. Ghazi, D.; Inkpen, D.; Szpakowicz, S. Hierarchical versus Flat Classification of Emotions in Text. In Proceedings of the NAACL
HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, Los Angels, CA, USA, 5 June
2010; pp. 140–146.
28. Aman, S.; Szpakowicz, S. Identifying Expressions of Emotion in Text. In Text, Speech and Dialogue; Springer: Berlin/Heidelberg,
Germany, 2007; Volume 4629, pp. 196–205.
29. Chaffar, S.; Inkpen, D. Using a Heterogeneous Dataset for Emotion Analysis in Text. In Advances in Artificial Intelligence, Proceedings of
the 24th Canadian Conference on Artificial Intelligence, St. John’s, Canada, 25–27 May 2011; Springer: Berlin/Heidelberg, Germany, 2011.
30. Purver, M.; Battersby, S. Experimenting with distant supervision for emotion classification. In Proceedings of the 13th EACL.
Association for Computational Linguistics, Avignon, France, 23–27 April 2012; pp. 482–491.
31. Choudhury, M.D.; Gamon, M.; Counts, S.; Horvitz, E. Predicting depression via social media. In Proceedings of the International
AAAI Conference on Weblogs and Social Media (ICWSM’13), Cambridge, MA, USA, 8–11 July 2013.
32. Thelwall, M.; Buckley, K.; Platoglou, G.; Kappas, A. Sentiment strength detection in short informal text. J. Am. Soc. Inf. Sci.
Technol. 2010, 61, 2544–2558. [CrossRef]
33. Rohini, V.; Thomas, M. Comparison of Lexicon based and Naïve Bayes Classifier in Sentiment Analysis. Int. J. Sci. Res. Dev. 2015,
3, 1265–1269.
34. Mohammad, S.; Turney, P. Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion
Lexicon. In Proceedings of the NAACL-HLT 2010 Workshop on Computational Approaches to Analysis and Generation of
Emotion in Text, Los Angeles, CA, USA, 5 June 2010.
35. Mohammad, S. Emotional Tweets. In Proceedings of the First Joint Conference on Lexical and Computational Semantics, Montréal,
QC, Canada, 7–8 June 2012.
36. Srinivasan, S.; Sangwan, R.S.; Neill, C.J.; Ziu, T. Twitter data for predicting election results: Insights from emotion classification.
IEEE Technol. Soc. Mag. 2019, 38, 58–63. [CrossRef]
37. Zhaou, M.; Srinivasan, S.M.; Tripathi, A. Four-Class Emotion Classification Problem using Deep-Learning Classifiers. J. Big
Data-Theory Pract. (JBDTP) 2022, 1, 42–50. [CrossRef]
38. Srinivasan, S.M.; Chari, R.; Tripathi, A. Modelling and Visualizing Emotions in Twitter Feeds. Int. J. Data Min. Model. Manag.
2021, 13, 337–350.
39. Srinivasan, S.M.; Ramesh, P. Comparing different Classifiers and Feature Selection techniques for Emotion Classification. Int. J.
Soc. Syst. Sci. 2018, 10, 259–284.
40. Srinivasan, S.M.; Sangwan, R.S.; Neill, C.J.; Zu, T. Power of Predictive Analytics: Using Emotion Classification of Twitter Data for
Predicting 2016 US Presidential Elections. J. Soc. Media Soc. 2019, 8, 211–230.
41. Srinivasan, S.M. Predictive modeling and visualization of emotions in Twitter feeds. In Proceedings of the 42nd Annual Meeting
of Northeastern Association of Business, Economics and Technology, University Park, PA, USA, 7–8 November 2019.
42. Stanton, J. An Introduction to Data Science. 2013. Available online: https://ia804509.us.archive.org/35/items/DataScienceBookV3
/DataScienceBookV3.pdf (accessed on 10 June 2022).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like