A Data-Centric Approach to Understanding the 2020 U.S.
A Data-Centric Approach to Understanding the 2020 U.S.
cognitive computing
Article
A Data-Centric Approach to Understanding the 2020 U.S.
Presidential Election
Satish Mahadevan Srinivasan 1, * and Yok-Fong Paat 2
1 Engineering Department, Pennsylvania State University, Great Valley, Malvern, PA 19355, USA
2 Department of Social Work, The University of Texas at El Paso, El Paso, TX 79968, USA; [email protected]
* Correspondence: [email protected]
Abstract: The application of analytics on Twitter feeds is a very popular field for research. A tweet
with a 280-character limitation can reveal a wealth of information on how individuals express their
sentiments and emotions within their network or community. Upon collecting, cleaning, and mining
tweets from different individuals on a particular topic, we can capture not only the sentiments and
emotions of an individual but also the sentiments and emotions expressed by a larger group. Using
the well-known Lexicon-based NRC classifier, we classified nearly seven million tweets across seven
battleground states in the U.S. to understand the emotions and sentiments expressed by U.S. citizens
toward the 2020 presidential candidates. We used the emotions and sentiments expressed within
these tweets as proxies for their votes and predicted the swing directions of each battleground state.
When compared to the outcome of the 2020 presidential candidates, we were able to accurately predict
the swing directions of four battleground states (Arizona, Michigan, Texas, and North Carolina), thus
revealing the potential of this approach in predicting future election outcomes. The week-by-week
analysis of the tweets using the NRC classifier corroborated well with the various political events that
took place before the election, making it possible to understand the dynamics of the emotions and
sentiments of the supporters in each camp. These research strategies and evidence-based insights
may be translated into real-world settings and practical interventions to improve election outcomes.
media is encouraging users to submit short messages, thus replacing the use of traditional
electronic documents and article styles for expressing views and opinions.
Twitter tweets are short messages, unlike conventional texts. They are very peculiar
in terms of size and structure. As indicated earlier, they are restricted to 280 characters.
Because of this limitation, users have to use very restrictive language to express their
feelings and emotions. The language used in tweets is very different from the language
used in other digitized documents such as blogs, articles, and news [1]. A large variety
of features (i.e., words) are used in these texts, which poses a very significant challenge.
Upon representing these texts as a vector of features, each text results in exponentially
increasing the size of the available features. This is because the corpus would contain a
million features for a given topic [2]. A major significant challenge persists in manually
classifying the text within the tweets into different emotion classes. Manually classifying
the tweets into different emotion classes has been tried previously. However, manually
annotating the tweets into different emotion classes is not free from ambiguity and does not
also guarantee 100% accuracy [2]. It is also the inherent complexity of the various emotional
states that poses a significant challenge. Differentiating the emotional classes from one
another is also a complication. According to the Circumplex model [3], human beings have
28 different types of affect emotions. To explain this, Russell proposed a two-dimensional
circular space model in which it was demonstrated that the 28 different emotion types differ
from each other by a slight angle. Russell clearly showed that several emotion types are
clustered so closely together that it becomes difficult to differentiate between them. Thus, it
becomes very difficult for humans to label those texts accurately. When humans try to label
these texts, there is a notable risk of mislabeling the emotions that are subtly different or
close to each other. This is a serious issue because it eventually inhibits the classifier from
learning the critical features that can be used to identify emotions hidden in the texts.
In this article, we focus on analyzing the tweets collected during the 2020 presidential
elections. Using the lexicon-based NRC classifier, we analyzed the emotions and sentiments
expressed by people toward the two presidential candidates, Donald Trump and Joe Biden,
on various topics. Based on these emotions and sentiments, we predicted the swing
direction of the 2020 presidential election in a subset of states deemed battleground and
key to the election. To begin with, we have provided a short review of the emotion
classification work performed in the past. Following that, we discussed the materials and
methods employed in this study, presented the results and discussions from this study, and
concluded our research with discussions on the scope for future work.
Literature Review
Studies related to sentiment and emotional classification have recently garnered consider-
able empirical attention. This popularity is due to the increase in the amount of unstructured
opinion-rich text resources from social media, blogs, and textual corpus. These texts have
given researchers and companies access to the opinions of a larger group of individuals
around the globe. Meanwhile, the advances in ML and natural language processing (NLP)
have also sparked increased interest in sentiment and emotion classification. For example,
Hasan, Rundensteiner, and Agu (2014) have proposed the use of EMOTEX, which can detect
emotions in text messages. EMOTEX uses supervised classifiers for emotion classification.
Using Naïve Bayes (N.B.), Support Vector Machine (SVM), Decision trees, and the KNN
(k-nearest neighbor), they have demonstrated 90% precision for a four-class emotion classifi-
cation on the Twitter dataset [2]. Other studies, including the work by Pak et al. (2010) and
Barbosa et al. (2010), have considered using ML techniques on Twitter datasets. They both
have demonstrated accuracies ranging between 60% and 80% for distinguishing between
positive and negative classes [4,5]. Go et al. (2009) have also performed sentiment analysis
on the Twitter dataset using Western-style emoticons. They have used the N.B., SVM, and
Maximum Entropy and have reported an accuracy of 80% [6].
Furthermore, Brynielsson et al. (2014) have demonstrated close to 60% accuracy on a
four-class emotion (positive, fear, anger, and others) classification on the tweets related to
the Sandy hurricane using the SVM classifier [7]. Last but not least, Roberts et al. (2012)
Big Data Cogn. Comput. 2024, 8, 111 3 of 19
have proposed Empa Tweet that can be used to annotate and detect emotions on Twitter
posts. In their work, they have discussed developing a synthetic corpus containing tweets
for seven different emotion types (anger, disgust, fear, joy, love, sadness, and surprise).
On their constructed synthetic dataset, they used seven different binary SVM classifiers
and classified the tweets. Using their ensemble classification technique, they have classified
each tweet to determine if a particular emotion was present. In addition, they have reported
that their corpus contained tweets with multiple emotion labels [8].
Emotion and sentiment classification has been widely researched using various ma-
chine learning and deep learning techniques.
Bhowmick et al. (2010) performed an experiment where they observed that humans and
machine learning models exhibited a very similar level of performance for emotion and senti-
ment classification on multiple data sets. Therefore, they concluded that the machine learning
(deep learning) models can be trusted for this task [9]. Chatterjee et al. (2019) also confirmed
through their study that methods employing Deep neural networks outperform other off-
the-shelf models for emotion classification in textual data [10]. Kim (2014) performed several
experiments on emotion classification using CNN on multiple benchmark datasets, including
the fine-grained Stanford Sentiment Treebank. A simple CNN with slight hyperparameter
tuning demonstrated excellent results for binary classifications of different emotions [11]. In a
work by Kalchbrenner et al., 2014 Dynamic Convolutional Neural Network (DCNN) has been
explored for sentiment classification on the Twitter dataset. According to them, DCNN is
capable of handling varying lengths of input texts in any language. They have reasoned that
the use of Dynamic k-Max Pooling makes DCNN a potential method for sentiment analysis
of Twitter data [12]. Acharya et al. (2018) have explored emotion detection in EEG signals. In
their study, they have explored and demonstrated the potential of using the complex 13-layer
CNN architecture [13]. In one of the studies, Hamdi et al. (2020) utilized the CNN streams and
the pre-trained word embeddings (Word2Vec) to achieve a staggering 84.9% accuracy on the
Stanford Twitter Sentiment dataset [14]. On the contrary, Zhang et al. (2016) have proposed the
Dependency Sensitive Convolutional Neural Networks (DSCNN) that outperforms traditional
CNNs. They have reported 81.5% accuracy in the sentiment analysis of Movie Review Data
(MR) [15]. Zhou et al. (2015) have proposed C-LSTM, which utilizes both the CNN and LSTM
for a 5-class classification task. However, they have only reported an accuracy of 49.2% [16].
Since emotion and sentiment classification is a sequence problem, several studies
have focused on exploring recurrent neural networks or RNNs. Lai et al. (2015) have
explored RNNs and have determined that RNNs have the capability to capture the key
features and phases in texts that can help boost performance for emotion and sentiment
classification [17]. Abdul-Mageed and Ungar (2017) have explored Gated RNN or GRNN
for emotion and sentiment classification in several dimensions and have demonstrated
significantly high accuracies [18]. Kratzwald et al. (2018) have explored six benchmark
datasets for emotion classification using the combination of both the RNN and sent2affect.
They have reported exceptional performance of this combination when compared against
any traditional machine learning algorithm [19].
Using the Recursive Neural Tensor Network (RNTN) for the famous Stanford Sen-
timent Treebank dataset (SST), Socher et al. (2013) have reported 85.4% accuracy for
sentiment classification [20]. Zhou et al. (2016) used the BLSTM-2DCNN architecture for
Stanford Sentiment Treebank binary and fine-grained classification tasks, archiving a mere
52.4% accuracy. In their study, they observed that the BLSTM-2DCNN architecture was
very efficient in capturing long-term sentence dependencies [21]. Czarnek et al. (2022)
used the Linguistic Inquiry and Word Count (LIWC) and NRC Word-Emotion Associa-
tion Lexicon (NRC) to investigate whether older people have more positive expressions
through their language use. They examined nearly five million tweets created by 3573
people between 18 and 78 years old and found that both methods show an increase in
positive affect until age 50. They also concluded that according to NRC, the growth of
positive affect increases steadily until age 65 and then levels off [22]. Barnes, J. (2023)
has presented a systematic comparison of sentiment and emotion classification methods.
Big Data Cogn. Comput. 2024, 8, 111 4 of 19
In this study, different methods for sentiment and emotion classification have been com-
pared, ranging from rule- and dictionary-based methods to recently proposed few-shot
and prompting methods with large language models. In this study, it has been reported
that in different settings—including the in-domain, out-of-domain, and cross-lingual—the
rule- and dictionary-based methods outperformed the few-shot and prompting methods in
low-resource settings [23].
There are three types of classifiers for emotion and sentiment classification: supervised,
unsupervised, and lexicon-based classifiers. Supervised classifiers are more commonly
used to address the emotion and sentiment classification problem [2,4–6,8,24–32]. However,
a training dataset is required to employ a supervised classifier for the classification problem.
More specifically, a domain-specific training dataset is required. Obtaining a domain-
specific training dataset for a task in hand is hard as it might not always be available.
Therefore, it is wise to explore unsupervised or lexicon-based classifiers.
Unsupervised classifiers are utilized to model the underlying structure or the distribution
of the data. Therefore, these algorithms are left on their own to discover and present interesting
patterns. Upon using unsupervised learning, users are left to look at those patterns and assign
the class labels. In the Lexicon-based approach, the aim is to identify certain patterns that
occur together with a seed list of sentiment/emotion-related words. More specifically, similar
sentiment/emotion-related words are identified from a large corpus with the same feature-
specific orientations. For this study, the unavailability of the domain-specific corpus is a
major challenge [33]. Therefore, we have opted for a lexicon-based classifier, NRC [34,35], to
determine the emotions and sentiments expressed within the collected tweets. In previous
works related to US presidential elections [36–41], it has been clearly demonstrated that the
NRC classifiers are best suited for emotion and sentiment classification of tweets compared to
several different supervised learning techniques.
2. Methods and Materials
2.1. Methodology
Collecting and mining Twitter feeds related to political events provides insight into the
opinion expressed by an individual or an entire community. In this particular context, the
Twitter feeds have the potential to serve as a proxy for the voter’s vote. These tweets also
have the potential to predict and understand all the major events related to the presidential
elections, ultimately leading to determining the outcomes. Srinivasan et al. (2019) clearly
demonstrated that the collected Twitter feeds relating to the 2016 presidential election had
the potential to predict the major events that led to the final outcome of the election [36].
In this study, we have chosen to collect data from a secondary source (social media sites).
To address the research queries in this study, we have identified collecting tweets from
Twitter. Here, we will first introduce the lexicon-based NRC classifier that will be used to
classify the tweets. Secondly, we will discuss the data collection process in detail.
2.1.1. NRC Classifier
The tweets collected for this study were classified using the NRC classifier. We im-
plemented the NRC classifier in R (version 3.6.0) using the Syuzhet package. The Syuzhet
package implements the NRC classifier using the function get_nrc_sentiment (). This function
classifies tweets across eight emotions and two sentiments.
The NRC is a lexicon-based classifier with annotations for about 14,182 words. Using
these words, the NRC classifier can classify texts into eight different emotions—anger,
anticipation, disgust, fear, joy, sadness, surprise, and trust—as well as sentiments: negative
and positive. This lexicon corpus (words) of the NRC classifier was constructed based on
two measures, namely the Strength of Association (SOA) and Pointwise Mutual Information
(PMI). The use of the two measures mentioned above ensures that the lexicon corpus has
the potential to determine a particular emotion class in a sentence [34,35]. A significant
drawback of the NRC classifier is that it cannot classify those sentences that do not contain
the words that belong to the lexicon corpus.
Big Data Cogn. Comput. 2024, 8, 111 5 of 19
Across each state, we determined each candidate’s average net sentiment score, which
is the difference between the fraction of the positive sentiment tweet to the total number
of tweets and the fraction of the negative sentiment tweet to the total number of tweets.
We predicted that a state would swing in favor of a candidate if a candidate received the
highest number of net positive tweets. Similarly, we determined the average score for each
of the eight emotions as a fraction of the total number of tweets.
2.1.2. Data Collection
To retrieve the tweets from Twitter, we developed an automated script. This script was
designed to retrieve tweets using both the Search API, which is part of the Twitter REST
API, and the built-in Twitter API package within the R Studio. To successfully execute
the automated script, we established a developer account on Twitter that provided us
with access to various Tokens and API keys. The automated script identified and used
appropriate handles for both candidates, Donald Trump and Joe Biden, to selectively
retrieve the tweets. Tweets were collected over eight months in 2020.
Using 20 different hashtags, a total of 7,653,518 tweets were collected for both presidential
candidates between 3 March 2020 and 30 October 2020. Of these hashtags, the most popular
were Trump, Trump2000, and Joe Biden. The unique number of tweeters who tweeted about
Donald Trump and Joe Biden was 6,258,545 and 1,394,973, respectively. Table 1 provides a
monthly distribution of the collected tweets. It is evident that there were at least five times the
number of tweets collected for Donald Trump than for Joe Biden (see Table 1).
Table 1. Month-wise distribution of tweets collected for the U.S. 2020 presidential candidates.
Once the tweets were collected, extensive cleaning was performed. To clean the tweets,
we used the gsub function from the stringr package in R. Stanton outlined the steps for
cleaning the tweets [42]. The focus of data cleaning was to get rid of unnecessary spaces,
get rid of the URLs, and remove the retweet header, hashtags, and references to other links.
We used the R statistical package cldr and identified that ~10% of the tweets were posted in
38 different languages. The rest of the tweets were in English. All the non-English tweets
were identified and filtered out.
Both data collection and cleaning took about ten hours per Twitter handle. The entire
workload was distributed evenly across several Google Cloud Computing (GCC) engines.
Two machines, each running seven different GCC virtual machines, took approximately
seven hours to collect the data. Once the data was collected, the data was validated
by a two-fold mechanism. A Python script was first used to compare the daily tweets
collected by the “Streaming APIs” of Twitter in order to confirm the completeness of the
tweets’ content and attributes. Another R script using the same “Streaming APIs” was
compared against the main script, but instead of comparing the daily tweets collected, it
was compared on a weekly basis to confirm the completeness of the collected tweets.
All the collected tweets were classified using the NRC classifier, which was imple-
mented in R using the Syuzhet package. This package contains an implementation of
an interface (function) called get_nrc_sentiment (). All the tweets were classified into one
of the eight different emotions and one of the two sentiments. For each tweet, the NRC
Big Data Cogn. Comput. 2024, 8, 111 6 of 19
classifier gives a nominal value ranging between zero (low) and seven (high) across eight
emotions and two sentiments. We assigned a unique label to each tweet by determining
which emotion was prominent (high nominal value) within it. If a tweet had more than one
prominent emotion, then we duplicated those tweets as a separate instance and assigned all
the respective prominent emotions to it. While classifying the tweets, we also encountered
situations where tweets had no emotion or sentiment label attached. This was because the
NRC classifier assigned a zero (0) nominal value across eight emotions and two sentiments.
Such tweets were removed from further processing and analysis. In addition to that, we
also computed the average net sentiment score for each candidate. We determined each
candidate’s average net sentiment score, which is the difference between the fraction of
the positive sentiment tweet to the total number of tweets and the fraction of the negative
sentiment tweet to the total number of tweets. We predicted that a state would swing
in favor of a candidate if a candidate received the highest number of net positive tweets.
Similarly, we determined the average score for each of the eight emotions as a fraction of
the total number of tweets.
3. Results
Using the NRC classifier, we determined the general population’s emotions and
sentiments expressed toward both presidential candidates in these tweets. Figure 1 shows
a higher overall positive sentiment for Biden (0.917) than for Trump (0.827). Overall, the
negative sentiment was similar, slightly more toward Trump (0.869) than Biden (0.859).
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 7 of 22
People expressed more anger toward Biden than Trump but were more surprised, disgusted,
and less trusting toward Trump.
Figure 1. Sentiment and emotion analysis of people toward the 2020 U.S. presidential candidates.
Figure 1. Sentiment and emotion analysis of people toward the 2020 U.S. presidential candidates.
Figure 2 indicates that the average positive sentiment for Trump has increased stead-
ily since June but decreased again in October as the election approached. On the other
hand, the average negative sentiment for Trump has been decreasing since mid-July,
while the negative sentiment for Biden has been increasing since August.
Big Data Cogn. Comput. 2024, 8, 111 7 of 19
Figure 2. Month-wise analyses of positive and negative sentiments for both the 2020 U.S. presiden-
tial candidates.
Figure 2. Month-wise
Figure 2. Month-wiseanalyses
analyses of positive
of positive and negative
and negative sentiments
sentiments for2020
for both the bothU.S.
thepresidential
2020 U.S. candidates.
presiden-
tial candidates.
Figure 3 shows the net sentiment (net sentiment = positive − negative) score for both
Figure candidates,
presidential 3 shows theand
net sentiment (netTrump
it is clear that sentiment
has =been
positive − negative)
gaining score forsince
ground steadily both
AugustFigure
presidential3candidates,
comparedshows the net
and
to Biden sentiment
andit is clear
was (net Trump
that
holding sentiment
a slighthas=been
positive − negative)
gaining
advantage. groundscore for since
steadily both
presidential candidates,
August compared andand
to Biden it iswas
clearholding
that Trump hasadvantage.
a slight been gaining ground steadily since
August compared to Biden and was holding a slight advantage.
Figure
Figure3.3.Month-wise
Month-wise analyses of the
analyses of thenet
netsentiment
sentimentscore
scorefor
for both
both thethe 2020
2020 U.S.
U.S. Presidential
Presidential Candi-
Candidates.
dates.
FigureWe analyze both
3. Month-wise the candidates’
analyses stand related
of the net sentiment to healthcare,
score for both the 2020immigration, economy,
U.S. Presidential Candi-
dates.
raceWe analyze
relations, bothand
trade thetariffs,
candidates’
foreignstand related
affairs, to healthcare,
and climate change,immigration, economy,to
which were essential
the electorate
race relations, in the 2020
trade electionforeign
and tariffs, cycle. Based
affairs,onand
Figure 4, Biden
climate heldwhich
change, an advantage on most
were essential
theWe
toofthe analyze
issues
electorate inboth
exceptthe the
thecandidates’
for2020 election stand
matters pertaining
cycle. related
Based toon to healthcare,
trade and4,tariffs
Figure immigration,
Bidenas wellan
held raceeconomy,
asadvantage
relations.
on
race relations, trade and tariffs, foreign affairs, and climate change, which
most of the issues except for the matters pertaining to trade and tariffs as well as race were essential
to the electorate
relations. in the 2020
Under Trump’ election cycle.
presidency, the US Based
economyon Figure
shrank4, Biden heldwhich
by 4.8%, an advantage on
took a sig-
most of the issues except for the matters pertaining to trade and tariffs
nificant hit for Trump in the 2020 presidential election. In terms of the economy, foreign as well as race
relations. Under Trump’ presidency, the US economy shrank by 4.8%, which took a sig-
nificant hit for Trump in the 2020 presidential election. In terms of the economy, foreign
Data Cogn.
BigComput.
Data Cogn.2024, 8, x2024,
Comput. FOR8, PEER
111 REVIEW 8 of 19
9 of
Under Trump’ presidency, the US economy shrank by 4.8%, which took a significant hit
affairs,
forand immigration,
Trump Trump’s election.
in the 2020 presidential averageInnet score
terms was
of the low until
economy, July,affairs,
foreign but then
and it start
picking up as theTrump’s
immigration, electionaverage
got closer. With
net score wasrespect toJuly,
low until trade
butand
thentariffs, people
it started pickingshowed
up mo
as the election got closer.
faith in Trump than in Biden. With respect to trade and tariffs, people showed more faith in
Trump than in Biden.
Figure 4. Cont.
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 10 of 22
Big
Big Data Cogn. Comput.
Data Cogn. Comput. 2024,
2024, 8,
8, x FOR PEER REVIEW
111 10 of 22
9 of 19
Figure 4. Candidates’ stand on the issues important to the electorate in the 2020 election cycle.
Figure 4. Candidates’ stand on the issues important to the electorate in the 2020 election cycle.
Figure 4. Candidates’ stand on the issues important to the electorate in the 2020 election cycle.
Figure 5 shows the overall sentiment for Trump and Biden and important events be-
Figure 5 shows the overall sentiment for Trump and Biden and important events be-
tween March Figure and
5 shows the overall
October sentiment for the
2020. Interestingly, Trump and Biden
increase and importanthad
in unemployment events
a much
tween March and October 2020. Interestingly, the increase in unemployment had a much
more between March
dramatic and October 2020. Interestingly, the increase in unemployment had a much
more dramaticeffect
effecton onBiden
Biden than
than onon Trump.
Trump.Trump’s
Trump’s attempt
attempt to block
to block JohnJohn Bolton’s
Bolton’s
book moreand dramatic
his effect
views on ondelaying
Biden thantheon2020
Trump. Trump’s attempt
presidential electionto block John Bolton’s
impacted the booksenti-
overall
book and his views on delaying the 2020 presidential election impacted the overall senti-
and his views on delaying the 2020 presidential election impacted the overall sentiment
mentment he he
garnered.
garnered. By
Bycontrast,
contrast, Joe Biden’sselection
Joe Biden’s selection of hishis vice presidential running mate
he garnered. By contrast, Joe Biden’s selection of hisofvice vice presidential
presidential running
running matematedid
diddid not seem
not seem
not seem
to have
to have
to have
helped
helped
helped
him.
him.him.
Overall,
Overall,
Overall,
Trump
Trump
Trump heldheld
held a slight
a slight
a slight
edge
edgeedge
overover
over
BidenBidenBiden
sincesince
since
the the
the first
firstfirst
presidential
presidential
presidential debate.
debate.
debate.
Figure 5. Month-wise analysis of the overall average sentiment for both the 2020 U.S. Presidential Can-
didates (blue for Biden, red for Trump) correlated with major events.
Figure 5. Month-wise analysis of the overall average sentiment for both the 2020 U.S. Presidential Can-
Figure
We Month-wise
5. also analysis of the overall average sentiment for both thecollected
2020 U.S. from
Presidential
didates (blue for performed sentiment
Biden, red for Trump)and emotion
correlated analyses
with major on tweets
events. seven
Candidates (blue for Biden, red for Trump) correlated with major events.
battleground states (Texas, Florida, Arizona, Michigan, Wisconsin, Pennsylvania, and
We also performed sentiment and emotion analyses on tweets collected from seven
battleground states (Texas, Florida, Arizona, Michigan, Wisconsin, Pennsylvania, and
Big Data Cogn. Comput. 2024, 8, 111 10 of 19
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 11 of 22
We also performed sentiment and emotion analyses on tweets collected from seven
battleground
North Carolina). states (Texas,
A total Florida, Arizona,
of 1,507,525 Michigan,
tweets were analyzedWisconsin, Pennsylvania,
across the and
seven battleground
North Carolina). A total of 1,507,525 tweets were analyzed across the seven battleground
states collected since July 2020. As depicted in Figure 6, a vast majority of the tweets came
states collected since July 2020. As depicted in Figure 6, a vast majority of the tweets came
from Texas and Florida, and Trump received relatively more tweets than Biden.
from Texas and Florida, and Trump received relatively more tweets than Biden.
Figure 6. 6.
Figure Distribution
Distributionof
oftweets collectedfrom
tweets collected fromthe
the seven
seven battleground
battleground states.
states.
AsAsshown
showninin Figure
Figure 7,7,Biden
Bidensecured a higher
secured net sentiment
a higher score inscore
net sentiment Pennsylvania and
in Pennsylvania
and Arizona, but Trump was in a close race in North Carolina and Wisconsin andinheld a
Arizona, but Trump was in a close race in North Carolina and Wisconsin and held a lead
Florida,
lead Michigan,
in Florida, and Texas.
Michigan, and Texas.
Big Data Cogn. Comput. 2024, 8, 111
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW 11 12
of of
1922
4. Analysis of theStates
4. Analysis of the Battleground Battleground States
Figure 8. Monthly-wise net sentiment analysis of both the presidential candidates in Arizona.
Figure Monthly-wise
Figure 8.8.Monthly-wise net sentiment
net sentiment analysis
analysis of both the of both thecandidates
presidential presidential candidates in Arizona.
in Arizona.
Table 2. Emotions toward both the presidential candidates in Arizona.
Figure
Figure 10.10. Monthly-wise
Monthly-wise net sentiment
net sentiment analysis
analysis of both of both
the presidential the presidential
candidates in Michigan. candidates in Michigan.
4.4. North
4.4. NorthCarolina
Carolina
In North Carolina, Trump held a significant lead over Biden in trust but also received
In North
more disgust, Carolina,
anger, Trump
and fear (see Table 5).held a significant
We also lead over
observed a substantial Biden
increase in trust but also received
in the
more disgust,
net sentiment anger,
toward Trump and
sincefear (see Table
September, 5). We
even though thealso observed
official a substantial
death toll due to increase in the
the COVID-19
net sentiment pandemic
toward surpassed
Trump 200,000.
since ForSeptember,
both candidates, the number
even though of the
tweetsofficial death toll due to
collected from North Carolina was very close, with slightly more tweets collected for
the COVID-19 pandemic surpassed 200,000. For both candidates, the number of tweets
Trump (i.e., ~9k for Trump and ~6k for Biden). Although we noticed a substantial rise in
collected fromfor
the net sentiment North
Biden Carolina
toward the was
end ofvery close,and
September with slightly more
the beginning tweets collected for Trump
of October,
there was
(i.e., ~9ka for
steepTrump
drop in and
positive
~6ksentiment for Biden,
for Biden). suggestingwe
Although that noticed
North Carolina
a substantial rise in the net
would undoubtedly swing toward Trump (see Figure 11).
sentiment for Biden toward the end of September and the beginning of October, there
was a Emotions
Table 5. steep drop
toward in positive
presidential sentiment
candidates in North for Biden, suggesting that North Carolina would
Carolina.
undoubtedlyEmotion swing toward Trump
Donald Trump(see Figure 11). Joe Biden
Anger 0.4248 0.3453
Table 5. Emotions
Disgust toward presidential
0.2966candidates in North0.2124
Carolina.
Fear 0.4430 0.3232
Emotion
Joy 0.3685 Donald Trump 0.2959 Joe Biden
Trust
Anger 0.6402 0.4248 0.5153 0.3453
Disgust 0.2966 0.2124
Fear 0.4430 0.3232
Joy 0.3685 0.2959
Big Data Cogn. Comput. 2024, 8, x FOR PEER REVIEW Trust 0.6402 15 of 22 0.5153
Figure
Figure 11.11. Monthly-wise
Monthly-wise net sentiment
net sentiment analysis
analysis of both of both the
the presidential presidential
candidates candidates in North Carolina.
in North Caro-
lina.
4.5. Pennsylvania
In Pennsylvania, both candidates experienced similar levels of trust, with Trump
slightly ahead of Biden. At the same time, Trump was also experiencing more disgust,
anger, and fear among people in Pennsylvania (see Table 6). In Pennsylvania, Trump en-
Big Data Cogn. Comput. 2024, 8, 111 14 of 19
4.5. Pennsylvania
In Pennsylvania, both candidates experienced similar levels of trust, with Trump slightly
ahead of Biden. At the same time, Trump was also experiencing more disgust, anger, and
fear among people in Pennsylvania (see Table 6). In Pennsylvania, Trump encountered a
lot of negative sentiment due to a variety of reasons, including suggesting that the election
should be delayed, recommending voters in North Carolina to vote twice, and the official
death toll for COVID-19 that surpassed 200,000, among others. However, in the final weeks of
the election, Trump surpassed Biden in the net average sentiment, strongly suggesting that
Pennsylvania would likely swing toward Trump (see Figure 12). Before the election’s final
weeks, this state appeared to be consistently swinging toward Biden.
Table 6. Emotions toward presidential candidates in Pennsylvania.
Figure 12. Monthly-wise net sentiment analysis of both the presidential candidates in Pennsylvania.
Figure 12. Monthly-wise net sentiment analysis of both the presidential candidates in Pennsylvania.
4.6. Texas
4.6. Texas
In Texas, Trump garnered notably greater trust among people over Biden. At the same
time,In Texas,showed
people Trumpslightly
garnered notably
more anger,greater
disgust,trust
fear,among
and joypeople
towardover himBiden. At the
(see Table 7).
same
In time,part
the later people showedafter
of October, slightly more anger,
the second disgust,
presidential fear, and
debate, Biden joyexperienced
toward him (see
a sub-
Table 7).
stantial In theinlater
decline part of October,
the positive sentiment,after the secondceding
consequently presidential
grounddebate,
to Trump,Biden experi-
suggesting
enced a substantial decline in the positive sentiment, consequently ceding
that Texas would swing toward Trump. From Figure 13, it is evident that the pandemic had ground to
Trump,
no suggesting
significant impactthat Texas
on the would swing
candidates. Even toward Trump.
as the U.S. deathFrom Figure 13,200,000,
toll surpassed it is evident
both
that the pandemic
candidates had no significant
were experiencing impact
an increase in on
thethe candidates.
average Even as score,
net sentiment the U.S. death
with toll
Biden
surpasseda 200,000,
receiving bothsurpassing
steeper rise, candidatesTrump
were experiencing an increase
in the latter part in the average net sen-
of September.
timent score, with Biden receiving a steeper rise, surpassing Trump in the latter part of
September.
Figure 13. Monthly-wise net sentiment analysis of both the presidential candidates in Texas.
Figure 13. Monthly-wise net sentiment analysis of both the presidential candidates in Texas.
4.7. Wisconsin
4.7. Wisconsin
In Wisconsin, Trump experienced significantly more trust than Biden but also more
angerInand fear. On the
Wisconsin, otherexperienced
Trump hand, Biden significantly
received slightly
moremore
trustdisgust (see Table
than Biden but 8).
alsoUnlike
more
in different
anger states,
and fear. OnBiden encountered
the other a steep
hand, Biden increase
received in the
slightly moreaverage net(see
disgust sentiment
Table 8).soon
Un-
after hedifferent
like in nominated Senator
states, Kamala
Biden Harris asahis
encountered running
steep mate.
increase inHowever, as noted
the average in Texas,
net sentiment
Biden experienced
soon after a decrease
he nominated in the
Senator positive
Kamala sentiment
Harris followingmate.
as his running the second
However,presidential
as noted
debate, bringing both Biden and Trump into a close race in late October
in Texas, Biden experienced a decrease in the positive sentiment following the (see Figure 14).
second
As the election date approached, Trump surpassed Biden on the average net
presidential debate, bringing both Biden and Trump into a close race in late October (See sentiment
score,
Figuresuggesting
14). As thethat Wisconsin
election would likelyTrump
date approached, favor Trump.
surpassed Biden on the average net
sentiment
Table score, toward
8. Emotions suggesting that Wisconsin
presidential candidateswould likely favor Trump.
in Wisconsin.
Table 8. Emotions
Emotiontoward presidential candidates in Wisconsin.
Donald Trump Joe Biden
Anger
Emotion 0.4172
Donald Trump 0.4013
Joe Biden
Disgust
Anger 0.2811
0.4172 0.2966
0.4013
Disgust
Fear 0.2811
0.4261 0.2966
0.3718
Fear
Joy 0.4261
0.3473 0.3718
0.3205
Joy
Trust 0.3473
0.6083 0.3205
0.5799
Trust 0.6083 0.5799
BigData
Big DataCogn.
Cogn. Comput.
Comput. 2024,
2024,8,
8,111
x FOR PEER REVIEW 18 of 22
16 of 19
Figure 14. Monthly-wise net sentiment analysis of both the presidential candidates in Wisconsin.
Figure 14. Monthly-wise net sentiment analysis of both the presidential candidates in Wisconsin.
Table 9 compares our study predictions with the outcome across the seven battleground
Table 9 compares our study predictions with the outcome across the seven battle-
states. The cells in Table 9 with bolded texts indicate the battleground states where our
ground states. The cells in Table 9 with bolded texts indicate the battleground states where
predictions matched the outcomes. Out of the seven battleground states we analyzed, our
our predictions matched the outcomes. Out of the seven battleground states we analyzed,
predictions were accurate for the four states (North Carolina, Texas, Arizona, and Michigan).
our predictions were accurate for the four states (North Carolina, Texas, Arizona, and
Michigan).
Table 9. Comparison of our study predictions with the final outcome.
Figure 15. Spider chart comparing the media predictions to the actual outcomes of the 2020 presi-
dential election.
Figure 15. Spider chart comparing the media predictions to the actual outcomes of the 2020 presidential election.
Figure 15. Spider chart comparing the media predictions to the actual outcomes of the 2020 pre
dential election.
Figure 16. Spider chart comparing our predictions to the actual outcomes of the 2020 presidential election.
of the 2020 U.S. presidential election. We believe this approach to mine social media data
and understanding the emotions and sentiments of an individual or a community has broad
applicability not just in predicting the outcomes of political events but also in studying
public sentiment surrounding social policy and public policy issues. Therefore, we believe
that this study could be a great case for assessing public sentiment regarding major party
platforms or ballot initiatives.
Author Contributions: Conceptualization, S.M.S. and Y.-F.P.; methodology, S.M.S.; validation, S.M.S.;
formal analysis S.M.S.; investigation, S.M.S. and Y.-F.P.; data curation, S.M.S.; writing—original draft
preparation, S.M.S.; writing—review and editing, Y.-F.P. All authors have read and agreed to the
published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The original contributions presented in the study are included in the article.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Ling, R.; Baron, N.S. Text Messaging and I.M.: Linguistic Comparison of American College Data. J. Lang. Soc. Psychol. 2007, 26, 291–298.
[CrossRef]
2. Hasan, M.; Rundensteiner, E.; Agu, E. EMOTEX: Detecting Emotions in Twitter Messages. In Proceedings of the ASE BIG-
DATA/SOCIALCOM/CYBERSECURITY Conference, Stanford, CA, USA, 27–31 May 2014; pp. 1–10.
3. Russell, J.A. A Circumplex model of affect. J. Personal. Soc. Psychol. 1980, 39, 1161–1178. [CrossRef]
4. Pak, A.; Paroubek, P. Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of the Seventh International
Conference on Language Resources and Evaluation (LREC 2010), Valletta, Malta, 17–23 May 2010; pp. 1320–1326.
5. Barbosa, L.; Feng, J. Robust sentiment detection on Twitter from biased and noisy data. In Proceedings of the 23rd International
Conference on Computational Linguistics (COLING 2010), Beijing, China, 23–27 August 2010; pp. 36–44.
6. Go, A.; Bhayani, R.; Huang, L. Twitter Sentiment Classification Using Distant Supervision. In CS224N Project Report; Stanford
University Press: Redwood City, CA, USA, 2009; pp. 1–12.
7. Bryneilsson, J.; Johansson, F.; Jonsson, C.; Westling, A. Emotion classification of social media posts for estimating people’s
reactions to communicated alert messages during crises. Secur. Inform. 2014, 3, 7. [CrossRef]
8. Roberts, K.; Roach, M.A.; Johnson, J.; Guthrie, J.; Harabagiu, S.M. EmpaTweet: Annotating and Detecting Emotions on Twitter.
LREC 2012, 12, 3806–3813.
9. Bhowmick; Kumar, P.; Basu, A.; Mitra, P. Classifying emotion in news sentences: When machine classification meets human
classification. Int. J. Comput. Sci. Eng. 2010, 2, 98–108.
10. Chatterjee, A.; Gupta, U.; Chinnakotla, M.K.; Srikanth, R.; Galley, M.; Agrawal, P. Understanding emotions in text using deep
learning and big data. Comput. Hum. Behav. 2019, 93, 309–317. [CrossRef]
11. Yoon, K. Convolutional neural networks for sentence classification. arXiv 2014, arXiv:1408.5882v2.
12. Kalchbrenner, N.; Grefenstette, E.; Blunsom, P. A convolutional neural network for modelling sentences. arXiv 2014, arXiv:1404.2188v1.
13. Acharya, U.R.; Oh, S.L.; Hagiwara, Y.; Tan, J.H.; Adeli, H. Deep convolutional neural network for the automated detection and
diagnosis of seizure using EEG signals. Comput. Biol. Med. 2018, 100, 270–278. [CrossRef]
14. Hamdi, E.; Rady, S.; Aref, M. A Deep Learning Architecture with Word Embeddings to Classify Sentiment in Twitter. In Proceedings
of the International Conference on Advanced Intelligent Systems and Informatics, Cairo, Egypt, 19–21 October 2020; pp. 115–125.
15. Zhang, R.; Lee, H.; Radev, D. Dependency sensitive convolutional neural networks for modeling sentences and documents. arXiv
2016, arXiv:1611.02361.
16. Zhou, C.; Sun, C.; Liu, Z.; Lau, F. A C-LSTM neural network for text classification. arXiv 2015, arXiv:1511.08630.
17. Lai, S.; Xu, L.; Liu, K.; Zhao, J. Recurrent convolutional neural networks for text classification. In Proceedings of the Twenty-Ninth
Aaai Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015.
18. Abdul-Mageed, M.; Ungar, L. Emonet: Fine-grained emotion detection with gated recurrent neural networks. In Proceedings of
the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada,
30 July–4 August 2017; pp. 718–728.
19. Kratzwald, B.; Ilić, S.; Kraus, M.; Feuerriegel, S.; Prendinger, H. Deep learning for affective computing: Text-based emotion
recognition in decision support. Decis. Support Syst. 2018, 115, 24–35. [CrossRef]
20. Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Manning, C.D.; Ng, A.Y.; Potts, C. Recursive deep models for semantic compositional-
ity over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing,
Seattle, WA, USA, 18–21 October 2013; pp. 1631–1642.
21. Zhou, P.; Qi, Z.; Zheng, S.; Xu, J.; Bao, H.; Xu, B. Text classification improved by integrating bidirectional LSTM with two-
dimensional max pooling. arXiv 2016, arXiv:1611.06639v1.
Big Data Cogn. Comput. 2024, 8, 111 19 of 19
22. Czarnek, G.; Stillwell, D. Two is better than one: Using a single emotion lexicon can lead to unreliable conclusions. PLoS ONE
2022, 17, e0275910. [CrossRef] [PubMed] [PubMed Central]
23. Barnes, J. Sentiment and Emotion Classification in Low-resource Settings. In Proceedings of the 13th Workshop on Computational
Approaches to Subjectivity, Sentiment, & Social Media Analysis, Toronto, ON, Canada, 14 July 2023; pp. 290–304.
24. Peng, B.; Lee, L.; Vaithyanathan, S. Thumbs us? Sentiment classification using machine learning techniques. In Proceedings of the Seventh
Conference on Empirical Methods in Natural Language Processing (EMNLP-02), Philadelphia, PA, USA, 6–7 July 2002; pp. 79–86.
25. Lliou, T.; Anagnostopoulos, C.N. Comparison of Different Classifiers for Emotion Recognition. In Proceedings of the 13th
Panhellenic IEEE Conference on Informatics, Corfu, Greece, 10–12 September 2009; Available online: http://ieeexplore.ieee.org/
document/5298878/ (accessed on 10 September 2016).
26. Badshah, A.M.; Ahmad, J.; Lee, M.Y.; Baik, S.W. Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC
and Random Forest. In Proceedings of the 2nd International Integrated Conference & Concert on Convergence, Saint Petersburg,
Russia, 7–14 August 2016; pp. 1–8.
27. Ghazi, D.; Inkpen, D.; Szpakowicz, S. Hierarchical versus Flat Classification of Emotions in Text. In Proceedings of the NAACL
HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, Los Angels, CA, USA, 5 June
2010; pp. 140–146.
28. Aman, S.; Szpakowicz, S. Identifying Expressions of Emotion in Text. In Text, Speech and Dialogue; Springer: Berlin/Heidelberg,
Germany, 2007; Volume 4629, pp. 196–205.
29. Chaffar, S.; Inkpen, D. Using a Heterogeneous Dataset for Emotion Analysis in Text. In Advances in Artificial Intelligence, Proceedings of
the 24th Canadian Conference on Artificial Intelligence, St. John’s, Canada, 25–27 May 2011; Springer: Berlin/Heidelberg, Germany, 2011.
30. Purver, M.; Battersby, S. Experimenting with distant supervision for emotion classification. In Proceedings of the 13th EACL.
Association for Computational Linguistics, Avignon, France, 23–27 April 2012; pp. 482–491.
31. Choudhury, M.D.; Gamon, M.; Counts, S.; Horvitz, E. Predicting depression via social media. In Proceedings of the International
AAAI Conference on Weblogs and Social Media (ICWSM’13), Cambridge, MA, USA, 8–11 July 2013.
32. Thelwall, M.; Buckley, K.; Platoglou, G.; Kappas, A. Sentiment strength detection in short informal text. J. Am. Soc. Inf. Sci.
Technol. 2010, 61, 2544–2558. [CrossRef]
33. Rohini, V.; Thomas, M. Comparison of Lexicon based and Naïve Bayes Classifier in Sentiment Analysis. Int. J. Sci. Res. Dev. 2015,
3, 1265–1269.
34. Mohammad, S.; Turney, P. Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion
Lexicon. In Proceedings of the NAACL-HLT 2010 Workshop on Computational Approaches to Analysis and Generation of
Emotion in Text, Los Angeles, CA, USA, 5 June 2010.
35. Mohammad, S. Emotional Tweets. In Proceedings of the First Joint Conference on Lexical and Computational Semantics, Montréal,
QC, Canada, 7–8 June 2012.
36. Srinivasan, S.; Sangwan, R.S.; Neill, C.J.; Ziu, T. Twitter data for predicting election results: Insights from emotion classification.
IEEE Technol. Soc. Mag. 2019, 38, 58–63. [CrossRef]
37. Zhaou, M.; Srinivasan, S.M.; Tripathi, A. Four-Class Emotion Classification Problem using Deep-Learning Classifiers. J. Big
Data-Theory Pract. (JBDTP) 2022, 1, 42–50. [CrossRef]
38. Srinivasan, S.M.; Chari, R.; Tripathi, A. Modelling and Visualizing Emotions in Twitter Feeds. Int. J. Data Min. Model. Manag.
2021, 13, 337–350.
39. Srinivasan, S.M.; Ramesh, P. Comparing different Classifiers and Feature Selection techniques for Emotion Classification. Int. J.
Soc. Syst. Sci. 2018, 10, 259–284.
40. Srinivasan, S.M.; Sangwan, R.S.; Neill, C.J.; Zu, T. Power of Predictive Analytics: Using Emotion Classification of Twitter Data for
Predicting 2016 US Presidential Elections. J. Soc. Media Soc. 2019, 8, 211–230.
41. Srinivasan, S.M. Predictive modeling and visualization of emotions in Twitter feeds. In Proceedings of the 42nd Annual Meeting
of Northeastern Association of Business, Economics and Technology, University Park, PA, USA, 7–8 November 2019.
42. Stanton, J. An Introduction to Data Science. 2013. Available online: https://ia804509.us.archive.org/35/items/DataScienceBookV3
/DataScienceBookV3.pdf (accessed on 10 June 2022).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.