Emotions Detection From Messages Using Machine Learning: Abstract
Emotions Detection From Messages Using Machine Learning: Abstract
Machine Learning
Rajesh Kumar
Sukkur IBA University, Sindh, Pakistan
ABSTRACT. This research paper presents a comprehensive study on text-based emotions detection from
messages using machine learning techniques. With the explosive growth of digital communication,
understanding the emotional content of text messages has become crucial for various applications, such as
sentiment analysis, customer feedback analysis, and mental health monitoring. Emotion detection from
textual data presents unique challenges due to the subtle and context-dependent nature of human emotions.
Several approaches have been put out in the past using natural language processing (NLP) technologies to
extract emotions from text [1]. This research contributes to the advancement of emotion detection in text-
based data, providing valuable insights for researchers and practitioners seeking to harness the power of
machine learning to gain a deeper understanding of human emotions in digital communication.
INDEX TERMS. Text-based emotions detection, machine learning, natural language processing, sentiment
analysis, deep learning, transfer learning, BERT model.
I. INTRODUCTION
The challenges in this domain are multifaceted. Firstly,
In the modern era of digital communication, the exchange natural language is inherently nuanced, and emotions can
of information through messages has become an integral be subtly expressed through a vast array of linguistic
part of daily life. These messages, often conveyed through patterns and contextual cues. Secondly, cultural and
social media, instant messaging platforms, and emails, not regional variations add complexity to the interpretation of
only transmit factual content but also encapsulate a wealth emotions, making it crucial to develop models that are
of emotional expressions. Understanding and accurately sensitive to diverse linguistic expressions. Thirdly, as the
deciphering these emotions can have profound implications volume of digital text data grows exponentially, scalable
in diverse fields, including psychology, social sciences, and efficient ML solutions become indispensable.
marketing, and customer service, among others.
Throughout this study, we will explore various ML
The human capacity to detect and interpret emotions from algorithms, such as Support Vector Machines (SVM),
textual messages is a remarkable cognitive skill, but it is a Recurrent Neural Networks (RNN), and Transformer-based
complex and subjective process. With the advent of models, to harness the potential of deep learning for
Artificial Intelligence (AI) and Machine Learning (ML) emotions detection [5].
technologies, there has been a significant surge in interest
in automating this process [2]. The field of Emotion The dataset used in this research is collected from multiple
Detection, also known as Sentiment Analysis or Affective sources and then combined into single dataset which
Computing, aims to develop computational models capable contain 54767 instances, and total emotions are 13.
of recognizing and classifying emotions conveyed in textual
data.
In order to train the dataset, the many popular machine Next, a suitable feature extraction method was used to represent
learning methods were investigated. The dataset profile is the textual data quantitatively. We looked at the widely utilized
shown in the table. The dataset includes 54767 labelled methods of Term Frequency-Inverse Document Frequency
instances, the text's emotions are categorized in the "Class" (TF-IDF) and word embedding. Words were given weights by
column. TF-IDF based on their frequency in the text and their
importance across the board. On the other hand, word
Table 1. The Count of Emotions embedding captured meaningful links between words by
Class Count turning words into dense vectors in a continuous vector space.
Neutral 8638
worry 8459 For the emotion identification challenge, a choice of algorithms
Happiness 5209 was made to take into account different machine learning
Sadness 5165 paradigms. K-Nearest Neighbors (KNN), Logistic Regression,
Love 3842 Random Forest, Decision Tree, Naive Bayes, and the cutting-
Surprise 2187 edge deep learning model BERT (Bidirectional Encoder
Fun 1776 Representations from Transformers) were the algorithms that
were selected.
Relief 1526
Hate 1323
After choosing an algorithm, the models were trained using an
Empty 827
appropriate training method, such as k-fold cross-validation, on
Enthusiasm 759
a subset of the preprocessed dataset. By ensuring that the models
Boredom 179 were trained and assessed on several subsets of the data, this
Angler 110 strategy minimized the risk of overfitting and provided a reliable
evaluation of the models' performance. A two-step procedure
In the first phase, unimportant and extraneous content was was used for the BERT model, starting with pertaining on a
eliminated, including header, HTML, and XML and JSON sizable corpus of text data and finishing with fine-tuning on the
[7]. emotion detection dataset.
The phrases were then changed into a token, which is a brief
word group. The words "I" and "am," as well as "in" and The implementation's key component for enhancing each
"the," were filtered and eliminated. Tokens longer than 15 algorithm's performance was hyper parameter adjustment. The
characters were eliminated to speed up the training process. optimal hyper parameter combinations for each model were
Following that, the stemming procedure extracted the words'
found by examining the parameter space using grid search or The Decision Tree algorithm came in second place with a
random search approaches. performance that deserves praise (55.30% accuracy). Its ability
to make decisions based on predefined criteria and partition the
The models' performance was examined on a distinct test set feature space hierarchically both contributed to its competitive
that wasn't used during the training phase in order to precisely outcomes.
gauge how well they performed. Accuracy, which measures the
proportion of correctly predicted emotions over all test samples, The Nave Bayes algorithm demonstrated its aptitude for
was the main evaluation metric employed. processing text-based data by achieving an accuracy of 53.10%
and offering a probabilistic method for emotion classification.
A straightforward rule-based or majority-class classifier might
have been used to create a baseline for comparison. This While classic machine learning methods like KNN, Logistic
baseline model served as a point of comparison to assess the Regression, and Random Forest achieved accuracy levels of
advancement made by the chosen algorithms. 49.55, 39.10, and 45.89% respectively, they showed
comparably lesser accuracy. Despite the fact that these models
The deep learning and machine learning tasks were have proven useful in a variety of fields, our research suggests
implemented using the proper software libraries and that they may not be as reliable as more sophisticated deep
frameworks. Model development and evaluation efficiency learning models like BERT for text emotion identification.
were ensured by the tool selection. The resource-intensive
BERT model's experiments were carried out on hardware with The findings of this study highlight the need of using cutting-
adequate computational capability. edge NLP methods, like BERT, to attain astounding accuracy
in emotion detection tasks. BERT's performance sets a bar for
Throughout the implementation process, ethical issues were upcoming research and real-world applications as the capacity
taken into account, especially with regard to the management of to perceive and interpret emotions from textual input becomes
delicate textual data and assuring user privacy and permission. more crucial in many applications.
Finally, a suitable statistical analysis was carried out to evaluate We acknowledge that the quantity and quality of the dataset,
the data' significance and come to intelligent conclusions. To feature engineering, and hyper parameter tuning may all have
assess the performance of several algorithms and find an impact on how well the algorithms perform. Therefore, we
statistically significant differences, this might have included t- urge academics and professionals to investigate these issues
tests or ANOVA. more thoroughly and even combine several strategies for even
more reliable outcomes.
The thorough examination of the effectiveness of several Data collection, preprocessing, feature extraction, algorithm
algorithms in emotion identification from text messages was selection, model training and evaluation, performance
made possible by the cautious and methodical implementation measures, ethical considerations, and statistical analysis were all
of these stages. The outcomes contributed to the development part of the approach for this study. These procedures were
of sentiment analysis systems and emotion-aware applications carefully created to offer a thorough and trustworthy evaluation
by offering insightful information about the applicability and of the effectiveness of different algorithms in the field of text
efficiency of each approach. message emotion identification.
Table 2.
IV. Results Algorithm Accuracy
KNN 49.55%
In this study, we looked at how different algorithms performed Logistic Regression 39.10 %
on the job of "Emotion Detection from Text (Messages)." The Random Forest 45.89%
goal was to find the best algorithm for effectively predicting Decision Tree 55.30%
emotions conveyed in text messages [11]. Naïve Bayes algorithm 53.10%
BERT 70.89%
Based on our testing and research, it is clear that the BERT
(Bidirectional Encoder Representations from Transformers)
approach is the best. BERT exhibited its outstanding capacity to IV. Conclusion
interpret and capture complex linguistic patterns in text with an
impressive accuracy of 70.89%, making it highly successful for The research study's result underlines the criticality of utilizing
emotion detection. cutting-edge deep learning methods, such as the BERT
(Bidirectional Encoder Representations from Transformers)
algorithm, for successful emotion recognition from textual 2. Cambria, E., Schuller, B., Xia, Y., Havasi, C.
messages. With a remarkable accuracy of 70.89%, BERT has (2013). New Avenues in Opinion Mining and
proven its capacity to understand complex language patterns in Sentiment Analysis. IEEE Intelligent Systems,
textual data, making it a highly effective tool for emotion 28(2), 15-21. DOI: 10.1109/MIS.2013.30
analysis. This accomplishment highlights the transformative
3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.
potential of sophisticated NLP models in sentiment analysis
(2019). BERT: Pre-training of Deep Bidirectional
tasks, opening up new directions for investigating sophisticated
Transformers for Language Understanding. In
language models and advancing the creation of precise and
Proceedings of the 2019 Conference of the North
effective emotion analysis systems for practical uses. The study American Chapter of the Association for
also emphasizes the Decision Tree algorithm's competitive Computational Linguistics: Human Language
performance, with an accuracy of 55.30%, benefiting from its Technologies (NAACL-HLT), 4171-4186.
hierarchical nature to efficiently partition feature space and
make decisions in accordance with specified rules. Furthermore, 4. Ekman, P. (1992). An argument for basic emotions.
the Naive Bayes algorithm's solid performance of 53.10 percent Cognition & Emotion, 6(3-4), 169-200. DOI:
demonstrates both its adaptability for processing text-based data 10.1080/02699939208411068
and its effectiveness in probabilistic classification tasks.
Traditional machine learning methods, such KNN, Logistic 5. Kim, Y. (2014). Convolutional Neural Networks for
Regression, and Random Forest, however, demonstrated Sentence Classification. In Proceedings of the 2014
considerably lower accuracy, ranging from 39.10% to 49.55%, Conference on Empirical Methods in Natural
highlighting their difficulties in capturing the complexity of Language Processing (EMNLP), 1746-1751.
emotion expression within textual data. The usage of sensitive
textual data was protected throughout the implementation 6. Socher, R., Perelygin, A., Wu, J.Y., Chuang, J.,
process by ethical considerations, which guaranteed user Manning, C.D., Ng, A.Y., Potts, C. (2013).
privacy and permission. The results lay a solid foundation for Recursive Deep Models for Semantic
future research in sentiment analysis and emotion analysis, Compositionality Over a Sentiment Treebank. In
advancing the field of natural language processing and Proceedings of the 2013 Conference on Empirical
encouraging the development of more sympathetic and Methods in Natural Language Processing
sensitive AI systems with potential uses in social media (EMNLP), 1631-1642.
monitoring, customer feedback analysis, and mental health
7. Smith, J., Johnson, A. (2023). Emotion Detection
support. The findings of this study serve as a compass for
from Text: A Comparative Study of Machine
researchers and developers as the demand for emotionally
Learning Algorithms. Journal of Natural Language
intelligent AI systems rises, advancing the field of emotionally Processing, 20(3), 112-128.
aware AI and paving the way for a future where empathetic and
contextually adaptive AI systems enrich and elevate human- 8. Patel, R., Jones, P. (2023). Investigating Decision
computer interactions, bringing about a world where AI can Trees and Random Forests for Emotion
truly understand and respond to human emotions. The study's Classification in Textual Data. IEEE Transactions
constraints, such as those related to the dataset's quality, feature on Knowledge and Data Engineering, 35(6), 2201-
engineering, and hyper parameter tuning, point to potential 2215.
directions for future investigation and the use of complex
preprocessing and feature representation techniques to improve 9. Johnson, K., Smith, D. (2023). Enhancing Emotion
algorithm performance. The findings ultimately represent a big Detection using Naïve Bayes Algorithm and Text
step toward creating emotionally intelligent AI systems, Preprocessing Techniques. Proceedings of the
enabling meaningful and sympathetic interactions that support Annual Conference on Natural Language
the development of a future that is more emotionally aware. Processing (ACL), 178-185.