Plajarism Check. Development of Bidirectional Long Short
Plajarism Check. Development of Bidirectional Long Short
INTRODUCTION
1
Enhancing the accuracy and robustness of emotion detection systems is the driving force behind
this research. In order to develop a model that can successfully understand and analyse human
emotions in a variety of contexts, this study aims to use the benefits of BiLSTM networks. The
findings of this study have the potential to advance the field of emotion detection, which would
increase the efficacy and versatility of systems in a wide range of applications.
2
It is expected that this new method will greatly improve the performance of emotion detection
systems, boosting the effectiveness of applications that rely on emotion detection. Improved
emotion identification could lead to more sympathetic and responsive systems in human-computer
interaction and improve patient monitoring and support in the healthcare industry. It can make
marketing campaigns more impactful and individualized.
Taking everything into account, the proposed BiLSTM-based model might enhance emotion
identification and contribute to the development of increasingly complex and human-like systems.
3
1.5 Scope of the Study
The scope of this study includes the development, implementation, and evaluation of a
Bidirectional Long Short-Term Memory (BiLSTM) network for emotion detection from textual
data. The research will involve collecting and preprocessing publicly available datasets containing
emotional content, designing a tailored BiLSTM model architecture, and optimizing its
performance through training and hyperparameter tuning. The model's effectiveness will be
assessed using standard evaluation metrics such as accuracy, precision, recall, and F1-score, with
comparisons to baseline models, including traditional machine learning methods and unidirectional
LSTM networks.
4
CHAPTER TWO
LITERATURE REVIEW
This chapter gives a thorough summary of the methods and body of research that have been done
on emotion detection, outlining the development and current status of the topic. The chapter opens
with an examination of the basic ideas and relevance of emotion detection, highlighting its use in a
range of fields like marketing, healthcare, and human-computer interaction. The limitations of
conventional methods for emotion recognition are then discussed, including rule-based systems
and classical machine learning techniques. After then, the topic shifts to the development of deep
learning techniques, with an emphasis on Long Short-Term Memory (LSTM) networks and their
benefits over conventional models.
5
assistants, for instance, can provide empathetic responses, increasing user satisfaction and
engagement. Furthermore, the application of emotion detection technology can enhance the user's
virtual reality and gaming experience by creating more immersive and engaging environments that
react quickly to the player's feelings.
2.1.2 Applications of Emotion Detection
Applications for emotion recognition are numerous and span many different fields, greatly
improving the calibre and efficacy of interactions and services. In order to develop more intuitive,
sympathetic, and effective systems, these applications take advantage of the capacity to identify
and react to human emotions (Abinaya & Vadivu, 2024).
Emotion detection is essential for creating sophisticated user interfaces and interactive systems in
human-computer interaction (HCI) (Kumar, Das & Singh, 2024). Systems like chatbots, virtual
assistants, and customer support platforms can provide more individualised and sympathetic
responses by comprehending and reacting to consumers' emotional states (Tang, Yuan & Zhang,
2024).
To increase user happiness and decrease frustration, a virtual assistant that recognises displeasure
in a user's voice, for example, can offer further assistance or refer the matter to a human agent. By
adapting the game environment or storyline to the player's emotional state, emotion detection in
virtual reality and gaming can produce more immersive experiences.
Medical care: Emotion detection is essential in the medical field for therapeutic interventions,
mental health evaluation, and patient monitoring. Early indications of mental health conditions like
stress, anxiety, and depression can be provided by systems that are able to identify emotional
changes (Saffar, Mann & Ofoghi, 2023). Continuous remote monitoring is made possible by
emotion detection technology built into telehealth platforms, which enable medical professionals
to act quickly when needed.
Francese and Attanasio, 2023. Additionally, these systems can support mental health treatments
and enhance the effectiveness of therapy sessions by offering real-time feedback and customised
recommendations based on patients' emotional reactions (Sajno et al., 2023).
Education: Emotion detection can enhance educational experiences by creating adaptable and
customised learning environments. By monitoring students' emotional states, teachers can
determine when they are frustrated, confused, or disengaged and adjust their teaching strategies
accordingly (Santos, 2023). This could lead to a more supportive learning environment, improved
learning outcomes, and more student engagement (Aggarwal, 2023). For example, an intelligent
6
tutoring system that detects a student's confusion may adjust the difficulty of the content or
provide further clarifications.
client service: By allowing systems to react more sympathetically to client requirements, emotion
detection can greatly enhance customer service. Emotion detection-enabled customer support
solutions can recognise when clients are angry or unsatisfied and give priority to those contacts for
a quicker resolution (Guo et al., 2024). According to Rane (2023), these technologies can also give
customer care representatives real-time information about the customer's emotional state, which
enables them to better customise their responses and raise customer satisfaction levels.
Social Media Analysis: Since it may be used to determine public opinion on a range of subjects,
occasions, and brands, emotion detection is also useful in social media analysis (Rodríguez-Ibánez
et al., 2023). Businesses and organisations can learn about public opinion and take preemptive
measures to address new trends and challenges by examining the emotional tone of social media
postings and comments (Hung & Alias, 2023). This can help guide community participation
initiatives, crisis management plans, and public relations campaigns.
2.1.3 Traditional Techniques for Emotion Detection
Conventional methods for detecting emotions have mostly depended on traditional machine
learning techniques and rule-based systems. Rule-based systems determine emotional states from
textual input by using specified linguistic rules and lexicons (García-Méndez, et al.,2023).
According to Kusal et al. (2023), these systems function by comparing input text to a
predetermined set of rules that are associated with particular emotions. terms like "sad" or "angry"
imply a negative mood, but terms like "happy" or "joyful" may reflect a positive emotional state.
Rule-based systems can offer rapid insights on the text's emotional content and are comparatively
easy to deploy (Denecke & Reichenpfader, 2023).
But since they are based on rigid rules, they are rigid and frequently unable to deal with the
complexities and subtleties of human emotions. Their inability to comprehend irony, context, and
the nuances of natural language results in imprecise or simplistic emotion recognition (Garg &
Saxena, 2024).
Text is classified into different emotional categories using models that are trained on labelled
datasets in traditional machine learning techniques for emotion detection. Naive Bayes classifiers,
decision trees, and support vector machines (SVM) are examples of common algorithms (Machová
et al., 2023). To represent the input text, these models usually use manually created characteristics
7
such part-of-speech tags, n-grams, and word frequency. By learning from data, traditional machine
learning techniques have outperformed rule-based systems, yet they nonetheless
encounter considerable obstacles (Cai, Li & Li, 2023). To choose the most pertinent features,
feature engineering is a laborious procedure that calls for domain knowledge (Li et al., 2023).
Furthermore, the complex and non-linear nature of human emotional expressions may not be
adequately captured by these models, which frequently presume linear connections between
features and emotions (Soujanya-Rao et al., 2023).
The incapacity of rule-based and conventional machine learning techniques to manage context and
long-term dependencies in text is one of their main intrinsic drawbacks (Hornyák, 2023). Accurate
emotion identification depends on capturing the context in which words and phrases are used, as
this frequently affects human emotions. Systems that rely on rules, with their static and In this
sense, predetermined rules are inevitably constrained (Cheng et al., 2024). Even though they are
more adaptable, classical machine learning models still have challenges effectively incorporating
contextual information, especially when working with longer texts or conversations.
Furthermore, both strategies struggle with adaptation and generalisation. To stay up with changing
language usage and new emotional expressions, rule-based systems need to be updated and
maintained frequently (Kusal et al., 2023). Conversely, traditional machine learning models may
experience overfitting, which occurs when they function well on training data but poorly on
unknown data (Aliferis & Simon, 2024). Since there can be a wide range of emotional
manifestations in various circumstances and people, this is especially troublesome when it comes
to emotion recognition.
2.1.4 Long Short-Term Memory (LSTM) Networks
Developments in deep learning techniques, especially the advent of Long Short-Term Memory
(LSTM) networks, have greatly influenced the progress of emotion detection. By utilising neural
networks that can learn from sequential data, LSTM networks constitute a paradigm change in
contrast to conventional rule-based systems and classical machine learning techniques (Zhou et al.,
2024). Because LSTM networks are so good at identifying patterns and relationships across time,
they are especially well-suited for applications like time series analysis and natural language
processing that require comprehending and forecasting event sequences (Gülmez, 2023).
The capacity of LSTM networks to manage the innate complexity and long-term interdependence
found in human emotional expressions is one of their main advantages (Mahadevaswamy &
Swathi, 2023). Contextual subtleties and the temporal evolution of emotions across longer texts or
8
conversations are frequently difficult for traditional models to handle. By keeping a memory cell
that can hold data for extended periods of time and updating and retaining data selectively
depending on the input sequence, LSTM networks overcome these difficulties (Aslan, 2024). This
feature enables more precise and complex emotion identification by enabling LSTM networks to
record correlations between words and phrases that extend over sentences or even entire
documents (Kumar et al., 2023).
Additionally, LSTM networks remove the requirement for manual feature engineering by
providing flexibility in learning representations straight from the data. Without direct human
assistance, LSTM networks may automatically learn hierarchical representations of text and adjust
to a variety of language patterns and emotional emotions (Cahuantzi, Chen & Güttel, 2023).
Because emotions can be communicated through a variety of cultural and linguistic idiosyncrasies,
this makes them more resilient and flexible across domains and languages (Olah, 2023).
When compared to conventional methods, the use of LSTM networks in emotion recognition has
shown notable gains in performance and accuracy. Learning from language's natural structure and
processing text as sequential data, LSTM networks are able to pick up on minute contextual clues
and Semantic relationships that support precise emotion categorisation (Zhao et al., 2021).
Understanding the sentiment and emotional tone of vast amounts of textual data is crucial for
corporate intelligence and decision-making, and this capability is especially useful in real-world
applications like sentiment analysis in social media.
2.1.5 Bidirectional Long Short-Term Memory (BiLSTM)
Bidirectional Long Short-Term Memory (BiLSTM) networks represent a substantial development
in sequential data processing, particularly in tasks requiring a thorough grasp of context and
dependencies throughout time, such as emotion detection (Bin et al., 2016). BiLSTM networks
enhance the capabilities of regular LSTM networks by processing input sequences in both forward
and backward directions simultaneously. A more comprehensive and nuanced representation of the
sequential data is produced by the model's ability to capture both past dependencies (from left to
right) and future dependencies (from right toleft) thanks to this bidirectional processing (Elfaik &
Nfaoui, 2020).
9
Figure 2. 1: Architecture of A BiLSTM Network Consists
Figure 2.1 shows the two LSTM layers that make up a BiLSTM network's architecture, each of
which processes the input sequence in a different way. While the backward LSTM layer processes
the sequence in reverse order, the forward LSTM layer reads the input sequence from start to finish
(Sherstinsky, 2020). Each LSTM layer contains memory cells that maintain states over time and
gates that control the flow of information through the network. These memory cells enable
BiLSTM networks to capture long-term dependencies in the data by selectively remembering and
forgetting information based on the input sequence (DiPietro & Hager, 2020).
During the training phase, the outputs of both the forward and backward LSTM layers are
concatenated at each time step or pooled together to form a combined representation of the input
sequence (Van-Houdt, Mosquera & Nápoles, 2020). This combined representation integrates
information from both directions, effectively enhancing the model's ability to understand context
and capture semantic relationships within the data. The bidirectional aspect of BiLSTM networks
is particularly helpful in applications where the meaning of a sequence is impacted by surrounding
context, such as in natural language processing and sentiment analysis (Ahmed et al., 2021).
BiLSTM networks are well-suited for emotion detection tasks because they excel in capturing
contextual information that is crucial for interpreting emotional expressions accurately (El-Amir et
al., 2020). By processing text in both directions, BiLSTM networks can consider the context from
preceding and subsequent words or phrases, thereby improving the model's sensitivity to the
nuanced ways in which emotions are expressed in language (Latif et al., 2020). This capability
10
makes BiLSTM networks highly effective in scenarios where emotions may be subtly conveyed
through complex interactions between words and sentences.
11
emotional nuances and enhance the accuracy of analysis. BiLSTM technique, which yielded a
remarkable F1 score of 97.32% after 100 training iterations, also significantly reduces function
loss to 1.33%, minimising its negative impact on the quality of emotion detection.
The study evaluates twenty-one conventional machine learning, ensemble learning, and deep
learning models to identify five emotional states (anger, fear, disgust, happiness, and fear).
Oyebode et al. (2023) acknowledge the detrimental effects of negative emotions like anger, fear,
and sadness on physiological functioning and resistance, and they also highlight the benefits of
positive emotions like happiness on both physical and mental well-being. The researchers' dual
objectives are (1) to develop emotion detection models using machine learning (ML) and deep
learning techniques trained on datasets that accurately represent real-life experiences, and (2) to
create an emotion-adaptive mobile health (mHealth) application that offers interventions to
manage difficult situations, promote positive emotions, and encourage behavioural modifications
for improved mental well-being. and sadness) from diary entries, with a primary focus on the first
goal. The study's next step is to integrate the MCBiLSTM model, which combines three
Convolutional Neural Network (CNN) channels with a Bidirectional Long Short-Term Memory
(BiLSTM) network, to deliver timely, emotion-based therapeutic interventions to enhance mental
health and overall well-being. According to Omarov & Zhumanov (2023), emotion analysis is
important in a number of fields, such as sentiment analysis, customer feedback monitoring, and
mental health evaluation. The paper evaluates traditional machine learning and deep learning
methods, highlighting their limitations in capturing complex and extensive relationships in text.
using the Bi-LSTM (bidirectional long short-term memory) paradigm. By using the power of
recurrent neural networks (RNNs) to the analysis of past and future textual contexts, this Bi-LSTM
model provides a more thorough understanding of emotional content. The forward and backward
LSTM layers are integrated to help the model effectively learn semantic word representations and
their links. Because it emphasises the importance of individual words inside a phrase, the model's
interpretability and performance are enhanced by adding an extra attention mechanism. The
suggested Bi-LSTM model beat numerous state-of-the-art baseline techniques, including support
vector machines, naive Bayes, CNNs, and conventional LSTMs, according to extensive testing
conducted on the Kaggle Emotion Sensor dataset. However, the sole reliance of the model on
textual The information emphasises the necessity of using voice-based detection to increase the
effectiveness and precision of emotion recognition.
In order to identify emotions in textual data, Cahyani et al. (2022) investigated the use of
12
Convolutional Neural Networks (CNN) in conjunction with Bidirectional Long Short-Term
Memory (BiLSTM) networks. The purpose of the study was to discuss the significance of
identifying feelings conveyed through written language on social media. The efficacy of
Word2Vec and GloVe word embeddings in the suggested CNN-BiLSTM architecture was
assessed on a number of datasets, including a combined dataset, commuter line, and trans-Jakarta.
Two scenarios were used in the study's trials: scenario I, which separates students with and without
emotion, and scenario II, which classifies feelings into five groups: surprise, fear, anger, sadness,
and happiness. In terms of accuracy, precision, recall, and F1-measure, the Word2Vec-CNN-
BiLSTM model continuously beats the GloVe-CNN-BiLSTM model in all datasets and scenarios,
according to the study's findings. On commuter line data, the Word2Vec-CNN-BiLSTM model
obtained accuracy rates of 84.34%, 83.73%, and 83.88%, respectively, on the combined dataset
and Transjakarta data. With respect to earlier models like Word2Vec-BiLSTM, these findings
show notable advancements. In order to improve the model's effectiveness, the study highlights the
need for more accuracy improvements and distinctive features, like incorporating text-to-speech-
based emotion recognition, even though its results are encouraging.
13
signs make up the complicated raw speech signals (Cai, Li & Li, 2023). It is necessary to process
these signals in order to extract pertinent features in order to recognise emotions. According to Al-
Dujaili and Ebrahimi-Moghadam (2023), time-domain speech inputs are transformed into
frequency-domain representations using techniques like the Fourier Transform, which reveals
patterns associated with various emotions. The power spectrum of a speech signal is captured by
Mel-Frequency Cepstral Coefficients (MFCC), another signal processing technique that is very
useful for simulating the human auditory system. way of seeing things (Sidhu, Latib & Sidhu,
2024). Through the extraction of variables such as pitch, energy, and spectral properties, the
processed speech signals yield a rich set of information that may be used to reliably classify and
forecast emotional states using machine learning models such BiLSTM networks (Basak et al.,
2023). To create reliable emotion recognition systems, this conversion from raw data to feature set
is necessary.
14
2023). A more complex depiction of emotional states is made possible by this approach, which
was put out by scholars such as James Russell. Excitation and anger, for instance, may both be
highly aroused but have different valences (Reitsema et al., 2023). It is feasible to record the
minute differences and intensities of emotional experiences by mapping emotions on various
dimensions. When it comes to applications that demand a thorough comprehension of emotional
dynamics, such emotive research on user experience and computing (Faul, Baumann & LaBar,
2023).
Models for emotion detection are developed using the foundational framework provided by
emotion theory. Discrete emotion theory is used in speech emotion recognition to classify speech
signals into fundamental emotional categories (Garcia-Garcia et al., 2023). By identifying the
acoustic and prosodic cues that correspond to these emotions, feature extraction techniques allow
the model to identify patterns that are specific to each category (Wagner et al., 2024). However,
dimensional emotion theory guides the creation of models that map speech characteristics onto
valence and arousal dimensions, thereby capturing a wider variety of emotional states. This makes
it possible to analyse emotional expressions in a more flexible and thorough way, which improves
the model's capacity to identify minute emotional nuances.
2.4 Research Gaps
Table 2.1 shows that the use of Bidirectional Long Short-Term Memory (BiLSTM) models for
emotion detection has advanced significantly. Still, there are a few areas that require further
research. In order to improve the precision and efficacy of emotion detection systems, existing
research mostly focusses on the analysis of written data, ignoring the potential to incorporate
multiple data sources, such as auditory or physiological signals. Furthermore, even though
BiLSTM has shown promise in capturing temporal correlations, its efficiency and interpretability
need to be improved, especially for real-time applications. Additionally, more study is required to
examine how well BiLSTM models manage large datasets and their ability to adjust to a variety of
emotional exhibits in several cultural contexts. Thus, the goal of this research is to close the gaps
by developing a model that uses BiLSTM to identify emotions. Multiple input types will be
incorporated into this model, which will also increase accuracy in a range of emotional contexts
and improve interpretability, efficiency, and scalability. Because of this, it will help develop
affective computing and its useful applications.
15
Table 2. 1: Some Recent Literature Reviewed
S/ Author/Year Methodology Findings Established Gap(s)
N
1 Omarov & Bi-LSTM model for text Bi-LSTM Reliance on textual
Zhumanov emotion detection outperforms content alone; need
(2023). traditional ML and for integration of
DL methods in voice-based
emotion detection detection to enhance
from text; F1 score model efficiency.
improvements with
attention
mechanism.
2 Cahyani et al. CNN-BiLSTM model Word2Vec-CNN- Requirement for
(2022) with Word2Vec and BiLSTM further accuracy
GloVe embeddings for outperforms GloVe- improvements and
text emotion detection CNN-BiLSTM in introduction of text-
both binary and to-speech features
multi-class emotion for enhanced
classification performance.
scenarios.
3 Miao (2023) BiLSTM model for Achieved F1 score Inability to capture
emotion analysis of of 97.32% and complex, multi-
social network users reduced function layered emotions;
loss to 1.33% after need for deeper
100 training emotional analysis
iterations. to enhance
precision.
4 Oğuz, Alkan, & BiLSTM, SVM, and BiLSTM achieved Need for novel
16
Schöler (2023) feedforward neural 78.28% accuracy features and
networks for emotion for valence and improvement in
detection from ECG 83.61% for arousal accuracy, including
signals categories, emotion detection
outperforming other from audio
algorithms. recordings.
5 Oyebode et al. Multichannel CNN- MCBiLSTM model Integration of the
(2023) BiLSTM model for achieved an F1- model into mHealth
detecting emotions from score of 81.1%, applications for real-
journal entries outperforming 21 time, personalized
classical ML and emotion-based
DL models. therapeutic
interventions.
6. Troiano, E., Dimensional Modeling assessment if an Method adopted
Oberländer, L., of Emotions in Text with event is novel, and were corpus
& Klinger, R. Appraisal Theories explain which creation, annotation
(2023) emotions are reliability and
developed based on prediction.
an event,
7. A review on emotion
detection by using deep
learning techniques
CHPATER THREE
RESEARCH METHODOLOGY
20
3.1.1 Dataset Acquisition
Speech samples from trustworthy sources reflecting a range of emotions are collected during the
dataset acquisition phase. Finding and choosing databases with pre-labeled emotional speech
recordings, such as RAVDESS or IEMOCAP, is necessary for this. To address particular
emotional states or situations that are under-represented in current datasets, bespoke datasets may
also be produced or added to. The objective is to compile a thorough, well-balanced dataset that
ensures diversity and representativeness by capturing a wide range of emotional expressions. To
increase the dataset's size and improve the model's generalisability and robustness, data
augmentation techniques could be used to introduce variances like noise or speed fluctuations.
3.1.2 Data Preprocessing
To get ready for feature extraction, the raw voice data is cleaned and transformed during
preprocessing. In order to improve the quality of the voice signals, noise reduction algorithms are
used to eliminate artefacts and background noise from the recordings. After that, the audio is
normalised to guarantee consistency by standardising the volume levels across samples. The
purpose of segmentation is to divide continuous speech into more comprehensible, smaller frames
or segments. If required, the data is also transformed to a consistent sample rate, and silence
removal may be used to highlight the speech's vocal portions. This stage guarantees that the input
data is consistent, clean, and in a feature extraction-ready format.
Meaningful characteristics are extracted from the preprocessed speech signals to be used as inputs
for the BiLSTM model in the feature extraction step. Prosodic features, such as pitch, energy, and
speech rate, are extracted to represent the emotional aspects of speech patterns, and acoustic
features, such as MFCCs, spectral contrast, and formants, are computed to capture the frequency
and spectral properties of the speech. These features are then combined into a feature vector for
each speech segment. The extraction process converts raw audio data into a structured format that
captures the emotional cues required for the model to classify the speech effectively.
21
The BiLSTM architecture processes the feature vectors in both forward and backward directions,
allowing it to capture temporal dependencies and contextual information in the speech data. The
model learns to identify patterns and relationships between the features and the corresponding
emotional labels through iterative training, where hyperparameters like learning rate, batch size,
and the number of BiLSTM layers are tuned to optimise the model’s performance. Applying the
BiLSTM model entails training the model on the extracted features to accurately recognise
emotions. Once the model is trained, it can predict emotions based on the input features.
A separate test dataset is used to evaluate the effectiveness of the trained BiLSTM model, and
evaluation metrics such as accuracy, precision, recall, and F1-score are computed by comparing
the model's predictions with the actual emotional labels. These metrics give information about the
model's overall performance and its capacity to accurately identify particular emotional classes,
and confusion matrices are frequently used to visualise the types of errors the model makes, such
as confusion between similar emotions. This step highlights the model's strengths and areas for
improvement, directing further improvements.
3.1.6 Report
A thorough record of the study's findings and contributions to the field is provided by the report
phase, which documents the entire process and results of the study. It includes a detailed
explanation of the methodology, from dataset acquisition and preprocessing to feature extraction,
model application, and performance evaluation. The report gives the results of the evaluation
metrics and contrasts the proposed BiLSTM-based system with other emotion detection
approaches. It also discusses the
There are 535 utterances in the database overall. The seven emotions listed in the EMODB
database are: 1) anger; 2) boredom; 3) worry; 4) happiness; 5) regret; 6) disgust; and 7) neutral.
The data was down-sampled to 16 kHz after being collected at a 48 kHz sampling rate.
22
From the Kaggle website, a secondary EMODB dataset will be obtained. The German emotional
database is openly accessible and is called EMODB. Berlin, Germany's Institute of
Communication Science at the Technical University is the creator of the database. Data was
recorded by ten professional speakers, five of whom were men and five of whom were women.
Where ⃗
ht and h́t are the hidden states of the forward and backward LSTM layers, respectively.
Concatenation of Hidden States
The hidden states from both directions are concatenated to form a combined representation ht for
each time step t :
23
ht =¿ ; h́t ] (3.5)
Emotion Prediction
During inference, the trained BiLSTM model predicts the emotion class for each time step t by
selecting the class with the highest probability:
t^ =arg max ^
y t , c (3.8)
c
By following this mathematical framework, the study aims to develop a BiLSTM-based model
capable of accurately detecting emotions from speech signals. The model leverages the strengths
of bidirectional processing to capture contextual information and long-term dependencies, thereby
enhancing the accuracy and robustness of emotion recognition.
F-Measure Metric
25
A statistic known as the "F-measure" can be created by combining the two metrics of precision and
recall. The smaller of the two numbers, "x" and "y," is near the harmonic mean of the two.
Accordingly, the high F-measure value guarantees that the Precision and Recall values are both
fairly high (Tarun, 2020).
r +p 2 * TP + FP + FN
Mean_absolute_error Metric
The MAE is rather easy to calculate. To calculate the "total error," the magnitudes (absolute
values) of the errors are added up, and the total error is then divided by n (Res et al., 2005).
Consequently, a model with a lower mean-absolute-error number was thought to perform better,
whereas a model with a higher MAE value performed worse.
Metric of True Positive Rate
It measures the fraction of positive examples correctly predicted by the classifier. To understand it
with an example, let’s say we are trying to search for documents that contain the term ‘machine
learning’ in a corpus of 100 documents. The number of relevant documents for ‘machine learning’
are 20 out of the 100. The model gives us 12 documents when queried for the term ‘machine
learning’ and obtaining fifteen documents. According to Tarun (2020), the recall is recall = 12 / 20
= 60%.
Sensitivity = TP Rate = TP / (TP+FN) = Recall
26
REFERENCES
Abinaya, M., & Vadivu, G. (2024). Enhancing the Potential of Machine Learning for Immersive
Emotion Recognition in Virtual Environment. EAI Endorsed Transactions on Scalable
Information Systems.
Ahmed, S., Saif, A. S., Hanif, M. I., Shakil, M. M. N., Jaman, M. M., Haque, M. M. U., ... &
Sabbir, H. M. (2021). Att-BiL-SL: Attention-based Bi-LSTM and sequential LSTM for
describing video in the textual formation. Applied sciences, 12(1), 317.
Al Maruf, A., Khanam, F., Haque, M. M., Jiyad, Z. M., Mridha, F., & Aung, Z. (2024). Challenges
and Opportunities of Text-based Emotion Detection: A Survey. IEEE Access.
Aliferis, C., & Simon, G. (2024). Overfitting, Underfitting and General Model Overconfidence and
Under-Performance Pitfalls and Best Practices in Machine Learning and AI. In Artificial
Intelligence and Machine Learning in Health Care and Medical Sciences: Best Practices and
Pitfalls (pp. 477-524). Cham: Springer International Publishing.
Alqahtani, G., & Alothaim, A. (2022). Predicting emotions in online social networks: challenges
and opportunities. Multimedia Tools and Applications, 81(7), 9567-9605.
Al-Qerem, A., Raja, M., Taqatqa, S., & Sara, M. R. A. (2024). Utilizing Deep Learning Models
(RNN, LSTM, CNN-LSTM, and Bi-LSTM) for Arabic Text Classification. In Artificial
27
Intelligence-Augmented Digital Twins: Transforming Industrial Operations for Innovation
and Sustainability (pp. 287-301). Cham: Springer Nature Switzerland.
Alslaity, A., & Orji, R. (2024). Machine learning techniques for emotion detection and sentiment
analysis: current state, challenges, and future directions. Behaviour & Information
Technology, 43(1), 139-164.
Basak, S., Agrawal, H., Jena, S., Gite, S., Bachute, M., Pradhan, B., & Assiri, M. (2023).
Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech
Signal Processing Algorithms, Tools and Systems. CMES-Computer Modeling in
Engineering & Sciences, 135(2).
Bin, Y., Yang, Y., Shen, F., Xu, X., & Shen, H. T. (2016, October). Bidirectional long-short term
memory for video description. In Proceedings of the 24th ACM international conference on
Multimedia (pp. 436-440).
Brandt, A. (2023). Noise and vibration analysis: signal analysis and experimental procedures.
John Wiley & Sons.
Cahuantzi, R., Chen, X., & Güttel, S. (2023, July). A comparison of LSTM and GRU networks for
learning symbolic sequences. In Science and Information Conference (pp. 771-785). Cham:
Springer Nature Switzerland.
Cahyani, D. E., Wibawa, A. P., Prasetya, D. D., Gumilar, L., Akhbar, F., & Triyulinar, E. R.
(2022, October). Text-Based Emotion Detection using CNN-BiLSTM. In 2022 4th
International Conference on Cybernetics and Intelligent System (ICORIS) (pp. 1-5). IEEE.
Cai, Y., Li, X., & Li, J. (2023). Emotion recognition using different sensors, emotion models,
methods and datasets: A comprehensive review. Sensors, 23(5), 2455.
Cheng, Y., Zhang, C., Zhang, Z., Meng, X., Hong, S., Li, W., ... & He, X. (2024). Exploring large
language model based intelligent agents: Definitions, methods, and prospects. arXiv preprint
arXiv:2401.03428.
28
Denecke, K., & Reichenpfader, D. (2023). Sentiment analysis of clinical narratives: a scoping
review. Journal of Biomedical Informatics, 140, 104336.
DiPietro, R., & Hager, G. D. (2020). Deep learning: RNNs and LSTM. In Handbook of medical
image computing and computer assisted intervention (pp. 503-519). Academic Press.
El-Amir, H., Hamdy, M., El-Amir, H., & Hamdy, M. (2020). Sequential models. Deep Learning
Pipeline: Building a Deep Learning Model with TensorFlow, 415-446.
Elfaik, H., & Nfaoui, E. H. (2020). Deep bidirectional LSTM network learning-based sentiment
analysis for Arabic text. Journal of Intelligent Systems, 30(1), 395-412.
Faul, L., Baumann, M. G., & LaBar, K. S. (2023). The representation of emotional experience
from imagined scenarios. Emotion, 23(6), 1670.
Francese, R., & Attanasio, P. (2023). Emotion detection for supporting depression
screening. Multimedia Tools and Applications, 82(9), 12771-12795.
Frye, R. H. (2022). Granular Emotion Detection for Multi-Class Sentiment Analysis in Social
Media (Doctoral dissertation, The University of North Carolina at Charlotte).
Garcia-Garcia, J. M., Lozano, M. D., Penichet, V. M., & Law, E. L. C. (2023). Building a three-
level multimodal emotion recognition framework. Multimedia Tools and Applications, 82(1),
239-269.
Garg, M., & Saxena, C. (2024). Emotion detection from text data using machine learning for
human behavior analysis. In Computational Intelligence Methods for Sentiment Analysis in
Natural Language Processing Applications (pp. 129-144). Morgan Kaufmann.
Gideon, J., Nguyen, H. T., & Burgt, S. V. (2020). Emotion recognition from short speech using
ensemble deep learning. Applied Sciences, 10(16), 5663.
https://doi.org/10.3390/app10165663
29
Gozuacik, N., Sakar, C. O., & Ozcan, S. (2023). Technological forecasting based on estimation of
word embedding matrix using LSTM networks. Technological Forecasting and Social
Change, 191, 122520.
Gülmez, B. (2023). Stock price prediction with optimized deep LSTM network with artificial
rabbits optimization algorithm. Expert Systems with Applications, 227, 120346.
Guo, R., Guo, H., Wang, L., Chen, M., Yang, D., & Li, B. (2024). Development and application of
emotion recognition technology—a systematic literature review. BMC psychology, 12(1), 95.
Guo, Y., Li, Y., Liu, D., & Xu, S. X. (2024). Measuring service quality based on customer
emotion: An explainable AI approach. Decision Support Systems, 176, 114051.
Han, K., & Kim, D. (2021). Emotion recognition in conversations using acoustic and linguistic
features. Applied Sciences, 11(6), 2542. https://doi.org/10.3390/app11062542
Hassan, N., Miah, A. S. M., & Shin, J. (2024). A Deep Bidirectional LSTM Model Enhanced by
Transfer-Learning-Based Feature Extraction for Dynamic Human Activity
Recognition. Applied Sciences, 14(2), 603.
Hosseini, S., Yamaghani, M. R., & Poorzaker Arabani, S. (2024). A review of the methods of
recognition multimodal emotions in sound, image and text. International Journal of Applied
Operational Research-An Open Access Journal, 12(1), 29-41.
Hung, L. P., & Alias, S. (2023). Beyond sentiment analysis: A review of recent trends in text based
sentiment analysis and emotion detection. Journal of Advanced Computational Intelligence
and Intelligent Informatics, 27(1), 84-95.
Jose, J., & Simritha, R. (2024). Sentiment Analysis and Topic Classification with LSTM Networks
and TextRazor. International Journal of Data Informatics and Intelligent Computing, 3(2),
42-51.
30
Kumar, A., Bhatia, A., Kashyap, A., & Kumar, M. (2023). LSTM network: a deep learning
approach and applications. In Advanced Applications of NLP and Deep Learning in Social
Media Data (pp. 130-150). IGI Global.
Kumar, G., Das, T., & Singh, K. (2024). Early detection of depression through facial expression
recognition and electroencephalogram-based artificial intelligence-assisted graphical user
interface. Neural Computing and Applications, 1-18.
Kusal, S., Patil, S., Choudrie, J., Kotecha, K., Vora, D., & Pappas, I. (2023). A systematic review
of applications of natural language processing and future challenges with special emphasis in
text-based emotion detection. Artificial Intelligence Review, 56(12), 15129-15215.
Latif, S., Ali, H. S., Usama, M., Rana, R., Schuller, B., & Qadir, J. (2022). Ai-based emotion
recognition: Promise, peril, and prescriptions for prosocial path. arXiv preprint
arXiv:2211.07290.
Latif, S., Bashir, S., Agha, M. M. A., & Latif, R. (2020). Backward-forward sequence generative
network for multiple lexical constraints. In Artificial Intelligence Applications and
Innovations: 16th IFIP WG 12.5 International Conference, AIAI 2020, Neos Marmaras,
Greece, June 5–7, 2020, Proceedings, Part II 16 (pp. 39-50). Springer International
Publishing.
Lee, J. Y., & Kim, J. H. (2019). Deep neural network-based emotion recognition system using a
bidirectional long short-term memory model. Electronics, 8(11), 1212.
https://doi.org/10.3390/electronics8111212
Leus, G., Marques, A. G., Moura, J. M., Ortega, A., & Shuman, D. I. (2023). Graph Signal
Processing: History, development, impact, and outlook. IEEE Signal Processing
Magazine, 40(4), 49-60.
Li, L., Wang, H., Zha, L., Huang, Q., Wu, S., Chen, G., & Zhao, J. (2023). Learning a data-driven
policy network for pre-training automated feature engineering. In The Eleventh International
Conference on Learning Representations.
Ma, L., Li, X., & Xue, L. (2018). Deep recurrent neural networks for emotion recognition from
speech. IEEE Transactions on Multimedia, 20(11), 3134-3142.
https://doi.org/10.1109/TMM.2018.2792158
31
Machová, K., Szabóova, M., Paralič, J., & Mičko, J. (2023). Detection of emotion by text analysis
using machine learning. Frontiers in Psychology, 14, 1190326.
Mahadevaswamy, U. B., & Swathi, P. (2023). Sentiment analysis using bidirectional LSTM
network. Procedia Computer Science, 218, 45-56.
Manalu, H. V., & Rifai, A. P. (2024). Detection of human emotions through facial expressions
using hybrid convolutional neural network-recurrent neural network algorithm. Intelligent
Systems with Applications, 200339.
Miao, R. (2023). Emotion Analysis and Opinion Monitoring of Social Network Users Under Deep
Convolutional Neural Network. Journal of Global Information Management (JGIM), 31(1), 1-
12.
Mortillaro, M., & Schlegel, K. (2023). Embracing the emotion in emotional intelligence
measurement: Insights from emotion theory and research. Journal of Intelligence, 11(11),
210.
Oğuz, F. E., Alkan, A., & Schöler, T. (2023). Emotion detection from ECG signals with different
learning algorithms and automated feature engineering. Signal, Image and Video Processing,
17(7), 3783-3791.
Omarov, B., & Zhumanov, Z. (2023). Bidirectional long-short-term memory with attention
mechanism for emotion analysis in textual content. International Journal of Advanced
Computer Science and Applications, 14(6).
Oyebode, O., Ogubuike, R., Daniel, D., & Orji, R. (2023, September). Emotion Detection from
Real-Life Situations Based on Journal Entries Using Machine Learning and Deep Learning
Techniques. In Proceedings of SAI Intelligent Systems Conference (pp. 477-502). Cham:
Springer Nature Switzerland.
32
Radke, R. J. (2024). A Signal Processor Teaches Generative Artificial Intelligence [SP
Education]. IEEE Signal Processing Magazine, 41(2), 6-10.
Rai, M., & Pandey, J. K. (Eds.). (2024). Using Machine Learning to Detect Emotions and Predict
Human Psychology. IGI Global.
Rane, N. (2023). Enhancing customer loyalty through Artificial Intelligence (AI), Internet of
Things (IoT), and Big Data technologies: improving customer satisfaction, engagement,
relationship, and experience. Internet of Things (IoT), and Big Data Technologies: Improving
Customer Satisfaction, Engagement, Relationship, and Experience (October 13, 2023).
Reitsema, A. M., Jeronimus, B. F., van Dijk, M., Ceulemans, E., van Roekel, E., Kuppens, P., &
de Jonge, P. (2023). Distinguishing dimensions of emotion dynamics across 12 emotions in
adolescents’ daily lives. Emotion, 23(6), 1549.
Rushan, R. R., Hossain, S., Shovon, S. S., & Rahman, M. A. (2024). Emotion detection for Bangla
language (Doctoral dissertation, Brac University).
Saffar, A. H., Mann, T. K., & Ofoghi, B. (2023). Textual emotion detection in health: Advances
and applications. Journal of Biomedical Informatics, 137, 104258.
Saha, G., Sharma, S., & Sircar, S. (2021). Emotion recognition using speech: A comprehensive
survey. IEEE Transactions on Affective Computing. Advance online publication.
https://doi.org/10.1109/TAFFC.2021.3061559
Sajno, E., Bartolotta, S., Tuena, C., Cipresso, P., Pedroli, E., & Riva, G. (2023). Machine learning
in biosignals processing for mental health: A narrative review. Frontiers in Psychology, 13,
1066317.
Santos, O. C. (2023). Beyond cognitive and affective issues: Designing smart learning
environments for psychomotor personalized learning. In Learning, Design, and Technology:
An International Compendium of Theory, Research, Practice, and Policy (pp. 3309-3332).
Cham: Springer International Publishing.
33
Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term
memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306.
Shiota, M. N. (2024). Basic and Discrete Emotion Theories. Routlegde Handbook of Emotion
Theory, London: Routledge.
Sidhu, M. S., Latib, N. A. A., & Sidhu, K. K. (2024). MFCC in audio signal processing for voice
disorder: a review. Multimedia Tools and Applications, 1-21.
Slovak, P., Antle, A., Theofanopoulou, N., Daudén Roquet, C., Gross, J., & Isbister, K. (2023).
Designing for emotion regulation interventions: an agenda for HCI theory and research. ACM
Transactions on Computer-Human Interaction, 30(1), 1-51.
Soujanya Rao, M., Coombs, T., Binti Mohamad, N., Kumar, V., & Jayabalan, M. (2023,
December). Comparative Analysis of Emotion Recognition Using Large Language Models
and Conventional Machine Learning. In The International Conference on Data Science and
Emerging Technologies (pp. 211-220). Singapore: Springer Nature Singapore.
Tang, L., Yuan, P., & Zhang, D. (2024). Emotional experience during human-computer
interaction: a survey. International Journal of Human–Computer Interaction, 40(8), 1845-
1855.
Tathgir, A., Sharma, C. M., & Chariar, V. M. (2024). EEG-based Emotion Classification using
Deep Learning: Approaches, Trends and Bibliometrics. Qeios
Troiano, E., Oberländer, L., & Klinger, R. (2023). Dimensional modeling of emotions in text with
appraisal theories: Corpus creation, annotation reliability, and prediction. Computational
Linguistics, 49(1), 1-72.]
Van Houdt, G., Mosquera, C., & Nápoles, G. (2020). A review on the long short-term memory
model. Artificial Intelligence Review, 53(8), 5929-5955.
Wagner, N., Mätzler, F., Vossberg, S. R., Schneider, H., Pavlitska, S., & Zöllner, J. M. (2024).
CAGE: Circumplex Affect Guided Expression Inference. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (pp. 4683-4692).
34
Zhang, W. (2023). Teaching Reform of Digital Signal Processing Driven by Probabilistic Neural
Network. Advances in Education, Humanities and Social Science Research, 8(1), 30-30.
Zhao, Y., Wang, G., Tang, C., Luo, C., Zeng, W., & Zha, Z. J. (2021). A battle of network
structures: An empirical study of cnn, transformer, and mlp. arXiv preprint
arXiv:2108.13002.
Zhou, Y., Xu, C. C., Song, M., Wong, Y. K., & Du, K. (2024). A Novel Quantum LSTM
Network. arXiv preprint arXiv:2406.08982.
35