0% found this document useful (0 votes)
39 views

Plajarism Check. Development of Bidirectional Long Short

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Plajarism Check. Development of Bidirectional Long Short

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 35

CHAPTER ONE

INTRODUCTION

1.1 Background to the Study


In a number of domains, such as social media analysis, marketing, healthcare, and human-
computer interface, emotion detection has become essential. Precisely recognising and
comprehending human emotions can greatly improve patient care, user experience, marketing
tactics, and offer profound comprehension of social dynamics (Tathgir, Sharma & Chariar, 2024).
Although they have had some success, traditional techniques for identifying emotions, like rule-
based systems and classical machine learning algorithms, frequently struggle to handle the
complexities and subtleties of human emotions (Kusal et al., 2022). These approaches usually use
linear models and manually created features, which might not adequately represent the complex
relationships and patterns found in emotional data (Al Maruf et al., 2024; Alqahtani & Alothaim,
2022).
The subject of emotion recognition has seen a significant transformation in recent years because to
deep learning techniques. Because of their capacity to effectively capture contextual information
and long-term dependencies in sequential data, Long Short-Term Memory (LSTM) networks have
attracted a lot of interest among the available options (Latif et al., 2022; Qian, 2023).
Because LSTM networks receive information in a unidirectional fashion, they have limited ability
to grasp context from both past and future conditions (Gozuacik, Sakar & Ozcan, 2023). This
restriction is especially apparent in emotion detection since the context can have a significant
impact on how an expression or statement is understood (Jose & Simritha, 2024).
BiLSTM networks present a compelling answer to this problem. By using bidirectional data
processing, BiLSTM networks are able to gather context from both previous and subsequent stages
(Hassan, Miah & Shin, 2024). This makes it easier to comprehend the data in its whole. Since the
interaction between several environmental cues may be crucial for effective interpretation, the
bidirectional technique is especially beneficial in emotion detection (Al-Qerem et al., 2024).
Although BiLSTM networks show promise, thorough research and assessments of their
performance in emotion detection tasks are required.

1
Enhancing the accuracy and robustness of emotion detection systems is the driving force behind
this research. In order to develop a model that can successfully understand and analyse human
emotions in a variety of contexts, this study aims to use the benefits of BiLSTM networks. The
findings of this study have the potential to advance the field of emotion detection, which would
increase the efficacy and versatility of systems in a wide range of applications.

1.2 Statement of the Problem


From mental health evaluation to human-computer interface, emotion detection is essential in
many applications. The intricacy and subtlety of human emotions are too great for the rule-based
systems and traditional machine learning algorithms that are currently used for emotion
identification (Frye, 2022). Conventional models, like decision trees and support vector machines,
use hand-crafted features to categorise emotions. However, their shortcomings result in decreased
robustness and accuracy in emotion detection, particularly in dynamic and varied real-world
situations (Rushan et al., 2024; Miao, 2023; Cahyani et al., 2023; Oyebode et al., 2023).
The identification and interpretation of human emotions from audio data is the main emphasis of
this work. This is important in a number of fields, including marketing tactics, healthcare, social
media trend analysis, and human-computer interaction. Creating an emotion detection model based
on Bidirectional Long Short-Term Memory (BiLSTM) networks—which have the ability to
process data sequences both forward and backward—is the suggested remedy. By capturing
contextual information from both past and future states, this bidirectional processing enables the
model to comprehend the input more thoroughly. By taking into account the entire context of an
emotional expression, the BiLSTM-based model solves the problem of contextual awareness in
emotion recognition, producing more accurate and subtle detections.

2
It is expected that this new method will greatly improve the performance of emotion detection
systems, boosting the effectiveness of applications that rely on emotion detection. Improved
emotion identification could lead to more sympathetic and responsive systems in human-computer
interaction and improve patient monitoring and support in the healthcare industry. It can make
marketing campaigns more impactful and individualized.

Taking everything into account, the proposed BiLSTM-based model might enhance emotion
identification and contribute to the development of increasingly complex and human-like systems.

1.3 Objectives of the Study


The aim of this study is to develop an emotion detection model using Bidirectional Long Short-
Term Memory (BiLSTM).
The specific objectives of the study include:
i. Formulate BiLSTM based model for emotion detection
ii. Develop an algorithm for model formulated in (i)
iii. Evaluate the performance of the model using standard evaluation metrics such as
accuracy, precision and recall

1.4 Significance of the Study


The significance of this work lies in its potential to revolutionize emotion detection, thereby
improving a wide range of applications across various fields. This research aims to address the
limitations of current approaches by developing a BiLSTM-based model for emotion recognition
that is more accurate and contextually sensitive. Progress in human-computer interaction could
lead to the creation of systems that demonstrate greater empathy and responsiveness to user needs,
ultimately enhancing user experience and satisfaction. Accurate emotion detection in the
healthcare field could improve patient monitoring, assess mental health, and deliver personalized
therapy, ultimately leading to improved patient outcomes. This technology has the potential to
boost customer engagement and loyalty by enabling more effective and personalized marketing
methods. Furthermore, the knowledge gained from this research could advance the broader field of
natural language processing (NLP) by offering innovative approaches and frameworks that can be
applied to other NLP tasks. In summary, this work has the potential to significantly advance the
current state of expertise in emotion recognition, with far-reaching implications for technology,
healthcare, and other areas.

3
1.5 Scope of the Study
The scope of this study includes the development, implementation, and evaluation of a
Bidirectional Long Short-Term Memory (BiLSTM) network for emotion detection from textual
data. The research will involve collecting and preprocessing publicly available datasets containing
emotional content, designing a tailored BiLSTM model architecture, and optimizing its
performance through training and hyperparameter tuning. The model's effectiveness will be
assessed using standard evaluation metrics such as accuracy, precision, recall, and F1-score, with
comparisons to baseline models, including traditional machine learning methods and unidirectional
LSTM networks.

4
CHAPTER TWO
LITERATURE REVIEW

2.1 Conceptual Review

This chapter gives a thorough summary of the methods and body of research that have been done
on emotion detection, outlining the development and current status of the topic. The chapter opens
with an examination of the basic ideas and relevance of emotion detection, highlighting its use in a
range of fields like marketing, healthcare, and human-computer interaction. The limitations of
conventional methods for emotion recognition are then discussed, including rule-based systems
and classical machine learning techniques. After then, the topic shifts to the development of deep
learning techniques, with an emphasis on Long Short-Term Memory (LSTM) networks and their
benefits over conventional models.

2.1.1 Fundamental Concepts of Emotion Detection


The technique of identifying and comprehending human emotions from a variety of data sources,
such as text, voice, facial expressions, and physiological indicators, is known as emotion detection.
Understanding people's emotional states is the goal of emotion detection, which is essential for
improving human-technology interactions (Hosseini, Yamaghani, & Poorzaker-Arabani, 2024).
The ability to recognise different emotional states, such as joy, grief, rage, fear, and surprise, is one
of the fundamentals of emotion detection. These emotions are represented through a variety of
modalities, necessitating sophisticated processes to accurately gather and assess them (Guo et al.,
2024). Emotion detection is a vital topic since advancements in natural language processing,
computer vision, and machine learning have significantly increased the ability to recognise and
classify emotions. of investigation and real-world application (Alslaity & Orji, 2024). The field of
human-computer interaction (HCI) relies heavily on emotion detection. More responsive and
intuitive interfaces are becoming more and more in demand as technology becomes increasingly
integrated into daily life (Manalu & Rifai, 2024). According to Rai and Pandey (2024), systems
that possess emotional awareness are able to modify their responses based on the user's emotional
state, leading to more genuine and satisfying interactions. Emotion-aware chatbots and virtual

5
assistants, for instance, can provide empathetic responses, increasing user satisfaction and
engagement. Furthermore, the application of emotion detection technology can enhance the user's
virtual reality and gaming experience by creating more immersive and engaging environments that
react quickly to the player's feelings.
2.1.2 Applications of Emotion Detection
Applications for emotion recognition are numerous and span many different fields, greatly
improving the calibre and efficacy of interactions and services. In order to develop more intuitive,
sympathetic, and effective systems, these applications take advantage of the capacity to identify
and react to human emotions (Abinaya & Vadivu, 2024).
Emotion detection is essential for creating sophisticated user interfaces and interactive systems in
human-computer interaction (HCI) (Kumar, Das & Singh, 2024). Systems like chatbots, virtual
assistants, and customer support platforms can provide more individualised and sympathetic
responses by comprehending and reacting to consumers' emotional states (Tang, Yuan & Zhang,
2024).
To increase user happiness and decrease frustration, a virtual assistant that recognises displeasure
in a user's voice, for example, can offer further assistance or refer the matter to a human agent. By
adapting the game environment or storyline to the player's emotional state, emotion detection in
virtual reality and gaming can produce more immersive experiences.
Medical care: Emotion detection is essential in the medical field for therapeutic interventions,
mental health evaluation, and patient monitoring. Early indications of mental health conditions like
stress, anxiety, and depression can be provided by systems that are able to identify emotional
changes (Saffar, Mann & Ofoghi, 2023). Continuous remote monitoring is made possible by
emotion detection technology built into telehealth platforms, which enable medical professionals
to act quickly when needed.
Francese and Attanasio, 2023. Additionally, these systems can support mental health treatments
and enhance the effectiveness of therapy sessions by offering real-time feedback and customised
recommendations based on patients' emotional reactions (Sajno et al., 2023).
Education: Emotion detection can enhance educational experiences by creating adaptable and
customised learning environments. By monitoring students' emotional states, teachers can
determine when they are frustrated, confused, or disengaged and adjust their teaching strategies
accordingly (Santos, 2023). This could lead to a more supportive learning environment, improved
learning outcomes, and more student engagement (Aggarwal, 2023). For example, an intelligent
6
tutoring system that detects a student's confusion may adjust the difficulty of the content or
provide further clarifications.
client service: By allowing systems to react more sympathetically to client requirements, emotion
detection can greatly enhance customer service. Emotion detection-enabled customer support
solutions can recognise when clients are angry or unsatisfied and give priority to those contacts for
a quicker resolution (Guo et al., 2024). According to Rane (2023), these technologies can also give
customer care representatives real-time information about the customer's emotional state, which
enables them to better customise their responses and raise customer satisfaction levels.
Social Media Analysis: Since it may be used to determine public opinion on a range of subjects,
occasions, and brands, emotion detection is also useful in social media analysis (Rodríguez-Ibánez
et al., 2023). Businesses and organisations can learn about public opinion and take preemptive
measures to address new trends and challenges by examining the emotional tone of social media
postings and comments (Hung & Alias, 2023). This can help guide community participation
initiatives, crisis management plans, and public relations campaigns.
2.1.3 Traditional Techniques for Emotion Detection
Conventional methods for detecting emotions have mostly depended on traditional machine
learning techniques and rule-based systems. Rule-based systems determine emotional states from
textual input by using specified linguistic rules and lexicons (García-Méndez, et al.,2023).
According to Kusal et al. (2023), these systems function by comparing input text to a
predetermined set of rules that are associated with particular emotions. terms like "sad" or "angry"
imply a negative mood, but terms like "happy" or "joyful" may reflect a positive emotional state.
Rule-based systems can offer rapid insights on the text's emotional content and are comparatively
easy to deploy (Denecke & Reichenpfader, 2023).
But since they are based on rigid rules, they are rigid and frequently unable to deal with the
complexities and subtleties of human emotions. Their inability to comprehend irony, context, and
the nuances of natural language results in imprecise or simplistic emotion recognition (Garg &
Saxena, 2024).
Text is classified into different emotional categories using models that are trained on labelled
datasets in traditional machine learning techniques for emotion detection. Naive Bayes classifiers,
decision trees, and support vector machines (SVM) are examples of common algorithms (Machová
et al., 2023). To represent the input text, these models usually use manually created characteristics

7
such part-of-speech tags, n-grams, and word frequency. By learning from data, traditional machine
learning techniques have outperformed rule-based systems, yet they nonetheless
encounter considerable obstacles (Cai, Li & Li, 2023). To choose the most pertinent features,
feature engineering is a laborious procedure that calls for domain knowledge (Li et al., 2023).
Furthermore, the complex and non-linear nature of human emotional expressions may not be
adequately captured by these models, which frequently presume linear connections between
features and emotions (Soujanya-Rao et al., 2023).
The incapacity of rule-based and conventional machine learning techniques to manage context and
long-term dependencies in text is one of their main intrinsic drawbacks (Hornyák, 2023). Accurate
emotion identification depends on capturing the context in which words and phrases are used, as
this frequently affects human emotions. Systems that rely on rules, with their static and In this
sense, predetermined rules are inevitably constrained (Cheng et al., 2024). Even though they are
more adaptable, classical machine learning models still have challenges effectively incorporating
contextual information, especially when working with longer texts or conversations.
Furthermore, both strategies struggle with adaptation and generalisation. To stay up with changing
language usage and new emotional expressions, rule-based systems need to be updated and
maintained frequently (Kusal et al., 2023). Conversely, traditional machine learning models may
experience overfitting, which occurs when they function well on training data but poorly on
unknown data (Aliferis & Simon, 2024). Since there can be a wide range of emotional
manifestations in various circumstances and people, this is especially troublesome when it comes
to emotion recognition.
2.1.4 Long Short-Term Memory (LSTM) Networks
Developments in deep learning techniques, especially the advent of Long Short-Term Memory
(LSTM) networks, have greatly influenced the progress of emotion detection. By utilising neural
networks that can learn from sequential data, LSTM networks constitute a paradigm change in
contrast to conventional rule-based systems and classical machine learning techniques (Zhou et al.,
2024). Because LSTM networks are so good at identifying patterns and relationships across time,
they are especially well-suited for applications like time series analysis and natural language
processing that require comprehending and forecasting event sequences (Gülmez, 2023).
The capacity of LSTM networks to manage the innate complexity and long-term interdependence
found in human emotional expressions is one of their main advantages (Mahadevaswamy &
Swathi, 2023). Contextual subtleties and the temporal evolution of emotions across longer texts or
8
conversations are frequently difficult for traditional models to handle. By keeping a memory cell
that can hold data for extended periods of time and updating and retaining data selectively
depending on the input sequence, LSTM networks overcome these difficulties (Aslan, 2024). This
feature enables more precise and complex emotion identification by enabling LSTM networks to
record correlations between words and phrases that extend over sentences or even entire
documents (Kumar et al., 2023).
Additionally, LSTM networks remove the requirement for manual feature engineering by
providing flexibility in learning representations straight from the data. Without direct human
assistance, LSTM networks may automatically learn hierarchical representations of text and adjust
to a variety of language patterns and emotional emotions (Cahuantzi, Chen & Güttel, 2023).
Because emotions can be communicated through a variety of cultural and linguistic idiosyncrasies,
this makes them more resilient and flexible across domains and languages (Olah, 2023).
When compared to conventional methods, the use of LSTM networks in emotion recognition has
shown notable gains in performance and accuracy. Learning from language's natural structure and
processing text as sequential data, LSTM networks are able to pick up on minute contextual clues
and Semantic relationships that support precise emotion categorisation (Zhao et al., 2021).
Understanding the sentiment and emotional tone of vast amounts of textual data is crucial for
corporate intelligence and decision-making, and this capability is especially useful in real-world
applications like sentiment analysis in social media.
2.1.5 Bidirectional Long Short-Term Memory (BiLSTM)
Bidirectional Long Short-Term Memory (BiLSTM) networks represent a substantial development
in sequential data processing, particularly in tasks requiring a thorough grasp of context and
dependencies throughout time, such as emotion detection (Bin et al., 2016). BiLSTM networks
enhance the capabilities of regular LSTM networks by processing input sequences in both forward
and backward directions simultaneously. A more comprehensive and nuanced representation of the
sequential data is produced by the model's ability to capture both past dependencies (from left to
right) and future dependencies (from right toleft) thanks to this bidirectional processing (Elfaik &
Nfaoui, 2020).

9
Figure 2. 1: Architecture of A BiLSTM Network Consists

Figure 2.1 shows the two LSTM layers that make up a BiLSTM network's architecture, each of
which processes the input sequence in a different way. While the backward LSTM layer processes
the sequence in reverse order, the forward LSTM layer reads the input sequence from start to finish
(Sherstinsky, 2020). Each LSTM layer contains memory cells that maintain states over time and
gates that control the flow of information through the network. These memory cells enable
BiLSTM networks to capture long-term dependencies in the data by selectively remembering and
forgetting information based on the input sequence (DiPietro & Hager, 2020).
During the training phase, the outputs of both the forward and backward LSTM layers are
concatenated at each time step or pooled together to form a combined representation of the input
sequence (Van-Houdt, Mosquera & Nápoles, 2020). This combined representation integrates
information from both directions, effectively enhancing the model's ability to understand context
and capture semantic relationships within the data. The bidirectional aspect of BiLSTM networks
is particularly helpful in applications where the meaning of a sequence is impacted by surrounding
context, such as in natural language processing and sentiment analysis (Ahmed et al., 2021).
BiLSTM networks are well-suited for emotion detection tasks because they excel in capturing
contextual information that is crucial for interpreting emotional expressions accurately (El-Amir et
al., 2020). By processing text in both directions, BiLSTM networks can consider the context from
preceding and subsequent words or phrases, thereby improving the model's sensitivity to the
nuanced ways in which emotions are expressed in language (Latif et al., 2020). This capability

10
makes BiLSTM networks highly effective in scenarios where emotions may be subtly conveyed
through complex interactions between words and sentences.

2.2 Empirical Review


Oğuz, Alkan, and Schöler (2023) examine the value of using physiological data, specifically ECG
signals, to identify emotions and gain insight into psychological states. Their research aims to
create applications using biofeedback, which is highly pertinent given the increasing popularity of
the metaverse and the incorporation of physiological signal trackers in smart devices. The study
uses ECG signal recordings from the MAHNOB-HCI database and presents an algorithm based on
the valence-arousal emotion model. Noise reduction and the Pan-Tompkins technique are used in
the preprocessing step to identify R peaks, which involves extracting morphological features, such
as P-QRS-T pieces, as well as the peak and nadir values. regarding P, Q, R, S, and T waves. To
create a complete feature vector, these attributes are then combined with specific heart rate
variability data. By increasing the number of instances and the significance of distinguishing
attributes, an automated feature engineering technique improves these feature vectors. Three
learning algorithms—feedforward neural networks, bidirectional long short-term memory
(BiLSTM), and support vector machines—are used to further categorise the enhanced attributes.
The study shows that the BiLSTM algorithm performs better than other algorithms, with accuracy
rates of 83.61% for the arousal category and 78.28% for the valence category. The results
demonstrate that BiLSTM is more successful than conventional techniques covered in earlier
studies at detecting emotions from ECG signals. and showcasing its potential in biofeedback and
smart device applications, but in order to improve its performance, this model needs to be made
more accurate by adding new features, like the ability to detect emotions from audio recordings to
text or vice versa. Miao's (2023) study explores the intricate relationship between emotions and
user behaviours in social networks, emphasising the importance of identifying emotions in the
advancement of mobile communication technologies and the application of network intelligence in
industries. The study examines public statements made on social networks, specifically user
expressions, to glean emotional trends. The results demonstrate the ability of deep learning
techniques to accurately identify user emotions, which is essential for the intelligent progression of
online social models and information management systems. Although the study presents promising
findings, it acknowledges its limitations in adequately capturing complex and multifaceted
emotions. Therefore, it is recommended that future studies should explore more profound

11
emotional nuances and enhance the accuracy of analysis. BiLSTM technique, which yielded a
remarkable F1 score of 97.32% after 100 training iterations, also significantly reduces function
loss to 1.33%, minimising its negative impact on the quality of emotion detection.
The study evaluates twenty-one conventional machine learning, ensemble learning, and deep
learning models to identify five emotional states (anger, fear, disgust, happiness, and fear).
Oyebode et al. (2023) acknowledge the detrimental effects of negative emotions like anger, fear,
and sadness on physiological functioning and resistance, and they also highlight the benefits of
positive emotions like happiness on both physical and mental well-being. The researchers' dual
objectives are (1) to develop emotion detection models using machine learning (ML) and deep
learning techniques trained on datasets that accurately represent real-life experiences, and (2) to
create an emotion-adaptive mobile health (mHealth) application that offers interventions to
manage difficult situations, promote positive emotions, and encourage behavioural modifications
for improved mental well-being. and sadness) from diary entries, with a primary focus on the first
goal. The study's next step is to integrate the MCBiLSTM model, which combines three
Convolutional Neural Network (CNN) channels with a Bidirectional Long Short-Term Memory
(BiLSTM) network, to deliver timely, emotion-based therapeutic interventions to enhance mental
health and overall well-being. According to Omarov & Zhumanov (2023), emotion analysis is
important in a number of fields, such as sentiment analysis, customer feedback monitoring, and
mental health evaluation. The paper evaluates traditional machine learning and deep learning
methods, highlighting their limitations in capturing complex and extensive relationships in text.
using the Bi-LSTM (bidirectional long short-term memory) paradigm. By using the power of
recurrent neural networks (RNNs) to the analysis of past and future textual contexts, this Bi-LSTM
model provides a more thorough understanding of emotional content. The forward and backward
LSTM layers are integrated to help the model effectively learn semantic word representations and
their links. Because it emphasises the importance of individual words inside a phrase, the model's
interpretability and performance are enhanced by adding an extra attention mechanism. The
suggested Bi-LSTM model beat numerous state-of-the-art baseline techniques, including support
vector machines, naive Bayes, CNNs, and conventional LSTMs, according to extensive testing
conducted on the Kaggle Emotion Sensor dataset. However, the sole reliance of the model on
textual The information emphasises the necessity of using voice-based detection to increase the
effectiveness and precision of emotion recognition.
In order to identify emotions in textual data, Cahyani et al. (2022) investigated the use of
12
Convolutional Neural Networks (CNN) in conjunction with Bidirectional Long Short-Term
Memory (BiLSTM) networks. The purpose of the study was to discuss the significance of
identifying feelings conveyed through written language on social media. The efficacy of
Word2Vec and GloVe word embeddings in the suggested CNN-BiLSTM architecture was
assessed on a number of datasets, including a combined dataset, commuter line, and trans-Jakarta.
Two scenarios were used in the study's trials: scenario I, which separates students with and without
emotion, and scenario II, which classifies feelings into five groups: surprise, fear, anger, sadness,
and happiness. In terms of accuracy, precision, recall, and F1-measure, the Word2Vec-CNN-
BiLSTM model continuously beats the GloVe-CNN-BiLSTM model in all datasets and scenarios,
according to the study's findings. On commuter line data, the Word2Vec-CNN-BiLSTM model
obtained accuracy rates of 84.34%, 83.73%, and 83.88%, respectively, on the combined dataset
and Transjakarta data. With respect to earlier models like Word2Vec-BiLSTM, these findings
show notable advancements. In order to improve the model's effectiveness, the study highlights the
need for more accuracy improvements and distinctive features, like incorporating text-to-speech-
based emotion recognition, even though its results are encouraging.

2.3 Theoretical Framework


Two theories will serve as the foundation for this investigation: the theory of signal processing and
the theory of emotion. The two hypotheses are thoroughly examined in the ensuing subsections.
2.3.1 Signal Processing Theory
The study of signal analysis, synthesis, and modification falls under the fields of applied
mathematics and electrical engineering known as signal processing theory. According to Brandt
(2023), signals are typically expressed as functions of time or space and can be audio, video,
speech, pictures, or sensor readings. Techniques and algorithms for signal processing are used to
improve, extract valuable information from, or change signals into a format that is better suited for
analysis (Leus et al., 2023). Digital filtering, which eliminates undesired components or noise from
a signal, and the Fourier Transform, which breaks a signal down into its component frequencies,
are important ideas (Zhang, 2023). These methods allow raw, noisy data to be transformed into
more pristine and formats that are easy to understand and use for additional analysis and decision-
making (Radke, 2024).
During the first steps of data preparation for emotion identification from speech signals, signal
processing theory is essential. A combination of linguistic content, speaker traits, and emotional

13
signs make up the complicated raw speech signals (Cai, Li & Li, 2023). It is necessary to process
these signals in order to extract pertinent features in order to recognise emotions. According to Al-
Dujaili and Ebrahimi-Moghadam (2023), time-domain speech inputs are transformed into
frequency-domain representations using techniques like the Fourier Transform, which reveals
patterns associated with various emotions. The power spectrum of a speech signal is captured by
Mel-Frequency Cepstral Coefficients (MFCC), another signal processing technique that is very
useful for simulating the human auditory system. way of seeing things (Sidhu, Latib & Sidhu,
2024). Through the extraction of variables such as pitch, energy, and spectral properties, the
processed speech signals yield a rich set of information that may be used to reliably classify and
forecast emotional states using machine learning models such BiLSTM networks (Basak et al.,
2023). To create reliable emotion recognition systems, this conversion from raw data to feature set
is necessary.

2.3.2 Emotion Theory


A variety of theories and frameworks are included in emotion theory, which aims to describe the
nature, causes, and operations of human emotions (Elfenbein, 2023). In disciplines like
psychology, neurology, and affective computing, these theories are especially important for
comprehending how emotions can be identified and interpreted (Slovak et al., 2024).
Discrete Emotion Theory
According to discrete emotion theory, there are only a few fundamental emotions that are felt and
understood by people from all walks of life (Mortillaro & Schlegel, 2023). This idea, which was
developed by psychologists like Paul Ekman, distinguishes between basic emotions including fear,
contempt, anger, happiness, sadness, and surprise (Shiota, 2024). These feelings are linked to
particular physiological and expressive patterns and are thought to be innate. For example, a frown
and a lower tone may indicate melancholy, but a grin and an increase in pitch may indicate
happiness (Wilson & Lewandowska-Tomaszczyk, 2023). Many emotion recognition systems,
which use predefined criteria to categorise speech or facial gestures into discrete emotional groups,
are based on this principle.
Dimensional Emotion Theory
On the other hand, according to the dimensional emotion theory, emotions can be characterised
along continuous dimensions rather than being discrete entities. Arousal (calm to enthusiastic) and
valence (positive to negative) are the most prevalent dimensions (Troiano, Oberländer & Klinger,

14
2023). A more complex depiction of emotional states is made possible by this approach, which
was put out by scholars such as James Russell. Excitation and anger, for instance, may both be
highly aroused but have different valences (Reitsema et al., 2023). It is feasible to record the
minute differences and intensities of emotional experiences by mapping emotions on various
dimensions. When it comes to applications that demand a thorough comprehension of emotional
dynamics, such emotive research on user experience and computing (Faul, Baumann & LaBar,
2023).
Models for emotion detection are developed using the foundational framework provided by
emotion theory. Discrete emotion theory is used in speech emotion recognition to classify speech
signals into fundamental emotional categories (Garcia-Garcia et al., 2023). By identifying the
acoustic and prosodic cues that correspond to these emotions, feature extraction techniques allow
the model to identify patterns that are specific to each category (Wagner et al., 2024). However,
dimensional emotion theory guides the creation of models that map speech characteristics onto
valence and arousal dimensions, thereby capturing a wider variety of emotional states. This makes
it possible to analyse emotional expressions in a more flexible and thorough way, which improves
the model's capacity to identify minute emotional nuances.
2.4 Research Gaps
Table 2.1 shows that the use of Bidirectional Long Short-Term Memory (BiLSTM) models for
emotion detection has advanced significantly. Still, there are a few areas that require further
research. In order to improve the precision and efficacy of emotion detection systems, existing
research mostly focusses on the analysis of written data, ignoring the potential to incorporate
multiple data sources, such as auditory or physiological signals. Furthermore, even though
BiLSTM has shown promise in capturing temporal correlations, its efficiency and interpretability
need to be improved, especially for real-time applications. Additionally, more study is required to
examine how well BiLSTM models manage large datasets and their ability to adjust to a variety of
emotional exhibits in several cultural contexts. Thus, the goal of this research is to close the gaps
by developing a model that uses BiLSTM to identify emotions. Multiple input types will be
incorporated into this model, which will also increase accuracy in a range of emotional contexts
and improve interpretability, efficiency, and scalability. Because of this, it will help develop
affective computing and its useful applications.

15
Table 2. 1: Some Recent Literature Reviewed
S/ Author/Year Methodology Findings Established Gap(s)
N
1 Omarov & Bi-LSTM model for text Bi-LSTM Reliance on textual
Zhumanov emotion detection outperforms content alone; need
(2023). traditional ML and for integration of
DL methods in voice-based
emotion detection detection to enhance
from text; F1 score model efficiency.
improvements with
attention
mechanism.
2 Cahyani et al. CNN-BiLSTM model Word2Vec-CNN- Requirement for
(2022) with Word2Vec and BiLSTM further accuracy
GloVe embeddings for outperforms GloVe- improvements and
text emotion detection CNN-BiLSTM in introduction of text-
both binary and to-speech features
multi-class emotion for enhanced
classification performance.
scenarios.
3 Miao (2023) BiLSTM model for Achieved F1 score Inability to capture
emotion analysis of of 97.32% and complex, multi-
social network users reduced function layered emotions;
loss to 1.33% after need for deeper
100 training emotional analysis
iterations. to enhance
precision.
4 Oğuz, Alkan, & BiLSTM, SVM, and BiLSTM achieved Need for novel
16
Schöler (2023) feedforward neural 78.28% accuracy features and
networks for emotion for valence and improvement in
detection from ECG 83.61% for arousal accuracy, including
signals categories, emotion detection
outperforming other from audio
algorithms. recordings.
5 Oyebode et al. Multichannel CNN- MCBiLSTM model Integration of the
(2023) BiLSTM model for achieved an F1- model into mHealth
detecting emotions from score of 81.1%, applications for real-
journal entries outperforming 21 time, personalized
classical ML and emotion-based
DL models. therapeutic
interventions.
6. Troiano, E., Dimensional Modeling assessment if an Method adopted
Oberländer, L., of Emotions in Text with event is novel, and were corpus
& Klinger, R. Appraisal Theories explain which creation, annotation
(2023) emotions are reliability and
developed based on prediction.
an event,
7. A review on emotion
detection by using deep
learning techniques

8. Basak, S., Speech signal Challenges and This findings was


Agrawal, H., processing algorithm Limitations in limited to only
Jena, S., Gite, S., Speech Recognition speech recognition
Bachute,M., Technology: A
Assiri, M. Critical Review of
(2023). Speech Signal
Processing
Algorithms, Tools
17
and System
9 Cahuantzi, R., A comparison of GRUs outperform
Chen, X., & LSTM and GRU LSTM networks on
Güttel, S. (2023, networks for learning low-complexity
July).. symbolic sequences sequences
In Science and while on high-
Information complexity
Conference sequences LSTMs
perform better
10 Cahyani, D. E., Sentimental analysis Text-Based The research is
Wibawa, A. P., which detect unique Emotion Detection limited to Text-
Prasetya, D. D., emotion rather than using CNN- Based Emotion
Gumilar, L., positive, negative or BiLSTM Detection using
Akhbar, F.2022. neutral CNN-BiLSTM.
keyword- and
lexicon-based
approaches as they
focus on semantic
relations.
11. Elfenbein, H. A. This article reviews the Emotion in The research is
(2023). scientific field of organizations: limited to workplace
emotion Theory and research settings, including
in organizations, some examples from
drawing from classic the literature rather
theories and cutting- than all relevant
edge advances to papers due to space
integrate a disparate constraints.
body of research.
12. Elfaik, H., & Forward-Backward bidirectional LSTM The research was
Nfaoui, E. H. encapsulate contextual network learning- limited to Arabic
(2020). information from based sentiment sentiment and it
Arabic feature analysis for Arabic lacks explicit
18
sequences. text. sentiment words due
to its complexity in
sentiment analysis
and ambiguity
13. Francese, R., & Beck Depression Emotion detection Only mobile
Attanasio, P. Inventory-II (BDI-II) for supporting application was used
(2023) and questionnaire. depression to test with 79%
screening. accuracy
14

CHPATER THREE
RESEARCH METHODOLOGY

3.1 Research Design


In essence, the research design is the set of guidelines or protocols used to collect and analyse data
in accordance with the research problem. The following tasks make up the research project shown
19
in Figure 3.1: (i) obtaining the dataset, (ii) preprocessing, and (iii) extracting features Using the
BiLSTM model (iv) (v) Evaluation of performance and (vi) Report.

20
3.1.1 Dataset Acquisition
Speech samples from trustworthy sources reflecting a range of emotions are collected during the
dataset acquisition phase. Finding and choosing databases with pre-labeled emotional speech
recordings, such as RAVDESS or IEMOCAP, is necessary for this. To address particular
emotional states or situations that are under-represented in current datasets, bespoke datasets may
also be produced or added to. The objective is to compile a thorough, well-balanced dataset that
ensures diversity and representativeness by capturing a wide range of emotional expressions. To
increase the dataset's size and improve the model's generalisability and robustness, data
augmentation techniques could be used to introduce variances like noise or speed fluctuations.
3.1.2 Data Preprocessing

To get ready for feature extraction, the raw voice data is cleaned and transformed during
preprocessing. In order to improve the quality of the voice signals, noise reduction algorithms are
used to eliminate artefacts and background noise from the recordings. After that, the audio is
normalised to guarantee consistency by standardising the volume levels across samples. The
purpose of segmentation is to divide continuous speech into more comprehensible, smaller frames
or segments. If required, the data is also transformed to a consistent sample rate, and silence
removal may be used to highlight the speech's vocal portions. This stage guarantees that the input
data is consistent, clean, and in a feature extraction-ready format.

3.1.3. Feature Extraction

Meaningful characteristics are extracted from the preprocessed speech signals to be used as inputs
for the BiLSTM model in the feature extraction step. Prosodic features, such as pitch, energy, and
speech rate, are extracted to represent the emotional aspects of speech patterns, and acoustic
features, such as MFCCs, spectral contrast, and formants, are computed to capture the frequency
and spectral properties of the speech. These features are then combined into a feature vector for
each speech segment. The extraction process converts raw audio data into a structured format that
captures the emotional cues required for the model to classify the speech effectively.

3.1.4 Applying BiLSTM Model

21
The BiLSTM architecture processes the feature vectors in both forward and backward directions,
allowing it to capture temporal dependencies and contextual information in the speech data. The
model learns to identify patterns and relationships between the features and the corresponding
emotional labels through iterative training, where hyperparameters like learning rate, batch size,
and the number of BiLSTM layers are tuned to optimise the model’s performance. Applying the
BiLSTM model entails training the model on the extracted features to accurately recognise
emotions. Once the model is trained, it can predict emotions based on the input features.

3.1.5 Performance Evaluation

A separate test dataset is used to evaluate the effectiveness of the trained BiLSTM model, and
evaluation metrics such as accuracy, precision, recall, and F1-score are computed by comparing
the model's predictions with the actual emotional labels. These metrics give information about the
model's overall performance and its capacity to accurately identify particular emotional classes,
and confusion matrices are frequently used to visualise the types of errors the model makes, such
as confusion between similar emotions. This step highlights the model's strengths and areas for
improvement, directing further improvements.

3.1.6 Report

A thorough record of the study's findings and contributions to the field is provided by the report
phase, which documents the entire process and results of the study. It includes a detailed
explanation of the methodology, from dataset acquisition and preprocessing to feature extraction,
model application, and performance evaluation. The report gives the results of the evaluation
metrics and contrasts the proposed BiLSTM-based system with other emotion detection
approaches. It also discusses the

3.2 Population Sample and Sampling Technique

There are 535 utterances in the database overall. The seven emotions listed in the EMODB
database are: 1) anger; 2) boredom; 3) worry; 4) happiness; 5) regret; 6) disgust; and 7) neutral.
The data was down-sampled to 16 kHz after being collected at a 48 kHz sampling rate.

3.3 Method of Data Collection

22
From the Kaggle website, a secondary EMODB dataset will be obtained. The German emotional
database is openly accessible and is called EMODB. Berlin, Germany's Institute of
Communication Science at the Technical University is the creator of the database. Data was
recorded by ten professional speakers, five of whom were men and five of whom were women.

3.4 Techniques for Data Analysis

3.4.1 BiLSTM Based Model for Emotion Detection


We must list the essential elements and formulas needed to create a BiLSTM-based model for
emotion detection utilising Bidirectional Long Short-Term Memory (BiLSTM) networks. The
structure and operation of the BiLSTM network, feature extraction, input data preprocessing, and
final classification are all included in this.
Preprocessing and Feature Extraction
Let x t represent the speech signal at time step t.
Then, extract acoustic and prosodic features f t∧x t . These features could include pitch, energy,
formant frequencies, and other relevant attributes.
The feature f t vector for each time step x t can be represented as:
f t=FeatureExtraction(x t ) (3.1)
Input Sequence
Let F=[ f 1 , f 2 , … f t ] (3.2)
Equation (3.2) is the sequence of feature vectors extracted from the speech signal, where T is the
total number of time steps.
BiLSTM Network
A BiLSTM network consists of two LSTM layers: forward and backward.
The forward LSTM processes the sequence from t=1 ¿ t=T
⃗ LSTM (f t , ⃗
ht =⃗ ht−1 ) (3.3)
The backward LSTM processes the sequence from t=T ¿ t=1:
´ (f t , h´t +1) (3.4)
h́t = LSTM

Where ⃗
ht and h́t are the hidden states of the forward and backward LSTM layers, respectively.
Concatenation of Hidden States
The hidden states from both directions are concatenated to form a combined representation ht for
each time step t :
23
ht =¿ ; h́t ] (3.5)

Fully Connected Layer and SoftMax Classification


The concatenated hidden states ht are passed through a fully connected (dense) layer to produce the
output logits z t :
z t =W h ht +b h (3.6)
Where W h is the weight matrix and ht is the bias vector. The output logits z t are then passed
through a SoftMax activation function to obtain the probability distribution over the emotion
classes ^
yt.
Loss Function:
The model's performance is evaluated using a suitable loss function, such as categorical cross-
entropy, which measures the difference between the predicted probabilities ^
y t nd the true emotion
labels y t .
T C
1
L= ∑ ∑ y , c log ( y t , c ) (3.7)
T t=1 c=1 t
Where C is the number of emotion classes, y t , c is the true label (1 if the emotion is c at time step t
otherwise 0), and ^
y t ,c is the predicted probability for class ccc at time step t .
Model Training:
The model is trained by minimizing the loss function L using an optimization algorithm such as
Adam optimization algorithm. The training process involves adjusting the weights W h and biases
b h through backpropagation.

Emotion Prediction
During inference, the trained BiLSTM model predicts the emotion class for each time step t by
selecting the class with the highest probability:

t^ =arg max ^
y t , c (3.8)
c

By following this mathematical framework, the study aims to develop a BiLSTM-based model
capable of accurately detecting emotions from speech signals. The model leverages the strengths
of bidirectional processing to capture contextual information and long-term dependencies, thereby
enhancing the accuracy and robustness of emotion recognition.

3.4.2 Algorithm for BiLSTM Based Model


24
Algorithm 1: BiLSTM Based Model
1. Collect dataset of speech signals 𝑆 with corresponding emotion labels E
2. Split the dataset into training set, validation set, and test set.
3. For each speech signal in the dataset
Extract features to form a feature sequence F
4. Initialize the BiLSTM model M with:
a. Forward LSTM layer
b. Backward LSTM layer
c. Fully connected (dense) layer
5. For each epoch from 1 to N:
a. For each training example (F, E)
i. Forward pass:
 Compute forward and backward hidden states.
 Concatenate hidden states.
 Compute output logits and predicted probabilities.
ii. Backward pass:
Compute gradients and update model weights using the learning rate
b. Evaluate the model on the validation set.
6. Evaluate the trained model M on the test set.
7. Compute evaluation metrics: accuracy, precision, recall, F1-score.

3.4.3 Performance Metrics Used


The following highlights and discusses the performance evaluation metrics utilised for this study:
Accuracy/Percent_correct Metric
It is the degree to which the measurements resemble a particular value. Put more simply, if a
measurement is near the actual value of the amount being measured after numerous attempts, it is
considered accurate (Tarun, 2020).
Accuracy = Number of correct predictions = f11 +f00

Total number of predictions f11 + f10 + f01 + f00

F-Measure Metric

25
A statistic known as the "F-measure" can be created by combining the two metrics of precision and
recall. The smaller of the two numbers, "x" and "y," is near the harmonic mean of the two.
Accordingly, the high F-measure value guarantees that the Precision and Recall values are both
fairly high (Tarun, 2020).

F1= 2rp = 2 *TP

r +p 2 * TP + FP + FN

Mean_absolute_error Metric
The MAE is rather easy to calculate. To calculate the "total error," the magnitudes (absolute
values) of the errors are added up, and the total error is then divided by n (Res et al., 2005).
Consequently, a model with a lower mean-absolute-error number was thought to perform better,
whereas a model with a higher MAE value performed worse.
Metric of True Positive Rate
It measures the fraction of positive examples correctly predicted by the classifier. To understand it
with an example, let’s say we are trying to search for documents that contain the term ‘machine
learning’ in a corpus of 100 documents. The number of relevant documents for ‘machine learning’
are 20 out of the 100. The model gives us 12 documents when queried for the term ‘machine
learning’ and obtaining fifteen documents. According to Tarun (2020), the recall is recall = 12 / 20
= 60%.
Sensitivity = TP Rate = TP / (TP+FN) = Recall

26
REFERENCES

Abinaya, M., & Vadivu, G. (2024). Enhancing the Potential of Machine Learning for Immersive
Emotion Recognition in Virtual Environment. EAI Endorsed Transactions on Scalable
Information Systems.

Aggarwal, D. (2023). Integration of innovative technological developments and AI with education


for an adaptive learning pedagogy. China Petroleum Processing and Petrochemical
Technology, 23(2), 709-714.

Ahmed, S., Saif, A. S., Hanif, M. I., Shakil, M. M. N., Jaman, M. M., Haque, M. M. U., ... &
Sabbir, H. M. (2021). Att-BiL-SL: Attention-based Bi-LSTM and sequential LSTM for
describing video in the textual formation. Applied sciences, 12(1), 317.

Al Maruf, A., Khanam, F., Haque, M. M., Jiyad, Z. M., Mridha, F., & Aung, Z. (2024). Challenges
and Opportunities of Text-based Emotion Detection: A Survey. IEEE Access.

Al-Dujaili, M. J., & Ebrahimi-Moghadam, A. (2023). Speech emotion recognition: a


comprehensive survey. Wireless Personal Communications, 129(4), 2525-2561.

Aliferis, C., & Simon, G. (2024). Overfitting, Underfitting and General Model Overconfidence and
Under-Performance Pitfalls and Best Practices in Machine Learning and AI. In Artificial
Intelligence and Machine Learning in Health Care and Medical Sciences: Best Practices and
Pitfalls (pp. 477-524). Cham: Springer International Publishing.

Alqahtani, G., & Alothaim, A. (2022). Predicting emotions in online social networks: challenges
and opportunities. Multimedia Tools and Applications, 81(7), 9567-9605.

Al-Qerem, A., Raja, M., Taqatqa, S., & Sara, M. R. A. (2024). Utilizing Deep Learning Models
(RNN, LSTM, CNN-LSTM, and Bi-LSTM) for Arabic Text Classification. In Artificial

27
Intelligence-Augmented Digital Twins: Transforming Industrial Operations for Innovation
and Sustainability (pp. 287-301). Cham: Springer Nature Switzerland.

Alslaity, A., & Orji, R. (2024). Machine learning techniques for emotion detection and sentiment
analysis: current state, challenges, and future directions. Behaviour & Information
Technology, 43(1), 139-164.

Aslan, A. (2024). Understanding Long Short-Term Memory (LSTM) Networks: A Comprehensive


Guide.

Basak, S., Agrawal, H., Jena, S., Gite, S., Bachute, M., Pradhan, B., & Assiri, M. (2023).
Challenges and Limitations in Speech Recognition Technology: A Critical Review of Speech
Signal Processing Algorithms, Tools and Systems. CMES-Computer Modeling in
Engineering & Sciences, 135(2).

Bin, Y., Yang, Y., Shen, F., Xu, X., & Shen, H. T. (2016, October). Bidirectional long-short term
memory for video description. In Proceedings of the 24th ACM international conference on
Multimedia (pp. 436-440).

Brandt, A. (2023). Noise and vibration analysis: signal analysis and experimental procedures.
John Wiley & Sons.

Cahuantzi, R., Chen, X., & Güttel, S. (2023, July). A comparison of LSTM and GRU networks for
learning symbolic sequences. In Science and Information Conference (pp. 771-785). Cham:
Springer Nature Switzerland.

Cahyani, D. E., Wibawa, A. P., Prasetya, D. D., Gumilar, L., Akhbar, F., & Triyulinar, E. R.
(2022, October). Text-Based Emotion Detection using CNN-BiLSTM. In 2022 4th
International Conference on Cybernetics and Intelligent System (ICORIS) (pp. 1-5). IEEE.

Cai, Y., Li, X., & Li, J. (2023). Emotion recognition using different sensors, emotion models,
methods and datasets: A comprehensive review. Sensors, 23(5), 2455.

Cheng, Y., Zhang, C., Zhang, Z., Meng, X., Hong, S., Li, W., ... & He, X. (2024). Exploring large
language model based intelligent agents: Definitions, methods, and prospects. arXiv preprint
arXiv:2401.03428.

28
Denecke, K., & Reichenpfader, D. (2023). Sentiment analysis of clinical narratives: a scoping
review. Journal of Biomedical Informatics, 140, 104336.

DiPietro, R., & Hager, G. D. (2020). Deep learning: RNNs and LSTM. In Handbook of medical
image computing and computer assisted intervention (pp. 503-519). Academic Press.

El-Amir, H., Hamdy, M., El-Amir, H., & Hamdy, M. (2020). Sequential models. Deep Learning
Pipeline: Building a Deep Learning Model with TensorFlow, 415-446.

Elfaik, H., & Nfaoui, E. H. (2020). Deep bidirectional LSTM network learning-based sentiment
analysis for Arabic text. Journal of Intelligent Systems, 30(1), 395-412.

Elfenbein, H. A. (2023). Emotion in organizations: Theory and research. Annual Review of


Psychology, 74(1), 489-517.

Faul, L., Baumann, M. G., & LaBar, K. S. (2023). The representation of emotional experience
from imagined scenarios. Emotion, 23(6), 1670.

Francese, R., & Attanasio, P. (2023). Emotion detection for supporting depression
screening. Multimedia Tools and Applications, 82(9), 12771-12795.

Frye, R. H. (2022). Granular Emotion Detection for Multi-Class Sentiment Analysis in Social
Media (Doctoral dissertation, The University of North Carolina at Charlotte).

Garcia-Garcia, J. M., Lozano, M. D., Penichet, V. M., & Law, E. L. C. (2023). Building a three-
level multimodal emotion recognition framework. Multimedia Tools and Applications, 82(1),
239-269.

García-Méndez, S., de Arriba-Pérez, F., Barros-Vila, A., & González-Castaño, F. J. (2023).


Targeted aspect-based emotion analysis to detect opportunities and precaution in financial
Twitter messages. Expert Systems with Applications, 218, 119611.

Garg, M., & Saxena, C. (2024). Emotion detection from text data using machine learning for
human behavior analysis. In Computational Intelligence Methods for Sentiment Analysis in
Natural Language Processing Applications (pp. 129-144). Morgan Kaufmann.

Gideon, J., Nguyen, H. T., & Burgt, S. V. (2020). Emotion recognition from short speech using
ensemble deep learning. Applied Sciences, 10(16), 5663.
https://doi.org/10.3390/app10165663
29
Gozuacik, N., Sakar, C. O., & Ozcan, S. (2023). Technological forecasting based on estimation of
word embedding matrix using LSTM networks. Technological Forecasting and Social
Change, 191, 122520.

Gülmez, B. (2023). Stock price prediction with optimized deep LSTM network with artificial
rabbits optimization algorithm. Expert Systems with Applications, 227, 120346.

Guo, R., Guo, H., Wang, L., Chen, M., Yang, D., & Li, B. (2024). Development and application of
emotion recognition technology—a systematic literature review. BMC psychology, 12(1), 95.

Guo, Y., Li, Y., Liu, D., & Xu, S. X. (2024). Measuring service quality based on customer
emotion: An explainable AI approach. Decision Support Systems, 176, 114051.

Han, K., & Kim, D. (2021). Emotion recognition in conversations using acoustic and linguistic
features. Applied Sciences, 11(6), 2542. https://doi.org/10.3390/app11062542

Hassan, N., Miah, A. S. M., & Shin, J. (2024). A Deep Bidirectional LSTM Model Enhanced by
Transfer-Learning-Based Feature Extraction for Dynamic Human Activity
Recognition. Applied Sciences, 14(2), 603.

Hornyák, O. (2023, October). An Overview on Evaluation Methods of Sequence Prediction


Problems. In International Conference Interdisciplinarity in Engineering (pp. 427-440).
Cham: Springer Nature Switzerland.

Hosseini, S., Yamaghani, M. R., & Poorzaker Arabani, S. (2024). A review of the methods of
recognition multimodal emotions in sound, image and text. International Journal of Applied
Operational Research-An Open Access Journal, 12(1), 29-41.

Hung, L. P., & Alias, S. (2023). Beyond sentiment analysis: A review of recent trends in text based
sentiment analysis and emotion detection. Journal of Advanced Computational Intelligence
and Intelligent Informatics, 27(1), 84-95.

Jose, J., & Simritha, R. (2024). Sentiment Analysis and Topic Classification with LSTM Networks
and TextRazor. International Journal of Data Informatics and Intelligent Computing, 3(2),
42-51.

30
Kumar, A., Bhatia, A., Kashyap, A., & Kumar, M. (2023). LSTM network: a deep learning
approach and applications. In Advanced Applications of NLP and Deep Learning in Social
Media Data (pp. 130-150). IGI Global.

Kumar, G., Das, T., & Singh, K. (2024). Early detection of depression through facial expression
recognition and electroencephalogram-based artificial intelligence-assisted graphical user
interface. Neural Computing and Applications, 1-18.

Kusal, S., Patil, S., Choudrie, J., Kotecha, K., Vora, D., & Pappas, I. (2023). A systematic review
of applications of natural language processing and future challenges with special emphasis in
text-based emotion detection. Artificial Intelligence Review, 56(12), 15129-15215.

Latif, S., Ali, H. S., Usama, M., Rana, R., Schuller, B., & Qadir, J. (2022). Ai-based emotion
recognition: Promise, peril, and prescriptions for prosocial path. arXiv preprint
arXiv:2211.07290.

Latif, S., Bashir, S., Agha, M. M. A., & Latif, R. (2020). Backward-forward sequence generative
network for multiple lexical constraints. In Artificial Intelligence Applications and
Innovations: 16th IFIP WG 12.5 International Conference, AIAI 2020, Neos Marmaras,
Greece, June 5–7, 2020, Proceedings, Part II 16 (pp. 39-50). Springer International
Publishing.

Lee, J. Y., & Kim, J. H. (2019). Deep neural network-based emotion recognition system using a
bidirectional long short-term memory model. Electronics, 8(11), 1212.
https://doi.org/10.3390/electronics8111212

Leus, G., Marques, A. G., Moura, J. M., Ortega, A., & Shuman, D. I. (2023). Graph Signal
Processing: History, development, impact, and outlook. IEEE Signal Processing
Magazine, 40(4), 49-60.

Li, L., Wang, H., Zha, L., Huang, Q., Wu, S., Chen, G., & Zhao, J. (2023). Learning a data-driven
policy network for pre-training automated feature engineering. In The Eleventh International
Conference on Learning Representations.

Ma, L., Li, X., & Xue, L. (2018). Deep recurrent neural networks for emotion recognition from
speech. IEEE Transactions on Multimedia, 20(11), 3134-3142.
https://doi.org/10.1109/TMM.2018.2792158
31
Machová, K., Szabóova, M., Paralič, J., & Mičko, J. (2023). Detection of emotion by text analysis
using machine learning. Frontiers in Psychology, 14, 1190326.

Mahadevaswamy, U. B., & Swathi, P. (2023). Sentiment analysis using bidirectional LSTM
network. Procedia Computer Science, 218, 45-56.

Manalu, H. V., & Rifai, A. P. (2024). Detection of human emotions through facial expressions
using hybrid convolutional neural network-recurrent neural network algorithm. Intelligent
Systems with Applications, 200339.

Miao, R. (2023). Emotion Analysis and Opinion Monitoring of Social Network Users Under Deep
Convolutional Neural Network. Journal of Global Information Management (JGIM), 31(1), 1-
12.

Mortillaro, M., & Schlegel, K. (2023). Embracing the emotion in emotional intelligence
measurement: Insights from emotion theory and research. Journal of Intelligence, 11(11),
210.

Oğuz, F. E., Alkan, A., & Schöler, T. (2023). Emotion detection from ECG signals with different
learning algorithms and automated feature engineering. Signal, Image and Video Processing,
17(7), 3783-3791.

Olah, C. (2023). Understanding LSTM Networks [https://colah. github. io/posts/2015-08-


Understanding-LSTMs/]. 2015. Accessed: Apr.

Omarov, B., & Zhumanov, Z. (2023). Bidirectional long-short-term memory with attention
mechanism for emotion analysis in textual content. International Journal of Advanced
Computer Science and Applications, 14(6).

Oyebode, O., Ogubuike, R., Daniel, D., & Orji, R. (2023, September). Emotion Detection from
Real-Life Situations Based on Journal Entries Using Machine Learning and Deep Learning
Techniques. In Proceedings of SAI Intelligent Systems Conference (pp. 477-502). Cham:
Springer Nature Switzerland.

Qian, Y. (2023). Enhancing Automatic Emotion Recognition for Clinical Applications: A


Multimodal, Personalized Approach and Quantification of Emotional Reaction Intensity with
Transformers.

32
Radke, R. J. (2024). A Signal Processor Teaches Generative Artificial Intelligence [SP
Education]. IEEE Signal Processing Magazine, 41(2), 6-10.

Rai, M., & Pandey, J. K. (Eds.). (2024). Using Machine Learning to Detect Emotions and Predict
Human Psychology. IGI Global.

Rane, N. (2023). Enhancing customer loyalty through Artificial Intelligence (AI), Internet of
Things (IoT), and Big Data technologies: improving customer satisfaction, engagement,
relationship, and experience. Internet of Things (IoT), and Big Data Technologies: Improving
Customer Satisfaction, Engagement, Relationship, and Experience (October 13, 2023).

Reitsema, A. M., Jeronimus, B. F., van Dijk, M., Ceulemans, E., van Roekel, E., Kuppens, P., &
de Jonge, P. (2023). Distinguishing dimensions of emotion dynamics across 12 emotions in
adolescents’ daily lives. Emotion, 23(6), 1549.

Rodríguez-Ibánez, M., Casánez-Ventura, A., Castejón-Mateos, F., & Cuenca-Jiménez, P. M.


(2023). A review on sentiment analysis from social media platforms. Expert Systems with
Applications, 223, 119862.

Rushan, R. R., Hossain, S., Shovon, S. S., & Rahman, M. A. (2024). Emotion detection for Bangla
language (Doctoral dissertation, Brac University).

Saffar, A. H., Mann, T. K., & Ofoghi, B. (2023). Textual emotion detection in health: Advances
and applications. Journal of Biomedical Informatics, 137, 104258.

Saha, G., Sharma, S., & Sircar, S. (2021). Emotion recognition using speech: A comprehensive
survey. IEEE Transactions on Affective Computing. Advance online publication.
https://doi.org/10.1109/TAFFC.2021.3061559

Sajno, E., Bartolotta, S., Tuena, C., Cipresso, P., Pedroli, E., & Riva, G. (2023). Machine learning
in biosignals processing for mental health: A narrative review. Frontiers in Psychology, 13,
1066317.

Santos, O. C. (2023). Beyond cognitive and affective issues: Designing smart learning
environments for psychomotor personalized learning. In Learning, Design, and Technology:
An International Compendium of Theory, Research, Practice, and Policy (pp. 3309-3332).
Cham: Springer International Publishing.

33
Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term
memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306.

Shiota, M. N. (2024). Basic and Discrete Emotion Theories. Routlegde Handbook of Emotion
Theory, London: Routledge.

Sidhu, M. S., Latib, N. A. A., & Sidhu, K. K. (2024). MFCC in audio signal processing for voice
disorder: a review. Multimedia Tools and Applications, 1-21.

Slovak, P., Antle, A., Theofanopoulou, N., Daudén Roquet, C., Gross, J., & Isbister, K. (2023).
Designing for emotion regulation interventions: an agenda for HCI theory and research. ACM
Transactions on Computer-Human Interaction, 30(1), 1-51.

Soujanya Rao, M., Coombs, T., Binti Mohamad, N., Kumar, V., & Jayabalan, M. (2023,
December). Comparative Analysis of Emotion Recognition Using Large Language Models
and Conventional Machine Learning. In The International Conference on Data Science and
Emerging Technologies (pp. 211-220). Singapore: Springer Nature Singapore.

Tang, L., Yuan, P., & Zhang, D. (2024). Emotional experience during human-computer
interaction: a survey. International Journal of Human–Computer Interaction, 40(8), 1845-
1855.

Tathgir, A., Sharma, C. M., & Chariar, V. M. (2024). EEG-based Emotion Classification using
Deep Learning: Approaches, Trends and Bibliometrics. Qeios

Troiano, E., Oberländer, L., & Klinger, R. (2023). Dimensional modeling of emotions in text with
appraisal theories: Corpus creation, annotation reliability, and prediction. Computational
Linguistics, 49(1), 1-72.]

Van Houdt, G., Mosquera, C., & Nápoles, G. (2020). A review on the long short-term memory
model. Artificial Intelligence Review, 53(8), 5929-5955.

Wagner, N., Mätzler, F., Vossberg, S. R., Schneider, H., Pavlitska, S., & Zöllner, J. M. (2024).
CAGE: Circumplex Affect Guided Expression Inference. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (pp. 4683-4692).

Wilson, P., & Lewandowska-Tomaszczyk, B. (2023). Prototypes in emotion concepts. Lodz


Papers in Pragmatics, 19(1), 125-143.

34
Zhang, W. (2023). Teaching Reform of Digital Signal Processing Driven by Probabilistic Neural
Network. Advances in Education, Humanities and Social Science Research, 8(1), 30-30.

Zhao, Y., Wang, G., Tang, C., Luo, C., Zeng, W., & Zha, Z. J. (2021). A battle of network
structures: An empirical study of cnn, transformer, and mlp. arXiv preprint
arXiv:2108.13002.

Zhou, Y., Xu, C. C., Song, M., Wong, Y. K., & Du, K. (2024). A Novel Quantum LSTM
Network. arXiv preprint arXiv:2406.08982.

35

You might also like