0% found this document useful (0 votes)
19 views7 pages

minor prj LR paper(1)

This paper reviews advancements in multimodal emotion recognition using deep learning techniques, highlighting the importance of integrating various emotional signals such as facial expressions, speech, and physiological data for improved accuracy. It discusses methodologies for feature extraction and classification, comparing multimodal systems to unimodal approaches, and emphasizes the role of deep learning architectures like CNN and LSTM in enhancing emotion detection. The findings suggest that multimodal systems can significantly outperform traditional methods in accurately classifying human emotions across different applications.

Uploaded by

adhesh023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views7 pages

minor prj LR paper(1)

This paper reviews advancements in multimodal emotion recognition using deep learning techniques, highlighting the importance of integrating various emotional signals such as facial expressions, speech, and physiological data for improved accuracy. It discusses methodologies for feature extraction and classification, comparing multimodal systems to unimodal approaches, and emphasizes the role of deep learning architectures like CNN and LSTM in enhancing emotion detection. The findings suggest that multimodal systems can significantly outperform traditional methods in accurately classifying human emotions across different applications.

Uploaded by

adhesh023
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Vol. 02, No. 01, pp.

73 –79 (2021)
ISSN: 2708-0757

JOURNAL OF APPLIED SCIENCE AND TECHNOLOGY TRENDS


www.jastt.org

Multimodal Emotion Recognition using Deep Learning

Sharmeen M.Saleem Abdullah 1, Siddeeq Y. Ameen 2, Mohammed A. M.sadeeq3, Subhi R. M. Zeebaree4


1
Duhok Polytechnic University, Duhok, Iraq, [email protected]
2
Duhok Polytechnic University, Duhok, Iraq, [email protected]
3
Duhok Polytechnic University, Duhok, Iraq, [email protected]
4
Duhok Polytechnic University, Duhok, Iraq, [email protected]

Abstract
New research into human-computer interaction seeks to consider the consumer's emotional status to provide a seamless human-computer
interface. This would make it possible for people to survive and be used in widespread fields, including education and medicine. Multiple
techniques can be defined through human feelings, including expressions, facial images, physiological signs, and neuroimaging strategies.
This paper presents a review of emotional recognition of multimodal signals using deep learning and comparing their applications based
on current studies. Multimodal affective computing systems are studied alongside unimodal solutions as they offer higher accuracy of
classification. Accuracy varies according to the number of emotions observed, features extracted, classification system and database
consistency. Numerous theories on the methodology of emotional detection and recent emotional science address the following topics. This
would encourage studies to understand better physiological signals of the current state of the science and its emotional awareness
problems.

Keywords: Emotion recognition, Facial recognition, Physiological signals, Deep Learning


Received: January 09th, 2021 / Accepted: May 14th, 2021 / Online: May 30th, 2021

recognition should be treated as a multimodal problem [18,


I. INTRODUCTION 19].
Recognition of emotions is a dynamic process that targets
the emotional state of the person, which means that the Affective computation is implemented in the world of
emotions corresponding to each individual's actions are school [20, 21], smart cards, and the scene of motor vehicles
different [1, 2]. Human beings, in general, communicate their [22], entertainment to health-care [23]. This applies in human-
feelings in different ways[3, 4]. To ensure meaningful computer interface architecture, intelligent robotics, safety
communication, accurate interpretation of these emotions is control [24, 25] etc.
important [5, 6]. However, emotional recognition in our For the last few years, deep learning [23, 26], relying on
everyday lives is important for social contact, and emotions the most modern system, has achieved great success in many
play an important role in deciding human actions[7, 8]. areas, such as signal processing, artificial intelligence, and
The various ways people communicate their feelings, emotion detection. Deep belief networks (DBN) [27, 28],
both verbally and nonverbally, including expressive speech, convolution neural network (CNN) [29-31], recurrent neural
facial gestures, body languages, etc [9, 10]. Therefore, networks (RNN) [32], these are the most commonly used
emotional signals from multiple modalities may be used to approaches for deep learning [33].
predict the emotional state of a subject [11-13]. However, the
single modal model cannot easily judge the consumer's
emotion]15 ,14[ . With cannot decide someone is emotional In this paper, we present a review of the recent
simply by looking at a particular entity or occurrence in front advancements in emotion research using multimodal signals;
of our eyes [16, 17]. This is one reason why emotional Feature extraction and classification methodologies using
deep learning, particularly for emotion elicitation stimuli. This

73
doi: 10.38094/jastt20291
Abdullah et al. / Journal of Applied Science and Technology Trends Vol. 02, No. 01, pp. 73 –79 (2021)

review aims to access and upgrade real-time emotion learning algorithms like Support vector machines, hidden
detection systems to know the latest progress in this Markov models, Gaussian mixture models, etc. Deep learning
technology. The latest 2019 and 2020 observations are has been widely used in several speech domains, including
discussed in our review of the topics and contributions. speech recognition [50]. Convolution neural network has also
been used for speech emotion recognition by [53]. [54] shows
The other portion of this paper is structured as follows: that using RNN bidirectional- (Bi-LSTM) is better for
Section 2 outlines the concepts of recognizing human feelings. extracting essential speech characteristics for better speech
Section 3 explains how multimodal emotional processing
signals are combined and provides a summary of recent recognition performance [55].
research work. Section 4 offers a dispute and comparisons, C. Multimodal emotion recognition
then a general conclusion, in Section 5 of this emotional
Multimodal emotion processing continues to see the
recognition study.
widespread application in science [9]. This expansion would
II. EMOTION RECOGNITION AND DEEP LEARNING help better understand emotions with the experience of other
related modalities of the study (video, audio, sensor data, etc.
A. Facial expression recognition Many different approaches and strategies are integrated to
Facial gestures are important ways of expressing feelings meet the study goal. Many of them use big data techniques,
in nonverbal contact. Facial expression recognition plays an semantic principles and deep learning [30].
important role in many applications, including human-
computer interaction and health care [34, 35]. Mehrabian III. MULTIMODAL EMOTION RECOGNITION
observed that 7% of knowledge moves between people Emotions are dynamic psychophysiological processes that
through writing, 38% through voice, and 55% through facial occur nonverbally, which makes emotion identification
expression [36]. Ekman et al. [37] defined six basic emotions: complicated. Multimodal learning is a lot more efficient form
happiness, sadness, surprise, fear, and anger. He proved that of learning than unimodal one [56]. Studies also tried
human perceive these emotions regardless of their cultures integrating signals from different modalities for better
[32]. Emotions could be expressed using two orthogonal efficiency and precision, such as facial expressions and audio,
dimensions: valence and arousal, as Feldman et al. [38]. He audio and written text, physiological signals, and various
suggested that everyone has different ways of communicating combinations of these modalities [57]. At present, This
their feelings. Moreover, there are strong variations in peoples' technique was supplemented to increase the precision of
feelings when asked to express periodic emotions [39, 40]. emotion detection further. Multimodal fusion model can
The valence can be positive to negative, and the arousal can achieve emotion detection results by integrating physiological
be calm to excited [41]. This work would categorize the input signals in various ways [58]. With the recent developments in
into its variations of valence and arousal [42]. Early methods Deep Learning (DL) architectures, deep learning has been
of extracting facial expressions had been developed manually applied [59] in multimodal emotion recognition. Deep
by developers by developing algorithms for extracted learning techniques include deep belief net, deep
functions. such as Garbor wavelet, Weber Local Descriptor Convolutional neural network, LSTM [60],support vector
(WLD), Local Binary Pattern (LBP) [43], multi-feature machine (SVM) [61],and their combination [27].
fusion, etc. These features are not resilient against imbalances
in subjects and may have a high loss of texture features from A. Multimodal emotion recognition combining (audio and
the original image [44]. The Deep Neural Network model text, image and text)
application to facial expression analysis is the hottest subject [ Pan, Zexu et al.] [62] studied a hybrid fusion process,
these days in facial recognition [45]. FER also has a wide referred to as a multimodal attention network (MMAN), to
variety of social life uses, such as smart protection, lie make use of visual and textual signals in speech emotion
detection, and smart medical practice[46],[47]. [27] reviewed recognition. They suggest a new multimodal focus
facial expression recognition models based on deep learning mechanism, cLSTM-MMA, which promotes attention across
techniques, including DBN, deep CNN, Long Short Term three modalities and fuses information selectively. During late
Memory (LSTM) [48, 49] and their combination. fusion, cLSTM-MMA is fused with other uni-modal
subnetworks. The tests demonstrate that identifying speech
B. Speech emotion recognition emotions profits immensely from visual and textual signals.
T Speech expression recognition is one of the main The suggested cLSTM-MMA alone is as successful in terms
elements of human-computer interface systems. They will of precision as other fusion approaches but with a much more
communicate their feelings employing voice and face [50]. compact network structure.
The speech recognition system is being widely used for the
detection of emotion [12, 45]. The earliest experiments on [Siriwardhana et al.][63] searched the use of the pre-
emotion recognition in speech considered extraction of hand- trained "BERT-like" architecture for self-supervised learning
crafted features of speech for classification. Liscombe et al. (SSL) to both represent language and text modalities in order
(2003) extracted a series of continuous speech features based to recognize the multimodal language emotions. They
on fundamental pitch, amplitude, and spectral tilt and demonstrate that a basic fusion mechanism (Shallow-Fusion)
evaluated its relationship with various emotions [51]. Various simplifies the overall structure and strengthens complex
algorithms to recognize feelings in human speaking have been fusion mechanisms.
proposed [52] over the years. Many people proposed machine

74
Abdullah et al. / Journal of Applied Science and Technology Trends Vol. 02, No. 01, pp. 73 –79 (2021)

[Priyasad et al.] [64] presented a deep learning-based for real-time tension monitoring. The research was focused on
approach to protect codes that are characteristic emotion. their reactions to the body. Physiological signals such as
Through a SincNet layer, band-pass filtering technique and electrocardiography (ECG), electrodermal activity (EDA) and
neural net, the researchers managed to extract acoustic electromyography (EMG), calculated by non-invasive
features from raw audio, and the output of said band-pass wearable sensors, are used for this function. It can be seen
filters is then applied to the input to DCNN. A set of from the results obtained that, in this case, the LSTM model
representations on the N-gram level is first determined in a was more effective and reliable than the DNN for the
bidirectional recurrent neural network, then in another parameters they fixed.
recurrent neural network using cross attention before being
combined as a final score. TABLE I. COMBINING SIGNALS FROM AUDIO AND TEXT, IMAGE AND
TEXT
[Krishna et al.] [50] introduced a new way of combining
Author Neural network Accuracy Data set
a raw-waveform-based convolutional neural network with architecture and deep used
cross-modal focus. They use raw audio processing by using learning technique
one-dimensional convolutional models and attention (algorithms)
processes between the audio and text feature to obtain the classificatio method
enhanced emotion detection. Their prototypes demonstrate the n
[ Zexu et LSTM MMAN 73.98% IEMOCAP
proposed architectures accomplish state of the art emotional al.] [62] ,Fusion
classifications. method
[Caihua] [24] through tests, he found that the SVM-based [ SSL modle Speech- __ IEMOCAP,
approach of machine learning is powerful for voice consumer Siriwardhan BERT, CMU-
a, et al][63] RoBERT MOSEI,
sentiment analysis. He suggested an SVM-based multimodal Shallow CMU-
speech emotion recognition. The experimental findings then fusion MOSI),
reveal that the SVM algorithm has advanced dramatically by
[ Priyasad, DCCN band- 80.51% IEMOCAP
applying this SVM approach to the common database et al] [64] with a pass
classification problem. Finally, he applies the approach to SincNet filters
understand the emotional expression and produces an layer, RNN
emotional recognition result with successful speech. [Krishna et 1D CNN cross- 1.9% IEMOCAP
[Lee et al.] [65] developed a multimodal deep learning al ] [50] modal improveme
attention nt
model that utilizes facial images and textual details explaining
the circumstances. To classify the characters' facial images in [ Caihua] SVM Fusion 72.52%, Berlin
the Korean TV series 'Misaeng: The Incomplete'1 into seven [24] method Emotional
DB
emotions: Rage, Disgust, Terror, Joyful, Neutral, Sad, and
Surprise, Using photographs and text, they developed two [Lee et al.] CNN Natural __ Asian
multimodal models to identify emotions. The experiment [65] Language Character
Processin from the TV
findings indicated that using text definition of the behaviour g (NLP) drama
of the characters dramatically increases recognition series
efficiency. [Liu, et al] LSTM Bert 5.77% 777
[Liu, Gaojun et al.] [66] A new multimodal music [66] model, improveme songs(Musi
nt c Mood
emotion grouping system was developed based on music LSFM
Classificatio
audio quality and text for music lyrics. Use of the LSTM n Data Sets)
network for classification is suggested in terms of audio, and
the classification effect is greatly increased relative to other
machine learning methods. It is proposed to use Bert in terms [Yang et al.] [68] proposed a method of emotion detection
of lyrics to describe the emotions of lyrics, which essentially based on electrocardiogram (ECG) and photoplethysmogram
addresses long-term dependency. LSFM is suggested in lyrics (PPG) signals as target sources of data entry. Three states of
in terms of multimodal fusion. The emotion dictionary is used emotions (positive, neutral, negative) were described as
to alter lyrics' emotional classification outcomes. The neural outputs of classification. To efficiently map the subject's
network is implemented based on linear weighted decision- emotions with the extracted features from both ECG and PPG
making stage fusion, which increases efficiency and precision. signals, a convolution neural network (CNN) was developed.
Table I compares several works in terms of deep learning With just two signals tracked, the output is equal or better than
algorithms/ techniques, accuracy, and dataset used. similar works.
[Zhang et al.] [69] proposed a revolutionary multimodal
fusion system with regularization focused on a new kernel
B. Multimodal emotion recognition combining (facial and matrix perspective and a deep network architecture. In
body physiological ) representation learning, they used the deep network
[DHAOUADI et al.] [67] evaluate the introduction of architecture's superior efficiency to transform the native space
(LSTM) and Deep Neural (DNN) networks in young gamers of a predefined kernel into a task-specific function space.

75
Abdullah et al. / Journal of Applied Science and Technology Trends Vol. 02, No. 01, pp. 73 –79 (2021)

They adopted a shared presentation layer to learn the or image and text. The second group combined signals from
representation of fusion, which implies the tacit combination facial and body physiological. Table I and Table II above
of many kernels. Simultaneously, in the loss function, a new show the specifics of the work that has been undertaken to
regularization term was added to model relationships between identify the different feelings using multimodal so far.
representations to enhance multimodal fusion efficiency.
TABLE II. COMBINING SIGNALS FROM PHYSIOLOGICAL SIGNALS
[Nakisa et al.] [70] suggest a temporal fusion approach
with deep learning model to capture the non-linear emotional Author Neural network Accuracy Data set
architecture and deep used
association and enhance emotion classification efficiency.
learning technique
Using two separate fusion approaches: early fusion and late (algorithms)
fusion, the proposed model's efficiency is evaluated. classificatio method
Specifically, after learning modality using a single deep n
network, they use a CNN (ConvNet) (LSTM) model to [ Zexu et LSTM MMAN 73.98% IEMOCAP
al.] [62] ,Fusion
combine the EEG and BVP signals to learn and investigate the method
strongly coupled representation of feelings through
modalities. The findings revealed that the temporal [ SSL modle Speech- __ IEMOCAP,
Siriwardhan BERT, CMU-
multimodal deep learning models effectively grouped human a, et al][63] RoBERT MOSEI,
emotions into four quadrants of dimensional emotions based Shallow CMU-
on early and late fusion approaches. fusion MOSI),

[Asghar et al.] [71] employed features from four [ Priyasad, DCCN band- 80.51% IEMOCAP
previously trained DNN architectures to identify EEG data. et al] [64] with a pass
SincNet filters
Then they generated the data for network training by layer, RNN
transforming the captured data into two-dimensional images.
[Krishna et 1D CNN cross- 1.9% IEMOCAP
They suggested an innovative and powerful form of emotion al ] [50] modal improveme
detection, which is a high-quality feature collection. Using attention nt
DFC-based functionality, they are shortening the training time
[ Caihua] SVM Fusion 72.52%, Berlin
for the network. The results suggested that the model has [24] method Emotional
significantly improved the extracted features and feature DB
classification efficiency.
[Lee et al.] CNN Natural __ Asian
[Nie et al.] [72] proposed a multi-layer LSTM architecture [65] Language Character
to retrieve the system emotion recognition's multimodal input Processin from the TV
g (NLP) drama
function. There will be major changes by neural-network- series
based add-on ideas in utterance-level. They use LSTM to
analyze video in order to extract the global features of speech. [Liu, et al] LSTM Bert 5.77% 777
[66] model, improveme songs(Musi
The experimental findings produced outstanding and nt c Mood
LSFM
important efficiency enhancements over the baseline. Classificatio
n Data Sets)
Table II compares several works in terms of deep learning
algorithms/ techniques, accuracy, and dataset used.
The Table II literature includes seven researchers from the
above-mentioned first party. Previously used, but with new
IV. DISCUSSION data sets, techniques were paired with deep education. We find
Several scientific experiments on emotional recognition that some of these studies fairly break down their results,
have been carried out during the last decade using multimodal while others are less straightforward, making the comparison
Mixing multiple modal signals such as facial and audio a difficult challenge. In [37] CNN based raw waveform was
gestures, audio and written language, physiological signals used following cross-modal focus between audio and text
and different variations in those modalities. characteristics, also in [30], deep learning (CNN)
methodology was developed to manipulate and to merge text
In the proposed multimodal emotion recognition and after and acoustic data to understand emotions, both of them
reviewing previous studies, everyone used more than one achieved good results and improved performance with
modal to identify emotions using different methods and accuracy reaching 80.5% on the IEMOCAP data set. In [24],
techniques combined with deep learning. Where deep learning the author used SVM-based multimodal voice emotion
also has multiple algorithms, methods and architectures in recognition. Results show that the SVM algorithm has
classification and extraction features. These deep learning modified a lot to speech emotion recognition with the
algorithms directly affect achieving a higher rate of deeper accuracy of 72.52 % using Berlin Emotional DB. In [50] and
understanding to improve precision, trust, and performance. [56], the authors used LTSM for classification, and both of
In this paper, after reviewing the latest research in them record significant improvement in performance and
multimodal emption recognition, we divided them into two accuracy reaches 73.98% and (5.77% improvement)
groups. The first group combined signals from audio and text respectively, using the different database. In [51], they
combine speech and text.

76
Abdullah et al. / Journal of Applied Science and Technology Trends Vol. 02, No. 01, pp. 73 –79 (2021)

In contrast, in [44] they combine image and text as signal REFERENCES


to recognition. In the two studies, they benefited from the deep [1] N. Perveen, D. Roy, and K. M. Chalavadi, "Facial Expression
learning, as they noticed a performance improvement. Recognition in Videos Using Dynamic Kernels," IEEE Transactions on
However, the accuracy rate was not mentioned. Image Processing, vol. 29, pp. 8316-8325, 2020.
[2] S. Bateman and S. Ameen, "Comparison of algorithms for use in
Table II, Which contains six researchers from the second adaptive adjustment of digital data receivers," IEE Proceedings I
group mentioned above. We can see that emotion elicitation (Communications, Speech and Vision), vol. 137, pp. 85-96, 1990.
techniques offer assistance in emotion classification. In [57], [3] H. I. Dino and M. B. Abdulrazzaq, "Facial expression classification
[61], the author both used LTSM and deep learning to based on SVM, KNN and MLP classifiers," in 2019 International
classifications features but using different physiological Conference on Advanced Science and Engineering (ICOASE), 2019,
pp. 70-75.
signals. When three physiological signals( ECG, EDA,EMG)
are considered, the classification accuracy was 95% using [4] O. F. Mohammad, M. S. M. Rahim, S. R. M. Zeebaree, and F. Y.
Ahmed, "A survey and analysis of the image encryption methods,"
LTSM. However, when only two physiological signals International Journal of Applied Engineering Research, vol. 12, pp.
(EEG,BVP) are considered, the classification accuracy was 13265-13280, 2017.
71.61%. In [58],[62], they both used CNN, and two [5] V. Shrivastava, V. Richhariya, and V. Richhariya, "Puzzling Out
physiological signals are considered, but in [62], more than Emotions: A Deep-Learning Approach to Multimodal Sentiment
one classification algorithms are used, which had a great Analysis," in 2018 International Conference on Advanced
impact on the results with an accuracy rate of more than 97% Computation and Telecommunication (ICACAT), 2018, pp. 1-6.
on SEED database. From here, we noticed that with more than [6] D. A. Zebari, H. Haron, S. R. Zeebaree, and D. Q. Zeebaree, "Enhance
the Mammogram Images for Both Segmentation and Feature
one algorithm, the results are better in some cases. In [60], to Extraction Using Wavelet Transform," in 2019 International
boost the efficiency of the multimodal fusion, a new algorithm Conference on Advanced Science and Engineering (ICOASE), 2019,
and new architecture using the kernel matrix and a deep neural pp. 100-105.
network are proposed., this method generates different [7] L. Chen, Y. Ouyang, Y. Zeng, and Y. Li, "Dynamic facial expression
accuracy on different posed databases, to approximately 63% recognition model based on BiLSTM-Attention," in 2020 15th
when dealing with DEAP data set while with DECAF data set International Conference on Computer Science & Education (ICCSE),
2020, pp. 828-832.
the accuracy is less about 57%.
[8] M. Wu, W. Su, L. Chen, W. Pedrycz, and K. Hirota, "Two-stage Fuzzy
V. CONCLUSION Fusion based-Convolution Neural Network for Dynamic Emotion
Recognition," IEEE Transactions on Affective Computing, 2020.
In this paper, we analyzed and discussed the various [9] A. Clark, S. Abdullah, and S. Ameen, "A comparison of decision-
multimodal identification of human emotions using deep feedback equalizers for a 9600 bit/s modem," Journal of the Institution
learning. The findings indicate that emotion recognition can of Electronic and Radio Engineers, vol. 58, pp. 74-83, 1988.
be done with greater precision and enhancement using a [10] S. Ammen, M. Alfarras, and W. Hadi, "OFDM System Performance
multimodal approach from biological signals to identify Enhancement Using Discrete Wavelet Transform and DS-SS System
Over Mobile Channel," ed: ACTA Press Advances in Computer and
emotional states. Engineering, 2010.
Emotion modulates nearly all human speech styles, such [11] J. Liang, S. Chen, and Q. Jin, "Semi-supervised Multimodal Emotion
as facial expression, movements, stance, voice tone, sentence Recognition with Improved Wasserstein GANs," in 2019 Asia-Pacific
Signal and Information Processing Association Annual Summit and
collection, breathing, skin temperature and clamminess, etc. Conference (APSIPA ASC), 2019, pp. 695-703.
Emotions will alter the message significantly: often, it is not [12] A. A. Yazdeen, S. R. Zeebaree, M. M. Sadeeq, S. F. Kak, O. M. Ahmed,
what has been said that is most relevant, nor how it has been and R. R. Zebari, "FPGA Implementations for Data Encryption and
said. Faces tend to be the most obvious means of contact Decryption via Concurrent and Parallel Computation: A Review,"
between feelings. However, as opposed to the voice and other Qubahan Academic Journal, vol. 1, pp. 8-16, 2021.
modes of speech, they are often easily manipulated in reaction [13] A. A. Salih and M. B. Abdulrazaq, "Combining best features selection
to various social circumstances. using three classifiers in intrusion detection system," in 2019
International Conference on Advanced Science and Engineering
However, emotional variations can be observed in (ICOASE), 2019, pp. 94-99.
physiological signals for a very short period of around 3- [14] H. Dino, M. B. Abdulrazzaq, S. R. Zeebaree, A. B. Sallow, R. R.
15seconds [73]. Therefore, it would provide better results to Zebari, H. M. Shukur, et al., "Facial Expression Recognition based on
Hybrid Feature Extraction Techniques with Different Classifiers,"
extract the details on the moment of emotional response. TEST Engineering & Management, vol. 83, pp. 22319-22329, 2020.
Throughout the process of the different physiological signals, [15] S. S. R. Zeebaree, S. Ameen, and M. Sadeeq, "Social Media Networks
this will entail a window dependent strategy. Security Threats, Risks and Recommendation: A Case Study in the
Kurdistan Region," International Journal of Innovation, Creativity and
Furthermore, it will improve a consumer emotion Change, vol. 13, pp. 349-365, 2020.
detection system with higher classification accuracy by using [16] S. Y. Ameen and S. W. Nourildean, "Coordinator and router
a comprehensive and novel extraction feature, feature investigation in IEEE802. 15.14 ZigBee wireless sensor network," in
collection and classification techniques. 2013 International Conference on Electrical Communication,
Computer, Power, and Control Engineering (ICECCPCE), 2013, pp.
We also note that researchers have learned and checked 130-134.
their pattern in many datasets to assess the proposed neural [17] M. R. Al-Sultan, S. Y. Ameen, and W. M. Abduallah, "Real Time
network structure. The recognition rates differ from one Implementation of Stegofirewall System," International Journal of
database to another in compliance with the same deep learning Computing and Digital Systems, vol. 8, pp. 498-504, 2019.
pattern. [18] E. Chandra and J. Y.-j. Hsu, "Deep Learning for Multimodal Emotion
Recognition-Attentive Residual Disconnected RNN," in 2019

77
Abdullah et al. / Journal of Applied Science and Technology Trends Vol. 02, No. 01, pp. 73 –79 (2021)

International Conference on Technologies and Applications of [38] L. A. Feldman, "Valence focus and arousal focus: Individual
Artificial Intelligence (TAAI), 2019, pp. 1-8. differences in the structure of affective experience," Journal of
[19] M. B. Abdulrazzaq and J. N. Saeed, "A comparison of three personality and social psychology, vol. 69, p. 153, 1995.
classification algorithms for handwritten digit recognition," in 2019 [39] S. R. Zeebaree, O. Ahmed, and K. Obid, "CSAERNet: An Efficient
International Conference on Advanced Science and Engineering Deep Learning Architecture for Image Classification," in 2020 3rd
(ICOASE), 2019, pp. 58-63. International Conference on Engineering Technology and its
[20] J. Chen, Y. Lv, R. Xu, and C. Xu, "Automatic social signal analysis: Applications (IICETA), 2020, pp. 122-127.
Facial expression recognition using difference convolution neural [40] M. B. Abdulrazaq, M. R. Mahmood, S. R. Zeebaree, M. H.
network," Journal of Parallel and Distributed Computing, vol. 131, pp. Abdulwahab, R. R. Zebari, and A. B. Sallow, "An Analytical Appraisal
97-102, 2019. for Supervised Classifiers’ Performance on Facial Expression
[21] M. R. Mahmood, M. B. Abdulrazzaq, S. Zeebaree, A. K. Ibrahim, R. Recognition Based on Relief-F Feature Selection," in Journal of
R. Zebari, and H. I. Dino, "Classification techniques’ performance Physics: Conference Series, 2021, p. 012055.
evaluation for facial expression recognition," Indonesian Journal of [41] S. Y. Ameen and M. R. Al-Badrany, "Optimal image steganography
Electrical Engineering and Computer Science, vol. 21, pp. 176-1184, content destruction techniques," in International Conference on
2021. Systems, Control, Signal Processing and Informatics, 2013, pp. 453-
[22] L. Hu, W. Li, J. Yang, G. Fortino, and M. Chen, "A Sustainable Multi- 457.
modal Multi-layer Emotion-aware Service at the Edge," IEEE [42] E. S. Salama, R. A. El-Khoribi, M. E. Shoman, and M. A. W. Shalaby,
Transactions on Sustainable Computing, 2019. "A 3D-convolutional neural network framework with ensemble
[23] E. Ghaleb, M. Popa, and S. Asteriadis, "Multimodal and Temporal learning techniques for multi-modal emotion recognition," Egyptian
Perception of Audio-visual Cues for Emotion Recognition," in 2019 Informatics Journal, 2020.
8th International Conference on Affective Computing and Intelligent [43] I. A. Khalifa, S. R. Zeebaree, M. Ataş, and F. M. Khalifa, "Image
Interaction (ACII), 2019, pp. 552-558. steganalysis in frequency domain using co-occurrence matrix and
[24] C. Caihua, "Research on Multi-modal Mandarin Speech Emotion Bpnn," Science Journal of University of Zakho, vol. 7, pp. 27-32, 2019.
Recognition Based on SVM," in 2019 IEEE International Conference [44] A. Chen, H. Xing, and F. Wang, "A Facial Expression Recognition
on Power, Intelligent Computing and Systems (ICPICS), 2019, pp. Method Using Deep Convolutional Neural Networks Based on Edge
173-176. Computing," IEEE Access, vol. 8, pp. 49741-49751, 2020.
[25] M. B. Abdulrazzaq and K. I. Khalaf, "Handwritten Numerals' [45] S. Dou, Z. Feng, X. Yang, and J. Tian, "Real-time multimodal emotion
Recognition in Kurdish Language Using Double Feature Selection," in recognition system based on elderly accompanying robot," in Journal
2019 2nd International Conference on Engineering Technology and its of Physics: Conference Series, 2020, p. 012093.
Applications (IICETA), 2019, pp. 167-172. [46] G. Wen, T. Chang, H. Li, and L. Jiang, "Dynamic Objectives Learning
[26] K. B. Obaid, S. Zeebaree, and O. M. Ahmed, "Deep Learning Models for Facial Expression Recognition," IEEE Transactions on Multimedia,
Based on Image Classification: A Review," International Journal of 2020.
Science and Business, vol. 4, pp. 75-81, 2020. [47] I. Lasri, A. R. Solh, and M. El Belkacemi, "Facial Emotion Recognition
[27] T. D. Nguyen, "Multimodal emotion recognition using deep learning of Students using Convolutional Neural Network," in 2019 Third
techniques," Queensland University of Technology, 2020. International Conference on Intelligent Computing in Data Sciences
[28] S. Y. Ameen, F. M. Almusailkh, and M. H. Al-Jammas, "FPGA (ICDS), 2019, pp. 1-6.
Implementation of Neural Networks Based Symmetric Cryptosystem," [48] S. Rajan, P. Chenniappan, S. Devaraj, and N. Madian, "Novel deep
in 6th International Conference: Sciences of Electronic, Technologies learning model for facial expression recognition based on maximum
of Information and Telecommunications May, 2011, pp. 12-15. boosted CNN and LSTM," IET Image Processing, vol. 14, pp. 1373-
[29] K. Mohan, A. Seal, O. Krejcar, and A. Yazidi, "Facial Expression 1381, 2020.
Recognition using Local Gravitational Force Descriptor based Deep [49] J. Saeed and S. Zeebaree, "Skin Lesion Classification Based on Deep
Convolution Neural Networks," IEEE Transactions on Instrumentation Convolutional Neural Networks Architectures," Journal of Applied
and Measurement, 2020. Science and Technology Trends, vol. 2, pp. 41-51, 2021.
[30] C. Marechal, D. Mikolajewski, K. Tyburek, P. Prokopowicz, L. [50] D. Krishna and A. Patil, "Multimodal Emotion Recognition using
Bougueroua, C. Ancourt, et al., "Survey on AI-Based Multimodal Cross-Modal Attention and 1D Convolutional Neural Networks," Proc.
Methods for Emotion Detection," ed, 2019. Interspeech 2020, pp. 4243-4247, 2020.
[31] E. S. Hussein, U. Qidwai, and M. Al-Meer, "Emotional Stability [51] J. Liscombe, J. Venditti, and J. Hirschberg, "Classifying subject ratings
Detection Using Convolutional Neural Networks," in 2020 IEEE of emotional speech using acoustic features," in Eighth European
International Conference on Informatics, IoT, and Enabling Conference on Speech Communication and Technology, 2003.
Technologies (ICIoT), 2020, pp. 136-140. [52] R. Ibrahim, S. Zeebaree, and K. Jacksi, "Survey on Semantic Similarity
[32] H. I. Dino and M. B. Abdulrazzaq, "A Comparison of Four Based on Document Clustering," Adv. sci. technol. eng. syst. j, vol. 4,
Classification Algorithms for Facial Expression Recognition," pp. 115-122, 2019.
Polytechnic Journal, vol. 10, pp. 74-80, 2020. [53] D. Bertero and P. Fung, "A first look into a convolutional neural
[33] H. Miao, Y. Zhang, W. Li, H. Zhang, D. Wang, and S. Feng, "Chinese network for speech emotion detection," in 2017 IEEE international
Multimodal Emotion Recognition in Deep and Traditional Machine conference on acoustics, speech and signal processing (ICASSP), 2017,
Leaming Approaches," in 2018 First Asian Conference on Affective pp. 5115-5119.
Computing and Intelligent Interaction (ACII Asia), 2018, pp. 1-6. [54] J. Lee and I. Tashev, "High-level feature representation using recurrent
[34] D. Liu, X. Ouyang, S. Xu, P. Zhou, K. He, and S. Wen, "SAANet: neural network for speech emotion recognition," in Sixteenth annual
Siamese action-units attention network for improving dynamic facial conference of the international speech communication association,
expression recognition," Neurocomputing, vol. 413, pp. 145-157, 2020. 2015.
[35] S. Zhang, X. Tao, Y. Chuang, and X. Zhao, "Learning deep multimodal [55] D. A. Hasan, B. K. Hussan, S. R. Zeebaree, D. M. Ahmed, O. S.
affective features for spontaneous speech emotion recognition," Speech Kareem, and M. A. Sadeeq, "The Impact of Test Case Generation
Communication, 2020. Methods on the Software Performance: A Review," International
[36] W. Mellouk and W. Handouzi, "Facial emotion recognition using deep Journal of Science and Business, vol. 5, pp. 33-44, 2021.
learning: review and insights," Procedia Computer Science, vol. 175, [56] Y.-T. Lan, W. Liu, and B.-L. Lu, "Multimodal Emotion Recognition
pp. 689-694, 2020. Using Deep Generalized Canonical Correlation Analysis with an
[37] P. Ekman and W. V. Friesen, Unmasking the face: A guide to Attention Mechanism," in 2020 International Joint Conference on
recognizing emotions from facial clues: Ishk, 2003. Neural Networks (IJCNN), 2020, pp. 1-6.

78
Abdullah et al. / Journal of Applied Science and Technology Trends Vol. 02, No. 01, pp. 73 –79 (2021)

[57] P. Bhattacharya, R. K. Gupta, and Y. Yang, "The Contextual Dynamics


of Multimodal Emotion Recognition in Videos," arXiv preprint
arXiv:2004.13274, 2020.
[58] H. Zhang, "Expression-EEG Based Collaborative Multimodal Emotion
Recognition Using Deep AutoEncoder," IEEE Access, vol. 8, pp.
164130-164143, 2020.
[59] F. Al-Naima, S. Y. Ameen, and A. F. Al-Saad, "Destroying
steganography content in image files," in IEEE Proceedings of Fifth
International Symposium on Communication Systems, Networks and
Digital Signal Processing, University of Patras, Patras, Greece, 2006.
[60] S. Bouktif, A. Fiaz, A. Ouni, and M. A. Serhani, "Multi-sequence
LSTM-RNN deep learning and metaheuristics for electric load
forecasting," Energies, vol. 13, p. 391, 2020.
[61] M. Verma, S. K. Vipparthi, G. Singh, and S. Murala, "LEARNet:
Dynamic imaging network for micro expression recognition," IEEE
Transactions on Image Processing, vol. 29, pp. 1618-1627, 2019.
[62] Z. Pan, Z. Luo, J. Yang, and H. Li, "Multi-modal Attention for Speech
Emotion Recognition," arXiv preprint arXiv:2009.04107, 2020.
[63] S. Siriwardhana, A. Reis, R. Weerasekera, and S. Nanayakkara,
"Jointly Fine-Tuning" BERT-like" Self Supervised Models to Improve
Multimodal Speech Emotion Recognition," arXiv preprint
arXiv:2008.06682, 2020.
[64] D. Priyasad, T. Fernando, S. Denman, S. Sridharan, and C. Fookes,
"Attention Driven Fusion for Multi-Modal Emotion Recognition," in
ICASSP 2020-2020 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), 2020, pp. 3227-3231.
[65] J.-H. Lee, H.-J. Kim, and Y.-G. Cheong, "A Multi-modal Approach for
Emotion Recognition of TV Drama Characters Using Image and Text,"
in 2020 IEEE International Conference on Big Data and Smart
Computing (BigComp), 2020, pp. 420-424.
[66] G. Liu and Z. Tan, "Research on Multi-modal Music Emotion
Classification Based on Audio and Lyirc," in 2020 IEEE 4th
Information Technology, Networking, Electronic and Automation
Control Conference (ITNEC), 2020, pp. 2331-2335.
[67] S. DHAOUADI and M. M. B. KHELIFA, "A multimodal
Physiological-Based Stress Recognition: Deep Learning Models’
Evaluation in Gamers’ Monitoring Application," in 2020 5th
International Conference on Advanced Technologies for Signal and
Image Processing (ATSIP), 2020, pp. 1-6.
[68] C.-J. Yang, N. Fahier, W.-C. Li, and W.-C. Fang, "A Convolution
Neural Network Based Emotion Recognition System using Multimodal
Physiological Signals," in 2020 IEEE International Conference on
Consumer Electronics-Taiwan (ICCE-Taiwan), 2020, pp. 1-2.
[69] X. Zhang, J. Liu, J. Shen, S. Li, K. Hou, B. Hu, et al., "Emotion
Recognition From Multimodal Physiological Signals Using a
Regularized Deep Fusion of Kernel Machine," IEEE Transactions on
Cybernetics, 2020.
[70] B. Nakisa, M. N. Rastgoo, A. Rakotonirainy, F. Maire, and V.
Chandran, "Automatic Emotion Recognition Using Temporal
Multimodal Deep Learning," IEEE Access, 2020.
[71] M. A. Asghar, M. J. Khan, M. Rizwan, R. M. Mehmood, and S.-H.
Kim, "An Innovative Multi-Model Neural Network Approach for
Feature Selection in Emotion Recognition Using Deep Feature
Clustering," Sensors, vol. 20, p. 3765, 2020.
[72] W. Nie, Y. Yan, D. Song, and K. Wang, "Multi-modal feature fusion
based on multi-layers LSTM for video emotion recognition,"
Multimedia Tools and Applications, pp. 1-10, 2020.
[73] H. Gunes and M. Pantic, "Automatic, dimensional and continuous
emotion recognition," International Journal of Synthetic Emotions
(IJSE), vol. 1, pp. 68-99, 2010.

79

You might also like