0% found this document useful (0 votes)

5 views

100

This research article surveys the field of multimodal sentiment analysis, highlighting its importance in artificial intelligence and its interdisciplinary nature involving computer science, psychology, and social sciences. It reviews the definition, development, datasets, and advanced models in this area, while also discussing challenges and future research directions. The article emphasizes the need for multimodal approaches to enhance sentiment analysis by integrating various data types such as text, audio, and visual information.

Uploaded by

gx28nt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

100

Uploaded by

gx28nt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Research Article

Multimodal Sentiment Analysis: A Survey

Songning Lai · Xifeng Hu · Haoxuan Xu · Zhaoxia Ren · Zhi
Liu
arXiv:2305.07611v3 [cs.CL] 3 Jul 2023

Received: date / Accepted: date

Abstract Multimodal sentiment analysis has become Sentiment analysis involves analyzing sentiment po-
an important research area in the field of artificial in- larity through available information [3, 4]. With the
telligence. With the latest advances in deep learning, rapid development of fields such as artificial intelligence,
this technology has reached new heights. It has great computer vision, and natural language processing, it is
potential for both application and research, making it a becoming increasingly possible for artificial agents to
popular research topic. This review provides an overview implement sentiment analysis. Sentiment analysis is an
of the definition, background, and development of multi- interdisciplinary research area that includes computer
modal sentiment analysis. It also covers recent datasets science, psychology, social science, and other fields [5–7].
and advanced models, emphasizing the challenges and Scientists have been working to empower AI agents
future prospects of this technology. Finally, it looks with sentiment analysis capabilities for decades. This
ahead to future research directions. It should be noted is a key component of human-like AI, making AI more
that this review provides constructive suggestions for like a human.
promising research directions and building better per- Sentiment analysis has significant research value [8–
forming multimodal sentiment analysis models, which 11]. With the explosive growth of Internet data, vendors
can help researchers in this field. can use evaluative data such as reviews and review
Keywords Multimodal Sentiment Analysis · Mul- videos to improve their products. Sentiment analysis
timodal Fusion · Affective Computing · Computer also has countless research values, such as lie detection,
Vision interrogation, and entertainment. The following sections
will elaborate on the application and research value of
sentiment analysis.
1 Introduction In the past, sentiment analysis has mostly focused
on a single modality (visual modality, speech modal-
Emotion is a subjective reaction of an organism to ex- ity, or text modality) [12]. Text-based sentiment anal-
ternal stimuli [?,1]. Humans possess a powerful capacity ysis [13–15] has gone a long way in NLP. Vision-based
for sentiment analysis, and researchers are currently sentiment analysis pays more attention to human facial
exploring ways to make this ability available to artificial expressions [16] and movement postures. Speech-based
agents [2]. sentiment analysis mainly extracts features such as pitch,
timbre, and temperament in speech for sentiment anal-
Songning Lai ([email protected]), Xifeng Hu
ysis [17]. With the development of deep learning, these
([email protected]), Haoxuan Xu (202020120237@
mail.sdu.edu.cn) and Zhi Liu ([email protected]) are with three modalities have gained some foothold in sentiment
the School of Information Science and Engineering, Shandong analysis.
University, Qingdao, China. However, using a single modality for sentiment anal-
Zhaoxia Ren ([email protected]) is with Assets and Labora-
tory Management Department, Shandong University, Qingdao,
ysis has limitations [18–21]. The emotional information
China. contained in a single modality is limited and incomplete.
Corresponding author: Zhi Liu and Zhaoxia Ren. (liuzhi@ Combining information from multiple modalities can
sdu.edu.cn,[email protected]). provide deeper emotional polarity. Analyzing only one
2 Songning Lai et al.

Fig. 1 Figure shows the model architecture for the more classical multi-modal sentiment analysis. The overall architecture
consists of three parts: one part for feature extraction of individual modalities, one part for fusion of features of each modality,
and one part for sentiment analysis of the fused features. These three parts are very important, and researchers have begun to
optimize these three parts one by one

modality results in limited results and makes it difficult same field, our focus is on providing constructive sug-
to accurately analyze the emotion of an action. gestions for promising research directions and building
Researchers have gradually realized the need for better performing multimodal sentiment analysis mod-
multi-modal sentiment analysis, and many multi-modal els. We emphasize the challenges and future prospects
sentiment analysis models have emerged to accomplish of these technologies.
this task. Text features dominate and play a key role
in the analysis of deep emotions [22]. Visual modality
extraction of expression and pose features can effectively 2 Multimodal Sentiment Analysis
aid text sentiment analysis and judgment [23]. On the Datasets
one hand, speech modality can extract text features,
and on the other hand, speech tone can be recognized to With the growth of the Internet, an era of data explosion
reveal the state of text at each time point [24]. Figure1 has been created [26–28]. Numerous researchers have
shows the model architecture for the more classical widely collected these data from the Internet (videos,
multi-modal sentiment analysis. The overall architecture reviews, etc.) and built sentiment datasets according
consists of three parts: one part for feature extraction to their own needs. Tab1 summarizes the commonly
of individual modalities, one part for fusion of features used multimodal datasets. The first column indicates
of each modality, and one part for sentiment analysis of the name of the data set. The second column is the year
the fused features. These three parts are very important, in which the sentiment data was released. The third
and researchers have begun to optimize these three parts column is the category of modalities included in the
one by one [25]. sentiment dataset. The fourth column is the platform
In this review, we provide a comprehensive overview from which the data set came. The fifth column is the
of the field of multimodal sentiment analysis. The review language used by the dataset.The sixth column is the
includes a summary and brief introduction of datasets, amount of data contained in the dataset.Each dataset
which can help researchers select appropriate datasets. has its own characteristics. This section lists well-known
We compare and analyze models that have significant datasets in the community, aiming to help researchers
research significance in multimodal sentiment analysis sort out the characteristics of each dataset and make it
and provide suggestions for model construction. We easier for them to choose datasets.
elaborate on three types of modal fusion methods and
explain the advantages and disadvantages of different
modal fusion methods. Finally, we look ahead to the 2.1 IEMOCAP [29]
challenges and future development directions of multi-
modal sentiment analysis, providing several promising IEMOCAP, a sentiment analysis dataset released by
research directions. Compared to other reviews in the the Speech Analysis and Interpretation Laboratory in
Multimodal Sentiment Analysis: A Survey 3

Name Year Modalities Source Language Number

IEMOCAP 2008 A+V+T N/A English 10039
A+V+T
DEAP 2011 A+V+T N/A English 10039
A+V+T
CMU-MOSI 2016 A+V+T YouTube English 2199
CMU-MOSEI 2018 A+V+T YouTube English 23453
MELD 2019 A+V+T The Friends English 13000
Multi-ZOL 2019 V+T ZOL.com Chinese 5288
CH-SIMS 2020 A+V+T N/A Chinese 2281
Spanish
Portuguese
CMU-MOSEAS 2021 A+V+T YouTube German 40000
French
FACTIFY 2022 V+T Twitter English 50000
Reddit
MEMOTION 2022 V+T Facebok English 10000

Table 1 This table contains the used multimodal datasets. The first column indicates the name of the data set. The second
column is the year in which the sentiment data was released. The third column is the category of modalities included in the
sentiment dataset. The fourth column is the platform from which the data set came. The fifth column is the language used by
the dataset.The sixth column is the amount of data contained in the dataset.

2008, is a multi-modal dataset that comprises 1,039 con- 2.3 CMU-MOSI [31]
versational segments, with a total video length of 12
hours. Participants in the study engaged in five different CMU-MOSI dataset is comprised of 93 critical YouTube
scenarios, performing emotions as per a pre-set scenario. videos that cover a range of topics (Zadeh et al., 2016).
The dataset includes not only audio, video, and text These videos were carefully selected to ensure that they
information but also facial expression and posture in- featured only one speaker who was facing the camera,
formation obtained through additional sensors. Data allowing for clear capture of facial expressions. While
points are categorized into ten emotions: neutral, happy, there were no restrictions on camera model, distance,
sad, angry, surprised, scared, disgusted, frustrated, ex- or speaker scene, all presentations and comments were
cited, and other. Overall, IEMOCAP provides a rich made in English by 89 different speakers, including 41
resource for researchers exploring sentiment analysis women and 48 men. The 93 videos were divided into
across multiple modalities. 2,199 subjective opinion segments and annotated with
sentiment intensity ranging from strongly negative to
strongly positive (-3 to 3). Overall, the CMU-MOSI
dataset provides a valuable resource for researchers
studying sentiment analysis.
2.2 DEAP [30]

DEAP is a dataset specifically designed for sentiment 2.4 CMU-MOSEI [32]

analysis that utilizes physiological signals as its source
of data (Koelstra et al., 2011). The dataset examines CMU-MOSEI is a popular dataset for sentiment analy-
EEG data from 32 subjects, with a 1:1 ratio of male and sis that comprises 3,228 YouTube videos (Zadeh et al.,
female participants. EEG signals were collected at 512Hz 2018). These videos are categorized into 23,453 segments
from different areas of the subjects’ brains, including and feature data from three different modalities: text, vi-
the frontal, parietal, occipital, and temporal lobes. To sual, and sound. With contributions from 1,000 speakers
annotate the EEG signals, the subjects were asked to and coverage of 250 different topics, this dataset offers
rate the corresponding videos in terms of three aspects: a diverse range of perspectives. All the videos are in En-
Valence, Arousal, and Dominance, on a scale of 1 to 9. glish, and both sentiment and emotion annotations are
This dataset provides valuable resources for researchers available. The six emotion categories include happy, sad,
to explore sentiment analysis using physiological signals. angry, scared, disgusted, and surprised, while the senti-
4 Songning Lai et al.

ment intensity markers range from strongly negative to includes whether the speaker expressed an opinion or
strongly positive (-3 to 3). Overall, CMU-MOSEI is an made an objective statement. Emotions are divided into
invaluable resource for researchers exploring sentiment six categories for each sentence: happiness, sadness, fear,
analysis across multiple modalities. disgust, surprise, and annotation.

2.5 MELD [33] 2.9 FACTIFY [37]

MELD is a comprehensive dataset that includes video FACTIFY is a fake news detection dataset that focuses
clips from the popular television series Friends. The on implementation validation. It includes data for both
dataset comprises textual, audio, and video information image and text modalities and contains 50,000 sets of
that corresponds to the textual data. It contains 1400 data. Most of the data’s claims refer to politics and gov-
videos, which are further divided into 13,000 individual ernment. The dataset is annotated into three categories:
segments. The dataset is annotated with seven categories support, no evidence, and refutation. This dataset is a
of annotations: Anger, Disgust, Sadness, Joy, Neutral, valuable resource for researchers interested in detecting
Surprise, and Fear. Each segment has three sentiment and combating the spread of fake news.
annotations: positive, negative, and neutral.

2.10 MEMOTION [38]

2.6 Multi-ZOL [34]
MEMOTION is a meme-based dataset that includes
Multi-ZOL is a dataset designed for bimodal sentiment popular memes related to politics, religion, and sports.
classification of images and text. The dataset consists The dataset comprises 10,000 data points and is divided
of reviews of mobile phones collected from ZOL.com. It into three sub-tasks: Sentiment Analysis, Emotion Clas-
contains 5288 sets of multimodal data points that cover sification, and Scale/Intensity of Emotion Classes. The
various models of mobile phones from multiple brands. annotators annotated each data point differently un-
These data points are annotated with a sentiment in- der the different subtasks. Subtask one annotates each
tensity rating from 1 to 10 for six aspects. data point into three categories (negative, neutral, and
positive). Subtask two annotates each data point into
four categories (humour, sarcasm, offense, motivation).
2.7 CH-SIMS [35] Subtask three annotates each data point in the interval
[0,4] to indicate the sentiment intensity. This dataset
CH-SIMS is a unique dataset consisting of 60 open- provides a unique opportunity for researchers to analyze
sourced videos from the web that are split into 2281 the use of memes as a means of communication and
video clips. The dataset focuses solely on Chinese (Man- expression in modern culture.
darin) language and ensures that each segment contains The aforementioned datasets all have certain limi-
only one character’s face and voice. It covers a wide tations. We summarize the limitations of each dataset
range of scenes and speaker ages and is individually la- to help researchers make informed decisions when se-
beled for each modality, making it a valuable resource for lecting datasets for experiments. For IEMOCAP, the
researchers. The dataset annotations contain sentiment limited number of actors in the dataset may lead to
intensity ratings ranging from negative to positive (-1 overfitting issues. Additionally, the emotion categories
to 1) and also include annotations for other attributes in the dataset may not be comprehensive enough to
such as age and gender. cover all emotion types. For DEAP, CMU-MOSI, and
CMU-MOSEI, the emotion categories in the datasets
may not be comprehensive enough to cover all emotion
2.8 CMU-MOSEA [36] types. Furthermore, the video clips in the datasets may
be too short to fully express emotions. For MELD, the
CMU-MOSEA is a versatile dataset that includes mul- audio samples in the dataset may be too short to fully
tiple languages, such as Spanish, Portuguese, German, express emotions. Additionally, facial expressions may
and French. The dataset comprises 40,000 sentence frag- be affected by environmental factors such as lighting.
ments, covering 250 different topics and 1645 speakers. For Multi-ZOL, since the sample sources for social me-
The annotations are split into two categories: sentiment dia comments are diverse, there may be some degree
intensity and binary. Each sentence is annotated with of noise. Furthermore, this dataset is only suitable for
sentiment strength in the interval [-3,3], and the binary sentiment analysis tasks on social media comments. For
Multimodal Sentiment Analysis: A Survey 5

CH-SIMS, this dataset is only suitable for student nega-

tive emotion recognition tasks. Additionally, due to the
limited sample sources being restricted to student ques-
tionnaires, there may be a certain degree of subjective
bias. For CMU-MOSEA, this dataset is only suitable for
multimodal information and emotion recognition tasks
in movie scenes. Additionally, due to the limited sam- Fig. 2 Figure shows the overall framework of early feature-
ple sources being restricted to movie scene annotations, based approaches for multimodal fusion. After extracting
features, this model uses specific components to fuse the
there may be a certain degree of subjective bias. For features of each modality.
FACTIFY, this dataset is only suitable for topic and
sentiment recognition tasks in news articles. Due to the
limited sample sources being restricted to news article category of models can effectively handle multimodal
annotations, there may be a certain degree of subjective emotion recognition tasks with high accuracy and ro-
bias. bustness. However, these models require a large amount
of training data to achieve good performance, and their
structures are relatively complex, which requires a longer
3 Multimodal fusion training time. The overall framework of early feature-
based approaches for multimodal fusion is shown in
Multimodal data describe objects from different per- Figure2. Some representative models are:
spectives and are more informative than single-modal
data. Data information from different modalities can be
complementary to each other. In the task of multimodal 3.1.1 THMM (Tri-modal Hidden Markov
sentiment analysis, it is a very important and challenging Model) [39]
task to fuse the data features between different modali-
ties, preserve the semantic integrity of the modalities, One approach to multimodal sequence modeling and
and achieve a good fusion between different modalities. analysis is to represent eigenvectors of multiple modal-
According to the different modes of modal fusion, it can ities as higher-order tensors and use tensor decompo-
be summarized as feature-based multimodal fusion in sition methods to extract hidden states and transition
the early stage, model-based multimodal fusion in the probabilities. This approach effectively exploits the cor-
middle stage, and decision-based multimodal fusion in relation and complementarity between multimodal data,
the late stage. while avoiding the curse of dimensionality and overfit-
ting. However, the disadvantage is that the order and
rank of the tensors and the number of hidden states
3.1 Early feature-based approaches for must be predetermined, which may affect the model’s
multimodal fusion performance and efficiency.

Early feature-based multimodal fusion methods perform

3.1.2 RMFN (Recurrent Multistage Fusion
shallow fusion after feature extraction in the early stage.
Network) [40]
The fusion of the features of different modalities at the
shallow level of the model is equivalent to unifying the
Another approach is to use multiple recurrent neural
features of different single modalities into the same pa-
network layers to gradually fuse features from differ-
rameter space. Due to the differences in information
ent modalities, from local to global, from low-level to
between different modalities, features commonly con-
high-level, and finally obtain a comprehensive sentiment
tain a large amount of redundant information, and it
representation. This model uses an attention mechanism
is frequently necessary to use dimensionality reduction
to adjust the location of features in semantic space for
methods to remove the redundant information. Features
different modalities, allowing the same word to exhibit
after dimensionality reduction are fed into the model
different emotions under different nonverbal behaviors.
to complete feature extraction and prediction. Early
feature fusion wants the model to consider input in-
formation from multiple modalities at the beginning 3.1.3 RAVEN (Recurrent Attended Variation
of feature modeling. However, the method of unifying Embedding Network) [41]
multiple different parameter spaces in the input layer
may not achieve the desired effect due to the differ- A Hierarchical Fusion Network has been proposed for
ences in the parameter spaces of different modes. This multimodal sentiment analysis, which includes a local
6 Songning Lai et al.

fusion module and a global fusion module. Local cross-

modal fusion is explored through a sliding window, which
effectively reduces computational complexity.

3.1.4 HFFN(Hierarchical feature fusion network

with local and global perspectives for multimodal
affective computing) [42]

Another approach is to use recurrent neural networks Fig. 3 Figure shows the overall framework of medium-term
and adversarial learning to learn joint representations be- model-based approaches for multimodal fusion. This class
inputs the feature information of each modality into multiple
tween different modalities, thereby improving the ability kernel learning, neural networks, graph models, and alternative
of single-modal representations and dealing with missing methods to complete the fusion of modalities. Most of the
modalities or noise. nodes of its modal fusion are variable.

3.1.5 MCTN (Multimodal Cyclic Translation 3.2.2 BERT-like (Self Supervised Models to
Network) [43] Improve Multimodal Speech Emotion
Recognition) [45]
A medium-term model-based multimodal fusion ap-
proach involves feeding multimodal data into the net- This model is a Transformer-based multi-modal senti-
work, and the intermediate layers of the model per- ment analysis method that can leverage self-attention
form feature fusion between the modalities. Model-based mechanism to achieve alignment and fusion between text
modality fusion methods can select the location of modal- and image. The model adopts a self-supervised learn-
ity feature fusion to achieve intermediate interactions. ing method, which can effectively handle multimodal
Model-based fusion typically uses multiple kernel learn- emotion recognition tasks with high accuracy and ro-
ing, neural networks, graph models, and alternative bustness. In addition, the model may be affected by the
methods. quality of data and annotations.

3.2 Medium-term model-based multimodal

fusion method 3.3 Multimodal model based on decision fusion
in the later stage
A medium-term model-based multimodal fusion ap-
proach is to feed multimodal data into the network, and A decision level fusion method is used to fuse infor-
the intermediate layers of the model perform feature mation from different modalities. Decision-level fusion
fusion between the modalities. Model-based modality fu- refers to training models separately on data from dif-
sion methods can select the location of modality feature ferent modalities to incorporate outputs from different
fusion to achieve intermediate interactions. Model-based modalities into the final decision. Multimodal models
fusion typically uses multiple kernel learning, neural based on decision fusion typically fuse modalities using
networks, graph models, and alternative methods. The methods such as averaging, majority voting, weight-
overall framework of medium-term model-based multi- ing, and learnable models. Such models are typically
modal fusion method is shown in Figure3. lightweight and flexible. When any modality is missing,
the decision can be made by using the remaining modal-
3.2.1 MKL (Multiple Kernel Learning) [44] ities. The overall framework of multimodal model based
on decision fusion in the later stagen method is shown
This model is a multiple kernel learning approach. It uses in Figure4.
different kernel functions to represent different modal in-
formation and selects the optimal combination of kernel 3.3.1 Deep Multimodal Fusion Architecture [46]
functions by optimizing an objective function to achieve
the fusion of multi-modal information. The model has In this model, each modality has an independent classi-
high flexibility and can adaptively select different ker- fier. The prediction results are output after averaging the
nel functions, thereby improving the robustness and confidence scores of each classifier. The model has a sim-
accuracy of the model. However, the model structure is ple structure, is easy to implement, and can effectively
relatively simple and may not be able to handle complex handle multimodal emotion recognition tasks. However,
emotional expressions. the model cannot handle the interaction information
Multimodal Sentiment Analysis: A Survey 7

hard to cover all the models here. In this chapter, we

present some recent and cutting-edge multimodal senti-
ment analysis models. Most of these models were used
as benchmark models by later researchers for experi-
mental reference. These models are summarized in Tab2.
The first column is the name of the model. The second
Fig. 4 Figure shows the overall framework of multimodal column is the year in which the model was published.
model based on decision fusion in the later stagen method.
These models train a separate classifier for each modality. The third column is the dataset used by the model. The
Finally, the classifiers of each modality were integrated to fourth column is the accuracy under this dataset.
complete the task of multimodal sentiment analysis.
4.0.1 MultiSentiNet-Att [49]
between modalities and may suffer from information
This model uses an LSTM network to incorporate text
loss.
information into word vectors. VGG is used to extract
both target feature information and scene feature in-
3.3.2 SAL-CNN (Select-Additive Learning formation of an image. The target and scene feature
CNN) [47] vectors are used to perform cross-modal attention mech-
anism learning with word vectors. That is, the target
This model is a multimodal sentiment analysis model
feature information and the scene feature information of
based on CNN and attention mechanism. It uses an
the image are combined to assign special weights to the
adaptive attention mechanism to fuse text and image
word vectors related to the sentiment in the text. The
features, a spatial attention mechanism to extract text-
resulting features are fed into a multi-layer perceptron
related regions in images, and finally a completely con-
to complete the sentiment analysis task.
nected layer to classify the output. The model adopts an
attention mechanism, which can effectively handle mul- 4.0.2 DFF-ATMF [50]
timodal emotion recognition tasks with high accuracy
and robustness. However, the model requires a large The proposed model mainly considers text modality and
amount of training data to achieve good performance, audio modality. The main contribution is to propose
and the model structure is relatively complex, requiring new multi-feature fusion strategies and multi-modal
a longer training time. fusion strategies. Two parallel branches are used to
learn features for text modality and features for audio
3.3.3 TSAM (Temporally Selective Attention modality. For the features of these two modalities, a
Model) [48] multimodal attention fusion module is used to complete
the multimodal fusion.
The proposed model is a time-selective attention model,
which assigns weights through an attention mechanism 4.0.3 AHRM [51]
to help the model choose the time step, and finally
sends it to a distinct SDL loss function model for senti- This model is mainly used to capture the relationship
ment analysis. SDL is a multi-modal sentiment analysis between text modality and visual modality. The authors
method based on Self-Distillation Learning, which can propose an attention mechanism based heterogeneous
exploit the complementarity between different modal- relation model, which can well integrate the respective
ities to improve the generalization ability and robust- high-quality representation information of text modality
ness of the model. The model adopts a time-selective and visual modality. This progressive dual attention
attention mechanism, which can effectively handle mul- mechanism can well highlight the channel-level semantic
timodal emotion recognition tasks with high accuracy information of image and text information. To integrate
and robustness. social attributes, social relations are introduced to cap-
ture complementary information from the social context,
and heterogeneous networks are constructed to integrate
4 Latest Multimodal Sentiment Analysis features.
Models
4.0.4 SFNN [52]
In recent years, the field of multimodal sentiment anal-
ysis has evolved into a huge system and many practical The proposed model is a neural network based on se-
and efficient models and architectures have emerged. It’s mantic feature fusion. Convolutional neural networks
8 Songning Lai et al.

Name Year Dataset Acc

MultiSentiNet-Att 2017 MVSA 68.86%
CMU-MOSI 80.98%
DFF-TMF 2019 CMU-MOSEI 77.15%
Flickr 87.10%
AHRM 2020 Getty Image 87.80%
SFNN 2020 Yelp 62.90%
MISA 2020 MOSI 83.40%
CMU-MOSI 84.10%
MAG-BERT 2020 CMU-MOSEI 84.50%
CMU-MOSI 92.28%
TIMF 2021 CMU-MOSEI 79.46%
Auto-ML based Fusion 2021 B-T4SA 95.19%
CMU-MOSI 84.00%
Self-MM 2022 CMU-MOSEI 82.81%
CH-SIMS 80.74%
CMU-MOSI 83.60%
DISRFN 2022 CMU-MOSEI 87.50%
CMU-MOSI 84.05%
TETFN 2023 CMU-MOSEI 84.25%
CMU-MOSI 89.30%
TEDT 2023 CMU-MOSEI 86.20%
CMU-MOSI 85.06%
SPIL 2023 CMU-MOSEI 85.01%
CH-SIMS 81.25%
Table 2 This table contains some of the most recent and top-performing multimodal sentiment analysis models. The first
column is the name of the model. The second column is the year in which the model was published. The third column is the
dataset used by the model. The fourth column is the accuracy under this dataset.

and attention mechanisms are used to extract visual 4.0.7 TIMF [55]
features. Visual features are mapped to text features
and combined with text modality features for sentiment The main idea of this model is that each modality learns
analysis. features separately and performs tensor fusion of the
features of each modality. In the dataset fusion stage,
the feature fusion for each modality is implemented by
4.0.5 MISA [53] a tensor fusion network. In the decision fusion stage, the
upstream results are fused by soft fusion to adjust the
The proposed model presents a novel multi-modal senti- decision results.
ment analysis framework. Each modality is mapped into
two distinct feature spaces after feature extraction. One
4.0.8 Auto-ML based Fusion [56]
feature space mainly learns the invariant features of the
modality and the other one learns the unique features
The authors propose to combine text and image indi-
of the modality.
vidual sentiment analysis into a final fused classification
based on AutoML. This approach combines individ-
4.0.6 MAG-BERT [54] ual classifiers into a final classification using the best
model generated by Auto-ML. This is a typical model
The authors propose a ”multi-modal” adaptation archi- for decision-level fusion.
tecture and apply it to BERT. The model can receive
input from multiple modalities during fine-tuning. MAG 4.0.9 Self-MM [57]
can be thought of as a vector embedding structure that
allows us to input multimodal information and embed In, the authors combine self-supervised learning and
it as a sequence to BERT. multi-task learning to construct a novel multi-modal
Multimodal Sentiment Analysis: A Survey 9

sentiment analysis architecture. To learn the private multimodal representations. It incorporates textual in-
information of each modality, the authors construct a formation in learning sentiment-related nonlinguistic
single-modal label generation module ULGM based on representations through text-based multi-head attention
self-supervised learning. The loss function corresponding and retains differentiated information among modalities
to this module is designed to incorporate the private through unimodal label prediction. Additionally, the
features learned by the three self-supervised learning vision pre-trained model Vision-Transformer is utilized
subtasks into the original multi-modal sentiment anal- to extract visual features from the original videos to pre-
ysis model using a weight adjustment strategy. The serve both global and local information of a human face.
proposed model performs well, and the self-supervised The strength of this model lies in its ability to incorpo-
learning based ULGM module also has the ability of rate textual information to improve the effectiveness of
single-modal label calibration. nonlinguistic modalities in MSA, while preserving inter-
and intra-modality relationships.
4.0.10 DISRFN [58]
4.0.13 SPIL [61]
The model is a dynamically invariant representation-
specific fusion network. The joint domain separation This model proposes a deep modal shared information
network is improved to obtain a joint domain separa- learning module for effective representation learning in
tion representation for all modalities, so that redundant multimodal sentiment analysis tasks. The proposed mod-
information can be effectively utilized. Second, a HGFN ule captures both shared and private information in a
network is used to dynamically fuse the feature informa- complete modal representation by using a covariance ma-
tion of each modality and learn the features of multiple trix to capture shared information between modalities
modal interactions. At the same time, a loss function and a self-supervised learning strategy to capture pri-
that improves the fusion effect is constructed to help vate information. The module is plug-and-play and can
the model learn the representation information of each adjust the information exchange relationship between
modality in the subspace. modes to learn private or shared information. Addition-
ally, a multi-task learning strategy is employed to help
the model focus its attention on modal differentiation
4.0.11 TEDT [59]
training data. The proposed model outperforms current
state-of-the-art methods on most metrics of three public
This model proposes a multimodal encoding-decoding
datasets, and more combinatorial techniques for the use
translation network with a transformer to address the
of the module are explored.
challenges of multimodal sentiment analysis, specifically
the impact of individual modal data and the poor quality
of nonnatural language features. The proposed method 5 Model comparison and suggestions
uses text as the primary information and sound and
image as the secondary information, and a modality This section evaluates five state-of-the-art multimodal
reinforcement cross-attention module to convert nonnat- sentiment analysis models: DFF-ATMF, MAG-BERT,
ural language features into natural language features to TIMF, Self-MM, and DISRFN. While DFF-ATMF does
improve their quality. Additionally, a dynamic filtering not consider the vision modality, the other models ana-
mechanism filters out error information generated in the lyze sentiment from three modalities of audio, text, and
cross-modal interaction. The strength of this model lies vision.
in its ability to improve the effect of multimodal fusion For the interaction relations of multimodal data,
and more accurately analyze human sentiment. However, DFF-ATMF and TIMF build transformer-based models
it may require significant computational resources and to learn complex relationships among the data. MAG-
may not be suitable for real-time analysis. BERT uses a simple yet effective multimodal adap-
tive gate fusion strategy. Self-MM uses self-supervised
4.0.12 TETFN [60] multi-task learning as the fusion strategy, self-supervised
generation of single-modal labels, and combination of
The Text Enhanced Transformer Fusion Network (TETFN) single-modal labels to complete the multi-modal senti-
is a novel method proposed for multimodal sentiment ment analysis task. DISRFN uses a Dynamic Invariant-
analysis (MSA) that addresses the challenge of different Specific Representation Fusion Network to obtain jointly
contributions of textual, visual, and acoustic modali- domain-separated representations of all modalities and
ties. The proposed method learns text-oriented pairwise dynamically fuse each representation through a hierar-
cross-modal mappings for obtaining effective unified chical graph fusion network.
10 Songning Lai et al.

DFF-ATMF uses two parallel branches to fuse au- different tasks. Advantages: Self-supervised multi-task
dio and text modalities. Its core mechanisms are feature learning as the fusion strategy, self-supervised generation
vector fusion and multimodal attention fusion, which of single-modal labels, and combination of single-modal
can learn more comprehensive sentiment information. labels to complete the multi-modal sentiment analy-
However, due to the use of multi-layer neural networks sis task. Disadvantages: May require a large amount
and sophisticated fusion methods, overfitting may occur. of labeled data to achieve good performance, and the
Advantages: Simple structure, easy to implement, and self-supervised learning process may be computationally
can learn comprehensive sentiment information through expensive.
feature vector fusion and multimodal attention fusion. DISRFN is a deep residual network-based multi-
Disadvantages: Does not consider the vision modality, modal sentiment analysis model that exploits the strat-
may suffer from overfitting due to the use of multi-layer egy of Dynamic Invariant-Specific Representation Fusion
neural networks and sophisticated fusion methods. Network to improve sentiment recognition capability.
MAG-BERT adapts the interior of BERTs using Its advantage is that it can efficiently utilize redundant
multimodal adaptation gates, which employ a simple yet information to obtain joint domain-separated represen-
effective fusion strategy without changing the structure tations of all modalities through a modified joint domain
and parameters of BERTs. However, the multimodal at- separation network and dynamically fuse each represen-
tention can only be performed within the same timestep tation through a hierarchical graph fusion network to
but not across timesteps, which may ignore some tem- obtain the interaction information of multimodal data.
poral relationships. Additionally, MAG-BERT requires However, as with Self-MM, interference and imbalance
freezing the parameters of BERT without being able between multiple tasks can occur, and a suitable weight
to fine-tune BERT, which may result in a representa- adjustment strategy needs to be designed to balance the
tion of BERT that is not adapted to a specific task or learning progress of different tasks. Advantages: Uses a
domain. Advantages: Uses a simple yet effective multi- Dynamic Invariant-Specific Representation Fusion Net-
modal adaptive gate fusion strategy without changing work to obtain jointly domain-separated representations
the structure and parameters of BERTs. Disadvantages: of all modalities and dynamically fuse each representa-
Multimodal attention can only be performed within the tion through a hierarchical graph fusion network. Disad-
same timestep but not across timesteps, which may ig- vantages: May require a large amount of labeled data to
nore some temporal relationships. Requires freezing the achieve good performance, and the hierarchical graph
parameters of BERT without being able to fine-tune fusion network may be computationally expensive.
BERT, which may result in a representation of BERT TEDT proposes a multimodal encoding-decoding
that is not adapted to a specific task or domain. translation network with a transformer to address the
TIMF leverages the self-attention mechanism of challenges of multimodal sentiment analysis. The strength
Transformers to learn complex interactions between of this model lies in its ability to improve the effect of
multimodal data and generate unified sentiment rep- multimodal fusion and more accurately analyze human
resentations. While it has the advantage of being able sentiment. By incorporating the modality reinforcement
to learn complex relationships between modalities, it cross-attention module and dynamic filtering mecha-
may suffer from extreme computational complexity, long nism, the model is able to address the challenges of
training times, and problems with large amounts of la- individual modal data impact and poor quality of non-
beled data. Advantages: Leverages the self-attention natural language features. To build effective multimodal
mechanism of Transformers to learn complex interac- sentiment analysis models, it is recommended to care-
tions between multimodal data and generate unified fully consider the contribution of each modality and
sentiment representations. Disadvantages: May suffer how to effectively integrate them. Additionally, atten-
from extreme computational complexity, long training tion should be paid to addressing challenges such as
times, and problems with large amounts of labeled data. individual modal data impact and poor quality of non-
Self-MM is a self-supervised multi-modal sentiment natural language features. Finally, it is important to
analysis model that uses a multi-task learning strat- consider the computational requirements of the model
egy to learn both multimodal and unimodal emotion and ensure that it is suitable for the intended use case.
recognition tasks. Its advantage is that it can generate TETFN is a novel method proposed for MSA that
single-modal labels using a self-supervised approach, addresses the challenge of different contributions of tex-
saving the cost and time of manual labeling. However, tual, visual, and acoustic modalities. Compared to the
interference and imbalance between multiple tasks can TEDT model, the TETFN model focuses on incorporat-
occur, and an appropriate weight adjustment strategy ing textual information to improve the effectiveness of
needs to be designed to balance the learning progress of nonlinguistic modalities in MSA while preserving inter-
Multimodal Sentiment Analysis: A Survey 11

CMU-MOSI CMU-MOSEI
Model MAE Corr Acc F1-Score MAE Corr Acc F1-Score
DFF-ATMF – – 80.9 81.3 – – 77.2 78.3
MAG-BERT 0.712 0.796 – 86 0.623 0.677 82 82.1
TIMF 0.373 0.93 92.3 92.3 0.645 0.669 79.5 79.5
Self-MM 0.723 0.797 84.8 84.8 0.534 0.764 84.1 84.1
DISRFN 0.798 0.734 83.4 83.6 0.591 0.78 87.5 87.5
TEDT 0.709 0,812 0.893 0.892 0.524 0.749 0.862 0.861
TETFN 0.717 0.800 0.841 0.838 0.551 0.748 0.843 0.842
SPIL 0.704 0.794 0.851 0.854 .523 0.766 0.850 0.849
Table 3 The table shows the performance metrics of the DFF-ATMF, MAG-BERT, TIMF, Self-MM, DISRFN, TEDT ,TETFN
and SPIL models under the CMU-MOSI and CMU-MOSEI datasets. The evaluation parameters included: MAE, Corr, Acc and
F1-Score.

and intra-modality relationships. The TETFN model a self-supervised learning strategy to capture private
achieves this by using text-based multi-head attention information, while the TETFN model uses text-based
and unimodal label prediction to retain differentiated multi-head attention and unimodal label prediction to
information among modalities. In contrast, the TEDT retain differentiated information among modalities. The
model uses a modality reinforcement cross-attention SPIL model also employs a multi-task learning strategy
module to convert nonnatural language features into to help the model focus its attention on modal differ-
natural language features and a dynamic filtering mech- entiation training data, while the TEDT and TETFN
anism to filter out error information generated in the models do not explicitly mention this. The strength of
cross-modal interaction. The strength of the TETFN the SPIL model lies in its ability to capture both shared
model lies in its ability to effectively incorporate tex- and private information in a complete modal represen-
tual information to improve the effectiveness of nonlin- tation, which can be adjusted based on the specific task
guistic modalities in MSA while preserving inter- and at hand. Additionally, the use of a multi-task learning
intra-modality relationships. Additionally, the use of the strategy helps to improve the model’s performance by
vision pre-trained model Vision-Transformer helps to ex- focusing its attention on modal differentiation training
tract visual features from the original videos to preserve data. The SPIL model’s approach of capturing both
both global and local information of a human face. To shared and private information in a complete modal
build effective multimodal sentiment analysis models, it representation is worth considering in future models.
is recommended to carefully consider the contribution Tab3 shows the performance metrics of these five
of each modality and how to effectively integrate them. models under the CMU-MOSI and CMU-MOSEI datasets.
Additionally, attention should be paid to addressing When analyzing the performance metrics of the five mod-
challenges such as individual modal data impact and els on these datasets, we recommend using BERT to
poor quality of nonnatural language features. extract features of text information while using LSTM
to extract features for video and audio modality infor-
SPIL proposes a deep modal shared information mation since it requires capturing modality information
learning module for effective representation learning in in the time series.
multimodal sentiment analysis tasks. The proposed mod- DFF-ATMF does not consider visual modality, re-
ule captures both shared and private information in a sulting in relatively low performance metrics. Visual
complete modal representation by using a covariance ma- information can provide additional information about
trix to capture shared information between modalities human expressions, poses, scenes, etc., which can en-
and a self-supervised learning strategy to capture pri- hance the information of text and speech modalities as
vate information. The module is plug-and-play and can well as complement them. Therefore, visual modality
adjust the information exchange relationship between information deserves to be considered and explored in
modes to learn private or shared information. Addi- multimodal sentiment analysis.
tionally, a multi-task learning strategy is employed to
help the model focus its attention on modal differentia-
tion training data. Compared to the TEDT and TETFN 6 Challenges and Future Scope
models, the SPIL model also focuses on capturing shared
and private information in a complete modal representa- With the development of deep learning, multimodal sen-
tion. However, the SPIL model uses a covariance matrix timent analysis techniques have also been rapidly devel-
to capture shared information between modalities and oped [62–65]. However, multi-modal sentiment analysis
12 Songning Lai et al.

still faces many challenges. This subsection analyzes for multimodal sentiment analysis tasks. Making good
the current state of research, challenges, and future use of memes that are mixed in the text is an important
developments in multimodal sentiment analysis. research topic, as memes often contain extremely strong
emotional messages about reviewers. Additionally, most
text data is transcribed directly through speech, making
6.1 Dataset it particularly difficult to analyze a person’s emotions
when multiple people are talking. Combined with the
In multimodal sentiment analysis, the dataset plays a cultural characteristics of different regions and countries,
crucial role. Currently, a large dataset in multiple lan- the same text data may reflect different emotions.
guages is missing. Given the diversity of languages and
races in many countries, a large, diverse dataset could
be used to train a multi-modal sentiment analysis model 6.5 Future Prospectsn
with strong generalization and wide usage. Additionally,
current multimodal datasets still have low annotation The future of multimodal sentiment analysis techniques
accuracy and have not yet reached absolute continu- is extremely bright, and some of the future applications
ous values, requiring researchers to label multimodal are listed below. Multimodal emotion analysis for real-
datasets more finely. Most current multimodal data con- time assessment of mental health [71–73]; multimodal
tain only visual, speech, and text modalities and lack criminal linguistic deception detection model [74]; offen-
modal information combined with physiological signals sive language detection; A human-like emotion-aware
such as brain waves and pulses. robot, etc. Multimodal emotion analysis is a technique
for recognizing and analyzing emotions. Models that
combine multi-modal information data for sentiment
6.2 Detection of Hidden Emotions analysis can effectively improve the accuracy of sen-
timent analysis. In the future, multi-modal sentiment
There has always been a recognized difficulty in multi- analysis techniques will be gradually improved. Perhaps
modal sentiment analysis tasks: the analysis of hidden one day there will be a multimodal sentiment analysis
emotions. Hidden emotions [66, 67] include: sarcastic model with a large number of parameters that will have
emotions (such as sarcastic words), emotions that need the same sentiment analysis capabilities as humans. It
to be concretely analyzed in context, and complex emo- was a thing of rapture.
tions [68, 69] (such as a person’s happiness and sadness).
It is important to explore these hidden emotions. It’s
the gap between human and artificial intelligence [70]. 7 Conclusion
Multimodal sentiment analysis techniques have been
6.3 Multiple forms of video data recognized as important by researchers in various fields,
making it a central research topic in the fields of nat-
In multimodal sentiment analysis tasks, video data is ural language processing and computer vision. In this
particularly challenging. Although the speaker is facing review, we provide a detailed description of various as-
the camera and the video data resolution is maintained pects of multimodal sentiment analysis, including its
at a high level, the actual situation is more complicated research background, definition, and development pro-
and requires the model to be robust against noise and cess. We also summarize commonly used benchmark
applicable to low-resolution video data. Capturing the datasets in Table 1 and compare and analyze recent
micro-expressions and micro-gestures of speakers for state-of-the-art multimodal sentiment analysis models.
sentiment analysis is also an area worth exploring by Finally, we present the challenges posed by the field
researchers. of multimodal sentiment analysis and explore possible
future developments.
Many prospective works are being actively carried
6.4 Multiform language data out and have even been largely implemented. However,
there are still challenges to be addressed, leading to the
The form of text data in multimodal sentiment analysis following meaningful research directions:
tasks is typically single. However, evaluation texts in on- (1) Construct a large multimodal sentiment dataset
line communities are often cross-lingual, with reviewers in multiple languages.
using multiple languages to make more vivid comments. (2) Solve the domain transfer problem of video, text,
Text data with mixed emotions also remains a challenge and speech modal data.
Multimodal Sentiment Analysis: A Survey 13

(3) Build a unified, large-scale multimodal senti- 3. Soo-Min Kim and Eduard Hovy. Determining the senti-
ment analysis model with excellent generalization per- ment of opinions. In COLING 2004: Proceedings of the
20th International Conference on Computational Linguis-
formance. tics, pages 1367–1373, 2004.
(4) Reduce model parameters, optimize algorithms, 4. Erik Cambria, Björn Schuller, Yunqing Xia, and Catherine
and reduce algorithm complexity. Havasi. New avenues in opinion mining and sentiment
(5) Solve the multi-lingual hybridness problem in analysis. IEEE Intelligent systems, 28(2):15–21, 2013.
multimodal sentiment analysis. 5. Arshi Parvaiz, Muhammad Anwaar Khalid, Rukhsana
Zafar, Huma Ameer, Muhammad Ali, and Muham-
(6) Discuss the weight problem of modal fusion and mad Moazam Fraz. Vision transformers in medical com-
provide the most reasonable scheme to assign weights puter vision—a contemplative retrospection. Engineering
of different modalities in different cases. Applications of Artificial Intelligence, 122:106126, 2023.
(7) Discuss the correlation between modalities and 6. Bo Zhang, Jun Zhu, and Hang Su. Toward the third gen-
eration artificial intelligence. Science China Information
separate shared and private information between them Sciences, 66(2):1–19, 2023.
to improve model performance and interpretability. 7. Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang,
(8) Construct a multimodal sentiment analysis model Hiroaki Hayashi, and Graham Neubig. Pre-train, prompt,
that can well complete hidden emotions. and predict: A systematic survey of prompting methods
in natural language processing. ACM Computing Surveys,
55(9):1–35, 2023.
8. Jireh Yi-Le Chan, Khean Thye Bea, Steven Mun Hong
Declarations Leow, Seuk Wai Phoong, and Wai Khuen Cheng. State of
the art: a review of sentiment analysis based on sequential
Availability of data and materials transfer learning. Artificial Intelligence Review, 56(1):749–
780, 2023.
9. Mayur Wankhade, Annavarapu Chandra Sekhara Rao,
Not applicable. and Chaitanya Kulkarni. A survey on sentiment analysis
methods, applications, and challenges. Artificial Intelli-
gence Review, 55(7):5731–5780, 2022.
Competing interests 10. Hui Li, Qi Chen, Zhaoman Zhong, Rongrong Gong, and
Guokai Han. E-word of mouth sentiment analysis for user
behavior studies. Information Processing & Management,
The authors declare that they have no competing inter- 59(1):102784, 2022.
ests. 11. Ashima Yadav and Dinesh Kumar Vishwakarma. Senti-
ment analysis using deep learning architectures: a review.
Artificial Intelligence Review, 53(6):4335–4385, 2020.
Funding 12. Ganesh Chandrasekaran, Tu N Nguyen, and Jude He-
manth D. Multimodal sentimental analysis for social
media applications: A comprehensive review. Wiley Inter-
This work was supported in part by Joint found for disciplinary Reviews: Data Mining and Knowledge Dis-
smart computing of Shandong Natural Science Foun- covery, 11(5):e1415, 2021.
dation under Grant ZR2020LZH013; open project of 13. Bernhard Kratzwald, Suzana Ilic, Mathias Kraus, Stefan
State Key Laboratory of Computer Architecture CAR- Feuerriegel, and Helmut Prendinger. Decision support
with text-based emotion recognition: Deep learning for
CHA202001; the Major Scientific and Technological In- affective computing. arXiv preprint arXiv:1803.06397,
novation Project in Shandong Province under Grant 2018.
2021CXG010506 and 2022CXG010504; ”New Univer- 14. Carlo Strapparava and Rada Mihalcea. Semeval-2007 task
sity 20 items” Funding Project of Jinan under Grant 14: Affective text. In Proceedings of the fourth interna-
tional workshop on semantic evaluations (SemEval-2007),
2021GXRC108 and 2021GXRC024.
pages 70–74, 2007.
15. Yang Li, Quan Pan, Suhang Wang, Tao Yang, and Erik
Cambria. A generative model for category text generation.
Acknowledgments Information Sciences, 450:301–315, 2018.
16. Rong Dai. Facial expression recognition method based on
Not applicable. facial physiological features and deep learning. Journal
of Chongqing University of Technology (Natural Science),
34(6):146–153, 2020.
17. Zhu Ren, Jia Jia, Quan Guo, Kuo Zhang, and Lianhong
References Cai. Acoustics, content and geo-information based sen-
timent prediction from large-scale networked voice data.
1. Julien Deonna and Fabrice Teroni. The emotions: A In 2014 IEEE International Conference on Multimedia
philosophical introduction. Routledge, 2012. and Expo (ICME), pages 1–4. IEEE, 2014.
2. Clayton Hutto and Eric Gilbert. Vader: A parsimonious 18. LIU Jiming, ZHANG Peixiang, LIU Ying, ZHANG Wei-
rule-based model for sentiment analysis of social media dong, and FANG Jie. Summary of multi-modal sentiment
text. In Proceedings of the international AAAI conference analysis technology. Journal of Frontiers of Computer
on web and social media, volume 8, pages 216–225, 2014. Science & Technology, 15(7):1165, 2021.
14 Songning Lai et al.

19. Feiran Huang, Xiaoming Zhang, Zhonghua Zhao, Jie Xu, 33. Soujanya Poria, Devamanyu Hazarika, Navonil Majumder,
and Zhoujun Li. Image–text sentiment analysis via deep Gautam Naik, Erik Cambria, and Rada Mihalcea. Meld:
multimodal attentive fusion. Knowledge-Based Systems, A multimodal multi-party dataset for emotion recognition
167:26–37, 2019. in conversations. arXiv preprint arXiv:1810.02508, 2018.
20. Akshi Kumar and Geetanjali Garg. Sentiment analy- 34. Nan Xu, Wenji Mao, and Guandan Chen. Multi-
sis of multimodal twitter data. Multimedia Tools and interactive memory network for aspect based multimodal
Applications, 78:24103–24119, 2019. sentiment analysis. In Proceedings of the AAAI Confer-
21. Ankita Gandhi, Kinjal Adhvaryu, and Vidhi Khanduja. ence on Artificial Intelligence, volume 33, pages 371–378,
Multimodal sentiment analysis: review, application do- 2019.
mains and future directions. In 2021 IEEE Pune Section 35. Wenmeng Yu, Hua Xu, Fanyang Meng, Yilin Zhu, Yixiao
International Conference (PuneCon), pages 1–5. IEEE, Ma, Jiele Wu, Jiyun Zou, and Kaicheng Yang. Ch-sims:
2021. A chinese multimodal sentiment analysis dataset with
22. Vaibhav Rupapara, Furqan Rustam, Hina Fatima fine-grained annotation of modality. In Proceedings of the
Shahzad, Arif Mehmood, Imran Ashraf, and Gyu Sang 58th annual meeting of the association for computational
Choi. Impact of smote on imbalanced text features for linguistics, pages 3718–3727, 2020.
toxic comments classification using rvvc model. IEEE 36. Amir Zadeh, Yan Sheng Cao, Simon Hessner, Paul Pu
Access, 9:78621–78634, 2021. Liang, Soujanya Poria, and Louis-Philippe Morency. Cmu-
23. Jia Li, Ziyang Zhang, Junjie Lang, Yueqi Jiang, Liuwei An, moseas: A multimodal language dataset for spanish, por-
Peng Zou, Yangyang Xu, Sheng Gao, Jie Lin, Chunxiao tuguese, german and french. In Proceedings of the Con-
Fan, et al. Hybrid multimodal feature extraction, min- ference on Empirical Methods in Natural Language Pro-
ing and fusion for sentiment analysis. In Proceedings of cessing. Conference on Empirical Methods in Natural
the 3rd International on Multimodal Sentiment Analysis Language Processing, volume 2020, page 1801. NIH Pub-
Workshop and Challenge, pages 81–88, 2022. lic Access, 2020.
24. Anna Favaro, Chelsie Motley, Tianyu Cao, Miguel Iglesias, 37. Shreyash Mishra, S Suryavardan, Amrit Bhaskar, Parul
Ankur Butala, Esther S Oh, Robert D Stevens, Jesús Chopra, Aishwarya Reganti, Parth Patwa, Amitava Das,
Villalba, Najim Dehak, and Laureano Moro-Velázquez. A Tanmoy Chakraborty, Amit Sheth, Asif Ekbal, et al. Fact-
multi-modal array of interpretable features to evaluate ify: A multi-modal fact verification dataset. In Proceedings
language and speech patterns in different neurological of the First Workshop on Multimodal Fact-Checking and
disorders. In 2022 IEEE Spoken Language Technology Hate Speech Detection (DE-FACTIFY), 2022.
Workshop (SLT), pages 532–539. IEEE, 2023.
38. Sathyanarayanan Ramamoorthy, Nethra Gunti, Shreyash
25. Soujanya Poria, Erik Cambria, Rajiv Bajpai, and Amir
Mishra, S Suryavardan, Aishwarya Reganti, Parth Patwa,
Hussain. A review of affective computing: From unimodal
Amitava DaS, Tanmoy Chakraborty, Amit Sheth, Asif
analysis to multimodal fusion. Information fusion, 37:98–
Ekbal, et al. Memotion 2: Dataset on sentiment and
125, 2017.
emotion analysis of memes. In Proceedings of De-Factify:
26. Sathyan Munirathinam. Industry 4.0: Industrial internet
Workshop on Multimodal Fact Checking and Hate Speech
of things (iiot). In Advances in computers, volume 117,
Detection, CEUR, 2022.
pages 129–164. Elsevier, 2020.
39. Louis-Philippe Morency, Rada Mihalcea, and Payal Doshi.
27. Esteban Ortiz-Ospina and Max Roser. The rise of social
Towards multimodal sentiment analysis: Harvesting opin-
media. Our world in data, 2023.
28. Abdul Haseeb, Enjun Xia, Shah Saud, Ashfaq Ahmad, and ions from the web. In Proceedings of the 13th international
Hamid Khurshid. Does information and communication conference on multimodal interfaces, pages 169–176, 2011.
technologies improve environmental quality in the era 40. Paul Pu Liang, Ziyin Liu, Amir Zadeh, and Louis-Philippe
of globalization? an empirical analysis. Environmental Morency. Multimodal language analysis with recurrent
Science and Pollution Research, 26:8594–8608, 2019. multistage fusion. arXiv preprint arXiv:1808.03920, 2018.
29. Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe 41. Yansen Wang, Ying Shen, Zhun Liu, Paul Pu Liang, Amir
Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Zadeh, and Louis-Philippe Morency. Words can shift: Dy-
Chang, Sungbok Lee, and Shrikanth S Narayanan. namically adjusting word representations using nonverbal
Iemocap: Interactive emotional dyadic motion capture behaviors. In Proceedings of the AAAI Conference on
database. Language resources and evaluation, 42:335–359, Artificial Intelligence, volume 33, pages 7216–7223, 2019.
2008. 42. Sijie Mai, Haifeng Hu, and Songlong Xing. Divide, conquer
30. Sander Koelstra, Christian Muhl, Mohammad Soley- and combine: Hierarchical feature fusion network with
mani, Jong-Seok Lee, Ashkan Yazdani, Touradj Ebrahimi, local and global perspectives for multimodal affective
Thierry Pun, Anton Nijholt, and Ioannis Patras. Deap: A computing. In Proceedings of the 57th annual meeting
database for emotion analysis; using physiological signals. of the association for computational linguistics, pages
IEEE transactions on affective computing, 3(1):18–31, 481–492, 2019.
2011. 43. Hai Pham, Paul Pu Liang, Thomas Manzini, Louis-
31. Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis- Philippe Morency, and Barnabás Póczos. Found in trans-
Philippe Morency. Mosi: multimodal corpus of sentiment lation: Learning robust joint representations by cyclic
intensity and subjectivity analysis in online opinion videos. translations between modalities. In Proceedings of the
arXiv preprint arXiv:1606.06259, 2016. AAAI Conference on Artificial Intelligence, volume 33,
32. AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, pages 6892–6899, 2019.
Erik Cambria, and Louis-Philippe Morency. Multimodal 44. Soujanya Poria, Erik Cambria, and Alexander Gelbukh.
language analysis in the wild: Cmu-mosei dataset and Deep convolutional neural network textual features and
interpretable dynamic fusion graph. In Proceedings of multiple kernel learning for utterance-level multimodal
the 56th Annual Meeting of the Association for Com- sentiment analysis. In Proceedings of the 2015 confer-
putational Linguistics (Volume 1: Long Papers), pages ence on empirical methods in natural language processing,
2236–2246, 2018. pages 2539–2544, 2015.
Multimodal Sentiment Analysis: A Survey 15

45. Shamane Siriwardhana, Andrew Reis, Rivindu Weerasek- tion fusion network for multimodal sentiment analysis.
era, and Suranga Nanayakkara. Jointly fine-tuning” bert- Computational Intelligence and Neuroscience, 2022, 2022.
like” self supervised models to improve multimodal speech 59. Fan Wang, Shengwei Tian, Long Yu, Jing Liu, Junwen
emotion recognition. arXiv preprint arXiv:2008.06682, Wang, Kun Li, and Yongtao Wang. Tedt: Transformer-
2020. based encoding–decoding translation network for mul-
46. Behnaz Nojavanasghari, Deepak Gopinath, Jayanth timodal sentiment analysis. Cognitive Computation,
Koushik, Tadas Baltrušaitis, and Louis-Philippe Morency. 15(1):289–303, 2023.
Deep multimodal fusion for persuasiveness prediction. In 60. Di Wang, Xutong Guo, Yumin Tian, Jinhui Liu, LiHuo
Proceedings of the 18th ACM International Conference He, and Xuemei Luo. Tetfn: A text enhanced transformer
on Multimodal Interaction, pages 284–288, 2016. fusion network for multimodal sentiment analysis. Pattern
47. Haohan Wang, Aaksha Meghawat, Louis-Philippe Recognition, 136:109259, 2023.
Morency, and Eric P Xing. Select-additive learning: Im- 61. Songning Lai, Xifeng Hu, Yulong Li, Zhaoxia Ren, Zhi Liu,
proving generalization in multimodal sentiment analysis. and Danmin Miao. Shared and private information learn-
In 2017 IEEE International Conference on Multimedia ing in multimodal sentiment analysis with deep modal
and Expo (ICME), pages 949–954. IEEE, 2017. alignment and self-supervised multi-task learning. arXiv
48. Hongliang Yu, Liangke Gui, Michael Madaio, Amy Ogan, preprint arXiv:2305.08473, 2023.
62. Mahesh G Huddar, Sanjeev S Sannakki, and Vijay S
Justine Cassell, and Louis-Philippe Morency. Temporally
Rajpurohit. A survey of computational approaches and
selective attention model for social and affective state
challenges in multimodal sentiment analysis. Int. J. Com-
recognition in multimedia content. In Proceedings of the
put. Sci. Eng, 7(1):876–883, 2019.
25th ACM international conference on Multimedia, pages 63. Ramandeep Kaur and Sandeep Kautish. Multimodal
1743–1751, 2017. sentiment analysis: A survey and comparison. Research
49. Nan Xu and Wenji Mao. Multisentinet: A deep semantic Anthology on Implementing Sentiment Analysis Across
network for multimodal sentiment analysis. In Proceedings Multiple Disciplines, pages 1846–1870, 2022.
of the 2017 ACM on Conference on Information and 64. Lukas Stappen, Alice Baird, Lea Schumann, and Schuller
Knowledge Management, pages 2399–2402, 2017. Bjorn. The multimodal sentiment analysis in car reviews
50. Feiyang Chen, Ziqian Luo, Yanyan Xu, and Dengfeng (muse-car) dataset: Collection, insights and improvements.
Ke. Complementary fusion of multi-features and IEEE Transactions on Affective Computing, 2021.
multi-modalities in sentiment analysis. arXiv preprint 65. Anurag Illendula and Amit Sheth. Multimodal emotion
arXiv:1904.08138, 2019. classification. In companion proceedings of the 2019 world
51. Jie Xu, Zhoujun Li, Feiran Huang, Chaozhuo Li, and wide web conference, pages 439–449, 2019.
S Yu Philip. Social image sentiment analysis by exploiting 66. Donglei Tang, Zhikai Zhang, Yulan He, Chao Lin, and
multimodal content and heterogeneous relations. IEEE Deyu Zhou. Hidden topic–emotion transition model for
Transactions on Industrial Informatics, 17(4):2974–2982, multi-level social emotion detection. Knowledge-Based
2020. Systems, 164:426–435, 2019.
52. Weidong Wu, Yabo Wang, Shuning Xu, and Kaibo Yan. 67. Petr Hajek, Aliaksandr Barushka, and Michal Munk. Fake
Sfnn: Semantic features fusion neural network for mul- consumer review detection using deep neural networks
timodal sentiment analysis. In 2020 5th International integrating word embeddings and emotion mining. Neural
Conference on Automation, Control and Robotics Engi- Computing and Applications, 32:17259–17274, 2020.
neering (CACRE), pages 661–665. IEEE, 2020. 68. Soonil Kwon. A cnn-assisted enhanced audio signal pro-
53. Devamanyu Hazarika, Roger Zimmermann, and Soujanya cessing for speech emotion recognition. Sensors, 20(1):183,
Poria. Misa: Modality-invariant and-specific representa- 2019.
tions for multimodal sentiment analysis. In Proceedings of 69. Umar Rashid, Muhammad Waseem Iqbal, Muhammad Ak-
the 28th ACM International Conference on Multimedia, mal Skiandar, Muhammad Qasim Raiz, Muhammad Raza
pages 1122–1131, 2020. Naqvi, and Syed Khuram Shahzad. Emotion detection
of contextual text using deep learning. In 2020 4th In-
54. Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, Amir
ternational symposium on multidisciplinary studies and
Zadeh, Chengfeng Mao, Louis-Philippe Morency, and
innovative technologies (ISMSIT), pages 1–5. IEEE, 2020.
Ehsan Hoque. Integrating multimodal information in large
70. Fereshteh Ghanbari-Adivi and Mohammad Mosleh. Text
pretrained transformers. In Proceedings of the conference.
emotion detection in social networks using a novel ensem-
Association for Computational Linguistics. Meeting, vol-
ble classifier based on parzen tree estimator (tpe). Neural
ume 2020, page 2359. NIH Public Access, 2020.
Computing and Applications, 31(12):8971–8983, 2019.
55. Jianguo Sun, Hanqi Yin, Ye Tian, Junpeng Wu, Linshan 71. Zhentao Xu, Verónica Pérez-Rosas, and Rada Mihalcea.
Shen, and Lei Chen. Two-level multimodal fusion for Inferring social media users’ mental health status from
sentiment analysis in public security. Security and Com- multimodal information. In Proceedings of the 12th lan-
munication Networks, 2021:1–10, 2021. guage resources and evaluation conference, pages 6292–
56. Vasco Lopes, António Gaspar, Luı́s A Alexandre, and João 6299, 2020.
Cordeiro. An automl-based approach to multimodal image 72. Rahee Walambe, Pranav Nayak, Ashmit Bhardwaj, and
sentiment analysis. In 2021 International Joint Confer- Ketan Kotecha. Employing multimodal machine learning
ence on Neural Networks (IJCNN), pages 1–9. IEEE, for stress detection. Journal of Healthcare Engineering,
2021. 2021:1–12, 2021.
57. Wenmeng Yu, Hua Xu, Ziqi Yuan, and Jiele Wu. Learn- 73. Nujud Aloshban, Anna Esposito, and Alessandro Vincia-
ing modality-specific representations with self-supervised relli. What you say or how you say it? depression detection
multi-task learning for multimodal sentiment analysis. In through joint modeling of linguistic and acoustic aspects
Proceedings of the AAAI conference on artificial intelli- of speech. Cognitive Computation, 14(5):1585–1598, 2022.
gence, volume 35, pages 10790–10797, 2021. 74. Safa Chebbi and Sofia Ben Jebara. Deception detection
58. Jing He, Haonan Yanga, Changfan Zhang, Hongrun Chen, using multimodal fusion approaches. Multimedia Tools
and Yifu Xua. Dynamic invariant-specific representa- and Applications, pages 1–30, 2021.

Multimodal Sentiment Analysis A Systematic Review of History, Datasets, Multimodal Fusion Methods, Applications, Challenges and Future Directions
No ratings yet
Multimodal Sentiment Analysis A Systematic Review of History, Datasets, Multimodal Fusion Methods, Applications, Challenges and Future Directions
21 pages
Mark J S Keenan Advanced Positioning
100% (7)
Mark J S Keenan Advanced Positioning
281 pages
SENTIMENT - ZADEHetal - Tensor Fusion Network For Multimodal Sentiment Analysis - Paper
No ratings yet
SENTIMENT - ZADEHetal - Tensor Fusion Network For Multimodal Sentiment Analysis - Paper
12 pages
1-s2.0-S156625352300074X-main
No ratings yet
1-s2.0-S156625352300074X-main
20 pages
Multimodal Sentiment Analysis A Survey
No ratings yet
Multimodal Sentiment Analysis A Survey
11 pages
A Survey On Multimodal Aspect-Based Sentiment Analysis
No ratings yet
A Survey On Multimodal Aspect-Based Sentiment Analysis
14 pages
Multi-modal_Sentiment_Analysis_of_Audio_and_Visual_Context_of_the_Data_using_Machine_Learning
No ratings yet
Multi-modal_Sentiment_Analysis_of_Audio_and_Visual_Context_of_the_Data_using_Machine_Learning
8 pages
P13-1096
No ratings yet
P13-1096
10 pages
Abdu 2021
No ratings yet
Abdu 2021
23 pages
clg research paper
No ratings yet
clg research paper
6 pages
Computers and Electrical Engineering: Jincai Chen Chao Sun Sheng Zhang Jiangfeng Zeng
No ratings yet
Computers and Electrical Engineering: Jincai Chen Chao Sun Sheng Zhang Jiangfeng Zeng
14 pages
ConFEDE- Contrastive Feature Decomposition for Multimodal Sentiment
No ratings yet
ConFEDE- Contrastive Feature Decomposition for Multimodal Sentiment
14 pages
A survey on sentiment analysis and opinion mining for social multimedia
No ratings yet
A survey on sentiment analysis and opinion mining for social multimedia
29 pages
3
No ratings yet
3
20 pages
2070481.2070509
No ratings yet
2070481.2070509
8 pages
2024 Progressive_Fusion_Network_with_Mixture_of_Experts_for_Multimodal_Sentiment_Analysis
No ratings yet
2024 Progressive_Fusion_Network_with_Mixture_of_Experts_for_Multimodal_Sentiment_Analysis
8 pages
MM 4
No ratings yet
MM 4
7 pages
Conference 5
No ratings yet
Conference 5
10 pages
AdaMoW Multimodal Sentiment Analysis Based on Adaptive Modality-Specific Weight Fusion Network (1)
No ratings yet
AdaMoW Multimodal Sentiment Analysis Based on Adaptive Modality-Specific Weight Fusion Network (1)
11 pages
3136755.3136801
No ratings yet
3136755.3136801
9 pages
15 Code Avail
No ratings yet
15 Code Avail
16 pages
32-2024-A Novel Dual-Pipeline based Attention Mechanism for
No ratings yet
32-2024-A Novel Dual-Pipeline based Attention Mechanism for
7 pages
MITADTSoCiCon2024_Paper178
No ratings yet
MITADTSoCiCon2024_Paper178
8 pages
2020.coling-main.93
No ratings yet
2020.coling-main.93
11 pages
Multimodal Sentiment Analysis
No ratings yet
Multimodal Sentiment Analysis
11 pages
Multimodal Sentiment Analysis-6
No ratings yet
Multimodal Sentiment Analysis-6
20 pages
s11042-024-19999-8
No ratings yet
s11042-024-19999-8
30 pages
2406.08068v1!!!!!!!!
No ratings yet
2406.08068v1!!!!!!!!
35 pages
2307.13205v1
No ratings yet
2307.13205v1
12 pages
Context Dependent Sentiment Analysis in User Generated Videos
No ratings yet
Context Dependent Sentiment Analysis in User Generated Videos
11 pages
Group10- Term Paper submit
No ratings yet
Group10- Term Paper submit
11 pages
(2019) (Springer) Sentiment analysis of multimodal twitter data
No ratings yet
(2019) (Springer) Sentiment analysis of multimodal twitter data
17 pages
Conference 4
No ratings yet
Conference 4
10 pages
1 s2.0 S1568494623005124 Main
No ratings yet
1 s2.0 S1568494623005124 Main
16 pages
(2023) (IEEE) Multimodal Emotion Classification With Multi-Level Semantic Reasoning Network
No ratings yet
(2023) (IEEE) Multimodal Emotion Classification With Multi-Level Semantic Reasoning Network
13 pages
Sentiment Analysis of Text and Audio Data IJERTV10IS120009
No ratings yet
Sentiment Analysis of Text and Audio Data IJERTV10IS120009
4 pages
clsdd
No ratings yet
clsdd
26 pages
SOTA-5
No ratings yet
SOTA-5
5 pages
Design Knowledge: A Visual Guide
From Everand
Design Knowledge: A Visual Guide
Rojin S. Vishkaie
No ratings yet
Recent Trends of Multimodal Affective Computing: A Survey From NLP Perspective
No ratings yet
Recent Trends of Multimodal Affective Computing: A Survey From NLP Perspective
26 pages
plati 1
No ratings yet
plati 1
16 pages
Multimodal Sentiment Analysis
No ratings yet
Multimodal Sentiment Analysis
6 pages
(2021) (IEEE) Image-Text Multimodal Emotion Classification via Multi-View Attentional Network
No ratings yet
(2021) (IEEE) Image-Text Multimodal Emotion Classification via Multi-View Attentional Network
13 pages
Literature Review
No ratings yet
Literature Review
5 pages
Exploiting Emojis in Sentiment Analysis A Survey
No ratings yet
Exploiting Emojis in Sentiment Analysis A Survey
14 pages
Significance of Sentiment Analysis Approaches Using Machine Learning ML Techniques
No ratings yet
Significance of Sentiment Analysis Approaches Using Machine Learning ML Techniques
6 pages
23-03-20 1. SA Levels, Framework
No ratings yet
23-03-20 1. SA Levels, Framework
56 pages
Research Paper - Major-Project
No ratings yet
Research Paper - Major-Project
11 pages
10.1007@s12559-020-09745-1
No ratings yet
10.1007@s12559-020-09745-1
33 pages
Sentiment Analysis To Handle Complex Linguistic Structures: A Review On Existing Methodologies
No ratings yet
Sentiment Analysis To Handle Complex Linguistic Structures: A Review On Existing Methodologies
7 pages
4. Attention-based Model for Multi-modal sentiment recognition using Text-Image Pairs
No ratings yet
4. Attention-based Model for Multi-modal sentiment recognition using Text-Image Pairs
5 pages
Multimodal_Classification_for_Analysing_Social_Med
No ratings yet
Multimodal_Classification_for_Analysing_Social_Med
16 pages
Picet Presentation
No ratings yet
Picet Presentation
12 pages
Multi-Label Multimodal Emotion Recognition
No ratings yet
Multi-Label Multimodal Emotion Recognition
17 pages
1383-Article Text-6285-2-10-20240305
No ratings yet
1383-Article Text-6285-2-10-20240305
8 pages
A REVIEW ON RECENT ADVANCES IN DEEP LEARNING FOR
No ratings yet
A REVIEW ON RECENT ADVANCES IN DEEP LEARNING FOR
9 pages
14 28 1 PB
No ratings yet
14 28 1 PB
19 pages
Challenges and Issues in Sentiment Analysis - A Comprehensive Survey
No ratings yet
Challenges and Issues in Sentiment Analysis - A Comprehensive Survey
18 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
5 pages
1-s2.0-S1566253517300738-main
No ratings yet
1-s2.0-S1566253517300738-main
28 pages
Acting with Technology: Activity Theory and Interaction Design
From Everand
Acting with Technology: Activity Theory and Interaction Design
Victor Kaptelinin
4/5 (10)
Getting Started With AI Using IBM Watson
No ratings yet
Getting Started With AI Using IBM Watson
33 pages
Beyond Accuracy: Behavioral Testing of NLP Models With Checklist
No ratings yet
Beyond Accuracy: Behavioral Testing of NLP Models With Checklist
11 pages
Artificial Intelligence Project Ideas
No ratings yet
Artificial Intelligence Project Ideas
7 pages
ICDSIS-2024 Conference-Template PDF
No ratings yet
ICDSIS-2024 Conference-Template PDF
8 pages
Identifying Personality Trait Using Social Media
No ratings yet
Identifying Personality Trait Using Social Media
9 pages
Systematic Review of Artificial Intelligence Application in Hospitality
No ratings yet
Systematic Review of Artificial Intelligence Application in Hospitality
16 pages
Assessment I
No ratings yet
Assessment I
16 pages
Machine Learning Task List
No ratings yet
Machine Learning Task List
14 pages
Unit Vi Natural Language Processing
No ratings yet
Unit Vi Natural Language Processing
2 pages
Influence of Social Media Over Stock Market
No ratings yet
Influence of Social Media Over Stock Market
9 pages
Understanding and Creating Art With AI Review and Outlook, de Eva Cetinic e James She
No ratings yet
Understanding and Creating Art With AI Review and Outlook, de Eva Cetinic e James She
17 pages
Customer Satisfaction Analysis For A Service Industry of Al-Arafah Islami Bank Limited
No ratings yet
Customer Satisfaction Analysis For A Service Industry of Al-Arafah Islami Bank Limited
25 pages
Sentiment Analysis For E-Commerce Product Reviews
No ratings yet
Sentiment Analysis For E-Commerce Product Reviews
9 pages
PYHTON PROGRAMMING
No ratings yet
PYHTON PROGRAMMING
63 pages
KedaiReka Dosen - Dari Perspektif DUDI
No ratings yet
KedaiReka Dosen - Dari Perspektif DUDI
52 pages
Natural Language Processing In The Real World Text Processing Analytics And Classification 1st Edition Jyotika Singh download
No ratings yet
Natural Language Processing In The Real World Text Processing Analytics And Classification 1st Edition Jyotika Singh download
76 pages
Artificial Intelligence and Monetary Policy: Enhancing Central Bank Decision-Making through AI-Driven Text Analysis
No ratings yet
Artificial Intelligence and Monetary Policy: Enhancing Central Bank Decision-Making through AI-Driven Text Analysis
7 pages
Comsats University Islamabad Wah Campus (Project Report) : Submitted by
No ratings yet
Comsats University Islamabad Wah Campus (Project Report) : Submitted by
14 pages
Sentiment Analysis For Vietnamese: Binh Thanh Kieu Son Bao Pham
No ratings yet
Sentiment Analysis For Vietnamese: Binh Thanh Kieu Son Bao Pham
6 pages
Natural Language Processing With Java - Sample Chapter
100% (1)
Natural Language Processing With Java - Sample Chapter
33 pages
Application of Fake News Detection in Stock Market Analyzer and Predictor Using Sentiment Analysis
No ratings yet
Application of Fake News Detection in Stock Market Analyzer and Predictor Using Sentiment Analysis
7 pages
AIDI1003_FinalReport
No ratings yet
AIDI1003_FinalReport
29 pages
Sentiment Analysis of Imdb Movie Review Database Final
100% (1)
Sentiment Analysis of Imdb Movie Review Database Final
16 pages
Epgp ML Ai 1706605342150
No ratings yet
Epgp ML Ai 1706605342150
27 pages
Antivirus Software Pitch Deck by Slidesgo
No ratings yet
Antivirus Software Pitch Deck by Slidesgo
29 pages
Coalesce of Artificial Intelligence and Human Resource Management: A Conceptual Study
No ratings yet
Coalesce of Artificial Intelligence and Human Resource Management: A Conceptual Study
9 pages
Scsa4003 - Business Analytics QB
No ratings yet
Scsa4003 - Business Analytics QB
6 pages
Natural Language Processingand Sentiment Analysis
No ratings yet
Natural Language Processingand Sentiment Analysis
15 pages
Research Proposal On Hate Speech Detection On Ethiopian Social Media Text Using Sentiment Analysis
No ratings yet
Research Proposal On Hate Speech Detection On Ethiopian Social Media Text Using Sentiment Analysis
7 pages