0% found this document useful (0 votes)
6 views

2

Uploaded by

Swapnil Joshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

2

Uploaded by

Swapnil Joshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

2020 11th International Conference on Information and Communication Systems (ICICS)

EmoDet2: Emotion Detection in English Textual


Dialogue using BERT and BiLSTM Models
Hani Al-Omari Malak A. Abdullah Samira Shaikh
2020 11th International Conference on Information and Communication Systems (ICICS) 978-1-7281-6227-0/20/$31.00 ©2020 IEEE 10.1109/ICICS49469.2020.239539

Dept. Computer Science Dept. Computer Science Dept. Computer Science


Jordan Univ. of Science and Technology Jordan Unive. of Science and Technology University of North Carolina-Charlotte
Irbid, Jordan Irbid, Jordan NC, USA
[email protected] [email protected] [email protected]

Abstract—Emotion detection is one of the most challenging learning approaches have been used to detect and predict
problems in the automated understand of language. Under- emotions and sentiments [9], however, it has been noticed that
standing human emotions using text without facial expression deep learning approaches can gain better performance than
is considered a complicated task. Therefore, building a machine
that understands the context of the sentences and differentiates traditional machine learning approaches [10] [11].
between emotions has motivated the machine learning commu- Our system can determine the emotion in English textual di-
nity recently. We propose a system to detect emotions using alogue and classify it into four categories (Happy, Sad, Angry
deep learning approaches. The main input to the system is a and Other). The main input to the system are utterances along
combination of GloVe word embeddings, BERT Embeddings with three turns of context. We extract features using a com-
and a set of psycholinguistic features (e.g. from AffectiveTweets
Weka-package). The proposed system (EmoDet2) is combining a bination of GloVe word embeddings, BERT Embeddings and
fully connected neural network architecture and BiLSTM neural a set of psycholinguistic features (e.g. from AffectiveTweets
network to obtain performance results that show substantial Weka-package). The proposed system is combining a fully
improvements (F1-Score 0.748) over the baseline model provided connected neural network architecture and BiLSTM neural
by Semeval-2019 / Task-3 organizers (F1-score 0.58). network. We have used the data provided by the organizers
Index Terms—Neural Network, BERT, Deep Learning, Ma-
chine learning, Emotions, Sentiment. of task 3, EmoContext competition, of Semeval 2019. The
performance results of the model show substantial improve-
ments (F1-Score 0.748) over the baseline model provided by
I. I NTRODUCTION
Semeval-2019 / Task-3 organizers (F1-score 0.58).
The challenge of defining emotions has motivated psy- The organization of this paper is as follows: First, we will go
chology researchers for a long period of time. In Lindsley through the related work in Section II. After that, Section III
et al., [1], the authors defined emotion as a complex be- provides more details about the model architecture. In Section
havioral phenomenon involving many levels of neural and IV, we will talk about the test and evaluation of the model.
chemical integration. The list of basic emotions varies in Section V will go through the conclusion of this research.
content and length [2]. Ekman [3] identified the six ba-
sic emotions: anger, disgust, fear, happiness, sadness, and II. R ELATED W ORK
surprise. While the list of basic emotion in Izard [4] had Emotions can be defined as complex state of feelings that
included anger, contempt, disgust, sadness, enjoyment-joy, come from physical and psychological changes in our lives
fear, interest-excitement, surprise-astonishment (and possibly and depend on the mood and the personality of speakers.
guilt, shame, and shyness) in his basic emotions list. Several researchers have investigated this to define emotions,
In the past decades, we have seen a rapid growth of user- including Ekman [3] who identified the six basic emotion like
generated content through different social media platforms, anger, disgust, fear, happiness, sadness and surprise. Machine
such as Facebook and Twitter, on a variety of topics on a daily learning researchers have built algorithms to understands emo-
basis. This content contains people’s sentiments and emotions tion. The researchers in Chatterjee et al. [12], worked on an
expressing happiness, sadness, and anger. Using social media LSTM model and fed it with two types of word embedding,
data, we can analyze and track public opinions to help predict semantic word embedding using Glove [13] and sentiment
attitudes towards certain products or political issues or even word embedding using Sentiment Specific Word Embedding
preventing depressed people from committing a suicide [5] [6]. (SSWE). In [10], the researchers used the data provided by
However, there are some difficulties toward detecting emotions shared task (Task 3: EmoContext) in SemEval-2019 workshop
using text-only without combining it with facial expressions. to build EmoDet model that ensembled a fully connected
Few corpora exist for emotion labeling with text. To under- neural network architecture and LSTM neural network. The
stand emotion, we need a way to get better knowledge from SEDAT model [11] detects sentiments and emotions in Arabic
the labeled data and predict new unlabeled data using machine tweets, using word and document embeddings and a set of se-
learning [7] and deep learning [8] techniques. Several machine mantic features in a CNN-LSTM and a fully connected neural

978-1-7281-6227-0/20/$31.00 ©2020 IEEE 226


Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 10,2020 at 10:38:58 UTC from IEEE Xplore. Restrictions apply.
2020 11th International Conference on Information and Communication Systems (ICICS)

network architecture. In EmoNet [14], the researchers worked that we did not apply preprocessing on the data for BERT
on building emotional chatbots to have a better understanding model because it can get more features from the raw data.
of other humans. They used huge labeled datasets and built
a system to classify 24 fine-grained emotion. Their system B. Extracting Feature Vectors
consisted of Gated Recurrent Neural Network (GRNN) that We have explored different encoding techniques to convert
is considered simpler and faster than the LSTM model. In the text into vector representation such as using word2vec
Illendula and Sheth [15], the researchers studied the effect of google news, glove, or fasttext embeddings. We have explored
emojis and images. They used BiLSTM model with Attention different features to represent each turn in the dataset and the
mechanism, and they fed it with fasttext embeddings, Emo- concatenated turns. Our approach have extracted feature vector
jiNet with the extracted features from the images. Rosenthal from texts as follows:
et al. [16] used the multi-view ensemble approach to detect First, we have extracted a 300-dimensional vector using
emotions. They trained models with features space like bag- the pretrained word2vec embedding model that is trained on
of-words and word2vec. They have used traditional machine Google News [19]. We also have extracted GloVe embedding
learning approaches, Logistic Regression and Support Vector that consists of a 300-dimensional vector. The Bert embed-
Machines. ding have been used to obtain 173-dimensional vector using
transformers package [20]
III. O UR A PPROACH Second, we also have extracted the semantic features
Our system, EmoDet2, can determine the emotion and by converting the whole conversation to 145-dimensional
sentiment in English textual dialogue and classify it into four vector using three vectors from the AffectiveTweets Weka-
categories (Happy, Sad, Angry and Other). In this section, we package [21] as follows: 43 features have been extracted us-
will describe the overall system design. ing the TweetToLexiconFeatureVectorAttribute that calculates
attributes for sentences using a variety of lexical resources, a
A. Collecting and Pre-Processing Data two-dimensional vector using the SentimentStrength features
For our approach, we have used the public data from the from the same package, and a 100-dimensional vector is
shared task (Task 3: EmoContext) of Semeval 2019 [17]. The obtained by vectorizing the sentence to embedding attributes.
task provides training, development and testing datasets to be
used by all participants. The number of training, development C. Network Architecture
and testing datasets for each emotion is shown in I. We can see EmoDet2 has been built using ensembling methods with
that the distribution between different classes is not balanced. different sub-models as shown in Fig.1: EmoDense, EmoDet-
BiLSTM-submodel1, EmoDet-BiLSTM-submodel2, EmoDet-
TABLE I BERT-BiLSTM (cased), and EmoDet-BERT-BiLSTM (un-
T RAINING AND TESTING DATASETS cased). More details for these four sub-models in the following
Train Data Dev Data Test Data three subsections.
Anger 5506 150 298 1) EmoDense: This submodel uses feed forward neural
Happy 4243 142 284 network that consists of four dense hidden layers with 512,
Sad 5463 125 250
Other 14948 2338 4677 256, 128, 64 neurons for each layer. The activation function for
Total 30160 2755 5509 all layers is ReLU [22]. We have added 0.2 dropout between
layers. The output layer consists of four sigmoid neurons to
predict the class of the conversation. For optimization, we have
The training corpus contains 5 columns: used Adam optimizer [23] with 0.0001 learning rate and Mean
• ID – a unique number to identify each training sample. Squared Error as a loss function. Moreover, we have saved
• Turn 1 – The first turn in the three-turn conversation, the output prediction weights to predict the testing dataset.
written by User 1. The fit function uses number of epochs= 40, batch size = 16,
• Turn 2 – The second turn, which is a reply to the first validation split= 0.33. The best epoch from the validation set
turn in conversation and written by User 2. have been chosen to be applied on the test data (more details
• Turn 3 – The third turn, which is a reply to the second in Fig.2).
turn in the conversation, which is written by User 1. 2) EmoDet-BiLSTM: The standard Recurrent Neural Net-
• Label – The human-judged label of Emotion of Turn 3 work (RNN) [24] is distinguished from Feed-Forward Network
based on the conversation for the given training sample. with a memory. A special kind of RNNs is Long Short Term
It always one of the four values – ‘happy’, ‘sad’, ‘angry’, Memory (LSTM), which is composed of a memory cell, an
and ‘other’. input gate, an output gate and a forget gate. The Bidirectional
We did not apply standard pre-processing steps like stem- Long Short-Term Memory (BiLSTM) [25] is the advanced
ming and removing the stopwords. We have converted all of form of LSTM in which the BiLSTM feeds the algorithm
the emojis in the text to textual forms and used Ekphrasis with the data once from beginning to the end, and once
[18] package to handle the spelling mistakes and add empty from the end to the beginning. This lets the network learn
strings between the special characters. It is worth mentioning more information from the data. We have applied two types

227
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 10,2020 at 10:38:58 UTC from IEEE Xplore. Restrictions apply.
2020 11th International Conference on Information and Communication Systems (ICICS)

Fig. 1. Overall Architecture Model

Fig. 2. EmoDense Submodel

of models using BiLSTM. For EmoDet-BiLSTM-submodel1, pretrained Glove embeddingsas and the extracted features from
we have one input, the encoded sentence, that goes into a AffectiveTweets Weka package for Turn 3 that consists of 445-
lookup table with 300-dimensional pretrained Glove vector dimensions. The encoded sentence goes into two BiLSTM
that represents words. After that, it goes into BiLSTM, which layers, each one with 256 nodes and 0.2 dropouts to avoid
consists of 2 layers each with 256 neurons and followed by 0.2 overfitting. Then the output is flattened. The second input
dropouts to avoid overfitting. Then, we take the output from goes into fully connected neural network with four dense
the BiLSTM and flatten it then feed it into fully connected hidden layers of 512, 256, 128, 64 neurons for each layer. The
neural network with four dense hidden layers of 512, 256, activation function for each layer is ReLU, and between them
128, 64 neurons for each. The activation function for each there are 0.2 dropouts. After that, the output of the second
layer is ReLU, and between them there are 0.2 dropouts. input are concatenated with the output from the first input to
The output layer consists of 4 sigmoid neurons to predict the go into a fully connected neural network with four dense layers
class of the conversation. For optimization, we also use Adam of 512, 256, 128, 64 neurons for each layer. The activation
optimizer with 0.0001 learning rate and Mean Squared Error function of each layer is ReLU with 0.2 dropouts. The output
as a loss function. We have saved the output prediction weights layer consists of 4 sigmoid neurons to predict the class of
to predict the testing datasets. The fit function uses number of the conversation. For optimization, we use Adam optimizer
epochs= 100, batch size = 32, validation split= 33. This is with 0.0001 learning rate and Mean Squared Error as a loss
shown in Fig. 3 that shows EmoDet-BiLSTM-submodel 1. function. The fit function uses number of epochs= 100, batch
size = 32, validation split= 33. Check Fig. 4 for EmoDet-
For EmoDet-BiLSTM-submodel 2, there are two inputs: BiLSTM-submodel 2.
the encoded sentence that is obtained by a 300-dimensional

228
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 10,2020 at 10:38:58 UTC from IEEE Xplore. Restrictions apply.
2020 11th International Conference on Information and Communication Systems (ICICS)

Fig. 3. EmoDet BiLSTM Sub-Model 1

Fig. 4. EmoDet BiLSTM Sub-Model 2

3) EmoDet-BERT-BiLSTM: In this sub-model, the input is IV. T ESTING AND E VALUATION


the raw data without preprocessing step. This is because the The techniques and the methodologies we have applied
BERT system can get better information from the original have been experimented widely to obtain the final architecture
data than using the processed data. So first, we have extracted model. This section will show the progress of the system step
BERT embeddings then fed them into two BiLSTM layers, by step.
each one with 128 nodes and 0.2 dropouts to avoid overfitting.
For the pre-processing techniques we have applied Ekphra-
After that, the output from BiLSTM layer has been flattened
sis package as we stated before. We have used it mainly
to be fed into fully connected neural network with four dense
to perform spelling corrections on the data. Then, we have
hidden layers of 512, 256, 128, 64 neurons for each layer.
converted the emoji from symbols to its text format using
The activation function for each layer is ReLU and between
Emoji package 1 . We have tested the preprocesing step in
them there are 0.2 dropouts. The output layer consists of 4
two different ways: 1- by taking each turn alone and 2- by
sigmoid neurons to predict the class of the conversation. For
taking all the turns together. Moreover, we have encoded the
optimization, we again have used Adam optimizer with 0.0001
words in the conversation using Word2vec, Glove Wiki, and
learning rate and Mean Squared Error as a loss function. We
Glove Common Crawl packages. Tables II, III and IV show the
have saved the output prediction weights to predict the testing
experiments that we have applied to the whole conversation
data sets. The Fit generator function uses number of epochs=
input to choose the best embedding system.
100, batch size = 128, validation split= .20. check Fig. 5 that
shows EmoDet-BERT-BiLSTM submodel.
TABLE II
E VALUATING THE M ODEL USING W ORD 2V EC
D. Ensembling
Epoch, Batch optimizer Accuracy Precision Recall F1
After several experiments we have obtained the best result (40,16) SGD 0.8034 0.3798 0.6322 0.4745
using the overall architecture that is shown in Fig.1. It is worth (40,32) SGD 0.7744 0.3153 0.5457 0.3996
mentioning that we have applied grid search to get the best set (40,16) Adam 0.8112 0.3984 0.6550 0.4955
(40,32) Adam 0.7942 0.3741 0.6803 0.4827
of weights for each model. The next section has more details
on the experiments that we have applied on each sub-model
to get the best performance. 1 https://pypi.org/project/emoji/

229
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 10,2020 at 10:38:58 UTC from IEEE Xplore. Restrictions apply.
2020 11th International Conference on Information and Communication Systems (ICICS)

Fig. 5. EmoDet BERT BiLSTM Submodel

TABLE III TABLE VI


E VALUATING THE M ODEL USING G LOVE W IKI E VALUATING THE M ODEL USING W ORD 2V EC (T URN 2 O NLY )

Epoch, Batch optimizer Accuracy Precision Recall F1 Epoch, Batch optimizer Accuracy Precision Recall F1
(40,16) SGD 0.8041 0.1885 0.1731 0.1805 (40,16) SGD 0.8461 0.2222 0.0144 0.0271
(40,32) SGD 0.8488 0.3333 0.0012 0.0024 (40,32) SGD 0.8441 nan 0.0000 0.0000
(40,16) Adam 0.8116 0.3923 0.5841 0.4693 (40,16) Adam 0.8455 0.2708 0.0156 0.0295
(40,32) Adam 0.7949 0.3642 0.5950 0.4518 (40,32) Adam 0.8490 nan 0.0000 0.0000

TABLE IV (Cased + Uncased) and Word2vec trained on uncased text


E VALUATING THE M ODEL U SING G LOVE C OMMON C RAWL (Table VII and VIII). The hyperparameters as follow: Dropout
= 0.4, the text in Word2Vec and Glove Wiki are lowered, and
Epoch, Batch optimizer Accuracy Precision Recall F1 Glove Common Crawl the text are cased. We conclude that
(40,16) SGD 0.8034 0.3829 0.6370 0.4783 the best accuracy is by using Glove Common Crawl pretrained
(40,32) SGD 0.7878 0.3501 0.5938 0.4405
(40,16) Adam 0.8321 0.4384 0.6070 0.5091 embedding and the third turn of the conversation provides
(40,32) Adam 0.8319 0.4377 0.6166 0.5120 better predictions of emotion class.

For fine-tuning the parameters, we have noticed the best TABLE VII
E VALUATING THE M ODEL USING W ORD 2V EC (T URN 3 O NLY )
parameters for our model are as follows: Dropout = 0.4, the
text for Word2Vec and Glove Wiki embeddings should be in
Epoch, Batch optimizer Accuracy Precision Recall F1
lower case, and Glove Common Crawl should use the cased (40,16) SGD 0.8726 0.5441 0.7260 0.6220
system. The best accuracy was by using Glove Common Crawl (40,32) SGD 0.8539 0.4658 0.5901 0.5207
pre-trained embedding as we think that the cased text can give (40,16) Adam 0.8880 0.5950 0.7380 0.6588
(40,32) Adam 0.8546 0.4521 0.5156 0.4818
extra features to the model so it can understand the context
better.
We have tested the model using Turn 1 only and Turn 2
only (shown in Tables V and VI). Also the hyperparameters TABLE VIII
are as follows: Dropout = 0.4 and the text in lower case. E VALUATING THE M ODEL USING G LOVE C OMMON CRAWL (T URN 3
O NLY )
TABLE V
E VALUATING THE M ODEL USING W ORD 2V EC (T URN 1 O NLY ) Epoch, Batch optimizer Accuracy Precision Recall F1
(40,16) SGD 0.8711 0.5377 0.7536 0.6276
(40,32) SGD 0.8533 0.4891 0.7260 0.5844
Epoch, Batch optimizer Accuracy Precision Recall F1 (40,16) Adam 0.8934 0.6099 0.7440 0.6703
(40,16) SGD 0.8461 0.1053 0.0048 0.0092 (40,32) Adam 0.8840 0.5755 0.7692 0.6584
(40,32) SGD 0.8435 0.0877 0.0060 0.0112
(40,16) Adam 0.7954 0.2809 0.2788 0.2799
(40,32) Adam 0.7873 0.2718 0.2957 0.2832
We have added the AffectiveTweets features to the third turn
features that consist of 145 dimensions from Glove Common
Next, we have tested the model using Turn 3 only and Crawl embeddings. The experiments are shown in Table IX.
extracted the words encoding using three pretrained embed- More experiments for EmoDet BiLSTM Submodel 1 are
ding models: Glove Common Crawl trained on normal text shown in Table X. We have used Glove Common Crawl

230
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 10,2020 at 10:38:58 UTC from IEEE Xplore. Restrictions apply.
2020 11th International Conference on Information and Communication Systems (ICICS)

TABLE IX V. C ONCLUSION
E VALUATING THE M ODEL USING G LOVE C OMMON CRAWL W ITH
A FFECTIVE T WEETS (T URN 3 O NLY ) In this paper, we have presented our system EmoDet2 that
Text type Accuracy Precision Recall F1 uses deep learning architectures for detecting the existence
Cased 0.8989 0.6196 0.7752 0.6887 of emotions in a text. The performance of the system (F1-
Lower 0.8840 0.5755 0.7692 0.6584 Score 0.75) surpasses the performance of the baseline model
(F1-Score 0.58) indicating that our approach is promising.
In this system, we have used word embedding models with
Embeddings to encode the text and the text were on cased feature vectors extracted using the AffectiveTweets. We also
form. We have used EmoDet BiLSTM Submodel 2 model to extracted word contextual embedding from BERT base model.
get better result than the previous model (Table XI). These vectors feed different deep neural network architectures,
feed-forward, and LSTM models, to obtain the predictions.
We use the SemEval-2019 Task 3 datasets as input for our
TABLE X
T ESTING B I LSTM SUB - MODEL 1
system and show that EmoDet2 has a high proficiency in
detecting emotions in a conversational text and surpasses the
Text type Accuracy Precision Recall F1 F1-score baseline model performance, which is provided by
Cased 0.8889 0.5836 0.7885 0.6708
Lower 0.8737 0.5440 0.7656 0.6360 the SemEval-Task 3 organizers.

R EFERENCES

TABLE XI [1] D. B. Lindsley, “Emotion.” S. S. Stevens (Ed.), Handbook of experimen-


T ESTING B I LSTM SUB - MODEL 2 tal psychology, pp. (473-516), 1951.
[2] P. R. Shaver, H. J. Morgan, and S. Wu, “Is love a “basic” emotion?”
Text type Accuracy Precision Recall F1 Personal Relationships, vol. 3, no. 1, pp. 81–96, 1996.
Cased 0.9002 0.6229 0.7921 0.6974 [3] P. Ekman, “Basic emotions,” Handbook of cognition and emotion,
Lower 0.8840 0.5755 0.7692 0.6584 vol. 98, no. 45-60, p. 16, 1999.
[4] C. Izard, “Emotions, personality, and psychotherapy. the psychology of
emotions. new york, ny, us,” 1991.
For EmoDet BERT BiLSTM model, the BERT embedding [5] M. De Choudhury, S. Counts, and E. Horvitz, “Social media as a
measurement tool of depression in populations,” in Proceedings of the
have been extracted using transformers package that we have 5th Annual ACM Web Science Conference, 2013, pp. 47–56.
summed from last hidden layers of BERT base model (Table [6] B. Bataineh, R. Duwairi, and M. Abdullah, “Ardep: An arabic lexicon
XII). for detecting depression,” in Proceedings of the 2019 3rd International
Conference on Advances in Artificial Intelligence, 2019, pp. 146–151.
[7] D. Michie, D. J. Spiegelhalter, C. Taylor et al., “Machine learning,”
TABLE XII Neural and Statistical Classification, vol. 13, 1994.
T ESTING E MO D ET BERT B I LSTM [8] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521,
no. 7553, pp. 436–444, 2015.
Text type Accuracy Precision Recall F1 [9] C. O. Alm, D. Roth, and R. Sproat, “Emotions from text: machine
Cased 0.8920 0.6012 0.7356 0.6616 learning for text-based emotion prediction,” in Proceedings of the con-
Lower 0.9100 0.6615 0.7680 0.7108 ference on human language technology and empirical methods in natural
language processing. Association for Computational Linguistics, 2005,
pp. 579–586.
[10] H. Al-Omari, M. Abdullah, and N. Bassam, “Emodet at semeval-2019
The result of ensembling all the models together gives task 3: Emotion detection in text using deep learning,” in Proceedings
the best result with F1-score 0.7478. We used the weighted of the 13th International Workshop on Semantic Evaluation, 2019, pp.
ensembling method as Figure 1 (Table XIII) for the results. 200–204.
[11] M. Abdullah, M. Hadzikadicy, and S. Shaikhz, “Sedat: sentiment and
Also, BERT model increased the model performance, the emotion detection in arabic text using cnn-lstm deep learning,” in
performance of the system has increased by extracting BERT 2018 17th IEEE International Conference on Machine Learning and
embeddings then feeding them into an LSTM network. Applications (ICMLA). IEEE, 2018, pp. 835–840.
[12] A. Chatterjee, U. Gupta, M. K. Chinnakotla, R. Srikanth, M. Galley, and
Finally, the performance of EmoDet2 (F1-Score 0.75) has P. Agrawal, “Understanding emotions in text using deep learning and
outperformed the performance of previous version EmoDet big data,” Computers in Human Behavior, vol. 93, pp. 309–317, 2019.
model [10] (F1-Score 0.67), and the baseline model (F1-Score [13] J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors
for word representation,” in Proceedings of the 2014 conference on
0.58) from the Shared Task 3 of SemEval2019 workshop empirical methods in natural language processing (EMNLP), 2014, pp.
competition. 1532–1543.
[14] M. Abdul-Mageed and L. Ungar, “Emonet: Fine-grained emotion de-
tection with gated recurrent neural networks,” in Proceedings of the
TABLE XIII 55th Annual Meeting of the Association for Computational Linguistics
T ESTING OVERALL M ODEL ARCHITECTURE (Volume 1: Long Papers), 2017, pp. 718–728.
[15] A. Illendula and A. Sheth, “Multimodal emotion classification,” in
Accuracy Precision Recall F1 Companion Proceedings of The 2019 World Wide Web Conference.
0.9199 0.6900 0.8161 0.7478 ACM, 2019, pp. 439–449.
[16] S. Rosenthal, N. Farra, and P. Nakov, “Semeval-2017 task 4: Sentiment
analysis in twitter,” arXiv preprint arXiv:1912.00741, 2019.

231
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 10,2020 at 10:38:58 UTC from IEEE Xplore. Restrictions apply.
2020 11th International Conference on Information and Communication Systems (ICICS)

[17] A. Chatterjee, K. N. Narahari, M. Joshi, and P. Agrawal, “Semeval-2019


task 3: Emocontext contextual emotion detection in text,” in Proceedings
of the 13th International Workshop on Semantic Evaluation, 2019, pp.
39–48.
[18] C. Baziotis, N. Pelekis, and C. Doulkeridis, “DataStories at SemEval-
2017 task 4: Deep LSTM with attention for message-level and topic-
based sentiment analysis,” in Proceedings of the 11th International
Workshop on Semantic Evaluation (SemEval-2017). Vancouver,
Canada: Association for Computational Linguistics, Aug. 2017, pp. 747–
754. [Online]. Available: https://www.aclweb.org/anthology/S17-2126
[19] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of
word representations in vector space,” arXiv preprint arXiv:1301.3781,
2013.
[20] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi,
P. Cistac, T. Rault, R. Louf, M. Funtowicz, and J. Brew, “Huggingface’s
transformers: State-of-the-art natural language processing,” ArXiv, vol.
abs/1910.03771, 2019.
[21] S. Mohammad, F. Bravo-Marquez, M. Salameh, and S. Kiritchenko,
“Semeval-2018 task 1: Affect in tweets,” in Proceedings of the 12th
international workshop on semantic evaluation, 2018, pp. 1–17.
[22] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Ben-
gio, “Maxout networks,” arXiv preprint arXiv:1302.4389, 2013.
[23] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
arXiv preprint arXiv:1412.6980, 2014.
[24] T. Mikolov, M. Karafiát, L. Burget, J. Černockỳ, and S. Khudanpur,
“Recurrent neural network based language model,” in Eleventh annual
conference of the international speech communication association, 2010.
[25] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural net-
works,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp.
2673–2681, 1997.

232
Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on May 10,2020 at 10:38:58 UTC from IEEE Xplore. Restrictions apply.

You might also like