0% found this document useful (0 votes)
9 views6 pages

Efficacy of BERT Embeddings On Predicting Disaster From Twitter Data

The document discusses exploring the efficacy of BERT embeddings compared to traditional embeddings for predicting disasters from Twitter data. It provides context on how social media like Twitter is used to share information during disasters in real-time. However, understanding sentiments from short tweets is challenging due to issues like context. The document aims to show that BERT embeddings, which provide different embeddings based on context, perform better than traditional embeddings for disaster prediction from tweets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views6 pages

Efficacy of BERT Embeddings On Predicting Disaster From Twitter Data

The document discusses exploring the efficacy of BERT embeddings compared to traditional embeddings for predicting disasters from Twitter data. It provides context on how social media like Twitter is used to share information during disasters in real-time. However, understanding sentiments from short tweets is challenging due to issues like context. The document aims to show that BERT embeddings, which provide different embeddings based on context, perform better than traditional embeddings for disaster prediction from tweets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Efficacy of BERT embeddings on predicting disaster from

Twitter data
Ashis Kumar Chanda
[email protected]
Temple University
Philadelphia, PA, USA

ABSTRACT dataset that becomes an important source of data to make different


Social media like Twitter provide a common platform to share and types of research analysis. Moreover, the social media data are real
communicate personal experiences with other people. People of- time data and accessible to monitor. Therefore, several research
arXiv:2108.10698v1 [cs.CL] 8 Aug 2021

ten post their life experiences, local news, and events on social works are conducted to perform different types real time predictions
media to inform others. Many rescue agencies monitor this type using the social media data, such as stock movement prediction [20],
of data regularly to identify disasters and reduce the risk of lives. relation extraction [26], and natural disaster prediction [17, 32].
However, it is impossible for humans to manually check the mass Twitter is such a social media site that can be accessed through
amount of data and identify disasters in real-time. For this pur- people’s laptops and smartphones. The rapid growth of smartphone
pose, many research works have been proposed to present words in or laptop usages enables people to share an emergency that they
machine-understandable representations and apply machine learn- observe in real time. For this reason, many disaster relief organi-
ing methods on the word representations to identify the sentiment zations and news agencies are interested in monitoring Twitter
of a text. The previous research methods provide a single represen- data programmatically. However, unlike long articles, tweets are
tation or embedding of a word from a given document. However, short length text, and they tend to have more challenges due to
the recent advanced contextual embedding method (BERT) con- their shortness, sparsity (i.e., diverse word content) [5], velocity
structs different vectors for the same word in different contexts. (rapid growth of short text like SMS and tweet) and misspelling [2].
BERT embeddings have been successfully used in different natural For these reasons, it is very challenging to understand whether a
language processing (NLP) tasks, yet there is no concrete analy- person’s words are announcing a disaster or not. For example, a
sis of how these representations are helpful in disaster-type tweet tweet like this, "#𝑜𝑙𝑑𝐵𝑎𝑛𝑑 𝑎𝑚𝑎𝑧𝑖𝑛𝑔 𝑝𝑒𝑟 𝑓 𝑜𝑟𝑚𝑎𝑛𝑐𝑒! 𝑙𝑖𝑔ℎ𝑡, 𝑐𝑜𝑙𝑜𝑟, 𝑓 𝑖𝑟𝑒
analysis. In this research work, we explore the efficacy of BERT 𝑜𝑛 𝑠𝑡𝑎𝑔𝑒! 𝑙𝑜𝑡𝑠 𝑜 𝑓 𝑝𝑒𝑜𝑝𝑙𝑒 𝑎𝑛𝑑 ℎ𝑢𝑔𝑒 𝑐ℎ𝑎𝑜𝑠!" tells us an experience of a
embeddings on predicting disaster from Twitter data and compare person in a concert and we can say from that he enjoyed it, because
these to traditional context-free word embedding methods (GloVe, of the word, "𝑎𝑚𝑎𝑧𝑖𝑛𝑔". Even though it contains the word, "𝑓 𝑖𝑟𝑒",
Skip-gram, and FastText). We use both traditional machine learning it does not mean any danger or emergency; rather, it is used to
methods and deep learning methods for this purpose. We provide describe the colorful decoration of the stage. Let’s assume another
both quantitative and qualitative results for this study. The results tweet like this, "𝐶𝑎𝑙𝑖 𝑓 𝑜𝑟𝑛𝑖𝑎 𝐻𝑤𝑦. 20 𝑐𝑙𝑜𝑠𝑒𝑑 𝑖𝑛 𝑏𝑜𝑡ℎ 𝑑𝑖𝑟𝑒𝑐𝑡𝑖𝑜𝑛𝑠 𝑑𝑢𝑒
show that the BERT embeddings have the best results in disaster 𝑡𝑜 𝐿𝑎𝑘𝑒 𝐶𝑜𝑢𝑛𝑡𝑟𝑦 𝑓 𝑖𝑟𝑒". Here, the word “𝑓 𝑖𝑟𝑒" means disaster, and
prediction task than the traditional word embeddings. Our codes the tweet describes an emergency. The two examples show that one
are made freely accessible to the research community. word could have multiple meanings based on its context. There-
fore, understanding the context of words is important to analyze a
KEYWORDS tweet’s sentiment.
Different researchers proposed different methods to understand
Twitter data, social media data, disaster prediction, BERT, Kaggle
the meaning of a word by representing it in embedding or vector
competition, natural language processing (NLP)
[4, 18, 23]. Neural network-based methods such as Skip-gram [18],
Reference Format: FastText [4] are popular for learning word embeddings from large
Ashis Kumar Chanda. 2021. Efficacy of BERT embeddings on predicting word corpus and have been used for solving different types of
disaster from Twitter data. In Archive ’21. USA, 6 pages. NLP tasks. These methods are also used for sentiment analysis of
Twitter data [6, 24]. However, those embedding learning methods
1 INTRODUCTION provide static embedding for a single word in a document. Hence,
In the current age of internet, online social media sites have be- the meaning of the word,“ 𝑓 𝑖𝑟𝑒" would remain the same in the above
come available to all people, and people tend to post their personal two examples for these methods.
experiences, current events, local and global news. For this reason, To handle this problem, the authors of [7] proposed a contextual
the daily usages of social media are growing up and making a large embedding learning model, Bidirectional Encoder Representations
from Transformers (BERT), that provides embeddings of a word
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
based on its context words. In different types of NLP tasks such
for profit or commercial advantage and that copies bear this notice and the full citation as text classification [28], text summarization [15], entity recogni-
on the first page. To copy otherwise, or republish, to post on servers or to redistribute tion [8], BERT model outperformed traditional embedding learning
to lists, requires prior specific permission. Request permissions from the author.
Archive 2021, August 08, 2021, . models. However, it is interesting to discover how the contextual
© 2021 copyright. embeddings could help to understand disaster-type texts. For this
Archive 2021, August 08, 2021, . Chanda. et al.

reason, we plan to analyze the disaster prediction task from Twitter The word embeddings are used in both traditional machine learning
data using both context-free and contextual embeddings in this methods and deep learning models as input for disaster prediction.
study. We use traditional machine learning methods and neural
network models for the prediction task where the word embeddings 3.1 BOW embeddings
are used as input to the models. We show that contextual embed- The bag-of-words (BOW) model is a common approach for text
dings work better in predicting disaster-types tweets than the other representation of a word document. If there are 𝑉 words in a text
word embeddings. Finally, we provide an extensive discussion to vocabulary, then BOW is a binary vector or array of length |𝑉 |
analyze the results. where each index of the array is used to present one word of the
The main contributions of this paper are summarized as follows. vocabulary. If a word exists in a document, then the corresponding
(1) We analyze a real-life natural language online social net- array index of the word becomes one; otherwise, it contains zero. We
work dataset, Twitter data, to identify challenges in human use BOW embeddings of Twitter data in three traditional machine
sentiment analysis for disaster-type tweet prediction. learning methods such as decision tree, random forest, and logistic
(2) We apply both contextual and context-free embeddings in regression to predict the sentiment of a tweet.
tweet representations for disaster prediction through ma- Even though BOW is good for representing words of a document,
chine learning methods and show that context-free embed- it loses contextual information because the order of words is not
dings (BERT) can improve the accuracy of disaster prediction recorded in the binary structure. However, contextual information
compared with contextual embeddings. is required to understand and analyze the sentiment of a text. For
(3) We provide a detailed explanation of our method and results this reason, we also plan to use context-based embeddings for this
and share our codes publicly that will enable researchers to sentiment analysis task.
run our experiments and reproduce our results for future
research work directions 1 . 3.2 Context-free embeddings
The rest of the paper is organized as follows. In section 2, some Many existing research works proposed to learn word embeddings
related works are introduced. The main methodology of this paper based on the co-occurrences of word pairs in documents. GloVe [23]
is elaborated in section 3. The dataset and the experiments are is one common method for learning word embeddings from the
presented in section 4 and 5, respectively. Finally, the conclusion is co-occurrences of words in documents. However, neural network-
drawn in section 6. based models such as Skip-gram [18], FastText [4] became popular
recently to learn word representations from documents and used
for sentiment analysis.
2 RELATED WORKS
In our research study, we use the pre-trained embeddings of
Many research works analyzed Twitter data for understanding three context-free embedding models (GloVe, Skip-gram, FastText)
emergency situation and predicting disaster analysis [3, 12, 21, in a neural network-based model to analyze the sentiment of tweet
33]. One group of researchers used text mining and statistical ap- data and predict disaster types tweets. To represent a tweet in
proaches to understand crises [12, 33], another group of researchers context-free embeddings, we took the average of word embeddings
focused on clustering text data to identify a group of tweets that of a tweet following the same strategy of [13]. For the calculated
belong to disaster [3, 21]. Later, different traditional machine learn- vector of a tweet, we use softmax to predict the sentiment of the
ing models are used to analyze Twitter data and predict disaster tweet. Let’s suppose that the vector of a tweet is 𝑣, we have a set of
or emergency situations where words of a tweet are represented labels, 𝐿 = {"positive", "negative"} and 𝑍 ∈ R ( |𝐿 |×𝑑) is the weight
as embeddings [1, 22, 27]. For example, Palshikar et al. [22] pro- matrix of softmax function. Then, the probability of the tweet to
posed a weakly supervised model where words are presented with be positive or disaster is calculated as follows
a bag of words (BOW) model. Moreover, frequency-based word
representation is used in [1] for disaster prediction from Twitter 𝑒 𝑍𝑖 ·𝑣
data using Naive Bayes, Logistic Regression, Random Forest, and 𝑝 (𝑦 (𝑙𝑖 ) = 1) = Í 𝑍 ·𝑣
𝑙𝑘 ∈𝐿 𝑒 𝑘
SVM methods. The authors in [27] used a Markov-based model to
predict the location of Tweets during a disaster. In a recent work Recently, deep neural networks are also used for sentiment anal-
[25], the authors proposed a pre-processing method for BERT-based ysis. To observe how the context-free embeddings work on deep
sentiment analysis of tweets. However, it is interesting to explore neural networks, we used a bidirectional recurrent neural network
the model performance on different word embeddings to observe with LSTM gates [9]. The Bi-LSTM model processes the input words
how the context words help to predict a tweet as a disaster. of a tweet from right to left and in reverse. The Bi-LSTM block is
followed by a fully connected layer and sigmoid function to output
as an activation function.
3 METHODOLOGY
In this section, we discuss our approach of leveraging word em- 3.3 Contextual embeddings
bedding for disaster prediction from Twitter data using machine
Unlike the other word embeddings, BERT embeddings [7] generates
learning methods. We consider three types of word embeddings, 1)
different vectors for the same word in different contexts. Recent
bag of words (BOW), 2) context-free, and 3) contextual embeddings.
advances in NLP have shown that BERT model have outperformed
traditional embeddings in different NLP tasks, like entity extrac-
1 https://github.com/ashischanda/sentiment-analysis tion, next sentence prediction. In our study, we plan to investigate
Efficacy of BERT embeddings on predicting disaster from Twitter data Archive 2021, August 08, 2021, .

how well do contextual embeddings work better than traditional 4.1 Data pre-processing
embeddings in sentiment analysis. For this purpose, we use the pre- Since the twitter data is natural language text, and it contains
trained embeddings of BERT models in the same neural network different types of typos, punctuation, abbreviations, and numbers.
models to predict disaster types tweets. For this reason, before training machine learning models on the
natural language text, a text pre-processing step is required to
remove stop words and word tokenization. Hence, we removed
all the stopwords and punctuations from the training data and
converted all the words into lower-case letters. Table 1 shows some
pre-processed tweets with the original tweets.

4.2 Data analysis


Before running any machine learning methods on our data, we
analyzed our dataset to obtain some insights about the data. Ta-
ble 2 shows some statistical results on the training data after pre-
processing the text. From the table, we find that there are 43% tweets
that are annotated as real disasters and 57% are not. There are a total
of 21,940 unique words, while only 6,816 words have frequency
>1. The average length of tweets is 12.5. However, it is important
to check the length of positive and negative tweets separately to
verify whether they have common characteristics. Figure 1 shows
word distribution for both the positive and negative tweets. The
Figure 1: Twitter length distribution in training data figure shows many negative tweets with small word lengths (<10),
but most positive and negative tweets are in a word length of 10 to
20.
We also analyzed the word frequency for positive and negative
4 DATASET tweets. Figure 2 shows the most frequent words in a word cloud
For this study, we used a Twitter dataset from a recent Kaggle where the high font of a word presents high frequency. We can
competition (Natural Language Processing with Disaster Tweets find some common words in both types of tweets (i.e., https, t,
2 ). Kaggle competition is a very well-known platform for machine
co, people). However, Figure 2(a) highlights many disaster-related
learning researchers where many research agencies share their words like storm, fire, bomber, death, and earthquake. On the other
data to solve different types of research problems. For example, hand, Figure 2(b) highlights daily used words such as think, good,
many researchers used data from Kaggle competitions to analyze love, now, time. From this figure, it is clear that the most frequent
real-life problems and propose models to solve the problems, such words are different in the two types of tweets, and understanding
as sentiment analysis, feature detection, diagnosis prediction [10, the meaning of words is important to classify them.
14, 29–31].
In the selected Kaggle competition, a dataset of 10,876 tweets is 5 EXPERIMENTS
given to predict which tweets are about real disasters and which
In our experimental study, we conduct several experiments based
ones aren’t using machine learning model. This dataset has two
on the real Twitter data to predict disaster-types tweets. At first, we
separate files, train (7,613 tweets) and test (3,263 tweets) data, where
describe the experimental settings and model training procedures
each row of the train data contains id, natural language text or
in this section. Then, we analyze the experimental results in detail.
tweet, and label. The labels are manually annotated by humans.
They labeled a tweet as positive or one if it is about real disaster,
otherwise as negative or zero. On the other hand, the test data 5.1 Experimental settings
has D and natural language text but no label. The competition site 5.1.1 Traditional ML models with BOW embeddings. From the Ta-
stores the labels of test data privately and uses that to calculate ble 2, we find that the training data has 21,940 unique words where
test scores based on user’s machine learning model predictions 6,816 words have frequency more than 1. To avoid infrequent words,
and create leader-board for the competition based on the test score. we considered only the vocabulary of 6,816 words in our BOW rep-
Moreover, this dataset was created by the figure-eight company resentations. To represent a tweet in BOW embeddings, we took a
and originally shared on their website 3 . binary array of 6,816 length where it had 1 if a word of tweet was
We used the training data to train different machine learning present in the vocabulary, otherwise 0. We used the BOW embed-
models and predict test data labels using trained models. We re- dings to predict the sentiment of a tweet using three traditional
ported both the train and test data score in our experiment. Note machine learning models, 1) decision tree, 2) random forest and 3)
that our purpose is not to get a high score in the competition, rather logistic regression. We used python Sklearn package 4 and used
use Twitter data to study our research goals. all the default parameters to train the models on our train dataset.
2 https://www.kaggle.com/c/nlp-getting-started
3 https://appen.com/open-source-datasets/ 4 https://scikit-learn.org/stable/
Archive 2021, August 08, 2021, . Chanda. et al.

Table 1: Sample pre-processed tweets

Tweet (original) Tweet (after preprocessing)


#RockyFire Update => California Hwy. 20 closed in both rockyfire update california hwy 20 closed
directions due to Lake County fire - #CAfire #wildfires directions due lake county fire cafire wildfires
@TheAtlantic That or they might be killed in an airplane theatlantic might killed airplane accident
accident in the night a car wreck! night car wreck

Figure 2: Showing the most frequent words in training data

Table 2: Training data statistic entropy loss function. We took 1% of training data as validation
data and used the validation data to stop the training model if the
Total train data 7,613 loss value for the validation data didn’t decrease in last ten epochs.
Total positive data (or disaster tweets) 3,271 Similarly to the softmax model, we also trained our Bi-LSTM model
Total unique words 21,940 using batch gradient descent algorithm for 100 epochs to minimize
Total unique words with frequency > 1 6,816 the binary cross entropy loss function. We followed the same stop
Avg. length of tweets 12.5 rule for this model.
Median length of tweets 13
Maximum length of tweets 29 5.1.3 Deep learning models with contextual embeddings. To ob-
Minimum length of tweets 1 tain contextual embeddings, we downloaded publicly available pre-
trained BERT model (Bert-base-uncased) [7] from the official site
of the authors 6 . We gave tweets as inputs in the BERT model and
After training the model, we used the test data to get labels and took the hidden states of the [𝐶𝐿𝑆] token of the last layer from the
submit that in Kaggle to have test score. model as embeddings of the given tweets. Then, the embeddings is
used in our sigmoid model to predict sentiment of tweets. The same
5.1.2 Deep learning models with context-free embeddings. For this
setting is used in a previous paper [11] to predict patient diagnosis
experiment, we chose three context-free methods, 1) Skip-gram
from medical note words using pre-trained BERT model.
[18], 2) FastText [4], and 2) GloVe [23]. We used publicly available
Moreover, we can find embeddings of each words of a tweet
pre-trained embeddings of Skip-gram and fastText models that
from the pre-trained BERT model. The BERT’s pre-trained word
are trained on Wikipedia data 5 . The pre-trained embeddings of
embeddings are used as input to our Bi-LSTM model. The authors
FastText is collected from Mikolov et. al. 2018 [19]. The size of all
of [16] used the similar setting for the sentiment analysis of text
the pre-trained embeddings or feature is 300.
data.
The proposed softmax model is trained for 100 epochs using
a stochastic gradient algorithm to minimize the categorical cross
5 https://nlp.stanford.edu/projects/glove/ 6 https://github.com/google-research/bert
Efficacy of BERT embeddings on predicting disaster from Twitter data Archive 2021, August 08, 2021, .

5.2 Evaluation metric Table 3: Performance of different machine learning models


on disaster prediction for different types of word represen-
Three different metrics are used in our experiment to evaluate
tations or embeddings
the performance of the machine learning models on the disaster
prediction task such as, 1) accuracy, 2) F1 score, and 3) Area Under
the Curve (AUC). In our experiment, we considered disaster tweets Model Train data Test data
as ’positive class’ and others as ’negative class’. Hence, True Positive AUC F1 Acc Acc
(TP) means the actual disaster tweets that are predicted as disaster BOW embeddings
while False Positive (FP) shows the tweets that are actually false, Decision tree 0.6320 0.5896 0.6273 0.6380
but predicted as true. True Negative (TN) and False Negative (FN) Random forest 0.8313 0.7320 0.7848 0.7042
imply in the same way. The accuracy is the number of correctly Logistic regression 0.8660 0.7443 0.7927 0.7293
predicted tweets among all of the tweets and it is calculated as Context-free embeddings
follows. Skip-gram+Softmax 0.8281 0.7301 0.7769 0.7649
TP+TN FastText+Softmax 0.8336 0.7231 0.7769 0.7826
Accuracy (Acc) =
TP+FP+TN+FN GloVe+Softmax 0.8246 0.7323 0.7717 0.7827
F1 score is another popular metric to test predictive performance Skip-gram+Bi-LSTM 0.8272 0.7440 0.7808 0.7775
of a model. The F1 score is measured by the harmonic mean of FastText+Bi-LSTM 0.8327 0.7369 0.7817 0.7955
recall and precision where recall means the number of true labels GloVe+Bi-LSTM 0.8351 0.7500 0.7991 0.8093
are predicted by a model among the total number of existing true
Contextual embeddings
labels and precision means the number of true labels are predicted
by a model divided by the total number of labels are predicted by BERT+Softmax 0.8513 0.8254 0.8292 0.8250
the model. The F1 score is calculated as follows. BERT+Bi-LSTM 0.8578 0.8316 0.8351 0.8308

TP
Recall (R) =
TP+FN
TP 5.3.2 Qualitative results. Table 3 shows us quantitative results
Precision (P) =
TP+FP for the prediction of disaster tweets where the neural model with
contextual embeddings outperformed the other models. However,
2 × (P × R)
F1 score (F1) = it is difficult to understand from the result that when the contextual
(P+R)
embeddings predict a disaster tweet successfully while context-free
On the other hand, AUC tells us how much a model is capable of models fail. For this purpose, we observe the prediction results of
distinguishing between classes. The higher score of the AUC means the Bi-LSTM model for both the context-free (GloVe) and contextual
the model works better at predicting negative classes as zero and embeddings (BERT). Table 4 shows the model predictions with true
positive classes as one. labels for some sample tweets. From the table, we can find that
the predictions for GloVe embeddings for the first two tweets are
5.3 Experimental results positive, maybe because of the word, “𝑎𝑐𝑐𝑖𝑑𝑒𝑛𝑡”, in the tweets, but
5.3.1 Quantitative results. Table 3 provides the results of all the the true labels are negative for the two tweets. If we read the tweets,
machine learning models on the disaster prediction tasks for all then we can understand that the tweets are not related to disaster
three types of embeddings. The table shows results for both the or crisis. On the other hand, since the BERT model generates word
training and test data. Since the test data results are collected from embeddings based on the context words, it successfully predicts
the Kaggle competition, we only can report the accuracy score. the tweets as negative.
The table shows that the logistic regression model has the best re- The predictions for GloVe embeddings for the third and fourth
sults for the BOW embeddings among the three traditional machine tweets are false while they are true. Note that no disaster-related
learning models. However, the results of neural network models words are used in the two tweets, but the tweets described serious
for context-free embeddings are better than the traditional ma- situations. The predictions for BERT embeddings are also correct
chine learning models that used context-free embeddings as inputs. for the third and fourth tweets. The predictions of GloVe and BERT
Among the three context-free embeddings (Skip-gram, FastText, embeddings for the fifth and sixth tweets of Table 3 are correct.
GloVe), the GloVe with Bi-LSTM model has the best train and test Since there are some disaster-related words (i.e., suicide, bomber,
score for all the three evaluation metrics. Note that the results also bombing) in the tweets, both models successfully labeled them.
show us that deep learning model like Bi-LSTM has better results After analyzing the results of Table 4, it can be implied that the
than the shallow neural network model such as the softmax model. context-free embeddings are helpful to predict a tweet as a disaster
Moreover, when we used the same shallow neural network and if disaster-related words (i.e., accident, bomb) exist in the tweet. In
deep learning models for contextual embeddings such as BERT; we contrast, contextual embeddings help to understand the context of
found that there are 2% improvements on AUC and Acc over the a tweet that is challenging and important for the sentiment analysis
context-free embeddings. It means that contextual embeddings are task. Although every tweet has a short length text, contextual
helpful and have the best performance for the disaster prediction embeddings works efficiently to understand the sentiment of a
task. tweet.
Archive 2021, August 08, 2021, . Chanda. et al.

Table 4: Showing sentiment predictions of Bi-LSTM model for pre-trained GloVe and BERT embeddings

Sample tweets Prediction True


GloVe BERT label
1 I swear someone needs to take it away from me, cuase I’m just accident prone. Yes No No
2 @Dave if I say that I met her by accident this week- would you be
super jelly Dave? :p Yes No No
3 Schoolgirl attacked in Seaton Delaval park by ’pack of animals’ No Yes Yes
4 Not sure how these fire-workers rush into burning buildings
but I’m grateful they do. #TrueHeroes No Yes Yes
5 A suicide bomber has blown himself up at a mosque in the south Yes Yes Yes
6 Bombing of Hiroshima 1945 Yes Yes Yes

6 CONCLUSION [16] Zhibin Lu, Pan Du, and Jian-Yun Nie. 2020. VGCN-BERT: augmenting BERT with
graph embedding for text classification. Advances in Information Retrieval 12035
In this paper, we described an extensive analysis for predicting (2020), 369.
disaster from Twitter data using different types of word embeddings. [17] Monica Mai, Carson K Leung, Justin MC Choi, and Long Kei Ronnie Kwan. 2020.
Big data analytics of Twitter data and its application for physician assistants: who
Our experimental results show that contextual embeddings have the is talking about your profession in Twitter? In Data Management and Analysis.
best result for predicting disaster from tweets. We also showed that Springer, 17–32.
deep neural network models outperformed traditional machine [18] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient
Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 (2013).
learning methods in the disaster prediction task. Advance deep arXiv:1301.3781 http://arxiv.org/abs/1301.3781
neural network models such as multi-layer convolutional models [19] Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Ar-
can also be used for this prediction task to achieve higher accuracy. mand Joulin. 2018. Advances in Pre-Training Distributed Word Representations.
In Proceedings of the International Conference on Language Resources and Evalua-
tion (LREC 2018).
[20] Thien Hai Nguyen, Kiyoaki Shirai, and Julien Velcin. 2015. Sentiment analysis
REFERENCES on social media for stock movement prediction. Expert Systems with Applications
[1] Siddu P Algur and S Venugopal. 2021. Classification of Disaster Specific Tweets-A 42, 24 (2015), 9603–9611.
Hybrid Approach. In 2021 8th International Conference on Computing for Sustain- [21] Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Sarah Vieweg. 2014.
able Global Development (INDIACom). IEEE, 774–777. Crisislex: A lexicon for collecting and filtering microblogged communications in
[2] Issa Alsmadi and Keng Hoon Gan. 2019. Review of short-text classification. crises. In Eighth international AAAI conference on weblogs and social media.
International Journal of Web Information Systems (2019). [22] Girish Keshav Palshikar, Manoj Apte, and Deepak Pandita. 2018. Weakly su-
[3] Zahra Ashktorab, Christopher Brown, Manojit Nandi, and Aron Culotta. 2014. pervised and online learning of word models for classification to detect disaster
Tweedr: Mining twitter to inform disaster response.. In ISCRAM. Citeseer, 269– reporting tweets. Information Systems Frontiers 20, 5 (2018), 949–959.
272. [23] Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove:
[4] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2016. En- Global vectors for word representation. In Proceedings of the 2014 conference on
riching word vectors with subword information. arXiv preprint arXiv:1607.04606 empirical methods in natural language processing (EMNLP). 1532–1543.
(2016). [24] A Poornima and K Sathiya Priya. 2020. A comparative sentiment analysis of
[5] Mengen Chen, Xiaoming Jin, and Dou Shen. 2011. Short text classification sentence embedding using machine learning techniques. In 2020 6th International
improved by learning multi-granularity topics. In Twenty-second international Conference on Advanced Computing and Communication Systems (ICACCS). IEEE,
joint conference on artificial intelligence. Citeseer. 493–496.
[6] B Oscar Deho, A William Agangiba, L Felix Aryeh, and A Jeffery Ansah. 2018. Sen- [25] Marco Pota, Mirko Ventura, Hamido Fujita, and Massimo Esposito. 2021. Multi-
timent analysis with word embedding. In 2018 IEEE 7th International Conference lingual evaluation of pre-processing for BERT-based sentiment analysis of tweets.
on Adaptive Science & Technology (ICAST). IEEE, 1–4. Expert Systems with Applications 181 (2021), 115119.
[7] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: [26] Alan Ritter, Evan Wright, William Casey, and Tom Mitchell. 2015. Weakly
Pre-training of deep bidirectional transformers for language understanding. arXiv supervised extraction of computer security events from twitter. In Proceedings of
preprint arXiv:1810.04805 (2018). the 24th international conference on world wide web. 896–905.
[8] Kai Hakala and Sampo Pyysalo. 2019. Biomedical named entity recognition with [27] Jyoti Prakash Singh, Yogesh K Dwivedi, Nripendra P Rana, Abhinav Kumar, and
multilingual BERT. In Proceedings of The 5th Workshop on BioNLP Open Shared Kawaljeet Kaur Kapoor. 2019. Event classification and location prediction from
Tasks. 56–61. tweets during disasters. Annals of Operations Research 283, 1 (2019), 737–757.
[9] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural [28] Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune bert
computation 9, 8 (1997), 1735–1780. for text classification?. In China National Conference on Chinese Computational
[10] Vladimir Iglovikov, Sergey Mushinskiy, and Vladimir Osin. 2017. Satellite imagery Linguistics. Springer, 194–206.
feature detection using deep convolutional neural network: A kaggle competition. [29] Alexey Tolkachev, Ilyas Sirazitdinov, Maksym Kholiavchenko, Tamerlan
arXiv preprint arXiv:1706.06169 (2017). Mustafaev, and Bulat Ibragimov. 2020. Deep Learning for Diagnosis and Segmen-
[11] Shaoxiong Ji, Matti Hölttä, and Pekka Marttinen. 2021. Does the Magic of tation of Pneumothorax: The Results on The Kaggle Competition and Validation
BERT Apply to Medical Code Assignment? A Quantitative Study. arXiv preprint Against Radiologists. IEEE Journal of Biomedical and Health Informatics (2020).
arXiv:2103.06511 (2021). [30] Xulei Yang and Jie Ding. 2020. A computational framework for iceberg and
[12] Amir Karami, Vishal Shah, Reza Vaezi, and Amit Bansal. 2020. Twitter speaks: A ship discrimination: Case study on Kaggle competition. IEEE Access 8 (2020),
case of national disaster situational awareness. Journal of Information Science 46, 82320–82327.
3 (2020), 313–324. [31] Xulei Yang, Zeng Zeng, Sin G Teo, Li Wang, Vijay Chandrasekhar, and Steven
[13] Tom Kenter, Alexey Borisov, and Maarten De Rijke. 2016. Siamese cbow: Hoi. 2018. Deep learning for practical image recognition: Case study on kaggle
Optimizing word embeddings for sentence representations. arXiv preprint competitions. In Proceedings of the 24th ACM SIGKDD International Conference
arXiv:1606.04640 (2016). on Knowledge Discovery & Data Mining. 923–931.
[14] Athanasia Koumpouri, Iosif Mporas, and Vasileios Megalooikonomou. 2015. Eval- [32] SoYeop Yoo, JeIn Song, and OkRan Jeong. 2018. Social media contents based
uation of Four Approaches for" Sentiment Analysis on Movie Reviews" The sentiment analysis and prediction system. Expert Systems with Applications 105
Kaggle Competition. In Proceedings of the 16th International Conference on Engi- (2018), 102–111.
neering Applications of Neural Networks (INNS). 1–5. [33] Lei Zou, Nina SN Lam, Heng Cai, and Yi Qiang. 2018. Mining Twitter data for
[15] Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. improved understanding of disaster resilience. Annals of the American Association
arXiv preprint arXiv:1908.08345 (2019). of Geographers 108, 5 (2018), 1422–1441.

You might also like