0% found this document useful (0 votes)
18 views8 pages

10 1109@access 2020 3012595

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views8 pages

10 1109@access 2020 3012595

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

COVID-19 Sensing: Negative Sentiment


Analysis on Social Media in China via
BERT Model
TIANYI WANG1 , KE LU2 , KAM PUI CHOW3 , AND QING ZHU4
1
Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong, China (e-mail: [email protected])
2
Department of Social Work and Social Administration, The University of Hong Kong, Pokfulam, Hong Kong, China (e-mail: [email protected])
3
Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong, China (e-mail: [email protected])
4
Department of Cardiology, Qilu Hospital of Shandong University, 107 West Wenhua Road, Jinan, 250012, PR China (e-mail: [email protected])
Corresponding authors: Kam Pui Chow (e-mail: [email protected]), Qing Zhu (e-mail: [email protected]).
This work didn’t have financial support.

ABSTRACT Coronavirus disease 2019 (COVID-19) poses massive challenges for the world. Public
sentiment analysis during the outbreak provides insightful information in making appropriate public health
responses. On Sina Weiboa , a popular Chinese social media, posts with negative sentiment are valuable
in analyzing public concerns. 999,978 randomly selected COVID-19 related Weibo posts from 1 January
2020 to 18 February 2020 are analyzed. Specifically, the unsupervised BERT (Bidirectional Encoder
Representations from Transformers) model is adopted to classify sentiment categories (positive, neutral, and
negative) and TF-IDF (term frequency-inverse document frequency) model is used to summarize the topics
of posts. Trend analysis and thematic analysis are conducted to identify characteristics of negative sentiment.
In general, the fine-tuned BERT conducts sentiment classification with considerable accuracy. Besides,
topics extracted by TF-IDF precisely convey characteristics of posts regarding COVID-19. As a result, we
observed that people concern four aspects regarding COVID-19, the virus Origin (Gamey Food, 3.08%; Bat,
2.70%; Conspiracy Theory, 1.43%), Symptom (Fever, 2.13%; Cough, 1.19%), Production Activity (Go to
Work, 1.94%; Resume Work, 1.12%; School New Semester Beginning, 1.06%) and Public Health Control
(Temperature Taking, 1.39%; Coronavirus Cover-up, 1.26%; City Shutdown, 1.09%). Results from Weibo
posts provide constructive instructions on public health responses, that transparent information sharing and
scientific guidance might help alleviate public concerns.
a https://www.weibo.com/

INDEX TERMS COVID-19 Sensing, Public Health, Sentiment Classification, Social Media in China.

I. INTRODUCTION health crisis. Gui et al. [7] investigated public concerns to


the Zika virus crisis and reported mechanisms of personal
ORONAVIRUS Disease 2019 (COVID-19), caused by
C a new coronavirus with higher reproductivity than
SARS [1], first emerged in the People’s Republic of China
risk assessment and travel-related decision making during the
crisis. Meanwhile, social media have been widely used by
public health professionals for epidemiological monitoring
in December 2019 [2]. Early outbreak data grew rapidly at and understanding public reactions to urgent public health
an exponential rate [3], and human-to-human transmission issues. Pei et al. [8] developed methods to detect the intensity
also occurred [4], [5], which brought severe challenges to of social reaction with word-to-vector technique and context
China and the whole world. Soon vicarious traumatization analysis. Tibebu et al. [9] analyzed real-time information on
caused by COVID-19 was found spreading in members of Twitter about opioid use and perceptions in Canada, which
medical teams united in aiding the COVID-19 control and facilitated public health practice and opioid crisis addressing.
general public [6]. Social media has been found as a key To explain and predict public emotional responses, espe-
platform for the public on information gathering and social cially the sentiment of distress and grief towards COVID-
learning to manage uncertainty and risks during a public

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 1. Workflow of sentiment analysis model.

19, we analyze 999,978 microblogging posts from January 1, media evolves as COVID-19 spreads.
2020 to February 18, 2020 on Sina Weibo (Weibo for short), • We extract representative topics and discuss the dom-
one of the most popular social media platforms in China with inant discourse of public distress about COVID-19
550 million monthly active users in Quarter one, 2020. Weibo caused by related social events. Findings of this study
enjoys its traits of instant messaging, transparent sharing and could assist governments worldwide in making efficient
publicly accessibility. In this work, the Deep Natural Lan- and effective public health protection decisions.
guage Processing (NLP) model and topic modelling method
are utilized. To be specific, we fine-tune BERT for sentiment II. RELATED WORK
classification upon posts with three potential categories of Coronavirus Disease 2019 (COVID-19) is a newly occurred
sentiment, positive, neutral and negative, achieving a 75.65% disease that related research has barely been published by the
of high accuracy, which surpasses many NLP baseline al- time of conducting our study. However, there has been some
gorithms. The number of posts on each date is analyzed studies on text sentiment classification, which to some extent
based on sentiment classification. Thereafter, TF-IDF model relate to our work. Ye et al. [12] applied Machine Learning
is adopted to extract central topics of posts. As the public SVM [13] model on Chinese product reviews for sentiment
sentiment on social media reflects people’s psychological (positive or negative) classification and achieved better per-
well-being and the spread of posts with negative sentiment formance than the classical Semantic Orientation approach.
may lead to social disruption and challenges for infection Narayanan et al. [14] built a fast sentiment classifier using
preventions [10], we analyze 11 dominant and distinctive Naïve Bayes, which achieved 88.80% accuracy on popular
topics extracted from Weibo posts with negative sentiment IMDB movie reviews dataset. In recent years, researchers
and investigate the trends of sentiment development and the have used more deep learning neural network techniques
underlying major themes to understand public concerns. on sentiment classification. Ren et al. [15] enhanced word
Outbreaks are now taking place in many countries around representation with character embeddings and mainly ap-
the world, especially Europe and North America [11]. For plied CNN for a context-sensitive sentiment classification on
instance, as of 25 July 2020, there have been more than 4 Twitter contents. Tang et al. [16] proposed sentiment classi-
million people affected in the U.S. Under this circumstance, fication upon documents by a combination of LSTM/CNN
main contributions of this study include: embedding and gated RNN.
• We fine-tune BERT model for sentiment classification
on Chinese Weibo posts about COVID-19 and achieve III. METHODOLOGY
considerable accuracy that beats all baseline NLP algo- Our sentiment analysis model consists of two parts, as shown
rithms. in the workflow of Fig. 1. In particular, we first use fine-
• The study demonstrates how public sentiment on social tuned BERT [17] to classify the sentiment of Weibo posts
2 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

into positive, neutral and negative categories and analyze the


trends of posts. Then we apply TF-IDF [18] algorithm to
extract topics of posts with different sentiment. Specifically,
11 topics for negative posts are generalized, and we then
analyze the underlying patterns.

A. SENTIMENT CLASSIFICATION AND DESCRIPTIVE


ANALYSIS
Bidirectional Encoder Representations from Transformers
(BERT), a neural network-based technique for natural lan-
guage processing pre-training, has been largely applied in
sentiment analysis [19]. The BERT model can be fine-tuned
with proper input and output layers to create state-of-the-art FIGURE 2. Post lengths of the whole Weibo post dataset. X-axis represents
different post lengths, and y-axis represent the numbers of each post length.
models in a wide range of text analysis tasks [17]. The core of
BERT is the adoption of transformer technique [20], which
perfectly applies encoder-decoder model [21] and attention the unlabeled Weibo posts, and apply the built-in TF-IDF
mechanism [22], [23], [24] on NLP tasks. In particular, at- function in jieba [28] , a Chinese word segmentation tool that
tention mechanism allows the model to focus on the relevant operates based on its huge pre-trained corpus, to posts in each
parts of the input sequence as needed when input sequence labeled sentiment class for topic extraction by (3) for word t
is too long for typical NLP models to memorize all input in a Weibo post d from the entire 999,978 Weibo posts dataset
features. In our work, attention mechanism in BERT model D.
helps determine the relevance weight of each word token 
in input of Weibo posts and generate corresponding hidden tf(t, d) = log(1 + freq(t, d))

N
states that can best describe the characteristics of different idf(t, D) = log( count(d∈D:t∈d) ) (3)

sentiment of Weibo posts during the training process by (1), tfidf(t, d, D) = tf(t, d) × idf(t, D)

where Q, K, and V are three vectors, query, key and value,
created based on input embedding, and dk is the dimension In detail, we treat each Weibo post as a document and each
of key vectors. segmented Chinese token as a potential topic. Due to the fact
that most Weibo posts are not too long (length distribution
QK T
Attention(Q, K, V ) = softmax( √ )V (1) without an outlier 884 is as shown in Fig. 2), topics with
dk top 5 TF-IDF scores in each post are extracted and analyzed.
After sentiment classification, the percentage of each sen- Topics in each sentiment category represent public focus and
timent category is calculated. In public health crisis, peo- concerns regarding COVID-19. Further analysis based on the
ple’s responses evolve over time [25]. To gain the insights extracted topics is performed.
into people’s reactions, Weibo posts with positive, neutral Thematic analysis is a common tool to understand the
and negative sentiment of each day from 1 January 2020 perceptions and reasons for people’s posts with negative
to 18 February 2020 are compared. Furthermore, in order sentiment [29]. In our case, topics appearing in more than
to identify dates with rapid changes in a number of posts 1% of total posts in each category of sentiment (positive,
with different sentiment categories, the increasing rate is neutral and negative) are collected. Five topics are excluded,
calculated by (2) and further analyzed according to critical including Pneumonia (肺 炎), Outbreak (疫 情), Virus (病
events happened on the corresponding dates. 毒), Coronavirus (冠状病毒), and COVID-19 (新冠), be-
number of posts on a day − number of posts on the day before cause they directly represent COVID-19 and couldn’t provide
number of posts on the day before any valuable information in interpreting public sentiment.
(2) As a result, there are 38 key topics for positive sentiment,
19 for neutral sentiment, and 19 for negative sentiment,
B. TOPIC EXTRACTION AND THEMATIC ANALYSIS respectively. The 19 key topics for negative sentiment are
Term frequency-inverse document frequency (TF-IDF) is a compared with those of neutral sentiment and positive sen-
numerical statistic reflecting how important a word is to a timent. Thereinto, 11 topics are found distinctive for negative
document in a collection or corpus [26]. The TF-IDF method sentiment and 8 topics are shared among posts with neutral or
is able to catch words that occur frequently by calculating positive sentiment, where the latter are Masks (口罩), Wuhan
term frequency and avoid insignificant words that occur (武 汉), Definite Diagnosis (确 诊), Doctor (医 生), Case
in every document as important by the ability of inverse (病例), Infection (感染), Quarantine (隔离), and Hospital
document frequency. Da Silva and Lopes [27] used TF-IDF (医 院). In addition, by exploring the semantics of topics,
to find the most informative Relevant Expression in each we divide the 11 negative-distinctive topics to four themes
document in the corpus of their research. We perform the for further analysis. The four themes are Origin, Symptom,
well-trained automatic sentiment classification model upon Production Activity, and Public Health Control. Moreover,
VOLUME 4, 2016 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

we analyze the frequency of the 11 topics from 1 January TABLE 1. Performance Measures of Sentiment Classification Model
2020 to 18 February 2020 to visualize their trends.
Precision(%) Recall(%) F1-score(%)
Positive 0.6990 0.7477 0.7225
IV. EXPERIMENTS AND RESULTS ANALYSIS Neutral 0.7975 0.7797 0.7885
Negative 0.6434 0.6246 0.6339
A. DATASET
Based on a list of 230 COVID-19 related key phrases, 2.4
TABLE 2. Comparative Test on NLP Baseline Algorithms
million Weibo posts from 1 January 2020 to 19 February
2020 (posts on 19 February is incomplete thus excluded System Accuracy(%)
in our final sample) are crawled by CCIR 2020 organizer Fine-tuned BERTBASE 75.65
(26th China Conference on Information Retrieval) [30]. The SVM 70.66
Naïve Bayes 66.97
crawler mainly uses SciPy and Beautiful Soup techniques, Logistic Regression 70.02
and deletion of duplicates and reposts are processed to con- CNN 71.19
struct the Weibo posts dataset. The dataset includes posts by LSTM 57.73
around 640 thousand users with user location information
excluded. TABLE 3. Examples of Posts With Different Sentiment Categories

Manual sentiment labelling (positive, neutral and negative)


Sentiment Post
is accomplished by CCIR 2020 organizer. In particular, open Positive #致敬白衣战士#辛苦了,感谢你们
source Chinese sentiment analysis tools are adopted for #Salute to the warriors wearing white clothes (the doctor)#
preprocessing, and 12 volunteers were invited to finish the Thanks for your hard work.
中国加油,武汉加油,我为我的家乡加油
manual labeling on 120 thousand randomly selected Weibo Stay strong China, stay strong Wuhan, I wish all the best
posts from the dataset based on the preprocessing results for my hometown.
and human decisions upon sentiment of Weibo contents. In Neutral 世卫组织正式命名新冠肺炎为COVID-19
WHO officially name the new coronavirus as COVID-19.
detail, label of each Weibo post is decided by majority voting 我屋的窗户对着小区里的花园步道,还算空旷,平时
method of 3 volunteers who didn’t know each other. 经常有人跑步散步,今早正式通告了我家小区的确诊
As a result, one million randomly selected posts from the 病例之后,一夜之间外面连个人影都很难看到了......?
The view of my bedroom window is the often-spacious
whole dataset are shared to the public and 10% of them are garden walkway of the neighborhood. There used to be
manually labeled with three sentiment categories (positive, people running or walking. However, after a confirmed
neutral and negative) by CCIR 2020 organizer. case being announced in our community, it’s hard to see
even a single person.
Negative 好不容易要出院了然后我现在发烧38.5这是怎么了怎
B. SENTIMENT CLASSIFICATION 么了?
Finally getting out of hospital and now I’m having a fever
Manually labeled Weibo posts are randomly split into train- with 38.5°C. What’s wrong with me, what’s wrong with
ing and testing sets with a ratio of 5 to 5. In our experiment, me?
we fine-tuned the Chinese BERT-BASE model with 12 layers 过于无聊,复工时间又推后了,真担心哪天公司没了
So boring. Work Resumption has been postponed again.
and hidden dimension 768. According to the suggestion on I’m so worried the company will be gone one day.
hyperparameters selection by the original BERT paper [17],
we set up parameters of 4 epochs, learning rate of 2e-5
and batch size of 32, and applied softmax neural network same data split ratios and random state values. Results are
layer to train a three-category (positive, neutral and negative) presented in Table 2.
sentiment classifier using the training dataset. For 999,978 posts, most of them (56.2%) are neutral, while
After training, the sentiment classification model achieves there are more positive sentiment (27.4%) than negative
a 75.65% accuracy upon testing set. F1-scores on testing set sentiment (16.4%). Examples of posts with different senti-
along with precision and recall metrics for each sentiment ment categories are given in Table 3. Specifically, posts of
category are summarized in Table 1. An overall weighted expressing gratitude would be labelled as positive while those
F1-score 0.7458 for classification model is obtained on data about the fear for COVID-19 or life arrangements affected by
in Table 1 by (4), where w is the weight of each sentiment COVID-19 would be categorized as negative. Neutral posts
category. The sentiment classification is then performed upon include the mere circulation of public information or plain
the rest unlabeled Weibo posts. descriptions of some life changes. From the upper part of
 2 Fig. 3, it is clear that COVID-19 related posts are increasing

 F1 = recall−1 +precision−1
across the time and posts with different sentiment categories
Weighted F1 =wpositive × F1,positive


(4) increase accordingly. Before 19 January, the number of posts


 + wneutral × F1,neutral about COVID-19 are quite stable. As to 20 January, there is

+ wnegative × F1,negative an increase of 25.81% and 33.03% in negative posts and total
posts, respectively, and a decrease of 16.30% in positive posts
We performed comparative tests on the same dataset using as compared to that of 19 January. Regarding the accumula-
different baseline sentiment classification algorithms with the tive confirmed cases of COVID-19 in lower part of Fig. 3, it
4 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 4. Number of posts related to the Origin with respect to the dates.

FIGURE 3. Posts with three different sentiment categories with respect to the
dates (upper graph) and number of accumulative confirmed cases with respect
to the dates (lower graph).

TABLE 4. 11 Key Topics in Weibo Posts with Negative Sentiment Along with
the Corresponding Themes and Frequency of Occurrences in Weibo Posts
FIGURE 5. Number of posts related to the Symptom with respect to the dates.

Key Topics Themes Percentage(%)


野味(Gamey Food) Origin 3.08
蝙蝠(Bat) Origin 2.70 firmed the human-to-human transmission of COVID-19 on
阴谋论(Conspiracy Theory) Origin 1.43 China Central Television [31]. This surge of Weibo posts
发烧(Fever) Symptom 2.13
indicates the tremendous influence of information revealed
咳嗽(Cough) Symptom 1.19
上班(Go to Work) Production Activity 1.94 by government-related media and it is therefore important for
复工(Resume Work) Production Activity 1.12 government to discreetly handle the public depression about
开 学(School New Semester Production Activity 1.06 COVID-19.
Beginning)
测体温(Temperature Taking) Public Health Control 1.39
瞒报(Coronavirus Cover-up) Public Health Control 1.26 C. NEGATIVE SENTIMENT ANALYSIS
封城(City Shutdown) Public Health Control 1.09 As listed in Table 4, the 11 key topics of Weibo posts with
The definitions of four themes in this work are as follows:
Origin: Discussions about where COVID-19 comes from. negative sentiment fall into four themes: Origin, Symptom,
Symptom: Change of body or mind indicating the possible infection Production Activity, and Public Health Control. Typically,
of COVID-19. the number of posts in each theme for each day from 1
Production Activity: Work life arrangements influenced by the
outbreak of COVID-19. January 2020 to 18 February 2020 are plotted in Fig. 4-7 for
Public Health Control: Measures to stop the spread of COVID-19. further analysis.
There are three topics about the Origin of COVID-19 as
shown in Fig. 4. In particular, “Gamey Food” (3.08%) and
is clear that the number of posts in each day starts to grow “Bat” (2.70%) are two primary assumptions for the origins
from 20 January, in line with the increase in accumulative of COVID-19 [5], [32]. They remain low frequency in the
confirmed cases. The number remains relatively stable from early January. However, there is a sharp increase in posts
early February during the early outbreak. of “Gamey Food” on 20 and “Bat” on 22, January 2020,
COVID-19 is first noticed in China in December 2019 [1], both then keeping a high frequency thereafter and reaching
but the number of Weibo posts remains stable and relatively the highest number of posts with negative sentiment per day
low during the early period. On 20 January 2020, there among 11 topics. And the “Conspiracy Theory” (1.43%)
is a surge in total Weibo posts and posts with negative suggesting that COVID-19 does not have a natural origin
sentiment as well, and the number of posts is kept at that is condemned by scientists [33] but widespread on social
level ever since. One important event might be related to media. However, it remains in a relatively low frequency
this. On the night of 20 January Dr. Zhong Nanshan con- in comparison with “Gamey Food” and “Bat”. The discus-
VOLUME 4, 2016 5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

“Resume work” starts to grow from 26 January 2020. The


arrangements for production activity might be an important
driving force for the general depression. The peak of concern
for “School New Semester Beginning” in Fig. 6 comes earlier
than that for “Resume Work”, thus earlier arrangements are
necessary in easing people’s tension.
Three topics about Public Health Control (Fig. 7) depict
different aspects. “Temperature Taking” (1.39%) is an impor-
tant method for COVID-19 diagnosis. “Coronavirus Cover-
up” (1.26%) is the misbehavior or fault in public health
control which might lead to risk of more infections [35]. And
“City Shutdown” (1.09%) describes the strict public health
FIGURE 6. Number of posts related to the Production Activity with respect to
the dates.
control taken by the Chinese government [36]. There are
almost no negative posts about Public Health Control in the
early January. The peak discussions are “City Shutdown” the
first on January 23, “Coronavirus Cover-up” the second on
January 26, and “Temperature taking” the third on January
27, and the three topics reach the same level of attention
around 18 February 2020. This trend demonstrates the fo-
cuses on different control measures as situation develops.
Towards this end, the public health control and opacity of
information may also lead to people’s depression. Therefore,
the corresponding public health control measures at a certain
time period should be elaborated for public acceptance.

V. CONCLUSION
FIGURE 7. Number of posts related to the Public Health Control with respect The spread of COVID-19 has turned to a worldwide pan-
to the dates. demic thus far [37]. Public health concerns not only re-
late to the infection prevention but also the psychological
status of people experiencing the disaster [38]. Therefore,
sions of the origins of virus are deeply correlated to nega- analyzing posts with negative sentiment from social media
tive sentiment and trigger the largest amount of posts with could contribute to understanding the experiences of Chinese
negative sentiment as demonstrated in Fig. 4. However, as general public during the outbreak of COVID-19 and offers
discussed, rumors and unconfirmed information may over- examples for other countries. Our analyses provide insights
whelm the discussion [33]. Consequently, it is important for on the evolution of social sentiment over time and the topic
the government to release the transparent progress about the themes connected to negative sentiment of Weibo posts. Fig.
investigation of origins. 3 illustrates the clear outbreak dates for public attention about
“Fever” (2.13%) and “Cough” (1.19%) are identified as COVID-19. Moreover, concerns about Origin, Symptom,
the representative symptoms of COVID-19 [34]. For posts Production Activity, and Public Health Control are deeply
about Symptom as shown in Fig. 5, posts with “Fever” as intertwined with the public sentiment.
topic outnumbered that with “Cough”, and the gap is quite This study collects data on social media from early stage of
large in early January but declines in February as COVID-19 COVID-19 transmission in China. Based on the data analysis
spread. Symptoms about COVID-19 might lead to negative and discussion, several advantages emerge. First, state-of-
sentiment but could also be beneficial for self-detection of the-art fine-tuned BERT classification model and TF-IDF
infection. Therefore, typical symptoms of disease should be topic extraction model deliver results with considerable ac-
revealed to the public clearly and timely, which would benefit curacy. Second, it can further be implemented as an online
early detection of the disease. platform for real-time monitoring on public sentiment during
Production Activity as pictured in Fig. 6 summarizes top- other crises in the future. Third, this study reveals important
ics about work-life arrangements under the threat of COVID- topic themes which are deeply connected to sentiment of
19. “Go to Work” (1.94%) and “Resume Work” (1.12%) depression. As the infection of COVID-19 keeps spreading
portray the public concern over work, and “School New worldwide now, insights from this study may contribute to
Semester Beginning” (1.06%) indicates people’s worries public administration and prevention of social disruptions.
about students going back to school. The concern on “Go Despite of informative results found in this study, further
to Work” starts from early January and remains relatively improvements are expected on the classification model to
high while the one on “School New Semester Beginning” achieve a higher accuracy. Furthermore, only information on
starts to grow from 20 January 2020 and the worries for Sina Weibo is used in this study, which may lead to bias
6 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

by neglecting posts on other social media platforms. Finally, [13] Vapnik Vladimir Naumovic., “SVM AND LOGISTIC REGRESSION,” in
in order to focus on the centrality of topics, only topics The nature of statistical learning theory, New York, MI: Springer, 2000,
pp. 156–163.
appearing in more than 1% of total posts are selected in [14] V. Narayanan, I. Arora, and A. Bhatia, “Fast and Accurate Sentiment Clas-
each category of sentiment. This may lead to the overlook sification Using an Enhanced Naive Bayes Model,” presented at Intelligent
of important topics with less percentage. Future studies by Data Engineering and Automated Learning - IDEAL 2013 Lecture Notes
in Computer Science, pp. 194–201, 2013.
incorporating information in empirical data from different [15] Y. Ren, Y. Zhang, M. Zhang, and D. Ji, "Context-sensitive twitter senti-
social media platforms and different countries may contribute ment classification using neural network," in Proceedings of the Thirtieth
to a more solid conclusion. AAAI Conference on Artifical Intelligence, pp. 215–221, 2016.
[16] D. Tang, B. Qin, and T. Liu, “Document Modeling with Gated Recurrent
New outbreaks are taking places in many other countries Neural Network for Sentiment Classification,” in Proceedings of the 2015
all around the world. The sentiment classification model and Conference on Empirical Methods in Natural Language Processing, 2015.
DOI: 10.18653/v1/D15-1167.
findings of this study would provide constructive instructions
[17] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training
for governments worldwide on making efficient and effective of Deep Bidirectional Transformers for Language Understanding,” in
public health protection decisions. Proceedings of NAACL-HLT 2019, pp. 4171–4186.
[18] C. Sammut and G. I. Webb, “TF-IDF,” in Encyclopedia of Machine
Learning, Boston, MA: Springer, 2011, pp. 986–987.
ACKNOWLEDGMENT [19] Z. Gao, A. Feng, X. Song, and X. Wu, “Target-Dependent Sentiment
Classification With BERT,” IEEE Access, vol. 7, pp. 154290–154299,
The authors thank Jing Ma from Shandong University for 2019.
cogent advice on data analysis viewpoint and format issues. [20] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L.
Kaiser, and I. Polosukhin, “Attention is all you need,” presented at Neural
Information Processing Systems (NIPS), pp. 6000–6010, 2017.
REFERENCES [21] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to Sequence Learning
[1] Y. Liu, A. A. Gayle, A. Wilder-Smith, and J. Rocklov, “The reproductive with Neural Networks,” in Proceedings of the 27th International Confer-
number of COVID-19 is higher compared to SARS coronavirus,” J Travel ence on Neural Information Processing Systems, vol. 2, pp.3104–3112.
Med, vol. 27, no. 2, Mar 13, 2020, DOI: 10.1093/jtm/taaa021. [22] M.-T. Luong, H. Pham, and C. D. Manning, “Effective Approaches
[2] “Novel Coronavirus (2019-nCoV) SITUATION REPORT – 1,” World to Attention-based Neural Machine Translation,” in Proceedings of the
Health Organization. Accessed: Mar. 22, 2020. [Online]. Avail- 2015 Conference on Empirical Methods in Natural Language Processing,
able: https://www.who.int/docs/default-source/coronaviruse/situation-re- pp.1412–1421.
ports/20200121-sitrep-1-2019-ncov.pdf?sfvrsn=20a99c10_4 [23] D. D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by
Jointly Learning to Align and Translate,” presented at the 6th Int. Conf. on
[3] S. Zhao, Q. Lin, J. Ran, S. S. Musa, G. Yang, W. Wang, Y. Lou, D. Gao,
Learning Representations, Vancouver, BC, Canada, April 30-May. 3, 2018,
L. Yang, D. He, and M. H. Wang, “Preliminary estimation of the basic
arXiv:1409.0473. [Online]. Available: https://arxiv.org/abs/1409.0473
reproduction number of novel coronavirus (2019-nCoV) in China, from
[24] Y. Kim, C. Denton, L. Hoang, and A. M. Rush, “Structured Attention
2019 to 2020: A data-driven analysis in the early phase of the outbreak,”
Networks,” presented at the 5th Int. Conf. on Learning Representa-
Int J Infect Dis, vol. 92, pp. 214–217, Mar, 2020.
tions, Palais des Congrès Neptune, Toulon, France, April 24-26, 2017,
[4] “Coronavirus disease 2019 (COVID-19) Situation Report – 45,”
arXiv:1702.00887. [Online]. Available: https://arxiv.org/abs/1702.00887
World Health Organization. Accessed: Mar. 22, 2020. [Online]. Avail-
[25] B.T. Burkholder, and M. J. Toole, “Evolution of complex disasters,” The
able: https://www.who.int/docs/default-source/coronaviruse/situation-re-
Lancet, vol. 346, no. 8981, pp. 1012–1015, 1995.
ports/20200305-sitrep-45-covid-19.pdf?sfvrsn=ed2ba78b_4
[26] A. Rajaraman and J. D. Ullman, “Data Mining,” in Mining of Massive
[5] C. C. Lai, T. P. Shih, W. C. Ko, H. J. Tang, and P. R. Hsueh, “Severe Datasets, Cambridge: Cambridge University Press, 2011, pp. 1–17.
acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus [27] J. F. D. Silva and G. P. Lopes, “A Document Descriptor Extractor Based
disease-2019 (COVID-19): The epidemic and the challenges,” Int J An- on Relevant Expressions,” presented at Progress in Artificial Intelligence
timicrob Agents, vol. 55, no. 3, pp. 105924, Mar, 2020. (EPIA 2009), pp. 646–657, 2009.
[6] Z. Li, J. Ge, M. Yang, J. Feng, M. Qiao, R. Jiang, J. Bi, G. Zhan, X. [28] “jieba.” PyPI. https://pypi.org/project/jieba (accessed: Mar. 22, 2020).
Xu, L. Wang, Q. Zhou, C. Zhou, Y. Pan, S. Liu, H. Zhang, J. Yang, B. [29] L. Mollema, I. A. Harmsen, E. Broekhuizen, R. Clijnk, H. De Melker, T.
Zhu, Y. Hu, K. Hashimoto, Y. Jia, H. Wang, R. Wang, C. Liu, and C. Paulussen, G. Kok, R. Ruiter, and E. Das, “Disease detection or public
Yang, “Vicarious traumatization in the general public, members, and non- opinion reflection? Content analysis of tweets, other social media, and
members of medical teams aiding in COVID-19 control,” Brain, Behavior, online newspapers during the measles outbreak in The Netherlands in
and Immunity, Mar. 2020. DOI: 10.1016/j.bbi.2020.03.007. 2013,” J Med Internet Res, vol. 17, no. 5, pp. e128, May 26, 2015.
[7] X. Gui, Y. Kou, K. H. Pine, and Y. Chen, “Managing Uncertainty: Using [30] “CCIR 2020.” 全 国 信 息 检 索 学 术 会 议CCIR2020.
Social Media for Risk Assessment during a Public Health Crisis,” in http://www.cvnis.net/ccir2020/index.html (accessed: Mar. 22, 2020).
Proceedings of the 2017 CHI Conference on Human Factors in Computing [31] “钟 南 山 肯 定 新 型 冠 状 病 毒 肺 炎 人 传 人.” Huanqiu.com.
Systems – CHI ’17, pp. 4520–4533. https://china.huanqiu.com/article/9CaKrnKoZPT (accessed: Mar. 22,
[8] J. Pei, G. Yu, X. Tian, and M. R. Donnelley, “A new method for early 2020).
detection of mass concern about public health issues,” Journal of Risk [32] Z. Hou, L. Lin, L. Lu, F. Du, M. Qian, Y. Liang, J. Zhang, and H. Yu,
Research, vol. 20, no. 4, pp. 516–532, 2015. “Public Exposure to Live Animals, Behavioural Change, and Support in
[9] S. Tibebu, V. C. Chang, C. A. Drouin, W. Thompson, and M. T. Do, “At-a- Containment Measures in response to COVID-19 Outbreak: a population-
glance - What can social media tell us about the opioid crisis in Canada?,” based cross sectional survey in China,” medRxiv preprints, 2020. DOI:
Health Promot Chronic Dis Prev Can, vol. 38, no. 6, pp. 263–267, Jun, 10.1101/2020.02.21.20026146.
2018. [33] C. Calisher, D. Carroll, R. Colwell, R. B. Corley, P. Daszak, C. Drosten, L.
[10] X. Ji, S. A. Chun, Z. Wei, and J. Geller, “Twitter sentiment classification Enjuanes, J. Farrar, H. Field, J. Golding, A. Gorbalenya, B. Haagmans,
for measuring public health concerns,” Social Network Analysis and Min- J. M. Hughes, W. B. Karesh, G. T. Keusch, S. K. Lam, J. Lubroth, J.
ing, vol. 5, no. 1, 2015. DOI: 10.1007/s13278-015-0253-5. S. Mackenzie, L. Madoff, J. Mazet, P. Palese, S. Perlman, L. Poon, B.
[11] “World Map,” Centers for Disease Control and Prevention. Accessed: Mar. Roizman, L. Saif, K. Subbarao, and M. Turner, “Statement in support of
22, 2020. [Online]. Available: https://www.cdc.gov/coronavirus/2019- the scientists, public health professionals, and medical professionals of
ncov/cases-updates/world-map.html China combatting COVID-19,” The Lancet, vol. 395, no. 10226, pp. e42–
[12] Q. Ye, B. Lin and Y. Li, "Sentiment classification for Chinese reviews: e43, 2020.
a comparison between SVM and semantic approaches," in 2005 Inter- [34] Z. Xu, L. Shi, Y. Wang, J. Zhang, L. Huang, C. Zhang, S. Liu, P. Zhao,
national Conference on Machine Learning and Cybernetics, Guangzhou, H. Liu, L. Zhu, Y. Tai, C. Bai, T. Gao, J. Song, P. Xia, J. Dong, J. Zhao,
China, 2005, pp. 2341–2346 Vol. 4, DOI: 10.1109/ICMLC.2005.1527335. and F.-S. Wang, “Pathological findings of COVID-19 associated with acute

VOLUME 4, 2016 7

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

respiratory distress syndrome,” The Lancet Respiratory Medicine, vol. 8, QING ZHU received Ph.D. degree in Cardiology
no. 4, pp. 420–422, Apr. 2020. from Shandong University, Jinan, China, in 2003.
[35] Z. Xu, S. Li, S. Tian, H. Li, and L.-q. Kong, “Full spectrum of COVID-19 She is currently a Cardiologist and a Chief physi-
severity still being depicted,” The Lancet, vol. 395, no. 10228, pp. 947– cian in Qilu Hospital of Shandong University, and
948, 2020. also an Associate Professor and the Master tutor
[36] Y. Xiao and M. E. Torok, “Taking the right measures to control COVID- of the Medical College of Shandong University.
19,” The Lancet Infectious Diseases, vol. 20, no. 5, pp. 523–524, May Her research interest includes the Clinical Pre-
2020.
vention Treatment and basic research of Lipid
[37] “Coronavirus Disease (COVID-19) - events as they happen,”
Metabolism Abnormality and Atherosclerosis,
World Health Organization. Accessed: Mar. 22, 2020. [Online].
Available: https://www.who.int/emergencies/diseases/novel-coronavirus- Electrophysiological Mechanism of Arrhythmia
2019/events-as-they-happen and Radiofrequency Ablation, and Chronic Disease Management of Cardio-
[38] S. C. Vos, and M. M. Buckner, “Social Media Messages in an Emerging vascular Disease.
Health Crisis: Tweeting Bird Flu,” J Health Commun, vol. 21, no. 3, pp.
301–308, 2016.

TIANYI WANG is currently pursuing the Ph.D.


degree at Department of Computer Science at the
University of Hong Kong. His major research in-
terest includes Machine Learning, Deep Learning
and Cyber Security.

KE LU is currently pursuing the Ph.D. degree


at the Department of Social Work and Social
Administration at the University of Hong Kong.
His major research interest includes Nonprofits,
Organization Development, Social Enterprise, So-
cial Innovation, Crowdfunding, and Philanthropy.
Recently, he has been involved in studying the
organization development of education nonprofits
in China.

KAM PUI CHOW received Ph.D. degree in Com-


puter Science from University of California, Santa
Barbara, United States. He is the Associate Profes-
sor of Department of Computer Science and the
Director of the Center for Information Security
and Cryptography (CISC) at the University of
Hong Kong (HKU).
In the years 1994-1997, Dr. Chow together with
other professionals and a team of software engi-
neers developed the search engine for Hong Kong
Telecom’s 108 Telephone Directory Enquiry System using state of the art
technology in main memory database and distributed computing. In the
recent years, his research interests have migrated to digital forensics and
computer security, and he is the leader of the Computer Forensics Research
Group (CFRG).
His research interest includes Computer Forensics, Digital Investigation,
Data Privacy, Cryptography, and Computer Security.

8 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

You might also like