10 1109@access 2020 3012595
10 1109@access 2020 3012595
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
ABSTRACT Coronavirus disease 2019 (COVID-19) poses massive challenges for the world. Public
sentiment analysis during the outbreak provides insightful information in making appropriate public health
responses. On Sina Weiboa , a popular Chinese social media, posts with negative sentiment are valuable
in analyzing public concerns. 999,978 randomly selected COVID-19 related Weibo posts from 1 January
2020 to 18 February 2020 are analyzed. Specifically, the unsupervised BERT (Bidirectional Encoder
Representations from Transformers) model is adopted to classify sentiment categories (positive, neutral, and
negative) and TF-IDF (term frequency-inverse document frequency) model is used to summarize the topics
of posts. Trend analysis and thematic analysis are conducted to identify characteristics of negative sentiment.
In general, the fine-tuned BERT conducts sentiment classification with considerable accuracy. Besides,
topics extracted by TF-IDF precisely convey characteristics of posts regarding COVID-19. As a result, we
observed that people concern four aspects regarding COVID-19, the virus Origin (Gamey Food, 3.08%; Bat,
2.70%; Conspiracy Theory, 1.43%), Symptom (Fever, 2.13%; Cough, 1.19%), Production Activity (Go to
Work, 1.94%; Resume Work, 1.12%; School New Semester Beginning, 1.06%) and Public Health Control
(Temperature Taking, 1.39%; Coronavirus Cover-up, 1.26%; City Shutdown, 1.09%). Results from Weibo
posts provide constructive instructions on public health responses, that transparent information sharing and
scientific guidance might help alleviate public concerns.
a https://www.weibo.com/
INDEX TERMS COVID-19 Sensing, Public Health, Sentiment Classification, Social Media in China.
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access
19, we analyze 999,978 microblogging posts from January 1, media evolves as COVID-19 spreads.
2020 to February 18, 2020 on Sina Weibo (Weibo for short), • We extract representative topics and discuss the dom-
one of the most popular social media platforms in China with inant discourse of public distress about COVID-19
550 million monthly active users in Quarter one, 2020. Weibo caused by related social events. Findings of this study
enjoys its traits of instant messaging, transparent sharing and could assist governments worldwide in making efficient
publicly accessibility. In this work, the Deep Natural Lan- and effective public health protection decisions.
guage Processing (NLP) model and topic modelling method
are utilized. To be specific, we fine-tune BERT for sentiment II. RELATED WORK
classification upon posts with three potential categories of Coronavirus Disease 2019 (COVID-19) is a newly occurred
sentiment, positive, neutral and negative, achieving a 75.65% disease that related research has barely been published by the
of high accuracy, which surpasses many NLP baseline al- time of conducting our study. However, there has been some
gorithms. The number of posts on each date is analyzed studies on text sentiment classification, which to some extent
based on sentiment classification. Thereafter, TF-IDF model relate to our work. Ye et al. [12] applied Machine Learning
is adopted to extract central topics of posts. As the public SVM [13] model on Chinese product reviews for sentiment
sentiment on social media reflects people’s psychological (positive or negative) classification and achieved better per-
well-being and the spread of posts with negative sentiment formance than the classical Semantic Orientation approach.
may lead to social disruption and challenges for infection Narayanan et al. [14] built a fast sentiment classifier using
preventions [10], we analyze 11 dominant and distinctive Naïve Bayes, which achieved 88.80% accuracy on popular
topics extracted from Weibo posts with negative sentiment IMDB movie reviews dataset. In recent years, researchers
and investigate the trends of sentiment development and the have used more deep learning neural network techniques
underlying major themes to understand public concerns. on sentiment classification. Ren et al. [15] enhanced word
Outbreaks are now taking place in many countries around representation with character embeddings and mainly ap-
the world, especially Europe and North America [11]. For plied CNN for a context-sensitive sentiment classification on
instance, as of 25 July 2020, there have been more than 4 Twitter contents. Tang et al. [16] proposed sentiment classi-
million people affected in the U.S. Under this circumstance, fication upon documents by a combination of LSTM/CNN
main contributions of this study include: embedding and gated RNN.
• We fine-tune BERT model for sentiment classification
on Chinese Weibo posts about COVID-19 and achieve III. METHODOLOGY
considerable accuracy that beats all baseline NLP algo- Our sentiment analysis model consists of two parts, as shown
rithms. in the workflow of Fig. 1. In particular, we first use fine-
• The study demonstrates how public sentiment on social tuned BERT [17] to classify the sentiment of Weibo posts
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access
we analyze the frequency of the 11 topics from 1 January TABLE 1. Performance Measures of Sentiment Classification Model
2020 to 18 February 2020 to visualize their trends.
Precision(%) Recall(%) F1-score(%)
Positive 0.6990 0.7477 0.7225
IV. EXPERIMENTS AND RESULTS ANALYSIS Neutral 0.7975 0.7797 0.7885
Negative 0.6434 0.6246 0.6339
A. DATASET
Based on a list of 230 COVID-19 related key phrases, 2.4
TABLE 2. Comparative Test on NLP Baseline Algorithms
million Weibo posts from 1 January 2020 to 19 February
2020 (posts on 19 February is incomplete thus excluded System Accuracy(%)
in our final sample) are crawled by CCIR 2020 organizer Fine-tuned BERTBASE 75.65
(26th China Conference on Information Retrieval) [30]. The SVM 70.66
Naïve Bayes 66.97
crawler mainly uses SciPy and Beautiful Soup techniques, Logistic Regression 70.02
and deletion of duplicates and reposts are processed to con- CNN 71.19
struct the Weibo posts dataset. The dataset includes posts by LSTM 57.73
around 640 thousand users with user location information
excluded. TABLE 3. Examples of Posts With Different Sentiment Categories
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access
FIGURE 4. Number of posts related to the Origin with respect to the dates.
FIGURE 3. Posts with three different sentiment categories with respect to the
dates (upper graph) and number of accumulative confirmed cases with respect
to the dates (lower graph).
TABLE 4. 11 Key Topics in Weibo Posts with Negative Sentiment Along with
the Corresponding Themes and Frequency of Occurrences in Weibo Posts
FIGURE 5. Number of posts related to the Symptom with respect to the dates.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access
V. CONCLUSION
FIGURE 7. Number of posts related to the Public Health Control with respect The spread of COVID-19 has turned to a worldwide pan-
to the dates. demic thus far [37]. Public health concerns not only re-
late to the infection prevention but also the psychological
status of people experiencing the disaster [38]. Therefore,
sions of the origins of virus are deeply correlated to nega- analyzing posts with negative sentiment from social media
tive sentiment and trigger the largest amount of posts with could contribute to understanding the experiences of Chinese
negative sentiment as demonstrated in Fig. 4. However, as general public during the outbreak of COVID-19 and offers
discussed, rumors and unconfirmed information may over- examples for other countries. Our analyses provide insights
whelm the discussion [33]. Consequently, it is important for on the evolution of social sentiment over time and the topic
the government to release the transparent progress about the themes connected to negative sentiment of Weibo posts. Fig.
investigation of origins. 3 illustrates the clear outbreak dates for public attention about
“Fever” (2.13%) and “Cough” (1.19%) are identified as COVID-19. Moreover, concerns about Origin, Symptom,
the representative symptoms of COVID-19 [34]. For posts Production Activity, and Public Health Control are deeply
about Symptom as shown in Fig. 5, posts with “Fever” as intertwined with the public sentiment.
topic outnumbered that with “Cough”, and the gap is quite This study collects data on social media from early stage of
large in early January but declines in February as COVID-19 COVID-19 transmission in China. Based on the data analysis
spread. Symptoms about COVID-19 might lead to negative and discussion, several advantages emerge. First, state-of-
sentiment but could also be beneficial for self-detection of the-art fine-tuned BERT classification model and TF-IDF
infection. Therefore, typical symptoms of disease should be topic extraction model deliver results with considerable ac-
revealed to the public clearly and timely, which would benefit curacy. Second, it can further be implemented as an online
early detection of the disease. platform for real-time monitoring on public sentiment during
Production Activity as pictured in Fig. 6 summarizes top- other crises in the future. Third, this study reveals important
ics about work-life arrangements under the threat of COVID- topic themes which are deeply connected to sentiment of
19. “Go to Work” (1.94%) and “Resume Work” (1.12%) depression. As the infection of COVID-19 keeps spreading
portray the public concern over work, and “School New worldwide now, insights from this study may contribute to
Semester Beginning” (1.06%) indicates people’s worries public administration and prevention of social disruptions.
about students going back to school. The concern on “Go Despite of informative results found in this study, further
to Work” starts from early January and remains relatively improvements are expected on the classification model to
high while the one on “School New Semester Beginning” achieve a higher accuracy. Furthermore, only information on
starts to grow from 20 January 2020 and the worries for Sina Weibo is used in this study, which may lead to bias
6 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access
by neglecting posts on other social media platforms. Finally, [13] Vapnik Vladimir Naumovic., “SVM AND LOGISTIC REGRESSION,” in
in order to focus on the centrality of topics, only topics The nature of statistical learning theory, New York, MI: Springer, 2000,
pp. 156–163.
appearing in more than 1% of total posts are selected in [14] V. Narayanan, I. Arora, and A. Bhatia, “Fast and Accurate Sentiment Clas-
each category of sentiment. This may lead to the overlook sification Using an Enhanced Naive Bayes Model,” presented at Intelligent
of important topics with less percentage. Future studies by Data Engineering and Automated Learning - IDEAL 2013 Lecture Notes
in Computer Science, pp. 194–201, 2013.
incorporating information in empirical data from different [15] Y. Ren, Y. Zhang, M. Zhang, and D. Ji, "Context-sensitive twitter senti-
social media platforms and different countries may contribute ment classification using neural network," in Proceedings of the Thirtieth
to a more solid conclusion. AAAI Conference on Artifical Intelligence, pp. 215–221, 2016.
[16] D. Tang, B. Qin, and T. Liu, “Document Modeling with Gated Recurrent
New outbreaks are taking places in many other countries Neural Network for Sentiment Classification,” in Proceedings of the 2015
all around the world. The sentiment classification model and Conference on Empirical Methods in Natural Language Processing, 2015.
DOI: 10.18653/v1/D15-1167.
findings of this study would provide constructive instructions
[17] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training
for governments worldwide on making efficient and effective of Deep Bidirectional Transformers for Language Understanding,” in
public health protection decisions. Proceedings of NAACL-HLT 2019, pp. 4171–4186.
[18] C. Sammut and G. I. Webb, “TF-IDF,” in Encyclopedia of Machine
Learning, Boston, MA: Springer, 2011, pp. 986–987.
ACKNOWLEDGMENT [19] Z. Gao, A. Feng, X. Song, and X. Wu, “Target-Dependent Sentiment
Classification With BERT,” IEEE Access, vol. 7, pp. 154290–154299,
The authors thank Jing Ma from Shandong University for 2019.
cogent advice on data analysis viewpoint and format issues. [20] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L.
Kaiser, and I. Polosukhin, “Attention is all you need,” presented at Neural
Information Processing Systems (NIPS), pp. 6000–6010, 2017.
REFERENCES [21] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to Sequence Learning
[1] Y. Liu, A. A. Gayle, A. Wilder-Smith, and J. Rocklov, “The reproductive with Neural Networks,” in Proceedings of the 27th International Confer-
number of COVID-19 is higher compared to SARS coronavirus,” J Travel ence on Neural Information Processing Systems, vol. 2, pp.3104–3112.
Med, vol. 27, no. 2, Mar 13, 2020, DOI: 10.1093/jtm/taaa021. [22] M.-T. Luong, H. Pham, and C. D. Manning, “Effective Approaches
[2] “Novel Coronavirus (2019-nCoV) SITUATION REPORT – 1,” World to Attention-based Neural Machine Translation,” in Proceedings of the
Health Organization. Accessed: Mar. 22, 2020. [Online]. Avail- 2015 Conference on Empirical Methods in Natural Language Processing,
able: https://www.who.int/docs/default-source/coronaviruse/situation-re- pp.1412–1421.
ports/20200121-sitrep-1-2019-ncov.pdf?sfvrsn=20a99c10_4 [23] D. D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by
Jointly Learning to Align and Translate,” presented at the 6th Int. Conf. on
[3] S. Zhao, Q. Lin, J. Ran, S. S. Musa, G. Yang, W. Wang, Y. Lou, D. Gao,
Learning Representations, Vancouver, BC, Canada, April 30-May. 3, 2018,
L. Yang, D. He, and M. H. Wang, “Preliminary estimation of the basic
arXiv:1409.0473. [Online]. Available: https://arxiv.org/abs/1409.0473
reproduction number of novel coronavirus (2019-nCoV) in China, from
[24] Y. Kim, C. Denton, L. Hoang, and A. M. Rush, “Structured Attention
2019 to 2020: A data-driven analysis in the early phase of the outbreak,”
Networks,” presented at the 5th Int. Conf. on Learning Representa-
Int J Infect Dis, vol. 92, pp. 214–217, Mar, 2020.
tions, Palais des Congrès Neptune, Toulon, France, April 24-26, 2017,
[4] “Coronavirus disease 2019 (COVID-19) Situation Report – 45,”
arXiv:1702.00887. [Online]. Available: https://arxiv.org/abs/1702.00887
World Health Organization. Accessed: Mar. 22, 2020. [Online]. Avail-
[25] B.T. Burkholder, and M. J. Toole, “Evolution of complex disasters,” The
able: https://www.who.int/docs/default-source/coronaviruse/situation-re-
Lancet, vol. 346, no. 8981, pp. 1012–1015, 1995.
ports/20200305-sitrep-45-covid-19.pdf?sfvrsn=ed2ba78b_4
[26] A. Rajaraman and J. D. Ullman, “Data Mining,” in Mining of Massive
[5] C. C. Lai, T. P. Shih, W. C. Ko, H. J. Tang, and P. R. Hsueh, “Severe Datasets, Cambridge: Cambridge University Press, 2011, pp. 1–17.
acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus [27] J. F. D. Silva and G. P. Lopes, “A Document Descriptor Extractor Based
disease-2019 (COVID-19): The epidemic and the challenges,” Int J An- on Relevant Expressions,” presented at Progress in Artificial Intelligence
timicrob Agents, vol. 55, no. 3, pp. 105924, Mar, 2020. (EPIA 2009), pp. 646–657, 2009.
[6] Z. Li, J. Ge, M. Yang, J. Feng, M. Qiao, R. Jiang, J. Bi, G. Zhan, X. [28] “jieba.” PyPI. https://pypi.org/project/jieba (accessed: Mar. 22, 2020).
Xu, L. Wang, Q. Zhou, C. Zhou, Y. Pan, S. Liu, H. Zhang, J. Yang, B. [29] L. Mollema, I. A. Harmsen, E. Broekhuizen, R. Clijnk, H. De Melker, T.
Zhu, Y. Hu, K. Hashimoto, Y. Jia, H. Wang, R. Wang, C. Liu, and C. Paulussen, G. Kok, R. Ruiter, and E. Das, “Disease detection or public
Yang, “Vicarious traumatization in the general public, members, and non- opinion reflection? Content analysis of tweets, other social media, and
members of medical teams aiding in COVID-19 control,” Brain, Behavior, online newspapers during the measles outbreak in The Netherlands in
and Immunity, Mar. 2020. DOI: 10.1016/j.bbi.2020.03.007. 2013,” J Med Internet Res, vol. 17, no. 5, pp. e128, May 26, 2015.
[7] X. Gui, Y. Kou, K. H. Pine, and Y. Chen, “Managing Uncertainty: Using [30] “CCIR 2020.” 全 国 信 息 检 索 学 术 会 议CCIR2020.
Social Media for Risk Assessment during a Public Health Crisis,” in http://www.cvnis.net/ccir2020/index.html (accessed: Mar. 22, 2020).
Proceedings of the 2017 CHI Conference on Human Factors in Computing [31] “钟 南 山 肯 定 新 型 冠 状 病 毒 肺 炎 人 传 人.” Huanqiu.com.
Systems – CHI ’17, pp. 4520–4533. https://china.huanqiu.com/article/9CaKrnKoZPT (accessed: Mar. 22,
[8] J. Pei, G. Yu, X. Tian, and M. R. Donnelley, “A new method for early 2020).
detection of mass concern about public health issues,” Journal of Risk [32] Z. Hou, L. Lin, L. Lu, F. Du, M. Qian, Y. Liang, J. Zhang, and H. Yu,
Research, vol. 20, no. 4, pp. 516–532, 2015. “Public Exposure to Live Animals, Behavioural Change, and Support in
[9] S. Tibebu, V. C. Chang, C. A. Drouin, W. Thompson, and M. T. Do, “At-a- Containment Measures in response to COVID-19 Outbreak: a population-
glance - What can social media tell us about the opioid crisis in Canada?,” based cross sectional survey in China,” medRxiv preprints, 2020. DOI:
Health Promot Chronic Dis Prev Can, vol. 38, no. 6, pp. 263–267, Jun, 10.1101/2020.02.21.20026146.
2018. [33] C. Calisher, D. Carroll, R. Colwell, R. B. Corley, P. Daszak, C. Drosten, L.
[10] X. Ji, S. A. Chun, Z. Wei, and J. Geller, “Twitter sentiment classification Enjuanes, J. Farrar, H. Field, J. Golding, A. Gorbalenya, B. Haagmans,
for measuring public health concerns,” Social Network Analysis and Min- J. M. Hughes, W. B. Karesh, G. T. Keusch, S. K. Lam, J. Lubroth, J.
ing, vol. 5, no. 1, 2015. DOI: 10.1007/s13278-015-0253-5. S. Mackenzie, L. Madoff, J. Mazet, P. Palese, S. Perlman, L. Poon, B.
[11] “World Map,” Centers for Disease Control and Prevention. Accessed: Mar. Roizman, L. Saif, K. Subbarao, and M. Turner, “Statement in support of
22, 2020. [Online]. Available: https://www.cdc.gov/coronavirus/2019- the scientists, public health professionals, and medical professionals of
ncov/cases-updates/world-map.html China combatting COVID-19,” The Lancet, vol. 395, no. 10226, pp. e42–
[12] Q. Ye, B. Lin and Y. Li, "Sentiment classification for Chinese reviews: e43, 2020.
a comparison between SVM and semantic approaches," in 2005 Inter- [34] Z. Xu, L. Shi, Y. Wang, J. Zhang, L. Huang, C. Zhang, S. Liu, P. Zhao,
national Conference on Machine Learning and Cybernetics, Guangzhou, H. Liu, L. Zhu, Y. Tai, C. Bai, T. Gao, J. Song, P. Xia, J. Dong, J. Zhao,
China, 2005, pp. 2341–2346 Vol. 4, DOI: 10.1109/ICMLC.2005.1527335. and F.-S. Wang, “Pathological findings of COVID-19 associated with acute
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012595, IEEE Access
respiratory distress syndrome,” The Lancet Respiratory Medicine, vol. 8, QING ZHU received Ph.D. degree in Cardiology
no. 4, pp. 420–422, Apr. 2020. from Shandong University, Jinan, China, in 2003.
[35] Z. Xu, S. Li, S. Tian, H. Li, and L.-q. Kong, “Full spectrum of COVID-19 She is currently a Cardiologist and a Chief physi-
severity still being depicted,” The Lancet, vol. 395, no. 10228, pp. 947– cian in Qilu Hospital of Shandong University, and
948, 2020. also an Associate Professor and the Master tutor
[36] Y. Xiao and M. E. Torok, “Taking the right measures to control COVID- of the Medical College of Shandong University.
19,” The Lancet Infectious Diseases, vol. 20, no. 5, pp. 523–524, May Her research interest includes the Clinical Pre-
2020.
vention Treatment and basic research of Lipid
[37] “Coronavirus Disease (COVID-19) - events as they happen,”
Metabolism Abnormality and Atherosclerosis,
World Health Organization. Accessed: Mar. 22, 2020. [Online].
Available: https://www.who.int/emergencies/diseases/novel-coronavirus- Electrophysiological Mechanism of Arrhythmia
2019/events-as-they-happen and Radiofrequency Ablation, and Chronic Disease Management of Cardio-
[38] S. C. Vos, and M. M. Buckner, “Social Media Messages in an Emerging vascular Disease.
Health Crisis: Tweeting Bird Flu,” J Health Commun, vol. 21, no. 3, pp.
301–308, 2016.
8 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.