0% found this document useful (0 votes)
13 views5 pages

2019 BERT Stock Market

Uploaded by

caonguyenkhanh24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views5 pages

2019 BERT Stock Market

Uploaded by

caonguyenkhanh24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

BERT for Stock Market Sentiment Analysis

Matheus Gomes de Sousa Kenzo Sakiyama Lucas de Souza Rodrigues


FACOM/UFMS FACOM/UFMS FACOM/UFMS
Campo Grande, Brazil Campo Grande, Brazil Campo Grande, Brazil
[email protected] [email protected] [email protected]

Pedro Henrique de Moraes Eraldo Rezende Fernandes Edson Takashi Matsubara


FACOM/UFMS FACOM/UFMS FACOM/UFMS
Campo Grande, Brazil Campo Grande, Brazil Campo Grande, Brazil
[email protected] [email protected] [email protected]

Abstract—When breaking news occurs, stock quotes can to identify the feeling, whether positive, negative or neutral,
change abruptly in a matter of seconds. The human analysis of texts. In this work, we believed that the sentiment analysis
of breaking news can take several minutes, and investors in the applied to financial news could improve the quality of the
financial markets need to make quick decisions. Such challenging
scenarios require faster ways to support investors. In this work, decisions of financial agents.
we propose the use of Bidirectional Encoder Representations from In Mäntylä et al. [1] shows a literature review and the
Transformers (BERT) to perform sentiment analysis of news evolution of sentiment analysis. Tubishat et al. [2] presents
articles and provide relevant information for decision making in a literature review on sentiment analysis focusing on aspect
the stock market. This model is pre-trained on a large amount
of general-domain documents by means of a self-learning task. level. Zimbra et al. [3] is another literature review using
To fine-tune this powerful model on sentiment analysis for the Twitter. According to Zimbra et al. [3], the overall average
stock market, we manually labeled stock news articles as positive, sentiment classification accuracy is around 61% for a general-
neutral or negative. This dataset is freely available and amounts purpose system, and domain-specific approaches improve this
to 582 documents from several financial news sources. We fine- performance by an average of 11%. This improvement is
tune a BERT model on this dataset and achieve 72.5% of F-
score. Then, we perform some experiments highlighting how the generally related to domain-specific indicators of sentiment
output of the obtained model can provide valuable information that helps the domain-specific models.
to predict the subsequent movements of the Dow Jones Industrial The literature review presented by Zhang et al. [4] reviewed
(DJI) Index. sentiment analysis using deep learning techniques. According
Index Terms—natural language processing, sentiment analysis,
stock market
to this study, the standard approach for text representation
for more modern methods are based on word embeddings
I. I NTRODUCTION [5]. The most frequently learning models are recurrent neural
networks such as LSTM [6] and GRU [7]. Also, studies
News is one of the main drivers of the financial markets for using Attention Mechanism [8], [9] in sentiment analysis start
abrupt changes in stock prices. Because of this, financial mar- to grow. Studies that performs a welcome combination of
ket analysts need to constantly monitor and evaluate financial embeddings, bidirecional strategies and attention mechanism
news to support stock buying and selling decisions. However, starts to appear in the literature beating many state of the art
in the financial market stock prices can vary quickly and often algorithms.
the time of reading the text, which can take a few minutes,
A recent paper based on attention mechanism called Bidi-
can cost millions of dollars due to a late decision. Another
rectional Encoder Representations from Transformers (BERT)
factor that hinders the individual analysis of the news is the
[10] obtained state-of-the-art results on eleven natural lan-
amount of information generated by the hundreds of sources
guage processing tasks. The pre-trained BERT model was fine-
of information. Thus, two problems arise: the quantity of news
tuned with one additional output layer to create these state-of-
and time of analysis of the news. The increase in the amount of
the-art models. Our proposal in this paper is to evaluate BERT
news requires more prolonged reading and analysis time, and
on financial news sentiment analysis problem to improve stock
in contrast, a reduction in the time of response time requires
market prediction. As a short paper, this research is under
analysing less news. The solution to one problem implies
development, and we show preliminary results.
worsening the other problem, which makes it difficult to solve
Thus, this work aims to experimentally evaluate BERT in
using traditional techniques.
the task of stock market sentiment analysis. Further steps of
A possible solution for quick analysis of a large volume
this research will focus on improving stock market prediction.
of news is the use of computational algorithms for automatic
So far, the main contributions of this work are listed below.
analysis of text using Natural Language Processing (NLP).
One of the tasks in NLP is the sentiment analysis, that seeks • Corpus of 582 financial news manually labeled with senti-
ment1 from CNBC, Forbes, New York Times, Washington
Post, Business Insider and other news websites.
• BERT code extended for fine-tuning on sentiment analy-
sis2 , and additional code needed to reproduce this work3
are all freely available.
• Experimental evaluation comparing BERT, Support Vec-
tor Machines, Naive Bayes, and Convolutional Neural
Network.
• Data analysis highlighting the relation between the Dow
Jones Industrial index and the developed BERT senti-
ment classifier.
The rest of the paper is organised as follows. Section II
summarises BERT. Section III describes the main three parts
of the proposal. Section IV shows the experiments conducted
to validate the effectiveness of the proposal. Finally, Section V
concludes the study. Fig. 1. BERT architecture with a two-layer encoder. Ei , for i = 1, 2, . . . , N ,
(1)
II. BERT: B IDIRECTIONAL E NCODER R EPRESENTATIONS are the input representation vectors (one vector for each input token wi ). Ti
(2)
are the attention-based representation in the first encoder layer, and Ti
FROM T RANSFORMERS (2)
are the same in the second encoder layer. And Ti = Ti are the output
Pre-trained generic language models [10]–[13] have representation vectors, again, one per token. (Adapted from [10])
achieved great results on different NLP tasks. Such models
are unsupervisedly trained on large amounts of text and may
later be applied to potentially any task. BERT [10] is one of the computed as a (adaptive) weighted sum of the representations
most successful language model available. This model is based of all tokens within the sentence. This is the main strength
on transformer encoder [14]. The Transformer is a sequence- of transformer models, i.e., each token representation is based
to-sequence architecture based solely on attention mechanisms on the representations of all the tokens. Thus, the context is
for both the encoder and the decoder. BERT architecture limited only by the input sentence. The output of on attention-
ignores the decoder network, using only a transformer encoder, based layer is provided as input for the next one. The output
since it is not a sequence-to-sequence model (although it can of the last attention layer comprises the model output.
be used in such tasks). In BERT, each input sentence is augmented with an initial
Most language models are based on unidirectional architec- artificial token denoted [CLS], as can be observed in Figure 2.
tures, i.e., outputs are conditioned only on previous words (left When fine-tuning the model on a text classification task, the
context). When applying such models on downstream tasks,
fine tuned models are also limited to be left conditioned. This
is a limitation for tasks in which the whole text is available
during prediction. BERT introduces a bidirectional language
model architecture in order to explore such knowledge. Senti-
ment analysis is modeled as text classification and, thus, can
benefit from this aspect.
In Figure 1, we illustrate the basic BERT architecture. The
input for the network is the token representation vectors Ei ,
which is equal to the sum of three representation vectors for
each token: a typical word embedding vector, a position em-
bedding vector and a sentence vector. The position embedding
provides the model with information about the position of the
token within its sentence, since transformer models do not
have this notion. The sentence vector is used only when the
task requires a context broader than a sentence, which is not
the case for sentiment analysis (we consider a document as a
sentence).
The attention-based layers produce, for each input token (Ei
(1) (2)
for the first layer Ti , for instance), a representation (Ti )
Fig. 2. Sentiment classification using BERT. (Adapted from [10])
1 https://drive.google.com/open?id=1eqNwkqb1tnaJm_
l975K6LJBic8pMof1x
2 https://github.com/stocks-predictor/bert output representation of this artificial token is used to feed the
3 https://github.com/stocks-predictor/stocks-time-series classification layer, which is a typical softmax layer.
In the following, we give more details about the pre-trained TABLE I
BERT model employed in this work. W EBSITES USED TO COLLECT NEWS TO BUILD THE C ORPUS .

Source # Articles (#pos #neu #neg) %


III. P ROPOSAL Business Insider 51 8,7
The objective of the proposal is to indicate the trend of CNBC 77 13.2
Forbes 32 5.5
the Dow Jones Index before the opening time. We estimate Investopedia 41 7.0
the sentiment of the market using financial news before its New York Times 45 7.7
opening, and we used to predict the DJI day trend. Washington Post 31 5.3
The proposal can be split into three parts: (1) collecting and Others 305 52.4
Total 582 100.0
pre-processing stock news articles; (2) BERT-based model for
sentiment analysis; (3) and leveraging the developed model to
improve decision making related to stock market prediction.
In addition to that, the Alpha Vantage API [17] collect the
Figure 3 illustrates these three parts so the following sections
historical data of the Dow Jones Industrial Average (DJI) index
describe the details of each part.
in the same time of the news articles.

B. Sentiment Analysis
The creators of BERT proposed two models with different
values for the parameters L - layers, H - hidden layer size, A
- attention heads: a smaller called BERT BASE com L = 12,
H = 768 e A = 12 and a bigger called BERT LARGE com L
= 24, H = 1024 e A = 16. In this research due to our limited
computational power we used smaller BERT BASE.
We fine-tuned this pre-trained BERT BASE model using
our labeled set. For experimental evaluation purposes, we
performed 10 fold cross-validation, and for the running model,
we use the model trained using all labeled data.

C. Data Analysis
The data analysis evaluates the financial news mood before
the stock market opening time. The idea is to reproduce the
scenario where a financial agent is restricted to operate only
at the stock market opening time. Therefore, in this part, the
system estimates the mood of the available news between OT
- HB and OT, where OT is the opening time, and HB (hours
before) is a parameter. The proportion of positive news within
this time frame is used to indicate the direction of DJI.

IV. E XPERIMENTAL E VALUATION


This section evaluates the performance of BERT when
compared with naive Bayes, support vector machines (SVM)
Fig. 3. Proposal [18] and TextCNN [19]. The first two algorithms require
tabular data format and we converted the texts into bag-of-
words (bow) and term frequency inverse document frequency
A. Data Acquisition and Pre-Processing (tfidf) [20] representation. For TextCNN, we used the average
News articles were collected from different website sources vector of word embeddings obtained from fastText [21]. We
showed in Table I. The data was collected from May 26th performed a 10 fold-cross validation procedure to evaluate
to February 4th, 2019. We crawled the news articles using the learning algorithms. In Table II, we show performance by
Selenium tool [15]. Four volunteered members from our means of accuracy, precision, recall and F1. The best results
research group to manually labeled dataset as positive, neutral are presented in boldface. We adjust the parameters of SVM
or negative sentiment. using Random Search [22] with 20 iterations varying C in a
After data acquisition, each document is transformed in a exponential scale of 100, gramma in a exponential scale of
token sequence. The tokenization is made using WordPiece .1, and using a RBF kernel.
[16] with a 30,000 token vocabulary. On the use of WordPiece, Clearly, that BERT outperformed the other methods. When
it allows BERT even with a 3000 “words” vocabulary, able performing a paired t-test (p-value = 0.05) we find a significant
to tokenize almost every single word in the English language. difference between BERT and TextCNN.
TABLE II news really is indicative of stock market fluctuations. There-
E XPERIMENTAL R ESULTS WITH 10 FOLD CROSS VALIDATION . T HE fore, positive sentiment was considered, whenever the positive
NUMBER AFTER ± REPRESENTS STANDARD DEVIATION .
news rate in the period was higher than 50%, otherwise it
Algorithm Accuracy Precision Recall F1 would be considered negative, see Table III.
NB bow 0.610 ± 0.060 0.593 ± 0.196 0.557 ± 0.069 0.503 ± 0.103
SVM bow 0.628 ± 0.063 0.627 ± 0.074 0.609 ± 0.066 0.601 ± 0.071
NB tfidf 0.610 ± 0.062 0.607 ± 0.102 0.568 ± 0.065 0.542 ± 0.080 TABLE III
SVM tfidf 0.624 ± 0.076 0.631 ± 0.104 0.595 ± 0.083 0.578 ± 0.099 D OW J ONES INDEX OPENINGS AND CLOSINGS DATA AND THE SENTIMENT
textCNN 0.739 ± 0.05 0.703 ± 0.18 0.500 ± 0.14 0.569 ± 0.12 OF THE NEWS IN THE PERIOD .
BERT 0.825 ± 0.04 0.750 ± 0.17 0.713 ± 0.16 0.725 ± 0.15
Date Open DJI Close DJI Sentiment DJI Variation
08-04-2019 0.56 0.40 negative decrease
09-04-2019 0.40 0.08 negative decrease
For a more detailed analysis, we constructed the ROC curve 11-04-2019 0.08 0.06 negative decrease
of BERT results (Figure 4). The area under the ROC curve 12-04-2019 0.06 0.52 positive increase
(ROC AUC) is 0.87, which indicates how good the model can 15-04-2019 0.52 0.49 positive decrease
16-04-2019 0.49 0.58 positive increase
be to distinguish between positive and negative classes.
17-04-2019 0.58 0.64 positive increase
18-04-2019 0.64 0.82 negative increase
22-04-2019 0.82 0.72 positive decrease
Receiver operating characteristic 23-04-2019 0.72 0.95 positive increase
1.0 24-04-2019 0.95 0.90 positive decrease
25-04-2019 0.90 0.68 negative decrease
26-04-2019 0.68 0.72 positive increase
0.8

Therefore, in the analyzed period, 69 % of the periods


True Positive Rate

0.6 between opening and closing the stock market, the sentiment
of the news was consistent with the stock exchange variation.
0.4 However, the collection period was short, and more extended
periods must be evaluated to verify if the observed behaviour
0.2 is significant.
ROC (AUC= 0.87)
0.0 Random classifier V. C ONCLUSION
0.0 0.2 0.4 0.6 0.8 1.0
False Positive Rate The results indicate that BERT has superior performance
than the convolutional neural networks and word embeddings
Fig. 4. ROC Curve for Bert Classifier approach in the order of 8.6% when compared to the hit
rate(accuracy). The results comparing the time series of senti-
ment analysis of the news and the Down Jones index are very
A. Analysis of the Time Series of the Stock Market noisy and difficult to analyze. We use the sentiment analysis
of economic news as an indicator of falling or rising in the
We evaluate the relationship between the market sentiment
day. The proposal achieved 69% hit rate in the prediction of
and the time series of the stock exchange using BERT to
stock exchange variation. Although the data collection period
classify the news that was collected by the crawler for approx-
was short, the data presented in the III table gives a good
imately 1 month and the DJI. Thus, we plotted the time series
indication of the effectiveness of the implemented predictor.
of the positive news rate during each hour, overlapping the
Down Jones Industrial Average index includes several
stock exchange variation, and we plotted the moving average
stocks such as MSFT (Microsoft), INTC (Intel), BA (Boeing),
considering the previous 10 hours, to smooth the curve, as
among others. Only observing the DJI can be challenging to
seen in Figure 5. Note that the DJI index score was normalized
evaluate which stocks have better chance to change, as there
using the minmax_scale method of the sklearn [23] tool.
may be cases where the significant increase in the price of a
By the chart, we do not find much correlation with the analysis
particular stock may not increases the index. This discussion
of feeling with the Dow Jones.
shows us that, as future work, one can extract specific news
However, to verify if the analysis of feelings can be useful
from individual companies and do data processing and analysis
in identifying the trend of falling or rising the index in the day
on the value of the shares of those companies. Also, as
was adopted the following strategy. At the beginning of each
an extension of this work, one could observe news about a
day, 5 hours before the stock exchange opened (HB=5), the
company, collect its accounting data and build a more precise
average sentiment of the news was calculated. We hypothesize
predictor.
that the average sentiment that precedes the stock market
opening more strongly indicates the mood of the market in R EFERENCES
the period that the stock exchange is closed.
[1] M. V. Mäntylä, D. Graziotin, and M. Kuutila, “The evolution of
This mood was compared to the opening and closing stock sentiment analysis—a review of research topics, venues, and top cited
market index so that it could assess whether the rating of the papers,” Computer Science Review, vol. 27, pp. 16–32, 2018.
1.0

0.8

0.6
rate

0.4

0.2

Normalized Down Jones Index


0.0 Market Sentiment Moving Average
05 05:04h
06 05:07h
07 05:09h
08 05:00h
09 06:11h
10 06:14h
11 21:08h
12 21:00h
13 22:09h
14 22:06h
15 22:00h
16 22:04h
17 22:06h
18 22:00h
19 22:04h
21 07:07h
22 07:00h
23 07:05h
24 07:00h
25 07:00h
26 07:14h
27 08:59h
28 08:01h
29 08:05h
Day and time

Fig. 5. Variation of Normalized Down Jones Index (Green) and Market Sentiment Moving Average (Blue).

[2] M. Tubishat, N. Idris, and M. A. Abushariah, “Implicit aspect extraction [14] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
in sentiment analysis: Review, taxonomy, oppportunities, and open L. Kaiser, and I. Polosukhin, “Attention is all you need,” CoRR, 2017.
challenges,” Information Processing & Management, vol. 54, no. 4, pp. [15] SeleniumHQ, “Selenium automates browsers,” https://www.seleniumhq.
545–563, 2018. org/.
[3] D. Zimbra, A. Abbasi, D. Zeng, and H. Chen, “The state-of-the-art in [16] Z. C. Q. V. L. M. N. Yonghui Wu, Mike Schuster, “Google’s neural
twitter sentiment analysis: a review and benchmark evaluation,” ACM machine translation system: Bridging the gap between human and
Transactions on Management Information Systems (TMIS), vol. 9, no. 2, machine translation,” arXiv preprint arXiv:1609.0814, 2016.
p. 5, 2018. [17] A. V. I. . 2019, “Free APIs for realtime and historical financial data,
[4] L. Zhang, S. Wang, and B. Liu, “Deep learning for sentiment analysis: A technical analysis, charting, and more!” https://www.alphavantage.co/.
survey,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge [18] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning,
Discovery, vol. 8, no. 4, p. e1253, 2018. vol. 20, no. 3, pp. 273–297, 1995.
[5] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, [19] S. Rosenthal, N. Farra, and P. Nakov, “Semeval-2017 task 4: Sentiment
“Distributed representations of words and phrases and their composi- analysis in twitter,” in Proceedings of the 11th international workshop
tionality,” in Advances in neural information processing systems, 2013, on semantic evaluation (SemEval-2017), 2017, pp. 502–518.
pp. 3111–3119. [20] A. Aizawa, “An information-theoretic perspective of tf–idf measures,”
[6] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Information Processing & Management, vol. 39, no. 1, pp. 45–65, 2003.
computation, vol. 9, no. 8, pp. 1735–1780, 1997. [21] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word
[7] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of vectors with subword information,” Transactions of the Association for
gated recurrent neural networks on sequence modeling,” arXiv preprint Computational Linguistics, vol. 5, pp. 135–146, 2017.
arXiv:1412.3555, 2014. [22] J. Bergstra and Y. Bengio, “Random search for hyper-parameter opti-
[8] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by mization,” Journal of Machine Learning Research, vol. 13, no. Feb, pp.
jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 281–305, 2012.
2014. [23] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-
[9] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-
Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances
esnay, “Scikit-learn: Machine learning in Python,” Journal of Machine
in neural information processing systems, 2017, pp. 5998–6008.
Learning Research, vol. 12, pp. 2825–2830, 2011.
[10] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
of deep bidirectional transformers for language understanding,” in
Proceedings of the 2019 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language
Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4171–4186.
[11] J. Howard and S. Ruder, “Universal language model fine-tuning for
text classification,” in Proceedings of the 56th Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long Papers),
2018.
[12] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and
L. Zettlemoyer, “Deep contextualized word representations,” in Proc. of
NAACL, 2018.
[13] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever,
“Language models are unsupervised multitask learners,” CoRR, 2019.

View publication stats

You might also like