Application of Fake News Detection in Stock Market Analyzer and Predictor Using Sentiment Analysis
Application of Fake News Detection in Stock Market Analyzer and Predictor Using Sentiment Analysis
8.1 Introduction
With the continuous development of Internet finance and stock market, massive stock
data with rich information has been generated. Because people lack effective methods
and practical technologies to extract valuable information in such complex situation,
it is hard to understand and draw information from the massive stock data. So, it
becomes a hot issue how to use a certain algorithm or data processing technology
to effectively mine the rules hidden in stock data and find out the stock price trend.
Y. Yao
School of Mathematical Sciences, Peking University, Beijing, China
X. Wu
School of Cyberspace Security, Nanjing University of Science and Technology, Nanjing, China
P. Zhang (B)
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China
e-mail: [email protected]
School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
The use of artificial intelligence algorithm combined with big data platform has a
strong approximation ability to the nonlinear relationship of massive stock market
data. Therefore, people gradually begin to use computer algorithms to invest in the
stock market. In the whole trading process, the prediction of stock prices movement
is an extremely important link.
In today’s constantly growing economy, predicting and analyzing the stock market
is crucial because it reflects the economic situation, but it is also a challenging
task. This is because of the nonlinear characteristics of stock data. Other factors
that make stock prices difficult to predict include society, economy, public opinion,
media, trading behavior, politics, etc. Professional traders have created a series of
analytical methods, including techniques, fundamentals, quantification, etc., in order
to make predictions. The availability of massive data has prompted researchers to
apply machine learning technology to the stock market, and some of these studies
have produced quite promising results [8].
However, it is undeniable that although the financial market has become one of
the earliest applications to adopt machine learning (ML). Since the 1980s, people
have been using ML to discover patterns in the market. Although ML has made
significant progress in predicting market outcomes, recent deep learning has not
significantly improved the prediction of financial markets. Although deep learning
and other ML technologies have finally made Alexa, Google Assistant, and Google
Photos possible, there seems to be little progress in the stock market [2, 3].
As stated in the Efficient Market Hypothesis, markets cannot be fully predicted,
making it extremely difficult to apply these research results to real-world invest-
ment trading techniques and make price predictions. The exploration in this field has
a complete development history, from support vector machine to time series anal-
ysis, from random forest to memory neural network. Until now, technical analysis
methods have developed completely and maturely, but the fact that financial markets
are inherently unpredictable makes it difficult for pure technical analysis to further
develop [5]. Compared to datasets for image classification, this obvious requirement
is difficult to meet for most financial datasets. Another issue is equally important, in
fact, dealing with the difference between future and history is a common challenge
in machine learning. In addition to ensuring that the test and training datasets have
similar distributions, it is also necessary to ensure that only when future data follows
the training/validation distribution is suitable for application [6]. The final problem
is also the fundamental one. High frequency trading and algorithmic trading domi-
nate the short-term price, while news and public opinion dominate the price trend
for many days. Different factors work together in the financial system, making it
difficult to deal with simple single factor machine learning.
Among all these factors that affects the stock price like stock transaction price,
trading volume, etc., investor sentiment is the most intuitive and important factor
[1]. The earliest quantitative indicators of sentiment were measure during market
transaction data, such as the discount number and turnover rate of closed-end funds.
However, these data are indirect to reflect the investor’s sentiment, but under the past
technologies and data conditions, the quantification of indirect indicators has been
the best means. Since then, many researchers have also made improvements, hoping
8 Application of Fake News Detection in Stock Market Analyzer … 95
to use more direct means to learn the views of investors. With the progress of Internet
and computer technology, it has become a common used method to analyze stock
market sentiment through text data in recent years. The existence of the Internet can
provide enough text big data (such as news, we media, twitter, etc.) for such methods.
The artificial intelligence algorithm analyze the investor’s emotional text with natural
language processing, so as to find out the stock price law [4, 15]. Compared with
traditional methods, such text data contain more emotional information and cover a
wider range, but the credibility of these data is not sufficient. Removing those fake
news so as to improve the quality of the data set can be a feasible way to improve
the model performance [7].
At the same time, the machine learning model of fake information detection has
been fully developed, both simple models based on LIAR package and complex
models have good performance [16, 19]. Therefore, we can use the fake information
detection model to screen the emotional texts collected to make the stock price
prediction model better.
8.2 Experiment
8.2.1 Methods
We reprint a stock price forecasting model based on emotional text analysis. Price data
of several stocks of NASDAQ and relevant news information from news media like
Reuters are collected and then pretreated. Uppercase letters are replaced by lowercase
letters, punctuation is removed, we unify tense, singular, plural, and remove those
stop words [9, 10].
First the words of these new pretreated are passed through a learna bleem
bedding
matrix (randomly initialized initially) to map into a vector f = p ∈ R m+q , in
n
which n represent the feature extracted from the news and p represent the price data of
the ticker related to the news. A convolutional neural network is trained using all these
vectors [11]. In this problem, cross entropy is used as the loss function, and a normal
noise proportional to the learning rate is added in the process of gradient descent so as
to use Stochastic Gradient Langevin Dynamics to obtain more robustness. Because
of the randomness of model training, we use the same group of training data to train
the model multiple times, select the model that performs best in the validation set
and test the performance of the model in the test data set.
After that, we selected an open source fake news detection model [14], and then
copied and modified it. The model is based on LIAR dataset [12, 13], which mainly
extracts features from the content of the news and also employs an attention mecha-
nism to use the side information of the subject, author, title and other side information.
We use a learnable embedding matrix to map words to vectors. A context query vector
q ∈ R 1×d is set to record the summarized side information. A convolution neural
network (CNN) is applied over this new word representation matrix E ∈ R n×d where
96 Y. Yao et al.
n is the number of words in the statement, and extracts features f c ∈ R F×n×1 from
E. And features are also learnt by a CNN model so as to give out the authenticity
rating as well as the fakeness rating.
We collate the collected news data of different stocks and use the fake news
detection model to score the authenticity of them, reorganize the news with higher
authenticity score and train another the stock price prediction model again. And we
compare and analyze the accuracy of the models selected for different authenticity.
For baseline, we train the models for 500 epochs with no early stopping. The learning
rate was chosen 0.001 × epoch−1 . And we train other 4 models with dataset processed
under different Authenticity Standard (compute the ratio of the authenticity score to
fake score, if the ratio is higher than the standard, the news will be accepted) [17, 18].
We apply cross entropy as loss and collect the accuracy of different models, and
we also collect the accuracy when the prediction (a variable between 0 and 1, in
which 1 represents an increase and 0 represents a decrease) >0.7 or <0.3 and when
the prediction >0.9 or <0.1. Because of the randomness in the model generation
process, we repeat training 5 times for each model and selected the ones with better
performance.
From Table 8.1 we can see the loss is improved by 10.74% in maximum and
accuracy has an improvement of 4.37% in maximum. Although the improvement
is not significant, compared with baseline, different models trained by new data
obtained under different standard all have steadily improved in terms of loss function
and accuracy (except accuracy when prediction >0.9 or <0.1). The least improvement
of loss is 7.33% attributed by dataset with standard 0.5, and the least improvement
of accuracy is 0.29% attributed by dataset with standard 0.6. This shows that using
fake news detection helps in improving the quality of dataset and the performance
of the stock price movement prediction model (Fig. 8.1).
Fig. 8.1 Training 5 times for each model and selected the ones with better performance
learning technology to analyze emotional text can obtain more comprehensive and
sensitive information better understand market conditions, and assist investors in
making wiser investment decisions.
Secondly, using machine learning techniques to analyze emotional text to predict
stock prices can improve the regulatory capabilities of financial regulatory agencies.
Financial regulatory agencies need to better understand and regulate market condi-
tions to ensure market stability and fair competition. Using machine learning tech-
nology to analyze emotional text can obtain more comprehensive market information,
better monitor market risks, and better protect the interests of investors.
Of course, using machine learning techniques to analyze emotional text to predict
stock prices also faces some challenges and limitations. The biggest challenge is
how to accurately identify and quantify emotional signals and relate them to actual
fluctuations in stock prices. In addition, it is necessary to consider multiple factors that
affect stock prices, such as the company’s financial situation, market environment,
etc., to ensure the stability and accuracy of the prediction results.
Overall, using machine learning techniques to analyze emotional text to predict
stock prices has broad application prospects in the future. With the continuous
improvement of artificial intelligence technology and data processing capabilities,
this technology is expected to become an important reference for financial market
investment decisions and have a positive promoting effect on financial regulation.
Of course, any technology has its drawbacks, and for predicting stock prices
through emotional text analysis, the following points should be noted:
Firstly, there is uncertainty in the analysis results of emotional texts. The emotional
analysis of emotional texts is influenced by many factors, such as the context of the
text, cultural background, personal emotions, etc. Therefore, using it as the main
basis for predicting stock prices may have errors and biases.
Secondly, the stock market is very complex. In addition to emotional factors, the
stock market is also influenced by numerous economic, political, and social factors.
The changes in these factors may have a greater impact on stock prices, while the
impact of emotional texts is relatively small.
Finally, the reliability of the data needs to be ensured. When using machine
learning technology to analyze emotional texts, it is necessary to ensure the reli-
ability of data sources and the accuracy of data annotation. If there are issues with
the data, it may affect the results of the analysis, thereby affecting the accuracy of
the prediction.
Therefore, using machine learning technology to analyze emotional texts to
predict stock prices requires comprehensive consideration of multiple factors, and
strict screening and validation of the data to ensure the accuracy and reliability of the
prediction results. The emotion text analysis machine learning technology targeted in
this article is aimed at compensating for the shortcomings of data filtering. It is also
hoped that in future research, the role and methods of data filtering will be further
explored, and it is also hoped that solutions and methods can be found for other key
difficulties at the same time.
8 Application of Fake News Detection in Stock Market Analyzer … 99
References
1. H. Lee, M. Surdeanu, B. Maccartney, et al., On the importance of text analysis for stock price
prediction. surdeanu info (2014)
2. J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional
transformers for language understanding (2018). arXiv preprint arXiv:1810.04805
3. M. Welling, Y. Whye Teh, Bayesian learning via stochastic gradient Langevin dynamics. ICML
(2011)
4. S. Kogan, T.J. Moskowitz, M. Niessner, Fake news: evidence from financial markets (2019).
Available at SSRN 3237763
5. T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficientestimation of word representations in vector
space (2013). arXiv preprint. arXiv:1301.3781
6. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L.
Antiga, A. Lerer, Automatic differentiation in pytorch (2017)
7. J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: pre-training of deep bidirectional trans-
formers for language understanding. In: Proceedings of the 16th Annual Conference of the
North American Chapter of the Association for Computational Linguistics (2018)
8. R.A. Kamble, Short and long-term stock trend prediction using decision tree (2017)
9. N. Vo, K. Lee, Hierarchical multi-head attentive network for evidence-aware fake news
detection (2021)
10. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, I. Polosukhin,
Attention isall you need, in Advances in Neural Information Processing Systems, pp. 5998–6008
(2017)
11. J. Pennington, R. Socher, C. Manning, Glove: global vectors for word representation,
inProceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
(EMNLP), pp. 1532–1543 (2014)
12. W. Yang Wang, “liar, liar pants on fire”: a new benchmark dataset for fake news detection,
in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics
(Volume 2: Short Papers) (2017)
13. S. Madge, Predicting stock price direction using support vector machines (2015)
14. P. Shi, J. Rao, J. Lin, Simple attention-based representation learning for ranking short social
media posts (2018). arXiv preprint. arXiv:1811.01013
15. Y. Guo, Stock price prediction based on LSTM neural network: the effectiveness of news
sentiment analysis (2020)
16. Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, Q.V. Le, Xlnet: generalized
autoregressive pretraining for language understanding (2019). arXiv preprint. arXiv:1906.
08237
17. E. Ranjan, Fake news detection by learning convolution filters through contextualized attention
(2019)
18. X. Ding, Deep learning for event-driven stock prediction. IJCAI (2015)
19. W.Y. Wang, “liar, liar pants on fire”: a new benchmarkdataset for fake news detection (2017).
arXiv preprint. arXiv:1705.00648
20. H. Grigoryan, A stock market prediction method based on support vector machines (SVM) and
independent component analysis (ICA) (2016)
21. T. Loughran, B. McDonald, When is a liability not a liability? Textual analysis, dictionaries,
and 10-Ks. J. Financ. 66(1), 35–65 (2011)
22. Y. Kim, Convolutional neural networks for sentence classification (2014). arXiv preprint. arXiv:
1408.5882