0% found this document useful (0 votes)
12 views

Interpretable Stock Anomaly Detection Based On Spa

This document discusses using a spatiotemporal convolutional neural network-based relational network model with genetic algorithms to detect anomalies in stock market data in an interpretable way. The proposed model can learn correlations between financial time series data and identify abnormal situations. An interpretability model is then applied to explain the reasons for anomalies to investors.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Interpretable Stock Anomaly Detection Based On Spa

This document discusses using a spatiotemporal convolutional neural network-based relational network model with genetic algorithms to detect anomalies in stock market data in an interpretable way. The proposed model can learn correlations between financial time series data and identify abnormal situations. An interpretability model is then applied to explain the reasons for anomalies to investors.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

Digital Object Identifier

Interpretable Stock Anomaly Detection


based on Spatio-temporal Relation
Networks with Genetic Algorithm
MEI-SEE CHEONG1 , MEI-CHEN WU1 and SZU-HAO HUANG2
1
Institute of Information Management, National Chiao Tung University, Hsinchu 30010, Taiwan
2
Department of Information Management and Finance, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan
Corresponding author: Szu-Hao Huang ([email protected])
This work was supported in part by the Ministry of Science and Technology, Taiwan, under Contract MOST 110-2622-8-009 -014 -TM1,
Contract MOST 109-2221-E-009 -139, Contract MOST 109-2622-E-009 -002 -CC2, and Contract MOST 109-2218-E-009 -015; and in
part by the Financial Technology (FinTech) Innovation Research Center, National Yang Ming Chiao Tung University.

ABSTRACT Instability in financial markets represents a considerable risk to investors; examples of


instability include a market crash caused by systematic risks and abnormal stock price volatility caused
by artificial hype. The early detection of abnormal behavior can help investors adjust their strategy and
reduce investment risks. We proposed a spatiotemporal convolutional neural network–based relational
network (STCNN-RN) model that can learn the complex correlations between multiple financial time-
series data sets, and we used genetic algorithms with a constrained gene to discover the time points for
outlier companies by fitting the STCNN-RN model; we used these outlier points to identify abnormal
situations. Most research on identifying anomalous patterns has been unable to sufficiently explain the
reason for anomalies to investors. We applied an interpretability model to enable investors to understand
these anomalous time points in relation to companies and discover the key factors giving rise to the
anomalies. The experiment results revealed that the proposed model can be used to model multiple financial
time-series data sets and to capture anomalous situations in relevant companies. Because this study explored
the discovery of anomaly phenomena in all transaction data and the explanation of these abnormalities,
investors can understand a stock market situation holistically.

INDEX TERMS Anomaly Detection, Genetic Algorithm, Interpretable Model, Relation Network,

I. INTRODUCTION changes; thus, the challenge is monumental. Even though


representative mathematical methods or prediction model
Although artificial intelligence has proved useful in nu- methods had delivered highly accurate results for less chal-
merous fields and has been applied to financial problems, lenging problems, those methods have delivered less accurate
technologies for processing financial market data are not as results for financial data. Even some promising prediction
mature as those for processing general text or image data. results have suddenly failed, and such prediction methods
General scientific data can be modeled and analyzed simply. seem far less effective than judgments and decisions made
Text or image data can be analyzed from a purely scientific by humans using past experience.
perspective, and common logic can be used to summarize a
set of logical, potential, or inherent rules. Financial data tend The financial markets are highly complex, chaotic, and
to change due to factors relating to the environment, politics, continuously dynamic environments that exert substantial
the military, and the media. Because numerous factors can effects on economies. Therefore, the rise and fall of stock
affect financial data, it is difficult to summarize the main prices substantially affect investor earnings. It is a major and
factors that cause changes in the financial markets. First, it difficult challenge to predict stock market prices or trends.
is necessary to collect raw data with complete and multiple In the early twenty-first century, considerable research has
characteristics in a short time, second, deep learning must be addressed this challenge. Machine learning methods have
used to model the current situation of the financial market, been used to predict stock prices. Alkhatib et al. [1] used a k-
and third, the prediction results must be close to the real price nearest neighbor (KNN) algorithm and nonlinear regression

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

to predict stock prices; their experiment results revealed that ars have published numerous reasoning models for objects
the prediction results of the KNN algorithm were highly simi- or entities; these models may operate between the original
lar to actual stock prices. Fenghua et al. [2] employed various data reasoning relations. Among reasoning models, relational
economic features in a support vector machine (SVM) to networks (RNs) [12] are the most common. The method
make price predictions. The results indicated that the pre- of Santoro et al., proposed at DeepMind in 2017, involves
dictive methods that combine the price features into SVMs the use of RNs as a simple plug-and-play module to solve
have stronger performance. Hegazy et al. [3] proposed an problems that fundamentally hinge on relational reasoning.
algorithm that integrates particle swarm optimization (PSO) Most RN studies have focused on pattern recognition and
and a least-squares SVM (LS-SVM) to predict stock prices. classification tasks. To date, most RN studies have mainly
The results revealed that the proposed model had a more focused on pattern recognition and classification tasks.
favorable prediction accuracy and that the PSO algorithm In addition to studying the interactions between finan-
has potential for optimizing an LS-SVM. Kazem et al. [4] cial products to help investors understand financial markets,
proposed a forecasting model based on chaotic mapping, many studies have found that anomaly detection plays a vital
a firefly algorithm, and support vector regression (SVR) to role in financial investigations. Detecting anomalies can help
predict stock prices. Compared with related algorithms, the investors make investment decisions and can also reduce
proposed model exhibited the highest performance in terms investment risks. Scholars have applied anomaly detection
of two error measures: mean squared error (MSE) and mean to crowded scenes [13], pricing data [14], network intrusion
absolute percent error (MAPE). The preceding discussion [15], hyperspectral images [16], and various other topics. For
indicates that machine learning techniques exhibit favorable research on anomaly detection in the financial field, the vast
performance in stock market price prediction. majority of studies have used traditional methods as the main
In the past, machine learning methods used human knowl- research tool. Typical anomalies can be divided into market
edge for feature extraction from data. The difference between anomalies and pricing anomalies in the financial market.
machine learning and deep learning is that deep learning uses “Market anomalies” refers to the difference in returns and the
a multilayer neural network to scrutinize the data and extract contradiction between efficient market assumptions. “Pricing
relevant characteristics [5]. Numerous deep learning methods anomalies” means that the pricing of something (such as
have been published. Bao et al. [6] proposed a novel deep stocks or securities) is different from the pricing predicted by
learning framework in which wavelet transforms (WTs), the model; the two most representative models are the capital
stacked autoencoders (SAEs), and long short-term memory asset pricing model (CAPM) [17] and the Fama–French
(LSTM) are combined for stock price forecasting. Their three-factor model [18].The CAPM and Fama–French three-
results revealed that the proposed model outperforms other factor model are linear models of traditional finance. Com-
similar models in both predictive accuracy and profitability pared with existing deep learning models, these two models
performance. Hafezi et al. [7] proposed a bat-neural network do not completely fit the relevant theories. The complexity
multiagent system (BNNMAS) to predict stock price. The inherent in existing deep learning models enables them to
results revealed that the BNNMAS performs accurately and simulate these theories well, indicating that existing complex
reliably; thus, it can be considered a suitable tool for pre- neural network models have a certain generalization abil-
dicting stock prices, especially over the long term. Khare ity.Researchers have also applied deep learning techniques
et al. [8] used feedforward neural networks and recurrent to financial market data [19], [20]for anomaly detection.
neural networks (RNN) to forecast short-term stock prices. Most research on anomaly detection has not been able to
Their results indicated that the feedforward multilayer per- explain the reasons for anomalies; most research has not ef-
ceptron outperforms LSTM at predicting short-term stock fectively reminded investors of crucial details. The frequency
prices. Selvin et al. [9] identified the latent dynamics in of anomalous events is irregular, and no clear definition of
data by using deep learning architectures. These researchers anomalies can be found in financial data. Financial market
employed RNN, LSTM, and convolutional neural network data do not have exact labels, making them dissimilar to text
(CNN) architectures for price prediction of national stock or image data when used in supervised learning; this com-
exchange–listed companies and compared the performance plicates the use of traditional neural networks for training,
of these architectures. The results revealed that the proposed detection, and response.
system is capable of identifying some interrelations within In the context of the aforementioned shortcomings and
the data. These results highlight that a CNN architecture can concerns, this paper presents methods for learning stock mar-
be applied to identify changes in trends. The preceding dis- ket trends and capturing the anomalous time points relevant
cussion reveals that many studies on deep learning techniques to companies. It is vital to explain and to analyze the causes
have focused on stock price forecasting. of anomalies. However, financial deep learning is currently
Some studies have used statistical models or traditional fi- limited to learning the interactive relationship between two
nancial methods [10], [11] to predict stock prices and trends. sequences. Even though investors seek to comprehend the
However, we believe that financial market data already con- trends of entire financial markets, the existing models simply
tain some of the existing knowledge. Any financial system cannot meet this challenge. However, research on applying
exhibits interactions between financial commodities. Schol- RNs to the visual question answering (VQA) framework has
2 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

been quite successful. Such systems perform image recog- •Practical applications in various financial markets: We
nition tasks accurately, and the results for some data sets validated our studies by using multiple data sets. Minute
are superior to human judgments. Given that the input data price data of S&P 100 constituent stocks were taken
applied to the learning task of the RN are mostly image from Wharton Research Data Services Trade and Quot-e
data and a RN model can perform object reasoning, we pro- (WRDS TQA) [23]. Minute price data of FTSE TWSE
pose a spatiotemporal convolutional neural network–based Taiwan 50 index constituent stocks were taken from
relational network (STCNN-RN), which can simultaneously the Taiwan Stock Exchange Corporation (TWSE). Daily
process spatiotemporal data and consider the complex inter- price data of SSE 50 index constituent stocks were taken
active relationships of the input data. Initially, we use RN to from the Shanghai Stock Exchange (SSE).
construct our model, and when our model successfully fits The remainder of this paper is organized as follows: In
the training data during a training process, we must discover section 2, a brief introduction to some related works is
the most striking outlier time points that cannot fit the model. provided. Section 3 presents the proposed STCNN-RN and
Such discovery is a complex optimization problem, so we GACG method, and the experimental results are provided
use a genetic algorithm with a constrained gene (GACG) to in section 4. Finally, the conclusion of this paper and future
discover the most notable outliers among the anomalous time research suggestions are provided in section 5.
points. Consequently, we contend that when the model fits
most of the training data, the time points that cannot fit the II. RELATED WORKS
model are anomalous time points that are incompatible with In this section, we explain our learning-to-reason model and
the entire financial market. This fundamentally describes our how we perform anomaly detection through deep learning
strategy to discover anomalies. To enable investors to under- and explainable artificial intelligence approaches.
stand these anomalous time points, we use the local inter-
pretable model–agnostic explanations (LIME) interpretable A. LEARNING TO REASON MODEL
model to discover the key factors of abnormalities. We are convinced that existing knowledge and information
This study proposes an effective anomaly detection sys- exist in financial market data. In financial market infor-
tem. This system includes an improved RN to learn the mation, various financial commodities have interactive re-
performance levels and interactions of various companies in lationships with the financial market, and various relation-
the stock market, the results of which can assist investors to ships operate between different financial commodities. The
comprehend the anomalous time points exhibited by compa- learning-to-reason model can reason about (and discover)
nies. Furthermore, investors can be informed of the causes the relationships between various objects or entities. This
of anomalies, and investors can improve their understand- learning-to-reason model entails the process of discovering
ing of anomaly patterns. The research framework that we whether the system has meaningful patterns of information
have established can be used to conduct arbitrage. Despite flow or data transformation [24].
numerous publications on anomaly detection and analytical To analyze the relationships between time series data
methods, but most of which has not addressed time series and to model such data, some researchers have used tradi-
data, only a few studies have focused on financial data, such tional statistical methods and complex mathematical models.
as accounting data [21], [22] and credit card data [19]. Unlike Podobnik et al. [25] proposed a method based on detrended
these approaches, we propose a novel system that can capture cross-correlation analysis in physics, physiology, and finance
the relationships between multiple companies and examine to analyze the relationship between two sequences; their
anomalous trends by using time series data and analyzing the method may have added diagnostic capabilities to the statis-
causes of anomaly patterns. tical methods that were current at the time. Wang and other
Our major contributions of this research can be summa- researchers [26] conducted research on the China stock mar-
rized as follows: ket. From the perspective of statistical analysis, the authors
• RN-based market model: Our proposed research frame- used the detrended cross-correlation analysis between the
work is the first to use the STCNN-RN to model the return series of the China A-share and B-share markets and
complex cross-correlations between various companies found that the system has long-term and short-term cross-
in the stock markets; this framework can fit regular correlations. Li and Liu [27] conducted research on cross-
market behaviors between each pair of companies. correlations between the agricultural commodity markets and
• Genetic algorithm–based anomaly detection method: the oil markets. The authors also used the DCCA method to
Our proposed STCNN-RN and GACG can discover discover that high oil prices caused the food crisis between
anomalous time points relevant to companies and can 2006 and 2008. Kullmann et al. [28] sought a time correlation
discover major events in stock markets. of returns between New York Stock Exchange stocks and
• LIME-based interpretable model: We use the LIME studied whether the return of one stock can affect the return
interpretable model to analyze and explain the causes of another stock at different times.
of anomalies. Using interpretable models to explain The aforementioned research sought interactive relation-
anomaly patterns makes it easier for investors to under- ships between various time series data, but those studies
stand the market situation. only modeled the cross-correlation between two sequences.
VOLUME 4, 2016 3

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

Because the financial market is a complex and changeable data mining on illegal insider trading cases and historical
environment, we require a model that can learn from multiple stock volume data. Furthermore, this research was the first
time series data sets to capture the complex interactions of such study of illegal insider trading using real cases.
financial market data. To surmount the previously discussed Schreyer et al. [20] argued that the general anomaly de-
shortcomings, a novel model called RN [12] was proposed. tection method is to discover existing anomaly patterns from
RN is a neural network model proposed by DeepMind in existing accounting data. Although a set of manual rules
2017 to solve the relational reasoning problem. Most of the for catching anomalies can be learned from these existing
features of RNs focus on pattern recognition and classifica- anomaly patterns and the effect can be exceptional, it is to
tion, such as object detection, few-shot learning [29], image be expected that fraudsters will gradually discover methods
recognition [30].Just as CNNs have spatial translation in- to avoid these anomaly detection tools. Therefore, the au-
variance, RNs are inherently capable of relational reasoning. thors proposed a method based on deep autoencoder neural
By constraining the functional form of a neural network, an networks to detect anomalous accounting data. Their results
RN has the core common properties of relational reasoning. proved that the f1 scores obtained by this method were higher
Scholars have researched image recognition by combining an than those of the benchmark method and the false alarm rate
RN with a VQA system [31] The system performed excep- was also lower than that of the benchmark method. Roy and
tionally well with both three-dimensional-rendered objects other researchers [19] conducted research on anomaly detec-
and a text-based VQA data set. tion in credit card data. They sought to discover anomalies in
In light of the aforementioned research, we observe that credit card fraud from existing credit card transaction data
the learning-to-reason model has some deficiencies. Most of and historical customer data, thus providing a solution to
the current research on RNs has focused on image tasks for the problem of credit card fraud detection. They proposed
learning, but few studies have been published in the financial a deep neural network model to solve the problem of credit
field. Much can be improved in the existing RN model. card fraud based on deep learning methods. It was found
Because the tasks that the existing RN must address are that the effect of LSTM and GRU is significantly better than
pattern recognition and classification, most tested data sets that of a typical neural network for distinguishing abnormal
consist of images, and thus, the relation module focuses on transaction data from typical transaction data.
the spatial relationships among data. We contend that as long Few studies have been published on anomaly detection for
as different convolution operations are used on the input data, time series data in the deep learning field; the present paper
the relation module can be arbitrarily mobilized, meaning explains a deep learning method that can discover anomalous
that both spatial and temporal relationships can be addressed. time points for stocks and companies in time series data. Few
However, given the success of the RN in pattern recognition studies have applied deep learning methods to detect fraud
and classification and its ability to process spatiotemporal in accounting data or assess whether credit card transaction
data, we propose that the RN has the ability to process data contain fraudulent data. To date, no studies have used
financial data in multiple time series. Therefore, we further deep learning to discover anomalies in stock data.
contend that the RN can be used to learn the characteristics
of the inference problem between objects and to discover C. EXPLAINABLE ARTIFICIAL INTELLIGENCE
the interactive relationships between different time points APPROACHES
relevant to companies. Deep learning is developing rapidly in the early twenty-
first century. Deep learning tasks (e.g., natural language
B. ANOMALY DETECTION USING DEEP LEARNING processing, image processing, recommendation systems, and
Anomaly detection technique can identify anomalies, novel anomaly detection) have been distinguishing themselves by
data, and outlier data within vast quantities of data.Zenati et high performance levels. Among typical deep learning tasks,
al. [32] presented high-performance generative adversarial image processing is the most commonly studied; it is a
networks (GANs) for anomaly detection using image data core research field of computer science and engineering. Of
sets and network activity data sets. Leangarun et al. [33] the successful applications of deep learning, training CNNs
demonstrated a model that merged a long short-term memory to recognize patterns and images has received considerable
network (LSTM) and GANs to detect stock price manipu- attention. CNNs that recognize patterns and images tend to
lation. This was the first study that used LSTM-GANs to encounter problems and vulnerabilities, such as some outliers
investigate stock price manipulation using time series data. or adversarial examples that can confuse neural networks.
However, Wang et al. [34]found that a company’s character- Therefore, common algorithms and methods can be ap-
istic features can effectively be used to distinguish between plied with neural networks [36]of interpretability and model
manipulated and nonmanipulated stocks. Published models inspection. Those methods can help us understand how a
have rarely incorporated this type of discriminatory feature. model works. Studies in a variety of fields have proven
The authors applied an RNN-EL model for stock price ma- that intrinsically interpretable models can look at internal
nipulation problems using data sets with trading data and model parameters and make self-interpretations.A consider-
characteristic company features. Islam et al. [35] proposed able number of studies of counterfactual explanations have
detecting illegal insider trading of stocks by using proactive changed some of the features to change predicted out-
4 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

comes in a relevant manner. Several representative methods


have been published for interpretable models. For example,
Ribeiro and colleagues [37] proposed the LIME, in which
this method interprets the prediction of any classifier in
an interpretable and faithful manner by learning a local
interpretable model based on the prediction. Through experi-
ments, the authors proved that the effectiveness of interpreta-
tion can be demonstrated in various situations that require
trust. In 2018, Ribeiro et al. [38] proposed an algorithm
that can explain a black-box model with a high-probability
guarantee and proved that the algorithm can predict the
behavior of the model with less-than-usual effort and higher-
than-usual accuracy. Guidotti et al. [39] considered decision-
making systems with high accuracy but ambiguous judgment
reasons. To explain these black-box subsystem decisions,
the authors required some interpretation tools to reveal the
reasons for predicting variables to make specific decisions.
Therefore, LORE was proposed, which can provide inter-
pretable and faithful explanations. The experiment results
proved that the proposed method is superior to the benchmark
method.
Some scholars have found that anomaly detection studies
have not explained to investors why such anomalies are
caught; the user cannot understand the reason for the forma-
tion of the anomaly. We expect that in addition to catching
anomalies related to stock time series data, we can also
explain the anomalies that were caught, enabling investors
to understand the reasons for anomalies.
The preceding research summary indicates that the linear
models of traditional finance do not completely fit the rel-
evant theories. Deep learning models enable linear models
Fig. 1. Overview of our system
to accurately simulate these theories, indicating that existing
complex neural network models have a certain generalization
ability. Most studies have employed deep learning methods
to identify the interactive relationship between two sets of the anomaly. We used the local interpretable model-agnostic
financial time-series data. Our contribution involves deter- explanation (LIME) model to analyze and explain the causes
mining the interaction between various companies at multiple of anomalies, thus helping investors to understand the market
time points. Moreover, most research on RNs has focused on situation.
image tasks for learning, but few studies have investigated
anomaly detection for time-series data in the financial field. III. PROPOSED METHOD
Our proposed research framework is the first to combine RNs In this part, we first introduce the overview of our system in
to model the complex cross-correlations between various Section 3.1, and we introduce the STCNN-RN and GACG,
companies in stock markets. our main proposed methods in detail in Section 3.2 and
Financial market data do not have exact labels, making Section 3.3, respectively.
them dissimilar to text or image data when used in super-
vised learning. Therefore, traditional neural networks have A. SYSTEM OVERVIEW
some difficulty when training with such data. Our proposed Fig. 1 displays an overview of our system. The ultimate goal
STCNN-RN can learn the complex correlations between of the system is to identify the most abnormal time points in
multiple financial time-series data sets; by using genetic algo- the financial time-series data. The anomaly information that
rithms with a constrained gene (GACG) to discover the time is caught must include which companies have anomaly phe-
points for outlier companies, we can employ these outlier nomena at which times. Moreover, these caught anomalies
points to identify the abnormal situation. should be explained reasonably and clearly, enabling system
Finally, most anomaly detection studies have been unable users or investors in the stock market to grasp the causes of
to explain to investors why such anomalies are identified the anomalies.
and have provided investors with few crucial details; thus, The main topic of this paper is anomaly detection. To
the user cannot understand the reason for the formation of perform anomaly detection on financial market data, two
VOLUME 4, 2016 5

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

Multiple Historical Stock


Transaction Data

Fig. 2. Network structure of the STCNN-RN

major problems must be addressed: (1) how to select anomaly explain the abnormal patterns identified in the first part and
time points from the total number of companies and times the second part. Moreover, using the introduced model to
and (2) how to analyze the causes of the abnormalities in the evaluate those features is more important for the model’s
selected time points and determine the types they belong to. judgment; we call these factors “key factors.” By providing
The three major parts of the system we designed can solve such information, (1) we hope that investors can more clearly
these two problems well. The first part is trend prediction understand the trend of the entire market and anomalous
(orange area in Fig. 1). For trend forecasting, we constructed phenomena; (2) we address the second problem in our study.
a problem network structure that combines RNs and a visual
question answering (VQA) system to capture the trends of B. STCNN-RN
diverse companies at various times and the interaction be- In the previous chapter, we mentioned that the application
tween them. We employed this approach because we believe of RNs combined with a VQA system for pattern recognition
that forecasting stock trends is more feasible than forecasting and classification tasks and relational reasoning problems can
stock prices, and the learning task is more meaningful. After be successful because an RN considers the relations between
showing that our model can better fit the data, we expanded objects in unstructured data. Our RN can be combined with
the number of stocks it learned from, expanding from individ- a VQA system to match strengths and weaknesses. The com-
ual stocks to the entire stock market. This can help investors bination can address volatility question types to fit financial
to focus on the stocks they are interested in and subsequently market data to capture the interaction of an entire financial
gain insight into stock market trends. market. We combine the RN with a VQA system to meet
The second part of the system involves anomaly detection our problem requirements and achieve the ultimate goal of
(blue area in Fig. 1). In the preceding paragraph, we outlined our research. Our network architecture was inspired by the
how we fit the data to our model. After the data were fitted RN [12] proposed by DeepMind in 2017, which can process
to our model, we introduced traditional genetic algorithms unstructured data such as an image or a series of sentences
to improve and optimize the model. By optimizing and and implicitly infer the relations between the objects con-
improving the model with traditional genetic algorithms, we tained in it. We thus propose the STCNN-RN, which has
were able to obtain genetic algorithms with constrained genes the ability to process spatiotemporal data and manage the
(GACGs) to meet the problem settings of our experiments. complex interactive relationships between the multiple time
We used GACGs to conduct anomaly detection experiments series.
on the VQA architecture combined with the input of the As demonstrated in Figure 2, we constructed a problem
RN model. The GACGs identify the companies and the time network structure that combines RNs and a VQA system to
points that are the strongest outliers in the model fit; thus, we capture the trends of different companies at various times and
endeavor to solve the first problem in our research through the interactions between them. This is because we believe
the first and second parts of our system. that forecasting stock trends is more straightforward than
The third part of the system involves anomaly explanation forecasting stock prices and the learning task is more mean-
(green area in Fig. 1). We introduced the LIME model to ingful. After proving that our model delivered excellent data
6 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

fitting, we expanded the number of stocks it learned from one items. This provides the model with great flexibility in the
stock to the entire stock market. This can help investors pay learning process.
attention to the stocks they are interested in and gain insight
into stock market trends. 2) Question
A typical RN operates on objects in the simplest possible As we described previously discussed, this model has two
form, and thus it does not explicitly operate on images or inputs; one input falls on the visual part, and the other input
natural language. The main contribution of the RN is to falls on the question part. The text or sentence can be used as
provide enough flexibility such that relatively unstructured the input data of our question part. In this part, we illustrate
inputs (such as CNN or LSTM embedding) can be regarded the existing research in the context of bottlenecks and argue
as a set of objects in an RN. Although an RN would typically that the system has a relative and interactive relationship
take the object representation as input, the system has no between different financial commodities in the time series
requirement to specify the semantics of the object, and the data on the financial market. Therefore, we use two question
RN learning process can induce upstream processing, thereby types to learn and model the stock trends. These questions
generating a set of useful objects from the distributed rep- are based on the volatility of the stock and the strengths and
resentation. To explain the operation mode and principle of weaknesses of the stock. The question with stock volatility
our proposed model in detail, we divide our model into four is usually not much different from the following example:
parts for the explanation, namely Visual, Question, RN, and we can ask “Is the volatility of company A larger than the
Answering. volatility of company B?” or “Is the volatility of company
A less than the volatility of company B?” If the problem
1) Visual is related to the strengths and weaknesses of stocks, the
The RN-with-VQA system has two inputs; a time series data question becomes “Is Company A stronger than Company
set serves as one of the inputs of our model in the visual B?” or “Is Company A weaker than Company B?” and so on.
part, and this input also serves as one of the objects of our The system always has four methods to ask each question.
model. The CNN can convolve the returned data with the Whether the system has a relationship and meaning be-
time series data size of C × T , which C is the number tween objects depends on the question. To obtain the question
of companies and T is the length of time. Although we encoded q,we use one-hot encoding to encode the question.
use the two-dimensional (2D) convolutional layer, the order The order of the encoding results is encoded according to
of each company in the input time series matrix does not the order of words. We can encode the two companies in
matter. Therefore, it is illogical to perform convolution from the question, and then we can encode the types of question.
top to bottom in the direction of the company axis. After We have two questions, one concerning the volatility of the
2D convolution, the information of each company can be stock and the other the strengths and weaknesses of stock.
collected according to the output unit setting of the previous Finally, we encode the condition types to which the problem
convolution layer and then mapped to a one-dimensional belongs. In one-hot encoding, the presence of the number “1”
feature map vector with different channel sizes. This may indicates that the feature exists, and the number “0” indicates
result in the loss of information about each company because that the feature does not exist.
it is a weighted sum of all companies. Consequently, we
use a one-dimensional (1D) filter of size (1,w) for the 2D 3) RN
convolution operation, which is the window size of a 1D The system has two inputs in the module of the RN, which
filter. The size of the company axis of the filter is locked are the feature maps from the CNN and the problem encoding
to 1, and the 1D convolution operation is simulated by the results from one-hot encoding. Now we must build an RN
length w of filter. We can obtain the feature map of input data module. The input to this network is a set of "objects" O =
through Eq. 1: {o1 , o2 , ..., on }. For example, the number of objects depends
on the sequence length of the final CNN feature map. After
Fk×T 0 = f (IC ×T ∗ k + b) (1) convolution, each feature map must be 2D (which retains
the company axis and time axis), the target time step and
where Fk×T 0 represents the feature maps of the final company information become a vector (which consists of
convolutional layer, IC×T denotes the input data of the model, the three-dimensional collected values output along with the
k represents the kernels (which can also be called filters) 2D convolution layer). This retains company information by
in the final convolutional layer, and b indicates the bias of convolving each company separately. In addition, because
the convolution layer. We know nothing about what specific the order of companies in the matrix is arbitrary, we avoid
image features should constitute the object. Therefore, after convolution in the direction of the company axis at the same
convolution of the image, the feature map is marked with time. If we want to extract a set of objects for relation
arbitrary coordinates indicating its relative spatial position calculation, we must use the third axis value of the final
and is regarded as an object of the RN. This means that CNN as the object. We must mark these objects with time
“objects” can include backgrounds, specific physical objects, and company coordinates, express them in the form of a key
textures, combinations of physical objects, and various other vector, and feed the object pairs to the RN module. Then
VOLUME 4, 2016 7

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

we use two MLPs to infer the object relationship. First, gθ


- MLP can calculate each object-to-object relationship and
represent part of the relationship. The fφ - MLP can sum
element-by-element from the output of the gθ - MLP and
calculate the overall relationship. Finally, the RN module can
learn the relationships between all the object pairs, embed the
question (considered as a condition), and integrate all these
components to answer the question. The simplest function of
the RN model is formalized as
!
X
RN (O) = fφ gθ ((oi , ci , oj , cj ) , q) (2)
i,j

where the input to this network is a set of “objects” O


= {o1 , o2 , ..., on }, oi , oj represents each odject-object pair,
ci , cj denotes the corresponding coordinate of each object,
q is the question encoding,gθ stands for the MLP with
parameters θ to calculate partial relations, and fφ indicates
the second MLP with parameters φ to calculate the entire
relation. The RN module can consider all pairs of objects, the
embedding of the question, the time tags, and the company
coordinate of objects; the RN can integrate all these compo-
nents to answer the corresponding questions.

4) Answering
In the answering part, we can obtain the output after gθ
- MLP, which is called the partial relation. The sums of
all partial relation element can be written as a matrix, and Fig. 3. Flow Chart of the GACG

that matrix is named the overall relation vector. This overall


relation vector Vor can enter a multilayer feed-forward neural
networkfφ , and then the last layer of the feed-forward neural The preceding passages fully explain the operation and
network must be processed through the sigmoid activation principles of our RN. Because of the successes of previous
function as Eq. 3 to limit the output value range between 0 RNs and the ability of RNs to process spatiotemporal data,
and 1: we introduce the RN into our research. We are convinced
that some interactions exist between various financial com-
1 modities, and thus, we use the RN to learn stock trends.
softmax (t) = (3)
1 + e−t After appropriately training the proposed model, we can use
where t denotes the output of fφ . The output of the model the proposed model to capture the interactions of financial
can also be predicted for the corresponding question to obtain commodities in the entire financial market. To retain com-
the prediction answer  as in Equation 4. pany and time information, we have added some features to
the RN. For example, our RN performs a 1D convolution
 = softmax (fφ (Vor )) (4) operation on a 2D convolution layer and adds corresponding
coordinates to the feature maps of the final convolution layer.
To measure the accuracy of our proposed model predic-
tion, we can use binary cross-entropy to calculate the gap
C. GACG
between the predicted answer and the actual answer. The
formula of binary cross-entropy is The preceding subsection completely describes the character-
istics of the RN, its operational methods, and the principles
N
of the model. Through the complete training of the RN to
1X fit the data of the entire financial market, the RN can learn
LossBCE = − Ai ∗ log(Âi )+(1 − Ai ) ∗ log(1 − Âi )
N i=1 the interactive relationships between financial commodities
(5) to ensure that the model can fully grasp the relationships in
where N represents the output size of Â, which indicates the entire financial market. For any set of data, after we have
the size of each batch. Â denotes the predict answer and trained this RN, we hope to discover the data that cannot
A regarded as the actual answer. When we are training the fit our model. Therefore, we hope to discover training data
model, we will use Equation 5 to minimize the loss of our that cannot be learned by our model in the anomaly detection
model (which is equivalent to optimizing learning). phase. A particular subset of captured training data contains
8 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

Algorithm 1 Genetic Algorithm with Constrained Gene for lem. Therefore, we introduce a genetic algorithm to solve
Anomaly Detection this problem. We can make changes and optimizations based
Input: on the GACG to meet our problem setting and achieve
AC×C×T : The accuracy matrix generated from the proposed the optimal effect. To clearly explain the operation of the
model, which is consist of the accuracy result of all C algorithm and its principle, we can divide the algorithm we
companies during a period of time T; propose into six parts and describe and explain the detailed
N: Population size for each generation; operation of each part. The six parts are generate, fitness
Crate : Cross rate, which is mating probability for chromo- function, selection, crossover, mutation and assessment, and
some crossover; limit the gene.
Mrate : Mutation rate, which is the mutation probability of Before this genetic algorithm is executed, we must train
each chromosome; the STCNN-RN so that it can achieve high training accuracy.
Output: To successfully execute the GACG, we must perform some
Chromosome phighest : The best chromosome in all genara- preliminary operations and prepare the input data required
tions. by it. We must send the training data (segmented according
1: Generate initial population P0 = {p0 , p1 , ..., pN −1 } to the date) to a properly trained STCNN-RN for evaluation.
2: Evaluate the fitness values of population P0 using Through this method, an accuracy matrix AC×C×T is cre-
AC×C×T by Equation 9 ated, where C represents the number of companies, and T
3: Find the highest fitness values of Chromosome phighest represents all dates of the training data. This accuracy matrix
by Equation 10 AC×C×T serves as the population in the actual real world
4: Calcuate the accuracy GAp_highest_acc of Chromosome solution space. As shown in Figure 3, we proposed genetic
phighest by Equation 11 algorithm with constrained gene(GACG) to conduct anomaly
5: Find the Rule-basedacc by Equation 12 detection experiments on the VQA architecture combined
6: while (GAp_highest_acc < Rule-basedacc ) do with the input of the relational network model. The genetic
7: Pnew = [] algorithm with constrained gene will find the companies and
8: Append (Select the topk fitness values of chromo- the time points that are the most outlier in the model fit, so
somes ptop_k ) into Pnew we hope to solve the first problem in our research through the
9: Select N-k chromosomes from population P accord- first part and the second part.
ing to the fitness values
10: for each chromosome i ∈ N-k chromosomes do 1) Generate
11: Crossover chromosome i with Crate After we have completed the pre-operation of the genetic
12: Mutation chromosome i with Mrate algorithm, then we officially start the first step of the ge-
13: Check up and Limit the Gene i netic algorithm. The first step of the genetic algorithm is
14: Append (chromosome i ) into Pnew to generate population. Firstly, we need to decide on the
15: end for representation we use to represent the solution of genetic
16: P = Pnew algorithm. The incorrect representation will cause the genetic
17: Evaluate the fitness values of population P algorithm to perform poorly. The most simple and frequently
18: Find the highest fitness values of chromosome used representation in the past genetic algorithms has been
pHighest the binary representation. This form of representation is rela-
19: Calcuate the accuracy GAp_highest_acc of Chromo- tively easy to use in the computing space to express solutions
some phighest in a way that the computing system understands and operates.
20: end while In this study, we use a binary representation of C × C × T
21: return Chromosome phighest elements to represent a solution. In this binary representation
of chromosome p , the pi,t th element represents the training
accuracy rate of the i th company at the time point t. We will
define the pti,j th element as follows:
information about a particular company at a particular point (
in time. The ultimate goal is to discover the most extreme t 1, anomaly
outliers among the training data. We contend that the data pi,j = (6)
0, normal
that cannot be fitted by the model may describe a certain
period in the financial market that is markedly different from If the pti,j th element is an anomaly, the definition returns
most periods for financial products in the entire market. At (1), and otherwise, the definition returns (0). In our research,
the same time, these captured data are different from the the population is defined as a subset of problem solutions.
usual changes, meaning that our RN cannot simulate these In the process of generating a population, we can ensure the
data. However, how to discover the most extreme outlier data diversity of the population to avoid the problem of premature
and the most anomalous training data among multisource convergence. We must repeatedly experiment to determine
financial time series data is a complicated optimization prob- the optimal population size because any excessively large
VOLUME 4, 2016 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

population can cause GACG’s operation speed to be unac- Equation 9 makes the calculated fitness values f (pvalues )
ceptably slow. Conversely, any excessively small population large when the accuracy values of chromosome pvalues(px )
can cause the problem of insufficient population diversity. are large. To allow every chromosome in the population to
After deciding the size of our population, we only fill a few have a chance of producing offspring, we add a small number
reasonable solutions in the set of all initialized solutions and to the fitness function, and consequently, the smaller pvalues
we fill the rest of the population with random solutions. becomes a smaller number instead of 0.

2) Fitness Function 3) Stop Condition


After generating an initial population according to the pop- After calculating the fitness values belonging to each chro-
ulation size, we must evaluate the quality of each solution. mosome, we must evaluate whether any chromosome ex-
Therefore, we introduce the fitness function we designed ists that meets the suspension conditions in this generation.
to calculate the fitness values of the population to evalu- Before evaluating whether any chromosome meets the sus-
ate the gap between the population and the target solution. pension conditions, we must first discover the chromosome
The fitness function here mentioned is a function that takes with the highest fitness values in this generation, which then
candidate solutions to the problem as input and produces as serves as a candidate for later use to measure whether the
output—how acceptable the “fitness” is of the solution to stop conditions are met. The equation for discovering the
the problem that is under consideration. The calculation of chromosome of highest fitness values phighest is expressed as
the fitness value is repeated in genetic algorithms and must follows:
be adequately fast. Slow calculation of the fitness values
may adversely affect genetic algorithms by making them phighest = max (f (pvalues )) (10)
extremely slow. However, the fitness function must have
After finding the chromosome with highest fitness val-
the following characteristics: the fitness function should be
ues phighest , we can introduce it into the equation
calculated fast enough, and it must quantitatively measure the
GAp_highest_acc to calculate its accuracy and set the accuracy
suitability of a given solution or how to generate individuals
as GAp_highest_acc :
from the given solution. In some cases, due to the inherent
complexity of the problem at hand, it may not be possible
GAp_highest_acc = pvalues(phighest ) (11)
to directly calculate the fitness function. In this case, we
can perform fitness approximation to meet our needs. Before In addition, we must discover a benchmark that satisfies
entering the fitness function we designed, we must decode the stop conditions. We discover the benchmark for the stop
each solution.When the value of pti,j is 1, we choose to set condition by discovering the worst accuracy rate of 5% in
the accuracy to 0; otherwise, if the value of pti,j is 0, we the accuracy matrix Ac,t Moreover, we set these accuracy
will get the accuracy rate from the accuracy matrix Ati,j rates to 0, sum all the accuracy rates, and then average
corresponding to the same position of pti,j . them. The previously discussed steps are how we look for
Greedy accuracy Greedy. It can be expressed as the following
( equation:
0, if pi,t = 1 or pj ,t = 1
Genevalues(pi,j
t ) = (7)
Ati,j , otherwise Greedy = pvalues(pgreedy ) (12)
Then, we sum all the accuracy values of pti,j . pvalues(px ) , After finding the candidate chromosome GAp_highest_acc
which can be expressed as the following equation: of the stop condition and the baseline chromosome greedy ,
we can compare the differences between them. If the accu-
PT PC PC racy of the candidate chromosome GAp_highest_acc exceeds
t=1 i=1 j =1 Genevalues(pi,j
t )
the accuracy of the baseline chromosome greedy, we stop the
pvalues(px ) = (8)
C ∗ C ∗ T − (C ∗ C ∗ T ∗ 0.05) operation of the GACG. Conversely, if the candidate chromo-
where px represents a chromosome of the population, C some GAp_highest_acc is less than the baseline accuracy rate
denotes the number of companies, T stands for all the trans- greedy,we resume executing the GACG.
action day of the training data and pti,t describes the training
accuracy of the i th company and j th company at the time 4) Selection
point t.After obtaining all the accuracy rates, to obtain the av- When the candidate chromosome GAp_highest_acc is less than
erage of all the accuracy rates, we can obtain all the accuracy the baseline accuracy rate Greedy, we can enter the selection
rates divided by the number of C ∗C ∗T −(C ∗C ∗T ∗0.05). part. In this part, we can select the top kfitness values of a
To evaluate the fitness values between solution and target, chromosome to retain for the next generation. This allows
we have designed a fitness function that meets our problem them have the opportunity to reproduce the chromosome, and
setting. The fitness function we designed is as follows: thus, our candidate chromosome can obtain a higher accuracy
rate. Then, we can use fitness values to calculate the proba-
f (pvalues ) = pvalues + (1e −3 ) − min(pvalues ) (9) bility of the screening chromosome and use this probability
10 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

to select N-k chromosomes.The equation for calculating the of gene value 1 matches the number of anomalous time points
probability of screening chromosome is as follows: we set. First, we can assess in each chromosome whether the
number of gene value 1 matches the number of anomalous
f (pvalues(pi ) ) time points we set. If the number of gene value 1 matches
P (f (pvalues(pi ) )) = PN (13)
i=1 f (pvalues(pi ) )
the number of anomalous time points, the chromosome can
be regarded as the new chromosome of the next generation.
where = pi represents a chromosome of the population, Conversely, if the number of gene value 1 does not meet
f (pvalues(pi ) ) stands for the fitness values of the chromo- the number we set, we limit the number of genes. First, we
some, and Ndenotes the size of the population for each discover and record the positions of all 1 in the gene and
generation. Then we can screen out N-kchromosomes equa- then extract the number we set from them as the new ones;
tion according to the calculated probability P(f (pvalues(pi ) )). then, we set the remaining positions to 0. After this stage
Each of these N-k chromosomes must enter the crossover is completed, the new chromosome becomes the population
part. Pnew of next generation.
When we obtain a new population Pnew ,we can recalculate
5) Crossover
fitness values of the new population. Then, we discover
First, we can make each of the N-kchromosome take turns as the candidate chromosome GAp_highest_acc of the new pop-
the parent, and we call it “father.” We can generate a random ulation Pnew to compare with the baseline accuracy rate
number and compare it with cross rateCrate . If the random Greedyto meet the suspension conditions until the stop condi-
number is larger than the cross rate Crate , then it cannot tions are met. In the end, the algorithm can return an optimal
enter the stage of cross over and the father’s chromosomes solution. The best solution is to achieve the best accuracy
directly become the chromosomes of the next generation. after deleting the anomalous time point.
Conversely, if this random number is less than the cross
rate Crate , we can officially enter the stage of cross over. IV. EXPERIMENTS
After entering the crossover part, we can randomly select
To ensure and verify the validity of our proposed model,
one of the N-k chromosomes as the mating target and we
we conducted three experiments to evaluate and measure
call it “mother.” Further, we can discover and remember the
the proposed model. Before discussing the experiments, we
mother gene position of 1. Suppose we set the number of
describe the data set used in the experiments. The baseline
anomalous time points to 60. Then, we can extract the half
methods used to compare the different models are defined.
of gene position 1 among the mother’s gene, and half the
After a detailed description of the experimental setup, the
gene positions are extracted from the father’s gene. For the
experimental results of the experimental setup are displayed
remaining gene positions, we fill in 0 as the gene. We can
and the three experiment’s results are discussed.
combine the extracted mother and father genes to form new
genes. This describes the mating process of chromosomes.
A. DATA SET DESCRIPTION
We used four data sets in three experiments: the S&P 500
6) Mutation
constituent stocks Top 20 Tech Companies data set, S&P 100
We take the chromosome that just emerged from the cross
constituent stocks data set, FTSE TWSE Taiwan 50 index
over and enter it into the gene mutation part. First, we
constituent stocks data set, and SSE 50 index constituent
generate a random number, and we can determine whether
stocks data set.
we must mutate genes Mrate for each gene. One can judge
whether a mutation is required by using a randomly generated TABLE 1. The company list of S&P 500 constituent stocks’s top 20 tech
number and comparing it with the mutation rate Mrate we set. companies dataset
If this random number is greater than the mutation rate Mrate ,
we do not enter the mutation procedure, and the chromosome AAPL, MSFT, V, MA, INTC, CSCO, ADBE,
can be directly used as the new chromosome of the next Ticker Symbol CRM, NVDA, ACN, AVGO, ORCL, IBM, TXN,
QCOM, FIS, ADP, INTU, FISV, MU
generation. Conversely, if this random number is less than
the mutation rate Mrate , we must enter the stage of gene
mutation. In the stage of gene mutation, we can turn the S&P 500 constituent stocks Top 20 Tech Companies data
original 1 gene into 0. Conversely, genes that were originally set: As shown in Tables 1, we collected the minute data
0 are set to 1. of S&P 500 constituent stocks from 2015 to 2016. S&P
500 constituent stocks contain five features, which are open,
7) Assess and limit the gene high, low, close, and volume. We can select the company in
In this part, we ensure that in each chromosome, the number the S&P 500 constituent stocks according to its published
of gene value 1 matches the number of anomalous time points weight. Therefore, we selected the 20 technology companies
we set. We have specially set up the system to assess and limit in the S&P500 with the greatest weights as our first data
the gene. After each chromosome has gone through selection, set. The number of stocks in the 2 years is different. After
mating, and mutation, the system can confirm that the number discarding some incomplete stock data, the number of stocks
VOLUME 4, 2016 11

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

in 2015 and 2016 are 483 and 490, respectively. In the process comprehensively rank stocks based on market capitalization
of screening stocks, we selected 20 technology companies and turnover and to select the top 50 stocks to form a sample,
that coexisted during the 2 relevant years. excluding stocks that have abnormal market performance and
are deemed inappropriate by the expert committee. Due to
TABLE 2. The company list of S&P 100 constituent stocks dataset changes in the SSE’s selection of constituent stock rules,
the list of SSE constituent stocks has changed greatly. We
AAPL, ABBV, ABT, ACN, AIG, ALL, AMGN, collected daily trading data of constituent stocks in 2019 and
AMZN, AXP, BA, BAC, BIIB, BK, BMY,
BRKB, C, CAT, CELG, CL, CMCSA, COF,
2020. We selected from 50 listed companies coexisting on the
COP, COST, CSCO, CVS, CVX, DIS, EMR, constituent stock list in 2019 and 2020 and finally selected 37
EXC, F, FB, FDX, FOXA, GD, GE, GILD, for our training and testing data sets in Experiment 2.
GM, GOOG, GOOGL, GS, HAL, HD, HON,
Ticker Symbol IBM, INTC, JNJ, JPM, KMI, KO, LLY,
LMT, LOW, MA, MCD, MDLZ, MDT, MET, B. PERFORMANCE EVALUATION OF THE PROPOSED
MMM, MO, MON, MRK, MS, MSFT, NKE,
ORCL, OXY, PEP, PFE, PG, PM, QCOM, ANOMALY DETECTOR
RTN, SBUX, SLB, SO, SPG, T, TGT, An experiment was conducted to compare the performance
TWX, TXN, UNH, UNP, UPS, USB,
UTX, V, VZ, WBA, WFC, WMT of the STCNN-RN with other baseline models.
A multilayer deep neural network (multilayer DNN) is a
The S&P 100 constituent stocks data set: As shown in model composed of multiple layers of density. We introduced
Tables 2. In Experiment 1, the second data set we used is a model composed of multiple layers of density as our
the minute stock price data of S&P100 constituent stocks. experimental baseline model. The input of the visual part
Minute price data of S&P 100 constituent stocks were taken in this model is IC×T , but we chose to directly import the
from Wharton Research Data Services Trade and Quot-e flattening layer to process this input. The input processing
(WRDS TQA) [23]. To compare the effects between the method of the question part is the same as that of the STCNN-
S&P 500 constituent stocks Top 20 Tech Companies data set RN. Then, we import the concatenated layer to process the
and S&P 100 constituent stocks data set, we took the years output from the visual part and question part and connect
shared by the two data sets and finally screened out the 2015 four layers of density behind this concatenated layer. The
and 2016 data for model training. Therefore, when screening output of the last dense layer passes through the activation
S&P100 constituent stocks, we also only took stocks shared function “sigmoid” before the answer A is returned. The
in the 2 years. The list of S&P100 companies in the 2 years process previously discussed is how this model works.
is different; we took out the stock price data of 90 companies Another baseline model used in our experiment was the
that also existed in 2 years from the 100 companies. temporal convolution neural network–based relational net-
The FTSE TWSE Taiwan 50 Index constituent stocks data work (TCNNRN). We applied financial time series data in
set: The FTSE TWSE Taiwan 50 index is an index jointly a model based on 1D convolution. The difference between
compiled by the Taiwan Stock Exchange(TWSE) and the the TCNNRN and STCNN-RN is that in the visual part, the
FTSE Index. The FTSE TWSE Taiwan 50 Index Components TCNNRN can use a multilayer 1D-CNN to process the input
cover the top 50 listed companies in the Taiwan stock market matrix to generate a set of objects for the RN module. After
by market capitalization, and it is highly correlated with returning the data for convolution operation, we can connect
the broader market. The minute price data of stocks were the object pairs extracted from the final CNN feature map
taken from the TWSE, and we collected data on intraday with the corresponding time coordinates and question one-
stock transactions from 2016 to 2017, including information hot encoding vector. Then, we feed the connected vector to
such as opening, high, low, closing, and volume. Because the the RN module to calculate the relation, and the final output
list of FTSE TWSE Taiwan 50 Index constituent stocks in is the corresponding answer A.
2016 and 2017 are different, we only took stocks shared in To verify the effectiveness of the GACG, we also proposed
2016 and 2017 as our training data. We screened 48 listed other baseline methods for determining genes, which are
companies from 50 listed companies in Experiment 2. None, Random Choice, and Greedy. None is the method to
The SSE 50 Index constituent stocks data set: Daily price measure the ability of our model to capture the relationship
data of SSE 50 index constituent stocks were taken from the between different time series data, for which we introduced
Shanghai Stock Exchange (SSE). The SSE 50 Index is an binary cross-entropy to calculate the performance of the
index compiled by the SSE. It is regarded as an index repre- model on the binary classification problems. To evaluate the
senting the overall situation of the most influential companies ability of the GACG to catch anomalies, we introduced the
on the SSE. It selects the most representative stocks with random choice method. The concept of random choice is to
a large scale and excellent liquidity in the Shanghai stock randomly generate N chromosomes and then calculate the
market to form the index. The goal of compiling this index is accuracy for each chromosome, after which we sum the N
to establish a large-scale investment index with active trans- calculated accuracy and divide by N to obtain the accuracy
actions that can be used as the basis for derivative financial of random choice. We also select the worst accuracy rate
instruments. The method of selecting component stocks is to of 5% in the accuracy matrix and set the values to 0. Then
12 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

TABLE 3. Anomaly detection accuracy of existing models with S&P 100 constituent stocks data set

Rolling Test
Model Anomaly Detection Average
1 2 3 4 5 6 7 8 9 10
None 0.5576 0.5067 0.7545 0.6108 0.6654 0.6054 0.6183 0.6355 0.6662 0.5168 0.6083
Random Choice 0.5766 0.5251 0.7744 0.6299 0.6848 0.623 0.6353 0.6532 0.6847 0.5381 0.6415
Multi-layer DNN
Greedy 0.5773 0.5258 0.7752 0.6305 0.6856 0.6238 0.6363 0.6542 0.6855 0.5388 0.6423
Proposed GACG 0.5776 0.526 0.7757 0.6307 0.6858 0.6241 0.6367 0.6546 0.6859 0.5391 0.6426
None 0.5747 0.5103 0.5649 0.544 0.5138 0.5164 0.5508 0.5203 0.5219 0.5172 0.5334
Random Choice 0.5812 0.516 0.5713 0.5501 0.5196 0.5222 0.557 0.5262 0.5278 0.523 0.5394
TCNN-RN
Greedy 0.5819 0.5168 0.572 0.5508 0.5203 0.5228 0.5577 0.5269 0.5284 0.5237 0.5401
Proposed GACG 0.8661 0.5192 0.5759 0.554 0.5228 0.5252 0.5612 0.5292 0.5305 0.5252 0.5709
None 0.7964 0.7664 0.7867 0.7799 0.795 0.7772 0.7698 0.7745 0.818 0.8127 0.7876
Random Choice 0.8054 0.775 0.7956 0.7887 0.8039 0.7859 0.7784 0.7833 0.8272 0.8218 0.7965
STCNN-RN
Greedy 0.8067 0.7761 0.7964 0.7896 0.8049 0.787 0.7796 0.7843 0.8282 0.823 0.7976
Proposed GACG 0.8212 0.7907 0.8093 0.8027 0.818 0.8006 0.7928 0.7978 0.8429 0.8368 0.8113

we add the entire accuracy matrix AC×C×T and divide by capture the interactions between different companies, and
C × C × T to obtain the Greedy accuracy. the STCNN-RN has the ability to process spatiotemporal
In Experiment 1, we used two data sets to train our pro- data. Compared with other baseline models, the experiment
posed model. The two data sets are the S&P 500 constituent with the STCNN-RN combined with the GACG proves that
stocks Top 20 Tech Companies data set and S&P 100 con- the model can discover the best solution to obtain a higher
stituent stocks data set. The purpose of using these two data anomaly detection accuracy rate.
sets was to verify the effectiveness of our proposed model.
Moreover, we also want to prove that our model can learn the C. ANOMALY DETECTION IN VARIOUS FINANCIAL
interactions between stocks regardless of having only a small MARKETS
amount of time series data or a large amount of time series To prove that our model has a certain generalization ability,
data. in addition to using the S&P 100 data set to verify the
In Experiment 1, we used two data sets, which are the performance of our proposed model on the US stock market,
S&P 500 constituent stocks Top 20 Tech Companies data we introduced FTSE TWSE Taiwan 50 Index Components
set and S&P 100 constituent stocks data set. We used the and SSE 50 Index Components; they represent the Taiwan
closing price to calculate the return data required by the stock market and the Shanghai stock market respectively.
model and then processed them into the input required by the In Table 5, we list the outcome of subjecting three data
model. The results in Tables 4 and Table 3 indicate that the sets to our proposed model and baseline models. In this
extent to which the model can learn financial time series data experiment, the three data sets represented different stock
affects the model’s fitting accuracy. Based on the results of markets. The component stock data set of the S&P100 Index
the experiments, we conclude the STCNN-RN has excellent represented the US stock market, the component stock data
fitting accuracy for the S&P 500 constituent stocks Top 20 set of FTSE TWSE Taiwan 50 Index represented the Taiwan
Tech Companies data set and the S&P 100 constituent stocks stock market, and the SSE 50 Index represented the Shanghai
data set. This means that the STCNN-RN has the ability stock market.
to learn the financial time series data and is suitable for It turns out that the proposed model can fit the stock
processing spatiotemporal data. For comparing the results of markets of different stock markets well. Even when facing
these models, we used bold fonts to highlight the highest data sets with different data frequencies, the STCNN-RN
anomaly detection accuracy. fitting data accuracy was still better than that of the TCNNRN
To conduct the anomaly detection accuracy experiments or multilayer DNN. Moreover, this proved that the proposed
detailed in Table 4 and Table 3, we used the random choice model can learn the interactive relationships in different time
method and Greedy method, respectively, to screen our best series data regardless of whether it is intraday transaction
solutions. The solution obtained by the STCNN-RN com- data or daily transaction data and that it is more than capable
bined with the GACG achieved the highest accuracy after of predicting future trends. However, the performance of the
deleting all anomalous time points with companies, followed TCNNRN was slightly inferior to that of the multilayer DNN,
by the performance of the multilayer DNN combined with possibly due to the use of the 1D-CNN to process input data
the GACG and the STCNN-RN combined with the GACG. to make object pairs, which prevents the model from learning
Table 3 shows the STCNN-RN combined with the GACG how to predict stock trends.
has a higher accuracy rate than the other baseline models, Table 5 lists the accuracy results of the three models
and the anomaly accuracy rate ranges from 0.7907 to 0.8429 combined with different baseline methods of the best solu-
in most of the rolling tests. Table 4 has the same results, tion. Comprehensive assessment of the experiment results
the anomaly accuracy rate ranges from 0.7809 to 0.9457. indicates that the STCNN-RN combined with the GACG
Therefore, we can draw the following conclusions from the has a higher accuracy rate than the other baseline models
results of Experiment 1: the STCNN-RN can accurately combined with different baseline methods and the anomaly
VOLUME 4, 2016 13

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

TABLE 4. Anomaly detection accuracy of existing models with FTSE TWSE Taiwan 50 Index constituent stocks data set

Rolling Test
Model Anomaly Detection Average
1 2 3 4 5 6 7 8 9 10
None 0.7721 0.829 0.741 0.792 0.7511 0.6959 0.7322 0.7179 0.7027 0.7167 0.7451
Random Choice 0.7886 0.8467 0.7569 0.8088 0.7671 0.7107 0.7478 0.7331 0.7178 0.7321 0.761
Multi-layer DNN
Greedy 0.7898 0.8479 0.7582 0.8104 0.7685 0.7122 0.7491 0.7345 0.7191 0.7338 0.7624
Proposed GACG 0.8119 0.8715 0.7804 0.8323 0.7896 0.7319 0.7697 0.7547 0.7402 0.7564 0.7839
None 0.7331 0.7453 0.7171 0.7508 0.7233 0.7249 0.7521 0.715 0.7309 0.7111 0.7304
Random Choice 0.7487 0.7612 0.7323 0.7668 0.7387 0.7403 0.7681 0.7302 0.7465 0.7263 0.7459
TCNN-RN
Greedy 0.75 0.7626 0.7337 0.7682 0.7401 0.7418 0.7694 0.7316 0.7481 0.7278 0.7473
Proposed GACG 0.7713 0.7851 0.7544 0.7898 0.7625 0.7624 0.7914 0.7516 0.77 0.7487 0.7687
None 0.852 0.8851 0.8599 0.8271 0.7417 0.8477 0.8803 0.851 0.9027 0.8797 0.8527
Random Choice 0.8702 0.904 0.8782 0.8447 0.7576 0.8657 0.8991 0.8691 0.9219 0.8984 0.8709
STCNN-RN
Greedy 0.8711 0.9046 0.8789 0.846 0.7592 0.8665 0.8996 0.8701 0.9224 0.8989 0.8717
Proposed GACG 0.8941 0.9273 0.9008 0.8687 0.7809 0.889 0.9222 0.8935 0.9457 0.9212 0.8943

TABLE 5. Anomaly detection accuracy of existing models with SSE 50 Index constituent stocks data set

Rolling Test
Model Anomaly Detection Average
1 2 3 4 5 6 7 8 9 10
None 0.8352 0.793 0.7967 0.7815 0.8148 0.8447 0.8143 0.8352 0.8246 0.8092 0.8149
Random Choice 0.9171 0.876 0.8817 0.8592 0.9167 0.9542 0.9043 0.9171 0.903 0.8807 0.901
Multi-layer DNN
Greedy 0.9213 0.8815 0.8868 0.8642 0.9213 0.9569 0.9087 0.9222 0.908 0.8877 0.9059
Proposed GACG 0.9527 0.9115 0.9174 0.8933 0.9536 0.9607 0.94 0.9542 0.9383 0.9175 0.9339
None 0.545 0.5355 0.5664 0.5119 0.5198 0.5268 0.5699 0.5843 0.5595 0.5741 0.5493
Random Choice 0.728 0.6569 0.7353 0.6407 0.6581 0.6449 0.724 0.7537 0.7224 0.6588 0.6923
TCNN-RN
Greedy 0.7284 0.666 0.7366 0.6492 0.6657 0.6467 0.7229 0.763 0.7173 0.6667 0.6963
Proposed GACG 0.77 0.6918 0.775 0.6756 0.6939 0.6781 0.7606 0.7938 0.7653 0.6915 0.7296
None 0.8133 0.8448 0.938 0.8567 0.9092 0.8904 0.9077 0.9106 0.9035 0.9388 0.8913
Random Choice 0.9446 0.9294 0.9716 0.9476 0.9603 0.9628 0.9626 0.9792 0.9496 0.9767 0.9584
STCNN-RN
Greedy 0.9487 0.9364 0.9741 0.9515 0.9634 0.967 0.9663 0.9806 0.9566 0.9787 0.9623
Proposed GACG 0.9822 0.97 0.9754 0.9857 0.9939 0.9985 0.998 0.9816 0.9959 0.9799 0.9861

accuracy rate ranges from 0.97 to 0.9985 in most of the Shawe-Taylor and Žličar [40] applied the OC-SVM to iden-
rolling tests. The second highest performance was that of the tify potential anomalies in financial time-series data and to
multilayer DNN combined with the GACG, followed by the find the distribution and the timing of the occurrence of the
performance of the TCNNRN combined with the GACG. In anomalous behavior in these data. The experiment results
this experiment, the time intervals in the three data sets were indicated that the OC-SVM detected changes in anomalous
different, but their rolling windows and shift month methods behavior in synthetic data sets and in several empirical data
were the same.Therefore, we once again proved the quality sets.
of the performance of our model in the learning tasks and An SOM [42] is an unsupervised clustering algorithm. An
complex correlation of multiple financial time series data in SOM differs from other clustering algorithms in that it has
this experiment. This indicates that when facing various stock a topological map that is used to express the distribution of
markets, data frequencies, or data intervals, the STCNN-RN each output or cluster. Therefore, SOMs can express the orig-
combined with the GACG method has outstanding perfor- inal high-dimensional space data through the visualized low-
mance. dimensional space, and the visualized result can also effec-
tively explain the result of the grouping. Li et al. [41] used an
D. COMPARE DIFFERENT ALTERNATIVE MODELS SOM in a dynamic environment for discovering the abnormal
Tables 3-5 indicate that the solution obtained by the STCNN- financial behaviors of corporations. The experiment results
RN combined with the GACG achieved the highest accu- indicated that the combination of macroeconomic indicators
racy. Therefore, we used the STCNN-RN combined with the and financial indicators is superior to the use of financial
GACG method to perform comparisons with other bench- indicators alone. The utilization of a hierarchical SOM helps
mark models. We selected the one-class SVM (OC-SVM) in identifying the abnormal financial behaviors associated
[40] and self-organizing maps (SOMs) [41] as the control with corporate operations more effectively.
group. The results for the OC-SVM, SOM, and our proposed
The OC-SVM is an unsupervised algorithm, and the train- method are provided in Table 6. Our method outperformed
ing data have only one classification. In this algorithm, a the other methods in anomaly detection accuracy, and it
decision boundary is learned through the characteristics of exhibited the strongest detection capabilities in 10 rolling
normal samples, and this boundary is used to determine tests. To validate that our proposed model has more favorable
whether the new data point is similar to the training data; detection capabilities, we used a t test. The t test is used to de-
data beyond the boundary are regarded as abnormal data. termine whether a significant difference exists between two
14 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

TABLE 6. Anomaly detection accuracy of different alternative models with S&P 100 constituent stocks data set

Rolling Test
Anomaly Detection Average
1 2 3 4 5 6 7 8 9 10
OCSVM 0.5509 0.5394 0.5442 0.5501 0.5439 0.5361 0.5508 0.5432 0.5652 0.5335 0.5457
SOM 0.5951 0.5306 0.5881 0.5949 0.6219 0.5598 0.5497 0.6349 0.6518 0.5803 0.5907
Proposed GACG 0.8212 0.7907 0.8093 0.8027 0.818 0.8006 0.7928 0.7978 0.8429 0.8368 0.8113

TABLE 7. Statistical test of proposed GACG and OCSVM

Rolling Test
1 2 3 4 5 6 7 8 9 10
F-value 47.70736 9.42140 36.30940 13.45678 33.73106 52.48977 23.38555 29.89704 38.24091 2.39872
P-value 4.20E-09 0.00326 1.24E-07 0.00053 2.82E-07 1.14E-09 1.01E-05 1.01E-06 6.77E-08 0.12687
Test result Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Accept H0
T-value 124.38467 413.67958 120.46406 113.82810 133.68709 112.32497 148.25440 149.31730 190.73750 429.87419
P-value 4.12E-72 2.44E-102 2.62E-71 6.92E-70 6.37E-74 1.49E-69 1.61E-76 1.06E-76 7.46E-83 2.63E-103
Test result Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0

TABLE 8. Statistical test of proposed GACG and SOM

Rolling Test
1 2 3 4 5 6 7 8 9 10
F-value 31.61851 9.08195 50.03186 97.66312 46.27081 11.23722 51.40315 77.79999 75.52385 28.86643
P-value 5.65E-07 0.00382 2.21E-09 4.80E-14 6.29E-09 1.42E-03 1.52E-09 2.63E-12 4.31E-12 1.43E-06
Test result Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0
T-value 84.07712 167.15383 40.20941 45.73598 69.99179 141.44962 38.64878 56.08296 51.86619 155.89619
P-value 2.66E-62 1.55E-79 4.68E-44 3.34E-47 9.98E-58 2.44E-75 4.29E-43 3.16E-52 2.69E-50 8.79E-78
Test result Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0

averages. Before the statistical test, the population variance 2015~這- 1 0


- AAPL
was unknown. We used the f test to verify that the populations
UO·
had the same variance.
When the p values of the f test and t test are less than the 115·

significance level of 0.05, the null hypothesis can be rejected, OuU


11

d
which means the sampling was from different populations. In 1u
OlS

our results, the t test returned significant p values; thus, the 105·

null hypothesis was rejected. Therefore, the proposed method


100·
outperformed the control group. To obtain the value of each
time period fairly, we ran each model 30 times. Tables 7 and 95.

8 provide the comparison results of our method versus the '


2015-08-05 '
2015-08-09 '
2015-08-13 '
2015-08-17
Date
'
2015-08-21 '
2015-08-25 ' '
2015-08-29 2015-09-01

OC-SVM and SOM, respectively. The t-test results revealed


that our model outperformed the control group in terms of Fig. 4. An anomalous event of Apple (AAPL) in the S&P 100 constituent
stocks data set
detecting abnormalities.

E. INTERPRETATION OF DETECTED ANOMALIES


To verify the effectiveness of the proposed method for de-
tecting anomalies, we used case studies to illustrate the
anomalous time points captured by our model and the cor-
responding real events to prove the accuracy of the model in
capturing anomalies. We want to emphasize that these results,
which are based on unsupervised anomaly detection, only
obstruct progress. Accidents and events differ, and they may
appear suddenly without prior activity. The proposed method
allows for detection of such events and related anomalies. We
provide two examples of accidents to prove that our method
can detect anomalous behavior in multiple time series data
when an accident occurs. Fig. 5. Local explanation for an anomalous event of Apple (AAPL)

VOLUME 4, 2016 15

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

2015-11-30 - KMI Local explanation for class 1


32.5·

CMCSA <= 0.50


30.0· i'BT <= 0.46
V<=0.54
FB <= 0.50
27.S· ACN <=0.50
CAT <=0.50
0.50 < MDLZ <= 0.67
25.0· 0.50 < MS<= 0. 71
HON <=0.46
22.5·
C <= 0.43
0.57 <MO<= 0.71
DIS<= 0.50
20.0· UNH <=0.50
GILD<= 0.50
TWX <= 0.50
17.5 . QCOM <=0.57
EXC <= 0.54
HD <= 0.57
15.0· AAPL <= 0.57
0.50 < MA<= 0.68
'
2015-10-01 '
2015-10-15 '
2015-11-01 '
2015-11-15 '
2015-12-01 '
2015-12-15 '
2016-01-01
0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008
Fig. 6. Anomaly event of Kinder Morgan (KMI) in S&P 100 constituent stocks
data set
Fig. 7. Local explanation for anomaly event of Kinder Morgan (KMI)

a: Apple Inc. (AAPL)


and negative headline” published on December 4, 2015.
iPhone Growth Concerns and Yuan Devaluation. 12 August Kinder Morgan’s stock performance was slightly below an-
2015. Depicted in Figure 4,the GACG judges Apple’s data alyst expectations. It caused some investors to worry that
on August 10, 2015, as anomalous (indicated by the red line), the company might be affected by the economic recession
and the date of this anomalous activity is the Chinese stock more than previously thought. After Kinder Morgan’s poor
market crash on August 12, 2015. This judgment also reflects performance in the second quarter of 2015, those worried
the effect of early judgment rather than a reaction after the investors began to feel genuine fear after the third quarter
news was released. The global stock market crash set off results were released in October. Although these results
by China caused a sharp drop in US stock markets. Even were within the expected range, the company announced on
Apple Inc., which was preparing to launch a new product December 4, 2015, that it would reduce its dividend, and
in September 2015, was strongly adversely affected, and its the negative news reports eventually caused disappointment
stock price plummeted. The slowdown in China’s economy among investors and caused the stock to plummet. As shown
on August 12, 2015, became the main reason for the hit to in Figure 7,Comcast Corporation (CMCSA) has the most
Apple Inc.’s stock price. The stock market was concerned influence on this decision, followed by Abbott Laboratories
about the slowdown in Apple’s growth. This is partly at- (ABT) and Visa Inc. (V).
tributable to the slowdown in the growth of the Chinese Based on the previously discussed description, these three
market raising doubts about the demand for the new product, cases demonstrate that the STCNN-RN combined with the
which put the stock price under pressure. When Apple’s GACG can discover anomalous activities that are forming.
iPhone 6 and iPhone 6 Plus launched in 2014, they dominated The model can detect anomalous activities early instead of at
the world in sales, recording the highest sales numbers ever. the time the news is released.
The key to success is the Chinese market, which accounts for
one-fourth of the profit of global companies. However, as the V. CONCLUSION
Chinese economy weakened and consumer willingness to re- In this study, we focused on using the STCNN-RN com-
place new devices decreased, Apple’s stock price collapsed. bined with the GACG to discover anomalous time points
This trend may continue and is likely to put pressure on with companies in the multiple financial time series data
Apple’s China business. Although Apple also has substantial sets. Furthermore, we also introduced the LIME interpretable
manufacturing operations in China, the cost–benefit ratio model to interpret and analyze the causes of anomalies.
from the devaluation of the Yuan is unlikely to be significant First, we proposed the STCNN-RN to learn the complex
because Apple may calculate it in US dollars with its contract correlation between multiple financial time series data sets.
manufacturers. Even if the company has some influence over In the experiment, we used the S&P 100 Index Component
its contract manufacturers, the potential losses in revenue stock data for the US stock market, the FTSE TWSE Taiwan
and growth in China may exceed any benefits. As shown in 50 Index Component stock data set for the Taiwan stock
Figure 5, the Southern Company (SO) has the most influence market, and the SSE 50 Index Component stock data set
on this decision, followed by Johnson & Johnson (JNJ) and for the Shanghai stock market. In these three data sets, the
Lockheed Martin Corporation (LMT). system has intraday transaction data and daily transaction
data. Relative to the baseline models, our proposed model
b: Kinder Morgan, Inc. (KMI) was able to capture the interaction between different financial
Slashes Dividend and Negative Headline. 04 December 2015. products. The STCNN-RN model proved able to fit the data
Figure 6 displays how the GACG judged Kinder Morgan’s accurately in these three data sets. Compared with the multi-
data on November 30, 2015, as anomalous (indicated by the layer DNN and TCNNRN, the STCNN-RN is more capable
red line), and the anomalous activity is the “slashes dividend of processing data with temporal and spatial characteristics.
16 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

Using different data sets, we proved the effectiveness of REFERENCES


the STCNN-RN’s model regardless of stock market, data [1] K. Alkhatib, H. Najadat, I. Hmeidi, and M. K. A. Shatnawi, “Stock
frequency, or time interval. The model was proved to have price prediction using k-nearest neighbor (knn) algorithm,” International
Journal of Business, Humanities and Technology, vol. 3, no. 3, pp. 32–44,
a generalization ability for different data sets. 2013.
To catch the anomalous time points with companies and [2] W. Fenghua, X. Jihong, H. Zhifang, and G. Xu, “Stock price prediction
based on ssa and svm,” Procedia Computer Science, vol. 31, pp. 625–631,
explain the anomaly phenomenon, we proposed the GACG 2014.
and used this algorithm to discover the anomalous time points [3] O. Hegazy, O. S. Soliman, and M. A. Salam, “A machine learning model
with companies where the model cannot be fitted among all for stock market prediction,” arXiv preprint arXiv:1402.7351, 2014.
[4] A. Kazem, E. Sharifi, F. K. Hussain, M. Saberi, and O. K. Hussain,
transaction data. To prove the effectiveness of our proposed “Support vector regression with chaos-based firefly algorithm for stock
method, we also introduced baseline methods, including the market price forecasting,” Applied soft computing, vol. 13, no. 2, pp. 947–
Greedy method and the Random Choice method. From dif- 958, 2013.
[5] Y.-Y. Chen, C.-T. Chen, C.-Y. Sang, Y.-C. Yang, and S.-H. Huang, “Adver-
ferent models with different methods of selecting the best sarial attacks against reinforcement learning-based portfolio management
solution, we proved that the combination of the STCNN-RN strategy,” IEEE Access, vol. 9, pp. 50 667–50 685, 2021.
and GACG can model the multiple financial time series data [6] W. Bao, J. Yue, and Y. Rao, “A deep learning framework for financial time
accurately and can also capture the anomalous time points series using stacked autoencoders and long-short term memory,” PloS one,
vol. 12, no. 7, p. e0180944, 2017.
with companies. Comprehensive analysis of the experiment [7] R. Hafezi, J. Shahrabi, and E. Hadavandi, “A bat-neural network multi-
results indicated that the STCNN-RN combined with GACG agent system (bnnmas) for stock price prediction: Case study of dax stock
has a higher anomaly detection accuracy rate than other price,” Applied Soft Computing, vol. 29, pp. 196–210, 2015.
[8] K. Khare, O. Darekar, P. Gupta, and V. Attar, “Short term stock price
baseline models combined with different baseline methods prediction using deep learning,” in 2017 2nd IEEE International Con-
in most of the rolling tests. At the same time, we also intro- ference on Recent Trends in Electronics, Information & Communication
duced our LIME interpretable model to interpret and analyze Technology (RTEICT). IEEE, 2017, pp. 482–486.
[9] S. Selvin, R. Vinayakumar, E. A. Gopalakrishnan, V. K. Menon, and K. P.
the results of our STCNN-RN combined with the GACG. Soman, “Stock price prediction using lstm, rnn and cnn-sliding window
Finally, we also verified the effectiveness of the STCNN-RN model,” in 2017 International Conference on Advances in Computing,
combined with the GACG to capture the anomaly through Communications and Informatics (ICACCI), 2017, pp. 1643–1647.
[10] A. A. Adebiyi, A. O. Adewumi, and C. K. Ayo, “Comparison of arima
case studies. and artificial neural networks models for stock price prediction,” Journal
To summarize the above description can be list as follows: of Applied Mathematics, vol. 2014, 2014.
[11] A. A. Ariyo, A. O. Adewumi, and C. K. Ayo, “Stock price prediction using
the arima model,” in 2014 UKSim-AMSS 16th International Conference on
• The STCNN-RN and GACG can model the multiple Computer Modelling and Simulation, 2014, pp. 106–112.
financial time series data accurately and can also capture [12] A. Santoro, D. Raposo, D. G. Barrett, M. Malinowski, R. Pascanu,
the anomalous time points with companies. P. Battaglia, and T. Lillicrap, “A simple neural network module for rela-
tional reasoning,” in Advances in neural information processing systems,
• Using different data sets, we proved the effectiveness of 2017, pp. 4967–4976.
the STCNN-RN’s model has a generalization ability for [13] W. Li, V. Mahadevan, and N. Vasconcelos, “Anomaly detection and
different data sets. localization in crowded scenes,” IEEE transactions on pattern analysis
and machine intelligence, vol. 36, no. 1, pp. 18–32, 2013.
• The STCNN-RN combined with GACG has a higher
[14] P.-Y. Wang, C.-T. Chen, J.-W. Su, T.-Y. Wang, and S.-H. Huang, “Deep
anomaly detection accuracy rate than other baseline learning model for house price prediction using heterogeneous data anal-
models in most of the rolling tests. ysis along with joint self-attention mechanism,” IEEE Access, vol. 9, pp.
• We also introduced our LIME interpretable model to 55 244–55 259, 2021.
[15] A. Lazarevic, L. Ertoz, V. Kumar, A. Ozgur, and J. Srivastava, “A compar-
interpret and analyze the results of our STCNN-RN ative study of anomaly detection schemes in network intrusion detection,”
combined with the GACG. in Proceedings of the 2003 SIAM international conference on data mining.
• Finally, we also verified the effectiveness of the SIAM, 2003, pp. 25–36.
[16] H. Kwon and N. M. Nasrabadi, “Kernel rx-algorithm: A nonlinear
STCNN-RN combined with the GACG to capture the anomaly detector for hyperspectral imagery,” IEEE transactions on Geo-
anomaly through case studies. science and Remote Sensing, vol. 43, no. 2, pp. 388–397, 2005.
[17] W. F. Sharpe, “Capital asset prices: A theory of market equilibrium under
Future studies should improve some aspects of our model. conditions of risk,” The journal of finance, vol. 19, no. 3, pp. 425–442,
1964.
Because this study explored the discovery of anomaly phe- [18] E. F. Fama and K. R. French, “Common risk factors in the returns on stocks
nomena in all transaction data and the explanation of these and bonds,” Journal of, 1993.
abnormalities, investors can understand a stock market sit- [19] A. Roy, J. Sun, R. Mahoney, L. Alonzi, S. Adams, and P. Beling, “Deep
learning detecting fraud in credit card transactions,” in 2018 Systems and
uation holistically. However, the data sets that our model Information Engineering Design Symposium (SIEDS). IEEE, 2018, pp.
learned were all without anomaly labels; this situation pre- 129–134.
vented our ensuring the effectiveness of the proposed model [20] M. Schreyer, T. Sattarov, D. Borth, A. Dengel, and B. Reimer, “Detec-
tion of anomalies in large scale accounting data using deep autoencoder
for catching all possible anomalies. Therefore, we can use
networks,” arXiv preprint arXiv:1709.05254, 2017.
some data set with labels to build models in future work. [21] D. Huang, D. Mu, L. Yang, and X. Cai, “Codetect: financial fraud detection
We can also use the STCNN-RN combined with a GA to with anomaly feature detection,” IEEE Access, vol. 6, pp. 19 161–19 174,
formalize a prediction model, and the resultant model can 2018.
[22] S. Thiprungsri and M. A. Vasarhelyi, “Cluster analysis for anomaly
remind investors of anomaly phenomena and reduce invest- detection in accounting data: An audit approach.” International Journal
ment risks. of Digital Accounting Research, vol. 11, 2011.

VOLUME 4, 2016 17

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access

[23] WRDS. (2015) Wharton research data services. [Online]. Available: MEI-SEE CHEONG received the B.S. degree in
https://wrds-www.wharton.upenn.edu/ information management from National Taiwan
[24] C.-H. Kuo, C.-T. Chen, S.-J. Lin, and S.-H. Huang, “Improving gener- University of Science and Technology, Taipei City,
alization in reinforcement learning–based trading by using a generative Taiwan, in 2018, and the M.S. degree in informa-
adversarial market model,” IEEE Access, vol. 9, pp. 50 738–50 754, 2021. tion management from National Chiao Tung Uni-
[25] B. Podobnik and H. E. Stanley, “Detrended cross-correlation analysis: a versity, Hsinchu, Taiwan, in 2020. Her research in-
new method for analyzing two nonstationary time series,” Physical review terests include relational learning and explainable
letters, vol. 100, no. 8, p. 084102, 2008.
AI.
[26] Y. Wang, Y. Wei, and C. Wu, “Cross-correlations between chinese a-
share and b-share markets,” Physica A: Statistical Mechanics and its
Applications, vol. 389, no. 23, pp. 5468–5478, 2010.
[27] L. Liu, “Cross-correlations between crude oil and agricultural commodity
markets,” Physica A: Statistical Mechanics and its Applications, vol. 395,
pp. 293–302, 2014.
[28] L. Kullmann, J. Kertész, and K. Kaski, “Time-dependent cross-
correlations between different stock returns: A directed network of
influence,” Phys. Rev. E, vol. 66, p. 026125, Aug 2002. [Online].
Available: https://link.aps.org/doi/10.1103/PhysRevE.66.026125
[29] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales,
“Learning to compare: Relation network for few-shot learning,” in Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recog-
nition, 2018, pp. 1199–1208.
[30] Y. Hua, L. Mou, and X. X. Zhu, “Relation network for multilabel aerial im-
age classification,” IEEE Transactions on Geoscience and Remote Sensing, MEI-CHEN WU received the B.S. and M.S. de-
2020. grees in information management from Yu Da
[31] S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Lawrence Zitnick, University of Science and Technology, Miaoli,
and D. Parikh, “Vqa: Visual question answering,” in Proceedings of the Taiwan, in 2012 and 2013, respectively. She is
IEEE international conference on computer vision, 2015, pp. 2425–2433. currently pursuing the Ph.D. degree in informa-
[32] H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar, “Ef- tion management at National Chiao Tung Uni-
ficient gan-based anomaly detection,” arXiv preprint arXiv:1802.06222, versity, Hsinchu, Taiwan. Her research interests
2018. include information hiding, digital watermarking,
[33] T. Leangarun, P. Tangamchit, and S. Thajchayapong, “Stock price manip- deep learning, artificial intelligence, and financial
ulation detection using generative adversarial networks,” in 2018 IEEE
technology.
Symposium Series on Computational Intelligence (SSCI). IEEE, 2018,
pp. 2104–2111.
[34] Q. Wang, W. Xu, X. Huang, and K. Yang, “Enhancing intraday stock
price manipulation detection by leveraging recurrent neural networks with
ensemble learning,” Neurocomputing, vol. 347, pp. 46–58, 2019.
[35] S. R. Islam, S. K. Ghafoor, and W. Eberle, “Mining illegal insider trading
of stocks: A proactive approach,” in 2018 IEEE International Conference
on Big Data (Big Data). IEEE, 2018, pp. 1397–1406.
[36] D. Gunning, “Explainable artificial intelligence (xai),” Defense Advanced
Research Projects Agency (DARPA), nd Web, vol. 2, p. 2, 2017.
[37] M. T. Ribeiro, S. Singh, and C. Guestrin, “" why should i trust you?"
explaining the predictions of any classifier,” in Proceedings of the 22nd
ACM SIGKDD international conference on knowledge discovery and data
mining, 2016, pp. 1135–1144.
[38] ——, “Anchors: High-precision model-agnostic explanations,” in Pro- SZU-HAO HUANG (Member, IEEE) received the
ceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, B.E. and Ph.D. degrees in computer science from
2018. National Tsing Hua University, Hsinchu, Taiwan,
[39] R. Guidotti, A. Monreale, S. Ruggieri, D. Pedreschi, F. Turini, and F. Gi- in 2001 and 2009, respectively. He is currently an
annotti, “Local rule-based explanations of black box decision systems,” Assistant Professor with the Department of Infor-
arXiv preprint arXiv:1805.10820, 2018. mation Management and Finance and the Chief
[40] J. Shawe-Taylor and B. Žličar, “Novelty detection with one-class support Director with Financial Tehnology (FinTech) In-
vector machines,” in Advances in Statistical Models for Data Analysis. novation Research Center, National Yang Ming
Springer, 2015, pp. 231–257. Chiao Tung University, Hsinchu, Taiwan. His re-
[41] S.-C. Li, C.-F. Huang, C.-C. Tu, and A.-P. Chen, “Discovery of abnormal search interests include artificial intelligence, deep
financial behavior in a dynamic finance environment with hierarchical self-
learning, recommender system, computer vision, and financial technology.
organizing mapping,” in The 2nd International Conference on Software
He has authored more than 50 papers published in the related international
Engineering and Data Mining. IEEE, 2010, pp. 450–455.
[42] T. Kohonen, “Cybernetic systems: Recognition, learning, self-
journals and conferences. He is also the Principal Investigator of the MOST
organization,” Research Studies Press, Ltd., Letchworth, Herfordshire, Financial Technology Innovation Industrial-Academic Alliance and several
UK, p. 3, 1984. cooperation projects with leading companies in Taiwan.

18 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/

You might also like