Interpretable Stock Anomaly Detection Based On Spa
Interpretable Stock Anomaly Detection Based On Spa
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
INDEX TERMS Anomaly Detection, Genetic Algorithm, Interpretable Model, Relation Network,
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
to predict stock prices; their experiment results revealed that ars have published numerous reasoning models for objects
the prediction results of the KNN algorithm were highly simi- or entities; these models may operate between the original
lar to actual stock prices. Fenghua et al. [2] employed various data reasoning relations. Among reasoning models, relational
economic features in a support vector machine (SVM) to networks (RNs) [12] are the most common. The method
make price predictions. The results indicated that the pre- of Santoro et al., proposed at DeepMind in 2017, involves
dictive methods that combine the price features into SVMs the use of RNs as a simple plug-and-play module to solve
have stronger performance. Hegazy et al. [3] proposed an problems that fundamentally hinge on relational reasoning.
algorithm that integrates particle swarm optimization (PSO) Most RN studies have focused on pattern recognition and
and a least-squares SVM (LS-SVM) to predict stock prices. classification tasks. To date, most RN studies have mainly
The results revealed that the proposed model had a more focused on pattern recognition and classification tasks.
favorable prediction accuracy and that the PSO algorithm In addition to studying the interactions between finan-
has potential for optimizing an LS-SVM. Kazem et al. [4] cial products to help investors understand financial markets,
proposed a forecasting model based on chaotic mapping, many studies have found that anomaly detection plays a vital
a firefly algorithm, and support vector regression (SVR) to role in financial investigations. Detecting anomalies can help
predict stock prices. Compared with related algorithms, the investors make investment decisions and can also reduce
proposed model exhibited the highest performance in terms investment risks. Scholars have applied anomaly detection
of two error measures: mean squared error (MSE) and mean to crowded scenes [13], pricing data [14], network intrusion
absolute percent error (MAPE). The preceding discussion [15], hyperspectral images [16], and various other topics. For
indicates that machine learning techniques exhibit favorable research on anomaly detection in the financial field, the vast
performance in stock market price prediction. majority of studies have used traditional methods as the main
In the past, machine learning methods used human knowl- research tool. Typical anomalies can be divided into market
edge for feature extraction from data. The difference between anomalies and pricing anomalies in the financial market.
machine learning and deep learning is that deep learning uses “Market anomalies” refers to the difference in returns and the
a multilayer neural network to scrutinize the data and extract contradiction between efficient market assumptions. “Pricing
relevant characteristics [5]. Numerous deep learning methods anomalies” means that the pricing of something (such as
have been published. Bao et al. [6] proposed a novel deep stocks or securities) is different from the pricing predicted by
learning framework in which wavelet transforms (WTs), the model; the two most representative models are the capital
stacked autoencoders (SAEs), and long short-term memory asset pricing model (CAPM) [17] and the Fama–French
(LSTM) are combined for stock price forecasting. Their three-factor model [18].The CAPM and Fama–French three-
results revealed that the proposed model outperforms other factor model are linear models of traditional finance. Com-
similar models in both predictive accuracy and profitability pared with existing deep learning models, these two models
performance. Hafezi et al. [7] proposed a bat-neural network do not completely fit the relevant theories. The complexity
multiagent system (BNNMAS) to predict stock price. The inherent in existing deep learning models enables them to
results revealed that the BNNMAS performs accurately and simulate these theories well, indicating that existing complex
reliably; thus, it can be considered a suitable tool for pre- neural network models have a certain generalization abil-
dicting stock prices, especially over the long term. Khare ity.Researchers have also applied deep learning techniques
et al. [8] used feedforward neural networks and recurrent to financial market data [19], [20]for anomaly detection.
neural networks (RNN) to forecast short-term stock prices. Most research on anomaly detection has not been able to
Their results indicated that the feedforward multilayer per- explain the reasons for anomalies; most research has not ef-
ceptron outperforms LSTM at predicting short-term stock fectively reminded investors of crucial details. The frequency
prices. Selvin et al. [9] identified the latent dynamics in of anomalous events is irregular, and no clear definition of
data by using deep learning architectures. These researchers anomalies can be found in financial data. Financial market
employed RNN, LSTM, and convolutional neural network data do not have exact labels, making them dissimilar to text
(CNN) architectures for price prediction of national stock or image data when used in supervised learning; this com-
exchange–listed companies and compared the performance plicates the use of traditional neural networks for training,
of these architectures. The results revealed that the proposed detection, and response.
system is capable of identifying some interrelations within In the context of the aforementioned shortcomings and
the data. These results highlight that a CNN architecture can concerns, this paper presents methods for learning stock mar-
be applied to identify changes in trends. The preceding dis- ket trends and capturing the anomalous time points relevant
cussion reveals that many studies on deep learning techniques to companies. It is vital to explain and to analyze the causes
have focused on stock price forecasting. of anomalies. However, financial deep learning is currently
Some studies have used statistical models or traditional fi- limited to learning the interactive relationship between two
nancial methods [10], [11] to predict stock prices and trends. sequences. Even though investors seek to comprehend the
However, we believe that financial market data already con- trends of entire financial markets, the existing models simply
tain some of the existing knowledge. Any financial system cannot meet this challenge. However, research on applying
exhibits interactions between financial commodities. Schol- RNs to the visual question answering (VQA) framework has
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
been quite successful. Such systems perform image recog- •Practical applications in various financial markets: We
nition tasks accurately, and the results for some data sets validated our studies by using multiple data sets. Minute
are superior to human judgments. Given that the input data price data of S&P 100 constituent stocks were taken
applied to the learning task of the RN are mostly image from Wharton Research Data Services Trade and Quot-e
data and a RN model can perform object reasoning, we pro- (WRDS TQA) [23]. Minute price data of FTSE TWSE
pose a spatiotemporal convolutional neural network–based Taiwan 50 index constituent stocks were taken from
relational network (STCNN-RN), which can simultaneously the Taiwan Stock Exchange Corporation (TWSE). Daily
process spatiotemporal data and consider the complex inter- price data of SSE 50 index constituent stocks were taken
active relationships of the input data. Initially, we use RN to from the Shanghai Stock Exchange (SSE).
construct our model, and when our model successfully fits The remainder of this paper is organized as follows: In
the training data during a training process, we must discover section 2, a brief introduction to some related works is
the most striking outlier time points that cannot fit the model. provided. Section 3 presents the proposed STCNN-RN and
Such discovery is a complex optimization problem, so we GACG method, and the experimental results are provided
use a genetic algorithm with a constrained gene (GACG) to in section 4. Finally, the conclusion of this paper and future
discover the most notable outliers among the anomalous time research suggestions are provided in section 5.
points. Consequently, we contend that when the model fits
most of the training data, the time points that cannot fit the II. RELATED WORKS
model are anomalous time points that are incompatible with In this section, we explain our learning-to-reason model and
the entire financial market. This fundamentally describes our how we perform anomaly detection through deep learning
strategy to discover anomalies. To enable investors to under- and explainable artificial intelligence approaches.
stand these anomalous time points, we use the local inter-
pretable model–agnostic explanations (LIME) interpretable A. LEARNING TO REASON MODEL
model to discover the key factors of abnormalities. We are convinced that existing knowledge and information
This study proposes an effective anomaly detection sys- exist in financial market data. In financial market infor-
tem. This system includes an improved RN to learn the mation, various financial commodities have interactive re-
performance levels and interactions of various companies in lationships with the financial market, and various relation-
the stock market, the results of which can assist investors to ships operate between different financial commodities. The
comprehend the anomalous time points exhibited by compa- learning-to-reason model can reason about (and discover)
nies. Furthermore, investors can be informed of the causes the relationships between various objects or entities. This
of anomalies, and investors can improve their understand- learning-to-reason model entails the process of discovering
ing of anomaly patterns. The research framework that we whether the system has meaningful patterns of information
have established can be used to conduct arbitrage. Despite flow or data transformation [24].
numerous publications on anomaly detection and analytical To analyze the relationships between time series data
methods, but most of which has not addressed time series and to model such data, some researchers have used tradi-
data, only a few studies have focused on financial data, such tional statistical methods and complex mathematical models.
as accounting data [21], [22] and credit card data [19]. Unlike Podobnik et al. [25] proposed a method based on detrended
these approaches, we propose a novel system that can capture cross-correlation analysis in physics, physiology, and finance
the relationships between multiple companies and examine to analyze the relationship between two sequences; their
anomalous trends by using time series data and analyzing the method may have added diagnostic capabilities to the statis-
causes of anomaly patterns. tical methods that were current at the time. Wang and other
Our major contributions of this research can be summa- researchers [26] conducted research on the China stock mar-
rized as follows: ket. From the perspective of statistical analysis, the authors
• RN-based market model: Our proposed research frame- used the detrended cross-correlation analysis between the
work is the first to use the STCNN-RN to model the return series of the China A-share and B-share markets and
complex cross-correlations between various companies found that the system has long-term and short-term cross-
in the stock markets; this framework can fit regular correlations. Li and Liu [27] conducted research on cross-
market behaviors between each pair of companies. correlations between the agricultural commodity markets and
• Genetic algorithm–based anomaly detection method: the oil markets. The authors also used the DCCA method to
Our proposed STCNN-RN and GACG can discover discover that high oil prices caused the food crisis between
anomalous time points relevant to companies and can 2006 and 2008. Kullmann et al. [28] sought a time correlation
discover major events in stock markets. of returns between New York Stock Exchange stocks and
• LIME-based interpretable model: We use the LIME studied whether the return of one stock can affect the return
interpretable model to analyze and explain the causes of another stock at different times.
of anomalies. Using interpretable models to explain The aforementioned research sought interactive relation-
anomaly patterns makes it easier for investors to under- ships between various time series data, but those studies
stand the market situation. only modeled the cross-correlation between two sequences.
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
Because the financial market is a complex and changeable data mining on illegal insider trading cases and historical
environment, we require a model that can learn from multiple stock volume data. Furthermore, this research was the first
time series data sets to capture the complex interactions of such study of illegal insider trading using real cases.
financial market data. To surmount the previously discussed Schreyer et al. [20] argued that the general anomaly de-
shortcomings, a novel model called RN [12] was proposed. tection method is to discover existing anomaly patterns from
RN is a neural network model proposed by DeepMind in existing accounting data. Although a set of manual rules
2017 to solve the relational reasoning problem. Most of the for catching anomalies can be learned from these existing
features of RNs focus on pattern recognition and classifica- anomaly patterns and the effect can be exceptional, it is to
tion, such as object detection, few-shot learning [29], image be expected that fraudsters will gradually discover methods
recognition [30].Just as CNNs have spatial translation in- to avoid these anomaly detection tools. Therefore, the au-
variance, RNs are inherently capable of relational reasoning. thors proposed a method based on deep autoencoder neural
By constraining the functional form of a neural network, an networks to detect anomalous accounting data. Their results
RN has the core common properties of relational reasoning. proved that the f1 scores obtained by this method were higher
Scholars have researched image recognition by combining an than those of the benchmark method and the false alarm rate
RN with a VQA system [31] The system performed excep- was also lower than that of the benchmark method. Roy and
tionally well with both three-dimensional-rendered objects other researchers [19] conducted research on anomaly detec-
and a text-based VQA data set. tion in credit card data. They sought to discover anomalies in
In light of the aforementioned research, we observe that credit card fraud from existing credit card transaction data
the learning-to-reason model has some deficiencies. Most of and historical customer data, thus providing a solution to
the current research on RNs has focused on image tasks for the problem of credit card fraud detection. They proposed
learning, but few studies have been published in the financial a deep neural network model to solve the problem of credit
field. Much can be improved in the existing RN model. card fraud based on deep learning methods. It was found
Because the tasks that the existing RN must address are that the effect of LSTM and GRU is significantly better than
pattern recognition and classification, most tested data sets that of a typical neural network for distinguishing abnormal
consist of images, and thus, the relation module focuses on transaction data from typical transaction data.
the spatial relationships among data. We contend that as long Few studies have been published on anomaly detection for
as different convolution operations are used on the input data, time series data in the deep learning field; the present paper
the relation module can be arbitrarily mobilized, meaning explains a deep learning method that can discover anomalous
that both spatial and temporal relationships can be addressed. time points for stocks and companies in time series data. Few
However, given the success of the RN in pattern recognition studies have applied deep learning methods to detect fraud
and classification and its ability to process spatiotemporal in accounting data or assess whether credit card transaction
data, we propose that the RN has the ability to process data contain fraudulent data. To date, no studies have used
financial data in multiple time series. Therefore, we further deep learning to discover anomalies in stock data.
contend that the RN can be used to learn the characteristics
of the inference problem between objects and to discover C. EXPLAINABLE ARTIFICIAL INTELLIGENCE
the interactive relationships between different time points APPROACHES
relevant to companies. Deep learning is developing rapidly in the early twenty-
first century. Deep learning tasks (e.g., natural language
B. ANOMALY DETECTION USING DEEP LEARNING processing, image processing, recommendation systems, and
Anomaly detection technique can identify anomalies, novel anomaly detection) have been distinguishing themselves by
data, and outlier data within vast quantities of data.Zenati et high performance levels. Among typical deep learning tasks,
al. [32] presented high-performance generative adversarial image processing is the most commonly studied; it is a
networks (GANs) for anomaly detection using image data core research field of computer science and engineering. Of
sets and network activity data sets. Leangarun et al. [33] the successful applications of deep learning, training CNNs
demonstrated a model that merged a long short-term memory to recognize patterns and images has received considerable
network (LSTM) and GANs to detect stock price manipu- attention. CNNs that recognize patterns and images tend to
lation. This was the first study that used LSTM-GANs to encounter problems and vulnerabilities, such as some outliers
investigate stock price manipulation using time series data. or adversarial examples that can confuse neural networks.
However, Wang et al. [34]found that a company’s character- Therefore, common algorithms and methods can be ap-
istic features can effectively be used to distinguish between plied with neural networks [36]of interpretability and model
manipulated and nonmanipulated stocks. Published models inspection. Those methods can help us understand how a
have rarely incorporated this type of discriminatory feature. model works. Studies in a variety of fields have proven
The authors applied an RNN-EL model for stock price ma- that intrinsically interpretable models can look at internal
nipulation problems using data sets with trading data and model parameters and make self-interpretations.A consider-
characteristic company features. Islam et al. [35] proposed able number of studies of counterfactual explanations have
detecting illegal insider trading of stocks by using proactive changed some of the features to change predicted out-
4 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
major problems must be addressed: (1) how to select anomaly explain the abnormal patterns identified in the first part and
time points from the total number of companies and times the second part. Moreover, using the introduced model to
and (2) how to analyze the causes of the abnormalities in the evaluate those features is more important for the model’s
selected time points and determine the types they belong to. judgment; we call these factors “key factors.” By providing
The three major parts of the system we designed can solve such information, (1) we hope that investors can more clearly
these two problems well. The first part is trend prediction understand the trend of the entire market and anomalous
(orange area in Fig. 1). For trend forecasting, we constructed phenomena; (2) we address the second problem in our study.
a problem network structure that combines RNs and a visual
question answering (VQA) system to capture the trends of B. STCNN-RN
diverse companies at various times and the interaction be- In the previous chapter, we mentioned that the application
tween them. We employed this approach because we believe of RNs combined with a VQA system for pattern recognition
that forecasting stock trends is more feasible than forecasting and classification tasks and relational reasoning problems can
stock prices, and the learning task is more meaningful. After be successful because an RN considers the relations between
showing that our model can better fit the data, we expanded objects in unstructured data. Our RN can be combined with
the number of stocks it learned from, expanding from individ- a VQA system to match strengths and weaknesses. The com-
ual stocks to the entire stock market. This can help investors bination can address volatility question types to fit financial
to focus on the stocks they are interested in and subsequently market data to capture the interaction of an entire financial
gain insight into stock market trends. market. We combine the RN with a VQA system to meet
The second part of the system involves anomaly detection our problem requirements and achieve the ultimate goal of
(blue area in Fig. 1). In the preceding paragraph, we outlined our research. Our network architecture was inspired by the
how we fit the data to our model. After the data were fitted RN [12] proposed by DeepMind in 2017, which can process
to our model, we introduced traditional genetic algorithms unstructured data such as an image or a series of sentences
to improve and optimize the model. By optimizing and and implicitly infer the relations between the objects con-
improving the model with traditional genetic algorithms, we tained in it. We thus propose the STCNN-RN, which has
were able to obtain genetic algorithms with constrained genes the ability to process spatiotemporal data and manage the
(GACGs) to meet the problem settings of our experiments. complex interactive relationships between the multiple time
We used GACGs to conduct anomaly detection experiments series.
on the VQA architecture combined with the input of the As demonstrated in Figure 2, we constructed a problem
RN model. The GACGs identify the companies and the time network structure that combines RNs and a VQA system to
points that are the strongest outliers in the model fit; thus, we capture the trends of different companies at various times and
endeavor to solve the first problem in our research through the interactions between them. This is because we believe
the first and second parts of our system. that forecasting stock trends is more straightforward than
The third part of the system involves anomaly explanation forecasting stock prices and the learning task is more mean-
(green area in Fig. 1). We introduced the LIME model to ingful. After proving that our model delivered excellent data
6 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
fitting, we expanded the number of stocks it learned from one items. This provides the model with great flexibility in the
stock to the entire stock market. This can help investors pay learning process.
attention to the stocks they are interested in and gain insight
into stock market trends. 2) Question
A typical RN operates on objects in the simplest possible As we described previously discussed, this model has two
form, and thus it does not explicitly operate on images or inputs; one input falls on the visual part, and the other input
natural language. The main contribution of the RN is to falls on the question part. The text or sentence can be used as
provide enough flexibility such that relatively unstructured the input data of our question part. In this part, we illustrate
inputs (such as CNN or LSTM embedding) can be regarded the existing research in the context of bottlenecks and argue
as a set of objects in an RN. Although an RN would typically that the system has a relative and interactive relationship
take the object representation as input, the system has no between different financial commodities in the time series
requirement to specify the semantics of the object, and the data on the financial market. Therefore, we use two question
RN learning process can induce upstream processing, thereby types to learn and model the stock trends. These questions
generating a set of useful objects from the distributed rep- are based on the volatility of the stock and the strengths and
resentation. To explain the operation mode and principle of weaknesses of the stock. The question with stock volatility
our proposed model in detail, we divide our model into four is usually not much different from the following example:
parts for the explanation, namely Visual, Question, RN, and we can ask “Is the volatility of company A larger than the
Answering. volatility of company B?” or “Is the volatility of company
A less than the volatility of company B?” If the problem
1) Visual is related to the strengths and weaknesses of stocks, the
The RN-with-VQA system has two inputs; a time series data question becomes “Is Company A stronger than Company
set serves as one of the inputs of our model in the visual B?” or “Is Company A weaker than Company B?” and so on.
part, and this input also serves as one of the objects of our The system always has four methods to ask each question.
model. The CNN can convolve the returned data with the Whether the system has a relationship and meaning be-
time series data size of C × T , which C is the number tween objects depends on the question. To obtain the question
of companies and T is the length of time. Although we encoded q,we use one-hot encoding to encode the question.
use the two-dimensional (2D) convolutional layer, the order The order of the encoding results is encoded according to
of each company in the input time series matrix does not the order of words. We can encode the two companies in
matter. Therefore, it is illogical to perform convolution from the question, and then we can encode the types of question.
top to bottom in the direction of the company axis. After We have two questions, one concerning the volatility of the
2D convolution, the information of each company can be stock and the other the strengths and weaknesses of stock.
collected according to the output unit setting of the previous Finally, we encode the condition types to which the problem
convolution layer and then mapped to a one-dimensional belongs. In one-hot encoding, the presence of the number “1”
feature map vector with different channel sizes. This may indicates that the feature exists, and the number “0” indicates
result in the loss of information about each company because that the feature does not exist.
it is a weighted sum of all companies. Consequently, we
use a one-dimensional (1D) filter of size (1,w) for the 2D 3) RN
convolution operation, which is the window size of a 1D The system has two inputs in the module of the RN, which
filter. The size of the company axis of the filter is locked are the feature maps from the CNN and the problem encoding
to 1, and the 1D convolution operation is simulated by the results from one-hot encoding. Now we must build an RN
length w of filter. We can obtain the feature map of input data module. The input to this network is a set of "objects" O =
through Eq. 1: {o1 , o2 , ..., on }. For example, the number of objects depends
on the sequence length of the final CNN feature map. After
Fk×T 0 = f (IC ×T ∗ k + b) (1) convolution, each feature map must be 2D (which retains
the company axis and time axis), the target time step and
where Fk×T 0 represents the feature maps of the final company information become a vector (which consists of
convolutional layer, IC×T denotes the input data of the model, the three-dimensional collected values output along with the
k represents the kernels (which can also be called filters) 2D convolution layer). This retains company information by
in the final convolutional layer, and b indicates the bias of convolving each company separately. In addition, because
the convolution layer. We know nothing about what specific the order of companies in the matrix is arbitrary, we avoid
image features should constitute the object. Therefore, after convolution in the direction of the company axis at the same
convolution of the image, the feature map is marked with time. If we want to extract a set of objects for relation
arbitrary coordinates indicating its relative spatial position calculation, we must use the third axis value of the final
and is regarded as an object of the RN. This means that CNN as the object. We must mark these objects with time
“objects” can include backgrounds, specific physical objects, and company coordinates, express them in the form of a key
textures, combinations of physical objects, and various other vector, and feed the object pairs to the RN module. Then
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
4) Answering
In the answering part, we can obtain the output after gθ
- MLP, which is called the partial relation. The sums of
all partial relation element can be written as a matrix, and Fig. 3. Flow Chart of the GACG
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
Algorithm 1 Genetic Algorithm with Constrained Gene for lem. Therefore, we introduce a genetic algorithm to solve
Anomaly Detection this problem. We can make changes and optimizations based
Input: on the GACG to meet our problem setting and achieve
AC×C×T : The accuracy matrix generated from the proposed the optimal effect. To clearly explain the operation of the
model, which is consist of the accuracy result of all C algorithm and its principle, we can divide the algorithm we
companies during a period of time T; propose into six parts and describe and explain the detailed
N: Population size for each generation; operation of each part. The six parts are generate, fitness
Crate : Cross rate, which is mating probability for chromo- function, selection, crossover, mutation and assessment, and
some crossover; limit the gene.
Mrate : Mutation rate, which is the mutation probability of Before this genetic algorithm is executed, we must train
each chromosome; the STCNN-RN so that it can achieve high training accuracy.
Output: To successfully execute the GACG, we must perform some
Chromosome phighest : The best chromosome in all genara- preliminary operations and prepare the input data required
tions. by it. We must send the training data (segmented according
1: Generate initial population P0 = {p0 , p1 , ..., pN −1 } to the date) to a properly trained STCNN-RN for evaluation.
2: Evaluate the fitness values of population P0 using Through this method, an accuracy matrix AC×C×T is cre-
AC×C×T by Equation 9 ated, where C represents the number of companies, and T
3: Find the highest fitness values of Chromosome phighest represents all dates of the training data. This accuracy matrix
by Equation 10 AC×C×T serves as the population in the actual real world
4: Calcuate the accuracy GAp_highest_acc of Chromosome solution space. As shown in Figure 3, we proposed genetic
phighest by Equation 11 algorithm with constrained gene(GACG) to conduct anomaly
5: Find the Rule-basedacc by Equation 12 detection experiments on the VQA architecture combined
6: while (GAp_highest_acc < Rule-basedacc ) do with the input of the relational network model. The genetic
7: Pnew = [] algorithm with constrained gene will find the companies and
8: Append (Select the topk fitness values of chromo- the time points that are the most outlier in the model fit, so
somes ptop_k ) into Pnew we hope to solve the first problem in our research through the
9: Select N-k chromosomes from population P accord- first part and the second part.
ing to the fitness values
10: for each chromosome i ∈ N-k chromosomes do 1) Generate
11: Crossover chromosome i with Crate After we have completed the pre-operation of the genetic
12: Mutation chromosome i with Mrate algorithm, then we officially start the first step of the ge-
13: Check up and Limit the Gene i netic algorithm. The first step of the genetic algorithm is
14: Append (chromosome i ) into Pnew to generate population. Firstly, we need to decide on the
15: end for representation we use to represent the solution of genetic
16: P = Pnew algorithm. The incorrect representation will cause the genetic
17: Evaluate the fitness values of population P algorithm to perform poorly. The most simple and frequently
18: Find the highest fitness values of chromosome used representation in the past genetic algorithms has been
pHighest the binary representation. This form of representation is rela-
19: Calcuate the accuracy GAp_highest_acc of Chromo- tively easy to use in the computing space to express solutions
some phighest in a way that the computing system understands and operates.
20: end while In this study, we use a binary representation of C × C × T
21: return Chromosome phighest elements to represent a solution. In this binary representation
of chromosome p , the pi,t th element represents the training
accuracy rate of the i th company at the time point t. We will
define the pti,j th element as follows:
information about a particular company at a particular point (
in time. The ultimate goal is to discover the most extreme t 1, anomaly
outliers among the training data. We contend that the data pi,j = (6)
0, normal
that cannot be fitted by the model may describe a certain
period in the financial market that is markedly different from If the pti,j th element is an anomaly, the definition returns
most periods for financial products in the entire market. At (1), and otherwise, the definition returns (0). In our research,
the same time, these captured data are different from the the population is defined as a subset of problem solutions.
usual changes, meaning that our RN cannot simulate these In the process of generating a population, we can ensure the
data. However, how to discover the most extreme outlier data diversity of the population to avoid the problem of premature
and the most anomalous training data among multisource convergence. We must repeatedly experiment to determine
financial time series data is a complicated optimization prob- the optimal population size because any excessively large
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
population can cause GACG’s operation speed to be unac- Equation 9 makes the calculated fitness values f (pvalues )
ceptably slow. Conversely, any excessively small population large when the accuracy values of chromosome pvalues(px )
can cause the problem of insufficient population diversity. are large. To allow every chromosome in the population to
After deciding the size of our population, we only fill a few have a chance of producing offspring, we add a small number
reasonable solutions in the set of all initialized solutions and to the fitness function, and consequently, the smaller pvalues
we fill the rest of the population with random solutions. becomes a smaller number instead of 0.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
to select N-k chromosomes.The equation for calculating the of gene value 1 matches the number of anomalous time points
probability of screening chromosome is as follows: we set. First, we can assess in each chromosome whether the
number of gene value 1 matches the number of anomalous
f (pvalues(pi ) ) time points we set. If the number of gene value 1 matches
P (f (pvalues(pi ) )) = PN (13)
i=1 f (pvalues(pi ) )
the number of anomalous time points, the chromosome can
be regarded as the new chromosome of the next generation.
where = pi represents a chromosome of the population, Conversely, if the number of gene value 1 does not meet
f (pvalues(pi ) ) stands for the fitness values of the chromo- the number we set, we limit the number of genes. First, we
some, and Ndenotes the size of the population for each discover and record the positions of all 1 in the gene and
generation. Then we can screen out N-kchromosomes equa- then extract the number we set from them as the new ones;
tion according to the calculated probability P(f (pvalues(pi ) )). then, we set the remaining positions to 0. After this stage
Each of these N-k chromosomes must enter the crossover is completed, the new chromosome becomes the population
part. Pnew of next generation.
When we obtain a new population Pnew ,we can recalculate
5) Crossover
fitness values of the new population. Then, we discover
First, we can make each of the N-kchromosome take turns as the candidate chromosome GAp_highest_acc of the new pop-
the parent, and we call it “father.” We can generate a random ulation Pnew to compare with the baseline accuracy rate
number and compare it with cross rateCrate . If the random Greedyto meet the suspension conditions until the stop condi-
number is larger than the cross rate Crate , then it cannot tions are met. In the end, the algorithm can return an optimal
enter the stage of cross over and the father’s chromosomes solution. The best solution is to achieve the best accuracy
directly become the chromosomes of the next generation. after deleting the anomalous time point.
Conversely, if this random number is less than the cross
rate Crate , we can officially enter the stage of cross over. IV. EXPERIMENTS
After entering the crossover part, we can randomly select
To ensure and verify the validity of our proposed model,
one of the N-k chromosomes as the mating target and we
we conducted three experiments to evaluate and measure
call it “mother.” Further, we can discover and remember the
the proposed model. Before discussing the experiments, we
mother gene position of 1. Suppose we set the number of
describe the data set used in the experiments. The baseline
anomalous time points to 60. Then, we can extract the half
methods used to compare the different models are defined.
of gene position 1 among the mother’s gene, and half the
After a detailed description of the experimental setup, the
gene positions are extracted from the father’s gene. For the
experimental results of the experimental setup are displayed
remaining gene positions, we fill in 0 as the gene. We can
and the three experiment’s results are discussed.
combine the extracted mother and father genes to form new
genes. This describes the mating process of chromosomes.
A. DATA SET DESCRIPTION
We used four data sets in three experiments: the S&P 500
6) Mutation
constituent stocks Top 20 Tech Companies data set, S&P 100
We take the chromosome that just emerged from the cross
constituent stocks data set, FTSE TWSE Taiwan 50 index
over and enter it into the gene mutation part. First, we
constituent stocks data set, and SSE 50 index constituent
generate a random number, and we can determine whether
stocks data set.
we must mutate genes Mrate for each gene. One can judge
whether a mutation is required by using a randomly generated TABLE 1. The company list of S&P 500 constituent stocks’s top 20 tech
number and comparing it with the mutation rate Mrate we set. companies dataset
If this random number is greater than the mutation rate Mrate ,
we do not enter the mutation procedure, and the chromosome AAPL, MSFT, V, MA, INTC, CSCO, ADBE,
can be directly used as the new chromosome of the next Ticker Symbol CRM, NVDA, ACN, AVGO, ORCL, IBM, TXN,
QCOM, FIS, ADP, INTU, FISV, MU
generation. Conversely, if this random number is less than
the mutation rate Mrate , we must enter the stage of gene
mutation. In the stage of gene mutation, we can turn the S&P 500 constituent stocks Top 20 Tech Companies data
original 1 gene into 0. Conversely, genes that were originally set: As shown in Tables 1, we collected the minute data
0 are set to 1. of S&P 500 constituent stocks from 2015 to 2016. S&P
500 constituent stocks contain five features, which are open,
7) Assess and limit the gene high, low, close, and volume. We can select the company in
In this part, we ensure that in each chromosome, the number the S&P 500 constituent stocks according to its published
of gene value 1 matches the number of anomalous time points weight. Therefore, we selected the 20 technology companies
we set. We have specially set up the system to assess and limit in the S&P500 with the greatest weights as our first data
the gene. After each chromosome has gone through selection, set. The number of stocks in the 2 years is different. After
mating, and mutation, the system can confirm that the number discarding some incomplete stock data, the number of stocks
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
in 2015 and 2016 are 483 and 490, respectively. In the process comprehensively rank stocks based on market capitalization
of screening stocks, we selected 20 technology companies and turnover and to select the top 50 stocks to form a sample,
that coexisted during the 2 relevant years. excluding stocks that have abnormal market performance and
are deemed inappropriate by the expert committee. Due to
TABLE 2. The company list of S&P 100 constituent stocks dataset changes in the SSE’s selection of constituent stock rules,
the list of SSE constituent stocks has changed greatly. We
AAPL, ABBV, ABT, ACN, AIG, ALL, AMGN, collected daily trading data of constituent stocks in 2019 and
AMZN, AXP, BA, BAC, BIIB, BK, BMY,
BRKB, C, CAT, CELG, CL, CMCSA, COF,
2020. We selected from 50 listed companies coexisting on the
COP, COST, CSCO, CVS, CVX, DIS, EMR, constituent stock list in 2019 and 2020 and finally selected 37
EXC, F, FB, FDX, FOXA, GD, GE, GILD, for our training and testing data sets in Experiment 2.
GM, GOOG, GOOGL, GS, HAL, HD, HON,
Ticker Symbol IBM, INTC, JNJ, JPM, KMI, KO, LLY,
LMT, LOW, MA, MCD, MDLZ, MDT, MET, B. PERFORMANCE EVALUATION OF THE PROPOSED
MMM, MO, MON, MRK, MS, MSFT, NKE,
ORCL, OXY, PEP, PFE, PG, PM, QCOM, ANOMALY DETECTOR
RTN, SBUX, SLB, SO, SPG, T, TGT, An experiment was conducted to compare the performance
TWX, TXN, UNH, UNP, UPS, USB,
UTX, V, VZ, WBA, WFC, WMT of the STCNN-RN with other baseline models.
A multilayer deep neural network (multilayer DNN) is a
The S&P 100 constituent stocks data set: As shown in model composed of multiple layers of density. We introduced
Tables 2. In Experiment 1, the second data set we used is a model composed of multiple layers of density as our
the minute stock price data of S&P100 constituent stocks. experimental baseline model. The input of the visual part
Minute price data of S&P 100 constituent stocks were taken in this model is IC×T , but we chose to directly import the
from Wharton Research Data Services Trade and Quot-e flattening layer to process this input. The input processing
(WRDS TQA) [23]. To compare the effects between the method of the question part is the same as that of the STCNN-
S&P 500 constituent stocks Top 20 Tech Companies data set RN. Then, we import the concatenated layer to process the
and S&P 100 constituent stocks data set, we took the years output from the visual part and question part and connect
shared by the two data sets and finally screened out the 2015 four layers of density behind this concatenated layer. The
and 2016 data for model training. Therefore, when screening output of the last dense layer passes through the activation
S&P100 constituent stocks, we also only took stocks shared function “sigmoid” before the answer A is returned. The
in the 2 years. The list of S&P100 companies in the 2 years process previously discussed is how this model works.
is different; we took out the stock price data of 90 companies Another baseline model used in our experiment was the
that also existed in 2 years from the 100 companies. temporal convolution neural network–based relational net-
The FTSE TWSE Taiwan 50 Index constituent stocks data work (TCNNRN). We applied financial time series data in
set: The FTSE TWSE Taiwan 50 index is an index jointly a model based on 1D convolution. The difference between
compiled by the Taiwan Stock Exchange(TWSE) and the the TCNNRN and STCNN-RN is that in the visual part, the
FTSE Index. The FTSE TWSE Taiwan 50 Index Components TCNNRN can use a multilayer 1D-CNN to process the input
cover the top 50 listed companies in the Taiwan stock market matrix to generate a set of objects for the RN module. After
by market capitalization, and it is highly correlated with returning the data for convolution operation, we can connect
the broader market. The minute price data of stocks were the object pairs extracted from the final CNN feature map
taken from the TWSE, and we collected data on intraday with the corresponding time coordinates and question one-
stock transactions from 2016 to 2017, including information hot encoding vector. Then, we feed the connected vector to
such as opening, high, low, closing, and volume. Because the the RN module to calculate the relation, and the final output
list of FTSE TWSE Taiwan 50 Index constituent stocks in is the corresponding answer A.
2016 and 2017 are different, we only took stocks shared in To verify the effectiveness of the GACG, we also proposed
2016 and 2017 as our training data. We screened 48 listed other baseline methods for determining genes, which are
companies from 50 listed companies in Experiment 2. None, Random Choice, and Greedy. None is the method to
The SSE 50 Index constituent stocks data set: Daily price measure the ability of our model to capture the relationship
data of SSE 50 index constituent stocks were taken from the between different time series data, for which we introduced
Shanghai Stock Exchange (SSE). The SSE 50 Index is an binary cross-entropy to calculate the performance of the
index compiled by the SSE. It is regarded as an index repre- model on the binary classification problems. To evaluate the
senting the overall situation of the most influential companies ability of the GACG to catch anomalies, we introduced the
on the SSE. It selects the most representative stocks with random choice method. The concept of random choice is to
a large scale and excellent liquidity in the Shanghai stock randomly generate N chromosomes and then calculate the
market to form the index. The goal of compiling this index is accuracy for each chromosome, after which we sum the N
to establish a large-scale investment index with active trans- calculated accuracy and divide by N to obtain the accuracy
actions that can be used as the basis for derivative financial of random choice. We also select the worst accuracy rate
instruments. The method of selecting component stocks is to of 5% in the accuracy matrix and set the values to 0. Then
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
TABLE 3. Anomaly detection accuracy of existing models with S&P 100 constituent stocks data set
Rolling Test
Model Anomaly Detection Average
1 2 3 4 5 6 7 8 9 10
None 0.5576 0.5067 0.7545 0.6108 0.6654 0.6054 0.6183 0.6355 0.6662 0.5168 0.6083
Random Choice 0.5766 0.5251 0.7744 0.6299 0.6848 0.623 0.6353 0.6532 0.6847 0.5381 0.6415
Multi-layer DNN
Greedy 0.5773 0.5258 0.7752 0.6305 0.6856 0.6238 0.6363 0.6542 0.6855 0.5388 0.6423
Proposed GACG 0.5776 0.526 0.7757 0.6307 0.6858 0.6241 0.6367 0.6546 0.6859 0.5391 0.6426
None 0.5747 0.5103 0.5649 0.544 0.5138 0.5164 0.5508 0.5203 0.5219 0.5172 0.5334
Random Choice 0.5812 0.516 0.5713 0.5501 0.5196 0.5222 0.557 0.5262 0.5278 0.523 0.5394
TCNN-RN
Greedy 0.5819 0.5168 0.572 0.5508 0.5203 0.5228 0.5577 0.5269 0.5284 0.5237 0.5401
Proposed GACG 0.8661 0.5192 0.5759 0.554 0.5228 0.5252 0.5612 0.5292 0.5305 0.5252 0.5709
None 0.7964 0.7664 0.7867 0.7799 0.795 0.7772 0.7698 0.7745 0.818 0.8127 0.7876
Random Choice 0.8054 0.775 0.7956 0.7887 0.8039 0.7859 0.7784 0.7833 0.8272 0.8218 0.7965
STCNN-RN
Greedy 0.8067 0.7761 0.7964 0.7896 0.8049 0.787 0.7796 0.7843 0.8282 0.823 0.7976
Proposed GACG 0.8212 0.7907 0.8093 0.8027 0.818 0.8006 0.7928 0.7978 0.8429 0.8368 0.8113
we add the entire accuracy matrix AC×C×T and divide by capture the interactions between different companies, and
C × C × T to obtain the Greedy accuracy. the STCNN-RN has the ability to process spatiotemporal
In Experiment 1, we used two data sets to train our pro- data. Compared with other baseline models, the experiment
posed model. The two data sets are the S&P 500 constituent with the STCNN-RN combined with the GACG proves that
stocks Top 20 Tech Companies data set and S&P 100 con- the model can discover the best solution to obtain a higher
stituent stocks data set. The purpose of using these two data anomaly detection accuracy rate.
sets was to verify the effectiveness of our proposed model.
Moreover, we also want to prove that our model can learn the C. ANOMALY DETECTION IN VARIOUS FINANCIAL
interactions between stocks regardless of having only a small MARKETS
amount of time series data or a large amount of time series To prove that our model has a certain generalization ability,
data. in addition to using the S&P 100 data set to verify the
In Experiment 1, we used two data sets, which are the performance of our proposed model on the US stock market,
S&P 500 constituent stocks Top 20 Tech Companies data we introduced FTSE TWSE Taiwan 50 Index Components
set and S&P 100 constituent stocks data set. We used the and SSE 50 Index Components; they represent the Taiwan
closing price to calculate the return data required by the stock market and the Shanghai stock market respectively.
model and then processed them into the input required by the In Table 5, we list the outcome of subjecting three data
model. The results in Tables 4 and Table 3 indicate that the sets to our proposed model and baseline models. In this
extent to which the model can learn financial time series data experiment, the three data sets represented different stock
affects the model’s fitting accuracy. Based on the results of markets. The component stock data set of the S&P100 Index
the experiments, we conclude the STCNN-RN has excellent represented the US stock market, the component stock data
fitting accuracy for the S&P 500 constituent stocks Top 20 set of FTSE TWSE Taiwan 50 Index represented the Taiwan
Tech Companies data set and the S&P 100 constituent stocks stock market, and the SSE 50 Index represented the Shanghai
data set. This means that the STCNN-RN has the ability stock market.
to learn the financial time series data and is suitable for It turns out that the proposed model can fit the stock
processing spatiotemporal data. For comparing the results of markets of different stock markets well. Even when facing
these models, we used bold fonts to highlight the highest data sets with different data frequencies, the STCNN-RN
anomaly detection accuracy. fitting data accuracy was still better than that of the TCNNRN
To conduct the anomaly detection accuracy experiments or multilayer DNN. Moreover, this proved that the proposed
detailed in Table 4 and Table 3, we used the random choice model can learn the interactive relationships in different time
method and Greedy method, respectively, to screen our best series data regardless of whether it is intraday transaction
solutions. The solution obtained by the STCNN-RN com- data or daily transaction data and that it is more than capable
bined with the GACG achieved the highest accuracy after of predicting future trends. However, the performance of the
deleting all anomalous time points with companies, followed TCNNRN was slightly inferior to that of the multilayer DNN,
by the performance of the multilayer DNN combined with possibly due to the use of the 1D-CNN to process input data
the GACG and the STCNN-RN combined with the GACG. to make object pairs, which prevents the model from learning
Table 3 shows the STCNN-RN combined with the GACG how to predict stock trends.
has a higher accuracy rate than the other baseline models, Table 5 lists the accuracy results of the three models
and the anomaly accuracy rate ranges from 0.7907 to 0.8429 combined with different baseline methods of the best solu-
in most of the rolling tests. Table 4 has the same results, tion. Comprehensive assessment of the experiment results
the anomaly accuracy rate ranges from 0.7809 to 0.9457. indicates that the STCNN-RN combined with the GACG
Therefore, we can draw the following conclusions from the has a higher accuracy rate than the other baseline models
results of Experiment 1: the STCNN-RN can accurately combined with different baseline methods and the anomaly
VOLUME 4, 2016 13
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
TABLE 4. Anomaly detection accuracy of existing models with FTSE TWSE Taiwan 50 Index constituent stocks data set
Rolling Test
Model Anomaly Detection Average
1 2 3 4 5 6 7 8 9 10
None 0.7721 0.829 0.741 0.792 0.7511 0.6959 0.7322 0.7179 0.7027 0.7167 0.7451
Random Choice 0.7886 0.8467 0.7569 0.8088 0.7671 0.7107 0.7478 0.7331 0.7178 0.7321 0.761
Multi-layer DNN
Greedy 0.7898 0.8479 0.7582 0.8104 0.7685 0.7122 0.7491 0.7345 0.7191 0.7338 0.7624
Proposed GACG 0.8119 0.8715 0.7804 0.8323 0.7896 0.7319 0.7697 0.7547 0.7402 0.7564 0.7839
None 0.7331 0.7453 0.7171 0.7508 0.7233 0.7249 0.7521 0.715 0.7309 0.7111 0.7304
Random Choice 0.7487 0.7612 0.7323 0.7668 0.7387 0.7403 0.7681 0.7302 0.7465 0.7263 0.7459
TCNN-RN
Greedy 0.75 0.7626 0.7337 0.7682 0.7401 0.7418 0.7694 0.7316 0.7481 0.7278 0.7473
Proposed GACG 0.7713 0.7851 0.7544 0.7898 0.7625 0.7624 0.7914 0.7516 0.77 0.7487 0.7687
None 0.852 0.8851 0.8599 0.8271 0.7417 0.8477 0.8803 0.851 0.9027 0.8797 0.8527
Random Choice 0.8702 0.904 0.8782 0.8447 0.7576 0.8657 0.8991 0.8691 0.9219 0.8984 0.8709
STCNN-RN
Greedy 0.8711 0.9046 0.8789 0.846 0.7592 0.8665 0.8996 0.8701 0.9224 0.8989 0.8717
Proposed GACG 0.8941 0.9273 0.9008 0.8687 0.7809 0.889 0.9222 0.8935 0.9457 0.9212 0.8943
TABLE 5. Anomaly detection accuracy of existing models with SSE 50 Index constituent stocks data set
Rolling Test
Model Anomaly Detection Average
1 2 3 4 5 6 7 8 9 10
None 0.8352 0.793 0.7967 0.7815 0.8148 0.8447 0.8143 0.8352 0.8246 0.8092 0.8149
Random Choice 0.9171 0.876 0.8817 0.8592 0.9167 0.9542 0.9043 0.9171 0.903 0.8807 0.901
Multi-layer DNN
Greedy 0.9213 0.8815 0.8868 0.8642 0.9213 0.9569 0.9087 0.9222 0.908 0.8877 0.9059
Proposed GACG 0.9527 0.9115 0.9174 0.8933 0.9536 0.9607 0.94 0.9542 0.9383 0.9175 0.9339
None 0.545 0.5355 0.5664 0.5119 0.5198 0.5268 0.5699 0.5843 0.5595 0.5741 0.5493
Random Choice 0.728 0.6569 0.7353 0.6407 0.6581 0.6449 0.724 0.7537 0.7224 0.6588 0.6923
TCNN-RN
Greedy 0.7284 0.666 0.7366 0.6492 0.6657 0.6467 0.7229 0.763 0.7173 0.6667 0.6963
Proposed GACG 0.77 0.6918 0.775 0.6756 0.6939 0.6781 0.7606 0.7938 0.7653 0.6915 0.7296
None 0.8133 0.8448 0.938 0.8567 0.9092 0.8904 0.9077 0.9106 0.9035 0.9388 0.8913
Random Choice 0.9446 0.9294 0.9716 0.9476 0.9603 0.9628 0.9626 0.9792 0.9496 0.9767 0.9584
STCNN-RN
Greedy 0.9487 0.9364 0.9741 0.9515 0.9634 0.967 0.9663 0.9806 0.9566 0.9787 0.9623
Proposed GACG 0.9822 0.97 0.9754 0.9857 0.9939 0.9985 0.998 0.9816 0.9959 0.9799 0.9861
accuracy rate ranges from 0.97 to 0.9985 in most of the Shawe-Taylor and Žličar [40] applied the OC-SVM to iden-
rolling tests. The second highest performance was that of the tify potential anomalies in financial time-series data and to
multilayer DNN combined with the GACG, followed by the find the distribution and the timing of the occurrence of the
performance of the TCNNRN combined with the GACG. In anomalous behavior in these data. The experiment results
this experiment, the time intervals in the three data sets were indicated that the OC-SVM detected changes in anomalous
different, but their rolling windows and shift month methods behavior in synthetic data sets and in several empirical data
were the same.Therefore, we once again proved the quality sets.
of the performance of our model in the learning tasks and An SOM [42] is an unsupervised clustering algorithm. An
complex correlation of multiple financial time series data in SOM differs from other clustering algorithms in that it has
this experiment. This indicates that when facing various stock a topological map that is used to express the distribution of
markets, data frequencies, or data intervals, the STCNN-RN each output or cluster. Therefore, SOMs can express the orig-
combined with the GACG method has outstanding perfor- inal high-dimensional space data through the visualized low-
mance. dimensional space, and the visualized result can also effec-
tively explain the result of the grouping. Li et al. [41] used an
D. COMPARE DIFFERENT ALTERNATIVE MODELS SOM in a dynamic environment for discovering the abnormal
Tables 3-5 indicate that the solution obtained by the STCNN- financial behaviors of corporations. The experiment results
RN combined with the GACG achieved the highest accu- indicated that the combination of macroeconomic indicators
racy. Therefore, we used the STCNN-RN combined with the and financial indicators is superior to the use of financial
GACG method to perform comparisons with other bench- indicators alone. The utilization of a hierarchical SOM helps
mark models. We selected the one-class SVM (OC-SVM) in identifying the abnormal financial behaviors associated
[40] and self-organizing maps (SOMs) [41] as the control with corporate operations more effectively.
group. The results for the OC-SVM, SOM, and our proposed
The OC-SVM is an unsupervised algorithm, and the train- method are provided in Table 6. Our method outperformed
ing data have only one classification. In this algorithm, a the other methods in anomaly detection accuracy, and it
decision boundary is learned through the characteristics of exhibited the strongest detection capabilities in 10 rolling
normal samples, and this boundary is used to determine tests. To validate that our proposed model has more favorable
whether the new data point is similar to the training data; detection capabilities, we used a t test. The t test is used to de-
data beyond the boundary are regarded as abnormal data. termine whether a significant difference exists between two
14 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
TABLE 6. Anomaly detection accuracy of different alternative models with S&P 100 constituent stocks data set
Rolling Test
Anomaly Detection Average
1 2 3 4 5 6 7 8 9 10
OCSVM 0.5509 0.5394 0.5442 0.5501 0.5439 0.5361 0.5508 0.5432 0.5652 0.5335 0.5457
SOM 0.5951 0.5306 0.5881 0.5949 0.6219 0.5598 0.5497 0.6349 0.6518 0.5803 0.5907
Proposed GACG 0.8212 0.7907 0.8093 0.8027 0.818 0.8006 0.7928 0.7978 0.8429 0.8368 0.8113
Rolling Test
1 2 3 4 5 6 7 8 9 10
F-value 47.70736 9.42140 36.30940 13.45678 33.73106 52.48977 23.38555 29.89704 38.24091 2.39872
P-value 4.20E-09 0.00326 1.24E-07 0.00053 2.82E-07 1.14E-09 1.01E-05 1.01E-06 6.77E-08 0.12687
Test result Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Accept H0
T-value 124.38467 413.67958 120.46406 113.82810 133.68709 112.32497 148.25440 149.31730 190.73750 429.87419
P-value 4.12E-72 2.44E-102 2.62E-71 6.92E-70 6.37E-74 1.49E-69 1.61E-76 1.06E-76 7.46E-83 2.63E-103
Test result Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0
Rolling Test
1 2 3 4 5 6 7 8 9 10
F-value 31.61851 9.08195 50.03186 97.66312 46.27081 11.23722 51.40315 77.79999 75.52385 28.86643
P-value 5.65E-07 0.00382 2.21E-09 4.80E-14 6.29E-09 1.42E-03 1.52E-09 2.63E-12 4.31E-12 1.43E-06
Test result Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0
T-value 84.07712 167.15383 40.20941 45.73598 69.99179 141.44962 38.64878 56.08296 51.86619 155.89619
P-value 2.66E-62 1.55E-79 4.68E-44 3.34E-47 9.98E-58 2.44E-75 4.29E-43 3.16E-52 2.69E-50 8.79E-78
Test result Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0 Reject H0
d
which means the sampling was from different populations. In 1u
OlS
our results, the t test returned significant p values; thus, the 105·
VOLUME 4, 2016 15
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
VOLUME 4, 2016 17
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3077067, IEEE Access
[23] WRDS. (2015) Wharton research data services. [Online]. Available: MEI-SEE CHEONG received the B.S. degree in
https://wrds-www.wharton.upenn.edu/ information management from National Taiwan
[24] C.-H. Kuo, C.-T. Chen, S.-J. Lin, and S.-H. Huang, “Improving gener- University of Science and Technology, Taipei City,
alization in reinforcement learning–based trading by using a generative Taiwan, in 2018, and the M.S. degree in informa-
adversarial market model,” IEEE Access, vol. 9, pp. 50 738–50 754, 2021. tion management from National Chiao Tung Uni-
[25] B. Podobnik and H. E. Stanley, “Detrended cross-correlation analysis: a versity, Hsinchu, Taiwan, in 2020. Her research in-
new method for analyzing two nonstationary time series,” Physical review terests include relational learning and explainable
letters, vol. 100, no. 8, p. 084102, 2008.
AI.
[26] Y. Wang, Y. Wei, and C. Wu, “Cross-correlations between chinese a-
share and b-share markets,” Physica A: Statistical Mechanics and its
Applications, vol. 389, no. 23, pp. 5468–5478, 2010.
[27] L. Liu, “Cross-correlations between crude oil and agricultural commodity
markets,” Physica A: Statistical Mechanics and its Applications, vol. 395,
pp. 293–302, 2014.
[28] L. Kullmann, J. Kertész, and K. Kaski, “Time-dependent cross-
correlations between different stock returns: A directed network of
influence,” Phys. Rev. E, vol. 66, p. 026125, Aug 2002. [Online].
Available: https://link.aps.org/doi/10.1103/PhysRevE.66.026125
[29] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales,
“Learning to compare: Relation network for few-shot learning,” in Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recog-
nition, 2018, pp. 1199–1208.
[30] Y. Hua, L. Mou, and X. X. Zhu, “Relation network for multilabel aerial im-
age classification,” IEEE Transactions on Geoscience and Remote Sensing, MEI-CHEN WU received the B.S. and M.S. de-
2020. grees in information management from Yu Da
[31] S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Lawrence Zitnick, University of Science and Technology, Miaoli,
and D. Parikh, “Vqa: Visual question answering,” in Proceedings of the Taiwan, in 2012 and 2013, respectively. She is
IEEE international conference on computer vision, 2015, pp. 2425–2433. currently pursuing the Ph.D. degree in informa-
[32] H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar, “Ef- tion management at National Chiao Tung Uni-
ficient gan-based anomaly detection,” arXiv preprint arXiv:1802.06222, versity, Hsinchu, Taiwan. Her research interests
2018. include information hiding, digital watermarking,
[33] T. Leangarun, P. Tangamchit, and S. Thajchayapong, “Stock price manip- deep learning, artificial intelligence, and financial
ulation detection using generative adversarial networks,” in 2018 IEEE
technology.
Symposium Series on Computational Intelligence (SSCI). IEEE, 2018,
pp. 2104–2111.
[34] Q. Wang, W. Xu, X. Huang, and K. Yang, “Enhancing intraday stock
price manipulation detection by leveraging recurrent neural networks with
ensemble learning,” Neurocomputing, vol. 347, pp. 46–58, 2019.
[35] S. R. Islam, S. K. Ghafoor, and W. Eberle, “Mining illegal insider trading
of stocks: A proactive approach,” in 2018 IEEE International Conference
on Big Data (Big Data). IEEE, 2018, pp. 1397–1406.
[36] D. Gunning, “Explainable artificial intelligence (xai),” Defense Advanced
Research Projects Agency (DARPA), nd Web, vol. 2, p. 2, 2017.
[37] M. T. Ribeiro, S. Singh, and C. Guestrin, “" why should i trust you?"
explaining the predictions of any classifier,” in Proceedings of the 22nd
ACM SIGKDD international conference on knowledge discovery and data
mining, 2016, pp. 1135–1144.
[38] ——, “Anchors: High-precision model-agnostic explanations,” in Pro- SZU-HAO HUANG (Member, IEEE) received the
ceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, B.E. and Ph.D. degrees in computer science from
2018. National Tsing Hua University, Hsinchu, Taiwan,
[39] R. Guidotti, A. Monreale, S. Ruggieri, D. Pedreschi, F. Turini, and F. Gi- in 2001 and 2009, respectively. He is currently an
annotti, “Local rule-based explanations of black box decision systems,” Assistant Professor with the Department of Infor-
arXiv preprint arXiv:1805.10820, 2018. mation Management and Finance and the Chief
[40] J. Shawe-Taylor and B. Žličar, “Novelty detection with one-class support Director with Financial Tehnology (FinTech) In-
vector machines,” in Advances in Statistical Models for Data Analysis. novation Research Center, National Yang Ming
Springer, 2015, pp. 231–257. Chiao Tung University, Hsinchu, Taiwan. His re-
[41] S.-C. Li, C.-F. Huang, C.-C. Tu, and A.-P. Chen, “Discovery of abnormal search interests include artificial intelligence, deep
financial behavior in a dynamic finance environment with hierarchical self-
learning, recommender system, computer vision, and financial technology.
organizing mapping,” in The 2nd International Conference on Software
He has authored more than 50 papers published in the related international
Engineering and Data Mining. IEEE, 2010, pp. 450–455.
[42] T. Kohonen, “Cybernetic systems: Recognition, learning, self-
journals and conferences. He is also the Principal Investigator of the MOST
organization,” Research Studies Press, Ltd., Letchworth, Herfordshire, Financial Technology Innovation Industrial-Academic Alliance and several
UK, p. 3, 1984. cooperation projects with leading companies in Taiwan.
18 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/