0% found this document useful (0 votes)

18 views

Statistical Arbitrage With ML 1721555596

Uploaded by

sohaibchoufani04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Statistical Arbitrage With ML 1721555596

Uploaded by

sohaibchoufani04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Available online at www.sciencedirect.

com

ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2022) 000–000
www.elsevier.com/locate/procedia
ScienceDirect
Procedia Computer Science 202 (2022) 194–202

International Conference on Identification, Information and Knowledge in the internet of

Things, 2021

Statistical Arbitrage with Momentum Using Machine Learning

Maojun Zhanga, Xiaohai Tanga, Shengpei Zhaob, Wenhua Wangc, Yang Zhaoa
a
School
School of Business, Suzhou University of Science and Technology, Suzhou 215009, China
b
Department of Computer Science, University College London, London,WC1E 6BT，UK
Department
c
Business School, Dalian University of Technology, Panjin 124000,China

Abstract

In this paper machine learning is used to investigate statistical arbitrage in China stock market. We use HS300 index constituent
stocks to construct pairs trading. The daily and monthly momentums in these stocks are used as new input factors to forecast the
stock price. We develop a trading approach to find that random forest (RF) outperform deep neural net (DNN), XGBoost, support
vector machine(SVM) and LSTM from January 2013 to August 2017.
© 2022 The Authors. Published by Elsevier B.V.
© 2022 The Authors. Published by ELSEVIER B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
This is an open
Peer-review access
under article under
responsibility of thethe CC BY-NC-ND
scientific license
committee of the(https://creativecommons.org/licenses/by-nc-nd/4.0)
International Conference on Identification, Information and
Peer-review under responsibility of the scientific
Knowledge in the Internet of Things, 2021 committee of the International Conference on Identification, Information and
Knowledge in the Internet of Things, 2021
Keywords: Momentum; Random forest; Statistical arbitrage; Machine learning;

1. Introduction

It is a very difficult task to predict price change of stocks. Statistical arbitrage is based on the mean reversion
principle. If the price fluctuation process of an asset is a stable time series, when the price of the asset fluctuates in a
short time, its price will return to the mean state in the next period of time due to the effect of the equilibrium
mechanism.Statistical arbitrage is to find a pair of stocks with high convergence in the stock market. The price
fluctuations of the two stocks have strong similarities. When the price of one stock rises, the price of the other stock
rises at the same time. In one certain period, the prices of the two stocks may deviate to a certain extent due to the
random factors of their respective companies.
The literature on statistical arbitrage mostly uses the traditional mathematical modeling methods such as time
series and stochastic control to find the pairing combination and solve the optimal transaction signal. It is expected
that machine learning algorithms can extract information from data more effectively. Huck (2009) developed a

1877-0509 © 2022 The Authors. Published by ELSEVIER B.V.

1877-0509 © 2022 The Authors. Published by Elsevier B.V.

This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the International Conference on Identification, Information and Knowledge
in the Internet of Things, 2021
10.1016/j.procs.2022.04.027
Maojun Zhang et al. / Procedia Computer Science 202 (2022) 194–202 195
2 Author name / Procedia Computer Science 00 (2019) 000–000

statistical arbitrage strategy using the integration of neural network and a multi criteria decision-making. His method
consists of three steps: prediction, ranking and trading. Huck (2010) further improved this method through multi-
step predictions. Takeuchi and Lee (2013) developed an enhanced momentum strategy for the CRSP stock market
from 1965 to 2009. Moritz and Zimmermann (2014) tested the statistical arbitrage strategy based on random forest
for the CRSP stock market data from 1968 to 2012, and found that the average monthly risk adjusted excess return
was 2%. When the feature data set includes 86 features of companies, the return increases to 2.28% monthly. Krauss
et al. (2017) applied deep neural network, gradient enhancement tree and random forest to S&P 500 index from
1992 to 2015. Using revenue based characteristics, they found that the combination of the above methods can
produce 0.45% revenue per day (before transaction costs). Fischer and Krauss (2018)use the LSTM network for the
same prediction task. Huck (2019) show that these technical indicators have the ability to generate trading signals
for portfolios with significant reversal effect and short holding period (one to five days).
Moreover, The momentum effect means that assets that have performed well in the past will often perform better
in the future. The momentum effect was discovered by Jegadeesh and Titman (1993,2001). Rouwenhorst (1998)
conducted an empirical study on the stocks of 2000 listed companies in twelve European countries and found that
momentum effect exists significantly in the European market. Schiereck et al. (1999) concluded that the momentum
strategy in the German market is better in the medium and long term. Chui's (2000)show that momentum strategies
can outperform the market in Asian stock markets. Pedro et al. (2015) show that momentum strategies can achieve
an excess return of more than 10% in several major global stock markets.
Therefore, this paper selects HS300constituent stocks to investigate statistical arbitrage using machine learning
algorithms. We take the momentum factor as the input feature to explore whether machine learning can make full
use of the momentum factor of stock price to successfully predict the rise and fall of stocks.The key task of the
employed machine learning methods is to accurately predict whether a stock outperforms HS300 index as a
benchmark.
The remainder of this paper is organized as follows.Section 2 briefly covers the data sample, software packages,
and our methodology, i.e., the generation of training and trading sets, the construction of input sequences, the model
architecture and training as well as the forecasting and trading steps. Section 3 presents the results and discusses our
most relevant findings. Finally, Section 4concludes.

2. Data and Methodology

2.1. Data and features

We take 84 stocks from HS 300 Index in Shanghai and Shenzhen Exchange markets. The data is from Wind
data over a 9-year period, from January 2, 2011 to December 31, 2019.This paper uses the sliding window method
to generate the training set and trading set.
The construction of input features mainly refers to the processing method (Takeuchi and Lee, 2013). First,
extract the daily returns of the stocks in the past 21 days, and then extract the monthly returns of the stocks in the
following 12 months. Next, use the daily returns of the past 21 days to calculate the cumulative returns, and use the
monthly returns of the following 12 months to calculate the 12 monthly cumulative returns. After calculating all the
daily momentum factors and monthly momentum factors, it is necessary to calculate the quantile of each momentum
factor in all 84 stocks. The quantile as each momentum factor is used as the input feature. In this way, a total of 33
characteristic variables are constructed.
Let P S ( Pts )tT represent the price of stock s ,where s  1, ...n .Then the return is defined as
Ps
R s  t 1. (1)
t, m s
P
t m
For the daily returns, define the range of period m  1, 2, 3...19, 20, 21 ,the range of period
m   42,...252, 273 for the monthly returns. A binary variable
196 Maojun Zhang et al. / Procedia Computer Science 202 (2022) 194–202
Author name / Procedia Computer Science 00 (2019) 000–000 3

Y  0,1 (2)
s |t l
is constructed for a stock s to represent the rise and fall trend. If the return of stock s exceeds the median return of
all stocks, the binary variable is Ys |t  l  1 (Category 1) , otherwise Ys |t  l  0 (category 0).
In order to scientifically and reasonably evaluate the real return level of statistical arbitrage strategy, we use
some classical performance evaluation indicators, including annualized return, return volatility, Sharp ratio, Sortino
ratio, etc. On each trading day t  1 , the probability that the return of stock s exceeds the median return of all

stocks at t is ps . We can find the undervalued stocks at the top of the ranking of the probability, the
t  1| t
overvalued stocks at the bottom of the ranking. Thus we buy the stocks with the highest rising probability and sell
the stocks with the highest falling probability.

2.2. Methodology

According to the empirical research of Moritz and Zimmermann (2014), Krauss et al. (2017), Fischer and
Krauss (2018), Huck (2019), etc., the random forest algorithm has a good performance in financial sequence
prediction. The random forest algorithm is not a single machine learning algorithm, but an integrated algorithm
based on decision tree model. Its estimator is a decision tree. The performance of each decision tree in classification
function determines the effect of random forest classification and prediction. In the process of decision tree growth,
the selection of features follows the principle of minimum information purity.
SVM is based on statistical learning theory, with extremely strict theoretical basis, based on the minimum
principle of VC dimensional theory and structural risk, and introduces the nuclear function, allowing its algorithm to
map high-dimensional space, but avoid complex the calculation and effectively overcomes the problem of disaster.
Since these more significant advantages, it is also applied in many fields and has achieved good results. Although
SVM theory and algorithm have had a large extent development and progress through such problems, on some
issues, such as training speed, nuclear function, calculation storage capacity, etc. Because of these advantages, SVM
can be well applied to pattern recognition, probability density function estimation, time series prediction, regression
estimation, etc.
Gradient boosting is one of the most powerful technologies for building prediction models. It is a representative
algorithm of boosting in integrated algorithms. The integration algorithm constructs multiple weak evaluators on the
data and summarizes the modeling results of all weak evaluators to obtain better regression or classification
performance than a single model. The weak evaluator is defined as performing at least better than random guess is a
better model, that is, any model with a prediction accuracy of no less than 50%. There are many ways to integrate
different weak evaluators. For example, the bagging method of establishing multiple parallel independent weak
evaluators at one time. There are also methods like the lifting method, which build weak evaluators one by one and
gradually accumulate multiple weak evaluators after many iterations. The most famous lifting algorithms include
AdaBoost and gradient lifting tree. XGBoost is developed from gradient lifting tree. Unlike traditional GBDT, the
traditional GBDT only uses the first order countdown information when optimizing, while XGBoost performs the
two order Taylor expansion for the loss function.
Deep neural network is composed of input layer, one or more hidden layers and output layers. The dimension
of input layer and input feature is equal. The output layer is a classification or regression layer to match the output
space. All layers are composed of neurons, the basic unit of this model. In the classical feed architecture, each
neuron is fully connected with all neurons in the previous layer, and each neuron represents a certain weight.
Moreover, the input layer and hidden layer of the neural network have bias units, which are used as the activation
threshold of neurons in the subsequence layer.
RNN is a cyclic network structure with the ability to maintain information. The cyclic network module in RNN
transmits information from the upper layer of the network to the lower layer. The output of the hidden layer of the
network module at each time depends on the information of the previous time. The chain attribute of RNN shows
that it is closely related to sequence annotation. In the training of RNN, there are problems of gradient explosion and
Maojun Zhang et al. / Procedia Computer Science 202 (2022) 194–202 197
4 Author name / Procedia Computer Science 00 (2019) 000–000

disappearance, and RNN is difficult to keep memory for a long time. LSTM network is an extension of RNN and is
specially designed to avoid long-term dependency problems. The repetitive neural network module of LSTM has
different structures, which is different from the naive RNN. There are four neural network layers that interact in a
special way.

3. Performance analysis

The research method of this paper is mainly divided into four steps. Firstly, the data set in a research cycle is
divided into two parts, the training set and the trading set. The training set is used to train the machine learning
model, and the trading set is used to verify the prediction effect of the model. The second step is to generate the
input characteristics and output characteristics. The third step is to train the random forest model on the training set
and determine the optimal parameter setting of the random forest model. The fourth step is to use the random forest
to predict on the trading set, rank the stocks according to the prediction results, and long the stocks with the highest
rising probability and short the stocks with the highest falling probability.

3.1. Performance for different algorithms

During the trading period, the cumulative returns of the five algorithms are shown in Figure1. We see the
trends, and find that the cumulative profit of the random forest algorithm is much higher than that of the other four
algorithms. The performance of LSTM algorithm is close to that of SVM algorithm. During the whole trading
period, the trend of its cumulative income is better than that of the HS300 index. XGBoost algorithm performs
worse than the HS300 index in the early stage, performs better in the late trading period, and finally outperforms the
HS300 index. During the whole empirical period, the cumulative return of DNN algorithm was poor, and finally
failed to exceed the HS300 index. In addition, in order to explore the profitability of the machine learning algorithm,
the proportion of the trading days with the returns greater than 0 before transaction cost and the proportion of the
after the transaction cost are respectively counted, as shown in Table 1.
The proportion of RAF, SVM and XGBoost is higher than that of the HS300 index, while the ratio of DNN
algorithm to LSTM algorithm is slightly lower than that of the HS300 index. After transaction costs, only RAF and
XGBoost algorithms have a higher proportion than the HS300 index, while the other three algorithms have a lower
proportion than HS300 Index. Through the above analysis, it can be found that most machine learning has the ability
to predict China's stock market. In addition, RAF algorithm performs better than the other four algorithms.
According to the accuracy performance of the random forest model in the training set, the number of estimators
is 39, the Gini coefficient used in the branching standard, the maximum depth of the random forest is 20, the
maximum feature is 20, the minimum segmented leaf node is 30, and the minimum number of leaf nodes is 25.
Since the random forest model adopted this time uses 39 estimators, and each estimator has more branches and a
large width.
Author name / Procedia Computer Science 00 (2019) 000–000 5
198 Maojun Zhang et al. / Procedia Computer Science 202 (2022) 194–202

Figure 1. The cumulative returns of different algorithms (before transaction costs)

3.2. Profitability over time

we display strategy performance over time from January 2013 to December 2019. The transaction cost is 1.5 ‰.
The evaluation indicators of the profitability in the three stages are shown in Table 1. From January 2013 to May
2015, the profitability of the statistical strategy is better than that of the HS300index in the same period with a daily
average return of 0.0036, while a daily average return of the HS300index is 0.0012. In addition, the annual return of
the statistical arbitrage strategy is 1.2438, while the annual return of the HS300index is 0.3143. The alpha value of
the strategy is 1.4862. In terms of risk, the annualized volatility of the statistical arbitrage strategy is 1.2483, while
the annualized volatility of the HS300index is 0.2216. The Sharp ratio of the statistical arbitrage strategy 2.1295,
while the Sharp ratio of the HS300index in the same period is 1.3451. According to the above analysis, although the
volatility of the strategy is higher than that of the HS300index, the statistical arbitrage strategy without transaction
cost is much better than that of the HS300index, After transaction costs, the fluctuation relationship between return
and risk of the statistical arbitrage strategy based on momentum factor and random forest is roughly equivalent to
that of the HS300index.
From May 2015 to August 2017, the profitability of the statistical arbitrage is better than that of the HS300index,
regardless of whether the transaction cost is deducted, with a daily average return of 0.0031, a daily average return
of 0.0016 after transaction costs, and a daily average return of the HS300index is -0.0002 in the same period. In
addition, during this period, the annualized return of statistical arbitrage strategy is 0.9331, while the annualized
return after transaction cost is 0.3242, while the annualized return of the HS300index is -0.0851. Compared with the
HS300index, the alpha value of statistical arbitrage strategy is 1.1764, and the alpha value after transaction costs is
0.4909. In terms of risk, the annualized volatility of the statistical arbitrage strategy is 0.4902, the annualized
volatility after transaction costs is 0.4895, and the annualized volatility of the HS300index is 0.2818. The Sharp
ratio of statistical arbitrage strategy is 1.5893, the Sharp ratio after transaction costS is 1.5893, while the Sharp ratio
of the HS300index is -0.1729. Considering the risk and return at the same time, not only the statistical arbitrage
strategy without transaction costs is much better than the HS300 index, but also after transaction costs is still much
better than the HS300index.
Maojun Zhang et al. / Procedia Computer Science 202 (2022) 194–202 199
6 Author name / Procedia Computer Science 00 (2019) 000–000

From August 2017 to December 2019, although the cumulative rate of return of the statistical arbitrage strategy
eventually exceeds the HS300 index before transaction costs, the statistical arbitrage strategy fluctuates greatly and
is inferior to the HS300 index in. The average daily return of the statistical arbitrage strategy is 0.0009, the average
daily return after transaction costs is -0.0006, while the average daily return of the HS300 index in the same period
is 0.0001. In addition, during this period, the annualized return of the statistical arbitrage strategy is 0.1448, while
the annualized return after transaction costs is -0.2158, while the annualized return of the HS300index is 0.0184.
Compared with the HS300index, the alpha value of the statistical arbitrage strategy is 0.2401, and the alpha value
after transaction cost is -0.1505. In terms of risk, the annualized volatility of the statistical arbitrage strategy is
0.4133, the annualized volatility after transaction costs is 0.4895, and the annualized volatility of the HS300index in
the same period is 0.4127. The Sharp ratio of the statistical arbitrage strategy based on momentum factors and
random forest is 0.5337, the Sharp ratio after transaction cost is -0.3823, while the Sharp ratio of the HS300 index in
the same period is -0.1908. The fluctuation relationship between return and risk of the statistical arbitrage strategy
after transaction costs is far worse than the HS300index. Therefore, it can be found that the statistical arbitrage
strategy can more accurately predict the rise and fall of the market from August 2017 to December 2019, but the
prediction ability is not enough to gain from the market.
Through the above empirical analysis of the statistical arbitrage strategy based on momentum factor and random
forest, the following conclusions can be summarized,: first, the statistical arbitrage strategy can obtain a return far
exceeding the HS300index from January 2013 to August 2017, and from August 2017 to December 2019, The
statistical arbitrage strategy can more accurately predict the rise and fall of the market, but the prediction ability is
not enough to make profits from the market. The return after transaction costs is worse than that of HS300index.
Second, the profit of statistical arbitrage strategy continues to decline with the extension of time, which may be due
to the optimal parameters selected by the parameters of random forest in the first research cycle.

3.3. Robustness test

A common method to test the robustness of machine learning algorithm is to change the accuracy of the algorithm
and observe whether the experimental results change when the dimension of parameter setting changes, so as to test
whether the model is stable. This paper changes the classification accuracy of the model by changing the number of
estimators in the random forest, and observes the change trend of cumulative return and Sharp ratio with the number
of estimators. As shown in Table 1, when the estimators of the statistical arbitrage policy based on momentum factor
and random forest is 37-41, the daily average return, annualized return, annualized volatility, alpha value, beta value,
Sharp ratio, sortino ratio and calmar ratio of the strategy are at a very stable level. When the number of estimators is
more than 41 or less than 37, the daily average return, annualized return, annualized volatility, alpha value, beta
value, Sharp ratio, Sortino ratio and calmar ratio of the strategy begin to fluctuate greatly. Therefore, it can be found
that when the number of estimators is 37 to 41, the random forest model constructed in this paper is in a stable state.
200 Maojun Zhang et al. / Procedia Computer Science 202 (2022) 194–202
Author name / Procedia Computer Science 00 (2019) 000–000 7

Table 1 Performance with different stages

Time 13/01-15/05 15/05- 17/08- 13/01- 15/05- 17/08- 13/01- 15/05- 17/08-
17/08 19/12 15/05 17/08 19/12 15/05 17/08 19/12

Before transaction cost After transaction cost The HS300index

Mean return 0.0036 0.0031 0.0009 0.0021 0.0016 -0.0006 0.0012 -0.0002 0.0001

Max 0.1238 0.1846 0.1075 0.1222 0.1829 0.1059 0.0461 0.0671 0.0595

Quartile 1 0.0187 0.0157 0.0145 0.0171 0.0141 0.0130 0.0080 0.0063 0.0068

Median 0.0049 0.0033 0.0009 0.0033 0.0018 -0.0006 0.0005 0.0008 0.0001

Quartile 3 -0.0104 -0.0098 -0.0135 -0.0119 -0.0113 -0.0150 -0.0061 -0.0049 -0.0064

Min -0.1206 -0.1356 -0.0928 -0.1219 -0.1369 -0.0941 -0.0770 -0.0875 -0.0584
Standard
deviation 0.0266 0.0309 0.0260 0.0266 0.0308 0.0260 0.0140 0.0178 0.0124
Skewness -0.5212 0.3569 -0.0143 -0.5212 0.3569 -0.0143 -0.2538 -1.0446 -0.0563
Kurtosis 2.8906 5.7621 1.5208 2.8906 5.7621 1.5208 3.0785 5.8067 2.7703
Annualized
return 1.2438 0.9331 0.1448 0.5371 0.3242 -0.2158 0.3143 -0.0851 0.0184
Annualized
volatility 0.4222 0.4902 0.4133 0.4216 0.4895 0.4127 0.2216 0.2818 0.1971
Cumulative
returns 5.0248 3.3262 0.3505 1.5993 0.8665 -0.4173 0.8357 -0.1793 0.0413
Alpha 1.4862 1.1764 0.2401 0.7032 0.4909 -0.1505 0.0000 0.0000 0.0000
Beta -0.0447 -0.0057 0.1399 -0.0447 -0.0057 0.1397 1.0000 1.0000 1.0000
Sharpe ratio 2.1295 1.5893 0.5337 1.2328 0.8171 -0.3823 1.3451 -0.1729 0.1908
Downside risk 0.2885 0.3185 0.2852 0.2988 0.3285 0.2971 0.1472 0.2195 0.1387
Sortino ratio 3.1167 2.4461 0.7734 1.7395 1.2177 -0.5309 2.0240 -0.2220 0.2713
Maximum
drawdown -0.2594 -0.4588 -0.4797 -0.2856 -0.4730 -0.5796 -0.2482 -0.4670 -0.3246
Calmar ratio 4.7951 2.0338 0.3018 1.8805 0.6855 -0.3723 1.2667 -0.1822 0.0566
Omega ratio 1.4437 1.3600 1.0955 1.2382 1.1718 0.9367 1.2673 0.9648 1.0339

Table 2 Robustness test

Estimators 36 37 38 39 40 41 42

Mean return 0.0018 0.0023 0.0025 0.0025 0.0023 0.0024 0.0021

Maximum 0.1846 0.2001 0.1846 0.1846 0.1846 0.1846 0.1846

Quartile 1 0.0155 0.0160 0.0155 0.0162 0.0157 0.0157 0.0158

Median 0.0029 0.0032 0.0032 0.0031 0.0033 0.0031 0.0028

Quartile 3 -0.0110 -0.0112 -0.0108 -0.0112 -0.0113 -0.0112 -0.0113

Minimum -0.1411 -0.1356 -0.1356 -0.1356 -0.1356 -0.1356 -0.1356

Standard dev. 0.0290 0.0280 0.0274 0.0279 0.0282 0.0283 0.0283

Skewness -0.1670 0.0528 0.0010 0.0156 -0.0345 0.1036 -0.0831

8 Maojun
Author Zhang
name et al. Computer
/ Procedia / Procedia Science
Computer00 Science 202 (2022) 194–202
(2019) 000–000 201

Kurtosis 4.4126 4.6122 4.1974 4.0709 4.0915 4.1059 3.9580

Annualized return 0.4011 0.6324 0.7078 0.7060 0.6229 0.6731 0.5254

Annualized volatility 0.4609 0.4441 0.4347 0.4434 0.4475 0.4489 0.4488

Alpha 0.5507 0.7977 0.8704 0.8784 0.7871 0.8409 0.6808

Beta 0.0527 0.0193 0.0341 0.0180 0.0374 0.0495 0.0384

Sharpe ratio 0.9636 1.3263 1.4496 1.4275 1.3069 1.3717 1.1664

Sortino ratio 1.3770 1.9570 2.1527 2.1255 1.9219 2.0520 1.6994

Max. drawdown -0.5344 -0.4985 -0.4420 -0.4797 -0.4681 -0.4282 -0.4935

Calmar ratio 0.7507 1.2686 1.6013 1.4717 1.3306 1.5719 1.0647

Omega ratio 1.1927 1.2697 1.2973 1.2922 1.2649 1.2809 1.2326

Downside risk 0.3226 0.3010 0.2928 0.2978 0.3043 0.3001 0.3081

4. Conclusion

This paper constructs the input features from the perspective of momentum, and explores the profitability of the
statistical arbitrage strategy based on momentum factors and random forest in China Stock Markets. Specifically,
when we predict the rise and fall of a stock after one day, we first need to calculate 33 indicators of a stock's daily
cumulative return and monthly cumulative return. After calculating all daily momentum factors and monthly
momentum factors, we need to further calculate the quantile of each momentum factor in all 84 stocks. The quantile
corresponding to each momentum factor is used as the input feature of the random forest to predict the rise and fall
of the stock.
This paper finds that the statistical arbitrage strategy based on momentum factors and random forest can obtain a
return far beyond the HS300index from January 2013 to August 2017. From August 2017 to December 2019, the
statistical arbitrage strategy can more accurately predict the rise and fall of the market, but the prediction ability is
not enough to make profits. In general, the statistical arbitrage strategy can obtain benefits beyond the market during
the trading period. However, during the whole empirical period, this profitability continues to decline with the
extension of trading time, but this decline in profitability is likely due to the change of the data set of the
corresponding training set with the change of training cycle, The rule of information in the data set may change
accordingly, and the random forest model is the best parameter selected according to the first training cycle. In
addition, the empirical study of this paper shows that the statistical arbitrage strategy is better than the traditional
momentum and reversal strategy. It covers the momentum information of the market in the past period, and contains
more information about the market, so it has stronger profitability.

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant Nos. 71961004,
72061007, 71461005, and Scientific Research of Suzhou University of Science and Technology Grant No.
332111807, 332111801.

References

[1] Chui A, Titman S, Wei K.C.J. Momentum, ownership structure, and financial crises: An analysis of Asian stock markets,Working paper,
University of Texas at Austin, 2000.
202 Maojun Zhang et al. / Procedia Computer Science 202 (2022) 194–202
Author name / Procedia Computer Science 00 (2019) 000–000 9

[2] Fischer T, Krauss C. Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational
Research, 2017, 270(2):654-669.
[3] Huck, N. Pairs selection and outranking: An application to the S&P 100 index[J]. European Journal of Operational Research, 2009, 196(2):
819–825.
[4] Huck, N. Pairs trading and outranking: The multi-step-ahead forecasting case[J]. European Journal of Operational Research, 2010, 207(3):
1702–1716.
[5] Huck N. Large data sets and machine learning: Applications to statistical arbitrage[J]. European Journal of Operational Research, 2019,
278(1):330-342.
[6] Jegadeesh N, Titman S. Returns to buying winners and selling losers: Implications for stock market efficiency. Journal of Finance,
1993,48:65-91.
[7] Jegadeesh N, Titman S. Profitability of Momentum Strategies: An Evaluation of Alternative Explanations. Journal of Finance, 2001, 02:699-
720.‘
[8] Krauss, C., Do, X. A., Huck, N. Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500. European
Journal of Operational Research, 2017,259(2), 689–702.
[9] Moritz B , Zimmermann T. Tree-Based Conditional Portfolio Sorts: The Relation between Past and Future Stock Returns. Available at
http://dx.doi.org/10.2139/ssrn.2740751, 2016.
[10] Pedro Barroso, Pedro Santa-Clara. Momentum has its moments. Journal of Financial Economics, 2015(116): 111-120.
[11] Rouwenhorst K.G. International momentum strategies. Journal of Finance, 1998,53:267-284
[12] Schiereck D, Weber B M. Behavioral Finance || Contrarian and Momentum Strategies in Germany. Financial Analysts Journal, 1999,
55(6):104-116.
[13] Takeuchi, L., Lee, Y.-Y. A. Applying deep learning to enhance momentum trading strategies in stocks. Working paper, Stanford, 2013.

Non-Exact Differential Equation: Integrating Factors
80% (10)
Non-Exact Differential Equation: Integrating Factors
7 pages
Unlucky 13 - James Patterson - Anna's Archive
No ratings yet
Unlucky 13 - James Patterson - Anna's Archive
4 pages
Week 1 Econ Unit Planner
100% (1)
Week 1 Econ Unit Planner
5 pages
Research Article Impact of Technical Indicators and Leading Indicators On Stock Trends On The Internet of Things
No ratings yet
Research Article Impact of Technical Indicators and Leading Indicators On Stock Trends On The Internet of Things
15 pages
Statistical Arbitrage Powered by Explainable Artificial Intelligence
No ratings yet
Statistical Arbitrage Powered by Explainable Artificial Intelligence
17 pages
Electronics
No ratings yet
Electronics
25 pages
Machine Learning Classification of Price Extrema B
No ratings yet
Machine Learning Classification of Price Extrema B
25 pages
Digital Twins: How Engineers Can Adopt Them To Enhance Performances
From Everand
Digital Twins: How Engineers Can Adopt Them To Enhance Performances
Isrin Ismail
No ratings yet
Applsci 13 01956
No ratings yet
Applsci 13 01956
27 pages
Applications of deep learning in stock market prediction Recent progress
No ratings yet
Applications of deep learning in stock market prediction Recent progress
22 pages
Forecasting price in a new hybrid neural network model with machine learning
No ratings yet
Forecasting price in a new hybrid neural network model with machine learning
12 pages
94ebe20e44219d2d80834f48336edbb981b5
No ratings yet
94ebe20e44219d2d80834f48336edbb981b5
29 pages
Survey of Feature Selection and Extraction Techniques For Stock Market Prediction
No ratings yet
Survey of Feature Selection and Extraction Techniques For Stock Market Prediction
25 pages
Computation 07 00004
No ratings yet
Computation 07 00004
20 pages
Journal.pone.0284695
No ratings yet
Journal.pone.0284695
19 pages
(IJCST-V10I5P49) :mrs R Jhansi Rani, C Nithin
No ratings yet
(IJCST-V10I5P49) :mrs R Jhansi Rani, C Nithin
8 pages
Statistical Modeling of High Frequency Datasets Using The ARIMA-ANN Hybrid2023
No ratings yet
Statistical Modeling of High Frequency Datasets Using The ARIMA-ANN Hybrid2023
17 pages
Research Article: An Empirical Study of Machine Learning Algorithms For Stock Daily Trading Strategy
No ratings yet
Research Article: An Empirical Study of Machine Learning Algorithms For Stock Daily Trading Strategy
31 pages
Using Machine Learning Algorithms On Prediction of Stock Price-SVR
No ratings yet
Using Machine Learning Algorithms On Prediction of Stock Price-SVR
16 pages
Stock Market Prediction Using Machine Learning
No ratings yet
Stock Market Prediction Using Machine Learning
5 pages
(IJCST-V10I2P3) :prof. Jogi John Saurabh Sonawne, Sagar Mankar, Dhanshri Wasu, Pranjali Rachchawar, Grishma Bhoyar
No ratings yet
(IJCST-V10I2P3) :prof. Jogi John Saurabh Sonawne, Sagar Mankar, Dhanshri Wasu, Pranjali Rachchawar, Grishma Bhoyar
6 pages
1-s2.0-S0957417422009150-main
No ratings yet
1-s2.0-S0957417422009150-main
14 pages
QFE-07-04-028
No ratings yet
QFE-07-04-028
26 pages
Machine Learning Approaches in Stock Price Prediction A Systematic Review
No ratings yet
Machine Learning Approaches in Stock Price Prediction A Systematic Review
11 pages
An efficient loss function and deep learning approach for ranking stock returns in the absence of prior knowledge (Yang et al., 2024)
No ratings yet
An efficient loss function and deep learning approach for ranking stock returns in the absence of prior knowledge (Yang et al., 2024)
16 pages
1-s2.0-S187705092500050X-main
No ratings yet
1-s2.0-S187705092500050X-main
12 pages
Stock Price Prediction Based On Procedural Neural
No ratings yet
Stock Price Prediction Based On Procedural Neural
11 pages
Empirical Research Nifty
No ratings yet
Empirical Research Nifty
13 pages
Thakkar 2021
No ratings yet
Thakkar 2021
17 pages
Real
No ratings yet
Real
5 pages
Zhao Et Al 2023
No ratings yet
Zhao Et Al 2023
9 pages
ssrn-4622722
No ratings yet
ssrn-4622722
22 pages
A REVIEW ON STOCK MARKET PREDICTION USING MACHINE LEARNING ALGORITHMS
No ratings yet
A REVIEW ON STOCK MARKET PREDICTION USING MACHINE LEARNING ALGORITHMS
25 pages
mokhtari-2021-ijca-9213471 (1)
No ratings yet
mokhtari-2021-ijca-9213471 (1)
9 pages
A Novel Data-driven Stock Price Trend Prediction System
No ratings yet
A Novel Data-driven Stock Price Trend Prediction System
10 pages
Stock Market Time Series Analysis
No ratings yet
Stock Market Time Series Analysis
12 pages
Hidden Markov Models for Stock Market Prediction
No ratings yet
Hidden Markov Models for Stock Market Prediction
7 pages
Scientific Programming - 2022 - Xiao - Research On Stock Price Time Series Prediction Based On Deep Learning and
No ratings yet
Scientific Programming - 2022 - Xiao - Research On Stock Price Time Series Prediction Based On Deep Learning and
12 pages
Stock Price Prediction Using Machine Learning With Python
No ratings yet
Stock Price Prediction Using Machine Learning With Python
13 pages
Reinforcment Learning in Stock Trading
No ratings yet
Reinforcment Learning in Stock Trading
13 pages
Final Thesis Report
No ratings yet
Final Thesis Report
30 pages
SSRN-id4371650
No ratings yet
SSRN-id4371650
51 pages
Stock Price Analysis and Prediction Using Machine Learning 2
No ratings yet
Stock Price Analysis and Prediction Using Machine Learning 2
6 pages
Automated Stock Price Prediction Using Machine Learning: Mariam Moukalled Wassim El-Hajj Mohamad Jaber
No ratings yet
Automated Stock Price Prediction Using Machine Learning: Mariam Moukalled Wassim El-Hajj Mohamad Jaber
9 pages
Sarmento 2020
No ratings yet
Sarmento 2020
13 pages
618248282
No ratings yet
618248282
11 pages
Soni 2022 J. Phys. Conf. Ser. 2161 012065
No ratings yet
Soni 2022 J. Phys. Conf. Ser. 2161 012065
11 pages
Issue 85 - Aug 2024 - Full Text Part 03
No ratings yet
Issue 85 - Aug 2024 - Full Text Part 03
1,291 pages
1 s2.0 S0950705121003828 Main
No ratings yet
1 s2.0 S0950705121003828 Main
14 pages
Machine-Learning Classification Techniques For The Analysis and P
No ratings yet
Machine-Learning Classification Techniques For The Analysis and P
292 pages
Stock Market Analysis Using Supervised Machine Learning
No ratings yet
Stock Market Analysis Using Supervised Machine Learning
4 pages
Data Mining For Algorithmic Asset Management - Montana
No ratings yet
Data Mining For Algorithmic Asset Management - Montana
13 pages
Exploring Machine Learning for Stock Price Prediction and Decision Making
No ratings yet
Exploring Machine Learning for Stock Price Prediction and Decision Making
4 pages
Stock Prediction Using Machine
No ratings yet
Stock Prediction Using Machine
13 pages
Stock Price Prediction With Optimized Deep LSTM Network With Artificial Rabbits Optimization Algorithm
No ratings yet
Stock Price Prediction With Optimized Deep LSTM Network With Artificial Rabbits Optimization Algorithm
16 pages
Stock Market Analysis Using Supervised Machine Learning: Kunal Pahwa Neha Agarwal
No ratings yet
Stock Market Analysis Using Supervised Machine Learning: Kunal Pahwa Neha Agarwal
4 pages
An Intelligent Statistical Arbitrage Trading System
No ratings yet
An Intelligent Statistical Arbitrage Trading System
14 pages
JETIR2501512
No ratings yet
JETIR2501512
6 pages
Share Market Analysis and Prediction
No ratings yet
Share Market Analysis and Prediction
5 pages
applsci-13-08356-v2
No ratings yet
applsci-13-08356-v2
18 pages
IJNRD2307048
No ratings yet
IJNRD2307048
5 pages
Data Science, AI, and Blockchain: Integrated Approaches
From Everand
Data Science, AI, and Blockchain: Integrated Approaches
Ekaaksh Deshpande
No ratings yet
Journal of Financial Economics - Charting by Machines
No ratings yet
Journal of Financial Economics - Charting by Machines
28 pages
Alcu Conductors ENG
No ratings yet
Alcu Conductors ENG
5 pages
Print Preview - Preliminary Application: Project Description
No ratings yet
Print Preview - Preliminary Application: Project Description
21 pages
DIRECTV Channel List
No ratings yet
DIRECTV Channel List
2 pages
Hydrogen Fluoride: Cautionary Response Information
No ratings yet
Hydrogen Fluoride: Cautionary Response Information
2 pages
Amazon Freight Europe
No ratings yet
Amazon Freight Europe
4 pages
Bachelor Thesis Themen Marketing
100% (2)
Bachelor Thesis Themen Marketing
7 pages
Trends in Food Science & Technology: S. Kalpana, S.R. Priyadarshini, M. Maria Leena, J.A. Moses, C. Anandharamakrishnan T
No ratings yet
Trends in Food Science & Technology: S. Kalpana, S.R. Priyadarshini, M. Maria Leena, J.A. Moses, C. Anandharamakrishnan T
13 pages
4470 Controller Manual Minikol M15S
No ratings yet
4470 Controller Manual Minikol M15S
19 pages
Basic Tutorial About CSS Program
No ratings yet
Basic Tutorial About CSS Program
2 pages
Coiled Tubing Drilling Manual
No ratings yet
Coiled Tubing Drilling Manual
140 pages
Landmark Arbitration Cases: Dubai Courts: Samer Abou Said
No ratings yet
Landmark Arbitration Cases: Dubai Courts: Samer Abou Said
64 pages
Manual Ep5 Spirax Sarco
75% (4)
Manual Ep5 Spirax Sarco
2 pages
Customer Credit Transfer Initiation Pain.001.001.07: HCL Corporation, Bangalore Has Received Below Invoice
No ratings yet
Customer Credit Transfer Initiation Pain.001.001.07: HCL Corporation, Bangalore Has Received Below Invoice
5 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
2 pages
Government Agencies of Educational System
No ratings yet
Government Agencies of Educational System
30 pages
Access Solution Manual for Intermediate Accounting 10th by Spiceland All Chapters Immediate PDF Download
No ratings yet
Access Solution Manual for Intermediate Accounting 10th by Spiceland All Chapters Immediate PDF Download
50 pages
Pruitt Igoe
No ratings yet
Pruitt Igoe
2 pages
Transformer Questions & Answers: Dry-Type Distribution Transformers Dry-Type Distribution Transformers
No ratings yet
Transformer Questions & Answers: Dry-Type Distribution Transformers Dry-Type Distribution Transformers
9 pages
MTH211A Problem Set-2
No ratings yet
MTH211A Problem Set-2
3 pages
San Beda LLB Curriculum PDF
No ratings yet
San Beda LLB Curriculum PDF
2 pages
Blockchain-Based E-Voting System: Project Report
50% (2)
Blockchain-Based E-Voting System: Project Report
53 pages
Graham & Clarke
No ratings yet
Graham & Clarke
41 pages
Shoe Socks Format PDF
No ratings yet
Shoe Socks Format PDF
9 pages
About B&B
No ratings yet
About B&B
9 pages
Bottles, Scott L. - L.A. and The Automobile
No ratings yet
Bottles, Scott L. - L.A. and The Automobile
170 pages
Welding Robot
No ratings yet
Welding Robot
2 pages
Advanced Accounting Testbank Questions
100% (1)
Advanced Accounting Testbank Questions
37 pages

Statistical Arbitrage With ML 1721555596

Uploaded by

Statistical Arbitrage With ML 1721555596

Uploaded by

Available online at www.sciencedirect.

International Conference on Identification, Information and Knowledge in the internet of

Statistical Arbitrage with Momentum Using Machine Learning

1877-0509 © 2022 The Authors. Published by ELSEVIER B.V.

1877-0509 © 2022 The Authors. Published by Elsevier B.V.

2. Data and Methodology

2.1. Data and features

3.1. Performance for different algorithms

Figure 1. The cumulative returns of different algorithms (before transaction costs)

3.2. Profitability over time

3.3. Robustness test

Table 1 Performance with different stages

Before transaction cost After transaction cost The HS300index

Table 2 Robustness test

Mean return 0.0018 0.0023 0.0025 0.0025 0.0023 0.0024 0.0021

Maximum 0.1846 0.2001 0.1846 0.1846 0.1846 0.1846 0.1846

Quartile 1 0.0155 0.0160 0.0155 0.0162 0.0157 0.0157 0.0158

Median 0.0029 0.0032 0.0032 0.0031 0.0033 0.0031 0.0028

Quartile 3 -0.0110 -0.0112 -0.0108 -0.0112 -0.0113 -0.0112 -0.0113

Minimum -0.1411 -0.1356 -0.1356 -0.1356 -0.1356 -0.1356 -0.1356

Standard dev. 0.0290 0.0280 0.0274 0.0279 0.0282 0.0283 0.0283

Skewness -0.1670 0.0528 0.0010 0.0156 -0.0345 0.1036 -0.0831

Kurtosis 4.4126 4.6122 4.1974 4.0709 4.0915 4.1059 3.9580

Annualized return 0.4011 0.6324 0.7078 0.7060 0.6229 0.6731 0.5254

Annualized volatility 0.4609 0.4441 0.4347 0.4434 0.4475 0.4489 0.4488

Alpha 0.5507 0.7977 0.8704 0.8784 0.7871 0.8409 0.6808

Beta 0.0527 0.0193 0.0341 0.0180 0.0374 0.0495 0.0384

Sharpe ratio 0.9636 1.3263 1.4496 1.4275 1.3069 1.3717 1.1664

Sortino ratio 1.3770 1.9570 2.1527 2.1255 1.9219 2.0520 1.6994

Max. drawdown -0.5344 -0.4985 -0.4420 -0.4797 -0.4681 -0.4282 -0.4935

Calmar ratio 0.7507 1.2686 1.6013 1.4717 1.3306 1.5719 1.0647

Omega ratio 1.1927 1.2697 1.2973 1.2922 1.2649 1.2809 1.2326

You might also like