0% found this document useful (0 votes)
10 views

1704647523654

This paper presents a statistical arbitrage strategy through pairs trading using two Dow Jones indices: the U.S. Pharmaceuticals Index and the U.S. Health Care Index. It employs copula modeling and LSTM machine learning techniques to identify trading signals based on statistical criteria derived from historical price data. The methodology includes a detailed pair selection process, testing for cointegration, Hurst exponent, and half-life, ultimately aiming to exploit mispricing opportunities in the selected indices.

Uploaded by

georgewu1999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

1704647523654

This paper presents a statistical arbitrage strategy through pairs trading using two Dow Jones indices: the U.S. Pharmaceuticals Index and the U.S. Health Care Index. It employs copula modeling and LSTM machine learning techniques to identify trading signals based on statistical criteria derived from historical price data. The methodology includes a detailed pair selection process, testing for cointegration, Hurst exponent, and half-life, ultimately aiming to exploit mispricing opportunities in the selected indices.

Uploaded by

georgewu1999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Statistical Arbitrage in Pairs Trading

1 Abstract ing, such as the distance and cointegration


approaches [10], are traditionally based on
This paper aims to exploit statistical ar- mean-reversion concepts [2]. More recently,
bitrage opportunities by constructing a pairs copula modeling techniques have been ap-
trading strategy using two indices from Dow plied to capture non-linear tail-move relations
Jones. We first explain our pair selection between various equities [5][12].
process, then construct a copula and LSTM In the paper, we investigate a pairs trading
models to develop statistical criteria for iden- strategy applying copula and machine learn-
tifying trading signals. Based on these crite- ing techniques to two Dow Jones sector eq-
ria, we separate our data into training and uity indices: the Dow Jones U.S. Pharmaceu-
testing sections and set up our trading strat- ticals Index (DJUSPR) and the Dow Jones
egy, which we then apply to the testing data U.S. Health Care Index (DJUSHC).
set. In the last part of the paper, we discuss
our results and the intuition behind them and
identify areas for further improvements. 3 Pair Selection
This section summarizes the underlying
2 Introduction data selection criteria for a pair of equity in-
dices.
”Markets can remain irrational longer than
you can remain solvent” is a famous quote
attributed to John Maynard Keynes. Many
investors would recognize LTCM as an exam-
ple of this adage, as the firm failed in the face
of liquidity risks. Yet, LTCM had generated
relatively high returns in their initial years
via their primary trading strategy, which was
to exploit mispricings – ironically, often aris-
ing because of differences in liquidity – be-
tween pairs of bonds sharing common funda-
mentals, which is an example of pairs trading.
A pairs trading strategy is a convergence
strategy in which investors seek to identify
temporary mispricings of two highly corre-
lated assets and then obtain profits from buy-
ing low and selling high [13].
The methods and techniques for pairs trad- Figure 1: Pair Selection Procedure

1
The data used in this paper are daily close 3.3 Hurst Exponent Test
prices, rolling 200-day simple moving aver-
We need to find the Hurst Exponents (H)
ages of those close prices, and daily volumes
of the selected indices [4], which can help us
of the selected pairs of indices sourced from
identify whether the normalized price time se-
Bloomberg Terminal. We select some sector
ries would follow a short and long-term mean-
equity indices from SP 500 and Dow Jones In-
reversion pattern. The Hurst Exponent test-
dustrial Indices and equity indices in the Chi-
ing function is described in the equation be-
nese market [1]. The dataset was chosen to
low,
cover eighteen years, from January 1st, 2005,
to December 31st, 2022, to ensure a broad
range of economic and market cycles is rep- Ri (n)
E[ ] = C · nH , where n → ∞, i = 1, 2
resented. Si (n)
Below is the four-step pair selection pro-
cess, which follows the procedure illustrated 3.4 Half-Life Test
in Figure 1 (above). Continuing from the data process above,
we need to find the mean reverting half-life
3.1 Data Normalization of the indices that passed through the Hurst
Exponent test [4]. The half-life measures the
We perform a data normalization process
time period that a price series of index reverts
by dividing the daily prices by the 200-day
to half of the difference between the divergent
simple moving average (SMA200) to smooth
value and mean. The half-life is calculated
and construct a stationary data set for each
with the equation below,
index pair. We compute the daily and nor-
malized daily returns and denote the normal-
∆ = p(t) – p(t − 1)
ized prices of the indices as S1t and S2t .
pt−1 = α + β∆ + ϵ
3.2 Cointegration Test where ∆, ϵ are vectors
We construct a series of differences in the
Cointegration offers a means to identify a daily normalized price. The next step is cre-
stationary linear relationship between mul- ating a daily normalized price series with one
tivariate time series. We tested for cointe- lag. Then we construct a regression of lagged
gration between index pairs using an Engle- price dependent on delta (independent) and
Granger two-step approach [6]. The beta of a result in the least-square solution of the re-
linear combination of two indices’ normalized gression. Lastly, we compute the half-life of
price series has to be stationary and explained each index in the pair with the formula below:
by the formula below,
ln(2)
S1t = µt + βS2t t1 =
2 β
We then run an ordinary least squares If the calculated half-time of each index is
(OLS) regression to estimate beta based on longer than one week and shorter than one
two normalized price time series data. Lastly, year, the pair will pass the test for half-life. It
we perform a stationary test on the estimated is optimal for our analysis since it is relatively
residual term µt . If the p-value of the regres- easy for the spread to converge. It gives the
sion is less than 0.05, the pair will pass the pairs trading strategy enough time to capture
cointegration test. the fluctuations in spread movements.

2
3.5 Selection Process Return development of a pairs trading strategy to
HC Pharm p-value capture returns from the mispricing oppor-
Hurst Exponent 0.26 0.27 tunities [7].
Half Life 46.7 52.8
Cointegration 4.21e-05 4.1 Copula Methodology
Based on the above-mentioned metrics, we Firstly, we use t-distributions to approxi-
selected Dow Jones U.S. Pharmaceuticals In- mate the normalized price distributions [7].
dex (DJUSPR) and Dow Jones U.S. Health A linear trading strategy can only generate
Care Index (DJUSHC) is the most significant accurate trading signals when the price series
paired price series to perform our pairs trad- have normalized distribution, which rarely
ing strategy. happens in the financial markets. We make
The above chart illustrates that this pair use of a copula approach to identify depen-
passes our selection metrics, as it has a low dency features and trading signals for our
p-value for the cointegration test, and the pairs trading strategy to better account for
two indices both have low Hurst exponents. both linear and non-linear dependencies be-
DJUSHC (HC) has a half-life of 46.7, and tween two selected indices [13].
DJUSPR (Pharm) has a half-life of 52.8. The
half-life of HC and Pharm not only passes
our test but also is relatively similar, which
means the two indices have a similar trend of
divergence and convergence.

Figure 3: DJUSHC Fitted t-distribution

Figure 2: Normalized Price by 200SMA

4 Methodology
This section summarizes the construction
of a copula model applying the Gaussian Cop-
ula approach, a machine learning model oper-
ating the Long short-term memory (LSTM)
method to determine trading signals and the Figure 4: DJUSPR Fitted t-distribution

3
We denote the normalized daily returns of
our two indices by RHC (t) and RP harm (t), re- ∂C ∂C
,
spectively. Based on Sklar’s theorem (1958), ∂F (RHC ) ∂F (RP harm )
we can derive a copula function C between Since the statistical tools could not dis-
the fitted continuous marginal distribution cover the exact joint distribution and the
functions FHC and FP harm . The correspond- copula function in Python, we developed an
ing joint distribution function J of the return alternative method to simulate the signal-
series is below, selecting process. We utilize the correlation
and degree of freedom generated by the cop-
ula to fit t-distributions of returns of the two
C(F (RHC ), F (RP harm )) = J(RHC , RP harm )
indices. The simulated spreads will be com-
We use Python’s statsmodels and copulas puted as the difference between simulated re-
packages to fit a Gaussian copula to the joint turns. The spread series fit well compared to
distribution of the two normalized index re- the observed spreads, as illustrated in Figure
turns, as illustrated in Figure 5. The simu- 6. We will use the simulated spreads to de-
lated distribution works sufficiently with the sign the trading signals in our strategy, which
observed return distribution. will be explained further in the Trading Sig-
The next step is to find the condi- nal part.
tional probability function of the two
indices’ returns FHC|P harm (HC) and
FP harm|HC (P harm) below,

FHC|P R (HC) = P (RHC < rHC | RP R = rP R )

FP R|HC (P R) = P (RP R < rP R | RHC = rHC )

Figure 6: Simulated Spread and Original


Spread

4.2 Machine Learning


In recent years, Neural network approaches
have been widely used to handle financial
Figure 5: Simulated Distribution Copula and data due to their strong ability to apply
Simulated Spreads pattern recognition. Furthermore, Recurrent
Neural Network (RNN) is the most suitable
The conditional probabilities can be calcu- structure in our case due to the ability to
lated from the first-order partial derivatives incorporate calculations from the previous
of the copula function below, times in the sequence [8].

4
Among the different RNN structures,
LSTM has been proven effective in most sce-
narios due to its delicate architecture design
[11]. The design of units, such as the input
gate, forget gate, and output gate, has made
the LSTM capable of remembering past cal-
culations, to utilize the more effective calcu-
lations to improve the overall loss function.
So, the LSTM can unravel the issue of vanish-
ing gradients by improving the model’s over-
all structure [11].
Thus, we choose LSTM as our model for
realized volatility prediction on our selected
indices, HC and Pharm. Compared with
the machine learning approaches that assume
a linear relationship, the LSTM considers a
nonlinear relationship that incorporates vari-
ables from both current and historical time.
The return movements of the indices are con- Figure 7: LSTM Model Process
sidered time series with certain levels of au-
tocorrelation. Thus, the LSTM method is a
good choice for handling such data since it inal inputs. Then, the new features will be
can capture the relationship between differ- implemented into the first LSTM layer in
ent time points of the sequence. the next step. The outputs generated from
We design a feature engineering process. our LSTM model will be transferred into a
For instance, we calculate the intercept α and Dropout layer. We randomly assigned input
coefficient β of a linear regression model as to zero in the first Dropout layer to prevent
the return depends on the volatility of HC our neuron network from memorizing a fixed
and Pharm. We use the computed intercept pattern. Then, the new output generated
and coefficient as features in LSTM while we from the first Dropout layer will be imple-
consider the returns of indices will follow a mented into the second LSTM and the second
random walk process. Dropout layer afterward. Lastly, we place the
Alpha is the drift term µ, and beta is the data into a dense layer and build estimated
volatility term σ. In LSTM, our team applied volatilities as the model outputs. We apply
lagged volatility, volume, and other charac- our LSTM model on both indices separately.
teristics as model features. We then prepro- We used mean squared error (MSE) as the
cess the data by eliminating outliers, using evaluation metric for the machine learning
mean to supersede NaN (Not a Number) val- model’s performance. Compared with other
ues, and normalizing all the features. metrics, MSE has a relatively higher sen-
The input layer is a 5 × 12 matrix, which sitivity to large errors, which will help us
means the LSTM will consider five days of ensure we don’t generate outlier predictions
data in all twelve selected features. The that misguide our trading activities. Eventu-
inputs are arranged into a one-dimensional ally, our model gets a loss value of 64, shown
Convolution layer, where we conduct 1-D and in Figure 8, and the fitting process indicates
horizontal feature extractions from the orig- that the loss value is gradually decreasing.

5
the data we have in the training set, which
means the two indices have similar driving
factors. We believe the fluctuations in the
volatilities constitute a significant cause of
the divergence besides fundamental changes
happening in the market.
After both conditional tests, if the P-value
is less than 0.05, the trading signal will be 1.
Otherwise, it will be 0.

Figure 8: Loss Function


4.4 Trading Strategy
We divide the data into training and test-
4.3 Trading Signal ing sets to ensure both contain excited and
calm market periods. We utilize data from
We use the Gaussian Copula and LSTM January 1st, 2005, to December 31st, 2017,
Methods to identify trading signals. As as the training set; data from January 1st,
the algorithm simulates new Student’s t- 2018, to December 31st, 2022, as the testing
distributions based on the copula outputs [7], set.
the corresponding Z-score is computed to de- We use these normalized return data to
termine whether a trading signal exists in our construct the copula model. We will re-
paired indices. estimate the copula model for each day in
the testing sample with a new pair of daily
observed − E(simulated)
z= returns. In this approach, the correlation
σsimulated between the two indices can be automat-
If the Z-score is outside the 99% confi- ically updated. We also simulate new t-
dence interval, ± 2.58, the algorithm can distributions and encounter the correspond-
identify divergence signals between the two ing Z-scores to determine whether a trading
selected indices. If the data achieves the Z- signal exists daily in the testing set.
score requirement, we utilize the daily esti-
mated volatilities from the LSTM model. Af-
ter the divergence is detected by achieving the
Z-score requirement, we utilize the daily es-
timated volatilities from the LSTM model to
construct an F-test [3] with a 95% confidence
interval, function is shown below,
2
σHC
F-Score = 2 ,
σP harm
2
where σHC , σP2 harm are variances of the two
indices.
We use F-test to identify if there is a sta- Figure 9: Correlation for the Test Set
tistically significant difference between the
volatilities of the two indices. The correla- Figure 9 shows the dynamic correlations of
tion between the two indices is 0.93 based on the testing set, where an unexpected surge

6
of daily correlation of 0.005 occurred during To specify the convergence condition of the
the beginning of the COVID Pandemic. The paired indices, the strategy will process the
chart indicates an average correlation of 0.93 tests mentioned above as the Z-score will fall
over the testing set. Yet, over the entire test- back to the 99% confidence interval, and the
ing data set, the correlation only changes by F-test will be passed as the P-value larger
0.02, which we consider insignificant. From than 0.05.
the analysis of the dynamic correlation gener- To ensure our strategy remains solvent, we
ated by our algorithm, we conclude that only designed a stop-loss trigger for our pairs trad-
one of the selected pairs will generate signifi- ing strategy to protect us from unexpected
cant price movement at each time. market developments. The implementation
For simplicity, at the beginning of our trad- of the stop-loss trigger is to utilize the cu-
ing period, we ensure the nominal values of mulative return during the trading cycle. If
DJUSHC (HC) and DJUSPR(Pharm) are the the cumulative return in the trading cycle is
same, e.g., $1 for each index. In our algo- larger than -25%, the algorithm will imme-
rithm, the difference between the daily re- diately close the positions regardless of the
turns of HC and Pharm will be generated, trading signal remaining as 1. From the pre-
noted as the absolute spread of returns. vious half-life test in the pair selection sec-
tion, we identify the mean reversion half-life
of the HC and Pharm as 46.7 and 52.8 days.
Abs Spread = |HC RetDaily −Pharm RetDaily | As the pairs trading strategy helps to hedge
sector and market risk, we remain risk averse.
The strategy considers successive daily re- We tracked the 5-year annualized historical
turns until a mispricing signal is observed. As volatility of the S&P 500 from the Bloomberg
the trading signals are detected, we iterate Terminal as 17.17% and set the risk-averse
over the normalized prices of the paired in- constant to 2. With the function below,
dices to determine the directions of the trad-
ing position. For example, if the normalized
Daily Volatility × Half-Life
price of HC > Pharm, the direction will be Stop Loss =
noted as 1; otherwise, the direction will be Risk Aversion
noted as -1. With the multiplication between This stop-loss trigger can immunize us
the absolute spread calculated by the above from potentially significant losses in edge
function and the direction signal, we could cases. For instance, if the two indices take an
ensure our trading strategy follows the ”buy unexpectedly long time to converge compared
low, sell high” method, as the trade will al- to the expected mean-reverting half-life, our
ways go long the low return index and short stop-loss trigger will effectively mitigate our
the high return index. strategy away from the unexpected market
The strategy continues to operate while volatility.
the two indices remain in divergence. In
this case, we keep our positions open and
record the cumulative return during the pe- 5 Strategy Results
riod. Once the algorithm indicates an exiting
signal, where the trading signal turns to 0 In this section, we will interpret the results
when the price movement of the paired in- of our trading strategy applied in the testing
dices converges back, we close out our po- set.
sitions and wait for the next trading signal. Cumulative returns, and drawdowns with

7
different confidence intervals, are shown be-
low

Figure 12: Drawdowns with 99% CI

Figure 10: Cumulative Returns of the Port- 0.0298 and annualized Sharpe ratio is 0.473.
folio with Different CI The max drawdown during the testing pe-
riod is illustrated in Figure 12. The highest
max drawdown appears in 2022 due to the
high volatility in the market, with a value of
13.8%.

Figure 11: Drawdowns with Different CI

Based on the 99% confidence level copula


model and 95% confidence level F-test model Figure 13: Cumulative Return and Normal-
on volatilities discussed above, we identified ized Price with 99% CI
208 trading signals in the testing set. After
applying our trading strategy, we obtained a
5-year cumulative return of 17.96%, as illus-
trated in Figure 13. The average annualized 6 Discussion
return is approximately 3.36%, benchmarked
to the S&P 500 index average Index 5-year This section analyzes the results generated
annual price return of 8.18%, as shown in Fig- from our pairs trading strategy and compares
ure 10. the results for different confidence level sce-
We also compute some performance met- narios.
rics for our trading strategy. Assuming the Although our cumulative return is less than
risk-free rate is zero, the daily Sharpe ratio is the return of S&P 500 index, we believe a

8
trading strategy is a useful tool for risk diver- addition to this, we can also try more types
sification in investing in a portfolio containing of non-linear copula functions to find the best
financial instruments related to the S&P 500 function that could describe the joint proba-
index. We computed the correlation between bility distribution of the two indexes.
the returns of our trading strategy and the
returns of S&P 500 index. The correlation is
-0.1761, showing a negative relationship. In References
Figure 10, we can see that, when S&P 500
[1] Avellaneda, Marco, and Jeong-Hyun Lee.
index experienced a sharp decrease in cumu-
Statistical Arbitrage in the US Equities
lative returns in 2019 and 2020, the cumu-
Market. Quantitative Finance, vol. 10, no.
lative return of our strategy retains a stable
7, Aug. 2010, pp. 761–82. https://doi-
increasing pattern.
org/10.1080/14697680903124632.
We experimented with our trading strat-
egy with 90% and 95% confidence intervals to [2] Bao, Yong, et al. Distribution of the
compare with our choice of using 99%. The Mean Reversion Estimator in the Orn-
algorithm generates 412 and 320 trading sig- stein–Uhlenbeck Process. Econometric
nals, respectively. The cumulative return un- Reviews, vol. 36, no. 6–9, July 2017,
der these two cases is presented in Figure 10. pp. 1039–56. https://doi-org/10.1080
The 99% confidence scenario has the highest /07474938.2017.1307977.
cumulative return because only a large diver-
gence can pass the Z-score test for 99% confi- [3] Bogomolov, Timofei. Pairs Trad-
dence level, resulting in the most accurate de- ing Based on Statistical Variability
tection of trading signals. The cumulative re- of the Spread Process. Quantita-
turn of 90% confidence interval is higher than tive Finance, vol. 13, no. 9, Sept.
that of 95% confidence interval. We believe 2013, pp. 1411–30. https://doi-
there will be more false positive signals if we org/10.1080/14697688.2012.748934.
widen the confidence level.
[4] Cho, Poongjin, and Minhyuk Lee. Fore-
casting the Volatility of the Stock Index
with Deep Learning Using Asymmetric
7 Future research Hurst Exponents. Fractal Fractional, vol.
The large number of hyperparameters in 6, no. 7, July 2022, p. 394. https://doi-
the LSTM model leads to a slow training pro- org/10.3390 /fractalfract6070394.
cess. Thus, grid search and random search [5] Krauss, Christopher. Statistical arbi-
are not suitable for the hyperparameter- trage pairs trading strategies: Review
tuning process of LSTM. In the future, we and outlook. IWQW Discussion Pa-
can use the Bayesian method for the tunning pers, No. 09/2015, Friedrich-Alexander-
process. The Bayesian method finds the opti- Universität Erlangen-Nürnberg, Insti-
mal hyperparameter combination by estimat- tut für Wirtschaftspolitik und Quan-
ing a prior probability distribution of the op- titative Wirtschaftsforschung (IWQW),
timal hyperparameters and continues to up- Nürnberg, 2015.
date by training the model on different val-
ues to get the model’s actual performance [9]. [6] Lee, Hyejin, and Junsoo Lee. More
This method reduces the number of hyperpa- Powerful Engle–Granger Cointegration
rameter scenarios that need to be tried. In Tests. Journal of Statistical Computation

9
and Simulation, vol. 85, no. 15, 2014, [13] Nadaf, Tayyebeh, et al. Revisiting
pp. 3154–3171., https://doi.org/10.1080 the Copula-Based Trading Method Using
/00949655.2014.957206. the Laplace Marginal Distribution Func-
tion. Mathematics (2227-7390), vol. 10,
[7] Luo, Xiaolin, and Pavel V. Shevchenko. no. 5, Mar. 2022, p. 783. https://doi-
The t Copula with Multiple Parameters org/10.3390 /math10050783.
of Degrees of Freedom: Bivariate Charac-
teristics and Application to Risk Manage-
ment. Quantitative Finance, vol. 10, no.
9, Nov. 2010, pp. 1039–54. https://doi-
org/10.1080 /14697680903085544.

[8] Pawar, K., Jalem, R.S., Tiwari, V. Stock


Market Price Prediction Using LSTM
RNN. Advances in Intelligent Systems
and Computing, vol 841. Springer, Singa-
pore. https://doi.org/10.1007 /978-981-
13-2285

[9] Putatunda, Sayan, and Kiran Rama.


A modified bayesian optimization based
hyper-parameter tuning approach for
extreme gradient boosting. 2019 Fif-
teenth International Conference on
Information Processing (ICINPRO).
IEEE, 2019. https://doi.org/10.1016
/j.sigpro.2022.108826

[10] Rad, Hossein, et al. The profitabil-


ity of pairs trading strategies: dis-
tance, cointegration and copula meth-
ods. Quantitative Finance, vol. 16, no.
10, Oct. 2016, pp. 1541–58. https://doi-
org/10.1080 /14697688.2016.1164337.

[11] Sherstinsky, Alex. Fundamentals of


Recurrent Neural Network (RNN) and
Long Short-Term Memory (LSTM)
Network. Physica D, vol. 404, Mar.
2020, p. N.PAG. https://doi-org/10.1016
/j.physd.2019.132306.

[12] Stübinger, Johannes, et al. Statistical


Arbitrage with Vine Copulas. Quantita-
tive Finance, vol. 18, no. 11, Nov. 2018,
pp. 1831–49. https://doi-org/10.1080
/14697688.2018.1438642.

10

You might also like