6 - Enhancing a Pairs Trading Strategy With ML
6 - Enhancing a Pairs Trading Strategy With ML
Abstract—Pairs Trading is one of the most valuable market- first problem motivating this research work. In section IV
neutral strategies used by hedge funds. It is particularly interest- we propose a new trading model in response to the second
ing as it overcomes the arduous process of valuing securities by problem on the origin of this work. Next, in section V
focusing on relative pricing. By buying a relatively undervalued
security and selling a relatively overvalued one, a profit can we design the simulation environment to test the proposed
be made upon the pair’s price convergence. However, with the approaches, for which the results are presented in section VI.
growing availability of data, it became increasingly harder to find
rewarding pairs. In this work we address two problems: (i) how II. BACKGROUND AND R ELATED W ORK
to find profitable pairs while constraining the search space and Each stage composing a Pairs Trading strategy is described
(ii) how to avoid long decline periods due to prolonged divergent in detail along with the most relevant related work.
pairs. To manage these difficulties, the application of promising
Machine Learning techniques is investigated in detail. We propose A. Pairs Selection
the integration of an Unsupervised Learning algorithm, OPTICS,
to handle problem (i). The results obtained demonstrate the The pairs selection stage encompasses (i) finding the ap-
suggested technique can outperform the common pairs’ search propriate candidate pairs and (ii) selecting the most promising
methods, achieving an average portfolio Sharpe ratio of 3.79, in ones.
comparison to 3.58 and 2.59 obtained by standard approaches. Starting with (i), the investor should select the securities
For problem (ii), we introduce a forecasting-based trading model,
capable of reducing the periods of portfolio decline by 75%. Yet, of interest (e.g stocks, ETFs, etc) and search for possible
this comes at the expense of decreasing overall profitability. The combinations. In the literature, two methodologies are typi-
proposed strategy is tested using an ARMA model, an LSTM and cally suggested for this stage: performing an exhaustive search
an LSTM Encoder-Decoder. This work’s results are simulated for all possible combinations among the selected securities,
during varying periods between January 2009 and December or grouping them by sector, and constrain the combinations
2018, using 5-minutes price data from a group of 208 commodity-
linked ETFs, and accounting for transaction costs. to pairs formed by securities within the same sector. While
Index Terms—Pairs Trading, Market Neutral, Machine Learn- the former may find more unusual interesting pairs, the lat-
ing, Deep Learning, Unsupervised Learning ter reduces the likelihood of finding spurious relations. For
example, [1, 2] impose no restriction on the universe from
I. I NTRODUCTION which to select the pairs. Contrarily, some research work, as
Pairs Trading is a popular trading strategy widely used by [3–5] arranges the securities on category groups and select
hedge funds and investment banks. It is capable of obtaining pairs within the same group.
profits irrespective of the market direction. Concerning (ii), the investor must define what criteria should
This is accomplished with a two-step procedure. First, a be used to select a pair. The most common approaches are the
pair of assets whose prices have historically moved together is distance, correlation, and cointegration approaches.
detected. Then, assuming the equilibrium relationship should The distance approach, suggested in [3], selects pairs which
persist in the future, the spread between the prices of the minimize the historic sum of squared distances between the
two assets is monitored and in case it deviates from its two assets’ price series. This method is widely used but
historical mean the investor shorts the overvalued asset and according to [6] it is analytically sub optimal. If pi,t is a
buys the undervalued one. Both positions are closed upon price realization of the normalized price process Pi = (Pi,t )t∈T
convergence. of an asset i, the average sum of squared distances ssdPi ,Pj
However, with the growing availability of data, it is be- in the formation period1 of a pair formed by assets i and j is
coming increasingly harder to find robust pairs. In this work, given by
T
we address two problems in specific: (i) how to find profitable 1X 2
ssdPi ,Pj = (pi,t − pj,t ) . (1)
pairs while constraining the search space and (ii) how to avoid T t=1
long decline periods due to prolonged divergent pairs.
Thus, an optimal pair would be one that minimizes Eq.(1).
The remainder of this document is organized as follows: in
However, this implies a zero spread pair is considered optimal
section II we introduce the main concepts of Pairs Trading
while describing the associated research work. In section III 1 The formation period corresponds to the period in which securities are
we suggest a new pairs selection framework to address the being analyzed to form potential pairs.
which logically may not be as it would not provide trading based on Neural Network Generalized Autoregressive Condi-
chances. tional Heteroskedasticity (GARCH) models for modeling the
The application of Pearson correlation as a selection metric mispricing-correction mechanism between relative prices com-
is analyzed in [7]. The authors examine its application on posing a pair. Huck [13], Huck [14] uses RNNs to generate a
return series with the same data sample used in [3] and find one-week ahead forecast, from which the predicted returns are
that correlation shows better performance, with a reported calculated. Lastly, Krauss et al. [1] analyze the effectiveness
monthly average of 1.70% raw returns, almost twice as high of deep neural networks, gradient-boosted-trees and random
as the one obtained using the distance approach. Nevertheless, forests in the context of statistical arbitrage using S&P 500
this criteria is not foolproof as two return level correlated stocks. Apart from this, Machine Learning techniques still
securities might not share an equilibrium relationship, and remain fairly unexplored in this field and the results obtained
divergence reversions cannot be explained theoretically. indicate this is a promising direction for future research.
At last, the cointegration approach entails selecting pairs
for which the two constituents are found to be cointegrated. If III. P ROPOSED PAIRS S ELECTION F RAMEWORK
two securities, Yt and Xt are found to be cointegrated, then
At this research stage we aim to explore how one investor
by definition, the series constructed as
may find promising pairs without exposing himself to the
St = Yt − βXt , (2) adversities of the common pairs searching techniques. On
the one hand, if the investor limits its search to securities
where β is the cointegration factor, must be stationary. Defin-
within the same sector he is less likely to find pairs not yet
ing the spread series in this way is particularly convenient
being traded in large volumes, leaving a small margin for
since under these conditions the spread is expected to be mean-
profit. But on the other hand, if the investor does not impose
reverting, meaning that every spread divergence is expected to
any limitation on the search space, he might have to explore
be followed by convergence. Hence, this approach finds econo-
excessive combinations and possibly find spurious relations.
metrically more sound equilibrium relationships. The most
We intend to reach an equilibrium with the application of
cited work in this field is [8], that proposes a set of heuristics
an Unsupervised Learning algorithm, on the expectation that
for cointegration based strategies. Furthermore, [9] performs a
it will infer meaningful clusters of assets from which to select
comparison study between the cointegration approach and the
the pairs.
distance approach and finds that the cointegration approach
significantly outperforms the distance method.
A. Dimensionality reduction
B. Trading Models The first step towards this direction consists in finding
The most common trading model follows from [3], and can a compact representation for each asset, starting from its
be described as indicated below: price series. The application of Principal Component Analysis
i Calculate the pair’s spread (St = Yt − Xt ) mean, µs , and (PCA) is proposed. PCA is a statistical procedure that uses an
standard deviation, σs , during the formation period. orthogonal transformation to convert a set of observations of
ii Define the model thresholds: the threshold that triggers possibly correlated variables into a set of linearly uncorrelated
a long position, αL , the threshold that triggers a short variables, the principal components. Each component can be
position, αS , and the exit threshold, αexit , that defines seen as representing a risk factor. We suggest the application
the level at which a position should be exited. of PCA in the normalized return series, defined as
iii Monitor the evolution of the spread, St , and control if
Pi,t − Pi,t−1
any threshold is crossed. Ri,t = , (3)
Pi,t−1
iv In case αL is crossed, go long the spread by buying Y and
selling X. If αS is triggered, short the spread by selling where Pi,t is the price series of a asset i. Using the price series
Y and buying X. Exit position when αexit is crossed. could result in the detection of spurious correlations due to
The simplicity of this model is particularly appealing, mo- underlying time trends. The number of principal components
tivating its frequent application in the field. Nonetheless, the used defines the number of features for each asset represen-
entry points defined may not be optimal since no information tation. Considering that an Unsupervised Learning algorithm
concerning the spread subsequent direction is incorporated will be applied to these data, the number of features should not
in the trading decision. Some efforts have emerged trying be large. High data dimensionality presents a dual problem.
to propose more robust models. Techniques from different The first being that in the presence of more attributes, the
fields, such as stochastic control theory, statistical process likelihood of finding irrelevant features increases. Additionally,
modelling and Machine Learning have been studied. In par- there is the problem of the curse of dimensionality, caused
ticular, the results obtained by Machine Learning approaches by the exponential increase in volume associated with adding
have proved very promising. Dunis et al. [10, 11] explore extra dimensions to the space. According to [15], this effect
the application of Artificial Neural Networks to forecast the starts to be severe for dimensions greater than 15. Taking this
spread change for two famous spreads. Thomaidis et al. into consideration, the number of PCA dimensions is upper
[12] propose an experimental statistical arbitrage system bounded at this value and is chosen empirically.
B. Unsupervised Learning clustering
Having constructed a compact representation for each asset,
a clustering technique may be applied. To decide which algo-
rithm is more appropriate, some problem-specific requisites
are first defined:
– No need to specify the number of clusters in advance.
– No need to group all securities.
– Strict assignment that accounts for outliers.
– No assumptions regarding the clusters’ shape. Fig. 1. Clusters with varying density. Adapted from: [17]
E. Evaluation metrics
Regarding the trading evaluation, we propose analyzing the
strategy Return on Investment (ROI), Sharpe Ratio (SR) and
the portfolio Maximum Drawdown (MDD).
The ROI is calculated as the net profit divided by the initial
capital, which we enforced to be $1.
The portfolio SR is calculated as
Rport − Rf As expected, when no restrictions are imposed in the search
SR year = × annualization factor, (8) space, a larger set of ETFs emerges and consequently more
σport
pairs are selected. Contrarily, when grouping ETFs in five
where Rport represents the daily portfolio returns and Rf the partitions (according to the categories described in section
risk-free rate6 . The portfolio volatility, σport , is calculated as V-C) there is a reduction in the number of possible pair
v combinations. This is not more evident due to the underlying
uN N
uX X unbalance across the categories considered. Because energy
σport = t ωi cov(i, j)ωj , (9) linked ETFs represent close to half of all ETFs, the combi-
i=1 j=1 nations within this sector are still vast. Lastly, the number of
possible pair combinations when using OPTICS is remarkably
where wi is the relative weight of asset i in the portfolio.
lower. Although the number of clusters is higher than when
The annualization factor is set according to the methodology
grouping by category, their smaller size results in fewer
proposed by Lo [27] (Table 2 in [27]), to prevent imprecise
combinations. We proceed to analyze in more detail the results
approximations.
obtained with this algorithm.
6 The average the 3-Month treasury bill rate, taken from [26], during the
The results concerning the OPTICS application are obtained
corresponding test period and converted to a daily basis for consistency with using five principal components to describe the data. We
the formula. empirically verified that up to the 15-dimensions boundary
(motivated in section III-A) the results are not significantly
affected. We adopt 5 dimensions since we find more adequate
to settle the ETFs’ representation in a lower dimension pro-
vided that there is no evidence favoring higher dimensions.
To validate the clusters formed and get an insight into their
composition we examine the results obtained in the period
of Jan 2014 to Dec 20177 . To represent the clusters in a 2-
D setting, the data must be reduced from 5 dimensions. We
consider the application of t-SNE [29] for this purpose. Figure
8 illustrates the clusters formed. The ETFs not clustered are
represented by the smaller circles, which were not labeled to (a) Normalized prices in Cluster 1.
facilitate the visualization.
TABLE IV
F ORECASTING RESULTS COMPARISON .
TABLE V
T RADING RESULTS COMPARISON USING A 8- YEAR - LONG FORMATION
PERIOD .
Each spread in Figure 10 is fitted by the forecasting al- The results indicate that if robustness is evaluated by the
gorithms. The forecasting score is obtained by averaging the number of days the portfolio value does not decline (accentu-
mean-square error (MSE) over the five spreads. ated in Table V), then the proposed trading model does provide
an improvement. The forecasting-based models display a total [3] E. Gatev, W. N. Goetzmann, and K. G. Rouwenhorst, “Pairs trading:
of 2 (LSTM), 11 (ARMA) and 22 (LSTM Encoder-Decoder) Performance of a relative-value arbitrage rule,” The Review of Financial
Studies, vol. 19, no. 3, pp. 797–827, 2006.
days of portfolio decline, in comparison with 87 days obtained [4] B. Do and R. Faff, “Does simple pairs trading still work?” Financial
when using the standard model. This finding suggests the Analysts Journal, vol. 66, no. 4, pp. 83–95, 2010. [Online]. Available:
forecasting-based model is capable of defining more precise https://doi.org/10.2469/faj.v66.n4.1
entry points, and hence reduce the number of unprofitable [5] C. L. Dunis, G. Giorgioni, J. Laws, and J. Rudy, “Statistical arbitrage
and high-frequency data with an application to eurostoxx 50 equities,”
days. However, that comes at the expense of a reduction in Liverpool Business School, Working paper, 2010.
both portfolio SR and ROI, questioning the benefits provided [6] C. Krauss, “Statistical arbitrage pairs trading strategies: Review and
by the proposed model after all. We suspect the long required outlook,” Journal of Economic Surveys, vol. 31, no. 2, pp. 513–545,
2017.
formation period is also responsible for this profitability de- [7] H. Chen, S. Chen, Z. Chen, and F. Li, “Empirical investigation of an
cline. Therefore we proceed to analyze the standard trading equity pairs trading strategy,” Management Science, 2017.
model in the 3-year-long period. [8] G. Vidyamurthy, Pairs Trading: quantitative methods and analysis.
John Wiley & Sons, 2004, vol. 217.
[9] N. Huck and K. Afawubo, “Pairs trading and selection methods: is
TABLE VI cointegration superior?” Applied Economics, vol. 47, no. 6, pp. 599–
T RADING RESULTS FOR STANDARD TRADING MODEL USING A 613, 2015.
3- YEAR - LONG FORMATION PERIOD . [10] C. L. Dunis, J. Laws, and B. Evans, “Modelling and trading the
gasoline crack spread: A non-linear story,” Derivatives Use, Trading
& Regulation, vol. 12, no. 1-2, pp. 126–145, 2006.
[11] C. L. Dunis, J. Laws, P. W. Middleton, and A. Karathanasopoulos,
“Trading and hedging the corn/ethanol crush spread using time-varying
leverage and nonlinear models,” The European Journal of Finance,
vol. 21, no. 4, pp. 352–375, 2015.
[12] N. S. Thomaidis, N. Kondakis, and G. Dounias, “An intelligent statistical
arbitrage trading system,” in SETN, 2006.
[13] N. Huck, “Pairs selection and outranking: An application to the s&p
100 index,” European Journal of Operational Research, vol. 196, no. 2,
pp. 819–825, 2009.
[14] N. Huck, “Pairs trading and outranking: The multi-step-ahead forecast-
By comparison, the performance in the 10-year-long period ing case,” European Journal of Operational Research, vol. 207, no. 3,
seems greatly affected by the long required duration, sug- pp. 1702–1716, 2010.
gesting the less satisfactory returns emerge not simply from [15] P. Berkhin, “A survey of clustering data mining techniques,” in Grouping
the trading model itself, but also due to the underlying time multidimensional data. Springer, 2006, pp. 25–71.
[16] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm
settings. Following this line of reasoning, if the forecasting- for discovering clusters in large spatial databases with noise.”
based models’ performance increases in the same proportion [17] M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “Optics:
as the standard trading model when reducing the formation ordering points to identify the clustering structure,” in ACM Sigmod
record, vol. 28, no. 2. ACM, 1999, pp. 49–60.
period, the results obtained could be much more satisfactory.
[18] T. Kleinow, “Testing continuous time models in financial markets,” 2002.
[19] E. Chan, Algorithmic trading: winning strategies and their rationale.
VII. C ONCLUSIONS John Wiley & Sons, 2013, vol. 625.
We explored how Pairs Trading could be enhanced with the [20] Y.-W. Si and J. Yin, “Obst-based segmentation approach to financial
time series,” Engineering Applications of Artificial Intelligence, vol. 26,
integration of Machine Learning. First, we proposed a new no. 10, pp. 2581–2596, 2013.
approach to search for pairs based on the application of the [21] R. C. Cavalcante, R. C. Brasileiro, V. L. Souza, J. P. Nobrega, and A. L.
OPTICS algorithm followed by a robust pairs selection crite- Oliveira, “Computational intelligence and financial markets: A survey
and future directions,” Expert Systems with Applications, vol. 55, pp.
ria. The strategy achieved better risk-adjusted returns when 194–211, 2016.
using this method. Secondly, we introduced a forecasting- [22] M. Perlin, “M of a kind: A multivariate approach at pairs trading,” 2007.
based model aiming to reduce decline periods associated with [23] “Find the Right ETF - Tools, Ratings, News,” https://www.etf.com/,
untimely market positions and prolonged divergent pairs. We accessed: 2019-06-30.
[24] H. Rad, R. K. Y. Low, and R. Faff, “The profitability of pairs trading
demonstrated the proposed model is capable of reducing the strategies: distance, cointegration and copula methods,” Quantitative
average decline period in more than 75% although that comes Finance, vol. 16, no. 10, pp. 1541–1558, 2016. [Online]. Available:
at the expense of declining profitability. In addition, this work https://doi.org/10.1080/14697688.2016.1164337
also contributes with empirical evidence of the suitability of [25] B. Do and R. Faff, “Are pairs trading profits robust to trading costs?”
Journal of Financial Research, vol. 35, no. 2, pp. 261–287, 2012.
ETFs traded in a 5-minutes setting in the context of Pairs [26] “3-Month Treasury Bill: Secondary Market Rate,”
Trading. https://fred.stlouisfed.org/series/TB3MS, accessed: 2019-07-11.
[27] A. W. Lo, “The statistics of sharpe ratios,” Financial analysts journal,
R EFERENCES vol. 58, no. 4, pp. 36–52, 2002.
[1] C. Krauss, X. A. Do, and N. Huck, “Deep neural networks, gradient- [28] S. Moraes Sarmento, “Github repository: Pairs trading,”
https://github.com/simaomsarmento/PairsTrading, 2019.
boosted trees, random forests: Statistical arbitrage on the s&p 500,”
European Journal of Operational Research, vol. 259, no. 2, pp. 689– [29] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal
702, 2017. of machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008.
[2] J. Caldeira and G. V. Moura, “Selection of a portfolio of pairs based [30] F. A. Gers, D. Eck, and J. Schmidhuber, “Applying lstm to time series
on cointegration: A statistical arbitrage strategy,” Available at SSRN predictable through time-window approaches,” in Neural Nets WIRN
2196391, 2013. Vietri-01. Springer, 2002, pp. 193–200.