Stat Arb
Stat Arb
Week 10
traders.berkeley.edu
Announcements
● Brainteaser
● What is stat arb?
● Stat arb pipeline
○ Identifying relationships
○ Trading relationships
○ Managing risk
Problem of the Day
I have a dataset which I can split into four parts (say by marginalizing by a
categorical variable). A linear model can achieve an R^2 of .8 on each of these
individual datasets.
Useful Concepts
● Correlation
○ Pearson: linear relationship between two random variables
■ E(XY)/[SD(X)SD(Y)]
○ Spearman: monotonic relationship between two variables
■ Pearson correlation between ranks
● Cointegration
○ Some linear combination of the two time series sum to a constant
Identifying Relationships
Regression Hypothesis Testing
● First, we have to specify a distribution. Correlation (and regression
coefficients) is generally assumed to follow a Student’s t distribution.
○ Null Hypothesis: 𝝆=0 Alternative: 𝝆≠0 with some significance level ɑ.
● Test for time series stationarity (Augmented Dickey-Fuller Test).
○ This is important because if your time series has a trend, you usually cannot perform inference
on the whole dataset.
● Most parametric tests can also be performed in non-parametric ways using
bootstrapping.
Using Correlation
● Usually, we want to
measure the relationship
between the returns.
● Returns are generally
stationary time series
which tells us that price is
an order one time series
Error Rate Analysis
● We will generally run many hypothesis tests, and we want to ensure that our
discoveries are real.
● Therefore we have to consider the False Discovery Rate (FDR) and the
Family-wise Error Rate (FWER).
● The FDR is E[V/R], ie the total number of false discoveries/the total number
of total discoveries
● The FWER (assuming our tests are independent) is 1-(1-ɑ)^n, where we run n
hypothesis tests.
Controlling the FWER: Bonferroni Correction
● Let’s say we run n hypothesis tests and we a global 5% error rate.
● We can just run each test at ɑ=.05/n.
● This gives us our goal, but is super conservative, because it assumes all of
our rejection regions are disjoint.
● This is literally the worst case.
Controlling the FDR: Benjamini-Hochberg
● The Benjamini-Hochberg procedure limits the FDR using the following
procedure.
1. Run hypothesis tests.
2. Collect p values and sort them.
3. Plot the line k/m*ɑ, where k is the rank of the current test and m is the
number of tests
4. Reject the null hypothesis for tests below this line.
Also Important
● Online FDR control (you don’t know how many hypothesis tests you’re going
to run)
○ Use the LORD algorithm
○ Or Generalized alpha investment (GAI, SAFFRON)
Out-of-Sample Testing
● Usually, you don’t just train a model on your dataset and hope it works, you
hold out a certain amount of the data to test on after training.
● Beware: data leakage. What is wrong with the following situations?
○ In order to test the performance of a strategy on the S&P, I take its current constituent stocks
and trade a momentum-based strategy on their returns.
○ I take a returns series, shuffle it and split it into a train and test set. I then train a model and
see shockingly good results on the test set.
Trading Stat Arb
No (Stat) Arbitrage Bounds
● Whenever you take a position, you have to cross the spread.
● You should not take the position when E[profit] ≤ 2*spread
● The following example does not yield a profit