Cleaning Correlation Matrices
Cleaning Correlation Matrices
J.P Bouchaud
with: M. Potters, L. Laloux, R. Allez, J. Bun, S. Majumdar
http://www.cfm.fr
Portfolio theory: Basics
Portfolio weights wi, Asset returns Xit
J.-P. Bouchaud
Markowitz Optimization
Find the portfolio with maximum expected return for a given
risk or equivalently, minimum risk for a given return (G)
In matrix notation:
C1g
wC = G T 1
g C g
where all gains are measured with respect to the risk-free
rate and i = 1 (absorbed in gi).
J.-P. Bouchaud
Markowitz Optimization
More explicitly:
X X
w 1
( g ) = g+ (1
1) ( g)
J.-P. Bouchaud
Empirical Correlation Matrix
J.-P. Bouchaud
Risk of Optimized Portfolios
2 T 1
Rin = wE EwE =
gT E1g
Out-of-sample risk
2 T gT E1CE1g
Rout = wE CwE =
(gT E1g)2
J.-P. Bouchaud
Risk of Optimized Portfolios
2
In fact, using RMT: Rout 2 (1 q)1 = R2 (1 q)2 ,
= Rtrue in
indep. of C! (For large N )
J.-P. Bouchaud
In Sample vs. Out of Sample
Return 150
100
50 Raw in-sample
Cleaned in-sample
Cleaned out-of-sample
Raw out-of-sample
0
0 10 20 30
Risk
J.-P. Bouchaud
Rotational invariance hypothesis (RIH)
In the absence of any cogent prior on the eigenvectors, one
can assume that C is a member of a Rotationally Invariant
Ensemble RIH
Surely not true for the market mode ~v1 (1, 1, . . . , 1)/ N ,
with 1 N but OK in the bulk (see below)
J.-P. Bouchaud
RMT: from C () to E ()
Solution using different techniques (replicas, diagrams, free
matrices) gives the resolvent GE (z) = N 1Tr(E z I) as:
Z
1
GE (z) = d C () ,
z (1 q + qzGE (z))
Note: One should work from C GE
J.-P. Bouchaud
Eigenvalue clipping
J.-P. Bouchaud
RMT: from C () to E ()
Solution using different techniques (replicas, diagrams, free
matrices) gives the resolvent GE (z) as:
Z
1
GE (z) = d C () ,
z (1 q + qzGE (z))
Note: One should work from C GE
J.-P. Bouchaud
Empirical Correlation Matrix
1.5 8
Data
Dressed power law (=2) 6
Raw power law (=2)
4
Marcenko-Pastur
1 0
-2
0 250 500
()
rank
0.5
0
0 1 2 3 4 5
J.-P. Bouchaud
Eigenvalue cleaning
3.5
Classical Shrinkage
3 Ledoit-Wolf Shrinkage
Power Law Substitution
Eigenvalue Clipping
2.5
2
2
R
1.5
0.5
0
0 0.2 0.4 0.6 0.8 1
J.-P. Bouchaud
A RIH Bayesian approach
All the above schemes lack a rigorous framework and are at
best ad-hoc recipes
t 1 h 1
i
V (C, {Xi }) = log C + EC + V0(C)
2q
J.-P. Bouchaud
A Bayesian approach: a fully soluble case
p q
(+ )( ) 2 b2/4)/b
C () 2 ; = (1 + b (1 + b)
J.-P. Bouchaud
The general case: HCIZ integrals
J.-P. Bouchaud
The general case: HCIZ integrals
J.-P. Bouchaud
An instanton approach to large N HCIZ
J.-P. Bouchaud
An instanton approach to large N HCIZ
J.-P. Bouchaud
An instanton approach to large N HCIZ
Finally, the action associated to these trajectories is:
Z " # Z Z=B
1 2 1
S dx v 2 + 2 dxdyZ (x)Z (y) ln |x y|
2 3 2 Z=A
Now, the link with HCIZ comes from noticing that the prop-
agator of the Brownian motion in matrix space is:
N N
P(B|A) exp [ Tr(AB)2] = exp [TrA2+TrB22TrAOBO]
2 2
Disregarding the eigenvectors of B (i.e. integrating over O)
leads to another expression for P (Bi |Aj ) in terms of HCIZ
that can be compared to the one using instantons
J.-P. Bouchaud
Back to eigenvalue cleaning...
J.-P. Bouchaud
The Ledoit-P
ech
e magic formula
The Ledoit-P
ech
e [2011] formula is a non-linear shrinkage,
given by:
b = E
C .
|1 q + qE lim0GE (E i)|2
J.-P. Bouchaud
Eigenvalue cleaning: Ledoit-P
ech
e
J.-P. Bouchaud
What about eigenvectors?
Up to now, most results using RMT focus on eigenvalues
J.-P. Bouchaud
What about eigenvectors?
Correlation matrices need a certain time T to be measured
J.-P. Bouchaud
What about eigenvectors?
H = H0 + H1
where H0 is deterministic or random (e.g. GOE) and H1
random.
J.-P. Bouchaud
Eigenvectors exchange
An issue: upon pseudo-collisions of eigenvectors, eigenvalues
exchange
Example: 2 2 matrices
H11 = a, H22 = a + , H21 = H12 = c,
s
2 2
0 a + c +
2 4
J.-P. Bouchaud
Subspace stability
An idea: follow the subspace spanned by P -eigenvectors:
|k+1i, |k+2i, . . . |k+P i |k+1 i, |k+2 i, . . . |k+P i
J.-P. Bouchaud
Intermezzo
Non equal time correlation matrices
t+
1 X Xit Xj
Eij =
T t ij
N N but not symmetrical: leader-lagger relations
Example: Yt = Xjt+ , N = M
J.-P. Bouchaud
Intermezzo: Singular values
Singular values: Square root of the non zero eigenvalues
of GGT or GT G, with associated eigenvectors uk and vik
1 s1 > s2 > ...s(M,N ) 0
J.-P. Bouchaud
Intermezzo: Benchmark
Null hypothesis: No correlations between Xs and Y s:
Gtrue 0
=Y
and define G XT .
J.-P. Bouchaud
Intermezzo: Random SVD
Final result:([Wachter] (1980); [Laloux,Miceli,Potters,JPB])
q
(s2 )(+ s2)
(s) = (m + n 1)+(s 1) +
s(1 s2)
with
q
= n + m 2mn 2 mn(1 n)(1 m), 0 1
J.-P. Bouchaud
Back to eigenvectors
Extend the target subspace to avoid edge effects:
|k+1i, |k+2i, . . . |k+P i |kQ+1 i, |k+2 i, . . . |k+Q i
1X
D= ln si
P i
J.-P. Bouchaud
Null hypothesis
Note: if P and Q are large, D can be accidentally small
J.-P. Bouchaud
Back to eigenvectors: perturbation theory
H = H0 + H1
J.-P. Bouchaud
GOE: the full SV spectrum
Strong fluctuations 1 (
s) s2, with
smin/
smin 1 and
smax 1.
J.-P. Bouchaud
The case of correlation matrices
Consider the empirical correlation matrix:
T
1 X
E=C+ = (X t X t C)
T t=1
J.-P. Bouchaud
Stability of eigenvalues: Correlations
J.-P. Bouchaud
Stability of eigenspaces: Correlations
D( ) for a given T , P = 5, Q = 10
J.-P. Bouchaud
Stability of eigenspaces: Correlations
D( = T ) for P = 5, Q = 10
J.-P. Bouchaud
Conclusion
Many RMT tools available to understand the eigenvalue spec-
trum and suggest cleaning schemes
J.-P. Bouchaud
Bibliography
J.-P. Bouchaud