0% found this document useful (0 votes)
175 views43 pages

Cleaning Correlation Matrices

This document summarizes several methods for cleaning empirical correlation matrices. It discusses how estimated correlation matrices (E) become noisy when the number of assets (N) exceeds the number of time periods (T) used to estimate them. It presents three main cleaning techniques: 1) shrinkage, which shrinks eigenvalues towards the mean, 2) eigenvalue clipping, which replaces small eigenvalues with a single value, and 3) eigenvalue substitution, which replaces eigenvalues with theoretically predicted values. It also proposes a Bayesian framework for cleaning correlation matrices using an inverse Wishart prior distribution, which leads exactly to the shrinkage technique.

Uploaded by

doc_oz3298
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
175 views43 pages

Cleaning Correlation Matrices

This document summarizes several methods for cleaning empirical correlation matrices. It discusses how estimated correlation matrices (E) become noisy when the number of assets (N) exceeds the number of time periods (T) used to estimate them. It presents three main cleaning techniques: 1) shrinkage, which shrinks eigenvalues towards the mean, 2) eigenvalue clipping, which replaces small eigenvalues with a single value, and 3) eigenvalue substitution, which replaces eigenvalues with theoretically predicted values. It also proposes a Bayesian framework for cleaning correlation matrices using an inverse Wishart prior distribution, which leads exactly to the shrinkage technique.

Uploaded by

doc_oz3298
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Cleaning correlation matrices,

Random Matrix Theory &


HCIZ integrals

J.P Bouchaud
with: M. Potters, L. Laloux, R. Allez, J. Bun, S. Majumdar

http://www.cfm.fr
Portfolio theory: Basics
Portfolio weights wi, Asset returns Xit

If expected/predicted gains are gi then the expected gain of


the portfolio is
X
G= wi g i
i

Let risk be defined as: variance of the portfolio returns


(maybe not a good definition !)
X
2
R = wiiCij j wj
ij

where i2 is the variance of asset i, and

Cij is the correlation matrix.

J.-P. Bouchaud
Markowitz Optimization
Find the portfolio with maximum expected return for a given
risk or equivalently, minimum risk for a given return (G)

In matrix notation:
C1g
wC = G T 1
g C g
where all gains are measured with respect to the risk-free
rate and i = 1 (absorbed in gi).

Note: in the presence of non-linear contraints, e.g.


X
|wi| A
i
a spin-glass problem! (see [JPB,Galluccio,Potters])

J.-P. Bouchaud
Markowitz Optimization

More explicitly:
X X
w 1
( g ) = g+ (1
1) ( g)

Compared to the naive allocation w g:

Eigenvectors with 1 are projected out

Eigenvectors with 1 are overallocated

Very important for stat. arb. strategies (for example)

J.-P. Bouchaud
Empirical Correlation Matrix

Before inverting them, how should one estimate/clean cor-


relation matrices?

Empirical Equal-Time Correlation Matrix E


t t
1 X Xi Xj
Eij =
T t ij
Order N 2 quantities estimated with N T datapoints.

When T < N , E is not even invertible.

Typically: N = 500 2000; T = 500 2500 days (10 years


Beware of high frequencies) q := N/T = O(1)

J.-P. Bouchaud
Risk of Optimized Portfolios

In-sample risk (for G = 1):

2 T 1
Rin = wE EwE =
gT E1g

True minimal risk


2 T 1
Rtrue = wC CwC = T 1
g C g

Out-of-sample risk

2 T gT E1CE1g
Rout = wE CwE =
(gT E1g)2

J.-P. Bouchaud
Risk of Optimized Portfolios

Let E be a noisy, unbiased estimator of C. Using convexity


arguments, and for large matrices:
2 R2
Rin R 2
true out

2
In fact, using RMT: Rout 2 (1 q)1 = R2 (1 q)2 ,
= Rtrue in
indep. of C! (For large N )

If C has some time dependence (beyond observation noise)


one expects an even worse underestimation

J.-P. Bouchaud
In Sample vs. Out of Sample

Return 150

100

50 Raw in-sample
Cleaned in-sample
Cleaned out-of-sample
Raw out-of-sample

0
0 10 20 30
Risk

J.-P. Bouchaud
Rotational invariance hypothesis (RIH)
In the absence of any cogent prior on the eigenvectors, one
can assume that C is a member of a Rotationally Invariant
Ensemble RIH


Surely not true for the market mode ~v1 (1, 1, . . . , 1)/ N ,
with 1 N but OK in the bulk (see below)

A more plausible assumption: factor model hierarchical,


block diagonal Cs (Parisi matrices)

Cleaning E within RIH: keep the eigenvectors, play with


eigenvalues

The simplest, classical scheme, shrinkage:


b = (1 ) + ,
C = (1 )E + I [0, 1]
C E

J.-P. Bouchaud
RMT: from C () to E ()
Solution using different techniques (replicas, diagrams, free
matrices) gives the resolvent GE (z) = N 1Tr(E z I) as:
Z
1
GE (z) = d C () ,
z (1 q + qzGE (z))
Note: One should work from C GE

Example 1: C = I (null hypothesis) Marcenko-Pastur [67]


q
(+ )( ) 2
E () = , [(1 q) , (1 + q)2]
2q

Suggests a second cleaning scheme (Eigenvalue clipping, [Laloux


et al. 1997]): any eigenvalue beyond the Marcenko-Pastur
edge can be trusted, the rest is noise.

J.-P. Bouchaud
Eigenvalue clipping

< + are replaced by a unique one, so as to preserve TrC =


N.

J.-P. Bouchaud
RMT: from C () to E ()
Solution using different techniques (replicas, diagrams, free
matrices) gives the resolvent GE (z) as:
Z
1
GE (z) = d C () ,
z (1 q + qzGE (z))
Note: One should work from C GE

Example 2: Power-law spectrum (motivated by data)


A
C () = 1+
( min)
( 0)

Suggests a third cleaning scheme (Eigenvalue substitution,


Potters et al. 2009, El Karoui 2010): E is replaced by the
theoretical C with the same rank k

J.-P. Bouchaud
Empirical Correlation Matrix
1.5 8
Data
Dressed power law (=2) 6
Raw power law (=2)
4
Marcenko-Pastur

1 0
-2
0 250 500
()

rank

0.5

0
0 1 2 3 4 5

MP and generalized MP fits of the spectrum

J.-P. Bouchaud
Eigenvalue cleaning
3.5

Classical Shrinkage
3 Ledoit-Wolf Shrinkage
Power Law Substitution
Eigenvalue Clipping
2.5

2
2
R

1.5

0.5

0
0 0.2 0.4 0.6 0.8 1

Out-of sample risk for different 1-parameter cleaning schemes

J.-P. Bouchaud
A RIH Bayesian approach
All the above schemes lack a rigorous framework and are at
best ad-hoc recipes

A Bayesian framework: suppose C belongs to a RIE, with


P(C) and assume Gaussian returns. Then one needs:
Z
hCi|X t = D CCP(C|{Xit})
i
with
h i
t 1 t
P(C|{Xi }) = Z exp N TrV (C, {Xi }) ;
where (Bayes):

t 1 h 1
i
V (C, {Xi }) = log C + EC + V0(C)
2q

J.-P. Bouchaud
A Bayesian approach: a fully soluble case

V0(C) = (1 + b) ln C + bC1, b > 0: Inverse Wishart

p q
(+ )( ) 2 b2/4)/b
C () 2 ; = (1 + b (1 + b)

In this case, the matrix integral can be done, leading exactly


to the Shrinkage recipe, with = f (b, q)

Note that b can be determined from the empirical spectrum


of E, using the generalized MP formula

J.-P. Bouchaud
The general case: HCIZ integrals

A Coulomb gas approach: integrate over the orthogonal


group C = OO , where is diagonal.
Z " #
N h i
D O exp Tr log + EO 1O + 2qV0()
2q

Can one obtain a large N estimate of the HCIZ integral


Z " #
N
F (A , B ) = lim N 2 ln D O exp TrAOBO
N 2q
in terms of the spectrum of A and B?

J.-P. Bouchaud
The general case: HCIZ integrals

Can one obtain a large N estimate of the HCIZ integral


Z " #
N
F (A , B ) = lim N 2 ln D O exp TrAOBO
N 2q
in terms of the spectrum of A and B?

When A (or B) is of finite rank, such a formula exists in terms


of the R-transform of B [Marinari, Parisi & Ritort, 1995].

When the rank of A, B are of order N , there is a formula due


to Matytsin [94](in the unitary case), later shown rigorously
by Zeitouni & Guionnet, but its derivation is quite obscure...

J.-P. Bouchaud
An instanton approach to large N HCIZ

Consider Dysons Brownian motion matrices. The eigenval-


ues obey:
s
2 1 X 1
dxi = dW + dt ,
N N j6=i xi xj

Constrain xi(t = 0) = Ai and xi(t = 1) = Bi. The proba-


bility of such a path is given by a large deviation/instanton
formula, with:
d2xi 2 X 1
2
= 2 3
.
dt N 6=i (xi x)

J.-P. Bouchaud
An instanton approach to large N HCIZ

Constrain xi(t = 0) = Ai and xi(t = 1) = Bi. The proba-


bility of such a path is given by a large deviation/instanton
formula, with:
d2xi 2 X 1
2
= 2 3
.
dt N 6=i (xi x)

This can be interpreted as the motion of particles interacting


through an attractive two-body potential (r) = (N r)2.
Using the virial formula, one finally gets Matytsins equations:

t + x[v] = 0, tv + vxv = 2x.

J.-P. Bouchaud
An instanton approach to large N HCIZ
Finally, the action associated to these trajectories is:
Z " # Z Z=B
1 2 1
S dx v 2 + 2 dxdyZ (x)Z (y) ln |x y|
2 3 2 Z=A

Now, the link with HCIZ comes from noticing that the prop-
agator of the Brownian motion in matrix space is:
N N
P(B|A) exp [ Tr(AB)2] = exp [TrA2+TrB22TrAOBO]
2 2
Disregarding the eigenvectors of B (i.e. integrating over O)
leads to another expression for P (Bi |Aj ) in terms of HCIZ
that can be compared to the one using instantons

The final result for F (A, B ) is exactly Matytsins expression,


up to details (!)

J.-P. Bouchaud
Back to eigenvalue cleaning...

Estimating HCIZ at large N is only the first step, but...

...one still needs to apply it to B = C1, A = E = X CX


and to compute also correlation functions such as
2
hOij iEC1
with the HCIZ weight

As we were working on this we discovered the work of Ledoit-


P
ech
e that solves the problem exactly using tools from RMT...

J.-P. Bouchaud
The Ledoit-P
ech
e magic formula
The Ledoit-P
ech
e [2011] formula is a non-linear shrinkage,
given by:
b = E
C .
|1 q + qE lim0GE (E i)|2

Note 1: Independent of C: only GE is needed (and is observ-


able)!

Note 2: When applied to the case where C is inverse Wishart,


this gives again the linear shrinkage

Note 3: Still to be done: reobtain these results using the


HCIZ route (many interesting intermediate results to hope
for!)

J.-P. Bouchaud
Eigenvalue cleaning: Ledoit-P
ech
e

Fit of the empirical distribution with V0 (z) = a/z + b/z 2 + c/z 3.

J.-P. Bouchaud
What about eigenvectors?
Up to now, most results using RMT focus on eigenvalues

What about eigenvectors? What natural null-hypothesis be-


yond RIH?

Are eigen-values/eigen-directions stable in time?

Important source of risk for market/sector neutral portfolios:


a sudden/gradual rotation of the top eigenvectors!

..a little movie...

J.-P. Bouchaud
What about eigenvectors?
Correlation matrices need a certain time T to be measured

Even if the true C is fixed, its empirical determination


fluctuates:
Et = C + noise

What is the dynamics of the empirical eigenvectors induced


by measurement noise?

Can one detect a genuine evolution of these eigenvectors


beyond noise effects?

J.-P. Bouchaud
What about eigenvectors?

More generally, can one say something about the eigenvec-


tors of randomly perturbed matrices:

H = H0 + H1
where H0 is deterministic or random (e.g. GOE) and H1
random.

J.-P. Bouchaud
Eigenvectors exchange
An issue: upon pseudo-collisions of eigenvectors, eigenvalues
exchange

Example: 2 2 matrices
H11 = a, H22 = a + , H21 = H12 = c,
s
2 2
0 a + c +
2 4

Let c vary: quasi-crossing for c 0, with an exchange of the


top eigenvector: (1, 1) (1, 1)

For large matrices, these exchanges are extremely numerous


labelling problem

J.-P. Bouchaud
Subspace stability
An idea: follow the subspace spanned by P -eigenvectors:

|k+1i, |k+2i, . . . |k+P i |k+1 i, |k+2 i, . . . |k+P i

Form the P P matrix of scalar products:



Gij = hk+i|k+j i

The determinant of this matrix is insensitive to label per-


mutations and is a measure of the overlap between the two
P -dimensional subspaces

D = P1 ln | det G| is a measure of how well the first sub-


space can be approximated by the second

J.-P. Bouchaud
Intermezzo
Non equal time correlation matrices
t+
1 X Xit Xj

Eij =
T t ij
N N but not symmetrical: leader-lagger relations

General rectangular correlation matrices


T
1 X
Gi = Yt Xit
T t=1
N input factors X; M output factors Y

Example: Yt = Xjt+ , N = M

J.-P. Bouchaud
Intermezzo: Singular values
Singular values: Square root of the non zero eigenvalues
of GGT or GT G, with associated eigenvectors uk and vik
1 s1 > s2 > ...s(M,N ) 0

Interpretation: k = 1: best linear combination of input vari-


ables with weights vi1, to optimally predict the linear com-
bination of output variables with weights u1, with a cross-
correlation = s1.

s1: measure of the predictive power of the set of Xs with


respect to Y s

Other singular values: orthogonal, less predictive, linear com-


binations

J.-P. Bouchaud
Intermezzo: Benchmark
Null hypothesis: No correlations between Xs and Y s:

Gtrue 0

But arbitrary correlations among Xs, CX , and Y s, CY , are


possible

Consider exact normalized principal components for the sam-


ple variables Xs and Y s:
t 1 X

Xi = Uij Xjt ; t = ...
Y
i j

=Y
and define G XT .

J.-P. Bouchaud
Intermezzo: Random SVD
Final result:([Wachter] (1980); [Laloux,Miceli,Potters,JPB])
q
(s2 )(+ s2)
(s) = (m + n 1)+(s 1) +
s(1 s2)
with
q
= n + m 2mn 2 mn(1 n)(1 m), 0 1

Analogue of the Marcenko-Pastur result for rectangular cor-


relation matrices

Many applications; finance, econometrics (large models),


genomics, etc. and subspace stability!

J.-P. Bouchaud
Back to eigenvectors
Extend the target subspace to avoid edge effects:

|k+1i, |k+2i, . . . |k+P i |kQ+1 i, |k+2 i, . . . |k+Q i

Form the P Q matrix of scalar products:



Gij = hk+i|k+j i

The singular values of G indicates how well the Q perturbed


vectors approximate the initial ones

1X
D= ln si
P i

J.-P. Bouchaud
Null hypothesis
Note: if P and Q are large, D can be accidentally small

One can compute D exactly in the limit P, Q , N ,


with fixed p = P/N , q = Q/N :

Final result: (same problem as above!)


Z 1
D= ds ln s (s)
0
with:
q
(s2 )(+ s2)
(s) =
s(1 s2)
and
q
= p + q 2pq 2 pq(1 p)(1 q), 0 1

J.-P. Bouchaud
Back to eigenvectors: perturbation theory

Consider a randomly perturbed matrix:

H = H0 + H1

Perturbation theory to second order in yields:


!2
2 X X hi|H1|j i
D .
2P i{k+1,...,k+P } j6{kQ+1,...,k+Q} i j

The full distribution of s can again be computed exactly (in


some limits) using free random matrix tools.

J.-P. Bouchaud
GOE: the full SV spectrum

Initial eigenspace: spanned by [a, b] [2, 2], b a =

Target eigenspace: spanned by [a , b + ] [2, 2]

Two cases (set s = 2


s):

Weak fluctuations 1 ( s) is a semi circle



centered around 1, of width

Strong fluctuations 1 (
s) s2, with
smin/
smin 1 and
smax 1.

J.-P. Bouchaud
The case of correlation matrices
Consider the empirical correlation matrix:
T
1 X
E=C+ = (X t X t C)
T t=1

The noise is correlated as:


D E 1
ij kl = (Cik Cjl + Cil Cjk )
T

from which one derives:



P ijN
1 X X
.
D 2
2T P i=1 j=Q+1 (i j )

(and a similar equation for eigenvalues)

J.-P. Bouchaud
Stability of eigenvalues: Correlations

Eigenvalues clearly change: well known correlation crises

J.-P. Bouchaud
Stability of eigenspaces: Correlations

D( ) for a given T , P = 5, Q = 10

J.-P. Bouchaud
Stability of eigenspaces: Correlations

D( = T ) for P = 5, Q = 10

J.-P. Bouchaud
Conclusion
Many RMT tools available to understand the eigenvalue spec-
trum and suggest cleaning schemes

The understanding of eigenvectors is comparatively poorer

The dynamics of the top eigenvector (aka market mode) is


relatively well understood

A plausible, realistic model for the true evolution of C is


still lacking (many crazy attempts Multivariate GARCH,
BEKK, etc., but second generation models are on their
way)

J.-P. Bouchaud
Bibliography

J.P. Bouchaud, M. Potters, Financial Applications of Ran-


dom Matrix Theory: a short review, in The Oxford Hand-
book of Random Matrix Theory (2011)

R. Allez and J.-P. Bouchaud, Eigenvectors dynamics: general


theory & some applications, arXiv 1108.4258

P.-A. Reigneron, R. Allez and J.-P. Bouchaud, Principal re-


gression analysis and the index leverage effect, Physica A,
Volume 390 (2011) 3026-3035.

J.-P. Bouchaud

You might also like