05.1 PP 3 14 Deterministic Matrices
05.1 PP 3 14 Deterministic Matrices
Deterministic Matrices
Matrices appear in all corners of science, from mathematics to physics, computer science,
biology, economics and quantitative finance. In fact, before Schrodinger’s equation, quan-
tum mechanics was formulated by Heisenberg in terms of what he called “Matrix Mechan-
ics”. In many cases, the matrices that appear are deterministic, and their properties are
encapsulated in their eigenvalues and eigenvectors. This first chapter gives several elemen-
tary results in linear algebra, in particular concerning eigenvalues. These results will be
extremely useful in the rest of the book where we will deal with random matrices, and in
particular the statistical properties of their eigenvalues and eigenvectors.
Dynamical System
Consider a generic dynamical system describing the time evolution of a certain
N-dimensional vector x(t), for example the three-dimensional position of a point in
space. Let us write the equation of motion as
dx
= F(x), (1.1)
dt
where F(x) is an arbitrary vector field. Equilibrium points x∗ are such that F(x∗ ) = 0.
Consider now small deviations from equilibrium, i.e. x = x∗ + y where 1. To first
order in , the dynamics becomes linear, and given by
dy
= Ay, (1.2)
dt
where A is a matrix whose elements are given by Aij = ∂j Fi (x∗ ), where i,j are indices
that run from 1 to N . When F can itself be written as the gradient of some potential V , i.e.
Fi = −∂i V (x), the matrix A becomes symmetric, i.e. Aij = Aj i = −∂ij V . But this is not
3
4 Deterministic Matrices
always the case; in general the linearized dynamics is described by a matrix A without any
particular property – except that it is a square N × N array of real numbers.
Master Equation
Another standard setting is the so-called Master equation for the evolution of probabilities.
Call i = 1, . . . ,N the different possible states of a system and Pi (t) the probability to
find the system in state i at time t. When memory effects can be neglected, the dynamics
is called Markovian and the evolution of Pi (t) is described by the following discrete time
equation:
N
Pi (t + 1) = Aij Pj (t), (1.3)
j =1
meaning that the system has a probability Aij to jump from state j to state i between t and
t +1. Note that all elements of A are positive; furthermore, since all jump possibilities must
be exhausted, one must have, for each j , i Aij = 1. This ensures that i Pi (t) = 1 at
all times, since
N
N
N
N
N
N
Pi (t + 1) = Aij Pj (t) = Aij Pj (t) = Pj (t) = 1. (1.4)
i=1 i=1 j =1 j =1 i=1 j =1
Matrices such that all elements are positive and such that the sum over all rows is equal
to unity for each column are called stochastic matrices. In matrix form, Eq. (1.3) reads
P(t + 1) = AP(t), leading to P(t) = At P(0), i.e. A raised to the t-th power applied to the
initial distribution.
Covariance Matrices
As a third important example, let us consider random, N -dimensional real vectors X, with
some given multivariate distribution P (X). The covariance matrix C of the X’s is defined as
where E means that we are averaging over the distribution P (X). Clearly, the matrix C is
real and symmetric. It is also positive semi-definite, in the sense that for any vector x,
xT Cx ≥ 0. (1.6)
If it were not the case, it would be possible to find a linear combination of the vectors X
with a negative variance, which is obviously impossible.
The three examples above are all such that the corresponding matrices are N × N square
matrices. Examples where matrices are rectangular also abound. For example, one could
consider two sets of random real vectors: X of dimension N1 and Y of dimension N2 . The
cross-covariance matrix defined as
1.1 Matrices, Eigenvalues and Singular Values 5
or Avj = λj vj . If the eigenvectors are linearly independent (which is not true for all
matrices), the matrix inverse V−1 exists and one can therefore write A as
A = VV−1, (1.11)
which is called the eigenvalue decomposition of the matrix A.
Symmetric matrices (such that A = AT ) have very nice properties regarding their
eigenvalues and eigenvectors.
1 The characteristic polynomial Q (λ) = det(λ1 − A) always has a coefficient 1 in front of its highest power (Q (λ) =
N N
λN + O(λN−1 )), such polynomials are called monic.
6 Deterministic Matrices
A = OOT , (1.12)
where is a diagonal matrix containing the eigenvalues associated with the eigenvectors
in the columns of O. A symmetric matrix can be diagonalized by an orthogonal matrix.
Remark that an N × N orthogonal matrix is fully parameterized by N (N − 1)/2 “angles”,
whereas contains N diagonal elements. So the total number of parameters of the diagonal
decomposition is N (N − 1)/2 + N, which is identical, as it should be, to the number of
different elements of a symmetric N × N matrix.
Let us come back to our dynamical system example, Eq. (1.2). One basic question is to
know whether the perturbation y will grow with time, or decay with time. The answer to
this question is readily given by the eigenvalues of A. For simplicity, we assume F to be
a gradient such that A is symmetric. Since the eigenvectors of A are orthonormal, one can
decompose y in term of the v’s as
N
y(t) = ci (t)vi . (1.13)
i=1
Taking the dot product of Eq. (1.2) with vi then shows that the dynamics of the coefficients
ci (t) are decoupled and given by
dci
= λi ci , (1.14)
dt
where λi is the eigenvalue associated with vi . Therefore, any component of the initial
perturbation y(t = 0) that is along an eigenvector with positive eigenvalue will grow expo-
nentially with time, until the linearized approximation leading to Eq. (1.2) breaks down.
Conversely, components along directions with negative eigenvalues decrease exponentially
with time. An equilibrium x∗ is called stable provided all eigenvalues are negative, and
marginally stable if some eigenvalues are zero while all others are negative.
The important message carried by the example above is that diagonalizing a matrix
amounts to finding a way to decouple the different degrees of freedom, and convert a matrix
equation into a set of N scalar equations, as Eqs. (1.14). We will see later that the same
idea holds for covariance matrices as well: their diagonalization allows one to find a set of
uncorrelated vectors. This is usually called Principal Component Analysis (pca).
1.1 Matrices, Eigenvalues and Singular Values 7
A = VSUT , (1.16)
where S is a non-negative diagonal matrix, whose elements are called the singular values
of A, and U,V are two real, orthogonal matrices. Whenever A is symmetric positive semi-
definite, one has S = and U = V.
Equation (1.16) also holds for rectangular N ×T matrices, where V is N ×N orthogonal,
U is T × T orthogonal and S is N × T diagonal as defined below. To construct the
singular value decomposition (svd) of A, we first introduce two matrices B and B, defined
as B := AAT and B = AT A. It is plain to see that these matrices are symmetric, since
8 Deterministic Matrices
BT = (AAT )T = AT T AT = B (and similarly for B). They are also positive semi-definite as
for any vector x we have xT Bx = ||AT x||2 ≥ 0.
We can show that B and B have the same non-zero eigenvalues. In fact, let λ > 0 be an
eigenvalue of B and v 0 is the corresponding eigenvector. Then we have, by definition,
AAT v = λv. (1.17)
Let u = AT v, then we can get from the above equation that
AT AAT v = λAT v ⇒ Bu = λu. (1.18)
Moreover,
u2 = vT AAT v = vT Bv 0 ⇒ u 0. (1.19)
Hence λ is also an eigenvalue of B. Note that for degenerate eigenvalues λ of B, an
orthogonal set of corresponding eigenvectors {v } gives rise to an orthogonal set {AT v }
of eigenvectors of B. Hence the multiplicity of λ in B is at least that of B. Similarly, we can
show that any non-zero eigenvalue of B is also an eigenvalue of B. This finishes the proof
of the claim.
Note that B has at most N non-zero eigenvalues and B has at most T non-zero eigen-
values. Thus by the above claim, if T > N , B has at least T − N zero eigenvalues, and if
T < N, B has at least N − T zero eigenvalues. We denote the other min{N,T } eigenvalues
of B and B by {λk }1≤k≤min{N,T } . Then the svd of A is expressed as Eq. (1.16), where V is
the N × N orthogonal matrix consisting of the N normalized eigenvectors of B, U is the
T × T orthogonal matrix consisting of the T normalized eigenvectors of B, and S is an
√
N × T rectangular diagonal matrix with Skk = λk ≥ 0, 1 ≤ k ≤ min{N,T } and all other
entries equal to zero.
For instance, if N < T , we have
⎛√ ⎞
λ1 0 0 0 ··· 0
√
⎜ 0 λ2 0 0 · · · 0⎟
⎜ ⎟
S=⎜ .. ⎟. (1.20)
⎝ 0 0 . 0 · · · 0⎠
√
0 0 0 λN · · · 0
Although (non-degenerate) normalized eigenvectors are unique up to a sign, the choice of
√
the positive sign for the square-root λk imposes a condition on the combined sign for the
left and right singular vectors vk and uk . In other words, simultaneously changing both vk
and uk to −vk and −uk leaves the matrix A invariant, but for non-zero singular values one
cannot individually change the sign of either vk or uk .
The recipe to find the svd, Eq. (1.16), is thus to diagonalize both AAT (to obtain V
and S2 ) and AT A (to obtain U and again S2 ). It is insightful to again count the number of
parameters involved in this decomposition. Consider a general N × T matrix with T ≥ N
(the case N ≥ T follows similarly). The N eigenvectors of AAT are generically unique
up to a sign, while for T − N > 0 the matrix AT A will have a degenerate eigenspace
associated with the eigenvalue 0 of size T − N , hence its eigenvectors are only unique up
1.2 Some Useful Theorems and Identities 9
1.0
0.5
Im ( )
0.0
− 0.5
− 1.0
0 1 2 3 4
Re ( )
Figure 1.1 The three complex eigenvalues of the matrix (1.22) (crosses) and its three Gershgorin
circles. The first eigenvalue λ1 ≈ 0.92 falls in the first circle while the other two λ2,3 ≈ 2.54 ± 0.18i
fall in the third one.
Application: Suppose A is a stochastic matrix, such that all its elements are positive
and satisfy i Aij = 1, ∀j . Then clearly the vector 1 is an eigenvector of AT , with
eigenvalue λ = 1. But since the Perron–Frobenius can be applied to AT , the inequalities
(1.24) ensure that λ is the top eigenvalue of AT , and thus also of A. All the elements of the
corresponding eigenvector v∗ are positive, and describe the stationary state of the associated
Master equation, i.e.
v∗
Pi∗ = Aij Pj∗ −→ Pi∗ = i ∗ . (1.25)
j k vk
for invertible A.
2 See: P. Denton, S. Parke, T. Tao, X. Zhang, Eigenvalues from Eigenvectors: a survey of a basic identity in linear algebra,
arXiv:1908.03795.
12 Deterministic Matrices
Q−1 −1
11 = M11 − M12 (M22 ) M21, (1.32)
where the right hand side is called the Schur complement of the block M22 of the matrix M.
where F () is the diagonal matrix where we have applied the function F to each (diag-
onal) entry of . The function F (M) is now a matrix valued function of a matrix. Scalar
polynomial functions can obviously be extended directly as
K
K
F (x) = ak x k ⇒ F (M) = ak Mk , (1.35)
k=0 k=0
We can take the derivative of a scalar-valued function Tr F (M) with respect to each
element of the matrix M:
d d
Tr(F (M)) = [F (M)]ij ⇒ Tr(F (M)) = F (M). (1.37)
d[M]ij dM
Equation (1.37) is easy to derive when F (x) is a monomial ak x k and by linearity for
polynomial or Taylor series F (x).
1.2 Some Useful Theorems and Identities 13