Eigenvalues, Eigenvectors (CDT-28) : April 2020
Eigenvalues, Eigenvectors (CDT-28) : April 2020
net/publication/340628834
CITATIONS READS
0 991
1 author:
Luciano da F. Costa
University of São Paulo
679 PUBLICATIONS 11,411 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Luciano da F. Costa on 14 April 2020.
Abstract
The concept and properties of eigenvalues and eigenvectors are presented in a concise and introductory manner. The
importance of eigenvalues and eigenvectors in several areas is also briefly illustrated with respect to characterization of
scalar field extrema, dynamical systems, Markov chains, and multivariate statistics.
1
2 Some Basic Concepts matrix P can be found such that:
2
Given a matrix A, which we will considered as being An eigenvalue λi can also have a respective geometric
associated to its respectively implemented linear trans- multiplicity, γA (λi ), which is given as:
formation, we can define one of its eigenvalues λ and re-
spective eigenvector ~v as obeying: γA (λi ) = N − rank(A − λi I) (11)
3
The above result allows us to obtain all the N eigenval- The following R algorithm can be used to visualize the
ues (in case they exist) given matrix A and its respective Gershgorin discs of the N × N input matrix A. Observe
eigenvectors. that N th is the angular resolution for plotting the discs.
Observe that a matrix A may not have its inverse. In
this case, its single-value decomposition can be used for Algorithm 1 Gershgorin(A, N )
respective diagonalization (e.g. [1]), being applicable even
to non-square matrices. 1. N th ← 300
Now, if we right-multiply both sides of Equation 10 by 2. dth ← 2 ∗ pi/(N th − 1)
V −1 , and assuming a set of eigenvalues given as Λ, we
have: 3. th ← seq(0, 2 ∗ pi, dth)
A = V ΛV −1 . (17)
4. a ← 7
which provides a method for designing a matrix A with
5. f or(i in seq(1, N )){
pre-specified eigenvalues and eigenvectors.
Given a row vector ~v with at least 2 components, it is (a) R ← sum(abs(Ad[i, ]))
easy to obtain a matrix A of which it is an eigenvector. (b) Gx ← R ∗ cos(2 ∗ pi ∗ th) + A[i, i]
This can be done as:
(c) Gy ← R ∗ sin(2 ∗ pi ∗ th)
T
A = ~v~v (18) (d) plot(Gx, Gy, xlim = c(−a, a), . . .
ylim = c(−a, a), type = ”l”)
If p~ is another vector orthogonal to ~v with the same
(e) par(new = TRUE)}
dimension, a matrix A having ~v and p~ as eigenvectors,
but a priori unspecified eigenvalues, can now be obtained
as:
Figure 2 illustrates the Gershgorin discs (in salmon),
A = ~v~v T + p~p~T (19)
as well as the actual three eigenvalues (green triangles,
Up to N vectors N × 1, each one orthogonal to all the calculated by using an eigenvalue/eigenvector library), of
others, can be combined in this manner so as to obtain the following matrix:
an N × N matrix A having them as eigenvectors.
−1
1 0
A = −2 2 −1 , (21)
4 Gershgorin Discs 1 2 0
4
preserving all the other entries, higher also will be the ~˜
We have, from property [P 4] in Table 1, that this H(X)
dispersion of the discs along the real axis. has eigenvalues λ1 = 2 and λ2 = −2, implying the found
In addition, the larger the absolute values along respec- critical point to be a saddle point.
tive rows of A, the higher the uncertainty, as inferred by
this approach, in bounding the eigenvalues. However, we
should not take higher bounding regions to necessarily 6 Linear Dynamical Systems
imply larger eigenvalues.
An interesting situation arises when we have a row i in A homogeneous (no constant terms in the equations) lin-
which only the respective element corresponding to the ear dynamical system involving N variables (<N ) with
diagonal, ai,i , is non-zero. In this case, one of the eigen- constant coefficients can be expressed as:
values will necessarily be equal to ai,i .
ẋ1 (t) = a1,1 x1 (t) + a1,2 x2 (t) + . . . + a1,N xN (t)
ẋ2 (t) = a2,1 x1 (t) + a2,2 x2 (t) + . . . + a12,N xN (t)
S:
5 Extrema of Multivariate Func-
...
ẋN (t) = aN,1 x1 (t) + aN,2 x2 (t) + . . . + aN,N xN (t)
tions
In our first application example, we will briefly ad- where ai , j, i, j = 1, 2, . . . , N are real values.
dress the classification of the extrema of a scalar field The system S can be placed in the equivalent matrix
ψ(x1 , x2 , . . . , xN ) defined on <N . form:
First, we identify the points yielding null respective gra-
ẋ1 (t)
a1,1 a1,2 ... a1,N
x1 (t)
dient, i.e. ||∇ψ(x ~ 1 , x2 , . . . , xN )|| = 0. These so-called ẋ2 (t) a2,1 a2,2 ... a2,N x2 (t)
critical points are candidates for being extrema of ψ. =
.. .. .. .. .. ..
However, additional testing is required considering the
. . . . . .
eigenvalues of the Hessian matrix of A, namely: ẋN (t) aN,1 aN,2 ... aN,N xN (t)
∂2ψ (24)
∂2ψ ∂2ψ
∂2x ∂x∂y ∂x∂z or, more synthetically:
∂2ψ ∂2ψ ∂2ψ
Hψ(~p) = (22)
∂y∂x ∂2y ∂y∂z
∂2ψ ∂2ψ ∂2ψ ~x˙ (t) = A~x(t). (25)
∂z∂x ∂z∂y ∂2z
The following criteria can then be used while trying to where A is an N × N matrix, and both ~x˙ and ~x are
classify the types of extrema of a scalar field: column vectors.
It can be shown (e.g. [7]) that the general solution of
1. Positive-definite Hessian: All eigenvalues of
a linear system of ordinary differential equations at con-
H(x̃, ỹ, z̃) are positive (i.e. A is positive definite) =⇒
stant coefficients can be expressed as:
(x̃, ỹ, z̃) is a local minimum point;
2. Negative-definite Hessian: All eigenvalues of ~x(t) = c1 e−λ1 t~v1 + c2 e−λ2 t~v2 + . . . + cN e−λN t~vN (26)
H(x̃, ỹ, z̃) are negative (i.e. A is negative definite)
=⇒ (x̃, ỹ, z̃) is a local maximum point; where c1 , c2 , . . . , cN are constants, λ1 , λ2 , . . . , λN are
the eigenvalues of A with respective eigenvectors
3. Indefinite Hessian: The eigenvalues of H(x̃, ỹ, z̃)
~v1 , ~v2 , . . . , ~vN , provided they can be found.
are a mixture of positive and negative values =⇒
(x̃, ỹ, z̃) is a saddle point; In case an initial condition ~x0 is provided, the constants
can be determined as:
4. Otherwise: One or more null eigenvalues =⇒ addi-
c1
x0,1
tional analysis is needed. −1
↑ ↑ ... ↑
c2 x0,2
Observe the importance of the eigenvalues of H(ψ) in ~ =
C .. = ~v1
~v2 . . . ~vN
..
identifying the types of extrema of a scalar field. .
↓ ↓ ... ↓
.
Let’s illustrate the identification of the extrema of the cN x0,N
~
scalar field ψ = x2 − y 2 . Its gradient is ∇(ψ) = 2xı̂ − 2ŷ, (27)
which has null magnitude only when x = y = 0, so that As an example, let’s consider the solution of the linear
we have X̃ ~ = (0, 0) as the sole critical point of ψ. The ODE system with:
Hessian of ψ is given as:
1 0 3
~˜ = 2 0
2 0 1 (28)
H(X) . (23)
0 −2 0 1 3
5
We obtain λ1 = 3.81912..., λ1 = 0.09043...+1.14062...i, Let the following ordinary differential equation:
and λ3 = 0.09043... − 1.14062...i, with respective eigen-
vectors: ~x˙ (t) = f (~x(t)). (32)
0.63557... Its discretization in time yields:
~v1 = 0.48922... ;
0.59725... ~x(t + ∆t) − ~x(t)
~x˙ (t) = lim∆t→0 ≈
∆t
0.15670... + 0.50738...i ~x(t + ∆t) − ~x(t)
~v2 = 0.80703... ; ≈ ≈ f (~x, t), (33)
∆t
−0.24042... − 0.09425i
which implies:
0.15670... − 0.50738...i
6
theorem that they are bound in magnitude to 1. How-
ever, as we also know that square matrices have the same
left- and right-eigenvalues, we conclude that the right-
eigenvalues of a stochastic matrix are also bound by mag-
nitude 1.
In addition, we have that every column-stochastic ma-
trix has at least one left-eigenvector identical to ~1 (row
vector with all entries identical to 1), because ~1A = ~1
effectively implements the sum of each of the columns of
A. This vector is associated to the eigenvalue 1, so ev-
ery row- and column-stochastic matrix have at least one Figure 4: The graph associated to the stochastic matrix in the
considered example. It is also interesting to imagine a uniformly
eigenvalue equal to 1.
random walk performed by a hypothetical agent along this graph,
If A is irreducible, we have from the Perron-Frobenius taking the outgoing links according to the respective transition prob-
that the eigenvector associated to λ = 1 can be placed in abilities.
a form with strictly positive elements. This eigenvector
will be associated to the stationary state of the respective
Markov chain, also implying that every respective state with corresponding eigenvectors:
will have a non-null probability.
It should be kept in mind that a stochastic matrix A −0.122... 0.435... − 0.334...i
can have: (i) more than one eigenvalue equal to 1; (ii) −0.967... −0.745...
−0.193... ; ~v2 = 0.036... + 0.241...i ;
~v1 =
eigenvalues equal to zero; (iii) negative eigenvalues; (iv)
complex eigenvalues. −0.110... −0.273... + 0.092...i
If A is a stochastic matrix, Equation 36 defines a re-
0.435... + 0.334...i
−0.734...
spective Markov chain on the states in ~x(t). −0.745... 0.600...
Let’s consider that A is irreducible and regular. This 0.036... − 0.241...i ; ~v4 = −0.146...
~v3 =
effectively means that the state xi (t) associated to any 0.273... − 0.092...i 0.280...
node i of the graph representing A will, along time, influ-
ences any of the other nodes with a non-null contribution. Observe the coexistence of real and complex eigenvalues
As already observed, the left-eigenvector associated to and eigenvectors.
the eigenvalue 1 of A corresponds to the equilibrium As expected, A has one eigenvalue identical to 1, with
or stationary distribution of probabilities of the Markov an associated real eigenvector corresponding to the sta-
chain states, i.e.: tionary state. This can be transformed into probabilities
by normalizing p~1 = ~v1 /sum(~v1 ), which yields:
A~
p = p~ (37)
0.0878...
0.6940...
p~1 =
0.1388...
Interestingly, this eigenvector does not depend on the 0.0793...
initial state ~x(t = 0), and therefore has no ‘memory’ of
the past dynamics or initial condition. In case we understand the transition probabilities in
Let’s consider the simple example of Markov chain pre- A as corresponding to a uniformly random walk on the
sented in Figure 4. respective system, the obtained distribution p~1 indicates
The respective transition matrix A can be obtained as: that node 2 will be much more frequently visited than the
others, followed by the third, first and forth nodes.
Figure 5 illustrates the unfolding of the states values
0.3 0 0.1 0.6
associated to the nodes along the discrete time steps t =
0.7 0.9 0 0.1 0, 1, 2, . . . , 10.
. (38)
0 0.1 0.5 0
0 0 0.4 0.3
8 Multivariate Statistics
This matrix can be verified to be irreducible. The multivariate normal distribution (e.g. [9]) is particu-
The respective eigenvalues are λ1 = 1, λ2 = 0.45461...+ larly important for modeling and trying to make predic-
0.30132...i, λ3 = 0.4546... − 0.3013...i, and λ4 = 0.0907..., tions on a whole set of random variables (measurements).
7
Indeed, the covariance can be shown to be positive
semidefinite, i.e. its eigenvalues are all larger or equal to
0, therefore its determinant is also nonnegative.
In addition, by being symmetric, its eigenvalues are
all real, and the eigenvectors ~vi corresponding to distinct
eigenvalues are orthogonal.
Therefore, if we define the matrix:
← v~1 →
← v~2 →
P = , (42)
.. .. ..
. . .
← v~N →
Figure 5: The values of the states probabilities (frequency of visits
by a hypothetical agent) along the discrete time t from the initial it will orthogonal. We can apply this matrix on the
condition ~
x0 = [0, 0, 0, 1]. Observe how, as a consequence of the
original random vector, yielding a new random variable
interconnections and respective probability transitions, the value of
vector Y~ whose elements are linear combinations of the
the state of node 4, initially equal to 1, decreases quickly, as node
2 progressively concentrates the density of visiting agents. original random vectors, i.e.:
~ = PX
Y ~ (43)
Considering an N × N domain, the multivariate normal which corresponds to a linear statistical transforma-
probability density function with average vector µ
~ and tion known as discrete Karhunen-Loève transform, which
covariance matrix K can be expressed as: provides the basis for the Principal Component Analysis
(PCA) methodology [10, 11]. PCA implements a rotation
~ =
gµ~ ,K (X)
of the original coordinates axes so as to align the first axes
1 1 T
= |K|−1/2 exp − ~ −µ
X ~ K −1 X ~ −µ~ (39) with the directions of largest variation, as quantified by
(2π)M/2 2 the respective variances. The obtained random variables
result completely uncorrelated, and their variance is equal
Observe the quadratic form of K −1 in the argument of
to eigenvalues associated to the respective axes.
the exponential above.
Given N random variables Xi , i = 1, 2, . . . , N , rep-
resented as the random vector X, ~ and their respective
9 Concluding Remarks
joint probability density function, the corresponding co-
variances can be defined as: The present work presented, briefly and in an introduc-
ˆ ∞ tory manner, the concept, properties and applications of
cov(Xi , Xj ) = (Xi − µXi )(Xj − µXj )p(X)d~ X ~ (40) eigenvalues and eigenvectors from ‘eigen-centered’ posi-
−∞
tion. By starting with a review of some of their important
The respective unbiased estimator, from M samples of properties, it was possible to discuss the subsequent ap-
the random variables is given as: plications in a more integrated and systematic way. The
interesting Gershgorin approach was also briefly outlined
M
1 X and illustrated.
cov(Xi , Xj ) ≈ (Xi − µXi )(Xj − µXj ) (41)
M −1 The addressed eigenvalues and eigenvectors applica-
k=1
tions included scalar field extrema characterization, so-
The covariance matrix K can now be defined so that lution of linear dynamic systems at constant coefficients,
each of its elements ki,j = cov(Xi , Xj ). This matrix has Markov chains, as well as some aspects of multivariate
important properties, some of which are presented as fol- statistics.
lows. The already large potential of theoretical and practical
First, we have that, as a consequence of its own defini- applications of eigenvalues and eigenvectors is being con-
tion, that it is necessarily real and symmetric. Then, as stantly further enhanced thanks to continuing advances
the diagonal elements correspond to the respective vari- in computer science, allowing respective calculations on
ance (i.e. var(Xi ) = cov(Xi, Xi), an average of squared matrices of ever increasing sizes. This opens up new
values), all its elements are necessarily non-zero, and so prospects in theoretical and applied research. It is hoped
is its trace, implying that the respective eigenvalues add that the covered presentation may motivate the reader to
to a nonnegative value. probe further in this interesting area.
8
Acknowledgments. 340114268_Features_Transformation_and_
Luciano da F. Costa thanks CNPq (grant Normalization_A_Visual_Approach_CDT-24.
no. 307085/2018-0) for sponsorship. This work has [Online; accessed 10-Apr-2020.].
benefited from FAPESP grant 15/22308-2.
Costa’s Didactic Texts – CDTs
CDTs intend to be a halfway point between a
formal scientific article and a dissemination text
in the sense that they: (i) explain and illustrate
References concepts in a more informal, graphical and acces-
[1] G. H. Golub and C. F. van Loan. Matrix Computa- sible way than the typical scientific article; and
tions. The Johns Hopkins University Press, 1996. (ii) provide more in-depth mathematical develop-
ments than a more traditional dissemination work.
[2] Wikipedia. Eigenvalues and eigenvectors.
Wikipedia, the free encyclopedia, 2020. It is hoped that CDTs can also incorporate new
https://en.wikipedia.org/wiki/Eigenvalues_ insights and analogies concerning the reported
and_eigenvectors. [Online; accessed 10-Apr-2020.]. concepts and methods. We hope these character-
istics will contribute to making CDTs interesting
[3] H. Sagan. Boundary and eigenvalue problems in
both to beginners as well as to more senior
mathematical Physics. Dover, 1989.
researchers.
[4] H. von Helmholtz. Die Lehre von den Tonempfindun-
gen als Physiologische Grundlage für die Rhoerie der Each CDT focuses on a limited set of interrelated
Musik. Braunschweig Druk und Verlag von Friedrich concepts. Though attempting to be relatively
Vieweg und Sohn, 1896. self-contained, CDTs also aim at being relatively
short. Links to related material are provided in
[5] R. A. Horn and C. R. Johnson. Matrix Analysis. order to complement the covered subjects.
Cambridge University Press, 2012.
Observe that CDTs, which come with absolutely
[6] S. Gershgorin. Über die abgrenzung der eigenwerte
no warranty, are non distributable and for non-
einer matrix. Izv. Akad. Nauk. URSS Otd. Fiz.-Mat.
commercial use only.
Nauk, pages 749–754, 1931.
[7] R. K. Nagle, E. B. Saff, and A. D. Snider. Funda- The complete set of CDTs can be found
mentals of Differential Equations. Pearson, 2017. at: https://www.researchgate.net/project/
Costas-Didactic-Texts-CDTs.
[8] G. Kemeny and J. L. Snell. Finite Markov Chains.
van Nostrand Princeton, 1960.
9
Table 1: Some of the some properties of eigenvalues and eigenvectors
of a matrix A.
10
View publication stats