0% found this document useful (0 votes)
77 views

2019 GHOJOGN Generalized Eigenvalue Tutorial

This document provides a tutorial on eigenvalue and generalized eigenvalue problems. It introduces the eigenvalue problem, which involves finding eigenvectors and eigenvalues that satisfy a matrix equation involving a single symmetric matrix. The generalized eigenvalue problem similarly finds eigenvectors and eigenvalues but involves two symmetric matrices. Examples of problems in machine learning that reduce to eigenvalue problems are also given, such as principal component analysis, kernel supervised principal component analysis, and Fisher discriminant analysis.

Uploaded by

carlos carrasco
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

2019 GHOJOGN Generalized Eigenvalue Tutorial

This document provides a tutorial on eigenvalue and generalized eigenvalue problems. It introduces the eigenvalue problem, which involves finding eigenvectors and eigenvalues that satisfy a matrix equation involving a single symmetric matrix. The generalized eigenvalue problem similarly finds eigenvectors and eigenvalues but involves two symmetric matrices. Examples of problems in machine learning that reduce to eigenvalue problems are also given, such as principal component analysis, kernel supervised principal component analysis, and Fisher discriminant analysis.

Uploaded by

carlos carrasco
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Eigenvalue and Generalized Eigenvalue Problems: Tutorial

Benyamin Ghojogh BGHOJOGH @ UWATERLOO . CA


Department of Electrical and Computer Engineering,
Machine Learning Laboratory, University of Waterloo, Waterloo, ON, Canada
arXiv:1903.11240v1 [stat.ML] 25 Mar 2019

Fakhri Karray KARRAY @ UWATERLOO . CA


Department of Electrical and Computer Engineering,
Centre for Pattern Analysis and Machine Intelligence, University of Waterloo, Waterloo, ON, Canada
Mark Crowley MCROWLEY @ UWATERLOO . CA
Department of Electrical and Computer Engineering,
Machine Learning Laboratory, University of Waterloo, Waterloo, ON, Canada

Abstract yield to the eigenvalue and generalized eigenvalue prob-


This paper is a tutorial for eigenvalue and gen- lems. Some examples of these optimization problems in
eralized eigenvalue problems. We first intro- machine learning are also introduced for better illustration.
duce eigenvalue problem, eigen-decomposition The examples include principal component analysis, kernel
(spectral decomposition), and generalized eigen- supervised principal component analysis, and Fisher dis-
value problem. Then, we mention the optimiza- criminant analysis.
tion problems which yield to the eigenvalue and
generalized eigenvalue problems. We also pro- 2. Introducing Eigenvalue and Generalized
vide examples from machine learning, includ- Eigenvalue Problems
ing principal component analysis, kernel super- In this section, we introduce the eigenvalue problem and
vised principal component analysis, and Fisher generalized eigenvalue problem.
discriminant analysis, which result in eigenvalue
and generalized eigenvalue problems. Finally, 2.1. Eigenvalue Problem
we introduce the solutions to both eigenvalue and The eigenvalue problem (Wilkinson, 1965;
generalized eigenvalue problems. Golub & Van Loan, 2012) of a symmetric matrix
A ∈ Rd×d is defined as:
1. Introduction Aφi = λi φi , ∀i ∈ {1, . . . , d}, (1)
Eigenvalue and generalized eigenvalue problems play im- and in matrix form, it is:
portant roles in different fields of science, especially in ma-
chine learning. In eigenvalue problem, the eigenvectors AΦ = ΦΛ, (2)
represent the directions of the spread or variance of data
where the columns of Rd×d ∋ Φ := [φ1 , . . . , φd ] are
and the corresponding eigenvalues are the magnitude of the
the eigenvectors and diagonal elements of Rd×d ∋ Λ :=
spread in these directions (Jolliffe, 2011). In generalized
diag([λ1 , . . . , λd ]⊤ ) are the eigenvalues. Note that φi ∈
eigenvalue problem, these directions are impacted by an-
Rd and λi ∈ R.
other matrix. If the other matrix is the identity matrix, this
impact is canceled and we will have the eigenvalue problem Note that for eigenvalue problem, the matrix A can be non-
capturing the directions of the maximum spread. symmetric. If the matrix is symmetric, its eigenvectors
are orthogonal/orthonormal and if it is non-symmetric, its
In this paper, we introduce the eigenvalue problem and gen-
eigenvectors are not orthogonal/orthonormal.
eralized eigenvalue problem and we introduce their solu-
tions. We also introduce the optimization problems which The Eq. (2) can be restated as:
⊤ ⊤
| {z } = ΦΛΦ
AΦ = ΦΛ =⇒ A ΦΦ
I

=⇒ A = ΦΛΦ⊤ = ΦΛΦ−1 , (3)


Eigenvalue and Generalized Eigenvalue Problems: Tutorial 2

where Φ⊤ = Φ−1 because Φ is an orthogonal matrix. which is an eigenvalue problem for A according to Eq. (1).
Moreover, note that we always have Φ⊤ Φ = I for orthog- The φ is the eigenvector of A and the λ is the eigenvalue.
onal Φ but we only have ΦΦ⊤ = I if “all” the columns of As the Eq. (6) is a maximization problem, the eigenvector
the orthogonal Φ exist (it is not truncated, i.e., it is a square is the one having the largest eigenvalue. If the Eq. (6) is
matrix). The Eq. (3) is referred to as “eigenvalue decom- a minimization problem, the eigenvector is the one having
position”, “eigen-decomposition”, or “spectral decomposi- the smallest eigenvalue.
tion”.
3.2. Optimization Form 2
2.2. Generalized Eigenvalue Problem Consider the following optimization problem with the vari-
The generalized eigenvalue problem (Parlett, 1998; able Φ ∈ Rd×d :
Golub & Van Loan, 2012) of two symmetric matrices A ∈
Rd×d and B ∈ Rd×d is defined as:
maximize tr(Φ⊤ A Φ),
Φ (7)
Aφi = λi Bφi , ∀i ∈ {1, . . . , d}, (4) subject to Φ⊤ Φ = I,
and in matrix form, it is:
where A ∈ Rd×d , the tr(.) denotes the trace of matrix, and
AΦ = BΦΛ, (5) I is the identity matrix. Note that according to the prop-
erties of trace, the objective function can be any of these:
where the columns of Rd×d ∋ Φ := [φ1 , . . . , φd ] are tr(Φ⊤ A Φ) = tr(ΦΦ⊤ A) = tr(AΦΦ⊤ ).
the eigenvectors and diagonal elements of Rd×d ∋ Λ := The Lagrangian (Boyd & Vandenberghe, 2004) for Eq. (7)
diag([λ1 , . . . , λd ]⊤ ) are the eigenvalues. Note that φi ∈ is:
Rd and λi ∈ R.
The generalized eigenvalue problem of Eq. (4) or (5) is 
L = tr(Φ⊤ A Φ) − tr Λ⊤ (Φ⊤ Φ − I) ,
denoted by (A, B). The (A, B) is called “pair” or “pencil”
(Parlett, 1998). The order in the pair matters. The Φ and Λ
are called the generalized eigenvectors and eigenvalues of where Λ ∈ Rd×d is a diagonal matrix whose entries are the
(A, B). The (Φ, Λ) or (φi , λi ) is called the “eigenpair” of Lagrange multipliers.
the pair (A, B) in the literature (Parlett, 1998). Equating derivative of L to zero gives us:
Comparing Eqs. (1) and (4) or Eqs. (2) and (5) shows that
the eigenvalue problem is a special case of the generalized ∂L set
eigenvalue problem where B = I. Rd×d ∋ = 2 AΦ − 2 ΦΛ = 0
∂Φ
=⇒ AΦ = ΦΛ,
3. Eigenvalue Optimization
In this section, we introduce the optimization problems which is an eigenvalue problem for A according to Eq. (2).
which yield to the eigenvalue problem. The columns of Φ are the eigenvectors of A and the diag-
3.1. Optimization Form 1 onal elements of Λ are the eigenvalues.
As the Eq. (7) is a maximization problem, the eigenvalues
Consider the following optimization problem with the vari-
and eigenvectors in Λ and Φ are sorted from the largest
able φ ∈ Rd :
to smallest eigenvalues. If the Eq. (7) is a minimization
maximize φ⊤ A φ, problem, the eigenvalues and eigenvectors in Λ and Φ are
φ sorted from the smallest to largest eigenvalues.
(6)
subject to φ⊤ φ = 1,
3.3. Optimization Form 3
where A ∈ Rd×d . The Lagrangian Consider the following optimization problem with the vari-
(Boyd & Vandenberghe, 2004) for Eq. (6) is: able φ ∈ Rd :

L = φ⊤ A φ − λ (φ⊤ φ − 1),
minimize ||X − φ φ⊤ X||2F ,
φ
(8)
where λ ∈ R is the Lagrange multiplier. Equating the
subject to φ⊤ φ = 1,
derivative of Lagrangian to zero gives us:

∂L set where X ∈ Rd×n and ||.||F denotes the Frobenius norm of


Rd ∋ = 2Aφ − 2λφ = 0 =⇒ Aφ = λφ,
∂φ matrix.
Eigenvalue and Generalized Eigenvalue Problems: Tutorial 3

The objective function in Eq. (8) is simplified as: 3.5. Optimization Form 5
||X− φφ ⊤
X||2F Consider the following optimization problem with the vari-
 able φ ∈ Rd :
= tr (X − φφ⊤ X)⊤ (X − φφ⊤ X)
= tr (X ⊤ − X ⊤ φφ⊤ )(X − φφ⊤ X)
 φ⊤ A φ
maximize . (10)
⊤ ⊤ ⊤ ⊤ ⊤ ⊤
 φ φ⊤ φ
= tr X X − X φφ X + X φ φ φ φ X
| {z }
1 According to Rayleigh-Ritz quotient method (Croot, 2005),

= tr(X X − X φφ X) ⊤ ⊤ this optimization problem can be restated as:

= tr(X ⊤ X) − tr(X ⊤ φφ⊤ X) maximize φ⊤ A φ,


φ
(11)
= tr(X ⊤ X) − tr(XX ⊤ φφ⊤ )
subject to φ⊤ φ = 1,
= tr(X ⊤ X − XX ⊤ φφ⊤ )
The Lagrangian (Boyd & Vandenberghe, 2004) is:
The Lagrangian (Boyd & Vandenberghe, 2004) is:
L = φ⊤ A φ − λ(φ⊤ φ − 1),
L = tr(X ⊤ X) − tr(XX ⊤ φφ⊤ ) − λ(φ⊤ φ − 1),
where λ is the Lagrange multiplier. Equating the derivative
where λ is the Lagrange multiplier. Equating the derivative
of L to zero gives:
of L to zero gives:
∂L ∂L set
Rd ∋
set
= 2 XX ⊤ φ − 2 λ φ = 0 = 2Aφ − 2λφ = 0
∂φ ∂w
(a)
=⇒ 2 A φ = 2 λ φ =⇒ A φ = λ φ,
=⇒ XX ⊤ φ = λ φ =⇒ A φ = λ φ,
which is an eigenvalue problem for A according to Eq. (1).
where (a) is because we take Rd×d ∋ A = XX ⊤ . The The φ is the eigenvector of A and the λ is the eigenvalue.
A φ = λ φ is an eigenvalue problem for A according to As the Eq. (10) is a maximization problem, the eigenvector
Eq. (1). The φ is the eigenvector of A and the λ is the is the one having the largest eigenvalue. If the Eq. (10) is
eigenvalue. a minimization problem, the eigenvector is the one having
the smallest eigenvalue.
3.4. Optimization Form 4
Consider the following optimization problem with the vari- 4. Generalized Eigenvalue Optimization
able Φ ∈ Rd×d :
In this section, we introduce the optimization problems
minimize ||X − Φ Φ⊤ X||2F , which yield to the generalized eigenvalue problem.
Φ (9)
subject to Φ⊤ Φ = I, 4.1. Optimization Form 1
d×n Consider the following optimization problem with the vari-
where X ∈ R .
Similar to what we had for Eq. (8), the objective function able φ ∈ Rd :
in Eq. (9) is simplified as:
maximize φ⊤ A φ,
φ
||X− ΦΦ ⊤
X||2F ⊤
= tr(X X − XX ΦΦ ) ⊤ ⊤ (12)
subject to φ⊤ B φ = 1,
The Lagrangian (Boyd & Vandenberghe, 2004) is:
where A ∈ Rd×d and B ∈ Rd×d . The Lagrangian
L = tr(X ⊤ X) − tr(XX ⊤ ΦΦ⊤ )
 (Boyd & Vandenberghe, 2004) for Eq. (12) is:
− tr Λ⊤ (Φ⊤ Φ − I) ,
L = φ⊤ A φ − λ (φ⊤ B φ − 1),
d×d
where Λ ∈ R is a diagonal matrix including Lagrange
multipliers. Equating the derivative of L to zero gives: where λ ∈ R is the Lagrange multiplier. Equating the
derivative of Lagrangian to zero gives us:
∂L set
Rd×d ∋ = 2 XX ⊤ Φ − 2 ΦΛ = 0
∂Φ ∂L set
Rd ∋ = 2Aφ − 2λBφ = 0 =⇒ Aφ = λBφ,
=⇒ XX ⊤ Φ = ΦΛ =⇒ AΦ = ΦΛ, ∂φ
which is an eigenvalue problem for A according to Eq. (2). which is a generalized eigenvalue problem (A, B) accord-
The columns of Φ are the eigenvectors of A and the diag- ing to Eq. (4). The φ is the eigenvector and the λ is the
onal elements of Λ are the eigenvalues. eigenvalue for this problem.
Eigenvalue and Generalized Eigenvalue Problems: Tutorial 4

As the Eq. (12) is a maximization problem, the eigenvector where λ is the Lagrange multiplier. Equating the derivative
is the one having the largest eigenvalue. If the Eq. (12) is of L to zero gives:
a minimization problem, the eigenvector is the one having
∂L set
the smallest eigenvalue. Rd ∋ = 2 XX ⊤ φ − 2 λ B φ = 0
∂φ
Comparing Eqs. (6) and (12) shows that eigenvalue prob-
lem is a special case of generalized eigenvalue problem =⇒ XX ⊤ φ = λ B φ =⇒ A φ = λ B φ,
where B = I.
which is a generalized eigenvalue problem (A, B) accord-
4.2. Optimization Form 2 ing to Eq. (4). The φ is the eigenvector and the λ is the
eigenvalue.
Consider the following optimization problem with the vari-
able Φ ∈ Rd×d : 4.4. Optimization Form 4

maximize tr(Φ A Φ), Consider the following optimization problem with the vari-
Φ (13) able Φ ∈ Rd×d :
subject to Φ⊤ B Φ = I,
minimize ||X − Φ Φ⊤ X||2F ,
d×d d×d Φ
where A ∈ R and B ∈ R . Note that according to (15)
the properties of trace, the objective function can be any of subject to Φ⊤ B Φ = I,
these: tr(Φ⊤ A Φ) = tr(ΦΦ⊤ A) = tr(AΦΦ⊤ ).
where X ∈ Rd×n .
The Lagrangian (Boyd & Vandenberghe, 2004) for Eq.
(13) is: Similar to what we had for Eq. (9), the objective function
 in Eq. (15) is simplified as:
L = tr(Φ⊤ A Φ) − tr Λ⊤ (Φ⊤ B Φ − I) ,
||X− ΦΦ⊤ X||2F = tr(X ⊤ X − XX ⊤ ΦΦ⊤ )
d×d
where Λ ∈ R is a diagonal matrix whose entries are the
Lagrange multipliers. The Lagrangian (Boyd & Vandenberghe, 2004) is:
Equating derivative of L to zero gives us: L = tr(X ⊤ X) − tr(XX ⊤ ΦΦ⊤ )

∂L set − tr Λ⊤ (Φ⊤ B Φ − I) ,
Rd×d ∋ = 2 AΦ − 2 BΦΛ = 0
∂Φ
=⇒ AΦ = BΦΛ, where Λ ∈ Rd×d is a diagonal matrix including Lagrange
multipliers. Equating the derivative of L to zero gives:
which is an eigenvalue problem (A, B) according to Eq.
(5). The columns of Φ are the eigenvectors of A and the ∂L set
Rd×d ∋ = 2 XX ⊤ Φ − 2 B ΦΛ = 0
diagonal elements of Λ are the eigenvalues. ∂Φ
As the Eq. (13) is a maximization problem, the eigenvalues =⇒ XX ⊤ Φ = BΦΛ =⇒ AΦ = BΦΛ,
and eigenvectors in Λ and Φ are sorted from the largest
which is an eigenvalue problem (A, B) according to Eq.
to smallest eigenvalues. If the Eq. (13) is a minimization
(5). The columns of Φ are the eigenvectors of A and the
problem, the eigenvalues and eigenvectors in Λ and Φ are
diagonal elements of Λ are the eigenvalues.
sorted from the smallest to largest eigenvalues.
4.5. Optimization Form 5
4.3. Optimization Form 3
Consider the following optimization problem (Parlett,
Consider the following optimization problem with the vari-
1998) with the variable φ ∈ Rd :
able φ ∈ Rd :
minimize ||X − φ φ⊤ X||2F , φ⊤ A φ
maximize . (16)
φ
(14) φ φ⊤ B φ
subject to φ⊤ B φ = 1,
According to Rayleigh-Ritz quotient method (Croot, 2005),
where X ∈ Rd×n . this optimization problem can be restated as:
Similar to what we had for Eq. (8), The objective function
maximize φ⊤ A φ,
in Eq. (14) is simplified as: φ
(17)
||X− φφ ⊤
X||2F ⊤
= tr(X X − XX φφ ) ⊤ ⊤ subject to φ⊤ B φ = 1,

The Lagrangian (Boyd & Vandenberghe, 2004) is: The Lagrangian (Boyd & Vandenberghe, 2004) is:

L = tr(X ⊤ X) − tr(XX ⊤ φφ⊤ ) − λ(φ⊤ B φ − 1), L = φ⊤ A φ − λ(φ⊤ B φ − 1),


Eigenvalue and Generalized Eigenvalue Problems: Tutorial 5

where λ is the Lagrange multiplier. Equating the derivative If we consider several PCA directions, i.e., the columns of
of L to zero gives: U , the minimization of the reconstruction error is:
∂L set minimize ||X − U U ⊤ X||2F ,
= 2Aφ − 2λB φ = 0 U (21)
∂w
=⇒ 2 A φ = 2 λ B φ =⇒ A φ = λ B φ, subject to U ⊤ U = I.
Thus, the columns of U are the eigenvectors of the covari-
which is a generalized eigenvalue problem (A, B) accord-
ance matrix S = XX ⊤ (the X is already centered by
ing to Eq. (4). The φ is the eigenvector and the λ is the
removing its mean).
eigenvalue.
As the Eq. (16) is a maximization problem, the eigenvector 5.2. Examples for Generalized Eigenvalue Problem
is the one having the largest eigenvalue. If the Eq. (16) is 5.2.1. K ERNEL S UPERVISED P RINCIPAL C OMPONENT
a minimization problem, the eigenvector is the one having A NALYSIS
the smallest eigenvalue.
Kernel Supervised PCA (SPCA) (Barshan et al., 2011)
uses the following optimization problem:
5. Examples for the Optimization Problems
In this section, we introduce some examples in machine maximize tr(Θ⊤ K x HK y HK x Θ),
Θ (22)
learning which use the introduced optimization problems.
subject to Θ⊤ K x Θ = I,
5.1. Examples for Eigenvalue Problem where K x and K y are the kernel matrices over the train-
5.1.1. VARIANCE IN P RINCIPAL C OMPONENT ing data and the labels of the training data, respectively,
A NALYSIS the H := I − (1/n)11⊤ is the centering matrix, and the
In Principal Component Analysis (PCA) (Pearson, 1901; columns of Θ span the kernel SPCA subspace.
Friedman et al., 2009), if we want to project onto one vec- According to Eq. (13), the solution to Eq. (22) is:
tor (one-dimensional PCA subspace), the problem is:
K x HK y HK x Θ = K x ΘΛ, (23)
maximize u⊤ S u, which is the generalized eigenvalue problem
u
(18)
subject to ⊤
u u = 1, (K x HK y HK x , K x ) according to Eq. (5) where
the Θ and Λ are the eigenvector and eigenvalue matrices,
where u is the projection direction and S is the covariance respectively.
matrix. Therefore, u is the eigenvector of S with the largest
eigenvalue. 5.2.2. F ISHER D ISCRIMINANT A NALYSIS
If we want to project onto a PCA subspace spanned by sev- Another example is Fisher Discriminant Analysis (FDA)
eral directions, we have: (Fisher, 1936; Friedman et al., 2009) in which the Fisher
criterion (Xu & Lu, 2006) is maximized:
maximize tr(U ⊤ S U ),
U (19) w⊤ S B w
⊤ maximize , (24)
subject to U U = I, w w⊤ S W w
where the columns of U span the PCA subspace. where w is the projection direction and S B and S W are
between- and within-class scatters:
5.1.2. R ECONSTRUCTION IN P RINCIPAL C OMPONENT c
X
A NALYSIS SB = (µi − µt )(µi − µt )⊤ , (25)
We can look at PCA with another perspective: PCA is the j=1
best linear projection which has the smallest reconstruc- nj
c X
X
tion error. If we have one PCA direction, the projection is SW = (xj,i − µi )(xj,i − µi )⊤ , (26)
u⊤ X and the reconstruction is uu⊤ X. We want the error j=1 i=1
between the reconstructed data and the original data to be c is the number of classes, nj is the sample size of the j-th
minimized: class, xj,i is the i-th data point in the j-th class, µi is the
minimize ||X − u u⊤ X||2F , mean of the i-th class, and µt is the total mean.
u
(20) According to Rayleigh-Ritz quotient method (Croot, 2005),
subject to u⊤ u = 1.
the optimization problem in Eq. (24) can be restated as:
Therefore, u is the eigenvector of the covariance matrix
maximize w ⊤ S B w,
S = XX ⊤ (the X is already centered by removing its w
(27)
mean). subject to w ⊤ S W w = 1.
Eigenvalue and Generalized Eigenvalue Problems: Tutorial 6

The Lagrangian (Boyd & Vandenberghe, 2004) is: The ρ is stationary at φ 6= 0 if and only if:

L = w⊤ S B w − λ(w ⊤ S W w − 1), (A − λ B) φ = 0, (31)

where λ is the Lagrange multiplier. Equating the derivative for some scalar λ (Parlett, 1998). The Eq. (31) is a linear
of L to zero gives: system of equations. This system of equations can also be
obtained from the Eq. (4):
∂L set
= 2 SB w − 2 λ SW w = 0
∂w Aφi = λi Bφi =⇒ (A − λi B) φi = 0. (32)
=⇒ 2 S B w = 2 λ S W w =⇒ S B w = λ S W w,
As we mentioned earlier, eigenvalue problem is a special
which is a generalized eigenvalue problem (S B , S W ) ac- case of generalized eigenvalue problem (where B = I)
cording to Eq. (4). The w is the eigenvector with the largest which is obvious by comparing Eqs. (28) and (32).
eigenvalue and the λ is the corresponding eigenvalue. According to Cramer’s rule, a linear system of equations
has non-trivial solutions if and only if the determinant van-
6. Solution to Eigenvalue Problem ishes. Therefore:
In this section, we introduce the solution to the eigenvalue
problem. Consider the Eq. (1): det(A − λi B) = 0. (33)

Aφi = λi φi =⇒ (A − λi I) φi = 0, (28) Similar to the explanations for Eq. (29), we can solve for
the roots of Eq. (33). However, note that the Eq. (33) is
which is a linear system of equations. According to obtained from Eq. (4) or (16) where only one eigenvector
Cramer’s rule, a linear system of equations has non-trivial φ is considered.
solutions if and only if the determinant vanishes. There- For solving Eq. (5) in general case, there exist two solu-
fore: tions for the generalized eigenvalue problem one of which
is a quick and dirty solution and the other is a rigorous
det(A − λi I) = 0, (29) method. Both of the methods are explained in the follow-
ing.
where det(.) denotes the determinant of matrix. The Eq.
(29) gives us a d-degree polynomial equation which has d 7.1. The Quick & Dirty Solution
roots (answers). Note that if the A is not full rank (if it is a
Consider the Eq. (5) again:
singular matrix), some of the roots will be zero. Moreover,
if A is positive semi-definite, i.e., A  0, all the roots are AΦ = BΦΛ.
non-negative.
The roots (answers) from Eq. (29) are the eigenvalues of If B is not singular (is invertible ), we can left-multiply the
A. After finding the roots, we put every answer in Eq. (28) expressions by B −1 :
and find its corresponding eigenvector, φi ∈ Rd . Note that
(a)
putting the root in Eq. (28) gives us a vector which can B −1 AΦ = ΦΛ =⇒ CΦ = ΦΛ, (34)
be normalized because the direction of the eigenvector is
important and not its magnitude. The information of mag- where (a) is because we take C = B −1 A. The Eq. (34) is
nitude exists in its corresponding eigenvalue. the eigenvalue problem for C according to Eq. (2) and can
be solved using the approach of Eq. (29).
7. Solution to Generalized Eigenvalue Note that even if B is singular, we can use a numeric hack
Problem (which is a little dirty) and slightly strengthen its main di-
In this section, we introduce the solution to the generalized agonal in order to make it full rank:
eigenvalue problem. Recall the Eq. (16) again:
(B + εI)−1 AΦ = ΦΛ =⇒ CΦ = ΦΛ, (35)

φ Aφ
maximize . where ε is a very small positive number, e.g., ε = 10−5 ,
φ φ⊤ B φ large enough to make B full rank.
Let ρ be this fraction named Rayleigh quotient (Croot,
7.2. The Rigorous Solution
2005):
Consider the Eq. (5) again:
u⊤ A u
ρ(u; A, B) := , ∀u 6= 0. (30) AΦ = BΦΛ.
u⊤ B u
Eigenvalue and Generalized Eigenvalue Problems: Tutorial 7

There exist a rigorous method to solve the generalized 1 ΦB , ΛB ← BΦB = ΦB ΛB


eigenvalue problem (Wang, 2015) which is explained in the −1/2 1/2
following. 2 Φ̆B ← Φ̆B = ΦB ΛB ≈ ΦB (ΛB + εI)−1

Consider the eigenvalue problem for B: 3 Ă ← Ă = Φ̆B AΦ̆B
4 ΦA , ΛA ← ĂΦA = ΦA ΛA
BΦB = ΦB ΛB , (36)
5 Λ ← Λ = ΛA
where ΦB and ΛB are the eigenvector and eigenvalue ma- 6 Φ ← Φ = Φ̆B ΦA
trices of B, respectively. Then, we have: 7 return Φ and Λ
BΦB = ΦB ΛB =⇒ Φ−1 −1
B BΦB = ΦB ΦB ΛB = ΛB Algorithm 1: Solution to the generalized eigen-
| {z }
I value problem AΦ = BΦΛ.
(a)
=⇒ Φ⊤
B BΦB = ΛB , (37)
where:
where (a) is because ΦB is an orthogonal matrix (its −1/2
columns are orthonormal) and thus Φ−1 ⊤ Φ := Φ̆B ΦA = ΦB ΛB ΦA . (43)
B = ΦB .
−1/2
We multiply ΛB to equation (37) from left and right The Φ also diagonalizes B because (I is a diagonal ma-
hand sides: trix):
−1/2 −1/2 −1/2 −1/2 (43)
ΛB Φ⊤
B BΦB ΛB = ΛB ΛB ΛB = I, −1/2 −1/2
Φ⊤ BΦ = (ΦB ΛB ΦA )⊤ B(ΦB ΛB ΦA )
⊤ −1/2 −1/2
=⇒ Φ̆B B Φ̆B = I, = Φ⊤
A ΛB (Φ⊤
B BΦB )ΛB ΦA
(37) −1/2 −1/2
where: = Φ⊤ ΛB ΛB ΦA = Φ⊤
A ΛB A ΦA
−1/2
| {z }
Φ̆B := ΦB ΛB . (38) I
(a)
We define Ă as: = Φ−1
A ΦA = I, (44)
⊤ where (a) is because ΦA is an orthogonal matrix. From
Ă := Φ̆B AΦ̆B . (39)
equation (44), we have:
The Ă is symmetric because: Φ⊤ BΦ = I =⇒ Φ⊤ BΦΛA = ΛA
⊤ ⊤ (a) ⊤ (42)
Ă = (Φ̆B AΦ̆B )⊤ = Φ̆B AΦ̆B = Ă. =⇒ Φ⊤ BΦΛA = Φ⊤ AΦ
(a)
where (a) notices that A is symmetric. =⇒ BΦΛA = AΦ, (45)
The eigenvalue problem for Ă is: where (a) is because Φ 6= 0.
ĂΦA = ΦA ΛA , (40) Comparing equations (5) and (45) shows us:

where ΦA and ΛA are the eigenvector and eigenvalue ma- ΛA = Λ. (46)


trices of Ă. Left-multiplying Φ−1
A to equation (40) gives To summarize, for finding Φ and Λ in Eq. (5), we do the
us: following steps (note that A and B are given):
(a)
Φ−1 −1 ⊤
A ĂΦA = ΦA ΦA ΛA =⇒ ΦA ĂΦA = ΛA , (41) 1. From Eq. (36), we find ΦB and ΛB .
| {z }
I 1/2
2. From Eq. (38), we find Φ̆B . In case ΛB is singu-
where (a) is because ΦA is an orthogonal matrix (its lar in Eq. (38), we can use the numeric hack Φ̆B ≈
columns are orthonormal), so Φ−1 ⊤
A = ΦA . Note that ΦA is
1/2
ΦB (ΛB + εI)−1 where ε is a very small positive
an orthogonal matrix because Ă is symmetric (if the matrix 1/2
number, e.g., ε = 10−5 , large enough to make ΛB
is symmetric, its eigenvectors are orthogonal/orthonormal). full rank.
The equation (41) is diagonalizing the matrix Ă.
3. From Eq. (39), we find Ă.
Plugging equation (39) in equation (41) gives us:
4. From Eq. (40), we find ΦA and ΛA . From Eq. (46),
⊤ Λ is found.
Φ⊤
A Φ̆B AΦ̆B ΦA = ΛA
(38) −1/2 −1/2
5. From Eq. (43), we find Φ.
=⇒ Φ⊤
A ΛB Φ⊤
B AΦB ΛB ΦA = Λ A
The above instructions are given as an algorithm in Algo-
=⇒ Φ⊤ AΦ = ΛA , (42) rithm 1.
Eigenvalue and Generalized Eigenvalue Problems: Tutorial 8

8. Conclusion
This paper was a tutorial paper introducing the eigenvalue
and generalized eigenvalue problems. The problems were
introduced, their optimization problems were mentioned,
and some examples from machine learning were provided
for them. Moreover, the solution to the eigenvalue and gen-
eralized eigenvalue problems were introduced.

References
Barshan, Elnaz, Ghodsi, Ali, Azimifar, Zohreh, and
Jahromi, Mansoor Zolghadri. Supervised principal com-
ponent analysis: Visualization, classification and regres-
sion on subspaces and submanifolds. Pattern Recogni-
tion, 44(7):1357–1371, 2011.
Boyd, Stephen and Vandenberghe, Lieven. Convex opti-
mization. Cambridge university press, 2004.
Croot, Ernie. The Rayleigh principle for finding
eigenvalues. Technical report, Georgia Institute of
Technology, School of Mathematics, 2005. Online:
http://people.math.gatech.edu/∼ecroot/notes_linear.pdf,
Accessed: March 2019.
Fisher, Ronald A. The use of multiple measurements in
taxonomic problems. Annals of eugenics, 7(2):179–188,
1936.
Friedman, Jerome, Hastie, Trevor, and Tibshirani, Robert.
The elements of statistical learning, volume 2. Springer
series in statistics New York, NY, USA:, 2009.
Golub, Gene H. and Van Loan, Charles F. Matrix compu-
tations, volume 3. The Johns Hopkins University Press,
2012.
Jolliffe, Ian. Principal component analysis. Springer, 2011.
Parlett, Beresford N. The symmetric eigenvalue problem.
Classics in Applied Mathematics, 20, 1998.
Pearson, Karl. LIII. on lines and planes of closest fit to
systems of points in space. The London, Edinburgh, and
Dublin Philosophical Magazine and Journal of Science,
2(11):559–572, 1901.
Wang, Ruye. Generalized eigenvalue problem.
http://fourier.eng.hmc.edu/e161/lectures/algebra/node7.html,
2015. Accessed: January 2019.
Wilkinson, James Hardy. The algebraic eigenvalue prob-
lem, volume 662. Oxford Clarendon, 1965.
Xu, Yong and Lu, Guangming. Analysis on Fisher discrim-
inant criterion and linear separability of feature space. In
2006 International Conference on Computational Intel-
ligence and Security, volume 2, pp. 1671–1676. IEEE,
2006.

You might also like