Chapter 23
Chapter 23
Matrix Algebra
23.1 Introduction
Sylvester developed the modern concept of matrices in the 19th century.
For him a matrix was an array of numbers. Sylvester worked with systems
of linear equations and matrices provided a convenient way of working
with their coefcients, so matrix algebra was to generalize number oper-
ations to matrices. Nowadays, matrix algebra is used in all branches of
mathematics and the sciences and constitutes the basis of most statistical
procedures.
23.2 Matrices: Denition
Amatrix is a set of numbers arranged in a table. For example, Toto, Marius,
and Olivette are looking at their possessions, and they are counting how
many balls, cars, coins, and novels they each possess. Toto has 2 balls, 5
cars, 10 coins, and 20 novels. Marius has 1, 2, 3, and 4 and Olivette has 6, 1,
3 and 10. These data can be displayed in a table where each rowrepresents
a person and each column a possession:
balls cars coins novels
Toto 2 5 10 20
Marius 1 2 3 4
Olivette 6 1 3 10
We can also say that these data are described by the matrix denoted A
equal to:
A =
_
_
2 5 10 20
1 2 3 4
6 1 3 10
_
_
. (23.1)
Matrices are denoted by boldface uppercase letters.
To identify a specic element of a matrix, we use its row and column
numbers. For example, the cell dened by Row 3 and Column 1 contains
2 23.2 Matrices: Denition
the value 6. We write that a
3,1
= 6. With this notation, elements of a matrix
are denoted with the same letter as the matrix but written in lowercase
italic. The rst subscript always gives the row number of the element (i.e.,
3) and second subscript always gives its column number (i.e., 1).
A generic element of a matrix is identied with indices such as i and
j. So, a
i,j
is the element at the the i-th row and j-th column of A. The
total number of rows and columns is denoted with the same letters as the
indices but in uppercase letters. The matrix A has I rows (here I = 3)
and J columns (here J = 4) and it is made of I J elements a
i,j
(here
3 4 = 12). We often use the term dimensions to refer to the number of
rows and columns, so Ahas dimensions I by J.
As a shortcut, a matrix can be represented by its generic element writ-
ten in brackets. So, Awith I rows and J columns is denoted:
A = [a
i,j
] =
_
_
a
1,1
a
1,2
. . . a
1,j
. . . a
1,J
a
2,1
a
2,2
. . . a
2,j
. . . a
2,J
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
i,1
a
i,2
. . . a
i,j
. . . a
i,J
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
I,1
a
I,2
. . . a
I,j
. . . a
I, J
_
_
. (23.2)
For either convenience or clarity, we can also indicate the number of
rows and columns as a subscripts below the matrix name:
A = A
I J
= [a
i,j
] . (23.3)
23.2.1 Vectors
A matrix with one column is called a column vector or simply a vector. Vec-
tors are denoted with bold lower case letters. For example, the rst column
of matrix A(of Equation 23.1) is a column vector which stores the number
of balls of Toto, Marius, and Olivette. We can call it b (for balls), and so:
b =
_
_
2
1
6
_
_
. (23.4)
Vectors are the building blocks of matrices. For example, A (of Equa-
tion 23.1) is made of four column vectors which represent the number of
balls, cars, coins, and novels, respectively.
23.2.2 Norm of a vector
We can associate to a vector a quantity, related to its variance and standard
deviation, called the norm or length. The norm of a vector is the square
23.3 Operations for matrices 3
root of the sum of squares of the elements, it is denoted by putting the
name of the vector between a set of double bars (). For example, for
x =
_
_
2
1
2
_
_
, (23.5)
we nd
x =
2
2
+ 1
2
+ 2
2
=
4 + 1 + 4 =
9 = 3 . (23.6)
23.2.3 Normalization of a vector
A vector is normalized when its norm is equal to one. To normalize a vec-
tor, we divide each of its elements by its norm. For example, vector x from
Equation 23.5 is transformed into the normalized x as
x =
x
x
=
_
_
2
3
1
3
2
3
_
_
. (23.7)
23.3 Operations for matrices
23.3.1 Transposition
If we exchange the roles of the rows and the columns of a matrix we trans-
pose it. This operation is called the transposition, and the new matrix is
called a transposed matrix. The Atransposed is denoted A
T
. For example:
if A = A
3 4
=
_
_
2 5 10 20
1 2 3 4
6 1 3 10
_
_
then A
T
= A
4 3
T
=
_
_
2 1 6
5 2 1
10 3 3
20 4 10
_
_
. (23.8)
23.3.2 Addition (sum) of matrices
When two matrices have the same dimensions, we compute their sum by
adding the corresponding elements. For example, with
A =
_
_
2 5 10 20
1 2 3 4
6 1 3 10
_
_
and B =
_
_
3 4 5 6
2 4 6 8
1 2 3 5
_
_
, (23.9)
we nd
A+B =
_
_
2 + 3 5 + 4 10 + 5 20 + 6
1 + 2 2 + 4 3 + 6 4 + 8
6 + 1 1 + 2 3 + 3 10 + 5
_
_
=
_
_
5 9 15 26
3 6 9 12
7 3 6 15
_
_
. (23.10)
4 23.3 Operations for matrices
In general
A+B =
_
_
a
1,1
+b
1,1
a
1,2
+b
1,2
. . . a
1,j
+b
1,j
. . . a
1,J
+b
1,J
a
2,1
+b
2,1
a
2,2
+b
2,2
. . . a
2,j
+b
2,j
. . . a
2,J
+b
2,J
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
i,1
+b
i,1
a
i,2
+b
i,2
. . . a
i,j
+b
i,j
. . . a
i,J
+b
i,J
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
I,1
+b
I,1
a
I,2
+b
I,2
. . . a
I,j
+b
I,j
. . . a
I, J
+b
I, J
_
_
. (23.11)
Matrix addition behaves very much like usual addition. Specically,
matrix addition is commutative (i.e., A + B = B + A); and associative
[i.e., A+ (B+C) = (A+B) +C].
23.3.3 Multiplication of a matrix by a scalar
Inorder to differentiate matrices fromthe usual numbers, we call the latter
scalar numbers or simply scalars. To multiply a matrix by a scalar, multiply
each element of the matrix by this scalar. For example:
10 B = 10
_
_
3 4 5 6
2 4 6 8
1 2 3 5
_
_
=
_
_
30 40 50 60
20 40 60 80
10 20 30 50
_
_
. (23.12)
23.3.4 Multiplication: Product or products?
There are several ways of generalizing the concept of product to matrices.
We will look at the most frequently used of these matrix products. Each
of these products will behave like the product between scalars when the
matrices have dimensions 1 1.
23.3.5 Hadamard product
When generalizing product to matrices, the rst approach is to multiply
the corresponding elements of the two matrices that we want to multi-
ply. This is called the Hadamard product denoted by . The Hadamard
product exists only for matrices with the same dimensions. Formally, it is
dened as:
AB = [a
i,j
b
i,j
]
=
_
_
a
1,1
b
1,1
a
1,2
b
1,2
. . . a
1,j
b
1,j
. . . a
1,J
b
1,J
a
2,1
b
2,1
a
2,2
b
2,2
. . . a
2,j
b
2,j
. . . a
2,J
b
2,J
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
i,1
b
i,1
a
i,2
b
i,2
. . . a
i,j
b
i,j
. . . a
i,J
b
i,J
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
I,1
b
I,1
a
I,2
b
I,2
. . . a
I,j
b
I,j
. . . a
I, J
b
I, J
_
_
. (23.13)
23.3 Operations for matrices 5
For example, with
A =
_
_
2 5 10 20
1 2 3 4
6 1 3 10
_
_
and B =
_
_
3 4 5 6
2 4 6 8
1 2 3 5
_
_
, (23.14)
we get:
AB =
_
_
2 3 5 4 10 5 20 6
1 2 2 4 3 6 4 8
6 1 1 2 3 3 10 5
_
_
=
_
_
6 20 50 120
2 8 18 32
6 2 9 50
_
_
. (23.15)
23.3.6 Standard (a.k.a.) Cayley product
The Hadamard product is straightforward, but, unfortunately, it is not the
matrix product most often used. This product is called the standard or
Cayley product, or simply the product (i.e., when the name of the product
is not specied, this is the standard product). Its denition comes from
the original use of matrices to solve equations. Its denition looks surpris-
ing at rst because it is dened only when the number of columns of the
rst matrix is equal to the number of rows of the second matrix. When
two matrices can be multiplied together they are called conformable. This
product will have the number of rows of the rst matrix and the number
of columns of the second matrix.
So, A with I rows and J columns can be multiplied by B with J rows
and K columns to give C with I rows and K columns. A convenient way
of checking that two matrices are conformable is to write the dimensions
of the matrices as subscripts. For example:
A
I J
B
J K
= C
I K
, (23.16)
or even:
I
A
J
B
K
= C
I K
(23.17)
An element c
i,k
of the matrix Cis computed as:
c
i,k
=
J
j=1
a
i,j
b
j,k
. (23.18)
So, c
i,k
is the sum of J terms, each term being the product of the corre-
sponding element of the i-th row of Awith the k-th column of B.
For example, let:
A =
_
1 2 3
4 5 6
_
and B =
_
_
1 2
3 4
5 6
_
_
. (23.19)
6 23.3 Operations for matrices
The product of these matrices is denoted C = A B = AB (the sign
can be omitted when the context is clear). To compute c
2,1
we add 3 terms:
(1) the product of the rst element of the second row of A(i.e., 4) with the
rst element of the rst column of B(i.e., 1); (2) the product of the second
element of the second rowof A(i.e., 5) with the second element of the rst
column of B(i.e., 3); and (3) the product of the third element of the second
row of A (i.e., 6) with the third element of the rst column of B (i.e., 5).
Formally, the term c
2,1
is obtained as
c
2,1
=
J=3
j=1
a
2,j
b
j,1
= (a
2,1
) (b
1,1
) + (a
2,2
b
2,1
) + (a
2,3
b
3,1
)
= (4 1) + (5 3) + (6 5)
= 49 . (23.20)
Matrix Cis obtained as:
AB = C = [c
i,k
]
=
J=3
j=1
a
i,j
b
j,k
=
_
1 1 + 2 3 + 3 5 1 2 + 2 4 + 3 6
4 1 + 5 3 + 6 5 4 2 + 5 4 + 6 6
_
=
_
22 28
49 64
_
. (23.21)
23.3.6.1 Properties of the product
Like the product betweenscalars, the product betweenmatrices is associa-
tive, and distributive relative to addition. Specically, for any set of three
conformable matrices A, Band C:
(AB)C = A(BC) = ABC associativity (23.22)
A(B+C) = AB+AC distributivity. (23.23)
The matrix products AB and BA do not always exist, but when they
do, these products are not, in general, commutative:
AB = BA. (23.24)
For example, with
A =
_
2 1
2 1
_
and B =
_
1 1
2 2
_
(23.25)
23.4 Special matrices 7
we get:
AB =
_
2 1
2 1
_ _
1 1
2 2
_
=
_
0 0
0 0
_
. (23.26)
But
BA =
_
1 1
2 2
_ _
2 1
2 1
_
=
_
4 2
8 4
_
. (23.27)
Incidently, we can combine transposition and product and get the fol-
lowing equation:
(AB)
T
= B
T
A
T
. (23.28)
23.3.7 Exotic product: Kronecker
Another product is the Kronecker product also called the direct, tensor, or
Zehfuss product. It is denoted , and is dened for all matrices. Specif-
ically, with two matrices A = a
i,j
(with dimensions I by J) and B (with
dimensions K and L), the Kronecker product gives a matrix C (with di-
mensions (I K) by (J L)) dened as:
AB =
_
_
a
1,1
B a
1,2
B . . . a
1,j
B . . . a
1,J
B
a
2,1
B a
2,2
B . . . a
2,j
B . . . a
2,J
B
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
i,1
B a
i,2
B . . . a
i,j
B . . . a
i,J
B
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
I,1
B a
I,2
B . . . a
I,j
B . . . a
I, J
B
_
_
. (23.29)
For example, with
A =
_
1 2 3
and B =
_
6 7
8 9
_
(23.30)
we get:
AB =
_
1 6 1 7 2 6 2 7 3 6 3 7
1 8 1 9 2 8 2 9 3 8 3 9
_
=
_
6 7 12 14 18 21
8 9 16 18 24 27
_
.
(23.31)
The Kronecker product is used to write design matrices. It is an essen-
tial tool for the derivation of expected values and sampling distributions.
23.4 Special matrices
Certain special matrices have specic names.
8 23.4 Special matrices
23.4.1 Square and rectangular matrices
A matrix with the same number of rows and columns is a square matrix.
By contrast, a matrix with different numbers of rows and columns, is a
rectangular matrix. So:
A =
_
_
1 2 3
4 5 5
7 8 0
_
_
(23.32)
is a square matrix, but
B =
_
_
1 2
4 5
7 8
_
_
(23.33)
is a rectangular matrix.
23.4.2 Symmetric matrix
A square matrix Awith a
i,j
= a
j,i
is symmetric. So:
A =
_
_
10 2 3
2 20 5
3 5 30
_
_
(23.34)
is symmetric, but
A =
_
_
12 2 3
4 20 5
7 8 30
_
_
(23.35)
is not.
Note that for a symmetric matrix:
A = A
T
. (23.36)
A common mistake is to assume that the standard product of two sym-
metric matrices is commutative. But this is not true as shown by the fol-
lowing example, with:
A =
_
_
1 2 3
2 1 4
3 4 1
_
_
and B =
_
_
1 1 2
1 1 3
2 3 1
_
_
. (23.37)
We get
AB =
_
_
9 12 11
11 15 11
9 10 19
_
_
, but BA =
_
_
9 11 9
12 15 10
11 11 19
_
_
. (23.38)
Note, however, that combining Equations 23.28 and 23.36, gives for sym-
metric matrices Aand B, the following equation:
AB = (BA)
T
. (23.39)
23.4 Special matrices 9
23.4.3 Diagonal matrix
A square matrix is diagonal when all its elements, except the ones on the
diagonal, are zero. Formally, a matrix is diagonal if a
i,j
= 0 when i = j. So:
A =
_
_
10 0 0
0 20 0
0 0 30
_
_
is diagonal . (23.40)
Because only the diagonal elements matter for a diagonal matrix, we
just need to specify them. This is done with the following notation:
A = diag{[a
1,1
, . . . , a
i,i
, . . . , a
I,I
]} = diag{[a
i,i
]} . (23.41)
For example, the previous matrix can be rewritten as:
A =
_
_
10 0 0
0 20 0
0 0 30
_
_
= diag{[10, 20, 30]} . (23.42)
The operator diag can also be used to isolate the diagonal of any square
matrix. For example, with:
A =
_
_
1 2 3
4 5 6
7 8 9
_
_
(23.43)
we get:
diag{A} = diag
_
_
_
_
_
1 2 3
4 5 6
7 8 9
_
_
_
_
_
=
_
_
1
5
9
_
_
. (23.44)
Note, incidently, that:
diag{diag{A}} =
_
_
1 0 0
0 5 0
0 0 9
_
_
. (23.45)
23.4.4 Multiplication by a diagonal matrix
Diagonal matrices are often used to multiply by a scalar all the elements of
a given rowor column. Specically, when we pre-multiply a matrix by a di-
agonal matrix the elements of the row of the second matrix are multiplied
by the corresponding diagonal element. Likewise, when we post-multiply
a matrix by a diagonal matrix the elements of the column of the rst matrix
are multiplied by the corresponding diagonal element. For example, with:
A =
_
1 2 3
4 5 6
_
B =
_
2 0
0 5
_
C =
_
_
2 0 0
0 4 0
0 0 6
_
_
, (23.46)
10 23.4 Special matrices
we get
BA =
_
2 0
0 5
_
_
1 2 3
4 5 6
_
=
_
2 4 6
20 25 30
_
(23.47)
and
AC =
_
1 2 3
4 5 6
_
_
_
2 0 0
0 4 0
0 0 6
_
_
=
_
2 8 18
8 20 36
_
(23.48)
and also
BAC =
_
2 0
0 5
_
_
1 2 3
4 5 6
_
_
_
2 0 0
0 4 0
0 0 6
_
_
=
_
4 16 36
40 100 180
_
. (23.49)
23.4.5 Identity matrix
A diagonal matrix whose diagonal elements are all equal to 1 is called an
identity matrix and is denoted I. If we need to specify its dimensions, we
use subscripts such as
I
3 3
= I =
_
_
1 0 0
0 1 0
0 0 1
_
_
(this is a 3 3 identity matrix). (23.50)
The identity matrix is the neutral element for the standard product. So:
I A = AI = A (23.51)
for any matrix Aconformable with I. For example:
_
_
1 0 0
0 1 0
0 0 1
_
_
_
_
1 2 3
4 5 5
7 8 0
_
_
=
_
_
1 2 3
4 5 5
7 8 0
_
_
_
_
1 0 0
0 1 0
0 0 1
_
_
=
_
_
1 2 3
4 5 5
7 8 0
_
_
. (23.52)
23.4.6 Matrix full of ones
A matrix whose elements are all equal to 1, is denoted 1 or, when we need
to specify its dimensions, by 1
I J
. These matrices are neutral elements for
the Hadamard product. So:
A
2 3
1
2 3
=
_
1 2 3
4 5 6
_
_
1 1 1
1 1 1
_
(23.53)
=
_
1 1 2 1 3 1
4 1 5 1 6 1
_
=
_
1 2 3
4 5 6
_
. (23.54)
The matrices can also be used to compute sums of rows or columns:
_
1 2 3
_
_
1
1
1
_
_
= (1 1) + (2 1) + (3 1) = 1 + 2 + 3 = 6 , (23.55)
23.4 Special matrices 11
or also
_
1 1
_
1 2 3
4 5 6
_
=
_
5 7 9
. (23.56)
23.4.7 Matrix full of zeros
A matrix whose elements are all equal to 0, is the null or zero matrix. It
is denoted by 0 or, when we need to specify its dimensions, by 0
I J
. Null
matrices are neutral elements for addition
_
1 2
3 4
_
+ 0
2 2
=
_
1 + 0 2 + 0
3 + 0 4 + 0
_
=
_
1 2
3 4
_
. (23.57)
They are also null elements for the Hadamard product.
_
1 2
3 4
_
0
2 2
=
_
1 0 2 0
3 0 4 0
_
=
_
0 0
0 0
_
= 0
2 2
(23.58)
and for the standard product:
_
1 2
3 4
_
0
2 2
=
_
1 0 + 2 0 1 0 + 2 0
3 0 + 4 0 3 0 + 4 0
_
=
_
0 0
0 0
_
= 0
2 2
. (23.59)
23.4.8 Triangular matrix
A matrix is lower triangular when a
i,j
= 0 for i < j. A matrix is upper
triangular when a
i,j
= 0 for i > j. For example:
A =
_
_
10 0 0
2 20 0
3 5 30
_
_
is lower triangular, (23.60)
and
B =
_
_
12 2 3
0 20 5
0 0 30
_
_
is upper triangular. (23.61)
23.4.9 Cross-product matrix
Across-product matrix is obtainedby multiplicationof a matrix by its trans-
pose. Therefore a cross-product matrix is square and symmetric. For ex-
ample, the matrix:
A =
_
_
1 1
2 4
3 4
_
_
(23.62)
pre-multiplied by its transpose
A
T
=
_
1 2 3
1 4 4
_
(23.63)
12 23.5 Special matrices
gives the cross-product matrix:
A
T
A =
_
1 1 + 2 2 + 3 3 1 1 + 2 4 + 3 4
1 1 + 4 2 + 4 3 1 1 + 4 4 + 4 4
_
=
_
14 21
21 33
_
. (23.64)
23.4.9.1 A particular case of cross-product matrix:
Variance/Covariance
A particular case of cross-product matrices are correlation or covariance
matrices. A variance/covariance matrix is obtained from a data matrix
with three steps: (1) subtract the mean of each column from each element
of this column (this is centering); (2) compute the cross-product matrix
fromthe centered matrix; and (3) divide eachelement of the cross-product
matrix by the number of rows of the data matrix. For example, if we take
the I = 3 by J = 2 matrix A:
A =
_
_
2 1
5 10
8 10
_
_
, (23.65)
we obtain the means of each column as:
m=
1
I
1
1 I
A
I J
=
1
3
_
1 1 1
_
_
2 1
5 10
8 10
_
_
=
_
5 7
. (23.66)
To center the matrix we subtract the mean of each column from all its ele-
ments. This centered matrix gives the deviations fromeach element to the
mean of its column. Centering is performed as:
D = A 1
J 1
m=
_
_
2 1
5 10
8 10
_
_
_
_
1
1
1
_
_
_
5 7
(23.67)
=
_
_
2 1
5 10
8 10
_
_
_
_
5 7
5 7
5 7
_
_
=
_
_
3 6
0 3
3 3
_
_
. (23.68)
We note S the variance/covariance matrix derived from A, it is com-
puted as:
S =
1
I
D
T
D =
1
3
_
3 0 3
6 3 3
_
_
_
3 6
0 3
3 3
_
_
=
1
3
_
18 27
27 54
_
=
_
6 9
9 18
_
. (23.69)
(Variances are on the diagonal, covariances are off-diagonal.)
23.5 The inverse of a square matrix 13
23.5 The inverse of a square matrix
An operation similar to division exists, but only for (some) square matri-
ces. This operation uses the notion of inverse operation and denes the
inverse of a matrix. The inverse is dened by analogy with the scalar num-
ber case for which division actually corresponds to multiplication by the
inverse, namely:
a
b
= a b
1
with b b
1
= 1 . (23.70)
The inverse of a square matrix A is denoted A
1
. It has the following
property:
AA
1
= A
1
A = I . (23.71)
The denition of the inverse of a matrix is simple. but its computation, is
complicated and is best left to computers.
For example, for:
A =
_
_
1 2 1
0 1 0
0 0 1
_
_
, (23.72)
the inverse is:
A
1
=
_
_
1 2 1
0 1 0
0 0 1
_
_
. (23.73)
All square matrices do not necessarily have an inverse. The inverse of
a matrix does not exist if the rows (and the columns) of this matrix are
linearly dependent. For example,
A =
_
_
3 4 2
1 0 2
2 1 3
_
_
, (23.74)
does not have an inverse since the second column is a linear combination
of the two other columns:
_
_
4
0
1
_
_
= 2
_
_
3
1
2
_
_
_
_
2
2
3
_
_
=
_
_
6
2
4
_
_
_
_
2
2
3
_
_
. (23.75)
A matrix without an inverse is singular. When A
1
exists it is unique.
Inverse matrices are used for solving linear equations, and least square
problems in multiple regression analysis or analysis of variance.
14 23.6 The Big tool: eigendecomposition
23.5.1 Inverse of a diagonal matrix
The inverse of a diagonal matrix is easy to compute: The inverse of
A = diag{a
i,i
} (23.76)
is the diagonal matrix
A
1
= diag
_
a
1
i,i
_
= diag{1/a
i,i
} (23.77)
For example,
_
_
1 0 0
0 .5 0
0 0 4
_
_
and
_
_
1 0 0
0 2 0
0 0 .25
_
_
, (23.78)
are the inverse of each other.
23.6 The Big tool: eigendecomposition
So far, matrix operations are very similar to operations with numbers. The
next notion is specic to matrices. This is the idea of decomposing a ma-
trix into simpler matrices. A lot of the power of matrices follows from this.
A rst decomposition is called the eigendecomposition and it applies only
to square matrices, the generalization of the eigendecomposition to rect-
angular matrices is called the singular value decomposition.
Eigenvectors and eigenvalues are numbers and vectors associated with
square matrices, together they constitute the eigendecomposition. Even
though the eigendecomposition does not exist for all square matrices, it
has a particularly simple expression for a class of matrices often used in
multivariate analysis suchas correlation, covariance, or cross-product ma-
trices. The eigendecomposition of these matrices is important in statistics
because it is used to nd the maximum(or minimum) of functions involv-
ing these matrices. For example, principal component analysis is obtained
from the eigendecomposition of a covariance or correlation matrix and
gives the least square estimate of the original data matrix.
23.6.1 Notations and denition
An eigenvector of matrix A is a vector u that satises the following equa-
tion:
Au = u , (23.79)
where is a scalar calledthe eigenvalue associatedtothe eigenvector. When
rewritten, Equation 23.79 becomes:
(AI)u = 0 . (23.80)
23.6 The Big tool: eigendecomposition 15
Therefore u is eigenvector of Aif the multiplication of u by Achanges
the length of u but not its orientation. For example,
A =
_
2 3
2 1
_
(23.81)
has for eigenvectors:
u
1
=
_
3
2
_
with eigenvalue
1
= 4 (23.82)
and
u
2
=
_
1
1
_
with eigenvalue
2
= 1 (23.83)
When u
1
and u
2
are multiplied by A, only their length changes. That is,
Au
1
=
1
u
1
=
_
2 3
2 1
_ _
3
2
_
=
_
12
8
_
= 4
_
3
2
_
(23.84)
and
Au
2
=
2
u
2
=
_
2 3
2 1
_ _
1
1
_
=
_
1
1
_
= 1
_
1
1
_
. (23.85)
This is illustrated in Figure 23.1.
For convenience, eigenvectors are generally normalized such that:
u
T
u = 1 . (23.86)
For the previous example, normalizing the eigenvectors gives:
u
1
=
_
.8321
.5547
_
and u
2
_
.7071
.7071
_
. (23.87)
3
2
12
8
u
1
1
Au
-1
1
1
-1
u
Au
a
b
2
2
Figure 23.1: Two eigenvectors of a matrix.
16 23.6 The Big tool: eigendecomposition
We can check that:
_
2 3
2 1
_ _
.8321
.5547
_
=
_
3.3284
2.2188
_
= 4
_
.8321
.5547
_
(23.88)
and
_
2 3
2 1
_ _
.7071
.7071
_
=
_
.7071
.7071
_
= 1
_
.7071
.7071
_
. (23.89)
23.6.2 Eigenvector and eigenvalue matrices
Traditionally, we store the eigenvectors of A as the columns a matrix de-
noted U. Eigenvalues are stored in a diagonal matrix (denoted ). There-
fore, Equation 23.79 becomes:
AU = U. (23.90)
For example, with A(from Equation 23.81), we have
_
2 3
2 1
_
_
3 1
2 1
_
=
_
3 1
2 1
_
_
4 0
0 1
_
(23.91)
23.6.3 Reconstitution of a matrix
The eigen-decomposition can also be use to build back a matrix from it
eigenvectors and eigenvalues. This is shown by rewriting Equation 23.90
as
A = UU
1
. (23.92)
For example, because
U
1
=
_
.2 .2
.4 .6
_
,
we obtain:
A = UU
1
=
_
3 1
2 1
_ _
4 0
0 1
_ _
.2 .2
.4 .6
_
=
_
2 3
2 1
_
. (23.93)
23.6.4 Digression:
An innity of eigenvectors for one eigenvalue
It is only through a slight abuse of language that we talk about the eigen-
vector associated with one eigenvalue. Any scalar multiple of an eigenvec-
tor is an eigenvector, so for each eigenvalue there is an innite number of
23.6 The Big tool: eigendecomposition 17
eigenvectors all proportional to each other. For example,
_
1
1
_
(23.94)
is an eigenvector of A:
_
2 3
2 1
_
. (23.95)
Therefore:
2
_
1
1
_
=
_
2
2
_
(23.96)
is also an eigenvector of A:
_
2 3
2 1
_ _
2
2
_
=
_
2
2
_
= 1 2
_
1
1
_
. (23.97)
23.6.5 Positive (semi-)denite matrices
Some matrices, such as
_
0 1
0 0
_
, do not have eigenvalues. Fortunately, the
matrices used often in statistics belong to a category called positive semi-
denite. The eigendecomposition of these matrices always exists and has
a particularly convenient form. A matrix is positive semi-denite when it
can be obtained as the product of a matrix by its transpose. This implies
that a positive semi-denite matrix is always symmetric. So, formally, the
matrix Ais positive semi-denite if it can be obtained as:
A = XX
T
(23.98)
for a certain matrix X. Positive semi-denite matrices include correlation,
covariance, and cross-product matrices.
The eigenvalues of a positive semi-denite matrix are always positive
or null. Its eigenvectors are composed of real values and are pairwise or-
thogonal when their eigenvalues are different. This implies the following
equality:
U
1
= U
T
. (23.99)
We can, therefore, express the positive semi-denite matrix Aas:
A = UU
T
(23.100)
where U
T
U = I are the normalized eigenvectors.
For example,
A =
_
3 1
1 3
_
(23.101)
can be decomposed as:
A = UU
T
18 23.6 The Big tool: eigendecomposition
=
_
_
_
1
2
_
1
2
_
1
2
_
1
2
_
_
_
4 0
0 2
_
_
_
_
1
2
_
1
2
_
1
2
_
1
2
_
_
=
_
3 1
1 3
_
, (23.102)
with
_
_
_
1
2
_
1
2
_
1
2
_
1
2
_
_
_
_
_
1
2
_
1
2
_
1
2
_
1
2
_
_
=
_
1 0
0 1
_
. (23.103)
23.6.5.1 Diagonalization
When a matrix is positive semi-denite we can rewrite Equation 23.100 as
A = UU
T
= U
T
AU. (23.104)
This shows that we can transformAinto a diagonal matrix. Therefore the
eigen-decomposition of a positive semi-denite matrix is often called its
diagonalization.
23.6.5.2 Another denition for positive semi-denite matrices
A matrix Ais positive semi-denite if for any non-zero vector x we have:
x
T
Ax 0 x . (23.105)
When all the eigenvalues of a matrix are positive, the matrix is positive
denite. In that case, Equation 23.105 becomes:
x
T
Ax > 0 x . (23.106)
23.6.6 Trace, Determinant, etc.
The eigenvalues of a matrix are closely related to three important numbers
associated to a square matrix the: trace, determinant and rank.
23.6.6.1 Trace
The trace of A, denoted trace{A}, is the sum of its diagonal elements. For
example, with:
A =
_
_
1 2 3
4 5 6
7 8 9
_
_
(23.107)
we obtain:
trace{A} = 1 + 5 + 9 = 15 . (23.108)
23.6 The Big tool: eigendecomposition 19
The trace of a matrix is also equal to the sum of its eigenvalues:
trace{A} =
= trace{} (23.109)
with being the matrix of the eigenvalues of A. For the previous example,
we have:
= diag{16.1168, 1.1168, 0} . (23.110)
We can verify that:
trace{A} =
with
_
Q
T
QI
_
(23.118)
This amounts to dening the following equation
L = trace
_
F
T
F
_
Q
T
QI
__
= trace
_
Q
T
X
T
XQ
_
Q
T
QI
__
.
(23.119)
The values of Q which give the maximum values of L, are found by rst
computing the derivative of L relative to Q:
L
Q
= 2X
T
XQ2Q, (23.120)
and setting this derivative to zero:
X
T
XQQ = 0 X
T
XQ = Q. (23.121)
Because is diagonal, this is an eigendecomposition problem, and is
the matrix of eigenvalues of the positive semi-denite matrix X
T
Xordered
fromthe largest to the smallest and Qis the matrix of eigenvectors of X
T
X.
Finally, the factor matrix is
F = XQ. (23.122)
The variance of the factors scores is equal to the eigenvalues:
F
T
F = Q
T
X
T
XQ = . (23.123)
Because the sum of the eigenvalues is equal to the trace of X
T
X, the rst
factor scores extract as much of the variances of the original data as pos-
sible, and the second factor scores extract as much of the variance left un-
explained by the rst factor, and so on for the remaining factors. The di-
agonal elements of the matrix
1
2
which are the standard deviations of the
factor scores are called the singular values of X.
23.7 A tool for rectangular matrices:
The singular value decomposition 21
23.7 A tool for rectangular matrices:
The singular value decomposition
The singular value decomposition(SVD) generalizes the eigendecomposition
to rectangular matrices. The eigendecomposition, decomposes a matrix
into two simple matrices, and the SVD decomposes a rectangular matrix
into three simple matrices: Two orthogonal matrices and one diagonal
matrix. The SVD uses the eigendecomposition of a positive semi-denite
matrix to derive a similar decomposition for rectangular matrices.
23.7.1 Denitions and notations
The SVD decomposes matrix Aas:
A = PQ
T
. (23.124)
where P is the (normalized) eigenvectors of the matrix AA
T
(i.e., P
T
P =
I). The columns of P are called the left singular vectors of A. Q is the
(normalized) eigenvectors of the matrix A
T
A(i.e., Q
T
Q = I). The columns
of Q are called the right singular vectors of A. is the diagonal matrix
of the singular values, =
1
2
with being the diagonal matrix of the
eigenvalues of AA
T
and A
T
A.
The SVD is derived from the eigendecomposition of a positive semi-
denite matrix. This is shown by considering the eigendecomposition of
the two positive semi-denite matrices obtained from A: namely AA
T
and A
T
A. If we express these matrices in terms of the SVD of A, we nd:
AA
T
= PQ
T
QP
T
= P
2
P
T
= PP
T
, (23.125)
and
A
T
A = QP
T
PQ
T
= Q
2
Q
T
= QQ
T
. (23.126)
This shows that is the square root of , that P are eigenvectors of
AA
T
, and that Qare eigenvectors of A
T
A.
For example, the matrix:
A =
_
_
1.1547 1.1547
1.0774 0.0774
0.0774 1.0774
_
_
(23.127)
can be expressed as:
A = PQ
T
=
_
_
0.8165 0
0.4082 0.7071
0.4082 0.7071
_
_
_
2 0
0 1
_ _
0.7071 0.7071
0.7071 0.7071
_
22 23.7 A tool for rectangular matrices:
The singular value decomposition
=
_
_
1.1547 1.1547
1.0774 0.0774
0.0774 1.0774
_
_
. (23.128)
We can check that:
AA
T
=
_
_
0.8165 0
0.4082 0.7071
0.4082 0.7071
_
_
_
2
2
0
0 1
2
_ _
0.8165 0.4082 0.4082
0 0.7071 0.7071
_
=
_
_
2.6667 1.3333 1.3333
1.3333 1.1667 0.1667
1.3333 0.1667 1.1667
_
_
(23.129)
and that:
A
T
A =
_
0.7071 0.7071
0.7071 0.7071
_ _
2
2
0
0 1
2
_ _
0.7071 0.7071
0.7071 0.7071
_
=
_
2.5 1.5
1.5 2.5
_
. (23.130)
23.7.2 Generalized or pseudo-inverse
The inverse of a matrix is dened only for full rank square matrices. The
generalization of the inverse for other matrices is called generalized in-
verse, pseudo-inverse or Moore-Penrose inverse and is denoted by X
+
. The
pseudo-inverse of A is the unique matrix that satises the following four
constraints:
AA
+
A = A (i)
A
+
AA
+
= A
+
(ii)
(AA
+
)
T
= AA
+
(symmetry 1) (iii)
(A
+
A)
T
= A
+
A (symmetry 2) (iv) . (23.131)
For example, with
A =
_
_
1 1
1 1
1 1
_
_
(23.132)
we nd that the pseudo-inverse is equal to
A
+
=
_
.25 .25 .5
.25 .25 .5
_
. (23.133)
23.7 A tool for rectangular matrices:
The singular value decomposition 23
This example shows that the product of a matrix and its pseudo-inverse
does not always gives the identity matrix:
AA
+
=
_
_
1 1
1 1
1 1
_
_
_
.25 .25 .5
.25 .25 .5
_
=
_
0.3750 0.1250
0.1250 0.3750
_
. (23.134)
23.7.3 Pseudo-inverse and singular value decomposition
The SVD is the building block for the Moore-Penrose pseudo-inverse. Be-
cause any matrix Awith SVD equal to PQ
T
has for pseudo-inverse:
A
+
= Q
1
P
T
. (23.135)
For the preceding example we obtain:
A
+
=
_
0.7071 0.7071
0.7071 0.7071
_ _
2
1
0
0 1
1
_ _
0.8165 0.4082 0.4082
0 0.7071 0.7071
_
=
_
0.2887 0.6443 0.3557
0.2887 0.3557 0.6443
_
. (23.136)
Pseudo-inverse matrices are usedtosolve multiple regressionandanal-
ysis of variance problems.
24 23.7 A tool for rectangular matrices:
The singular value decomposition
Bibliography
[1] Abdi, H. (2007a). Eigendecomposition: eigenvalues and eigen-
vecteurs. In N.J. Salkind (Ed.): Encyclopedia of Measurement and
Statistics. Thousand Oaks (CA): Sage. pp. 304308.
[2] Abdi, H. (2007b). Singular Value Decomposition (SVD) and General-
ized Singular Value Decomposition (GSVD). In N.J. Salkind (Ed.): En-
cyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage.
pp. 907912.
[3] Basilevsky, A. (1983). Applied Matrix Algebra inthe Statistical Sciences.
New York: North-Holland.
[4] Graybill, F.A. (1969). Matrices withApplications inStatistics. NewYork:
Wadworth.
[5] Healy, M.J.R. (1986). Matrices for Statistics. Oxford: Oxford University
Press.
[6] Searle, S.R. (1982). Matrices Algebra Useful for Statistics. New York:
Wiley.