00 Lectureslides LinAlg
00 Lectureslides LinAlg
David Barber
University College London
1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and demos can be downloaded from
www.cs.ucl.ac.uk/staff/D.Barber/brml. Feedback and corrections are also available on the site. Feel free to adapt these slides for your own purposes,
but please include a link the above website.
Matrices
Matrix addition
For two matrix A and B of the same size,
The matrix I is the identity matrix, necessarily square, with 1’s on the diagonal
and 0’s everywhere else. For clarity we may also write Im for a square m × m
identity matrix. Then for an m × n matrix A, Im A = AIn = A. The identity
matrix has elements [I]ij = δij given by the Kronecker delta:
1 i=j
δij ≡
0 i 6= j
1 0 ··· 0
0 1 ··· 0
I=
.. .. .. ..
. . . .
0 0 ··· 1
Transpose
Vectors
Let x denote the n-dimensional column vector with components
x1
x2
..
.
xn
x1 y1 x1 + y1
x2 y2 x2 + y2
x+y = + =
.. .. ..
. . .
xn yn xn + yn
Scalar product
n
X
w·x= wi xi = wT x
i=1
where θ is the angle between the two vectors. Thus if the lengths of two vectors are
fixed their inner product is largest when θ = 0, whereupon one vector is a constant
multiple of the other. If the scalar product xT y = 0, then x and y are orthogonal.
Linear dependence
Suppose we wish to resolve the vector a into its components along the orthogonal
directions specified by the unit vectors e and e∗ . That is |e| = |e|∗ = 1 and
e · e∗ = 0. We are required to find the scalar values α and β such that
a = αe + βe∗
a · e = αe · e + βe∗ · e, a · e∗ = αe · e∗ + βe∗ · e∗
From orthogonality and unit lengths of the vectors e and e∗ , this becomes
a · e = α, a · e∗ = β
Hence
a = (a · e) e + (a · e∗ ) e∗
a·f
The projection of a vector a onto a direction specified by general f is |f |2 f .
Determinant
For a square matrix A, the determinant is the volume of the transformation of the
matrix A (up to a sign change). That is, we take a hypercube of unit volume and
map each vertex under the transformation. The volume of the resulting object is
defined as the determinant. Writing [A]ij = aij ,
a11 a12
det = a11 a22 − a21 a12
a21 a22
The determinant in the (3 × 3) case has the form
a22 a23 a21 a23 a a22
a11 det − a12 det + a13 det 21
a32 a33 a31 a33 a31 a32
More generally, the determinant can be computed recursively as an expansion
along the top row of determinants of reduced matrices.
The absolute value of the determinant is the volume of the transformation.
det AT = det (A)
A−1 A = I = AA−1
It is not always possible to find a matrix A−1 such that A−1 A = I, in which case
A singular. Geometrically, singular matrices correspond to projections: if we
transform each of the vertices v of a binary hypercube using Av, the volume of
the transformed hypercube is zero (A has determinant zero). Given a vector y and
a singular transformation, A, one cannot uniquely identify a vector x for which
y = Ax. Provided the inverses exist
−1
(AB) = B−1 A−1
Pseudo inverse
For a non-square matrix A such that AAT is invertible,
−1
A† = AT AAT
satisfies AA† = I.
Solving Linear Systems
Problem
Given square N × N matrix A and vector b, find the vector x that satisfies
Ax = b
Solution
Algebraically, we have the inverse:
x = A−1 b
Complexity
Solving a linear system is O N 3 —can be very expensive for large N .
Approximate methods include conjugate gradient and related approaches.
Matrix rank
X = x1 , . . . , xn
Full rank
An n × n square matrix is full rank if the rank is n, in which case the matrix is
must be non-singular. Otherwise the matrix is reduced rank and is singular.
Trace and Det
X X
trace (A) = Aii = λi
i i
Trace-Log formula
For a positive definite matrix A,
The above logarithm of a matrix is not the element-wise logarithm. In general for
an analytic function f (x), f (M) is defined via the power-series expansion of the
function. On the right, since det (A) is a scalar, the logarithm is the standard
logarithm of a scalar.
Orthogonal matrix
AAT = I = AT A
From the properties of the determinant, we see therefore that an orthogonal matrix
has determinant ±1 and hence corresponds to a volume preserving transformation.
Linear transformations
Linear transformation
A linear transformation of x is given by matrix multiplication by some matrix A
X X
Ax = xi Aui = xi ai
i i
Ae = λe
(A − λI) e = 0
| {z }
B
det (A − λI) = 0
It may be that for an eigenvalue λ the eigenvector is not unique and there is a
space of corresponding vectors.
Spectral decomposition
A real symmetric matrix N × N A has an eigen-decomposition
n
X
A= λi ei eT
i
i=1
Computational Complexity
It generally takes O N 3 time to compute the eigen-decomposition.
Singular Value Decomposition
X = USVT
Computational Complexity
2
It generally takes O max (n, p) (min (n, p)) time to compute the
SVD-decomposition.
Positive definite matrix
A symmetric matrix A with the property that xT Ax ≥ 0 for any vector x is called
positive semidefinite. A symmetric matrix A, with the property that xT Ax > 0 for
any vector x 6= 0 is called positive definite. A positive definite matrix has full rank
and is thus invertible.
Eigen-decomposition
Using the eigen-decomposition of A,
X X 2
xT Ax = λi xT ei (ei )T x = λi xT ei
i i
which is greater than zero if and only if all the eigenvalues are positive. Hence A is
positive definite if and only if all its eigenvalues are positive.