0% found this document useful (0 votes)
3 views

CS115 Linear Algebra Review

This document serves as a review of linear algebra concepts, particularly in the context of machine learning, and is intended for a Maths for Computer Science course at the University of Information Technology. It covers topics such as vector operations, matrix multiplication, and fundamental subspaces, referencing works by Gilbert Strang. The document includes definitions, properties, and examples to illustrate key concepts in linear algebra.

Uploaded by

anhkhoa10082005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

CS115 Linear Algebra Review

This document serves as a review of linear algebra concepts, particularly in the context of machine learning, and is intended for a Maths for Computer Science course at the University of Information Technology. It covers topics such as vector operations, matrix multiplication, and fundamental subspaces, referencing works by Gilbert Strang. The document includes definitions, properties, and examples to illustrate key concepts in linear algebra.

Uploaded by

anhkhoa10082005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Linear Algebra

Review

Ngoc Hoang Luong, Viet-Hang Duong

Faculty of Computer Science


University of Information Technology (UIT)
Vietnam National University - Ho Chi Minh City (VNU-HCM)

Maths for Computer Science, Fall 2024

University of Information Technology (UIT) Maths for Computer Science CS115 1 / 78


Machine Learning and Linear Algebra

1
https://xkcd.com/1838/
University of Information Technology (UIT) Maths for Computer Science CS115 2 / 78
References

The contents of this document are taken mainly from the following
sources:
I Gilbert Strang. Linear Algebra and Learning from Data.
https://math.mit.edu/~gs/learningfromdata/
I Gilbert Strang. Introduction to Linear Algebra.
http://math.mit.edu/~gs/linearalgebra/
I Gilbert Strang. Linear Algebra for Everyone.
http://math.mit.edu/~gs/everyone/

University of Information Technology (UIT) Maths for Computer Science CS115 3 / 78


Table of Contents

1 Matrix-Vector Multiplication Ax

2 Matrix-Matrix Multiplication AB

3 The Four Fundamental Subspaces of A: C(A), C(A> ), N(A), N(A> )

4 Elimination and A = LU

5 Orthogonal Matrices, Subspaces, and Projections

University of Information Technology (UIT) Maths for Computer Science CS115 4 / 78


Table of Contents

1 Matrix-Vector Multiplication Ax

2 Matrix-Matrix Multiplication AB

3 The Four Fundamental Subspaces of A: C(A), C(A> ), N(A), N(A> )

4 Elimination and A = LU

5 Orthogonal Matrices, Subspaces, and Projections

University of Information Technology (UIT) Maths for Computer Science CS115 5 / 78


Vectors

I Vectors are arrays of numerical values.


I Each numerical value is referred to as coordinate, component, entry,
or dimension.
I The number of components is the vector dimensionality.
I e.g., a vector representation of a person: 25 years old (Age), making
30 dollars an hour (Salary), having 6 years of experience (Experience):
[25, 30, 6].
I Vectors are special objects that can be added together and multiplied
by scalars to produce another object of the same kind.

University of Information Technology (UIT) Maths for Computer Science CS115 6 / 78


Geometric Vectors

I Geometric vectors are often visualized as a quantity that has a


magnitude as well as a direction.
I e.g., the velocity of a person moving at 1 meter/second in the eastern
direction and 3 meters/second in the northern direction can be
described as a directed line from the origin to (1, 3).
I The tail of the vector is at the origin. The head is at (1, 3).
I Geometric vectors can have arbitrary tails.
I Two geometric vectors can be added, such that x + y = z is another
geometric vector.
I Multiplication by a scalar λx, λ ∈ R, is also a geometric vector.

University of Information Technology (UIT) Maths for Computer Science CS115 7 / 78


Vectors

4 4

3 3

2 2

1 1

-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4

-1

-2

-3

-4

University of Information Technology (UIT) Maths for Computer Science CS115 8 / 78


Vectors

I Polynomials are vectors. Adding two polynomials results in another


polynomial. Multiplied by a scalar, the result is also a polynomial.
I Audio signals are also vectors. Addition of two audio signals and
scalar multiplication result in new audio signals.
I Elements of Rn (tuples of n real numbers) are vectors. For example,

6
a =  14  ∈ R3
−3

is a triplet of numbers. Adding two vectors a, b ∈ Rn


component-wise results in another vectors a + b = c ∈ Rn .
Multiplying a ∈ Rn by λ ∈ R results in a scaled vector λa ∈ Rn .

University of Information Technology (UIT) Maths for Computer Science CS115 9 / 78


Basic Operations with Vectors

I Vector of the same dimensionality can be added or subtracted.


I Consider two d-dimensional vectors:
           
x1 y1 x 1 + y1 x1 y1 x 1 − y1
x + y = . . . + . . . =  . . .  x − y = . . . − . . . =  . . . 
xd yd x d + yd xd yd x d − yd
I Vector addition is commutative: x + y = y + x.

University of Information Technology (UIT) Maths for Computer Science CS115 10 / 78


Basic Operations with Vectors

I A vector x ∈ Rd can be scaled by a factor a ∈ R as follows


   
x1 ax1
v = ax = a . . . =  . . . 
xd axd

I Scalar multiplication operation scales the “length” of the vector, but


does not change the “direction” (i.e., relative values of different
components)

University of Information Technology (UIT) Maths for Computer Science CS115 11 / 78


Basic Operations with Vectors

I The dot product between two vectors x, y ∈ Rd is the sum of the


element-wise multiplication of their individual components.
d
X
x·y = x i yi
i=1

I The dot product is commutative:

d
X d
X
x·y = x i yi = yi xi = y · x
i=1 i=1

I The dot product is distributive:

x · (y + z) = x · y + x · z

University of Information Technology (UIT) Maths for Computer Science CS115 12 / 78


Basic Operations with Vectors

I The dot product of a vector with itself produces the squared Euclidean
norm. The norm defines the vector length and is denoted by k · k:
d
X
2
kxk = x · x = x2i
i=1

I The Euclidean norm of x ∈ Rd is defined as


v
u d
uX √
kxk2 = t x2i = x · x
i=1

and computes the Euclidean distance of x from the origin.


I The Euclidean norm is also known as the L2 −norm.

University of Information Technology (UIT) Maths for Computer Science CS115 13 / 78


Basic Operations with Vectors

I A generalization of the Euclidean norm is the Lp -norm, denoted by


k · kp :
d
!(1/p)
X
p
kxkp = |xi |
i=1

where p is a positive value.


I When p = 1, we have the Manhattan norm, or the L1 -norm.

University of Information Technology (UIT) Maths for Computer Science CS115 14 / 78


Basic Operations with Vectors

I Vectors can be normalized to unit length by dividing them with their


norm:
x x
x0 = =√
kxk x·x
I The resulting vector is a unit vector.
I The squared Euclidean distance x, y ∈ Rd can be shown to be the
dot product of x − y with itself:
d
X
kx − yk2 = (x − y) · (x − y) = (xi − yi )2
i=1

University of Information Technology (UIT) Maths for Computer Science CS115 15 / 78


Basic Operations with Vectors

I Cauchy-Schwarz Inequality: the dot product between a pair of


vectors is bounded above by the product of their lengths.
d
X
xi yi = |x · y| ≤ kxkkyk
i=1

I Triangle Inequality: Consider the triangle formed by the origin, x,


and y, the side length kx − yk is no greater than the sum kxk + kyk
of the other two sides.

University of Information Technology (UIT) Maths for Computer Science CS115 16 / 78


Basic Operations with Vectors

I Consider the triangle created by the origin, x, and y. Find the angle
θ between x and y.
I The side lengths of this triangle are: a = kxk, b = kyk, and
c = kx − yk. Using the cosine law, we have:

a2 + b2 − c2 kxk2 + kyk2 − kx − yk2


cos (θ) = =
2ab 2kxkkyk
2 2
kxk + kyk − (x − y) · (x − y)
=
2kxkkyk
x·y
=
kxkkyk
I Two vectors are orthogonal if their dot product is 0.
I The vector 0 is considered orthogonal to every vector.

University of Information Technology (UIT) Maths for Computer Science CS115 17 / 78


Matrices

Definition
With m, n ∈ N, a real-valued (m, n) matrix A is an m · n-tuple of
elements aij , i = 1, . . . , m, j = 1, . . . , n, which is ordered according to a
rectangular scheme consisting of m rows and n columns:
 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
A= . ..  , aij ∈ R
 
. .. . .
 . . . . 
am1 am2 . . . amn

Rm×n is the set of all real-valued (m, n)-matrices.


A ∈ Rm×n can also be represented as a ∈ Rmn by stacking all n columns
of the matrix into a long vector.

University of Information Technology (UIT) Maths for Computer Science CS115 18 / 78


Matrices

I A matrix has the same number of rows as columns is a square


matrix. Otherwise, it is a rectangular matrix.
I A matrix having more rows than columns is referred to as tall, while a
matrix having more columns than rows is referred to as wide or fat.
I A scalar can be considered as a 1 × 1 “matrix”.
I A d-dimensional vector can be considered a 1 × d matrix when it is
treated as a row vector.
I A d-dimensional vector can be considered a d × 1 matrix when it is
treated as a column vector.
I By defaults, vectors are assumed to be column vectors.

University of Information Technology (UIT) Maths for Computer Science CS115 19 / 78


Matrix-Vector Multiplication

2
https://xkcd.com/184/
University of Information Technology (UIT) Maths for Computer Science CS115 20 / 78
Matrix-Vector Multiplication Ax

I Multiply A times x using rows of A.


     ∗ 
2 3   2x1 + 3x2 a1 x
2 4 x1 = 2x1 + 4x2  = a∗2 x
x2
3 7 3x1 + 7x2 a∗3 x

Ax = dot products of rows of A with x.


I Multiply A times x using columns of A.
     
2 3   2 3
x
2 4 1 = x1 2 + x2 4 = x1 a1 + x2 a2
x2
3 7 3 7

Ax = combination of columns of a1 , a2 (of A) scaled by scalars


x1 , x2 respectively.

University of Information Technology (UIT) Maths for Computer Science CS115 21 / 78


Linear Combinations of Columns

Ax
Ax is a linear combination of the columns of A.

        
a11 a12 ... a1n x1 a11 a12 a1n
 a21 a22 ... a2n 
  x2 
   a21   a22   a2n 
..   ..  = x1  ..  + x2  ..  + · · · + xn  .. 
      
 .. .. ..
 . . . .  .   .   .   . 
am1 am2 . . . amn xn am1 am2 amn
Ax = x1 a1 + x2 a2 + · · · + xn an

Column space of A = C(A) = all vectors Ax


= all linear combinations of the columns

University of Information Technology (UIT) Maths for Computer Science CS115 22 / 78


Column Space of A

     
2 3   2 3
x
Ax = 2 4 1 = x1 2 + x2 4
x2
3 7 3 7

I Each Ax is a vector in the R3 space.


I All combinations Ax = x1 a1 + x2 a2 produce what part of R3 ?
I Answer: a plane, containing:
the line of all vectors x1 a1 ,
the line of all vectors x2 a2 ,
the sum of any vector on one line + any vector on the other line, filling
out an infinite plane containing the two lines, but not the whole R3 .

Definition
The combinations of the columns fill out the column space of A.

University of Information Technology (UIT) Maths for Computer Science CS115 23 / 78


Column Space of A

     
2 3   2 3
x
Ax = 2 4 1 = x1 2 + x2 4
x2
3 7 3 7

I C(A) is plane.
I The plane includes (0, 0, 0), produced when x1 = x2 = 0.
I The plane includes (5, 6, 10) = a1 + a2 and (−1, −2, −4) = a1 − a2 .
Every combination x1 a1 + x2 a2 is in C(A).
I The probability the plane does not include a random point rand(3,1)?
Which points are in the plane?

Ax = b
b is in C(A) exactly when Ax = b has a solution x.
x shows how to express b as a combination of the columns of A.

University of Information Technology (UIT) Maths for Computer Science CS115 24 / 78


Column Space of A
I b = (1, 1, 1) is not in C(A) because
   
2 3   1
x1
Ax = 2 4
  = 1
 is unsolvable.
x2
3 7 1
I What is the column space of A2 ?
  a3 = a1 + a2 , is already in C(A), the plane of a1 and a2 .
2 3 5
2 4 6  Including this dependent column does not go beyond C(A).
3 7 10 C(A2 )=C(A).
I What is the column space of A3 ?

  a3 = (1, 1, 1) is not in the plane C(A).


2 3 1
2 4 1 Visualize the xy-plane and a third vector (x3 , y3 , z3 ) out of the
plane (meaning that z3 6= 0).
3 7 1
C(A3 )=R3 .
University of Information Technology (UIT) Maths for Computer Science CS115 25 / 78
Column Spaces of R3

I Subspaces of R3 :
The zero vector (0, 0, 0).
A line of all vectors x1 a1 .
A plane of all vectors x1 a1 + x2 a2 .
The whole R3 with all vectors x1 a1 + x2 a2 + x3 a3 .
I Vectors a1 , a2 , a3 need to be independent. The only combination
that gives the zero vector is 0a1 + 0a2 + 0a3 .
I The zero vector is in every subspace.

University of Information Technology (UIT) Maths for Computer Science CS115 26 / 78


Linear Dependence

3
https://mathsci2.appstate.edu/ sjg/class/2240/hf14.html
University of Information Technology (UIT) Maths for Computer Science CS115 27 / 78
Independent Columns, Basis, and Ranks of A

Definition
A basis for a subspace is a full set of independent vectors: All vectors in
the space are combinations of the basis vector.

Create a matrix C whose columns come directly from A:


I If column 1 of A is not all zero, put it into C.
I If column 2 of A is not a multiple of column 1, put it into C.
I If column 3 of A is not a combination of columns 1 and 2, put it into
C. Continue.
I At the end, C will have r columns (r ≤ n). They are independent
columns, and they are a “basis” for the column space C(A).

University of Information Technology (UIT) Maths for Computer Science CS115 28 / 78


Independent Columns, Basis, and Ranks of A
   
1 3 8 1 3
n = 3 columns in A
If A = 1 2 6 then C = 1 2
r = 2 columns in C
0 1 2 0 1
 
1 2 3
n = 3 columns in A
If A = 0 4 5 then C = A
r = 3 columns in C
0 0 6
   
1 2 5 1
n = 3 columns in A
If A = 1 2 5 then C = 1
r = 1 columns in C
1 2 5 1

I The number r counts independent columns.


I It is the “dimension” of the column space of A and C (same space).

Definition
The rank of a matrix is the dimension of its column space.
University of Information Technology (UIT) Maths for Computer Science CS115 29 / 78
Rank Factorization A = CR
I The matrix C connects to A by a third matrix R: A = CR.
I A ∈ Rm×n , C ∈ Rm×r , R ∈ Rr×n
   
1 3 8 1 3  
1 0 2
A= 1 2 6 = 1 2
    = CR
0 1 2
0 1 2 0 1

I C multiplies the first column of R produces column 1 of A.


I C multiplies the second column of R produces column 2 of A.
I C multiplies the third column of R produces column 3 of A.
I Combinations of the columns of C produce the columns of A
−→ Put the right numbers in R.

Definition
R =rref(A) = row-reduced echelon form of A.
University of Information Technology (UIT) Maths for Computer Science CS115 30 / 78
Rank Factorization A = CR

   
1 3 8 1 3  
1 0 2
A= 1 2 6 = 1 2
    = CR
0 1 2
0 1 2 0 1

I The matrix R has r = 2 rows r ∗1 , r ∗2 .


I Multiply row 1 of C with R, we get r ∗1 + 3r ∗2 → row 1 of A.
I Multiply row 2 of C with R, we get r ∗1 + 2r ∗2 → row 2 of A.
I Multiply row 3 of C with R, we get 0r ∗1 + 1r ∗2 → row 3 of A.
I R has independent rows: No row is a combination of the other rows.
Hint: Look at the zeros and ones in R - the identity matrix I in R.
I The rows of R are a basis for the row space of A.
I Notation: The row space of matrix A = C(A> ).

University of Information Technology (UIT) Maths for Computer Science CS115 31 / 78


Rank Factorization A = CR

1 The r columns of C are independent (by their construction).


2 Every column of A is a combination of those r columns of C
(because A = CR).
3 The r rows of R are independent (they contain the matrix Ir ).
4 Every row of A is a combination of those r rows of R (because
A = CR).
Key facts:
I The r columns of C is a basis for C(A): dimension r.
I The r rows of R is a basis for C(A> ): dimension r.

Notice
The number of independent columns = The number of independent rows.
The column space and row space of A both have dimension r.
The column rank of A = The row rank of A.

University of Information Technology (UIT) Maths for Computer Science CS115 32 / 78


Q&A

Question: If an n × n matrix A has n independent columns, then


C = ?, R = ?
Answer: C = A, R = I.

University of Information Technology (UIT) Maths for Computer Science CS115 33 / 78


Table of Contents

1 Matrix-Vector Multiplication Ax

2 Matrix-Matrix Multiplication AB

3 The Four Fundamental Subspaces of A: C(A), C(A> ), N(A), N(A> )

4 Elimination and A = LU

5 Orthogonal Matrices, Subspaces, and Projections

University of Information Technology (UIT) Maths for Computer Science CS115 34 / 78


Matrix-Matrix Multiplication AB

4
https://mathsci2.appstate.edu/ sjg/class/2240/hf14.html
University of Information Technology (UIT) Maths for Computer Science CS115 35 / 78
Compute AB by Inner Products

I Inner products (rows times columns) produce each of the numbers


in AB = C:
    
· · · · · b13 · · ·
a21 a22 a23  · · b23  = · · c23 
· · · · · b33 · · ·

I cij = (row i of A)·(column j of B)


n
X
cij = ai1 b1j + ai2 b2j + · · · + ain bnj = aik bkj = a∗i bj
k=1

University of Information Technology (UIT) Maths for Computer Science CS115 36 / 78


Rank-1 Matrix

I Outer products (columns times rows) produce rank one matrices.


   
2  6 8 12
uv > = 2 3 4 6 = 6 8 12


1 3 4 6

I An m × 1 matrix (a column u) times a 1 × p matrix (a row v > ) gives


an m × p matrix.
I All columns of uv > are multiples of u.
I All rows of uv > are multiples of v > .
I The column space of uv > is the line through u.
I The row space of uv > is the line through v.
I All non-zero matrices uv > have rank one.

University of Information Technology (UIT) Maths for Computer Science CS115 37 / 78


AB = Sum of Rank-1 Matrices

I The product AB is the sum of columns ak times rows b∗k .

— b∗1 —
  
| |
.. ∗ ∗ ∗
AB = a1 . . . an    = a1 b1 + a2 b2 + · · · + an bn
 
.
| | — b∗n —

I Example:
            
1 0 2 4 1   0   2 4 0 0 2 4
= 2 4 + 0 5 = + =
3 1 0 5 3 1 6 12 0 5 6 17

University of Information Technology (UIT) Maths for Computer Science CS115 38 / 78


Insight from Column times Row

I Looking for the important part of a matrix A.


I Factor A into CR and look at the pieces ck r ∗k of A = CR.
I Factoring A into CR is the reverse of multiplying CR = A.
I The inside information about A is not visible until A is factored.

Important Factorizations
1 A = LU : elimination
2 A = QR: orthogonalization
3 S = QΛQ> : eigenvalues and orthonormal eigenvectors
4 A = XΛX −1 : diagonalization
5 A = U ΣV > : Singular Value Decomposition (SVD)

University of Information Technology (UIT) Maths for Computer Science CS115 39 / 78


Inverse Matrices
I The square matrix A is invertible if there exists a matrix A−1 that

A−1 A = I and AA−1 = I

I The matrix A cannot have two different inverses. Suppose BA = I


and also AC = I. Then B = C.

B(AC) = (BA)C gives BI = IC or B = C.

I If A is invertible, the one and only solution to Ax = b is x = A−1 b.


I If Ax = 0 for a nonzero vector x, then A has no inverse.
I If A and B are invertible then so is AB. The inverse of AB is

(AB)−1 = B −1 A−1

University of Information Technology (UIT) Maths for Computer Science CS115 40 / 78


Table of Contents

1 Matrix-Vector Multiplication Ax

2 Matrix-Matrix Multiplication AB

3 The Four Fundamental Subspaces of A: C(A), C(A> ), N(A), N(A> )

4 Elimination and A = LU

5 Orthogonal Matrices, Subspaces, and Projections

University of Information Technology (UIT) Maths for Computer Science CS115 41 / 78


Example 1


1 2
A= = uv >
3 6
 
1
I Column space C(A) is the line through u = .
3
 
> 1
I Row space C(A ) is the line through v = .
2
 
2
I Nullspace N(A) is the line through x = . Ax = 0.
−1
 
> 3
I Left nullspace N(A ) is the line through y = . A> y = 0.
−1

University of Information Technology (UIT) Maths for Computer Science CS115 42 / 78


Example 1

4 4

3 Left nullspace 3
Nullspace

2 2
Row space Column space
1 1
 
1 2
-4 -3 -2 -1 0 1 2 3 4
A= = uv > -4 -3 -2 -1 0 1 2 3 4

-1
3 6 -1

-2 -2

-3 -3

-4 -4

Definition
The column space C(A) contains all combinations of the columns of A.
The row space C(A> ) contains all combinations of the columns of A> .
The nullspace N(A) contains all solutions x to Ax = 0.
The left nullspace N(A> ) contains all solutions y to A> y = 0.

University of Information Technology (UIT) Maths for Computer Science CS115 43 / 78


Example 2

 
1 −2 −2
B=
3 −6 −6

I The row space C(B > ) is the infinite


line through v 1 = 13 (1, −2, −2).
I Bx = 0 has solutions x1 = (2, 1, 0)
and x2 = (2, 0, 1).
I x1 and x2 are in the same plane with
v2 = 31 (2, −1, 2) and v3 = 13 (2, 2, −1).
I The nullspace N(B) has an
orthonormal basis v 2 and v 3 , is the
infinite plane of v 2 and v 3 .
I v 1 , v 2 , v 3 : an orthonormal basis for R3 .

University of Information Technology (UIT) Maths for Computer Science CS115 44 / 78


Subspaces of A

   

row 1 0
 ..     .. 
If Ax = 0 then  .  x =  . 
row m 0
I x is orthogonal to every row of A.
I Every x in the nullspace of A is orthogonal to the row space of A.
I Every y in the nullspace of A> is orthogonal to the column space of
A.
N(A) ⊥ C(A> ) N(A> )⊥ C(A)
Dimensions n−r r m−r r

I Two orthogonal subspaces. The dimensions add to n and to m.

University of Information Technology (UIT) Maths for Computer Science CS115 45 / 78


Table of Contents

1 Matrix-Vector Multiplication Ax

2 Matrix-Matrix Multiplication AB

3 The Four Fundamental Subspaces of A: C(A), C(A> ), N(A), N(A> )

4 Elimination and A = LU

5 Orthogonal Matrices, Subspaces, and Projections

University of Information Technology (UIT) Maths for Computer Science CS115 46 / 78


Ax = b by Elimination

The usual order:


I Column 1.
Row 1 is the first pivot row.
Multiply row 1 by numbers l21 , l31 , . . . , ln1 and subtract from rows
2, 3, . . . , n of A respectively.
a21 a31 an1
Multipliers l21 = l31 = ... ln1 =
a11 a11 a11
   
2 1 −1 2 5 2 1 −1 2 5
  4 5 −3 6 9 0 3 −1 2 −1
A|b =
−2 5
→ 
−2 6 4 0 6 −3 8 9
4 11 −4 8 2 0 9 −2 4 −8

University of Information Technology (UIT) Maths for Computer Science CS115 47 / 78


Ax = b by Elimination

The usual order:


I Column 2.
The new row 2 is the second pivot row.
Multiply row 2 by numbers l32 , l42 , . . . , ln2 and subtract from rows
3, 4, . . . , n of A respectively.
a32 a42 an2
Multipliers l32 = l42 = ... ln2 =
a22 a22 a22
   
2 1 −1 2 5 2 1 −1 2 5
0
 3 −1 2 −1 0
→ 3 −1 2 −1
0 6 −3 8 9 0 0 −1 4 11 
0 9 −2 4 −8 0 0 1 −2 −5

University of Information Technology (UIT) Maths for Computer Science CS115 48 / 78


Ax = b by Elimination

The usual order:


I Column 3.
The new row 3 is the third pivot row.
Multiply row 3 by numbers l43 , l53 , . . . , ln3 and subtract from rows
4, 5, . . . , n of A respectively.
a43 a53 an3
Multipliers l43 = l53 = ... ln3 =
a33 a33 a33
   
2 1 −1 2 5 2 1 −1 2 5
0 3 −1 2 −1 0 3 −1 2 −1 
= U |c

 →
0 0 −1 4 11  0 0 −1 4 11 
0 0 1 −2 −5 0 0 0 2 6

I Columns 3 to n: Eliminating on A until obtaining the upper


triangular U : n pivots on its diagonal.

University of Information Technology (UIT) Maths for Computer Science CS115 49 / 78


Ax = b by Elimination

2x1 + x2 − x3 + 2x4 = 5
3x2 − x3 + 2x4 = −1
− x3 + 4x4 = 11
2x4 = 6
By back substitution, we get

x4 = 3, x3 = 1, x2 = −2, x1 = 1

University of Information Technology (UIT) Maths for Computer Science CS115 50 / 78


Lower Triangular L and Upper Triangular U
I Elimination on Ax = b produces the upper triangular matrix
 
2 1 −1 2
0 3 −1 2
U = 
0 0 −1 4
0 0 0 2

I and the lower triangular matrix


   
1 0 0 0 1 0 0 0
l21 1 0 0 2 1 0 0
L= = 
l31 l32 1 0 −1 2 1 0
l41 l42 l43 1 2 3 −1 1

I Elimination factors A into a lower triangular L times an upper


triangular U .
A = LU
University of Information Technology (UIT) Maths for Computer Science CS115 51 / 78
The Factorization A = LU

    
1 0 0 0
pivot row 1 2 1 −1 2
l21 1 0 0
 pivot row 2 =  4
   5 −3 6
A= 
l31 l32 0 pivot row 3 −2 5 −2 6
1
l41 l42 l43 1 pivot row 4 4 11 −4 8
   
1 0 0 0 0
l21    0 x x x  aij
l31  pivot row 1 + 0 x x x
= lij =
  
ajj
l41 0 x x x
   
1 0 0 0 0
 2   0 x x x 
−1 2 1 −1 2 + 0 x x x
=   

2 0 x x x
   
2 1 −1 2 0 0 0 0
4 2 −2 4  + 0 3 −1 2
 
=
−2 −1 1 −2 0 6 −3 8
4 2 −2 4 0 9 −2 4

The first step reduces the 4 × 4 problem to a 3 × 3 problem by removing l1 u∗1 .


University of Information Technology (UIT) Maths for Computer Science CS115 52 / 78
The Factorization A = LU

    
1 0 0 0 pivot row
1 0 0 0 0
l21 1 0 0 pivot row
2 ∗
0 3 −1 2
A=   = l1 u1 +  
l31 l32 1 0 pivot row
3 0 6 −3 8
l41 l42 l43 1 pivot row
4 0 9 −2 4
   
0 0 0 0 0
 1   0 0 0 0  aij
= l1 u∗1 + 
l32  pivot
 row 2 +  0 0 x x
 lij =
ajj
l42 0 0 x x
   
0 0 0 0 0

 1   0 0 0 0 
= l1 u1 +  2 0 3
 −1 2 +  0 0 x x 

3 0 0 x x
   
0 0 0 0 0 0 0 0
0 3 −1 2
∗  + 0 0 0 0

= l1 u1 + 0 6 −2

4 0 0 −1 4
0 9 −3 6 0 0 1 −2

The second step reduces the 3 × 3 problem to a 2 × 2 problem by removing l2 u∗2 .


University of Information Technology (UIT) Maths for Computer Science CS115 53 / 78
The Factorization A = LU
    
1 0 00 pivot row 1 0 0 0 0
l21 1 00 pivot row 2 ∗ ∗
0 0 0 0 
A=   = l1 u1 + l2 u2 +  
l31 l32 1
0 pivot row 3 0 0 −1 4
l41 l42 l43 1 pivot row 4 0 0 1 −2
   
0 0 0 0 0
∗ ∗
 0   0 0 0 0  aij
= l1 u1 + l2 u2 +   pivot row 3 + 
    lij =
1 0 0 0 0 ajj
l43 0 0 0 x
   
0 0 0 0 0
 0   0 0 0 0 
= l1 u∗1 + l2 u∗2 + 
 1  0 0 −1 4 + 0 0 0 0 
  

−1 0 0 0 x
   
0 0 0 0 0 0 0 0
0 0 0 0 
= l1 u∗1 + l2 u∗2 +   + 0 0 0 0
 
0 0 −1 4  0 0 0 0
0 0 1 −4 0 0 0 2
= l1 u∗1 + l2 u∗2 + l3 u∗3 + l4 u∗4
The third step reduces the 2 × 2 problem to a single number by removing l3 u∗3 .
University of Information Technology (UIT) Maths for Computer Science CS115 54 / 78
Elimination and A = LU

   
I Start from A b = LU b .
I Elimination produces U L−1 b = U c .
   

I Elimination on Ax = b produces the equation U x = c that are ready


for back substitution.
I A = LU = li u∗i = sum of rank one matrices.
P

University of Information Technology (UIT) Maths for Computer Science CS115 55 / 78


Table of Contents

1 Matrix-Vector Multiplication Ax

2 Matrix-Matrix Multiplication AB

3 The Four Fundamental Subspaces of A: C(A), C(A> ), N(A), N(A> )

4 Elimination and A = LU

5 Orthogonal Matrices, Subspaces, and Projections

University of Information Technology (UIT) Maths for Computer Science CS115 56 / 78


Orthogonality
I Orthogonal ∼ perpendicular.
I Orthogonal vectors x and y:
x> y = x1 y1 + x2 y2 + · · · + xn yn = 0
Law of Cosines: θ is the angle between x and y:

kx − yk2 = kxk2 + kyk2 − 2kxkkyk cos θ

Orthogonal vectors have cos θ = 0.


Pythagoras Law:

kx − yk2 = kxk2 + kyk2


(x − y)> (x − y) = x> x + y > y
x> x + y > y − x> y − y > x = x> x + y > y
x> y = 0

Law of Cosines: θ is the angle between


University of Information Technology (UIT) x and Science
Maths for Computer y: CS115 57 / 78
Orthogonal Basis

I Orthogonal basis for a subspace: Every pair of basis vectors has


v>i vj = 0
I Orthonormal basis: Orthogonal basis of unit vectors: Every v >i vi = 1
(length 1).
I From orthogonal to orthonormal, divide every basis vector v i by its
length kv i k.
I The standard basis is orthogonal (and orthonormal) in Rn :
     
1 0 0
Standard basis i, j, k in R3 i = 0
 j = 1
 k = 0

0 0 1

I Every subspace of Rn has an orthogonal basis.

University of Information Technology (UIT) Maths for Computer Science CS115 58 / 78


Orthogonal Subspaces

I Subspace S is orthogonal to subspace T: Every vector in S is


orthogonal to every vector in T.

University of Information Technology (UIT) Maths for Computer Science CS115 59 / 78


Orthogonal Subspaces

I The row space C(A> ) is orthogonal to the nullspace N(A).


    
row 1 0
 ..     .. 
Ax =  .  x =  . 
row m 0

I The column space C(A) is orthogonal to the left nullspace N(A> ).

(column 1)>
    
0
> .
.. .
A y=  y =  .. 
    

(column m)> 0

University of Information Technology (UIT) Maths for Computer Science CS115 60 / 78


Orthogonal Subspaces
I Every vector v in Rn has a row space component v row and a
nullspace component v null : v = v row + v null
 
 x
1 0 0  1
  
0
Ax = x2 =
0 1 0 0
x3

I The row space C(A> ) is the plane of all vectors β1 a∗1 + β2 a∗2 .
I The nullspace N(A) is the line through u = (0, 0, 1): all vectors β3 u
         
v1 v1 1 0 0
v = v2  ∈ R3 v = v2  = β1 0 + β2 1 + β3 0
v3 v3 0 0 1
| {z } | {z }
v row v null

I Dimensions: dim C(A> ) + dim N(A) = r + (n − r) = n.


I A row space basis (r vectors) and a nullspace basis (n − r vectors)
produces a basis for the whole Rn (n vectors).
University of Information Technology (UIT) Maths for Computer Science CS115 61 / 78
The Big Picture
Fundamental Theorem in Linear Algebra
The row space and nullspace of A are orthogonal complements in Rn .

dimension
C(A)
dimension C(AT) =r
=r row column
space space
of A row space to column space of A

Axrow = b
Rn 0 0 Rm
= 0
Ax null
nullspace to 0 nullspace
pace
of A nulls of AT
dimension N(A) dimension
=n-r N(AT) =m-r

Figure: Two pairs of orthogonal subspaces.

University of Information Technology (UIT) Maths for Computer Science CS115 62 / 78


Projection onto a Line
I e=b−p
I p = xa
I Because e is orthogonal to a:

a> e = 0
a> (b − p) = 0
a> (b − xa) = 0
xa> a = a> b
a> b
x=
a> a
>
I Therefore, p = ax = a a> b
a a
I There is a projection matrix P that p = P b.

aa>
P =
a> a
University of Information Technology (UIT) Maths for Computer Science CS115 63 / 78
Projection onto a Line

aa>
P =
a> a

I Column space of A: matrix-vector multiplication Ax ∈ C(A).


I p = P b. What is the column space C(P )?
I C(P ) is the line through a.
I Is P symmetric?
>
aa> aa>

>
P = = = P. Yes.
a> a a> a
I What if we project b twice?
 > >
2 aa aa aa>
P = = =P
a> a a> a a> a

University of Information Technology (UIT) Maths for Computer Science CS115 64 / 78


Projection onto a Subspace

I Why bother with projection?


I Because Ax = b may have no solution (m  n). b might not in the
column space C(A).
I Solve Ax̂ = p instead, where p is the projection of b onto the column
space C(A).

University of Information Technology (UIT) Maths for Computer Science CS115 65 / 78


Projection onto a Subspace

I Choose two independent vectors a1 , a2


in the plane to form a basis.
 
| |
A = a1 a2 
| |

I Plane of a1 , a2 = Column space of A.


I p is a linear combination of a1 , a2 .

p = x̂1 a1 + x̂2 a2
= Ax̂

I Find x̂.

University of Information Technology (UIT) Maths for Computer Science CS115 66 / 78


Projection onto a Subspace
I p = Ax̂. Find x̂.
I e = b − p is perpendicular to the plane.
 
 >
 |  
a1 e = 0
a>
2 0
|
A> e = 0
A> (b − Ax̂) = 0
A> Ax̂ = A> b
x̂ = (A> A)−1 A> b

I We have p = Ax̂ = A(A> A)−1 A> b.


I The projection matrix P :

P = A(A> A)−1 A>

University of Information Technology (UIT) Maths for Computer Science CS115 67 / 78


Projection onto a Subspace

P = A(A> A)−1 A>

I Is P symmetric?
P > = (A(A> A)−1 A> )> = A((A> A)−1 )> A>
= A((A> A)> )−1 A>
= A(A> A)−1 A> = P
Yes.
I Is P 2 = P ?
P 2 = A(A> A)−1 A> A(A> A)−1 A>
= A(A> A)−1 (A> A)(A> A)−1 A>
= A(A> A)−1 A> = P
Yes.
University of Information Technology (UIT) Maths for Computer Science CS115 68 / 78
Q with Orthonormal Columns
 
2
1
Q>
 
Q1 =  2  1 Q1 = 1
3
−1
 
2 2  
1 1 0
Q2 =  2 −1 Q>
2 Q2 =
3 0 1
−1 2
   
2 2 −1 1 0 0
1
Q3 =  2 −1 2  Q>
3 Q3 = 0 1 0
 
3
−1 2 2 0 0 1

I Columns of Q’s are orthonormal.


I Each one of those matrices has Q> Q = I.
I Q> is a left inverse of Q.
I Q3 Q> >
3 = I. Q3 is also a right inverse.
University of Information Technology (UIT) Maths for Computer Science CS115 69 / 78
Orthogonal Projection

I All the matrices P = QQ> have P T = P .

P > = (QQ> )> = QQ> = P

I All the matrices P = QQ> have P 2 = P .

P 2 = (QQ> )(QQ> ) = Q(Q> Q)Q> = QQ> = P

I P is a projection matrix.

Orthogonal Projection
If P 2 = P = P > then P b is the orthogonal projection of b onto the
column space of P .

University of Information Technology (UIT) Maths for Computer Science CS115 70 / 78


Orthogonal Projection
I Project b = (3, 3, 3) on the Q1 line. P1 = Q1 Q>
1
       
2  3 2 2
1   1
P1 b = 2 2 2 −1 3 =  2  9 =  2 
9 9
−1 3 −1 −1

I P1 splits b in 2 perpendicular parts: projection P1 b and error


e = b − P1 b
University of Information Technology (UIT) Maths for Computer Science CS115 71 / 78
Orthogonal Projection
I Project b = (3, 3, 3) on the Q2 plane. P2 = Q2 Q>2
       
2 2   3 2 2   4
1 2 2 −1   1 9
P2 b = 2 −1 3 =  2 −1 = 1
9 2 −1 2 9 9
−1 2 3 −1 2 1

I P2 projects b on the column space of Q2 .


I The error vector b − P2 b is shorter than b − P1 b.
University of Information Technology (UIT) Maths for Computer Science CS115 72 / 78
Orthogonal Projection

 
2 2 −1
1
Q3 = 2 −1 2 
3
−1 2 2

I What is P3 b = Q3 Q> 3b ?
I Project b onto the whole space R3 .
I P3 = Q3 Q> 3
3 = I. Thus, P3 b = b. Vector b is in R already.
I The error e is zero!!!

University of Information Technology (UIT) Maths for Computer Science CS115 73 / 78


Orthogonalization

I Determine if a list of vectors a1 , a2 , . . . , ak is linearly independent.

Gram-Smidth algorithm
given vectors a1 , a2 , . . . , ak
for i = 1, . . . , k
1 Orthogonalization. q̃ i = ai − (q T T
1 ai )q 1 − . . . − (q i−1 ai )q i−1
2 Test for linear dependence. If q̃ i = 0, quit.
3 Normalization. q i = q̃ i /kq̃ i k

I If the vectors are linearly independent, the Gram-Smidth algorithm


produces an orthonormal collection of vectors q 1 , . . . , q k .
I If the vectors a1 , . . . , aj−1 are linearly independent, but a1 , . . . , aj
are linearly dependent, the algorithm detects this and terminates.

University of Information Technology (UIT) Maths for Computer Science CS115 74 / 78


Orthogonalization: Example

a1 = (−1, 1, −1, 1), a2 = (−1, 3, −1, 3), a3 = (1, 3, 5, 7)


Applying the Gram-Smidth algorithm gives the following results.
I i = 1:
q̃ 1 = a1
1
q1 = q̃ = (−1/2, 1/2, −1/2, 1/2)
kq̃ 1 k 1

I i = 2:
q̃ 2 = a2 − (q T
1 a2 )q 1
= (−1, 3, −1, 3) − 4(−1/2, 1/2, −1/2, 1/2) = (1, 1, 1, 1)
1
q2 = q̃ = (1/2, 1/2, 1/2, 1/2)
kq̃ 2 k 2

University of Information Technology (UIT) Maths for Computer Science CS115 75 / 78


Orthogonalization: Example

I i = 3:

q̃ 3 = a3 − (q T T
1 a3 )q 1 − (q 2 a3 )q 2
       
1 −1/2 1/2 −2
3  1/2  1/2 −2
=5 − 2 −1/2 − 8 1/2 =  2 
      

7 1/2 1/2 2
1
q3 = q̃ = (−1/2, −1/2, 1/2, 1/2)
kq̃ 3 k 3

I The completion of the Gram-Smidth algorithm without early


termination indicates that the vectors a1 , a2 , a3 are linearly
independent.

University of Information Technology (UIT) Maths for Computer Science CS115 76 / 78


QR factorization: A = QR

A = QR
 
    r11 r12 . . . r1n
| | | | | | 0 r
22 . . . r2n 

a1 a2 . . . an  = q 1 q 2 . . . qn  .

.. . . .. 
 .. . . . 
| | | | | |
0 0 . . . rnn
rkk = kq̃ k k
rk−1,k = q T
k−1 ak

University of Information Technology (UIT) Maths for Computer Science CS115 77 / 78


QR factorization: A = QR

x̂ = (AT A)−1 AT b
= ((QR)T (QR))−1 (QR)T b
= (RT QT QR)−1 RT QT b
= (RT R)−1 RT QT b (because QT Q = I)
= R−1 R−T RT QT b
= R−1 QT b

Solving for x̂ by solving Rx̂ = QT b with back-substitution.

University of Information Technology (UIT) Maths for Computer Science CS115 78 / 78

You might also like