0% found this document useful (0 votes)
92 views

LectureNotes LinearAlgebra

Linear Algebra content

Uploaded by

Bas Sim
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views

LectureNotes LinearAlgebra

Linear Algebra content

Uploaded by

Bas Sim
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

MA214:

Mathematics and Contemplations On


Linear Algebra and Its Applications
towards building a “Data Scientist”

Waleed A. Yousef, Ph.D.,

Human Computer Interaction Lab.,


Computer Science Department,
Faculty of Computers and Information,
Helwan University,
Egypt.

March 24, 2019

Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Lectures Notes (http://www.helwan.edu.eg/university/staff/Dr.WaleedYousef/HTML/Home.html)
follows:

Strang, G., 2016. In- Searle, S. R., 1982. Schott, J. R., 2005. Golub, G. H., Van
troduction to linear Matrix algebra use- Matrix analysis for Loan, C. F., 1996. Ma-
algebra, 5th Edition. ful for statistics. Wi- statistics, 2nd Edi- trix computations,
Wellesley-Cambridge ley, New York tion. Wiley, Hoboken, 3rd Edition. Johns
Press N.J Hopkins University
Press, Baltimore

i Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Linear Algebra: FCIHOCW vs. MITOCW

• Arabic vs. English.

• More rigorous treatment.

• Teaching, while “Data Science” in mind.

ii Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Course Objectives

• Developing rigorous treatment.

• Developing mathematical foundations to many courses and areas, in particular “Data Science”

• Building intuition.

• Linking to CS applications (e.g., Pattern Recognition, Image Processing, etc.)

iii Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Linear Algebra, Prerequisites, and
Applications

iv Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Data Science, Pattern Recognition, Machine Learning, Data Analysis, ...

Multivariate Statistics

Statistics

Optimization Probability Data Visualization

Calculus Linear Algebra

Discrete Mathematics

• Some prerequisites are not so strict; others are possible, e.g., GPU, Algorithms, etc.
• It differs from researchers to practitioners; See pattern recognition course and big picture talk.

v Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Computer Graphics

vi Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Contents

Contents vii

1 Introduction 1
1.0 Back to School: visualspace! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1 Angle, Lengths, and Dot Products (visualspace and school again) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Extension and Abstraction: Vectors and Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Solving Linear Equations 12


2.1 Vectors and Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 Three Equations in Three Unknowns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 The Idea of Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Rules for Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.1 Matrix Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.2 Matrix Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.3 Matrix Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.4 Addition, Subtraction, and Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.5 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.6 The Laws of Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4 Elimination Using Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5 Inverse Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.6 Elimination Using Matrices is A = LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.7 Computational Issues:
Scientific Computing Environments (SCEs), Examples, and Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

vii Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


2.7.1 On Scientific Computing Environments and Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.7.2 Issues on Complexity* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3 Vector Spaces and Subspaces 70


3.1 Spaces of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.1.1 Properties of Vector Spaces (seems trivial for R but deep for others!)* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.1.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.1.3 The column space of the matrix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.2 The Nullspace of A: Solving Ax = 0 and Rx = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.2.1 Systematic solution using pivot columns, free columns, and reduced echelon form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.2.2 Gauss Elimination Algorithm: revisited and detailed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.3 3.3 The Complete Solution to Ax = b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.4 3.4 Independence, Basis and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.5 3.5 Dimensions of the Four Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Bibliography

viii Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Chapter 1

Introduction

1 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


1.0 Back to School: visual space!
• We locate a point in a 3D space by three numbers.

• The coordinates are perpendicular.

• The order of the axes X , Y , Z : “right-hand” rule.

• The 3-tuple (3 ordered elements, or triple)


(v 1 , v 2 , v 3 ) ∈ R × R × R = R3 = {(x, y, z)|x, y, z ∈ R},
the set of all points.

• The following are equivalent (some books differen-


tiate; we do not):

• the 3-tuple v = (v 1 , v 2 , v 3 ).
• the point v = (v 1 , v 2 , v 3 ).
• the arrow connecting O to v, i.e., the vector
−→
v = Ov = (v 1 , v 2 , v 3 ).

• The line segment Ov consists of all points, not only


v.

2 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Definition 1 (Geometric Manipulation) .

• A vector is used to indicate a displacement in some


direction; starting point is not important

• Start at any point A, move a distance in the di-


−→ −→ −→
rection of Ov, and end at B . Then, AB = Ov = v.
−→ −→
(B ̸= AB ; but v = Ov)

• Addition: u + v

• Scalar Multiplication: If c is a scalar, then u = cv


is a vector whose length is |c|× length of v and di-
rection:

• Scalar and Addition:

3 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Definition 2 (Algebraic Treatment) . Addition and
Scalar: if a = (a 1 , a 2 , a 3 ), b = (b 1 , b 2 , b 3 ):

a + b = (a 1 + b 1 , a 2 + b 2 , a 3 + b 3 ),
ca = (c a 1 , c a 2 , c a 3 ).

Proof of equivalence. Trivial.

Hint: The displacement is added algebraically:


given P = (a 1 , a 2 , a 3 ), and any A = (x, y, z).
Then:
−−→ −→
P= OP = AB ,
B= A + P = (x + a 1 , y + a 2 , z + a 3 ).

4 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Lemma 3 (Properties of Vectors) . For any two vectors a and b,

a +b = b+a
a + (b + c) = (a + b) + c
a +0 = a
a + (−a) = 0
c(a + b) = ca + cb
(c + d )a = ca + d a

Proof. It is quite straight forward to prove (HW)

Example 4 Consider the vector a,

a = (a 1 , a 2 , a 3 )
= a 1 i + a 2 j + a 3 k,
i = (1, 0, 0),
j = (0, 1, 0),
k = (0, 0, 1).

5 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


1.1 Angle, Lengths, and Dot Products (visual space and school again)
• Notation:
( u1 ) the vector u, with 3-tuple (u 1 , u 2 , u 3 ) is written as:
u = uu2 or u ′ = (u 1 , u 2 , u 3 ).
3

• it is a school business to prove that (whether 2D or 3D):

c 2 = a 2 + b 2 − 2ab cos θ
∥u − v∥2 = ∥u∥2 + ∥v∥2 − 2∥u∥∥v∥ cos θ
2∥u∥∥v∥ cos θ = (u 12 + u 22 ) + (v 12 + v 22 ) − (u 1 − v 1 )2 − (u 2 − v 2 )2 = 2u 1 v 1 + 2u 2 v 2
u1 v 1 + u2 v 2
cos θ = .
∥u∥∥v∥

• This is why we defined the dot product to be:


u ′ v = u 1 v 1 + u 2 v 2 = ∥u∥∥v∥ cos θ.

• When u ′ v is zero we say they are orthogonal.

• If u = v, then θ = 0, u ′ u = u 1 u 1 + u 2 u 2 = ∥u∥2 .

• u is unit vector if ∥u∥ = 1. Then ∀u, u/∥u∥ is a unit vector.

u′v = ∥v∥∥u∥ cos θ = ∥v∥×Projection Length of u on v



u (v/∥v∥) = Projection Length of u on v

v (u/∥u∥) = Projection Length of v on u.
6 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
Lemma 5 (Properties) .

• Basic properties:

u′ v = v ′u
∥au∥ = |a|∥u∥
a(u ′ v) = (au)′ v = au ′ v
(au + bv)′ w = au ′ w + bv ′ w
(u + v)′ (u + v) = u ′ u + 2u ′ v + v ′ v.

• Cauchy-Shwartz inequality: −∥u∥∥v∥ ≤ u ′ v ≤ ∥u∥∥v∥


Proof. immediate from both: −1 ≤ cos θ ≤ 1 and u ′ v = ∥u∥∥v∥ cos θ.

• Traingular inequality: ∥u + v∥ ≤ ∥u∥ + ∥v∥


Proof. ∥u + v∥2 = (u + v)′ (u + v) = u ′ u + 2u ′ v + v ′ v ≤ ∥u∥2 + 2∥u∥∥v∥ + ∥v∥2 = (∥u∥ + ∥v∥)2 .

• Then, we can generalize this definition in higher dimensions, and define the angle between two vectors
for p > 3.

7 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Example 6 w = (−1, 2)′ , v = (4, 2)′ , then
w ′v (−1) (4) + (2) (2) 0
cos θ = =√ √ = p p =0
∥w∥∥v∥ (−1) + (2) (4) + (2)
2 2 2 2 5 20

Example 7 (3D) .

Hint: To save space, we write, e.g., v = (4, 2)′ . Sometimes, we drop the prime if there is no confusion.
8 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
Example 8 (Linear Combination) .

• We can generalize to p-dimensions, although we cannot visualize.

• What is the picture for ALL linear combinations? “spanning” the space, independence ...
9 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
1.2 Extension and Abstraction: Vectors and Linear Combinations
Extension in both: meaning and number of components to treat applications.

Definition 9 (Vector) The ordered p-tuple (v 1 , v 2 , · · · , v p ), v i ∈ R , is called a p-dimensional vector.

Definition 10 (dot product (inner product), length, angle)


 
v1
( )  . 
〈u, v〉 = u.v = u ′ v = u1 , · · · , u p ·  .. 
vp

p
= u1 v 1 + · · · + u p v p = ui v i
i =1
p √
∥u∥ = u ′ u = u 12 + u 22 + · · · + u p2
u′v
cos θ = .
∥u∥∥v∥

Now, we have to reprove Cauchy-Schwartz inequality; then triangular inequality follows directly!
u′ v
Proof. ∀λ ∈ R , (we will put it later as ∥v∥ 2)

(u ′ v)2 (u ′ v)2 (u ′ v)2


0 ≤ ∥u − λv∥2 = ∥u∥2 − 2λu ′ v + ∥λv∥2 = ∥u∥2 − 2 + ∥v∥ 2
= ∥u∥2

∥v∥2 ∥v∥4 ∥v∥2
(u ′ v)2 ≤ ∥u∥2 ∥v∥2 =⇒ −∥u∥∥v∥ ≤ u ′ v ≤ ∥u∥∥v∥

10 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Definition 11 (Linear Combination: generalization to adding vectors; this is the abstraction) .
Consider the two p-dimensional vectors v and w, and c, d ∈ R. We call c v + d w a linear combination.

     
v1 w1 c v1 + d w1
 .   .   .. 
c  ..  + d  ..  =  . .
vp wp cv p + d w p

11 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Chapter 2

Solving Linear Equations

12 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


2.1 Vectors and Linear Equations

( )( ) ( )
x − 2y = 1 1 −2 x 1
≡ = ≡ Ax = b
3x + 2y = 11 3 2 y 11

Column picture (linear combination) Row picture (vector equation of line intersection)
( ) ( ) ( ) ( )
1 −2 1 ( ) x
x+ y= 1 −2 =1
3 2 11 y
( ) ( ) ( ) ( )
1 −2 1 ( ) x
3+ 1= 3 2 = 11
3 2 11 y
p
Projection = (x, y)(α/∥α∥) = b/∥α∥ = 11/ 13.

13 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


2.1.1 Three Equations in Three Unknowns

      
x + 2y + 3z = 6 1 2 3 x 6 R1 x
2x + 5y + 2z = 4 ≡ 2 5 2   y  = 4  ≡ Ax = b ≡ C 1 x +C 2 y +C 3 z = R 2 x = b,
6x − 3y + z = 2 6 −3 1 z 2 R3 x

where x = (x, y, z)′ .

14 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


2.2 The Idea of Elimination
Systematic way to solve linear equations

• Find the pivot (1 in this example)

• Form the uper triangle system of equations.

• Backsubstitution.

Example 12 (2 equations)
x − 2y =1
(Before)
3x + 2y = 11

x − 2y =1
(After)
8y =8

Example 13 (3 equations)

2x + 4y − 2z = 2 2x + 4y − 2z = 2 2x + 4y − 2z = 2
4x + 9y − 3z = 8 1y + 1z = 4 1y + 1z = 4
−2x − 3y + 7z = 10 1y + 5z = 12 4z = 8
Step 0 Step 1 Step 2

The solution is: z = 2, y = 2, x = −1; i.e., (−1, 2, 4)

15 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Failure 1: no solution
x − 2y =1
(Before)
3x − 6y = 11

x − 2y =1
(After)
0y =8
Interpretation from two perspectives

16 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Failure 2: infinite solutions
x − 2y =1
(Before)
3x − 6y =3

x − 2y =1
(After)
0y =0
Interpretation from two perspectives

17 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


2.3 Rules for Matrix Operations
Definition 14 (Matrix) : A matrix A m×n is a square array (of size m × n) of “objects” (could be numbers could
be other blocks of matrices). The element a i j is located in row i and column j respectively. We say A = (a i j ) or
in some books A = ((a i j )) to denote:
 
a 11 a 12 ··· a 1n
 
 a 21 a 22 ··· a 2n 
 . .. 
 . .. 
 . . . 
a m1 a m2 ··· a mn

• Languages handle matrices differently; e.g., Matlab images, C (row-wise), Fortran (column-wise), etc.

• Traversing matrices is Θ(m × n).

18 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


2.3.1 Matrix Transpose
( ) ( )
Definition 15 The transpose of the matrix A m×n is A ′ n×m , where A i j = A ′ j i

Example 16
 
( ) 18 19
18 17 11
A= , A ′ = 17 −4
19 −4 0
11 0

Notice:
( )′
• A ′ = A.

• For vectors:  
19
( )
x = −4 , x ′ = 19 −4 0 .
0
( )′ ( )
we usually write x = 19 −4 0 , or x ′ = 19 −4 0 to save vertical space.

Definition 17 (Symmetric Matrices (around diagonal)) A square matrix A m×m is called symmetric if A i j =
A j i ; i.e., A = A ′ .
 
18 17 11
Example 18 (write a SW to check the symmetry of ) : A = 17 −4 0 
11 0 2

19 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


2.3.2 Matrix Partitioning

Definition 19 A matrix A p×q is said to be parti- Example 20


tioned into r ≤ p rows and c ≤ q columns if it is    
written in the form 1 6 8 9 3 8 1 6 8 9 3 8
2 1  2 4 
   4 1 6 1   1 6 1 1 
   
A 11 A 12 · · · A 1c ↕ p 1 A = 3 3 6 1 2 1 =  3 3 6 1 2 1 
     
 A 21 A 22 · · · A 2c  ↕ p 2 9 1 4 6 8 7  9 1 4 6 8 7 

A= . ..  (2.1)
..  . 6 8 1 4 3 2 6 8 1 4 3 2
 .. . .  .. ( )
Ar 1 Ar 2 · · · Ar c ↕ pr (A 11 )3×4 (A 12 )3×2
=
(A 21 )2×4 (A 22 )2×2
←→ ←→ · · · ←→
q1 q2 qc
From the definition, this partitioning is not allowed:
where the block (submatrix) A i j is a matrix
( )  
A i j p q , and of course 1 6 8 9 3 8
i j
 
 2 4 1 6 1 1 
 
A i j = (A i j )p i ×q j  3 3 6 1 2 1 
 

r  9 1 4 6 8 7 
pi = p 6 8 1 4 3 2
i =1
∑ c
Application: dividing vectors in regression.
q j = q.
j =1

20 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Transposing Partitioned Matrix

It is quite easy to show that, the transpose of a partitioned matrix (2.1) is given by
 ′  
A 11 A 12 ··· A 1c A ′11 A ′21 ··· A ′r 1
   
 A 21 A 22 ··· A 2c   A ′12 A ′22 ··· A ′r 2 
A′ = 
 .. .. ..  
 = . .. ..  
 . . .   .. . . 
Ar 1 Ar 2 ··· Ar c A ′1c A ′2c ··· A ′r c

Example 21
( )
2 8 9 ( )
A= = A 11 A 12
3 7 4
 
( ) 2 3
A ′11
A′ = = 8 7 
A ′12
9 4

21 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Paritioning into Vectors

Suppose that a j is the j th column of A r ×c . Then


( ) ( )
A = a1 a2 ··· a c = A 11 A 12 ··· A 1c ,

where each submatrix is just a r × 1 vector.

Similarly, it can be partitioned into r rows where α′i is the i th row:


   
A 11 α′
   1′ 
 A 21  α2 
A=   
 ..  =  .. 
 .   . 
Ar 1 α′r

22 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


2.3.3 Matrix Trace
Definition 22 For a square matrix A m×m , the trace, trace (A), (for short tr (A)), is defined as the sum of diag-
onal elements; i.e.,
∑m
tr (A) = Ai i .
i =1

HW: write a C function to calculate the trace. (of course Θ(m))

Corollary 23
( )
tr (A) = tr A ′ .
tr (x) = x ∀x ∈ R.

Proof.
∑ ∑ ′
tr(A) = Ai i = (A )i i = tr(A’) .
i i

Example 24
 
1 7 6
A = 8 3 9  =⇒ tr (A) = −4.
4 −2 −8

23 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


2.3.4 Addition, Subtraction, and Scaling
Definition 25 For equal size matrices A m×n and B m×n , and for a scalar λ:

• the matrix C = A ± B is defined as


Ci j = Ai j ± Bi j ,

• the matrix D = λA is defined as


D i j = λA i j ,

• we say that A = B if A i j = B i j ∀i , j .

• and a matrix, all of whose components are zeros, is written as 0m×n .

• Of course, A + 0 = A

Corollary 26 It is quite easy to show that

(A + B )′ = A ′ + B ′
tr (A + B ) = tr (A) + tr (B )

Proof. Show that the general element i j of LHS equal to that of RHS.
( )
(A + B )′ i j = (A + B ) j i = A j i + B j i = (A ′ )i j + (B ′ )i j = (A ′ + B ′ )i j .
∑ ∑ ∑ ∑
tr(A + B ) = (A + B )i i = (A i i + B i i ) = A i i + B i i = tr(A) + tr(B ).
i i i i

24 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


2.3.5 Matrix Multiplication
  
a 11 a 1n b 11 b 1p
 .. .. ..   .. .. .. 
C = A m×n B n×p =  . . .  . . .  = C m×p
a m1 a mn b n1 b np
The general element C i j is the dot product of Row i and C ol j :


n
C i k = a i′ b k = a i j b j k = a i 1 b 1k + a i 2 b 2k + . . . + a i n b nk .
j =1

However, we can partition either (or both) A m×n and B n×p as rows and/or columns to see the multiplication
differently. This has a great value in mathematical treatments and semantics. We have only 4 ways to do
that:

1. A m×1 , B 1×p .

2. A 1×n , B n×p .

3. A 1×n , B n×1 .

4. A m×n , B n×1 .

Now, we will treat each case in detail.

25 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


1- As dot products

 
a 1′
 . ( )
C =  ..  b 1 · · · b p (A m×1 B 1×p partitioning)

am
 ′   ∑n ∑n 
a1 b1 a 1′ b p j =1 a 1 j b j 1 j =1 a 1 j b j p
 .. ..   .. .. 
= . ..  .. 
. .  . . . 
′ ′ ∑n ∑n
am b1 am bp j =1 m j b j 1
a j =1 a m j b j p

n
C i k = a i′ b k = a i j b j k = a i 1 b 1k + · · · + a i n b nk
j =1

Example 27
( )( )
1 4 3 2 0
=
1 5 1 4 −1
( )
3 + 4 2 + 16 0−4
=
3 + 5 2 + 20 0−5
( )
7 18 −4
=
8 22 −5

26 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


2- As linear combinations of columns of A

 
b 11 b 1p
( ) . .. .. 
C = a1 ··· a n  .. . .  (A 1×n B n×p partitioning)
b n1 b np
( )
= b 11 a 1 + · · · + b n1 a n · · · b 1p a 1 + · · · + b np a n
(∑ ∑ )
= j b j 1a j · · · j bjpaj
( )
= c1 · · · c p
( ) ( ∑ ) ∑( ) ∑
C i k = ck i = bjk aj i = b j k a j i = b j k ai j .
j j j

Example 28
( )( )
1 4 3 2 0
C=
1 5 1 4 −1
( ( ) ( ) ( ) ( ) ( ) ( ))
1 4 1 4 1 4
= 3 +1 2 +4 0 + −1
1 5 1 5 1 5
(( ) ( ) ( )) ( )
3+4 2 + 16 0−4 7 18 −4
= =
3+5 2 + 20 0−5 8 22 −5

27 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


3- As linear combinations of rows of B

Example 29
  
a 11 a 1n b 1′ ( )( )
 .. . ..   ..  1 4 3 2 0
C = . . . .   .  (A m×n B n×1 partitioning) =
1 5 1 4 −1
a m1 a mn b n′ ( ( ) ( ))
  1 3 2 0 + 4 1 4 −1
a 11 b 1′ + · · · + a 1n b n′ = ( ) ( )
1 3 2 0 + 5 1 4 −1
 ..  (( )) ( )
= .  3 + 4 2 + 16 0 − 4 7 18 −4
a m1 b 1′ + · · · + a nm b n′ = ( ) =
3 + 5 2 + 20 0 − 5 8 22 −5
 ∑n ′

j =1 a 1 j b j
 .. 
= .


∑n ′
a
j =1 m j j b
 ′ 
c1
 .. 
= . 

cm
(∑ ) ∑( ) ∑
C i k = (c i′ )k = a i j b ′j k = a i j b ′j k = a i j b j k .
j j j

28 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


4- As summation of outer products, each is a matrix

Example 30
 
b 1′ ( )( )
( ) .  1 4 3 2 0
C = a1 ··· a n  ..  (A 1×n B n×1 partitioning) =
1 5 1 4 −1
b n′ ( ) ( )
1 ( ) 4 ( )
= a 1 b 1′ + · · · + a n b n′ = 3 2 0 + 1 4 −1
1 5
∑n ( ) ( ) ( )
= a j b ′j , 3 2 0 4 16 −4 7 18 −4
= + =
j =1 3 2 0 5 20 −5 8 22 −5
   
a1 j a 1 j b ′j
 .   .. 
a j b ′j =  ..  b ′j =   .
,


an j an j b j
(∑ ) ∑( ) ∑
Ci k = a j b ′j i k = a j b ′j i k = a i j b j k .
j j j

29 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Partitioned Matrices and Multiplication (general case)

Subdevide each matrix to conforming number of blocks, e.g.


( )( )
A 11 A 12 B 11
AB =
A 21 A 22 B 21
( )
A 11 B 11 + A 12 B 21
= (Must Conform)
A 21 B 11 + A 22 B 21

In general: A m×n B n×p

←→ ←→ · · · ←→
n1 n2 nc
   
A 11 A 12 · · · A 1c n1 ↕ B 11 B 12 · · · B 1k
   
 A 21 A 22 · · · A 2c  n2 ↕ B 21 B 22 · · · B 2k 
=
 .. .. ..  ..  .
 . .. .. 

 . . .  .  . . . 
Ar 1 Ar 2 · · · Ar c r ×c
nc ↕ B c1 B c2 · · · B ck c×k
n 1 + · · · + n c = n.

30 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Product with Diagonal Matrix

Definition 31 A matrix D is diagonal if D i j = 0 ∀i =


̸ j ; i.e.,
 
d1 0 · · · 0
 .. 
0 d . 
 2 
D = . .
 . .. 
 . . 0 
0 ··· 0 dm

Since there is no confusion, we subscript d i instead of d i i . We also, for short, write D = diag(d 1 , . . . , d n )

Row scaling:
    
d1 a 1′ d 1 a 1′
 ..   ..   .. 
D m×m A m×n =  .  .  =  . 
′ ′
dm am dm am
Column scaling:
 
d1 0 0
( ) ..  ( )
A m×n D n×n = a 1 ··· an  0 . 0  = a1 d1 ··· an dn
0 0 dn

Definition 32 The identity matrix I is a special case diagonal matrix and defined as

I m×m = diag (1, . . . , 1)

It is obvious that I A = AI = A.
31 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
Transpose of a Product

Lemma 33 For conforming matrices A m×n and B n×p ,

(AB )′ = B ′ A ′ ,

and more general


(A 1 · · · A n )′ = A ′n · · · A ′1 .

Proof. The general element AB i k is given by


n ∑
n ( ) ( ′) ∑
n ( ) ( ) ( )
(AB )i k = Ai j B j k = A′ ji B kj = B ′ k j A ′ j i = B ′ A ′ ki .
j =1 j =1 j =1

Proving the second part is immediate by induction.

32 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Trace of a Product

The trace is defined only for a square matrix; hence, for a product to have a trace it must be A m×n B n×m .

Lemma 34 For two-side conforming matrices A m×n and B n×m ,

tr (AB ) = tr (B A) ,

and more general


tr (A 1 · · · A n ) = tr (A n · · · A 1 ) .

Proof. A m×n B n×m = C m×m , B n×m A m×n = D n×n :



m ∑
m ∑
n ∑
n ∑
m ∑
n
tr(AB ) = (AB )i i = Ai j B j i = B j i Ai j = (B A) j j = tr(B A).
i =1 i =1 j =1 j =1 i =1 j =1

Remark 1 From the proof above, we see that



n ∑
m ∑
n ∑
m ( )
tr (AB ) = B j i Ai j = B ′ i j Ai j ,
j =1 i =1 j =1 i =1

i.e., it is the sum of products of each element of A multiplied by the corresponding element of B ′ . And if B = A ′
( ) ( ) ∑n ∑
m ∑
n ∑
m
tr A A ′ = tr A ′ A = Ai j Ai j = A 2i j ,
j =1 i =1 j =1 i =1

i.e., it is the sum squares of all elements.


33 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
Example 35
( )
1 2 3
A= ,
−4 3 0
( )
tr A A ′ = 12 + 22 + 32 + (−4)2 + 32 + 02 = 39

34 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Power of a Matrix

A k = A A · · · A, k times (A must be square; why?)

Example 36 (Graph Theory) :


• Number of ways of getting from S i to S k in exactly 2 steps is
∑ ( 2)
j Ti j T j k = T i k .

• Number of ways of getting from S i to S k in exactly 3 steps is


∑ ( 2) ( 3)
j Ti j T j k = T i k .

   
0 0 1 1 1 0 0 1 0 2
0   0
 0 0 0 2 0 0 2 0 
   
T 2 = 0 0 1 0 0  , T 3 = 0 0 0 0 1
   
• The traffic is represented as a matrix 0 0 1 0 0 0 0 0 0 1
T , where a path from S i to S j exists if 0 0 0 0 1 0 0 1 0 0
Ti j = 1.
  • Number of ways of getting from S i to S k in exactly r steps is
0 1 1 0 0 ∑ ( r −1 ) r
0 j Ti j T j k = (T )i k .
 0 1 1 0
 
T = 0 0 0 0 1 • There is no path from S i to S k only if
 
0 0 0 0 1 Σ∞ r
r =1 (T )i k = 0.
0 0 1 0 0
∑∞ r
• What is r =1 T ?
35 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
2.3.6 The Laws of Algebra
Theorem 37 ∀A m×n , B m×n , C m×n , c scalar, we have

A +B = B + A (commulative)
c (A + B ) = c A + cB (distributive)
A + (B +C ) = (A + B ) +C , (associative)

and

C (A + B ) = C A +C B, (∀A m×n , B m×n ,C k×m )


(A + B )C = AC + BC , (∀A m×n , B m×n ,C n×p )
A (BC ) = (AB )C (∀A m×n , B n×p ,C p×q )
A m×n B n×m ̸= B n×m A m×n
A m×m B m×m ≠= B m×m A m×m

Example 38 (Counter Example for AB ̸= B A)


( )( ) ( ) ( )( )
1 2 0 2 6 8 0 2 1 2
= =
3 4 3 3 12 18 3 3 3 4
( )( ) ( ) ( ) ( )( )
1 2 0 1 6 11 3 4 0 1 1 2
= ̸= =
3 4 3 5 12 23 18 26 3 5 3 4
A B B A

36 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Proof of multiplication associative rule. A m×n B n×p C p×q


n
(AB )i j = Ai r Br j
r =1
∑ p
((AB )C )i k = (AB )i j C j k
j =1

p ∑
n
= Ai r Br j C j k
j =1 r =1

n ∑
p
= Ai r Br j C j k
r =1 j =1

n
= A i r (BC )r k
r =1
= (A (BC ))i k

37 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Example 39 Factor Y = X P X +Q X 2 + X and find the constraints on the order of matrices.
It is clear that all matrices will be of order m × m.

Y = X P X +Q X 2 + X
= (X P +Q X + I ) X

X P X +Q X X + X

38 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Product with Scalar and Quadratic Forms

Back to Definition 2 it is very important, sometimes, to make sure of conforming even for scalars; i.e., we
write
y m×1 a 1×1 NOT a y.
This is because, sometimes, a 1×1 itself is a matrix multiplication that if dissembled it should conform with
the remaining of equation

a 1×1 = x 1×m A m×m x m×1

y n×1 a 1×1 = y n×1 x 1×m A m×m x m×1

a 1×1 y n×1 = x 1×m A m×m x m×1 y n×1 (WRONG!)

39 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.



Example 40 (Quadratic Form) For any square matrix A, the form y 1×1 = x 1×n A n×n x n×1 is called quadratic
form; it contains all quadratic and bilinear terms. (For scalar case, simply it is y = xax = ax 2 ).
  
a 11 ... a 1n x1
( ) . .. ..   .. 
y = x1 ... x n  .. . .  . 
a n1 ... a nn xn

x1
 
(∑ ∑ ∑ )  x2 
= i xi ai 1 i xi ai 2 ...  
i x i a i n  .. 
 . 
xn
( )
∑ ∑
= xi ai j x j
j i
∑∑
= ai j x j xi
j i
∑ ∑∑
= a i i x i2 + ai j xi x j
i i ̸= j all off diagonal
( )
∑ ∑∑
= a i i x i2 + ai j + a j i xi x j ,
i i>j LT UT

this is because, e.g., a 13 x 1 x 3 + a 31 x 3 x 1 = (a 13 + a 31 ) x 1 x 3 .


( ) ( )
Complexity of Ver. 1: we sum n 2 − n off-diagonal terms, each term is 2 multiplications a i j x i x j ; therefore

40 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


total steps is given by:
( ) ( )
# o f st eps = n 2 − n (2M ) + n 2 − n − 1 (S)
( ) ( )
= 2 n2 − n M + n2 − n − 1 S
( )
Complexity of Ver. 2:, we sum n 2 − n /2 lower triangular term, each term is one summation and 2 multipli-
cations; therefore
( 2 ) (( ) )
n −n n2 − n
# o f st eps = (2M + 1S) + −1 S
2 2
( ) ( )
= n 2 − n M + n 2 − n − 1 S.

Ver. 2 is half the number of multiplications of Ver. 1; it is almost double speed gained by a simple trick.
(∑ ∑ )
For the following quadratic form y, expand column wise j i :
  
1 2 3 x1
( )
y = x1 x2 x 3 4 7 6 x 2 
2 −2 0 x 3
= x 12 + 4x 2 x 1 + 2x 3 x 1 + 2x 1 x 2 + 7x 22 − 2x 3 x 2 + 3x 1 x 3 + 6x 2 x 3
= x 12 + (2 + 4) x 1 x 2 + (3 + 2) x 1 x 3 + 7x 22 + (6 − 2) x 2 x 3
= x 12 + 6x 1 x 2 + 5x 1 x 3 + 7x 22 + 4x 2 x 3 .

41 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Without expansion, it is obvious that, e.g.,
     
1 2 3 x1 1 1 1 x1
( ) ( )
x1 x2 x 3 4 7 6   x 2  = x 1 x2 x 3 5 7 3   x 2  ,
2 −2 0 x 3 4 1 0 x3
A B
( ) ( )
because A i j + A j i = B i j + B j i ∀i , j .
( )
Hence, we can replace the matrix A in any quadratic form y = x ′ Ax by a symmetric matrix Σ = A + A ′ /2
whose diagonals and off-diagonals have:

σi i = (a i i + a i i ) /2 = a i i
( ) ( ) ( )
σi j + σ j i = a i j + a j i /2 + a j i + a i j /2
= ai j + a j i
x Ax = x ′ Σx

42 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


( )′ ( )
Example 41 Expand and simplify y = x − µ Σ x − µ , where x and µ are vectors and Σ is a symmetric ma-
trix.

( )′ ( )
y = x −µ Σ x −µ
( ) ( )
= x ′ − µ′ Σ x − µ
= x ′ Σx − x ′ Σµ − µ′ Σx + µ′ Σµ
( )′
= x ′ Σx − x ′ Σµ − µ′1×p Σp×p x p×1 + µ′ Σµ (sc al ar ′ = sc al ar )
= x ′ Σx − x ′ Σµ − x ′ Σµ + µ′ Σµ
= x ′ Σx − 2x ′ Σµ + µ′ Σµ

43 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


2.4 Elimination Using Matrices
Back to the linear system of equations (Ex., 13) (and using pivots for elemination):
    
2x + 4y − 2z = 2 2 4 −2 x 2
4x + 9y − 3z = 8 =⇒  4 9 −3  y  =  8 
−2x − 3y + 7z = 10 −2 −3 7 z 10
 
(row1) .x
(col1) x + (col2) y + (col3) z = (row2) .y  = b
(row3) .z

To eliminate: R 2new = R 2 + (−2) × R 1 , which can be accomplished by the matrix multiplication:


      
1 0 0 2 4 −2 x 1 0 0 2
−2 1 0  4    
9 −3 y = −2 1 0   8
0 0 1 −2 −3 7 z 0 0 1 10
    
2 4 −2 x 2
0 1 1   y = 4 .
 
−2 −3 7 z 10
 
1 0 0
We denote the elimination matrix −2 1 0 by E 21 (−2).
0 0 1

Definition 42 The elimination matrix E i j (l ) is an identity matrix except the element e i j = l to perform:
R inew = R i + l × R j . If not ambiguous we write E i j
44 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
 
1 0 0
E 31 (1) = 0 1 0
1 0 1
      
1 0 0 2 4 −2 x 1 0 0 2
0 1 0  0 1 1   y  = 0 1 0  4 
1 0 1 −2 −3 7 z 1 0 1 10
    
2 4 −2 x 2
0 1 1   y  =  4 
0 1 5 z 12

Finally,
 
1 0 0

E 32 (−1) = 0 1 0
0 −1 1
      
1 0 0 2 4 −2 x 1 0 0 2
0 1 0 0 1 1    
y = 0 1 0   4
0 −1 1 0 1 5 z 0 −1 1 12
    
2 4 −2 x 2
0 1 1    
y = 4 ,
0 0 4 z 8

whose solution is z = 2, y = 2, x = −1.


The summary of that is
E 32 (−1)E 31 (1)E 21 (−2)AX = E 32 (−1)E 31 (1)E 21 (−2)b.
45 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
Just for simpler notation (with same everything), we could have made up the augmented matrix
 ¯ 
2 4 −2 ¯¯ 2
(A| b) =  4 9 −3 ¯¯ 8 
−2 −3 7 ¯ 10

,then
E 32 E 31 E 21 (A| b)

Definition 43 The permutation matrix P i j is an identity matrix except that in Rows i and j (to be permuted)
the ones are located in p i j , p j i respectively; e.g.,
 
1 0 0
P 23 = 0 0 1 .
0 1 0

Of course, P i j = P j i .

This is needed to swap equations when the pivot is zero.

46 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Example 44
 ¯ 
x + 2y + 2z = 1 1 2 2 ¯ 1
¯

4x + 8y − 9z = 3 =⇒ (A| b) = 4 8 9 ¯ 3 
¯
3y + 2z = 1 0 3 2 ¯ 1

   ¯ 
1 0 0 1 2 2 ¯¯ 1
−4 1 0 ↙ =  0 0 1 ¯¯ −1  (E 21 (−4))
0 0 1 0 3 2 ¯ 1
   ¯ 
1 0 0 1 2 2 ¯¯ 1
0 0 1 ↙ =  0 3 2 ¯¯ 1  . (P 23 )
0 1 0 0 0 1 ¯ −1

This is called Gauss elimination, and by back-substitution, z = −1, y = 1, x = 1. Jordan would go further to
get pivots on diagonal and zeros elsewhere.

47 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


 ¯ 
1 2 2 ¯¯ 1
=  0 3 2 ¯¯ 1 
0 0 1 ¯ −1
   ¯ 
1 0 0 1 2 2 ¯¯ 1
0 1 −2 ↙ =  0 3 0 ¯¯ 3  (E 23 (−2))
0 0 1 0 0 1 ¯ −1
   ¯ 
1 0 −2 1 2 0 ¯¯ 3
0 1 0  ↙ =  0 3 0 ¯¯ 3  (E 13 (−2))
0 0 1 0 0 1 ¯ −1
   ¯ 
1 − 23 0 1 0 0 ¯¯ 1
0 1 0 ↙ =  0 3 0 ¯¯ 3  (E 12 ( −2
3 ))
0 0 1 0 0 1 ¯ −1
   ¯ 
1 0 0 1 0 0 ¯¯ 1
0 1
3 0 ↙ =  0 1 0 ¯¯ 1  (D(1, 13 , 1))
0 0 1 0 0 1 ¯ −1
= (I | sol ut i on) ,

where the solution is : x = 1, y = 1, z = −1.


The summary of that is:
DE 12 E 13 E 23 P 23 E 21 (A| b)
Solution of system or linear equations is nothing but multiplication by E s, P s, and finally D
48 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
Example 45 (Elimination by blocks) Using the first pivot, we can eliminate all elements underneath using a
single matrix. Write the matrix A as
 
a 11 a 12 a 13 ( )
  a 11 A 12
A = a 21 a 22 a 23 =
A 21 A 22
a 31 a 32 a 33

In general we eliminate by:


(E n n−1 ) . . . (E n2 . . . E 42 E 32 ) (E n1 . . . E 31 E 21 )

E = E 31 E 21
  
1 0 0 1 0 0
= 0 1 0 −a 21 /a 11 1 0
−a 31 /a 11 0 1 0 0 1
 
1 0 0 ( )
1 0
= −a 21 /a 11 1 0 =
−A 21 /a 11 I
−a 31 /a 11 0 1

The power of block treatment, allow us to write


( )( )
1 0 a 11 A 12
EA=
−A 21 /a 11 I A 21 A 22
( )
a 11 A 12
= .
0 A 22 − A 21 A 12 /a 11

49 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Of course A can be replaced by (A| b). For this example
 ¯ 
1 2 2 ¯¯ 1
(A| b) =  4 8 9 ¯¯ 3 
0 3 2 ¯ 1
( ) ( )
8 9 3 4 ( )
A 22 − A 21 A 12 /a 11 = − 2 2 1 /1
3 2 1 0
( ) ( ) ( )
8 9 3 8 8 4 0 1 −1
= − =
3 2 1 0 0 0 3 2 1

This gives
 ¯ 
1 2 2 ¯¯ 1
 0 0 1 ¯ −1  ,
¯
0 3 2 ¯ 1
which would be obtained of course having multiplied by E 21 (−4)

50 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


2.5 Inverse Matrices
Definition 46 The square matrix A p×p is invertible if there exists a matrix A −1 such that

A −1 −1
l A = A A r = I p×p

Hint: we will show soon that A −1


l
= A −1 −1
r = A . But we have to be cautious and rigorous, since AB ̸= B A in
general.

Motivation: for scalar a

aX = b
a −1 aX = a −1 b
1 X = a −1 b
X = a −1 b

Analogously, what is A −1 such that

AX = b
−1
A AX = A −1 b
I X = A −1 b
X = A −1 b,

although finding A −1 is more computational expensive than solving by elimination as we will see.
51 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
Lemma 47 If both left and right inverses exist they are equal

Proof. Suppose the left and right inverses of A are A −1


l
and A −1 −1 −1
r (so that A l A = A A r = I ); then consider
A l−1 A A −1
r ( −1 ) −1 ( −1 )
A −1 −1
r = Al A Ar = Al A Ar = Al
−1 −1
A A r = A −1
l
←−−−−−→

This Lemma is different from the last two statements in Lemma 51 (will be proven shortly), from which we
can say:

1. If left (or right) inverse exists the right (or left) exists and equals it. Stated differently, if AB = I then
B A = I.

2. If the inverse exists it is unique. So, we cannot find B 1 A = I and B 2 A = I with B 1 ̸= B 2

Therefore,
Either: the square matrix A has no inverse
Or: the left and right inverses are identical and unique.

52 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Lemma 48 (inverse of special matrices) :

1. Any 2 × 2 matrix:
( )−1 ( )
a b 1 d −b
=
c d ad − bc −c a

2. Any n × n diagonal matrix:


 −1  
d1 1/d 1
 ..   .. 
 .  = . 
dn 1/d n

3. Any pivot cancellation matrix:


E i−1
j (l ) = E i j (−l )

4. Any permutation matrix:


( )−1
Pi j = Pi j

Proof. The proof is by direct multiplication from both sides; it is obvious for 1 and 2. For 3,
    
1 0 ··· 0 1 0 ··· 0 1 0 ··· 0
 ..     .. 
0 1 .  ...  0 .
  0 1   1 
. .  = . .
. .. . ..  . .. 
 . −l . 0  . l . 0  . −l × 1 + l . 
0 ··· 0 1 0 ··· 0 1 0 ··· 0 1

Proving 4 follows exactly the same line. In few words, since P i j is I with rows i and j swapped then P i j P i j
swaps again the same rows to bring it back to I .
53 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
Example 49 Consider E 21 (−5), then
    
1 0 0 R1 R1
E 21 (−5)A = −5 1 0 R 2  = R 2 − 5R 1  ,
0 0 1 R3 R3
    
1 0 0 R1 R1
  
E 21 (5) (E 21 (−5)A) = 5 1 0 R 2 − 5R 1 = R 2  ,
 
0 0 1 R3 R3

i.e., it subtracts what E added. Of course, E E −1 = E −1 E = I .

54 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Calculating A −1 by Gauss-Jordan Elimination
( )
Consider A n×n and its right inverse exists: A −1
r = x1 ··· x n . Then,
 
1 0 0 ···
 .. 
( ) 0 ..
.
 .  ( )
A x1 · · · xn = I =  .  = e1 · · · en
. .. 
. . 0
0 ··· 0 1
( ) ( )
Ax 1 · · · Ax n = e 1 · · · e n
Ax 1 = e 1
..
.
Ax n = e n

Then, finding A −1 is nothing but solving by elimination n systems of equations, each is n × n:

Ax i = e i , i = 1, . . . , n.

55 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Example 50 .
 
2 −1 0
A =  −1 2 −1 .
0 −1 2  ¯ 
−1 2 −1 0 ¯¯ 1 0 0
Find A using the augmented
=  −1 2 −1 ¯¯ 0 1 0 
matrix (A| I ) =
0 −1 2 ¯ 0 0 1
 ¯ 
2 −1 0 ¯¯ 1 0 0
→ 0 3 ¯ 1  (E 21 ( 21 ))
2 −1 ¯ 2 1 0
0 −1 2 ¯ 0 0 1
 ¯ 
2 −1 0 ¯¯ 1 0 0
→ 0 3 ¯ 1 1 0  (E 32 ( 32 ); Gauss stops here)
2 −1 ¯ 2
4 ¯ 1 2
0 0 3 3 3 1
 ¯ 
2 −1 0 ¯¯ 1 0 0
→ 0 3
2 0 ¯¯ 34 32 34  (E 23 ( 43 ))
0 0 43 ¯ 13 23 1
 ¯ 
2 0 0 ¯¯ 32 1 12
→  0 23 0 ¯¯ 43 32 34  (E 12 ( 32 ))
0 0 43 ¯ 31 23 1
 ¯ 
1 0 0 ¯¯ 34 21 41
→  0 1 0 ¯¯ 12 1 21  (D( 12 , 32 , 34 ))
In summary:
) 0 0 1 ¯ 14 21 43
DE 12 E 23 E 32 E 21 (A| I ) = (I | A −1 . )
r = (I | A −1
r (reduced echelon form)
56 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
Lemma 51 (Connection between A −1 and pivots) :

1. If A has n pivots A −1 exists and


M = A −1
l =A
−1
= A −1
r = X,

where M is the multiplication of the elemination matrices and X is the solution of AX = I .

2. If either inverse exists A has n pivots and hence A −1 exists. (This means if AB = I then B A = I )

3. If the inverse exists then it is unique along with pivots and the solution to Ax = b.

Proof. 1: If the pivots exist then this has been produced to initially solve the problem AX = I , and the
solution X went to the right side; therefore the solution X is A −1
r . In parallel, the solution is nothing but a
series of matrix multiplications:

D(E 12 ) . . . (E 1 n−1 . . . E n−3 n−1 E n−2 n−1 ) (E 1n . . . E n−2 n E n−1 n )·


←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
(E n n−1 ) . . . (E n2 . . . E 42 E 32 ) (E n1 . . . E 31 E 21 ) A = I ,
←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
in the form M A = I ; hence, M is A −1
l
. Since both inverses exist, they are equal (Lemma 47).

2: If A −1
r exists (AX = I ) we will prove A has n pivots by contradiction. Assume that A does not have n pivots
(the elimination matrices M A produces a matrix with zero row):

zero row mat. = (M A) X = M AX = M (AX ) = M I = M .


zero row mat. ←−−→
However, M cannot have a zero row; otherwise it would produce a zero row matrix; while by construction, it
should produce n pivots not a zero row; a contradiction. Hence, A has n pivots and from 1 above M = A −1 l
=
A −1 = A −1
r = X (which means X A = I ).
57 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
If A −1
l
exists (X A = I ), then AX = I with which we have just started lines above.

3: Assume that A has two inverses A 1 and A 2 , so that A 1 A = A A 1 = I and A 2 A = A A 2 = I .

A1 A = I
A1 A A2 = A2
A1 = A2

Since the inverse is unique, the elimination process cannot produce different pivots; hence they are unique
too and the solution to Ax = b ∀b will be unique as well and equals to A −1 b.

Lemma 52 If A is symmetric, then its inverse is symmetric.

Proof. Suppose that B is an inverse then

BA =I
A′B ′ = I
AB ′ = I
B AB ′ = B
B ′ = B.

58 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Lemma 53

1. Suppose that A, B , are invertible, (AB )−1 = B −1 A −1 .

2. And in general (A 1 · · · A n )−1 = A −1 −1


n · · · A1 .

Proof. For the first part,

(AB )−1 (AB ) = I


(AB )−1 (AB ) B −1 A −1 = B −1 A −1
(AB )−1 = B −1 A −1
(AB ) (AB )−1 = I
B −1 A −1 (AB ) (AB )−1 = B −1 A −1
(AB )−1 = B −1 A −1 .

The proof of part 2 is immediate by induction.

Lemma 54 If AX = 0 and X ̸= 0 then A is not invertible.

Proof. Given X ̸= 0, suppose that A −1 exists;

AX = 0
A −1 AX = A −1 0
X = 0,

a contradiction; hence A −1 does not exist.


59 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
2.6 Elimination Using Matrices is A = LU Factorization
Back to Sec. 2.4 Intuitively:
      
1 0 0 1 0 0 1 0 0 2 4 −2 2 4 −2 U3 = A 3 − l 31U1 − l 32U2
0 1 0 0 1 0 −2 1 0  4 9 −3 = 0 1 1
A 3 = U3 + l 31U1 + l 32U2
0 −1 1 1 0 1 0 0 1 −2 −3 7 0 0 4
    
1 0 0 2 4 −2 2 4 −2
−2 1 0  4 9 −3 = 0 1 1
3 −1 1 −2 −3 7 0 0 4
E 32 (−1)E 31 (1)E 21 (−2) A = M L A = U

( )−1
A = M L−1 U = E 32 (−1)E 31 (1)E 21 (−2) U
( )
= E 21 (2)E 31 (−1)E 32 (1) U
A=LU
    
2 4 −2 1 0 0 2 4 −2
4 9 −3 =  2 1 0  0 1 1
−2 −3 7 −1 1 1 0 0 4 Remark: L stores the Gauss-
    elimination steps on A, which
1 0 0 2 0 0 1 2 −1

= 2 1 0  0 1 0  0 1 1  ends up to U .
−1 1 1 0 0 4 0 0 1
A = L D U.

60 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Example 55 (Using L U in solving equations:)

AX = b ≡ L (U X ) = b

Then, solve L C = b to find C , then solve U X = C to find X .

     
1 0 0 2 4 −2 x 2
 2 1 0 0 1 1  y  =  8 
−1 1 1 0 0 4 z 10
    
1 0 0 c1 2
2   
1 0 c2 = 8  (Gauss-elimination for b)
−1 1 1 c3 10
c1 = 2 2c 1 + c 2 = 8 −→ c 2 = 4 − c 1 + c 2 + c 3 = 10 −→ c 3 = 8
    
2 4 −2 x 2
0 1 1   y  = 4 (same obtained with augmenting)
0 0 4 z 8
4z = 8 −→ z = 2 y + z = 4 −→ y = 2 2x + 4y − 2z = 2 −→ x = −1

61 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Lemma 56 (A = L U factorization) : for the case of no permutation, we get
( ) ( )( )
E n n−1 · · · E n2 · · · E 42 E 32 E n1 · · · E 31 E 21 A = U
ML A = U
A = M L−1 U
( −1 −1 −1
)( −1 −1 −1
) ( −1 )
= E 21 E 31 · · · E n1 E 32 E 42 · · · E n2 · · · E n n−1 U
= L U,

where: (1) both M L , M L−1 (= L) are LTMs, and (2) L has L i j equals directly to the element of E i j as opposed to
M L . The proof is immediate from the following two more general lemmas. Hint: to prove that the elements of
M L are not directly the elements of E i j a single counter example is enough.

Lemma 57 Multiplication of two lower (or upper) triangular matrices is a lower (or upper) triangular matrix.
The diagonal will be one if A i i B i i = 1 (A i i = B i i = 1 is a special case).

Proof. Suppose A, B are LTMs; i.e., A i j = B i j = 0 ∀i < j . Then, the element C i j , i ≤ j will be
∑ ∑ ∑ ∑ ∑
Ci j = Ai k Bk j = Ai k Bk j + Ai i Bi j + Ai k Bk j = Ai k 0 + Ai i Bi j + 0 Bk j = Ai i Bi j
k k<i k>i k<i k>i

which is 0 for i < j and A i i B i i for i = j . Hence, it is obvious that M L is LTM with ones on the diagonal

62 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Lemma 58 Consider any two LTMs A, B with the following properties

Ai j = Bi j = 0 ∀i < j
Ai i = Bi i = 1
Aj = ej ∀j > J
Bj = ej ∀j < J
Ai J = 0 ∀I < i
Bi J = 0 ∀J < i ≤ I .

Since C j = i B i j A i , we get:

Cj = Bi j Ai + B j j A j = 0 + A j = A j , (∀ j < J )
i ̸= j
∑ ∑ ∑ ∑
Cj = Bi j Ai = Bi j Ai + Bi j Ai = 0 + Bi j ei = B j , (∀ j > J )
i i<j j ≤i j ≤i
∑ ∑ ∑ ∑
CJ = Bi J Ai + B J J A J + Bi J Ai + Bi J Ai = 0 + A J + 0 + Bi J ei . (j = J)
i <J J <i ≤I I <i I <i

Hence, each element of A and B goes to C directly in the same position.

63 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Example 59 (Common Mistake:) Do the elements of E s go directly to M L and hence:

L −1
i , j = −L i , j , i>j
L −1
i , j = L i , j = 1, i=j
L −1
i,j = L i , j = 0, i < j.

Lemma 60 If A has a row starting with zero, so does the same row in L; and when a column in A starts with
zero, so does the same column in U

Proof. If A i 1 = 0, then L i 1 = 0 is immediate from


∑ ∑
0 = Ai 1 = L i k Uk1 = L i 1U11 + L i k 0;
k k>1

Also, it could be immediate from the fact that if a row in A has zero, it does not need elimination and hence
the element of its E matrix will be zero. This saves computer time.
On the other hand, if A 1 j = 0, then U1 j = 0 is immediate from
∑ ∑
0 = A1 j = L 1k Uk j = 1U1 j + 0Uk j ,
k k>1

which completes the proof.

64 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


2.7 Computational Issues:
Scientific Computing Environments (SCEs), Examples, and Complexity

65 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


2.7.1 On Scientific Computing Environments and Libraries
EISPACK : early 1970s, for solving symmetric, un- Mathematica : Commercial SW for symbolic (and
symmetric, and generalized eigenproblems. numeric of course) mathematical computa-
tions.
LINPACK : late 1970s for solving linear equations
and least squares problems. R : free software environment for statistical comput-
ing and graphics.
BLAS (Basic Linear Algebra Subprograms): very ef-
ficiently preforming common linear algebra Python : is a widely used high-level, general-
problem. purpose, interpreted, dynamic programming
language
ATLAS (Automatically Tuned Linear Algebra
Software): BLAS implementation with higher Sage : SageMath is a free open-source mathematics
performance. software system licensed under the GPL.

LAPACK (Linear Algebra PACKage): stands on EIS- • It builds on top of many existing open-
PAC and LINPACK and heavily on BLAS (all source packages: NumPy, SciPy, mat-
written in Fortran) to make them run effi- plotlib, Sympy, Maxima, GAP, FLINT, R
ciently on shared-memory vector and parallel and many more.
processors. • Access their combined power through a
common, Python-based language or di-
Matlab : is a commercial SW: rectly via interfaces or wrappers.
• late 1970s, written to access to EISPACK • Mission: Creating a viable free open
and LINPACK without learning Fortran. source alternative to Magma, Maple,
• Then was written in C. Mathematica and Matlab.
• Then, in 2000, rewritten to use LAPACK. • Examples and Sage cheat sheet:
66 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
2.7.2 Issues on Complexity*
To measure algorithm complexity we need to define a step; We adopt the definition of FLOP (Floating Point
Operation) from the great and very mature reference for matrix computations (Golub and Van Loan, 1996,
Sec. 1.2.4): □×□ + □ (almost the inner loop)

Example 61 (LU factorization) :

LU factorization steps = steps of side of b =

= (n)(n − 1) + (n − 1)(n − 2) + · · · + 2 · 1 = (n − 1) + (n − 2) + · · · + 1
∑n 1 1 1
= (n − i + 1)(n − i ) = (n − 1)n = n 2 − n = O (n 2 ).
i =1
2 2 2
∑( )
= i 2 − (2n + 1)i + n(n + 1)
i
(1 1 1 ) (1 )
= n 3 + n 2 + n − (2n + 1) n(n + 1) + n 2 (n + 1)
3 2 6 2
1 3 1
= n − n = O (n 3 ).
3 3

 
sage : var ( ’i ,j ,k , n ’) ;
sage : sum (( n - i +1) *( n - i ) ,i ,1 , n )
1/3* n ^3 - 1/3* n
 
67 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
Example 62 (Elaboration on Lemma 57 and looping over LT (or UT)) .

Multiplication A m×n B n×p , C i j = k A i k B k j , mnp (or n 3 ) steps:
 
C = 0
for i =1: m
for j =1: p
for k =1: n
C (i , j ) = A (i , k ) B (k , j ) + C (i , j )
 

∑i
If both A, B are LT: C i j = 0, ∀i < j , Ci j = Ai k Bk j , ∀ j ≤ i
 k= j 
C = 0
for i =1: n
for j =1: i // ( to access the UT j = i : n )
no. of steps =
for k = j : i // B =0 for k <j , A =0 for i < k
C (i , j ) = A (i , k ) B (k , j ) + C (i , j )
n ∑
∑ i ∑
i
 =
 1
i =1 j =1 k= j


n ∑
i
= (i + 1 − j )
i =1 j =1
∑ n ( 1 ) 1∑ n
=
(i + 1)i − i (i + 1) = (i + i 2 )
i =1 2 2 i =1
 1(1 1 3 1 2 1 )
sage : var ( ’i ,j ,k , n ’) ; = n(n + 1) + n + n + n
sage : sum ( sum ( sum (1 , k , j , i ) , j , 1 , i ) , i , 1 , n ) 2 2 3 2 6
1/6* n ^3 + 1/2* n ^2 + 1/3* n 1 3 1 2 1
 =
 n + n + n.
6 2 3
68 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
Example 63 (Matrix round off error and LU partial permutation) (Golub and Van Loan, 1996, Sec. 3.3).
Suppose the PC has a floating point arithmetic with t = 3 digits; what is the LU factorization/solution to:
( )( ) ( )
.001 1.00 x 1 1.00
=
1.00 2.00 x 2 3.00

Infinite precision solution (exact):


( ) ( ) ( ) ( ) ( ) ( )
1 0 .001 1 .001 1.00 x1 500/499 1.002004
L= ,U= , LU = = A, = =
1000 1 0 −998 1 2.00 x2 997/998 0.998998

3-digit precision:
( ) ( ) ( ) ( ) ( )
1.00 0 .001 1.00 .001 1.00 x1 0.00
L= ,U= , LU = = A, =
1000 1.00 0 −1000 1.00 0.00 x2 1.00
some calculation steps:
−1000 × 1 + 2 = −1.00 × 10 + 0.002 × 10 = (−1.00 + 0.00) × 10−3 = −1000.
3 3

1 × c 1 = 1 → c 1 = 1; 1000c 1 + c 2 = 3 → c 2 = −1000; −1000x 2 = −1000 → x 2 = 1; .001x 1 + x 2 = 1 → x 1 = 0.

3-digit precision with partial pivoting:


( ) ( ) ( ) ( ) ( )
1.00 0 1.00 2.00 1 2.00 x1 1.00
L= ,U= , LU = = A, =
.001 1.00 0 1.00 .001 1.00 x2 .996

69 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Chapter 3

Vector Spaces and Subspaces

70 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


3.1 Spaces of Vectors
Definition 64 (A Real Vector Space) is a set V of vectors (each is n−tuple) over R with an addition and scalar
multiplication on V such that:

commutativity u + v = v + u ∈ V ∀u, v ∈ V .

associativity (u + v) + w = u + (v + w) ∈ V and (ab)v = a(bv) ∈ V ∀u, v, w ∈ V , a, b ∈ R .

additive identity ∃ 0 ∈ V such that v + 0 = v, ∀v ∈ V .

additive inverse ∀v ∈ V ∃w ∈ V such that v + w = 0. (we may denote w by −v)

multiplicative identity 1v = v ∀v ∈ V .

distributive properties a(u + v) = au + av ∈ V and (a + b)u = au + bu ∈ V ∀u, v ∈ V , a, b ∈ R .

Hint:

• Informally: it is the set of vectors which all additions and scalars lay in the set as well.

• any linear combination lie in the subspace (from first and last identity)
{ } { }
Example 65 (R 2 = (x 1 , x 2 )|x 1 , x 2 ∈ R vs. V = (x 1 , x 2 )| − a ≤ x 1 , x 2 ≤ a is NOT an example) .

71 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Hint:

• We expanded from the visual n = 3 to general n.

• We can expand from the set R to any F ; the vector space will be defined then over this F .

• x = (x 1 , x 2 ) ∈ R 2 is a point, vector, 2-tuple, element in R 2 .

• We can generalize for R n , or even C n , or polynomial, or others.

• The human brain cannot visualize or provide geometric models of R n , n ≥ 4.

• Edwin A. Abbott, 1884, “Flatland: a romance of many dimensions”: can help creatures living in three-
dimensional space, such as ourselves, imagine a physical space of four or more dimensions.

• However, we can do mathematics defined ∀n which complies with geometry of 1 ≤ n ≤ 3.

Example 66 (Many other spaces of ) :


p
Real: p = (1, 4, (3), −1, 0) ∈ R 5
p
Complex: p = (1 + i , −2i , − (2) + 3i ) ∈ C 3 .

Polynomial: p(z) = a o + a 1 z + · · · + a m z m .

72 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


3.1.1 Properties of Vector Spaces (seems trivial for R but deep for others!)*
Proposition 67 For ANY vector space satisfying definition 64 we have the following properties:

1. the additive identity is unique.

2. the additive inverse of every element is unique.

3. 0v = 0 ∀v ∈ V .

4. a0 = 0 ∀a ∈ F .

5. −1v ∀v ∈ V is the additive inverse of v, (−v).

The proof is very trivial:


Proof.

0′ = 0′ + 0 = 0
←−−→
w = w + 0 = w + (v + w ′ ) = (w + v) + w ′ = 0 + w ′ = w ′ .
←−−−−−−−−−−−−−−−−−−−−→
0v = (0 + 0)v = 0v + 0v −→ 0v = 0
a0 = a(0 + 0) = a0 + a0 −→ a0 = 0
v + (−1)v = (1)v + (−1)v = (1 − 1)v = 0v = 0 −→ (−1)v is the additive inverse of v

73 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


3.1.2 Subspaces
Definition 68 A subset U of V is called a subspace of V if U is also a vector space (of course using the same
addition and scalar as V ).
{ }
Example 69 U = (x 1 , x 2 , 0)|x 1 , x 2 ∈ R is a subspace of R 3 since it satisfies all the properties of a space.

Proposition 70 For any space V and a subset U ⊂ V , U is a space (or a subspace) if the following hold:

additive identity 0 ∈ U .

closed under addition ∀u, v ∈ U , u + v ∈ U .

closed under scalar multiplication ∀a ∈ R , au ∈ U .

Proof. The proof is obvious since other properties are satisfied immediately on the subset as long as they are
satisfied on the whole set.

Corollary 71 The smallest subspace over R n is 0.

74 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Example 72 Which of the following is a subspace (draw):
{ }
• U = (x 1 , x 2 )|x 1 ∈ R , x 2 = ax 1 + b, a ̸= 0 , compare it to R 2 , then set a condition to be a subspace of R 2 .
1. (0, 0) ∈ U −→ (0, a × 0 + b) ∈ U −→ b = 0.
2. (x 1 , ax 1 ) + (x 2 , ax 2 ) = ((x 1 + x 2 ), a(x 1 + x 2 )) ∈ U .
3. k(x 1 , ax 1 ) = ((kx 1 ), a(kx 1 ))

{ }
• U = (x 1 , x 2 )|0 ≤ x 1 , x 2 .

{ }
• U = (x 1 , x 2 )|x 1 ∈ R , x 2 = ax 12 , a ̸= 0
1. (0, 0) ∈ U .
( ) ( )
2. (x 1 , ax 12 ) + (x 2 , ax 22 ) = (x 1 + x 2 ), a(x 12 + x 22 ) ̸= (x 1 + x 2 ), a(x 1 + x 2 )2

75 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


3.1.3 The column space of the matrix A
Definition 73 (Column Space) of a matrix A m×n , denoted by C (A), is the vector subspace of R m (or probably
the whole R m ) consisting of all linear combinations of the matrix columns; i.e., Ax. Said differently:
{ }
C (A) = Ax | ∀x ∈ R n .
C (A) is the span of the columns of A
proof of C (A) is really a subspace. : C (A) is really a subspace since 0 ∈ C (A) by choosing x = 0; Ax 1 + Ax 2 =
A(x 1 + x 2 ) ∈ C (A); and a(Ax 1 ) = A(ax 1 ) ∈ C (A).

Remark 2 This recalls the solution of Ax = b exists? b must be in the column space of A.
 
1 0
Example 74 What is the column space of the matrix A = 4 3?
2 3
   
1 0
It is the set C (A) = Ax = 4 x 1 + 3 x 2 ∀x 1 , x 2 , which is actually a plane passing through zero.
  
2 3

76 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Example 75 Describe the column spaces of each of the following:
( ) ( ) ( )
1 0 1 2 1 2 3
I= , A= , B= .
0 1 2 4 0 0 4

It is obvious that all are subspaces of R 2 (probably R 2 itself).

77 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


3.2 The Nullspace of A: Solving Ax = 0 and Rx = 0
It is natural to define the row space of a matrix analogously to the column space; but nothing new!.

Definition 76 (Row Space R (A) ⊆ R n )


{ } { }
R (A) = x ′ A | ∀x ∈ R m = (A ′ x)′ | ∀x ∈ R m
R (A) = C (A ′ ),
with no distinction between x and x ′ (both are in R m ).

Now: it is natural to define a space from ONLY x (not Ax or x ′ A), under some constraint.

Definition 77 (Null Space N (A) ⊆ R n ) , and is constructed such that N (A) ⊥ R (A).
{ }
N (A) = x | Ax = 0, x ∈ R n .
Proof of N (A) is really a subspace.
x =0 −→ A0 = 0
x 1 , x 2 ∈ N (A) −→ Ax 1 = Ax 2 = 0 = Ax 1 + Ax 2 = A(x 1 + x 2 )
x 1 ∈ N (A) −→ Ax 1 = 0 = a Ax 1 = A(ax 1 ).

Remark 3 .
{ }
1. It is impossible to have a subspace of x | Ax = b, x ∈ R n except for b = 0; why?

2. Ax = 0 means both: x ∈ R (A) and x is a zero linear combination in C (A).


78 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
( )
1 2
Example 78 What is the null space of A = . It is of course the solution to Ax = 0 (by def.):
3 6
( ) ( ) ( )( ) ( )
1 2 1 2 1 2 x1 −2
−→ ≡ =0 ≡ 1x 1 + 2x 2 = 0 −→ x = x2 , ∀x 2 ∈ R . (E 21 (−3))
3 6 0 0 0 0 x2 1
( )′
x ⊥ 1, 2 ONLY. The null space of A is the set of vectors constituting the line.

Example 79 What is the null space of the matrix: x 1 + 2x 2 + 3x 3 = 0. Here: A = ( 1, 2, 3 ). No pivot cancellation:
     
−2x 2 − 3x 3 −2 −3
x=  x2  = x2 1 + x3 0 
  
x3 0 1

The solution is the set of all linear combinations of these two (2 = 3 − 1) simple vectors; A PLANE: let’s draw it.

Example 80 Suppose A =
 
( ) ( ) 11
1 2 3 1 2 3
A= −→ −→ x 2 = −7x 3 , x 1 = 11x 3 −→ x = x 3 −7 .
1 1 −4 0 −1 −7
1

Much easier to continue from U to the reduced echelon form R:


( ) ( ) ( )
1 2 3 1 0 −11 1 0 −11
−→ −→
0 −1 −7 0 −1 −7 0 1 7

So, the solution is the set of all linear combination of this single (1 = 3 − 2) vector; A Line: let’s see Sage.
79 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
Example 81 (Motivation from data science) :

• Data reduction and compression.

• Data Interpretation.

• Data modeling and prediction.

80 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


3.2.1 Systematic solution using pivot columns, free columns, and reduced echelon form
Example 82 Find N (A):
( ) ( ) ( ) ( )
1 2 2 4 1 2 2 4 1 0 2 −6 1 0 2 −6
−→ −→ −→ .
3 8 6 22 0 2 0 10 0 2 0 10 0 1 0 5

In reduced echelon form, we get r pivot variables p and n − r free variables f ; in the form of p = − αf :

x 1 = −2x 3 + 6x 4
x 2 = 0x 3 − 5x 4
     
x1 −2 6
x  0 −5
 2    
x =   = x3   + x4  
x 3  1 0
x4 0 1

Example 83 After pivot cancellation of A:


    
  
  −a −c   −a −c
1 0 0 a c −b  −d  1 0 a 0 c −b  −d 
0 d     0 0 d    
 1 0 b       1 b     
  −→ x = x 4  0  + x 5  −e    −→ x = x3  1  + x5  0 
0 0 1 0 e     0 0 0 1 e    
 1   0   0   −e 
0 0 0 0 0 0 0 0 0 0
0 1 0 1

81 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


3.2.2 Gauss Elimination Algorithm: revisited and detailed

  Corollary 84 (Gauss elimination algorithm) .


i = 0; j = 0; // previous pivot location For A m×n that produces r pivots:
while ( (i < m ) && (j < n ) ) {
1. R i j = 0, ∀i > I , j < J , R(I , J ) is a pivot.
i ++; j ++;
do { 2. the number of pivots r , number of column
I = argmax ( | A (k , j ) | , i <= k <= m ) ;
j += ! A (I , j ) ;
pivots, and number of row pivots are all equal.
} while ( ! A (I , j ) && (j <= n ) ) ;
3. r ≤ m, n ≡ r ≤ mi n(m, n).
if ( A (I , j ) ) { // pivot or reached boundary 4. the m − r non-pivot rows are all zeros and are
Swap ( R_i , R_I ) ;
PivotElimination (i , j ) ; deferred to the end of R.
}
5. the n − r non-pivot columns have zeros under
}
  the previous pivot.
   
. . . . . . . . .
  .
. . . . . .  . . .
.
 . . . .
Proof. It is trivial and is already a bi-product
. . . . . .      from the construction of elimination!
. . . . . . . . .
   
. . . . . . . . . . . . . . .
. . . . . . . . .
   
. . . . . . . . .
  .
. . . . . .  . . .
.
 . . . .
. . . . . .     
. . . . . . . . .
   
. . . . . . . . . . . . . . .
. . . . . . . . .
82 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.
Lemma 85 In pivot cancellation, a column will have no pivots if and only if it is a linear combination from
preceding columns. A row will be zero if and only if it is a linear combination of preceding rows.

Proof.
For columns:
( )( ) ( )
1 0 a 11 a 21 · · · αa 11 + βa 21 ··· a 11 a 21 ··· αa 11 + βa 21 ···
=
−A 1 /a 11 I. A 1 A2 · · · αA 1 + βA 2 ··· 0 A 2 − A 1 a 21 /a 11 · · · β(A 2 − A 1 a 21 /a 11 ) · · ·

second pivot cancellation will not provide pivots in the linear combination column.

For rows:
     
a 11 R1 a 11 R1 a 11 R1
     
 a 21 R2   0 R 2 − R 1 (a 21 /a 11 )   0 R 2 − R 1 (a 21 /a 11 ) 
  −→  . = . 
 .. ..   . ..   . .. 
 . .   . .   . . 
( ) ( )
αa 11 + βa 21 αR 1 + βR 2 0 (αR 1 + βR 2 ) − R 1 αa 11 + βa 21 /a 11 0 β R 2 − R 1 (a 21 /a 11 )

Definition 86 (Rank of a matrix) is defined as the number of its pivots r .


Later, an equivalent definition is provided and we will show that r is the number of independent columns,
independent rows, etc.

83 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Lemma 87 After Gauss elimination of A to produce the echelon (or reduced echelon) form R:
1. All pivot columns of R are linearly independent; their corresponding columns of A are linearly independent
as well.
2. All non-pivot columns of R are linear combination of preceding columns; same apply for matrix A.
3. All pivot rows of R are linearly independent. their corresponding rows of A are linearly independent as well.
4. All non-pivot rows of R (the zero rows) are linear combination from the pivot rows; same apply for matrix A

Proof. We arrange

1. We assume that ∃ α, a linear combination of pivot columns, such that Rα = 0


       
α1 ∗ ∗ ∗
 ..   ..   ..   .. 
 .  . . .
       
α  ∗ ∗ ∗
 r      
       
R 0 =0 → α ∈ N (R) → α = x r +1  1  + x r +2  0  + · · · + x n  0  → x i = 0, r + 1 ≤ i ≤ n.
       
0 0 1 0
       
 ..   ..   ..   .. 
 .  . . .
0 0 0 1

which means α = 0; a contradiction.

2. Since we have just proven that α ∉ N (A); then Aα ̸= 0. Therefore, the corresponding columns of A are
linearly independent as well.

84 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Remark 4 :

• N (A) = N (U ) = N (R) of course, since pivot cancellation will not change the 0 vector at the R.H.S.

• C (A) ̸= C (U ) ̸= C (R); simply:


( ) ( ) ( ) ( )
1 2 1 1 2 1
A= C (A) = a ,R = , C (R) = a .
2 4 2. 0 0 0

We will come later to how to find exactly C (A) and R (A ).

• The number of vectors in N (A) is itself the number of linear combinations of columns of A that gives 0.

85 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


3.3 3.3 The Complete Solution to Ax = b

86 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


3.4 3.4 Independence, Basis and Dimension

87 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


3.5 3.5 Dimensions of the Four Subspaces

88 Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.


Bibliography

Golub, G. H., Van Loan, C. F., 1996. Matrix computations, 3rd Edition. Johns Hopkins University Press, Balti-
more.

Schott, J. R., 2005. Matrix analysis for statistics, 2nd Edition. Wiley, Hoboken, N.J.

Searle, S. R., 1982. Matrix algebra useful for statistics. Wiley, New York.

Strang, G., 2016. Introduction to linear algebra, 5th Edition. Wellesley-Cambridge Press.

Copyright © 2016, 2019 Waleed A. Yousef. All Rights Reserved.

You might also like