0% found this document useful (0 votes)
49 views

Chap 03

The document discusses the method of least squares for linear problems. It describes how the linear least squares problem can be written as an overdetermined linear system and solved using the normal equations, resulting in the unique solution that minimizes the residual vector. It provides an example of fitting data to a quadratic polynomial.

Uploaded by

Rider rogue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Chap 03

The document discusses the method of least squares for linear problems. It describes how the linear least squares problem can be written as an overdetermined linear system and solved using the normal equations, resulting in the unique solution that minimizes the residual vector. It provides an example of fitting data to a quadratic polynomial.

Uploaded by

Rider rogue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Lecture Notes to Accompany

Scientific Computing
An Introductory Survey
Second Edition

by Michael T. Heath

Chapter 3

Linear Least Squares

Copyright
c 2001. Reproduction permitted only for
noncommercial, educational use in conjunction with the
book.
1
Method of Least Squares

Measurement errors inevitable in observational


and experimental sciences

Errors smoothed out by averaging over many


cases, i.e., taking more measurements than
strictly necessary to determine parameters of
system

Resulting system overdetermined, so usually


no exact solution

Project higher dimensional data into lower di-


mensional space to suppress irrelevant detail

Projection most conveniently accomplished by


method of least squares

2
Linear Least Squares

For linear problems, obtain overdetermined


linear system Ax = b, with m × n matrix A,
m>n

Better written Ax = ∼ b, since equality usually


not exactly satisfiable when m > n

Least squares solution x minimizes squared


Euclidean norm of residual vector r = b − Ax,

min kr k2 2
2 = min kb − Axk2
x x

3
Example: Data Fitting

Given m data points (ti, yi), find n-vector x


of parameters that gives “best fit” to model
function f (t, x):
m
(yi − f (ti, x))2
X
min
x
i=1

Problem linear if function f linear in compo-


nents of x:

f (t, x) = x1φ1(t) + x2φ2(t) + · · · + xnφn(t),


where functions φj depend only on t

∼ b, with a =
Written in matrix form as Ax = ij
φj (ti) and bi = yi

4
Example: Data Fitting

Polynomial fitting,

f (t, x) = x1 + x2t + x3t2 + · · · + xntn−1,


is linear, since polynomial linear in coefficients,
though nonlinear in independent variable t

Fitting sum of exponentials, with

f (t, x) = x1ex2t + · · · + xn−1exnt,


is nonlinear problem

For now, consider only linear least squares prob-


lems

5
Example: Data Fitting

Fitting quadratic polynomial to five data points


gives linear least squares problem
2
   
1 t1 t1 y1
t2
   
1 t2  x1

 y2 
2
 ∼ 
  
Ax = 
1 t3 2
t3   x2  =  y3  =
 
b
   
1 t4 2
t4  x3  y4 
  

1 t5 t2
5 y5

Matrix with columns (or rows) successive pow-


ers of independent variable called Vandermonde
matrix

6
Example: Data Fitting

For data

t −1.0 −0.5 0.0 0.5 1.0


y 1.0 0.5 0.0 0.5 2.0

overdetermined 5 × 3 linear system is


   
1 −1.0 1.0 1.0
−0.5
   
1 0.25  x1

 0.5 
  
 ∼ 

Ax = 1

0.0 0.0   x2  =

 0.0  = b

   
1 0.5 0.25  x3  0.5 
  

1 1.0 1.0 2.0

Solution, which we will see later how to com-


pute, is

x = [ 0.086 0.40 1.4 ]T ,


so approximating polynomial is

p(t) = 0.086 + 0.4t + 1.4t2


7
Example: Data Fitting

Resulting curve and original data points shown


in graph:

2 •.
...
....
..
.
.....
.
.
.... ..
.....
.
•........ 1 .
......
..... ....
..... ...
.
...... ..
...... •
....... ..
........•
.
.........
............ ..
..
..
..
.........
..............................................

−1 0 1

8
Existence and Uniqueness

∼ b always
Linear least squares problem Ax =
has solution

Solution unique if, and only if, columns of A


linearly independent, i.e., rank(A) = n, where
A is m × n

If rank(A) < n, then A is rank-deficient, and


solution of linear least squares problem is not
unique

For now, assume A has full column rank n

9
Normal Equations

Least squares minimizes squared Euclidean norm

kr k2 T
2=r r
of residual vector

r = b − Ax
To minimize

kr k2
2 = r T r = (b − Ax)T (b − Ax)

= bT b − 2xT AT b + xT AT Ax,
take derivative with respect to x and set to o,

2AT Ax − 2AT b = o,
which reduces to n × n linear system

AT Ax = AT b
known as system of normal equations

10
Orthogonality

Vectors v1 and v2 are orthogonal if their inner


product is zero, v1T v2 = 0

Space spanned by columns of m × n matrix A,


span(A) = {Ax : x ∈ Rn}, is of dimension at
most n

If m > n, b generally does not lie in span(A),


so no exact solution to Ax = b

Vector y = Ax in span(A) closest to b in 2-


norm occurs when residual r = b − Ax orthog-
onal to span(A)

Thus,
o = AT r = AT (b − Ax),
or
AT Ax = AT b
11
Orthogonality, continued

..
.. ...........
.
.. ..... ..
.
. .
....... . r = b − Ax
b ..... . ..
. .
... . ..................... ...
................ . ..
.................................................................................................................
.
.
.......
..
....... ..
. .......
. . .
..
..... ......
.
..
.
. ..
.....
. . .
....... ............. ..
.......
....
. . .
.....θ .... .. .....
..
.. . ... . .
.
..... ...
.
....... ............................................................................................... .. .......
. .
....... y = Ax .
.....
.
. .
....... .
.....
.
. .
....... ..
.....
.... span(A) ....
......................................................................................................................................................................

12
Orthogonal Projectors

Matrix P is orthogonal projector if idempotent


(P 2 = P ) and symmetric (P T = P )

Orthogonal projector onto orthogonal comple-


ment span(P )⊥ given by P⊥ = I − P

For any vector v ,

v = (P + ( I − P )) v = P v + P⊥ v

∼ b, if rank(A) =
For least squares problem Ax =
n, then
P = A(AT A)−1AT
is orthogonal projector onto span(A), and

b = P b + P⊥b = Ax + (b − Ax) = y + r

13
Pseudoinverse and Condition Number

Nonsquare m × n matrix A has no inverse in


usual sense

If rank(A) = n, pseudoinverse defined by

A+ = (AT A)−1AT
and
cond(A) = kAk2 · kA+k2

By convention, cond(A) = ∞ if rank(A) < n

Just as condition number of square matrix mea-


sures closeness to singularity, condition num-
ber of rectangular matrix measures closeness
to rank deficiency

∼ b is given by
Least squares solution of Ax =
x = A+ b
14
Sensitivity and Conditioning

∼b
Sensitivity of least squares solution to Ax =
depends on b as well as A

Define angle θ between b and y = Ax by


ky k2 kAxk2
cos(θ) = =
kbk2 kbk2
(see previous drawing)

Bound on perturbation ∆x in solution x due


to perturbation ∆b in b given by
k∆xk2 1 k∆bk2
≤ cond(A)
k xk 2 cos(θ) kbk2

15
Sensitivity and Conditioning, cont.

Similarly, for perturbation E in matrix A,


k∆xk2  2
 kE k
2
/ [cond(A)] tan(θ) + cond(A)
k xk 2 kA k2

Condition number of least squares solution about


cond(A) if residual small, but can be squared
or arbitrarily worse for large residual

16
Normal Equations Method

If m × n matrix A has rank n, then symmetric


n × n matrix AT A is positive definite, so its
Cholesky factorization

AT A = LLT ,
can be used to obtain solution x to system of
normal equations

AT Ax = AT b,
which has same solution as linear least squares
problem Ax =∼b

Normal equations method involves transforma-


tions

rectangular −→ square −→ triangular

17
Example: Normal Equations Method

For polynomial data-fitting example given pre-


viously, normal equations method gives

1 1 1 1 1

AT A = 
 −1.0 −0.5 0.0 0.5 1.0 

1.0 0.25 0.0 0.25 1.0


 
1 −1.0 1.0
−0.5
 
1 0.25 

5.0 0.0 2.5



1 0.0 0.0  =  0.0 2.5 0.0  ,
   
 
1 0.5 0.25  2.5 0.0 2.125
 
1 1.0 1.0

AT b =
 
1.0
  
1 1 1 1 1  0.5 

4.0

 
 −1.0 −0.5 0.0 0.5 1.0   0.0  =  1.0 
    
 
1.0 0.25 0.0 0.25 1.0  0.5 

 3.25
2.0

18
Example Continued

Cholesky factorization of symmetric positive


definite matrix gives

5.0 0.0 2.5

AT A = 
 0.0 2.5 0.0  =

2.5 0.0 2.125



2.236 0 0

2.236 0 1.118

 0 1.581 0  0 1.581 0 
  

1.118 0 0.935 0 0 0.935

= LLT

19
Example Continued

Solving lower triangular system Lz = AT b by


forward-substitution gives

1.789

z=
 0.632 

1.336

Solving upper triangular system LT x = z by


back-substitution gives least squares solution

0.086

x=
 0.400 

1.429

20
Shortcomings of Normal Equations

Information can be lost in forming AT A and


AT b

For example, take



1 1

A=
 0,

0 

where  is positive number smaller than mach

Then in floating-point arithmetic


1 + 2
" # " #
1 1 1
AT A = = ,
1 1 + 2 1 1
which is singular

Sensitivity of solution also worsened, since

cond(AT A) = [cond(A)]2

21
Augmented System Method

Definition of residual and orthogonality require-


ment give (m+n)×(m+n) augmented system
" #" # " #
I A r b
=
AT O x o

System not positive definite, larger than origi-


nal, and requires storing two copies of A

But allows greater freedom in choosing pivots


in computing LDLT or LU factorization

22
Augmented System Method, continued

Introducing scaling parameter α gives system


" #" # " #
αI A r /α b
= ,
AT O x o
which allows control over relative weights of
two subsystems in choosing pivots

Reasonable rule of thumb

α = max |aij |/1000


i,j

Augmented system sometimes useful, but far


from ideal in work and storage required

23
Orthogonal Transformations

Seek alternative method that avoids numerical


difficulties of normal equations

Need numerically robust transformation that


produces easier problem

What kind of transformation leaves least squares


solution unchanged?

Square matrix Q is orthogonal if QT Q = I

Preserves Euclidean norm, since

kQv k2
2 = ( Qv ) T Qv = v T QT Qv = v T v = kv k2
2

Multiplying both sides of least squares problem


by orthogonal matrix does not change solution

24
Triangular Least Squares Problems

As with square linear systems, suitable target


in simplifying least squares problems is trian-
gular form

Upper triangular overdetermined (m > n) least


squares problem has form
" # " #
R ∼ b1 ,
x=
O b2
with R n × n upper triangular and b partitioned
similarly

Residual is

kr k2 2 2
2 = kb1 − Rxk2 + kb2 k2

25
Triangular Least Squares Problems, cont.

Have no control over second term, kb2k2 2 , but


first term becomes zero if x satisfies triangular
system
Rx = b1,
which can be solved by back-substitution

Resulting x is least squares solution, and min-


imum sum of squares is

kr k2 2
2 = kb2 k2

So strategy is to transform general least squares


problem to triangular form using orthogonal
transformation

26
QR Factorization

Given m × n matrix A, with m > n, we seek


m × m orthogonal matrix Q such that
" #
R
A=Q ,
O
with R n × n and upper triangular

Linear least squares problem Ax = ∼ b trans-


formed into triangular least squares problem
" # " #
R ∼ c1
QT Ax = x= = QT b,
O c2
which has same solution, since kr k2
2=
" # " #
R R
kb − Axk2
2 = kb − Q xk22 = kQT b − xk22
O O
= kc1 − Rxk2
2 + k c 2 k 2
2
because orthogonal transformation preserves
Euclidean norm
27
Orthogonal Bases

Partition m×m orthogonal matrix Q = [Q1 Q2],


with Q1 m × n

Then
" # " #
R R
A=Q = [Q1 Q2] = Q1R
O O
is reduced QR factorization of A

Columns of Q1 are orthonormal basis for span(A),


and columns of Q2 are orthonormal basis for
span(A)⊥

Q1QT1 is orthogonal projector onto span(A)

∼ b given
Solution to least squares problem Ax =
by solution to square system

QT1 Ax = Rx = c1 = QT1 b

28
QR Factorization

To compute QR factorization of m × n matrix


A, with m > n, annihilate subdiagonal entries
of successive columns of A, eventually reach-
ing upper triangular form

Similar to LU factorization by Gaussian elim-


ination, but uses orthogonal transformations
instead of elementary elimination matrices

Possible methods include

• Householder transformations

• Givens rotations

• Gram-Schmidt orthogonalization

29
Householder Transformations

Householder transformation has form


vv T
H =I −2 T
v v
for nonzero vector v

H = H T = H −1, so H is orthogonal and sym-


metric

Given vector a, choose v so that


   
α 1
0
 = α  0  = αe
   
 
Ha = 
 ..   ..  1
 .  .
0 0

Substituting into formula for H , can take

v = a − αe1
and α = ±kak2, with sign chosen to avoid can-
cellation
30
Example: Householder Transformation

Let a = [ 2 1 2 ]T

By foregoing recipe,

2
 
1
 
2
 
α

v = a − αe1 = 
1 − α0 = 1 − 0,
      

2 0 2 0
where α = ±kak2 = ±3

Since a1 positive, choosing negative sign for α


avoids cancellation

−3

2
   
5

Thus, v =  1  −  0  =  1 
     

2 0 2

To confirm that transformation works,


−3

2
 
5
  
T
v a 15  
Ha = a − 2 T v = 
1 − 2

1 =  0
 
v v 30
2 2 0
31
Householder QR Factorization

To compute QR factorization of A, use House-


holder transformations to annihilate subdiago-
nal entries of each successive column

Each Householder transformation applied to


entire matrix, but does not affect prior columns,
so zeros preserved

In applying Householder transformation H to


arbitrary vector u,
vv T vT u
! !
Hu = I − 2 u=u− 2 v,
vT v vT v
which is much cheaper than general matrix-
vector multiplication and requires only vector
v , not full matrix H

32
Householder QR Factorization, cont.

Process just described produces factorization


" #
R
Hn · · · H1 A =
O
with R n × n and upper triangular

If Q = H1 · · · Hn, then
" #
R
A=Q
O

To preserve solution of linear least squares prob-


lem, right-hand-side b transformed by same se-
quence of Householder transformations

Then solve triangular least squares problem


" #
R ∼ QT b
x=
O
for solution x of original least squares problem
33
Householder QR Factorization, cont.

For solving linear least squares problem, prod-


uct Q of Householder transformations need not
be formed explicitly

R can be stored in upper triangle of array ini-


tially containing A

Householder vectors v can be stored in (now


zero) lower triangular portion of A (almost)

Householder transformations most easily ap-


plied in this form anyway

34
Example: Householder QR Factorization

For polynomial data-fitting example given pre-


viously, with
   
1 −1.0 1.0 1.0
−0.5
   
1 0.25   0.5 
   
A=
1 0.0 0.0  ,

b =  0.0 

,
   
1 0.5 0.25   0.5 
   
1 1.0 1.0 2.0

Householder vector v1 for annihilating subdi-


agonal entries of first column of A is
     
1 −2.236 3.236
     
1  0  1 
  
   
1 − 
v1 =  0 = 1 
    
     
1  0  1 
  
   
1 0 1

35
Example Continued

Applying resulting Householder transformation


H1 yields transformed matrix and right-hand
side
 
−2.236 0 −1.118
−0.191 −0.405 
 

 0 
H1 A =

0 0.309 −0.655 

 

 0 0.809 −0.405 

0 1.309 0.345
 
−1.789
 −0.362 
 
 
H1b =  −0.862 


 
 −0.362 
 
1.138

36
Example Continued

Householder vector v2 for annihilating subdi-


agonal entries of second column of H1A is
     
0 0 0
 −0.191   1.581   −1.772 
     
     
v2 =  0.309  −  0  =  0.309 
    

     
 0.809   0   0.809 
     
1.309 0 1.309

Applying resulting Householder transformation


H2 yields
 
−2.236 0 −1.118
 

 0 1.581 0 

H2 H1 A = 
 0 0 −0.725 

 

 0 0 −0.589 

0 0 0.047

37
Example Continued

 
−1.789
 
 0.632 
 
H2H1b =  −1.035 


 
 −0.816 
 
0.404

Householder vector v3 for annihilating subdi-


agonal entries of third column of H2H1A is
     
0 0 0
     

 0   0 
  

 0 

v3 =  −0.725  −  0.935  =  −1.660 
    

     
 −0.589   0   −0.589 
     
0.047 0 0.047

38
Example Continued

Applying resulting Householder transformation


H3 yields
 
−2.236 0 −1.118
 

 0 1.581 0 

H3 H2 H1 A = 
 0 0 0.935 

 

 0 0 0 

0 0 0
 
−1.789
 
 0.632 
 
H3H2H1b =  1.336 


 
 0.026 
 
0.337

Now solve upper triangular system Rx = c1 by


back-substitution to obtain

x = [ 0.086 0.400 1.429 ]T

39
Givens Rotations

Givens rotations introduce zeros one at a time

Given vector [ a1 a2 ]T , choose scalars c and


s so that
" #" # " #
c s a1 α
= ,
−s c a2 0
q
with c2 + s2 = 1, or equivalently, α = a2
1 + a 2
2

Previous equation can be rewritten


" #" # " #
a1 a2 c α
=
a2 −a1 s 0

Gaussian elimination yields triangular system


" #" # " #
a1 a2 c α
=
0 −a1 − a2
2 /a1 s −αa2/a1

40
Givens Rotations, continued

Back-substitution then gives


αa2 αa1
s= 2 , c= 2
a1 + a22 a1 + a2
2

q
Finally, c2 + s2 = 1, or α = a2
1 + a 2 , implies
2
a1 a2
c=q , s=q
a2
1 + a 2
2 a2
1 + a 2
2

41
Example: Givens Rotation

Let a = [ 4 3 ]T

Computing cosine and sine,


a1 4
c=q = = 0.8
2 2
a1 + a2 5

a2 3
s=q = = 0.6
a2 + a 2 5
1 2

Rotation given by
" # " #
c s 0.8 0.6
G= =
−s c −0.6 0.8

To confirm that rotation works,


" #" # " #
0.8 0.6 4 5
Ga = =
−0.6 0.8 3 0

42
Givens QR Factorization

To annihilate selected component of vector in


n dimensions, rotate target component with
another component
    
1 0 0 0 0 a1 a1
    
0 c 0 s 0   a2   α 
    

0 0 1 0 0   a3  =  a3 
    
    
0
 −s 0 c 0   a4 
  
 0 
 

0 0 0 0 1 a5 a5

Reduce matrix to upper triangular form using


sequence of Givens rotations

Each rotation orthogonal, so their product is


orthogonal, producing QR factorization

Straightforward implementation of Givens


method requires about 50% more work than
Householder method, and also requires more
storage, since each rotation requires two num-
bers, c and s, to define it
43
Gram-Schmidt Orthogonalization

Given vectors a1 and a2, can determine or-


thonormal vectors q1 and q2 with same span
by orthogonalizing one vector against other:

a1 a...2
....... . .
.. ..............
..... ..
..... ....... ..........
..... . ... .....
..... ... ...
..... ... ...
..... ...
.... ...
........ .
. ...
. ........
... q2
.. . .
. .
.
q1 ......... ..... ...
..
... ... .
...
..... .. ......
..... ... .
..... .. .....
..
..... ... .
..
..... .. ....
....... ....
.......... a2 − (q1T a2 )q1

for k = 1 to n
qk = ak
for j = 1 to k − 1
rjk = qjT ak
qk = qk − rjk qj
end
rkk = kqk k2
qk = qk /rkk
end
44
Modified Gram-Schmidt

Classical Gram-Schmidt procedure often suf-


fers loss of orthogonality in finite-precision

Also, separate storage is required for A, Q, and


R, since original ak needed in inner loop, so qk
cannot overwrite columns of A

Both deficiencies improved by modified Gram-


Schmidt procedure, with each vector orthogo-
nalized in turn against all subsequent vectors
so qk can overwrite ak :

for k = 1 to n
rkk = kak k2
qk = ak /rkk
for j = k + 1 to n
rkj = qkT aj
aj = aj − rkj qk
end
end
45
Rank Deficiency

If rank(A) < n, then QR factorization still ex-


ists, but yields singular upper triangular factor
R, and multiple vectors x give minimum resid-
ual norm

Common practice selects minimum residual so-


lution x having smallest norm

Can be computed by QR factorization with


column pivoting or by singular value decom-
position (SVD)

Rank of matrix often not clear cut in practice,


so relative tolerance used to determine rank

46
Example: Near Rank Deficiency

Consider 3 × 2 matrix

0.641 0.242

A=
 0.321 0.121 

0.962 0.363

Computing QR factorization,
" #
1.1997 0.4527
R=
0 0.0002

R extremely close to singular (exactly singular


to 3-digit accuracy of problem statement)

If R used to solve linear least squares prob-


lem, result is highly sensitive to perturbations
in right-hand side

For practical purposes, rank(A) = 1 rather


than 2, because columns nearly linearly depen-
dent
47
QR with Column Pivoting

Instead of processing columns in natural or-


der, select for reduction at each stage column
of remaining unreduced submatrix having max-
imum Euclidean norm

If rank(A) = k < n, then after k steps, norms


of remaining unreduced columns will be zero
(or “negligible” in finite-precision arithmetic)
below row k

Yields orthogonal factorization of form


" #
R S
QT AP = ,
O O
with R k×k, upper triangular, and nonsingular,
and permutation matrix P performing column
interchanges

48
QR with Column Pivoting, cont.

∼b
Basic solution to least squares problem Ax =
can now be computed by solving triangular sys-
tem Rz = c1, where c1 contains first k com-
ponents of QT b, and then taking
" #
z
x=P
o

Minimum-norm solution can be computed, if


desired, at expense of additional processing to
annihilate S

rank(A) usually unknown, so rank determined


by monitoring norms of remaining unreduced
columns and terminating factorization when
maximum value falls below tolerance

49
Singular Value Decomposition

Singular value decomposition (SVD) of m × n


matrix A has form

A = U ΣV T ,
where U is m × m orthogonal matrix, V is n × n
orthogonal matrix, and Σ is m × n diagonal
matrix, with
(
0 for i 6= j
σij =
σi ≥ 0 for i = j

Diagonal entries σi, called singular values of A,


usually ordered so that σ1 ≥ σ2 ≥ · · · ≥ σn

Columns ui of U and vi of V called left and


right singular vectors

50
Example: SVD

SVD of
 
1 2 3
 4 5 6 
 
A = 
 7 8 9 


10 11 12

given by U ΣV T =
 
.141 .825 −.420 −.351
 .344 .426 .298 .782 
 
 
 .547

.0278 .664 −.509 
.750 −.371 −.542 .0790
 
25.5 0 0 
.504 .574 .644

 0 1.29 0
 

  −.761 −.057 .646 
 
 0 0 0

.408 −.816 .408
0 0 0

51
Applications of SVD

∼ b:
Minimum norm solution to Ax =
X uTi b
x= vi
σ 6=0
σi
i
For ill-conditioned or rank deficient problems,
“small” singular values can be dropped from
summation to stabilize solution

Euclidean matrix norm:


kAxk2
kAk2 = max = σmax
x6=o kxk2

Euclidean condition number of matrix:

cond(A) = σmax/σmin

Rank of matrix: number of nonzero, or non-


negligible, singular values
52
Pseudoinverse

Define pseudoinverse of scalar σ to be 1/σ if


σ 6= 0, zero otherwise

Define pseudoinverse of (possibly rectangular)


diagonal matrix by transposing and taking scalar
pseudoinverse of each entry

Then pseudoinverse of general real m×n matrix


A given by

A+ = V Σ+U T

Pseudoinverse always exists whether or not ma-


trix is square or has full rank

If A is square and nonsingular, then A+ = A−1

∼b
In all cases, minimum-norm solution to Ax =
is given by A+ b
53
Orthogonal Bases

Columns of U corresponding to nonzero singu-


lar values form orthonormal basis for span(A)

Remaining columns of U form orthonormal


basis for orthogonal complement span(A)⊥

Columns of V corresponding to zero singular


values form orthonormal basis for null space of
A

Remaining columns of V form orthonormal


basis for orthogonal complement of null space

54
Lower-Rank Matrix Approximation

Another way to write SVD:


A = U ΣV T = σ1E1 + σ2E2 + · · · + σnEn,
with Ei = uiviT

Ei has rank 1 and can be stored using only


m + n storage locations

Product Eix can be formed using only m + n


multiplications

Condensed approximation to A obtained by


omitting from summation terms corresponding
to small singular values

Approximation using k largest singular values


is closest matrix of rank k to A

Approximation is useful in image processing,


data compression, information retrieval, cryp-
tography, etc.
55
Total Least Squares

Ordinary least squares applicable when right


hand side b subject to random error but matrix
A known accurately

When all data, including A, subject to error,


then total least squares more appropriate

Total least squares minimizes orthogonal dis-


tances, rather than vertical distances, between
model and data

Total least squares solution can be computed


from SVD of [A, b]

56
Comparison of Methods

Forming normal equations matrix AT A requires


about n2m/2 multiplications, and solving re-
sulting symmetric linear system requires about
n3/6 multiplications

Solving least squares problem using Householder


QR factorization requires about mn2 − n3/3
multiplications

If m ≈ n, two methods require about same


amount of work

If m  n, Householder QR requires about twice


as much work as normal equations

Cost of SVD proportional to mn2 + n3, with


proportionality constant ranging from 4 to 10,
depending on algorithm used
57
Comparison of Methods, continued

Normal equations method produces solution


whose relative error is proportional to [cond(A)]2

Required Cholesky factorization can be expected



to break down if cond(A) ≈ 1/ mach or worse

Householder method produces solution whose


relative error is proportional to

cond(A) + kr k2 [cond(A)]2,
which is best possible, since this is inherent
sensitivity of solution to least squares problem

Householder method can be expected to break


down (in back-substitution phase) only if

cond(A) ≈ 1/mach
or worse
58
Comparison of Methods, continued

Householder is more accurate and more broadly


applicable than normal equations

These advantages may not be worth additional


cost, however, when problem is sufficiently well
conditioned that normal equations provide ad-
equate accuracy

For rank-deficient or nearly rank-deficient prob-


lem, Householder with column pivoting can pro-
duce useful solution when normal equations
method fails outright

SVD is even more robust and reliable than


Householder, but substantially more expensive

59

You might also like