0% found this document useful (0 votes)
68 views

Cholesky Decomposition, Linear Algebra Libraries and Matlab Routine

This document provides an overview of Cholesky decomposition and linear algebra libraries and MATLAB routines. It discusses: 1. Symmetric, positive definite matrices and properties. 2. Cholesky decomposition can be used to solve systems involving symmetric, positive definite matrices in half the time of LU decomposition. 3. LAPACK and BLAS libraries provide efficient linear algebra routines, with BLAS implementing basic operations and LAPACK building on them. MATLAB uses these libraries for decomposition and solving routines.

Uploaded by

Leaner saluja
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Cholesky Decomposition, Linear Algebra Libraries and Matlab Routine

This document provides an overview of Cholesky decomposition and linear algebra libraries and MATLAB routines. It discusses: 1. Symmetric, positive definite matrices and properties. 2. Cholesky decomposition can be used to solve systems involving symmetric, positive definite matrices in half the time of LU decomposition. 3. LAPACK and BLAS libraries provide efficient linear algebra routines, with BLAS implementing basic operations and LAPACK building on them. MATLAB uses these libraries for decomposition and solving routines.

Uploaded by

Leaner saluja
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Lecture 11

Cholesky Decomposition, Linear Algebra Libraries and Matlab


routine

Course Website
https://sites.google.com/view/kporwal/teaching/mtl107
Symmetric, positive definite systems

Definition
A symmetric matrix A is positive definite, if the corresponding
quadratic form is positive definite, i.e., if

Q(x) := x⊤ Ax > 0, for all x ̸= 0.


Symmetric, positive definite systems

Theorem
If a symmetric matrix A ∈ Rn×n is positive definite, then the
following conditions hold.
1. aii > 0 for i=1,...,n.
2 <a a
2. aik ii kk for i ̸= k, i, k = 1, ..., n.
3. There is a k with maxi,k |aik | = akk .

Proof.
1. aii = e⊤
i Aei > 0.
2. (ξei + ek )⊤ A(ξei + ek ) = aii ξ 2 + 2aik ξ + akk > 0.This
quadratic equation has no real zero ξ, therefore its
2 − 4a a
discriminant 4aik ii kk must be negative.
3. Clear.
Gaussian elimination works w/o pivoting. Since a11 > 0 we have

a11 a⊤ 0⊤ a⊤
    
1 1 a11 1
A= =
a1 A1 a1 /a11 I 0 A1 − a1 a⊤ 1 /a11

0⊤ a11 0⊤ 1 a⊤
   
=
1 1 /a11
a1 /a11 I 0 A(1) 0 I

with A(1) = A1 − a1 a⊤
1 /a11 . But for any x ∈ R
n−1 /{0} we have

⊤ 
a11 0⊤
  
0 0
x⊤ A(1) x =
x 0 A(1) x
⊤ 
0⊤ a11 a⊤ 1 −a⊤
    
0 1 1 1 /a11 0
=
x −a1 /a11 I a1 A1 0 I x
= y⊤ Ay > 0.
Cholesky decomposition

From the proof we see that

A = LU = LDL⊤ with U = DL⊤

Note that L is a unit lower triangular matrix and D is a diagonal


matrix with positive diagonal entries.
1
We can easily compute the positive definite diagonal matrix D 2
1 1
with D 2 D 2 = D.
Definition 1
Let L1 = LD 2 . Then
A = L1 L⊤
1

is called the Cholesky decomposition of A.


Cholesky decomposition: Solving systems and complexity

Linear system solving with the Cholesky factorization:

A = LL⊤ (Cholesky decomposition)

Lc = b (Forward substitution)
L⊤ x = c (Backward substitution)
The complexity is half that of the LU factorization:
1 3
n + O(n2 )
3
Efficient implementation

▶ In general, the performance of an algorithm depends not only


on the number of arithmetic operations that are carried out
but also on the frequency of memory accesses.
▶ A first step towards efficiently accessing memory is
vectorization. Matlab supports vector operations in a
convenient way.
▶ However, good performance is only achievable by blocking
algorithms. Only blocks of data increase the number of
floating point operations (flops) versus memory accesses (in
Bytes) beyond O(1).
LAPACK and the BLAS

▶ LAPACK (Linear Algebra PACKage) is a library of Fortran 77


subroutines for solving the most commonly occuring problems
in numerical linear algebra. It has been designed to be
efficient on a wide range of high-performance computers with
a hierarchy of memory levels.
▶ LAPACK supersedes LINPACK and EISPACK.
▶ LAPACK is written in a way that as much as possible of the
computation is performed by calls to the Basic Linear Algebra
Subprograms (BLAS). While LINPACK and EISPACK relied
on the vector operations in BLAS-1, LAPACK calls BLAS-2
(matrix-vector operations) and BLAS-3 (matrix-matrix
operations) to exploit the fast memories (caches) of todays
high-performance computers.
▶ Most of the work has been done at universities (U of TN at
Knoxville, U of CA at Berkeley) and at NAG, Oxford.
▶ The software is freely available at
http://www.netlib.org/lapack
▶ The LINPACK benchmark gives rise to the TOP500 list, the
list of the 500 most powerful computers:
http://www.top500.org
▶ IIT Delhi: HP Apollo 6000 Xl230/250 , Xeon E5-2680v3 12C
2.5GHz, Infiniband FDR, NVIDIA Tesla K40m
Hewlett-Packard Rank: 217, 4th in India
BLAS

See http://www.netlib.org/blas
BLAS-1: vector operations (real, double, complex variants)
▶ Swap two vectors, copy two vectors, scale a vector
▶ AXPY operation: y = αx + y
▶ 2-norm, 1-norm, dot product
▶ I AMAX index of largest matrix element:
first i such that |xi | ≥ |xk | for all k.
▶ O(1) flops per Byte memory access.
BLAS (cont.)

BLAS-2: matrix-vector operations


▶ matrix-vector multiplication (variants for various matrix types:
general, symmetric, banded, triangular)
▶ triangular solves (various variants)
▶ O(1) flops per Byte memory access.
BLAS (cont.)

BLAS-3: matrix-matrix operations


▶ matrix-matrix multiplication
▶ triangular solves for multiple right-hand sides (various
variants)
▶ O(b) flops per Byte memory access where b is block size.
Matlab

▶ Matlab has built-in Gaussian Elimination to solve Ax = b.


Use x=A\b.
▶ Can compute decompositions withlu and chol.
▶ Do not implement Gaussian elimination yourself!
▶ Use numerical libraries (LAPACK), NAG, MKL, or MATLAB!
▶ MATLAB operator: \
▶ In fact: all implementations are based on LAPACK.
.
Lecture 12
Error Analysis, Condition Number

Course Website
https://sites.google.com/view/kporwal/teaching/mtl107
Error estimation

Two questions regarding the accuracy of x̃ as an approximaton to


the solution x of the linear system of equations Ax = b.
▶ First we investigate what we can derive from the size of the
residual r̃ = b − Ax̃.
Note that r = b − Ax = 0.
▶ Then, what is the effect of errors in the initial data (b, A) on
the solution x? That is, how sensitive is the solution to
perturbations in the initial data?
Error estimation (cont.)

Let    
1.2969 0.8648 0.8642
A= , b=
0.2161 0.1441 0.1440
Suppose somebody came up with the approximate solution
 
0.9911
x̃ =
−0.4870

−10−8
 
Then, r̃ = b − Ax̃ = =⇒ ∥r̃∥∞ = 10−8
10−8
 
2
Since x = =⇒ error ∥z̃∥∞ = 1.513 which is ≈ 108
−2
times larger than the residual.
Error estimation (cont.)

So, how does the residual r̃ = b − Ax̃ affect the error z̃ = x̃ − x?


We assume that ∥A∥ is any matrix norm and ∥x∥ a compatible
vector norm, i.e., we have ∥Ax∥ ≤ ∥A∥ ∥x∥ for all ∥x∥.
We have
Az = A(x̃ − x) = Ax̃ − b = −r̃
Therefore

∥z∥ = −A−1 r̃ ≤ A−1 ∥r̃∥



∥b∥ = ∥Ax∥ ≤ ∥A∥ ∥x∥ ,

We get an estimate for the relative error of x:

∥z∥ ∥x̃ − x∥ ∥r̃∥ ∥r̃∥


≤ ∥A∥ A−1

= = κ(A)
∥x∥ ∥x∥ ∥b∥ ∥b∥
Error estimation (cont.)

Definition
The quantity
κ(A) = ∥A∥ A−1

is called the condition number of A.


The condition number is at least 1:

1 = ∥I ∥ = AA−1 ≤ ∥A∥ A−1 = κ(A).


If A is nonsingular and E is the matrix with smallest norm such


that A + E is singular, then

∥E ∥2 1
=
∥A∥2 κ2 (A)
Error estimation (cont.)

Previous example continued.


 
−1 8 0.1441 −0.8648
A = 10
−0.2161 1.2969

This yields
−1
A = 1.513×108 =⇒ κ∞ (A) = 2.162×1.513×108 ≈ 3.27×108 .

1.513 3.27
The numbers 2 < 0.8642 confirm the estimate

∥z∥∞ ∥r̃∥∞
≤ κ∞ (A)
∥x∥∞ ∥b∥∞
Error estimation (cont.)

We now make A singular by replacing a22 by a12a11a21 .


Then A+E is singular with
   
0 0 0 0
E= ≈ .
0 0.8648−0.2161
1.2969 − 0.1441 0 −7.7 × 10−9

So, indeed,
∥E ∥ 1
≈ .
∥A∥ κ(A)
(This estimate holds in l2 and l∞ norms.)
Sensitivity on matrix coefficients

▶ Input data A and b are often perturbed due, e.g., to rounding.


▶ How big is the change δx if the matrix A is altered by δA and
b is perturbed by δb.
▶ The LU decomposition can be considered as the exact
decomposition of a perturbed matrix à = A + δA
▶ Forward/backward substitition add (rounding) errors.
▶ Alltogether: The computed solution ˆ x̃ is the solution of a
perturbed problem

(A + δA)x̃ = b + δb. x̃ = x + δx.

▶ How do these perturbations affect the solution?This is called


backward error analysis.
Sensitivity on matrix coefficients (cont.)

(A + δA)(x + δx) = (b + δb)


Multiplying out:

Ax + Aδx + δAx + δAδx = b + δb

or, with Ax=b,


Aδx + δAx + δAδx = δb
or, equivalently,
Aδx = δb − δAx − δAδx
Sensitivity on matrix coefficients (cont.)

Thus,
δx = A−1 (δb − δAx − δAδx)
For compatible norms we have

∥δx∥ = A−1 ∥δb − δAx − δAδx∥


≤ A−1 (∥δb∥ + ∥δA∥ ∥x∥ + ∥δA∥ ∥δx∥)


From this we get

(1 − A−1 ∥δA∥) ∥δx∥ ≤ A−1 (∥δb∥ + ∥δA∥ ∥x∥)



Sensitivity on matrix coefficients (cont.)

Now assume that the perturbation in A is small: A−1 ∥δA∥ < 1.


Then, −1
A
∥δx∥ ≤ (∥δb∥ + ∥δA∥ ∥x∥)
1 − ∥A−1 ∥ ∥δA∥
Since we are interested in an estimate of the relative error we use
1 ∥A∥
∥b∥ = ∥Ax∥ ≤ ∥Ax∥ ≤ ∥A∥ ∥x∥ =⇒ ∥x∥ ≥ ∥b∥ / ∥A∥ =⇒ ≤ .
∥x∥ ∥b∥

Therefore, we have
−1  
∥δx∥ A ∥δb∥
≤ + ∥δA∥
∥x∥ 1 − ∥A−1 ∥ ∥δA∥ ∥x∥
Sensitivity on matrix coefficients (cont.)

With the condition number κ(A) = ∥A∥ A−1 we get


 
∥δx∥ κ(A) ∥δb∥ ∥δA∥
≤ +
∥x∥ 1 − κ(A) ∥δA∥
∥A∥
∥b∥ ∥A∥

The condition number κ(A) is the decisive quantity that describes


the sensitivity of the solution x on both relative changes in A as
well as in b.
▶ If κ(A) ≫ 1 small perturbations in A can lead to large relative
errors in the solution of the linear system of equations.
▶ If κ(A) ≪ 1 a stable algorithm can produce solutions with
large relative error!
▶ A stable algorithm produces (acceptably) small errors if the
problem is well-conditioned (i.e. κ(A) ‘not too large’).
Rule of thumb

Let’s assume we compute with d decimal digits such that the


initial errors in the input data are about

∥δb∥ ∥δA∥
≤ 5.10−d , ≤ 5.10−d .
∥b∥ ∥A∥

Let’s assume that the condition number of A is 10α .


If 5.10α−d ≪ 1 then we get

∥δx∥
≤ 10α−d+1
∥x∥
Rule of thumb (cont.)

Rule of thumb: If a linear system A x = b is solved with d digit


floating point numbers and κ(A) ≈ 10α then, due to unavoidable
errors in the initial data, we have to expect relative errors in x in
the order of 10α−d+1 .
Note that this statement is about the largest components of x.
The small components can be bigger errors!
Note on δA and δb

It can be shown that Gaussian elimination with pivoting yield a


perturbation bounded by

∥δA∥∞ ≤ ηϕ(n)gn (A)

where ϕ(n) is a low order polynomial in n (cubic at most) and η is


the rounding unit. The bound on forward and backward
substitutions and on δb are significantly smaller.
Thus, as long as the pivoting keeps gn (A) growing only moderately
and n is not too large then the overall perturbations δA and δb are
not larger than a few orders of magnitude times η.
Scaling

▶ It would be nice to have well-conditioned problems.


▶ Can we easily improve the condition of a linear system before
starting the LU factorization?
▶ Scaling is a possibility that often works.
Let D1 and D2 be diagonal matrices. The solution of A x = b can
be found by solving the scaled system

D1−1 AD2 y = D1−1 b

by Gaussian elimination and then setting

x = D2 y.

Scalings of A, b, y, require only O(n2 ) flops. Often D2 = I , i.e.,


only row scaling, unless matrix is symmetric.
Scaling (cont.)

Example     
10 100000 x1 100000
=
1 1 x2 2
The row equivalent scaled problem is
    
0.0001 1 x1 1
=
1 1 x2 2

The solutions with 3 decimal digits are x̃ = (0, 1.00)T for the
unscaled system and x̃ = (1.00, 1.00)T for the scaled system.
The correct solution is x = (1.0001, 0.9999)T .
See Example 5.11 in Ascher and Greif
Theorem on scaling

Theorem
Let A ∈ Rn×n be nonsingular. Let the diagonal matrix Dz be
defined by
 −1
n
X
dii =  |aij |
j=1

Then
κ∞ (Dz A) ≤ κ∞ (DA)
for all nonsingular diagonal D.
See: Dahmen & Reusken: Numerik fur Ingenieure und
Naturwissenschaftler. 2nd ed. Springer 2008.
Remark on determinants

Although det(A) = 0 for singular matrices, small determinants do


not indicate bad condition!
Example(Small determinant)
A=diag (0.1, 0.1, ..., 0.1) ∈ Rn×n has det(A) = 10−n and κ(A) = 1.
Example(Unit determinant)
The condition number of A below grows exponentially but det(A)
= 1.  
1 −1 ... −1

 ... ... ... 
A=  ... ... 
.
 ... −1 
1
Condition estimation

▶ After having solved A x = b via PA = LU we would like to


ascertain the number of correct digits in the computed x̃ .
▶ We need estimate for κ∞ (A) = ∥A∥∞ A−1 .


▶ known: ∥A∥∞ = max1≤i≤n nj=1 |aij |
P

▶ How get at A−1 ?




▶ Idea: Ay = d =⇒ A−1 ≥ ∥y∥∞ / ∥d∥∞


▶ Find d with ∥d∥∞ = 1 such that ∥y ∥∞ is as big as possible.
▶ Observation: Estimation of U −1 gives a good


approximation of A−1 . Solve Uy = d with di = ±1.

You might also like