0% found this document useful (0 votes)
2 views

Geological Data Analysis_Matrix Algebra_Paul Wessel

Uploaded by

saurabhgupta8573
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Geological Data Analysis_Matrix Algebra_Paul Wessel

Uploaded by

saurabhgupta8573
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

GG313 GEOLOGICAL D ATA ANALYSIS 3–1

GG313 GEOLOGICAL DATA ANALYSIS


LECTURE NOTES
PAUL WESSEL

SECTION 3
LINEAR (MATRIX) ALGEBRA
OVERVIEW OF MATRIX ALGEBRA
(or All you ever wanted to know about Linear Algebra but were afraid to ask...)

A matrix is simply a rectangular array of "elements" arranged in a series of m rows and n


columns. The order of a matrix is the specification of the number of rows by the number of
columns. Elements of a matrix are given as aij, where the value of i specifies the row position
and the value of j specifies the column position; so aij identifies the element at position i,j.
An element can be a number (real or complex), algebraic expression, or (with some
restrictions) a matrix or matrix expression. For example:
12 4 10 a 11 a 12 a 13
A= 8 1 11 = a 21 a 22 a 23
15 3 11 a 31 a 32 a 33
14 1 11 a 41 a 41 a 43

This matrix, A, has order 4 x 3, the element a23 = 11, a13 = 10, etc. The notation for matrices is
not always consistent but is usually one of the following schemes:
matrix - designated by a bold letter (most common); capital letter; or letter with under-score,
brackets or hat (^). Order is also sometimes given: A(4,3) = A is 4 x 3.
order - always given as row x column but uses letters n,m,p differently: n(row) xm(column) or
m(row) x n(column)
element - most commonly aij with i = row; j = column; (sometimes k,l,p)

Advantages of matrix algebra mainly lies in the fact that it provides a concise and simple
method for manipulating large sets of numbers or computations, making it ideal for computers.
Also (l) Compact form of matrices allows convenient notation for describing large tables of data;
(2) Operations allow complex relationships to be seen which would otherwise be obscured by the
shear size of the data (i.e., it aids clarification); and (3) Most matrix manipulation involves just a
few standard operations for which standard subroutines are readily available.
As a convention with data matrices (i.e., the elements represent data values), the columns
usually represent the dilferent variables (e.g., one column contains temperatures, another salinity,
etc.) while rows contain the samples (e.g., the values of the variable at each depth or time). Since
there are usually more samples than variables, such data matrices are usually rectangular having
more rows (m) than columns (n) - order m x n where m > n.
Column vector is a matrix containing only a single column of elements:
GG313 GEOLOGICAL D ATA ANALYSIS 3–2

a1
a2
a=
an

Row vector is a matrix containing only a single row of elements:


a T = a 1 a 2...a n
Size of a vector is simply the number of elements it contain (= n, in both examples above).
Null matrix, written as 0 or 0(m,n) has all elements equal to 0 - it plays the role of zero in matrix
algebra.
Square matrix has the same numbers of rows as columns, so order is n x n.
Diagonal matrix is a square matrix with zero in all positions except along the principal (or
leading) diagonal:
30 0
D= 01 0
00 6
or

0 for i ≠ j
dij =
non –zero for i = j

This type of matrix is important for scaling rows or columns of other matrices. The Identity
matrix (I) is a diagonal matrix with all of the nonzero elements equal to one. Written as I or In;
it plays the role of 1 in matrix algebra (A·I= I·A = A). A Lower triangular matrix (L) is a square
matrix with all elements equal to zero above the principal diagonal:
100 1
L = 370 = 37
826 82 6
or

0 for i < j
=
ij
non –zero for i ≥ j

An Upper triangular matrix is a square matrix with all elements equal to zero below the principal
diagonal

0 for i < j
u ij =
non–zero for i ≤ j

(If one multiplies 2 triangular matrices of the same form, the result is a third matrix of the same
form).
We also have Fullv populated matrix which is a matrix with all of its elements nonzero,
Sparse matrix which is a matrix with only a small proportion of its elements nonzero, and Scalar
which simply is a number (i.e., matrix with a single element).
GG313 GEOLOGICAL D ATA ANALYSIS 3–3

Matrix transpose (or transpose of a matrix) is obtained by interchanging the rows and columns
of a matrix. So row i becomes column i and column j becomes row j (also the order of the matrix
is reversed).
1 14
A = 6 7 ; AT = 1 6 8
8 2 14 7 2

A diagonal matrix is its own transpose: D T = D. In general, we find


a ij ⇔ aji
Symmetric matrix is a square matrix which is symmetrical about its principal diagonal, so
a ij = aji . Therefore a symmetrical matrix is equal to its own transpose.
125
A = A T = 2 6 3 = symmetric
534
Skew symmetric matrix is a matrix in which:
aij =– a ji
So
A T =– A
a ii = 0(principaldiagonal elementsare zero)
0 4 –5
A = – 4 0 3 =skew symmetric
5 –3 0

Any square matrix can be split into the sum of a symmetric and skew symmetric matrix:
A = 1 A + AT + 1 A – AT
2 2
Basic Matrix Operations:

Matrix addition and subtraction requires matrices of the same order since this operation
simply involves addition or subtraction of corresponding elements. So, if A + B = C,

a 11 a 12 b 11 b 12 a 11 + b 11 a 12 + b 12
A = a 21 a 22 ; B = b 21 b 22 ; C = a 21 + b 21 a 22 + b 22
a 31 a 32 b 31 b 32 a 31 + b 31 a 32 + b 32

and
(1) A + B = B + A

(2)(A + B)+ C = A +(B + C)


where all matrices are the same order. Scalar multiplication of a matrix is multiplying a matrix
by a constant (scalar):
GG313 GEOLOGICAL D ATA ANALYSIS 3–4

a 11 a 12 a 11 a 12
A = a 21 a 22 = a 21 a 22
a 31 a 32 a 31 a 32

where is a scalar. Every element is multiplied by the scalar. Scalar product (or dot product or
inner product) is the product of 2 vectors of the same size.
a⋅b =
where a is a row vector (or the transpose of a column vector) of length n, b is a column vector (or
the transpose of a row vector), also of length n, and is the scalar product of a·b. Then:
b1
a = a1 a2 a3 ; b = b2
b3
and
= a 1b 1 + a 2b 2 + a 3 b 3
Some people like to visualize this multiplication as:

1
3
b
x 4
x
2
x
x

a [ 2 1 4 5 ] [ 31 ] =
Fig. 3-1. Dot product of two vectors.

Conceptually, this product can be thought of as multiplying the length of one vector by the
component of the other vector which is parallel to the first:

|a |·|b |·cos
Fig. 3-2. Graphical meaning of the dot product of two vectors.

Think of b as a force and |a| as the magnitude of displacement which is equal to work in the
direction of a. Thus:
a ⋅ b =|a||b|cos( )
where
GG313 GEOLOGICAL D ATA ANALYSIS 3–5

x = x 21 + x 22 + .. . + x 2
The maximum principle says that the unit vector (n) making a ·n a maximum is that unit vector
pointing in the same direction as a: If n || a then cos( ) = cos(0°) = 1 and a·n = |a| |n| cos ( ) =
|a||n| = |a|. This is equally true where d is any vector of a given magnitude - that vector n which
parallels d will give the largest scalar product.
Parallel vectors thus have cos( ) = 1, then a·b = |a| |b| and a = b (i.e., 2 vectors are parallel if
one is simply a scalar multiple of the other - this property comes from equating direction cosines)
where:
= |a|
|a | |b|
Perpendicular vectors have cos( ) = cos 90° = 0, so a·b = 0, where a ⊥ b. Squaring vectors is
simply:
a 2 = a ⋅ a T forrowvectors

a 2 = a T ⋅ a forcolumnvectors
Matrix multiplication requires "conformable" matrices. Conformable matrices are ones in which
there are as many columns in the first as there are rows in the second:
C (m ,n) = A (m,p ) ⋅ B (p, n)
So, the product matrix C is of order m x n and has elements cij:
p

c ij = Σ a ikb kj
k =1

This is extension of scalar product - in this case, each element of C is the scalar product of a row
vector in A and column vector in B.
b 11 b 12
c 11 c12 a 11 a 12 a 13
c 21 c 22 = a 21 a 22 a 23 b 21 b 22
b 31 b 32

c 12 = a 11b 12 + a 12b 22 + a 13b 32


In "box form":
GG313 GEOLOGICAL D ATA ANALYSIS 3–6

Elements of C are scalar p


products of corresponding B
vectors as shown here.

m A m C

p n
Fig. 3-3. The matrix product of two matrices.

Order of multiplication is important—usually


A ⋅ B ≠B ⋅A
and unless A and B are square matrices or the order of AT is the same as the order of B (or vice
versa), one of the two products can not even be formed. Order is specified by stating:

A is pre-multiplied by B (for B · A)

A is post-multiplied by B (for A · B)
Multiple products:
D = A ⋅B ⋅ C =A ⋅ B ⋅ C
(The order in which the pairs are multiplied is not important mathematically.)

Computational considerations:
C (m n ) = A (m,p) ⋅ B (p,n )
involves m × n × p multiplications and m ×n × (p -1) additions, so:
E (m, n) =[A (m, p ) ⋅ B (p, q)] ⋅ C (q, n )
gives m × p × q multiplications = [D (m, q)] ⋅ C (q,n)
gives m × q × n multiplications

and
E (m, n) = A (m,p ) ⋅ [B (p,q) ⋅ C (q,n )]
gives p × q × n multiplications = A (m ,p ) ⋅ [D (p ,n)]
gives m × p × n multiplications
GG313 GEOLOGICAL D ATA ANALYSIS 3–7

Therefore:
1 )(A ⋅ B) ⋅ C ⇒ mq(p + n)totalmultiplications
2) A ⋅ (B ⋅ C) ⇒ pn(m + q)total multiplications
If both A and B are 100 x 100 matrices and C is 100 x 1, then m = 100, p = 100, q = 100 , and n
= 1. Multiplying using form l) involves ~l x 106 multiplications, whereas form 2) involves 2 x
104 ; so computing B · C first, then pre-multiplying by A saves almost a million multiplications
and almost an equal number of additions in this example. Therefore order is extremely important
computationally for both speed and accuracy (more operations lead to a greater accumulation of
round-off errors).

Transpose of matrix product is simply multiplication by transpose of the individual matrices in


reverse order:
D =A ⋅B⋅C
DT = CT ⋅ B T ⋅ AT

Multiplication bv I leaves the matrix unchanged:


100
A ·I = I · A = A 0 1 0 :I
001

A: 3 6 9 3 6 9 :AI
287 28 7

Pre-multiplication bv a diagonal matrix:

C = D · A, where D is a diagonal matrix. C is the A matrix with each row scaled by a diagonal
element of D:
d 11 a11 a 12 a 13
D= d 22 ; A = a21 a 22 a 23
d 33 a31 a 32 a 33

a 11d 11 a 12d 11 a 13d 11 ← each element × d 11


C = a 21d 22 a 22d 22 a 23d 22 ← each element × d 22
a 31d 33 a 32d 33 a 33d 33 ← each element × d 33

Post-multiplication bv a diagonal matrix produces a matrix in which each column has been
scaled by a diagonal element.
C =A ⋅ D

a 11d 11 a 12d 22 a 13d 33


C = a 21d 11 a 22d 22 a 23d 33
a 31d 11 a 32d 22 a 33d 33

where each column in A has been scaled by the corresponding diagonal matrix elements, dii.

The determinant of a matrix is a single number representing a property of a square matrix


GG313 GEOLOGICAL D ATA ANALYSIS 3–8

(dependent upon what the matrix represents). The main use here is for finding the inverse of a
matrix or solving simultaneous equations. Symbolically, the determinant is usually given as det
A, |A| or ||A|| (to differentiate from magnitude). Calculation of a 2 x 2 determinant is given by:
a a
|A|= a 11 a 12 = a 11a 22 – a 12a 21
21 22

This is the difference of the cross products. The calculation of an n x n determinant is given by
A = a 11m 11 – a 12m12 + a 13m 13 – ⋅ ⋅ ⋅ –( –1) n a 1n m 1n
where m11 is the determinant with the first row and column missing; m12 is the determinant with
the first row and second column missing; etc. (The determinant of a 1 × 1 matrix is just the
particular element.) An example of a 3 x 3 determinant follows.
a 11 a12 a 13
A = a 21 a22 a 23
a 31 a32 a 33

a 11 a 12 a 13
m11 = a 21 a 22 a 23 = a 22a 33 – a 23a 32
a 31 a 32 a 33

a 11 a 12 a 13
m12 = a 21 a 22 a 23 = a 21a 33 – a 23a 31
a 31 a 32 a 33

a 11 a 12 a 13
m13 = a 21 a 22 a 23 = a 21a 32 – a 22a 31
a 31 a 32 a 33

So
A = a 11m 11 – a 12m 12 + a 13m13
= a 11(a 22a 33 – a 23a 32)– a 12(a 21a 33 – a 23a 31)+ a 13(a 21a 32 – a 22a 31)
For a 4 x 4 determinant, each m 1i would be an entire expansion given above for the 3 x 3
determinant—one quickly needs a computer.

A singular matrix is a square matrix whose determinant is zero. A determinant is zero


if:

(l) any row or column is zero;


(2) any row or column is equal to a linear combination of the other rows or columns. For
example:
1 6 4
A = 2 1 0
5 –3–4

where row 1 = 3(row 2) - row 3.


GG313 GEOLOGICAL D ATA ANALYSIS 3–9

A = a 11(a 22a 33 – a 23a 32)– a 12(a 21a 33 – a 23a 31)+ a 13(a 21a 32 – a 22a 31)
= 1[1(–4)– 0(–3)]– 6[2(–4)– 0(5)]+4[2(–3)– 1(5]=– 4 +48 – 44= 0
The degree of clustering symmetrically about the principal diagonal is another (of many)
properties of a determinant. The more the clustering, the higher the value of the determinant.

Matrix division can be thought of as multiplying by the inverse. Consider scalar division:
x = x 1 = xb–1
b b
which we can write
bb– 1 = 1
Matrices can be effectively divided by multiplying by the inverse matrix. Nonsingular square
matrices may have an inverse symbolized as A-l and AA-l = 1. The calculation of matrix inverse
is usually done using elimination methods on the computer. For a simple 2 x 2 matrix, its inverse
is given by:
a 22 –a 12
A–1 = 1
A –a 21 a 11

An example follows.

A= 7 2 ;
10 3

1 3 –2 = 3 –2
A–1 =
21 – 20 –10 7 –10 7

3 –2 = 1 0 = I
AA –1 = 7 2
10 3 –10 7 01

Solution of Simultaneous Equations

A system of n simultaneous equations in n unknowns:


a11 x1 + a12 x2 + a13 x3 + a14 x4 = b1
a21 x1 + a22 x2 + a23 x3 + a24 x4 = b2
a31 x1 + a32 x2 + a33 x3 + a34 x4 = b3
a41 x1 + a42 x2 + a43 x3 + a44 x4 = b4
can be written as
Ax = b
where
a 11 a 12 a 13 a14
a a a a24
A = a 21 a 22 a 23 a34 =coefficientmatrix
31 32 33
a 41 a 42 a 43 a44
GG313 GEOLOGICAL D ATA ANALYSIS 3–10

x1 b1
x2 b
x= x ; b= 2
3 b3
x4 b4

Then
A – 1 Ax = A – 1b (pre–multiplyingboth sidesby A –1 )
so
Ix = x = A –1 b
gives the solution for values of xl , x2 , x3 , x4 which solve the system. The following example
solves for 2 simultaneous equations. Consider 2 equations in 2 unknowns (e.g., equations of lines
in the x-y plane):
5x1 + 7x2 = 19
3x1 – 2x2 =– 1
In matrix form this translates to:
5 7 x 1 = 19
3 –2 x 2 –1
A ⋅ x = b
To solve this matrix, we need the inverse of A:

2 7
–1 1 –2 –7 31 31
A = =
–10 –21 –3 5 3 –5
31 31
Then x = A-1·b where

2 7 38 7
–1 31 31 19 = 31 – 31 = 1
A b=
3 –5 –1 57 + 5 2
31 31 31 31
or, in box form:

19 :b
–1

2 7
–1 31 31 1 :x
A =
3 –5 2
31 31

So, the values xl = 1 and x2 = 2 solve the above system, or


GG313 GEOLOGICAL D ATA ANALYSIS 3–11

x1 1
x2 = 2
Computational considerations

While this approach may seem burdensome, it is good because it is extremely general and
allows a simple handling and a straight forward solution to very large systems. However, it is
true that direct (elimination) methods to the solution are in fact quicker for fully populated
matrices:

l) Solution to inverse matrix approach involves n3 multiplications for the inversion and n 2 m
more multiplications to finish the solution, where n is the number of equations per set, and m
is the number of sets of equations (each of the same form but different b matrix). The total
number of multiplications is n 3 + n2 m.

2) Solution to directly solving equations involves n 3 /3 + n2 m.

So, while the matrix form is easy to handle, one should not necessarily always use it blindly. We
will consider many situations for which matrix solutions are ideal. For sparse or symmetrical
matrices, the above relationships may not hold. The rank of a matrix is the number of linearly
independent vectors it contains (either row or column vectors):
1 4 0 2
A = 1 0 1 –1
–3 –4 –2 0

Since row 3 = -(row 1) - 2(row 2) or col 3 = col 1 - 1/4(col 2) and col 4 = -(col 1) + 3/4(col 2),
the matrix A has rank 2 (i.e., it has only 2 linearly independent vectors, independent of whether
viewed by rows or columns). The rank of a matrix product must be less than or equal to the
smallest rank of the matrices being multiplied
A (rank2) · B (rank1) = C (rank1)
Therefore, (from another angle), if a matrix has rank r, than any matrix factor of it must have
rank of at least r. Since the rank cannot be greater than the smallest of m or n, in a mxn matrix,
this definition also limits the size (order) of factor matrices. (That is, one cannot factor a matrix
of rank 2, into 2 matrices of which either is of less than rank 2, so m and n of each factor must
also be ≥ 2 ).

The trace of a square matrix is simply the sum of the elements along the principal diagonal. It
is symbolized as tr A. This property is useful in calculating various quantities from matrices.
Submatrices are smaller matrix partitions of the larger supermatrix:
Supermatrix
= A B
F CD

Such partitioning will frequently be useful. Other useful matrix properties:


GG313 GEOLOGICAL D ATA ANALYSIS 3–12

T
1. A T = A
–1
2. A – 1 =A
T –1
3. A –1 = A T = A– T
4. D = ABC,then D – 1 = C – 1 B – 1 A – 1 ;recall that D = ABC; D T = C TB T A T

This "reversal rule" for inverse products may be useful for eliminating or minimizing the number
of matrix inverses requiring calculation.
We will look at a few examples of matrix manipulations. For data matrix A:
123
A = 456
789
and unit row vector j:
j = 111
(l) Compute the mean of each column vector in A (each column has length n = 3):

x c = 1 jA = jn A jn= 1 1 1
3 3 3 3

Then

123
4 5 6 :A
789

jn: 1 1 1 4 5 6 :x c
3 3 3

(2) Compute the mean of each row vector in A:


x r = 1 Aj T = Aj Tn
3
Then

13
1 3 : jTn
13

123 2
A : 456 = 5 : xr
789 8

Last time we looked at the matrix equation


A x=b
and we found that the solution could be written
GG313 GEOLOGICAL D ATA ANALYSIS 3–13

x = A–1 b
For a moment, let us just consider the left hand side A .x. For any x, this product gives a new
vector y. We can say that x is transformed to give y. This is a linear transformation since there
are only linear terms in the matrix multiplication, i.e., the vector y is

a 11x 1 + a 12x 2 + a 13x3


y = a 21x 1 + a 22x 2 + a 23x3
a 31x 1 + a 32x 2 + a 33x3

Thus we call the operation T(x) = A.x a linear transformation. If we stick to three or less
dimensions, it is possible to graphically visualize vectors and operations on them. Figure 3-4
shows an arbitrary vector x and the result y of the linear transformation y = A.x.

Fig. 3-4. The vector x is transformed into another vector y using a linear transformation.

Obviously, as we pick another x, we get another y. We might want to know if there are certain
vectors that, when being operated on, returns a vector in the same direction, possibly longer or
shorter than the original x. In other words, are there an x that satisfies

Ax = x (3.1)
We call the eigenvalue and x the eigenvector. We can rewrite this as
Ax – x =0
Ax – Ix =0
A– I x =0

or
B x = 0= n
In general, the solution to this equation can be written x = B-l.n:

x= 1 n = B –1 n
B

or
GG313 GEOLOGICAL D ATA ANALYSIS 3–14

B x= n=0

So apart from the trivial solution x = [0 0 0], the answer is given when
B =0
We know the determinant of B is

a 11 – a 12 a 13
B = a 21 a 22 – a 23 =0
a 31 a 32 a 33 –

Writing out what the determinant is and setting it to zero gives a polynomial in of order n. For
n = 3 this will in general give a cubic equation; for n = 2 a quadratic equation must be solved.
The solutions 1 , 2 .... etc. are called the eigenvalues of A, and the equation |B | = 0 is called
the characteristic equation. For example, given

A = 17 –6
45 –16

let us find its eigenvalues. We set

17 – –6
A– I = =0
45 – 16 –

or
2 2
– 272– 17 + 16 + +270 = – –2 =0
We easily solve for :
1± 12 –4(–2)
= = 2,– 1
2
So the eigenvalues are 1 = 2, 2 = -1. We now know what must be for (3.1) to be satisfied, but
what about the vectors x ? We still haven't found what they must be, but we will substitute in the
value for in (3.1). Using = 2 first, we find
A x =2 x
(A –2 I ) x = 0
Find
15 –6 x 1 = 0
45 –18 x 2 0

or
15x1 – 6x2 = 0
45x1 – 18x2 =0
which both give
GG313 GEOLOGICAL D ATA ANALYSIS 3–15

x1 = 2 x2
5
So
2 
x = t 
5 
where t is any scalar. Similarly, for = - 1, we find (A + I).x = 0
18 –6 x 1 = 0
45 –15 x 2 0

which reduces to
3x1 – x2 = 0
which gives
 1
x = t 
 3
It may happen that the characteristic equation gives solutions that are imaginary. However, if
the matrix is symmetric it will always yield real eigenvalues, and as long as the matrix A is not
singular, all the will be non-zero and the corresponding eigenvectors will be orthogonal.
The technique we've used applies to matrices of any size n × n, but finding the roots of large
polynomials is painful. Usually, the are found by matrix manipulations that involve successive
approximations to the x. This is of course only practical on a computer. If we restricted our
attention to 2-D geometry, certain properties of eigenvalues and eigenvectors will be clearer.
Consider the matrix A

A = 48
84

(4,8)

(8,4)

Fig. 3-5. Graphical representation of two vectors in the x-y plane.

We can regard the matrix as two row vectors [4 8] and [8 4]. Let us find the eigenvalues and
eigenvectors of A :
GG313 GEOLOGICAL D ATA ANALYSIS 3–16

4– 8
=0
8 4–

2
16 – 8 + – 64 =0
2
– 8 – 48 = 0

= 8 ± 64+ 4⋅48 = 12,– 4


2
The eigenvectors are:
4– 12 8 x1 –8 8 x1 0
x = x2 = 0
8 4 – 12 2 8 –8

– 8x 1 +8x 2 = 0 ⇒ x 1 = x 2 e1T = 1 1

4+ 4 8 x1 88 x1 0
8 4 +4 x2 = 8 8 x2 = 0

8x 1 +8x 2 =0 ⇒ x 1 =–x 2 e1T = –1 1

We find that the eigenvectors define the minor and major axis of the ellipse which goes through
the two points defined by (4,8) and (8,4). The length of these axes are given by the absolute
values of the eigenvalues, 1 = 12 and 2 = 4.

e2 e1
λ

12
2
=

1=
4

Fig. 3-6. The eigenvectors, scaled by the eigenvalues, can be seen to represent the major and minor axes of the
ellipse that goes through the two data vectors (8, 4) and (4, 8).
GG313 GEOLOGICAL D ATA ANALYSIS 3–17

It is customary to normalize the eigenvectors so that their length is unity. In our case we find

e T1 = 2 2 and e2T = – 2 2
2 2 2 2
The axes of the ellipse are then simply
v1 = e 1 1

v2 = 2e2

Since the sign of the eigenvector is indeterminate we choose to make all eigenvalues positive and
thus place the minus-sign in 2 inside e2. You'll notice that vl . v2 = 0, i.e., they are orthogonal.
The eigenvectors make up the columns in a new matrix V
1 1
V = e1 e2 = 2
2 1 –1

Let us expand the eigenvalue equation (3.1) A .x = x to a full matrix equation. We have
A e1 = e
1 1

A e2 = 2e2

We can combine these two matrix equations into one.


A ·V =V ·
where

1 0
=
0 2

From this we may learn two things. l) Post multiply by V-1 :


A V V –1 = A = V V–1
The eigenvalue-eigenvector operation let us split a symmetric matrix A into a diagonal matrix
(with the eigenvalues along the diagonal) and the matrix V (with the eigenvectors as rows) and
its inverse V-1.

2) Pre-multiplying by V-1:
= V–1 A V
This operation transforms the A matrix into a diagonal matrix . It corresponds to a rotation of
the coordinate axes in which the eigenvectors in V becomes the new coordinate axes. Relative to
the new coordinates, conveys the same information as A does in the old coordinates. Because
is a simple diagonal matrix, the rotation (transformation) makes the relationships between
rows and columns in A much clearer.

Simple Regression and Curve Fitting

Whereas an interpolant fits each data point exactly, it is frequently advantageous to produce a
smoothed fit to the data - not exactly fitting each point, but producing a "best" fit. A popular (and
GG313 GEOLOGICAL D ATA ANALYSIS 3–18

convenient) method for producing such fits is known as the method of least squares.
The method of least squares produces a fit of a specified (usually continuous) basis to a set of
data points which minimizes the sum of the squared mismatch (error) between the fitted curve
and data. The error can be measured as in Figure 13-1:
This regression of y on x is the most common method. Less common methods (more work
involved) is regression of x on y and orthogonal regression (which we will return to later).

y
error

Fig. 3-7. Graphical representation of the regression errors used in least-squares procedures.

y y
error error

x x
Fig. 3-8. Two other regression methods: regressing x on y and orthogonal regression.

Least squares simple example

Consider fitting a single "best" linear slope to n data points. This can be a scatter plot of y(t),
x(t) plotted at similar values of t; or a simple f(x) relationship. At any rate, y is considered a
function of x. We wish to fit a line of the form
y = a1 + a2(x – x0 ) (3.2)
and must therefore determine a value for a1 and a2 which produces a line that minimizes the sum
of the squared errors (x0 is specified beforehand). So

Σ (y
n
minimize computed – y observed )2
i= 1

Ideally, for each observation yi at xi we should have


GG313 GEOLOGICAL D ATA ANALYSIS 3–19

a1 + a2(x1 – x0 )= y1
a1 + a2(x2 – x0 )= y2
a1 + a2(x3 – x0 )= y3

a1 + a2(xn – x0 )= yn
There are many more equations (n - one for each observed value of y) than unknowns (2 - a1 and
a2 ). Such a system is over-determined and there exists no unique solution (unless all the yi 's do
lie exactly on a single line, in which case any two equations will uniquely determine a1 and a2 ).
In matrix form (i.e., Ax = b):

1 (x1 – x0) y1
1 (x2 – x0 ) a1 = y2
a2 (3.3)

1 (xn – x0) yn

Since A is a non-square matrix it has no inverse, so the equation cannot be inverted and solved.
Consider instead the error in the fit at each point:
a1 + a2(x1 – x0 )– y1 = e1
a1 + a2(x2 – x0 )– y2 = e2

a1 + a2(xn – x0 )– yn = en
We wish to determine the values for a1 and a2 that minimize

Σe
n
2
i
i =1

This will minimize the variance of the residuals about the regression line and give the least-
squares fit. Notation:

Σe
n
S = E(a 1,a 2)= 2
i ( = eTe in matrixform) (3.4)
i =1

So, E(a1 ,a2 ) and the minimum of this function (with respect to the two unknown coefficients)
can be determined using simple differential calculus, where
∂E(a1,a2) ∂E(a1,a2)
∂a1 = ∂a2 = 0 (at theminimum)

∂E ∂ ∂
Σe Σ
n n 2
2
∂a1 = ∂a1 i = ∂a a1 + a2(xi – x0) – yi
i=1 1 i=1

Σ
n
=2 a 1 + a 2(x i – x 0) – y i =0
i =1

∂E ∂
Σ e 2i = ∂a∂ 2 Σ
n n 2

∂a2 = ∂a2 a1 + a2(xi – x0) – yi


i=1 i=1
GG313 GEOLOGICAL D ATA ANALYSIS 3–20

Σ
n
=2 a1 + a2(xi – x0) – yi (xi – x0 ) = 0
i =1

These two equations can now be expanded into their individual terms forming what are known as
the normal equations:

Σ Σ a2 (xi – x0 ) = Σ yi
n n n
a1 +
i =1 i=1 i =1

Σ Σ a2 (xi – x0 )2 = Σ yi(xi – x0)


n n n
a1( xi – x0) +
i =1 i=1 i =1

The normal equations thus provide a system of 2 equations in 2 unknowns which can be uniquely
solved. Rearranging,

na1 + a2 Σ (xi – x0) = Σy


n n

i
i =1 i=1

a1 Σ ( xi – x0) + a2 Σ (xi – x0 )2 = Σ yi(xi – x0)


n n n

i=1 i =1 i =1

Notice that all sums are sums of known values that sum to simple constants. Solving:

a1 = 1n Σ yi – n2 Σ (xi – x0 )
n
a n
i =1 i =1

substitute this into the second equation:

n iΣ n iΣ 0 Σ 2Σ Σ i i 0
1 n y – a2 n (x – x ) n (x – x ) + a n (x – x )2 = n y (x – x )
i i i 0 i 0
=1 =1 i=1 i =1 i =1

Now solve for a 2 :


2

n iΣ i Σ Σ (x – x ) + a2 Σ (xi – x0)2 = Σ yi(xi – x0 )


1 n y n (x – x ) – a2 n n n

=1 i =1
i 0 n i=1
i 0
i =1 i =1

Σ Σ (x i – x0) = Σ yi(x i – x 0) – 1n Σ y i Σ (x
n n n n n
a2 (xi – x 0)2 – 1n i – x 0)
i =1 i =1 i =1 i =1 i =1

Finally,

Σ y (x i – x 0) – n Σ y i Σ (x i – x 0)
n n n 2
1
Σ (x i – x0 ) – 1n Σ (x
n n
2
a2 = i i – x0)
ii =1 i =1 i =1 i =1 i =1

Substitute a2 into the first equation to solve for a 1 . So, we compute the sums

Σ y, Σ y (x – x ), Σ (x – x ) , and Σ (x – x )
n n n n
2
i i i 0 i 0 i 0
i =1 i=1 i=1 i =1

and substitute into the above equation to give a1 and a2 producing the best fit. In matrix form the
normal equations are:

Σy
n

Σ (x – x )
n
n i 0 a1 i =1
i
i =1
a2 = (3.5)
Σ
n

Σ (x – x ) Σ (x – x )
n n
2 yi(xi – x0)
i 0 i 0 i =1
i =1 i =1
GG313 GEOLOGICAL D ATA ANALYSIS 3–21

So, Nx = B, e.g., of the form Ax = b. Since N is square and of full rank, this equation is solved in
the standard manner:
N – 1Nx = N – 1B
or
x = N –1 B
This problem was simple enough (2 x 2) to solve brute force for a1 and a2 . For larger systems
this becomes impractical and a matrix solution to the non-square A·x = b equation must be
sought. We will next look at the general linear least-squares problem and find a solution in
matrix notation.

General Linear Least Squares

We have looked at a few special cases where we have sought to fit a "model" to "data" in a
least-squares sense. Fitting a straight line to the x-y points was a very simple example of this
technique. We will now look at the more general problem of finding the coefficients for any
linear combination of basis functions that fits some data in a least squares sense. There are
numerous situations where this is needed:

Situation Model Data


Curve Fitting Coefficients of polynomials, sin/cos, etc. Points in x-y plane
Gravity modeling Densities of subsurface polygons Gravity observations
Hypocenter location Small pertubations to hypocenter Seismic arrival times
position

While the basis functions in these cases are all vastly different, they are all used in a linear
combination to fit the observed data. We will therefore take time to investigate how such a
problem is set up, and how it can all be simplified with matrix algebra.

General (linear) least squares.

Consider the least squares fitting of any continuous basis of the form
x1, x2, x3 , , xm
For example

polynomial basis Fourier sine basis

x1 = x0 x1 =sin (2π x/ T)

x2 = x1 x2 =sin (4π x/ T)

x3 = x2 x3 =sin (6π x/ T)
GG313 GEOLOGICAL D ATA ANALYSIS 3–22

xm = xm – 1 x j= sin (2mπ x/ T)

We desire to fit an equation of the form


y = a1 x1 +a2 x2 + + am xm
to a data set of n data points, where n > m, by minimizing E:

E = Σ (e i )2 = Σ (a 1x i1 + a 2x i 2 +
n n
+ a m x im –y i)2 (3.6)
i =1 i= 1

where yi is the observed value. We can write this explicitly:


a 1x 11+ a 2 x 12 + + a mx 1m – y 1 = e1
a 1x 21+ a 2 x22 + + a mx 2m – y 2 = e2

a 1x n 1+ a 2x n2 + + a mx nm – y n = en
where xij is the j'th x function of the basis, evaluated at the value xi . To minimize E, we set
∂E(a )j
∂a j = 0 (3.7)

So
∂E(a 1)
= ∂ Σ (a 1xi 1 + a 2x i2 +
n
+ a mx im – y i)2
∂a 1 ∂a 1 i = 1
Σ (a x
n
=2 1 i1 + a 2x i2 + + a mx im – y i )xi1 =0
i =1

∂E(a 2)
= ∂ Σ (a 1xi 1 + a 2x i 2 +
n
+ a mx im – y i)2
∂a 2 ∂a 2 i = 1
Σ (a x
n
=2 1 i1 + a 2x i2 + + a mx im – y i )xi2 =0
i =1

∂E(a m)
Σ (a x
n
= ∂ + a 2x i 2 + + a mx im – y i)2
∂a m ∂a m i= 1
1 i1

Σ (a x
n
=2 1 i1 + a 2x i2 + + a mx im – y i )xim =0
i =1

or
∂E(a j )
=2 Σ (a 1xi1 + a 2xi 2 +
n
+ a mx im – y i)xij = 0
∂aj i =1

Rearranging these normal equations gives


GG313 GEOLOGICAL D ATA ANALYSIS 3–23

a 1 Σ x i21 + a 2 Σ x i2 xi1 + + a m Σ x imx i 1 = Σyx


n n n n

i i1
i =1 i =1 i =1 i =1

a 1 Σ xi 1x i 2 + a 2 Σ x i22 + + a m Σ x imx i 2 = Σyx


n n n n

i i2
i =1 i =1 i =1 i =1

a 1Σ x i1 xim + a 2 Σ xi2 xim + + a m Σ x 2im = Σ y ixim


n n n n

i =1 i =1 i =1 i =1

or (for each j ):

a1 Σ xi1 xij + a2 Σ xi2 xij + + am Σ xim xij = Σ yi xij


n n n n

i=1 i =1 i=1 i =1

This provides a closed system of m normal equations, one for each coefficient, e.g.,
∂E(a )j
∂a j = 0 for j =1, 2, ..., m.
In matrix form

Σx Σx x Σx Σyx
n n n n
2
i1 i2 i1 imxi 1 i i1
i =1 i =1 i =1 i =1

Σx x Σ x Σx Σyx
n n n n
2 a1
i1 i2 i2 x
im i 2 i i2
i =1 i =1 i =1 a2 i =1
=
am
Σx Σx Σx Σ yx
n n n n
2
x
i1 im x
i2 im im i im
i =1 i =1 i =1 i =1

or simply
N⋅x =B
where N is the (known) coefficient matrix, x the vector with the unknowns (aj ), and B contains
weighted sums of known (observed) quantities. Solve for the a j in the x vector (N is square and
of full rank):
N – 1 ⋅ N ⋅ x = N – 1 ⋅B

x = N –1 ⋅B
where x is the solution for the aj . The resulting aj values are the ones which satisfy (3.7) and
therefore the same ones which, in combination with the chosen basis, produce the "best" fit to the
n data points such that (3.6) is minimized.

Last time we found the solution to a general linear least squares problem led us to the matrix
form
GG313 GEOLOGICAL D ATA ANALYSIS 3–24

Σx Σx x Σx Σyx
n n n n
2
i1 i2 i1 imxi 1 i i1
i =1 i =1 i =1 i =1

Σx x Σ x Σx Σyx
n n n n
2 a1
i1 i2 i2 x
im i 2 i i2
i =1 i =1 i =1 a2 i =1
=
am
Σx Σx Σx Σ yx
n n n n
2
i1x im i2 xim im i im
i =1 i =1 i =1 i =1

or simply
N⋅x =B (3.8)
where N is the (known) coefficient matrix, x the vector with the unknowns (aj ), and B contains
weighted sums of known (observed) quantities. Solve for the aj in the x vector (N is square and
of full rank):
N – 1 ⋅ N ⋅ x = N – 1 ⋅B

x = N –1 ⋅B (3.9)
where x is the solution for the aj . The resulting aj values are the ones which satisfy (3.7) and
therefore the same ones which, in combination with the chosen basis, produce the "best" fit to the
n data points such that (3.6) is minimized.
We will now look at a simpler approach to the same problem using matrix algebra. We have
y1 e1
x 11 x12 x 1m a1
y2 e2
x 21 x22 x 2m a2
× – y 3 = e3
x n1 xn 2 x nm am
yn en
A · x – b = e
Since m < n , the A matrix is rectangular. This can be written
a1 e1
x 11 x12 x 1m –y 1
a2 e2
x 21 x 22 x 2m –y 2
× = e3
am
x n1 xn 2 x nm –y n
1 en
C · X = e
These matrices describe the system listed earlier. We wish to find the aj values which minimize
E = eTe. The matrix form can be partitioned as

x
A –b = e
1
GG313 GEOLOGICAL D ATA ANALYSIS 3–25

A –b x = A ⋅ x + (–b ⋅ 1)= A ⋅ x – b = e
1
We will, for now, refer to the [ A -b] matrix as the C matrix and the [x 1] T column vector as the X
vector. So, we have
C ⋅ X =e
(recall C is an n x m+l rectangular matrix). The i'th error, ei , is the dot product:
a1 xi1
a2 xi2
C i ⋅ X = e i = x i1 x i 2 x im –yi = a1 a2 am 1
am x im
1 –yi

where Ci is the i'th row vector in C. The squared i'th error, ei 2 is then
e Ti ⋅ e i = C i ⋅ X ⋅ C i ⋅ X = X T⋅ C Ti ⋅ C i ⋅ X
T

or
x i1 a1
x i2 a2
e Ti ⋅ e i = a 1 a 2 am 1 x i1 x i2 x im –yi
x im am
–yi 1

where we have used the reversal rule for transposed products. The sum of the ei 2 over all the i's is
thus
e T ⋅ e = X T ⋅ C T⋅ C⋅ X =

x i1 a1
xi2 a2
Σ
n
a1 a2 am 1 x i 1 x i2 x im –yi
i =1
x im am
–yi 1

XT C Ti Ci X
The product CTC can be computed to form a new matrix R. Since CTm+1,nC n,m+1=Rm+1,m+1 the
resulting R matrix is square and symmetric. So,
E = eT ⋅ e = XT ⋅ C T ⋅ C ⋅ X = XT ⋅ R ⋅ X
To minimize E, we find
∂E a j
= ∂ X T ⋅R ⋅ X = ∂X R ⋅ X + X T⋅R ∂X
T

∂a j ∂a j ∂a j ∂a j
For the 2nd coefficient (as an example), we get
GG313 GEOLOGICAL D ATA ANALYSIS 3–26

∂X T = 0 1 0 0 0 = ∂X
T

∂a 2 ∂a 2
Thus, the partial derivative of the error is
a1 0
a2 1
∂E = 0 1 0 0 0 R + a1 a2 am 1 R
∂a 2
am 0
1 0

For all coefficients we set all the partial derivatives to zero:


∂E a j
=0 = X T ⋅ R ⋅ X + X T ⋅ R ⋅ X
∂a j
where

1 0 0 0
∂X T
= 0 1
T 0 0 = I O
X =
∂a j m m

0 0 1 0

where Im is the m x m identity matrix and O m the null vector of length m. Consider first R
(= CT C):

x11 x 12 x 1m –y 1
x 21 x 22 x 2m –y 2
C:
x n1 xn 2 x nm –y n

x11 x 21 x n1 Σx Σ x x
2
i1 i2 i1 Σx x –Σ y i x i1
im i1

x12 x 22 x n2 Σx x Σx
i1 i2
2
i2 Σx x –Σ y i x i2
im i2
T
C : :R
x 1m x 2m x nm Σ x x Σx x
i1 im i2 im Σ x –Σ y x
2
im i im

–Σ y x –Σ y x –Σ y x Σy
– y1 – y2 – yn 2
i i1 i i2 i im i

The R matrix should look familiar. Consider the partitioned matrix multiplication:
GG313 GEOLOGICAL D ATA ANALYSIS 3–27

n A(n x m) -b
R= CT C=
(m x n)

m+1
n m+1

-A b
T

m+1
T
m+1

A (m x n) AT A (m x m)

T T
-b -b A bTb

Notice that ATA is matrix N of the normal equations and -AT b = (-bTA)T equal matrix B.
Because R is symmetrical (so R = RT) we have
T
XT ⋅ R = R ⋅ X
So
∂E(a j )
= X T R X + X T R X =0
∂aj

=2X T R X =0 X T R X =0
Consider X T · R :
GG313 GEOLOGICAL D ATA ANALYSIS 3–28

m+1

AT A

-A b

m+1
T
R

T
-b A bTb

m+1 m+1

-A b
Always multiplied

m
.T

T
m by zero
X I 0 A TA
(m x m)

N B
And X T ⋅ R ⋅ X :

x m+1

m+1

T
AT Ax - A b
m

AT A T
-A b Nx - B

Finally, since
XT R X = A T A x – AT b = 0

AT A x = AT b
–1 –1
AT A A T A x = AT A AT b

–1
x = AT A AT b (3.10)
or
x = N–1 B
GG313 GEOLOGICAL D ATA ANALYSIS 3–29

as before. Therefore, the unknown values of the aj (in the x vector) can be solved for directly
from the system:
a 1x 11 + a 2x 12 + + a mx 1m = y1
a 1x 21 + a 2x22 + + a mx 2m = y2

a 1x n 1 + a 2x n2 + + a mx nm = yn
or simply
A ⋅ x =b
where A is of order n x m with n > m. The least squares solution then becomes
–1
x = AT ⋅ A AT ⋅ b
where ATA is a square matrix of full rank, with order r = m, and thus invertable.

Fitting a straight line, revisited

We will again consider the best-fitting line problem, this time with errors i in the y-values.
We want to measure how well the model agrees with the data, and for this purpose will use the
2 function, i.e.
n 2
y i – a – bx i
2
a,b = (3.11)
i =1 i

Minimizing 2 will give the best weighted least squares solution. Again we set the partial
derivatives to 0:
∂ 2 n
y i – a – bx i
= 0= –2
∂a i =1
2
i

(3.12)

∂ 2 n
y i – a – bx i
= 0= –2 xi
∂b i =1
2
i

Let us define the following:

1 xi yi x 2i x iy i
S= 2
Sx = 2
Sy = 2
S xx = 2
S xy = 2
i i i i i

Then (3.12) reduces to


aS + bS x = S y
aS x + bS xx = S xy
With
= SS xx – S x2
we find
GG313 GEOLOGICAL D ATA ANALYSIS 3–30

S xxS y – S xS xy
a=
(3.13)
SS xy – S x S y
b=

All this is swell but we must also estimate the uncertainties in a and b. For the same i we
may get large differences in errors in a and b. Although not shown here, consideration of
propagation of errors shows that the variance f2 in the value of any function is

∂f 2
2
= 2 (3.14)
f i
∂yi

For our line we can directly find the derivatives of a and b with respect to yi from (3.13):

Figure 3-9. The uncertainty in the line fit depends to a large extent on the distribution of the x-positions.

∂a = S xx – S xx i
∂y i 2
i

∂b = Sx i – S x
∂y i 2
i

Inserting this result into (14.7) then gives


2 2
2 2
S xx – S xx i S xx –2S xxS x x i + S x2x 2i
a = i 2
= 2
2
i i

2
S x2 2
S xx 1 – 2S xxS x xi x 2i S xxS 2S xx S x2 S x2S xx
= 2 2 2 2
+ 2 2
= 2
– 2
+ 2
i i i

S xx S xxS – S x2 S xx
= 2
=
(3.15)
and
2
2 2
Sx i – S x S 2x2i – 2SS xxi + S x2
b = i 2
= 2
2
i i
GG313 GEOLOGICAL D ATA ANALYSIS 3–31

S x2
2 2 2 2
x 2i 2SS x xi 1 = S S xx – 2SS x + S x S
= S2 2
– 2 2
+ 2 2 2 2 2
i i i

S S xxS – S x2
= 2
=S
(3.16)

Similarly, we can find the covariance ab from

2
= 2 ∂a ∂b = – S x
ab i
∂y i ∂y i

Then, the correlation coefficient becomes

–S x
r=
SS xx (3.17)

It is therefore useful to shift the origin to x where r = 0.


Finally, we must check if the fit is meaningful. We use the 2 value computed and get
critical χ2α for n - 2 degrees of freedom which we use to see if 2 exceeds this value. If it doesn't
we may say the fit is significant at the level.

What if some data constraints are more reliable than others? We may give that residual more
weight than the others:
e1
2e 2
e=
en

In general, we can use weights wi for each error so that the new error ei ' = e i ·wi . We do this by
introducing a weight matrix w which is a diagonal matrix:

w1
w2
w=
wn

Now the sum of the squared errors, E, becomes


E = e T ⋅ e = eT ⋅ w T ⋅ w ⋅ e = e T ⋅ W ⋅ e
where we have introduced W = wT w: Since we have wTe = w(ATx - b) we obtain
E =(w·A·x – w·b)T(w·A·x – w·b) = (x T ·A T·w T – b T·w T)(w·A·x – w·b)
= x T ·A T ·w T ·w·A·x – x T ·A T ·w T ·w·b – b T ·w T ·w·A·x + b T ·w T·w·b
We substitute for W. Then
GG313 GEOLOGICAL D ATA ANALYSIS 3–32

∂E = 0 = x T ⋅A T ·W·A·x + x T·A T ·W·A·x – x T·A T·W·b – b T ·W ·A·x


∂a j
Since x only contains the a j , we have x T = x = I . We find

A T ·W·A·x + x T·A T ·W·A – A T·W ·b – b T ·W·A = 0


Again, the 2nd and 4th term are the transpose of the 1st and 3rd. Because each term represent a
symmetrical matrix our equation reduces to
2A T ·W·A·x – 2A T ·W·b = 0
which gives us the weighted linear least squares solution
–1
x = A T·W ·A A T·W·b (3.18)

You might also like