Mathematics For Seismic Data Processing and Interpretation (PDFDrive)
Mathematics For Seismic Data Processing and Interpretation (PDFDrive)
Foreword
by
R. L. French
Racal Geophysics Limited
Introduction
by
M. Bacon
Shell
Graham ~ Trotman
First published in 1984 by
This publication is protected by international copyright law. All rights reserved. No part
a
of this publication may be reproduced, stored in retrieval system, or transmitted in
any form or by any means, electronic, mechanica~ photo-copying, recording or otherwise,
without the prior permission of the publishers.
FOREWORD ix
PREFACE xiii
INTRODUCTION xv
1 Functions 1
2 Polynomials and Step Functions 2
3 Trigonometric Functions 7
4 Power and Exponential Functions 14
5 Inverse Functions 16
6 New Functions from Old 20
7 Numbers 20
1 Introduction 27
2 Higher Derivatives 35
3 Maxima and Minima 37
4 Taylor Series and Approximations 41
5 Partial Derivatives 43
6 Higher Order Partial Derivatives 46
7 Optimisation 48
Chapter 3 INTEGRATION 51
4 Double Integration 63
5 Line Integrals 66
6 Differential Equations 73
1 Introduction 79
2 The Beginning 79
3 Functions of Complex Variables 83
4 Differentiation and Integration 86
Chapter 5 MATRICES 89
1 Introduction 89
2 Definitions and Elementary Properties 89
3 Matrices 91
4 Multiplication of Matrices 93
5 Special Types of Matrices 96
6 Matrices as Functions 98
7 Linear Equations 100
8 Eigenvalues and Quadratic Forms 108
1 Introduction 112
2 Probability 115
3 Permutations and Combinations 119
4 Probability Distributions 121
5 Joint Distributions 128
6 Expected Values and Moments 129
7 Real Data Samples 134
8 Two Variables 138
9 Simulation and Monte Carlo Methods 141
10 Confidence Intervals 144
11 Stochastic Processes 147
1 Introduction 149
2 Fourier Series 149
3 Some Examples of Fourier Analysis 152
4 The Phase, Amplitude and Exponential Formulation 154
5 Fourier Transform 158
6 The z-Transform 162
Contents vii
1 Wavelets 197
2 Predictive Deconvolution 204
INDEX 251
FOREWORD
It was also clear that there were no really suitable courses in the UK to
teach the relevant mathematics (and the related physics and computing).
This is not meant as a criticism of the various courses currently available,
it is purely a comment on their usefulness to our specific needs. Most of
the instruction offered in seismics is of two varieties, either of an acquaint-
ance type or of an advanced nature to already practising experts neither of
which include the fundamentals which were required. The length and size
of these courses usually also preclude a high level of direct one-to-one
instructor-student contact. None of this was what we at Racal required and
consequently, in December 1981, in conjunction with the Racal College at
Brixham, we decided to take the bull by the horns and set up a course
tailored to our specific needs. After consultation with colleagues at the
University of East Anglia, University College of Wales, Aberstwyth, Merlin
Geophysical Company Limited and Racal College, we produced a proposed
curriculum for a three-part course.
Part One, to be taught by lecturers of Racal College at Brixham, would
involve a review of electronics, physics, acoustics and the use of a suite of
the analogue equipment employed by Racal Geophysics and would be of
three weeks' duration.
Part Two, to be provided by staff from both Racal Geophysics Limited and
Merlin Geophysical Company Limited, would include multi-channel digital
acquistion processing, by computer, terminating i~a period of hands-on
experience at Merlin Geophysical Company Limited at Woking. This would
be a two week course.
Part Three, to be provided by the School of Mathematics of the J)niversity
of East Anglia, would be a series of correspondence notes in mathematics
to be supplied to the student over a period of six months. The idea being
that they would be in stages and lead into the other parts of the course.
The text of this book is an expansion of these notes.
The course has been inclined specifically towards digital seismic recording
and processing, although it would obviously provide a good general back-
ground to any time-series, digital analysis. The starting point for this course
has been at the end of high school. The early chapters are a review of the
A-level syllabus and a review of first year university additional mathematics
in areas of specific relevance. The subjects covered are special functions,
trigonometric functions for waveforms and calculus. The content and the
examples, although of an easy high schoolj first year nature, are all orientated
towards seismics and were designed to assist with Part I of the course. In
that the objective was to prepare a mathematics book for use by geophysicists
all instr~ction is of a purely mathematical nature and no attempt is made
to apply the text to physical theory. Therefore, in the first three chapters
the mathematics used with alternating current theory, operational amplifiers,
Huygens' principle and the Rayleigh-Willis curve will be explained but no
attempt to apply it will be made. One departure from this practice is the
section on number systems where a description of binary, octal and
hexadecimal bases could be equally at home in a text book on physics or
electronics. However, as this is so crucial to the further development of
digital time series analysis it was decided to include this.
Foreword xi
The later chapters are much more complex and ultimately go beyond
what is usually taught in a first degree subsidiary mathematics course.
The middle section covers complex numbers which introduces the idea
of vector transform, a section that is further developed in the chapter on
Fourier analysis; matrices which are the main number system used in
computers and is the direct form for de-multiplexing. The next chapter is
devoted to stochastic processes and probability and a development of the
concepts of mean, median, root mean square and standard deviation. The
final chapters deal with Fourier Analysis and Transforms and Time Series
Analysis. Covered in these chapters are the mathematical concepts involved
in Fourier synthesis and decomposition, predictive deconvolution and
Weiner Filtering.
The actual processing concepts and details were covered by the course
of lectures produced by Merlin Geophysical Company. The point of the
mathematics course was that the Merlin lecturers would not have to be
interrupted by definitions of triple integral signs and the convolution and
de-convolution signs. Short sections are also included on the z-transform,
and the wave equation and the frequency-wave number domain.
To expand the last point, this text is not meant to be a mathematical
discussion of predictive deconvolution and the like, neither is it meant to
be a rigorous and complete mathematical text. The idea is that, those who
have not had a heavy University mathematical training, will be able to use
this book as a reader and background to the classic processing papers by
Robinson, Backus, Claerbout, Weiner, Berkhout, etc.
One of the problems encountered in designing this course in general,
which was also highlighted with the mathematics content, was the bringing
together of a number of advanced level ideas from a variety of subjects and
disciplines. This is evident in the number of starting points, almost with
the beginning of each chapter, and the way in which the whole text only
really comes together in the final chapters. Consequently, the text does not
flow from one chapter to next as would be the case in a conventional text
book. In other words, in order to carry out seismic data processing it is
necessary to have an understanding of binary number systems, discrete
wavelet sampling, digital filtering and Fourier analysis as well as the work-
ings of digital recording and large mainframe computing techniques.
Incidentally, an understanding of the geology, and geophysics of the earth
is quite helpful as it gives one an idea after all the processing as to how
well you have done.
It cannot be emphasised too much that this is a textbook, covering many
of the mathematical techniques employed in modern day seismic data
processing and as such most of the examples and references are linked to
specific problems. Indeed, an attempt has been made to correlate the
examples through to final processing ideas and to only consider that which
is rei event to seismics; consequently, this is not a complete mathematical
course. But like all mathematics text books, it cannot be read from cover
to cover in an evening like a novel. Time has to be taken over each chapter
and the examples worked in order to achieve a reasonable understanding.
Although written essentially for geophysical work, most of the techniques
involved are equally relevant to the field of data transfer, whether it be by
xii Foreword
telecommunication, both radio and wire, fibre optics, or acoustics. The only
difference being that remarked on by one of the authors towards the end
of the final drafting, that in seismics we were looking for and at the noise,
rather than at the signal which is usually the case. The problem is that there
is noise arid noise. Hopefully this book will remove some of the noise from
the processing.
R. L. French
Racal Geophysics Limited
PREFACE
mentioned above. If, however, the layers are several thousand metres apart,
the delayed arrival will mimic a reflection from a much deeper layer. Clearly,
we would not want to drill an apparent deep structure which is actually
such an artefact of the method.
Great progress has been made over the last twenty years in dealing with
these problems. Perhaps the greatest single advance has been the use of
digital recording of the receiver signals. This has meant that the data can
be fed into a computer, where a wide variety of ingenious signal-processing
techniques can be applied. In this way, the problems outlined above can
be partially solved. Thus, it is possible to estimate the shape of the smeared-
out pulse that actually generated the data, and then process the records to
approximate what they would have been with an ideally sharp pulse. It is
also possible to predict the arrival times of energy that has spent some time
trapped between sub-surface layers, and then subtract this signal from the
record, so that we are no longer fooled into believing that these echoes
come from genuine deep reflectors.
These signal-processing techniques were first applied, of necessity, in the
search for deep oil accumulations, but in recent years they have been
increasingly used in the rather simpler problem of shallow sub-surface
investigation. This is usually carried out for engineering purposes (predic-
tion of drilling hazards, and selection of sites for offshore production
platforms or pipeline routes). In this application, the depth of investigation
required is usually only a few hundred metres, but a relatively high resolution
may be needed.
To apply these signal-processing techniques with confidence, it is impor-
tant to understand their nature and limitations. A cookerybook approach
is not enough; what is ideally needed is a thorough understanding of what
happens to the seismic signal as it propagates through the earth, and the
effects of the source and receiver parameters on our picture of the sub-
surface. It is not only the geophysicist directly concerned with processing
the data who needs this appreciation; anyone who interprets the data in
geological terms needs a clear understanding of the distortions introduced
into his picture of the sub-surface by the imperfections of the seismic
reflection technique.
Any useful and detailed account of the seismic reflection method inevi-
tably involves a good deal of mathematics. Many texts cover this ground,
but they all assume that the reader has a good grasp of the mathematical
background, and make no real attempt to explain the mathematics (as
opposed to the physics) of the development of the subject. However, it is
a dubious assumption that the user of such a book will have enough
mathematics to be able to follow the arguments in detail. With the increased
use of the seismic method, a wide variety of people need to learn about it,
and their pre-existing mathematical knowledge will range from a fairly
elementary school level in some cases to University level in others. To fill
gaps in his knowledge, it would be possible for the student to read parts
of various traditional texts, assembling a "course" directly related to his
needs; this type of study, however, requires careful guidance from an
experienced teacher. This book brings together these scattered topics,
to give a coherent account of the background mathematics needed to
xviii Introduction
understand the seismic method, and thus covers ground which no other single
textbook does. It begins with a section on basic concepts, which will be
useful as a reference source even to those familiar with them, and goes on
to develop the subject to a level at which the student can read for himself
the (sometimes rather abstruse) literature on seismic processing. The final
section of the book provides a bridge to the more advanced material to be
found elsewhere.
An important feature of the text is the provision of numerous examples,
and some illustrative computer programs. By working through these, the
student can acquire the detailed familiarity with mathematical manipulation
which he will need if he is to understand the basis of modern seismic
techniques.
M. Bacon
Shell UK Exploration and Production Ltd
Chapter 1
SPECIAL FUNCTIONS
1 FUNCTIONS
A-----':)>-----i f-----))-- B
Fig. 1.1
distance
time
Fig. 1.2
distance
time
Fig. 1.3
be the population and B would just consist of {Yes, No}. In this case the
function takes only one of two values. In general our functions will normally
have as their values real numbers; although some of them may look a little
unusual. We note in passing that we can often replace non-numerical
outcomes by numerical values, i.e.
Yes=i and No=O
making numerical processing possible.
In our discussions special functions play an important role in later
chapters. We continue by looking first at the polynomial functions and then
introduce the trigonometric functions. Section 3 introduces exponential
functions and ends with a discussion of the inverse functions, specifically,
log and the inverse trigonometric functions.
we have
p(x) +q(X) = (p + q)(X) = (Po + qO) + ... +(Pn + qn)X n
The formula for multiplication is slightly more complex, let r(x) = p(x)q(x)
where
then
i
Ci = L Pjqi-j O::::;i::::;m+n
j=O
I
-=q(x)
p(x)
However there are problems in doing this. Let p(x) = 1- x, then q(x) =
I +X+X2+ .... If x= I, q(I)= I +1 +1 + ... , which is infinite. So there is
a new complication to consider when we expand the definition of poly-
nomials in this way.
Polynomials have the nice property of being continuous. This means that
there are no jumps in the graph. If f(x) has no jumps, and we have calculated
f(xl) and f(X2), where XI < X2, and they have different signs then there is a
value Xo between XI and X2 so that f(x2) = o. This value Xo is called a root
of the equation f(x) = 0 and is the point where the curve cuts the x-axis.
This could also be used as a basis for an iterative method of solving such
equations, as follows:
Letf(x) = x 3 -2 and choose XI = I and X2 = 2. Thenf(1) = -I andf(2) = 6
so there is a value a such that f( a) = o. That is a 3 = 2 and 1 < a < 2. Now
choose X3 = (XI + x2)/2 = 1.5. Further x~ - 2 > 0 so 1 < a < 1.5. Put X4 =
(XI + x3)/2 = 1.25 and (1.25)3 - 2 < 0 so 1.25 < a < 1.5. Put Xs = 1.375 and
P x;-2> 0 so 1.25 < a < 1.375. Put X6= 1.3125, x~-2> 0 so 1.25< a < 1.3125.
We can repeat this process to get a more and more accurately.
At this stage it will be of use to introduce the idea of a step function. Let
us begin with a picture, Fig. 1.4.
How do we represent this symbolically?
o if x< I
{
f(x)= 1 ifl~x~3
o ifx> 3
[Note: a ~ b means a is less than or equal to b.]
This is a perfectly satisfactory function. Given any value of X we can
calculate the appropriate values of f(x). Note that this function is discon-
tinuous at X = I and at X = 3. Another example is the Heaviside function
Special Functions 5
y·f(x)
-1 x
Fig. 1.6
given by:
ifx<O
f(x) = {~ ifO:5x
This is shown in Fig. 1.5.
Examples
l. Let
o ifx<1
f(x)= {2 if-1:5X:51
o if x > 1 (see Fig. 1.6)
Then the graph is given by Fig. 1.6.
2. Let
if x <-3
if-3:5x:56
ifx>6
Then the graph is given in Fig. 1.7.
6 Mathematics for Seismic Data Processing
y
4
-3 6 x
Fig. 1.7
y 3
-------.--.--.--+--.--.-------~x
-3
Fig. 1.8
----------~-----ho~---.--------~x
-1
-1
Fig. 1.9
L...L------i A
Fig. 1.10
3 TRIGONOMETRIC FUNCTIONS
The central idea is that of angle and how to measure angle in radians.
Consider a circle of radius I, this is called the unit circle. Let 0 be the
centre and A and B two points on the circumference as in Fig. 1.10. The
size of the angle made by AOB in radians is the length AB, which we call
(J. Since the circumference of the circle is 217, we get the formula 217
radians = 360°. Most calculators have a facility for calculation in radians
or degrees. Notice that 0 radians = 0°, 17/2 radians = 90°, 17/3 radians = 60°
and 17 radians = 180°. Given the unit circle with centre 0 we let OA be the
x-axis. Let B be a point on the circumference such that AOB is the angle
(J. Let C be the point on OA such that the angle OCB is a right angle, see
Fig. 1.11. We can now define two functions of (J, the cosine and sine function,
normally written as cos and sin.
cos () = OC and sin () = CB
(If you are familiar with the definition of cos (J as (adjacent)/(hypotenuse)
note that this is the same since the hypotenuse OB equals 1.)
We have to ensure that we measure the lengths with the appropriate
directions. When (J lies between 0 and 17/2 there are clearly no difficulties
but when () lies between 17/2 and 17 we see from Fig. 1.12 that cos () = OC
which is negative but that sin () = CB which is positive.
A third important function related to sine and cosine is the tangent which
is written tan and defined by tan () = sin (J / cos (J. We label the quadrants as
2nd 1st
3rd 4th
8 Mathematics for Seismic Data Processing
Fig. 1.11
----------4-ll---~-L--~--------~x
Fig. 1.12
Note
Os; (Js;1T/2 gives the 1st quadrant
1T/2<(JS;1T gives the 2nd quadrant
1T<(Js;31T/2 gives the 3rd quadrant
31T/2 < (J < 21T gives the 4th quadrant
We can tabulate the values of cos, sin and tan as follows:
1st 2nd 3rd 4th
TT-9
Fig. 1.13
. (7T
S10 - - 8 ) = S10
. -7T • cos 8 - cos -7T . S10
. 8
2 2 2
But sin 7T/2= 1 and cos 7T/2=0 so
Fig. 1.14
This last one requires some comment, but where a is negative we choose
a mod x to be positive (usually) so that - 3 +(4) = I or (-3)-1 = -4 = -I x4.
An alternative way of writing this is to say that a mod x is the number
such that a - (a mod x) = nx for some integer (whole number) n, and (a -
a mod x) is positive.
Exercise 2
Evaluate the following:
(i) 7 mod 3 (ii) 25 mod 5 (iii) 3 mod 2
(iv) -5 mod 2 (v) 6 mod 2.4 (vi) 517 mod (217)
So the value of cos 8 depends only on 8 mod 217.
Fig. 1.15
There are three further trigonometric functions which are common and
have names. These are based on the ones we have already defined:
cosec 8 = 1I sin 8
sec 8 = II cos 8
cot 8 = Iltan 8
At this stage it is valuable to remind ourselves how these functions were
defined, see Fig. 1.15. Since A and B lie on the unit circle centre O,OA
and DB have length 1. Thus DC and CB both have length less than or
equal to 1. So
- b; DC ::5 + I and -1::5 CB ::5 + I
This can be reinterpreted to mean that
- I ::5 cos 8::5 + I - I ::5 sin 8::5 + I
These facts can also be deduced from the formula cos 2 8 +sin 2 8 = 1.
The next important stage is to sketch the curve of cos and sin. PrlJbably
many of you are familiar with these anyway. Firstly since cos(8 +27T) = cos 8,
if n is any integer (something of the form 0, ±I, ±2, ... ) we have cos(O +
27Tn) = cos 0. Therefore, if we sketch the portion for 0::5 8::5 27T, the remain-
der is just a repetition. We can tabulate some values:
cos I 0 -I 0 I
Also cos -8 = cos 8 (i.e. cos is an even function) and so the curve for cos 8
is as shown in Fig. 1.16. For the sine curve we tabulate:
8 0 7T/2 7T 37T/2
sin 0 I 0 -I
Also sin 8 = -sin( - 0). (Functions with this property are called odd.) Figure
12 Mathematics for Seismic Data Processing
1\ 1\ 1\
v v V V V v
Fig. l.l8
1.17 gives the sine curve. If we draw the picture as in Fig. 1.4 and imagine
() traversing the circle, we can see that CB grows bigger and smaller and
that if we plotted its height on a graph against () we would obtain the graph
shown in Fig. 1.17.
Clearly it is now possible to build up more complicated functions. What
happens if we consider cos 2()?
Since 2( () + 7T) = 2() + 27T
cos(2( () + 7T» = cos 2( ().
Tabulating values of () from o~ 57T/4 gives:
cos 2() 1 0 -1 0 1 0
In this case the graph repeats after only 7T, this is illustrated in Fig. 1.18.
If we wrote cos 3() this would repeat more quickly. Tabulating cos ()/2,
however, we see that this repeats more slowly (see Fig. 1.19):
cos () /2 1 0.70 0 -1 0 1
Clearly cos( w() is a graph of the same shape with w determining how
rapidly it repeats. The constant w is called the angular frequency of this
Special Functions 13
Fig. 1.21
curve. As w increases the rate at which the graph repeats increases. Figure
1.20 illustrates this. The wavelength is the length between the repeats (or in
mathematical language the period). Thus for cos e the wavelength is 21T,
for cos 2e the wavelength is 1T. In general the wavelength A and the frequency
are related by the following relation:
217'
A=-
w
Note: Frequency in engineering is usually in Hertz, cycles per second, and
so frequency is usually "frequency" /217'. Since one "cycle" consists of one
rotation through an angle 21T then 1 cycle per second is 21T radians per
second. Thus the more usual equation is A = 1/ w. Similar arguments apply
to the sine wave.
Another stage is to consider functions of the form 3 cos e or ~ cos e. These
two examples are illustrated in Fig. 1.21. If we have a wave a cos we then
a is called the amplitude. The last alteration we can make to the simple cos
or sin wave is to consider a wave of the form a cos( we + cf». The term cf>
moves the wave along the axis and is called the phase. For example, in Fig.
1.22 we sketch cos e and cos( e+ 1T /2). Using the identity discussed earlier,
17')
cos ( e +2 = cos e cos 2
1T
- sin e sm• 217' = -sm
.
e
Note: cos( e -1T /2) = sin e.
These ideas are basic to the whole of Fourier analysis and the theory of
waves.
14 Mathematics for Seismic Data Processing
Exercise 4
Sketch the following:
(i) 2 sin 38 (ii) cos( 8 + 1T)
(iii) ! sin 28 (iv) sin 8 +cos 8
As a final note to this section, we observe that when the angle 8 is small,
we can find some simple and useful approximation for sin 8 and tan 8.
Figure 1.23 shows the unit circle with a small angle 8. From the diagram
we can see CB < arc AB. Thus CBIOB = CB = sin 8 < 8 and so 0 < sin 8 < 8.
We can also make a rather more subtle deduction:
CB
tan8=-
OC
since OC < OA = 1
CB .
tan (J=-> CB=SlD 8
OC
and so
tan (J > sin 8.
For small angles (in radians) we can assume
tan (J = sin 8 = (J
thus for (J = 10-7
tan 8 = 10-7
sin 8 = 10-7
When 8 is small a useful approximation is cos (J = 1 = sin 81tan 8.
y y
x x
graph offunctions like b = lOa (see Fig. 1.24). This is the basis oflogarithms
and log tables.
One favourite number to replace x is the number e = 2.718281828 ...
which seems rather bizarre. In fact as we shall see the number e is very
useful and crops up all over the place-you will notice it appears in all
scientific calculations.
Some calculations show that
a ea
-1 0.36788
-0.5 0.60653
0.0 1.0000
0.5 1.6487
1.0 2.71828
This produces the graph shown in Fig. 1.25, which is of the same shape as
that in Fig. 1.24. You will often see ea = f( a) written as b = f( a) = exp( a).
It is such an important and useful function that most computer languages
have it provided.
Power functions can often arise as solutions to equations. Suppose we
have a function f(t) where f(t) is some value at time t.
Suppose we notice
f(t + 1)- f(t) = xf(t)
for some value x, so the increment in our time interval is proportional to
the function value.
If f(O) = I (for simplicity) then
f(l) =(1 + x)f(O) =(1 + x)
f(2) = (1 + x)f(l) = (1 + xf
f(t) = (1 +x)'
giving a power function!
16 Mathematics for Seismic Data Processing
f(t)=(1 +tlt)I/1l1
Example 3
Compute f(1) for tlt = 0.1,0.001,0.0001. You will notice that your solutions
for f(l) are very close to e. In fact as tlt -+ 0 the value for f(l) tends to e.
(Try tlt = 0.00000001!)
5 INVERSE FUNCTIONS
A )
Fig. 1.27
-1 x
Exercise 5
(i) IfJ(x)=7x-4findg(x)
(ii) If J(x) = (x _1)3 find g(x)
The first and perhaps the most important inverse function is defined for
exp(x). If we look again at the graph of y = exp x, Fig. 1.25, we can see
that for any real Yo we can find Xo such that Yo = exp Xo. Notice that Xo is
unique. This will define a function called loge or In. The graph of y = loge x
can be found from the graph of exp x and is given in Fig. 1.30.
Some properties of loge x follow almost immediately from this definition:
log.(I) = 0
log.(e) = I
log. (xy) = logex + logeY
If instead of exp x = eX, we chose a different power function, say lOX, the
inverse function would be called loglo x.
Exercise 6
Evaluate the following expressions using your calculator:
(i) loglo(l02) (iii) e \.5 (v) 1010810 1.5
(ii) loge 4.4816 (iv) exp(loge 10) (vi) exp(loglo 3)
18 Mathematics for Seismic Data Processing
Fig. 1.30
cos- I X
x degrees radians
-1 180 1T
-0.5 120 21T/3
0.0 90 1T/2
0.5 60 1T/3
1.0 0 0
Special Functions 19
x
Fig. 1.31 Fig. 1.32
Fig. 1.33
y y
We can define an inverse sine in the same way, see Fig. 1.33.
Again we need to restrict our definition of sin x. In this case we choose
to define sin x only for x between -7T/2 and 7T/2. Then we can define the
inverse function
y =sin- I x
or
y = arc sin x
Notice that both sin-I x and COS-I x only have a definition for x in the
range - 1 to I.
Exercise 7
Sketch sin-I x
Fig. 1.36
Fig. 1.37
There are a number of ways, given two functions, say f(x) and g(x), of
constructing a new function from them. If we use our black box idea we
get Fig. 1.36.
Here we get the f + g which is obtained by adding the results of f to that
of g. We can replace + by x, - or -:- to obtain f / g or f x g etc. One class
of function which is useful is that of rational functions. These are those of
the form f(x)/ g(x) where f and g are both polynomials. For example
(x + 1)/ (x 2 + I), 1/ (x - I), (3x 4 - 5x 3 +6)/ (2x + I). As we commented earlier
there is no reason why a rational function is a polynomial.
The obvious other picture is 1.37, where A representsf(x) and B represents
g(x). Then Fig. 1.37 gives g(f(x». This is sometimes called a function of
a function or the "product" of g and f This is best illustrated by some
examples. Let f(x) = sin x and g(x) = x 2 + I. Then g(f(x» = (sin X)2 + I.
Notice f(g(x» = sin(x2 + I). Also if f(x) = eX and g(x) = x 2 + X + I, g(f(x» =
(eXi + eX + I = e2x + eX + I.
Exercise 8
For the following pairs evaluate g(f(x» and f(g(x».
(i) f(x) = 1/ x, g(x) = x 2
(ii) f(x) = sin x, g(x) = eX
(iii) f(x)=x+l,g(x)=x-1
7 NUMBERS
When we write a number as 2345 we automatically realise that this is, in
words, two thousand three hundred and forty five. So in the written language
the number is thought of as 2 x 1000 + 3 x 100 + 4 x 10 + 5, or more neatly as
2xI03 +3xI0 2 +4XI0 1 +5XIO°, remember 10°=1.
If we let d = 10 then any string an ... a2al ao represents the number
aniOn +an_,lO n- 1 + ... +a,IO+ao or and n +an_,d n- , + ... +ao, where
0::;; aj < 10.
Special Functions 21
It is now clear that if we let d be any number and aj be numbers such
that o~ a j < d, we can use anan-I ... , a o to represent the number and n +-
an-I d n - I + ... + ao· When this is done the number is said to be written in
base d. Our usual representation is in base 10 (decimal). The other very
commonly used systems are base 2 (binary) and base 16 (hexadecimal)
although other bases are used occasionally.
To illustrate let us consider base 3. We write a number as 2101. We
interpret this in decimal as 2x3 3 +1 x3 2 +Ox3+1 xl =54+9+1 =64.
Notice that we have no need of a symbol representing 3, since 3 = 1 x 3 +0 =
10. To take a number written in decimal form and write it to the base three
(ternary) we keep dividing and note the remainders:
31ill
3140 rI
3111 rI
31~ r I
I r1
So 121 in decimal becomes 11111 in a ternary representation. It can easily
be checked
11111 = 1 x3 4 +1 x3 3 +1 x3 2 +1 x3+1
= 81 + 27 + 9 + 3 + 1 = 121
We will now give some examples in binary representation.
Example 5
121 in decimal notation becomes
2 Iill
2160 r 1
2130 r0
21U rO
2 11 r 1
2 J r1
1 r I
o r 1
So
121 = 1 x26+1 x2 5 +1 x24+1 x2 3 +Ox2 2 +0x2+1
=64+32+16+8+1 = 121 i.e. 121 = 111101
Exercise 9
(i) In the following evaluate the binary numbers as a decimal: (a) 10101;
(b) 1I01101I; (c) 100; (d) 10001.
(ii) In the following evaluate the ternary numbers as a decimal: (a) 12021;
(b) 11022; (c) 202; (d) 2021.
(iii) Rewrite the following decimal numbers as both binary and ternary
numbers: (a) Ill; (b) 422; (c) 61; (d) 81.
Just to make the point clear, given any number d we can use this as a
base to write any number a in the form a = and n +an - I d n - I + ... +ao where
O:s; aj < d. If d > lOwe have to use new symbols. In particular the
hexadecimal system (d = 16) is used in many computers. The usual conven-
tion is that A= 10, B= II, C= 12, D= 13, E= 14, F= 15. Let us write out
the first 32 numbers in the four bases we have discussed so far.
I I I I
10 2 2 2
11 10 3 3
100 II 4 4
101 12 5 5
1I0 20 6 6
III 21 7 7
1000 22 8 8
1001 100 9 9
1010 IO} 10 A
1011 102 11 B
1100 1I0 12 C
1101 III 13 D
II 10 112 14 E
III I 120 15 F
10000 121 16 10
10001 122 17 II
10010 200 18 12
1001I 201 19 13
10100 202 20 14
10101 210 21 15
10110 21I 22 16
lOll I 212 23 17
1I000 220 24 18
11001 221 25 19
11010 222 26 lA
11011 1000 27 IB
11I00 1001 28 lC
IIIOI 1002 29 ID
III 10 1010 30 IE
III 11 lOll 31 IF
100000 1012 32 20
Special Functions 23
It is just as easy to add and mUltiply numbers to the base d as ordinary
decimals, all you have to do is to remember that d is the relevant number
for carrying, not 10 as usual. Here is an example base 3.
III
+ 10
+ 102
1000
11
Just to check we do it in decimal notation
Similarly multiplying 121 x2= 1012. Again we can check: 121 =9+6+1 =
16, so 2 x 16 = 32 and 32 = I x 33 + 3 + 2. Long multiplication works the same
121
21
121
10120
11011
1
Here are some examples in binary notation:
1101 11001
+ 1001 + 1110
10110 + --.lQ!
1 1 101100
III
1101
x 1001
1101
1101000
1110101
It is very easy to do this arithmetic using polynomials. If a = am ... , a. ao
and {3 = brn , ••• , bo are two numbers represented to the base d, then
n rn
a =L ai di and {3 =L bid i.
;=0
Then
i=O
24 Mathematics for Seismic Data Processing
where
j
Cj = L ajb j j_
j=O
These are precisely the coefficients worked out for the product of poly-
nomials. Hence we have reduced the problem of multiplying two numbers
to one of multiplying polynomials. This may not seem to be of much
advantage but recall that 0 $ aj < d, so the coefficients of the polynomials
have a limited range of values. If d = 2 then each coefficient is either 0 or 1.
In many ways base 2 is a favoured base precisely because it only requires
two distinct symbols. It does have the disadvantage that it involves long
strings (6 digits for example) to specify quite small numbers. Before discuss-
ing the connections with logic however one should meQtion the representa-
tion of numbers which are not integers (or whole numbers). In practice we
write these in decimal as decimals, i.e.
3.75 = 3 x 100 +7 x 10- 1 +5 x 10-2
7 5
=3+10+102
We can think of this as extending our polynomial to negative powers of d.
So we write any number a as anan-I an-2 ••• ao . b l b2 , • • . , where
a = and n +an-I d n- I + ... +ao+b l d- I + b2 d- 2 + .... So if d = 2
101
1.101 = I +-+-+-
248
7
= 1 +"8 = 1.875 in decimal
1.235 X 10 X 1.235 X 102 = (1.235)2 X 103 but 1.235 X 10 + 1.235 X 102 has to be
adjusted to say 0.1235 x 102+ 1.235 X 102 before the addition can take place.
One of the advantages of binary is that the hardware is easy to design.
This is because if we think of how the addition works we get
0+0=0 1+0 = 0 + I = I and I + 1= 10
If we only consider the last digit we obtain a new sum
0$0 = 0 1$0 = 0$ I = I and I $ 1= 0
Multiplying is as before
oX 0 = I X 0 = 0 X I = 0 and IXI= I
This makes for easy design since all we need is an "on" for I and "off"
for zero. If we consider the truth and falsity of simple statements we find
that we construct a very similar model, which is very useful. Let us use
capital letters for statements, P, Q and R etc. We can write P is true or P
is false and use t and f for these. We can combine two statements by "and"
P and Q, for example P = "mathematicians are mad", Q = "mad people are
useless". Now P and Q is the statement "mathematicians are mad and
mad people are useless".
Now we need to consider the truth of the compound statement P and Q
assuming we know the truth, or otherwise, of P and Q. The neatest way to
write this is in a "truth table".
P Q P and Q
t t
t f f
f t f
f f f
So the statement P and Q is true if and only if both P and Q are true, which
is our usual use of the term "and". Let us introduce another method of
joining two statements together. This is, for obvious reasons, eor (or exclus-
ive or). That is P eor Q is true if exactly one of P or Q is true. This has the
truth table
P Q P eor Q
t t f
t f t
f t t
f f f
We notice that the truth value of our statements can only be either true or
false. If we denote a true statement by 1 and a false statement by 0, we can
rewrite these two tables, using x instead of "and" and $ instead of "eor"
26 Mathematics for Seismic Data Processing
Fig. 1.38
to get
1<:91=0 1<:90 = 0<:9 1 = 1 and 0<:90 = 0
and
1 xl = 1 lxO=Oxl=OxO=O
So this system is exactly the same as the system derived from binary
arithmetic. The final piece in the puzzle comes with the realisation that
electric circuits can give rise to the saine system. Take a simple system with
a set of switches, any given switch is either on or off. So if we label the
switches as capital letters, say P, then P is either on (l) or off (0). If we
have two switches P and Q in series then we get that the combined switch
P x Q is on if and only if both P and Q are on. So I x 1 = 1 but 1 x 0 = 0 x 1 =
Ox 0 = O. It is easy to construct a circuit which gives "or" in the non-exclusive
sense, just take two parallel switches. However to obtain "exclusive or" we
need to be a little more clever. Let p' denote the switch which is on when
P is off and vice versa. Then the circuit shown in Fig. 1.38 will give
1<:91 =0=0<:90 and 1<:90=0<:91 = 1.
Thus we can do both arithmetic and logic by using electrical circuits.
This is at the heart of all modem calculators and computers. A nice and
very comprehensive account is given in Knuth (1977) with a computation
bias. In fact this approach to logic dates from G. Boole in the 1800s and
is often called Boolean algebra.
Chapter 2
CALCULUS: DIFFERENTIATION
1 INTRODUCTION
Given a function f(x), as in Fig. 2.1, one of its characteristics which may
be of interest is its "rate of change" or its "wigglyness". We could get some
idea of this by calculating the slope of the curve at any point xo, the natural
measure of slope being the tan of the angle 6, tan 6. Thus tan 6 = 0 means
the curve is parallel to the x-axis.
Consider the graph shown in Fig. 2.2. Then at
at the slope is 0
a2 the slope is 1
a3 the slope is 0
a4 the slope is -1
as the slope is ";3 = tan 'IT /3
This defines a functioll, for consider the black box shown in Fig. 2.3; on
receiving Xo the box produces the slope (tan 6) at Xo.
Given smooth curves without jumps or comers we can imagine that in
principle the slope function can be determined. The next question is whether
given f( x) we can find the slope. A crude technique might be to lay a ruler
along the curve and then measure 6. The curve in Fig. 2.4 is magnified at
this point to make things easier to see.
From the figure we seef(xo) = y, say, measures the "height" of the function
at Xo. If we move a small distance /)x to Xo + /)x then the "height" of the
function at Xo + /)x is f( Xo + /)x). The slope is approximately
f(xo+/)x)-f(xo) QS
=-
Xo + /)x - Xo PQ
since if /)x is small we would expect the true slope RQ / PQ to be close to
QS/PQ.
Taking small values for /)x we look at
QS = Y + /)y - Y /)y
PQ Xo + /)x - Xo /)x
28 Mathematics for Seismic Data Processing
Y.f~
,,
,
x.
Fig. 2.1
a, a2 a. a. a.
Fig. 2.2
slope at x.
)
point Xo tan a
Fig. 2.3
y
:R
a :
,p- -----"1 Q
,, :'
,, ,,,
,
, ,
:.-
, 6X----';,
x Xo+6X
Fig. 2.4
Example 1
To try an example, let f(x) = 3x then
y +8y - y 8y =f(xo+8x)- f(xo) = 3(xo+8x)- 3xo = 3
8x 8x 8x 8x
Try this on your calculator for Xo = 10, Xo = 0 and a "small" 8x.
Example 2
A more complex example
f(x) = 3x 2
Calculus: Differentiation 29
so
l)y = f(xo + l)X) -f(xo) = 3(xo + l)xi - 3x~
=3(x~ + 2xol)X + l)X 2) - 3x~
Giving
l)y f(xo + l)x) - f(xo)
-= = 6xo+3l)X
l)x l)x
When l)X is small then we find the slope at Xo is 6xo. That is we can determine
the slope for any value Xo giving the function g(x) = 6x.
Example 3
A more complex function is f( x) = 1/ x 2 • Then
I
l)y = f(xo + l)x) - f(x) = 2
(xo+l)x) x~
This can be rearranged to give
~
x 02- (x0 + l)X)2 2Xol)X -l)x2
uy=
(xo + l)X)2X~ (xo+l)xix~
and hence dividing by l)x
_l)y = 2Xo-l)X
__::--::-
_~_--=-.o....-
As our approximation gets better the smaller l)X we have, and letting l)x be
zero
l)y = 2xo 2
l)x - x~ = - x~
Thus, Fig. 2.5 illustrates the function, and Fig. 2.6 illustrates the slope.
We have carefully avoided the point x = 0 where unpleasant things
happen.
Example 4
As a final example we take a trigonometric function f(x) = cos x. At any
point Xo
l)y = cos(xo + l)x) - cos Xo
= cos Xo cos l)x - sin Xo sin l)x - cos Xo
using the formula for the cosine of a sum. As we are interested in small l)X
we can use the approximation (see Chapter I, section 3)
cos l)X = I sin l)x = l)X
Then the expression simplifies and we get
l)y = cos x - l)x sin x - cos x
or
5y .
-= -sin x
5x
30 Mathematics for Seismic Data Processing
Fig. 2.5
Fig. 2.6
If you try thrs in degrees it doesn't work! The values of Xo may make a
difference as to how small 8x should be; Xo = 2 requires smaller values for
6x.
Calculus: Differentiation 31
Exercise 1
(i) Find fjylSx for y=f(x)= llx, x>O.
(ii) Sketch sin 11 x for x near 0, e.g. x values from 0 to 1. Would you expect
to encounter difficulties in computing fly Iflx at x = O? Try it using x = 0
and flx = 0.00001.
The quantity Sy Iflx for small values of Sy and llx tends to a function
written dyldx or f(x). This quantity dyldx is called the derivative of y.
Remember it is a function. The two different ways of writing the function
arise because there were two discoverers of calculus. The f(x) notation was
Newton's while dyldx was used by Leibnitz.
We shall usually use dyldx to mean the derivative. Sometimes this is not
possible so if y = f(x) we may use dy I dx or dfl dx or f(x). Similarly, if
y = g(x) say
dy dg I
-=-=g(x)
dx dx
and if z = u(x)
dz du I
-=-=u(x)
dx dx
We do not even have to use x, thus if
s = h(t)
ds dh I
-=-=h (t)
dt dt
Unfortunately we cannot always find the derivative-it may not even
exist. The earlier example y = 11 x 2 had a problematical point at x = O. As
the function is not well defined here it is hardly surprising that dy I dx causes
problems. You will remember the Heaviside function where
ox<O
f(x) = { 1 x 2: 0
Figure 2.7 illustrates the "jump" at x = 0, where the function moves from
o to 1. Clearly attempting to find the slope here is not really sensible and
the derivative d y I dx does not exist (i.e. there is not a value) at x = O.
x
Fig. 2.7
32 Mathematics for Seismic Data Processing
/(x) f'(x)
a (constant) o
ax"(n;t' 0) anx"-I
(x+a)" n(x +a)"-I
I
log x
x
e ax ae ax
sin ax a cos ax (x in radians)
cos ax -a sin ax (x in radians)
tan ax a sec2 ax (x in radians)
aX aX log a
Rule 2
If f(x) and g(x) are two functions and y = f(x) + g(x), then
dy d df dg
-=-(f(x)+g(x»=-+-
dx dx dx dx
or we could write this as
d df dg
dx (f + g) = dx + dx
This is just a symbolic way of expressing the rule that the derivative of
a sum is the sum of the derivatives.
Calculus: Differentiation 33
Example S
Let y = sin x + cos x. Then
dy d sin x d cos x
-=--+--
dx dx dx
= cos x-sin x
Rule 3
Suppose u(x) and v(x) are functions and y =j(x) = u(x)v(x). Then
dy dv du
- = u(x) - +- v(x)
dx dx dx
Example 6
y = x sin x
dy .
dx = x cos x + I sm x
Example 7
Rule 4
Suppose u(x) and v(x) are functions and that
y = j(x) = u(x)/ v(x)
Then
du dy
dy = ~ v(x)-u(x)~
dx v(xf
Example 8
sin x
y=j(x)=-
cos x
dy = (cos x) cos x - (sin x)( -sin x)
dx cos 2 X
cos 2 x +sin 2 x I 2
--2-=sec x
cos x
2 cos x
Example 9
I
y=j(x)=-
logx
dy O·(logx)-l·(l/x)
-=
dx (log X)2 x(log X)2
34 Mathematics for Seismic Data Processing
dy
-=cos[(x+a)2]. 2(x+a)
dx
Rule 6
Often we would like to differentiate inverse functions. Suppose
y=jl(X)
then
dy= _ _
dx dxjdy
That is if y = jl(X) then
f(y)=x
and so
df. dy = I
dy dx
dy=_I_
dx dfjdy
Example 12
y = sin-I x
dy= _ __
dx d . cosy
dy (smy)
Z = log Y = x 2 log 13
dz
-=2xlog 13
dx
dy 2
.. dx = (2 log 13)x13 X
Exercise 2
Differentiate
(i) ax +b (vi) eX/sinx
(ii) ae x + bx 2 (vii) eax+sin x
(iii) sin x log x + cos 14x (viii) 13 sin x cos x
(iv) e- x2 / 2 (ix) 2x
(v) e sin x
2 HIGHER DERIVATIVES
Since y=j(x) has a derivative dy/dx and this is a function then dy/dx
can have a derivative, e.g. if y = x 2 , then dy / dx = 2x. g(x) = 2x is a perfectly
respectable function and
dg=2
dx
Once again h(x) = dg/dx is a function and has a derivative dh/dx =0
The derivative of the derivative d/dx(dy/dx) is written d 2 y/dx 2 and is
called the second derivative. This second derivative can itself be differenti-
ated to give d 3 y / dx 3 which would give d 4 y / dx 4 , ••• , etc. Figure 2.8 illustrates
the original function, y = x 2 , and its first and second derivatives. Thus one
example is
y = log x
dy =_
dx X
d2y
dx 2 = - x 2
36 Mathematics for Seismic Data Processing
x x
Fig. 2.8
d3 y _2
dx 3 - x 3
d4y 6
dx 4 = - X4
d3 y
-=-cosx
dx3
d4y .
-=SIDX
dx 4
d5 y
-=cosx
dx 5
Calculus: Differentiation 37
Exercise 3
(i) Find dy/dx, d 2 y/dx 2 and d 3 y/dx 3 if y=6x 4 +2x 2 +1.
(ii) Find d2y / dx 2 if y = (cos X)2.
(iii) If y=xlogx what is d 3 y/dx 3 ?
(iv) If y = eX +e- x show that d 2 y/dx 2 = y.
dy
if x> -3 say -2 then -=-4
dx
38 Mathematics for Seismic Data Processing
xo x
Fig. 2.9
-3 2
Fig. 2.10
. dy
whtle x> 2 say x = 3 then dx = 6
Thus x = 2 is a minimum. These are shown in Fig. 2.10. Notice these points
are the local maxima and minima. The function value at x = lOis bigger
than the value at x = - 3.
Example 15
1
y=--
x-2
dy I
dx = - (X_2)2
We cannot find an x for which dy/dx = 0 and so there are no turning points.
If we wish to sketch a curve it is very useful to know the turning points,
for example consider
y =x4 +2x 3 -3x2 -4x+4
dy 3 2
-=4x +6x -6x-4
dx
Calculus: Differentiation 39
Fig. 2.11
Fig. 2.12
Example 16
2 250
y=x +-
x
dy 250
-=2x--=0 whenx=5
dx x2
d 2 y 2 500 0 .. .
d x 2= +->
125
SO x=51s a mInimum.
Example 17
Steel cans are h cm high with radius of base r. Volume is V = 11'r2h. Surface
area is 211'rh + 211'r2. Suppose we want a minimum area for a given volume,
say 64 cubic cm. Then h = 64/ 11'r2 and so
128 2
Area=A=-+211'r,
r
then
dA 128
- = --+411'r
dr r2
so
so
r3 = ~ or r = .JF3 = 2.17 cm
and finally,
=4.34cm
Calculus: Differentiation 41
Fig. 2.13
Suppose we have the function y = f(x) and we know the value of the function
at Xo but that the value of f(x o + h) is not known. One could imagine that
f(x) might be sales at time x and that we are trying to extrapolate. If () is
the slope of the curve at x o, then an approximation to l>y is h tan () = h dy / dx,
see Fig. 2.13, and so
This is a simple but very useful approximation which is the key to Newton's
approximation method.
Suppose we wish to solve an equation f(x) = o. We might guess a solution
Xo. If Xo is near the solution then Xo + h might be the solution. From the
above equation
so
f(xo+h)=O
h = _f(xo)
.. f'(xo}
and a better approximation is
f(xo)
x=xo+h=xo---
f'(xo)
We can then use this value to go through the process again.
Example 18
f(x) = x 2 -9 = O,j'(x) = 2x. Try Xo = 2
h = f(2) = 4-9 = +~
-1'(2) -4 4·
Thus h = ~ and the new approximation for x is 3.25. Using Xo = 3.25
h = -0.24 ...
get
x =3.0096 ...
42 Mathematics for Seismic Data Processing
I 2 (first guess)
2 3.25
3 3.0096
4 3.00001536
5 3.00000000
Example 19
x
f'(x) = cos x- 2
Attempt number Value
I 1.5
p 2 2.14039
3 1.95201
4 1.93393
5 1.93375
This process can be carried further but we will not pursue it. However
in the information sheet provided various extended series of this nature are
shown, especially for the better known functions.
5 PARTIAL DERIVATIVES
Partial Derivatives
For y = f(x) we find the slope by differentiating. Suppose we have z = f(x, y)
say z = x 3y + y2. If we draw a contour map we eventually have a map of
the function rather like a map of the Lake District. If one were to move in
the direction of the y-axis at some fixed value of x, say x = a, then we are
clearly going to go upwards or downwards depending on whether we go
in the positive or negative directions.
In fact if x = a, then
z=a 3y+y2
i.e. z is a function of y alone as a is the fixed value of x, at x = 0 we have
44 Mathematics for Seismic Data Processing
! ,./"
1/
--------------~'
Fig. 2.16
the graph as shown in Fig. 2.17, for X= 1, Z=y+y2 and dz/dy= 1 +2y.
This is shown in Fig. 2.18. Whatever value of a we take we obtain a function
which we can differentiate. This slope is the slope of the surface in the
direction of a constant x.
Effectively we just differentiate for constant x. To distinguish between
dz/dy and dz/dy for a fixed x we write the latter as az/ay, and call it
partial "dz" by partial "dy". Thus
az 3
-=X +2y
ay
In the same way we can differentiate with respect to x for a fixed y. This
gives dZ/ ax = 3x 2y, i.e. differentiating with respect to x while holding y
constant.
This gives the slope of the surface as we travel parallel to the x-axis for
a fixed value of y. If y = 0, dZ/ dX = 0 i.e. the surface is flat along the x-axis.
If y = 1,
az 2
z = x3 + 1 -=3x
ax
and we have a valley.
As usual there is a variety of notation: for z = f(x, y) we can have
az af
ax = ax =fx
Calculus: Differentiation 45
Fig. 2.17
Fig. 2.18
and
az af
ay = ay =J;,
We illustrated a function of two variables because we could draw the
pictures. We can have partial derivatives for functions of several variables.
Suppose v = f(w, x, y, z), then av/ ax or af/ ax is just the result of differen-
tiating f(w, x, y, z) and regarding everything but x constant.
Example 20
z = 4x 3 +2xe Y + y2 +Iog t
az 2
- = 12x +2e Y
ax
az
-=2xe Y +2y
ay
az I
at
Exercise 6
Find az/ax, az/ay for
(i) z = X 2+y2
Oi) z = xy
(iii) z = sin xe Y
46 Mathematics for Seismic Data Processing
Exercise 7
Find all the first order partial derivatives, i.e. afl ax, afl ay, ...
(i) f(x, y, z) = x 2+ y2 + Z2
(ii) f(x, y, z, w) = xy2z3w4
(iii) f(x, y, z) = x/(I + ye- 2Z ).
As we are dealing with functions of many variables and with a very rich
class of possible derivatives the theory of partial derivatives does get very
complex. We shall not give very much of this theory but it is worth pointing
out some of the ideas.
Calculus: Differentiation 47
We start with an example. Suppose z = f(x, y) and we think we would
rather work in polar coordinates
x = r cos ()
y = r sin 8
Given azlax, azlay can we get azlar and azla8? If say z=x+3y then we
can substitute, getting
z = r cos 8 + 3 r sin 8
Now
az .
- = cos () + 3 sm 8
ar
and
az .
- = - r sm 8 + 3 r cos 8.
a8
Sometimes this is difficult and various relations have been found to save
work. These are of the form
az afax afay
-=-.-+--
ar ax ar ay ar '
az afax af ay
-=--+-.-
a8 ax a8 ay a()
Many variants of these formulae exist and have some useful application.
We will omit details.
Any equation involving differential coefficients is called a differential
equation, e.g. dy I dx = y. These equations are very important in physical
applications.
Any equation involving partial differential coefficients is called a partial
differential equation, p.d.e., for example
2 au au
3y -+-=2u
ax ay
2 a2u (au)2
u ao+ a8 =0
If the equation involves only ordinary differential coefficients the equation
is called an ordinary differential equation, o.d.e. If
x
z = f(x, y) = tan- I -
y
then
az y
-=--
ax X 2+y2
a2 z _ 2xy
ax 2- (x2 + y2)2
~_ y2_x 2
ax ay - (x 2+ y2)2
48 Mathematics for Seismic Data Processing
and
(Pj a2j
-+-=0
aX 2 ay2
Three important examples of p.d.e.s are
Example 22 The wave equation (in one dimension)
a2u I a2 u
ax 2- c2 at 2 =0
Example 23 Laplace's equation (two-dimensional form)
a2u a2 u
-+
2 -=0
ax ay2
Example 24 One-dimensional diffusion equation
a2 u I au
ax 2 =kat
We will examine Example 22 and some of its solutions in the next chapter,
as well as discussing the solutions to ordinary differential equations.
7 OPTIMISATION
In many situations we need the maximum or minimum of functions of
several variables. One might want the minimum value of "Rosenbrock's
banana-valley function",
P j(x, y)= 100(y-x 2 f+(x-I)2
This function was used for testing various numerical techniques. It is easier
to see how to find these values for functions of only two variables, the
technique is much more general but for the moment we just look at the
simplest cases.
Suppose we have a function with contours as shown in Fig. 2.19. If we
look at the function through the maximum for constant x and constant y
(indicated by dotted lines), these have the form shown in Fig. 2.20. Thus
at a maximum
aj aj a2j
-=0 and -=0 and -<0
ax ay ax 2
Similarly in a hole,
of
-=0
of
-=0
ax ay
These are illustrated in Fig. 2.21. Unfortunately there is another possibility-
the saddle point, shown in Fig. 2.22. At the saddle point the surface is flat
but as we move away we find that we go upslope in some directions and
Calculus: Differentiation 49
Yo
at Yo
x x
Fig. 2.19 Fig. 2.20
z max z
~:
:
, I ~
I
: min
I
by
I I ' • I
, I I
I
C![gJ
Y
c=:::;
I I
\
.I.. \
--...... \
--~---
X x
af = af =0
ax ay
Let
Then
af af
-=-=0
ax ay
~<o ~<o
50 Mathematics for Seismic Data Processing
Exercise 8
Find the maximum and minimum points of
(i) J(x, y) = X4 +4X 2y2 - 2X2 +2y 2-1
(ii) J(x,y)=x 2/+y2- x 2-1
[Note: if a = 0 more subtle tests have to be used to pinpoint the nature of
that point.]
Chapter 3
INTEGRATION
;=0
L;=o a; = L;=o aj .)
i is called the dummy variable since
We could also have approximated by choosing different rectangles:
b-a) n
(- b-a b-a
- I f(x;)=--f(x,)+··· +--f(xn)
n ;=, n n
52 Mathematics for Seismic Data Processing
C --------r-----...,
A
a b a b
Fig. 3.1 Fig. 3.2
_!~"')L _________________ _
f(x,)
f(x o) =:=:=:-
a-x o x, x. Xj
Fig. 3.3
Now let n -+ 00, for most functions we can expect that the sum
L;:~ «b-a)/n)f(xj) will tend to A.
Example (simple) 1
Choose y =f(x) = x and a = 0, b = 1, as in Fig. 3.4. If we take n = 2, Xo =0,
XI =!. X2 = 1
If, instead, n = 3,
xo=O
the sum becomes
1-0 1-0 1 1-0 2 1
A=--xO+--x-+--x-=-
3 3 3 3 3 3
In general
"-I 1 -0 i "-I i 1
L - . - = L 2=2(1+2+· ··+n-l)
j=O n n j=O n n
n(n-:-l)
2n2
n-l
2n
(Note: This depends on the result from algebra that 1+2 + ... + n -1 =
n(n -1)/2.) Hence we find that the area of A is !, which is the result we
would expect. Even this simple example involves considerable manipulative
Integration 53
y y
A
x
o x
Fig. 3.4 Fig. 3.5
Fig. 3.6
(i) Given a function f(x) the area between a and b, as defined above,
r
is called the definite integral of f and is written
f(x)dx
r
(iii) Addition. By looking at the appropriate picture, Fig. 3.6, we can see
that
t b
f(x) dx = L f(x) dx + f(x) dx
t b
cf(x) dx = C t
b
f(x) dx
2a ----------- f(x)-2x
f(x)-x
a
Fig. 3.7
f(x)+gx
y g(x)
f(x)
a b x
Fig. 3.8
(v) Addition of functions. One last simple rule is that if f(x) and g(x)
t t t
are two functions then
b b b
(f(x) + g(x» dx = f(x) dx + g(x) dx
=~+3=4
When we have learnt how to evaluate more integrals these rules become
more and more useful.
f(x)
a xo x
x
y F(x) -jt(tldt
a
F(x)
--~------~~------L-------4
a X
Fig. 3.9
function of b with J!f(t) dt acting as a black box, so that for every value
J!
of b, F( b) = f( t) dt gives a further number.
J:
To return to more familiar notation, we write F(x) = f( t) dt. Figure 3.9
shows the relationship between these functions, f(x) oui" original function
and F(x) the area under the curve at different points. We will now state
(but not prove) the Fundamental Theorem of Calculus:
G(x) = f: f(t)dt
Then it is still true that dj dx G(x) = f(x) but G(x) does not necessarily
06 Mathematics for Seismic Data Processing
= La f(t) dt + F(x)
So G(x) = c + F(x) where c is a constant, given by c = J~ f( t) dt.
There is a language for this. If F(x) is any function such that
d
dx F(x) = f(x)
we say that F(x) is a primitive for f For example log x is a primitive for
II x, as is log x + 23. Since the differential of a constant is zero any two
primitives will be related by a constant. If F(x) is a primitive for f(x),
F(x) = Jf(x) dx (without limits) is called an indefinite integral of f and is
defined up to a constant, i.e. S f(x) dx = F(x) + c where c is any constant.
If G(x) = S:f(x) dx, then
G(b)= t b
f(x)dx and G(a)=O
f f(X)dX= X"+l
n +1
+c
This enables us to evaluate the area under a parabola (curve of the form
y = ax 2 + bx + c) between,say, Xo and Xl.
f X
Xo
' f(x) dx = f X
Xo
) (ax 2 +bx +c) dx
X f X)
= f )ax 2 dx + fX) bx dx + c dx
Xo Xo Xo
Now
fXo
X
) 2
x dx=---
x~
3
x~
3
Integration 57
Table 1
f(x) J
F(x) = f(x) dx f(x) J
F(x) = f(x) dx
x"
x"(n 'i- -I) cot x loglsin xl
n +1
1
-=x- I log,(x) sec 2 x tan x
x
1
_ eax
e ax (a 'i- 0) cosec2 x -cot x
a
1
aX(a>O) --aX sin-I(~)(-I<x<a)
log, a .J a 2 - x2
cos ax 1 _I x
sin ax (a 'i- 0) -tan -
a a 2 +x 2 a a
sin ax
cos ax (a'i-O) log,x x logex-x
a
tan x logelsec xl
as
f X,
xdx=---
xi x~
Xo 2 2
and
So
f X ,
~
a
3
b
(ax 2 + bx + c) dx = - (x~ - x~) +- (xi -
2
X o) + C(XI - Xo)
= -cos x +sin x +C
C is some constant.
(ii) f J- 1
1/2
dx = sin-I (~)-sin-I (0)
o I-x 2
7T 7T
=--0=-
6 6
(iii) fI
lOl
- dx = loge 10-0 = 2.30 ...
X
58 Mathematics for Seismic Data Processing
fIx)
Fig. 3.10
Fig. 3.11
Exercise 1
Evaluate the following integrals and check your solutions by differentiating:
0) f x 3 dx (iii) f2/: x2
Evaluate
f" -2--2
o o J4-x 2
2a dx
(vii)
a a +x
x
Fig. 3.12
Example 5
f .
OO
1
I
-dx
x2
Consider the picture, Fig. 3.12. The area A = J~ II x 2 dx. Now a primitive
is F(x)=-llx. So
A=-~-( -n=I-~
Now as X ~OO, II X ~O so we have
f
OO I
1
-dx=1
x2
r
F is a primitive of f we have
f(x) dx = [F(x)]~
So
f2
0
(x4+x3+I)dx= -+-+x
54
[X5 X4 J2
0
= 6.4 +4 +2 = 12.4
Integrating step functions is very easy. Let
0 if x <-1
y
2 1----,..--
-1 o 1x 2 x
Fig. 3.13
Some important integrals which arise in the work on Fourier Series involve
the trigonometric functions. We give just one example. If we can recall two
formulae from Chapter 1
sin(A + B) = sin A cos B +cos A sin B
sin( A - B) = sin A cos B - cos A sin B
So
sin(A + B) +sin(A - B) = 2 sin A cos B
Hence
=0
Exercise 2
Evaluate the following integrals
(i) J~1T sin 3x cos 4x dx
(ii) J~ e- 2x dx, you can assume e- 2x tends to zero as x tends to infinity
(iii)J; e- 2x dx
(iv) J~f(x) dx where
f(x) =1 O::s;x::s; 1
=2 1::s;x::s;2
=3 2::s; x:S 3
b x
Fig. 3.14
Our aim with these examples was to demonstrate that the evaluation of
integrals involves much mathematical technique and a lot of practice.
However even with all of these techniques there are still functions whose
integrals are well behaved but for which there are no easily written down
formulae. Fortunately there are books (Gradshteyn and Ryzhik) which
tabulate the values of such integrals. This section will discuss some of the
methods used in calculating such integrals.
Since the idea of integration is very close to that of summation (I is the
Greek equivalent to "s" and Jis a deformed "s") it is not surprising that
the numerical evaluation of integrals is a well developed subject.
Clearly the original definition could be used to evaluate an integral
numerically but we can do better. If we look at Fig. 3.14 we can see that
rather than just using I::~ (b - a)j n/(x.), a = Xo, X., ... , Xn = b with Xi-
Xi-. = (b - a)j n we can approximate the area of Ai better by adding the top
triangle. So we estimate Ai by
This is called the trapezoidal rule. Just to illustrate that it is better, a simple
program was run on a home computer to evaluate J~ X3 dx. The results are
tabulated in Table 2, showing that even for a small n, the trapezoidal rule
gives a good approximation. This is not, however, true for the simple
formula.
An alternative way to consider these two approximations is to think of
them as approximations to the graph. The simple version gives a step
function approximation, see Fig. 3.15. The trapezoidal rule gives a sequence
of straight lines as in Fig. 3.16. We can do even better by dividing the whole
curve into 2n divisions and using quadratic curves to join the points in
threes as in Fig. 3.17. Rather than go through the mathematical details, we
62 Mathematics for Seismic Data Processing
Table 2
4 0.1406 .. . 0.2565 .. .
16 0.2197 .. . 0.2509 .. .
64 0.2422 .. . 0.2500.. .
256 0.2480 .. . 0.2500.. .
Fig. 3.15
Fig. 3.16
x
Fig. 3.17
f
a
b f(x) dx = b - a (f(xo) +4f(Xl) + 2f(X2) +4f(X3) + ... +4f(x2n ) +f2n)
6n
where Xo = a, X2n = b, and Xi - X i - 1 = (b - a)/2n.
Integration 63
Table 3
fo
I
tan -1TX dx =-loge
4
4
1T
sec (1T)
- = 0.4412 ...
4
4 DOUBLE INTEGRATION
J:
We interpret this expression as follows: first integrate f(x, y) considering y
as a constant; then f( x, y) dx = F(y) is a function of y; finally calculate
J! F(y) dy. Normally it does not matter in what order we do this process
as long as we watch the limits carefully. If d(y) and c(y) are constants, d
and c respectively, then
Example 6
Let lex, y) = xy. Then
Example 7
lex, y) = h.
r
Then
tbfd h dx dy = h [hx]~ dy
= [h(d - c)y]~ = h(d - c)(b - a)
Example 7 is worth noting since it illustrates the important fact that just as
the single integral gives an area, the double integral gives a volume. This
comment needs to be explained. The region R in Fig. 3.18 determined by
a :5 x :5 b and by c:5 Y :5 d is just a rectangle on the plane with area
(b - a)( d - c). If we now erect a solid of height h on this rectangle we have
a solid with volume (b - a)(d - c)h, as in Fig. 3.19.
This gives an alternative definition which is more like our original defini-
tion for the single integral (it is also more symmetrical).
Let!(x, y) be a function of two variables x and y. Let R be a plane region
in the (x, y)-plane illustrated in Fig. 3.20. We divide the region R up into
little rectangles size 8x· 8y where 8x and 8y are small. The height will be
approximately lex, y). Thus we have a column of volume !(x, y) 8x 8y. Now
we sum over all the bits to get the volume
ff lex, y) dx dy
R
(iii) J~ J~2 x dx dy
Integration 65
z
z
y y
bl~----~~--+-----
X x
Fig. 3.18 Fig. 3.19
Fig. 3.20
Solutions
(i)
= L(~+2y-O-O)dy
=[h+/]~=~+l =¥
(ii) f 2(fIX2dX)dy=f2[X3JI
1 Y -, 3Y-1
dy
1
=f 2
1
1 -1
3y - 3y dy
=f -dy
2 2
1 3y
= [~ log y n= ~ . log 2
66 Mathematics for Seismic Data Processing
(iii) II (f: 2
XdX)dY = f [~2J:2 dy
= fl y4-l dy=
Jo 2
[l _lJI
10 6 0
3-5
10 6 30 15
It is sometimes quite useful to be able to sketch the region defined by
the limits of the integration. If we take Example 8(iii) we obtain Fig. 3.21,
R is the shaded region.
The technical difficulties of evaluating double integrals are even greater
than for single integrals. Once again there is a vast body of numerical
method available to do this. However this is not the place to investigate
those methods.
Also as we have extended the single integral to double integrals, there is
no difficulty, theoretically that is, of extending the idea further. We can
similarly define triple integrals
fff f(x, y, z) dx dy dz
This is relevant to the real world as many real situations lead us to consider
functions of three variables. Once again this drifts beyond the scope of this
book.
Recall that the object of this book is to give you familiarity with these
mathematical concepts, but not to make you feel that you can treat them
with contempt.
5 LINE INTEGRALS
L P(x, y)dx+Q(x,y) dy
the "line integral". The reason for this formulation is that the pair is often
thought of as a vector, see Chapter 5. To simplify the description at this
stage we will assume that the curve C is given by y = f(x) and C is either
increasing or decreasing, see Fig. 3.23. Such a function is called monotonic.
Technically it means that given any XI and X2 either f(xI) $.f(x2) or f(xI);:::
!(X2) but not both. The curve cannot increase and decrease.
Integration 67
y
1
x x
Fig. 3.21
B
A
c,
Fig. 3.22
Fig. 3.23
Example 9
(i) y = 3x + I, then if XI < X2. 3xI + 1< 3X2 + I. So 3x + I is monotonic.
(ii) y = I/x, x>O, then if XI <X2, l/xl> I/X2 and so l/x is monotonic.
x
(iii) y = 2 is not monotonic as if XI =! and X2 = 1,-(!f < 1 but if XI =-1
and x 2 =-!, XI <X2, (- If> (-!)2.
In this situation the line integral is easily calculated by substituting
y = f(x) and dy = f'(x) dx or dy = (df/dx) dx into the equations so that
fc
P(x, y) dx + Q(x, y) dy = IX, (P(x,f(x» dx + Q(x,f(x»f'(x» dx
~
68 Mathematics for Seismic Data Processing
2 4
Fig. 3.24
L P(x,y)dx+Q(x,y)dy= f«X+3X+l)+(X.3X+I)3)dX
(since dyjdx=3)
= f4
2
[7X2]4
(7x + 1 +9x 2 ) dx = T+ x +3x 3 2
Similarly if the curve is given in the form x = f(y) we can calculate the
integral. Notice that if for the curve we choose y = 0, then dy j dx = 0 and
so the integral is just our normal integral, J~~ P(x, 0) dx.
One of the major applications of these ideas is in work on gravitational
and electromagnetic theory. If we allow a particle to move it is natural to
integrate along the path travelled by the particle.
The restriction on the nature of the curve is quite limiting but many curves
can be split into various sections each of which satisfy the monotonic
hypothesis. If we take the triangle ABC in Fig. 3.25, we can split it into 3
pieces AB, BC and CA all of which are monotonic, similarly with a finite
chunk of sine wave as in Fig. 3.26.
Exercise 3
Split the circle into a number of monotonic pieces.
Example 11
Now suppose we integrate P(x, y) = x + y and Q(x, y) = 0 around the triangle
DAB in Fig. 3.25. This is not monotonic but can be split into three parts
JOA P(xy) dx, LB J
P(x, y) dx and BO P(x, y) dx. In fact there is a special
notation §c P(x, y) dx + Q(x, y) dy to indicate integration round a closed
loop C. Notice that the simplification to an elementary integral only holds
Integration 69
B 0.'
0.0 '.0
o A monotonic segments
f
eA
(x + y) dx = I° I X dx = 4
f (x+Y)dx=fo(X+l-X)dX
AB I
since y = 1 - x on AB
f (x + y) dx = f ° 1 dx = - 1
AB I
f (x+y)dx=O sincex+dx=O
BO
therefore
f e (x + y) dx = 4- 1 + 0 = -4
One interesting case of a line integral crops up when we look at the length
of a curve. Suppose s is the distance along a curve C then we might be
interested in Ie P(s) ds where P(s) is perhaps a density, or a cost. We can
see how to evaluate these integrals with the aid of Fig. 3.27. Clearly for
small 8x, 8y
and thus
Example 12
Let us work out the length of x 2+ y2 = 1 in the first quadrant, as in Fig.
3.28, arc length = I~ dsJl +(dy/dx)2 dx since dy/dx = -x/y we have
f 11M2 II
e
ds =
0
1+2 dx =
Y 0
dx
-==!!.
Jl-x 2 2
70 Mathematics for Seismic Data Processing
y+6y
6y
y
c
x X+6y
Green's Theorem
Suppose we have a curve C and two continuous functions P(x, y) and
Q(x, y) defined on the region R enclosed by C and on the boundary curve
e;-
C, as in Fig. 3.29. Then
ff
R
~~) dx dy = -fc P(x, y) dx +Q(x, y) dy
Integration 71
Fig. 3.29
A= f f dxdy
R
A= f f dXdy=-fc y dx
R
and similarly,
A= f
R
f dx dy = +fc x dy
Example 13
Let C be the ellipse x 2 +4/ = 9. Then if P(x, y) = 3x - y, Q = x +2y
Numerical Integration
To evaluate a double integral numerically we proceed in much the same
way as for single integrals. We cover the surface of interest as in Fig. 3.30
with a mesh and evaluate the function P(Xi' Yi) at each mesh point (Xi, Yi).
72 Mathematics for Seismic Data Processing
y.
x
y.
y,
X, X. x.
Fig. 3.30 Fig. 3.31
Where we have Xt. X2, ••• , Xn and Yt. ... ,Yn, then the integral
II
R
P(xy) dx dy = L
all squares
P(xJ'Jhk
.
where hand k are the length and breadth of the mesh cells.
Such integrals are often needed, for example in calculating the gravimetric
effects of a mass.
Since we have a two-dimensional problem with say n x-points and n
y-points we need n 2 function evaluations, which for large n can be expensive.
One way around this is to use Green's theorem and, instead of working out
the long problem, to evaluate the line integral. Now this can be written as
a combination of simple integrals which are one-dimensional. This gives
considerable savings in computation.
Suppose we wish to evaluate
1= f f x dxdy
R
6 DIFFERENTIAL EQUATIONS
Now that we have learnt to integrate and differentiate it is time to consider
differential equations. These are simply equations involving differentials,
dy/dx = x is such an equation. Essentially, in such a situation, we have
determined the slope of the curve at all points x. Can we recover a function
y = f(x) so that dy/dx = x? This example is easy! Consider y = f(x). Then
f f(x) dx = f(x)
:. f(x) = f x dx = ~2 + c
c is some constant. Thus any function y = (x 2 /2) + c is a solution.
The reason for the indeterminacy of the solution is clear if we draw a
graph, Fig. 3.32. All the curves have the same slope for a particular value
of x. Hence they will all satisfy the equation. (Solutions are sometimes
called flow lines.)
Clearly any equation of the form
dy
-=f(x)
dx
can be solved in a similar manner. That is, we write the solution as
y= f f(x)dx
f h(y) dy = f g(x) dx
Crudely one could justify this by mUltiplying each side of our original
equation by dx:
dy
h(y) - . dx = g(x) dx
dx
f f
and then,
h(y) dy = g(x) dx
74 Mathematics for Seismic Data Processing
Fig. 3032
and so
f;d f y= kdx
d3 y dy d 2y
(ii) - - y - = -
dx 3 dx dx 2
... ) d2y (d y
(III dx2- dx
)3 = x 3
- S10
•
Y
. d 2y dy 2
(tv) dx2- dx +y =0
4 d4 y 3 d3 y 2 d 2y dy
(v) x -+x -+x -+x-+y=O
dx 4 dx 3 dx 2 dx
so z dz/dy = _w 2 y. Hence
f zdz= f -w2ydy
so z2/2=-(w 2/2)y+c/2 for some constant c. Thus z=±.Jc-w2 y.
Let us choose c to be positive and write it as d 2 and choose the positive
square root. Then
76 Mathematics for Seismic Data Processing
So
f I
wJ(dlw)2- y2
dy= f dx
where u is the length of the string, U (x, t) and x is distance along the string
Integration 77
and t is the time. The first method we will use is called "Separation oj
the Variables". We assume the solution has the form u(x, t)= V(x)T(t),
where V depends only on x and T depends only on t. We now rewrite the
equation as
Hence
Since the left-hand side depends only on x and the right-hand side depends
only on t they must both be a constant. We put this constant equal to
- w2 / c2 and obtain
and
d2 T
-+w 2 T=O
dt
2
. ( -1-
ur(x, t) = sm r1Tx) ( Cr cos (r1TC
-1- 1) -1- 1) )
+ Dr sin (r1Tc
78 Mathematics for Seismic Data Processing
u(x, t) = L ur(x, t)
r=1
If we know the initial starting position we then get a solution for the C:s.
This discussion is crucial to understanding the importance of Fourier
series which we will return to in a later chapter.
Before leaving this finally, we will show an alternative approach because
this can be important in certain applications. For this we assume that we
can write u as a function of x - ct. Thus u(x, t) = f(x - ct) = f(z) say where
z=x-ct. Now using the chain rule aflat=dfldz·azlat=-cdfldz and
afl ax = dfl dz· azl ax = dfl dz. Then i fI az 2 = c2 dfl dz and a2f1 ax 2 =
d 2fl dz 2 and so u(x, t) = x - ct satisfies the equation a2 ul ax 2 = II c2 aul at 2 •
The fact that there are two methods shows the difficulty of solving Partial
Differential Equations and the crucial importance of making sure that the
method chosen is appropriate to the physical model.
Chapter 4
COMPLEX NUMBERS
1 INTRODUCTION
2 THE BEGINNING
We start with the simple observation that there is no real number x such
that x 2 = -I. So we invent a number i (engineers often call it j) such that
i2 =-1
This doesn't get us too far of itself, but we continue by considering all
"numbers" (pairs) of the form a + ib where a and b are real numbers and
we add and mUltiply the pairs according to the following rules: If a, b, c
and d are real numbers then
(a + ib) +(c + id) = (a + c) + i(b +d)
(a + ib)(c + id) = ac + iad + ibc + ibid
= ac +i(ad +bc) +i 2 bd
= ac - bd + i(ad + be) (since i 2 = -1)
The collection of all such "numbers" with this addition and multiplication
is called the complex numbers, and any number of the form a + ib (where
a, b are real numbers) is called a complex number. The real number a is
called the real part of a + ib and b is called the imaginary part. Subtraction
then obeys the rule (for a, b, c, d real) that:
(a + ib) - (c + id) = (a - c) + i ( b - d),
80 Mathematics for Seismic Data Processing
and it is routine to check that the usual rules familiar from real numbers
also apply to complex numbers. We give some examples:
(2 + i) +(3 + i2) = 5 + i3
(~+ i2) - a+ i3) = ~ - i
(l +i)(1 +i)= I +i 2+i(1 +1)
= I-I +i2 = i2
(~- i)(2 + i2) = ~ x2 -(i2)(2) + i( -2 +2 x~)
=1-(-2)+i(-2+1)
=3-i
(a + ib)( a - ib) = a 2 + b 2
+ ib)(a + ib) = a 2 - b 2 +2iab
(a
(Note that a + ib is often written a + bi, which is the same thing.)
Exercise 1
Evaluate
(i) (l +~i) +(2 +~i)
(ii) (1-IOi)+(23.5+50i)
(iii) (~- i)(2 + i)
(iv) (I + i)(lO-3i)
Examples
1. If z = 2 +3i
£=2-3i
2 1 2-3i 2 3i
Izl = 13 and--=--=---
2+3i 13 13 13
2. z=!-2i
£=!+2i
1z 1
2 17 I ! + 2i
=-and--=--
4 !-2i 1]
2 +8i
17
3. z=J2i_
£ = -J2i, Izl2 = 2
1 -J2i
z 2
Exercise 2
In each case calculate £, Izl and 1/ z.
(i) 2 + i
(ii) -2 +3i
(iii) -1
(iv) J2 +iJ3
Finally, note that (as with real numbers) division by a non-zero complex
number is the same as multiplication by its inverse. Thus
I
(a + ib)+ (c + id) = (a + ib) . - - . -
(c + Id)
c-id
=(a+ib) . - -
c 2 +d 2
For example,
1
(2+2i)+(1 +i)=(2+2i)· 1 +i
(1 - i)
=(2+2i)--
1+ I
= (l + i)(l - i) = 2
With the definitions we have given it is possible to develop all the familiar
things that are done with real numbers. But why does this help with
82 Mathematics for Seismic Data Processing
.c • (a,b)
. .r
Imaginary
.A
real
·s
·0
Fig. 4.1
real 6 real
(a.O)
Fig. 4.2 Fig. 4.3
__~)~I______~--f_(Z)~
Fig. 4.4
Exercise 4
(i) If z = (l + i)j (3 - 4i), find Izl and the argument (J. Plot the point z on
the Argand diagram.
(ii) If Zl = I + i and Z2 = 3 -4i find Izt/, IZ21 and IZI - z21. Find the argument
(J for (ZI - Z2).
The techniques for complex variables are exactly the same as in the real
case. Instead of the black box being fed real numbers, it is fed with complex
numbers (see Fig. 4.4). Polynomial functions are just the same:
f(z)=z+l
or
f(z)=z3+2z+1
or
f(z) = iz 2 -(i+I)z-2i
Such functions can be evaluated just as before. Note that if the coefficients
of a polynomial are all real as in Z3 + 2z + I, and if z is real, i.e. z = x + iO
for real x, then f(z) = f(x) just as if everything was real. More generally, if
we have a complex variable z = x + iy where x and yare real, then z can
be thought of as depending on the real variables x and y. Hence a function
f(z) can be thought of as involving a pair offunctions each of two variables:
f(z) = u(x, y) + iv(x, y)
where u(x, y) gives the real part and v(x, y) gives the imaginary part. For
example, if f(Z)=Z2=(x+iy)2=X2_y2+2ixy then u(x,Y)=X 2_ y 2 and
v(x, y) = 2xy.
84 Mathematics for Seismic Data Processing
e iy - e- iy = 2i sin y
For example, ei?r = cos( 7T) = -1 = e- i7T
e- i7T / 2 =-i
(5) The last of these examples is a special case of the general result
eZ+27Ti = ex +i(y+27T) = eX(cos(y +27T) + i sin(y +27T»
= eX(cos y + i sin y) = eZ
From the formulae for cos y and sin y given at (4), it is possible to
generalise the concepts of trigonometric functions so that they are defined
for complex numbers z by the rules:
eiz+e- iz
cos z=
2
e iz _ e- iz
sinz=---
2i
The familiar formulae for trigonometric functions hold with these defini-
tions: for example cos 2 z +sin 2 z = I.
If we return to the idea of polar coordinates as mentioned in the previous
section, then for a non-zero complex number z = r cos (J + ir sin (J we see
that z = r e i9, where r = Izl and (J is the argument of z. Since r e i9 = r e i9+27Ti
from (5) above, (J is not uniquely defined by z; if (Jo is one value of the
argument then so is 80 + 2n7Ti for any whole number n. This corresponds
to rotating more than 27T around the origin in Fig. 4.3. There are various
Complex Numbers 85
subtle difficulties following from the non-uniqueness of the argument of a
complex number, but we will usually slide over them.
Polar coordinates can also be used to look at multiplication. Let z, = " e i8J
and Z2 = '2 e i8,. Then
z, Z2 = ,,(cos 0, + i sin 0,) '2( cos O2 + i sin ( 2 )
= "'2([ cos 0, cos O2 - sin 0, sin O2 ]
+ i[ cos 0, sin O2 + sin 0, cos O2 ])
So if we recall the trigonometric formulae, we have
z, Z2 = "'2( cos( 0, + O2 ) + i sin( 0, + O2 ))
This is a much neater and more useful formula than the previous one.
A consequence is that if '2 ',= = =
I then e i8J x e i82 e i(8 J+82). Hence, if
0, = O2 = 0 (say), we have
(cos 0 + i sin 0)2 = cos 20 + i sin 20
and for any integer n
(cos 0 + i sin Or = cos nO + i sin nO
So for any non-zero complex number z = ,e i8, we have zn = ,n e in8 =
,n(cos nO + i sin nO).
We can also see how to obtain squarJ: roots or nth roots using this polar
form. Suppose z = ,e i8 and we want Jz. Some reflexion will convince you
that
J; = J; e i812 or J; e- i812
This is illustrated in Fig. 4.5. More generally, to find an nth root, we see
,'In ei(O+2rr)/n, ••• , ,lin ei(O+2(n-l)rr)/n
i.e.
Z'/n = ,'In e i(oln+2krr / n>, k = 0,1, ... , n-l
Thus there are n distinct nth roots of a non-zero complex number z.
Using the exponential function, we can explain mathematically the some-
times mysterious relation between time and frequency which so often
appears in geophysical interpretation.
°
Let t be a variable which can have any positive real value. Put z = e it•
°
Then as t goes from to 27T, z moves once round the unit circle. If instead
we put z = e iwt then as t goes from to 27T, z moves round the unit circle
w times. So if we have a function of t (time), and we substitute z = e iwt we
obtain a function of z or of "frequency" wand time t (see Fig. 4.6).
There is one further function we need to investigate and that is the
logarithm. Following the example of real numbers we would like to have
log(exp(z» = z and exp(log(z) = z. Taking these rules as a guide we find
that the right definition for log is (using polar coordinates)
log(, e iO ) = log, + iO
86 Mathematics for Seismic Data Processing
im
(0.1)
real real
_ _ _ _~----~L-~----_ _ _ ~
(to)
The theory for differentiation is very similar to that for real variables. We
define
f'(z) = df =limf(z+h)-f(z)
dz h~O h
It turns out that all the functions known to be differentiable for real variables
are again differentiable with the same differential.
Examples
d
4. dz (Z4) = 4z 3
d .
5. dz (cos z) = -sm z
d
6. dz (exp z) = exp z
One interesting result comes from examining the real and imaginary parts
of a differentiable complex function.
If we stop and think for a moment we might suspect that differentiation
might contain some hidden complexities as f(z) gives a curve in the plane.
We can write z = x + iy and hencef(z) = u(x, y) + iv(x, y) for some functions
Complex Numbers 87
u(x, y), vex, y). Since
j'(z) = lim f(z +az) - f(z)
dz->O az
We might let az = ax, i.e. we just increment the real part and get
fez + Ilx) - fez)
ax
or suppose Ilz = illy so we also have a derivative in the y direction,
fez + illy) - fez)
illy
The derivative j'(z) must be the same in each case and so if we write the
function fez) as u(x, y) + iv(x, y) and do some manipulation we have
au av
---
ax ay
and
au av
ay ax
So for a derivative to exist these "Cauchy-Riemann" equations must hold.
These are of great importance in mathematical physics.
A consequence of these equations is that
iu iu iv iv
-+ - = 0 and - + -=0
ax 2 ay2 ax 2 ay2
These equations are the two-dimensional version of Laplace's Equations.
Any function satisfying them is called harmonic. In the case above both u
and v are harmonic and are called conjugate functions.
Integration of functions of a complex variable is more difficult to cope
with than differentiation. But if we have a function fez) with a primitive
F(z), i.e. a function F(z) such that dF(z)/ dz = fez) then the theory for fez)
is similar to that for functions of a real variable. For example, we know
d/ dz( cos z) = -sin z, and d/ dz e = e so Z Z
,
I Z
z,
2 sin z dz = -cos Z2 +cos Zl
I e dz = e i - eO = cos I - i sin I - I
Z
= ( - 1 + cos 1) - i sin 1
The difficulties with complex integration come from the fact that, when
we think about the meaning of J;~ fez) dz we have to think about the route
or path from Zl to Z2 along which we are integrating as in Fig. 4.7.
On the "real line" there is only one route from Xl to x 2 , but in the complex
plane there are clearly many possible routes. If fez) has a primitive then
88 Mathematics for Seismic Data Processing
~l2
~th2
l,
all routes give the same answer for the integral. In particular J;l J(z) dz = 0,
where we use any path from ZI to itself as in Fig. 4.8. I
f!c Z
dz = 21Ti
That is, although Log z looks like a primitive for 1/ z, the integral Ji::~; 1/ z dz
is not zero. This difficulty over primitives needs to be borne in mind, but
will not be discussed further.
Chapter 5
MATRICES
1 INTRODUCTION
Very often we are faced with the situation where large quantities of data
which have to be processed. In the 19th century mathematical techniques
were developed for coping with this situation, especially for "linear" prob-
lems. Although we will not follow an historical approach it is interesting
to note that these ideas were used to cope with large-scale calculations at
a time when they had to be done by hand. Although nowadays with the
development of computers such techniques might be expected to be
irrelevant, the opposite turns out to be the case. Precisely because these
techniques were able to handle computation, they turned out to be very
appropriate for calculating on a very large scale.
The idea is that if we have data, we handle it in arrays. For example if
we consider the population of Britain, we might want to describe it by
(x}, ... , x 8 ) where for I::::: j::::: 7, Xj is the number of people whose ages lie
between (i -1) x lO and i x 10, and X8 is the rest. Thus XI is the number of
people aged up to 10, etc. until we get to X 8 , the number of people who are
over 70. For different purposes we might require alternative ways of splitting
up the population. For example, if there is some population of animals,
the rate of reproduction may depend on the number of females of a certain
age and this would be the most relevant information. Thus in human
population the ranges 15-20, 20-30, 30-40, may be the keyage bands.
Another example might be the data from a set of 50 microphones streamed
from a survey vessel. At a particular time t we might have the data
(x,(t), X2(t), ... , xso(t)), and over a period of time we could build up a whole
collection of such "strings". Each one of them is called a vector.
(D
the set as
90 Mathematics for Seismic Data Processing
~I I
0,11
y
1,1,1
0,0.1 r---'---J
,
,/~.~~).--- - - - - - - 1.1.0
1,0,0
x
Fig. 5.1
~+y ,/ --",
;
- 2~
'-- -
-~
Fig. 5.2
Exercise 1
Add the following pairs where possible:
(i) (1,2,3) +(2, 1,4)
(ii) (1,2, -1,0)+(-1,2,1,1)
(iii) (-1,4, 1)+(-4, 1,2,3)
(iv) (1,t 1)+(-1, -2, -1)
Exercise 2
Evaluate the following vectors:
(i) 1(1, I, 1)
(ii) (4,2,3)-2(1,-1,0)
(iii) 3( I, -1, I) + 2( -1, 1, -1)
3 MATRICES
Vectors can handle some data very well but quite often we need to manipu-
late vectors and to facilitate understanding it is more convenient to arrange
things in two-dimensional arrays. An m x n matrix is a rectangular array of
numbers with m rows and n columns. Some examples:
Example 1
(-~
3
2
5I) . 2 x3
IS
12)
(!
3 7
2 8 ~ is 3 x4
9
(~ }4X2
92 Mathematics for Seismic Data Processing
A+B=(1+2,
2 +3,
-1+1)=(3
3 +4 5 7
0)
Example 3
( 1 -1) -1)
-1
A= 2-2 ( 10
and B= ~
-3 4 3
are both 3 x 2 then
A+B= ~ (0 -2)~
Exercise 3
Add (where possible) the following pairs of matrices:
(i) G-~ ~) + ( - ~ ! ~)
(ii)
GD+G ~)
(iii)
G ~) +(-~ -~)
(iv)
( 0.1
1
1+ i
0.2
2
2 -I
C
0.3)~ + 0.1
2
-i
0.01
-i
-~I)
+i
Matrices 93
The last example illustrates the fact that the entries in a matrix may be
complex numbers, or, possibly, functions.
There are simple (and familiar-looking) rules that the addition of matrices
satisfies:
A+B=B+A; (A + B) +C = A +(B +C)
where A, Band C are all m x n matrices. If A = (aj), -A = (-a jj ) and then
A+(-A)=O, where 0 is the matrix all of whose entries are zero, then
0+ A = A. So we can add matrices in much the same way as we do vectors
(or numbers).
Again we have a scalar multiplication, aA, where a is a number and A
is a matrix, defined by a A = (aaij) where A = (aij).
Example 4
_2(2 -1) =(-2X2 -2X-l) =(-4 2)
3 1 -2 x3 -2 x 1 -6-2
4 MULTIPLICATION OF MATRICES
But in some ways the most useful thing about matrices is that (in suitable
cases) it is possible to mUltiply them together. We begin to approach this
technique by considering the simultaneous linear equations
aXI +bX2= el
eXI +dX2 = e2
To find the solutions (if any) we only need to know a, b, e, d, el and e2 •
We could describe the same information using matrices and vectors:
where the matrix (; :) tells us the coefficients, the vector (:J tells us
the constants and finally the vector (::) tells us the names of the variables.
Comparison of the equations (*) and (**) will lead us to a definition of
matrix multiplication (see Fig. 5.3). So
Fig. 5.3
and define
b·=
I ( ~li)
.
•
bni
then AB = (AbJ. Ab 2 , ••• ,Abp ). Notice that Ab i is an m x 1 matrix so that
AB is an m x p matrix. A few examples should make things clearer:
Examples
5. Let A= G_~), C) B=
Th
en AB =
(I2 _2)(1)
1 1 = (12xlx 1-1
+2XI). (3)
xI = 1
6. Let A= (1
2
2), B=
-I 1
(1
1x 1+ 2x 2) = (3 5)
2xl-lx2 1 0
Matrices 95
A= (12 2 0) and B= (2
1 23 64 4)5
I I
3 156
Also if A is m x nand B is n x p then we have defined the product AB to
be m x p. If m = p so that BA is defined then AB is m x nand BA is n x n.
Thus AB is in general a different size to BA. A
A matrix is said to be square
if for some m it has size m x m. If A and B are both square m x m matrices
then AB and BA are both defined, and are both square (of the same size
as A and B) but it does not follow that AB and BA are equal. For example,
if
A=G ~), B= (~ ~)
then
A=(]
96 Mathematics for Seismic Data Processing
(I 0I 0)0
13 = 0 and so on
o 0 t
then if A is m x n, AIn = A and ImA = A. If we write I we mean In where n
is not specified. Such matrices are called identity, or unit, matrices.
This is just a list of matrices with special properties which turn out to be
useful. We begin by defining the transpose of an m x n matrix A. The
transpose AT, is an n x m matrix whose i,jth entry is the j, ith entry of A.
An important relation is that (AB) T = BTAT, the proof is straightforward but
not obvious.
Examples
~ !) then AT ~G
8. If A=G
D
9. If A= ( -I
-I
o0) then AT = (-I0 -~)
If A is a square n x n matrix such that A = AT then A is called symmetric.
This amounts to insisting that for i ¥- j we must have aij = aji' If A is square
Matrices 97
(~ ~) H
3 4
3
2 4 1I
4 5
5 6
;)
Exercise 5
Check the following are skew-symmetric:
-D (-; -i)
:i
(-~ ~) (-: 0
-2
0
-l-i
1
As we hinted above, the entries ajj 1::; i::; n in an n x n matrix are called
the elements of the leading diagonal.
A square matrix A is said to be orthogonal if AAT = I. We will see later
that orthogonal matrices have importance in some problems of interpreta-
tion. Clearly I is an orthogonal matrix, since IT = I and II = I.
Exercise 6
Show that
(i) A=(~ ~)
(ii) A ( I/J"2 I/J"2)
= +1/J"2 -1/J2
I/J3 1//6
(iii) (
A = I/J~ -2/J~ are all orthogonal
I/J3 I/J6
A square matrix is said to be diagonal if the only non-zero entries (if
any) are down the leading diagonal.
Example 10
5 0 0 0 0
(~ ~)
0 0 6 0 0 0
3 0 0 -1 0 0
0 0 0 0 -2 0
0 0 0 0
(2 I)
A= 0 3 and B =
(1~ _1)i
then
and
BA=(~ -D(~ ~)
=G ~~~) =G ~) =12
Exercise 7
Check that if A =
Exercise 8
eD and B = ( _ ~ -2)
3 then AB = 12 =BA.
Check that if
6 MATRICES AS FUNCTIONS
One important way to look at matrices is to view them as functions from
vectors to vectors. Let A be an m x n matrix and let C(n) and C(m) denote
the set of all n x 1 column vectors and the set of all m x 1 column vectors,
respectively. Now A defines a function from C(n) to c(m) by sending x to
Ax where x is in C(n). To confuse things AT defines a function from C(m)
to C(n) by: y goes to AT y, where y is in C(m).
If R(n) is the set of all 1 x n row vectors and R(m) is the set of all 1 x m
row vectors, A also defines a function from R(m) to R(n) by x goes to xA
where x is in R(m).
To say that A is invertible means that the function defined by A has an
inverse map because if AB = I = BA then B(Ax) = Ix = x. To say that A is
symmetric is the same as saying that the function defined by A is the same
as the one defined by AT.
Matrices 99
The functions defined by matrices are rather special. They are called
linear because they preserve addition and scalar multiplication: for any
vectors XI. X2 and any number A, we have
A(xl +X2)=Axl +AX2 and A(Ax)=AAx
In fact given any linear function f from e(n) to elm), i.e. a function f such
that
f(X2 +X2) = f(xl) + f(X2) and f(Ax) = Af(x)
we can find an m x n matrix A such that f(x) = Ax. Consequently matrices
are fundamental to the study of linear problems (or, in practice, problems
that can be approximated by linear techniques).
If we take the special case of 3 x 3 matrices then e(3) is just normal 3
dimensional space. If we fix a coordinate frame for e(3) then a linear
function from e(3) to e(3) is a function that preserves straight lines and
fixes the origin. So matrices represent functions of this sort. In real life we
are frequently concerned with functions (or transformations) that preserve
the length of a vector. We can write the length of a vector x rather neatly
in vector form as
(length ofx)2 = xT . x (where x is a column vector)
or
(length of xi = x . xT (where x is a row vector)
Note that xT is a I x3 row vector (or a 3 x I column vector respectively),
so the product xT . x (or the product X· XT, in the other case) is a I x I
matrix, i.e. a scalar. Note also that we have to make sure we are using the
right formula: for example, if
is a 3 x 3 matrix.
(D ~
Examples
If we take two vectors x and y in C(3) (the set of column vectors) then
xT . y is called the scalar product of x and y. You may have encountered a
special case in the "dot product" of vector mechanics. An exercise in
three-dimensional geometry shows that
xT 'y
Ixllyl = cos ()
where () is the angle between x and y. (Beware: this only works if neither
x nor y is zero.) In consequence we say two vectors are orthogonal or
perpendicular if xT • Y= O.
Let A be a 3 x 3 matrix such that the function it defines "preserves
distance" and suppose y = Ax. To say that A preserves distance means that
yT 'y=xT.x for all x. So
(AX)T ·Ax=xT·x forallx
Thus xTATAx=xT·x for all x.
From this relation it can be shown that ATA = I. So we can now see the
importance of orthogonal matrices, they are precisely those that preserve
scalar products and hence distance and angles. For ifYI = AXI and Y2 = AX2
and A is orthogonal,
yiY2 = xiATAX2 = XTIX2
=XiX2
There are two final comments which may be worth making. One is that
the dimension 3 is in no way special in this context and the whole theory
and discussion could be carried out over C(n) for arbitrary n. The same
definitions of length and of scalar products work without any difficulties.
So far this discussion has been on the basis of real matrices and vectors.
There is no reason why we cannot allow complex entries and then we get
complex matrices and vectors. In this situation we introduce the complex
conjugate A of a matrix A = (aij) which is obtained by changing the entry
aij to its complex conjugate iiij. (Recall that this means, if aij = xij + iYij, where
xij, Yij are real, then iiij = xij - iYij.) It then turns out that scalar multiplication
is now iT . Yand that instead of orthogonal matrices we use unitary matrices,
i.e. those matrices A such that AT. A = I. For a real vector x, we have i = x
so iT = XT, and for a real matrix AT = AT. Thus, if a complex vector or matrix
happens to have all real entries these new definitions are the same as the
ones given for real vectors and matrices.
7 LINEAR EQUATIONS
In this section we will use procedures with matrices to solve systems of
linear equations. Let
allxl +aI2x2+'" +alnxn = b l
a21 . . 2 + ... +a2n x.n =.b2
x I +a22x (i)
Matrices 101
A = (aij),
There are certain "elementary" ways of changing the system (i) which
will not alter the set of solutions, where the set of solutions of (i) or solution
set of (i) is the set of all x E ern) such that Ax == b.
Firstly, if we switch two equations in the system the solution set will not
change.
Let us take a 2 x 3 system to illustrate.
4xI + X 2 + X3 == I
2xI - X2 + X3 = 2
The matrix of (**) has the same rows as the matrix of (*), but they have
been swapped round.
Secondly, we can multiply any equation in (i) by a non-zero constant.
We can multiply the first row in the example by a ~ to get
XI -h2 +~X3 =I
4xI +x2 +X3 =I
This is equivalent to multiplying the appropriate row of the matrix by the
same constant.
Thirdly, we can add (or subtract) any non-zero mUltiple of one row to
any other. In the example (***) above we can take 4 xthe 1st row away
from the 2nd row. This gives
XI -!X2 +h3 = I
OXXI +3X2-X3=-3
Using this example we can complete the process to find the solutions as
follows
XI -~X2 +~X3 = I
X2 - tX3 = -I (dividing by 3)
102 Mathematics for Seismic Data Processing
and
XI =1-~A
x3=A
G -1 ~)
G
-1
D
G ;)
I I
-2 2
G
I
-~)
-2 2
3 -1
G
I I
-~)
-2 2
I
3
I
0
(~ -D
3
I
-3
Examples
G !)
4
13.
6
-G 2
6 ~) divide 1st row by 2
-G 2
0 ~) 3 x 1st row away from 2nd row
=G 2
0 ~).
14.
G:)-G -D 3 x 1st row away from 2nd row
-(~o -~) -4
5 x 1st row away from 3rd row
-(~o ~) -4
dividing 2nd row by -2
~G D
-1
(-~
15. 2 4
2 -1
-2
3
-~) 2nd + 1st row
+3 +4 2 +3
j
2 3 2 2 x 1st row +2nd row
-(-i
3 -2
3 x 1st row-3rd row
2 -1 3
3 +4 2 2 x 1st row-4th row
104 Mathematics for Seismic Data Processing
-~)
1 2 3 2
-(
0 7 7 2
4th row ~ 2nd row
0 -4 -10 -3
0 -1 -2 -2
1 2 3 2
-i)
0 -1 -2 -2
-(
2nd row x-I
0 -4 -10 -3
0 7 7 2
-:)
2 3 2
-(
0 2 2 4 x 2nd row + 3rd row
0 -4 -10 -3 -2 7 x 2nd row - 4th row
0 7 7 2 4
I 2 3 2
-(
0
0
0
0
0
2
-2
-7
2
5
-12
-i)
II
2 x2nd row-1st row
(-4) x3rd row + 4th row
I 0 -1 -2
- ( 0
0
0
0
0
2
-2
2
5
-32
-f)
3rd row~4th row
-~)
0 -1 -2 3rd row + 1st row
- ( 0
0
0
0
0
2
-I
2
-32
5
2 x 3rd row - 2nd row
-~)
0 0
-(
0 0 66
divide 4th row by - 59
0 0 -32
0 0 0 -59
-34 34 x 4th row + 1st row
(
0 0
0 I 0 66 6) 66 x 4th row - 2nd row
-7
- 0 0 -32
0 0 0 -8/5! 32 x 4th row + 3rd row
0 0 0 82/59)
- (0
0 I 0 0 115/59
0 I 0 -79/59
0 0 0 -8/59
Matrices 105
Note: In one ortwo places various manipulations have been done to simplify
the calculations.
Exercise 9
Find the reduced row echelon form of the following matrices:
(i) (-~ 2 ~)
121
(ij) G: ~)
The significance of this procedure is that given a system of equations in
reduced echelon form it is easy to read off the solutions. Assume we begin
with (A, b) and end with systems which when interpreted as equations will
look like this:
+a;,r+IXr+1 ... +a;,nxn = b; p
X2 +a~.r+IXr+I·· ·+a~,nxn=b~
The various possibilities for solutions are given by the nature of the form.
It is probably best to use a number of examples to illustrate.
Examples
16. 2Xl +4X2 = 6
3Xl + 6X2=9
Then we know from earlier calculations that this system has the same
solution set as
Xl+2x2=3 and OX I +OX2=0
So we have X2 = A and Xl = 3 -2A.
17. 2Xl +4x2 =6
3Xl +6X2 = 10
Augmented matrix is
tions we get
4
3 6
(2 106). So by process of elementary row opera-
G~ l~) - G~ ~)
This equation is equivalent to
106 Mathematics for Seismic Data Processing
This is clearly impossible. In this situation we say that the system of equations
is inconsistent. This illustrates the rule that, when the augmented matrix has
been transformed to row echelon form, if we get a row (0, 0, ... , 0, 1) then
the system has no solution.
18. XI +X2+X3= 1
X2 +x3 = 2
x3=4
The augmented matrix is
which transforms to
1
( 00 ~ ~ ~) (~ ~ ~ =~)
- -
0140014
so the solution is XI = -1, X2 = -2, X3 = 4. We have a unique solution. This
corresponds to the transformed version of the augmented matrix being:
~x=Bb
In this case the solution is found once B (known as the inverse of A, written
A-I) is found. However, in practice the method of finding A-I is to use the
transforming procedure above, together with column operations (which are
defined analogously).
These techniques are generally known under the generic title "the Gauss
Elimination Method". There is a vast literature on techniques for carrying
them out on a computer. It is not too difficult to write such a programme;
P the difficulty is to make it efficient and to avoid creating too many rounding
errors.
Example 19
XI + X2 + X3 + X4 = 1
2xI - X2 + 2X3 - X4 = -1
- X I + X2 + X3 - 2X4 = 2
XI +3X3-3x4= 1
Matrices 107
Write down the augmented matrix: then apply elementary operations:
(-;
-1
0
I
2
I
3
-I
-2
-3
-~) 2nd row - 2 x 1st row
3rd row + I st row
4th row - I st row
1
(~ -i)
-3 0 -3 switch 2nd row with 4th row
2 2 -1 and multiply by -I
-I 2 -4
1st row - 2nd row
(~ j)
I -2 4
2 2 -I 3rd row - 2 x 2nd row
-3 0 -3 4th row + 3 x 2nd row
0 +3 -3
(~ ~)
-2 4 dividing 3rd and 4th
-2 4 rows by 6
0 6 -9
0 +3 -3
(~
0 0
-i)
+~
I 0
- 0
0 0
3
2
0
We can write this as
XI +~X4=-!
X 2 +x4 = I
X3 -~X4=!
We can choose X4 = A and then
XI = -!-~A
X2 = I-A
x3=!+~A
So the system is consistent, i.e. has a solution. There is just one parameter
involved A, and the remaining variables are determined by A.
Exercise 10
In the following write down the augmented matrix. Find its reduced row
echelon form. If the system is consistent, find the solutions.
108 Mathematics for Seismic Data Processing
(i) XI +X2= 1
XI-X2= 1
(ii) XI +X2-X3= 1
XI-X2 +2X3 =4
(iii) 2xI + 2X2 - 2X3 + X4 = 5
X I - X2 + X3 - X4 = 6
=7
3x I -4X2 +5X3
(iv) XI +x2 +X3=1
XI-2x2 +3X3 = 3
2XI- X2+ 4x3=5
To end this section we remark that equations of the form Ax = 0 (i.e. with
all zeros on the right hand side) are called homogeneous. Given the solutions
of this equation then if there is at least one solution say x, of the system
Ax = b then all the solutions are of the form Xo +XI where Xo is a solution
of the homogeneous equation. This situation is analogous to the problem
of solving linear differential equations.
These examples illustrate results which can be proved using the test which
we have mentioned: counting repeated eigenvalues the number of times
they occur, every n x n matrix has n eigenvalues (not necessarily distinct),
and for any two distinct eigenvalues we can find distinct corresponding
eigenvectors. There is also an important theorem which states that: if A is
a real symmetric n x n matrix, then all the eigenvalues of A are real and
)
there exists an orthogonal matrix, U such that
...
...
An
where AI • . . An are the eigenvalues of A. This is of great practical value as
in many physical cases A is real symmetric.
Another concept which is related to matrices is that of the quadraticform.
A quadratic form in n variables is an expression LicSj aijxiXj in variables Xi,
Xj with coefficients aij'
Examples 22
(a) x~+X~+2XIX2
(b) 2x~-3x~+4xIX2-X~+X3Xl
(c) XIX2+XIX3+XIX4+X2X3
In fact any so-called central conic has an equation in the form ax2 + bxy +
cy2 = r, where the expression on the left hand side is a quadratic form in
two variables x and y. If Lj,;j aijxjxj is a quadratic form and we define
bjj =4aij i <j
bjj=ajj
bij=4ajj i>j
for appropriate values of i and j, then using the symmetric n x n matrix
8 = (bij) and the vector
we can write the quadratic form very simply as x T8x. The 4appears because
of the symmetric nature of the system.
Examples 23
(a) (XI X2)( ~
(b) (XI
(c) (XI
AI 0)
U T 8U= ( A2 ...
o An
and the corresponding form is then
yT (
AI
. '.
0) y=AIYf+A2Y~+'" +AnY~
o An
For example, a process like this is used when we say that an ellipse can
be written as x2/a 2+y2/b 2=r2. In general a central ellipse has the shape
shown in Fig. 5.4, where the dotted lines represent the axes of the ellipse.
Matrices 111
Fig. 5.4
(
COS (J -sin (J)
sin (J cos (J
it is not too difficult to check that U is orthogonal and that, if
then we obtain the equation of the ellipse (relative to axes along the dotted
lines) in the standard form:
STOCHASTIC PROCESSES,
PROBABILITY AND STATISTICS
1 INTRODUCTION
Often when doing an experiment or monitoring a system we end up with
a sequence of observations, Xl> X2, X3, .•• , X" •••. These may be observations
made at discrete time intervals e.g. monthly sales figures, velocity per second
or we may prefer to think of them as forming a continuous record as time
passes, for example like a pen recorder. Figures 6.1-6.4 give examples of
records like these. Such sequences or traces are called time series and we
shall use x(t) to denote the observation made at time t. (In almost all
applications t is time but one could let t be distance, say down a railway
line and x( t) could be the "height" of the rail.) Thus in Fig. 6.1 x( t) denotes
the seismic noise at time t. Sometimes x, is used to denote the series, usually
when we have discrete time intervals.
Given such a series the obvious questions that arise are
(a) what does it tell us about the system or experiment that gave rise to
the series?
(b) how can we predict future values, or past values that are missing?
In some circumstances x( t) is predictable. Thus if x( t) is the output of
a radio and we know that the input is a signal of constant frequency and
amplitude then x( t) is (pretty well) predictable. In fact with a decent radio
you might argue that the output of a Bach fugue was perfectly predictable.
This isn't always so: if die radio is tuned badly to a distant source then
x( t) may contain "noise". At some point the noise may make x( t) unpredict-
able even when the source is known. Clearly, monthly unemployment figures
or aluminium production figures are not perfectly (if at all) predictable.
To take a specific case we might represent the major features of Fig. 6.2
by
x( t) = a COS(27T/ot + <p)
where 10 is the supply frequency and a the amplitude. A glance at the figure
shows that while this may well be acceptable as a crude description, the
actual value x( t) fluctuates irregularly with time.
Stochastic Processes, Probability and Statistics 113
Fig. 6.1
Voltage
KV
Fig. 6.2
Fig. 6.3
,
,
..
•
Fig. 6.4
114 Mathematics for Seismic Data Processing
Fig. 6.5
depending on the actual outcomes, i.e. each sequence is a time series. The
stochastic process X(t) describes the system, viz. a coin is tossed in the air,
and:
X (t) = "I if the coin is heads, otherwise 0"
x( t) = the actual value I or 0 according as the coin is heads or tails
Thus for the first sequence we have for x( t) the values
II 00 ...
Stochastic Processes, Probability and Statistics 115
Fig. 6.6
Example 3
A drunk stands at the point (0, 0) on the plane. At set intervals of time he
takes a step of unit length in either the x direction or the y direction.
Suppose that the x and y steps are equally likely and the length of each
step is equally likely to be + I or -I. Then Fig. 6.6 describes one outcome
of the "drunkard's walk". The drunkard's walk is the stochastic process,
while each drunk's path is a realisation, or observation.
In Section 6.9 we show how to generate such paths without requiring a
mathematical drunk.
2 PROBABILITY
Suppose we take a stochastic process at some fixed point in time, t. Then
at this moment X(t) will give rise to x(t). Since we are now just looking at
one point in time t we suppress the t suffix and consider the "random
variable" X and the observation x.
The set of all possible values that X can take, say S, is called the sample
space.
Example 4
Suppose X denotes the outcome of rolling a die. Then S consists of the
numbers I to 6 viz. S = {I, 2, ... , 6} and x will be the actual result observed.
Example 5
Suppose X is the diameter of a nominally I cm diameter ball bearing. In
this case S might be the set of diameters between 8 mm and 12 mm. If we
select any particular bearing and measure its diameter we get x, one of the
values in S.
116 Mathematics for Seismic Data Processing
S X(s)·x
x
Fig. 6.7
so p = 1/ n.
Stochastic Processes, Probability and Statistics 117
Example 6
Suppose that in a family p(baby is a boy) = p(B) = t p(baby is a girl) = p( G).
Consider families of two children: this can happen in the following ways
BB BG GB GG
i.e.
S = {BB, BG, GB, GG}
Assuming each is equally likely we have
p( one of each sex) = p( BG or GB)
= p(BG) +p(GB) =*+*=~
p(two boys) = p(BB) = *
p(two girls) = p( GG) = *
p(first a girl and then a boy) = p( GB) = *
Notice one can have one child of each sex in two ways BG and GB. These
are quite distinct events.
The famous statistician R. A. Fisher had 7 daughters; since there are 128
possible combinations of Band G the probability of this event is ,is,
Example 7
A die is rolled twice, giving the following set of possibilities
(I, I) (1,2) (1,3) (1,4) (1,5) (1,6)
(2, I) (2,2) (2,3) (2,4) (2,5) (2,6)
(3, I) (3,2) (3,3) (3,4) (3,5) (3,6)
(4, I) (4,2) (4,3) (4,4) (4,5) (4,6)
(5, 1) (5,2) (5,3) (5,4) (5,5) (5,6)
(6, 1) (6,2) (6,3) (6,4) (6,5) (6,6)
Notice we have taken the order into account and distinguish between (1,6)
and (6, I)-these are separate events.
As there are 36 outcomes then the probability of any pair (i, j) is just k
So
p(l,6)=p(6, I)=~
p(we obtain a I and a 6 in any order) = p(l, 6) + p(6, I) = is
p(two faces add up to 4) = p(l, 3) + p(3, I) + p(2, 2) = is
p(two faces have the same value) =p(l, I) + p(2, 2) + ... + p(6, 6) =::fu =~
p(second face shows higher number) = p(l, 2) + p(l, 3) + ...
+ p(2, 3) + ... + p(3, 4) + ... + p(5, 6) = ~ = f2
Exercises
1. A pair of dice is thrown twice. What is the probability of getting totals
of7, of 11 ?
118 Mathematics for Seismic Data Processing
2. A bag contains 6 discs numbered 1, 2 ... 6. Two discs are drawn from
the bag. Find the sample space S. What is the probability the sum of the
numbers on the discs is 12? 7? II? Suppose now one of the 6 discs is drawn
from the bag, the number noted and replaced. A further disc is now drawn.
What are the probabilities of the events above in this case?
Often we can only discuss the probability of an event A given that another
event B has already occurred. This is the conditional probability of A given
B, written
p(AIB)
We define this probability as
p(AIB) = p(A and B)/ p(B) provided p(B) > 0
If p(AIB) = p(A) that is B does not affect A then
p(A and B) = p(A)p(B)
and A and B are said to be stochastically independent or more usually just
independent.
Example 8
Suppose we roll a die, then the possible outcome is one of the numbers
{I, 2, 3, 4, 5, 6}. Since we assume that the die is "fair", we deduce
p( outcome i) = ~
If we roll two dice then
p(i on first andj on the second) = p(i on first)p(j on second)
=~x~=-k
This seems reasonable since we assume the dice are independent, i.e. they
do not collaborate, it also agrees with example 7.
Example 9
Suppose we roll die and B is the event that the number observed is even.
Thus p( B) = 4. Define A to be the event that the number observed is 4. Then
p(A and B) !.-p('----no_i_s4--<..)
p(A IB)= =-
p(B) p(even)
_1/6 _!
-1/2 - 3·
In the same way
p(we 0
b serve 5IB)= p(observe 5 and the.number is even) =0
p(number IS even)
since 5 is not an even number.
Exercise 3
There are 37 numbers on a roulette wheel, 18 of which are red. Assuming
a "fair" wheel and independence between spins, find the probability of 26
Stochastic Processes, Probability and Statistics 119
successive red numbers. This happened at Monte Carlo and made the house
rather rich!
Example 10
Suppose S represents the adults in a county who have completed an Open
University course. We classify them by sex and employment:
ways. This expression is often ~ritten (;), and is the number of combinations
of r objects from n. Thus, if we want to pick a committee of three people
from ten possible candidates, it can be done in
Of the possible hands, we could pick a hand all the cards of which were
in the same chosen suit (say spades) in C~) ways. Thus the probability of
a hand all of which are spades is
Furthermore, the probability of a hand all of which are in the same suit
Stochastic Processes, Probability and Statistics 121
Exercise 5
Find the number of combinations of two letters from CHLOE. Check your
answer by writing out all pairs.
4 PROBABILITY DISTRIBUTIONS
Binomial Distribution
Suppose our "trial" has only two possible outcomes Sand F. Let p(S) =
p, p( F) = q = 1 - p. Then suppose we conduct n independent trials; the
probability that X = r (where X is the number of S's observed) is given by
p(X = r) = (~)prqn-r
(n)
p(X = r) = r p r (1- p) n - r
This simple distribution has so many useful applications that there are
extensive tables to help compute the probabilities. Figure 6.8 shows the
shape of the distribution for some values of p. Further details can be found
in any good set of statistical tables (for example Statistical Tables by H. R.
Neave, George Allen and Unwin, London 1978).
Example 11
Suppose items come off a production line and the probability that one is
defective is 0.01. Then the probability that exactly one of a batch of 10 is
122 Mathematics for Seismic Data Processing
0.2
0.1
0.0 L.L...L.....I.---L-_ _~
o 2
I
68100246810
I 024 6 8 10
r~
Fig. 6.8
defective is given by
Poisson Distribution
Another useful distribution is the Poisson distribution, usually used when
counting "rare" events. Typically if events occur at "random" at an average
rate of A, then X, the number of events occurring in 0 to t, has the distribution
(At)' e- At
p( X = r) = -'----'-- r = 0, 1,2, ...
r!
We outline the derivation to give some idea how it arises: this can be
omitted at a first reading. Figure 6.9 gives the shape of the distribution.
Stochastic Processes, Probability and Statistics 123
0.3 I
0.2
P(X= rl
o 68101214
0.4
0.3
0.2
681012141618
04
0.3
0.2
0.1
Fig. 6.9
= Pn(t)(1- A8t)
+ Pn-,(t)A81
+Pn_l(t)A(5t)2
Example 12
Telephone calls are made to an exchange at a rate of 0.2 per second on
average. If the calls arrive at random and X is the number of calls received
in a minute, since At = 60 x 0.2 = 12 we have
(0.2 x6We- 12
p(X = 3) = 1.77 X 10- 3
3!
(0.2 x60) 12 e- 12
p(X=12)= =0.1144
12!
p( X :::; II) = p( X = 0) + p( X = 1) + ... + p( X = 11)
= 0.4616
p(X> II) = 1-0.4616 = 0.5384
Thus if each call lasts one minute, it is to be hoped that there are more
than 11 telephone lines!
Exercise 9
Tankers arrive at a dock at a rate of 3 per day. The dock has facilities to
unload 5 tankers at once. If X is the number of tankers arriving at the port
in a day, find: p(X:::; 5), p(X> 5), p(X = 0), p(2 < X < 5), given that the
arrivals are a Poisson distribution.
r
p(a < X < b) = F(b) - F(a)
= f(x)dx
0.2
0.1
o 1 2 3 4 5 6 7 8 9 ~
o
o , 334 5 6 7 8 9 ~
Fig. 6.10
F(x)
x
Fig. 6.11
Example 13
Suppose we choose a point on the line 0 to 1. Then let x be the observed
distance as in Fig. 6.12 of the point from 0, and let X be the variable giving
the distance of the point from O. If we assume that the probability of the
point falling in any segment is equal to the length of the segment, then
F(a)=p(X$a)=a
and
d
f(x)=-F(x)= I
dx
fIx)
o x,
x
Fig. 6.12 Fig. 6.13
fIx)
Fig. 6.14
in this case
F(x) = 0 when x:50
F(x) = LX A e- At dt = 1 - e- Ax when x 2= 0
In the Poisson distribution the time between random events has this distri-
bution.
(2) The normal (or Gaussian) distribution
f(x) = (2'7TU 2)-1/2 exp{ - (x - /-L)2 j2u 2}
where /-L and u are constants, this is sketched in Fig. 6.14. In this case
and then
JL
p(X < b) = p ( Z <---;;-
b-JL) = <P (b-
---;;- )
In general
Example 14
If T is the length of life of a component, and we know
f(t)=2e- 2t t2=O
=0 otherwise
then
F(x) = IX 2 e- 2t dt = 1- e- 2t
p(T< I) = f 2 e- 2t dt = 1- e- 2 = 0.865
or = F(l) = 1- e- 2t
p(l < T < 2) = F(2) - F(I) = e- 2 - e- 4 = 0.117
= l2 2 e- 2t dt=O.117
since,
5 JOINT DISTRIBUTIONS
~ X
Y 0 I 2
0 3/28 9/28 3/28
I 3/14 3/14 0
2 1/28 0 0
Then p(O<X<1
=0
and~<Y<~)=
1/4 0
otherwise
f
x(l +3 2)
4
l/2
y dxdy
II
which you should find quite straightforward to evaluate. Clearly this idea
can be extended in principle to the joint distribution of n variables XI.
X 2, ... ,Xn using a probability distribution f(x I. ... , x n) or a density
function.
If the variables are independent then we have a simpler situation that
f(XI.' .. , x n) = fl(XI)fix 2), . .. In(x n)
for some functions fl' f2 ... fn having the properties of density functions.
This follows from the definition of independence.
f:
or in the continuous case as
f.Lr=E[(X-f.L)']= (x-f.L)'f(x)dx
With these definitions then f.L is the "mean" of the distribution, and can be
thought of as its "centre of mass", while u 2 = f.L2 is the variance, which gives
its "spread". You will often see it written as var(x). The next moment f.L3
to some extent measures the amount of symmetry.
Section 6 gives some practical illustrations of these quantities.
Example 16
Rolling a die we have
p(X=i)=t, fori=1,2, ... ,6
so
6
E(X)= L t,i=¥=~=3.5=f.L
6
E[(X-f.L)2] = L t,(i-3.5)2=2.9167
i=1
E(X)= f oo
o
Axe-AX dx=-
A
1
var(X) = II A2
while for the normal distribution
f(x) = 1I (J2'TT'u 2) exp{ _ (x ;;)2}
E(X) = f.L
var(X)= u 2
f.L3 = 0
since the distribution is symmetric.
Stochastic Processes, Probability and Statistics 131
Example 17
In general
var(X) = E[(X - JL )2] = f (x - JL ?f(x) dx
= f x 2f(x)dx+JL2_2JL2=E(X 2)-JL2
This result also holds for discrete variables.
Example 18
Suppose we have the density function
200
f(x)=- x> 10
x3
Then
F(t)= f t
10
200
-3
X
dx=
[200] t
--2
2x 10
100
=1--
t
2
and
E(X)= f
oo 200
x·-3 dx=
fOO -200
2 dx=
[200]00
--
10 X 10 X X 10
The following table gives the mean and variances for some common
distributions
x=O, I ... n
Poisson A A
Exponential
A
x>O
Normal
132 Mathematics for Seismic Data Processing
E(c/>(x)) = L c/>(x)f(x) dx
p(B = k) = (~)(0.05)\0.95)5-k
since once the machine breaks the remainder of a day is lost. The cost of
repair if a maintenance contract is not held is £250 per call.
If a maintenance contract costing £ 100 per week is available which covers
the cost of all repairs is it worth considering?
Stochastic Processes, Probability and Statistics 133
E(XY) = ff xyf(x, y) dx dy
~ X
Y 1 2 3 4
1 1/16 0 0 0 1/16
2 1/16 2/16 0 0 3/16
3 1/16 1/16 3/16 0 5/16
4 1/16 1/16 1/16 4/16 7/16
1/4 1/4 1/4 1/4 1
134 Mathematics for Seismic Data Processing
87 5 25 38 19
cov(X, Y)=16-2: X g= -16=-g
Example 21
Suppose we have 100 measurements of length (in mm)
22.5,20.1,23.3,22.9,23.1,22.0,22.3,23.6,24.7,23.7,
24.0,20.4,21.3,22.0,24.2,21.7,21.0,20.1,21.9,21.9,
21.7,22.6,20.9,21.6,22.2,22.5,22.2,24.3,22.3,22.6,
20.1,22.0,22.8,22.0,22.4,22.3,20.6,22.1,21.9,23.0,
22.0,22.0,21.1,22.0, 19.6,22.8,22.0,23.4,23.8,23.3,
Stochastic Processes, Probability and Statistics 135
22.5,22.3,21.9,22.0,21.7,23.3,22.2,22.3,22.8,22.9,
23.7,22.0,21.9,22.2,24.4,22.7,23.3,24.0,23.6,22.1,
21.8, 21.1, 23.4, 23.8, 23.3, 24.0, 23.5, 23.2, 24.0, 22.4,
23.9,22.0,23.9,20.9,23.8,25.0,24.0,21.7,23.8,22.8,
23.1,23.1,23.5,23.0,23.0,21.8,23.0,23.3,22.4,22.4.
First we try to give a pictorial idea of the data. The simplest method is the
histogram-to construct one we take a grid of lengths and count the number
of observations in the grid.
These histograms summarise the data and give an immediate pictorial
representation. They are very important in getting a feel for what is going on.
19.5-19.9 19.7 I I
20.0-20.4 20.2 IIII 4
20.5-20.9 20.7 III 3
21.0-21.4 21.2 III 3
21.5-21.9 21.7 Uf1 Uf1 II 12
22.0-22.4 22.2 U11 U11 U11 U11 U11 II 27
22.5-22.9 22.7 U11 U11 II 12
23.0-23.4 23.2 U11 U11 U11 16
23.5-23.9 23.7 U11 U11 II 12
24.0-24.4 24.2 U11 III 8
24.5-24.9 24.7 I I
25.0-25.4 25.2 I I
Total 100
Then we use these figures to draw bars of heights equal to the various
frequencies in our grid. Technically the area of the bar is equal to the
frequency and this can enable us to cope with grids with unequal intervals
(see Fig. 6.15).
Example 22
Figure 6.16 gives the distribution of the age at death of a sample of infants
born in the UK.
Clearly we have a background population and in each case the sample
and the associated histogram is used to make some inference about the
population. We might feel that the frequencies give an approximation to
the probabilities, thus we might suspect in Example 21
p(measurement lies between 22.0 and 22.4) = (2Jo
We can also use numerical values to summarise the data XI. X2 • •• Xn
(i) the sample mean x = 1/ n L~~( Xi
In Example 21 x = 22.555.
(ii) The median M which has the property that half the values are less
than M. To compute M we usually rank the observations in order
x(I), X(2), ••• , x(n) where x(I) is the smallest, x(2) the next smallest and
136 Mathematics for Seismic Data Processing
Fig. 6.15
sample x
I 7853 4.8
097 I 8 5.0
53146 3.8
h If---r-
I-'Ir---"""-I""'I---'I
o 5 10
Age at death (years)
Fig. 6.16
is used to estimate the population variance. For Example 21 S2 =
(l.077f·
(v) Various other quantities are used; Qu the upper quartile, i.e. the
value that exceeds 75% of the observations.
QL the lower quartile, i.e. the value that exceeds 25% of the observa-
tions.
I the interquartile range IR = Qu - QL.
From the sample and histograms etc. we attempt to make inferences
about the unknown population structure. The details are complex
and we shall restrict ourselves to one simple application.
Exercise 13
For the data below construct a histogram. Compute the statistics i, S2, and
M. Mark i on your histogram.
1 3 4 5 1 1 3 4 o 6
7 7 4 2 3 2 2 1 3 3
4 6 1 2 1 6 4 1 2 1
6 4 4 1 3 2 2 4 6 1
o 4 5 3 3 1 o 3 2 4
4 o 2 4 2 o 1 5 2 1
3 o 3 3 2 3 4 6 2 4
2 4 6 o 3 2 2 1 3 2
1 3 4 2 2 2 2 1 3 7
1 1 2 6 2 3 3 o o 2
4 4 3 3 1 4 1 3 1 1
2 5 2 3 3 1 3 3 3 1
3 1 2 2 5 o 3 1 1 5
2 o 6 3 6 1 2 3 2 1
3 2 4 2 2 1 2 5 2 o
1 3 3 8 5 4 6 o 6 4
5 3 3 2 2 2 5 2 4 8
2 1 2 2 3 2 4 2 2 1
4 3 3 4 6 1 1 5 o 2
1 3 1 3 3 5 2 3 5 3
138 Mathematics for Seismic Data Processing
8 TWO VARIABLES
For two samples we can also estimate the correlation p. Suppose we have
n pairs of observations (Xi, 1';) for i = 1, ... , n. If we define
_ 1 n _ 1 n
X=- '" L.. X- Iy=- I 1';
,
n i=1 n i=1
and
n
Sxx = I (Xi-X)2
i=1
i=1
i=l
We assume that the errors are independent and have common mean 0, and
common variance (F2.
To find a and b we might choose the values which minimise
n
Q= I (Yi-a-bxi
i=1
or
Stochastic Processes, Probability and Statistics 139
y
.....
.....
r=O
1
r=O.9
..... . .
.... .:...
y • •• y
....
x ~ Fig. 6.17
x
"m
y !luum;u,:
_ _ _ _ _ _. .
(X_'!2) • •
,
x
Fig. 6.18
and
Thus
L Yi = L a + b L Xi
and
L Xi.Yi = a L Xi - b L x7
which give the "normal equations"
L Yi = na + b L Xi
L X;)Ii = a L Xi - b L X7
We should check that this does give a minimum but we omit this detail and
hope the reader will believe us.
It is easily checked that
a = y-bi
and
b= SXY
Sxx
This is the regression of Y on X
Example 24
Patients' kidney function is monitored by measuring the level of a trace
element injected into the blood stream as time goes by.
140 Mathematics for Seismic Data Processing
1.3956 10
1.3734 12
1.3852 18
1.3416 20
1.3498 24
1.3697 26
1.3493 32
1.3175 36
1.2599 42
1.3136 48
1.2737 50
1.1946 100
1.9858 1.0000
4.6843 2.0000
3.8751 3.0000
10.9987 4.0000
9.9644 5.0000
15.2463 6.0000
16.6968 7.0000
15.8197 8.0000
21.1323 9.0000
23.3355 10.0000
26.0283 11.0000
28.5071 12.0000
27.3629 13.0000
25.0314 14.0000
28.6330 15.0000
31.9960 16.0000
32.9099 17.0000
38.2365 18.0000
34.2972 19.0000
41.5566 20.0000
Stochastic Processes, Probability and Statistics 141
1.5
1.3
1.1
-- '0 __
t(secs)
Fig. 6.19
Nearing the end of this chapter we take a brief look at methods of simulating
experiments. This look should also illustrate the relationship between a
random variable and its realisation. Our methods are based on a sequence
of "random numbers". The definition of random numbers can raise quite
nasty problems, we shall just assume devices (or equivalently computer
programs) exist which produce sequences of "random" digits. In BASIC
the function is RND(X). By random digits we assume equally frequent
occurrences of each digit, but with no discernible pattern. Table 6.8 gives
a set of numbers produced by such a generator. We shall, as an approxima-
tion, use numbers from this table and assume that they are random.
Suppose we have a random variable X with a distribution p(X = 1) =!
p(X = 0) =!. We look for a mechanism to generate realisations of X and
values of Os and Is. Suppose we choose a random digit from the table. If
it is one of 1, 3, 5, 7, 9 we deem X to have the value 1 while if it is 0, 2,
4, 6, 8, then X is deemed to take the value O. Thus the sequence of digits
30458 gives X values 1,0,0, 1, O. If we had to choose X to take the values
Head and Tail then we would have simulated the tossing of a coin. If
X = red or black then we would have simulated the outcomes of a roulette
wheel.
To handle more complex problems we first consider how to produce
values for U where U is uniformly distributed between 0 and 1, i.e. U is
a random number between 0 and 1. We can work to d decimals as follows:
choose d digits, 60219 say, and place a decimal point before the left hand
one, to get 0.60219 as the required number.
Example 25
For the game "chuck a luck" you are invited to bet £1 a number from 1,
2, 3, 4, 5, 6. Three dice are thrown and you win £K if the number of times,
K, your number appears is non-zero, otherwise you lose. Thus if you back
4 you win £1 for one 4, £2 for two 4s and so on. If 4 doesn't occur then
you lose your stake.
142 Mathematics for Seismic Data Processing
Random Digits
60219 01405 ',6662- !<lOOO ()46lH <39266 4a.48 64441 16007
86944 6490/ 8(;Ol::': o06{)~ 6IB:C0 OBi57 053;~0
091:"4 95027 4009:) 66664 8/::';26 6-+~)t:2 2359li 5BO't7 13669 33766
4821/ }16<10 /jf;jOJ 006/2 .. .30'>'9 4'1;,60 216)'9 3/862 51031 02;393
942<10 24<167 62636 5/29~: 953<1.3 822/6 01452 02/64 95827 6365~>
51749 37889 '>'6641 130"74 59861 60211 29095 09672 77489 13400
80309 33825 66220 24424 65317 03088 64654 6/504 26771 55108
29320 06216 20788 21712 88886 66767 09120 33219 69719 38069
06684 89301 23299 47598 97659 16735 96393 40863 19069 48330
19647 05272 25832 48938 25174 66654 19643 47573 56068 28029
91627 34990 66789 77256 40213 17982 40322 47825 79699 89296
95272 87270 10276 28031 15651 49008 54462 68420 44737 90'J4~j
55093 68372 81382 26263 66387 11334 40456 78640 20932 551~)6
09798 61378 02517 72648 63039 22456 22820 17868 63496 8254~~
~S6587 61599 30946 66912 53639 65269 87144 95920 83838 73762
84002 57484 19497 28527 52711 66041 93180 13714 17029 18370
7 6~.~ 18 40374 09.300 22761 <16493 86684 35873 33471 55101 ?l86f.l
22867 74847 38427 34953 70/25 04573 14/05 32877 853f.l3
43840 74666 /2489 34264 52871 56411 65459 29192 88637 15307
99590 78662 /'/1:"7 3%32 73964 106Bl 57011 48183
75:5'>'9 10606 0/64.3 804'>'4 67713 12631 /4,,76 91027 67943
376.,0 62921 :"039<1 :;'14/8 21~IBO 61830 :.'.'1394 B7699 08694
33235 26~'78 :j940:::' 24/1:';: 15576 995BB <,15962 /1~68 62385 612~jEl
04(132 6/:"6<1 668,:),:) 49.541 20099 7'1204 ()9..524 94647 0::'6U ;:~',8l0
58563 376,,), 12043 52'778 18828 74't19 0:::1445 17254 10615 04137
82898 31192 07944 3029"7 912T7 81234 51435 4:'.:.295 22966 03944
05070 52557 86600 76672 64175 48824 29124 38816 08937 510B7
31415 31947 68217 25701 20043 36307 71783 99230 88528 4591~~
29077 31027 52213 38563 00430 12550 29660 87182 65095 93923
70489 49458 24852 81311 45193 39468 47522 07207 55866 4246"7
47407 21451 77500 19656 98062 41106 29548 64092 21026 673M,
64196 91133 55018 41382 85029 05463 34210 78635 44976 79710
54521 93753 64520 96323 92639 31233 51428 80460 88712 0137El
l2836 76023 51517 74013 86542 78707 00396 75640 11430 60114
2"7047 74090 68775 26473 66635 96127 10657 90027 40835 90660
;n:360 00710 48466 54551 91516 48198 24992 39476 91497 051/0
51799 16501 54529 18207 41438 66220 55908 54785 47129 06123
8"7319 19686 53825 52234 62992 03034 02617 73932 36097 87399
21049 33319 98784 59544 41446 13,,34 01320 68857 95034 41044
07049 67869 19680 275'11 64365 44709 55913 80179 99003 20214
',841., 913'>'5 6:"036 04382 77641 7:,814 67753 /~J084 44899 261B9
3/126 97848 4~49~ 63/16 <180<11 1 ;,<,I~''t 19560 6B333 47670 67316
B3::>'64 829;j() lOll/ 41790 841()1 8B:"O() :J()l44 64101 Ci9741 9434<.1
036<18 "7097<1 /1287 64161 21891 30146 60406 2J999 88491 86876
67054 30"700 95609 52898 97103 2;jl·?3 2348:) 12514 36341 5721~~
061<10 84691 00363 21523 13857 2B417 21830 2B099 58370 74675
97891 96114 81915 17792 02369 12693 61261 67722 21420 02'J49
51559 50287 60103 17184 10242 78478 18412 03317 97694 51163
02602 05475 57445 16753 60581 74783 68269 90193 87792 319()~.!
90896 61776 89585 57563 63118 19080 84127 11245 13011 44592
Stochastic Processes, Probability and Statistics 143
05787 03472 0.0694 0_0046
~ )( )()( )
o
o~----------------~--------~----~o
w
Fig. 6.20
There are 63 = 216 possible outcomes for the fall of 3 dice and we find
(and you might like to check) that
U X Gain
0.6022 1 1
0.1794 -1 o
0.2367 -1 -1
p(step in ± y direction) =!
p(step in ±x direction) =!
144 Mathematics for Seismic Data Processing
and
p(stepis +I)=~
p(step is -I) = ~
from the possibilities open to the drunkard. See Fig. 6.6.
Exercise 15
Generate 5 samples from the Binomial distribution
Exercise 16
The number of accidents on a stretch of road per day is known to be a
Poisson variable X mean A = 0.15, i.e.
(0.1 5)'e- oo 15
P(X=r)='O-""'~-
r!
By using random numbers simulate 2 successive weeks of accidents. How
close is your average number of accidents to the population mean value 0.15?
f
I=i g(y~a») dy
so we can save time and think just of integrals like I = J~ g(x) dx. Now
1= f g(x)f(x)dx wheref(x) = 1
= E(g(x»
where X has a uniform distribution f(x) = 1 where O:s;x:s;1. We can take
a sample of uniform numbers for X and approximate I by
1 n
I =- I g(Xi)
n i~1
where the Xi are uniform random digits. We know J~ eX dx = [eX]b = e -I =
1.71878. Simulation gives 1.7194 with n = 10.
10 CONFIDENCE INTERVALS
You should now appreciate that when we take a sample and estimate a
population parameter there must be some error. Statistics has many tech-
Stochastic Processes, Probability and Statistics 145
niques for reducing error, finding "best" estimates and testing hypotheses
about parameters. We will not discuss these as they would require a separate
volume, however we think that you should encounter one useful idea in
this area based on the "Central Limit Theorem".
The Central Limit Theorem is of critical theoretical importance in Statis-
tics, but also has many useful applications. It enables us to deduce the
probability distribution of a sum or an average even when the distribution
of the individual components is unknown.
As you can imagine we often have sums of observations and so this result
is of crucial importance.
A simple form is as follows:
Suppose we have n observations each with mean JL and variance (J'2.
Then if x = (I Xi) / n, the average, we can prove z = (x - JL ) / «(J' / .JTi) is normal,
mean 0 and variance I; notice that the variance is reduced by JTi and we
gain precision. A simple example of its use is in the analysis of rounding
errors. Suppose, because of rounding error, the last digit of a number is
"noise" and is equally likely to be any of 0, 1, 2, ... ,9. Suppose X is a
random variable taking these values then
E(X)=4.5 var(X) = 8.25
If we add 1000 numbers then we might ask what is the probability that the
sum of these digits exceeds 4500?
p(sum> 4500) = p(X > 4.5) = p(Z > 0) = 0.5
from tables of the normal distribution. This is a simple example of a
profound result whose importance is hard to overstate.
Example 27
Suppose we have a Binomial distribution
p(TSr)=p(zs T-n p)
Jnpq
146 Mathematics for Seismic Data Processing
where z is standard normal. Thus p( T:s q) = 0.9662 from tables when n = 15,
P = 0.4. Using the approximation
p(T:s9)=p(z< 1.842)
=0.9673
Since IL does not vary, it is just unknown, this is taken to mean, what
interval (a, b) traps IL with probability I - a? After some algebra we can
find the interval is
-
p( X - ZI-a/2
cr IL < X- + zl-a/2 ..;-;,
J;;< cr) = 1- a
- cr
90% confidence is X ± 1.6449 Fn
- cr
95% confidence is X ± 1.96 ..;-;,
- cr
99% confidence is X ± 2.5758 ..;-;,
You will notice that if you require great confidence then for any sample
size the interval will be wide while as the degree of confidence reduces so
does the interval.
Notice that the width of a 95% confidence interval is 2 x 1.96 cr/J-;' =
4cr/"jn. Thus, whatever cr, increasing the value of n decreases the width of
the interval. In addition if we have accurate observations and hence have
correspondingly small values for n for a given precision.
There is an interplay between the degree of precision required and the
sample size.
11 STOCHASTIC PROCESSES
With this sketch of the basic concepts we can now turn our attention back
to stochastic processes.
First we consider an example of a simple queue where the probability
structure is known and we can get (with a struggle) some exact results.
Suppose that people arrive at a counter randomly at a rate a, i.e.
p[ one arrival in (t, t + 8t)] = a8t
p[more than one in (t, t +8t)]=0
At the counter the service time has a Poisson distribution such that
p[service finishes in (t, t + 8t)] = f38t
Now at any time (t + 8t) there are n people in the queue, this could be
(a) because there were n at time t and nobody arrived or departed in
time 8t
(b) because there were n + I at time t and someone was served in the
time 8t
(c) because there were n -I at time t and someone arrived in the time 8t
(d) there were n people at time t and there was one departure and one
arrival
(e) some other events of small probability.
Then as
p(no arrival) = I - a8t
p(no departure) = 1- f38t
148 Mathematics for Seismic Data Processing
+ Pn( t)a8t~8t
If we ignore terms in (8t)2 we have
p~( t + 8t) - Pn(t)
8t = ~(Pn+I(t) - Pn(t)) + a(pn-I(t) - Pn(t))
dpn(t)
dt = ~{Pn+I(t) - Pn(t)} + a{Pn-I(t) - Pn(t)}
Solving this system of equations enables us to say quite a lot about queues
with the assumptions above.
If we assume the queue has reached a steady state then dpn( t)/ dt = 0 and
we can find (after some manipulation) that
Pn = (l_p)pn where p = a/~
and we can find the expected number of customers in the queue. Also, if
T is the time a customer spends in the queue
p(T> t) = P e-!3(I-p)t
From our "simple" model we obtain some very useful results which
clearly have wide applications. Notice however that we are talking about
a model with known parameters. For an individual queue the numbers and
arrival times cannot be predicted.
Usually the boot is on the opposite foot. We know the actual values
observed but do not know the values of the stochastic processes. This is a
very much more difficult problem which we start to approach in later
chapters.
Chapter 7
FOURIER ANALYSIS
1 INTRODUCTION
2 FOURIER SERIES
A function f is said to have period 21 if f(x + 2ml) = f(x) for all integers m,
i.e. f(x + 21) = f(x), and f(x - 21) = f(x). The two obvious such functions are
cos x and sin x both of which have period 27T. Notice that cos (7TX/ l) has
period 21.
In practice many functions are not periodic but we are concerned with
functions defined in a particular interval -I < x:s I, for some I. We then try
to find an analysis of this function over the interval as a sum of sine and
cosine functions. The secret of the technique is to use the relations for
integrating sine and cosine functions as referred to in the chapter on
integration (Chapter 3).
In Chapter 3 Section 2, we proved the relations
fo
27T
sin nx cos mx dx = 0 for all natural numbers nand m
150 Mathematics for Seismic Data Processing
L b
fn (x)fm (x) dx = 0 whenever n;f:. m
There are many orthogonal families and some, e.g. Walsh functions, have
been used in image processing, but in Fourier Analysis the trigonometric
or related exponential ones are the usual families to consider. It is not
difficult to extend our original relation to
f l
-I
nX1T mX1T
sin - - cos - - dx = 0
I I
This is done by using the same expressions as in Section 2 of Chapter 3.
Ifwe assume that our functionf(x) is written as a sum of sines and cosines
a L
f(x) =-+ 00o ( rX1T rX1T)
ar cos-+br sin-,-
2 r~1 I
then by integrating as follows
l
f_If(X)COS ( )
mX1T ao fl mX1T
- , - dX="2 -I cos-,-dt
+ ~
r-I
00 {
ar
f I rx1T mX1T
cos -,- cos -,- dx
f
-I
I rx1T mX1T }
+br -I sin-[-cos-[-dx
so
f
-I 2 -I I
00 I rx1T mX1T
+ r~1 a r -I cos -[- cos -[- dx
f l
-I
rx1T mX1T
cos -,- cos - - dx
I
Using the relations in Chapter I
cos(A + B) = cos A cos B -sin A sin B
cos (A - B) = cos A cos B +sin A sin B
we get
rX1T mX1T 1 ( (r + m)x1T (r - m)X1T)
cos - cos - - = - cos + cos -'------'---
I [2 [ [
So if
I = f l
-I
rX1T mX1T
cos -,- cos -[- dx
Fourier Analysis 151
then
1
I =-
II cos
(r + m )X7T
1
1
dx +-
II cos
(r - m )X7T
1 dx
2 -I 2 -I
and if r~ m,
I_![sin(r+m)X7T).
2 1
1
(r+m)7T
]1-I
+![sin(r-m)X7T).
2 1
1 ]1
(r-m)7T -I
Thus 1=0.
If r = m ~ 0 then cos( r - m )X7T /1 = I and
1-
_![.(r+m)X7T).
sm 1 (
1
)
]1 +2![]I
X -I
2 r+m 7T -I
=1.
If r = m = 0 then r + m = 0 and r - m = 0 and so 1= 1+ 1= 2/.
Now returning to our earlier equation (*) we see that
(Note: the reason why ao was chosen this way should now be apparent.)
If we now consider a similar analysis for
+ br f I
_I
rX7T mX7T }
sin -[- sin -[- dx (7.2.3)
we get
To summarize
and
and
ao L
f(x) =-+ <Xl 7Trx) +br sin ( -7Trx)
a r cos ( -[-
2 r~1 [
152 Mathematics for Seismic Data Processing
Example 1
Consider 1= 1, and f(x) = x 2 for -I < x:s I. Then b, = 0 for all r as x 2 is
an even function. We have
Qo= fl
-I
x 2 dx
= [X3] I
3 _I
=(t+1)=~
f
while
Q, = I x 2 cos rX7T dx
-I
Thus
2 I 4 (-1)'
x = -3 +2 L - 2 - cos( r7Tx)
11' r
Fig. 7.1
Example 2
Let us consider a wave given by
if 0< x < I
f(x) = { I
-I if -I <x <0
illustrated in Fig. 7.1. This is an odd function so we know thatf(x) has an
expression involving only sine waves. So f(x) = I br sin 1Trx/ I, and
I
br = -
fl 1Trx
f( x) sin - dx
I -I I
=!
I
fO -sin (1Trx) dx+!
-I I I
II° sin (1Trx) dx
I
f() 4 ~ . 1Tx(2m + I)
x =- L.. sm
1T m~O I
There is an interesting point to observe about this function. The first is that
f(x) is clearly discontinuous at x = o. The function jumps from -I to + I.
If we evaluate the Fourier series, 4/ 1T I:~o sin 1Tx(2m + 1)/ I at x = 0 we get
0, since sin 0 = O. So
4 00
f(O) 'i' - I sin(O)
1T m=O
Calculated Theoretical
al -0.406 -0.406
a2 +0.101 0.101
a3 -0.046 -0.046 Example I
a4 +0.025 0.025
as -0.017 -0.017
a6 +0.011 +0.011
bl 1.273 1.273
b2 0 0
b3 0.424 0.424
b4 -I x 10-3 0 Example 2
bs 0.256 -0.0255
b6 -I x 10-3 0
This however was using mathematically precise data and was fairly slow.
It is interesting but not in general a practical technique.
Sometimes we evaluate a function via its Fourier series, truncating the
series after N terms.
The truncated Fourier series is clearly not going to match f(t) exactly.
In fact at discontinuities the approximating series always overshoots the
mark very slightly (by 9.09%). This effect is known as Gibbs phenomenon
and is illustrated in Fig. 7.2. The figure shows the approximating curve to
the wave in Example 2 with a fairly large scale. Locally the errors can be
quite large despite the diagram.
Writing
o~----+-----4------+----+--
-1 ~ -- - - - --
At
Fig_ 7.3
f(x) =I Cr e irx
-00
where we assume that we are considering the interval -7T to 7T. The reason
for the doubly infinite sum will appear if we do a little mathematics. We
evaluate
Cr
1
=-
f1T f(x) e- Jrx
. dx
27T -1T
156 Mathematics for Seismic Data Processing
In many ways this is a more compact form and allows us to consider negative
frequencies. Using the formulae given earlier it is possible to write the c/s
in terms of the a/s and b/s.
Cr + C- r = ar and Cr - C- r = - ib r
Also C- r = Cr.
If instead of writing f(x) we had originally written f(t) and thought of f
as a function oftime, evaluating the Fourier coefficients Cr leads to the ideas of
frequency domain analysis.
Example 3
Suppose f(x) = cos Ax for -11" < X < 11" and A is not an integer.
Then
Cr = _1_
411"
f-rrrr {ei(A -r)x + e -i(A +r)x} dx
we eventually obtain
2( -1)',\ sin A1I"
C
r
= 11" (A 2 _ r2) r = 0, 1,2, ...
f(x) for-i<x<i
then the Fourier series is
00
f(x) = L creirrrx/l
-00
with
Cr = -
I
21
f' _I
f(x)e-lTrrx/1dx.
Fourier Analysis 157
Exercise (lengthy)
Suppose that f(x) is defined for 0 < x < l.
If
00
f(x) =L Cr ei27rrx/1
-00
show that
Cr = 21I f21
0 f(x)
.
e-·27rrx/1 dx
Complex Fourier series are easier to manipulate than real ones and are
often preferred for this reason. To show how they can give useful results
we introduce the convolution theorem.
Suppose we have two functions f(x) and g(x) both defined for -I < x < I
and
00
f(x) = L Cr ei7rrx/1
-00
00
g(x) = L drei7rrx/1
-00
Then
I
h(t)=-
fl f(x)g(t-x)dx
21 _I
h( t) = L a r e i7rr / I
-00
then
r = 0, ± I, ±2, ...
This latter result that
00
h( t) = L crdr e i7rtr / I
-00
f
to be f(x - t) and hence obtain
1
h(O) = -I
I
f(x)2 dx = L
00
c;
2 _I -00
158 Mathematics for Seismic Data Processing
Example 4
f
Suppose f(x) = cos Ax as in Example 3, then
I 4A 2 sin 2 A7T
L
7T 00
- cos 2 Ax dx = 2 2 2
27T -7T r~-oo 7T (A - r )
so for A =0.5
I f7T X 4
L
00
- cos 2 -dx=
27T -7T 2 r~-OO 7T 2 (l- 2r2)
5 FOURIER TRANSFORM
F(t)=J-
I foo e-lXtf(x)dx
.
27T -00
Note the definition varies slightly from one author to another. This is often
referred to as transforming from the time domain to the frequency domain
or vice versa. Mathematically the theories are similar and there is no reason
why we should prefer one to another. The advantage is that we can work
either with F(t) or f(x) whichever is most convenient.
The important point is that this process is reversible; if F( t) is the Fourier
transform of f(x) then
f(x) = J-
I foo eix1F(t) dt
27T -00
definition above is used in mathematical physics and has the virtue of being
symmetric. We shall choose a slight modification which does not have a
multiplier, in this we follow E. Robinson.
Given f(x) we defined the Fourier transform F(t) as
It is not too difficult to show that the Fourier transform H(t) of h(t) is
given by
H(t)= F(t)G(t)
or
:!F( h) = :!F(f * g) = :!F(f):!F(g )J21T.
Often we choose to work with the Fourier transforms of functions because
we can write the convolution as a product and products are easier to handle.
It is both interesting and important to see if we can find a function f
such that f *g = g for any function g.
Consider the function as in Fig. 7.4
! if-t:!::sx::st:!
fd(X)= {d 2 2
o otherwise
If Gd = fd * g we see that
Gd(x) = f~oof(x)g(X-Y)dY
= f d/2
-d/2
I
-d g(x - y) dy
If we substitute z = x - y we get
Gd(x) =d
1 f x+d/2
g(z)dz
x-d/2
This is the average value of g over the interval given by x - d /2 ::s z ::s x + d / 2.
160 Mathematics for Seismic Data Processing
Vx)
Fig. 7.4
F(t)
':" t(x)
,
,
I
I
I F
I
I
-a 0 a x
t:
We can now see that
c5(y)g(x - y) dy = g(x)
t:
So
c5(O)g(-y) dy =g(O)
Thus
The Fourier transform of the constant function is c5 and the Fourier transform
of c5(y) is 1.
Example 5
Consider the square wave in Fig. 7.5. So
I if -a::s x::s a
f(x) = {
o otherwise
Fourier Analysis 161
Then
F(t) = f: e-i21Ttxf(x) dx
= fa e-i21Ttx dx
-a
=[- e;:~;XIa
= L [e21Tiat ~ie-21Tiat]
sin 21Tat
1Tt
f(x) F(t)
af(x) aF(t)
f(ax)
I~I F(~)
af(x) ± bg(x) aF( t) ± bG( t)
F(±x) f(~t)
f(x± a) F( t) e±21Tita
f'(x) 21TitF( t)
L:f(S) ds
1
- . F(t)
2mt
L: f(s)g(t - s) ds F(t)G(t)
g(x) G(t)
We can use the duality between the transforms to generate further results.
162 Mathematics for Seismic Data Processing
Example 5
Suppose we want the Fourier Transform of J(t)g(t) i.e. :Ji(fg). From the
table we can deduce
:Ji(f * g) = F(t)G(t)
f:
hence
:Ji(fg)=F* G= F(s)G(t-s)ds
Exercise 5
Show that
r:
Exercise 6
f:
Show that
J(x)g(x) dx = F(t)G(t) dt
Exercise 7
Find the transform of t e- 7Tt2 by differentiating a suitable function.
Exercise 8
Show that
6 THE z-TRANSFORM
The Fourier transform is only one of many possible transforms. One of the
most often used techniques for manipulating sequences is the z-transJorm.
Suppose we have J(t) defined for t = 0, ± I, ±2, ... then the z-transform of
J( t) is
00
F(z) = L f(t)z-t.
1=-00
Then
00 1
F(t)= L z-t =----1
t~O 1- z
(Note this is a geometric series, see "useful formulae".)
In practical applications one often works with data that are given as a set
of discrete quantities and in this case the finite discrete Fourier transform
can be very useful. We can think of it as a version of the Fourier transform
discussed earlier that is amenable to machine computation.
Suppose f(O),f(1), ... ,feN -1) is a sequence of (possible complex)
numbers, then the DFT of feu) u = 0, ... , N -1 is defined as
N-I
F(v)= L feu) e-27Tiuv/N
U~O
and these form a transform pair, for details see the following example.
Example 7
Recall
N-1 N-1
I W~' WiVku = I e i (2rr/ N)(v-k)u = N if k = v(mod N)
°
u=o u=o
otherwise. =
i.e. v - k is a multiple of N (See formulae-useful results.)
Using the definition and substituting for F(v) gives
N-1
f(u)= I F(v)W~
1= f: f(t) e- iwt dt
1= f X
-Xj2
/
2
f(t) e- iwt dt
where for simplicity we label the function values f(O) f(ax) ....
To avoid complications, choose ax = 1 whence
N-1
I = I f(n) e- iwn
n=O
converting from radian measure to hertz gives us the OFT in radian measure.
We continue our study of the properties of OFTs by noticing that f( u) and
Fourier Analysis 165
cJ(u) + dg(u)
has a OFT
cF(v) +dG(v)
while if
N-J
h(u)= L J(k)g(u-k) u = 0, I, ... , N - I
k=O
________________~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I----------------k
arg (Flkl)
------------~II~II~II~II~I~II~II~II-------------k
Fig. 7.7
Then
N-l
F(v)= L ak e-(27ri/N)kv
k~O
I-aN
27riv/ N Os V s N -I (Geometric series)
I-ae
The sequences are shown in Fig. 7.7 for a = 0.8.
o
h
Fig. 7.8
°otherwise. Then
Example 10
= 1, f( 1) = -1, f(j) =
f(O)
F(v)=I-e- iv
• V
= 21. e -iv/2 Sln-
2
Example 11
Suppose
1
f(O) = f(1) = ... = feR -1) = -
R
and
f(j) = ° otherwise
Then
1 R-\ ..
F(v) = - L e- IV) , and
R j~O
168 Mathematics for Seismic Data Processing
Exercise 9
Let A be the n x n matrix whose (p, q)th element is W)3. If x =
(x(O), ... , x(N -1» show that the DFT of x can be written as a product of
A and x.
N
f2(k) = f(2k + 1) k=O,I'···'2-I.
Then
N-I N-I
F(v) = L: f(k) W Nkv + L: f(k) W Nkv
k~O k~O
keven kodd
(N/~-I (N/~-I
But since
W 2N -- (e27Til N)2 -_ e27Ti/(N 12) -- W N/2
while 0:5 v:5 NI2-I. Otherwise F(v) = FI(v- N 12) + W NvF2(v- NI2).
If N 12 is also even we can split the two DFTs FI (k) and F2(k) and
recombine to make F(k). For N a power of 2 say 2" then we can split into
DFTs of length 2, compute each of these and recombine with the appropriate
"twiddle factor".
We can with a struggle follow a similar procedure when N is not of the
form 2N but has other factors. The technique is messy but the computer
algorithms work like a charm.
The details of these algorithms are interesting but have a jargon all of
their own involving "butterflies", "bit shuffling" and so on. We give a simple
programme following the rediscovery of the FFT by Cooley and Turkey. P
The algorithm outlined above is called the "decimation in time algorithm".
Another popular version is the "decimation in frequency algorithm". This
behaves as follows for N = 2m.
Let
N
k=O, ... '2-1
170 Mathematics for Seismic Data Processing
(compare this with the previous algorithm). Then F(v) can be written
(N/2)-1 N-I
F(v)= L f(k)W Nkv + L f(k)W NVk
k=O k=N/2
(N/2)-1 (N/2)-1
L j;(k) W Nkv + L f2(k) W;jk+N/2)V
k=O k=O
(Nj2)-1
L [fl (k) + e- 7TVi f2(k)] W Nvk
k=O
By considering the even and odd OFTs F(2v) and F(2v + 1) we can, after
some algebra see that these can be obtained from the (N /2) point OFTs of
N
k =0, ... , "2-1
and
Both algorithms are very similar and both take of the order of N log2 N
operations. The major difference is that one algorithm computes the short
OFTs and then adds the "twiddle" factors while the other performs the
twiddles first.
There may be advantages when computing FFTs in restricted high speed
memory in using decimation in frequency.
Example 13
Suppose we wish to find the inverse OFT
1 N-I
f(k)=- L F(V)W-;.kv
N v=o
and the right-hand side is just the OFT of F(v). Thus with scaling a single
FFT algorithm will perform the transform and the inverse transform.
Unfortunately we do not have the time to give an insight into the mathe-
matics behind the FFT. This hinges on the factorisation of N and the
representation of the OFT as a function of two variables.
The FFT is quite a remarkable algorithm and if coded sensibly is invalu-
able in that it enables one to tackle problems which would otherwise be
impossible.
Fourier Analysis 171
sin At
Fig. 7.9
9 FREQUENCY DOMAIN
By introducing two further parameters A and 4> the amplitude and phase,
a large number of time series with the same frequency as that above can
be generated
x(t)= A sin(At +4» -00< t<oo
(*)
= A sin(217"ft +217"8)
where 217"() = 4> the latter form being in cycles per unit.
Because of the periodic repetition of the sinusoid the phase can be
restricted to the range
-17"<4><17"
or
-4<()<4
172 Mathematics for Seismic Data Processing
I fT Ix(tWdt
E=-
2T -T
as a measure of the "power" in the series between - T and T rather like
the energy dissipated by an electric current. Then
=-
Inn
L L CkCr fT e(c7Ti/T)(k-r)1 dt
2T k=-n r=-n -T
Now 1/2T J~T e[i7T(k-r)I]/T dt = 1/21T J:7T eis(k-r) dt which is zero unless
k - r = O. Thus E = L: n ICkl2 and we see that the component at each frequency
contributes its share to the total power. We can plot the c~ and obtain a
discrete power spectrum. Clearly we need only do so for non-negative
frequencies since cn = C- n •
If we suppose x(t) is not periodic but extends to infinity we can run into
problems in the mathematics. A way around this is to say:
Let
X(t) if-T<t<T
Xd(t) = {
o otherwise
i.e. we make a copy. Now as before let
n
Xd = L Ck e i7Tk / T
k=-n
Fourier Analysis 173
then
1
E=-
fT xd(t)2dt= L
n
\Ck\2
2T -T k=-n
= lim L T\Ckf ~
T~oo T
Here we have a function f( w) which gives the contribution from the frequen-
cies w to w + 5w.
These ideas give the basis for "spectral analysis" a mode of analysis we
discuss in Chapter 8. While the modern approach depends on FFTs we
point out that the ideas here have given useful results which date back to
the time of Fourier.
Chapter 8
TIME SERIES
In this chapter we shall attempt to introduce some of the main ideas and
methods for studying time series. This is a fairly complex subject so we
shall have to give a fairly concise account.
We recall that a time series is a signal, or function of time, x(t), which
exhibits random or fluctuating properties. As outlined in previous units x(t)
can be regarded as one "realisation" from an infinite ensemble of functions
which might have been observed. We shall use X(t) to denote the "stochastic
process" or the random variable and x(t) to indicate the actual outcome.
In most of what follows we shall concentrate on single series for simplicity,
but if the "state" of the process can be represented by a vector of numbers
at the relevant time point say
then it may be necessary to look at the vector time series or mUltiple time
series x(t). Thus it may be sensible to consider x(t) where XI(t), X2(t) and
X3(t) are data representing the vertical component of the earth's motion at
three recording sites. Naturally more complex series raise more difficult
problems.
Given a stochastic process it can have two possible modes of behaviour:
it can be
(a) stationary, that is the mechanism generating the series does not change
significantly in time;
(b) non-stationary or evolutionary.
Stationary series are easiest to handle and we shall look at them in some
detail. Series can be studied in time and in the "frequency domain" via the
power spectrum. We shall examine the spectrum using the ideas of Chapter
7. The concept of a filter also arises in a natural way and we shall look at
filters and their properties. We have no intention of giving a comprehensive
coverage of time series theory and practice and for this the reader should
consult the references.
Time Series 175
X(t)
--u--~
Fig. 8.1
p(u)
u
continuous process
Fig. 8.2
i p(u)
I I
I I
I
I I u
)
Fig. 8.3
Examples
1. Purely Random Process: "White noise" in discrete time.
Suppose we make observations at times t, t = ... -1, 0, 1, 2, ... and at
these points
X(t)=a(t)
where the a(t) are mutually independent with zero means and common
variance u 2 • This process is called "purely random" by statisticians and
"band-limited white noise" by engineers. For this case
'Yxx(O) = u 2
'Yxx(u)=O u~O
Time Series 177
2. Markov Process: A slight modification of the above gives the Markov or
first order autoregressive process. Suppose X(t) satisfies
X(t)-aX(t-l)= a(t)
with a( t) as in 1 above.
Then we can show
u
2 a
Yxx(U)=U (l-a 2)
for a < 1 and large t. For a = 1 then the series is not stationary-it is called
the "random walk".
We easily see the lack of stationarity with a random walk. We have
X(t)=X(t-l)+a(t)
so
X(t) = (X(t - 2) +a(t -1)) + a(t)
= «X(t - 3) +a(t - 2)) + a(t -1)) + a(t)
= X(O) + a(l) +a(2) + ... +a(t)
Thus X(t) can be thought of as a sum of past "innovations". Now if
var(a(t))=u 2 for all t
var(X(t)) = tu 2
since
t
var(a(I)+a(2)+" ·+a(t))= L var(a(t))=u 2 t
;=1
See Appendix A.
Such processes are very common, suppose for example that X(t) is the
change in a stock price. It is often asserted that this is just the previous
change X (t - I) plus an unpredictable component a( t), i.e.
X(t)= X(t-l)+a(t)
3. Purely Random Process in Continuous Time: One of the nasty points in
the development of time series.is that we encounter problems in the definition
of a white noise process in continuous time. The obvious definition is to
consider a series with
Yxx(O) = u 2
Yxx(u) = 0 u;t' 0
Sadly we must have a continuous function Yxx(u) which in the case above
is not true. We shall avoid the problem by sleight of hand and define
a white noise process as consisting entirely of uncorrelated contiguous
impulses with
178 Mathematics for Seismic Data Processing
zIt) X(t)
Fig. 8.4
8 (u) being the delta function, i.e. 8 (u) = 0 u ¢ 0 and infinite otherwise. As
we do not have to deal with this process in detail it is sufficient to visualise
it as the discrete case with very close sampling values for t.
We shall now look at some of a rather wider class of models for time
series to give some idea of the processes one may encounter.
4. Linear Processes: A linear process X(t) generated by an input Z(t) (Fig.
8.4) of the form
1
X(t)= T f'I-T Z(v) dv
i.e.
v<O
O~v~T
v>T
(see Fig. 8.5). Then
J.L=E(X(t»=-
1
T
f'I-T
E(Z(v»dv=O
(72 -
'Yxx(u) = T2 iflul~ T
o T
Fig. 8.5
AX(t)=X(t+At)-X(t)= f t
t +4t
Z(v)dv=AtZ(t)
or
A~~ t) = Z( t).
Thus the "derivative" X(t) is given by
X(t)= Z(t)
If X(t) is normal then X(t) is called a Weiner process. Note X(t) is not
stationary.
7. Brownian Motion: This is a phenomenon in physics which describes the
random movement of microscopic particles suspended in a liquid or gas.
A simple one dimensional model is as follows.
At time t let X(t) denote the velocity of the particle (moving in a straight
line). Let Z(t) be a random force acting on it at that time.
Then the equation of motion is
X(t) = Z(t)- aX(t)
or
X(t) +aX(t) = Z(t)
If Z(t) is a purely random process then the process X(t) is called an
autoregressive process of order one.
Note if there is no resistance, i.e. a = 0 then X(t) is a Wiener process.
where some bj may be zero. We could use the B operator to give the form
for a finite case as
X(t) = (1 + OIB + 02B2 + ... + OqBq)a(t)
A combination of these forms gives the ARMA (autoregressive moving
average model),
+ cPlB + cP2B2 + ... + cPpBP)X(t) = (I + OIB + 02B2 + ... + OqBq)a(t)
(1
If <I>(B) = I + OIB + ... + cPpBP and 0(B) = 1 + OIB + ... + OqBq then
<I>(B)X(t) = 0(B)a(t)
For example
X(t) +O.IX(t -1) +O.2X(t - 2) = a(t) - a(t -1)
may be written
(1 +O.lB +O.2B2)X(t) = (1- B)a(t)
Time Series 181
rxx(u) = lim -
1 fT (x(t) -11-)(x(t +u)-I1-) dt
T->co2T -T
with
1 N
}1 = - L X(t)
N t=l
These equivalences are called the Ergodic Theorem. You might think of
the following analogy.
Imagine being at a ball, one can move around the whole ball and examine
the merrymakers (i.e. ensemble) or sit in the bar and watch over time.
With suitable mixing one might come to the same conclusions about the
composition of those attending the ball.
Given these results we can compute the autocorrelations for our realisa-
tion and thus start to analyse the underlying model. We aim to deduce
something at least about our given stochastic process by our knowledge of
the auto correlations. Before looking further at estimation we turn our
attention to the frequency domain.
then recall
We might suppose that there are infinitely many frequencies and that
182 Mathematics for Seismic Data Processing
f:f(W)dW
Our equation * becomes
'Yxx(O) = f:f(W)dW
with a relation very like a Fourier transform. In fact if we consider
'Yxx(u) = f: eiWUf(w) dw
(see Chapter 7). Note the scaling change in the Fourier transform.
f( w) is the spectral density function and it gives the contribution to the
energy in the system from frequencies lying between wand w + 8w.
Equations (**) are called the Wiener-Khintchin equations. The spectrum
gives us some idea of the underlying random process. In particular if x( t)
has any periodic component say
x(t)=x(t+p)
then there will be a peak in the spectrum at a frequency corresponding to
the period p.
Example 8
Continuous but harmonic series
K
X(t) = I A COS(Wit +tPi)
i=1
where K, A., ... ,AK are constants, the cPi are random, uniformly distributed
over (-1T, 1T). Then we can show
( ) _ ~ (!A;)
Pxx u - K
L...
cos WiU
i=1 I (!Ai
i=1
Time Series 183
Example 9
Pure noise series: From example 3 we know that
'Yxx(u) = (T21)( u)
hence using the transform formula above
(T2
f(w)=-
27T
(recalling the definition of the Dirac I) function outlined in Chapter 7).
Thus the spectrum is flat.
Exercise 2
Suppose we are given a series whose autocovariance function is
'Yxx( u) = e -2alul
We wish to emphasise at this point that effects with long periods have
short frequencies and hence will appear in the low frequency part of the
spectrum. We defer a discussion of estimation until we have discussed filters
and time domain estimation.
r---
,
one time interval ~I
i X(t)
Fig. 8.7
)I( )1(
At 2At 3At
Fig. 8.8
at !::J.t, 2!::J.t, ••. as in Fig. 8.8 then we cannot tell which harmonic was involved.
Both of the sine waves would give the same value.
We can demonstrate the result as follows:
assuming (for simplicity) a continuous spectrum and u = s!::J.t say for some
integer s then
J'fr/l1t eiSW-2ks'frif(w +
-'fr/l1t
2k7T) dw
at
but e is )'-2ks'fri = eis). so
'Yxx(sat) = J-'fr/l1t
'fr/l1t 00
e isw L f
(
w+
2 k)
-: dw
-00 ~t
Now we have two for.ns for 'Yxx(u), one is the usual infinite integral,
the second an integral between -7T/ at and 7T/ at, i.e.
7T
Iwl<-
at
and zero otherwise.
As you can see it is vitally important to choose the Nyquist frequency or
at so we do not lose important high frequencies.
Note: In angular measure the Nyquist frequency is 7T/ at, in cycles it is
1/2at cycles/time. Thus if at = 0.1 seconds, the Nyquist frequency is 5 cps.
The discrete spectrum at frequency 4 cps will be made up of contributions
from f(c) at 4 cps
and so on. Figure 8.9 gives examples of digiti sing on the spectrum.
It shows that one should take care in choosing a suitable sampling rate.
Exercises
3. Why do the wheels of a stage coach appear to rotate the wrong way on
film?
4. Suppose the series under study contains two sinusoid components at
frequencies of 100 cps and 99 cps. Given a record what sampling interval
is required?
~
W
____ -L~--t-----"----~) freq.
~ 2~ ~
) freq.
1
~~,/
o ~ + ) freq.
Fig. 8.9
. va Iues at t h
Its '
e time .
po lOts 7Tk
-1- k = 0, ± I, ±2, ...
and
X(t)= I Sin/(t-7Tk/l)x(7Tk)
k=-oo sin( t - 7Tk/ I) 1
So selecting a Nyquist frequency exceeding 1 ensures we have no problems
over the difference between the sampled series spectrum and the real series
spectrum.
In most of what follows we shall assume Ilt = 1, i.e. sampling at one cycle
per second. If the spectrum is f(A) and Ilt is not I we can easily get the
new version from
by
1 N-Iul
'Yxx(u)=- L x(t)x(t+lul)
N 1=1
f(w)=- L 'Yxx(u)cossu
27T -00
but see later.
We catalogue some models for reference.
Examples
10. Purely random process, discrete case: Here
Pxx(u) = 1 u=O
= 0 otherwise
i.e. 'Yxx (0) = (T2 'Yxx (k) = 0 otherwise. Hence
(T2
f(w)=-
27T
a flat spectrum over the range -7T to 7T.
11. (i) Markov Process-continuous time
X(t) +aX(t) = a(t)
i.e.
so
1
f(w)=- foo exp(-alul) cos wu du
27T -00
L
X(t) y(t)
Fig. 8.10
i.e.
or perhaps
Y(t)=(Dq +a\Dq-\ + ... +aq)X(t)
or even a combination
(DP + a\DP-\ + ... + ap) Y(t) = (D q + b\Dq-2 + ... + bp)X(t)
If X(t) is of a known form we can come to some conclusion about the
properties of Y( t). In fact we would hope to infer the form of L from
knowledge of X(t) and Y(t). First we shall just look at the ways we can
characterise the effect of L.
In terms of the spectrum, say
Y(t) = L ajX(t - j)
then
'Yyy(u) = E(Y(t) Y(t + u))
= L L akaj'YXX(u + k - j)
k j
If
ct>(B) Y(t) = 0(B)X(t)
Time Series 189
where
p
<I>(z) = 1 + L l/Jrz r
r=1
q
0(z)= 1 + L 8r z r
r=l
then
fy(w)I<I>(eiWW = fx(w )10(e iw )12
Let L be a filter, say giving an output Y( t) of the form
00
Y(t)= L guX(t-u)
u=-oo
This can be put into any reasonable form by the correct choice of gu, e,g.
go = 1, gl = 0.7, gu = 0 otherwise. Then from the discussion above
r = IfI, a function of w is called the gain while 8 is the phase or phase shift.
You may also find it useful to note that for a linear filter
L[ a Y( t)] = aLe Y( t»
L[X(t) + yet)] = L[X(t)] + L[ Y(t)]
If
L[X(t)] = yet)
L[X(t+h)]= Y(t+h)
In continuous time the analogue is
L(Y(t» = f: h(u)X(t-u)du
2Tr(l- 2a cos w +( 2 )
13. Y(t) - a yet-I) - ~Y(t-2) = a(t)
(1- aB - ~B2) yet) = a(t)
u2
fY(W)=2 Tr II -ae Uu a
-,..e 2iWl2
L 'Yyy(u)zu= L g(u)zu·L'YxxV-u)
u=-oo u=-oo
00 00 00
L 'YYY(u)ZU = L L g(uhxx(v)zV-U
u=-oo u=-oo v=-oo
Time Series 191
Fig. 8.11
Equating coefficients of z
00
In many practical situations we know the input X(t) and the Y(t) but
do not know the convolution or equivalently the filter. Our next aim is thus
to try and estimate the filter or to deconvolute. We may also have to expand
our ideas to multiple inputs and outputs.
We note in passing that if we have an autoregressive model
«1>(B)X(t)::: a(t)
it can be written as
X(t)::: «1>-I(B)a(t)
i.e. as a moving average model only if «1>-I(B) exists as a polynomial. The
condition for this (and in fact for the stationarity of X(t» is that the roots
of «1>( z) lie inside the unit circle.
8(w):::(17'-A)/2 A>O
-(17'-A)/2 A<O seeFig.8.ll.
This filter is often used to reduce a series to stationarity.
Suppose X(t)::: IX +f3t +Z(t) where Z(t) is stationary. Clearly X(t) is
not stationary while
(1- B)X(t)::: f3 +Z(t)- Z(t -1)
will be.
Exercise 4
Show that a series of the form
X(t)::: IX +f3t +yt 2 +Z(t)
192 Mathematics for Seismic Data Processing
1 R-l
f(A) = - I e-iAj
R j=O
= e iA (R-l)/2 sin(ARI2) -1T<A<1T
R sin(A/2)
r(w) = /f(w)/ = cos A/2
-w I 2 w>O
(j(w) - {
wl2 w<O
since here
r (A) = Rsinsin(A/2)
ARI2
-1T < A "5, 1T see Fig. 8.12
For this reason this filter is widely used to remove periodic disturbances
from the spectrum.
Suppose our series gives the quarterly sales of ice cream in London. One
would expect a periodic peak in the summer quarter and a corresponding
blip in the spectrum. This will occur at frequency w = 1T12. By choosing
R = 4 for the above filter we remove the oscillation from the series and the
blip from the spectrum. A similar filter was used in an engineering example
when x(t) measured the irregularity at a distance t down a railway track.
Clearly the constant bump at the rail joins could be removed in this way.
Time Series 193
Fig. 8.12
t=-oo
and
I if t = 1, 2, ... , N
a= {
t 0 otherwise!
By suitable averaging or by other methods of choosing filters one can
enhance or suppress more or less desirable characteristics in one's record.
This is in fact often done on amplifiers where there is a high frequency
cut off for playing poor source material.
Estimation
This is a wide subject and we just look at two simple cases.
As in practice we will use computational techniques we shall only work
with discrete series. In real life that is all we ever observe! For a discrete
series the obvious analogs of the differential models are difference equations
such as:
(a) Y(t) +al Y(t -1) +a2 Y(t -2) + ... +apY(t - p) = X(t)
viz. (1 + alB + a2B2 + ... + apBP) Y(t) = X(t) the autoregressive model of
order p.
(b) Y(t) = X(t) +bIX(t -1) + ... + bqX(t - q)
Y(t)=(l +bIB+··· +bqBP)X(t)
The moving average model of order q.
(c) The mixed model
(1 +alB + ... +apBP)X(t) = (l +blB + ... +bqBq)X(t)
the autoregressive moving average model (ARMA) of order p, q.
We c~n find the correlation structure for these models. We consider just
the two simple cases
A(B) Y(t) = X(t)
194 Mathematics for Seismic Data Processing
and suppose X(t) is a noise signal which is unrelated to past values of X(t)
and Yet). Then multiplying yet) in (1) by Y(t-s) we have
Y(t)Y(t-s)+al Y(t-I)Y(t-s)+··· +apY(t-p)Y(t-s)= Y(t-s)X(t)
Taking expectations gives
')'yy(s) +al')'yy(s - I) + ... +ap')'yy(s - p) = 0
i.e.
')'yy(I)+al')'YY(O)+ ... +ap')'yy(p-I)=O
')'yy(2) + al ')'yy(I) + a2,),yy(2) =0
then
a=R-1r
Solving these equations is often known as Wiener filtering. In practical
problems we can use the rather special form of R (a Toeplitz matrix) to
speed up computation. The algorithm for this has been rediscovered on
several occasions.
These equations are of little help when we have a moving average
component and in this case we must resort to a non-linear optimisation
technique. The best of these maximise the likelihood or formulate the
problem in a Kalman filter context.
To estimate the spectrum the traditional estimate is
AIm
f(w)=27T!;. A.')'xx(s)cossw
where
and use filter theory to compute j( w) given the polynomial c/>( B). This form
of estimator, variously called the autoregressive estimate or maximum entropy
estimate does have virtues but also some vices compared to the weighted
integral form.
Rationale
Given a model as in Fig. 8.13 the approach used and needed may well vary
as the point of interest. An engineer may consider a(t) to be a "signal"
while X (t) might be the received signal and noise. The aim here is to detect
the signal in the noise.
An economist given X(t) may try and deduce L with slight knowledge
of aCt). In seismic work we provide a(t) as a sound pulse, perhaps from a
towed airgun array and record X(t). In this case the question is, given X(t)
and a( t), can we find L? As L is a convolution the problem came to be
described as "deconvolution". Since we know aCt) and X(t) we can try
and unwind the convolution. Naturally this brings problems in its train
which are really beyond the bounds of this elementary introduction and
we refer the reader to the references in the literature. Beware however there
is no common system of notation and one must always find out what the
author intends a symbol to represent.
196 Mathematics for Seismic Data Processing
L
a(t) X(t)
Fig. 8.13
APPENDIX A
Expectation operator E.
Given a pair of random variables X and Y we can show
(i) E( aX) = aE(X)
(ii) E(aX + bY) = aE(X) + bE( Y)
(iii) cov(X, Y) = E(XY) - E(X)E( Y)
(iv) var(aX - bY) = a 2 var(X) +b 2 var(Y)+2ab cov(X, Y)
Chapter 9
APPLICATIONS
1 WAVELETS
Perhaps the best source is
E. Robinson, Physical Applications of Stationary Time Series, Griffin (1980),
(especially Chapters 5 and 7).
Another useful text is
E. Robinson and M. Silva, Digital foundations of time series analysis, Vol
2, Holden-Day (1981), (see Chapter 4).
Suppose we send a "signal" which we will view as a pulse or a wave. If
we are to work with digital processes then we will observe the wave as a
sequence of values at discrete time points. If x( t) is the amplitude of the
wave we observe the sequence Xk = x( kilt), k = ... , -3, -2, 0, 1, .... The
sampling interval is denoted by Ilt. It will be convenient to write the string
of values as a vector viz.
I bJ
j=O
where
we have
X(Z) = XO+XIZ +X2Z2 + ••• +xpzP
which is the z-transform, or at least the mathematical variant. In con-
sequence the DFT is
Applications 199
and
2k
C(z) = X(z) Y(z) = I CkZk
j=O
Example 1
x=(l,2) and y=(4,3)
then c = (4, 11, 6) since
Co = xoYo = 4 x I = 4
Cl = XOYI +xIYo=3 +8 = 11
C2 = XIYl = 2 x3 = 6
and
X(z) = I +2z
Y(z)=4+3z
giving
C(z)=(l +2z)(4+3z)
=4+ lIz +6z 2
If x is convolved with the spike 8 it is easily seen that
x*8=x
since the transform
c5(z) = I
and thence
X(z)c5(z) = X(z)
-----;)~-~I BLACK
x input .
80)(
youtput
)
Fig. 9.1
surface
/
/
/ layer 1
/
/
/
j
\ / layer 3
\ I
\ I
Fig. 9.2
surface
Fig. 9.3
primary
ghost
Fig. 9.4
We assume that the ghost is the reflection of the shot pulse from a surface
velocity discontinuity and that it has much the same profile as the primary
impulse. Our aim is to disentangle the two signals to give a clear view of
the primary, see Fig. 9.4.
202 Mathematics for Seismic Data Processing
i i
primary ghost at time
t=O t=d
that is a combination of primary and surface reflection with Ikl < 1 since
the energy must be less than the primary. Note if we change our time scale
then this can be written (1, k). To devise a suitable disentangling filter it is
easier to look at the z transform of y, Y( z) = 1 + kZd.
We would like a filter f whose z transform F(z) satisfied
F(z) Y(z) = X(z)
that is when applied to the output y gives the input primary and removes
the ghost. Since in this simple case X(z) = 8(z) we have
F(z)(1 +kz d )= 1
or F(z) = (1 + kzd)-I is the ghost elimination filter. This can be written, since
Ikl< 1
F(z) = 1 +kZd +ez 2d + ...
giving a corresponding wavelet form
(1, 0, ... , 0, k, 0, ... , 0, e, ...)
We can apply the same technique to the elimination of reflections when
working on water, see Fig. 9.5.
From Fig. 9.5 we can see that the received primary reflection is added to
the successive reflections between the water interfaces. If our measurements
give motion towards the reflecting stratum the signal received will be of the
form
y = (1, 0, 0, 0, 0, -k, 0, ... , d, e, ...)
where
- k is the second downward pulse at time d
k 2 is the third downward pulse at time 2d
and we assume the surface has a reflection coefficient of -1 and d is the
travel time (two-way) in water.
Again the z-transform is
Y(z) = 1- kz d + ez2dk3z3d + ... = (1 + kzd)-I
on summing the geometric series.
In this case we have the filter relationship
F(z)Y(z) = 1
or
and
f=(I,O, ... ,O,k)
Applications 203
air
water
rock
Fig. 9.5
Fig. 9.6
s=-oo
a bar denoting the complex conjugate since we may well have complex
terms in the wavelet. Recall that the total energy in the wavelet is
00
Yxx(O) = L Ixsl2 •
s=-oo
IS
s=-oo
2 PREDICTIVE DECONVOLUTION
which converges when Ikl < I, i.e. for a mininum delay wavelet (bo, bl). Thus
while we have a solution, albeit in an infinite series, for the minimum delay
case. In the maximum delay case we must be a little more careful.
Applications 205
) b
x y
Fig. 9.7
I (I, -k)
2 (1, -k, k 2 )
3 (I, -k, k 2 , _k 3 )
Alternatively we can choose the approximate filter (ao, aI, ... , ac ) which
206 Mathematics for Seismic Data Processing
and on solving
ao=(l +e)/(l +e+e)
al = -k/(l +e+e)
It is not difficult to check that these are minimum points, indeed in this
case Q can be plotted as a function of ao and al'
We can extend this least squares energy extension to the more practical
case using correlations of signals. This can be viewed as a prediction problem
and hence is often described as predictive deconvolution.
Suppose y the output is related to x via
*
that is y is the convolution a x. We look at the problem of predicting X , + a
a
for some > 0 given Yo, y], ... ,y" in fact we will take the rather simpler
x,
case and consider predicting X , +a by + a where a,+a = Yt for some t.
The prediction error 8 , + a is just
We would like to find the A(z) which gives the "best predictions" or
minimises the (energy) error. Suppose we choose the coefficients aj
j = 0, I, ... to minimise
Q = I (X, + a - X,+ a )2
I
Applications 207
as we did in the 2-wavelet case, using t+ et x = Yt. Here
Q = L (x t+et - aOxt - a1xn_1 - ... - anx t_n)2
t
s =2, 3, ...
Now if we recall that the correlations rk are defined as
rk = L X,xt+k
these equations become
These can be solved directly for the a given the correlations or rather more
economically using the fact that the extreme left-hand matrix is a "Toeplitz
matrix". Because of its rather special structure there are some slick
P algorithms, the earliest version (known to us) due to Levinson, for finding
the aj is based on the following idea.
Suppose a = I and apj is the jth coefficient from a filter of length p. Our
equations are for p = 2, P = 3
r2= U21 rl + a22 rO
rl = U21 ro + a22 rl
and
Since
then
II f[Jlfl
0
ro
0
which gives a particularly nice and economical set of equations to solve.
Peackock and Treitel report generally good results with this method.
We suggest that at this point our reader might like to start reading the
geophysical literature. Do remember that mathematics takes time and effort
to understand.
Appendix 1
REFERENCES TO APPLICATIONS
PART 1
1.1 Functions
The idea is so embedded in the literature that it is difficult to dissect. As
some examples try
Claerbout (1976) Chapter 1, Introduction and Section
1.1
Chapter 2
Robinson (1980) Section 7.3
Kulhanek, O. (1976) Chapter 1
Notice in most cases no explicit reference is made.
1.2 Polynomials
The main application is as the representation of data, perhaps as a wavelet.
This gives a compact data representation.
C1aerbout (1976) Chapter 1
Robinson (1980) Sections 6.4, 6.11
Robinson (1983) Section 2.9
Chapter 2: Differentiation
McQuillin (1979) Section 3.6, repeated differentiation
Grant and West (1965) Section 5.1, ordinary derivatives
ordinary differential equations
Claerbout (1976) Section 2.5
Section 6, Minimisation
Grant and West (1965) Sections 2.6, 2.9, partial differential
equations
Section 8.4, Greens theorem
Section 9.9, Interpolation
Robinson (1983) Section 8.7, Minimisation
Chapter 3: Integration
McQuillin (1979) Section 3.5
Appendix 1
Claerbout (1976) Chapter 4
Grant and West (1965) Sections 5.1, 5.5, 8.3. Chapter 8 uses
line integrals and Greens theorem
Robinson (1980) Sections 3.4, 4.4
Bath (1974) Chapter 3
K. Hubbert Line Integrals
Runcorn (1966) pp. 123-204 for triple integrals
PART 2
Chapter 5
Matrices are a key data processing concept.
Claerbout (1976)
Robinson (1980) Chapter 5, section 9.7
Robinson (1983) Section 1.9, sections 4.1 to 4.1 0
Robinson (1981) Chapter 15
Chapter 6
Clearly any data gathering operation must have some statistical component,
even if just for quality control.
References to Applications 213
Chapter 7
Rayner (1971) Chapters 2, 3, 5
McQuillin (1979) Appendix 1
Robinson (1971) Chapter 11
Robinson (1980) Whole volume
Parasnis (1972) Section 5.9
Claerbout (1976) Section 1.2 for the Dirac delta
Section 1.3
Robinson (1980) Chapter 4
Runcorn (1966)
Chapter 8
Almost any volume by E. Robinson gives a wealth of illustration.
Claerbout (1976) Chapters 3, 4
McQuillin (1979) Chapters 1,3,4
Rayner (1971) Whole volume
Bath (1974) Whole volume
Robinson (1980)
Robinson (1981)
Robinson (1978)
Kulhanek O. (1976)
Robinson E. A. and Treitel (1973)
Silvia and Robinson (1978)
Webster G. M. (1978)
REFERENCES
Grant, F. and West, G. (1965), Interpretation theory in applied Geophysics,
McGraw Hill.
Bath, M. (1974), Spectral Analysis in Geophysics, Elsevier.
Runcorn, S. K. ed. (1966), Methods and Techniques in Geophysics, J. Wiley.
Kulhanek, O. (1976), Introduction to digitalfiltering in Geophysics, Elsevier.
Robinson, E. A. and Treitel, S. (1973), The Robinson-Treitel Reader (3rd
edn) Seismograph Service Corporation, Tulsa, Oklahoma.
Silva, M. T. and Robinson, E. A. (1978), Deconvolution of Geophysical Time
Series in Exploration for Oil and Natural Gas, Elsevier.
Claerbout, J. F. (1976), Fundamentals of Geophysical Data Processing,
McGraw Hill.
Robinson, E. A. (1983), Multichannel Time Series Analysis with Digital
Computer Programs, (2nd edn), Goose Pond Press.
Robinson, E. A. (1980), Physical Applications of Stationary Time Series with
Special Reference to Digital Data Processing of Seismic Signals, C.
Griffin.
214 Mathematics for Seismic Data Processing
TRIGONOMETRIC FORMULAE
LOGS
loga (x) = loga b . 10gb X
loge (x) = In(x) = log(x); log( eX) = x;
aX = exloga; loga xy = loga x + loga Y
or
n
L aj = am + am + I + ... + an
i=m
Geometric series
l-x n
2
a+ax+ax +"'+ax
n-I
= L ax
n-I
j~O
j
=a'--
I-x'
(ifx,.,I).
= b + 1- a (if w = 0)
Also xn - an = (x - a)(x n- I + x n- 2a + ... + xa n- 2 + an-I) or
xn _an n-I ..
n-l-r I
---=L.,X
~
a.
x-a j~O
n(n-l) n(n-l)(n-2)
(l +xt = 1 +nx+ x2 + x 3 + . .. +x n
2! 3!
We define
C =
n r r
(n)
= (n-r)!r!"
. n' where n' = n . n - 1 . n - 2, . .. , 2· 1.
(1 +xt = I
r~O
(n)xr,
r (~)=1=(:).
This is known as the binomial theorem.
Some infinite series
x2 x3
eX = a +x+-+-+···
2! 3!
x2 x3
log(1 +x)=x--+--'" (-1 <x:::; 1)
2 3
x3 X S
sinx=x--+-_·· .
3! 5!
x3 XS
cos x= I-x +---+ ...
3! 5!
Ixl = positive value of x, i.e. 1-21 = 2, 131 = 3 etc.
Some Useful Formulae for Ready Reference 217
CALCULUS
d 1
- (log x) =-,
dx x
f x" dx=--
n + l'
X"+I
f~ dx = log\x\,
d . d .
dx sm x = cos x, - cos x = -sm x,
dx
x2 x3
f(a +x) = f(a) +xf'(a) +- f(2)(a) +- t<3)(a) ...
2 3!
where f(r)(x) = d,//dxr.
Appendix 3
PROGRAMS
1 FUNCTIONS
10 DEF FN A(W)=2*W+W
.. n.. X. FN A(23)
Most BASICS also provide some functions which are already defined.
We will assume these are:
From these one can immediately obtain a set of "defined" functions using
DEF FN.
Programs 219
SECANT
SEC(X) = IjCOS(X)
COSECANT
CSC(X) = IjSIN(X)
COTANGENT
COT(X) = IjTAN(X)
INVERSE SINE
ASN(X) = ATN(XjSQR( -X * X + 1)
INVERSE COSINE
ACN(X) = -ATN(X/SQR( -x * x + 1» + 1.5708
INVERSE SECANT
ASE(X) = ATN(SQR(X * X - 1) +(SGN(X) - 1) * 1.5708
INVERSE COSECANT
ACE(X) = ATN(1/SQR(X * X-I» +(SIGN(X) - 1) * 1.5708
INVERSE COTANGENT
ARCCDT(X) = -ATN(X) + 1.5708
HYPERBOLIC SIGN
SINH(X) = (EXP(X) - EXP(-X»/,i
HYPERBOLIC COSINE
COSH(X) = (EXP(X) + EXP( -I) )/2
HYPERBOLIC TANGENT
TANH(X) = -EXP( -X)/(EXP(X) + EXP( -X» * 2 + 1
A MOD B
MOD(A)= INT«A/B-INT(A/B» * B+0.05) * SIGN(A/B)
ABS(X) returns the absolute value of X i.e. IXI so if X < 0 ABS(X) = -X
while if X> 0 ABS(X) = X
INT(X) returns the largest integer less than or equal to X e.g.
INT(1.7) = I
INT(2.07) = 2
INT(2) =2
RND( 1) returns a random number less than 1 but exceeding zero
SG N (X) returns -1 if X < 0 or +1 if X> 0, 0 otherwise
SQR(X) returns the positive square root of X
Two strange but useful functions are
(a) 100 DEF FNR(X) = INT(X * (lOjF0) +0.5)/(lOjF0)
where we assume F0 is set on some previous line to be an integer like 2
or 6. This function takes a number X and keeps only the first F0 numbers
after the decimal point-it "rounds" the number, If F0 = 2 and X is
212.123456 then FNR(X) will equal 212.12. Setting F0= -2 with the same
X will give FNR(X) equal to 200, i.e. the number would be rounded to the
nearest 100.
This is useful to print tidy versions of numbers, thus with F0 = 6
PRINT(FNR(X) )
prints X to 6 decimals.
(b) MOD(A) = INT(A/B - INT(A/B» * B +0.05) * SGN(A/B)
220 Mathematics for Seismic Data Processing
Program 1
This program tests the two functions we have just described.
R(x): the rounding function
mod(x): the mod function
As you can see we display the output to assist in the understanding of the
program.
JLIST X= 71.4390695
ROUNDED XIS 71.4391
100 RE" FUNCTION TESTIN6
110 RE" TESTS R(XI AND "00 x = 71
120 RE" DATA FRO" RANDO" "ODm = 2
130 FO : 4:8 : 3
1~0 DEF FN R(XI: INT (X • (10 X= 40.8186764
~ FO) + 0.51
(10 ~ FOI I ROUNDED X IS 40.8187
150 DEF FN "OD(A) = INT ((A I
B- INT (A ! B») • B+ 0.05 x = 40
) • S6N (A I B) "ODm =
160 FOR I : 1 TO 4
170 X= RND (9) X= 21. 0867141
180 X= 100 • X ROUNDED XIS 21.0867
190 PRINT
200 PRINT I X= "jX x = 21
210 PRINT "ROUNDED X IS "j FN R( ImDm = 0
Xl
220 PRINT X= 96.0521198
230 X= INT (X) ROUNDED X IS 96.0521
240 PRINT'X = ";X
250 PRINT ""OD(X) = "j FN "OD(X x = 96
) "ODm = 0
260 NEXT
270 PRINT PR06RA" ENDS
280 PRINT "PR06RA" ENDS ": END
JRUN
Programs 221
Program 2
This program is provided for you to evaluate polynomials. Given the order
of the polynomial and its coefficients the program will compute the value
of the polynomial for a given number of points. A range of values must be
given.
Do note this is not an efficient way of computing values: there are very
much better ones, for example Homer's algorithm.
Program 3
This program uses the method of bisection to find the solution of f(x) = o.
You must give a range of values which include the solution, the program
will warn you if there is no solution in this range. Notice you are asked to
specify the accuracy required. To ensure that the procedure terminates, the
program will ask you for an upper bound to the number of iterations.
222 Mathematics for Seismic Data Processing
Program 4
This program uses the Newton-Raphson algorithm to solve f(x) = O. You
should note that the program computes the derivatives of the function itself.
If you supply the derivatives in the subtoutine beginning at line 1000 the
program will converge more quickly but this does make the program less
flexible. Notice an initial guess and the accuracy required must be specified.
There is no check to ensure that the program stops if the algorithm does
not converge so take care.
110 "
RE" NEED AN INITIAL SUESS
120 RE" FN DEFINED AT 140
JEE
170 EP : IE - 16: RE" ACCURACY
80 UND FOR "leRO
130 RE" PROS FINDS DERIVATIVE RUN
140 DEF FN FIX) : SIN IX) - X• 180 EP = SGR (EP)
XI 2 190 FO: FN FIXO)
Programs 223
200 SOSUS 1000 STEP 3 X= 1.47B46231
210 Xl = XO - FO I FD:I = I + 1 STEP 4 X= 1.4155181
220 PRINT ·STEP "iIi" X= "iXI STEP 5 X= 1.40783465
230 E = Xl - XO:XO = Xl STEP 6 X= 1.40600649
240 IF ADS (E) > EE GO TO IBO STEP 7 X= 1.40528888
250 PRINT "ACCURACY ACHIEVED": STOP STEP 8 X= 1.40492846
STEP 9 X= 1.40472574
1000 H= ( ASS (XO) + EP) * EP STEP 10 X= 1.4046056::
1010 Fl = FN F(XO + H) STEP 11 X= 1.40453268
1020 FD = (Fl - FO) I H STEP 12 X= 1.40448785
1030 REM FD IS DERIVATIVE STEP 13 X= 1.40446015
1040 RE" HIS DELTA X STEP 14 X= 1.40444298
1050 RETURN STEP 15 X= 1.40443232
JRUH STEP 16 X= 1.4044257
SIVE INITIAL VAL 1 ACCURACY ACHIEVED
GIVE ACCURACY NEEDEDO.OOOOI
STEP 1 X= 2.62956303 BREAK IN 250
STEP 2 X= 1.78211336
Program 5
This program computes the minimum value of a function in the range a to
h. The method is to split the range into n intervals and then find the minimum
function value. The process is then repeated on this interval to give a smaller
interval. This process is repeated until the accuracy required is attained.
The diagnostic print is given to illustrate the method.
One could also find the maximum using the same program, we leave this
as an exercise for the reader. Alternatively the stationary values might be
found by using bisection or Newton-Raphson to find the values for which
the derivative is zero.
8
100 RE" "INI"U" PROGRA" 180 T = FN FIA):K = 0
110 RE" ASSU"ES THERE IS ONE "I 190 N= FN FIB)
N 200 IF N< T THEN T E N:K = N
120 RE" NEEDS INTERVAL Of INTER 210 H= (8 - A) I N
EST 220 FOR J = 1 TO N- 1
130 OEF FN F(X) = - 250 • X+ 230 P = FH F(A + J • H)
22500 • (1.00949) A X 240 IF P > = T GOTO 260
140 INPUT "NO OF GRID POINTSE "i 250 T = P:K = J
N 2bO NEXT
150 INPUT "TOLERENCE= 'iEE 280 R= A+ J • H
160 PRINT "FN EXA"INED BETWEEN A 290 PRINT "INTERVAl IS "jiR - H)
AND B" ; I ro"iR
224 Mathematics for Seismic Data Processing
Program 6
This program plots contours for a function of two variables f(x, y). The
function is evaluated for x values between xa and xb,y values between ya
and yb. The letters indicate the "height" of the function. We have used this
display as it only requires a text string as opposed to full graphics.
You may wish to set vt, the number of vertical lines, and vh, the number
of horizontal places, to fit the display to your own machine.
Program 7
This is a non-linear least squares program which minimises a nonlinear
sum of squares using a technique called Marquardt's method. We have
included it so you might have an example of the minimisation of a more
complex function, this technique is often used in practical problems. The
program tries to find b values to minimise
12
I {xj -b 1/[1 +b 2 exp(ib3 )]}2
j=1
the data being obtained from the data statement. As you will see from the
output we have printed values of IG, INF and LAMBDA. These are
diagnostic parameters.
226 Mathematics for Seismic Data Processing
Program 8
This is another complex program which finds the minimum of a function
of n variables b(l) ... b(n). We think of it as a variable metric method. In
this example the function defined in the subroutine starting at 970 is
P(b) b2 b3 ••• ) = lOO(b 2 - bi)2 +(b) _1)2
a famous test example. Starting values for the b's are needed.
llIST 190 60SUB 1000
200 16 = 16 + 1
100 RE" VARIABLE "ETRIC NKI 210 FOR I = 1 TO N
110 INPUT "THE NO OF PARA"S= "jN 220 C8 (I , I) = 1
230 NEXT
120 FOR I = I TO N: INPUT "B(I)= 240 ILAST = 16
"j8(1) 250 PRINT "FN EVAlUATION NO";INF
130 NEXT
140 W= 0.21TL = 0.0001 260 PRINT "&RADIENT CAlCS=";I6
ISO RE" SETUP 270 PRINT "FN ="jPO
160 60SU8 950 280 FOR I = 1 TO N
170 PO = P 290 PRINT "COEFF=";BII): NEXT
180 INF = INF + I 300 PRINT
230 Mathematics for Seismic Data Processing
COEFF=.983692503 COEFF=.9979488t2
COEFF=.966154852 COEFF=.995836004
Program 9
This is a simple program that integrates the function defined at line 160
using the trapezoidal rule with n points. Two values of n are illustrated.
See also program 10 for Simpson's rule.
Programs 233
Program 10
This program integrates the function defined at line 150 using Simpson's
rule with n points. Two values of n are used as illustration.
Program 11
This is a program which can be used to solve systems of equations and to
find inverses. Essentially it uses elementary operations (in a rather clever
way) to find the solution of
Ax=b
and is written to work for several different b's at the same time.
If the b's are chosen to be the columns of the unit matrix then the solutions
are the columns of the inverse matrix. Note the tolerance asked for is used
to ensure values close to zero are avoided in scaling.
lLIST 380 NEXT
390 D= - 0
100 RE" GAUSS ElI"INATION 400 D= D• AtJ,J)
110 INPUT NTHE ORDER OF A";N
I 410 IF ADS IAtJ,J)) < Tl THEN STOP
120 INPUT P THE NU"DER OF R.H.
I
Program 12
This is a program for finding the eigenvalues and vectors of a real symmetric
matrix using what is often called Jacobi's method. The input ensures that
a symmetric matrix is supplied.
236 Mathematics for Seismic Data Processing
Program 13
This program computes Binomial probabilities.
JlIST
180 INPUT "NO OF DECIKAL PLACES
100 REK PROSRAK "iFO
110 REK COKPUTES BINO"IAL PROBS 190 g = 1 - P
200 DII'! A(N)
120 REK THE NU"BER OF TRIALS IS 210 DEF FN R(X) = INT iX • (10
N .• FO) + 0.5) ! (to" FO)
130 RE" P THE PROBABILITY IF SU 220 IF P < g THEN 60SUB 410: GOTO
CCESS 240
140 PRINT "BINOKIAL PROBABILITIE 230 GOSUB 330
S" 240 PRINT' I P(X=!) PO(=I
150 PRINT: PRINT : )": PRINT:
160 INPUT' GIVE N "jN 250 FOR I = 0 TO N
170 INPUT GIVE P "iP
I 260 Y = Y + A!I)
238 Mathematics for Seismic Data Processing
Program 14
This program computes Poisson probabilities.
JlIST 200 GOSUB 280
210 PRINT I PIX:!)
I PO<=I
100 REH COHPUTES POISSON PROBS !": PRINT:
110 REH THE NUHBER OF TRIALS IS 220 FOR I : 0 TO N
M 230 Y= Y+ A(I!
120 REH P THE PROBABILITY IF SU 240 PRINT Ij" "i FN R(A(I)!j"
CCESS " j FN R(Y)
130 PRINT POISSON PROBS.
I I 250 NEXT
140 PRINT : PRINT : 200 PRINT "END OF RUN"
150 INPUT GIVE N "jN
I 270 END
160 INPUT • GIVE "EAN "jP 280 RE" •••••••••••••••••••
170 INPUT "NO OF DECIHAL PLACES 290 RE" CO"PUTES PROBS IF P)Q
"jFO 300 A(O) = EXP ( - P)
IBO DIH A(N! 310 FOR I = 1 TO N
190 DEF FN RlXl = INT (X • (10 320 All) = A(I - I! • P ! I
.• FO! + 0.5) ! (10 . . FO) 330 NEXT
Programs 239
340 RETURN 5 .0607 .1157
350 RETURN 6 .0911 .2068
JRUN 7 .1171 .3239
POISSON PROBS. 8 .1318 .4557
9 .1318 .5874
10 .1186 .706
GIYE N 18 11 .097 .803
GIYE "EAN 9 12 .0728 .8758
NO OF DECI"AL PLACES 4 13 .0504 .9261
I P(x=Il P(x(=Il 14 .0324 .9585
15 .0194 .978
0 lE-04 lE-04 16 .0109 .9889
1 1.IE-03 I. 2E-03 17 5.8E-03 .9947
2 5E-03 6.2E-03 18 2.9E-03 .9976
3 .015 .0212 END OF RUN
4 .0337 .055
Program 15
This program is a simple data analysis program. Given some data, here
supplied by a random number generator, the program computes some
sample statistics, prints the data in order of magnitude and plots a simple
histogram.
Points of note are the recursive mean and variance estimation, lines 170
and 180, and the "Shell sort" algorithm starting at line 320.
JLIST 280 Z = I + X
290 PRINT Ij" "jFII);" "jCII);'
100 RE" BIN GENERATOR ";X;" "iZ
110 INPUT 'SA~LE SIZE 'jREP 300 NEXT
120 DI" AIREP) 310 PRINT "SA"PLE'
130 FlO) = 0.1296 320 FOR I = 1 TO REP
140 Fll) = 0.4752 330 PRINT All);" "j
150 F(2) = 0.8209 340 NEXT
160 F(3) = 0.9744 350 END
170 F(4) = 1.0000
180 FOR I = 1 TO REP JRUN
190 X= RND (9) SA"PLE SIZE 20
200J= -1 o .1296 4 .2 .2
210 J = J + 1 1 .4752 7 .35 .55
220 IF X ) FIJ) SOTO 210 2 .8209 8 .4 .95
230 CIJ) = CIJ) + 1 3 .9744 1 .05 1
240 Am = J oI
250 NEXT
260 FOR I = 0 TO 4 1 0 2 2 2 2 320 1 200 1 1 1 1 122
270 X= CII) J REP
Program 17
This is a departure for our original self imposed brief but we couldn't resist
the temptation to include a drunkard's walk. The lines 150, 340 contain
code to plot using an Apple II micro. F is a scale parameter.
JRUN
GIVE NO OF STEPS400
F4
Program 18
We have chosen to give a fairly standard FFr program and this one follows
the original form due to Cooley and Tukey. There are others which are
slicker but more obscure.
This program is devised for powers of two and expects a data series with
real part RA(I) and imaginary part IA(I) which are supplied on line 160.
The transform is computed and then the transform ofthe complex conjugate
of the transform. As we expect we retrieve our original series times a constant.
16 -8.00000239 40.2187163
BACKWARDS
0 0 0
1 16 7. 4505806E-09 9 144 7. 4505806E-0'I
2 32.0000005 3. 68739381E-07 10 160 -1.8318566E-07
3 47.9999999 -1.71712605£-08 11 176 1. 71712595E-08
4 64.0000001 -1. 95979044E-07 12 192 -5. 35448974E-08
5 80.0000005 3. 72529149E-08 13 208 -5. 21540761E-08
6 96.0000007 -9.91920048E-08 14 223.999999 -1.35746416E-07
7 112.000001 8.08504415E-08 15 240 -8.08504405E-08
8 128 -8. 97791664E-08 16 255.999999 3.88687806E-07
Program 19
This program will compute the gain and phase of a given filter. It is very
much better with plotting. You must supply the program with the coefficients
of the filter polynomial, A(l) ... A(LAGS). The frequency range is expected
to be between multiples of pi.
If the filter inverse is wanted this can be provided.
lLIST 280 N: WI
290 FOR J : 0 TO ex
100 RE" FILTER PR06RA" 300 Y: O:Z : 0
110 6X : 20 310 FOR K: 1 TO N
120 INPUT "NO. OF LASS: "jN 320 X : AIIO
130 DI" AIN),6ISX),PI6X),PLI6X) 330 V: V+ X' COS IK • N)
140 DI" DISXl 340 Z: Z + X' SIN IK • W)
150 PRINT "NON INPUT COEFFS "j 350 NEXT
160 PRINT: PRINT 360 61J) : V• V+ Z • Z
170 FOR I : 1 TO NI INPUT" All) 370 IF Z ( > 0 THEN PIJ): ATN
"jAm: NEXT IV I Zl
180 PRINT "ECHO CHECK " 380 N : N + EP
190 FOR I : 1 TO N 390 NEXT
200 PRINT' AI";I;'): 'jAII) 400 PRINT 'IF INVERSE USE TYPE V
210 NEXT ES"
220 PRINT "OK ••• TO PROCEED ••• • 410 INPUT 'WELL ? •• ·;9$
230 PRINT 'FRE9UENCY FRO" WI TO 420 IF 9$ < } 'VES' SOTO 470
W2 " 430 FOR I : 0 TO 6X:6(1) = I ! 6
240 PRINT "JUST 61VE THE ~LTIPL m
E OF PIE" 440 P(I): - P(I)
250 INPUT" WI: ";Wl: INPUT" N2 450 NEXT
: "jW2 460 PRINT: PRINT "INVERSE RE9UE
260 WI : WI • 3.14159:W2 : W2 • 3 STED •• •
.14159 470 PRINT ·SAIN •••• •
270 EP : IW2 - WI) I ex 480 PRINT 'FRE9UENCV SAIN'
Programs 245
490 II = III lRUN
500 FOR I = 0 TO SX NO. OF LASS = 4
510 PRINT N;' ';SII) NON INPUT COEFFS
520 II = II + EP
530 NEXT Am 0.25
540 SET AS Am 0.25
550 PRINT 'PHASE ••••• Am 0.25
SbO N= N1 Am 0.25
570 PRINT' FREUUENCY •••••• PHASE ECHO CHECk
AIl)= .25
580 FOR I = 0 TO 6X A(2)= .25
590 PRINT N;' ';PII) A(3)= .25
600 N= N+ EP A(4)= .25
610 NEXT OK ••• TO PROCEED •••
620 SET AS FREQUENCY FRO" N1 TO 112
630 HO"E JUST 61VE THE "ULTIPLE OF PIE
640 STS = ·SAIN ••• • IU= 0
650 FOR I = 0 TO 6X:D(I) = SII): N2= 1
NEXT IF INVERSE USE TYPE YES
660 STS = ·PHASE •••• • IIELL ? ...
670 FOR I = 0 TO 6X:D(I) a P(I): SAIN ••••
NEXT FREQUENCY 6AIN
680 RE" tt*t PLOTTIN6 HERE* ••••• o 1
.1570795 .969523123
690 HO"E .314159 .882373788
700 TEXT .4712385 .750628444
710 PRINT 'THAT'S AlL FOLkS ••• • .628318 .592009056
720 END .7853975 .426777378
730 "A = D(O):"I = "A .942477 .274283868
740 FOR I = 1 TO 6X 1.0995565 .149839713
750 IF-"A < 0(1) THEN "A = D(I) 1.256636 .0625004559
760 IF "I) 0(1) THEN "I = D(I) 1.4137155 .0141502105
770 NEXT 1.570795 8. 78525569E-13
780 R = "A - "I 1.7278745 .0103215865
790 FOR I = 0 TO SX 1.884954 .03299125'
800 PL(I) = 160 - INT (((0(1) - 2.0420335 .05bZb80695
"I) * 150) I Rl 2.199113 .0711082419
810 NEXT 2.3561925 .0732233648
820 OY = 160 - INT i((O - "I) • 2.513272 .0625002154
150) I R) 2.671\3515 .0432648653
830 OX = INT (N1 ! EP)IOX = - 0 L.827431 .0221351976
X 2.9845105 6.00537332E-03
840 RETURN 3.14159 1.75717995E-12
246 Mathematics for Seismic Data Processing
PHASE ....
FREQUENCY •••••• PHASE
o 0
.1570795 1.17809758 1.884954 3.97507172E-06
.314159 .785398827 2.0420335 -.39269477
.4712385 .392700077 2.199113 -.785393521
.628318 1.32616363E-06 2.3561925 -1.17809227
.7853975 -.392697424 2.513272 -1. 57079102
.942477 -.785396174 2.6703515 1.17810288
1.09955b5 -1.17809493 2.827.31 .785404129
1.256636 -1.57079368 2.9845105 .392705375
1.4137155 1.17810023 3.14159 0
1.570795 .7852606 THAT'S AlL FOLKS •••
1.7278745 •392702725
Program 20
Often one wishes to manipulate polynomials when using filters. This pro-
gram will untangle discrete time series as follows.
Suppose that
A(z)f(z) = B(z)g(z)
Then the program will find the coefficients of the polynomial
C(z) = A-1(z)B(z)
A(z) is the AR part of a polynomial of order p and B(z) is the MA part
of order q, i.e.
i=O
q .
B(z)= L bjz'
;=0
JLIST
ISO R(4) = 0.01:RI51: - O.Ol:R(
100 REI! :TOEPLITZ TESTER 6) : 0.02
110 REII: CORRELATIONS ARE IN RI 160 R(71 : O.OI:RIB): - 0.001
1) 170 PRINT "SIVE ORDER OF IIATRIX
120 DIll PHI9,91,RII0)
130 RIO) = I:RII) = 0.B06 180 INPUT "Dl"=";NO
140 R(2) = 0.42B:R(31 = 0.070 190 IF NO ) 10 THEN GOTO 270
248 Mathematics for Seismic Data Processing
Convolution 3, 157, 159, 161, 165, 167, 182, Double integration 63-66
190, 191, 195, 198, 199,206 Drunkard's walk 242
Coordinates 90 Duality 79, 161
Correlation 133, 138 Dummy variable 51
Correlation coefficient 175
Cosec 11 Echelon matrix 102
Cosine 7, 29, 84, 149, 150, 152 Eigenvalues 77,108-9, 111,236
Cot 11 Eigenvectors 108-9
Covariance 133 Electric circuits 26
Cross-correlation function 203-4 Electric current 172
Cross-correlation generating function 204 Electromagnetic theory 68
Cumulated process 178 Elementary row operations 102
Cumulative distribution 125 Ellipse 109-11
Ensemble 114
Daniell window 195 Ensemble average 181
Data analysis program 240 Equation of motion 179
Data processing 89, 90 Ergodic theorem 181
Data samples 134-37 Errors 138, 144-46, 205, 206, 208
Decay 175 Estimate 136, 138
Decimal notation 21-23 Estimation 193-95
Decimation Even functions 152
in frequency algorithm 169 Event 116
in time algorithm 169 Expected value 130, 132
Deconvolution 195, 200, 208 Exponential distribution 130, 131
Deconvolution filter 204, 205 Exponential functions 14-16,84,85, 165
Definite integral 53
Degree of differential equation 74 Fast Fourier transform (FFT) 168-70, 195,
Delay 199,204,205 243,250
Delta function 160, 178, 183 Filter coefficients 248, 250
Density function 131, 132, 182 Filters 174, 188-95,202,208,245,247
Derivatives 31, 32 Finite data filter 193
see also Higher derivatives; Partial deriva- Finite discrete Fourier transform 163
tives Floating point representation 24
Diagonal matrix 97 Flow lines 73
Differential coefficients 47 Fourier analysis 13, 149-73
Differential equations 47, 73-78 examples of 152-54
degree of 74 references 213
general solution 75 Fourier coefficients 152, 153, 156
particular solution 75 Fourier series 60, 78, 149-57
Differentiation 27-50, 75, 180 Fourier transform 149, 158-62, 182, 190
complex functions 86-88 Frequency 85, 171-73
introduction 27-35 Frequency domain 171, 174
references 212 Frequency domain analysis 156
relationship with integration 54-60 Function value 15
Diffusion equation 48 Functions 1-26,27,32,35,43,46,51,52,66,
Digital filters 191-93 80,149
Discrete Fourier transform (DFT) 163-68, addition of 54
198 BASIC programs 218-52
Discrete series 193 graphs of 15
Discrete time 180, 181, 191-93 integral of 53
Discrete time series 183, 247 matrices as 98-100
Division 81 new from old 20
Dot product 100 of two variables 83, 225
Double integral 71 random variable as 116
Index 253
Functions (cont.) Inverse functions 16-19,34, 158
references 211 trigonometric 18-19
special types of 152 Inverse sine 19
Fundamental Theorem of Calculus 55 Inverse tangent 19
Inverse transform 164, 182
Invertibility 106
Gain 189, 245 Invertible matrix 97
Gauss Elimination Method 106
Gaussian distribution 127
Jacobi's method 236
Generalised function 160
Joint density function 129
Geometric series 163, 168,216
Joint distribution 128-29, 133
Ghost elimination filter 202
Gibbs phenomenon 154
Kalman filter 119, 194
Graphs 12
area under 51, 56
Laplace's equation 48, 87
of functions 15 Leading diagonal matrix 97
Gravimetric effects 72
Leading term 2
Gravitational theory 68
Line integrals 66-73
Green's theorem 70-72
Linear equations 74, 100-8, 207
Linear filters in discrete time 191-93
Harmonic functions 87 Linear functions 99
Harmonics 77 Linear problems 89, 99
Heaviside function 4, 31 Linear process 178
Hexadecimal notation 21-22 Local maxima and minima 38
High pass filter 190 Log tables IS
Higher derivatives 35-40 Logarithms 15, 18, 85, 88, 215
Higher order partial derivatives 46-48 Logic 26
Histogram 135, 137 Low pass filter 190
Homogeneous equations 108
Hyperbola 109 Marginal distribution 129
Markov process 177, 187
Identity (or unit) matrices 96 continuous time 187
Image processing 150 discrete time 187
Imaginary part 79, 83, 84 Marquardt's method 226
Inconsistency 106 Mathematical models 43
Indefinite integral 56 Matrices 89-111, 227
Independence 118 addition of 92, 93
Infinite sequences 3 as functions 98-100
Integer 6 definitions and elementary properties 89-91
Integrals 56, 58, 60, 61, 76, 88, 150, 152, 161, examples 91-93
184-85 introduction 89
double 66, 71 multiplication of 93-96
line 66-73 notation of 92, 93
of functions 53 references 212
triple 66 special types of 96-98
Integration 51-78,180 Matrix inversion 208
complex functions 86-88 Maximum delay 199,204,205
double 63-66 Maximum entropy 195
numerical 61-63, 71-73 Mean 130, 131, 145, 146
references 212 Median 136
relationship with differentiation 54-60 Mesh 71-73
repeated 63 Mesh cells 72
Inverse 80, 81, 106 Mesh point 71
Inverse Fourier transform 158 Minimum delay 199,200, 204, 205
254 Index