0% found this document useful (0 votes)
235 views

Mathematics For Seismic Data Processing and Interpretation (PDFDrive)

Math seismic

Uploaded by

Ibrahima Ndiaye
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
235 views

Mathematics For Seismic Data Processing and Interpretation (PDFDrive)

Math seismic

Uploaded by

Ibrahima Ndiaye
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 269

Mathematics for Seismic Data

Processing and Interpretation


Mathematics for Seismic Data
Processing and Interpretation

A. R. Camina and G. J. Janacek


School of Mathematics and Physics
University of East Anglia

Foreword
by
R. L. French
Racal Geophysics Limited

Introduction
by
M. Bacon
Shell

Graham ~ Trotman
First published in 1984 by

Graham & Trotman Limited


Sterling House
66 Wilton Road
London SWlV IDE

© A. R. Camina and G. J. Janacek, 1984


Softcover reprint of the hardcover 1st edition 1984

British Library Cataloguing in Publication Data


Camina, A. R.
Mathematics for seismic data.
I. Engineering mathematics
I. Title II. Janacek, G. J.
510'.2462 TA330

ISBN 978-0-86010-576-3 ISBN 978-94-011-7767-2 (eBook)


DOI 10.1007/978-94-011-7767-2

This publication is protected by international copyright law. All rights reserved. No part
a
of this publication may be reproduced, stored in retrieval system, or transmitted in
any form or by any means, electronic, mechanica~ photo-copying, recording or otherwise,
without the prior permission of the publishers.

Typeset in Great Britain by J. W. Arrowsmith Ltd, Bristol


CONTENTS

FOREWORD ix

PREFACE xiii

INTRODUCTION xv

Chapter 1 SPECIAL FUNCTIONS

1 Functions 1
2 Polynomials and Step Functions 2
3 Trigonometric Functions 7
4 Power and Exponential Functions 14
5 Inverse Functions 16
6 New Functions from Old 20
7 Numbers 20

Chapter 2 CALCULUS: DIFFERENTIATION 27

1 Introduction 27
2 Higher Derivatives 35
3 Maxima and Minima 37
4 Taylor Series and Approximations 41
5 Partial Derivatives 43
6 Higher Order Partial Derivatives 46
7 Optimisation 48

Chapter 3 INTEGRATION 51

1 Introduction and Definition 51


2 The Relationship between Integration and Differentiation 54
3 Numerical Integration (Quadrature) 61
v
vi Contents

4 Double Integration 63
5 Line Integrals 66
6 Differential Equations 73

Chapter 4 COMPLEX NUMBERS 79

1 Introduction 79
2 The Beginning 79
3 Functions of Complex Variables 83
4 Differentiation and Integration 86

Chapter 5 MATRICES 89

1 Introduction 89
2 Definitions and Elementary Properties 89
3 Matrices 91
4 Multiplication of Matrices 93
5 Special Types of Matrices 96
6 Matrices as Functions 98
7 Linear Equations 100
8 Eigenvalues and Quadratic Forms 108

Chapter 6 STOCHASTIC PROCESSES, PROBABILITY AND


STATISTICS 112

1 Introduction 112
2 Probability 115
3 Permutations and Combinations 119
4 Probability Distributions 121
5 Joint Distributions 128
6 Expected Values and Moments 129
7 Real Data Samples 134
8 Two Variables 138
9 Simulation and Monte Carlo Methods 141
10 Confidence Intervals 144
11 Stochastic Processes 147

Chapter 7 FOURIER ANALYSIS 149

1 Introduction 149
2 Fourier Series 149
3 Some Examples of Fourier Analysis 152
4 The Phase, Amplitude and Exponential Formulation 154
5 Fourier Transform 158
6 The z-Transform 162
Contents vii

7 The Discrete Fourier Transform 163


8 Fast Fourier Transform 168
9 Frequency Domain 171

Chapter 8 TIME SERIES 174

1 Stationary and Related Series 175


2 Aliasing and Sampling 183
3 Filters and Convolutions 188

Chapter 9 APPLICATIONS 197

1 Wavelets 197
2 Predictive Deconvolution 204

Appendix 1 REFERENCES TO APPLICATIONS 211

Appendix 2 SOME USEFUL FORMULAE FOR READY


REFERENCE 215

Appendix 3 PROGRAMS 218

INDEX 251
FOREWORD

With the growth of modern computing power it has become possible to


apply far more mathematics to real problems. This has led to the difficulty
that many people who have been working in various jobs suddenly find
themselves not understanding the modern processing which is being applied
to their own professional field. It also means that the people presently being
trained in these subjects need to understand a much wider range of mathe-
matics than in the past. It is to both of these groups that this book is
addressed.
The major objective is to present the reader with the basic mathematical
understanding to follow the new developments in their own field. The
mathematics in this book is based on the need to understand signal process-
ing. The modern work in this area is mathematically very sophisticated and
our purpose is not to train professional mathematicians but to make far
more of the literature accessible. Since this book is based on courses devised
for Racal Geophysics there is clearly going to be a bias towards the
applications in that area, as the title implies. It is also true that the bibliogra-
phy has been chosen in order to aid the reader in that field by pointing
them in the direction of recent applications in geophysics.
Whilst every attempt has been made to make the material comprehensible
to the non-mathematically inclined, it is important to remember that this
is a mathematical textbook. This has the implication that it must be read
very carefully and not like a novel. One of the great advantages of mathe-
matics is its very conciseness and precision. This has the disadvantage of
making it difficult to read. So have patience and take your time. There is a
collection of computer programs (American spelling) which we hope will
be useful, not for commercial use, but to enable the reader to work through
some of these ideas for themselves, especially to foster understanding of
the mathematics.
This work arose out of the realisation at Racal Geophysics that numeracy
had become a problem as we moved from our traditional business under
the Decca Survey flag of single-channel analogue seismics into newer fields
of engineering hazard surveys and multi-channel exploration. It became
clear that, whether our personnel's background was classical geology,
engineering geology or geophysics, each had problems to some degree with
the mathematical concepts involved with current seismic data processing
techniques.
ix
x Foreword

It was also clear that there were no really suitable courses in the UK to
teach the relevant mathematics (and the related physics and computing).
This is not meant as a criticism of the various courses currently available,
it is purely a comment on their usefulness to our specific needs. Most of
the instruction offered in seismics is of two varieties, either of an acquaint-
ance type or of an advanced nature to already practising experts neither of
which include the fundamentals which were required. The length and size
of these courses usually also preclude a high level of direct one-to-one
instructor-student contact. None of this was what we at Racal required and
consequently, in December 1981, in conjunction with the Racal College at
Brixham, we decided to take the bull by the horns and set up a course
tailored to our specific needs. After consultation with colleagues at the
University of East Anglia, University College of Wales, Aberstwyth, Merlin
Geophysical Company Limited and Racal College, we produced a proposed
curriculum for a three-part course.
Part One, to be taught by lecturers of Racal College at Brixham, would
involve a review of electronics, physics, acoustics and the use of a suite of
the analogue equipment employed by Racal Geophysics and would be of
three weeks' duration.
Part Two, to be provided by staff from both Racal Geophysics Limited and
Merlin Geophysical Company Limited, would include multi-channel digital
acquistion processing, by computer, terminating i~a period of hands-on
experience at Merlin Geophysical Company Limited at Woking. This would
be a two week course.
Part Three, to be provided by the School of Mathematics of the J)niversity
of East Anglia, would be a series of correspondence notes in mathematics
to be supplied to the student over a period of six months. The idea being
that they would be in stages and lead into the other parts of the course.
The text of this book is an expansion of these notes.
The course has been inclined specifically towards digital seismic recording
and processing, although it would obviously provide a good general back-
ground to any time-series, digital analysis. The starting point for this course
has been at the end of high school. The early chapters are a review of the
A-level syllabus and a review of first year university additional mathematics
in areas of specific relevance. The subjects covered are special functions,
trigonometric functions for waveforms and calculus. The content and the
examples, although of an easy high schoolj first year nature, are all orientated
towards seismics and were designed to assist with Part I of the course. In
that the objective was to prepare a mathematics book for use by geophysicists
all instr~ction is of a purely mathematical nature and no attempt is made
to apply the text to physical theory. Therefore, in the first three chapters
the mathematics used with alternating current theory, operational amplifiers,
Huygens' principle and the Rayleigh-Willis curve will be explained but no
attempt to apply it will be made. One departure from this practice is the
section on number systems where a description of binary, octal and
hexadecimal bases could be equally at home in a text book on physics or
electronics. However, as this is so crucial to the further development of
digital time series analysis it was decided to include this.
Foreword xi

The later chapters are much more complex and ultimately go beyond
what is usually taught in a first degree subsidiary mathematics course.
The middle section covers complex numbers which introduces the idea
of vector transform, a section that is further developed in the chapter on
Fourier analysis; matrices which are the main number system used in
computers and is the direct form for de-multiplexing. The next chapter is
devoted to stochastic processes and probability and a development of the
concepts of mean, median, root mean square and standard deviation. The
final chapters deal with Fourier Analysis and Transforms and Time Series
Analysis. Covered in these chapters are the mathematical concepts involved
in Fourier synthesis and decomposition, predictive deconvolution and
Weiner Filtering.
The actual processing concepts and details were covered by the course
of lectures produced by Merlin Geophysical Company. The point of the
mathematics course was that the Merlin lecturers would not have to be
interrupted by definitions of triple integral signs and the convolution and
de-convolution signs. Short sections are also included on the z-transform,
and the wave equation and the frequency-wave number domain.
To expand the last point, this text is not meant to be a mathematical
discussion of predictive deconvolution and the like, neither is it meant to
be a rigorous and complete mathematical text. The idea is that, those who
have not had a heavy University mathematical training, will be able to use
this book as a reader and background to the classic processing papers by
Robinson, Backus, Claerbout, Weiner, Berkhout, etc.
One of the problems encountered in designing this course in general,
which was also highlighted with the mathematics content, was the bringing
together of a number of advanced level ideas from a variety of subjects and
disciplines. This is evident in the number of starting points, almost with
the beginning of each chapter, and the way in which the whole text only
really comes together in the final chapters. Consequently, the text does not
flow from one chapter to next as would be the case in a conventional text
book. In other words, in order to carry out seismic data processing it is
necessary to have an understanding of binary number systems, discrete
wavelet sampling, digital filtering and Fourier analysis as well as the work-
ings of digital recording and large mainframe computing techniques.
Incidentally, an understanding of the geology, and geophysics of the earth
is quite helpful as it gives one an idea after all the processing as to how
well you have done.
It cannot be emphasised too much that this is a textbook, covering many
of the mathematical techniques employed in modern day seismic data
processing and as such most of the examples and references are linked to
specific problems. Indeed, an attempt has been made to correlate the
examples through to final processing ideas and to only consider that which
is rei event to seismics; consequently, this is not a complete mathematical
course. But like all mathematics text books, it cannot be read from cover
to cover in an evening like a novel. Time has to be taken over each chapter
and the examples worked in order to achieve a reasonable understanding.
Although written essentially for geophysical work, most of the techniques
involved are equally relevant to the field of data transfer, whether it be by
xii Foreword

telecommunication, both radio and wire, fibre optics, or acoustics. The only
difference being that remarked on by one of the authors towards the end
of the final drafting, that in seismics we were looking for and at the noise,
rather than at the signal which is usually the case. The problem is that there
is noise arid noise. Hopefully this book will remove some of the noise from
the processing.
R. L. French
Racal Geophysics Limited
PREFACE

In this book we have tried to present the mathematical foundations for


understanding signal processing in seismic analysis. It is not meant to be
a mathematics text book in the traditional sense but rather a mathematics
text to give the reader an understanding of the concepts involved. The object
is not to turn the reader into a mathematician but rather to enable him or
her to be mathematically literate.
The chapters of the book could be divided into three sections, the first
three chapters making up the first section. In this section, most of the
material is essentially" A-level material" (in the English context) or first
calculus course (in the American context). There are various sections
especially relevant to geophysics. It could be said that most of this is essential
to the understanding of continuous processes.
Chapters 4 and 5 are not always covered in such basic courses but
nevertheless underlie a lot of the work done in the last three chapters. It is
difficult to imagine a modern engineer or applied scientist who has not a
need to know about the contents of these chapters. It is perhaps in these
two chapters that the reader will first find material which is not necessarily
familiar. It is important at this point to recall that reading mathematics is
a slow procedure. By its nature the writing is condensed, many ideas can
be compressed into a short formula. So read slowly and carefully, but don't
be afraid to go on and then return. Often the way a topic develops leads
to greater understanding of its roots.
The last three chapters are the meat of the book. In one sense the
justification of the first six chapters is to enable the reader to understand
these last three chapters. One could almost say that this is presented as a
precursor to the work of Enders Robinson.
Throughout the main text the reader will occasionally find a "P" in the
margin. This means that there is a program (written in BASIC) which is
relevant to the material in the text at that point. These are not meant to be
the latest, fastest, most efficient programs. Their purpose is to enable the
reader to use them on most micros and to get some feeling for the mathe-
matics involved.
We would like to thank the Chairman and Directors of Racal Electronics
plc and the Directors of their subsidiaries Racal Training Services Limited
and Racal Geophysics Limited, for permission to write and publish this
book.
xiii
xiv Preface

In particular, thanks to Phillip Holden (Oxon.) C. Eng. M.I.E.E. of the


Racal Training College at Brixham for checking the original manuscript
and for his constant encouragement throughout the project. Thanks are also
due to Robert Whittington BSc., MSc., PhD, F.G.S. of University College
of Wales, Aberystwyth, members of the School of Environmental Studies
at the University of East Anglia, the Directors and staff of Merlin Geophy-
sical Company Limited of Woking and of course colleagues at Racal
Geophysics.
Finally, we would like to thank Carol Haines who did a magnificent job
typing the manuscript, correcting our inadequate English and spelling as
she went along. We would also like to thank Nick Bartlett for the figures
and many colleagues for comment and criticism. However, like all authors,
we must make it quite clear that all errors, mistakes and inaccuracies are
due entirely to us.
A. R. Camina and G. J. Janacek
University of East Anglia
INTRODUCTION

One of the most important practical applications of geophysics is in the


search for accumulations of oil and gas. This is largely a matter of looking
for suitable rock structures that might have trapped the accumulations.
Typically, useful reserves of oil and gas are found at depths of several
thousand metres below the Earth's surface. It is only at such depths that
the conditions of pressure and temperature are right to cook the organic
remains in a source rock, leading to the formation of oil and gas. These
petroleum products are expelled into the surrounding rocks, displacing the
water which is otherwise present in the pore spaces. Being less dense than
the water, the oil and gas tend to rise up through the prorous rock until
their path is blocked by a layer of rock with no pore spaces, through which
they cannot pass. Thus, a petroleum accumulation will tend to form at the
top of a "buried hill" (anticline) of porous rock, which is capped by an
impermeable layer. Therefore, petroleum exploration is largely a matter of
looking for such buried structures.
Sometimes, it is fairly easy to guess the strucure at a depth of several
thousand metres by extrapolating the structures visible at the surface. In
such a case, the hypothesis that deep structures are geometrically similar
to the shallow ones can be easily tested by drilling a few wells. However,
such easy cases have usually been drilled long ago. If we want to look for
new oil-fields today, we must search for deeply buried structures that have
no expression at all at the surface. In principle, this search could be carried
out by drilling a large number of exploration wells. However, wells are
expensive to drill, and in many cases, especially offshore, trying to find
fields by more-or-Iess blind drilling would be hopelessly uneconomic.
It is at this point that the geophysicist can offer a great deal of help.
Geophysical methods permit us to build up a picture of the sub-surface
structure to depths of several thousand metres. With this knowledge, a small
number of wells can be precisely situated so as to test the most propsective
structures; furthermore, the geological information gained from the well,
which in itself tells us only about a zone within a few feet of the borehole,
can be extrapolated laterally with some confidence, perhaps for many
kilometres in favourable cases. Various geophysical methods (gravity, mag-
netics, seismic refraction) can be used to delineate sub-surface structure,
but they mostly have very poor vertical and horizontal resolution, giving
xv
xvi Introd uction

us a very generalised picture of the sub-surface. One method, however, can


give us rather precise knowledge of sub-surface structure, with a resolution
down to a few tens of metres in many cases. This is the seismic reflection
technique, which in the last twenty years has become a basic tool of
petroleum exploration.
In essence, the idea behind this technique is an extension of that of the
ship's echo-sounder. Sound waves are generated at the Earth's surface,
using a rather strong source such as a controlled explosion. The sound
travels down into the Earth, and is reflected back from sub-surface layers
where a change in rock physical properties occurs. These echoes from
sub-surface discontinuities can be detected by receivers, analogous to micro-
phones, on the ground surface and recorded on magnetic tape. It is then
possible to measure the time taken for the sound to travel from the surface
down to the reflecting layer and back again; this travel time is a measure
of the depth of the reflecting interface. Because of the great depths to the
reflectors, the echoes are rather faint, and considerable skill is needed in
the design of field equipment that will record them satisfactorily. It is usually
necessary to use an array of receivers and add together their signals, so as
to achieve partial cancellation of random noise, such as might for example
be generated by road traffic or by the wind blowing in the trees. For the
same reason, it is often necessary to repeat the observation several times
at each location, and add together the resulting echo signals from each
separate explosion.
Even when the echoes have been satisfactorily recorded, our problems
are not at an end. It is fairly easy to interpret the echo travel-times from a
ship's echo-sounder in terms of depth to the sea-bed, because the sound
has travelled through a (nearly) homogeneous sea-water layer. In seismic
reflection exploration, the sound has travelled through what may be very
complex structures below the surface, and this gives rise to a number of
complications. By way of illustration, let us single out just two of the
problems that arise.
Firstly, we should ideally like to put a very sharp pulse of sound into
the ground, and record the echoes as a series of sharp pulses also. In this
way, it would be possible to measure the travd-times very accurately.
Unfortunately, this ideal is unattainable, for two reasons. It is not possible
to generate an ideally sharp pulse with any practical course, even an
explosion. Even if this were possible, however, it is a property of the real
Earth that, over a path of several thousand metres, it will smear out an
initially sharp pulse, perhaps turning it into quite a complex waveform.
This will limit the vertical resolution we can attain; if we have a series of
closely-spaced reflectors, the echoes from them will overlap in time, and
the resulting jumble will be difficult to sort out into the echoes from the
individual layers.
Secondly, if we have a number of reflecting layers, sound energy can be
partly trapped, bouncing backwards and forwards between them. Some of
this energy eventually arrives back at the surface, but later than expected
because of the time spent in the part of the path where it is bouncing up
and down. If the reflecting layers involved are close together, the extra time
taken will be small, and the effect contributes to the pulse-broadening
Introduction xvii

mentioned above. If, however, the layers are several thousand metres apart,
the delayed arrival will mimic a reflection from a much deeper layer. Clearly,
we would not want to drill an apparent deep structure which is actually
such an artefact of the method.
Great progress has been made over the last twenty years in dealing with
these problems. Perhaps the greatest single advance has been the use of
digital recording of the receiver signals. This has meant that the data can
be fed into a computer, where a wide variety of ingenious signal-processing
techniques can be applied. In this way, the problems outlined above can
be partially solved. Thus, it is possible to estimate the shape of the smeared-
out pulse that actually generated the data, and then process the records to
approximate what they would have been with an ideally sharp pulse. It is
also possible to predict the arrival times of energy that has spent some time
trapped between sub-surface layers, and then subtract this signal from the
record, so that we are no longer fooled into believing that these echoes
come from genuine deep reflectors.
These signal-processing techniques were first applied, of necessity, in the
search for deep oil accumulations, but in recent years they have been
increasingly used in the rather simpler problem of shallow sub-surface
investigation. This is usually carried out for engineering purposes (predic-
tion of drilling hazards, and selection of sites for offshore production
platforms or pipeline routes). In this application, the depth of investigation
required is usually only a few hundred metres, but a relatively high resolution
may be needed.
To apply these signal-processing techniques with confidence, it is impor-
tant to understand their nature and limitations. A cookerybook approach
is not enough; what is ideally needed is a thorough understanding of what
happens to the seismic signal as it propagates through the earth, and the
effects of the source and receiver parameters on our picture of the sub-
surface. It is not only the geophysicist directly concerned with processing
the data who needs this appreciation; anyone who interprets the data in
geological terms needs a clear understanding of the distortions introduced
into his picture of the sub-surface by the imperfections of the seismic
reflection technique.
Any useful and detailed account of the seismic reflection method inevi-
tably involves a good deal of mathematics. Many texts cover this ground,
but they all assume that the reader has a good grasp of the mathematical
background, and make no real attempt to explain the mathematics (as
opposed to the physics) of the development of the subject. However, it is
a dubious assumption that the user of such a book will have enough
mathematics to be able to follow the arguments in detail. With the increased
use of the seismic method, a wide variety of people need to learn about it,
and their pre-existing mathematical knowledge will range from a fairly
elementary school level in some cases to University level in others. To fill
gaps in his knowledge, it would be possible for the student to read parts
of various traditional texts, assembling a "course" directly related to his
needs; this type of study, however, requires careful guidance from an
experienced teacher. This book brings together these scattered topics,
to give a coherent account of the background mathematics needed to
xviii Introduction

understand the seismic method, and thus covers ground which no other single
textbook does. It begins with a section on basic concepts, which will be
useful as a reference source even to those familiar with them, and goes on
to develop the subject to a level at which the student can read for himself
the (sometimes rather abstruse) literature on seismic processing. The final
section of the book provides a bridge to the more advanced material to be
found elsewhere.
An important feature of the text is the provision of numerous examples,
and some illustrative computer programs. By working through these, the
student can acquire the detailed familiarity with mathematical manipulation
which he will need if he is to understand the basis of modern seismic
techniques.
M. Bacon
Shell UK Exploration and Production Ltd
Chapter 1

SPECIAL FUNCTIONS

1 FUNCTIONS

Central to the development of mathematics and its applications is the idea


of a function. The idea will probably be a familiar one, especially in the
guise of a graph or as something likef(x) = x 2 + x 3 + X4 or f(x) = x 3 -1/ x +2.
However, there is a more general notion of function which it is important
to understand in order to develop all the ideas we are going to encounter.
We begin with two sets A and B and we think of a function as a black
box connecting them (see Fig. 1.1). The idea being that if we feed an element
a of A into the black box then we get an element b of B. If we call the
black box f, we write this as b = f( a). An important point is that if we feed
in a we always get the same element f( a) (or, if you like, the black box is
completely determined). There is no reason why A and B have to be distinct,
in fact in many cases they are both sets of real numbers.
As an example, consider a car journey from Yarmouth to Brixham. For
A we have the times and for B the distances. Let f(t) be the distance
(element of B) travelled in time t. This example illustrates one of the
important points of a function: given a particular time there is just one
distance travelled (although if the trip was broken up by a stop for lunch
it may well be true that for different times the distance travelled will be the
same).
A useful and standard method for representing functions is to draw a
graph. In the example given the graph might well look like Fig. 1.2. This
is rather idealised; a real journey on real roads would probably be more
irregular, as.in Fig. 1.3. In this particular example we see that both A and
B consist of real positive numbers. Since later on we will need to consider
alternative situations let us consider some different cases. If we measure
for each point on the earth's surface the pressure, temperature and height,
we will obtain a function g from A, the points on the earth's surface, to a
"triple" of real numbers (p, t, h) where p is the pressure, t the temperature
and h the height. If we measure points on the earth's surface by latitude
and longitude we can consider A as pairs of real numbers. We can then
say that g is a function of two variables.
A less familiar example comes from opinion polls. If we sample the
population and ask whether or not they would vote for the SDP, A would
2 Mathematics for Seismic Data Processing

A-----':)>-----i f-----))-- B

Fig. 1.1

distance

time

Fig. 1.2

distance

time

Fig. 1.3

be the population and B would just consist of {Yes, No}. In this case the
function takes only one of two values. In general our functions will normally
have as their values real numbers; although some of them may look a little
unusual. We note in passing that we can often replace non-numerical
outcomes by numerical values, i.e.
Yes=i and No=O
making numerical processing possible.
In our discussions special functions play an important role in later
chapters. We continue by looking first at the polynomial functions and then
introduce the trigonometric functions. Section 3 introduces exponential
functions and ends with a discussion of the inverse functions, specifically,
log and the inverse trigonometric functions.

2 POLVNOMIALS AND STEP FUNCTIONS


We begin this section by recalling the idea of a polynomial, that is an
expression of the form anx n +an_1x n- 1+ ... +ao. If an,=O an is called the
leading term, ao is the constant term and n is the degree of the polynomial.
A polynomial of degree 0 is called a constant polynomial. A polynomial
can be thought of as a function from the reals to the reals x ~ anx n + ... + ao.
Special Functions 3
If the polynomial is viewed in this light then it makes no difference if it
takes y ~ anyn +... +ao, so that the particular indeterminate used is
irrelevant. The values of z for which p(z) is zero are called the roots.
Polynomials are added term by term and so if
p(x) = Po + PIX + ... + Pn Xn
and

we have
p(x) +q(X) = (p + q)(X) = (Po + qO) + ... +(Pn + qn)X n
The formula for multiplication is slightly more complex, let r(x) = p(x)q(x)
where

then
i

Ci = L Pjqi-j O::::;i::::;m+n
j=O

We use the assumption that


Pj =0 if j> n and lJ.i=0 ifj>m p
This formula will occur again in the text and is referred to as the
convolution of (Po, ... ,Pn) with (ql, ... , qm). If we consider the collection
of all sequences say (Po, ... , Pn) we can define a function from (Po, ... , Pn)
to the polynomial Po +PI X + ... + Pnxn. If the indeterminate is labelled z so
that given any sequence (Po, ... ,Pn) we construct a polynomial Po + PI Z +
... + Pnzn; this is called the z-transform of (Po, ... ,pJ and plays a major
role in applications.
There are two ways to extend the idea of polynomials both of which have
applications. One is to allow negative indices, that is terms like Z-2 = 1/ Z2.
Then sequences like (P-2, P_I. Po, PI) can be transformed to P_2Z-2 +
P_IZ- I +PO+PIZ. With some thought this is the same as
1 2 3
2(P-2+P-IZ+POZ +PIZ)
z
The second generalisation is to consider infinite sequences, (Po, PI. ... ).
To see how this can be useful consider the product of two polynomials p(x)
of degree nand q(x) of degree m. Then p(x)q(x) has degree mn which is
zero if and only if m = n = O. So given a polynomial p(x) of degree greater
than 0 we can find no polynomial q(x) so that p(x)q(x) = I (= means
identically, that is as polynomials). However if we allow q(x) to have an
infinite number of terms then we can sometimes solve the above problem.
Consider the simple example p(x) = I + x. Let
q(x) = qo +qlx +q2 x2 + ...
Then
4 Mathematics for Seismic Data Processing

Using the product rule for polynomials we get


qo= I
qo+ql = 0
ql +q2=O

So qo=l, ql=-I, q2=1, .... Thus


(l + x)(1 - x + x 2- x 3 + ... ) = I
or
I
--=I-x+x 2
-x 3 + ...
I+x

In fact given any polynomial Po + PI X + ... + Pnxn where Po ¥- 0 we can


always find an infinite polynomial q(x) such that

I
-=q(x)
p(x)

However there are problems in doing this. Let p(x) = 1- x, then q(x) =
I +X+X2+ .... If x= I, q(I)= I +1 +1 + ... , which is infinite. So there is
a new complication to consider when we expand the definition of poly-
nomials in this way.
Polynomials have the nice property of being continuous. This means that
there are no jumps in the graph. If f(x) has no jumps, and we have calculated
f(xl) and f(X2), where XI < X2, and they have different signs then there is a
value Xo between XI and X2 so that f(x2) = o. This value Xo is called a root
of the equation f(x) = 0 and is the point where the curve cuts the x-axis.
This could also be used as a basis for an iterative method of solving such
equations, as follows:
Letf(x) = x 3 -2 and choose XI = I and X2 = 2. Thenf(1) = -I andf(2) = 6
so there is a value a such that f( a) = o. That is a 3 = 2 and 1 < a < 2. Now
choose X3 = (XI + x2)/2 = 1.5. Further x~ - 2 > 0 so 1 < a < 1.5. Put X4 =
(XI + x3)/2 = 1.25 and (1.25)3 - 2 < 0 so 1.25 < a < 1.5. Put Xs = 1.375 and
P x;-2> 0 so 1.25 < a < 1.375. Put X6= 1.3125, x~-2> 0 so 1.25< a < 1.3125.
We can repeat this process to get a more and more accurately.
At this stage it will be of use to introduce the idea of a step function. Let
us begin with a picture, Fig. 1.4.
How do we represent this symbolically?

o if x< I
{
f(x)= 1 ifl~x~3
o ifx> 3
[Note: a ~ b means a is less than or equal to b.]
This is a perfectly satisfactory function. Given any value of X we can
calculate the appropriate values of f(x). Note that this function is discon-
tinuous at X = I and at X = 3. Another example is the Heaviside function
Special Functions 5

y·f(x)

-1 x
Fig. 1.6

given by:
ifx<O
f(x) = {~ ifO:5x
This is shown in Fig. 1.5.
Examples
l. Let
o ifx<1
f(x)= {2 if-1:5X:51
o if x > 1 (see Fig. 1.6)
Then the graph is given by Fig. 1.6.
2. Let
if x <-3
if-3:5x:56
ifx>6
Then the graph is given in Fig. 1.7.
6 Mathematics for Seismic Data Processing

y
4

-3 6 x

Fig. 1.7

y 3

-------.--.--.--+--.--.-------~x
-3

Fig. 1.8

One interesting function which is very useful is the integer part of x.


Many computers have this function normally denoted by Int(x). It is defined
as the largest integer not larger than x. So
p Int(l.l) = I Int(3.57) = 3 Int(3) = 3
but note
Int(-2.5)=-3 and Int(-9.7)=-1O but Int(-2)=-2
It is an instructive exercise to draw the graph ofInt(x). In many mathematical
texts the function is represented by [x] or Lxj.
Exercise 1
In the first two write down f(x) for the graph drawn.
(i) Fig. 1.8
(ii) Fig. 1.9
(iii) Draw the graph of f(x)
-I if x<2
{
f(x) = +01 if2:5; X:5; 5
ifx>5
(iv) Sketch x-int(x)
Special Functions 7

----------~-----ho~---.--------~x
-1

-1

Fig. 1.9

L...L------i A

Fig. 1.10

3 TRIGONOMETRIC FUNCTIONS
The central idea is that of angle and how to measure angle in radians.
Consider a circle of radius I, this is called the unit circle. Let 0 be the
centre and A and B two points on the circumference as in Fig. 1.10. The
size of the angle made by AOB in radians is the length AB, which we call
(J. Since the circumference of the circle is 217, we get the formula 217
radians = 360°. Most calculators have a facility for calculation in radians
or degrees. Notice that 0 radians = 0°, 17/2 radians = 90°, 17/3 radians = 60°
and 17 radians = 180°. Given the unit circle with centre 0 we let OA be the
x-axis. Let B be a point on the circumference such that AOB is the angle
(J. Let C be the point on OA such that the angle OCB is a right angle, see
Fig. 1.11. We can now define two functions of (J, the cosine and sine function,
normally written as cos and sin.
cos () = OC and sin () = CB
(If you are familiar with the definition of cos (J as (adjacent)/(hypotenuse)
note that this is the same since the hypotenuse OB equals 1.)
We have to ensure that we measure the lengths with the appropriate
directions. When (J lies between 0 and 17/2 there are clearly no difficulties
but when () lies between 17/2 and 17 we see from Fig. 1.12 that cos () = OC
which is negative but that sin () = CB which is positive.
A third important function related to sine and cosine is the tangent which
is written tan and defined by tan () = sin (J / cos (J. We label the quadrants as
2nd 1st
3rd 4th
8 Mathematics for Seismic Data Processing

Fig. 1.11

----------4-ll---~-L--~--------~x

Fig. 1.12

Note
Os; (Js;1T/2 gives the 1st quadrant
1T/2<(JS;1T gives the 2nd quadrant
1T<(Js;31T/2 gives the 3rd quadrant
31T/2 < (J < 21T gives the 4th quadrant
We can tabulate the values of cos, sin and tan as follows:
1st 2nd 3rd 4th

cos +ve -ve -ve +ve

sin +ve +ve -ve -ve

tan +ve -ve +ve -ve

Using Pythagoras' Theorem it can be shown that


cos 2 (J +sin 2 (J = 1
There are a number of very useful identities relating cosine and sine and
the addition of angles. We will state the main two of these relations.
Historically in geometry it was traditional to denote angles by capital letters
Special Functions 9

TT-9

Fig. 1.13

so that identities are as follows:


cos(A ± B) = cos A cos B Of sin A sin B
and
sin( A ± B) = sin A cos B ± cos A sin B
For some particular values the results can be seen by looking at the pictures
we drew earlier. If A = 7T and B = 8 we get cos( 7T - 8) =
cos 7T cos 8 +sin 7T sin 8. Now cos 7T = -1, sin 7T = 0 and so
cos( 7T - 8) = -cos 8 (see Fig. 1.13).
Clearly cos 8 = OC and cos( 7T - 8) = CO. Also from A = 7T /2, B = 8

. (7T
S10 - - 8 ) = S10
. -7T • cos 8 - cos -7T . S10
. 8
2 2 2
But sin 7T/2= 1 and cos 7T/2=0 so

sin ( ~- (J) = cos (J

We can think of the angle (J as being a rotation through an angle (J in


an anticlockwise direction. Now if we wish to consider angles greater than
27T, we rotate through more than one complete revolution. The effect of a
rotation by 27T + 8 will be just the same as by 8, similarly 47T + 8 and 67T + (J
etc. Further, to obtain - 8 we rotate in a clockwise direction through 8 as
in Fig. 1.14. Thus - 27T is the same as 0; we obtain 8 - 27T = 8 and 8 - 47T = 8
etc.
It is useful at this point to introduce a new function. In many situations,
as in the one above, we are only interested in the value of something after
ignoring multiples of some fixed number. For example, in days of the week,
clearly multiples of 7 can be ignored. So given x we define a function mod x
where a mod x means the remainder of a after dividing out multiples of x.
It is helpful to give some examples: (i) 7 mod 5 = 2, (ii) 8 mod 5 = 3,
(iii) 13mod5=3, (iv) 2 mod 1.5=0.5, (v) 37T/2mod7T=7T/2, (vi) (-3)
mod4= l.
10 Mathematics for Seismic Data Processing

Fig. 1.14

This last one requires some comment, but where a is negative we choose
a mod x to be positive (usually) so that - 3 +(4) = I or (-3)-1 = -4 = -I x4.
An alternative way of writing this is to say that a mod x is the number
such that a - (a mod x) = nx for some integer (whole number) n, and (a -
a mod x) is positive.
Exercise 2
Evaluate the following:
(i) 7 mod 3 (ii) 25 mod 5 (iii) 3 mod 2
(iv) -5 mod 2 (v) 6 mod 2.4 (vi) 517 mod (217)
So the value of cos 8 depends only on 8 mod 217.

Since 17 + 8 is clearly the same angle as 8 we would expect that COS(217 +


8) = cos 8. Using the fact that cos 217 = I and sin 217 = 0 and the now familiar
identities:
COS(217 + 8) = cos 217 cos 8 - sin 217 sin 8
and
sin(217 + 8) = sin 217 cos 8 +cos 217 sin 8
thus
COS(217 + 8) = cos 8
and
sin(217 + 8) = sin 8
Finally using the somewhat tricky looking idea that - 8 = 0 - 8 we have
cos( - 8) = cos 8 and sin( - 8) = -sin 8
Since calculators are readily available a useful exercise would be to
evaluate certain trigonometric functions using them.
Exercise 3
Evaluate cos, sin and tan for the following values (all given in radians):
0.3, 0.5, 0.03, 1.56, 7.84, -12.06.
Answers
0.95, 0.29, 0.30; 0.877, 0.479, 0.54; 0.9995, 0.2999, 0.0030; 0.010, 0.99994,
92.62; 0.010, 0.9994, 92.62; 0.87,0.48,0.55.
Special Functions 11

Fig. 1.15

There are three further trigonometric functions which are common and
have names. These are based on the ones we have already defined:
cosec 8 = 1I sin 8
sec 8 = II cos 8
cot 8 = Iltan 8
At this stage it is valuable to remind ourselves how these functions were
defined, see Fig. 1.15. Since A and B lie on the unit circle centre O,OA
and DB have length 1. Thus DC and CB both have length less than or
equal to 1. So
- b; DC ::5 + I and -1::5 CB ::5 + I
This can be reinterpreted to mean that
- I ::5 cos 8::5 + I - I ::5 sin 8::5 + I
These facts can also be deduced from the formula cos 2 8 +sin 2 8 = 1.
The next important stage is to sketch the curve of cos and sin. PrlJbably
many of you are familiar with these anyway. Firstly since cos(8 +27T) = cos 8,
if n is any integer (something of the form 0, ±I, ±2, ... ) we have cos(O +
27Tn) = cos 0. Therefore, if we sketch the portion for 0::5 8::5 27T, the remain-
der is just a repetition. We can tabulate some values:

0 0 7T/2 7T 37T/2 27T

cos I 0 -I 0 I

Also cos -8 = cos 8 (i.e. cos is an even function) and so the curve for cos 8
is as shown in Fig. 1.16. For the sine curve we tabulate:

8 0 7T/2 7T 37T/2

sin 0 I 0 -I

Also sin 8 = -sin( - 0). (Functions with this property are called odd.) Figure
12 Mathematics for Seismic Data Processing

Fig. l.l6 Fig. l.l7

1\ 1\ 1\

v v V V V v

Fig. l.l8

1.17 gives the sine curve. If we draw the picture as in Fig. 1.4 and imagine
() traversing the circle, we can see that CB grows bigger and smaller and
that if we plotted its height on a graph against () we would obtain the graph
shown in Fig. 1.17.
Clearly it is now possible to build up more complicated functions. What
happens if we consider cos 2()?
Since 2( () + 7T) = 2() + 27T
cos(2( () + 7T» = cos 2( ().
Tabulating values of () from o~ 57T/4 gives:

() 0 7T/4 7T/2 37T/4 7T 57T/4

cos 2() 1 0 -1 0 1 0

In this case the graph repeats after only 7T, this is illustrated in Fig. 1.18.
If we wrote cos 3() this would repeat more quickly. Tabulating cos ()/2,
however, we see that this repeats more slowly (see Fig. 1.19):

() 0 7T/2 7T 27T 37T 47T

cos () /2 1 0.70 0 -1 0 1

Clearly cos( w() is a graph of the same shape with w determining how
rapidly it repeats. The constant w is called the angular frequency of this
Special Functions 13

Fig. 1.19 Fig. 1.20

Fig. 1.21

curve. As w increases the rate at which the graph repeats increases. Figure
1.20 illustrates this. The wavelength is the length between the repeats (or in
mathematical language the period). Thus for cos e the wavelength is 21T,
for cos 2e the wavelength is 1T. In general the wavelength A and the frequency
are related by the following relation:
217'
A=-
w
Note: Frequency in engineering is usually in Hertz, cycles per second, and
so frequency is usually "frequency" /217'. Since one "cycle" consists of one
rotation through an angle 21T then 1 cycle per second is 21T radians per
second. Thus the more usual equation is A = 1/ w. Similar arguments apply
to the sine wave.
Another stage is to consider functions of the form 3 cos e or ~ cos e. These
two examples are illustrated in Fig. 1.21. If we have a wave a cos we then
a is called the amplitude. The last alteration we can make to the simple cos
or sin wave is to consider a wave of the form a cos( we + cf». The term cf>
moves the wave along the axis and is called the phase. For example, in Fig.
1.22 we sketch cos e and cos( e+ 1T /2). Using the identity discussed earlier,
17')
cos ( e +2 = cos e cos 2
1T
- sin e sm• 217' = -sm
.
e
Note: cos( e -1T /2) = sin e.
These ideas are basic to the whole of Fourier analysis and the theory of
waves.
14 Mathematics for Seismic Data Processing

Fig. 1.22 Fig. 1.23

Exercise 4
Sketch the following:
(i) 2 sin 38 (ii) cos( 8 + 1T)
(iii) ! sin 28 (iv) sin 8 +cos 8

As a final note to this section, we observe that when the angle 8 is small,
we can find some simple and useful approximation for sin 8 and tan 8.
Figure 1.23 shows the unit circle with a small angle 8. From the diagram
we can see CB < arc AB. Thus CBIOB = CB = sin 8 < 8 and so 0 < sin 8 < 8.
We can also make a rather more subtle deduction:
CB
tan8=-
OC
since OC < OA = 1
CB .
tan (J=-> CB=SlD 8
OC
and so
tan (J > sin 8.
For small angles (in radians) we can assume
tan (J = sin 8 = (J
thus for (J = 10-7
tan 8 = 10-7
sin 8 = 10-7
When 8 is small a useful approximation is cos (J = 1 = sin 81tan 8.

4 POWER AND EXPONENTIAL FUNCTIONS


We generalise the idea of xm where m is a whole number by using the rule
that x nt +" = xm. x". This enables us to make coherent sense of x- m = II x m,
xo= 1, XI/2=h as XI/2XI/2=XI. These are rules that are almost certainly
familiar but this gives a reason for their appearance. We can now draw the
Special Functions 15

y y

x x

Fig. 1.24 Fig. 1.25

graph offunctions like b = lOa (see Fig. 1.24). This is the basis oflogarithms
and log tables.
One favourite number to replace x is the number e = 2.718281828 ...
which seems rather bizarre. In fact as we shall see the number e is very
useful and crops up all over the place-you will notice it appears in all
scientific calculations.
Some calculations show that

a ea

-1 0.36788
-0.5 0.60653
0.0 1.0000
0.5 1.6487
1.0 2.71828

This produces the graph shown in Fig. 1.25, which is of the same shape as
that in Fig. 1.24. You will often see ea = f( a) written as b = f( a) = exp( a).
It is such an important and useful function that most computer languages
have it provided.
Power functions can often arise as solutions to equations. Suppose we
have a function f(t) where f(t) is some value at time t.
Suppose we notice
f(t + 1)- f(t) = xf(t)
for some value x, so the increment in our time interval is proportional to
the function value.
If f(O) = I (for simplicity) then
f(l) =(1 + x)f(O) =(1 + x)
f(2) = (1 + x)f(l) = (1 + xf

f(t) = (1 +x)'
giving a power function!
16 Mathematics for Seismic Data Processing

A more complex example is as follows. Suppose we measure f( t) at small


time increments tlt and we notice
f(t +tlt)- f(t) = tltf(t)
if we assume f(O) = I as before then
f(t+tlt)=(1 +tlt)f(t)
f(tlt)=(1 +tlt)
and f(2tlt) = (I + tlt)2

f(t)=(1 +tlt)I/1l1

Example 3
Compute f(1) for tlt = 0.1,0.001,0.0001. You will notice that your solutions
for f(l) are very close to e. In fact as tlt -+ 0 the value for f(l) tends to e.
(Try tlt = 0.00000001!)

5 INVERSE FUNCTIONS

Our definition of a function f from A to B was given in terms of a black


box f, Fig. 1.26. A natural question to ask is whether we can find a black
box say g which reverses the effect of f. Clearly g would have to go from
B to A. It must have the property shown in Fig. 1.27. If we put a in at the
beginning we get a out at the end. In terms of an equation, we say that
g(f(a» = a. Similarly we require thatf(g(b» = b. We call such a g the inverse
function to f and write it as rl.
Consider the following fairly simple example:
Let f(x)=3x + 1
y=f(x)
so
f(O) = I
f(l)=4
Then it is possible to show that
g(y) = t(y -I)
so
J(g(y» = 3H(y -I)} + 1= y
and
g(f(x» = H(3x + I) -I} = x
From Fig. 1.28 we can read off a value of x for any y by following the
dotted line. Thus getting rl(y) from the same sketch as f(x).
Sometimes it is not possible to define such an inverse. Suppose we take
f(x)=x 2 sof(l)=1 andf(-I)=I, see Fig. 1.29. If we look at the set of
values of x 2 it is not clear whether x 2 = I gives x = + I or x = -I.
Special Functions 17

A---1L____ g____ ~----B


Fig. 1.26

A )

Fig. 1.27

-1 x

Exercise 5
(i) IfJ(x)=7x-4findg(x)
(ii) If J(x) = (x _1)3 find g(x)

The first and perhaps the most important inverse function is defined for
exp(x). If we look again at the graph of y = exp x, Fig. 1.25, we can see
that for any real Yo we can find Xo such that Yo = exp Xo. Notice that Xo is
unique. This will define a function called loge or In. The graph of y = loge x
can be found from the graph of exp x and is given in Fig. 1.30.
Some properties of loge x follow almost immediately from this definition:
log.(I) = 0
log.(e) = I
log. (xy) = logex + logeY
If instead of exp x = eX, we chose a different power function, say lOX, the
inverse function would be called loglo x.
Exercise 6
Evaluate the following expressions using your calculator:
(i) loglo(l02) (iii) e \.5 (v) 1010810 1.5
(ii) loge 4.4816 (iv) exp(loge 10) (vi) exp(loglo 3)
18 Mathematics for Seismic Data Processing

Fig. 1.30

Logarithms were initially developed by Napier in the first half of the


seventeenth century to facilitate calculations. It enabled multiplication to
be reduced to only 3 operations, viz. (1) take logs, (2) add logs, (3) take
exp (in tables normally called antilogs).
Example 4
To find 13579 x6243:
108(13579) = 9.5162
log(6234) = 8.7378
Sum = 19.2540
e 19.2540 = 84646981 Answer

Inverse Trigonometric Functions


We can now see if there are suitable inverse functions for the common
trigonometric functions cosine, sine etc., see Fig. 1.15. We must restrict the
values of x since cos(O) = COS(21T) = COS(41T) = 1. To get around this problem
we only consider values between 0 and 1T. Thus we restrict our definition
of cos x to values of x between 0 and 1T.
Looking at Fig. 1.31 of the restricted function we now see it is easy to
define the inverse function, i.e. given y find x such that
x=cosy
The function y = g(x) is written, y = cos- I x or y = arc cos x and is shown
in Fig. 1.32.
Notice we have been working in radians.

cos- I X

x degrees radians

-1 180 1T
-0.5 120 21T/3
0.0 90 1T/2
0.5 60 1T/3
1.0 0 0
Special Functions 19

x
Fig. 1.31 Fig. 1.32

Fig. 1.33

y y

Fig. 1.34 Fig. 1.35

We can define an inverse sine in the same way, see Fig. 1.33.
Again we need to restrict our definition of sin x. In this case we choose
to define sin x only for x between -7T/2 and 7T/2. Then we can define the
inverse function
y =sin- I x
or
y = arc sin x
Notice that both sin-I x and COS-I x only have a definition for x in the
range - 1 to I.
Exercise 7
Sketch sin-I x

The remaining trigonometric function we consider is the tangent, see Fig.


1.34. Again to obtain an inverse we see from the diagram we need to restrict
our values of x to x between -7T /2 and 7T /2. In this case the inverse function
is tan -I x or arc tan x. Fig. 1.35 shows its behaviour.
20 Mathematics for Seismic Data Processing

Fig. 1.36

Fig. 1.37

6 NEW FUNCTIONS FROM OLD

There are a number of ways, given two functions, say f(x) and g(x), of
constructing a new function from them. If we use our black box idea we
get Fig. 1.36.
Here we get the f + g which is obtained by adding the results of f to that
of g. We can replace + by x, - or -:- to obtain f / g or f x g etc. One class
of function which is useful is that of rational functions. These are those of
the form f(x)/ g(x) where f and g are both polynomials. For example
(x + 1)/ (x 2 + I), 1/ (x - I), (3x 4 - 5x 3 +6)/ (2x + I). As we commented earlier
there is no reason why a rational function is a polynomial.
The obvious other picture is 1.37, where A representsf(x) and B represents
g(x). Then Fig. 1.37 gives g(f(x». This is sometimes called a function of
a function or the "product" of g and f This is best illustrated by some
examples. Let f(x) = sin x and g(x) = x 2 + I. Then g(f(x» = (sin X)2 + I.
Notice f(g(x» = sin(x2 + I). Also if f(x) = eX and g(x) = x 2 + X + I, g(f(x» =
(eXi + eX + I = e2x + eX + I.
Exercise 8
For the following pairs evaluate g(f(x» and f(g(x».
(i) f(x) = 1/ x, g(x) = x 2
(ii) f(x) = sin x, g(x) = eX
(iii) f(x)=x+l,g(x)=x-1

7 NUMBERS
When we write a number as 2345 we automatically realise that this is, in
words, two thousand three hundred and forty five. So in the written language
the number is thought of as 2 x 1000 + 3 x 100 + 4 x 10 + 5, or more neatly as
2xI03 +3xI0 2 +4XI0 1 +5XIO°, remember 10°=1.
If we let d = 10 then any string an ... a2al ao represents the number
aniOn +an_,lO n- 1 + ... +a,IO+ao or and n +an_,d n- , + ... +ao, where
0::;; aj < 10.
Special Functions 21
It is now clear that if we let d be any number and aj be numbers such
that o~ a j < d, we can use anan-I ... , a o to represent the number and n +-
an-I d n - I + ... + ao· When this is done the number is said to be written in
base d. Our usual representation is in base 10 (decimal). The other very
commonly used systems are base 2 (binary) and base 16 (hexadecimal)
although other bases are used occasionally.
To illustrate let us consider base 3. We write a number as 2101. We
interpret this in decimal as 2x3 3 +1 x3 2 +Ox3+1 xl =54+9+1 =64.
Notice that we have no need of a symbol representing 3, since 3 = 1 x 3 +0 =
10. To take a number written in decimal form and write it to the base three
(ternary) we keep dividing and note the remainders:
31ill
3140 rI
3111 rI
31~ r I
I r1
So 121 in decimal becomes 11111 in a ternary representation. It can easily
be checked
11111 = 1 x3 4 +1 x3 3 +1 x3 2 +1 x3+1
= 81 + 27 + 9 + 3 + 1 = 121
We will now give some examples in binary representation.
Example 5
121 in decimal notation becomes
2 Iill
2160 r 1
2130 r0
21U rO
2 11 r 1
2 J r1
1 r I
o r 1
So
121 = 1 x26+1 x2 5 +1 x24+1 x2 3 +Ox2 2 +0x2+1
=64+32+16+8+1 = 121 i.e. 121 = 111101

In this situation we only need as base numerals 0 and 1. So a binary


representation has the form anan-I ... + a o where 0 ~ a j < 2 or aj = 0 or 1.
So binary numbers have the form 10010, 100010011.
Example 6
10010 = 24 +2 = 18 and 100010011 = 28 +24 +2 + 1 = 256 + 16 +2 + 1 = 275
A comment here is that in computer language the binary system is so
common that the symbol K does not mean "one thousand" or 103 but i O
which is 1024. Note in binary i O = 10000000000.
22 Mathematics for Seismic Data Processing

Exercise 9
(i) In the following evaluate the binary numbers as a decimal: (a) 10101;
(b) 1I01101I; (c) 100; (d) 10001.
(ii) In the following evaluate the ternary numbers as a decimal: (a) 12021;
(b) 11022; (c) 202; (d) 2021.
(iii) Rewrite the following decimal numbers as both binary and ternary
numbers: (a) Ill; (b) 422; (c) 61; (d) 81.
Just to make the point clear, given any number d we can use this as a
base to write any number a in the form a = and n +an - I d n - I + ... +ao where
O:s; aj < d. If d > lOwe have to use new symbols. In particular the
hexadecimal system (d = 16) is used in many computers. The usual conven-
tion is that A= 10, B= II, C= 12, D= 13, E= 14, F= 15. Let us write out
the first 32 numbers in the four bases we have discussed so far.

Binary Ternary Decimal Hexadecimal

I I I I
10 2 2 2
11 10 3 3
100 II 4 4
101 12 5 5
1I0 20 6 6
III 21 7 7
1000 22 8 8
1001 100 9 9
1010 IO} 10 A
1011 102 11 B
1100 1I0 12 C
1101 III 13 D
II 10 112 14 E
III I 120 15 F
10000 121 16 10
10001 122 17 II
10010 200 18 12
1001I 201 19 13
10100 202 20 14
10101 210 21 15
10110 21I 22 16
lOll I 212 23 17
1I000 220 24 18
11001 221 25 19
11010 222 26 lA
11011 1000 27 IB
11I00 1001 28 lC
IIIOI 1002 29 ID
III 10 1010 30 IE
III 11 lOll 31 IF
100000 1012 32 20
Special Functions 23
It is just as easy to add and mUltiply numbers to the base d as ordinary
decimals, all you have to do is to remember that d is the relevant number
for carrying, not 10 as usual. Here is an example base 3.
III
+ 10
+ 102
1000
11
Just to check we do it in decimal notation

111=9+3+1=13 10=3 and 102=9+2= 11


So
III + 10 + 102 = 27 = 1000

Similarly multiplying 121 x2= 1012. Again we can check: 121 =9+6+1 =
16, so 2 x 16 = 32 and 32 = I x 33 + 3 + 2. Long multiplication works the same
121
21
121
10120
11011
1
Here are some examples in binary notation:
1101 11001
+ 1001 + 1110
10110 + --.lQ!
1 1 101100
III
1101
x 1001
1101
1101000
1110101
It is very easy to do this arithmetic using polynomials. If a = am ... , a. ao
and {3 = brn , ••• , bo are two numbers represented to the base d, then
n rn
a =L ai di and {3 =L bid i.
;=0

Then

i=O
24 Mathematics for Seismic Data Processing

where
j

Cj = L ajb j j_
j=O

These are precisely the coefficients worked out for the product of poly-
nomials. Hence we have reduced the problem of multiplying two numbers
to one of multiplying polynomials. This may not seem to be of much
advantage but recall that 0 $ aj < d, so the coefficients of the polynomials
have a limited range of values. If d = 2 then each coefficient is either 0 or 1.
In many ways base 2 is a favoured base precisely because it only requires
two distinct symbols. It does have the disadvantage that it involves long
strings (6 digits for example) to specify quite small numbers. Before discuss-
ing the connections with logic however one should meQtion the representa-
tion of numbers which are not integers (or whole numbers). In practice we
write these in decimal as decimals, i.e.
3.75 = 3 x 100 +7 x 10- 1 +5 x 10-2
7 5
=3+10+102
We can think of this as extending our polynomial to negative powers of d.
So we write any number a as anan-I an-2 ••• ao . b l b2 , • • . , where
a = and n +an-I d n- I + ... +ao+b l d- I + b2 d- 2 + .... So if d = 2
101
1.101 = I +-+-+-
248
7
= 1 +"8 = 1.875 in decimal

Another useful device is to represent numbers in the so-called scientific


notation. Essentially we choose to take the number anan-I'" ao . b l b2
and write it as dn(a n ' a n-Ia n-2'" bob l b2 •• • ). So we write it as
an' an-I'" b2 ••• bn. ThUS\ 12.35 is written 1.235 x 10 1 and 123.5 is written
as 1.235 x 101 . On calculators or computers this is normally designated as
1.235EE2 where the base is 10. We also do this with negative powers, so
that 0.01235 is written as 1.235 x 10-2 or 1.235EE-2. Within a computer with
a fixed maximal length for the representation of numbers this enables
calculations to be carried out over a larger range of values than would be
possible otherwise but some loss of accuracy. This is known as the floating
point representation.
Suppose we have 16 "bits" in a computer available to represent a number.
We need 1 for the sign ± so we now have 15. Then we can have numbers
in binary with 15 consecutive Is. This is about i 5 + extra terms and 215 =
32768. It may be more sensible to split our locations into two parts, one
which holds the power of 10 part in the scientific notation representation
and one the remainder. So if the first 8 locations hold the number and the
next 8 the power of 10 we can represent numbers in the range 10-27 to 1027
i.e. 10-38 to 1038-some improvement.
Multiplying two numbers in scientific notation is easy but adding is
slightly trickier as it is necessary to make them of the same form, i.e.
Special Functions 25

1.235 X 10 X 1.235 X 102 = (1.235)2 X 103 but 1.235 X 10 + 1.235 X 102 has to be
adjusted to say 0.1235 x 102+ 1.235 X 102 before the addition can take place.
One of the advantages of binary is that the hardware is easy to design.
This is because if we think of how the addition works we get
0+0=0 1+0 = 0 + I = I and I + 1= 10
If we only consider the last digit we obtain a new sum
0$0 = 0 1$0 = 0$ I = I and I $ 1= 0
Multiplying is as before
oX 0 = I X 0 = 0 X I = 0 and IXI= I
This makes for easy design since all we need is an "on" for I and "off"
for zero. If we consider the truth and falsity of simple statements we find
that we construct a very similar model, which is very useful. Let us use
capital letters for statements, P, Q and R etc. We can write P is true or P
is false and use t and f for these. We can combine two statements by "and"
P and Q, for example P = "mathematicians are mad", Q = "mad people are
useless". Now P and Q is the statement "mathematicians are mad and
mad people are useless".
Now we need to consider the truth of the compound statement P and Q
assuming we know the truth, or otherwise, of P and Q. The neatest way to
write this is in a "truth table".

P Q P and Q

t t
t f f
f t f
f f f

So the statement P and Q is true if and only if both P and Q are true, which
is our usual use of the term "and". Let us introduce another method of
joining two statements together. This is, for obvious reasons, eor (or exclus-
ive or). That is P eor Q is true if exactly one of P or Q is true. This has the
truth table

P Q P eor Q

t t f
t f t
f t t
f f f

We notice that the truth value of our statements can only be either true or
false. If we denote a true statement by 1 and a false statement by 0, we can
rewrite these two tables, using x instead of "and" and $ instead of "eor"
26 Mathematics for Seismic Data Processing

Fig. 1.38

to get
1<:91=0 1<:90 = 0<:9 1 = 1 and 0<:90 = 0
and
1 xl = 1 lxO=Oxl=OxO=O
So this system is exactly the same as the system derived from binary
arithmetic. The final piece in the puzzle comes with the realisation that
electric circuits can give rise to the saine system. Take a simple system with
a set of switches, any given switch is either on or off. So if we label the
switches as capital letters, say P, then P is either on (l) or off (0). If we
have two switches P and Q in series then we get that the combined switch
P x Q is on if and only if both P and Q are on. So I x 1 = 1 but 1 x 0 = 0 x 1 =
Ox 0 = O. It is easy to construct a circuit which gives "or" in the non-exclusive
sense, just take two parallel switches. However to obtain "exclusive or" we
need to be a little more clever. Let p' denote the switch which is on when
P is off and vice versa. Then the circuit shown in Fig. 1.38 will give
1<:91 =0=0<:90 and 1<:90=0<:91 = 1.
Thus we can do both arithmetic and logic by using electrical circuits.
This is at the heart of all modem calculators and computers. A nice and
very comprehensive account is given in Knuth (1977) with a computation
bias. In fact this approach to logic dates from G. Boole in the 1800s and
is often called Boolean algebra.
Chapter 2

CALCULUS: DIFFERENTIATION

1 INTRODUCTION

Given a function f(x), as in Fig. 2.1, one of its characteristics which may
be of interest is its "rate of change" or its "wigglyness". We could get some
idea of this by calculating the slope of the curve at any point xo, the natural
measure of slope being the tan of the angle 6, tan 6. Thus tan 6 = 0 means
the curve is parallel to the x-axis.
Consider the graph shown in Fig. 2.2. Then at

at the slope is 0
a2 the slope is 1
a3 the slope is 0
a4 the slope is -1
as the slope is ";3 = tan 'IT /3

This defines a functioll, for consider the black box shown in Fig. 2.3; on
receiving Xo the box produces the slope (tan 6) at Xo.
Given smooth curves without jumps or comers we can imagine that in
principle the slope function can be determined. The next question is whether
given f( x) we can find the slope. A crude technique might be to lay a ruler
along the curve and then measure 6. The curve in Fig. 2.4 is magnified at
this point to make things easier to see.
From the figure we seef(xo) = y, say, measures the "height" of the function
at Xo. If we move a small distance /)x to Xo + /)x then the "height" of the
function at Xo + /)x is f( Xo + /)x). The slope is approximately

f(xo+/)x)-f(xo) QS
=-
Xo + /)x - Xo PQ
since if /)x is small we would expect the true slope RQ / PQ to be close to
QS/PQ.
Taking small values for /)x we look at
QS = Y + /)y - Y /)y
PQ Xo + /)x - Xo /)x
28 Mathematics for Seismic Data Processing

Y.f~
,,
,
x.
Fig. 2.1

a, a2 a. a. a.
Fig. 2.2

slope at x.
)

point Xo tan a
Fig. 2.3

y
:R
a :
,p- -----"1 Q
,, :'
,, ,,,
,
, ,
:.-
, 6X----';,
x Xo+6X

Fig. 2.4

Example 1
To try an example, let f(x) = 3x then
y +8y - y 8y =f(xo+8x)- f(xo) = 3(xo+8x)- 3xo = 3
8x 8x 8x 8x
Try this on your calculator for Xo = 10, Xo = 0 and a "small" 8x.
Example 2
A more complex example
f(x) = 3x 2
Calculus: Differentiation 29
so
l)y = f(xo + l)X) -f(xo) = 3(xo + l)xi - 3x~
=3(x~ + 2xol)X + l)X 2) - 3x~
Giving
l)y f(xo + l)x) - f(xo)
-= = 6xo+3l)X
l)x l)x
When l)X is small then we find the slope at Xo is 6xo. That is we can determine
the slope for any value Xo giving the function g(x) = 6x.
Example 3
A more complex function is f( x) = 1/ x 2 • Then
I
l)y = f(xo + l)x) - f(x) = 2
(xo+l)x) x~
This can be rearranged to give
~
x 02- (x0 + l)X)2 2Xol)X -l)x2
uy=
(xo + l)X)2X~ (xo+l)xix~
and hence dividing by l)x

_l)y = 2Xo-l)X
__::--::-
_~_--=-.o....-

l)x x~ +2x~l)x +(l)X)2X~

As our approximation gets better the smaller l)X we have, and letting l)x be
zero
l)y = 2xo 2
l)x - x~ = - x~

Thus, Fig. 2.5 illustrates the function, and Fig. 2.6 illustrates the slope.
We have carefully avoided the point x = 0 where unpleasant things
happen.
Example 4
As a final example we take a trigonometric function f(x) = cos x. At any
point Xo
l)y = cos(xo + l)x) - cos Xo
= cos Xo cos l)x - sin Xo sin l)x - cos Xo
using the formula for the cosine of a sum. As we are interested in small l)X
we can use the approximation (see Chapter I, section 3)
cos l)X = I sin l)x = l)X
Then the expression simplifies and we get
l)y = cos x - l)x sin x - cos x
or
5y .
-= -sin x
5x
30 Mathematics for Seismic Data Processing

Fig. 2.5

Fig. 2.6

This is only true when x is measured in radians because otherwise the


approximation doesn't hold. Thus cos x has a slope function -sin x.
We can check this rather easily. Suppose we iook at f(x) = cos x at x = 1
(remember in radians). Take Xo = I then 6x = 0.001 gives
8y cos 1.00 I - cos I 0.5394606 - 0.5403023
-=
8x 0.00 I 0.00 I
= 0.841741 = -sin(l.000500)
or
6x=0.OOOOI
cos 1.00001-cos 1 .
0.00001 = -0.841474= -slO(1.000006)

If you try thrs in degrees it doesn't work! The values of Xo may make a
difference as to how small 8x should be; Xo = 2 requires smaller values for
6x.
Calculus: Differentiation 31
Exercise 1
(i) Find fjylSx for y=f(x)= llx, x>O.
(ii) Sketch sin 11 x for x near 0, e.g. x values from 0 to 1. Would you expect
to encounter difficulties in computing fly Iflx at x = O? Try it using x = 0
and flx = 0.00001.
The quantity Sy Iflx for small values of Sy and llx tends to a function
written dyldx or f(x). This quantity dyldx is called the derivative of y.
Remember it is a function. The two different ways of writing the function
arise because there were two discoverers of calculus. The f(x) notation was
Newton's while dyldx was used by Leibnitz.
We shall usually use dyldx to mean the derivative. Sometimes this is not
possible so if y = f(x) we may use dy I dx or dfl dx or f(x). Similarly, if
y = g(x) say
dy dg I
-=-=g(x)
dx dx
and if z = u(x)
dz du I
-=-=u(x)
dx dx
We do not even have to use x, thus if
s = h(t)
ds dh I
-=-=h (t)
dt dt
Unfortunately we cannot always find the derivative-it may not even
exist. The earlier example y = 11 x 2 had a problematical point at x = O. As
the function is not well defined here it is hardly surprising that dy I dx causes
problems. You will remember the Heaviside function where
ox<O
f(x) = { 1 x 2: 0
Figure 2.7 illustrates the "jump" at x = 0, where the function moves from
o to 1. Clearly attempting to find the slope here is not really sensible and
the derivative d y I dx does not exist (i.e. there is not a value) at x = O.

x
Fig. 2.7
32 Mathematics for Seismic Data Processing

Table 1. Functions and Their Derivatives

/(x) f'(x)
a (constant) o
ax"(n;t' 0) anx"-I
(x+a)" n(x +a)"-I
I
log x
x
e ax ae ax
sin ax a cos ax (x in radians)
cos ax -a sin ax (x in radians)
tan ax a sec2 ax (x in radians)
aX aX log a

The main problem in working out dy/dx is to find a useful way of


expressing f(x + l)x) - f(x) = l)y. Some cases are fairly easy but others are
not. Table I gives a list of some of the most useful common cases. In practice
we try and split a given function into simpler functions which we can
differentiate and then recombine our derivatives using some basic rules.
We will now have a look at these rules and show how they can be used.
If you have not met these ideas before you may find it rather hard going.
However, it is more important at this stage that you convince yourself that
finding the derivative is possible and leave the technical details to be worked
over more slowly on a second reading. The basic rules:
Rule 1
If a is a constant, that is, it does not depend on x, and if
y = f(x) = ag(x)
then
dy dg
-=a-
dx dx
Thus if y=17cosx we know d/dx(cosx)=-sinx and so dy/dx=
-17 sin x.

Rule 2
If f(x) and g(x) are two functions and y = f(x) + g(x), then
dy d df dg
-=-(f(x)+g(x»=-+-
dx dx dx dx
or we could write this as
d df dg
dx (f + g) = dx + dx

This is just a symbolic way of expressing the rule that the derivative of
a sum is the sum of the derivatives.
Calculus: Differentiation 33
Example S
Let y = sin x + cos x. Then
dy d sin x d cos x
-=--+--
dx dx dx
= cos x-sin x
Rule 3
Suppose u(x) and v(x) are functions and y =j(x) = u(x)v(x). Then
dy dv du
- = u(x) - +- v(x)
dx dx dx
Example 6
y = x sin x
dy .
dx = x cos x + I sm x

Example 7

Rule 4
Suppose u(x) and v(x) are functions and that
y = j(x) = u(x)/ v(x)
Then
du dy
dy = ~ v(x)-u(x)~
dx v(xf
Example 8
sin x
y=j(x)=-
cos x
dy = (cos x) cos x - (sin x)( -sin x)
dx cos 2 X
cos 2 x +sin 2 x I 2
--2-=sec x
cos x
2 cos x
Example 9
I
y=j(x)=-
logx
dy O·(logx)-l·(l/x)
-=
dx (log X)2 x(log X)2
34 Mathematics for Seismic Data Processing

Rule 5 Functions of functions (Chain Rule)


Suppose y = f(t) and t = g(x) so that y = f(g(x)). Then
dy df dt
-=-0-
dx dt dx
or
dy df dg
-=-0-
dx dt dx
This is also called the "chain rule".
Example 10
y = sin(log x)
dy I
- = cos(log x) . -
dx x
Example 11

dy
-=cos[(x+a)2]. 2(x+a)
dx
Rule 6
Often we would like to differentiate inverse functions. Suppose
y=jl(X)
then
dy= _ _
dx dxjdy
That is if y = jl(X) then
f(y)=x
and so
df. dy = I
dy dx
dy=_I_
dx dfjdy
Example 12
y = sin-I x
dy= _ __
dx d . cosy
dy (smy)

then since x = sin y, Jl- x 2 = cos y and dyjdx = IjJI-x 2 •


Calculus: Differentiation 35
Rule 6A
This isn't really a rule but a useful rule of thumb. Sometimes it is easier to
differentiate logf(x). Suppose y =f(x), and z = log y = logf(x), then
dz I dy
-=--
dx ydx
so
dy dz
-=y-
dx dx
Example 13
y = 13 x2

Z = log Y = x 2 log 13
dz
-=2xlog 13
dx
dy 2
.. dx = (2 log 13)x13 X

Exercise 2
Differentiate
(i) ax +b (vi) eX/sinx
(ii) ae x + bx 2 (vii) eax+sin x
(iii) sin x log x + cos 14x (viii) 13 sin x cos x
(iv) e- x2 / 2 (ix) 2x
(v) e sin x

2 HIGHER DERIVATIVES
Since y=j(x) has a derivative dy/dx and this is a function then dy/dx
can have a derivative, e.g. if y = x 2 , then dy / dx = 2x. g(x) = 2x is a perfectly
respectable function and

dg=2
dx
Once again h(x) = dg/dx is a function and has a derivative dh/dx =0
The derivative of the derivative d/dx(dy/dx) is written d 2 y/dx 2 and is
called the second derivative. This second derivative can itself be differenti-
ated to give d 3 y / dx 3 which would give d 4 y / dx 4 , ••• , etc. Figure 2.8 illustrates
the original function, y = x 2 , and its first and second derivatives. Thus one
example is
y = log x
dy =_
dx X
d2y
dx 2 = - x 2
36 Mathematics for Seismic Data Processing

x x

Fig. 2.8

d3 y _2
dx 3 - x 3
d4y 6
dx 4 = - X4

Another example would be


y = sin x
dy
-=cosx
dx
d2 y .
dx 2 = -SID X

d3 y
-=-cosx
dx3

d4y .
-=SIDX
dx 4
d5 y
-=cosx
dx 5
Calculus: Differentiation 37
Exercise 3
(i) Find dy/dx, d 2 y/dx 2 and d 3 y/dx 3 if y=6x 4 +2x 2 +1.
(ii) Find d2y / dx 2 if y = (cos X)2.
(iii) If y=xlogx what is d 3 y/dx 3 ?
(iv) If y = eX +e- x show that d 2 y/dx 2 = y.

3 MAXIMA AND MINIMA


An immediate and very useful application of derivatives is in finding the
maximum or minimum of a function.
Suppose we have a function y = f(x) as illustrated in Fig. 2.9, and we are
trying to find the lowest point. The low point is the bottom of the valley
and if we move along the function we move downhill until we reach the
bottom and then move uphill. Thus dy / dx is first negative and then becomes
positive. At the minimum it is zero.
If we have a hill then to reach the peak we go uphill and then downhill
so, for a maximum, dy / dx is first positive, dy / dx = 0 at the peak, then
dy / dx is negative. To find maxima or minima we look for the "turning
point" where dy / dx = O. If dy / dx = 0 for a value of x, say xo, then
(a) if dy/dx > 0 for x < Xo and dy/dx < 0 for x> Xo we are on a hill and
Xo is at the maximum
(b) if dy/dx<O for x<xo and dy/dx>O for x>xo we are in a valley
and are at a minimum.
dy
-=0 at x = Xo
dx

dy <0 for x<xo dy>O for x < Xo


dx dx p
dy>O for x> Xo dy <0 for x> Xo
dx dx
Xo is a minimum Xo is a maximum
Example 14
x3 x2
Y = f(x)=-+--6x +8
3 2
dy
- = x 2 +x-6=(x +3)(x -2)
dx
the "stationary" values or "turning" points where dy / dx = 0 are x = -3 and
x = 2. Take x = -3
. dy
if x < - 3 say -4 then dx = 7

dy
if x> -3 say -2 then -=-4
dx
38 Mathematics for Seismic Data Processing

xo x

Fig. 2.9

-3 2

Fig. 2.10

hence x = -3 is a maximum. Now at x =2


dy
for x < 2 say x = 1 then dx = -4

. dy
whtle x> 2 say x = 3 then dx = 6

Thus x = 2 is a minimum. These are shown in Fig. 2.10. Notice these points
are the local maxima and minima. The function value at x = lOis bigger
than the value at x = - 3.
Example 15
1
y=--
x-2
dy I
dx = - (X_2)2
We cannot find an x for which dy/dx = 0 and so there are no turning points.
If we wish to sketch a curve it is very useful to know the turning points,
for example consider
y =x4 +2x 3 -3x2 -4x+4
dy 3 2
-=4x +6x -6x-4
dx
Calculus: Differentiation 39

Fig. 2.11

Turning points occur at


x = -2, -hnd 1
where
x = -2 is a minimum
x = -~ is a maximum
x=1 is a minimum
if x is large and negative y > 0
if x is large and positive y > 0
and the function is shown in Fig. 2.11. Suppose y = x 3 - 8, then dy / dx = 3x 2
and we have a stationary point at x = O. For x < 0 and x> 0, d y / dx > O.
So this isn't a maximum or a minimum! It does represent a point where
the curve is fiat but is in fact a small ledge and not a peak or a valley.
Figure 2.12 illustrates such a point which is called a point of inflection.
We can rewrite our criterion for points in terms of the second derivative-
this is sometimes simpler.
Suppose y=f(x} and dy/dx=O at X=Xo then

YESl f(xo} is a minimum I


NO p

H~« ~YEs1I(X') is a maximum I


NO
I?* TEST FAILS I
40 Mathematics for Seismic Data Processing

Fig. 2.12

Example 16
2 250
y=x +-
x
dy 250
-=2x--=0 whenx=5
dx x2

d 2 y 2 500 0 .. .
d x 2= +->
125
SO x=51s a mInimum.

Example 17
Steel cans are h cm high with radius of base r. Volume is V = 11'r2h. Surface
area is 211'rh + 211'r2. Suppose we want a minimum area for a given volume,
say 64 cubic cm. Then h = 64/ 11'r2 and so
128 2
Area=A=-+211'r,
r
then
dA 128
- = --+411'r
dr r2
so

so

r3 = ~ or r = .JF3 = 2.17 cm
and finally,

=4.34cm
Calculus: Differentiation 41

Fig. 2.13

4 TAYLOR SERIES AND APPROXIMATIONS

Suppose we have the function y = f(x) and we know the value of the function
at Xo but that the value of f(x o + h) is not known. One could imagine that
f(x) might be sales at time x and that we are trying to extrapolate. If () is
the slope of the curve at x o, then an approximation to l>y is h tan () = h dy / dx,
see Fig. 2.13, and so

f(xo + h) = f(xo) +h(dY )


dx atx=xo

This is a simple but very useful approximation which is the key to Newton's
approximation method.
Suppose we wish to solve an equation f(x) = o. We might guess a solution
Xo. If Xo is near the solution then Xo + h might be the solution. From the
above equation

so
f(xo+h)=O

h = _f(xo)
.. f'(xo}
and a better approximation is
f(xo)
x=xo+h=xo---
f'(xo)
We can then use this value to go through the process again.
Example 18
f(x) = x 2 -9 = O,j'(x) = 2x. Try Xo = 2

h = f(2) = 4-9 = +~
-1'(2) -4 4·
Thus h = ~ and the new approximation for x is 3.25. Using Xo = 3.25
h = -0.24 ...
get
x =3.0096 ...
42 Mathematics for Seismic Data Processing

Finally we obtain the sequence:

Attempt number Value

I 2 (first guess)
2 3.25
3 3.0096
4 3.00001536
5 3.00000000

Example 19

x
f'(x) = cos x- 2
Attempt number Value

I 1.5
p 2 2.14039
3 1.95201
4 1.93393
5 1.93375

using the last value f(x) = 0.000005.


Generally each set doubles the number of correct figures in the decimal
part.
The method almost always works-but We do not go into the details.
This approximation can be improved. The approximation we made was
to extend the curve by a straight line. We might hope to do better by using
an arc. The next approximation to using a lirie is to use a parabola of the form
f(xo+h)= ao+a.h +a2h2
where ao, a .. a2 are constants. If h = 0 thenf(xo) = ao. Now if we differentiate
df
-=a. +2a2h
dx
and at h = 0 a. = df/ dx. Similarly taking the next derivative gives a2 =
!d 2f/dx 2 and so
dy h2 d 2y
f( x +h)=f(x )+h-+--
° ° dx 2! dx 2
Exetdse 4
Find an approximation for 10g(1 + x).
Calculus: Differentiation 43

This process can be carried further but we will not pursue it. However
in the information sheet provided various extended series of this nature are
shown, especially for the better known functions.

5 PARTIAL DERIVATIVES

Most of the functions which we have differentiated involve only a single


variable x,
e.g. y = sin x

If we are to construct realistic mathematical models we need to consider


functions of two or more variables, for instance
z=f(x,Y)=X 3y+y2
z = f(x, y) = x 2eY
w = f(x, y, z) = x 2+ y2 + z2-1
Some simple but practical examples might be: f(x, y) =.J x 2+ l is the
distance from the point (x, y) to the origin. The weight of a rectangular
sheet of metal of sides x and y with density p is given by pxy.
Ifwe use such functions then we shall need ideas like that of the derivative
which leads us to the idea of partial derivatives. These ideas are complex,
mainly because it is so difficult to visualise the functions involved.
If we know

then a picture requires the drawing of a 3-dimensional diagram, see Fig.


2.14. One way round this problem is to draw a contour map, Fig. 2.15,
rather on the lines of a geophysical map, as another example, Fig. 2.16,
shows the contour lines for e-x2_y2_bxy.
Unfortunately if w = f(x, y, z) we need 4 axes at right angles and we
cannot draw diagrams.
Exercise 5
Draw a rough contour map of z = x 2+ y2 - xy. P

Partial Derivatives
For y = f(x) we find the slope by differentiating. Suppose we have z = f(x, y)
say z = x 3y + y2. If we draw a contour map we eventually have a map of
the function rather like a map of the Lake District. If one were to move in
the direction of the y-axis at some fixed value of x, say x = a, then we are
clearly going to go upwards or downwards depending on whether we go
in the positive or negative directions.
In fact if x = a, then
z=a 3y+y2
i.e. z is a function of y alone as a is the fixed value of x, at x = 0 we have
44 Mathematics for Seismic Data Processing

! ,./"
1/
--------------~'

Fig. 2.14 Fig. 2.15

Fig. 2.16

the graph as shown in Fig. 2.17, for X= 1, Z=y+y2 and dz/dy= 1 +2y.
This is shown in Fig. 2.18. Whatever value of a we take we obtain a function
which we can differentiate. This slope is the slope of the surface in the
direction of a constant x.
Effectively we just differentiate for constant x. To distinguish between
dz/dy and dz/dy for a fixed x we write the latter as az/ay, and call it
partial "dz" by partial "dy". Thus
az 3
-=X +2y
ay
In the same way we can differentiate with respect to x for a fixed y. This
gives dZ/ ax = 3x 2y, i.e. differentiating with respect to x while holding y
constant.
This gives the slope of the surface as we travel parallel to the x-axis for
a fixed value of y. If y = 0, dZ/ dX = 0 i.e. the surface is flat along the x-axis.
If y = 1,
az 2
z = x3 + 1 -=3x
ax
and we have a valley.
As usual there is a variety of notation: for z = f(x, y) we can have
az af
ax = ax =fx
Calculus: Differentiation 45

Fig. 2.17

Fig. 2.18

and
az af
ay = ay =J;,
We illustrated a function of two variables because we could draw the
pictures. We can have partial derivatives for functions of several variables.
Suppose v = f(w, x, y, z), then av/ ax or af/ ax is just the result of differen-
tiating f(w, x, y, z) and regarding everything but x constant.
Example 20
z = 4x 3 +2xe Y + y2 +Iog t

az 2
- = 12x +2e Y
ax
az
-=2xe Y +2y
ay
az I
at
Exercise 6
Find az/ax, az/ay for
(i) z = X 2+y2
Oi) z = xy
(iii) z = sin xe Y
46 Mathematics for Seismic Data Processing

Exercise 7
Find all the first order partial derivatives, i.e. afl ax, afl ay, ...
(i) f(x, y, z) = x 2+ y2 + Z2
(ii) f(x, y, z, w) = xy2z3w4
(iii) f(x, y, z) = x/(I + ye- 2Z ).

6 HIGHER ORDER PARTIAL DERIVATIVES


Once we find the derivatives of a function, as they are themselves functions,
we can then differentiate them. Thus
z = x 2 + 2xy + y2
az
ax = 2x +2y = g(x, y)

then we can find the partial derivative of g with respect to x or to y


ag a2z
-=-=2
ax ax 2
and
ag a2z
-=--=2
ay ayax
We can hence calculate
a2z a2z a2z a2z
ax 2' ay2' axay' ayax
In most cases a2z/ ax ay = a2z/ ay ax and we shall always assume this is true.
Example 21
z = 4x 3+2xe Y + y2
az
- = 12x2+2eY
ax
a2 z
-=24x
ax 2
a2 z
--=2e Y
ayax
az
-=2xe Y +2y
ay

As we are dealing with functions of many variables and with a very rich
class of possible derivatives the theory of partial derivatives does get very
complex. We shall not give very much of this theory but it is worth pointing
out some of the ideas.
Calculus: Differentiation 47
We start with an example. Suppose z = f(x, y) and we think we would
rather work in polar coordinates
x = r cos ()
y = r sin 8
Given azlax, azlay can we get azlar and azla8? If say z=x+3y then we
can substitute, getting
z = r cos 8 + 3 r sin 8
Now
az .
- = cos () + 3 sm 8
ar
and
az .
- = - r sm 8 + 3 r cos 8.
a8
Sometimes this is difficult and various relations have been found to save
work. These are of the form
az afax afay
-=-.-+--
ar ax ar ay ar '
az afax af ay
-=--+-.-
a8 ax a8 ay a()
Many variants of these formulae exist and have some useful application.
We will omit details.
Any equation involving differential coefficients is called a differential
equation, e.g. dy I dx = y. These equations are very important in physical
applications.
Any equation involving partial differential coefficients is called a partial
differential equation, p.d.e., for example
2 au au
3y -+-=2u
ax ay
2 a2u (au)2
u ao+ a8 =0
If the equation involves only ordinary differential coefficients the equation
is called an ordinary differential equation, o.d.e. If
x
z = f(x, y) = tan- I -
y
then
az y
-=--
ax X 2+y2
a2 z _ 2xy
ax 2- (x2 + y2)2
~_ y2_x 2
ax ay - (x 2+ y2)2
48 Mathematics for Seismic Data Processing

and
(Pj a2j
-+-=0
aX 2 ay2
Three important examples of p.d.e.s are
Example 22 The wave equation (in one dimension)
a2u I a2 u
ax 2- c2 at 2 =0
Example 23 Laplace's equation (two-dimensional form)
a2u a2 u
-+
2 -=0
ax ay2
Example 24 One-dimensional diffusion equation
a2 u I au
ax 2 =kat

We will examine Example 22 and some of its solutions in the next chapter,
as well as discussing the solutions to ordinary differential equations.

7 OPTIMISATION
In many situations we need the maximum or minimum of functions of
several variables. One might want the minimum value of "Rosenbrock's
banana-valley function",
P j(x, y)= 100(y-x 2 f+(x-I)2
This function was used for testing various numerical techniques. It is easier
to see how to find these values for functions of only two variables, the
technique is much more general but for the moment we just look at the
simplest cases.
Suppose we have a function with contours as shown in Fig. 2.19. If we
look at the function through the maximum for constant x and constant y
(indicated by dotted lines), these have the form shown in Fig. 2.20. Thus
at a maximum
aj aj a2j
-=0 and -=0 and -<0
ax ay ax 2
Similarly in a hole,
of
-=0
of
-=0
ax ay
These are illustrated in Fig. 2.21. Unfortunately there is another possibility-
the saddle point, shown in Fig. 2.22. At the saddle point the surface is flat
but as we move away we find that we go upslope in some directions and
Calculus: Differentiation 49

Yo
at Yo

x x
Fig. 2.19 Fig. 2.20

z max z

~:
:
, I ~
I

: min
I

by
I I ' • I
, I I
I

C![gJ
Y
c=:::;
I I

\
.I.. \
--...... \
--~---
X x

Fig. 2.21 Fig. 2.22

down in others as in Fig. 2.22. An alternative description of a saddle is to


think of it as a pass through the mountains. It is a low point from the view
of the mountains but a peak from the view of the walker trying to get over
those mountains. The point (0,0) for f(x, y) = X4 +4x 2l - 2X2 + 2l- I is a
saddle point. Draw a contour map to convince yourself. See if you can
check on (1,0) and (-1,0).
We finish by giving a rule. Suppose we have a function f(x, y) and

af = af =0
ax ay
Let

Then
af af
-=-=0
ax ay

~<o ~<o
50 Mathematics for Seismic Data Processing

Exercise 8
Find the maximum and minimum points of
(i) J(x, y) = X4 +4X 2y2 - 2X2 +2y 2-1
(ii) J(x,y)=x 2/+y2- x 2-1
[Note: if a = 0 more subtle tests have to be used to pinpoint the nature of
that point.]
Chapter 3

INTEGRATION

1 INTRODUCTION AND DEFINITION


Historically and practically one of the major problems discussed by
mathematicians has been how to work out the areas of various figures. One
way to approach this problem is to consider the area under a given curve.
Let y = f(x) be a function with Fig. 3.1 as its graph between x = a and x = b.
We wish to find the area of the region A. Clearly if f(x) is constant, say c
for some number c, then the graph is as shown in Fig. 3.2, and the area
under the graph is the rectangle whose area is (b - a )c.
The way in which we attempt to evaluate the area of A in a more complex
case is to approximate, splitting it into small pieces whose area we can
calculate. First we divide the interval from a to b into n bits, for some
whole number n (mathematicians call whole numbers, natural numbers).
Choose xo, X\o"',X n so that xo=a, x,-xo=(l/n)(b-a), X2-X,=
(1/ n)( b - a), ... , Xn - Xn _, = (1/ n)( b - a) and Xn = b (it is more usual to write
this as x;+,-x; =(l/n)(b-a) for i= I to n -I with Xo= I and Xn = b). We
can draw n rectangles, the first of height f(xo), the second of height f(x,) . ..
as in Fig. 3.3. Then we approximate the area of A as
b-a b-a b-a
--f(xo) +--f(x,) + ... +--f(xn-,)
n n n
This is written
b - a n-'
- L f(x;)
n ;=0

(Note given expressions a" ... , ar

;=0

L;=o a; = L;=o aj .)
i is called the dummy variable since
We could also have approximated by choosing different rectangles:
b-a) n
(- b-a b-a
- I f(x;)=--f(x,)+··· +--f(xn)
n ;=, n n
52 Mathematics for Seismic Data Processing

C --------r-----...,
A

a b a b
Fig. 3.1 Fig. 3.2

_!~"')L _________________ _
f(x,)
f(x o) =:=:=:-

a-x o x, x. Xj

Fig. 3.3

Now let n -+ 00, for most functions we can expect that the sum
L;:~ «b-a)/n)f(xj) will tend to A.
Example (simple) 1
Choose y =f(x) = x and a = 0, b = 1, as in Fig. 3.4. If we take n = 2, Xo =0,
XI =!. X2 = 1

1-0) f(xo)+ (1-0)


A= ( -2- -2- f(XI)=2·
I
2=4·
I I

If, instead, n = 3,
xo=O
the sum becomes
1-0 1-0 1 1-0 2 1
A=--xO+--x-+--x-=-
3 3 3 3 3 3
In general
"-I 1 -0 i "-I i 1
L - . - = L 2=2(1+2+· ··+n-l)
j=O n n j=O n n
n(n-:-l)
2n2

n-l
2n
(Note: This depends on the result from algebra that 1+2 + ... + n -1 =
n(n -1)/2.) Hence we find that the area of A is !, which is the result we
would expect. Even this simple example involves considerable manipulative
Integration 53

y y

A
x
o x
Fig. 3.4 Fig. 3.5

Fig. 3.6

skills to evaluate. Before going on to examine more complicated cases, we


need some more notation and some clear rules:

(i) Given a function f(x) the area between a and b, as defined above,

r
is called the definite integral of f and is written

f(x)dx

The notation comes from an enlarged capital s standing for sum.


(ii) The negative case. We need to consider what happens if f(x) is
negative as in Fig. 3.5. If the total area of the shaded piece is A then
it is the sum of the three areas AI. A2 and A 3 • However A2 is negative
because f(x) is negative for c < Xi < d. Thus in calculating the integral
J!
from a to b we need to take A2 away, so that f(x) dx = AI - A2 + A 3 •
This enables us to calculate the integral of functions whether or not
they take negative values.

r
(iii) Addition. By looking at the appropriate picture, Fig. 3.6, we can see
that

t b
f(x) dx = L f(x) dx + f(x) dx

(iv) The effect of a constant. Also if c is some constant we get

t b
cf(x) dx = C t
b
f(x) dx

The constant c represents a stretching of the graph. The example of


c=2 andf(x)=x is shown in Fig. 3.7.
54 Mathematics for Seismic Data Processing

2a ----------- f(x)-2x

f(x)-x

a
Fig. 3.7

f(x)+gx

y g(x)

f(x)

a b x

Fig. 3.8

(v) Addition of functions. One last simple rule is that if f(x) and g(x)

t t t
are two functions then
b b b
(f(x) + g(x» dx = f(x) dx + g(x) dx

This is illustrated in Fig. 3.8.


Example 2
Letf(x)=2x+3. Then

L (2x+3)dx= L L 2xdx+ 3dx by Rule (v)

=2 f xdx+3 by Rule (iv)

=~+3=4
When we have learnt how to evaluate more integrals these rules become
more and more useful.

2 THE RELATIONSHIP BETWEEN INTEGRATION


AND DIFFERENTIATION
We now move on from examining J!f(x) dx to considering F(b) = J!f(x) dx
where we think of b as being variable, but keep a fixed. Thus we have a
Integration 55

f(x)

a xo x

x
y F(x) -jt(tldt
a
F(x)

--~------~~------L-------4
a X

Fig. 3.9

function of b with J!f(t) dt acting as a black box, so that for every value
J!
of b, F( b) = f( t) dt gives a further number.
J:
To return to more familiar notation, we write F(x) = f( t) dt. Figure 3.9
shows the relationship between these functions, f(x) oui" original function
and F(x) the area under the curve at different points. We will now state
(but not prove) the Fundamental Theorem of Calculus:

This theorem is of both considerable theoretical importance and practical


importance in calculations as it relates integration to differentiation. To
relate this to Fig. 3.9, we see that for a given x o, the slope of F(x) at x = Xo
is the value of our original function at Xo. The practical value of this is that
integration (which can be very complicated arithmetically) can be reduced
in many cases to differo.ntiation which is much easier.
Our Theorem suggests that integration and differentiation are inverse
operations. This is almost true but not absolutely, as will become clear in
the next paragraph.
If we chose d instead of a we would have a function:

G(x) = f: f(t)dt

Then it is still true that dj dx G(x) = f(x) but G(x) does not necessarily
06 Mathematics for Seismic Data Processing

equal F(x). In fact G(x) = J:f(t) dt + F(x) by an earlier rule because

G(x) = LX f(t)dt= La f(t)dt+ f: f(t)dt

= La f(t) dt + F(x)
So G(x) = c + F(x) where c is a constant, given by c = J~ f( t) dt.
There is a language for this. If F(x) is any function such that
d
dx F(x) = f(x)

we say that F(x) is a primitive for f For example log x is a primitive for
II x, as is log x + 23. Since the differential of a constant is zero any two
primitives will be related by a constant. If F(x) is a primitive for f(x),
F(x) = Jf(x) dx (without limits) is called an indefinite integral of f and is
defined up to a constant, i.e. S f(x) dx = F(x) + c where c is any constant.
If G(x) = S:f(x) dx, then

G(b)= t b
f(x)dx and G(a)=O

If F(x) is some other primitive then G(x) = F(x) + c. So G(a) = F(a) + c


and c = -F(a). So J!f(x) dx = G(b) = F(b) +c = F(b)- F(a).
If the area under the curve is defined then a primitive exists called the
integral. The real usefulness of the idea comes when the primitive can easily
be found from known functions.
Example 3
(i) f(x) = 2x then F(x) = x 2 + c for any constant c is a primitive.
(ii) f(x) = x", n a natural number then F(x) = x"+l/(n + 1) is a primitive or

f f(X)dX= X"+l
n +1
+c

This enables us to evaluate the area under a parabola (curve of the form
y = ax 2 + bx + c) between,say, Xo and Xl.

f X

Xo
' f(x) dx = f X

Xo
) (ax 2 +bx +c) dx

X f X)
= f )ax 2 dx + fX) bx dx + c dx
Xo Xo Xo

(by a rule stated earlier)


X f X)
=a f )x 2 dx + b fX) x dx + c dx
Xo Xo Xo

Now

fXo
X
) 2
x dx=---
x~
3
x~
3
Integration 57
Table 1

f(x) J
F(x) = f(x) dx f(x) J
F(x) = f(x) dx

x"
x"(n 'i- -I) cot x loglsin xl
n +1
1
-=x- I log,(x) sec 2 x tan x
x
1
_ eax
e ax (a 'i- 0) cosec2 x -cot x
a
1
aX(a>O) --aX sin-I(~)(-I<x<a)
log, a .J a 2 - x2
cos ax 1 _I x
sin ax (a 'i- 0) -tan -
a a 2 +x 2 a a
sin ax
cos ax (a'i-O) log,x x logex-x
a
tan x logelsec xl

as
f X,
xdx=---
xi x~
Xo 2 2
and

So

f X ,

~
a
3
b
(ax 2 + bx + c) dx = - (x~ - x~) +- (xi -
2
X o) + C(XI - Xo)

Using the results from Chapter 2, we can obtain Table 1.


Example 4
(i)
f sin x + cos x dx = f sin x dx + f cos x dx

= -cos x +sin x +C
C is some constant.

(ii) f J- 1
1/2
dx = sin-I (~)-sin-I (0)
o I-x 2

7T 7T
=--0=-
6 6

(iii) fI
lOl
- dx = loge 10-0 = 2.30 ...
X
58 Mathematics for Seismic Data Processing

fIx)

Fig. 3.10

Fig. 3.11

Exercise 1
Evaluate the following integrals and check your solutions by differentiating:

0) f x 3 dx (iii) f2/: x2

(ii) f :2 dx (iv) f sin 4y dy

Evaluate

(v) fT sin x dx (viii) {710g x dx


(vi) f1T tan x dx (ix) f2 dx

f" -2--2
o o J4-x 2
2a dx
(vii)
a a +x

The entries in Table I can be verified by differentiating the right-hand


side. Important applications occur where we need to integrate over an
infinite piece of the x-axis. For example this is often true in statistics. In
the case where we need to consider the left hand side as in Fig. 3.10 we
write J:oof(x) dx. This gives an area from -00 to b. If we wish to consider
r:
the right-hand side we write f(x) dx, see Fig. 3.11. Finally we may wish
to consider both sides and then we write J:'oof(x) dx. Such integrals have
to be calculated with care. .
Integration 59

x
Fig. 3.12

Example 5

f .
OO

1
I
-dx
x2
Consider the picture, Fig. 3.12. The area A = J~ II x 2 dx. Now a primitive
is F(x)=-llx. So

A=-~-( -n=I-~
Now as X ~OO, II X ~O so we have

f
OO I

1
-dx=1
x2

A useful notational device is to write [F(x)]~ to mean F(b)-F(a). Thus if

r
F is a primitive of f we have

f(x) dx = [F(x)]~
So

f2
0
(x4+x3+I)dx= -+-+x
54
[X5 X4 J2
0

= 6.4 +4 +2 = 12.4
Integrating step functions is very easy. Let
0 if x <-1

f(x) 1 1 if -1 :s: x < 0


= ~ ifO<x<2
if 2:s: x,
as sketched in Fig. 3.13.
Then

fl f(x) dx = fl fI dx + 2 dx = [X]~l +[2x]b


=0-(-I)+2xl-0
=3
60 Mathematics for Seismic Data Processing

y
2 1----,..--

-1 o 1x 2 x
Fig. 3.13

Some important integrals which arise in the work on Fourier Series involve
the trigonometric functions. We give just one example. If we can recall two
formulae from Chapter 1
sin(A + B) = sin A cos B +cos A sin B
sin( A - B) = sin A cos B - cos A sin B
So
sin(A + B) +sin(A - B) = 2 sin A cos B
Hence

{21T sin nx cos mx dx = f1T !(sin(n +m)x +sin(n - m)x) dx


=! [ cos(n + m)x cos(n - m)x] 21T if n ¥:- m
2 n+m n-m 0

=.! ( cos 0 + cos 0 _ cos(n + m)27T cos(n - m)27T)


2 n+m n-m n+m n-m
=0
If n=m
21T f21T
f o sin nx cos nx dx = 0 ! sin 2nx dx

=0
Exercise 2
Evaluate the following integrals
(i) J~1T sin 3x cos 4x dx
(ii) J~ e- 2x dx, you can assume e- 2x tends to zero as x tends to infinity
(iii)J; e- 2x dx
(iv) J~f(x) dx where
f(x) =1 O::s;x::s; 1
=2 1::s;x::s;2
=3 2::s; x:S 3

=i i-I::s; x::s; i for i = 1,2,3, ...


Integ ration 61

b x

Fig. 3.14

3 NUMERICAL INTEGRATION (QUADRATURE)

Our aim with these examples was to demonstrate that the evaluation of
integrals involves much mathematical technique and a lot of practice.
However even with all of these techniques there are still functions whose
integrals are well behaved but for which there are no easily written down
formulae. Fortunately there are books (Gradshteyn and Ryzhik) which
tabulate the values of such integrals. This section will discuss some of the
methods used in calculating such integrals.
Since the idea of integration is very close to that of summation (I is the
Greek equivalent to "s" and Jis a deformed "s") it is not surprising that
the numerical evaluation of integrals is a well developed subject.
Clearly the original definition could be used to evaluate an integral
numerically but we can do better. If we look at Fig. 3.14 we can see that
rather than just using I::~ (b - a)j n/(x.), a = Xo, X., ... , Xn = b with Xi-
Xi-. = (b - a)j n we can approximate the area of Ai better by adding the top
triangle. So we estimate Ai by

/(xJ . (b - a) + (b - a) (f(x i +.) - f(xJ)


n n 2

When we add up all these contributions we get the following expression:


b-a p
Area of A =--{if(a) +/(x.) + ... +/(x n -.) +If(b)}
n

This is called the trapezoidal rule. Just to illustrate that it is better, a simple
program was run on a home computer to evaluate J~ X3 dx. The results are
tabulated in Table 2, showing that even for a small n, the trapezoidal rule
gives a good approximation. This is not, however, true for the simple
formula.
An alternative way to consider these two approximations is to think of
them as approximations to the graph. The simple version gives a step
function approximation, see Fig. 3.15. The trapezoidal rule gives a sequence
of straight lines as in Fig. 3.16. We can do even better by dividing the whole
curve into 2n divisions and using quadratic curves to join the points in
threes as in Fig. 3.17. Rather than go through the mathematical details, we
62 Mathematics for Seismic Data Processing

Table 2

n Simple Formula Trapezoidal Formula

4 0.1406 .. . 0.2565 .. .
16 0.2197 .. . 0.2509 .. .
64 0.2422 .. . 0.2500.. .
256 0.2480 .. . 0.2500.. .

Fig. 3.15

Fig. 3.16

x
Fig. 3.17

will quote the approximation:

f
a
b f(x) dx = b - a (f(xo) +4f(Xl) + 2f(X2) +4f(X3) + ... +4f(x2n ) +f2n)
6n
where Xo = a, X2n = b, and Xi - X i - 1 = (b - a)/2n.
Integration 63
Table 3

n Simple Trap. Simpson

4 0.3203 .. . 0.4453 .. . 0.4413 .. .


16 0.4102 .. . 0.4415 .. . 0.4412 .. .
64 0.4334 .. . 0.4412 .. . 0.4412 .. .

This method is called Simpson's Rule. To illustrate the power of this


technique when asked to evaluate J~ x 3 dx it only needed n = 2 to obtain
the estimate 0.25. This is considerably better than the other methods. We
also tried J~ tan 1Tx/4 dx by the three approaches. (In this case the exact
formula is given by

fo
I
tan -1TX dx =-loge
4
4
1T
sec (1T)
- = 0.4412 ...
4

We tabulate the results in Table 3.


Once again the advantage of Simpson's rule is clearly seen.
When dealing with real data, we do not always have an explicit formula
which could be integrated. However we do have a finite number of points
and so numerical techniques can be used.
Clearly this subject has only been touched upon at this stage and because
of its importance in computing techniques there is a large library of methods
which are used. These are often referred to as methods for quadrature.

4 DOUBLE INTEGRATION

In the same way that we consider differentiation offunctions of two variables


we can integrate such functions. We normally do this by a process of
repeated integration:

f b(fd(Y) f(x, y) dx ) dy d (y) and c(y) are functions of y


a c(y)

J:
We interpret this expression as follows: first integrate f(x, y) considering y
as a constant; then f( x, y) dx = F(y) is a function of y; finally calculate
J! F(y) dy. Normally it does not matter in what order we do this process
as long as we watch the limits carefully. If d(y) and c(y) are constants, d
and c respectively, then

tb(fd f(x, y) dX) dy = fd (tb f(x, y) dY ) dx

We write this as J! J: f(x, y) dx dy.


64 Mathematics for Seismic Data Processing

Example 6
Let lex, y) = xy. Then

fabfd xy dx dy = fba [X2y]d


c 2 dy c

Example 7
lex, y) = h.

r
Then

tbfd h dx dy = h [hx]~ dy
= [h(d - c)y]~ = h(d - c)(b - a)

Example 7 is worth noting since it illustrates the important fact that just as
the single integral gives an area, the double integral gives a volume. This
comment needs to be explained. The region R in Fig. 3.18 determined by
a :5 x :5 b and by c:5 Y :5 d is just a rectangle on the plane with area
(b - a)( d - c). If we now erect a solid of height h on this rectangle we have
a solid with volume (b - a)(d - c)h, as in Fig. 3.19.
This gives an alternative definition which is more like our original defini-
tion for the single integral (it is also more symmetrical).
Let!(x, y) be a function of two variables x and y. Let R be a plane region
in the (x, y)-plane illustrated in Fig. 3.20. We divide the region R up into
little rectangles size 8x· 8y where 8x and 8y are small. The height will be
approximately lex, y). Thus we have a column of volume !(x, y) 8x 8y. Now
we sum over all the bits to get the volume

ff lex, y) dx dy
R

There are technical difficulties in this two-dimensional case because the


boundary of the region R can be quite complicated. This leads to difficulties
in the values of the limits of the integral to be chosen. If the region R is
rectangular as in the previous examples no problems usually occur. We
shall avoid complicated cases.
Example 8
(i) J~ J~ (x 2 + y) dx dy
(ii) J~ (. x /y dx dy
2

(iii) J~ J~2 x dx dy
Integration 65

z
z

y y

bl~----~~--+-----­
X x
Fig. 3.18 Fig. 3.19

Fig. 3.20

Solutions
(i)

(remember y is a constant in this first integral)

= L(~+2y-O-O)dy
=[h+/]~=~+l =¥

(ii) f 2(fIX2dX)dy=f2[X3JI
1 Y -, 3Y-1
dy
1

=f 2

1
1 -1
3y - 3y dy

=f -dy
2 2
1 3y
= [~ log y n= ~ . log 2
66 Mathematics for Seismic Data Processing

(iii) II (f: 2
XdX)dY = f [~2J:2 dy

= fl y4-l dy=
Jo 2
[l _lJI
10 6 0

3-5
10 6 30 15
It is sometimes quite useful to be able to sketch the region defined by
the limits of the integration. If we take Example 8(iii) we obtain Fig. 3.21,
R is the shaded region.
The technical difficulties of evaluating double integrals are even greater
than for single integrals. Once again there is a vast body of numerical
method available to do this. However this is not the place to investigate
those methods.
Also as we have extended the single integral to double integrals, there is
no difficulty, theoretically that is, of extending the idea further. We can
similarly define triple integrals

fff f(x, y, z) dx dy dz

This is relevant to the real world as many real situations lead us to consider
functions of three variables. Once again this drifts beyond the scope of this
book.
Recall that the object of this book is to give you familiarity with these
mathematical concepts, but not to make you feel that you can treat them
with contempt.

5 LINE INTEGRALS

In some applications it is useful to think of an integral not just along a


straight line, like the x-axis, but along a curve. If we think of A and B in
Fig. 3.22 representing two towns and the plane as being a surface of uneven
quality the energy required in going from A to B will depend on the route
chosen.
In general given a pair of functions P(x, y) and Q(x, y) and a curve C
we will define

L P(x, y)dx+Q(x,y) dy

the "line integral". The reason for this formulation is that the pair is often
thought of as a vector, see Chapter 5. To simplify the description at this
stage we will assume that the curve C is given by y = f(x) and C is either
increasing or decreasing, see Fig. 3.23. Such a function is called monotonic.
Technically it means that given any XI and X2 either f(xI) $.f(x2) or f(xI);:::
!(X2) but not both. The curve cannot increase and decrease.
Integration 67

y
1

x x

Fig. 3.21

B
A
c,

Fig. 3.22

Fig. 3.23

Example 9
(i) y = 3x + I, then if XI < X2. 3xI + 1< 3X2 + I. So 3x + I is monotonic.
(ii) y = I/x, x>O, then if XI <X2, l/xl> I/X2 and so l/x is monotonic.
x
(iii) y = 2 is not monotonic as if XI =! and X2 = 1,-(!f < 1 but if XI =-1
and x 2 =-!, XI <X2, (- If> (-!)2.
In this situation the line integral is easily calculated by substituting
y = f(x) and dy = f'(x) dx or dy = (df/dx) dx into the equations so that

fc
P(x, y) dx + Q(x, y) dy = IX, (P(x,f(x» dx + Q(x,f(x»f'(x» dx
~
68 Mathematics for Seismic Data Processing

2 4
Fig. 3.24

where Xb is the value of x at the beginning of C and Xe at the end. We now


have an integral with just one variable.
Example 10
See Fig. 3.24 where C is given by y = 3x + I and Xb = 2 and Xe = 4. Let
P(x, y) = x + y and Q(x, y) = xy. Then

L P(x,y)dx+Q(x,y)dy= f«X+3X+l)+(X.3X+I)3)dX

(since dyjdx=3)

= f4
2
[7X2]4
(7x + 1 +9x 2 ) dx = T+ x +3x 3 2

So the required integral is


7xl6 7x4
--+4+3 x64----2-3 x8=212
2 2

Similarly if the curve is given in the form x = f(y) we can calculate the
integral. Notice that if for the curve we choose y = 0, then dy j dx = 0 and
so the integral is just our normal integral, J~~ P(x, 0) dx.
One of the major applications of these ideas is in work on gravitational
and electromagnetic theory. If we allow a particle to move it is natural to
integrate along the path travelled by the particle.
The restriction on the nature of the curve is quite limiting but many curves
can be split into various sections each of which satisfy the monotonic
hypothesis. If we take the triangle ABC in Fig. 3.25, we can split it into 3
pieces AB, BC and CA all of which are monotonic, similarly with a finite
chunk of sine wave as in Fig. 3.26.
Exercise 3
Split the circle into a number of monotonic pieces.
Example 11
Now suppose we integrate P(x, y) = x + y and Q(x, y) = 0 around the triangle
DAB in Fig. 3.25. This is not monotonic but can be split into three parts
JOA P(xy) dx, LB J
P(x, y) dx and BO P(x, y) dx. In fact there is a special
notation §c P(x, y) dx + Q(x, y) dy to indicate integration round a closed
loop C. Notice that the simplification to an elementary integral only holds
Integration 69

B 0.'

0.0 '.0
o A monotonic segments

Fig. 3.25 Fig. 3.26

on each monotonic bit. To illustrate

f
eA
(x + y) dx = I° I X dx = 4

f (x+Y)dx=fo(X+l-X)dX
AB I

since y = 1 - x on AB

f (x + y) dx = f ° 1 dx = - 1
AB I

f (x+y)dx=O sincex+dx=O
BO

therefore

f e (x + y) dx = 4- 1 + 0 = -4

One interesting case of a line integral crops up when we look at the length
of a curve. Suppose s is the distance along a curve C then we might be
interested in Ie P(s) ds where P(s) is perhaps a density, or a cost. We can
see how to evaluate these integrals with the aid of Fig. 3.27. Clearly for
small 8x, 8y

and thus

Example 12
Let us work out the length of x 2+ y2 = 1 in the first quadrant, as in Fig.
3.28, arc length = I~ dsJl +(dy/dx)2 dx since dy/dx = -x/y we have

f 11M2 II
e
ds =
0
1+2 dx =
Y 0
dx
-==!!.
Jl-x 2 2
70 Mathematics for Seismic Data Processing

y+6y
6y
y

c
x X+6y

Fig. 3.27 Fig. 3.28

One last method of calculation which is of value is to represent the curve


"parametrically" or with a "parameter". In this case we define the curve
C by a new variable t so that C is given by two functions y = g( t) and
x =f(t). This is an extremely valuable technique for certain curves. The
circle of radius , for example is very neatly described by y = , sin t, x =
, cos t. Using the properties of trigonometric functions we have
x 2 +l = ,2 cos2 t + ,2 sin2 t = ,2
In this situation we can trace the curve by plotting values of t. An ellipse
is given by
x = a cos t y = b sin t
To evaluate the line integral we get

,( P(x, y) dx +Q(x, y) dy = f" (P(f(t), g(t» ddf +Q(f(t), g(t» d g) dt


Jc 10 t dt
where to is the beginning of the curve and t( is the end.
Exercise 4
(i) Evaluate (x + y) along the curve x 2 + y2 = I as in example 3.
(ii) Evaluate Jds around the triangle in example 2. Does the answer agree
with your expectation?
(iii) The curve C has the parametric form x = 2at, y:::;: at 2 • Sketch this curve
for a = 1.

Green's Theorem and Application


While line integrals are interesting in themselves we shall just expand one
facet of their application which utilises the connection between double
integrals and line integrals. This is summed up in Green's Theorem named
for the famous Nottingham mathematician, George Green.

Green's Theorem
Suppose we have a curve C and two continuous functions P(x, y) and
Q(x, y) defined on the region R enclosed by C and on the boundary curve

e;-
C, as in Fig. 3.29. Then

ff
R
~~) dx dy = -fc P(x, y) dx +Q(x, y) dy
Integration 71

Fig. 3.29

This gives us a nice relationship between the surface R and its


boundary C.
This is a handy theorem for cases where we have a hard double integral
which we can calculate via the line integral or vice "Versa. The area enclosed
in Fig. 3.29 is A where

A= f f dxdy
R

From Green's theorem if P(x, y) = y, Q(x, y) = 0

A= f f dXdy=-fc y dx
R

and similarly,

A= f
R
f dx dy = +fc x dy

Example 13
Let C be the ellipse x 2 +4/ = 9. Then if P(x, y) = 3x - y, Q = x +2y

fc (3x-y)dx+(x+2y)dy=- f f (-1-1)dXd Y =2f dxdy


R

= 2(area of ellipse) = 91T.

(Actually we just looked up the integral-it's not that simple!)

Numerical Integration
To evaluate a double integral numerically we proceed in much the same
way as for single integrals. We cover the surface of interest as in Fig. 3.30
with a mesh and evaluate the function P(Xi' Yi) at each mesh point (Xi, Yi).
72 Mathematics for Seismic Data Processing

y.

x
y.
y,
X, X. x.
Fig. 3.30 Fig. 3.31

Where we have Xt. X2, ••• , Xn and Yt. ... ,Yn, then the integral

II
R
P(xy) dx dy = L
all squares
P(xJ'Jhk
.

where hand k are the length and breadth of the mesh cells.
Such integrals are often needed, for example in calculating the gravimetric
effects of a mass.
Since we have a two-dimensional problem with say n x-points and n
y-points we need n 2 function evaluations, which for large n can be expensive.
One way around this is to use Green's theorem and, instead of working out
the long problem, to evaluate the line integral. Now this can be written as
a combination of simple integrals which are one-dimensional. This gives
considerable savings in computation.
Suppose we wish to evaluate

1= f f x dxdy
R

over the region bounded by the curve C defined by


r = 2a(1 +cos 8)
with r2=x +l and 8=tan- 1(y/x) as in Fig. 3.31.
2
If we tried this numerically we could either construct a mesh over the
surface or note that
x2
P(x, y) = I and Q(x'Y)=-"2

fc dx- ~2 dy= f f xdxdy


R

converting the problem to a single dimension! A nice practical description


of these ideas occurs in the pioneering paper of King Hubert (1948), and
they are still used today even with cheap and fast computing. Do bear in
Integration 73
mind that if we choose a fine enough mesh we can saturate even the largest
computing facility.

6 DIFFERENTIAL EQUATIONS
Now that we have learnt to integrate and differentiate it is time to consider
differential equations. These are simply equations involving differentials,
dy/dx = x is such an equation. Essentially, in such a situation, we have
determined the slope of the curve at all points x. Can we recover a function
y = f(x) so that dy/dx = x? This example is easy! Consider y = f(x). Then

f f(x) dx = f(x)

:. f(x) = f x dx = ~2 + c
c is some constant. Thus any function y = (x 2 /2) + c is a solution.
The reason for the indeterminacy of the solution is clear if we draw a
graph, Fig. 3.32. All the curves have the same slope for a particular value
of x. Hence they will all satisfy the equation. (Solutions are sometimes
called flow lines.)
Clearly any equation of the form
dy
-=f(x)
dx
can be solved in a similar manner. That is, we write the solution as

y= f f(x)dx

In real problems we frequently have boundary conditions, i.e. in addition to


the equations we also know the behaviour of y at some particular point.
Take our example dy/dx = x, and suppose we also know that at x = 0,
y = 1. We can then check from y = (x 2 /2) +c that, if x = 0, y = c and so the
only solution satisfying this extra condition is y = (x 2 /2) + 1.
If it is possible to write the equation in the form h(y) dy / dx = g(x) then
we can solve it in this form. Equations of this form are called separable
because we can separate x and y.

f h(y) dy = f g(x) dx
Crudely one could justify this by mUltiplying each side of our original
equation by dx:
dy
h(y) - . dx = g(x) dx
dx

f f
and then,
h(y) dy = g(x) dx
74 Mathematics for Seismic Data Processing

Fig. 3032

An important equation of this type is dy / dx == ky, where k is some


constant. In this the rate of growth is determined by the CUl-rent value of
the funCtion. Then

and so

f;d f y= kdx

Thus log Y = kx +c and


y = exp(kx :c) = eJ(p kx· exp c
If exp c = A we have,<as our general solution
" . : : .,
y = A exp(kc) = A eke
It is useful at tpis stag~ tp introduce some more technical language. We
have been dealing with· special sorts of differential equations. These have
been rather simple and have 110t contained terms like d 2y/dx 2 or d 3 y/dx 3
etc. The degree n- of a differential equation is the order of the highest
derivative which occurs, i.e. the ones above have all been of degree I.
Here is an example of an equation of degree 2:
d 2y dy
-+-=x
2
dx dx
If d 3 y/dx 3 = (dy/dx)2 then we have an equation of degree 3.
A differential equation is called linear if there are no terms of the form
(dy/dx)2 or (d 3 y/dx 3i or y5 or (d 2y/dx 2) etc., i.e. we can write the equation
in the form
Integration 75
Exercise 5
Determine the degrees of the following equations and decide whether or
not they are linear:
d2 y 2
(i) dx 2 = w , w a constant

d3 y dy d 2y
(ii) - - y - = -
dx 3 dx dx 2
... ) d2y (d y
(III dx2- dx
)3 = x 3
- S10

Y

. d 2y dy 2
(tv) dx2- dx +y =0

4 d4 y 3 d3 y 2 d 2y dy
(v) x -+x -+x -+x-+y=O
dx 4 dx 3 dx 2 dx

A general solution to a differential equation is a form which contains all


solutions. If dy / dx = kx then y = Ae kx is a general solution. To determine
the constant A we need more information, normally a boundary condition.
Thus y = I if x = 0 implies I = A . I so A = I. Alternatively we may know
that at x = I, y = I so that 1= Ae k and so A = e- k • Then the solution is
y = e-ke kx = ek(x-I).
A particular solution is just one individual solution, such as y = e kx would
be for dy/dx = kx. A useful guide is that a general solution to a linear
equation of degree n will have n unknowns.
Example 14
d 2y / dx 2+ w2y = O. This is an important equation which describes many
physical phenomena. It is known under the name of Simple Harmonic
Motion. This is because it is an equation of this form which describes the
motion of a pendulum. As a consequence we will spend time developing
solutions.
We put z = dy/dx. Then using the chain rule from differentiation we have
d 2y dz dz dy dz
-= -=-·-=-·z
dx 2 dx dy dx dy

so z dz/dy = _w 2 y. Hence

f zdz= f -w2ydy
so z2/2=-(w 2/2)y+c/2 for some constant c. Thus z=±.Jc-w2 y.
Let us choose c to be positive and write it as d 2 and choose the positive
square root. Then
76 Mathematics for Seismic Data Processing

So

f I
wJ(dlw)2- y2
dy= f dx

Now from our tables of integrals we have


I . _( yw .
w sm d = x +f where f IS some constant

Then y = d I w sin( wx + cf» where cf> = wi Hence we have a solution with


two arbitrary constants cf> and d. We could rewrite
y = A sin( wx + cf> )
and we see that the solution is a sine wave with amplitude A, frequency w
and phase cf>. An important point to observe is that w is determined by the
equation.
An alternative form of the solution is given by using the trigonometric
identities
A sin( wx + cf> ) = A sin wx cos cf> + A cos wx sin cf>
Put A cos cf> = B and A sin cf> = C and we obtain the solution
y = B sin wx + C cos wx
This is often the way such solutions are written in the standard texts. As
you will appreciate, solving equations (and this one is simple) requires some
expertise, and experience!
Exercise 6
(i) Solve dy I dx = xl y.
(ii) Find a solution of dyldx = -y where y = 1 when x = O.
(iii) Solve the equation x dy I dx = y.
(iv) Find a solution of the equation d 2yldx 2 +(dyldx)2=O such that y=O
when x=O.
[Hint: remember Ilea = e- a.]

Another important class of differential equations is that of partial differen-


tial equations. These are of the form, let u(x, y) be a function of x and y,
satisfying:

find the possibilities for u.


This whole area is much more complex but to illustrate the ideas we will
consider one particular equation, "The Wave Equation". Even in this case
we will examine only the one-dimensional case, this applies for example
to a vibrating string:

where u is the length of the string, U (x, t) and x is distance along the string
Integration 77

and t is the time. The first method we will use is called "Separation oj
the Variables". We assume the solution has the form u(x, t)= V(x)T(t),
where V depends only on x and T depends only on t. We now rewrite the
equation as

Hence

Since the left-hand side depends only on x and the right-hand side depends
only on t they must both be a constant. We put this constant equal to
- w2 / c2 and obtain

and
d2 T
-+w 2 T=O
dt
2

Now from the Simple Harmonic Motion example considered earlier we


have
wx wx
V(x) = A cos-+B sin-
c c
and T(t) = C cos wt + D sin wt, where A, B, C and D are constants. So

u(x, t) = ( A cos (:x) + B sin (:x) ) (C cos wt + D sin wt)

If we start with a string of length 1 fixed at 0 and I we would expect


u(O, t) = u(l, t) = 0 for all t. Thus A( C cos wt + D sin wt) = Ofor all t. So A = 0
as cosO=1 and sinO=O. Also u(l,t)=O=Bsin(wx/c) (Ccoswt+
Dsin wt). So Bsin(wl/c)=O.
If B = 0 then u(x, t) = 0 for all x and t and this is a trivial solution. So
sine wl/ c) = O. Now from the graphs of sine we know that sine wl/ c) = 0 only
when wl/ c is a multiple of 1T. Hence there is an integer n satisfying wl/ c = 1Tn
and w = 1Tnc/ l. Clearly we have a relation between w, C and I. This is very
important since it establishes that the waves on a string cannot be arbitrary
but depend on 1T, I and c (which is related to various real properties of the
string). The list of possible values of ware called the eigenvalues of the
equation. These values are also referred to as the harmonics of the string.
Notice that if we put WI = 1TC/ I, w2 , 21TC/ I, W3 = 31TC/ I, ... we get possible
solutions

. ( -1-
ur(x, t) = sm r1Tx) ( Cr cos (r1TC
-1- 1) -1- 1) )
+ Dr sin (r1Tc
78 Mathematics for Seismic Data Processing

It is not too difficult to see that we can get solutions


00

u(x, t) = L ur(x, t)
r=1

(i.e. we go on forever if required). Notice that


00 r7TX
U(X, 0) = r~1 C, sin -1-

If we know the initial starting position we then get a solution for the C:s.
This discussion is crucial to understanding the importance of Fourier
series which we will return to in a later chapter.
Before leaving this finally, we will show an alternative approach because
this can be important in certain applications. For this we assume that we
can write u as a function of x - ct. Thus u(x, t) = f(x - ct) = f(z) say where
z=x-ct. Now using the chain rule aflat=dfldz·azlat=-cdfldz and
afl ax = dfl dz· azl ax = dfl dz. Then i fI az 2 = c2 dfl dz and a2f1 ax 2 =
d 2fl dz 2 and so u(x, t) = x - ct satisfies the equation a2 ul ax 2 = II c2 aul at 2 •
The fact that there are two methods shows the difficulty of solving Partial
Differential Equations and the crucial importance of making sure that the
method chosen is appropriate to the physical model.
Chapter 4

COMPLEX NUMBERS

1 INTRODUCTION

One of the great mysteries in geophysical analysis is the so-called duality


between the time domain and the frequency domain. This always looks
very mysterious but in fact is a simple mathematical device which enables
the data to be handled much more easily. The key mathematical idea in
this duality is that of Complex Numbers and the object of this chapter is
to give you a basic grasp of these numbers and to help you to see through
the apparent mystification.

2 THE BEGINNING

We start with the simple observation that there is no real number x such
that x 2 = -I. So we invent a number i (engineers often call it j) such that
i2 =-1
This doesn't get us too far of itself, but we continue by considering all
"numbers" (pairs) of the form a + ib where a and b are real numbers and
we add and mUltiply the pairs according to the following rules: If a, b, c
and d are real numbers then
(a + ib) +(c + id) = (a + c) + i(b +d)
(a + ib)(c + id) = ac + iad + ibc + ibid
= ac +i(ad +bc) +i 2 bd
= ac - bd + i(ad + be) (since i 2 = -1)
The collection of all such "numbers" with this addition and multiplication
is called the complex numbers, and any number of the form a + ib (where
a, b are real numbers) is called a complex number. The real number a is
called the real part of a + ib and b is called the imaginary part. Subtraction
then obeys the rule (for a, b, c, d real) that:
(a + ib) - (c + id) = (a - c) + i ( b - d),
80 Mathematics for Seismic Data Processing

and it is routine to check that the usual rules familiar from real numbers
also apply to complex numbers. We give some examples:
(2 + i) +(3 + i2) = 5 + i3
(~+ i2) - a+ i3) = ~ - i
(l +i)(1 +i)= I +i 2+i(1 +1)
= I-I +i2 = i2
(~- i)(2 + i2) = ~ x2 -(i2)(2) + i( -2 +2 x~)
=1-(-2)+i(-2+1)
=3-i
(a + ib)( a - ib) = a 2 + b 2
+ ib)(a + ib) = a 2 - b 2 +2iab
(a
(Note that a + ib is often written a + bi, which is the same thing.)
Exercise 1
Evaluate
(i) (l +~i) +(2 +~i)
(ii) (1-IOi)+(23.5+50i)
(iii) (~- i)(2 + i)
(iv) (I + i)(lO-3i)

One important property of complex numbers is that any of them other


than 0 (= 0 + iO) has an "inverse":
1 a - ib a . b
--=--=---1--
a + ib a 2 + b2 a 2 + b2 a 2 + b2
and (as we would expect for the inverse of a + ib)
1 1
. . (a +ib)= 1 =(a +ib)· - - .
--
a+~ a+~

There is a great convenience to be gained from writing a complex number


as a single symbol, and we frequently write a + ib = z (using letters such as
u, v, wand z to denote complex numbers). The real part of the number,
Re(z), then satisfies Re(z) = a, and the imaginary part Im(z) = b. Two im-
portant functions of a complex number are (for a, b real):
the complex conjugate of z = a + ib, written as z= a - ib, and
the modulus of z, written as JzJ = ~
Note that the inverse of z "" 0 is zllzl2, since
a-ib

and that JzJ ~ 0 for all z.


Complex Numbers 81

Examples
1. If z = 2 +3i
£=2-3i
2 1 2-3i 2 3i
Izl = 13 and--=--=---
2+3i 13 13 13
2. z=!-2i
£=!+2i

1z 1
2 17 I ! + 2i
=-and--=--
4 !-2i 1]
2 +8i
17

3. z=J2i_
£ = -J2i, Izl2 = 2
1 -J2i
z 2
Exercise 2
In each case calculate £, Izl and 1/ z.
(i) 2 + i
(ii) -2 +3i
(iii) -1
(iv) J2 +iJ3
Finally, note that (as with real numbers) division by a non-zero complex
number is the same as multiplication by its inverse. Thus
I
(a + ib)+ (c + id) = (a + ib) . - - . -
(c + Id)
c-id
=(a+ib) . - -
c 2 +d 2

For example,
1
(2+2i)+(1 +i)=(2+2i)· 1 +i

(1 - i)
=(2+2i)--
1+ I
= (l + i)(l - i) = 2
With the definitions we have given it is possible to develop all the familiar
things that are done with real numbers. But why does this help with
82 Mathematics for Seismic Data Processing

.c • (a,b)

. .r
Imaginary
.A
real

·s

·0
Fig. 4.1

geophysical analysis? Largely because there are particularly nice ways of


representing complex numbers geometrically. This idea is due to Gauss,
but the main representation is always called the Argand diagram. It is the
usual (x, y) plane, but with the x-axis labelled as the real axis and the y-axis
labelled as the imaginary axis. A typical complex number a + ib (where a, b
are real) is then placed at the appropriate points (a, b) in the plane. For
example, in Fig. 4.1, I + i is point A, I - i is point B, -2 + 2i is point C and
- 3 - 4i is point D. A complex number of the form ib is often called a purely
or pure imaginary number and lies on the y-axis; one of the form a is real
and lies on the x-axis.
Exercise 3
Draw a picture and plot the points corresponding to the complex numbers
I +2i, -3+!i, 6-i, -2-3i.

It is essential to notice that if z is a complex number then Izl is the


distance of the point representing z from the origin (which represents
0= 0 + iO). This follows from Pythagoras' theorem, as in Fig. 4.2. Similarly,
if z\ and Z2 are complex numbers Iz\ - z21 is the distance between them. (See
if you can draw the diagram.) So it is possible to express certain simple
geometric ideas very straightforwardly in terms of complex numbers, and
vice versa.
Finally in this section it is useful to use the "polar form" of a complex
number. Any point A in the plane can be specified by giving its distance
from the origin 0 and the angle that OA makes with the (positive) real axis
(measuring anti-clockwise); this is sketched in Fig. 4.3.
We denote the distance by r and the angle by 8. Then any non-zero
complex number a +ib is determined by the corresponding pair (r, 8). The
relations between the two ways of identifying it are given by
r cos 8 = a r sin 8 = b if z = a + ib
and r=lzl
b Imz
tan 8=-=--
a Rez
8 is called the argument of z.
Complex Numbers 83
im
im
(Ob) 1--------" (a.b) A=(a.b)

real 6 real
(a.O)
Fig. 4.2 Fig. 4.3

__~)~I______~--f_(Z)~
Fig. 4.4

Exercise 4
(i) If z = (l + i)j (3 - 4i), find Izl and the argument (J. Plot the point z on
the Argand diagram.
(ii) If Zl = I + i and Z2 = 3 -4i find Izt/, IZ21 and IZI - z21. Find the argument
(J for (ZI - Z2).

3 FUNCTIONS OF COMPLEX VARIABLES

The techniques for complex variables are exactly the same as in the real
case. Instead of the black box being fed real numbers, it is fed with complex
numbers (see Fig. 4.4). Polynomial functions are just the same:
f(z)=z+l
or
f(z)=z3+2z+1
or
f(z) = iz 2 -(i+I)z-2i
Such functions can be evaluated just as before. Note that if the coefficients
of a polynomial are all real as in Z3 + 2z + I, and if z is real, i.e. z = x + iO
for real x, then f(z) = f(x) just as if everything was real. More generally, if
we have a complex variable z = x + iy where x and yare real, then z can
be thought of as depending on the real variables x and y. Hence a function
f(z) can be thought of as involving a pair offunctions each of two variables:
f(z) = u(x, y) + iv(x, y)
where u(x, y) gives the real part and v(x, y) gives the imaginary part. For
example, if f(Z)=Z2=(x+iy)2=X2_y2+2ixy then u(x,Y)=X 2_ y 2 and
v(x, y) = 2xy.
84 Mathematics for Seismic Data Processing

One of the most important functions we encounter is the exponential


function, written e Z or exp(z). We have to extend the former definition to
allow for a complex argument z. When z = x + iy, so that x is its real part
and y its imaginary part, we define the exponential function by the rule
exp(z) = e Z = eX+iy = eX(e iY )
= eX(cos y + i sin y)
This is without doubt the most important formula you will need to know.
In terms of the functions u(x, y) and v(x, y) discussed above, we have
f(x + iy) = eX cos y + ie x sin y
so
u(x, y) = eX cos y and v(x, y) = eX sin y
There are several simple deductions we can make from the definition:
(1) If z = x +Oi, so that z is real and y = 0, then eZ = eX(cos 0 + i sin 0) = eX;
(2) If z = 0 + iy, so that z is purely imaginary, then e Z = e iy =
eO( cos y + i sin y) = cos y + i sin y;
(3) If z = 0 - iy, so that z is purely imaginary, then eZ = ei(-y) =
cos y + i sin( - y) = cos y - i sin y;
(4) From (2) and (3) we get
eiY+e-iY=2cosy

e iy - e- iy = 2i sin y
For example, ei?r = cos( 7T) = -1 = e- i7T
e- i7T / 2 =-i
(5) The last of these examples is a special case of the general result
eZ+27Ti = ex +i(y+27T) = eX(cos(y +27T) + i sin(y +27T»
= eX(cos y + i sin y) = eZ
From the formulae for cos y and sin y given at (4), it is possible to
generalise the concepts of trigonometric functions so that they are defined
for complex numbers z by the rules:
eiz+e- iz
cos z=
2
e iz _ e- iz
sinz=---
2i
The familiar formulae for trigonometric functions hold with these defini-
tions: for example cos 2 z +sin 2 z = I.
If we return to the idea of polar coordinates as mentioned in the previous
section, then for a non-zero complex number z = r cos (J + ir sin (J we see
that z = r e i9, where r = Izl and (J is the argument of z. Since r e i9 = r e i9+27Ti
from (5) above, (J is not uniquely defined by z; if (Jo is one value of the
argument then so is 80 + 2n7Ti for any whole number n. This corresponds
to rotating more than 27T around the origin in Fig. 4.3. There are various
Complex Numbers 85
subtle difficulties following from the non-uniqueness of the argument of a
complex number, but we will usually slide over them.
Polar coordinates can also be used to look at multiplication. Let z, = " e i8J
and Z2 = '2 e i8,. Then
z, Z2 = ,,(cos 0, + i sin 0,) '2( cos O2 + i sin ( 2 )
= "'2([ cos 0, cos O2 - sin 0, sin O2 ]
+ i[ cos 0, sin O2 + sin 0, cos O2 ])
So if we recall the trigonometric formulae, we have
z, Z2 = "'2( cos( 0, + O2 ) + i sin( 0, + O2 ))

This is a much neater and more useful formula than the previous one.
A consequence is that if '2 ',= = =
I then e i8J x e i82 e i(8 J+82). Hence, if
0, = O2 = 0 (say), we have
(cos 0 + i sin 0)2 = cos 20 + i sin 20
and for any integer n
(cos 0 + i sin Or = cos nO + i sin nO
So for any non-zero complex number z = ,e i8, we have zn = ,n e in8 =
,n(cos nO + i sin nO).
We can also see how to obtain squarJ: roots or nth roots using this polar
form. Suppose z = ,e i8 and we want Jz. Some reflexion will convince you
that
J; = J; e i812 or J; e- i812
This is illustrated in Fig. 4.5. More generally, to find an nth root, we see
,'In ei(O+2rr)/n, ••• , ,lin ei(O+2(n-l)rr)/n

i.e.
Z'/n = ,'In e i(oln+2krr / n>, k = 0,1, ... , n-l
Thus there are n distinct nth roots of a non-zero complex number z.
Using the exponential function, we can explain mathematically the some-
times mysterious relation between time and frequency which so often
appears in geophysical interpretation.

°
Let t be a variable which can have any positive real value. Put z = e it•

°
Then as t goes from to 27T, z moves once round the unit circle. If instead
we put z = e iwt then as t goes from to 27T, z moves round the unit circle
w times. So if we have a function of t (time), and we substitute z = e iwt we
obtain a function of z or of "frequency" wand time t (see Fig. 4.6).
There is one further function we need to investigate and that is the
logarithm. Following the example of real numbers we would like to have
log(exp(z» = z and exp(log(z) = z. Taking these rules as a guide we find
that the right definition for log is (using polar coordinates)
log(, e iO ) = log, + iO
86 Mathematics for Seismic Data Processing

im
(0.1)

real real
_ _ _ _~----~L-~----_ _ _ ~
(to)

Fig. 4.5 Fig. 4.6

However () is not uniquely defined since re ilJ = re i (IJ+27r). If () is chosen


suitably this does not give problems, but it is important to bear in mind
that, while exp(log z) = z is always true, we can only say that log(exp z) =
z + 27Tin for some n.
Exercise 5
(i) Find the values of e37ri, e57ri, e77ri
(ii) Find the solutions of Z2 = 2 + i
(iii) By considering z as r e ilJ find all three roots of Z3 =I

4 DIFFERENTIATION AND INTEGRATION

The theory for differentiation is very similar to that for real variables. We
define
f'(z) = df =limf(z+h)-f(z)
dz h~O h
It turns out that all the functions known to be differentiable for real variables
are again differentiable with the same differential.
Examples
d
4. dz (Z4) = 4z 3

d .
5. dz (cos z) = -sm z

d
6. dz (exp z) = exp z

One interesting result comes from examining the real and imaginary parts
of a differentiable complex function.
If we stop and think for a moment we might suspect that differentiation
might contain some hidden complexities as f(z) gives a curve in the plane.
We can write z = x + iy and hencef(z) = u(x, y) + iv(x, y) for some functions
Complex Numbers 87
u(x, y), vex, y). Since
j'(z) = lim f(z +az) - f(z)
dz->O az
We might let az = ax, i.e. we just increment the real part and get
fez + Ilx) - fez)
ax
or suppose Ilz = illy so we also have a derivative in the y direction,
fez + illy) - fez)
illy
The derivative j'(z) must be the same in each case and so if we write the
function fez) as u(x, y) + iv(x, y) and do some manipulation we have
au av
---
ax ay
and
au av
ay ax
So for a derivative to exist these "Cauchy-Riemann" equations must hold.
These are of great importance in mathematical physics.
A consequence of these equations is that
iu iu iv iv
-+ - = 0 and - + -=0
ax 2 ay2 ax 2 ay2
These equations are the two-dimensional version of Laplace's Equations.
Any function satisfying them is called harmonic. In the case above both u
and v are harmonic and are called conjugate functions.
Integration of functions of a complex variable is more difficult to cope
with than differentiation. But if we have a function fez) with a primitive
F(z), i.e. a function F(z) such that dF(z)/ dz = fez) then the theory for fez)
is similar to that for functions of a real variable. For example, we know
d/ dz( cos z) = -sin z, and d/ dz e = e so Z Z
,

I Z

z,
2 sin z dz = -cos Z2 +cos Zl

I e dz = e i - eO = cos I - i sin I - I
Z

= ( - 1 + cos 1) - i sin 1
The difficulties with complex integration come from the fact that, when
we think about the meaning of J;~ fez) dz we have to think about the route
or path from Zl to Z2 along which we are integrating as in Fig. 4.7.
On the "real line" there is only one route from Xl to x 2 , but in the complex
plane there are clearly many possible routes. If fez) has a primitive then
88 Mathematics for Seismic Data Processing

~l2

~th2
l,

Fig. 4.7 Fig. 4.8

all routes give the same answer for the integral. In particular J;l J(z) dz = 0,
where we use any path from ZI to itself as in Fig. 4.8. I

We must, however, be careful about whether or not we have a suitable


primitive F(z) for a given function f(z). Appearance can sometimes be
deceptive. For example, we have already observed that log z is rather badly
behaved. If we define Log z (with a capital "L") by
Log z = log r + if]
where r = Izl and -1T < (J::::; 1T, we would like Log z to be a primitive for
1/ z. But if we integrate around the unit circle C = {z: 1zl = I}, say from (I, 0)
back to itself, then it can be shown (though we will not here) that

f!c Z
dz = 21Ti

That is, although Log z looks like a primitive for 1/ z, the integral Ji::~; 1/ z dz
is not zero. This difficulty over primitives needs to be borne in mind, but
will not be discussed further.
Chapter 5

MATRICES

1 INTRODUCTION

Very often we are faced with the situation where large quantities of data
which have to be processed. In the 19th century mathematical techniques
were developed for coping with this situation, especially for "linear" prob-
lems. Although we will not follow an historical approach it is interesting
to note that these ideas were used to cope with large-scale calculations at
a time when they had to be done by hand. Although nowadays with the
development of computers such techniques might be expected to be
irrelevant, the opposite turns out to be the case. Precisely because these
techniques were able to handle computation, they turned out to be very
appropriate for calculating on a very large scale.
The idea is that if we have data, we handle it in arrays. For example if
we consider the population of Britain, we might want to describe it by
(x}, ... , x 8 ) where for I::::: j::::: 7, Xj is the number of people whose ages lie
between (i -1) x lO and i x 10, and X8 is the rest. Thus XI is the number of
people aged up to 10, etc. until we get to X 8 , the number of people who are
over 70. For different purposes we might require alternative ways of splitting
up the population. For example, if there is some population of animals,
the rate of reproduction may depend on the number of females of a certain
age and this would be the most relevant information. Thus in human
population the ranges 15-20, 20-30, 30-40, may be the keyage bands.
Another example might be the data from a set of 50 microphones streamed
from a survey vessel. At a particular time t we might have the data
(x,(t), X2(t), ... , xso(t)), and over a period of time we could build up a whole
collection of such "strings". Each one of them is called a vector.

2 DEFINITIONS AND ELEMENTARY PROPERTIES


An ordered set of numbers (x}, ... , x n ) is called a row-vector. If we write

(D
the set as
90 Mathematics for Seismic Data Processing

~I I

0,11
y
1,1,1

0,0.1 r---'---J
,
,/~.~~).--- - - - - - - 1.1.0

1,0,0
x

Fig. 5.1

then we call this a column-vector. Normally we will just write "vector",


hoping that the context makes clear which is meant. The number n is called
the size or dimension of the vector. Vectors in books are often denoted by
italics or bold face. We will choose to use bold face to denote vectors; for
example, we have the vector x = (x" ... ,xn ). Points in three-dimensional
space can be represented by vectors, x = (x" X2, X3) where x" X2, X3 are the
coordinates with respect to some set of axes. For example, if we take the
usual Cartesian axes Oxyz (where any two axes are at right angles to each
other) then a cube of side-length 1 unit has vertices which can be represented
by the eight vectors (0, 0, 0), (1, 0, 0), (0, I, 0), (0, 0, 1), (I, I, 0), (1, 0, I),
(0,1,1), (1, 1, 1) (see Fig. 5.1). Or, in data processing, a sample comprising
the measurements of twenty objects out of some population can be thought
of as a vector of size 20.
We need to develop a method (an "algebra") to enable us to handle
vectors more easily. By just writing x instead of (x" ... , xn) we already save
space, but unless we have techniques for manipulating these objects we
have not gained much.
Let x and y be two n-vectors, with x = (x" X 2 , ••• ,xn ) and y =
(y" Y2, ... ,Yn). We define addition of the vectors by

Note that x +y is another n-vector. If we define the zero vector as 0 =


(0,0, ... ,0) (n entries) then x +0 = 0 +x = x. The usual rules for addition
apply to these vectors. Thus x+y=y+x, and if -x=(-x" ... ,-xn ) we
have x + -x = 0; finally if x, y and z are all n-vectors (x +y) +z = x +(y +z).
The same process and rules work (in the obvious ways) for column vectors.
It is important to remember that we can only add vectors of the same size.
However we can define scalar multiplication by a number. Given a vector
x and a number a where x = (x" ... , xn) we define ax = (ax" ... , aXn).
For those familiar with vectors in an applied mathematics or physics
setting adding x +y is just the vector addition related to the parallelogram
law (see Fig. 5.2). Scalar multiplication of a vector by a number a is just
taking a multiple in the same direction (if a is positive) or the reverse
direction (if a is negative).
Matrices 91

~+y ,/ --",
;
- 2~
'-- -

-~

Fig. 5.2

Exercise 1
Add the following pairs where possible:
(i) (1,2,3) +(2, 1,4)
(ii) (1,2, -1,0)+(-1,2,1,1)
(iii) (-1,4, 1)+(-4, 1,2,3)
(iv) (1,t 1)+(-1, -2, -1)
Exercise 2
Evaluate the following vectors:
(i) 1(1, I, 1)
(ii) (4,2,3)-2(1,-1,0)
(iii) 3( I, -1, I) + 2( -1, 1, -1)

3 MATRICES

Vectors can handle some data very well but quite often we need to manipu-
late vectors and to facilitate understanding it is more convenient to arrange
things in two-dimensional arrays. An m x n matrix is a rectangular array of
numbers with m rows and n columns. Some examples:
Example 1

(-~
3
2
5I) . 2 x3
IS

12)
(!
3 7
2 8 ~ is 3 x4
9

(~ }4X2
92 Mathematics for Seismic Data Processing

It is important to remember the convention that a matrix is described as


(no. of rows) x(no. of columns). We will use the standard notation of
denoting matrices by capital bold face letters: A, B, M, N etc., and to denote
the entries as aij where aij is the entry in the ith row and the jth column.
So in the first example above a22 = 2, a23 = 5. In the second example a\3 = 7,
a31 = 4. Occasionally we write (aij) to denote the matrix with entries aij
1:::; i:::; m, 1:::; j:::; n, where the matrix has m rows and n columns.
Clearly a vector is a matrix with either one row, or one column; the
vector (a], ... , an) is a 1 x n matrix. If A and B are matrices of the same
size (i.e. both m x n matrices) we can define addition by:
A + B = (aij +bij) where A = (aij) and B = (bij)
Example 2
If

A=G -~) and B=G ~)


are both 2 x 2 then

A+B=(1+2,
2 +3,
-1+1)=(3
3 +4 5 7
0)
Example 3

( 1 -1) -1)
-1
A= 2-2 ( 10
and B= ~
-3 4 3
are both 3 x 2 then

A+B= ~ (0 -2)~
Exercise 3
Add (where possible) the following pairs of matrices:

(i) G-~ ~) + ( - ~ ! ~)
(ii)
GD+G ~)
(iii)
G ~) +(-~ -~)
(iv)
( 0.1
1
1+ i
0.2
2
2 -I
C
0.3)~ + 0.1
2
-i
0.01
-i
-~I)
+i
Matrices 93
The last example illustrates the fact that the entries in a matrix may be
complex numbers, or, possibly, functions.
There are simple (and familiar-looking) rules that the addition of matrices
satisfies:
A+B=B+A; (A + B) +C = A +(B +C)
where A, Band C are all m x n matrices. If A = (aj), -A = (-a jj ) and then
A+(-A)=O, where 0 is the matrix all of whose entries are zero, then
0+ A = A. So we can add matrices in much the same way as we do vectors
(or numbers).
Again we have a scalar multiplication, aA, where a is a number and A
is a matrix, defined by a A = (aaij) where A = (aij).
Example 4
_2(2 -1) =(-2X2 -2X-l) =(-4 2)
3 1 -2 x3 -2 x 1 -6-2

4 MULTIPLICATION OF MATRICES
But in some ways the most useful thing about matrices is that (in suitable
cases) it is possible to mUltiply them together. We begin to approach this
technique by considering the simultaneous linear equations
aXI +bX2= el
eXI +dX2 = e2
To find the solutions (if any) we only need to know a, b, e, d, el and e2 •
We could describe the same information using matrices and vectors:

where the matrix (; :) tells us the coefficients, the vector (:J tells us

the constants and finally the vector (::) tells us the names of the variables.
Comparison of the equations (*) and (**) will lead us to a definition of
matrix multiplication (see Fig. 5.3). So

(a b) (XI) = (axi + bX2)


e d X2 ex I + dX 2
It is worth noting that the same matrix notation we have used here for two
equations in two unknowns could clearly be used for m linear equations
in n unknowns: given
allx2 +a12x2 + ... +alnxn = e l
a21 XI + a22 x 2 + ... + a2n X n = e2
94 Mathematics for Seismic Data Processing

Fig. 5.3

we can write the equations Ax = e where A is the m x n matrix (aij),

x{J and e{J


This looks like multiplication and we define the product of an m x nand
an n x I matrix by the rule above; i.e.

allXl +aI2x~' .. +a1nxn)


Ax= ( :
am1Xl + ... +amnxm
which is an m x 1 matrix. Notice that we are thinking of vectors in this
context as column matrices, i.e. n x 1 or m x 1 matrices. Let A and B be two
matrices, where A is m x nand B is n xp say. We then define the product
AB by thinking of B as made up of p n x I columns. Let

and define

b·=
I ( ~li)
.

bni
then AB = (AbJ. Ab 2 , ••• ,Abp ). Notice that Ab i is an m x 1 matrix so that
AB is an m x p matrix. A few examples should make things clearer:
Examples

5. Let A= G_~), C) B=

Th
en AB =
(I2 _2)(1)
1 1 = (12xlx 1-1
+2XI). (3)
xI = 1

6. Let A= (1
2
2), B=
-I 1
(1
1x 1+ 2x 2) = (3 5)
2xl-lx2 1 0
Matrices 95

7. Let A = G_~), = C B 2 _~)


Then AB = (AC), AG), A( -~)) = e°5 3-4) =(3 °5 -1)
6 +2 1 8
If A is an m x n matrix and B is a q x p matrix we can only multiply A
and B if n = q. In that case we say that A and B are conformable. Beware:
A B
if and B A
are conformable, and A
need not be! If is 2 x 3 and is B
3 x4, BA is not defined, for example if

A= (12 2 0) and B= (2
1 23 64 4)5
I I
3 156
Also if A is m x nand B is n x p then we have defined the product AB to
be m x p. If m = p so that BA is defined then AB is m x nand BA is n x n.
Thus AB is in general a different size to BA. A
A matrix is said to be square
if for some m it has size m x m. If A and B are both square m x m matrices
then AB and BA are both defined, and are both square (of the same size
as A and B) but it does not follow that AB and BA are equal. For example,
if

A=G ~), B= (~ ~)
then

AB= (G ~)(~), G ~)G)) = G~)


but

BA = ((~ ~)G), (~ ~)(~)) = (~ ~)


This example also shows that BA = 0 does not imply that either A or B is O.
From our definitions it is clear that if x = (x], ... , xm) is a 1 x m matrix,
i.e. a row-vector of size m, and if

is an m x I matrix, i.e. a column-vector of size m, then we have a product


x . Y= X1YI + X2Y2 + ... + xmYm, which is a I x I matrix; any such I x I matrix
can be thought of just as a number, and vice versa.
With this multiplication in mind we can write the product of two matrices
in another, very convenient way. If A is m x nand B is n x p, and we write

A=(]
96 Mathematics for Seismic Data Processing

where each aj is a I x n matrix, i.e. a row-vector, and B = (b l , . . . , bp ) where


each bj is an n x I matrix, i.e. a column-vector, then
AB=(aj' bj ) is m xp
where for any appropriate i and j the entry cij of AB is given by

= ail blj + aiZbZj + ... + ainbnj


We should notice that this is a convolution of ai and bj . A similar formula
has already been seen in Chapter I § 2 with respect to the multiplication
of polynomials.
There are some simple rules for mUltiplication.
A(BC) = (AB)C if the products exist
A(B+C)= AB+AC if the products exist
(A +B)C=AC+BC if the products exist
We let In be the n x n matrix with I's down the "leading" diagonal (iith
entries) and 0 elsewhere, i.e.

(I 0I 0)0
13 = 0 and so on
o 0 t
then if A is m x n, AIn = A and ImA = A. If we write I we mean In where n
is not specified. Such matrices are called identity, or unit, matrices.

5 SPECIAL TYPES OF MATRICES

This is just a list of matrices with special properties which turn out to be
useful. We begin by defining the transpose of an m x n matrix A. The
transpose AT, is an n x m matrix whose i,jth entry is the j, ith entry of A.
An important relation is that (AB) T = BTAT, the proof is straightforward but
not obvious.
Examples

~ !) then AT ~G
8. If A=G
D
9. If A= ( -I
-I
o0) then AT = (-I0 -~)
If A is a square n x n matrix such that A = AT then A is called symmetric.
This amounts to insisting that for i ¥- j we must have aij = aji' If A is square
Matrices 97

and A = - AT then A is called skew-symmetric. This amounts to insisting that


a jj = 0 for each i, and a jj = -a jj for each pair i, j where i ¥ j.
Exercise 4
Check the following are symmetric:
2 3
!)
(~
0 -1

(~ ~) H
3 4
3
2 4 1I
4 5
5 6
;)
Exercise 5
Check the following are skew-symmetric:

-D (-; -i)
:i
(-~ ~) (-: 0
-2
0
-l-i
1

As we hinted above, the entries ajj 1::; i::; n in an n x n matrix are called
the elements of the leading diagonal.
A square matrix A is said to be orthogonal if AAT = I. We will see later
that orthogonal matrices have importance in some problems of interpreta-
tion. Clearly I is an orthogonal matrix, since IT = I and II = I.
Exercise 6
Show that

(i) A=(~ ~)
(ii) A ( I/J"2 I/J"2)
= +1/J"2 -1/J2
I/J3 1//6
(iii) (
A = I/J~ -2/J~ are all orthogonal
I/J3 I/J6
A square matrix is said to be diagonal if the only non-zero entries (if
any) are down the leading diagonal.
Example 10
5 0 0 0 0

(~ ~)
0 0 6 0 0 0
3 0 0 -1 0 0
0 0 0 0 -2 0
0 0 0 0

A matrix A is said to be invertible or non-singular if there is a matrix B


such that AB = BA = I. It is clear that for this to happen A and B must be
square and of the same size, say n x n. Later we will see how to check P
98 Mathematics for Seismic Data Processing

whether or not a square matrix is invertible. A square matrix which is not


invertible is called singular. Note that these terms are not usually applied
to non-square matrices.
Example 11
If

(2 I)
A= 0 3 and B =
(1~ _1)i
then

and

BA=(~ -D(~ ~)
=G ~~~) =G ~) =12
Exercise 7
Check that if A =

Exercise 8
eD and B = ( _ ~ -2)
3 then AB = 12 =BA.
Check that if

A~G :D and B~H ~: -D


then AB = 13 = AB.

6 MATRICES AS FUNCTIONS
One important way to look at matrices is to view them as functions from
vectors to vectors. Let A be an m x n matrix and let C(n) and C(m) denote
the set of all n x 1 column vectors and the set of all m x 1 column vectors,
respectively. Now A defines a function from C(n) to c(m) by sending x to
Ax where x is in C(n). To confuse things AT defines a function from C(m)
to C(n) by: y goes to AT y, where y is in C(m).
If R(n) is the set of all 1 x n row vectors and R(m) is the set of all 1 x m
row vectors, A also defines a function from R(m) to R(n) by x goes to xA
where x is in R(m).
To say that A is invertible means that the function defined by A has an
inverse map because if AB = I = BA then B(Ax) = Ix = x. To say that A is
symmetric is the same as saying that the function defined by A is the same
as the one defined by AT.
Matrices 99
The functions defined by matrices are rather special. They are called
linear because they preserve addition and scalar multiplication: for any
vectors XI. X2 and any number A, we have
A(xl +X2)=Axl +AX2 and A(Ax)=AAx
In fact given any linear function f from e(n) to elm), i.e. a function f such
that
f(X2 +X2) = f(xl) + f(X2) and f(Ax) = Af(x)
we can find an m x n matrix A such that f(x) = Ax. Consequently matrices
are fundamental to the study of linear problems (or, in practice, problems
that can be approximated by linear techniques).
If we take the special case of 3 x 3 matrices then e(3) is just normal 3
dimensional space. If we fix a coordinate frame for e(3) then a linear
function from e(3) to e(3) is a function that preserves straight lines and
fixes the origin. So matrices represent functions of this sort. In real life we
are frequently concerned with functions (or transformations) that preserve
the length of a vector. We can write the length of a vector x rather neatly
in vector form as
(length ofx)2 = xT . x (where x is a column vector)
or
(length of xi = x . xT (where x is a row vector)
Note that xT is a I x3 row vector (or a 3 x I column vector respectively),
so the product xT . x (or the product X· XT, in the other case) is a I x I
matrix, i.e. a scalar. Note also that we have to make sure we are using the
right formula: for example, if

is a 3 x I column vector then xT = (XI. X2, X3) is a I x3 row vector, and


xT . X= x~ + x~ + x~, but

is a 3 x 3 matrix.

(D ~
Examples

12. p(O, 1,0)', then xT • x~(O, 1,0)' (0+ 1 +0) ~1

13. x~ (i) then x x~ (1,1, o>(i) ~(l + 1+0)~ 2


T •

We frequently write +J(xT . x) = lxi, the length or norm of x.


100 Mathematics for Seismic Data Processing

If we take two vectors x and y in C(3) (the set of column vectors) then
xT . y is called the scalar product of x and y. You may have encountered a
special case in the "dot product" of vector mechanics. An exercise in
three-dimensional geometry shows that
xT 'y
Ixllyl = cos ()
where () is the angle between x and y. (Beware: this only works if neither
x nor y is zero.) In consequence we say two vectors are orthogonal or
perpendicular if xT • Y= O.
Let A be a 3 x 3 matrix such that the function it defines "preserves
distance" and suppose y = Ax. To say that A preserves distance means that
yT 'y=xT.x for all x. So
(AX)T ·Ax=xT·x forallx
Thus xTATAx=xT·x for all x.
From this relation it can be shown that ATA = I. So we can now see the
importance of orthogonal matrices, they are precisely those that preserve
scalar products and hence distance and angles. For ifYI = AXI and Y2 = AX2
and A is orthogonal,
yiY2 = xiATAX2 = XTIX2
=XiX2
There are two final comments which may be worth making. One is that
the dimension 3 is in no way special in this context and the whole theory
and discussion could be carried out over C(n) for arbitrary n. The same
definitions of length and of scalar products work without any difficulties.
So far this discussion has been on the basis of real matrices and vectors.
There is no reason why we cannot allow complex entries and then we get
complex matrices and vectors. In this situation we introduce the complex
conjugate A of a matrix A = (aij) which is obtained by changing the entry
aij to its complex conjugate iiij. (Recall that this means, if aij = xij + iYij, where
xij, Yij are real, then iiij = xij - iYij.) It then turns out that scalar multiplication
is now iT . Yand that instead of orthogonal matrices we use unitary matrices,
i.e. those matrices A such that AT. A = I. For a real vector x, we have i = x
so iT = XT, and for a real matrix AT = AT. Thus, if a complex vector or matrix
happens to have all real entries these new definitions are the same as the
ones given for real vectors and matrices.

7 LINEAR EQUATIONS
In this section we will use procedures with matrices to solve systems of
linear equations. Let
allxl +aI2x2+'" +alnxn = b l
a21 . . 2 + ... +a2n x.n =.b2
x I +a22x (i)
Matrices 101

be a system of m linear equations in n unknowns XI> X 2, ••• , X n • We write


this as Ax == b where

A = (aij),

The matrix AI = (A, b) which is m x(n + I) is called the augmented matrix


of the system (i), and looks like

There are certain "elementary" ways of changing the system (i) which
will not alter the set of solutions, where the set of solutions of (i) or solution
set of (i) is the set of all x E ern) such that Ax == b.
Firstly, if we switch two equations in the system the solution set will not
change.
Let us take a 2 x 3 system to illustrate.
4xI + X 2 + X3 == I
2xI - X2 + X3 = 2

This system is clearly the same as


2x I - X 2+ X3 = 2
4x I + X2 + X3 == I

The matrix of (**) has the same rows as the matrix of (*), but they have
been swapped round.
Secondly, we can multiply any equation in (i) by a non-zero constant.
We can multiply the first row in the example by a ~ to get
XI -h2 +~X3 =I
4xI +x2 +X3 =I
This is equivalent to multiplying the appropriate row of the matrix by the
same constant.
Thirdly, we can add (or subtract) any non-zero mUltiple of one row to
any other. In the example (***) above we can take 4 xthe 1st row away
from the 2nd row. This gives
XI -!X2 +h3 = I
OXXI +3X2-X3=-3

Using this example we can complete the process to find the solutions as
follows
XI -~X2 +~X3 = I
X2 - tX3 = -I (dividing by 3)
102 Mathematics for Seismic Data Processing

XI +0 X X2 +%x3 = 1 (1 times 2nd row added to 1st row)


X2-~X3 =-1

Now if we assign X3 to be A the solutions are given by

and
XI =1-~A
x3=A

It is instructive to see the augmented matrices at each stage.

G -1 ~)
G
-1
D
G ;)
I I
-2 2

G
I

-~)
-2 2
3 -1

G
I I

-~)
-2 2
I
3
I
0
(~ -D
3
I
-3

The final matrix is said to be in row reduced form.


The whole process could have been done in terms of matrices forgetting
the original equations.
This leads to the concept of elementary row operations on a matrix:
(i) Interchange two rows,
(ii) mUltiply a row by a non-zero constant, and
(iii) add a non-zero multiple of one row to another.
An m x n matrix is called a (row) echelon matrix if
(i) the first non-zero element in each non-zero row is 1,
(ii) the leading 1 in any non-zero row occurs to the right of the leading 1
in any preceding row,
(iii) the non-zero rows ~ppear before the zero rows.
An echelon matrix is called a reduced echelon matrix if
(iv) the leading 1 in any non-zero row is the only non-zero element in the
column in which that 1 occurs.
Any matrix can be converted to the reduced echelon form by applying
the elementary row operations in sequence.
Matrices 103

Examples

G !)
4
13.
6

-G 2
6 ~) divide 1st row by 2

-G 2
0 ~) 3 x 1st row away from 2nd row

=G 2
0 ~).

14.
G:)-G -D 3 x 1st row away from 2nd row

-(~o -~) -4
5 x 1st row away from 3rd row

-(~o ~) -4
dividing 2nd row by -2

-G D adding 4 x 2nd row to 3rd row

-G D taking 2 x 2nd row away from 1st row

~G D
-1

(-~
15. 2 4

2 -1
-2
3
-~) 2nd + 1st row

+3 +4 2 +3

j
2 3 2 2 x 1st row +2nd row

-(-i
3 -2
3 x 1st row-3rd row
2 -1 3
3 +4 2 2 x 1st row-4th row
104 Mathematics for Seismic Data Processing

-~)
1 2 3 2

-(
0 7 7 2
4th row ~ 2nd row
0 -4 -10 -3
0 -1 -2 -2
1 2 3 2

-i)
0 -1 -2 -2

-(
2nd row x-I
0 -4 -10 -3
0 7 7 2

-:)
2 3 2

-(
0 2 2 4 x 2nd row + 3rd row
0 -4 -10 -3 -2 7 x 2nd row - 4th row
0 7 7 2 4

I 2 3 2

-(
0
0
0
0
0
2
-2
-7
2
5
-12
-i)
II
2 x2nd row-1st row
(-4) x3rd row + 4th row

I 0 -1 -2

- ( 0
0
0
0
0
2
-2
2
5
-32
-f)
3rd row~4th row

-~)
0 -1 -2 3rd row + 1st row

- ( 0
0
0
0
0
2

-I
2
-32
5
2 x 3rd row - 2nd row

2x3rd row+4th row


-34

-~)
0 0

-(
0 0 66
divide 4th row by - 59
0 0 -32
0 0 0 -59
-34 34 x 4th row + 1st row

(
0 0
0 I 0 66 6) 66 x 4th row - 2nd row
-7
- 0 0 -32
0 0 0 -8/5! 32 x 4th row + 3rd row
0 0 0 82/59)
- (0
0 I 0 0 115/59
0 I 0 -79/59
0 0 0 -8/59
Matrices 105

Note: In one ortwo places various manipulations have been done to simplify
the calculations.
Exercise 9
Find the reduced row echelon form of the following matrices:

(i) (-~ 2 ~)
121

(ij) G: ~)
The significance of this procedure is that given a system of equations in
reduced echelon form it is easy to read off the solutions. Assume we begin
with (A, b) and end with systems which when interpreted as equations will
look like this:
+a;,r+IXr+1 ... +a;,nxn = b; p
X2 +a~.r+IXr+I·· ·+a~,nxn=b~

The various possibilities for solutions are given by the nature of the form.
It is probably best to use a number of examples to illustrate.
Examples
16. 2Xl +4X2 = 6
3Xl + 6X2=9
Then we know from earlier calculations that this system has the same
solution set as
Xl+2x2=3 and OX I +OX2=0
So we have X2 = A and Xl = 3 -2A.
17. 2Xl +4x2 =6
3Xl +6X2 = 10

Augmented matrix is
tions we get
4
3 6
(2 106). So by process of elementary row opera-

G~ l~) - G~ ~)
This equation is equivalent to
106 Mathematics for Seismic Data Processing

This is clearly impossible. In this situation we say that the system of equations
is inconsistent. This illustrates the rule that, when the augmented matrix has
been transformed to row echelon form, if we get a row (0, 0, ... , 0, 1) then
the system has no solution.

18. XI +X2+X3= 1
X2 +x3 = 2
x3=4
The augmented matrix is

which transforms to

1
( 00 ~ ~ ~) (~ ~ ~ =~)
- -
0140014
so the solution is XI = -1, X2 = -2, X3 = 4. We have a unique solution. This
corresponds to the transformed version of the augmented matrix being:

(I. :} asterisk being anything.

This is equivalent to A being invertible. If A is invertible and BA = AB = I then


Ax=b~BAx=Bb

~x=Bb

In this case the solution is found once B (known as the inverse of A, written
A-I) is found. However, in practice the method of finding A-I is to use the
transforming procedure above, together with column operations (which are
defined analogously).
These techniques are generally known under the generic title "the Gauss
Elimination Method". There is a vast literature on techniques for carrying
them out on a computer. It is not too difficult to write such a programme;
P the difficulty is to make it efficient and to avoid creating too many rounding
errors.
Example 19
XI + X2 + X3 + X4 = 1
2xI - X2 + 2X3 - X4 = -1
- X I + X2 + X3 - 2X4 = 2

XI +3X3-3x4= 1
Matrices 107
Write down the augmented matrix: then apply elementary operations:

(-;
-1

0
I
2
I
3
-I
-2
-3
-~) 2nd row - 2 x 1st row
3rd row + I st row
4th row - I st row
1

(~ -i)
-3 0 -3 switch 2nd row with 4th row
2 2 -1 and multiply by -I
-I 2 -4
1st row - 2nd row

(~ j)
I -2 4
2 2 -I 3rd row - 2 x 2nd row
-3 0 -3 4th row + 3 x 2nd row
0 +3 -3

(~ ~)
-2 4 dividing 3rd and 4th
-2 4 rows by 6
0 6 -9
0 +3 -3

(~ I -2 4 1st row+3 x 4th row


3
0 1 -2 4th row + 3rd row
0 -I +~ -1)

(~
0 0

-i)
+~
I 0
- 0
0 0
3
2
0
We can write this as
XI +~X4=-!
X 2 +x4 = I
X3 -~X4=!
We can choose X4 = A and then
XI = -!-~A
X2 = I-A
x3=!+~A
So the system is consistent, i.e. has a solution. There is just one parameter
involved A, and the remaining variables are determined by A.
Exercise 10
In the following write down the augmented matrix. Find its reduced row
echelon form. If the system is consistent, find the solutions.
108 Mathematics for Seismic Data Processing

(i) XI +X2= 1
XI-X2= 1
(ii) XI +X2-X3= 1
XI-X2 +2X3 =4
(iii) 2xI + 2X2 - 2X3 + X4 = 5
X I - X2 + X3 - X4 = 6
=7
3x I -4X2 +5X3
(iv) XI +x2 +X3=1
XI-2x2 +3X3 = 3
2XI- X2+ 4x3=5

To end this section we remark that equations of the form Ax = 0 (i.e. with
all zeros on the right hand side) are called homogeneous. Given the solutions
of this equation then if there is at least one solution say x, of the system
Ax = b then all the solutions are of the form Xo +XI where Xo is a solution
of the homogeneous equation. This situation is analogous to the problem
of solving linear differential equations.

8 EIGENVALUES AND QUADRATIC FORMS


We have seen above how matrices may be used for handling linear simul-
taneous equations. But handling matrices is itself sometimes complicated,
and it is often useful to find especially "nice" ways of representing matrices.
It turns out that many important results concerning matrices depend upon
a knowledge of the eigenvalues and the eigenvectors of a matrix. Here we
shall give a brief sketch of these ideas, remarking also that a full development
of them requires the concept of determinants which we are not going to
cover here.
Given a square matrix A a number A is called an eigenvalue (or "proper
value" or a "latent value" or "characteristic value") if there exists a non-zero
vector x such that Ax = A x. The vector x is called an eigenvector associated
P to A.
Example 20

If (_~ ~)(;) =A(;), then y=Ax and -x=Ay. So y=_A 2y and X=

-A 2X. SO A2 = -I and A = ±i.

In general finding an eigenvalue A is equivalent to solving (for A) the


equation
Ax=Ax
This is the same as solving the homogeneous system of equations
(A-Al)x=O
Remember we are trying to solve for A, i.e. we require a non-zero solution
(for x) of the system (*) to exist. If A - AI is invertible and B is the inverse
then B(A - AI)x = o. So Ix = 0, so x = o. Thus if A is an eigenvalue then
Matrices 109

A - A I is not invertible. There is a test for checking whether or not a matrix


is invertible by means of which the eigenvalues can be found in a fairly
straightforward way.
Examples 21
(a) If A = ( I
-I
2) the eigenvalues are 2, 3 with corresponding eigenvectors
4
(say) (2, 1) and (1, I) respectively.
(b) If A = G~) the eigenvalues are 2 and 2 with corresponding eigen-
vectors (say) (1,0) and (0, I) respectively.
(c) If A = (~ ~) the eigenvalues are I and I with corresponding eigen-
vector (1,0). Note that any other eigenvector is (A, 0) for some A¥- O.

These examples illustrate results which can be proved using the test which
we have mentioned: counting repeated eigenvalues the number of times
they occur, every n x n matrix has n eigenvalues (not necessarily distinct),
and for any two distinct eigenvalues we can find distinct corresponding
eigenvectors. There is also an important theorem which states that: if A is
a real symmetric n x n matrix, then all the eigenvalues of A are real and

)
there exists an orthogonal matrix, U such that

...
...
An
where AI • . . An are the eigenvalues of A. This is of great practical value as
in many physical cases A is real symmetric.
Another concept which is related to matrices is that of the quadraticform.
A quadratic form in n variables is an expression LicSj aijxiXj in variables Xi,
Xj with coefficients aij'

Examples 22
(a) x~+X~+2XIX2
(b) 2x~-3x~+4xIX2-X~+X3Xl
(c) XIX2+XIX3+XIX4+X2X3

Quadratic forms have geometric significance. In two dimensions the


equation of a circle is

An ellipse has the equation

and -a hyperbola has the equation


110 Mathematics for Seismic Data Processing

In fact any so-called central conic has an equation in the form ax2 + bxy +
cy2 = r, where the expression on the left hand side is a quadratic form in
two variables x and y. If Lj,;j aijxjxj is a quadratic form and we define
bjj =4aij i <j
bjj=ajj
bij=4ajj i>j
for appropriate values of i and j, then using the symmetric n x n matrix
8 = (bij) and the vector

we can write the quadratic form very simply as x T8x. The 4appears because
of the symmetric nature of the system.
Examples 23
(a) (XI X2)( ~

(b) (XI

(c) (XI

If we change x by a unitary matrix U, by putting x = Uy, we get another


quadratic form (in YI ... Yn), written as yT UT 8y. By the theorem on eigen-
values of real symmetric matrices we know we can choose a real orthogonal
(and hence unitary) U so that

AI 0)
U T 8U= ( A2 ...

o An
and the corresponding form is then

yT (
AI
. '.
0) y=AIYf+A2Y~+'" +AnY~
o An
For example, a process like this is used when we say that an ellipse can
be written as x2/a 2+y2/b 2=r2. In general a central ellipse has the shape
shown in Fig. 5.4, where the dotted lines represent the axes of the ellipse.
Matrices 111

Fig. 5.4

If we choose for U the matrix

(
COS (J -sin (J)
sin (J cos (J
it is not too difficult to check that U is orthogonal and that, if

then we obtain the equation of the ellipse (relative to axes along the dotted
lines) in the standard form:

By finding the eigenvalues of the matrix it is easy to read off properties


of the quadratic form. There are many areas of mathematics (especially
statistics) where the right interpretation will enable us to use this approach.
There are many standard computing techniques for finding eigenvalues,
which means that the method is capable of being used very widely.
Chapter 6

STOCHASTIC PROCESSES,
PROBABILITY AND STATISTICS

1 INTRODUCTION
Often when doing an experiment or monitoring a system we end up with
a sequence of observations, Xl> X2, X3, .•• , X" •••. These may be observations
made at discrete time intervals e.g. monthly sales figures, velocity per second
or we may prefer to think of them as forming a continuous record as time
passes, for example like a pen recorder. Figures 6.1-6.4 give examples of
records like these. Such sequences or traces are called time series and we
shall use x(t) to denote the observation made at time t. (In almost all
applications t is time but one could let t be distance, say down a railway
line and x( t) could be the "height" of the rail.) Thus in Fig. 6.1 x( t) denotes
the seismic noise at time t. Sometimes x, is used to denote the series, usually
when we have discrete time intervals.
Given such a series the obvious questions that arise are
(a) what does it tell us about the system or experiment that gave rise to
the series?
(b) how can we predict future values, or past values that are missing?
In some circumstances x( t) is predictable. Thus if x( t) is the output of
a radio and we know that the input is a signal of constant frequency and
amplitude then x( t) is (pretty well) predictable. In fact with a decent radio
you might argue that the output of a Bach fugue was perfectly predictable.
This isn't always so: if die radio is tuned badly to a distant source then
x( t) may contain "noise". At some point the noise may make x( t) unpredict-
able even when the source is known. Clearly, monthly unemployment figures
or aluminium production figures are not perfectly (if at all) predictable.
To take a specific case we might represent the major features of Fig. 6.2
by
x( t) = a COS(27T/ot + <p)
where 10 is the supply frequency and a the amplitude. A glance at the figure
shows that while this may well be acceptable as a crude description, the
actual value x( t) fluctuates irregularly with time.
Stochastic Processes, Probability and Statistics 113

Fig. 6.1

Voltage
KV

Fig. 6.2

Fig. 6.3

,
,
..

Fig. 6.4
114 Mathematics for Seismic Data Processing

Fig. 6.5

If we were to repeat the observations on say Fig. 6.2 we could build up


a family or ensemble of records x( t) all of which were similar in some broad
sense but differed in detail (Fig. 6.5).
The set of all possible traces x(t) is called a "stochastic process" X( t)
and at any point t in time we observe a realisation of the process x(t), thus
x(t) is one member of the set of possible traces. We need this rather artificial
construct if we are to examine the statistical structure of the observed
realisation x( t). In fact, as we shall see, the possible values x( t) of the time
series are described by the random variable X (t) and its associated probabil-
ity distribution.
Consider two examples:
Example 1
A listener to a radio receiver y( t) where the original signal was x( t).
Assuming atmospheric and electronic distortion we might propose
yet) = x(t) +noise
Thus each possible set of records y( t) will differ slightly but (*) gives an
overall description. Y(t) the stochastic process may describe the behaviour
but the listener hears y( t) the realisation.
Example 2
A coin is tossed in the air every minute with outcomes heads H or tails T.
We may observe:
HHTT .. .
HTHT .. .
THT .. .

depending on the actual outcomes, i.e. each sequence is a time series. The
stochastic process X(t) describes the system, viz. a coin is tossed in the air,
and:
X (t) = "I if the coin is heads, otherwise 0"
x( t) = the actual value I or 0 according as the coin is heads or tails
Thus for the first sequence we have for x( t) the values
II 00 ...
Stochastic Processes, Probability and Statistics 115

Fig. 6.6

Example 3
A drunk stands at the point (0, 0) on the plane. At set intervals of time he
takes a step of unit length in either the x direction or the y direction.
Suppose that the x and y steps are equally likely and the length of each
step is equally likely to be + I or -I. Then Fig. 6.6 describes one outcome
of the "drunkard's walk". The drunkard's walk is the stochastic process,
while each drunk's path is a realisation, or observation.
In Section 6.9 we show how to generate such paths without requiring a
mathematical drunk.

As we see from these examples x( t) denotes the realisation or the observa-


tions while X(t) is the "theoretical process".
To handle time series properly we need some understanding of probability
and statistics so that we can construct suitable models for observations.
What follows gives a brief survey of these ideas, they are not simple and
will take some time to assimilate.

2 PROBABILITY
Suppose we take a stochastic process at some fixed point in time, t. Then
at this moment X(t) will give rise to x(t). Since we are now just looking at
one point in time t we suppress the t suffix and consider the "random
variable" X and the observation x.
The set of all possible values that X can take, say S, is called the sample
space.
Example 4
Suppose X denotes the outcome of rolling a die. Then S consists of the
numbers I to 6 viz. S = {I, 2, ... , 6} and x will be the actual result observed.
Example 5
Suppose X is the diameter of a nominally I cm diameter ball bearing. In
this case S might be the set of diameters between 8 mm and 12 mm. If we
select any particular bearing and measure its diameter we get x, one of the
values in S.
116 Mathematics for Seismic Data Processing

S X(s)·x
x

Fig. 6.7

An equivalent way of looking at the random variable X is as a function.


X takes values in S and gives values x as in Fig. 6.7.
Strictly we should write x = X(s) but nobody does. The complicated part
of this description is the choice of s from S but this is much too complex
to pursue here. We just emphasise that X (or X(t) in the earlier case) is a
function while x (and x(t)) is the output!
The prediction of exactly which of the values in S will be observed is
not possible in most cases but we would still like some idea of the likelihood
of any particular x. In much the same way people who back horses seek
some idea of the "likelihood" of their choice winning. This leads us into
the ideas of probability. The theory is rich and involved, we shall in
consequence just give the principle definitions and attempt to illustrate the
key ideas.
While we demonstrate how to compute some probabilities these calcula-
tions are not central to this chapter and can be skimmed over at first reading.
Suppose A is the "event" that we observe a specified set of values from
S, and that B is another such "event". For example if S = {I, 2, 3, 4, 5, 6}
is the set of outcomes of rolling a die, then A might be the event: {the
outcome is a 6}, while B could be the event: {the outcome is an even
number, i.e. one of 2, 4, 6}. Then to each such event we assign a number
p(A), the probability of A, with the following properties:
(i) p(S) = I
(ii) 0 ~ p(A) ~ I for every event A
(iii) If A and B have no common elements, p(A or B) = p(A) + p(B)
(iv) If A and B are any events p(A or B) = p(A) + p(B) - p(A and B)
(v) p(not A) = 1 - p(A).
(Note that in this definition "A or B" denotes the event comprising those
elements of S which are in A or B or both; "A and B" denotes the event
comprising those elements of S which are in both A and B; "not A" denotes
the event comprising these elements of S which are not in A. See Chapter
I.) A probability of I means that an event is certain, and a probability of
zero means the event is impossible.
Assigning probabilities to events is always difficult and often impossible.
In some cases however when events are "equally" likely then from (iii) we
have the following simple setup:
Let AI> A 2 , ••• , An be equally likely events, i.e. P(Ai) = p(~) = p say (with
Ai, Aj having no common elements); then
n
L p(A;) = np = 1
i~l

so p = 1/ n.
Stochastic Processes, Probability and Statistics 117

Example 6
Suppose that in a family p(baby is a boy) = p(B) = t p(baby is a girl) = p( G).
Consider families of two children: this can happen in the following ways
BB BG GB GG
i.e.
S = {BB, BG, GB, GG}
Assuming each is equally likely we have
p( one of each sex) = p( BG or GB)
= p(BG) +p(GB) =*+*=~
p(two boys) = p(BB) = *
p(two girls) = p( GG) = *
p(first a girl and then a boy) = p( GB) = *
Notice one can have one child of each sex in two ways BG and GB. These
are quite distinct events.
The famous statistician R. A. Fisher had 7 daughters; since there are 128
possible combinations of Band G the probability of this event is ,is,
Example 7
A die is rolled twice, giving the following set of possibilities
(I, I) (1,2) (1,3) (1,4) (1,5) (1,6)
(2, I) (2,2) (2,3) (2,4) (2,5) (2,6)
(3, I) (3,2) (3,3) (3,4) (3,5) (3,6)
(4, I) (4,2) (4,3) (4,4) (4,5) (4,6)
(5, 1) (5,2) (5,3) (5,4) (5,5) (5,6)
(6, 1) (6,2) (6,3) (6,4) (6,5) (6,6)
Notice we have taken the order into account and distinguish between (1,6)
and (6, I)-these are separate events.
As there are 36 outcomes then the probability of any pair (i, j) is just k
So
p(l,6)=p(6, I)=~
p(we obtain a I and a 6 in any order) = p(l, 6) + p(6, I) = is
p(two faces add up to 4) = p(l, 3) + p(3, I) + p(2, 2) = is
p(two faces have the same value) =p(l, I) + p(2, 2) + ... + p(6, 6) =::fu =~
p(second face shows higher number) = p(l, 2) + p(l, 3) + ...
+ p(2, 3) + ... + p(3, 4) + ... + p(5, 6) = ~ = f2
Exercises
1. A pair of dice is thrown twice. What is the probability of getting totals
of7, of 11 ?
118 Mathematics for Seismic Data Processing

2. A bag contains 6 discs numbered 1, 2 ... 6. Two discs are drawn from
the bag. Find the sample space S. What is the probability the sum of the
numbers on the discs is 12? 7? II? Suppose now one of the 6 discs is drawn
from the bag, the number noted and replaced. A further disc is now drawn.
What are the probabilities of the events above in this case?

Often we can only discuss the probability of an event A given that another
event B has already occurred. This is the conditional probability of A given
B, written
p(AIB)
We define this probability as
p(AIB) = p(A and B)/ p(B) provided p(B) > 0
If p(AIB) = p(A) that is B does not affect A then
p(A and B) = p(A)p(B)
and A and B are said to be stochastically independent or more usually just
independent.
Example 8
Suppose we roll a die, then the possible outcome is one of the numbers
{I, 2, 3, 4, 5, 6}. Since we assume that the die is "fair", we deduce
p( outcome i) = ~
If we roll two dice then
p(i on first andj on the second) = p(i on first)p(j on second)

=~x~=-k
This seems reasonable since we assume the dice are independent, i.e. they
do not collaborate, it also agrees with example 7.
Example 9
Suppose we roll die and B is the event that the number observed is even.
Thus p( B) = 4. Define A to be the event that the number observed is 4. Then
p(A and B) !.-p('----no_i_s4--<..)
p(A IB)= =-
p(B) p(even)
_1/6 _!
-1/2 - 3·
In the same way
p(we 0
b serve 5IB)= p(observe 5 and the.number is even) =0
p(number IS even)
since 5 is not an even number.
Exercise 3
There are 37 numbers on a roulette wheel, 18 of which are red. Assuming
a "fair" wheel and independence between spins, find the probability of 26
Stochastic Processes, Probability and Statistics 119

successive red numbers. This happened at Monte Carlo and made the house
rather rich!
Example 10
Suppose S represents the adults in a county who have completed an Open
University course. We classify them by sex and employment:

Employed Unemployed Row


(E) (U) total

Male (M) 460 40 500


Female (F) 140 260 400
Column total 600 300 900

If we select an individual at random, and if M is the event, a man is chosen,


and E is the event, the chosen individual is employed, then
p(M)=~ p(MIE)=:~g while p(E)=~gg=~
i.e. of the 600 possible 460 are men

I p(M and E) 460/900


p(M E) = p(E) 600/900
Exercise 4
If F is the event, the selected person is female, find:
p(F), p(FIE or U)
p(FIE), p(FI U)
Notice there is no reason why
p(MIE) = p(EIM)
The connection is more subtle and forms the basis of a well known result
in probability called Bayes Theorem. In fact
p(E)· p(MIE) = p(EIM)p(M)
We shall not go into detail here but this result is of importance in Kalman
filtering. It is also of major philosophical importance in the study of
inference.

3 PERMUTATIONS AND COMBINATIONS


It is obvious that in these finite cases the main problems are those of
counting. This means we need the ideas of permutations and combinations,
which relate to particular ways of counting arrangements of things.
If we are going to perform two procedures in succession and if the first
procedure can be performed in m ways, and if for each of these ways, the
120 Mathematics for Seismic Data Processing

second procedure can be performed in n ways, then there are mn ways in


which the two procedures can be formed successively.
Thus if we roll a die and then toss a coin we have 6 x2 = 12 possibilities.
If we then draw a card from a pack there are 12 x 52 possibilities.
Suppose we have n distinct objects and we want to arrange them in a
sequence. Then we have n choices for the first in the sequence, and then
(n - I) choices for the second, (n - 2) for the third, and so on. In this way
it can be shown that there are n x (n - I) X· .. x 2 x I different possible
arrangements. For example, if we have three objects, namely the letters A,
B, C, then there are 3 x 2 x I = 6 possible arrangements:
ABC ACB BAC BCA CAB CBA
As a shorthand we will write
n x (n - I) x (n - 2) X· .. x 2 x I = n!
(pronounced "n factorial"), which is the number of permutations of n
objects.
If we are not interested in arranging our n distinct objects in a sequence,
but just wish to select some collection of r items from our original nones,
then it can be shown that this can be done in
n! n x(n -I)x(n -2) x·· . x(n -r+ I)
(n-r)!r! rx(r-l)x···xI

ways. This expression is often ~ritten (;), and is the number of combinations
of r objects from n. Thus, if we want to pick a committee of three people
from ten possible candidates, it can be done in

( 10) = IOx9x8 120 ways


3 3x2xl

A poker hand of five cards can be picked from a pack of 52 in (5~)


ways (which is a very large number)! Assuming each is equally likely then
the probability of each hand is

Of the possible hands, we could pick a hand all the cards of which were
in the same chosen suit (say spades) in C~) ways. Thus the probability of
a hand all of which are spades is

Furthermore, the probability of a hand all of which are in the same suit
Stochastic Processes, Probability and Statistics 121

(whether spades, hearts, diamonds or clubs) is

Exercise 5
Find the number of combinations of two letters from CHLOE. Check your
answer by writing out all pairs.

4 PROBABILITY DISTRIBUTIONS

A list of outcomes together with associated probabilities is called a probabil-


ity distribution. Over the years it has become apparent that some distribu-
tions are particularly useful and we give some examples.

Binomial Distribution
Suppose our "trial" has only two possible outcomes Sand F. Let p(S) =
p, p( F) = q = 1 - p. Then suppose we conduct n independent trials; the
probability that X = r (where X is the number of S's observed) is given by

p(X = r) = (~)prqn-r

Thus p (exactly 17 heads in 20 throws of a coin)

We show how this formula arises as it is a nice example of previous ideas.


Suppose we have n trials of which there are r "successes". A set of trials
can be represented as a sequence of successes and failures e.g.
SSFSFFS ...
The probability of a sequence with r S's and n - r F's is just pr(l_ p)"-r.
However there are (~) sequences with r S's so the

(n)
p(X = r) = r p r (1- p) n - r

This simple distribution has so many useful applications that there are
extensive tables to help compute the probabilities. Figure 6.8 shows the
shape of the distribution for some values of p. Further details can be found
in any good set of statistical tables (for example Statistical Tables by H. R.
Neave, George Allen and Unwin, London 1978).
Example 11
Suppose items come off a production line and the probability that one is
defective is 0.01. Then the probability that exactly one of a batch of 10 is
122 Mathematics for Seismic Data Processing

0.3 P·O.1 P=O.S P=O.9


n·1Q n=10 n-10

0.2

0.1

0.0 L.L...L.....I.---L-_ _~
o 2
I
68100246810
I 024 6 8 10
r~

Fig. 6.8

defective is given by

p(1 in 10) = C~)(0.01)'(0.99)9


=0.09
Simple modifications of this approach are quite possible. For example
suppose we have our independent sequence of Sand F outcomes with
p(S) = p, p(F) = q. Let X = the number of trials before the first S.
p(X = r) = p(r-l F values and then an S)
=p(r-l Fvalues)p(S)
=qr-,p
Exercises
6. 5 items are taken from a large batch in which the proportion defective
is 0.1. What is the probability that 0 or 1 item in the 5 is defective?
7. The probability that a rocket will successfully launch a satellite is ~. Find
the probability that 2 of the next 4 launches are successful. What is the
probability that at least 3 out of 4 are successful?
8. A coin is tossed until a head appears. If N is the number of throws
required what is the best bet for N?

Poisson Distribution
Another useful distribution is the Poisson distribution, usually used when
counting "rare" events. Typically if events occur at "random" at an average
rate of A, then X, the number of events occurring in 0 to t, has the distribution
(At)' e- At
p( X = r) = -'----'-- r = 0, 1,2, ...
r!
We outline the derivation to give some idea how it arises: this can be
omitted at a first reading. Figure 6.9 gives the shape of the distribution.
Stochastic Processes, Probability and Statistics 123

0.3 I

0.2

P(X= rl

o 68101214

0.4

0.3

0.2

681012141618

04

0.3

0.2

0.1

0.0 .L..l-L-'--'_ _ _ _ _ _ _ _ _ _ _----->


046

Fig. 6.9

Suppose events happen at random and that on average the probability


of an event occurring in a small time interval (t, t +St) [that is between
times t and t + t5t] is >"St. We assume that the probability of two or more
events in (t, t + St) is negligibly small. Let Pn(t + St) = p( n events in interval
124 Mathematics for Seismic Data Processing

o to t + 8t), then we can write


Pn(t + 81) = p(n events in 0 to t and none in (t, t + 81»
or (n -1) events in 0 to t and 1 in (t, t +8t)
or (n - 2) events in 0 to t and 2 in (t, t + 8t)

=p(n events in 0 to t and none in (t, t +8t»


+p(n-l eventsinOto tand 1 in (t,t+8t»
+ ...
+ ...

We can now write this as


Pn( t + 8t) = p(n events in 0 to t)p(none in (t, t + 81»
+ p(n -1 events in 0 to t)p(1 in (t, t + 81»
+ p( n - 2 events in 0 to t)p(2 in (t, t + 8t»

= Pn(t)(1- A8t)
+ Pn-,(t)A81
+Pn_l(t)A(5t)2

ignoring terms smaller than 8t gives


Pn(t + 8t) - Pn(t)
81
Letting 81 tend to zero gives

The solution of this set of equations is, after some work,

The Poisson distribution can be used to calculate probabilities of "rare"


events given that we know how many happen in a "long" period. For
example, if we know how many customers to expect in a period, how long
it takes (on average) to deal with a customer, and how many customers can
b'e dealt with simultaneously, we can calculate various probabilities concern-
ing the queue (if any) that might build up. The theory of such bottlenecks
is sometimes called Queueing Theory, and the Poisson distribution is often
used.
Stochastic Processes, Probability and Statistics 125

Example 12
Telephone calls are made to an exchange at a rate of 0.2 per second on
average. If the calls arrive at random and X is the number of calls received
in a minute, since At = 60 x 0.2 = 12 we have
(0.2 x6We- 12
p(X = 3) = 1.77 X 10- 3
3!
(0.2 x60) 12 e- 12
p(X=12)= =0.1144
12!
p( X :::; II) = p( X = 0) + p( X = 1) + ... + p( X = 11)
= 0.4616
p(X> II) = 1-0.4616 = 0.5384
Thus if each call lasts one minute, it is to be hoped that there are more
than 11 telephone lines!
Exercise 9
Tankers arrive at a dock at a rate of 3 per day. The dock has facilities to
unload 5 tankers at once. If X is the number of tankers arriving at the port
in a day, find: p(X:::; 5), p(X> 5), p(X = 0), p(2 < X < 5), given that the
arrivals are a Poisson distribution.

In many cases we might think of the random variable X as having a


continuum of values, for example X might be the length oflife of a light-bulb
in which case the values which X might take are in the range 0 to 5000
(say) and any value in this range is possible. This complicates the ideas of
probability but we can still define a cumulative distribution traditionally
denoted by F(x) where
F(x) = p(X:::; x) = p(X is less than x)
This is a non-controversial function which describes the probability of not
exceeding x. Figure 6.10 shows this function for the Poisson distribution.
When X may take a continuous set of values then the typical shape of F(x)
is as in Fig. 6.11. In many situations F(x) is the function of interest, for
example let X denote the length of life of an electronic component then
F(x) is the probability of failure before time x. Another commonly used
function is the probability density function f(x) where approximately
f(x)Bx = p(x < X < x +Bx)
the technical definition being F(x) = [oof(t) dt, when both of these func-
tions exist (and f(t) may not). In consequence

r
p(a < X < b) = F(b) - F(a)

= f(x)dx

In a sense if we have the "discrete" distributions such as the Binomial or


the Poisson we can think of the continuous ones as approximations.
126 Mathematics for Seismic Data Processing

0.2

0.1

o 1 2 3 4 5 6 7 8 9 ~

o
o , 334 5 6 7 8 9 ~

Fig. 6.10

F(x)

x
Fig. 6.11

Example 13
Suppose we choose a point on the line 0 to 1. Then let x be the observed
distance as in Fig. 6.12 of the point from 0, and let X be the variable giving
the distance of the point from O. If we assume that the probability of the
point falling in any segment is equal to the length of the segment, then
F(a)=p(X$a)=a
and
d
f(x)=-F(x)= I
dx

Two of the classical continuous distributions are:


(I) The negative exponential distribution
f(x) = A e- AX provided x ~ 0
f(x) = 0 otherwise as in Fig. 6.13
Stochastic Processes, Probability and Statistics 127

fIx)

o x,
x
Fig. 6.12 Fig. 6.13

fIx)

Fig. 6.14

in this case
F(x) = 0 when x:50

F(x) = LX A e- At dt = 1 - e- Ax when x 2= 0

In the Poisson distribution the time between random events has this distri-
bution.
(2) The normal (or Gaussian) distribution
f(x) = (2'7TU 2)-1/2 exp{ - (x - /-L)2 j2u 2}
where /-L and u are constants, this is sketched in Fig. 6.14. In this case

F(x) = f!(t) dt = <fJ(x)


<fJ(x) being the usual notation for this function.
There is an added complication here as we are not able to evaluate <fJ(x)
without resorting to numerical methods like Simpson's rule. The distribution
is so important that there are extensive tables of <fJ(x). They are all for the
standard normal distribution where /-L = 0 and u 2 = I i.e.
I
f (x)=--=e- x2/2
.J2'7T
Now if X has a normal distribution mean /-L, variance u 2 and Z is standard
normal then we can show
Z=X-/-L
u
128 Mathematics for Seismic Data Processing

and then

JL
p(X < b) = p ( Z <---;;-
b-JL) = <P (b-
---;;- )

In general

p(a b- JL ) - <P (a-JL)


< X < b) = <P ( ---;;- ---;;-

Example 14
If T is the length of life of a component, and we know
f(t)=2e- 2t t2=O
=0 otherwise
then

F(x) = IX 2 e- 2t dt = 1- e- 2t

p(T< I) = f 2 e- 2t dt = 1- e- 2 = 0.865

or = F(l) = 1- e- 2t
p(l < T < 2) = F(2) - F(I) = e- 2 - e- 4 = 0.117

= l2 2 e- 2t dt=O.117

since,

p(a < X < b) = F(b)- F(a) = tbf(X) dy

p( T > 3) = I - p( T:s 3) = I - F(3) = e -6 = 0.002


Notice that as T is continuous, the probability that T takes any value is
zero, i.e. p( T = 3) = 0, the only non-zero values are for intervals. In con-
sequence
p(T:s a) = p(T< a)

5 JOINT DISTRIBUTIONS

Suppose we have two random variables X and Y, then we might be interested


in their joint variability, i.e. their joint probability distribution. For the case
of discrete variables we. can just list the probability distribution i.e.
p(X=x, Y=y)=f(x,y)2=O
For example, if X, Y can each take the values 0, I or 2, we might get:
Stochastic Processes. Probability and Statistics 129

~ X

Y 0 I 2
0 3/28 9/28 3/28
I 3/14 3/14 0
2 1/28 0 0

In the case of continuous variables we can define a joint density function


f(x, y) such that

p[(X, Y) lies in region R] = ff f(x, y) dx dy


R
Example 15
If
f(x, y) = x(l +3l)/4 0< x < 2 and 0 < y < I

Then p(O<X<1
=0

and~<Y<~)=
1/4 0
otherwise

f
x(l +3 2)
4
l/2
y dxdy
II
which you should find quite straightforward to evaluate. Clearly this idea
can be extended in principle to the joint distribution of n variables XI.
X 2, ... ,Xn using a probability distribution f(x I. ... , x n) or a density
function.
If the variables are independent then we have a simpler situation that
f(XI.' .. , x n) = fl(XI)fix 2), . .. In(x n)
for some functions fl' f2 ... fn having the properties of density functions.
This follows from the definition of independence.

With these multiple variable problems there is a very rich family of


associated distributions. For example, suppose there are three variables: X,
Z, Y. We might be interested in the joint distribution of X, Z, Y if they all
varied in some interesting way. But alternatively we might only be interested
in X and Z, when we would want the distribution of X and Z alone ignoring
Y: this is called the marginal distribution of X and Z. Yet again, we might
be interested in X alone, in which case we would require the marginal
distribution of X.
In addition there are many possible conditional cases. For example, we
might be interested in the distribution of X and Y given Z = z with density
f(x,ylz), or perhaps the density of z given x and yJ(zlx,y).

6 EXPECTED VALUES AND MOMENTS


In most if not all situations we cannot work directly with the density
functions as they are unknown, we need to work with their moments or
130 Mathematics for Seismic Data Processing

expected values. These are the theoretical counterparts of "long run"


averages.
Suppose X has a probability distribution f(x) = p(X = x) where X is
discrete. Then we define the expected value of X to be
E(X) = L xp(X = x) = f.L say.
all x

In the continuous case we use the analogous definition

E(X) = f: xf(x) dx = f.L

The rth moment of X is then defined as


f.Lr = E[(X - f.L)'] = L (x - f.L )'p(X = x)
all x

f:
or in the continuous case as

f.Lr=E[(X-f.L)']= (x-f.L)'f(x)dx

With these definitions then f.L is the "mean" of the distribution, and can be
thought of as its "centre of mass", while u 2 = f.L2 is the variance, which gives
its "spread". You will often see it written as var(x). The next moment f.L3
to some extent measures the amount of symmetry.
Section 6 gives some practical illustrations of these quantities.
Example 16
Rolling a die we have
p(X=i)=t, fori=1,2, ... ,6
so
6
E(X)= L t,i=¥=~=3.5=f.L
6
E[(X-f.L)2] = L t,(i-3.5)2=2.9167
i=1

For the exponential distribution


f(x) = Ae- AX x> 0

E(X)= f oo

o
Axe-AX dx=-
A
1

var(X) = II A2
while for the normal distribution
f(x) = 1I (J2'TT'u 2) exp{ _ (x ;;)2}
E(X) = f.L
var(X)= u 2
f.L3 = 0
since the distribution is symmetric.
Stochastic Processes, Probability and Statistics 131
Example 17
In general
var(X) = E[(X - JL )2] = f (x - JL ?f(x) dx

= f x 2f(x) dx + JL f f(x) dx - 2JL f xf(x) dx


2

= f x 2f(x)dx+JL2_2JL2=E(X 2)-JL2
This result also holds for discrete variables.
Example 18
Suppose we have the density function
200
f(x)=- x> 10
x3
Then
F(t)= f t

10
200
-3
X
dx=
[200] t
--2
2x 10
100
=1--
t
2

and
E(X)= f
oo 200
x·-3 dx=
fOO -200
2 dx=
[200]00
--
10 X 10 X X 10

Now we can think of this as

[ _ 200] L as L tends to infinity


x 10
Then E(X)= -(200IL) +20. Now lIL~O as L~oo so E(X)=20.
Notice that E(X2) does not exist!

The following table gives the mean and variances for some common
distributions

Distribution f(x) Mean Variance

Binomial (:)p'(1- pr-' np npq

x=O, I ... n

Poisson A A

x=O, 1,2 ...

Exponential
A
x>O

Normal
132 Mathematics for Seismic Data Processing

As its name indicates the "expected" value is a long run average. If we


have a situation modelled by a probability distribution then the expected
value approach gives a long run average also for other functions:
If f(x) is a density then the expected value of any function c/>(x) is

E(c/>(x)) = L c/>(x)f(x) dx

while if f(x) is a discrete density f(x r) = p(X = x r)


E( c/>(x)) = L c/>(xr)p(X = x r)
Example 19
Suppose we play roulette and bet £1 on red. There are 37 possible outcomes
of which 18 are red and 18 black. In the obvious notation
p(R) = ~ p(B) = ~ p(not red) = ~
Let c/>(x) be the gain on a turn of the wheel, i.e.
I ifred
{
c/>(x) = -1 ifnot red
E(gain) = 1 X~-1 x~= -f.r= -0.027
On average you would expect to lose 2.7% of your wager per spin.
Exercises
10. The ACME Veeblitzer Co. knows that the distribution of Net profit (in
'OOOs) is given by
Net profit (£'000) 25 15 5 -5
Probability 0.06 0.15 0.06 0.73
Find the expected net profit.
11. Let X be the random variable that denotes the life in hours of a visual
display unit. The density function is
20000
f(x)= {~ x> 100
o otherwise

Show that the expected life of the device is 200 hours.


12. A printing machine has a constant probability of 0.05 of breaking down
on any particular day. We assume that B, the number of breakdowns in a
5 day week, has distribution

p(B = k) = (~)(0.05)\0.95)5-k
since once the machine breaks the remainder of a day is lost. The cost of
repair if a maintenance contract is not held is £250 per call.
If a maintenance contract costing £ 100 per week is available which covers
the cost of all repairs is it worth considering?
Stochastic Processes, Probability and Statistics 133

Similar ideas can be devised for jointly distributed random variables X


and Y but the most usual are the covariance written cov(X, Y) ,and-the
correlation p. As before we can define

E(X) = f f xf(x, y) dx dy = J.tx


E(Y) = ff yf(x, y) dx dy = J.ty

E(XY) = ff xyf(x, y) dx dy

We define the covariance as


cov(X, Y) = E[(X - J.tx)( Y - J.ty)]
which can be shown to be (after a struggle)
cov(X, Y)= E(Xy)-J.txJ.ty
The correlation p is a scaled covariance
,-----
p = cov(X, y)/Jvar(X) var( Y),
var the "variance" being defined earlier, and measures the strength of the
linear relationship between X and Y.
We summarise the details (which are fairly straightforwardly demon-
strated):
(i) \p\ ~ 1.
(ii) If X and Yare independent then p = o.
(iii) If X = aY +b then
p = +1 if a> 0
p = -{ if a <0
If p is nearly equal to + 1 this indicates that X and Yare (almost) directly
related, and that they increase and decrease together, whilst p being close
to -1 gives the idea that X and Yare (roughly inversely related).
Example 20
The joint distribution of X and Y is given by:

~ X

Y 1 2 3 4

1 1/16 0 0 0 1/16
2 1/16 2/16 0 0 3/16
3 1/16 1/16 3/16 0 5/16
4 1/16 1/16 1/16 4/16 7/16
1/4 1/4 1/4 1/4 1
134 Mathematics for Seismic Data Processing

and then we have


1 1 1 1 10 5
E(X)=-x 1 +2 x-+3 x-+4x-=-=-
4 4 4 4 4 2
1 2x3 5x3 4x7
E(Y)= 1 x - + - + - + -
16 16 16 16
1 50 25
=-(1 +6+ 15 +28)=-=-
16 16 8
2 1 1 2 1 2 1 2 30 15
E(X )=-xl+-x2 +-x3 +-x4 =-=-
4 4 4 4 4 2
2
E(Y)= (11
x - + 43x -9+
x 5- +
16-x-7)
16 16 16 16
170
16
1 1 3xl 1
E(XY) = 1 xl x-+2xl x-+-+4xl x-+l x2xO
16 16 16 16
2 1 1
+2 x2 x-+3 x2 x-+4 x2 x-
16 16 16
311
+3 x3 x-+4x3 x-+4x-
16 16 16
87
16
15 25 5
var(X)=2-4=4

87 5 25 38 19
cov(X, Y)=16-2: X g= -16=-g

:. p = _1:/ J~x 1: = - J7~9x8 = - J~~o = -0.776


7 REAL DATA SAMPLES

Example 21
Suppose we have 100 measurements of length (in mm)
22.5,20.1,23.3,22.9,23.1,22.0,22.3,23.6,24.7,23.7,
24.0,20.4,21.3,22.0,24.2,21.7,21.0,20.1,21.9,21.9,
21.7,22.6,20.9,21.6,22.2,22.5,22.2,24.3,22.3,22.6,
20.1,22.0,22.8,22.0,22.4,22.3,20.6,22.1,21.9,23.0,
22.0,22.0,21.1,22.0, 19.6,22.8,22.0,23.4,23.8,23.3,
Stochastic Processes, Probability and Statistics 135

22.5,22.3,21.9,22.0,21.7,23.3,22.2,22.3,22.8,22.9,
23.7,22.0,21.9,22.2,24.4,22.7,23.3,24.0,23.6,22.1,
21.8, 21.1, 23.4, 23.8, 23.3, 24.0, 23.5, 23.2, 24.0, 22.4,
23.9,22.0,23.9,20.9,23.8,25.0,24.0,21.7,23.8,22.8,
23.1,23.1,23.5,23.0,23.0,21.8,23.0,23.3,22.4,22.4.
First we try to give a pictorial idea of the data. The simplest method is the
histogram-to construct one we take a grid of lengths and count the number
of observations in the grid.
These histograms summarise the data and give an immediate pictorial
representation. They are very important in getting a feel for what is going on.

Interval of grid Midpoint Tally Frequency

19.5-19.9 19.7 I I
20.0-20.4 20.2 IIII 4
20.5-20.9 20.7 III 3
21.0-21.4 21.2 III 3
21.5-21.9 21.7 Uf1 Uf1 II 12
22.0-22.4 22.2 U11 U11 U11 U11 U11 II 27
22.5-22.9 22.7 U11 U11 II 12
23.0-23.4 23.2 U11 U11 U11 16
23.5-23.9 23.7 U11 U11 II 12
24.0-24.4 24.2 U11 III 8
24.5-24.9 24.7 I I
25.0-25.4 25.2 I I
Total 100

Then we use these figures to draw bars of heights equal to the various
frequencies in our grid. Technically the area of the bar is equal to the
frequency and this can enable us to cope with grids with unequal intervals
(see Fig. 6.15).
Example 22
Figure 6.16 gives the distribution of the age at death of a sample of infants
born in the UK.
Clearly we have a background population and in each case the sample
and the associated histogram is used to make some inference about the
population. We might feel that the frequencies give an approximation to
the probabilities, thus we might suspect in Example 21
p(measurement lies between 22.0 and 22.4) = (2Jo
We can also use numerical values to summarise the data XI. X2 • •• Xn
(i) the sample mean x = 1/ n L~~( Xi
In Example 21 x = 22.555.
(ii) The median M which has the property that half the values are less
than M. To compute M we usually rank the observations in order
x(I), X(2), ••• , x(n) where x(I) is the smallest, x(2) the next smallest and
136 Mathematics for Seismic Data Processing

Fig. 6.15

so on. Then if n is divisible by 2 we take M = 4[x(n/2) + X(n/2)+1]'


Otherwise we have
M = X«n+l)/2)

Thus 21, 6, 62, 47 have M=(21 +47)/2=34 while 21,15,33,24,12


has M =21.
(iii) The mode m which is the most commonly occurring value.
These values give the "location" of the data. It is important to realise
that they will vary from sample to sample. Thus x will not be the
same for two samples, there is a distribution of x values over samples.
Example 23
Consider the population of numbers given below.
1032849727711185325312206
Then possible samples of size 5 are

sample x
I 7853 4.8
097 I 8 5.0
53146 3.8

We use x to estimate the unknown mean of the population f..L. x is variable,


as one might expect and gives an estimate of f..L. Theory tells us that as the
size of the sample increases the variation (in general) decreases.
The median also can be thought of as a location estimate. It is often used
in preference to x when it is suspected that the data are contaminated with
large or small observations from another source. In this circumstance it is
less affected than x.
(iv) The sample variance
1
S 2 = - - L (Xi-X)2
n -1
Stochastic Processes, Probability and Statistics 137
40 -

h If---r-
I-'Ir---"""-I""'I---'I
o 5 10
Age at death (years)
Fig. 6.16
is used to estimate the population variance. For Example 21 S2 =
(l.077f·
(v) Various other quantities are used; Qu the upper quartile, i.e. the
value that exceeds 75% of the observations.
QL the lower quartile, i.e. the value that exceeds 25% of the observa-
tions.
I the interquartile range IR = Qu - QL.
From the sample and histograms etc. we attempt to make inferences
about the unknown population structure. The details are complex
and we shall restrict ourselves to one simple application.
Exercise 13
For the data below construct a histogram. Compute the statistics i, S2, and
M. Mark i on your histogram.
1 3 4 5 1 1 3 4 o 6
7 7 4 2 3 2 2 1 3 3
4 6 1 2 1 6 4 1 2 1
6 4 4 1 3 2 2 4 6 1
o 4 5 3 3 1 o 3 2 4
4 o 2 4 2 o 1 5 2 1
3 o 3 3 2 3 4 6 2 4
2 4 6 o 3 2 2 1 3 2
1 3 4 2 2 2 2 1 3 7
1 1 2 6 2 3 3 o o 2
4 4 3 3 1 4 1 3 1 1
2 5 2 3 3 1 3 3 3 1
3 1 2 2 5 o 3 1 1 5
2 o 6 3 6 1 2 3 2 1
3 2 4 2 2 1 2 5 2 o
1 3 3 8 5 4 6 o 6 4
5 3 3 2 2 2 5 2 4 8
2 1 2 2 3 2 4 2 2 1
4 3 3 4 6 1 1 5 o 2
1 3 1 3 3 5 2 3 5 3
138 Mathematics for Seismic Data Processing

8 TWO VARIABLES
For two samples we can also estimate the correlation p. Suppose we have
n pairs of observations (Xi, 1';) for i = 1, ... , n. If we define
_ 1 n _ 1 n
X=- '" L.. X- Iy=- I 1';
,
n i=1 n i=1
and
n
Sxx = I (Xi-X)2
i=1

i=1

i=l

then we take as our estimate of p the number:


r = SXy/J SxxSyy
r is an estimate of the correlation p, p being the population value, based
on the sample. In consequence r~ 1 implies X and Y vary together while
r~ -1 implies the relationship is an inverse one. Values of r near zero
suggest no relationship. The sort of (X, Y) plots expected are given in Fig.
6.17, but do not expect anything as clear cut outside a textbook.
Despite textbook assertions and Fig. 6.17 it is very difficult to judge the
value of p from diagrams.
To give a further illustration of two-variable techniques consider the
following. Suppose we have (X1YI), (X2Y2), ... , (xnYn) and that we can assume
Yi = a +bx +e j j

i.e. Yi and X are related but there is an "error", perhaps of observation, e


j j•

We assume that the errors are independent and have common mean 0, and
common variance (F2.
To find a and b we might choose the values which minimise
n

Q= I (Yi-a-bxi
i=1

giving the diagram in Fig. 6.18.


To find the minimum of Q(a, b), a function of a and b, we differentiate
and set the derivative to zero
aQ n
-= I 2(Yi - a - bXi)(-I) = 0
aa i=1
and

or
Stochastic Processes, Probability and Statistics 139

y
.....
.....
r=O
1
r=O.9
..... . .
.... .:...
y • •• y
....

x ~ Fig. 6.17
x

"m
y !luum;u,:
_ _ _ _ _ _. .
(X_'!2) • •
,

x
Fig. 6.18

and

Thus
L Yi = L a + b L Xi
and
L Xi.Yi = a L Xi - b L x7
which give the "normal equations"
L Yi = na + b L Xi
L X;)Ii = a L Xi - b L X7
We should check that this does give a minimum but we omit this detail and
hope the reader will believe us.
It is easily checked that
a = y-bi
and
b= SXY
Sxx
This is the regression of Y on X
Example 24
Patients' kidney function is monitored by measuring the level of a trace
element injected into the blood stream as time goes by.
140 Mathematics for Seismic Data Processing

y = log(level) t = time after injection

1.3956 10
1.3734 12
1.3852 18
1.3416 20
1.3498 24
1.3697 26
1.3493 32
1.3175 36
1.2599 42
1.3136 48
1.2737 50
1.1946 100

Then calculation gives


b= -0.0023
a = 1.4055
and
log(level) = 1.4065 - 0.0023 t
with the figure as in 6.19.
Exercise 14
Plot the data given below and find the best fitting straight line
y X

1.9858 1.0000
4.6843 2.0000
3.8751 3.0000
10.9987 4.0000
9.9644 5.0000
15.2463 6.0000
16.6968 7.0000
15.8197 8.0000
21.1323 9.0000
23.3355 10.0000
26.0283 11.0000
28.5071 12.0000
27.3629 13.0000
25.0314 14.0000
28.6330 15.0000
31.9960 16.0000
32.9099 17.0000
38.2365 18.0000
34.2972 19.0000
41.5566 20.0000
Stochastic Processes, Probability and Statistics 141

1.5

1.3

1.1
-- '0 __

t(secs)

Fig. 6.19

9 SIMULATION AND MONTE CARLO METHODS

Nearing the end of this chapter we take a brief look at methods of simulating
experiments. This look should also illustrate the relationship between a
random variable and its realisation. Our methods are based on a sequence
of "random numbers". The definition of random numbers can raise quite
nasty problems, we shall just assume devices (or equivalently computer
programs) exist which produce sequences of "random" digits. In BASIC
the function is RND(X). By random digits we assume equally frequent
occurrences of each digit, but with no discernible pattern. Table 6.8 gives
a set of numbers produced by such a generator. We shall, as an approxima-
tion, use numbers from this table and assume that they are random.
Suppose we have a random variable X with a distribution p(X = 1) =!
p(X = 0) =!. We look for a mechanism to generate realisations of X and
values of Os and Is. Suppose we choose a random digit from the table. If
it is one of 1, 3, 5, 7, 9 we deem X to have the value 1 while if it is 0, 2,
4, 6, 8, then X is deemed to take the value O. Thus the sequence of digits
30458 gives X values 1,0,0, 1, O. If we had to choose X to take the values
Head and Tail then we would have simulated the tossing of a coin. If
X = red or black then we would have simulated the outcomes of a roulette
wheel.
To handle more complex problems we first consider how to produce
values for U where U is uniformly distributed between 0 and 1, i.e. U is
a random number between 0 and 1. We can work to d decimals as follows:
choose d digits, 60219 say, and place a decimal point before the left hand
one, to get 0.60219 as the required number.
Example 25
For the game "chuck a luck" you are invited to bet £1 a number from 1,
2, 3, 4, 5, 6. Three dice are thrown and you win £K if the number of times,
K, your number appears is non-zero, otherwise you lose. Thus if you back
4 you win £1 for one 4, £2 for two 4s and so on. If 4 doesn't occur then
you lose your stake.
142 Mathematics for Seismic Data Processing

Random Digits
60219 01405 ',6662- !<lOOO ()46lH <39266 4a.48 64441 16007
86944 6490/ 8(;Ol::': o06{)~ 6IB:C0 OBi57 053;~0

091:"4 95027 4009:) 66664 8/::';26 6-+~)t:2 2359li 5BO't7 13669 33766
4821/ }16<10 /jf;jOJ 006/2 .. .30'>'9 4'1;,60 216)'9 3/862 51031 02;393
942<10 24<167 62636 5/29~: 953<1.3 822/6 01452 02/64 95827 6365~>

51749 37889 '>'6641 130"74 59861 60211 29095 09672 77489 13400
80309 33825 66220 24424 65317 03088 64654 6/504 26771 55108
29320 06216 20788 21712 88886 66767 09120 33219 69719 38069
06684 89301 23299 47598 97659 16735 96393 40863 19069 48330
19647 05272 25832 48938 25174 66654 19643 47573 56068 28029
91627 34990 66789 77256 40213 17982 40322 47825 79699 89296
95272 87270 10276 28031 15651 49008 54462 68420 44737 90'J4~j

55093 68372 81382 26263 66387 11334 40456 78640 20932 551~)6

09798 61378 02517 72648 63039 22456 22820 17868 63496 8254~~

~S6587 61599 30946 66912 53639 65269 87144 95920 83838 73762
84002 57484 19497 28527 52711 66041 93180 13714 17029 18370
7 6~.~ 18 40374 09.300 22761 <16493 86684 35873 33471 55101 ?l86f.l
22867 74847 38427 34953 70/25 04573 14/05 32877 853f.l3
43840 74666 /2489 34264 52871 56411 65459 29192 88637 15307
99590 78662 /'/1:"7 3%32 73964 106Bl 57011 48183
75:5'>'9 10606 0/64.3 804'>'4 67713 12631 /4,,76 91027 67943
376.,0 62921 :"039<1 :;'14/8 21~IBO 61830 :.'.'1394 B7699 08694
33235 26~'78 :j940:::' 24/1:';: 15576 995BB <,15962 /1~68 62385 612~jEl

04(132 6/:"6<1 668,:),:) 49.541 20099 7'1204 ()9..524 94647 0::'6U ;:~',8l0

58563 376,,), 12043 52'778 18828 74't19 0:::1445 17254 10615 04137
82898 31192 07944 3029"7 912T7 81234 51435 4:'.:.295 22966 03944
05070 52557 86600 76672 64175 48824 29124 38816 08937 510B7
31415 31947 68217 25701 20043 36307 71783 99230 88528 4591~~

29077 31027 52213 38563 00430 12550 29660 87182 65095 93923
70489 49458 24852 81311 45193 39468 47522 07207 55866 4246"7
47407 21451 77500 19656 98062 41106 29548 64092 21026 673M,
64196 91133 55018 41382 85029 05463 34210 78635 44976 79710
54521 93753 64520 96323 92639 31233 51428 80460 88712 0137El
l2836 76023 51517 74013 86542 78707 00396 75640 11430 60114
2"7047 74090 68775 26473 66635 96127 10657 90027 40835 90660
;n:360 00710 48466 54551 91516 48198 24992 39476 91497 051/0
51799 16501 54529 18207 41438 66220 55908 54785 47129 06123
8"7319 19686 53825 52234 62992 03034 02617 73932 36097 87399
21049 33319 98784 59544 41446 13,,34 01320 68857 95034 41044
07049 67869 19680 275'11 64365 44709 55913 80179 99003 20214
',841., 913'>'5 6:"036 04382 77641 7:,814 67753 /~J084 44899 261B9
3/126 97848 4~49~ 63/16 <180<11 1 ;,<,I~''t 19560 6B333 47670 67316
B3::>'64 829;j() lOll/ 41790 841()1 8B:"O() :J()l44 64101 Ci9741 9434<.1
036<18 "7097<1 /1287 64161 21891 30146 60406 2J999 88491 86876
67054 30"700 95609 52898 97103 2;jl·?3 2348:) 12514 36341 5721~~

061<10 84691 00363 21523 13857 2B417 21830 2B099 58370 74675
97891 96114 81915 17792 02369 12693 61261 67722 21420 02'J49
51559 50287 60103 17184 10242 78478 18412 03317 97694 51163
02602 05475 57445 16753 60581 74783 68269 90193 87792 319()~.!

90896 61776 89585 57563 63118 19080 84127 11245 13011 44592
Stochastic Processes, Probability and Statistics 143
05787 03472 0.0694 0_0046
~ )( )()( )

o
o~----------------~--------~----~o
w
Fig. 6.20

There are 63 = 216 possible outcomes for the fall of 3 dice and we find
(and you might like to check) that

p(No luck) p(win £1) p(win £2) p(win £3)


125 75 15 1
- -
216 216 216 216
If X is used to denote the gain
p(X=-I) p(X = 1) p(X = 2) p(X =3)
0.5787 0.3472 0.0694 0.0046
We shall now try to simulate this game using our table of random numbers.
Thus we need some way of generating -1, 1, 2, 3 with the probabilities
given above.
We first pick a random number U between 0 and 1, this could be the
0.60219 as described above. Now consider the interval 0 to 1 as in Fig. 6.20.
U will fall in the first segment with probability 0.5787, in the second with
probability 0.3472 and so on. A simple scheme is thus
if O:s U:s 0.5787 choose X = -I
if 0.5787 < U:s 0.9259 choose X =1
if 0.9259 < U:s 0.9953 choose X = 2
if 0.9953 < U:s 1 choose X = 3
If U values are then chosen we can simulate X s as below:

U X Gain

0.6022 1 1
0.1794 -1 o
0.2367 -1 -1

We stop here before it becomes too expensive.


Exercise 14
A drunk stands at the point (0,0) on the plane and makes steps of unit
length either horizontally or vertically. Suppose

p(step in ± y direction) =!
p(step in ±x direction) =!
144 Mathematics for Seismic Data Processing

and
p(stepis +I)=~
p(step is -I) = ~
from the possibilities open to the drunkard. See Fig. 6.6.
Exercise 15
Generate 5 samples from the Binomial distribution

P(X = r) = (;)(0.4)'(0.6)4-r r= 0, 1,2,3

Exercise 16
The number of accidents on a stretch of road per day is known to be a
Poisson variable X mean A = 0.15, i.e.
(0.1 5)'e- oo 15
P(X=r)='O-""'~-­
r!
By using random numbers simulate 2 successive weeks of accidents. How
close is your average number of accidents to the population mean value 0.15?

This technique of simulation gives us a powerful tool for studying prob-


lems which are intractable by other methods. We can evaluate integrals and
invert matrices using these techniques:
Example 26
Suppose we wish to evaluate 1= Jba g(x) dx. First we write this as

f
I=i g(y~a») dy
so we can save time and think just of integrals like I = J~ g(x) dx. Now
1= f g(x)f(x)dx wheref(x) = 1

= E(g(x»
where X has a uniform distribution f(x) = 1 where O:s;x:s;1. We can take
a sample of uniform numbers for X and approximate I by
1 n
I =- I g(Xi)
n i~1
where the Xi are uniform random digits. We know J~ eX dx = [eX]b = e -I =
1.71878. Simulation gives 1.7194 with n = 10.

10 CONFIDENCE INTERVALS
You should now appreciate that when we take a sample and estimate a
population parameter there must be some error. Statistics has many tech-
Stochastic Processes, Probability and Statistics 145
niques for reducing error, finding "best" estimates and testing hypotheses
about parameters. We will not discuss these as they would require a separate
volume, however we think that you should encounter one useful idea in
this area based on the "Central Limit Theorem".
The Central Limit Theorem is of critical theoretical importance in Statis-
tics, but also has many useful applications. It enables us to deduce the
probability distribution of a sum or an average even when the distribution
of the individual components is unknown.
As you can imagine we often have sums of observations and so this result
is of crucial importance.
A simple form is as follows:
Suppose we have n observations each with mean JL and variance (J'2.
Then if x = (I Xi) / n, the average, we can prove z = (x - JL ) / «(J' / .JTi) is normal,
mean 0 and variance I; notice that the variance is reduced by JTi and we
gain precision. A simple example of its use is in the analysis of rounding
errors. Suppose, because of rounding error, the last digit of a number is
"noise" and is equally likely to be any of 0, 1, 2, ... ,9. Suppose X is a
random variable taking these values then
E(X)=4.5 var(X) = 8.25
If we add 1000 numbers then we might ask what is the probability that the
sum of these digits exceeds 4500?
p(sum> 4500) = p(X > 4.5) = p(Z > 0) = 0.5
from tables of the normal distribution. This is a simple example of a
profound result whose importance is hard to overstate.
Example 27
Suppose we have a Binomial distribution

p(T = r) = (~)pr(l- pr-r

where n is large. T is the number of successes in n trials. Suppose we take


Xi as a random variable at the jth trial and suppose
Xi = I if jth trial is success
Xi = 0 if jth trial is failure
Then
P(Xi = 1)= P
P(Xi = 0) = 1-P = q
and
T=IXi
We can now apply the Central Limit Theorem to T and get (after some
work using JL = p, (J' = pq):

p(TSr)=p(zs T-n p)
Jnpq
146 Mathematics for Seismic Data Processing

where z is standard normal. Thus p( T:s q) = 0.9662 from tables when n = 15,
P = 0.4. Using the approximation
p(T:s9)=p(z< 1.842)
=0.9673

A further consequence of the central limit theorem is the importance of


the mean and variance in getting some broad ideas about data. Suppose
we have a mean X based on a large number n( > 50) observations with
mean IL and variance cr 2 •
The central limit theorem tells us that X has a normal distribution with
mean IL and variance cr 2 In. Notice that X has the same mean as the
population value but a smaller variance, cr 2I n. As the sample size increases
the precision increases as the variance declines. The accuracy increases as
rn.
A connected idea you may encounter is that of a "confidence interval".
This is a widely used and little understood concept which we can derive as
follows.
Suppose we take a small error a and ask for what values a and b can
we say
p( a < IL < b) = I - a

Since IL does not vary, it is just unknown, this is taken to mean, what
interval (a, b) traps IL with probability I - a? After some algebra we can
find the interval is

-
p( X - ZI-a/2
cr IL < X- + zl-a/2 ..;-;,
J;;< cr) = 1- a

where <I>(zl-ad = 1 - a12.


The interval is a I OO( I - a)% confidence interval, explicitly

- cr
90% confidence is X ± 1.6449 Fn

- cr
95% confidence is X ± 1.96 ..;-;,

- cr
99% confidence is X ± 2.5758 ..;-;,

Thus we may conclude X ± 1.96 cr j fn traps the unknown population mean,


IL,95% of the time.
If n exceeds 50 we can replace cr by s = ..;U:·( Xi - X)2j (n - I)} to a reason-
able approximation. Thus if 200 observations give

X=50.0175 and s=5.1352


Stochastic Processes, Probability and Statistics 147

a 95% confidence interval for JL would be


5.1352)
50.0175±1.96 / ( J200

i.e. 50.0175 ± 0.3631.


We can assume that the probability the interval does not include JL is
0.05 i.e. we are mistaken once in 20 occasions.
Exercise 17
Find a 90% confidence interval for the mean JL given the data in Exercise
13.

You will notice that if you require great confidence then for any sample
size the interval will be wide while as the degree of confidence reduces so
does the interval.
Notice that the width of a 95% confidence interval is 2 x 1.96 cr/J-;' =
4cr/"jn. Thus, whatever cr, increasing the value of n decreases the width of
the interval. In addition if we have accurate observations and hence have
correspondingly small values for n for a given precision.
There is an interplay between the degree of precision required and the
sample size.

11 STOCHASTIC PROCESSES
With this sketch of the basic concepts we can now turn our attention back
to stochastic processes.
First we consider an example of a simple queue where the probability
structure is known and we can get (with a struggle) some exact results.
Suppose that people arrive at a counter randomly at a rate a, i.e.
p[ one arrival in (t, t + 8t)] = a8t
p[more than one in (t, t +8t)]=0
At the counter the service time has a Poisson distribution such that
p[service finishes in (t, t + 8t)] = f38t
Now at any time (t + 8t) there are n people in the queue, this could be
(a) because there were n at time t and nobody arrived or departed in
time 8t
(b) because there were n + I at time t and someone was served in the
time 8t
(c) because there were n -I at time t and someone arrived in the time 8t
(d) there were n people at time t and there was one departure and one
arrival
(e) some other events of small probability.
Then as
p(no arrival) = I - a8t
p(no departure) = 1- f38t
148 Mathematics for Seismic Data Processing

we have: Let Pn(t) = pEn persons in the queue at time t]


Pn(t + 8t) = pEn people at time t and no arrivals or departures in the next 8t]
+p[ n + 1 people at time t and one departure no arrivals in
the next 8t]
+ pEn -1 people at time t and one arrival and no departures
in the next 8t]
+ pEn at time t and one arrival and one departure in the
next 8t]
=Pn(t)(l- a8t)(l- ~8t)
+ Pn+I(t)~8t(l- a8t)
+ Pn-I (t)a8t( 1- ~8t)

+ Pn( t)a8t~8t
If we ignore terms in (8t)2 we have
p~( t + 8t) - Pn(t)
8t = ~(Pn+I(t) - Pn(t)) + a(pn-I(t) - Pn(t))

dpn(t)
dt = ~{Pn+I(t) - Pn(t)} + a{Pn-I(t) - Pn(t)}
Solving this system of equations enables us to say quite a lot about queues
with the assumptions above.
If we assume the queue has reached a steady state then dpn( t)/ dt = 0 and
we can find (after some manipulation) that
Pn = (l_p)pn where p = a/~
and we can find the expected number of customers in the queue. Also, if
T is the time a customer spends in the queue
p(T> t) = P e-!3(I-p)t

From our "simple" model we obtain some very useful results which
clearly have wide applications. Notice however that we are talking about
a model with known parameters. For an individual queue the numbers and
arrival times cannot be predicted.
Usually the boot is on the opposite foot. We know the actual values
observed but do not know the values of the stochastic processes. This is a
very much more difficult problem which we start to approach in later
chapters.
Chapter 7

FOURIER ANALYSIS

1 INTRODUCTION

It was an inspired observation of Fourier that nearly all functions could be


thought of as sums of sine and cosine waves. In this chapter we intend to
describe the mathematical background needed to understand the methods
which are used to carry out this analysis. In the first section we consider a
nicely behaved situation where the function we are analysing is periodic.
It then turns out that when we analyse the frequencies involved they are
only nice simple ones occurring at discrete intervals. The spectrum and
phase angles are easy to see in this case.
We then have to consider the situation when the function is not so well
behaved and then we use the Fourier Transform. In this case the so-called
spectrum is a continuous function and the interpretation is not so clear. In
this chapter we discuss the mathematical ideas but in Chapter 8 we will
apply these techniques to the problem of time series.
Before we begin the mathematics seriously can we remind you that all
angles are measured in radians.

2 FOURIER SERIES

A function f is said to have period 21 if f(x + 2ml) = f(x) for all integers m,
i.e. f(x + 21) = f(x), and f(x - 21) = f(x). The two obvious such functions are
cos x and sin x both of which have period 27T. Notice that cos (7TX/ l) has
period 21.
In practice many functions are not periodic but we are concerned with
functions defined in a particular interval -I < x:s I, for some I. We then try
to find an analysis of this function over the interval as a sum of sine and
cosine functions. The secret of the technique is to use the relations for
integrating sine and cosine functions as referred to in the chapter on
integration (Chapter 3).
In Chapter 3 Section 2, we proved the relations
fo
27T
sin nx cos mx dx = 0 for all natural numbers nand m
150 Mathematics for Seismic Data Processing

These relations are called orthogonality relations. A family of functions


fn (x), n = 0, I, ... are called orthogonal over some interval (a, b) if

L b
fn (x)fm (x) dx = 0 whenever n;f:. m

There are many orthogonal families and some, e.g. Walsh functions, have
been used in image processing, but in Fourier Analysis the trigonometric
or related exponential ones are the usual families to consider. It is not
difficult to extend our original relation to

f l
-I
nX1T mX1T
sin - - cos - - dx = 0
I I
This is done by using the same expressions as in Section 2 of Chapter 3.
Ifwe assume that our functionf(x) is written as a sum of sines and cosines
a L
f(x) =-+ 00o ( rX1T rX1T)
ar cos-+br sin-,-
2 r~1 I
then by integrating as follows

l
f_If(X)COS ( )
mX1T ao fl mX1T
- , - dX="2 -I cos-,-dt

+ ~
r-I
00 {
ar
f I rx1T mX1T
cos -,- cos -,- dx

f
-I

I rx1T mX1T }
+br -I sin-[-cos-[-dx
so

fl f(x) cos(mx,1T) dx = ao fl cos mX1T dt

f
-I 2 -I I
00 I rx1T mX1T
+ r~1 a r -I cos -[- cos -[- dx

We now need to consider the integrals

f l

-I
rx1T mX1T
cos -,- cos - - dx
I
Using the relations in Chapter I
cos(A + B) = cos A cos B -sin A sin B
cos (A - B) = cos A cos B +sin A sin B
we get
rX1T mX1T 1 ( (r + m)x1T (r - m)X1T)
cos - cos - - = - cos + cos -'------'---
I [2 [ [
So if

I = f l
-I
rX1T mX1T
cos -,- cos -[- dx
Fourier Analysis 151

then
1
I =-
II cos
(r + m )X7T
1
1
dx +-
II cos
(r - m )X7T
1 dx
2 -I 2 -I

and if r~ m,

I_![sin(r+m)X7T).
2 1
1
(r+m)7T
]1-I
+![sin(r-m)X7T).
2 1
1 ]1
(r-m)7T -I
Thus 1=0.
If r = m ~ 0 then cos( r - m )X7T /1 = I and

1-
_![.(r+m)X7T).
sm 1 (
1
)
]1 +2![]I
X -I
2 r+m 7T -I

=1.
If r = m = 0 then r + m = 0 and r - m = 0 and so 1= 1+ 1= 2/.
Now returning to our earlier equation (*) we see that

L/(X) cos (m;7T) dx=l· am

(Note: the reason why ao was chosen this way should now be apparent.)
If we now consider a similar analysis for

f 1 f(x) sin (mX7T) dx = ao


1 2
f 1 sin mX7T dx
1
{II
-I _I

<Xl mX7T mX7T


+ ~ ar cos-1-sin-1-dx
r-I -I

+ br f I
_I
rX7T mX7T }
sin -[- sin -[- dx (7.2.3)

we get

To summarize

and

and
ao L
f(x) =-+ <Xl 7Trx) +br sin ( -7Trx)
a r cos ( -[-
2 r~1 [
152 Mathematics for Seismic Data Processing

One important mathematical comment. In this analysis at an early stage


an expression J~I (L~=I ... ) was written as L~=I (tl ...) without comment.
There is need of some serious discussion to justify this step. However as is
true in real examples the nature of the functions f we deal with normally
are well-behaved and we will interchange integrals and summations without
any further comment.
What this mathematical analysis has done is to show that the coefficients
Q, and b, can be calculated in terms of certain, hopefully, easily carried
out integrations.
Before doing some examples it is useful to make some comments that
make the calculation easier for special types of functions. Recall from
Chapter I that a function f(x) is said to be even if f(x) = +f( -x) and odd
if f(x) = - f( - x). If f is an even function then b, = 0 for all r, i.e. f can be
written as a sum of cosine functions. If f is odd then f can be written as a
sum of sine functions, i.e. Q, = 0 for all r. This is a simple (but not trivial)
exercise in integration.

3 SOME EXAMPLES OF FOURIER ANALYSIS

Example 1
Consider 1= 1, and f(x) = x 2 for -I < x:s I. Then b, = 0 for all r as x 2 is
an even function. We have

Qo= fl
-I
x 2 dx

= [X3] I

3 _I

=(t+1)=~

f
while
Q, = I x 2 cos rX7T dx
-I

(after some calculations)

Thus
2 I 4 (-1)'
x = -3 +2 L - 2 - cos( r7Tx)
11' r

Sometimes one can obtain curious formulae by looking at special values


in the expression. For example if we put x = 0 we obtain
O=~+~ L (-1)'
3 7T 2 , r2
Thus
2
7T I I I
12= 1--+---+·
- 4 9 16
.. •
Fourier Analysis 153

Fig. 7.1

Example 2
Let us consider a wave given by
if 0< x < I
f(x) = { I
-I if -I <x <0
illustrated in Fig. 7.1. This is an odd function so we know thatf(x) has an
expression involving only sine waves. So f(x) = I br sin 1Trx/ I, and
I
br = -
fl 1Trx
f( x) sin - dx
I -I I

=!
I
fO -sin (1Trx) dx+!
-I I I
II° sin (1Trx) dx
I

=! [cos (1Trx) . -.!...] ° +! [-cos (1Trx) . -.!...] I


I I 1Tr -I I I 1Tr 0
=! (-.!..._ (-1)'/) +! (_ (-1)'1 + I)
I 1Tr 1Tr I 1Tr
[Note: cos 1Tr = (-I)'.]
So br = (2/ 1Tr)(l- (-I)'). So br = 0 if r is even. Thus

f() 4 ~ . 1Tx(2m + I)
x =- L.. sm
1T m~O I
There is an interesting point to observe about this function. The first is that
f(x) is clearly discontinuous at x = o. The function jumps from -I to + I.
If we evaluate the Fourier series, 4/ 1T I:~o sin 1Tx(2m + 1)/ I at x = 0 we get
0, since sin 0 = O. So
4 00
f(O) 'i' - I sin(O)
1T m=O

[Also note that if we consider x=~l, we get I =(4/1T)(1-~+~-~+·· .),


yet another formula for 1T.] It is possible to use numerical methods to
calculate the Fourier coefficients U r and br. We have done this on a very
simple home computer using the numerical technique Simpson's rule, which
was referred to in Chapter 3. For the two examples just calculated we
obtained the following results (to 3 decimal places):
154 Mathematics for Seismic Data Processing

Calculated Theoretical

al -0.406 -0.406
a2 +0.101 0.101
a3 -0.046 -0.046 Example I
a4 +0.025 0.025
as -0.017 -0.017
a6 +0.011 +0.011
bl 1.273 1.273
b2 0 0
b3 0.424 0.424
b4 -I x 10-3 0 Example 2
bs 0.256 -0.0255
b6 -I x 10-3 0

This however was using mathematically precise data and was fairly slow.
It is interesting but not in general a practical technique.
Sometimes we evaluate a function via its Fourier series, truncating the
series after N terms.
The truncated Fourier series is clearly not going to match f(t) exactly.
In fact at discontinuities the approximating series always overshoots the
mark very slightly (by 9.09%). This effect is known as Gibbs phenomenon
and is illustrated in Fig. 7.2. The figure shows the approximating curve to
the wave in Example 2 with a fairly large scale. Locally the errors can be
quite large despite the diagram.

4 THE PHASE, AMPLITUDE AND EXPONENTIAL


FORMULATION
Assume that we have a function expressed as
a co
f(x) =~+ L (a r cos rx +br sin rx)
2 r=1

Writing

ar cos rx + br SIn ar 2 cos rx +---Z--b


. rx = (a 2r + b2r) ( ---Z--b br 2 SIn
. rx
)
ar+ r ar+ r
we can see that
ar cos rx + br sin rx = Ar cos( rx + 4>r),
where Ar = a; + b; and cos 4>r = arl (a; + b;) and sin 4>r = brl (a; + b;), using
formulae from Chapter 1. Thenf(x) = (ao/2) + L Ar cos(rx + 4>r). Ar is called
the amplitude and 4>r the phase angle at r. This has been analysed as if
1 = 7T, to simplify the notation. The spectrum off is the sequence AI, A 2 , ••••
Figure 7.3 shows the spectrum of Example I.
Fourier Analysis 155

o~----+-----4------+----+--

-1 ~ -- - - - --

At

Fig_ 7.3

In many applications the spectrum is the significant aspect of the analysis.


Suppose we analyse the spectrum of the signal from a musical instrument.
The differences that arise between instruments give them their own distinc-
tive timbre, which is caused by the differences in the spectrum. This is when
you have each instrument playing the same note, i.e. with the same dominant
frequency.
An alternative analysis can be based on using the relation e iy =
cos y + i sin y of Chapter 4. We attempt to write our function f(x) as
00

f(x) =I Cr e irx
-00

where we assume that we are considering the interval -7T to 7T. The reason
for the doubly infinite sum will appear if we do a little mathematics. We
evaluate

f:1T e irx e imx dx


and see that this integral is zero unless r + m = 0, in which case it has value
27T. So

Cr
1
=-
f1T f(x) e- Jrx
. dx
27T -1T
156 Mathematics for Seismic Data Processing

In many ways this is a more compact form and allows us to consider negative
frequencies. Using the formulae given earlier it is possible to write the c/s
in terms of the a/s and b/s.
Cr + C- r = ar and Cr - C- r = - ib r
Also C- r = Cr.
If instead of writing f(x) we had originally written f(t) and thought of f
as a function oftime, evaluating the Fourier coefficients Cr leads to the ideas of
frequency domain analysis.
Example 3
Suppose f(x) = cos Ax for -11" < X < 11" and A is not an integer.
Then

Cr =-I frr e- . lTx cos Ax dx


211" -rr
but from Chapter 4
e iAX +e- iAx
cos A x = - - - -
2
and so

Cr = _1_
411"
f-rrrr {ei(A -r)x + e -i(A +r)x} dx

I [e l (A-r)X]1r rr _ I [e- i (A+r)x]1r rr


41T(A - r)i 41Ti(A + r)
Tidying this up and using, again from Chapter 4.
e i8 _ e i8
sin (J = 2i

we eventually obtain
2( -1)',\ sin A1I"
C
r
= 11" (A 2 _ r2) r = 0, 1,2, ...

Notice as before if we have f(x) defined in an interval we change the


argument of the exponential. Thus if we have

f(x) for-i<x<i
then the Fourier series is
00

f(x) = L creirrrx/l
-00

with

Cr = -
I
21
f' _I
f(x)e-lTrrx/1dx.
Fourier Analysis 157

Exercise (lengthy)
Suppose that f(x) is defined for 0 < x < l.
If
00

f(x) =L Cr ei27rrx/1
-00

show that

Cr = 21I f21
0 f(x)
.
e-·27rrx/1 dx

Note J~7r e irx e irnx dx = 0 unless m + r = O.

Complex Fourier series are easier to manipulate than real ones and are
often preferred for this reason. To show how they can give useful results
we introduce the convolution theorem.
Suppose we have two functions f(x) and g(x) both defined for -I < x < I
and
00

f(x) = L Cr ei7rrx/1
-00

00

g(x) = L drei7rrx/1
-00

Then
I
h(t)=-
fl f(x)g(t-x)dx
21 _I

is called the convolution of f and g, written f


that
* g. It is not difficult to show
00

h(t) = L crdr e i7rtr / 1


-00

and hence if h(t) has a Fourier series say,


00

h( t) = L a r e i7rr / I
-00

then
r = 0, ± I, ±2, ...
This latter result that
00

h( t) = L crdr e i7rtr / I
-00

is called Parseval's theorem. In the most common form we choose g(t - x)

f
to be f(x - t) and hence obtain
1
h(O) = -I
I
f(x)2 dx = L
00
c;
2 _I -00
158 Mathematics for Seismic Data Processing

Example 4

f
Suppose f(x) = cos Ax as in Example 3, then
I 4A 2 sin 2 A7T
L
7T 00
- cos 2 Ax dx = 2 2 2
27T -7T r~-oo 7T (A - r )
so for A =0.5
I f7T X 4
L
00
- cos 2 -dx=
27T -7T 2 r~-OO 7T 2 (l- 2r2)

Recall we insist A is not an integer.


Exercises
2. Find Fourier series representing
(i) f(x)=x -7T<X<7T

(ii) COS x 0< x < 7T


f(x) ={
-cos X -7T <X<0
3. Express the series obtained in Example I in complex form. Use Parseval's
result to obtain an expression for J~I X4 dx.

5 FOURIER TRANSFORM

This is an integral version of equation 7.4.1 and enables us to cope with


aperiodic functions. Given f(x) we define F(t) by

F(t)=J-
I foo e-lXtf(x)dx
.
27T -00

Note the definition varies slightly from one author to another. This is often
referred to as transforming from the time domain to the frequency domain
or vice versa. Mathematically the theories are similar and there is no reason
why we should prefer one to another. The advantage is that we can work
either with F(t) or f(x) whichever is most convenient.
The important point is that this process is reversible; if F( t) is the Fourier
transform of f(x) then
f(x) = J-
I foo eix1F(t) dt
27T -00

This is referred to as the inverse Fourier transform, if we denote the Fourier


transform by :!i(f(x» we denote the inverse Fourier transform by :!i-I F(t).
Then :!i-I :!i(f) = f
We can think of :!i as a black box which sends a function of x to a
function of t and then :!i-I is merely the inverse function, in the sense
defined in Chapter I. There is a slight complication because :!i is a function
of functions and that is notationally confusing but the theory as set up in
Chapter I still applies.
As we mentioned above there are many slight variations in the definition
of the Fourier transform, each with a slightly different multiplier. The
Fourier Analysis 159

definition above is used in mathematical physics and has the virtue of being
symmetric. We shall choose a slight modification which does not have a
multiplier, in this we follow E. Robinson.
Given f(x) we defined the Fourier transform F(t) as

F(t)= f:f(X) e- 2rritx dx

The inverse transform is given by

f(x) = L: F(t) e 2rritx dt

and f(x) and F(t) are transform pairs, i.e.


:!F(f) = F and :!F-1(F) = f
There is a good deal of theory about Fourier Transforms which while
interesting is rather peripheral to our main interests. As for Fourier series
we can have a convolution of two functions f *
g, however, in this case

h(t)=f* g(t)= f:f(x)g(t-X)dX

It is not too difficult to show that the Fourier transform H(t) of h(t) is
given by
H(t)= F(t)G(t)
or
:!F( h) = :!F(f * g) = :!F(f):!F(g )J21T.
Often we choose to work with the Fourier transforms of functions because
we can write the convolution as a product and products are easier to handle.
It is both interesting and important to see if we can find a function f
such that f *g = g for any function g.
Consider the function as in Fig. 7.4

! if-t:!::sx::st:!
fd(X)= {d 2 2
o otherwise
If Gd = fd * g we see that
Gd(x) = f~oof(x)g(X-Y)dY
= f d/2

-d/2
I
-d g(x - y) dy

If we substitute z = x - y we get

Gd(x) =d
1 f x+d/2
g(z)dz
x-d/2

This is the average value of g over the interval given by x - d /2 ::s z ::s x + d / 2.
160 Mathematics for Seismic Data Processing

Vx)

Fig. 7.4

F(t)
':" t(x)
,
,
I
I
I F
I
I

-a 0 a x

Fig. 7.5 Fig. 7.6

Clearly as d gets smaller, Gd(x) gets closer to g(x). So we define c5(x) to


be the limit of fd(x) as d approaches O.
This function called the (Dirac) delta function is the function we are
searching for. Strictly speaking we should call it a generalised function
because it has rather peculiar properties.
(1) c5(x)=Oifx~O.
(2) c5(O) is infinite (sometimes authors say c5(O) = 1).
(3) eX)
c5(x) dx = 1.

t:
We can now see that

c5(y)g(x - y) dy = g(x)

t:
So

c5(O)g(-y) dy =g(O)

Thus

The Fourier transform of the constant function is c5 and the Fourier transform
of c5(y) is 1.
Example 5
Consider the square wave in Fig. 7.5. So
I if -a::s x::s a
f(x) = {
o otherwise
Fourier Analysis 161

Then

F(t) = f: e-i21Ttxf(x) dx

= fa e-i21Ttx dx
-a

=[- e;:~;XIa
= L [e21Tiat ~ie-21Tiat]

sin 21Tat
1Tt

So the transform has the form given in Fig. 7.6.


Note: if t = 0 we have to be slightly more careful.

It is of interest to note that the theory of filters is essentially the study


of convolutions and hence the central importance of Fourier theory. The
choice of the right filter is perhaps an art form but to enable one to make
the right choice there are again lots of tables of convolutions. Essentially
you keep trying until one produces a result that looks right.
The integrals involved in working out Fourier transforms can be rather
difficult and to assist the reader we include a table of transforms.

Fourier Transforms: General Results

f(x) F(t)
af(x) aF(t)

f(ax)
I~I F(~)
af(x) ± bg(x) aF( t) ± bG( t)
F(±x) f(~t)

f(x± a) F( t) e±21Tita

f'(x) 21TitF( t)

L:f(S) ds
1
- . F(t)
2mt

L: f(s)g(t - s) ds F(t)G(t)

g(x) G(t)
We can use the duality between the transforms to generate further results.
162 Mathematics for Seismic Data Processing

Example 5
Suppose we want the Fourier Transform of J(t)g(t) i.e. :Ji(fg). From the
table we can deduce
:Ji(f * g) = F(t)G(t)
f:
hence

:Ji(fg)=F* G= F(s)G(t-s)ds

Exercise 5
Show that

r:
Exercise 6

f:
Show that

J(x)g(x) dx = F(t)G(t) dt

Fourier Transform pairs


J(x) F(t)
sin 27Tat
J(x) = 1 -a:sx:Sa
7Tt
= 0 otherwise
8(x)
8(x - a) e-27Tiat

cos 27Tax ![8(t+a)+8(t-a)]

Exercise 7
Find the transform of t e- 7Tt2 by differentiating a suitable function.
Exercise 8
Show that

6 THE z-TRANSFORM

The Fourier transform is only one of many possible transforms. One of the
most often used techniques for manipulating sequences is the z-transJorm.
Suppose we have J(t) defined for t = 0, ± I, ±2, ... then the z-transform of
J( t) is
00

F(z) = L f(t)z-t.
1=-00

Notice some authors prefer to have positive powers of z in the definition.


Fourier Analysis 163
Example 6
Suppose
f(t)=O t<O
f(t) = 1 t~O

Then
00 1
F(t)= L z-t =----1
t~O 1- z
(Note this is a geometric series, see "useful formulae".)

For each f(t) there is a unique z-transform and further if we write z = e iB


we have
00

F(e iB ) = L f(t) e- iBt


l=~OO

which is the Fourier Transform.


The inverse of the z-transform is rather nasty, formally it looks like

f(t) =_1_.1 F(z)zn-I dz


27TI:r curve
enclosing origin

Fortunately we are usually mainly interested in using z-transforms in study-


ing filters in which case we are satisfied with F(z). The chief attraction of
the z-transform is its ability to summarise a, possibly infinite, sequence in
one function F(z) and for this representation to be unique.

7 THE DISCRETE FOURIER TRANSFORM (OFT)

In practical applications one often works with data that are given as a set
of discrete quantities and in this case the finite discrete Fourier transform
can be very useful. We can think of it as a version of the Fourier transform
discussed earlier that is amenable to machine computation.
Suppose f(O),f(1), ... ,feN -1) is a sequence of (possible complex)
numbers, then the DFT of feu) u = 0, ... , N -1 is defined as
N-I
F(v)= L feu) e-27Tiuv/N
U~O

For ease of notation write


W N --e -27Ti/N
and hence
N-I
F(v) = L f(u)W NUV
U~O
164 Mathematics for Seismic Data Processing

The inverse transform is defined as


1 N-1
f(u)=- I F(v)W~ u = 0,1, ... , N-l
N u=O

and these form a transform pair, for details see the following example.
Example 7
Recall
N-1 N-1
I W~' WiVku = I e i (2rr/ N)(v-k)u = N if k = v(mod N)

°
u=o u=o

otherwise. =
i.e. v - k is a multiple of N (See formulae-useful results.)
Using the definition and substituting for F(v) gives
N-1
f(u)= I F(v)W~

=X: [x: f(k) WiVkV] W~


N-1 N-1
= I I f(k) W~-kv
v=O k=O

=f(u) (after some thought)


To see why OFTs are important we sketch the connection between them
and integral forms. This will be the subject of a more detailed study in
Chapter 8.
Suppose we have

1= f: f(t) e- iwt dt

and we need the numerical value. We might approximate the range by an


interval X say thus

1= f X

-Xj2
/
2
f(t) e- iwt dt

Then if we chop X into N intervals we have


N-1
I = I f(nax) e-iwdXn ax
n=O

where for simplicity we label the function values f(O) f(ax) ....
To avoid complications, choose ax = 1 whence
N-1
I = I f(n) e- iwn
n=O

converting from radian measure to hertz gives us the OFT in radian measure.
We continue our study of the properties of OFTs by noticing that f( u) and
Fourier Analysis 165

F( v) are periodic since

F(v)=F(kN+v) k=O,±1,±2, ...


and
J(u)=J(kN+u) k=O,±1,±2, ...

by definition. This result following from the periodicity of the exponential


function.
Rather than go into details of the derivation we will just state the following
results:
given J(u) u=O,I, ... ,N-I
g(u) u=O,I, ... ,N-I

and the corresponding OFTs F(v) and G(v) then

cJ(u) + dg(u)
has a OFT
cF(v) +dG(v)
while if
N-J
h(u)= L J(k)g(u-k) u = 0, I, ... , N - I
k=O

then the OFT of h(u), H(u) is given by


H(v)= G(v)H(v)
Similarly when
z(u) = J(u)g(u)
then
I N-J
Z(v)=- L G(k)F(v-k)
N k=v

Note: In these convolution definitions we always have an F, GJ or g value


by using
J(u) = J(kN + u)
so
J( -I) = J( N -- I)
J(-38) = F(N - 38)
Example 8
Suppose we have
J(u)=a U
u=O, 1, ... , N-I
for some value a.
166 Mathematics for Seismic Data Processing

~ .. " . riO; 1111111, ..... 1111111111 , ... , °


Flkl

________________~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I~I----------------k
arg (Flkl)

------------~II~II~II~II~I~II~II~II-------------k
Fig. 7.7

Then
N-l
F(v)= L ak e-(27ri/N)kv
k~O

I-aN
27riv/ N Os V s N -I (Geometric series)
I-ae
The sequences are shown in Fig. 7.7 for a = 0.8.

We now look at series f(u) which are not periodic. Say


f( u) is given for u = 0, I, 2, ... , N - I
while f(u) = 0 for all other values. We can still calculate the OFT since we
only use values of f(u) defined between 0 and N -1. However the z-
transform of x(u) is
N-l
F(z) = L f(u)z-U
U~O

and putting z = e 27riu / N gives the OFT! An important implication of this


result is that the OFT coefficients give a unique representation of f(u)
u =0, ... , N -I.
Exercise 8
If
aU OsusN-I
f(u) = {
o other u values
show that the z transform is
I -a N z -N
F(z) = _\
l-az
Compare this result with the OFT.
Fourier Analysis 167

o
h

Fig. 7.8

We can use these periodic convolutions of OFTs to evaluate ordinary


ones. Suppose feu) is non-zero for u = 0, I, ... , N\ and h(u) is non-zero
for u = 0, ... , N, and we require
v
y(v)= L h(k)f(v-k)
k~O

a convolution! We can write this as a periodic convolution of OFTs by


adding enough zeros to the feu) and h(u) sequences to make them each
N\ + N2 - 1 long and then mUltiply the OFTs of the two new sequences.
The inverse transform gives the correct y( v).
Example 9
Fig. 7.8 shows two sequences f and h together with the convolution. Notice
N\ = 7, N2 = 4 and N\ + N2 - 1 = 10.

°otherwise. Then
Example 10
= 1, f( 1) = -1, f(j) =
f(O)
F(v)=I-e- iv

• V
= 21. e -iv/2 Sln-
2
Example 11
Suppose
1
f(O) = f(1) = ... = feR -1) = -
R
and
f(j) = ° otherwise
Then
1 R-\ ..
F(v) = - L e- IV) , and
R j~O
168 Mathematics for Seismic Data Processing

using our tables of formulae, a geometric series,


F(v) = eiv (R-I)/2 sin(AR/2)(R sin(A/2»
Example 12
Suppose
f(j) = 0/ j = 0, 1, ... , N - 1
=0 otherwise lal < 1
Then
N-I 1_ ( iV)N
F(v) = L a j e- ivj = a e -iv
j=O 1- a e
and
1
F(v)=1 -iv
-ae
when N is large giving

Exercise 9
Let A be the n x n matrix whose (p, q)th element is W)3. If x =
(x(O), ... , x(N -1» show that the DFT of x can be written as a product of
A and x.

DFTs are important because they can be evaluated very economically


using "Fast Fourier Transform" algorithms. These FFTs have revolutionised
data analysis and we now outline the main ideas.

8 FAST FOURIER TRANSFORM (FFT)

We recall that the DFT is defined as


N-I
F(v) = L feu) w:yUV v =0, ... , N-l
u=o

and in practice N may be anything between several thousand to several


million. For many years these transforms were computed directly, a calcula-
tion requiring N 2 operations of multiplication and addition. The FFT
algorithm reduces the number of operations to something of the order
N log N. This is a remarkable algorithm, which has made many things
possible and has acquired a rich and somewhat confusing literature. Here
we shall just touch on the main points.
Consider the DFT where N is a power of 2
N-l
F(v)= L f(k)W,r
k=O
Fourier Analysis 169
now define two sequences fl (k) and f2 (k)
N
fl(k)=f(2k) k=0,1""'"2-1

N
f2(k) = f(2k + 1) k=O,I'···'2-I.

Then
N-I N-I
F(v) = L: f(k) W Nkv + L: f(k) W Nkv
k~O k~O
keven kodd

(N/~-I (N/~-I

L: f(2k) W-;Jkv + L: f(2k + 1) W;j2k+I)V


k~O

But since
W 2N -- (e27Til N)2 -_ e27Ti/(N 12) -- W N/2

we have (after some manipulation)

where FI and F2 are the DFTs of fl and f2'


So we can write the DFT as the sum of two shorter DFTs with some
"twiddle factors" to make the sum work. Using the periodicity we have

while 0:5 v:5 NI2-I. Otherwise F(v) = FI(v- N 12) + W NvF2(v- NI2).
If N 12 is also even we can split the two DFTs FI (k) and F2(k) and
recombine to make F(k). For N a power of 2 say 2" then we can split into
DFTs of length 2, compute each of these and recombine with the appropriate
"twiddle factor".
We can with a struggle follow a similar procedure when N is not of the
form 2N but has other factors. The technique is messy but the computer
algorithms work like a charm.
The details of these algorithms are interesting but have a jargon all of
their own involving "butterflies", "bit shuffling" and so on. We give a simple
programme following the rediscovery of the FFT by Cooley and Turkey. P
The algorithm outlined above is called the "decimation in time algorithm".
Another popular version is the "decimation in frequency algorithm". This
behaves as follows for N = 2m.
Let
N
k=O, ... '2-1
170 Mathematics for Seismic Data Processing

(compare this with the previous algorithm). Then F(v) can be written
(N/2)-1 N-I
F(v)= L f(k)W Nkv + L f(k)W NVk
k=O k=N/2
(N/2)-1 (N/2)-1
L j;(k) W Nkv + L f2(k) W;jk+N/2)V
k=O k=O

(Nj2)-1
L [fl (k) + e- 7TVi f2(k)] W Nvk
k=O

By considering the even and odd OFTs F(2v) and F(2v + 1) we can, after
some algebra see that these can be obtained from the (N /2) point OFTs of

N
k =0, ... , "2-1

and

g(k) = [fl (k) - fz(k)] W Nk

Both algorithms are very similar and both take of the order of N log2 N
operations. The major difference is that one algorithm computes the short
OFTs and then adds the "twiddle" factors while the other performs the
twiddles first.
There may be advantages when computing FFTs in restricted high speed
memory in using decimation in frequency.
Example 13
Suppose we wish to find the inverse OFT
1 N-I
f(k)=- L F(V)W-;.kv
N v=o

This can be done by using an FFT algorithm with no change as follows.


Take the complex conjugate of f(k) and scale by N, then
1 N-I
Nf(k)=- L F(V)W Nkv
N v=o

and the right-hand side is just the OFT of F(v). Thus with scaling a single
FFT algorithm will perform the transform and the inverse transform.

Unfortunately we do not have the time to give an insight into the mathe-
matics behind the FFT. This hinges on the factorisation of N and the
representation of the OFT as a function of two variables.
The FFT is quite a remarkable algorithm and if coded sensibly is invalu-
able in that it enables one to tackle problems which would otherwise be
impossible.
Fourier Analysis 171

sin At

--- - - - -- - --- -T ---- --- ------~

Fig. 7.9

9 FREQUENCY DOMAIN

We first recall some terminology. If A is a non-negative number and we


consider the time series
x(t) = sin At

then the length of time T required for x( t) to go through one complete


cycle is the period, see Fig. 7.9. Since a complete cycle of sin () requires 217"
radians the period satisfies
AT = 217"

The frequency f in cycles/unit time is the reciprocal of the period


I
f=-T
Example 14
A sinusoidal time series of period 2 seconds has a frequency of I Hertz.
The angular frequency, measured in radians/unit time is
A = 217"J.

By introducing two further parameters A and 4> the amplitude and phase,
a large number of time series with the same frequency as that above can
be generated
x(t)= A sin(At +4» -00< t<oo
(*)
= A sin(217"ft +217"8)
where 217"() = 4> the latter form being in cycles per unit.
Because of the periodic repetition of the sinusoid the phase can be
restricted to the range
-17"<4><17"
or
-4<()<4
172 Mathematics for Seismic Data Processing

It is usually rather easier if we write (*) in the complex form


e(At+<I»i - e-(Ai+<I»iJ
x(t)=A
[ 2i

= Ai [e -(Ati +<I>i) _ e-(Ati +<I>i)]


2

Thus we can view the contribution as coming from frequencies - A and A.

Suppose we have a time series x( t) defined over - T to T and let us assume


n
x(t) = L Ck e i7Tk / T
k=-n
with C-k = Ck as in the simple case above. We assume that it is possible that
frequencies 0, ±1TI T, ±21TI T, ... , 1Tnl T may contribute to x(t). Since we
have a period of 2 T the smallest frequency is f = I I 2 T or 1T I T in radian
measure.
Suppose we think of

I fT Ix(tWdt
E=-
2T -T
as a measure of the "power" in the series between - T and T rather like
the energy dissipated by an electric current. Then

_1_ fT Ix(t)1 2 dt =_1_ fT L Ck ei7Tkl/T(L crei7Trl/T) dt


2T -T 2T -T

=-
Inn
L L CkCr fT e(c7Ti/T)(k-r)1 dt
2T k=-n r=-n -T

Now 1/2T J~T e[i7T(k-r)I]/T dt = 1/21T J:7T eis(k-r) dt which is zero unless
k - r = O. Thus E = L: n ICkl2 and we see that the component at each frequency
contributes its share to the total power. We can plot the c~ and obtain a
discrete power spectrum. Clearly we need only do so for non-negative
frequencies since cn = C- n •
If we suppose x(t) is not periodic but extends to infinity we can run into
problems in the mathematics. A way around this is to say:
Let
X(t) if-T<t<T
Xd(t) = {
o otherwise
i.e. we make a copy. Now as before let
n
Xd = L Ck e i7Tk / T
k=-n
Fourier Analysis 173

then
1
E=-
fT xd(t)2dt= L
n
\Ck\2
2T -T k=-n

Now let T increase to infinity as does n then

energy in x(t) = lim _1_


n~oo2T
f -T
T Xd (t)2 dt

= lim L T\Ckf ~
T~oo T

Here we have a function f( w) which gives the contribution from the frequen-
cies w to w + 5w.
These ideas give the basis for "spectral analysis" a mode of analysis we
discuss in Chapter 8. While the modern approach depends on FFTs we
point out that the ideas here have given useful results which date back to
the time of Fourier.
Chapter 8

TIME SERIES

In this chapter we shall attempt to introduce some of the main ideas and
methods for studying time series. This is a fairly complex subject so we
shall have to give a fairly concise account.
We recall that a time series is a signal, or function of time, x(t), which
exhibits random or fluctuating properties. As outlined in previous units x(t)
can be regarded as one "realisation" from an infinite ensemble of functions
which might have been observed. We shall use X(t) to denote the "stochastic
process" or the random variable and x(t) to indicate the actual outcome.
In most of what follows we shall concentrate on single series for simplicity,
but if the "state" of the process can be represented by a vector of numbers
at the relevant time point say

then it may be necessary to look at the vector time series or mUltiple time
series x(t). Thus it may be sensible to consider x(t) where XI(t), X2(t) and
X3(t) are data representing the vertical component of the earth's motion at
three recording sites. Naturally more complex series raise more difficult
problems.
Given a stochastic process it can have two possible modes of behaviour:
it can be
(a) stationary, that is the mechanism generating the series does not change
significantly in time;
(b) non-stationary or evolutionary.
Stationary series are easiest to handle and we shall look at them in some
detail. Series can be studied in time and in the "frequency domain" via the
power spectrum. We shall examine the spectrum using the ideas of Chapter
7. The concept of a filter also arises in a natural way and we shall look at
filters and their properties. We have no intention of giving a comprehensive
coverage of time series theory and practice and for this the reader should
consult the references.
Time Series 175

1 STATIONARY AND RELATED SERIES


By "stationary" we mean that the statistical properties of X(t) do not change
over time, that is, the underlying probability distributions are unchanged
if the origin is changed. As you might expect this is rather difficult to handle
and it is common to use the weaker idea of second order or weak stationarity.
In this case we require that the expected values
E(X(t»=J.L and E(X(t)X(s»=J(t-s)
do not depend on the choice of origin for t. The second condition requires
the relationship between X(t) and X(s), in a rather special sense, to depend
only on the distance between t and s. The distance between t and s
is usually called the lag. (For details of the rules for Expectation see
Appendix A.)
The simplest sort of indicator of the behaviour of a time series is the
autocovariance function (see also Chapter 6, for an alternative definition)
'Yxx(u) = E[(X(t)- J.L)(X(t +u)- J.L)]
which for a stationary process is just a function of the lag u.
Note 'Yxx(O) is just the variance of the series while
'Yxx(u)
Pxx(u) = 'Yxx(O)

is a scaled version of the autocovariance function called the autocorrelation.


Since
IPxx(u)1 s; 1
(not obviously) this may be simpler to work with in some circumstances.
Their properties can be summarised as follows:
(i) Pxx(O) = 1
(ii) Pxx(-u)=Pxx(u) for real series P
(iii) 'Yxx( -u) = 'Yxx( u) for real series
(iv) If X(t) is continuous then 'Yxx(u) is a continuous function of u.
As we see the autocorrelation function Pxx (u) measures the correlation
coefficient betweeh pairs of values of {X(t)} separated by an interval of u.
Intuitively we might interpret Pxx(u) as a measure of the "similarity"
between a realisation of X(t) and the same realisation shifted to the left
by u.
As u increases we would expect the correlation to decrease. (For large u
we might expect the series to have forgotten its value at time t when it gets
to t + u.) Consequently we would expect 'Yxx(u) and Pxx(u) to decay to
zero as lui ~ 00. The "typical" form of an autocorrelation function is shown
in Figs 8.1, 8.2 and 8.3. The rate of decay can then be interpreted as a
measure of the "memory" of the process. Do be warned: this is a generalisa-
tion-there are processes which do not have decaying functions p(u).
As we shall see the 'Yxx or Pxx functions can be estimated and are of
central importance in practical work.
To illustrate the ideas encountered we shall consider first some simple
models.
176 Mathematics for Seismic Data Processing

X(t)

--u--~

Fig. 8.1

p(u)

u
continuous process

Fig. 8.2

i p(u)

I I
I I
I

I I u
)

discrete process u • ... ".0.,.2 ....

Fig. 8.3

Examples
1. Purely Random Process: "White noise" in discrete time.
Suppose we make observations at times t, t = ... -1, 0, 1, 2, ... and at
these points
X(t)=a(t)
where the a(t) are mutually independent with zero means and common
variance u 2 • This process is called "purely random" by statisticians and
"band-limited white noise" by engineers. For this case
'Yxx(O) = u 2
'Yxx(u)=O u~O
Time Series 177
2. Markov Process: A slight modification of the above gives the Markov or
first order autoregressive process. Suppose X(t) satisfies
X(t)-aX(t-l)= a(t)
with a( t) as in 1 above.
Then we can show

u
2 a
Yxx(U)=U (l-a 2)

for a < 1 and large t. For a = 1 then the series is not stationary-it is called
the "random walk".
We easily see the lack of stationarity with a random walk. We have
X(t)=X(t-l)+a(t)
so
X(t) = (X(t - 2) +a(t -1)) + a(t)
= «X(t - 3) +a(t - 2)) + a(t -1)) + a(t)
= X(O) + a(l) +a(2) + ... +a(t)
Thus X(t) can be thought of as a sum of past "innovations". Now if
var(a(t))=u 2 for all t
var(X(t)) = tu 2
since
t
var(a(I)+a(2)+" ·+a(t))= L var(a(t))=u 2 t
;=1

See Appendix A.
Such processes are very common, suppose for example that X(t) is the
change in a stock price. It is often asserted that this is just the previous
change X (t - I) plus an unpredictable component a( t), i.e.
X(t)= X(t-l)+a(t)
3. Purely Random Process in Continuous Time: One of the nasty points in
the development of time series.is that we encounter problems in the definition
of a white noise process in continuous time. The obvious definition is to
consider a series with
Yxx(O) = u 2
Yxx(u) = 0 u;t' 0
Sadly we must have a continuous function Yxx(u) which in the case above
is not true. We shall avoid the problem by sleight of hand and define
a white noise process as consisting entirely of uncorrelated contiguous
impulses with
178 Mathematics for Seismic Data Processing

zIt) X(t)

Fig. 8.4

8 (u) being the delta function, i.e. 8 (u) = 0 u ¢ 0 and infinite otherwise. As
we do not have to deal with this process in detail it is sufficient to visualise
it as the discrete case with very close sampling values for t.
We shall now look at some of a rather wider class of models for time
series to give some idea of the processes one may encounter.
4. Linear Processes: A linear process X(t) generated by an input Z(t) (Fig.
8.4) of the form

X(t)-J.L= I'O h(v)Z(t-v)dv


where E (Z (t» = O. For Z (t) white noise, variance (72, this is called a linear
process. With some work we can show that

'Yxx(u) = (72 [0 h(v )h( v + u) dv

and the process is stationary if fO Ih (v Wd v is finite


5. Cumulated Process, an example of 4

1
X(t)= T f'I-T Z(v) dv
i.e.
v<O

O~v~T

v>T
(see Fig. 8.5). Then

J.L=E(X(t»=-
1
T
f'I-T
E(Z(v»dv=O

(72 -
'Yxx(u) = T2 iflul~ T

6. The Wiener Process: Suppose Z(t) is white noise and we define

X(t) = Loo Z(v) dv


or
Time Series 179

o T

Fig. 8.5

If t2= t+At, tl = t then

AX(t)=X(t+At)-X(t)= f t
t +4t
Z(v)dv=AtZ(t)

or

A~~ t) = Z( t).
Thus the "derivative" X(t) is given by
X(t)= Z(t)
If X(t) is normal then X(t) is called a Weiner process. Note X(t) is not
stationary.
7. Brownian Motion: This is a phenomenon in physics which describes the
random movement of microscopic particles suspended in a liquid or gas.
A simple one dimensional model is as follows.
At time t let X(t) denote the velocity of the particle (moving in a straight
line). Let Z(t) be a random force acting on it at that time.
Then the equation of motion is
X(t) = Z(t)- aX(t)
or
X(t) +aX(t) = Z(t)
If Z(t) is a purely random process then the process X(t) is called an
autoregressive process of order one.
Note if there is no resistance, i.e. a = 0 then X(t) is a Wiener process.

It is not difficult to conceive of higher order models of the form


X(t) +aIX(t) +a2X(t) = Z(t)
and it is convenient to write DX(t) to indicate the first derivative, D2X(t)
the second and D n the nth derivative. Equation (1) thus becomes
D2X(t) +aIDX(t) +a2X(t) = Z(t)
or
180 Mathematics for Seismic Data Processing

The polynomial in D can be factorised thus


(D2+2D + I) = (D + 1)(D + I)
or
(D 2+SD-14)=(D-2)(D +7)
In fact we can if we wish consider D-I-the reverse of differentiation as
integration viz.
1
-Z(t)=
ft Z(u)du
D -00

This does enable us to solve equations. We omit the details.


In discrete time we can build analogous models
X(t) +uIX(t -I) +U2X(t -2) = a(t)
If we use B (backward shift) as the operator
BX(t) = X(t -1)
so
B2X(t) = X(t -2)
BkX(t) = X(t - k)
then
(B2 +uIB +U2)X(t) = a(k)
Using the obvious notation
B-1X(t) = X(t + 1)
we can have a similar theory to the above. These models are said to be
autoregressive. In the same fashion as for continuous models there are linear
or moving average types, i.e.
00

X(t) = a(t) + L bja(t - j)


j=1

where some bj may be zero. We could use the B operator to give the form
for a finite case as
X(t) = (1 + OIB + 02B2 + ... + OqBq)a(t)
A combination of these forms gives the ARMA (autoregressive moving
average model),
+ cPlB + cP2B2 + ... + cPpBP)X(t) = (I + OIB + 02B2 + ... + OqBq)a(t)
(1
If <I>(B) = I + OIB + ... + cPpBP and 0(B) = 1 + OIB + ... + OqBq then
<I>(B)X(t) = 0(B)a(t)
For example
X(t) +O.IX(t -1) +O.2X(t - 2) = a(t) - a(t -1)
may be written
(1 +O.lB +O.2B2)X(t) = (1- B)a(t)
Time Series 181

Weakly Stationary Series


Suppose we are interested in the analysis of a realisation x(t) how can we
get information about the autocorrelation 'Yxx (u)? The definition above
depends on knowledge of the probabilistic structure, i.e. across realisations,
while we have one realisation through time. Fortunately in most situations
if

rxx(u) = lim -
1 fT (x(t) -11-)(x(t +u)-I1-) dt
T->co2T -T

the time average and


'Yxx(u) = E(X(t) -11- )(X(t + u) -11-)
the ensemble average then rxx (u) equals 'Yxx (u)
Similarly 0/2 T) J~T x( t) dt will tend to E (X( t» = 11-. In the discrete time
case given X(l) ... X(N) we use
1 N-Iul
'Yxx(u)=- L (X(t)-}1)(X(t+lul)-}1)
N t=l

with
1 N
}1 = - L X(t)
N t=l

These equivalences are called the Ergodic Theorem. You might think of
the following analogy.
Imagine being at a ball, one can move around the whole ball and examine
the merrymakers (i.e. ensemble) or sit in the bar and watch over time.
With suitable mixing one might come to the same conclusions about the
composition of those attending the ball.
Given these results we can compute the autocorrelations for our realisa-
tion and thus start to analyse the underlying model. We aim to deduce
something at least about our given stochastic process by our knowledge of
the auto correlations. Before looking further at estimation we turn our
attention to the frequency domain.

The Power Spectrum


As we saw in Chapter 7 we could develop an "energy spectrum" f(w) which
gave the same idea of the "average energy" in the series x( t).
Thus if

then recall

We might suppose that there are infinitely many frequencies and that
182 Mathematics for Seismic Data Processing

it is plausible to imagine a continuous function few) which describes the


distribution of amplitudes Icj l2. Suppose we have few) such that
few) dw
is a contribution to the mean energy in the system. From our description
of the complex form and negative frequencies few) = f( -w). We can
approximate the sum Ij=-n icl by

f:f(W)dW
Our equation * becomes

'Yxx(O) = f:f(W)dW
with a relation very like a Fourier transform. In fact if we consider

'Yxx(u) = lim - I fT x(t)x(t +u) dt


T-+oo2T -T

we have using our convolution ideas

'Yxx(u) = f: eiWUf(w) dw

and the corresponding inverse transform


I
few) =-2
foo .
e-'WUyxx(U) du
1T -00

(see Chapter 7). Note the scaling change in the Fourier transform.
f( w) is the spectral density function and it gives the contribution to the
energy in the system from frequencies lying between wand w + 8w.
Equations (**) are called the Wiener-Khintchin equations. The spectrum
gives us some idea of the underlying random process. In particular if x( t)
has any periodic component say
x(t)=x(t+p)
then there will be a peak in the spectrum at a frequency corresponding to
the period p.
Example 8
Continuous but harmonic series
K
X(t) = I A COS(Wit +tPi)
i=1

where K, A., ... ,AK are constants, the cPi are random, uniformly distributed
over (-1T, 1T). Then we can show
( ) _ ~ (!A;)
Pxx u - K
L...
cos WiU
i=1 I (!Ai
i=1
Time Series 183

Note Pxx(u) never dies out!


K
f(w)=! L A;[5(w+w;)+5(w-wj)]
j=i

i.e. a purely discrete spectrum with jumps at ±Wj.

Example 9
Pure noise series: From example 3 we know that
'Yxx(u) = (T21)( u)
hence using the transform formula above
(T2
f(w)=-
27T
(recalling the definition of the Dirac I) function outlined in Chapter 7).
Thus the spectrum is flat.
Exercise 2
Suppose we are given a series whose autocovariance function is
'Yxx( u) = e -2alul

Obtain the spectrum of the series.

We wish to emphasise at this point that effects with long periods have
short frequencies and hence will appear in the low frequency part of the
spectrum. We defer a discussion of estimation until we have discussed filters
and time domain estimation.

2 ALIASING AND SAMPLING


In many situations we naturally deal with discrete time series, e.g. hourly
temperature readings, daily stock prices, or yearly sun spot numbers. Quite
often however we have a sampled record of a series, we may not be capable
of recording a continuous trace or we may have to compute-in which case
we will have to digitise and hence have a discrete process. In this section
we look at the effect of sampling on the spectrum.
Suppose we have a continuous record and we take observations at intervals
of ~ t-called the sampling interval. The number of samples per unit time
is the sampling rate, 1/ ~t, see Fig. 8.6. So if ~t = 0.05 sec then the sampling
rate is 1/0.05 = 20 observations/second. The series X(t) is replaced by Xd(k)
say where
k = 0, ± I, ±2, ...
Clearly we must lose detail by choosing to sample.
It would seem plausible that if events occur in time scales less than ~t
in length we will not see them. In fact things go further, some harmonics
in the spectral decomposition cannot be distinguished. Figure 8.7 has two
sine functions which are quite different in period. If we just have the values
184 Mathematics for Seismic Data Processing

r---
,
one time interval ~I
i X(t)

~~< Fig. 8.6

Fig. 8.7

)I( )1(

At 2At 3At

Fig. 8.8

at !::J.t, 2!::J.t, ••. as in Fig. 8.8 then we cannot tell which harmonic was involved.
Both of the sine waves would give the same value.
We can demonstrate the result as follows:

assuming (for simplicity) a continuous spectrum and u = s!::J.t say for some
integer s then

'Yxx(s!::J.t) = f~oo eiAtsAf(A) dA


= f- 7T/AI +f7T/AI +f37T / AI ••• foo eiAtsY(A) dA
-00 -7T1 AI 7T1 AI 37T1 AI
-37T/At f-7T/AI
f
= -57T/AI + -37T/AI
7T1 AI f 57T1 AI
+f ...eiAISAf(A) dA
-7T/AI 37T/AI
00 f(2k+I)7T/AI
= L eiAtsY(A) dA
k~-oo (2k-I)7T/AI

Now for any k the integral is


(2k+I)7T/AI
f . eiAIsY(A) dA
(2k-I)7T1 AI
Time Series 185
Now let w = A -2k7r/ at then the integral becomes

J'fr/l1t eiSW-2ks'frif(w +
-'fr/l1t
2k7T) dw
at
but e is )'-2ks'fri = eis). so

'Yxx(sat) = J-'fr/l1t
'fr/l1t 00
e isw L f
(
w+
2 k)
-: dw
-00 ~t

Now we have two for.ns for 'Yxx(u), one is the usual infinite integral,
the second an integral between -7T/ at and 7T/ at, i.e.

We have no contributions to 'Yxx(u) from frequencies above 7T/ at the


Nyquist or folding frequency. If the continuous process has contributions
from frequencies bigger than the Nyquist frequency we can see it on the
sampled trace-the high frequencies are added to lower frequencies and
just interfere with our record.
We can think of the discrete trace as having a "spectrum"

7T
Iwl<-
at
and zero otherwise.
As you can see it is vitally important to choose the Nyquist frequency or
at so we do not lose important high frequencies.
Note: In angular measure the Nyquist frequency is 7T/ at, in cycles it is
1/2at cycles/time. Thus if at = 0.1 seconds, the Nyquist frequency is 5 cps.
The discrete spectrum at frequency 4 cps will be made up of contributions
from f(c) at 4 cps

10+4= 14cps -10+4= -6 cps


20+4=24 cps -20+4= -16 cps

and so on. Figure 8.9 gives examples of digiti sing on the spectrum.
It shows that one should take care in choosing a suitable sampling rate.
Exercises
3. Why do the wheels of a stage coach appear to rotate the wrong way on
film?
4. Suppose the series under study contains two sinusoid components at
frequencies of 100 cps and 99 cps. Given a record what sampling interval
is required?

We finish this section by mentioning the sampling theorem.


If for some number 1 the spectrum of X(t) is zero outside the frequency
interval -1::5 A ::5 1 then the time series can be exactly reconstructed from
186 Mathematics for Seismic Data Processing

~
W
____ -L~--t-----"----~) freq.

~ 2~ ~
) freq.

1
~~,/
o ~ + ) freq.

Fig. 8.9

. va Iues at t h
Its '
e time .
po lOts 7Tk
-1- k = 0, ± I, ±2, ...

and
X(t)= I Sin/(t-7Tk/l)x(7Tk)
k=-oo sin( t - 7Tk/ I) 1
So selecting a Nyquist frequency exceeding 1 ensures we have no problems
over the difference between the sampled series spectrum and the real series
spectrum.

Examples of Spectra (Fig. 8.9)

In most of what follows we shall assume Ilt = 1, i.e. sampling at one cycle
per second. If the spectrum is f(A) and Ilt is not I we can easily get the
new version from

where w = A/ Ilt in radians.


In reality in most fields and especially in seismology we are forced to
work with a digitised record and we shall assume that we have a discrete
series X(t) and observe x(O), x(l), x(2), ... ,x(N). Thus our unknown
autocovariance 'Yxx(u) (autocorrelation) has discrete values and is estimated
Time Series 187

by
1 N-Iul
'Yxx(u)=- L x(t)x(t+lul)
N 1=1

while the power spectrum is f( w) where


1 00
f(w) = - L 'Yxx(u) cos SU
27T -00

f( w) is real and symmetric, and can be estimated by


.. 1 00

f(w)=- L 'Yxx(u)cossu
27T -00
but see later.
We catalogue some models for reference.
Examples
10. Purely random process, discrete case: Here
Pxx(u) = 1 u=O
= 0 otherwise
i.e. 'Yxx (0) = (T2 'Yxx (k) = 0 otherwise. Hence
(T2
f(w)=-
27T
a flat spectrum over the range -7T to 7T.
11. (i) Markov Process-continuous time
X(t) +aX(t) = a(t)
i.e.

so
1
f(w)=- foo exp(-alul) cos wu du
27T -00

(ii) Markov process-discrete time


X(t + 1) + aX(t) = a(t)
'Yxx(u) = aU'Y(O)
so
188 Mathematics for Seismic Data Processing

L
X(t) y(t)

Fig. 8.10

3 FILTERS AND CONVOLUTIONS


Most time series models are linear and can be thought of as a filter applied
to an input. One might consider a model with input X(t) and output Y(t).
The box in Fig. 8.10 (presumably black) can be thought of as filtering the
input X(t) i.e. Y(t) = L{X(t)}. "
For continuous series the sort of model that is simple (and useful) is of
the form

i.e.

or perhaps
Y(t)=(Dq +a\Dq-\ + ... +aq)X(t)
or even a combination
(DP + a\DP-\ + ... + ap) Y(t) = (D q + b\Dq-2 + ... + bp)X(t)
If X(t) is of a known form we can come to some conclusion about the
properties of Y( t). In fact we would hope to infer the form of L from
knowledge of X(t) and Y(t). First we shall just look at the ways we can
characterise the effect of L.
In terms of the spectrum, say
Y(t) = L ajX(t - j)
then
'Yyy(u) = E(Y(t) Y(t + u))

: : ;: L L akajE[X(t- k)X(t +u - j)]


k j

= L L akaj'YXX(u + k - j)
k j

MUltiplying by e iwu and summing over u· gives

If
ct>(B) Y(t) = 0(B)X(t)
Time Series 189
where
p
<I>(z) = 1 + L l/Jrz r
r=1
q
0(z)= 1 + L 8r z r
r=l

then
fy(w)I<I>(eiWW = fx(w )10(e iw )12
Let L be a filter, say giving an output Y( t) of the form
00

Y(t)= L guX(t-u)
u=-oo

This can be put into any reasonable form by the correct choice of gu, e,g.
go = 1, gl = 0.7, gu = 0 otherwise. Then from the discussion above

fy(w) = Iu=~oo gu e- iuw 1


2fx (W)
p
-7T<W<7T

f(w)=L~oogue-iuw is called the transfer function of the filter. Since few)


is complex valued we can in general write it as
f(w)=r(w) e i8 (w)

r = IfI, a function of w is called the gain while 8 is the phase or phase shift.
You may also find it useful to note that for a linear filter
L[ a Y( t)] = aLe Y( t»
L[X(t) + yet)] = L[X(t)] + L[ Y(t)]
If
L[X(t)] = yet)
L[X(t+h)]= Y(t+h)
In continuous time the analogue is

L(Y(t» = f: h(u)X(t-u)du

fy(U) = It: h(u) e-iWUI2fx(w)

Special Forms of Filters

(a) If(w)12={~ wI~lwl~W2


is called a band-pass filter.
190 Mathematics for Seismic Data Processing

(b) If(w)1 2 ={OI Iwl:swo


Iwl>wo
is a low pass filter.

(c) If(w)1 2 ={1o Iwl>wo


Iwl:s Wo
is a high pass filter.
Examples
Let a(t) be a white noise process i.e. fa(w) = u 2 /2Tr.
12. Suppose
Y(t)-aY(t-I)=a(t)
(1- aB) yet) = aCt)
so

2Tr(l- 2a cos w +( 2 )
13. Y(t) - a yet-I) - ~Y(t-2) = a(t)
(1- aB - ~B2) yet) = a(t)
u2
fY(W)=2 Tr II -ae Uu a
-,..e 2iWl2

14. Y(t) - a Y(t -I) = a(t)- ~a(t -I)


u 2 11_ ~ e- iw l 2
fy(w)= II-ae- iw I2
Any spectrum which can be written as a ratio of polynomials of this kind
is said to be rational.
Now output spectrum is fy(w) and
fy(w) = G(w)fx(w)
Now we know that G(w) and fx(w) are Fourier transforms and so by
the (discrete) convolution results
'Yyy(u) = L gu'Yxx(j - u)
Explicitly if we write z = e Uu and gu = g( u)
00 00

L 'Yyy(u)zu= L g(u)zu·L'YxxV-u)
u=-oo u=-oo
00 00 00

L 'YYY(u)ZU = L L g(uhxx(v)zV-U
u=-oo u=-oo v=-oo
Time Series 191

Fig. 8.11

Equating coefficients of z
00

yyy(u)::: L g(s)yxx(u +s)


s=-oo

Thus the filter is just a convolution!

In many practical situations we know the input X(t) and the Y(t) but
do not know the convolution or equivalently the filter. Our next aim is thus
to try and estimate the filter or to deconvolute. We may also have to expand
our ideas to multiple inputs and outputs.
We note in passing that if we have an autoregressive model
«1>(B)X(t)::: a(t)
it can be written as
X(t)::: «1>-I(B)a(t)
i.e. as a moving average model only if «1>-I(B) exists as a polynomial. The
condition for this (and in fact for the stationarity of X(t» is that the roots
of «1>( z) lie inside the unit circle.

Digital Filters (Linear Filters in Discrete Time)


(i) Y(t)::: X(t) - X(t -1) is the differencing filter, here
f(w)::: 11- e- iw l2 see Fig. 8.11, and the gain
. W
g(w)::: 11-e- ,w l:::2 sin"2 while phase

8(w):::(17'-A)/2 A>O
-(17'-A)/2 A<O seeFig.8.ll.
This filter is often used to reduce a series to stationarity.
Suppose X(t)::: IX +f3t +Z(t) where Z(t) is stationary. Clearly X(t) is
not stationary while
(1- B)X(t)::: f3 +Z(t)- Z(t -1)
will be.
Exercise 4
Show that a series of the form
X(t)::: IX +f3t +yt 2 +Z(t)
192 Mathematics for Seismic Data Processing

can be made stationary by differencing twice. Hence convince yourself any


polynomial in t can be removed by differencing often enough.

(ii) Averaging filter


I R-l
Y(t)=- I X(t-j) t = 0, ±l, ...
R j=O

1 R-l
f(A) = - I e-iAj
R j=O
= e iA (R-l)/2 sin(ARI2) -1T<A<1T
R sin(A/2)
r(w) = /f(w)/ = cos A/2
-w I 2 w>O
(j(w) - {
wl2 w<O

(iii) We can almost avoid phase shifts with


I (R-l)/2
Y(t)=- I X(t-j)
R j=-(R+l)

since here

r (A) = Rsinsin(A/2)
ARI2
-1T < A "5, 1T see Fig. 8.12

(j(A) = 0 when sin(ARI2) > 0


= ± 1T otherwise
Notice this filter is zero when
AR
-=k1T
2
i.e.
2k1T
A=-
R

For this reason this filter is widely used to remove periodic disturbances
from the spectrum.
Suppose our series gives the quarterly sales of ice cream in London. One
would expect a periodic peak in the summer quarter and a corresponding
blip in the spectrum. This will occur at frequency w = 1T12. By choosing
R = 4 for the above filter we remove the oscillation from the series and the
blip from the spectrum. A similar filter was used in an engineering example
when x(t) measured the irregularity at a distance t down a railway track.
Clearly the constant bump at the rail joins could be removed in this way.
Time Series 193

Fig. 8.12

(iv) Finite Data Filter


Usually we only have a finite length of recorded data, say
Y(t) = X(t) for t = I ... N
This can be viewed as a filter
00

t=-oo
and
I if t = 1, 2, ... , N
a= {
t 0 otherwise!
By suitable averaging or by other methods of choosing filters one can
enhance or suppress more or less desirable characteristics in one's record.
This is in fact often done on amplifiers where there is a high frequency
cut off for playing poor source material.

Estimation
This is a wide subject and we just look at two simple cases.
As in practice we will use computational techniques we shall only work
with discrete series. In real life that is all we ever observe! For a discrete
series the obvious analogs of the differential models are difference equations
such as:
(a) Y(t) +al Y(t -1) +a2 Y(t -2) + ... +apY(t - p) = X(t)
viz. (1 + alB + a2B2 + ... + apBP) Y(t) = X(t) the autoregressive model of
order p.
(b) Y(t) = X(t) +bIX(t -1) + ... + bqX(t - q)
Y(t)=(l +bIB+··· +bqBP)X(t)
The moving average model of order q.
(c) The mixed model
(1 +alB + ... +apBP)X(t) = (l +blB + ... +bqBq)X(t)
the autoregressive moving average model (ARMA) of order p, q.
We c~n find the correlation structure for these models. We consider just
the two simple cases
A(B) Y(t) = X(t)
194 Mathematics for Seismic Data Processing

and suppose X(t) is a noise signal which is unrelated to past values of X(t)
and Yet). Then multiplying yet) in (1) by Y(t-s) we have
Y(t)Y(t-s)+al Y(t-I)Y(t-s)+··· +apY(t-p)Y(t-s)= Y(t-s)X(t)
Taking expectations gives
')'yy(s) +al')'yy(s - I) + ... +ap')'yy(s - p) = 0
i.e.
')'yy(I)+al')'YY(O)+ ... +ap')'yy(p-I)=O
')'yy(2) + al ')'yy(I) + a2,),yy(2) =0

These are called the Yule- Walker equations.


Together with
')'yy(O) + al ')'yy(1) + ... + ap')'yy(p) = u 2
which we obtain by deducing that
E(X(t)X(t» =u2
we can find ')'yy(u). If we write these equations in matrix form (note
we use the ryy(u), which are the sample values) Ra= -r where rT =
(ryy(l), ryy(2), .. . , ryy(p» with

reO) r(l) r(2) ...J


R = [ r(l) reO) r(l) ... dropping the suffix for clarity
r(2) r(l)

then
a=R-1r
Solving these equations is often known as Wiener filtering. In practical
problems we can use the rather special form of R (a Toeplitz matrix) to
speed up computation. The algorithm for this has been rediscovered on
several occasions.
These equations are of little help when we have a moving average
component and in this case we must resort to a non-linear optimisation
technique. The best of these maximise the likelihood or formulate the
problem in a Kalman filter context.
To estimate the spectrum the traditional estimate is
AIm
f(w)=27T!;. A.')'xx(s)cossw

where m is a chosen integer and A. is a decreasing sequence of weight


functions. A computationally more convenient view is to choose a spectral
window
00
W(8) = L A. ei• 8
-00
Time Series 195
and then we can show that

j(w) = f:1T IN(w)W(w-O)dO

where

liNL X(t) e- iwt 12


IN(w)=-
7TN t=1

Since IN(w), the periodogram, is the square of a OFT, it is simple to compute


via the FfT procedure outlined in Chapter 7. In fact it may be easier to
compute f(w) and hence rxx(u) using the transform and FFTs than comput-
ing rxx(u) directly. Many suggestions have been made for the form of As
or W(O). For example
. 7Ts/m m 7T 7T
Daniell window As=s1O-- W(O)=---:s;O 0:S;0:s;-
7Ts/m 27T m m

Tukey-Hamming A ={!{l+COS(7Ts/m)} Isl:s;m


s 0 Isl>m

W( 0) =-1 {sin(m+!)(O-7T/m) + sin(m+!)O +-"---="----.!~


Sin(m+!)(o+7T/ffl)}
47T 2sin(0-7T/m) sinO sin(O+7T/m)

The choice of m is based on the minimum "bandwidth" required in the


resolution of f( w ).
An alternative approach is to fit an autoregressive model say

c/>( B)X (t) = a( t)

and use filter theory to compute j( w) given the polynomial c/>( B). This form
of estimator, variously called the autoregressive estimate or maximum entropy
estimate does have virtues but also some vices compared to the weighted
integral form.

Rationale
Given a model as in Fig. 8.13 the approach used and needed may well vary
as the point of interest. An engineer may consider a(t) to be a "signal"
while X (t) might be the received signal and noise. The aim here is to detect
the signal in the noise.
An economist given X(t) may try and deduce L with slight knowledge
of aCt). In seismic work we provide a(t) as a sound pulse, perhaps from a
towed airgun array and record X(t). In this case the question is, given X(t)
and a( t), can we find L? As L is a convolution the problem came to be
described as "deconvolution". Since we know aCt) and X(t) we can try
and unwind the convolution. Naturally this brings problems in its train
which are really beyond the bounds of this elementary introduction and
we refer the reader to the references in the literature. Beware however there
is no common system of notation and one must always find out what the
author intends a symbol to represent.
196 Mathematics for Seismic Data Processing

L
a(t) X(t)

Fig. 8.13

APPENDIX A
Expectation operator E.
Given a pair of random variables X and Y we can show
(i) E( aX) = aE(X)
(ii) E(aX + bY) = aE(X) + bE( Y)
(iii) cov(X, Y) = E(XY) - E(X)E( Y)
(iv) var(aX - bY) = a 2 var(X) +b 2 var(Y)+2ab cov(X, Y)
Chapter 9

APPLICATIONS

To conclude our review of the mathematics behind signal processing we


shall look at some rather specific applications. We will discuss
(i) wavelet analysis
(ii) predictive deconvolution
As you might expect our survey will not be comprehensive but we expect
that our readers will be able to read the original articles with some under-
standing.

1 WAVELETS
Perhaps the best source is
E. Robinson, Physical Applications of Stationary Time Series, Griffin (1980),
(especially Chapters 5 and 7).
Another useful text is
E. Robinson and M. Silva, Digital foundations of time series analysis, Vol
2, Holden-Day (1981), (see Chapter 4).
Suppose we send a "signal" which we will view as a pulse or a wave. If
we are to work with digital processes then we will observe the wave as a
sequence of values at discrete time points. If x( t) is the amplitude of the
wave we observe the sequence Xk = x( kilt), k = ... , -3, -2, 0, 1, .... The
sampling interval is denoted by Ilt. It will be convenient to write the string
of values as a vector viz.

If our wave has zero amplitude before time t = 0 we write

while if the amplitude is also zero for t> e then


x = (xo, ... , xe)
Robinson has called such signals wavelets if they start at time t = 0 and
have finite energy. You will recall that from previous chapters if
x = (b o, b" ... )
198 Mathematics for Seismic Data Processing

then the energy of the wave is


00

I bJ
j=O

Thus (6, -2, 1) is a wavelet with amplitude 6 at time t = 0, -2 at time t = 1


and I at time t = 3, assuming unit time intervals and energy 36 +4 + I = 41.
The signal
(1 ,3,3
1 12 -I)
, ... ,3 , ...
has amplitude r l at time t and since
1 ="23
1+ 31 + ... =--1
1- 3
(from the formula for a geometric series), is a wavelet.
We can think of &= (1,0,0, ... ,0, ... ) = (I) as being a "spike" and we
will often find a sudden event being modelled as a spike. Also many authors
will suppose that the total energy is unity, that is I bJ = I. This is merely a
useful scaling and we shall make use of it ourselves; however, it is not, for
the moment, necessary.
Using the ideas of our Chapters 7 and 1 we can define the convolution
of two wavelets
X=(Xo, Xh ... , Xk, ••• )

Y= (Yo, Yh ... , Yk, ...)


as

where

Explicitly the above is


Co= XoYo
p CI = XOYI + XIYO
C2 = XOY2 + XIYI + X2YO
and so on.
There is no difficulty in defining polynomials with the wavelet amplitudes
as coefficients, thus for

we have
X(Z) = XO+XIZ +X2Z2 + ••• +xpzP
which is the z-transform, or at least the mathematical variant. In con-
sequence the DFT is
Applications 199

It is often simplier to look at the transformed version of our signal and


recall that the convolution is the product of the transforms. When
x = (xo, ... , Xk)
and
y = (Yo, ... , yd
then
k
X(z) =I Xjzj
j=O
k
Y(z)= I Yj~
j=O

and
2k
C(z) = X(z) Y(z) = I CkZk
j=O

Example 1
x=(l,2) and y=(4,3)
then c = (4, 11, 6) since
Co = xoYo = 4 x I = 4
Cl = XOYI +xIYo=3 +8 = 11
C2 = XIYl = 2 x3 = 6
and
X(z) = I +2z
Y(z)=4+3z
giving
C(z)=(l +2z)(4+3z)
=4+ lIz +6z 2
If x is convolved with the spike 8 it is easily seen that
x*8=x
since the transform
c5(z) = I
and thence
X(z)c5(z) = X(z)

A wavelet which has its largest coefficient in magnitude on the extreme


right is said to be minimum delay while if this coefficient is the first one it
is said to be maximum delay. The expressions are fairly self explanatory,
the dominant energy term in the minimum delay wave arrives immediately!
200 Mathematics for Seismic Data Processing

Often it is useful to think of a wavelet as being constructed of smaller


ones, thus in our example c = (4, 11,6) is constructed from two 2-wavelets
(4,3) and (1,2). In general for any wavelet c we can take the z transform
C(z) and find the solutions of
k
C(z)= L csz s
s=o
Suppose they are a(, a2, ... , ak then
C(z) = (z - a\)(z - a2)(z - a3) ... (z - ak)
but z - a is the transform of a wavelet (-a, 1) and thence C(z) is the product
of z transforms. In consequence c is made up of the k-fold convolution
c=(-a(, 1) * (-a2, 1) * ... * (-ak, 1)
In general the a's can be complex and will appear in conjugate pairs (-a, 1)
and (-li, 1). If we write out C(z) explicitly we find
C(z)=(z-a\)··· (Z-ak)
= zk_(a\ + ... +ak)zk-\ + ... +(-1)\a\a2··· ak)
Now if (-a(, 1) (-a2, 1)··· are all minimum delay then lad> 1 la21> 1
and in consequence

so c is also minimum delay.


These ideas are important as we shall see when we turn to our next idea
that of deconvolution, see Fig. 9.1.
Given the input signal x and the output y we may reasonably ask "what
*
is the filter"? We need to unravel a x or "deconvolute"-look inside the
black box. In general this is a difficult problem, and especially so if one
has no control over the input x. Happily, in many engineering contexts we
have a reaonable idea as to the nature of x, thus we know the transmitted
signal from a radar set and we receive the output (reflection). What we now
wish to do is deconvolute to deduce what we can about the target. In
geophysics we may think of the earth as consisting of layered strata (with
or without a layer of water). When a transmitted wave strikes the interface
between two sedimentary layers some of the energy is transmitted and some
is reflected. The coefficient of reflection is useful information and the
"primary event" is the reception of this reflected energy. There is a problem
in that secondary reflections, "ghosts" can appear on the seismic trace and
interface with the primary event, see Fig. 9.2. While it is true that in
exceptional cases it is possible for these extraneous signals to combine to
enhance our resolution it would be unwise to rely on this happening,
especially if money is involved.
We reduce the effect ofthe multiples by stacking and deconvolution which
would normally be carried out in one operation. Here we shall just consider
deconvolution. What is more we shall assume that our input wave x is a
spike and hence so is the primary reflection.
We now look at perhaps the simplest case where we have a ghost reflection
of a seismic shot, see Fig. 9.3.
Applications 201

-----;)~-~I BLACK
x input .
80)(
youtput
)

Fig. 9.1

surface
/
/

/ layer 1
/
/
/

j
\ / layer 3
\ I
\ I

- primary path - - - multiples (not all shown)

Fig. 9.2

surface

deep layer interface

Fig. 9.3

primary

ghost

Fig. 9.4

We assume that the ghost is the reflection of the shot pulse from a surface
velocity discontinuity and that it has much the same profile as the primary
impulse. Our aim is to disentangle the two signals to give a clear view of
the primary, see Fig. 9.4.
202 Mathematics for Seismic Data Processing

Suppose we regard the received signal as


y = (1, 0, 0, ... ,0, k)

i i
primary ghost at time
t=O t=d

that is a combination of primary and surface reflection with Ikl < 1 since
the energy must be less than the primary. Note if we change our time scale
then this can be written (1, k). To devise a suitable disentangling filter it is
easier to look at the z transform of y, Y( z) = 1 + kZd.
We would like a filter f whose z transform F(z) satisfied
F(z) Y(z) = X(z)
that is when applied to the output y gives the input primary and removes
the ghost. Since in this simple case X(z) = 8(z) we have
F(z)(1 +kz d )= 1
or F(z) = (1 + kzd)-I is the ghost elimination filter. This can be written, since
Ikl< 1
F(z) = 1 +kZd +ez 2d + ...
giving a corresponding wavelet form
(1, 0, ... , 0, k, 0, ... , 0, e, ...)
We can apply the same technique to the elimination of reflections when
working on water, see Fig. 9.5.
From Fig. 9.5 we can see that the received primary reflection is added to
the successive reflections between the water interfaces. If our measurements
give motion towards the reflecting stratum the signal received will be of the
form
y = (1, 0, 0, 0, 0, -k, 0, ... , d, e, ...)
where
- k is the second downward pulse at time d
k 2 is the third downward pulse at time 2d
and we assume the surface has a reflection coefficient of -1 and d is the
travel time (two-way) in water.
Again the z-transform is
Y(z) = 1- kz d + ez2dk3z3d + ... = (1 + kzd)-I
on summing the geometric series.
In this case we have the filter relationship
F(z)Y(z) = 1
or

and
f=(I,O, ... ,O,k)
Applications 203
air

water

rock

Fig. 9.5

Fig. 9.6

After transforming we have


y * f= (1, 0, 0, ... )
which we leave as an exercise.
As you can imagine the models can be made ever more complex. For
example in the discussion above we may suppose that the reflections from
the upper water interface reach the deep stratum and are reflected. Adding
layers of strata makes life even more complex as can be seen from possible
paths described in Fig. 9.6.
Given wavelets we can calculate autocorrelation and cross-correlation
functions. Suppose x is a wavelet then the auto covariance function is
00

s=-oo

a bar denoting the complex conjugate since we may well have complex
terms in the wavelet. Recall that the total energy in the wavelet is
00

Yxx(O) = L Ixsl2 •
s=-oo

Now from the definitions of Chapter 8 the autocorrelation is


r(k) = YxAk)hxAO)
For simplicity we will follow the usual geophysical practice and assume
that the total energy is unity viz. Yxx(O) = 1. In this case the autocorrelation
204 Mathematics for Seismic Data Processing

IS

In the same way the cross-correlation between two wavelets x and y is


00

s=-oo

This scaling gives some advantages in simplicity, thus the cross-correlation


generating junction f(z) is defined as
f(z) = L rxy(k)zk
and it is not difficult to show that
f(z) = X(z) Y(l/ z)

2 PREDICTIVE DECONVOLUTION

We now have the apparatus to look at predictive deconvolution. We shall


discuss the basic ideas and the following paper
Predictive Deconvolution, K. L. Peacock and S. Treitel, Geophysics V 34
No.2 1969.
A more general discussion can be found in
E. Robinson (1980) cited in section 1.
*
Suppose we observe y, a filtered version of x and y = b x say, see Fig.
9.7 and we take the simplest non-trivial case where b=(bo, bl ) so
Yt = boxt +blxt_1
Our objective given y is to calculate x, or to find b-I so that
b-I*y=b-I*b*x=x
The easiest approach is via the transforms and so we look at the z-transform
of the filter relation
Y(z) = B(z)X(z)
Suppose our deconvolution filter is a with transform A(z) we need A(z) to
satisfy
A(z) Y(z) = A(z)B(z)X(z) = X(z)
so A(z) = B-I(z) which in this case gives A(z) = (b o+ bIZ)-I.
Using formal division
1
A(z) = Ibol (1- kz + ez - ... ) where k = bl/lbol

which converges when Ikl < I, i.e. for a mininum delay wavelet (bo, bl). Thus
while we have a solution, albeit in an infinite series, for the minimum delay
case. In the maximum delay case we must be a little more careful.
Applications 205

) b
x y

Fig. 9.7

For maximum delay Ibl / bol > I and we can write

A(z) = (b o+bIZ)-1 = (bIZ-I) (!: I)


Z-I +

= (b, z) -I (1- (kz) -I + (kz) -2 - •.. )

= (bl)-I(Z-I - k- I Z-2 + ... )


The solution is, assuming b l = I for simplicity
a=(·· ·-k-2 ,k- l )
and it is called an anti-wavelet. Thus the maximum delay case can only
have a solution if we allow anti-wavelets and in consequence convolution
y * a which require future values of y.
Even in the minimum delay case we must modify our filter since an
infinite number of terms cannot be used. A naive solution is just to disregard
the terms after the cth say and use
(1, -k, e, ... , (-kn
as an approximate deconvolution filter. To choose c we look at the error
between the real value and the approximation where e the error is
e = (1, 0, ... , 0) - « 1, - k, ... , ( - k) C) * (I, k)
In transform terms
E ( z) = 1 - (1 - kz +.
. .)(1 + kz)
= 1-(1- kz + ... (-k),zC) -(kz + k 2 z2 • •• )
i.e.
E(z)=(-k),+IZC+I
Thus the error is (0, -k) for 2-wavelet, (0,0, e) for a 3-wavelet and in each
case we can determine the energy left by the approximation.
Thus

c Wavelet Error energy

I (I, -k)
2 (1, -k, k 2 )
3 (I, -k, k 2 , _k 3 )

Alternatively we can choose the approximate filter (ao, aI, ... , ac ) which
206 Mathematics for Seismic Data Processing

has minimum error energy. Since


e=l-a*(l,k)
we can in theory determine the a's. In the 2-wavelet case
e= l-(ao, aok +a], alk)
so energy is
Q = (I - ao)2 +(aok + al)2 +(alk)2
To minimise Q we take the partial derivatives and set these to zero so

and on solving
ao=(l +e)/(l +e+e)
al = -k/(l +e+e)
It is not difficult to check that these are minimum points, indeed in this
case Q can be plotted as a function of ao and al'
We can extend this least squares energy extension to the more practical
case using correlations of signals. This can be viewed as a prediction problem
and hence is often described as predictive deconvolution.
Suppose y the output is related to x via

*
that is y is the convolution a x. We look at the problem of predicting X , + a
a
for some > 0 given Yo, y], ... ,y" in fact we will take the rather simpler
x,
case and consider predicting X , +a by + a where a,+a = Yt for some t.
The prediction error 8 , + a is just

and for a = I, 2, 3, ....


Taking the z-transform gives, when 1= t
z-aE(z) = X(z)z-a - X(z)A(z)
or
E(z) = (1- zaA(z»X(z)

We would like to find the A(z) which gives the "best predictions" or
minimises the (energy) error. Suppose we choose the coefficients aj
j = 0, I, ... to minimise

Q = I (X, + a - X,+ a )2
I
Applications 207
as we did in the 2-wavelet case, using t+ et x = Yt. Here
Q = L (x t+et - aOxt - a1xn_1 - ... - anx t_n)2
t

so taking partial derivatives


aQ
- = 2 L (xt+et - aoxt - ... - anxt- n)( -Xt) = 0
aao t

where we assume A(z)=ao+alz+·· ·+an_IZ n- l. Tidying up these


equations gives, after some algebra

L Xt-1Xt+et = aoL XtXt-1 +al L X;_I + ... +an L Xt-n+IXt-1


t

s =2, 3, ...
Now if we recall that the correlations rk are defined as
rk = L X,xt+k
these equations become

This uses the usual results on correlations of stationary series i.e.

We can write this system of linear equations in a matrix form as

When a = 1 we have the Wiener filter of Yule-Walker equations for a fixed


wavelet of n terms.
208 Mathematics for Seismic Data Processing

These can be solved directly for the a given the correlations or rather more
economically using the fact that the extreme left-hand matrix is a "Toeplitz
matrix". Because of its rather special structure there are some slick
P algorithms, the earliest version (known to us) due to Levinson, for finding
the aj is based on the following idea.
Suppose a = I and apj is the jth coefficient from a filter of length p. Our
equations are for p = 2, P = 3
r2= U21 rl + a22 rO
rl = U21 ro + a22 rl
and

r2=a3l rl +a32 r2+ a33 rl


rl = a3l rO+ a32rl +a33 r2
Solving gives

Since

then

which gives a recursive method of solution provided we know a33' Fortu-


nately substituting this last equation in our original 3 x 3 set gives
r3 - a21 r2- a22 rl
U33 =
1 - U 21 r l - U22r2
This recursion can be generalised and as it saves a matrix inversion is very
fast. One must take care when the roots of A(z) lie close to the unit circle
since rounding errors can cause problems with this method.
Now for a = 1 we can choose an n and by estimating the A(z) perform
the deconvolution. This is a well known technique both in geophysical
processing and signal processing generally. The problem is that one doesn't
know n, the filter length. In practice one can, if no a priori information is
available, just estimate the sequence ao, at. a2, ... until they become zero.
Peacock and Treitel took this concept rather further and suggested that
one should not only choose n but also consider a filter designed for an a
exceeding one.
Recall that for the simple model with reflections between two water
interfaces as in Fig. 9.5 the received signal was of the form
y=(l,O,O, ... ,0, -k,0, ... ,0, e, ... )
Applications 209
and it would seem intuitively reasonable that to predict and remove the
terms of order -Ie, k 2 , ••• occurring in the d + I, 2d + I, ... position one
should choose a = d.
If one computes the correlations
ro= 1 +e+e+···
rj =0, 0< Ijl < d
rd = -kro
and upon substitution into the prediction filter equations we find the
following equation

II f[Jlfl
0
ro

0
which gives a particularly nice and economical set of equations to solve.
Peackock and Treitel report generally good results with this method.
We suggest that at this point our reader might like to start reading the
geophysical literature. Do remember that mathematics takes time and effort
to understand.
Appendix 1

REFERENCES TO APPLICATIONS

PART 1

In this section we give some explicit references to the Geophysics literature


which contain the ideas and techniques we have developed. The list is not
exhaustive but will we believe give some insight into the use and value of
the mathematics.

1.1 Functions
The idea is so embedded in the literature that it is difficult to dissect. As
some examples try
Claerbout (1976) Chapter 1, Introduction and Section
1.1
Chapter 2
Robinson (1980) Section 7.3
Kulhanek, O. (1976) Chapter 1
Notice in most cases no explicit reference is made.

1.2 Polynomials
The main application is as the representation of data, perhaps as a wavelet.
This gives a compact data representation.
C1aerbout (1976) Chapter 1
Robinson (1980) Sections 6.4, 6.11
Robinson (1983) Section 2.9

1.3 Trigonometric and Other Functions


McQuillin et al. (1979) Section 1.3 log
Section 1.5 exp
Section 1.8 sine
Appendix 1. Sine and cosine
Grant and West (1965) Section 3.9
Section 5.7
Robinson (1980) Chapter 3
Claerbout (1976)
212 Mathematics for Seismic Data Processing

Chapter 2: Differentiation
McQuillin (1979) Section 3.6, repeated differentiation
Grant and West (1965) Section 5.1, ordinary derivatives
ordinary differential equations
Claerbout (1976) Section 2.5
Section 6, Minimisation
Grant and West (1965) Sections 2.6, 2.9, partial differential
equations
Section 8.4, Greens theorem
Section 9.9, Interpolation
Robinson (1983) Section 8.7, Minimisation

Chapter 3: Integration
McQuillin (1979) Section 3.5
Appendix 1
Claerbout (1976) Chapter 4
Grant and West (1965) Sections 5.1, 5.5, 8.3. Chapter 8 uses
line integrals and Greens theorem
Robinson (1980) Sections 3.4, 4.4
Bath (1974) Chapter 3
K. Hubbert Line Integrals
Runcorn (1966) pp. 123-204 for triple integrals

PART 2

Chapter 4: Complex Numbers


The ideas here are essential for Fourier theory and study of filters. The
frequency time domain duality is crucially dependent on the complex
exponential.
McQuillin (1979) Chapter 1, seismic waves
Claerbout (1976) Chapter 1, section 2
Rayner (1971) Section 5.2
Robinson (1980) Section 8.8
Runcorn (1964)

Chapter 5
Matrices are a key data processing concept.
Claerbout (1976)
Robinson (1980) Chapter 5, section 9.7
Robinson (1983) Section 1.9, sections 4.1 to 4.1 0
Robinson (1981) Chapter 15

Chapter 6
Clearly any data gathering operation must have some statistical component,
even if just for quality control.
References to Applications 213

Claerbout (1976) Chapter 4


Robinson (1980) Chapter 8
Robinson (1981) Chapters 1, 3, 10

Chapter 7
Rayner (1971) Chapters 2, 3, 5
McQuillin (1979) Appendix 1
Robinson (1971) Chapter 11
Robinson (1980) Whole volume
Parasnis (1972) Section 5.9
Claerbout (1976) Section 1.2 for the Dirac delta
Section 1.3
Robinson (1980) Chapter 4
Runcorn (1966)

Chapter 8
Almost any volume by E. Robinson gives a wealth of illustration.
Claerbout (1976) Chapters 3, 4
McQuillin (1979) Chapters 1,3,4
Rayner (1971) Whole volume
Bath (1974) Whole volume
Robinson (1980)
Robinson (1981)
Robinson (1978)
Kulhanek O. (1976)
Robinson E. A. and Treitel (1973)
Silvia and Robinson (1978)
Webster G. M. (1978)

REFERENCES
Grant, F. and West, G. (1965), Interpretation theory in applied Geophysics,
McGraw Hill.
Bath, M. (1974), Spectral Analysis in Geophysics, Elsevier.
Runcorn, S. K. ed. (1966), Methods and Techniques in Geophysics, J. Wiley.
Kulhanek, O. (1976), Introduction to digitalfiltering in Geophysics, Elsevier.
Robinson, E. A. and Treitel, S. (1973), The Robinson-Treitel Reader (3rd
edn) Seismograph Service Corporation, Tulsa, Oklahoma.
Silva, M. T. and Robinson, E. A. (1978), Deconvolution of Geophysical Time
Series in Exploration for Oil and Natural Gas, Elsevier.
Claerbout, J. F. (1976), Fundamentals of Geophysical Data Processing,
McGraw Hill.
Robinson, E. A. (1983), Multichannel Time Series Analysis with Digital
Computer Programs, (2nd edn), Goose Pond Press.
Robinson, E. A. (1980), Physical Applications of Stationary Time Series with
Special Reference to Digital Data Processing of Seismic Signals, C.
Griffin.
214 Mathematics for Seismic Data Processing

Parasnis, D. S. (1972), Principles of Applied Geophysics (2nd edn), Chapman


and Hall.
McQuillin, R., Bacon, M. and Barclay, W. (1979), An Introduction to Seismic
Interpretation, Graham and Trotman.
Webster, G. M. (1978), Deconvolutions (2nd Volume), Society of Explor-
ation Geophysicists, Tulsa, Oklahoma.
Robinson, E. A. (1981), Time Series Analysis and its Applications, Goose
Pond.
Rayner, J. N. (1971), An Introduction to Spectral Analysis, Pion.
-Gradshteyn, I. S. and Ryzhik, I. M. (1980), iHU., Academic Press.
Knuth, D. E. (1977), The Art of Computer Programming, Vol. I, Addison-
Wesley.
Appendix 2

SOME USEFUL FORMULAE FOR


READY REFERENCE

TRIGONOMETRIC FORMULAE

sin(A + B) = sin A cos B +cos A sin B


cos(A + B) = cos A cos B -sin A sin B
sin 2A = 2 sin A cos A; cos 2A = 2 cos 2 A - I
tan A+tan B
tanA+B=----
I-tan A tan B
2 tan A
tan2A= 2 A
I-tan
tan( -A) = -tan A, cos( -A) = cos A, sin( -A) = sin A
I +tan 2 A = sec 2 A
A sin x + B cos x = C sin(x + cf»;
where

LOGS
loga (x) = loga b . 10gb X
loge (x) = In(x) = log(x); log( eX) = x;
aX = exloga; loga xy = loga x + loga Y

SUMS AND SERIES


n

L aj=ao+a l +" '+c n


i=O
216 Mathematics for Seismic Data Processing

or
n
L aj = am + am + I + ... + an
i=m

An arithmetic series is one of the form


n-I
a+(a+x)+·· ·+(a+(n-1)x)= L (a+ix)=!n(2a+(n-1)x)
j~O

A particular example is given by a = 1, x = 1. Then 1 + 2 + 3 + ... + n =


L~~I i =!n(n + 1). Another series is
n
12+2 2+3 2 .. '+n 2= L r2=~(n)(n+l)(2n+1).
r~1

Geometric series
l-x n
2
a+ax+ax +"'+ax
n-I
= L ax
n-I

j~O
j
=a'--
I-x'
(ifx,.,I).

i exp(iwj) = exp(iw(b +a)~2)


j~a
sin(w(b + l-a)/2
sm(w/2)
(if w,., 0)

= b + 1- a (if w = 0)
Also xn - an = (x - a)(x n- I + x n- 2a + ... + xa n- 2 + an-I) or
xn _an n-I ..
n-l-r I
---=L.,X
~
a.
x-a j~O

n(n-l) n(n-l)(n-2)
(l +xt = 1 +nx+ x2 + x 3 + . .. +x n
2! 3!
We define
C =
n r r
(n)
= (n-r)!r!"
. n' where n' = n . n - 1 . n - 2, . .. , 2· 1.

(1 +xt = I
r~O
(n)xr,
r (~)=1=(:).
This is known as the binomial theorem.
Some infinite series
x2 x3
eX = a +x+-+-+···
2! 3!
x2 x3
log(1 +x)=x--+--'" (-1 <x:::; 1)
2 3
x3 X S
sinx=x--+-_·· .
3! 5!
x3 XS
cos x= I-x +---+ ...
3! 5!
Ixl = positive value of x, i.e. 1-21 = 2, 131 = 3 etc.
Some Useful Formulae for Ready Reference 217
CALCULUS

d 1
- (log x) =-,
dx x

f x" dx=--
n + l'
X"+I
f~ dx = log\x\,

d . d .
dx sm x = cos x, - cos x = -sm x,
dx

f cos x dx = sin x f sin x dx = -cos x,


f sec 2 x dx = tan x also f tan x dx = -log\ cos x\
If f and g are functions of x,
d(f(x)g(x) df. g(x) + dg . f(x)
dx dx dx
If h(x) = f(g(x))
dh df dg
-=-.-
dx dg dx

f u(x)v'(x) dx= u(x)v(x)- f v(x) du(x).

x2 x3
f(a +x) = f(a) +xf'(a) +- f(2)(a) +- t<3)(a) ...
2 3!
where f(r)(x) = d,//dxr.
Appendix 3

PROGRAMS

We have listed below a selection of programs written in BASIC. These will


hopefully enable you to carry out simple calculations to aid in your under-
standing of the mathematical material in the text. The letter P in the margin
of the main text indicates that one of the programs supplied here is relevant.
The programs listed here are not presented as "state of the art" and we
do not claim they are good examples of the programmer's art, they will we
hope assist comprehension. They are written in what we believe to be a
fairly universal subset of BASIC. We have thus not provided any graphics,
apart from character plots, as there is no common standard. We have also
avoided extensions that will only be implemented on some machine (thus
we use one line functions and do not use matrix commands). In consequence
all these programs should work on a cheap microcomputer and all have
been tested on a micro.

1 FUNCTIONS

Most dialects of BASIC allow the user to define one-line functions in a


program.
Typically the function is defined using DEF; late in the program the
previously DEFined function may be used e.g.

10 DEF FN A(W)=2*W+W
.. n.. X. FN A(23)

Most BASICS also provide some functions which are already defined.
We will assume these are:

SIN ATN SGN EXP


COS INT ABS LOG
TAN RND SQR

From these one can immediately obtain a set of "defined" functions using
DEF FN.
Programs 219
SECANT
SEC(X) = IjCOS(X)
COSECANT
CSC(X) = IjSIN(X)
COTANGENT
COT(X) = IjTAN(X)
INVERSE SINE
ASN(X) = ATN(XjSQR( -X * X + 1)
INVERSE COSINE
ACN(X) = -ATN(X/SQR( -x * x + 1» + 1.5708
INVERSE SECANT
ASE(X) = ATN(SQR(X * X - 1) +(SGN(X) - 1) * 1.5708
INVERSE COSECANT
ACE(X) = ATN(1/SQR(X * X-I» +(SIGN(X) - 1) * 1.5708
INVERSE COTANGENT
ARCCDT(X) = -ATN(X) + 1.5708
HYPERBOLIC SIGN
SINH(X) = (EXP(X) - EXP(-X»/,i
HYPERBOLIC COSINE
COSH(X) = (EXP(X) + EXP( -I) )/2
HYPERBOLIC TANGENT
TANH(X) = -EXP( -X)/(EXP(X) + EXP( -X» * 2 + 1
A MOD B
MOD(A)= INT«A/B-INT(A/B» * B+0.05) * SIGN(A/B)
ABS(X) returns the absolute value of X i.e. IXI so if X < 0 ABS(X) = -X
while if X> 0 ABS(X) = X
INT(X) returns the largest integer less than or equal to X e.g.
INT(1.7) = I
INT(2.07) = 2
INT(2) =2
RND( 1) returns a random number less than 1 but exceeding zero
SG N (X) returns -1 if X < 0 or +1 if X> 0, 0 otherwise
SQR(X) returns the positive square root of X
Two strange but useful functions are
(a) 100 DEF FNR(X) = INT(X * (lOjF0) +0.5)/(lOjF0)
where we assume F0 is set on some previous line to be an integer like 2
or 6. This function takes a number X and keeps only the first F0 numbers
after the decimal point-it "rounds" the number, If F0 = 2 and X is
212.123456 then FNR(X) will equal 212.12. Setting F0= -2 with the same
X will give FNR(X) equal to 200, i.e. the number would be rounded to the
nearest 100.
This is useful to print tidy versions of numbers, thus with F0 = 6
PRINT(FNR(X) )
prints X to 6 decimals.
(b) MOD(A) = INT(A/B - INT(A/B» * B +0.05) * SGN(A/B)
220 Mathematics for Seismic Data Processing

Thus function, which assumes that B is previously defined returns the


remainder after A is divided by B. For B = 3
MOD(7)= I MOD(13) = I
MOD(3)=O
MOD(2)=2
We will from time to time write A mod B to mean MOD(A).
The following program demonstrates the use of these functions.

Program 1
This program tests the two functions we have just described.
R(x): the rounding function
mod(x): the mod function
As you can see we display the output to assist in the understanding of the
program.

JLIST X= 71.4390695
ROUNDED XIS 71.4391
100 RE" FUNCTION TESTIN6
110 RE" TESTS R(XI AND "00 x = 71
120 RE" DATA FRO" RANDO" "ODm = 2
130 FO : 4:8 : 3
1~0 DEF FN R(XI: INT (X • (10 X= 40.8186764
~ FO) + 0.51
(10 ~ FOI I ROUNDED X IS 40.8187
150 DEF FN "OD(A) = INT ((A I
B- INT (A ! B») • B+ 0.05 x = 40
) • S6N (A I B) "ODm =
160 FOR I : 1 TO 4
170 X= RND (9) X= 21. 0867141
180 X= 100 • X ROUNDED XIS 21.0867
190 PRINT
200 PRINT I X= "jX x = 21
210 PRINT "ROUNDED X IS "j FN R( ImDm = 0
Xl
220 PRINT X= 96.0521198
230 X= INT (X) ROUNDED X IS 96.0521
240 PRINT'X = ";X
250 PRINT ""OD(X) = "j FN "OD(X x = 96
) "ODm = 0
260 NEXT
270 PRINT PR06RA" ENDS
280 PRINT "PR06RA" ENDS ": END
JRUN
Programs 221

Program 2
This program is provided for you to evaluate polynomials. Given the order
of the polynomial and its coefficients the program will compute the value
of the polynomial for a given number of points. A range of values must be
given.
Do note this is not an efficient way of computing values: there are very
much better ones, for example Homer's algorithm.

JlIST 300 HEXT


310 END
100 RE" PROGRA" 2.1 lRUN
110 RE" OBVIOUS AND SLOM POLY NO GIVE THE ORDER OF THE
"IALS POLYNO"IAL 2
120 PRINT "GIVE THE ORDER OF THE NON GIVE THE COEFFIC.IENTS,STARTING MITH THE LEADING
": A(2)=
130 INPUT "POLYNO"IAL "jN ?2
140 DI" AIN) A(1J=
150 PRINT "NOM GIVE THE COEFFICI ?1
ENTS,STARTING WITH THE LEADI AIO)=
NG" ?-1
160 FOR I = NTO 0 STEP - 1 GIVE THE LOMER END OF THE RAN6EO
170 PRINT HAI"jIj"): ": INPUT AI GIVE THE UPPER END2
I) HON "ANY POINTS 10
180 NEXT X=O PIXJ=-1
190 INPUT "GIVE THE LOWER END OF X = .2 PIX) = -.72
THE RANGE"jLS X = .4 pm = -.28
200 INPUT "GIVE THE UPPER END"jU X= .6 P!Xl = • 32
S x = .8 PIX) = 1.08
210 INPUT "HOW "ANY POINTS "jNP X= 1 PIXl = 2
X = 1.2 PIX) = 3.08
220 Y=,IUS - LB) ! NP X= 1.4 PIX} = 4.32000001
230 FOR I = 0 TO NP X = 1.6 PIX) = 5.72000001
240 I = LB + Y• I X : 1.8 PIX) = 7.28000001
250 S = 0 X= 2 pm = 9
260 FOR J : NTO 0 STEP - 1
270 S : S + AIJJ * II A J)
280 NEXT
290 PRINT "X = "jZ,· PIX) = "jS

Program 3
This program uses the method of bisection to find the solution of f(x) = o.
You must give a range of values which include the solution, the program
will warn you if there is no solution in this range. Notice you are asked to
specify the accuracy required. To ensure that the procedure terminates, the
program will ask you for an upper bound to the number of iterations.
222 Mathematics for Seismic Data Processing

JlIST 250 IF ABS IFA) < EE THEN SO :


A: 60TO 310
100 REP! P!ETHOD OF BISECTION 260 IF ABS (FB) ( EE THEN SO :
110 REP! SOLVES EGUATION B: SO TO 310
120 REP! FN IS DEFINED IN 140 270 PI : (A + B) I 2:FPI: FN FIPI'
130 REP! RAN6E A(X(B NEEDED
140 DEF FN FIX) = X • X+ 3 • X 280 IF FA • FPI) : 0 THEN A: "
+1 :FA : F": GOTO 230
150 PRINT "YOU "UST GIVE AAND B 290 B: ":FB : FPI
300 SOTO 230
160 PRINT "VALUES WHICH INCLUDE 310 PRINT "SOLH. IS ",SO
THESOlN" 320 PRINT "FN VALUE AT THIS POIN
170 INPUT ·S"ALlER VALUE A= "jA T: ";FPI
180 INPUT "LARSER VALUE B= "jB 330 END
190 INPUT "SIVE ACCURACY "iEE JRUN
200 PRINT ""AXI"UPI NO. OF ITERAT YOU "UST SIVE AAND B
IONS· VALUES WHICH INCLUDE THE SOLN
210 INPUT "TO STOP PROS"jIT S"AllER VALUE A: -I
220 FA = FN FIA):FB = FN F(B) LARGER VALUE B: 2
230 IF FA • FB > (I THEN PRINT" SIVE ACCURACY 0.000001
TRY A6AIN": SOTO "AXI"U" NO. OF ITERATIONS
160 TO STOP PROSIOO
240 I = I + I: IF I ) IT THEN PRINT SOlN. IS -.381966114
"STOPPED":SO = PI: GOTO 310 FN VALUE AT THIS POINT: -2.30e36676E-01

Program 4
This program uses the Newton-Raphson algorithm to solve f(x) = O. You
should note that the program computes the derivatives of the function itself.
If you supply the derivatives in the subtoutine beginning at line 1000 the
program will converge more quickly but this does make the program less
flexible. Notice an initial guess and the accuracy required must be specified.
There is no check to ensure that the program stops if the algorithm does
not converge so take care.

JLIST 150 INPUT "SIVE INITIAL VAL ";XO


100 RE" NEWTON-RAPHSON AlSORITH 160 INPUT "GIVE ACCURACY NEEDED"

110 "
RE" NEED AN INITIAL SUESS
120 RE" FN DEFINED AT 140
JEE
170 EP : IE - 16: RE" ACCURACY
80 UND FOR "leRO
130 RE" PROS FINDS DERIVATIVE RUN
140 DEF FN FIX) : SIN IX) - X• 180 EP = SGR (EP)
XI 2 190 FO: FN FIXO)
Programs 223
200 SOSUS 1000 STEP 3 X= 1.47B46231
210 Xl = XO - FO I FD:I = I + 1 STEP 4 X= 1.4155181
220 PRINT ·STEP "iIi" X= "iXI STEP 5 X= 1.40783465
230 E = Xl - XO:XO = Xl STEP 6 X= 1.40600649
240 IF ADS (E) > EE GO TO IBO STEP 7 X= 1.40528888
250 PRINT "ACCURACY ACHIEVED": STOP STEP 8 X= 1.40492846
STEP 9 X= 1.40472574
1000 H= ( ASS (XO) + EP) * EP STEP 10 X= 1.4046056::
1010 Fl = FN F(XO + H) STEP 11 X= 1.40453268
1020 FD = (Fl - FO) I H STEP 12 X= 1.40448785
1030 REM FD IS DERIVATIVE STEP 13 X= 1.40446015
1040 RE" HIS DELTA X STEP 14 X= 1.40444298
1050 RETURN STEP 15 X= 1.40443232
JRUH STEP 16 X= 1.4044257
SIVE INITIAL VAL 1 ACCURACY ACHIEVED
GIVE ACCURACY NEEDEDO.OOOOI
STEP 1 X= 2.62956303 BREAK IN 250
STEP 2 X= 1.78211336

Program 5
This program computes the minimum value of a function in the range a to
h. The method is to split the range into n intervals and then find the minimum
function value. The process is then repeated on this interval to give a smaller
interval. This process is repeated until the accuracy required is attained.
The diagnostic print is given to illustrate the method.
One could also find the maximum using the same program, we leave this
as an exercise for the reader. Alternatively the stationary values might be
found by using bisection or Newton-Raphson to find the values for which
the derivative is zero.

JLIST 170 INPUT A= ";A: INPUT "BE ",


I

8
100 RE" "INI"U" PROGRA" 180 T = FN FIA):K = 0
110 RE" ASSU"ES THERE IS ONE "I 190 N= FN FIB)
N 200 IF N< T THEN T E N:K = N
120 RE" NEEDS INTERVAL Of INTER 210 H= (8 - A) I N
EST 220 FOR J = 1 TO N- 1
130 OEF FN F(X) = - 250 • X+ 230 P = FH F(A + J • H)
22500 • (1.00949) A X 240 IF P > = T GOTO 260
140 INPUT "NO OF GRID POINTSE "i 250 T = P:K = J
N 2bO NEXT
150 INPUT "TOLERENCE= 'iEE 280 R= A+ J • H
160 PRINT "FN EXA"INED BETWEEN A 290 PRINT "INTERVAl IS "jiR - H)
AND B" ; I ro"iR
224 Mathematics for Seismic Data Processing

300 RE" THIS GIYES INTERYAL FN EXA"INED BETNEEN AAND B


310 IF ABS (H) < EE I 2 THEN GOTO A= 15
340 B= 20
320 B= A+ (K + 1) • H:A = B-2 INTERYAL IS 19.5 T020
*H INTERYAL IS 17.4 T017.5
330 GOTO 180 INTERYAL IS 17.28 T017.3
340 SO = R - H! 2 INTERYAL IS 17.216 T017.22
350 PRINT "SOLN= ";SO INTERYAL IS 17.1992 T017.2
360 PRINT "FN YALUE = "jP INTERYAL IS 17.19664 T017.1968
370 END INTERYAL IS 17.196928 T017.19696
lRUN SOLN= 17.196944
NO OF GRID POINTS= 10 FN YALUE = 22168.985
TOLERENCE= 0.0001

Program 6
This program plots contours for a function of two variables f(x, y). The
function is evaluated for x values between xa and xb,y values between ya
and yb. The letters indicate the "height" of the function. We have used this
display as it only requires a text string as opposed to full graphics.
You may wish to set vt, the number of vertical lines, and vh, the number
of horizontal places, to fit the display to your own machine.

lLIST 250 Z( I, J) = 100 * (X * X- V* V


)
100 RE" CONTOUR PLOT 260 P = Z(I,J)
110 RE" SET YT=NO YERT LINES 270 IF "AX < P THEN "AX = P
120 RE" YH=NO HORIZONTAL CHAR S 280 IF HIN ) P THEN HIN = P
PACES 290 NEXT
130 RE" BOUNDS XA-XB,YA,YB 300 NEXT
140 REM FN DEFINED AT 250 310 RE" SRID CO"PUTED IN Z
150 VT = 20:VH = 39 320 PRINT "SRID CO"PUTED "
160 DIH 5$(40) 330 PRINT ""AX= "j"AX,""IN= "i"I
170 DI" Z(20,40):"AX = O:"IN = 0 N
340 RE" NAIT HERE
180 INPUT ·XA = "ilA: INPUT IB
I 350 SET A$
= "jXB 360R= ("AX - "IN)
190 INPUT YA= "jVA: INPUT VB
I I 370 HO"E
= "jVB 380 FOR I = 1 TO YT
200 IX = (XB - XA) ! VH:IV = (VB - 390 FOR J = 1 TO VH
VA) ! VT 400K = INT ((Z(I,J) - "IN) *9 I
210 FOR I = 1 TO VT R) + 48
220 V= VA + I • IV 410 0$ = US + CHRS (K)
230 FOR J = 1 TO YH 420 NEXT
240 X= XA + J * IX 430 PRINT US
Programs 225
440 Q$ = I I
450 NEXT
460 PRINT "END": END JRUN
lRUN XA : -1
XA = -1 XB= 1
XB= 1 VA= -1
VA= -1 VB= 1
VB= 1 GRID CO"PUTED
GRID CO"PUTED "AX= 200 "IN= 0
"AX= 100 "IN= -99.9342538 776665554444443333333333444444555666778
444333222111111000000001111112223334445 666555444333333322222233333334445556667
554443332222211111111111122222333444556 655544433332222222222222222333344455566
655544433332222222222222222333344455566 554443332222211111111111122222333444556
666555444333333322222233333334445556667 544333222211111111111111111122223334455
766655544444333333333333334444455566677 443332222111110000000000111112222333445
777665555444444333333334444445555667778 443322211111000000000000001111122233444
877666555544444444444444444455556667788 433322211110000000000000000111122233344
877766655554444444444444444555566677788 433222111100000000000000000011112223344
887766665555444444444444445555666677888 433222111100000000000000000011112223344
887766665555444444444444445555666677888 433222111100000000000000000011112223344
887766665555444444444444445555666677888 433322211110000000000000000111122233344
877766655554444444444444444555566677788 443322211111000000000000001111122233444
877666555544444444444444444455556667788 443332222111110000000000111112222333445
777665555444444333333334444445555667778 544333222211111111111111111122223334455
766655544444333333333333334444455566677 554443332222211111111111122222333444556
666555444333333322222233333334445556667 655544433332222222222222222333344455566
655544433332222222222222222333344455566 666555444333333322222233333334445556667
554443332222211111111111122222333444556 776665554444443333333333444444555666778
444333222111111000000001111112223334445 887766665555544444444444455555666677889
433222111100000000000000000011112223344 END
END

Program 7
This is a non-linear least squares program which minimises a nonlinear
sum of squares using a technique called Marquardt's method. We have
included it so you might have an example of the minimisation of a more
complex function, this technique is often used in practical problems. The
program tries to find b values to minimise
12
I {xj -b 1/[1 +b 2 exp(ib3 )]}2
j=1

the data being obtained from the data statement. As you will see from the
output we have printed values of IG, INF and LAMBDA. These are
diagnostic parameters.
226 Mathematics for Seismic Data Processing

Notice that there is a Choleski decomposition algorithm starting at line


1270 which transforms a matrix into a lower triangular form. The subroutine
which follows it solves equations using this decomposition. This is a very
good example of the practical use of some matrix results.

nRI00 360 "1'1 :N• (N + 1) ! 2


370 FOR I = 1 TO I'II'1
lLIST 380 All) : 0
390 NEXT
100 REI'I THIS IS MARQUART 4(10 FOR I : 1 TO N
110 REI'I DERIVATIVES NOl NEEDED 410 V(J) : 0
120 DII'I E(12),A(6),CI6),B!3),XI3 420 NEXT
),VI3),D(3),FI12),ZI3) 430 FOR I = 1 TO 1'1
130 DATA 5.308,7.24,9.638,12.86 440 IS = I: 60SUB 1160
6 450 FOR J = 1 TO N
140 DATA 17.069,23.192,31.443,3 460 V(J) = V(J) + DIJ) * F(I)
8.558 470 Q= J * (J - 1) I 2
150 DATA 50.156,62.948,75.995,9 480 FOR K= I TO J - 1
1.972 490 A(D + K) = AID + K) + D(J} *
160 INPUT "NO OF PARAI'IETERS:1jN DUO
170 INPUT "NO OF RESIDUALS:I;I'I 500 NEXT K
180 INPUT "TOLERENCE,USUALLY lE- 510 NEXT J
37";TL 520 NEXT I
190 FOR I : 1 TO 1'1: READ E(I): NEXT 530 FOR J = 1 TO ""
540 CIJ} = AIJ)
200 PRINT "INPUT INITIAL PARAI'I V 550 NEXT
ALS" 560 FOR J = 1 TO N
210 FOR I : 1 TO N 570 DW = B(J)
220 INPUT "BII):";B(I) 580 NEXT
230 NEXT 590 FOR J = 1 TO N
240 INC: 10:DEC = 0.4 600 D =J * (J + 1) I 2
250 PH : I:LAI'I : 0.0001 610 A(9) = C(g) * II + LAI'I) + PH *
260 REI'I FUNCTION EVAL LA"
270 FOR IS : 1 TO 1'1: 60SUB 1100: 620 X(J) = - V(J)
NEXT 630 IF J = 1 60TO 670
280 60SUB 1790 640 FOR I = I TO J - 1
290 PO : P 650 AID - I) = C(D - I)
300 FOR I : 1 TO N:ZII) : BII): NEXT 660 NEXT
670 NEXT
310 16 : 16 + 1:INF : INF + 1 680 60SUB 1270
320 LA" : LA" • DEC 690 IF ND ) 1 THEN 6010 880
330 PRINT FN VAL:ljPO
I 700 60SUB 1530
340 PRINT "16 :1;16;" INF:ljINF 710 CO = 0
720 FOR I = 1 TO N
350 PRINT I LAI'IBDA:I;LAI'I 730 B(J) = DU) + 1m
Programs 227
740 IF BII) = OIl) THEN CO = CO + SOTO 1150
I 1130 PX = 8(2) * EXP iPX) + I:PX
750 NEXT = 8!1l I PX
760 IF CO =N6oTo 950 1140 FIlS) = PX - EllS)
770 FOR IS = I TO H: 60SUB 1100: 1150 RETURN
NEXT 1160 RE" DERIYATIVE
780 SoSUB 1790 1170 EPS = 0.000001
790 INF = INF + I 1180 FOR L = 1 TO N
800 REH IF HANGS HERE TRY 770 1190 XX = BIL):H = ADS IXX) • EP
810 PRINT 1 FN YAL="jP S + EPS • EPS
820 FOR I = 1 TO N 1200 GOSUB 1100
830 PRINT IBII;ljl)=I,BII) 1210 PI = FlIS)
840 NEXT 1220 BIL) = BIL) + H
850 PRINT "LA"BDA="jLA" 1230 GOSUB 1100
860 IF P < PO THEN GOTO 290 1240 P2 = FlIS)
870 RE" INCREASE PARA" 1250 DIL) = P2 - Pl:D(L) = D(L) j
880 LA" = LA" • INC H
890 IF LA" < lEI5 THEN SOTo 930 1260 RETURN
1270 RE" CHOLESKI DECO"P
900 PRINT "LA" TOO BIS,TRY RESTA 1280 FOR J = 1 TO N
RT" 1290 Q = J • (J + 1) j 2
910 PRINT "TRY ANEN START" 1300 IF J = I THEN GOTO 1390
920 soro 950 1310 FOR I = J TO N
930 IF LA" = 0 THEN LA" = TL 1320 "" = I * (I - 1) j 2 + J
940 60TO 590 1330 S = A("")
950 PRINT "END OF RUN" 1340 FOR K = 1 TO J - 1
960 PRINT "FN YALUE=",P 1350 S = S - AI"" - K) * A(g - K)
970 PRINT "LAST FN VALUE =";~O
980 PRINT "CORRESPONDING PARA"ET 13bO NEXT
ERS" 1370 AI"") = S
990 FOR I = I TO N 1380 NEXT
1000 PRINT "B(ljI") = "jZII): NEXT 1390 IF AIQ) ) 0 THEN ND = 0: 60TO
1430
1010 PRINT "COEFF YALS="; 1400 RE" ND=O IS POS DEF
1020 FOR I = I TO N 1410 ND = 10
1030 PRINT "BI"jI")="jB(I) 1420 A(Q) = 0
1040 NEXT 1430 S = SQR (A(g)
1050 PRINT "RESIDUALS • 1440 FOR I = J TO N
1060 FOR I = 1 TO " 1450 "" = I * (I - 1) j 2 + J
1070 PRINT Ij" IjF(I) 1460 IF S = 0 THEN A("") = 0: 60TO
1080 NEXT 1480
1090 STOP 1470 AI"") = AI"") ! S
1100 RE" RESIDUALS EYALUATED 1480 NEXT
1110 PX = IS * B(3) 1490 NEXT
1120 IF PX ) 8 THEN FIlS) = 3E4: 1500 RE" END OF CHOLESKI
228 Mathematics for Seismic Data Processing

1510 RETURN NO OF RESIDUALS=12


1520 RE" NON RECONS1UCl FRO" CH TOLERENCE,USUALLY lE-370.000001
OL INPUT INITIAL PARA" YALS
1530 IF All) = 0 THEN XII) = 0: 6OTO B(I)=196
1550 Bill =50
1540 XI!) = XIII I A(1) B(I)=-0.3
1550 RE" FN YAl=237.859078
1560 IF N= 1 THEN 6010 1660 16 =1 INF=l1
1570 Q = 1 lA"BDA=4E-05
1580 FOR I = 2 TO N FN YAl=3.88438723
1590 FOR J = 1 TO I - 1 Bill =217 •300262
1600 Q= Q+ I:XII) = XII) - AIQ) B(2)=50
• XlJ) BI31=-.3
1610 NEXT LA"BDA=4E-05
1620 Q = Q + 1 FN YAl=3.88438723
1630 IF AIQ) = 0 THEN XII~ = 0: 6OTO 16 =2 INF=13
1650 LA"BDA=I.6E-05
1640 XII) = XII) I AIQ) FN YAL=9.00000008E+09
1650 NEXT BI1I=219.169401
1660 IF AIN • IN + 1) I 2) = 0 THEN B(2)=50
XIN) = 0: 60TO 1680 BI3)=3.51491685
1670 XIN) = XIN) lAIN. IN + 1) I l~BDA=.456976
2) FN YAl=7321.77648
1680 IF N= 1 60TO 1770 B(1) =217.308558
1690 FOR I = NTO 2 STEP - 1 8(2)=50
1700 Q = I • (I - 1) I 2 8(3)=-.199591924
1710 FOR J = 1 TO I - 1 lA"8DA=6.397664
1720 XIJ) = X(J) - XII) • AI9 + J FN YAL=100.039937
) 8(1 )=217.306271
1730 NEXT B(2) =49. 9246579
1740 IF A(9) = 0 THEN XII - 1) = 8(3)=-.308975884
0: 60TO 1760 LA"BDA=1439.4744
1750 XII - 1) = XII - 1) I A(9) FN YAl=3.88412928
1760 NEXT B(1) =217.302883
1770 RE" END CHOLESKI B(2)=49.9953757
1780 RETURN B(3)=-.300008395
1790 RE" SU" OF SQUARES LA"8DA=23031.5904
1800 CU = 0 FN YAL=3.88412928
1810 FOR IS = 1 TO " 16 =3 INF=18
1820 CU = CU + FilS) • FilS) lA"BDA=9212.63617
1830 NEXT FN YAl=3.89945702
1840 P = CU B(1) =217.305573
1850 RETURN 8(2)=49.9835004
lRUN 8(3)=-.300169186
NO OF PAR~ETERS=3 LA"BDA=9212.63617
Programs 229
FN VAL=3.88496789 CORRESPONDIN6 PARA"ETERS
81 I) =217.305491 8(1) = 217.302883
8(2)=49.9947509 8(2) = 49.9953757
8(3)=-.300005289 8(3) = -.300008395
L~BDA=175040.087 COEFF VALS=8(1)=217.305491
FN VAL=3.88425395 B(2)=49.9953757
Bll)=217.305491 8(3)=-.300008395
8(2)=49.9953445 RESIDUALS
8(3)=-.300008209 1 .404976705
LA"BDA=3500801.74 2 .401489398
FN VAL=3.88421308 3 .551654126
8(I) =217 •305491 4 .666696049
B(2)=49.9953742 5 .808856748
8(3)=-.300008386 6 .265581213
LA"BDA=73516S36.6 7 -.930688426
FN VAL=3.88421113 8 .701020166
BII) =217.305491 9 -•311987355
8(2)=49.9953756 10 -.663439274
8(3)=-.300008395 11 .418302804
LA"8DA=I.61737041E+09 12 -.123885065
END OF RUN
FN VALUE=3.88421113 BREAK IN 1090
LAST FN VALUE =3.88412928

Program 8
This is another complex program which finds the minimum of a function
of n variables b(l) ... b(n). We think of it as a variable metric method. In
this example the function defined in the subroutine starting at 970 is
P(b) b2 b3 ••• ) = lOO(b 2 - bi)2 +(b) _1)2
a famous test example. Starting values for the b's are needed.
llIST 190 60SUB 1000
200 16 = 16 + 1
100 RE" VARIABLE "ETRIC NKI 210 FOR I = 1 TO N
110 INPUT "THE NO OF PARA"S= "jN 220 C8 (I , I) = 1
230 NEXT
120 FOR I = I TO N: INPUT "B(I)= 240 ILAST = 16
"j8(1) 250 PRINT "FN EVAlUATION NO";INF
130 NEXT
140 W= 0.21TL = 0.0001 260 PRINT "&RADIENT CAlCS=";I6
ISO RE" SETUP 270 PRINT "FN ="jPO
160 60SU8 950 280 FOR I = 1 TO N
170 PO = P 290 PRINT "COEFF=";BII): NEXT
180 INF = INF + I 300 PRINT
230 Mathematics for Seismic Data Processing

310 RE" ITER LOOP 720 S = 0


320 FOR I = 1 TO N 730 FOR J = 1 TO N
330 X(I) = 8(1):C(I) = 6(1) 740 5 = S + C(J) * CBII,J)
340 NEXT I 750 NEXT J
350 Dl = 0 7bO XII) = 5:D2 = D2 + 5 * C(I)
360 FOR I = 1 TO N 770 NEXT I
370 5 = 0 780 D2 = D2 I Dl + 1
380 FOR J = 1 TO N 790 FOR I = 1 TO N
390 5 = 5 - CD II, J) * 6(J) 800 FOR J = 1 TO N
400 NEXT 810 WX = T(l) * XlJ) + xm • T(J
410 T(I) = 5:Dl = Dl - 5 * 6(1) ) - D2 *T(I) *T(J)
420 NEXT I 820 C8(I,J) = C8(I,J) - WX I Dl
430 IF Dl } 0 THEN GOTO 460 830 NEXT
440 IF ILAST = IS SOTO 860 840 NEXT
450 SOTO 160 850 60TO 250
460 K= 1 8bO PRINT "NO FN EVALUATIONS=";I
470 COUNT = 0 NF
480 FOR I = 1 TO N 870 PRINT "NO GRADIENT CALC5=";1
490 8(1) = XII) + K* T(I) 6
500 IF 8(1) = XII) THEN COUNT = 880 PRINT "NUft8ER OF FUNCTION CA
COUNT + I llS ";S9
510 NEXT I 890 PRINT "FUNCTION VALUE=";PO
520 IF COUNT < N60TO 550 900 FOR I = I TO N
530 IF ILAST = IS THEN 60TO 860 910 PRINT
920 PRINT "1= ";1;" 8(I)=",B(I)
540 60TO 160 930 NEXT I
550 605UB 950 940 END
560 INF = INF + 1 950 5X = B(2) - B(I) • 8(1)
570 RE" IF FN NO CO"P THEN 960 5Y = 8(1) - 1
580 IF P ( PO - Dl * Kl TL THEN 970 P = 5X • 5X • 100 + 5Y • 5Y
60TO 600 980 69 =S9 + 1
590 K= W* K: 60TO 470 990 RETURN
600 PO = P 1000 EPS = 0.00001
610 SOSU8 1000 1010 FOR I = 1 TO N
620 16 = 16 + 1 1020 XX = BII):H = XX • EP5 + EPS
630 Dl = 0 *EPS
640 FOR I = 1 TO N 1030 SOSU8 950
650 T(I) = K* T(I) 1040 PI = P
660 C(I) = S(I) - C(I):Dl = Dl + 1050 B(I) = 8(1) + H
T(I) *cm 1060 S05UB 950
670 NEXT I 1070 P2 = P
680 IF 01 < (I THEN 60TO 210 1080 BII) = XX:S(I) = P2 - PI
690 IF 01 = 0 SOTO 160 1090 S(I) =S(I) I H
700 02 = 0 1100 NEXT
710 FOR I = 1 TO N 1110 RETURN
Programs 231

lRUH FN EVAlUATION N014


THE NO OF PARA"S= 2 GRADIENT CAlCS=8
BIIl= 0.5 FN =.028705741
BIII= 0.7 COEFF=.8317717
FN EVALUATION NOI COEFF=.689831749
GRADIENT CALCS=l
FN =20.5 FNEVALUATION N015
COEFF=.5 6RADIENT CALCS=9
COEFF=.7 FN =.0275865741
COEFF=.837636146
FN EVAlUATION N06 COEFF=.698134952
GRADIENT CALCS=2
FN =2.06336416 FN EVALUATION N016
COEFF=.64559693 6RADIENT CALCS=10
COEFF=.555998938 FN =.0249911004
COEFF=.852374883
FN EVALUATION N09 COEFF=.720887921
6RADIENT CALCS=3
FN =.438546167 FN EVALUATION N017
COEFF=.92955768 GRADIENT CALCS=11
COEFF=.798230381 FN =.0210267683
COEFF=.874676951
FN EVAlUATION NOlO COEFF=.757765317
6RADIENT CALCS=4
FN =.423564151 FN EVALUATION N018
COEFF=.733021118 GRADIENT CALCS=12
COEFF=.596673681 FN =.0146130444
COEFF=.902775999
FN EYALUATION NOll COEFF=.807820817
6RADIENT CALCS=5
FN =.0994769478 FN EVALUATION N019
COEFF=.788153424 GRADIENT CALCS=13
COEFF=.644552029 FN =5.6447563E-03
COEFF=.933557974
FN EVALUATION N012 COEFF=.868023051
GRADIENT CALCS=6
FN =.0684814198 FN EVALUATION N020
COEFF=.859137235 6RADIENT CALCS=14
COEFF=.716062515 FN =1.87848705E-03
COEFF=.958881672
FN EVALUATION N013 COEFF=.918083769
GRADIENT CALCS=7
FN =.0306907852 FN EVALUATION N021
COEFF=.825135854 GRADIENT CALCS=15
COEFF=.681913677 FN =4.89762509E-04
232 Mathematics for Seismic Data Processing

COEFF=.983692503 COEFF=.9979488t2
COEFF=.966154852 COEFF=.995836004

FN EVALUATION N022 FN EVAlUATION N065


GRADIENT CALCS=16 GRADIENT CALCS=22
FN =1.51795138E-04 FN =4.35884451E-06
COEFF=.991676235 COEFF=.997913901
COEFF=.984330105 COEFF=.995840544

FN EVALUATION N023 FN EVALUATION N076


GRADIENT CALCS=17 GRADIENT CALCS=23
FN =5.3680833E-06 FN =4.35884451E-06
COEFF=.997989834 COEFF=.997913901
COEFF=.9958685 COEFF=.995840544

FN EVALUATION N035 FN EVALUATION NOGt


GRADIENT CALCS=18 GRADIENT CALCS=24
FN =5.3680833E-06 FN =4.33694836E-06
COEFF=.997989834 COEFF=.997917467
COEFF=.9958685 CDEFF=.995839032

FN EVALUATION N040 FN EVALUATION H092


GRADIENT CALCS=t9 GRADIENT CALCS=25
FN =4.64069246E-06 FN =4.33694836E-06
COEFF=.997948812 COEFF=.997917467
COEFF=.995836005 COEFF=.995839032

FN EVALUATION N049 NO FN EVALUATIOHS=102


GRADIENT CAlCS=20 NO GRADIENT CALCS=25
FN =4.64069229E-06 NU"BER OF FUNCTION CALLS 202
COEFF=.997948812 FUNCTION VALUE=4.33b94836E-06
COEFF=.99S836004
1= 1 BII)=.997917467
FN EVALUATION HObO
GRADIENT CALCS=21 1= 2 8(1)=.995839032
FN =4.64069229E-06

Program 9
This is a simple program that integrates the function defined at line 160
using the trapezoidal rule with n points. Two values of n are illustrated.
See also program 10 for Simpson's rule.
Programs 233

ll\ 220 NEXT


lUST 230 S : S + t FN F(A) + FN FIB))
I2
100 RE" INTEGRATION USING TRAPI 240 S : S • Ie
ZODAl 250 PRINT "INTEGRAL lS",S
110 RE" FN DEFINED IN 160 260 END
120 RE" lI"ITS A,B
130 INPUT GIYE lONER lI"IT "jA
I JRUN
SIYE lONER ll"IT -4
140 INPUT I GIVE UPPER ll"IT ";B GIVE UPPER ll"IT 0
NO OF POINTS 25
150 INPUT "NO OF POINTS "jN INTEGRAL IS .499967192
160 DEF FN FIX) : EXP ( - X*
X ! 2) * 0.39894228 JRUN
170 IC : IB - A) I N GIVE LONER LI"IT -4
180 S : 0 GIVE UPPER ll"IT 0
190 FOR I : 1 TO N- 1 NO OF POINTS 50
200 X: A+ I • IC INTEGRAL IS .499968043
210 S : S + FN FIX)

Program 10
This program integrates the function defined at line 150 using Simpson's
rule with n points. Two values of n are used as illustration.

lUST 1030 11: FN FIA + 0.5 • D)


1040 A: A+ D:Z2: FN FIA)
100 RE" SI"PSONS RULE 1050 P : P + D• (ZO + Z2 + 4 • Z
110 RE" INTE6ERAND IS DEFINED 1) I 6
ONLINE 150
1060 ZO : Z2
120 RE" A,B ARE LI"ITS OF INTEG 1070 NEXT
RATION 1080 RETURN
130 RE" NIS THE NU"BER OF STRI
PS lRUN
140 RE" SI"PSONS RULE
150 DEF FN F(X) : EXP ( - X * LONER LI"IT -4
UPPER U"IT 0
X ! 2) • 0.39894228
N: 25
160 INPUT "LONER LI"IT ";A INTEGRAL IS .499968328
170 INPUT "UPPER LI"IT ";B
180 INPUT "N: "iN
190 60SUB 1000 JRUN
200 PRINT "INTEGRAL IS ";P LONER U"IT -4
210 END UPPER LI"IT 0
1000 D: (B - Al ! N N: 50
1010 P : 0:10: FN F(A) INTEGRAL IS .499968329
1020 FOR I : 1 TO N
234 Mathematics for Seismic Data Processing

Program 11
This is a program which can be used to solve systems of equations and to
find inverses. Essentially it uses elementary operations (in a rather clever
way) to find the solution of
Ax=b
and is written to work for several different b's at the same time.
If the b's are chosen to be the columns of the unit matrix then the solutions
are the columns of the inverse matrix. Note the tolerance asked for is used
to ensure values close to zero are avoided in scaling.
lLIST 380 NEXT
390 D= - 0
100 RE" GAUSS ElI"INATION 400 D= D• AtJ,J)
110 INPUT NTHE ORDER OF A";N
I 410 IF ADS IAtJ,J)) < Tl THEN STOP
120 INPUT P THE NU"DER OF R.H.
I

SIDES"iP 420 RE" STOPS HERE IF CO"P. SIN


130 PRINT "NOW INPUT A,COlU"NWIS SULAR
E" 430 FOR K= J + 1 TO N
140 FOR I = 1 TO N 440 AIK,J) = A!K,J) ! AIJ,J)
150 PRINT "COlU"N "iI 450 FOR I = J + 1 TO N+ P
160 FOR J = 1 TO N 460 AtK,I! = A(K,I) - A(K,J) • AI
170 INPUT AIJ,I) J,1)
180 NEXT 470 NEXT
190 NEXT 480 NEXT
200 PRINT "NOW INPUT R.H.SIDES" 490 NEXT
210 FOR I = I TO P 500 0 = 0 • AtN,N): RE" DET CO"P
220 PRINT "COl NO. "jI UTED
230 FOR J = I TO N 510 IF ADS IAIN,N)) < = TL THEN
240 INPUT AIJ,N + I) STOP
250 NEXT 520 RE" AGAIN CO".SINGULAR
260 NEXT 530 RE" ••••••••••••••••••••
270 RE" NOW TO START 540 RE" BACK SUBSTITUTION
280 D= I: RE" DET INITIALISED 550 FOR I = N+ I TO N+ P
290 INPUT "SIVE TOLERANCE "jTl 560 AIN,I) = A!N,I) / AIN,N)
300 FOR J = 1 TO N- 1 570 FOR J = N- I TO I STEP - 1
310 S = ADS IAIJ,J)):K = J
320 FOR H= J + 1 TO N 580 S = A(J,I)
330 lF ADS IAIH,J)) ) S THEN S = 590 FOR K= J + 1 TO N
ADS IA(H,J)):K = H 600 S = 5 - A(J,K) • AtK,I)
340 NExT: RE" PIVOT SEARCH END 610 NEXT
ED 620 AiJ,I) = 5 / AIJ,J)
350 IF K= J THEN SOTO 400 630 NEXT
360 FOR I = J TO N+ P 640 NEXT
370 S = AtK,I):AtK,I) = AIJ,I):AI 650 RE" PRINT SOLNS
J, Il = S 660 FOR I = 1 TO P
Programs 235
b70 PRINT "NO."jI ?1
b80 FOR J = 1 TO N ?-1
b90 PRINT A(J,I + N) COLUItN 2
700 NEXT ?1
710 PRINT 13
720 NEXT ?2
730 END COLUItN 3
JRUN ?-1
NTHE ORDER OF A3 12
P THE NU"BER OF R.H.SIDESI ?1
NON INPUT A,COLU"NNISE NON INPUT R.H.SIDES
COLU"N 1 COL NO. 1
?1 ?1
?3 ?O
?4 ?O
COLU"N 2 COL NO. 2
?2 ?O
?1 ?1
?-3 ?O
COLU"" 3 COL NO. 3
?1 ?O
?-2 ?O
?-1 ?1
NON INPUT R.H.SIDES GIVE TOLERANCE 0.001
COL NO.1 NO.1
?2 .1
?1 .3
?3 -.5
GIVE TOLERANCE 0.001
NO.1 NO.2
1 .3
0 -.1
1 .5
JRUN
NTHE ORDER OF A3 NO.3
P THE NU"BER OF R.H.SIDES3
NON INPUT A,COLU"NMISE -.5
COLU"N 1 .5
?2 -.5

Program 12
This is a program for finding the eigenvalues and vectors of a real symmetric
matrix using what is often called Jacobi's method. The input ensures that
a symmetric matrix is supplied.
236 Mathematics for Seismic Data Processing

lLIST 420 GOTO 440


430 IF ABS IAIJ,J)) = ABS (AIJ
100 RE" JACOBI Al6 FOR EIGENYAL ,J) + 100 * ABS IP)) THEN GOTO
UES 500
110 INPUT· GIYE ORDER OF "ATRIX 440 C = S9R liT + g) ! 12 • T)):
·;N S = P / IT • C): GOTO 470
120 PRINT ·THIS IS ONLY FOR SY"" 450 S = S9R liT - g) / 12 • TI)
ETRIC "ATRU· 460 IF P < 0 THEN S = - S
130 DI" YIN,N),AIN,N): RE" AIS 470 C = P ! IT • S)
INPUT YHOLDS EYECTORS 480 IF ABS IS) ( > 0 THEN GOTO
140 PRINT ·READ UPPER TRIANGLE C 520
OLU"NIIISE· 490 GOTO 650
150 FOR I = 1 TO N 500 " = " + 1
160 PRINT ·COLUftN ·jI 510 GOTO 650
170 FOR J = 1 TO I 520 FOR K= 1 TO N
180 INPUT AII,J): 530 9 = AII , KI
190 IF J < I THEN A(J,I) = AII,J 540 AII,K) = C *9 + S • AIJ,K)
) 550 AIJ,K) = - S * 9 + C * AIJ,K
200 NEXT )

210 NEXT 560 NEXT


220 60SUB 870 570 FOR K= 1 TO N
230 FOR I = 1 TO N 580 9 = AIK,I)
240 FOR J = 1 TO N:YII,J) = 0, NEXT 590 AIK,I) = C • 9 + S * AIK,J)
600 AIK,J) = - S • 9 + C *AIK,J
250 YU,1l = 1 I
260 NEXT 610 99 = VIK,I)
270 CO = 0 620 YIK,I) = C * gg + S * VIK,J)
280 CO = CO + 1 630 VIK,J) = - S • 99 + C * VIK,
290 IF CO < 30 THEN GOTO 340 J)
300 PRINT ·STOPPED BY ITERATION 640 NEXT
COUNT·: PRINT 650 NEXT
310 60SUB 710 660 NEXT
320 GOSUB 770 670 IF" < N* IN - 1) ! 2 THEN
330 STOP GOIO 280
340 " = 0 680 GOSUB 710
350 FOR I = 1 TO N- 1 690 GOSUB 770
360 FOR J = I + 1 TO N 700 END
370 P = 0.5 • IAII,J) + A(J,I») 710 RE" PRINTE EI6ENVALUES
3809= AII,I) - A(J,J):T = SQR 720 PRINT ·EI6ENVALUES ••• ·: PRINT
(4 * P * P + 9 *g)
390 IF T • 0 THEN GOTO 500 730 FOR I = 1 TO N
400 IF 9 < 0 THEN GOTO 450 740 PRINT· ·;1;· ·jA(I,I)
410 IF ABS (A(I,I)) = ABS IAII 750 NEXT
,I) + 100 * ABS IP) THEN GOTO 760 RETURN
430 770 RE" :PRINT EIGENYALUES
Programs 237
780 PRINT "EIGEN VALUES I '4
790 FOR I = 1 TO N ?5
800 PRINT 'VECTOR NO. "jl ?6
810 PRINT : 124
820 FOR J = 1 TO N 235
830 PRINT VIJ,Il 456
840 NEXT EIGENVALUES •••
850 NEXT
860 RETURN 1 11.5640289
870 RE" TRACER 2 -.0573962428
880 FOR I = 1 TO N 3 -1.50663263
890 FOR J = 1 TO N EIGEN VALUES
900 PRINT A(J,I)j" "; VECTOR NO. 1
910 NEXT
920 PRINT .386153928
930 NEXT .530682311
940 RETURN .754494155
JRUN VECTOR NO. 2
GIVE ORDER OF "ATRIX 3
THIS IS ONLY FOR SYKKETRIC KATRIX -.62028727
READ UPPER TRIANGLE COLUKNWISE .754782185
COLUKN 1 -.213418736
?1 VECTOR NO. 3
COLU"N 2
'1'1
:1.. -.682736294
?3 -.385590636
COLUKN 3 .620637587

Program 13
This program computes Binomial probabilities.
JlIST
180 INPUT "NO OF DECIKAL PLACES
100 REK PROSRAK "iFO
110 REK COKPUTES BINO"IAL PROBS 190 g = 1 - P
200 DII'! A(N)
120 REK THE NU"BER OF TRIALS IS 210 DEF FN R(X) = INT iX • (10
N .• FO) + 0.5) ! (to" FO)
130 RE" P THE PROBABILITY IF SU 220 IF P < g THEN 60SUB 410: GOTO
CCESS 240
140 PRINT "BINOKIAL PROBABILITIE 230 GOSUB 330
S" 240 PRINT' I P(X=!) PO(=I
150 PRINT: PRINT : )": PRINT:
160 INPUT' GIVE N "jN 250 FOR I = 0 TO N
170 INPUT GIVE P "iP
I 260 Y = Y + A!I)
238 Mathematics for Seismic Data Processing

270 PRINT Iii "i FN RIAII)!;" GIVE N 17


"; FN RIY) GIVE P 0.37
280 NEXT NO OF DECIHAL PLACES 4
290 PRINT "END OF RUN" I PIX:Il P(X<=Il
300 END
310 REH ••••••••••••••••••• 0 4E-04 4E-04
320 REH COHPUTES PROBS IF P)Q 1 3.9E-03 4.3E-03
330 XX : P i Q 2 .0182 .0225
340 A(O! : Q A N 3 .0534 .0759
350 FOR I : 1 TO N 4 .1099 .1858
360 A(I) : AII - 1) • (N - I + 1) 5 .1677 .3535
• XX ! I 6 .197 .5505
370 NEXT 7 .1818 .7324
380 RETURN B .1335 .B659
390 RE" ••••••••••••••••••• 9 .0784 .9443
400 RE" COHPUTES PROBS IFQ)P 10 .0368 .9811
410 X : QI P 11 .0138 .9949
420 AIM) : PAN 12 4E-03 .99B9
430 FOR I : NTO 1 STEP - 1 13 9E-04 .9998
440 AlI - 1) : Am • x • I ! (N - 14 2E-04 1
I + 1) 15 0 1
450 NEXT 16 0 1
460 RETURN 17 0 1
lRUN END OF RUN
BINOHIAL PROBABILITIES

Program 14
This program computes Poisson probabilities.
JlIST 200 GOSUB 280
210 PRINT I PIX:!)
I PO<=I
100 REH COHPUTES POISSON PROBS !": PRINT:
110 REH THE NUHBER OF TRIALS IS 220 FOR I : 0 TO N
M 230 Y= Y+ A(I!
120 REH P THE PROBABILITY IF SU 240 PRINT Ij" "i FN R(A(I)!j"
CCESS " j FN R(Y)
130 PRINT POISSON PROBS.
I I 250 NEXT
140 PRINT : PRINT : 200 PRINT "END OF RUN"
150 INPUT GIVE N "jN
I 270 END
160 INPUT • GIVE "EAN "jP 280 RE" •••••••••••••••••••
170 INPUT "NO OF DECIHAL PLACES 290 RE" CO"PUTES PROBS IF P)Q
"jFO 300 A(O) = EXP ( - P)
IBO DIH A(N! 310 FOR I = 1 TO N
190 DEF FN RlXl = INT (X • (10 320 All) = A(I - I! • P ! I
.• FO! + 0.5) ! (10 . . FO) 330 NEXT
Programs 239
340 RETURN 5 .0607 .1157
350 RETURN 6 .0911 .2068
JRUN 7 .1171 .3239
POISSON PROBS. 8 .1318 .4557
9 .1318 .5874
10 .1186 .706
GIYE N 18 11 .097 .803
GIYE "EAN 9 12 .0728 .8758
NO OF DECI"AL PLACES 4 13 .0504 .9261
I P(x=Il P(x(=Il 14 .0324 .9585
15 .0194 .978
0 lE-04 lE-04 16 .0109 .9889
1 1.IE-03 I. 2E-03 17 5.8E-03 .9947
2 5E-03 6.2E-03 18 2.9E-03 .9976
3 .015 .0212 END OF RUN
4 .0337 .055

Program 15
This program is a simple data analysis program. Given some data, here
supplied by a random number generator, the program computes some
sample statistics, prints the data in order of magnitude and plots a simple
histogram.
Points of note are the recursive mean and variance estimation, lines 170
and 180, and the "Shell sort" algorithm starting at line 320.

llIST 260 IS = INT IN I 2)


270 IF S } 0 THEN "D = XIIS + 1)
100 RE" DATA INPUT : SO TO 290
110 PRINT "SIVE THE NU"BER OF" 280 "D = IXIIS) + XIIS + 1)) I 2
120 INPUT "DATA POINTS "jN 290 PRINT ""EDIAN = ";"D
130 DI" XIN) 300 SOSUB 520
140 "E = O:VA = 0 310 PRINT "END OF RUN ": END
ISO FOR I = 1 TO N 320 RE" ••••• SHELL SORT ••••••••
160 XII) = RND 12) * 10
170 VA = VA + III - 1) * IXII) - 330 RE" S,I,J,Jl,W
~) A 2) I I 340 S = N
180 "E = II - 1) • "E + XIII,"E = 350 6 = INT IS I 2)
fIE I I 360 IF S = 0 THEN 60TO 490
190 NEXT 370 FOR I = 1 TO N - 6
200 VA = VA I IN - II 380 FOR J = I TO 1 STEP - 6
210 PRINT ""EAN = ";"E 390 Jl = J + 6
220 PRINT "VARIANCE = "iVA 400 IF XIJ) } XIJl) THEN 430
230 RE" SORT FOR "EDIAN 410 J = 0
240 60SUB 320 420 aUTO 460
250 6 = N I 2 - INT (N I 2) 430 If = XlJ)
240 Mathematics for Seismic Data Processing

440 I(J) =1(11) .569998023


450 I(Jl) = " .609974674
460 NEXT 3.26050515
470 NEXT 3.71763383
480 SOTO 350 4.3B754545
490 REft •••••••••••••••••••••••• 4.73742943

500 FOR I = t TO NI PRINT X(I). NEXT
4.84409365
5.49798409
6.46748403
510 RETURN 6.89944903
520 REft HISTDSRAII FREQUENCIES 7.022541K14
530 INPUT "LONER END OF RAMSE";A 7.35606948
I 7.72387781
540 INPUT "UPPER END OF RANSE";B 7.9535178
X 8.47421985
550 INPUT "NO OF INTERVALS";II 8.6852196
560 IF IX ) 20 ThEN IX = 20 B.92224367
570 Dlft Flm 9.13709861
580 RI = BX - AX 9.61442052
590 FOR I = 1 TO N 9.65093233
600 PX = XII) - AX:PI =PI I RX ftEDIAN = 6.96099853
610 PX = INT IIX • PX) LONER END OF RANGEO
620 FIPX) = F(PX) + 1 UPPER END OF RAN6EI0
630 NEXT NO OF IHTERVAlS12
640 RE" NON PLOT !U
650 FOR I =0 TO IX
660 PRINT "!";
670 IF Fill = 0 SOTO 710 !t
680 FOR J = I TO Fill !•
690 PRINT "'"; !...
700 NEXT !•
710 PRINT !.
720 NEXT !u.
730 RETURN !U
!UU
JRUN !U
GIVE THE NUftBER OF
DATA POINTS 20 END OF RUN
ftEAN =6.27661226
VARIANCE = 7.5952495
Program 16
This program generates random numbers from the Binomial distribution
with n = 4 and p = 0.4. The sample frequencies are compared with the
theoretical ones.
Programs 241

JLIST 280 Z = I + X
290 PRINT Ij" "jFII);" "jCII);'
100 RE" BIN GENERATOR ";X;" "iZ
110 INPUT 'SA~LE SIZE 'jREP 300 NEXT
120 DI" AIREP) 310 PRINT "SA"PLE'
130 FlO) = 0.1296 320 FOR I = 1 TO REP
140 Fll) = 0.4752 330 PRINT All);" "j
150 F(2) = 0.8209 340 NEXT
160 F(3) = 0.9744 350 END
170 F(4) = 1.0000
180 FOR I = 1 TO REP JRUN
190 X= RND (9) SA"PLE SIZE 20
200J= -1 o .1296 4 .2 .2
210 J = J + 1 1 .4752 7 .35 .55
220 IF X ) FIJ) SOTO 210 2 .8209 8 .4 .95
230 CIJ) = CIJ) + 1 3 .9744 1 .05 1
240 Am = J oI
250 NEXT
260 FOR I = 0 TO 4 1 0 2 2 2 2 320 1 200 1 1 1 1 122
270 X= CII) J REP
Program 17
This is a departure for our original self imposed brief but we couldn't resist
the temptation to include a drunkard's walk. The lines 150, 340 contain
code to plot using an Apple II micro. F is a scale parameter.

JUST 250 IF Dl < = 0.5 THEN PY = -


1
260 CX = CX + PXley = CV + PV
100 REI'I DRUNKARDS WALK 270 X = ex * FlY = CV * F
110 INPUT "SIVE NO OF STEPS";N 280 Y= 80 - V
120 INPUT IF"iF 290 X = 140 + X
130 PX = O:PY = 0 300 IF X < 0 THEN SOSUS 380
140 VI = 80:11 = 140 310 IF Y< 0 THEN 60SUB 380
150 HSR l HCOLOR= 3 320 IF X) 279 THEN SOSUB 380
160 FOR I = I TO N 330 IF Y > 159 THEN 60SUB 380
170 D= RND (9) 340 HPLOT XI,Yl TO X.Y
180 DI = RND (9) 350 Yl = YlXl = X
190 PX = OlPY = 0 360 NEXT
200 IF D< 0.5 THEN SOTO 240 370 END
210 PX = 1 380 RE" ENDER
220 IF DI < = 0.5 THEN PI = - 390 VTAB (24); PRINT "END OF 6RA
1 PH"j
230 SOTO 260 400 STOP
240 PV = 1 410 RETURN
242 Mathematics for Seismic Data Processing

JRUN
GIVE NO OF STEPS400
F4

Program 18
We have chosen to give a fairly standard FFr program and this one follows
the original form due to Cooley and Tukey. There are others which are
slicker but more obscure.
This program is devised for powers of two and expects a data series with
real part RA(I) and imaginary part IA(I) which are supplied on line 160.
The transform is computed and then the transform ofthe complex conjugate
of the transform. As we expect we retrieve our original series times a constant.

JLIST 170 NEXT


175 GOSUB 1000
100 RE" FFT TEST 180 GOSUB 2000
110 INPUT"" "j" 190 GOSUB 1000
120 N = 2 ... " 195 FOR I = 1 TO NIIAII) = - IA
130 PRINT"" = "j"j" N= "jN (f)INEXT
140 DI" RAINl,IAIN) 199 PRINT "BACKNARDS": PRINT :
150 FOR I = 1 TO N 200 GOSUB 2000
160 RAIIl = I 210 60SUB 1000
Programs 243
220 END 2304 IAtl) = IAtl) + IT
1000 RE" PRINTER 2310 NEXT
1010 FOR I = 0 TO N 2312 91 = IU * IN;92 = RU * RN;93
1020 PRINT Ij" "jRAtI),IAtl) = IU • RN;94 = RU • IN
1030 NEXT 2314 RU = 92 - 91
1040 RETURN 2316 IU = 93 + 94
2000 RE" FFT AFTER COOLEY &TUK 2340 NEXT
EY 2350 NEXT
2010 NV = N / 2 2360 RETURN
2020 N" = N- 1 JRUN
2030 J =1
" 4
2040 FOR I = 1 TO N" " = 4 N= 16
2050 IF I ) = J SOTO 2100 000
2060 IT = IA (J) 1 1 0
2070 RT = RA (J) 220
2080 RAtJ) = RAtl) 3 3 0
2090 IAtJ) = JAIl) 440
2092 RA (I) = RT 550
2094 lAm = IT b 6 0
2100 K= NV 7 7 0
2110 IF K> = J THEN SOTO 2150 880
990
2120 J =J - K 10 10 0
2130 K= K! 2 11 11 0
2140 SOTO 2110 12 12 0
2150 J =J + K 13 13 0
2155 NEXT 14 14 0
2160 PI = 3.1415927 15 15 0
2170 FOR L = 1 TO " 16 16 0
2180 LE = 2 A L 000
2190 Ll = LE ! 2 1 136 0
2200 RU = 1.0 2 -7.99999913 -40.2187153
2210 IU = 0.0 3 -7.99999953 -19.3137084
2220 RN = COS tPI / Ll) 4 -7.99999914 -11.972846
2230 IN = SIN tPI / LI) 5 -7.99999981-8
2240 FOR J = 1 TO Ll 6 -7.99999965 -5.34542922
2250 FOR I = J TO NSTEP LE 7 -7.99999965 -3.31370864
2260 IP = I + Ll 8 -7.9999992 -1.59129925
2270 RT = RAtIP) • RU - IAtIP) * 9 -8 0
IU 10 -8 1. 59129879
2280 IT = IAtIP) • RU + RA(IP) • 11 -8.00000009 3.31370836
IU 12 -8.00000007 5.34542864
2290 RAtIP) = RAtl) - RT 13 -8.00000019 8
2300 IA(IP) = IAII) - IT 14 -8.00000042 11.972846
2302 RAtl) = RAtl) + RT 15 -8.00000074 19.3137087
244 Mathematics for Seismic Data Processing

16 -8.00000239 40.2187163
BACKWARDS

0 0 0
1 16 7. 4505806E-09 9 144 7. 4505806E-0'I
2 32.0000005 3. 68739381E-07 10 160 -1.8318566E-07
3 47.9999999 -1.71712605£-08 11 176 1. 71712595E-08
4 64.0000001 -1. 95979044E-07 12 192 -5. 35448974E-08
5 80.0000005 3. 72529149E-08 13 208 -5. 21540761E-08
6 96.0000007 -9.91920048E-08 14 223.999999 -1.35746416E-07
7 112.000001 8.08504415E-08 15 240 -8.08504405E-08
8 128 -8. 97791664E-08 16 255.999999 3.88687806E-07

Program 19
This program will compute the gain and phase of a given filter. It is very
much better with plotting. You must supply the program with the coefficients
of the filter polynomial, A(l) ... A(LAGS). The frequency range is expected
to be between multiples of pi.
If the filter inverse is wanted this can be provided.

lLIST 280 N: WI
290 FOR J : 0 TO ex
100 RE" FILTER PR06RA" 300 Y: O:Z : 0
110 6X : 20 310 FOR K: 1 TO N
120 INPUT "NO. OF LASS: "jN 320 X : AIIO
130 DI" AIN),6ISX),PI6X),PLI6X) 330 V: V+ X' COS IK • N)
140 DI" DISXl 340 Z: Z + X' SIN IK • W)
150 PRINT "NON INPUT COEFFS "j 350 NEXT
160 PRINT: PRINT 360 61J) : V• V+ Z • Z
170 FOR I : 1 TO NI INPUT" All) 370 IF Z ( > 0 THEN PIJ): ATN
"jAm: NEXT IV I Zl
180 PRINT "ECHO CHECK " 380 N : N + EP
190 FOR I : 1 TO N 390 NEXT
200 PRINT' AI";I;'): 'jAII) 400 PRINT 'IF INVERSE USE TYPE V
210 NEXT ES"
220 PRINT "OK ••• TO PROCEED ••• • 410 INPUT 'WELL ? •• ·;9$
230 PRINT 'FRE9UENCY FRO" WI TO 420 IF 9$ < } 'VES' SOTO 470
W2 " 430 FOR I : 0 TO 6X:6(1) = I ! 6
240 PRINT "JUST 61VE THE ~LTIPL m
E OF PIE" 440 P(I): - P(I)
250 INPUT" WI: ";Wl: INPUT" N2 450 NEXT
: "jW2 460 PRINT: PRINT "INVERSE RE9UE
260 WI : WI • 3.14159:W2 : W2 • 3 STED •• •
.14159 470 PRINT ·SAIN •••• •
270 EP : IW2 - WI) I ex 480 PRINT 'FRE9UENCV SAIN'
Programs 245
490 II = III lRUN
500 FOR I = 0 TO SX NO. OF LASS = 4
510 PRINT N;' ';SII) NON INPUT COEFFS
520 II = II + EP
530 NEXT Am 0.25
540 SET AS Am 0.25
550 PRINT 'PHASE ••••• Am 0.25
SbO N= N1 Am 0.25
570 PRINT' FREUUENCY •••••• PHASE ECHO CHECk
AIl)= .25
580 FOR I = 0 TO 6X A(2)= .25
590 PRINT N;' ';PII) A(3)= .25
600 N= N+ EP A(4)= .25
610 NEXT OK ••• TO PROCEED •••
620 SET AS FREQUENCY FRO" N1 TO 112
630 HO"E JUST 61VE THE "ULTIPLE OF PIE
640 STS = ·SAIN ••• • IU= 0
650 FOR I = 0 TO 6X:D(I) = SII): N2= 1
NEXT IF INVERSE USE TYPE YES
660 STS = ·PHASE •••• • IIELL ? ...
670 FOR I = 0 TO 6X:D(I) a P(I): SAIN ••••
NEXT FREQUENCY 6AIN
680 RE" tt*t PLOTTIN6 HERE* ••••• o 1
.1570795 .969523123
690 HO"E .314159 .882373788
700 TEXT .4712385 .750628444
710 PRINT 'THAT'S AlL FOLkS ••• • .628318 .592009056
720 END .7853975 .426777378
730 "A = D(O):"I = "A .942477 .274283868
740 FOR I = 1 TO 6X 1.0995565 .149839713
750 IF-"A < 0(1) THEN "A = D(I) 1.256636 .0625004559
760 IF "I) 0(1) THEN "I = D(I) 1.4137155 .0141502105
770 NEXT 1.570795 8. 78525569E-13
780 R = "A - "I 1.7278745 .0103215865
790 FOR I = 0 TO SX 1.884954 .03299125'
800 PL(I) = 160 - INT (((0(1) - 2.0420335 .05bZb80695
"I) * 150) I Rl 2.199113 .0711082419
810 NEXT 2.3561925 .0732233648
820 OY = 160 - INT i((O - "I) • 2.513272 .0625002154
150) I R) 2.671\3515 .0432648653
830 OX = INT (N1 ! EP)IOX = - 0 L.827431 .0221351976
X 2.9845105 6.00537332E-03
840 RETURN 3.14159 1.75717995E-12
246 Mathematics for Seismic Data Processing

PHASE ....
FREQUENCY •••••• PHASE
o 0
.1570795 1.17809758 1.884954 3.97507172E-06
.314159 .785398827 2.0420335 -.39269477
.4712385 .392700077 2.199113 -.785393521
.628318 1.32616363E-06 2.3561925 -1.17809227
.7853975 -.392697424 2.513272 -1. 57079102
.942477 -.785396174 2.6703515 1.17810288
1.09955b5 -1.17809493 2.827.31 .785404129
1.256636 -1.57079368 2.9845105 .392705375
1.4137155 1.17810023 3.14159 0
1.570795 .7852606 THAT'S AlL FOLKS •••
1.7278745 •392702725

Program 20
Often one wishes to manipulate polynomials when using filters. This pro-
gram will untangle discrete time series as follows.
Suppose that
A(z)f(z) = B(z)g(z)
Then the program will find the coefficients of the polynomial
C(z) = A-1(z)B(z)
A(z) is the AR part of a polynomial of order p and B(z) is the MA part
of order q, i.e.

i=O

q .
B(z)= L bjz'
;=0

JlIST 190 FOR I = 0 TO P


200 INPUT "AII)= "jAII)
100 RE" FINDS "A 210 NEXT
110 PRINT "NEED 3 PLOYNO"IALS" 220 FOR I = 0 TO P
120 INPUT "GIVE ORDER OF AR"jP 230 PRINT "AI "jl j")= "jAII)
no INPUT "GIVE ORDER OF "A"ja
140 INPUT NO OF TER"S NEEDED
I I 240 NEXT
jN" 250 PRINT "INPUT "A COEFFS·
150 N= a 260 FOR I = 0 TO a
160 IF P > 9 THEN N= P 270 INPUT BII)= "j8II)
I

170 DI" AIN),BIN),PIN"} 280 NEXT


180 PRINT • AR. POLYNO"IAL COEFF 290 FOR I = 0 TO a
ICIENTS· 300 PRINT 8(l jl j") = "j8(1)
I
Programs 247
310 NEXT A(I) = 0.7
320 PIO) = BIO) I AIO) A(I)= 0.2
330 FOR I = I TO Nil AI 0)=
340 S=0 A( 11= .3
350 FOR J = I TO I A( 2)= .7
360 K= I - J AI 3)= .2
370 IF J ) NTHEN DU = 0: SOTO 3 INPUT I!A COEFFS
90 8(l)= 1
380 DU = AIJ) 81Il= 0.2
390 S = S + PIK) • DU 8m= 0.3
400 NEXT 8m: 0.4
410 IF I ) NTHEN PII) = - S ! 8m: 0.9
A(0): SOTO 430 B(O) : 1
420 pm = Bm - S:P(l) = pm ! 8111 = .2
A(O) B(2) : .3
430 NEXT B(3) : .4
440 PRINT "II.A. REP •••• • B(4) : .9
450 FOR I = 0 TO Nil II.A. REP ....
460 PRINT" PI"iIi")= "jP(I) P(o): 1
470 NEXT P(ll: -.1
480 END P(2)= -.37
P(31= .381
P(4): 1.0647
JRUN PIS): -.51211
NEED 3 PLOYNOIIIALS PIli): -.667857
GIVE ORDER OF AR3 pm= .3458941
GIVE ORDER OF IIA4 PIS): .46615367
NO OF TERIIS NEEDED 12 P(9)= -.248400571
AR. POLYNOIIIAL COEFFICIENTS PIIO)= -.320966218
Am= 1 P(ll): .176939531
Am= 0.3 P(12): .221274607
Program 21
This program gives the recursive method ascribed to Levinson for estimating
the filter coefficients for a Wiener filter. This is discussed in Chapter 9. For
this simple case the correlations are supplied in the program.

JLIST
ISO R(4) = 0.01:RI51: - O.Ol:R(
100 REI! :TOEPLITZ TESTER 6) : 0.02
110 REII: CORRELATIONS ARE IN RI 160 R(71 : O.OI:RIB): - 0.001
1) 170 PRINT "SIVE ORDER OF IIATRIX
120 DIll PHI9,91,RII0)
130 RIO) = I:RII) = 0.B06 180 INPUT "Dl"=";NO
140 R(2) = 0.42B:R(31 = 0.070 190 IF NO ) 10 THEN GOTO 270
248 Mathematics for Seismic Data Processing

200 IF NO = I THEN PRINT R(ll: 60TO JRIIN


260 GIVE ORDER OF ~TRIX
210 IF NO < 1 THEN 60TO 270 DI"=4
220 60SUB 290 1 1.23163107
230 FOR I = 1 TO NO 2 -.455699002
240 PRINT Ii" "iPH(NO,I) 3 -2.31438685
250 NEXT 4 1.69420738
260 60TO 180 DI"=5
270 PRINT "PR06RA" END": 1 3.01352816
280 END 2 -2.53858417
290 RE" :TOEPlITZ RECURSION 3 -.479285322
300 PH(I,I) = R(l) 4 1.29537851
310 FOR I = 2 TO NO 5 -1. 05175855
320 DX = OlDY = 0 DI"=9
330 FOR J = 1 TO I - 1 1 2.32118233
340 DX = PH!I - I,J) • R(I - J) + 2 -4.37933773
DX 3 -.494J141014
350 DY = PH!I - I,J) • R(J) + DY 4 1.29569952
360 NEXT 5 .0119352533
370 PH(J,Il = IRm - Dx) i II - 6 -3.17087649E-03
DY) 7 8.27713829E-03
380 FOR J = I TO I - 1 8 .0164944491
390 PH(I,J) = PHil - J,J) - PHil, 9 -7.1073405E-03
II * PHil - 1, I - J) DI"=O
400 NEXT PR06RA" END
410 NEXT
420 RETURN
Program 22
This program gives the recursive method ascribed to Levinson for estimating
the filter coefficients for a Wiener. Unlike the simple version in program 21
this program has some extra wrinkles. Firstly it simulates a second order
stationary process x(t) using the generating model ascribed to the sunspot
series. Then it computes the correlations. The method used is direct and is
quite slow on a micro (an FFT based routine might be better).
Given the correlations the filter coefficients can be found recursively where
the assumed filter length is supplied. You will notice this is precisely
predictive deconvolution with on step prediction. The extension to several
steps is fairly easy.
lLlST 140 DI" R!NC),XINN)
150 RE" :FIRST SI~lATE SERIES
100 RE" ,SERIES 6ElERATOR 160 FOR I = 2 TO NN
110 INPUT 'SERIES LENGTH";N 170 XIII = 1.32 • X(I - 1) - 0.63
120 INPUT "NO OF CORRELNS';NC * XII - 2) + RND (9)
130 NN = N+ 12 180 NEXT
Programs 249
190 FOR I = 1 TO N:XII) = XII + 570 DY = PHil - 1,J) • RIJI + DY
12): NEXT 580 NEXT
200 PRINT: PRINT "SERIES GENERA 590 PHIl,}) = IRIII - OX! ! 11 -
TED": PRINT DY!
210 RE" :NON CO"PUTE CORRELATION 600 FOR J = 1 TO I - 1
S 610 PHII,J) = PHil - J,J) - PHIl,
220 XX = 0 }) • PH (J - 1, I - J)
230 RIO) = 1 620 NEXT
240 FOR I = 1 TO N 630 NEXT
250 XX = xx + XII) • XII) 640 RETURN
260 NEXT
270 FOR I = 1 TO NC JRUN
280 XS = 0 SERIES LEN6TH64
290 FOR J = 1 TO N- I NO OF CORRELNS12
300 XS = XS + XIJ) • XIJ + II
310 NEXT
SERIES GENERATED
320 RII) = XS ! XX
330 NEXT
o 1
340 RE" :PRINT CORRELNS. 1 .937597326
350 FOR I = 0 TO NC: PRINT Ii" 2 .850968041
"iRII): NEXT 3 .770074649
360 RE" :NIENER FILTER USING 4 .715543398
370 RE" LEVINSON RECURSION 5 .695582979
380 RE": CORRELATIONS ARE IN RI 6 .690695288
J)
7 .700835561
390 PRINT "GIVE ORDER OF "ATRIX 8 .713462715
9 •729638382
400 INPUT "Dl"=";NO 10 .741111904
410 IF NO ) NC THEN GOTD 490 11 .753310002
420 IF NO = 1 THEN PRINT Rll): GOTO 12 .140191527
480
430 IF NO < 1 THEN SOTO 490 GIVE ORDER OF ~TRIX
440 SOSUB 510 DI"=3
450 FOR I = 1 TO NO 1.16522928
460 PRINT Iii "jPHINO,I) 2 -.0415635165
470 NEXT 3 .0411571131
480 SOTO 400 01"=7
490 PRINT "PROSRA" END": 1 -4.63736286
500 END 2 1.06696384
510 RE" :TOEPLITZ RECURSION 3 -\. 92516782
520 PHll,l) = Rll) 4 .0298389376
530 FOR I = 2 TO NO 5 -.258639504
540 OX = OlDY = 0 6 .22046817
550 FOR J = 1 TO I - 1 7 .0583621423
560 OX = PHIl - I,JI * RII - J) + DI"=O
OX PROGRA" END
INDEX

Addition 53,79, 168 Binomial distribution 121-123, 131, 144, 145,


in scientific notation 24-25 241
of angles 8 Binomial probability 238
of functions 54 Binomial theorem 216
of matrices 92, 93 Boolean algebra 26
of vectors 90 Boundary conditions 73
Aliasing 183-87 Brownian motion 179
Amplitude 13, 154, 171, 197
Angles 7-9, 14,27, 149 Calculus
Angular frequency 12-13, 171 differentiation 27-50
Anti-wavelet 205 formulae 217
Aperiodic functions 158 integration 51-78
Approximations 41-43 Cauchy-Riemann equations 87
Area under a graph 51 Central conic 110
Area under a parabola 56 Central Limit Theorem 145-46
Argand diagram 82 Chain Rule 34, 75
Argument 82 Choleski decomposition algorithm 227
Arithmetic series 216 Circle 109
ARMA (autoregressive moving average Coefficients 24, 93
model) 180, 193 Column-vector 90
Arrays 89, 91 Combinations 119-21
Augmented matrix 101, 102, \05-7 Complex conjugate 80, 100
Autocorrelation 175, 181, 186 Complex function 29
Autocorrelation function 175, 203 differentiation 86-88
Autocovariance 186 integration 86-88
Autocovariance function 175, 183, 203 Complex numbers 79-88, 93
Autoregressive estimate 195 references 212
Autoregressive model 180, 191 Complex variables 83-86
Autoregressive moving average model Compound statement 25
(ARMA) 180, 193 Computer languages 15
Averaging filter 192 Computer programs 218-52
Computer techniques 89, III
Band-limited white noise 176 Conditional probability 118
Band-pass filter 189 Confidence intervals 144-47
Base numerals 21 Conformability 95
BASIC 141, 218 Conjugate functions 87
Bayes theorem 119 Constants 2, 32, 42, 53, 56, 57, 74, 76, 77, 93
Binary notation 21-23, 25, 26 Continuous functions 177, 182
251
252 Index

Convolution 3, 157, 159, 161, 165, 167, 182, Double integration 63-66
190, 191, 195, 198, 199,206 Drunkard's walk 242
Coordinates 90 Duality 79, 161
Correlation 133, 138 Dummy variable 51
Correlation coefficient 175
Cosec 11 Echelon matrix 102
Cosine 7, 29, 84, 149, 150, 152 Eigenvalues 77,108-9, 111,236
Cot 11 Eigenvectors 108-9
Covariance 133 Electric circuits 26
Cross-correlation function 203-4 Electric current 172
Cross-correlation generating function 204 Electromagnetic theory 68
Cumulated process 178 Elementary row operations 102
Cumulative distribution 125 Ellipse 109-11
Ensemble 114
Daniell window 195 Ensemble average 181
Data analysis program 240 Equation of motion 179
Data processing 89, 90 Ergodic theorem 181
Data samples 134-37 Errors 138, 144-46, 205, 206, 208
Decay 175 Estimate 136, 138
Decimal notation 21-23 Estimation 193-95
Decimation Even functions 152
in frequency algorithm 169 Event 116
in time algorithm 169 Expected value 130, 132
Deconvolution 195, 200, 208 Exponential distribution 130, 131
Deconvolution filter 204, 205 Exponential functions 14-16,84,85, 165
Definite integral 53
Degree of differential equation 74 Fast Fourier transform (FFT) 168-70, 195,
Delay 199,204,205 243,250
Delta function 160, 178, 183 Filter coefficients 248, 250
Density function 131, 132, 182 Filters 174, 188-95,202,208,245,247
Derivatives 31, 32 Finite data filter 193
see also Higher derivatives; Partial deriva- Finite discrete Fourier transform 163
tives Floating point representation 24
Diagonal matrix 97 Flow lines 73
Differential coefficients 47 Fourier analysis 13, 149-73
Differential equations 47, 73-78 examples of 152-54
degree of 74 references 213
general solution 75 Fourier coefficients 152, 153, 156
particular solution 75 Fourier series 60, 78, 149-57
Differentiation 27-50, 75, 180 Fourier transform 149, 158-62, 182, 190
complex functions 86-88 Frequency 85, 171-73
introduction 27-35 Frequency domain 171, 174
references 212 Frequency domain analysis 156
relationship with integration 54-60 Function value 15
Diffusion equation 48 Functions 1-26,27,32,35,43,46,51,52,66,
Digital filters 191-93 80,149
Discrete Fourier transform (DFT) 163-68, addition of 54
198 BASIC programs 218-52
Discrete series 193 graphs of 15
Discrete time 180, 181, 191-93 integral of 53
Discrete time series 183, 247 matrices as 98-100
Division 81 new from old 20
Dot product 100 of two variables 83, 225
Double integral 71 random variable as 116
Index 253
Functions (cont.) Inverse functions 16-19,34, 158
references 211 trigonometric 18-19
special types of 152 Inverse sine 19
Fundamental Theorem of Calculus 55 Inverse tangent 19
Inverse transform 164, 182
Invertibility 106
Gain 189, 245 Invertible matrix 97
Gauss Elimination Method 106
Gaussian distribution 127
Jacobi's method 236
Generalised function 160
Joint density function 129
Geometric series 163, 168,216
Joint distribution 128-29, 133
Ghost elimination filter 202
Gibbs phenomenon 154
Kalman filter 119, 194
Graphs 12
area under 51, 56
Laplace's equation 48, 87
of functions 15 Leading diagonal matrix 97
Gravimetric effects 72
Leading term 2
Gravitational theory 68
Line integrals 66-73
Green's theorem 70-72
Linear equations 74, 100-8, 207
Linear filters in discrete time 191-93
Harmonic functions 87 Linear functions 99
Harmonics 77 Linear problems 89, 99
Heaviside function 4, 31 Linear process 178
Hexadecimal notation 21-22 Local maxima and minima 38
High pass filter 190 Log tables IS
Higher derivatives 35-40 Logarithms 15, 18, 85, 88, 215
Higher order partial derivatives 46-48 Logic 26
Histogram 135, 137 Low pass filter 190
Homogeneous equations 108
Hyperbola 109 Marginal distribution 129
Markov process 177, 187
Identity (or unit) matrices 96 continuous time 187
Image processing 150 discrete time 187
Imaginary part 79, 83, 84 Marquardt's method 226
Inconsistency 106 Mathematical models 43
Indefinite integral 56 Matrices 89-111, 227
Independence 118 addition of 92, 93
Infinite sequences 3 as functions 98-100
Integer 6 definitions and elementary properties 89-91
Integrals 56, 58, 60, 61, 76, 88, 150, 152, 161, examples 91-93
184-85 introduction 89
double 66, 71 multiplication of 93-96
line 66-73 notation of 92, 93
of functions 53 references 212
triple 66 special types of 96-98
Integration 51-78,180 Matrix inversion 208
complex functions 86-88 Maximum delay 199,204,205
double 63-66 Maximum entropy 195
numerical 61-63, 71-73 Mean 130, 131, 145, 146
references 212 Median 136
relationship with differentiation 54-60 Mesh 71-73
repeated 63 Mesh cells 72
Inverse 80, 81, 106 Mesh point 71
Inverse Fourier transform 158 Minimum delay 199,200, 204, 205
254 Index

Modulus 80 Polynomials 2-6, 23, 68, 180, 190, 198,247


Moments 129-34 program 221
Monotonic functions 66 references 211
Monte Carlo methods 141-44 Power functions 15, 17
Multiple time series 174 Power spectrum 181-83
Multiplication 18, 23-25, 79, 81, 85, 93-96, Precision 147
168 Prediction filter equation 209
Predictive deconvolution 204-9, 250
Primitive 56, 59, 88
Natural numbers 51, 56 Probability 115-19, 143, 145-47
Negative case 53 Probability density function 125
Negative exponential distribution 126 Probability distribution 121-28, 130, 132, 175
Negative indices 3 Product 94, 95
Newton-Raphson algorithm 223 Pure noise series 183
Newton's approximation method 41 Purely random process 176-79, 187
Non-linear least squares program 226 Pythagoras' theorem 8, 82
Non-singular matrix 97
Non-square matrix 98 Quadratic form 109-11
Normal distribution 127, 130, 131, 145, 146 Quadrature methods 63
Normal equations 139 Queue 147, 148
Numbers 20-26, 79 Queueing theory 124
Numerical integration 61-63,41-73
Numerical processing 2 Radians 7, 14, 18,30, 149, 164
Nyquist frequency 185, 186 Random digits 141
Random numbers 141, 143,241
Random variable 115, 116, 132, 141, 174
Odd functions 152, 153 Random walk 177
Optimisation 48-49 Rate of change 27
Ordinary differential equations 47 Rational functions 20
Orthogonal matrix 97, 100, 109 Rational spectrum 190
Orthogonal vectors 100 Real numbers 79-81
Orthogonality relations 150 Real part 79, 83, 84
Realisation 114
Parameter 70 Recursion 208
Parametric form 70 Recursive method 248, 250
Parseval's theorem 157 Reduced echelon matrix 102, 105
Partial derivatives 43-46, 206, 207 Reduced row echelon matrix 107
higher ·order 46-48 Regression 139
Partial differential equations 76, 78 Root 4,85
Period 171 Rosenbrock's banana-vaHey function 48
Periodic convolution 167 Row reduced form 102
Periodicity 165 Row-vector 89-91
Periodogram 195
Permutations 119-21 Saddle point 48
Perpendicular vector 100 Sample space 115
Phase 13, 154, 171, 189,245 Sampling 183-87
Phase shift 189, 192 Sampling interval 183
Point of inflection 39 Sampling rate 183
Poisson distribution 122-28, 131, 147 Sampling theorem 185
Poisson probability 239 Scalar multiplication 90, 93, 100
Polar coordinates 47, 84-85 Scalar product 100
Polar form 82 Scientific notation 24
Polynomial degree 2 Sec 11
Polynomial functions 83 Second order stationary process 175, 250
Index 255
Separable equations 73 Time series models 188
Separation of the variables 77 Toeplitz matrix 194,208
Set of solutions 101 Transfer function 189
Shell sort algorithm 240 Transforming procedure 106
Simple Harmonic Motion 75, 77 Transpose 96
Simpson's rule 63, 127, 153,234 Trapezoidal rule 61
Simulation 141-44 Trigonometric formulae 85, 215
Simultaneous linear equations 93 Trigonometric functions 7-14, 29, 60, 70, 76,
Sine 7, 19,60,84, 149, 150, 152, 183 84
Sine curve 12 inverse 18-19
Sine wave 13,76, 153 references 211
Singular matrix 98 Truth table 25
Skew-symmetric matrix 97 Turning points 38-39
Slope function 30 "Twiddle factor" 169, 179
Slope of curve 27
Slope of surface 44
Uniform distribution 144
Solution set 101
Unitary matrix 100, 110
Spectral analysis 173
Spectral window 194
Spectrum 154-55, 183, 185, 186 Variables 43~ 45, 46, 66, 93
Spread 130 separation of 77
Square matrix 95, 97, 98 two 83, 138-40,225
Square root 85 Variance 130, 131, 136, 137, 145, 146
Square wave 160 Vector time series 174
Stacking 200 Vectors 89, 91, 100,236
Standard normal distribution 127 addition of 90
Stationarity 191 dimension of 90
Stationary series 174-83, 207 notation 90
Statistical tables 121 Vibrating string 76
Statistics 144
Step function 4, 59
Walsh functions 150
Stochastic independence 118
Wave equation 48, 76
Stochastic processes 114, 115, 147-48, 174
Wave theory 13
Subtraction 79
Wavelength 13
Summation 61, 152
Wavelet analysis 197-204
Symmetric matrix 96, 97, 109, 110
Weakly stationary series 181
Symmetry 130
White noise 176-78, 190
Wiener filter 194,207,248
Tangent 7, 19,27
Wiener process 178-79
Taylor series 41-43
Wiener-Khintchin equations 182
Ternary notation 21-22
Time and frequency relation 85
Yule- Walker equations 194, 207
Time average 181
Time series 112, 114, 174-96
references 213 z-transform 3, 162-63, 198,200,202,204,206

You might also like