Notes On Least Squares
Notes On Least Squares
Geospatial Science
NOTES ON
LEAST SQUARES
Copies of copyright material in this compilation have been made in
accordance with the provisions of Section V(b) of the Copyright Act for the
teaching purposes of the University.
RMIT
Geospatial Science
School of Mathematical and Geospatial Science
NOTES ON LEAST SQUARES
This edition printed in 2005
CONTENTS
CHAPTER PAGES
1. Introduction 3
2. Least Squares Adjustment of Indirect Observations 57
3. Propagation of Variances 37
4. Approximate Values 5
5. Propagation of Variances Applied to Least Squares 7
Adjustment of Indirect Observations
6. Least Squares Adjustment of Observations Only 39
7. Linearization Using Taylor's Theorem and the 15
Derivation of Some Common Surveying
Observation Equations
8. The Standard Error Ellipse 10
9. Least Squares Resection 14
10. Least Squares Bearing Intersection 12
APPENDICES
A Matrix Algebra 23
REFERENCES 2
RMIT University Geospatial Science
1. INTRODUCTION
The theory of least squares and its application to adjustment of survey measurements is well
known to every geodesist. The invention of the method is generally attributed to Karl
Freidrich Gauss (1777-1855) but could equally be credited to Adrien-Marie Legendre (1752-
1833).
Gauss used the method of least squares to compute the elements of the orbit of the minor
planet Ceres and predicted its position in October 1801 from a few observations made in the
previous year. He published the technique in 1809 in Theoria Motus Corporum Coelestium in
Sectionibus Conicis Solem Ambientium (Theory of the Motion of the Heavenly Bodies
Moving about the Sun in Conic Sections), mentioning that he had used it since 1795, and also
developed what we now know as the normal law of error, concluding that: "... the most
probable system of values of the quantities ... will be that in which the sum of the squares of
the differences between the actually observed and computed values multiplied by numbers
that measure the degree of precision, is a minimum." (Gauss 1809).
After these initial works, the topic was subjected to rigid analysis and by the beginning of the
20th century was the universal method for the treatment of observations. Merriman (1905)
compiled a list of 408 titles, including 72 books, written on the topic prior to 1877 and
publication has continued unabated since then. Leahy (1974) has an excellent summary of the
development of least squares and clearly identifies the historical connection with
mathematical statistics, which it pre-dates.
The current literature is extensive; the books Observations and Least Squares (Mikhail 1976)
and Analysis and Adjustment of Survey Measurements (Mikhail and Gracie 1981), and lecture
notes by Cross (1992), Krakiwsky (1975) and Wells and Krakiwsky (1971) stand out as the
simplest modern treatments of the topic.
Following Wells and Krakiwsky (1971, pp.8-9), it is interesting to analyse the following
quotation from Gauss' Theoria Motus (Gauss, 1809, p.249).
"If the astronomical observations and other quantities, on which the computation of
orbits is based, were absolutely correct, the elements also, whether deduced from
three or four observations, would be strictly accurate (so far indeed as the motion is
supposed to take place exactly according to the laws of KEPLER), and, therefore, if
other observations were used, they might be confirmed, but not corrected. But since
all our measurements and observations are nothing more than approximations to the
truth, the same must be true of all calculations resting upon them, and the highest
aim of all computations made concerning concrete phenomena must be to
approximate, as nearly as practicable, to the truth. But this can be accomplished in
no other way than by a suitable combination of more observations than the number
absolutely requisite for the determination of the unknown quantities. This problem
can only be properly undertaken when an approximate knowledge of the orbit has
been already attained, which is afterwards to be corrected so as to satisfy all the
observations in the most accurate manner possible."
This single paragraph, written almost 200 years ago, embodies the following concepts, which
are as relevant today as they were then.
(iii) All that can be expected from computations based on inconsistent measurements are
estimates of the "truth",
(v) Initial approximations to the final estimates should be used, and finally,
These notes contain a development of Least Squares processes applicable to surveying and
geodesy. Examples and exercises of least squares processes are given using MATLAB, an
interactive, matrix-based system for scientific and engineering computation and visualization.
The name MATLAB is derived from MATrix LABoratory and is licensed by The
MathWorks, Inc.
2.1. Introduction
Crandall and Seabloom (1970, pp. 4-5) give a definition of a measurement as:
Direct measurements (or observations) are those that are made directly upon the quantity to be
determined. Measurements of a line by direct chaining, or Electronic Distance Measurement
(EDM), or measurement of an angle by theodolite or Total Station are examples of direct
measurements.
Indirect measurements (or observations) are not made upon the quantity itself but are made on
some other quantity or quantities related to it. For example, the coordinates of a point P are
indirectly determined by measuring bearings and distances to P from other points; the latitude
of P may be determined from altitudes to certain stars; and the height of P may be determined
by measured height differences from a known point.
coming from the Latin errare which means to wander and not to sin.
Blunders or mistakes are definite mis-readings, booking errors or other like occurrences.
They are usually caused by poor measurement technique and/or a lack of attention to detail by
the person making the measurement. They may be eliminated or minimized by correct and
careful measurement techniques, and a thorough understanding of the operation of the
equipment used for the measurement.
Constant errors are those that do not vary throughout the particular measurement period.
They are always of the same sign. Neglecting to standardize a measuring tape introduces a
constant error; failure to use the correct prism-offset value introduces constant errors in EDM
measurements. A faulty joint between sections of a levelling staff will introduce a constant
error into height differences from spirit levelling. Constant errors can be eliminated from
measurements by a thorough understanding of the measurement process and the equipment
used.
Systematic errors are those errors that follow some fixed law (possibly unknown) dependent
on local conditions and/or the equipment being used. For example, if the temperature and
pressure (which are indicators of atmospheric conditions) are not measured when using EDM
equipment then a systematic error may be introduced, since the modulated electromagnetic
beam of the EDM passes through the atmosphere and its time of travel (indirectly measured
by phase comparison of emitted and reflected beams) is affected by atmospheric conditions.
All EDM measurements must be corrected for atmospheric conditions that depart from
"standard conditions".
Accidental or Random errors are the small errors remaining in a measurement after mistakes,
constant errors and systematic errors have been eliminated. They are due to the imperfection
of the instruments used, the fallibility of the observer and the changing environmental
conditions in which the measurements are made, all of which affect the measurement to a
lesser or greater degree.
Bearing in mind the aforementioned, it could be said that all careful measurements (where
mistakes, constant errors and systematic errors have been eliminated) contain small random
errors and from experience, three axioms relating to random errors can be stated.
These axioms are the basic premises on which the theory of errors (the normal law of error) is
founded.
A measured quantity has a true value and a most probable value. The most probable value is
often called the best estimate and the two terms can be taken as synonymous.
No matter how many times a quantity is measured, its true value will remain unknown and
only a best estimate can be obtained from the measurements. In the case of a single measured
quantity, the best estimate is the arithmetic mean (or average) of the measurements.
If a quantity has been measured a number of times, the difference between the true (but
unknown) value and any measurement is the true error and the difference between the best
estimate and any measurement is the apparent error.
These relationships can be established by defining a correction to have the same magnitude as
an error but the opposite sign. In surveying, the terms correction and residual are regarded as
synonymous, and are universally denoted by the letter v.
true value (unknown) of the measured quantity is μ (mu) and is estimated by the arithmetic
mean x where
n
x +x + + xn ∑x k
x= 1 2 = k =1
(2.1)
n n
The arithmetic mean is regarded as the best estimate or most probable value. A correction v
having the same magnitude as an error but the opposite sign is defined as
vk = x − xk
Since these corrections relate to the measurements and arithmetic mean, they could be called
apparent corrections and hence according to our definition of corrections and errors, apparent
errors −v are defined as
− vk = xk − x
ε k = xk − μ
True errors are unknown and are approximated by apparent errors. The closer the best
estimate (or most probable value) approaches the true value, the closer the apparent error
approaches the true error. The laws defining the nature and behaviour of true errors were
derived from practical axioms deduced from the nature of apparent errors and hence any
theory of errors may also be regarded as a theory of corrections (or residuals) and the
distinction between true errors and apparent errors is ignored for all practical purposes.
The following sections contain simple examples of least squares processes, the mean, the
weighted mean, line of best fit (linear regression) and polynomial curve fitting. In each case,
Gauss' least squares principle: "... the most probable system of values of the quantities ... will
be that in which the sum of the squares of the differences between the actually observed and
computed values multiplied by numbers that measure the degree of precision, is a minimum."
will be employed to determine equations or systems of equations that may be regarded as least
squares solutions to the problems. Furthermore, it is assumed that all measurements are free
of mistakes, constant errors and systematic errors and "contain" only random errors and that
the precision of the measurements is known a priori (Latin a priori from what is before).
Solutions to some of the examples are provided as MATLAB script files (.m files).
It is well known practice that when a single quantity is measured a number of times the
arithmetic mean is taken as the best estimate of the measured quantity. Few people realise
that when they adopt this practice they are employing Gauss' least squares principle.
x1 + v1 = p, x2 + v2 = p, x3 + v3 = p, , xn + vn = p
v1 = p − x1 , v2 = p − x2 , v3 = p − x3 , , vn = p − xn
Now if all the measurements can be regarded as having equal precision we may state the least
squares principle as
The best estimate p is that value which makes the sum of the squares
of the residuals a minimum.
n
ϕ = the sum of the squares of the residuals = ∑ vk2 (2.2)
k =1
n
ϕ = ∑ vk2 = ( p − x1 ) + ( p − x2 ) + + ( p − xn )
2 2 2
or
k =1
We say that ϕ is a function of p, the single parameter or variable in this equation. The
minimum value of the function (i.e. making the sum of squares of residuals a minimum) can
dϕ
be found by equating the derivative to zero, i.e.,
dp
dϕ
ϕ is a minimum when =0
dp
dϕ
and = 2 ( p − x1 ) + 2 ( p − x2 ) + + 2 ( p − xn ) = 0
dp
Cancelling the 2's and rearranging gives the best estimate p as the arithmetic mean.
n
x + x + x3 + + xn ∑x k
p= 1 2 = k =1
(2.3)
n n
Hence, the arithmetic mean of a series of measurements is the best estimate according to
Gauss' least squares principle.
Before demonstrating that the weighted mean of a set of observations is the result of a least
squares process, some discussion of the term weight and its connection with precision is
required.
In every least squares process it is assumed that the precision of measurements is known. The
precision is a measure of the dispersion (or spread) of a number of measurements from their
mean (or average) value. A common statistical measure of precision is the variance σ 2 and
the positive square root of the variance is the standard deviation σ . Equations for the
variance and standard deviation differ depending on whether the population of measurements
is finite or infinite and a population is a potential set of quantities that we want to make
inference about based on a sample from that population.
Following Deakin and Kildea (1999), consider a finite population, such as the examination
marks mk of a group of N students in a single subject. Since we have complete information
about the population, i.e., its size is known, the mean μ , the variance σ 2 and the standard
deviation σ of the finite population are
N
∑m k
μ= k =1
(2.4)
N
∑ (m − μ)
2
k
σ2 = k =1
(2.5)
N
∑(m − μ)
2
k
σ= k =1
(2.6)
N
Note that the variance σ 2 is the average squared difference of a member of the population mk
from the population mean μ . The mean, variance and standard deviation are known as
population parameters.
Consider surveying measurements, drawn from infinite populations with the attendant
difficulties of estimation since population averages can never be known. In such cases we are
usually dealing with small samples of measurements of size n and we can only obtain
estimates of the unknown population parameters μ , σ 2 and σ . For a sample of n
1 n
x= ∑ xk
n k =1
(2.7)
1 n
s x2 = ∑ ( xk − x )
2
(2.8)
n − 1 k =1
1 n
sx = ∑ ( xk − x )
2
(2.9)
n − 1 k =1
Note the divisor n − 1 (known as the degrees of freedom) in equations for the estimates of
variance and standard deviation. This ensures that s x2 is an unbiased estimate of the
population variance σ 2 , but does not ensure that s x is an unbiased estimate of the population
n
1
s ∗x = ∑( x − x)
2
k (2.10)
cn k =1
n 2 3 4 5 10 15 20 30 90
n-1 1 2 3 4 9 14 19 29 89
cn 0.64 1.57 2.55 3.53 8.51 13.51 18.51 28.50 88.50
In these notes, it is always assumed that the terms mean, variance and standard deviation refer
to estimates of population values.
Another measure of precision, often used in least squares applications is weight w and the
weight of an observation (or measurement) is defined as being inversely proportional to the
variance
1
wk ∝ (2.11)
sk2
σ 02
or wk = (2.12)
sk2
the classical definition of weight and if an observation has unit weight ( wk = 1) its variance
equals σ 02 , hence the reference variance is sometimes called the variance of an observation of
unit weight; a term often encountered in older surveying texts. In this definition of weight,
there is an assumption that the measurements are uncorrelated (a statistical term relating to the
dependence between measurements, see section 2.5). In cases where measurements are
correlated, weights are not an adequate means of describing relative precisions.
As an example of the connection between weights and standard deviations consider three
uncorrelated (i.e., independent) observations of a particular distance, where each observation
is the mean of several measurements and standard deviations of each observation have been
determined from the measurements
Since the weight is inversely proportional to the variance, the observation with the smallest
weight will have the largest variance (standard deviation squared). For convenience, this
observation is given unit weight i.e., w2 = 1 and the other observations (with smaller
variances) will have higher weight. Hence from (2.12)
σ 02
w2 = 1 = and σ 02 = ( 0.032 )
2
( 0.032 )
2
( 0.032 ) = 10.24
2
w1 =
( 0.010 )
2
( 0.032 )
2
w2 = 2 =
1
( 0.032 )
( 0.032 )
2
w3 = = 1.78
( 0.024 )
2
Weights are often assigned to observations using "other information". Say for example, a
distance is measured three times and a mean value determined. If two other determinations of
the distance are from the means of six and four measurements respectively, the weights of the
three observations may simply be assigned the values 3, 6 and 4. This assignment of weights
is a very crude reflection of the (likely) relative precisions of the observations since it is
known that to double the precision of a mean of a set of measurements, we must quadruple
the number of measurements taken (Deakin and Kildea, 1999, p. 76).
w1 , w2 , w3 , … , wn and denote the best estimate of this quantity as q. According to our general
or x1 + v1 = q, x2 + v2 = q, x3 + v3 = q, , xn + v n = q
v1 = q − x1 , v2 = q − x2 , v3 = q − x3 , , v n = q − xn
Now each measurement has a weight reflecting relative precision and we may state the least
squares principle as
The best estimate q is that value which makes the sum of the squares
of the residuals, multiplied by their weights, a minimum.
We may define a least squares function ϕ (phi) as
n
ϕ = the sum of the weighted squared residuals = ∑ wk vk2 (2.13)
k =1
n
ϕ = ∑ wk vk2 = w1 ( q − x1 ) + w2 ( q − x2 ) + + wn ( q − xn )
2 2 2
or
k =1
We say that ϕ is a function of q, the single parameter or variable in this equation. The
minimum value of the function (i.e., making the sum of the weighted squared residuals a
dϕ
minimum) can be found by equating the derivative to zero, i.e.,
dq
dϕ
ϕ is a minimum when =0
dq
dϕ
and = 2 w1 ( q − x1 ) + 2 w2 ( q − x2 ) + + 2 wn ( q − xn ) = 0
dq
w1q − w1 x1 + w2 q − w2 x2 + + wn q − wn xn = 0
w x + w2 x2 + + wn xn ∑w x k k
q= 1 1 = k =1
(2.14)
w1 + w2 + + wn n
∑w
k =1
k
Hence, the weighted arithmetic mean of a series of measurements xk each having weight wk
is the best estimate according to Gauss' least squares principle.
It should be noted that the equation for the weighted mean (2.14) is valid only for
measurements that are statistically independent. If observations are dependent, then a
measure of the dependence between the measurements, known as covariance, must be taken
into account. A more detailed discussion of weights, variances and covariances is given in
later sections of these notes.
5 •
c
•4 x+
y =m
2• C • 3
1•
The line of best fit shown in the Figure 2.1 has the equation y = m x + c where m is the slope
⎛ y − y1 ⎞
of the line ⎜ m = tan θ = 2 ⎟ and c is the intercept of the line on the y axis.
⎝ x2 − x1 ⎠
m and c are the parameters and the data points are assumed to accord with the mathematical
model y = m x + c . Obviously, only two points are required to define a straight line and so
three of the five points in Figure 2.1 are redundant measurements (or observations). In this
example the x,y coordinate pairs of each data point are considered as indirect measurements of
the parameters m and c of the mathematical model.
To estimate (or compute) values for m and c, pairs of points in all combinations (ten in all)
could be used to obtain average values of the parameters; or perhaps just two points selected
as representative could be used to determine m and c.
A better way is to determine a line such that it passes as close as possible to all the data
points. Such a line is known as a Line of Best Fit and is obtained (visually) by minimising the
differences between the line and the data points. No account is made of the "sign" of these
differences, which can be considered as corrections to the measurements or residuals. The
Line of Best Fit could also be defined as the result of a least squares process that determines
estimates of the parameters m and c such that those values will make the sum of the squares of
the residuals, multiplied by their weights, a minimum. Two examples will be considered, the
first with all measurements considered as having equal precisions, i.e., all weights of equal
value, and the second, measurements having different precisions, i.e., unequal weights.
In Figure 2.1 there are five data points whose x,y coordinates (scaled from the diagram in
mm's) are
Point x y
1 −40.0 −24.0
2 −15.0 −24.0
3 10.0 −12.0
4 38.0 15.0
5 67.0 30.0
Assume that the data points accord with the mathematical model y = m x + c and each
measurement has equal precision. Furthermore, assume that the residuals are associated with
the y values only, which leads to an observation equation of the form
y k + v k = m xk + c (2.15)
By adopting this observation equation we are actually saying that the measurements (the x,y
coordinates) don't exactly fit the mathematical model, i.e., there are inconsistencies between
the model and the actual measurements, and these inconsistencies (in both x and y
measurements) are grouped together as residuals vk and simply added to the left-hand-side of
the mathematical model. This is simply a convenience. We could write an observation
equation of the form
(
yk + v yk = m xk + v xk + c )
v xk , v yk are residuals associated with the x and y coordinates of the kth point. Observation
equations of this form require more complicated least squares solutions and are not considered
in this elementary section.
vk = m xk + c − y k (2.16)
The distinction here between observation equations and residual equations is simply that
residual equations have only residuals on the left of the equals sign. Rearranging observation
equations into residual equations is an interim step to simplify the function ϕ = sum of
squares of residuals.
Since all observations are of equal precision (equal weights), the least squares function to be
minimised is
n
ϕ = the sum of the squares of the residuals = ∑ vk2
k =1
5
or ϕ = ∑ vk2 = ( m x1 + c − y1 ) 2 + ( m x2 + c − y2 )2 + .... + (m x5 + c − y5 )2
k =1
∂φ
= 2( m x1 + c − y1 )( x1 ) + 2( m x2 + c − y2 )( x2 ) + ... + 2( m x5 + c − y5 )( x5 ) = 0
∂m
∂φ
= 2( m x1 + c − y1 )(1) + 2( m x2 + c − y2 )(1) + ... + 2( m x5 + c − y5 )(1) =0
∂c
Cancelling the 2's, simplifying and re-arranging gives two normal equations of the form
n n n
m ∑ xk2 + c ∑ xk = ∑x k yk
k =1 k =1 k =1
n n
(2.17)
m ∑ xk + c n = ∑y k
k =1 k =1
⎡ ∑ xk2 ∑ x ⎥⎤ ⎡m ⎤ ⎡ ∑ xk y k ⎤
⎢
k
= ⎢ ⎥ (2.18)
⎣ ∑ xk n ⎦ ⎢⎣ c ⎥⎦ ⎣ ∑ yk ⎦
or Nx = t (2.19)
Matrix algebra is a powerful mathematical tool that simplifies the theory associated with least
squares. The student should become familiar with the terminology and proficient with the
algebra. Appendix A contains useful information relating to matrix algebra.
⎡n n12 ⎤
N = ⎢ 11 is the ( u, u ) normal equation coefficient matrix
⎣ n21 n22 ⎥⎦
⎡x ⎤
x = ⎢ 1⎥ is the ( u,1) vector of parameters (or "unknowns"), and
⎣ x2 ⎦
⎡t ⎤
t = ⎢ 1⎥ is the ( u,1) vector of numeric terms.
⎣ t2 ⎦
x = N −1 t (2.20)
In this example (two equations in two unknowns) the matrix inverse N −1 is easily obtained
(see Appendix A 4.8) and the solution of (2.20) is given as
⎡x ⎤ 1 ⎡ n22 −n12 ⎤ ⎡ t1 ⎤
x = ⎢ 1⎥ = (2.21)
⎣ x2 ⎦ ( n11 n22 − n12 n21 ) ⎢⎣ −n21 n11 ⎥⎦ ⎢⎣ t2 ⎥⎦
From the data given in Table 2.2, the normal equations are
and the solutions for the best estimates of the parameters m and c are
Substitution of the best estimates of the parameters m and c into the residual equations
vk = m xk + c − yk gives the residuals (mm's) as
v1 = −7.8
v2 = 6.0
v3 = 7.9
v4 = −3.6
v5 = −2.5
Consider again the Line of best Fit shown in Figure 2.1 but this time the x,y coordinate pairs
are weighted, i.e., some of the data points are considered to have more precise coordinates
than others. Table 2.3 shows the x,y coordinates (scaled from the diagram in mm's) with
weights.
Point x y weight w
1 −40.0 −24.0 2
2 −15.0 −24.0 5
3 10.0 −12.0 7
4 38.0 15.0 3
5 67.0 30.0 3
Table 2.3 Coordinates (mm) and weights of data points shown in Figure 2.1
Similarly to before, a residual equation of the form given by (2.16) can be written for each
observation but this time a weight wk is associated with each equation and the least squares
function to be minimised is
n
ϕ = the sum of the weighted squared residuals = ∑ wk vk2
k =1
5
or ϕ = ∑ wk vk2 = w1 ( m x1 + c − y1 ) 2 + w2 ( m x2 + c − y2 )2 + .... + w5 ( m x5 + c − y5 ) 2
k =1
∂φ
= 2 w1 ( m x1 + c − y1 )( x1 ) + 2 w2 ( m x2 + c − y2 )( x2 ) + ... + 2 w5 ( m x5 + c − y5 )( x5 ) = 0
∂m
∂φ
= 2 w1 ( m x1 + c − y1 )(1) + 2 w2 ( m x2 + c − y2 )(1) + ... + 2 w5 ( m x5 + c − y5 )(1) =0
∂c
Cancelling the 2's simplifying and re-arranging gives two normal equations of the form
n n n
m ∑ wk xk2 + c ∑ wk xk = ∑w x k k yk
k =1 k =1 k =1
n n n
m ∑ wk xk + c ∑ wk = ∑w y k k
k =1 k =1 k =1
⎡ ∑ wk xk2 ∑w x ⎤ ⎡m⎤ ⎡ ∑ wk xk yk ⎤
⎥⎢ ⎥ = ⎢
k k
⎢ ⎥
⎣ ∑ wk xk ∑w k ⎦⎣c⎦ ⎣ ∑ wk yk ⎦
The solution for the best estimates of the parameters m and c is found in exactly the same
manner as before (see section 2.4.1)
m = 0.592968
c = −12.669131
v1 = −12.4
v2 = 2.4
v3 = 5.3
v4 = −5.1
v5 = −2.9
Comparing these residuals with those from the Line of Best Fit (equal weights), shows that
the line has been pulled closer to points 2 and 3, i.e.; the points having largest weight.
Some of the information in this section has been introduced in previously in section 2.3 The
Weighted Mean and is re-stated here in the context of developing general matrix expressions
for variances, covariances, cofactors and weights of sets or arrays of measurements.
population variance σ x2 and the family of Normal probability density functions are given by
Kreyszig (1970) as
+∞
μx = ∫ x f ( x ) dx (2.22)
−∞
+∞
σ x2 = ∫ ( x − μx )
2
f ( x ) dx (2.23)
−∞
2
1 ⎛ x−μx ⎞
− ⎜ ⎟
1 2⎝ σ x ⎠
f ( x; μ x , σ x ) = e (2.24)
σ x 2π
Since the population is infinite, means and variances are never known, but may be estimated
from a sample of size n. The sample mean x and sample variance s x2 , are unbiased estimates
1 n
x= ∑ xk
n k =1
(2.25)
1 n
s x2 = ∑ ( xk − x )
2
(2.26)
n − 1 k =1
The sample standard deviation s x is the positive square root of the sample variance and is a
measure of the precision (or dispersion) of the measurements about the mean x .
When two or more observations are jointly used in a least squares solution then the
interdependence of these observations must be considered. Two measures of this
interdependence are covariance and correlation. For two random variables x and y with a
joint probability density function f ( x, y ) the covariance σ x y is
∫ ( x − μ ) ( y − μ ) f ( x, y ) dx dy
+∞ +∞
σxy = ∫ x y (2.27)
−∞ −∞
σxy
ρx y = (2.28)
σ xσ y
(Kreyszig 1970, pp.335-9). Correlation and statistical dependence are not the same, although
both concepts are used synonymously. It can be shown that the covariance σ x y is always
zero when the random variables are statistically independent (Kreyszig 1970, p.137-9).
Unfortunately, the reverse is not true in general. Zero covariance does not necessarily imply
statistical independence. Nevertheless, for multivariate Normal probability density functions,
zero covariance (no correlation) is a sufficient condition for statistical independence (Mikhail
1976, p.19).
1 n
sx y = ∑ ( xk − x )( yk − y )
n − 1 k =1
(2.29)
⎡ σ 12 σ 12 σ 13 ... σ 1n ⎤
⎢ ⎥
⎢ σ 21 σ 22 σ 23 ... σ 2 n ⎥
Σ = (2.30)
⎢ .... .... .... .... ⎥
⎢ 2 ⎥
⎣⎢σ n1 σ n 2 σ n 3 ... σ n n ⎥⎦
In practical applications of least squares, population variances and covariances are unknown
and are replaced by estimates s12 , s22 , … , sn2 and s12 , s13 , … or by other numbers
representing relative variances and covariances. These are known as cofactors and the
cofactor matrix Q, which is symmetric, is defined as
Σ = σ 02 Q (2.32)
σ 02 is a scalar quantity known as the variance factor. The variance factor is also known as the
reference variance and the variance of an observation of unit weight (see section 2.3 for
further discussion on this subject).
W = Q −1 (2.33)
Note that since Q is symmetric, its inverse W is also symmetric. In the case of uncorrelated
observations, the variance-covariance matrix Σ and the cofactor matrix Q are both diagonal
matrices (see Appendix A) and the weight of an observation w is a value that is inversely
proportional to the estimate of the variance i.e.,
For uncorrelated observations, the off-diagonal terms will be zero and the double subscripts
may be replaced by single subscripts; equation (2.34) becomes
wk = σ 02 sk2 (2.35)
Note: The concept of weights has been extensively used in classical least squares theory but
is limited in its definition to the case of independent (or uncorrelated) observations.
(Mikhail 1976, pp.64-65 and Mikhail and Gracie 1981, pp.66-68).
Matrix algebra is a powerful mathematical tool that can be employed to develop standard
solutions to least squares problems. The previous examples of the Line of Best Fit will be
used to show the development of standard matrix equations that can be used for any least
squares solution.
In previous developments, we have used a least squares function ϕ as meaning either the sum
of squares of residuals or the sum of squares of residuals multiplied by weights.
In the Line of Best Fit (equal weights), we used the least squares function
n
ϕ = the sum of the squares of the residuals = ∑ vk2
k =1
If the residuals vk are elements of a (column) vector v, the function ϕ can be written as the
matrix product
⎡ v1 ⎤
n ⎢v ⎥
ϕ = ∑ vk2 = [ v1 v2 vn ] ⎢ 2 ⎥ = v T v
k =1 ⎢ ⎥
⎢ ⎥
⎣ vn ⎦
In the Line of Best Fit (unequal weights), we used the least squares function
n
ϕ = the sum of the weighted squared residuals = ∑ wk vk2
k =1
If the residuals vk are elements of a (column) vector v and the weights are the diagonal
elements of a diagonal weight matrix W, the function ϕ can be written as the matrix product
⎡ w1 0 0 0 ⎤ ⎡ v1 ⎤
n ⎢0 w2 0 0 ⎥ ⎢v ⎥
ϕ = ∑ wk vk2 = [ v1 v2 vn ] ⎢ ⎥ ⎢ 2 ⎥ = v T Wv
k =1 ⎢0 0 0 ⎥⎢ ⎥
⎢ ⎥⎢ ⎥
⎣0 0 0 wn ⎦ ⎣ vn ⎦
Note that in this example the weight matrix W represents a set of uncorrelated measurements.
ϕ = vT Wv (2.36)
Note that replacing W with the identity matrix I gives the function for the case of equal
weights and that for n observations, the order of v is (n,1), the order of W is (n,n) and the
function ϕ = vT Wv is a scalar quantity (a single number).
In both examples of the Line of Best Fit an observation equation yk + vk = m xk + c was used
v1 − mx1 − c = − y1
v2 − mx2 − c = − y2
v3 − mx3 − c = − y3
v4 − mx4 − c = − y4
v5 − mx5 − c = − y5
Note that these re-arranged observation equations have all the unknown quantities v, m and c
on the left of the equals sign and all the known quantities on the right.
⎡ v1 ⎤ ⎡ − x1 −1⎤ ⎡ − y1 ⎤
⎢v ⎥ ⎢− x −1 ⎡ m ⎤ ⎢ − y 2 ⎥
⎥
⎢ 2⎥ + ⎢ 2 ⎥ =⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢⎣ c ⎥⎦ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ v5 ⎦ ⎣ − x5 −1⎦ ⎣ − y5 ⎦
v + Bx = f (2.37)
If n is the number of observations (equal to the number of equations) and u is the number of
unknown parameters
v is an (n,1) vector of residuals,
B is an (n,u) matrix of coefficients,
x is the (u,1) vector of unknown parameters,
f is the (n,1) vector of numeric terms derived from the observations,
d is an (n,1) vector of constants and
l is the (n,1) vector of observations.
Note that in many least squares problems the vector d is zero.
By substituting (2.37) into (2.36), we can obtain an expression for the least squares function
ϕ = vT Wv
= ( f − Bx ) W ( f − Bx )
T
(
= f T − ( Bx )
T
) W ( f − Bx )
= ( f T − xT BT ) W ( f − Bx )
Since ϕ is a scalar (a number), the four terms on the right-hand-side of (2.39) are also scalars.
Furthermore, since the transpose of a scalar is equal to itself, the second and third terms are
ϕ = f T Wf − 2f T WBx + xT ( BT WB ) x (2.40)
In equation (2.40) all matrices and vectors are numerical constants except x, the vector of
unknown parameters, therefore for the least squares function ϕ to be a minimum, its partial
derivative with respect to each element in vector x must be equated to zero, i.e., ϕ will be a
∂ϕ
minimum when = 0T . The first term of (2.40) does not contain x so its derivative is
∂x
automatically zero and the second and third terms are bilinear and quadratic forms
respectively and their derivatives are given in Appendix A, hence ϕ will be a minimum when
∂ϕ
= −2f T WB + 2xT ( BT WB ) = 0T
∂x
Cancelling the 2's, re-arranging and transposing gives a set of normal equations
( B WB ) x = B Wf
T T
(2.41)
Nx = t (2.42)
x = N − 1t (2.43)
After solving for the vector x, the residuals are obtained from
v = f − Bx (2.44)
ˆl = l + v (2.45)
The "hat" symbol (^) is used to denote quantities that result from a least squares process.
Such quantities are often called adjusted quantities or least squares estimates.
The name "least squares adjustment of indirect observations", adopted by Mikhail (1976) and
Mikhail & Gracie (1981), recognises the fact that each observation is an indirect measurement
of the unknown parameters. This is the most common technique employed in surveying and
geodesy and is described by various names, such as
parametric least squares
least squares adjustment by observation equations
least squares adjustment by residual equations
The technique of least squares adjustment of indirect observations has the following
characteristics
r = n − n0 .
• An equation can be written for each observation, i.e., there are n observation
equations. These equations can be represented in a standard matrix form; see equation
(2.37), representing n equations in u unknowns and solutions for the unknown
parameters, residuals and adjusted observations obtained from equations (2.41) to
(2.45).
The popularity of this technique of adjustment is due to its easy adaptability to computer-
programmed solutions. As an example, the following MATLAB program best_fit_line.m
reads a text file containing coordinate pairs (measurements) x and y and a weight w (a
measure of precision associated with each coordinate pair) and computes the parameters m
and c of a line of best fit y = mx + c .
function best_fit_line
%
% BEST_FIT_LINE reads an ASCII textfile containing coordinate pairs (x,y)
% and weights (w) associated with each pair and computes the parameters
% m and c of the line of best fit y = mx + c using the least squares
% principle. Results are written to a textfile having the same path and
% name as the data file but with the extension ".out"
%============================================================================
% Function: best_fit_line
%
% Author:
% Rod Deakin,
% Department of Geospatial Science, RMIT University,
% GPO Box 2476V, MELBOURNE VIC 3001
% AUSTRALIA
% email: [email protected]
%
% Date:
% Version 1.0 18 March 2003
%
% Remarks:
% This function reads numeric data from a textfile containing coordinate
% pairs (x,y) and weights (w) associated with each pair and computes the
% parameters m and c of a line of best fit y = mx + c using the least
% squares principle. Results are written to a textfile having the same
% path and name as the data file but with the extension ".out"
%
% Arrays:
% B - (n,u) coeff matrix of observation equation v + Bx = f
% f - (n,1) vector of numeric terms
% N - (u,u) coefficient matrix of Normal equations Nx = t
% Ninv - (u,u) inverse of N
% t - (u,1) vector of numeric terms of Normal equations Nx = t
% v - (n,1) vector of residuals
% W - (n,n) weight matrix
% weight - (n,1) vector of weights
% x - (u,1) vector of solutions
% x_coord - (n,1) vector of x coordinates
% y_coord - (n,1) vector of y coordinates
%
%
% Variables
% n - number of equations
% u - number of unknowns
%
% References:
% Notes on Least Squares (2003), Department of Geospatial Science, RMIT
% University, 2003
%
%============================================================================
%-------------------------------------------------------------------------
% 1. Call the User Interface (UI) to choose the input data file name
% 2. Concatenate strings to give the path and file name of the input file
% 3. Strip off the extension from the file name to give the rootName
% 4. Add extension ".out" to rootName to give the output filename
% 5. Concatenate strings to give the path and file name of the output file
%-------------------------------------------------------------------------
filepath = strcat('c:\temp\','*.dat');
[infilename,inpathname] = uigetfile(filepath);
infilepath = strcat(inpathname,infilename);
rootName = strtok(infilename,'.');
outfilename = strcat(rootName,'.out');
outfilepath = strcat(inpathname,outfilename);
%----------------------------------------------------------
% 1. Load the data into an array whose name is the rootName
% 2. set fileTemp = rootName
% 3. Copy columns of data into individual arrays
%----------------------------------------------------------
load(infilepath);
fileTemp = eval(rootName);
x_coord = fileTemp(:,1);
y_coord = fileTemp(:,2);
weight = fileTemp(:,3);
% compute residuals
v = f - (B*x);
fprintf(fidout,'\n\nInput Data');
fprintf(fidout,'\n x(k) y(k) weight w(k)');
for k = 1:n
fprintf(fidout,'\n%10.4f %10.4f %10.4f',x_coord(k),y_coord(k),weight(k));
end
fprintf(fidout,'\n\n');
Running the program from the MATLAB command window prompt >> opens up a standard
Microsoft Windows file selection window in the directory c:\Temp. Select the appropriate
data file (in this example: line_data.dat) by double clicking with the mouse and the program
reads the data file, computes the solutions and writes the output data to the file
c:\Temp\line_data.out
Input Data
x(k) y(k) weight w(k)
-40.0000 -24.0000 2.0000
-15.0000 -24.0000 5.0000
10.0000 -12.0000 7.0000
38.0000 15.0000 3.0000
67.0000 30.0000 3.0000
Vector of solutions x
0.5930
-12.6691
Vector of residuals v
-12.3878
2.4363
5.2605
-5.1363
-2.9403
The data in this example is taken from section 2.4.2 Line of Best Fit (unequal weights)
By adding the following lines to the program, the Line of Best Fit is shown on a plot together
with the data points.
%--------------------------------------
% plot data points and line of best fit
%--------------------------------------
% plot line of best fit and then the data points with a star (*)
plot(x,y,'k-');
plot(x_coord,y_coord,'k*');
20
10
Y coordinate
-10
-20
-30
-40
-40 -20 0 20 40 60 80
X coordinate
The general matrix solutions for least squares adjustment of indirect observations (see
equations (2.37) to (2.45) of section 2.6) can be applied to curve fitting. The following two
examples (parabola and ellipse) demonstrate the technique.
Consider the following: A surveyor working on the re-alignment of a rural road is required to
fit a parabolic vertical curve such that it is a best fit to the series of natural surface Reduced
Levels (RL's) on the proposed new alignment. Figure 2.2 shows a Vertical Section of the
proposed alignment with Chainages (x-values) and RL's (y-values).
Natura
l Surface
Datum RL 50.00
46.20
36.62
38.96
47.42
57.72
63.48
Red. Level
250
300
350
150
200
Chainage
100
y = ax 2 + bx + c (2.46)
This is the mathematical model that we assume our data accords with and to account for the
measurement inconsistencies, due to the irregular natural surface and small measurement
errors we can add residuals to the left-hand-side of (2.46) to give an observation equation
vk − axk2 − bx − c = − yk (2.48)
⎡ v1 ⎤ ⎡ − x12 − x1 −1⎤ ⎡ − y1 ⎤
⎢v ⎥ ⎢− x 2 ⎥ ⎡a ⎤ ⎢ − y ⎥
⎢ 2⎥ + ⎢ 2 − x2 −1⎥ ⎢ ⎥ ⎢ 2 ⎥
b =
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ 2 ⎥ ⎢⎣ c ⎥⎦ ⎢ ⎥
⎣ v6 ⎦ ⎢⎣ − x6 − x6 −1⎥⎦ ⎣ − y6 ⎦
where
⎡ v1 ⎤ ⎡ − x12 − x1 −1⎤ ⎡ − y1 ⎤
⎢v ⎥ ⎢ 2 ⎥ ⎡a ⎤ ⎢− y ⎥
− x2 − x2 −1⎥
v (6,1) = ⎢ 2⎥, B ( 6,3) = ⎢ , x (3,1) = b⎥ ,
⎢ f(6,1) = ⎢ 2⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ 2 ⎥ ⎢⎣ c ⎦⎥ ⎢ ⎥
⎣ v6 ⎦ ⎣⎢ − x6 − x6 −1⎥⎦ ⎣ − y6 ⎦
Considering all the measurements to be of equal precision, i.e., W = I, the least squares
solution for the three parameters in the vector x is given by the following sequence of
operations
This is the identical series of operations to solve for the parameters of the Line of Best Fit,
except in this case u = 3. With minor modifications to the MATLAB program best_fit_line.m
another MATLAB program best_fit_parabola.m can be created to determine the parameters a,
b, c of the best fit parabola. The relevant modifications are shown below.
Making the following changes to the MATLAB program best_fit_line, a new program
best_fit_parabola can be created.
function best_fit_parabola
%
% BEST_FIT_PARABOLA reads an ASCII textfile containing coordinate pairs (x,y)
% and weights (w) associated with each pair and computes the parameters
% a, b and c of a best fit parabola y = a(x*x) + bx + c using the least
% squares principle. Results are written to a textfile having the same
% path and name as the data file but with the extension ".out"
% Remarks:
% This function reads numeric data from a textfile containing coordinate
% pairs (x,y) and weights (w) associated with each pair and computes the
% parameters a, b, and c of a best fit parabola y = a(x*x) + bx + c using
% the least squares principle. Results are written to a textfile having
% the same path and name as the data file but with the extension ".out"
%------------------------------------------
% plot data points and Parabola of best fit
%------------------------------------------
Using the data from Figure 2.2 a data file c:\Temp\parabola_data.dat was created
Running the program from the MATLAB command window generated the following output
file c:\Temp\parabola_data.out and a plot of the Least Squares Parabola of best Fit
Input Data
x(k) y(k) weight w(k)
100.0000 63.4800 1.0000
150.0000 46.2000 1.0000
200.0000 36.6200 1.0000
250.0000 38.9600 1.0000
300.0000 47.4200 1.0000
350.0000 57.7200 1.0000
Vector of solutions x
0.001500
-0.688221
116.350000
Vector of residuals v
-0.948
0.676
2.103
-0.889
-2.498
1.555
60
55
Y coordinate
50
45
40
35
1 00 150 2 00 250 300 3 50
X c oord inate
In November 1994, a survey was undertaken by staff of the Department of Geospatial Science
at the Melbourne Cricket Ground (MCG) to determine the dimensions of the playing surface.
This survey was to decide which of two sets of dimensions was correct, those of the
Melbourne Cricket Club (MCC) or those of the Australian Football League (AFL). The MCC
curator Tony Ware and the AFL statistician Col Hutchison both measured the length of the
ground (Tony Ware with a 100-metre nylon tape and Col Hutchison with a measuring wheel)
and compared their distances with the "true" distance determined by Electronic Distance
Measurement (EDM) with a Topcon 3B Total Station. Their measurements were both
reasonably close to the EDM distance and it turned out that the "official" AFL dimensions
were incorrect. After this "measure-off", observations (bearings and distances) were made to
seventeen points around the edge of the playing surface to determine the Least Squares
Ellipse of Best Fit and to see if the major axis of this ellipse was the actual line between the
goals at either end. The Total Station was set-up close to the goal-to-goal axis and 20-25
metres from the centre of the ground. An arbitrary X,Y coordinate system was used with the
origin at the Total Station and the positive X-axis in the direction of the Brunton Avenue end
of the Great Southern Stand (approximately west). The table of coordinates is given below;
point numbers 1 to 6 were not points on the edge of the ground.
To develop an observation equation for the Least Squares Ellipse of Best Fit and to determine
the lengths and directions of the axes of the ellipse the following derivation of the general
equation of an ellipse is necessary.
Figure 2.5 shows an ellipse whose axes are aligned with the u-v axes. The semi-axes lengths
are a and b ( a > b) , the centre of the ellipse is at X 0 , Y0 and the ellipse axes are rotated by an
angle β , measured positive anti-clockwise from the x-axis. The x-y axes are parallel to the X-
Y axes and pass through the centre of the ellipse.
Y
y
v
u
θ
r
β
Y0
a x
b
X0 X
Figure 2.5
u2 v2
+ =1 (2.49)
a 2 b2
⎡1 a 2 0 ⎤ ⎡u ⎤
[u v] ⎢ 2⎥⎢ ⎥
=1 (2.50)
⎣ 0 1 b ⎦ ⎣v ⎦
The u,v axes are rotated (positive anti-clockwise) by an angle β from the x,y axes and the
relationship between coordinates is shown in Figure 2.6
y sin β
y x sin β
v •
u
y cos β x cos β
β
x
Figure 2.6
u = x cos β + y sin β
(2.51)
v = − x sin β + y cos β
Replacing cos β and sin β with the letters c and s the coordinate relationships can be
represented as a matrix equation
⎡u ⎤ ⎡ c s⎤ ⎡ x ⎤
⎢ v ⎥ = ⎢ −s c ⎥ ⎢ y ⎥ (2.52)
⎣ ⎦ ⎣ ⎦⎣ ⎦
Transposing this equation (remembering the reversal rule with the transpose of matrix
products) gives
⎡ c −s ⎤
[u v] = [ x y] ⎢ ⎥ (2.53)
⎣ s c⎦
Substituting (2.52) and (2.53) into (2.50) and multiplying the matrices gives
⎡c −s ⎤ ⎡1 a 2 0 ⎤ ⎡ c s⎤ ⎡ x ⎤
[x y] ⎢ ⎥ ⎢ 2⎥⎢ ⎥⎢ ⎥ =1
⎣ s c ⎦ ⎣ 0 1 b ⎦ ⎣ −s c ⎦ ⎣ y ⎦
⎡ ⎛ c 2 s2 ⎞ ⎛ cs cs ⎞ ⎤
⎢⎜ 2 + 2 ⎟ ⎜ 2 − 2 ⎟ ⎥
⎝a b ⎠ ⎝a b ⎠⎥ ⎡ x ⎤
[x y] ⎢ =1
⎢ ⎛ cs cs ⎞ ⎛ s2 c 2 ⎞ ⎥ ⎢⎣ y ⎥⎦
⎢⎜ 2 − 2 ⎟ ⎜ 2 + 2 ⎟⎥
⎢⎣ ⎝ a b ⎠ ⎝a b ⎠ ⎥⎦
Replacing the elements of the square matrix with the symbols A, B and H, noting that the top-
right and lower-left elements are the same, this equation may be written in a general form as
⎡A H ⎤ ⎡ x⎤
[x y] ⎢ =1
⎣H B ⎥⎦ ⎢⎣ y ⎥⎦
or Ax 2 + 2 Hxy + By 2 = 1 (2.54)
Equation (2.54) is the equation of an ellipse centred at the coordinate origin but with axes
rotated from the x,y axes. The semi axes lengths a and b, and the rotation angle β can be
determined from (2.54) by the following method.
Letting x = r cos θ and y = r sin θ in equation (2.54) gives the polar equation of the ellipse
1
A cos2 θ + 2 H cosθ sin θ + B sin 2 θ = (2.55)
r2
r is the radial distance from the centre of the ellipse and θ is the angle measured positive
anti-clockwise from the x-axis. Equation (2.55) has maximum and minimum values defining
the lengths and directions of the axes of the ellipse. To determine these values from (2.55),
consider the following
1
Let 2
= f = A cos2 θ + 2 H cosθ sin θ + B sin 2 θ
r
= A cos2 θ + H sin 2θ + B sin 2 θ
and aim to find the optimal (maximum and minimum) values of f and the values of θ when
these occur by investigating the first and second derivatives f ′ and f ′′ respectively, i.e.,
f ′ = ( B − A) sin 2θ + 2 H cos 2θ
where (2.56)
f ′′ = 2 ( B − A) cos 2θ − 4 H sin 2θ
Now the maximum or minimum value of f occurs when f ′ = 0 and from the first member of
(2.56) the value of θ is given by
2H
tan 2θ = (2.57)
A− B
But this value of θ could relate to either a maximum or a minimum value of f. So from the
second member of equations (2.56) with a value of 2θ from equation (2.57) this ambiguity
can be resolved by determining the sign of the second derivative f ′′ giving
⎧ f max ⎫ ⎧ f ′′ < 0 ⎫
⎨ ⎬ when ⎨ ⎬
⎩ f min ⎭ ⎩ f ′′ > 0 ⎭
In the polar equation of the ellipse given by equation (2.55) f max coincides with rmin and f min
coincides with rmax so the angle β (measured positive anti-clockwise) from the x-axis to the
major axis of the ellipse (see Figure 2.5) is found from
⎧ rmax ⎫ ⎧ f ′′ > 0 ⎫ ⎧β = θ ⎫
⎨ ⎬ when ⎨ ⎬ and ⎨ ⎬ (2.58)
⎩ rmin ⎭ ⎩ f ′′ < 0 ⎭ ⎩β = θ − 2 π ⎭
1
These results can be verified by considering the definitions of A, B and H used in the
derivation of the polar equation of the ellipse, i.e.,
⎛ 1 1⎞ ⎛ 1 1⎞
and A − B = ⎜ 2 − 2 ⎟ cos 2 β , 2 H = ⎜ 2 − 2 ⎟ sin 2 β
⎝a b ⎠ ⎝a b ⎠
2H
giving tan 2 β =
A− B
Noting that the values of θ coinciding with the maximum or minimum values of the function
2H
f are found from equation (2.57) then tan 2 β = = tan 2θ or
A− B
tan 2θ = tan 2 β
whereupon
2θ = 2 β + nπ or θ = β + 12 nπ where n is an integer
f ′′ = 2 ( B − A) cos 2θ − 4 H sin 2θ
Now, for n = 0
⎛ 1 1⎞
θ = β , f ′′ θ = β = −2 ⎜ − 2 ⎟ and since a > b , f ′′ θ = β > 0
⎝a b ⎠
2
( cos β + sin 2 β )
2 2
1
= 2
=
a a2
So rmax = a
When n = 1
θ = β + 12 π , sin 2θ = − sin 2 β , cos 2θ = − cos 2 β and so
⎛ 1 1⎞
f ′′ θ = β + 1π = 2 ⎜ 2 − 2 ⎟ and since a > b , f ′′ θ = β + 1π < 0
2
⎝a b ⎠ 2
(sin β + cos2 β )
2 2
1
f max = 2
=
b b2
So rmin = b
When n = 2
θ = β + π , sin 2θ = cos 2 β , cos 2θ = cos 2 β and f ′′ θ = β +π > 0
1
So θ = β + 12 π makes f min = and rmax = a
a2
When n = 3
θ = β + 23 π , sin 2θ = − cos 2 β , cos 2θ = − cos 2 β and f ′′ θ = β + π < 0 3
2
1
So θ = β + 23 π makes f max = and rmin = b
b2
All other even values of n give the same result as n = 2 and all other odd values of n give the
same result as n = 1
Now consider Figure 2.5 and the general Cartesian equation of the ellipse, re-stated again as
aX 2 + 2hXY + bY 2 + dX + eY = 1 (2.59)
where the translated x,y coordinate system is related to the X,Y system by
X = x + X0 and Y = y + Y0
a ( x + X 0 ) + 2h ( x + X 0 )( y + Y0 ) + b ( y + Y0 ) + d ( x + X 0 ) + e ( y + Y0 ) = 1
2 2
Now when the coefficients of x and y are zero the ellipse will be centred at the origin of the
x,y axes with an equation of the form
ax 2 + 2hxy + by 2 = c (2.60)
2aX 0 + 2hY0 + d = 0
and (2.62)
2hX 0 + 2bY0 + e = 0
Equations (2.62) can be written in matrix form and solved (using the inverse of a 2,2 matrix)
to give X 0 and Y0
⎡ d ⎤ ⎡ −2 a − 2 h ⎤ ⎡ X 0 ⎤
⎢ e ⎥ = ⎢ −2 h −2b ⎥ ⎢ Y ⎥
⎣ ⎦ ⎣ ⎦⎣ 0 ⎦
⎡X0⎤ 1 ⎡ −b h ⎤ ⎡ d ⎤
⎢ Y ⎥ = 2ab − 2h 2 ⎢ h − a ⎥ ⎢ e ⎥
⎣ 0⎦ ⎣ ⎦⎣ ⎦
eh − bd dh − ae
giving X0 = and Y0 = (2.63)
2 ( ab − h 2 ) 2 ( ab − h 2 )
Ax 2 + 2 Hxy + By 2 = 1 (2.64)
a h b
where A= , H= , B=
c c c
Equation (2.64), identical to equation (2.54), is the equation of an ellipse centred at the x,y
coordinate origin whose axes are rotated from the x,y axes by an angle β . The rotation angle
β and semi-axes lengths a and b of the ellipse can be determined using the method set out
above and equations (2.58), (2.57), (2.56) and (2.55). Thus, we can see from the development
that the general Cartesian equation of an ellipse is given by
aX 2 + 2hXY + bY 2 + dX + eY = 1 (2.65)
Note that the coefficients a and b in this equation are not the semi-axes lengths of the ellipse.
Returning to the problem of the Least Squares Ellipse of Best Fit for the MCG, the size, shape
location and orientation of this ellipse can be determined from a set of observation equations
of the form
This observation equation is the general Cartesian equation of an ellipse with the addition of
the residual vk . The addition of vk to the left-hand-side of (2.65) is simply a convenience and
reflects the fact that the measured coordinates X k , Yk are inconsistent with the mathematical
model. For each of the 17 measured points around the perimeter of the MCG an equation can
be written and arranged in the matrix form v + Bx = f
⎡a⎤
⎡ v1 ⎤ ⎡ X 12 X 1Y1 Y12 X1 Y1 ⎤ ⎢ ⎥ ⎡1⎤
⎢v ⎥ ⎢X2 X 2Y2 Y22 X2
⎥ 2h
Y2 ⎥ ⎢ ⎥ ⎢⎢1⎥⎥
⎢ 2⎥+⎢ 2 ⎢b⎥=
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ 2 ⎥ d ⎢ ⎥
⎣ v17 ⎦ ⎢⎣ X 17 X 17Y17 Y172 X 17 Y17 ⎥⎦ ⎢ ⎥ ⎣1⎦
⎢⎣ e ⎥⎦
The vector x contains the parameters a, h, b, d and e of the general equation of the ellipse and
with a weight matrix W = I (i.e., all observations of equal precision) the solution for x is
given by the following sequence of operations
This is the identical series of operations to solve for the parameters of the Line of Best Fit,
and for the Parabola of Best Fit except in this case the residuals v have little practical meaning
because they are not connected to quantities such as distances or coordinates. In the case of
the Least Squares Ellipse of Best Fit, it is better to compute the offsets h (perpendicular
distances) from the ellipse to the data points rather than the residuals v.
(i) compute the parameters a, h, b, d and e using the Least Squares process set out
above.
(ii) compute the coordinates of the origin X 0 , Y0 and the constant c using equations
(2.63) and (2.61).
(iii) compute coefficients A, H and B of the ellipse given by (2.64) which can then be
used to compute the rotation angle β and the semi-axes lengths a and b from
equations (2.55) to (2.58) and.
(iv) compute the u,v coordinates of the data points using equations (2.51).
Now, having the u,v coordinates, the offsets h can be computed. Consider the sectional view
of a quadrant of an ellipse in Figure 2.7. The u,v axes are in the direction of the major and
minor axes respectively (a and b are the semi-axes lengths) and P is a point related to the
ellipse by the normal, which makes an angle φ with the major axis, and the distance h = QP
along the normal. The u,v coordinates of P are the distances LP and MP respectively. From
the geometry of an ellipse, the normal intersects the minor axis at H and the distance QH = ν
(where ν is the Greek symbol nu) and the distances DH and OH are ν e 2 and ν e 2 sin φ
respectively. e is the eccentricity of the ellipse and the eccentricity and flattening f of an
ellipse are related to the semi-axes a and b.
tan
ge
nt
L • P
h
Q
b ν
)
e2
1−
al
rm
ν(
no
O D φ M u
a
νe 2
a−b
f =
a
e = f (2 − f )
2
a
ν=
1 − e 2 sin 2 φ
Using these relationships, the angle φ and perpendicular distance h are given by
v + ν e 2 sin φ
tan φ = (2.67)
u
u
h= −ν (2.68)
cos φ
Inspecting these equations; if the semi-axes a,b and the u,v coordinates of P are known, the
perpendicular offset h can be determined from (2.68) and (2.67). It should be noted that
functions of φ appear on both sides of the equals sign of equation (2.67) and φ must be
solved by iteration.
To determine the Least Square Best Fit Ellipse for the playing surface of the MCG a
MATLAB program best_fit_ellipse.m operating in the same way as the MATLAB programs
best_fit_line and best_fit_parabola with a data file (in this example: MCG_ellipse_data.dat)
gives the following results (contained in an output file having the same name and path as the
data file but with the extension .dat) and plot on the screen.
Input Data
point x(k) y(k) weight w(k)
7 -54.5800 17.1100 1.0000
8 -45.4700 36.5600 1.0000
9 -28.4000 53.2200 1.0000
10 -2.0200 63.7200 1.0000
11 28.1200 63.4400 1.0000
12 57.4900 52.5500 1.0000
13 80.8500 34.2000 1.0000
14 98.0800 9.1400 1.0000
15 105.6900 -17.3000 1.0000
16 103.8300 -46.9600 1.0000
17 88.4200 -71.5000 1.0000
18 61.2600 -86.8400 1.0000
19 26.4700 -91.0700 1.0000
20 -6.5900 -81.3700 1.0000
21 -34.5500 -59.2400 1.0000
22 -51.5100 -29.2800 1.0000
23 -56.3000 -2.3100 1.0000
Ellipse parameters
semi-major axis a = 86.017
semi-minor axis b = 73.544
0
23 MCG
-20 15
22
-40 16
21
-60
17
20 Brunton Ave
-80 18 goals
19
Figure 2.8 Plot of Least Squares Best Fit Ellipse and data points for the MCG
The MATLAB program best_fit_ellipse.m calls two other MATLAB functions ellipse.m (a
function to compute the coordinates of points on an ellipse) and DMS.m (a function to convert
decimal degree to degrees, minutes and seconds). A copy of these programs is shown below.
function best_fit_ellipse
%
% BEST_FIT_ELLIPSE reads an ASCII textfile containing point numbers of
% coordinate pairs (X,Y) and weights (W) associated with each pair and
% computes the cordinates of the origin, the lengths of the axes and
% the rotation angle of the Best Fit Ellipse using the least squares
% principle. Results are written to a textfile having the same path
% and name as the data file but with the extension ".out"
%============================================================================
% Function: best_fit_ellipse
%
% Author:
% Rod Deakin,
% School of Mathematical and Geospatial Sciences, RMIT University,
% GPO Box 2476V, MELBOURNE VIC 3001
% AUSTRALIA
% email: [email protected]
%
% Date:
% Version 1.0 22 March 2003
% Version 1.1 10 May 2003
% Version 1.2 9 November 2005
%
% Functions Required:
% [X,Y] = ellipse(a,b,theta)
% [D,M,S] = DMS(DecDeg)
%
% Remarks:
% The general equation of an ellipse is
% aXX + 2hXY + bYY + dX + eY = 1
% This function computes the parameters a,h,b,d,e of a Least Squares Best Fit
% Ellipse given a set of X,Y coordinate pairs and weights (w) associated with
% each pair. The centre of the best fit ellipse is at Xo = (eh-db)/(2ab-2hh)
% and Yo = (dh-ea)/(2ab-2hh).
% A constant c = 1 - (aXoXo + 2hXoYo + bYoYo + dXo + eYo) is divided into a,
% h and b giving A = a/c, H = h/c and B = b/c which are the parameters of an
% ellipse Axx +2Hxy + Byy = 1. The major axis of this ellipse is rotated
% from the coordinate axes by an angle beta which can be determined from the
% polar equation of an ellipse
% A*cos_squared(theta) + 2H*cos(theta)*sin(theta) + B*sin_squared(theta) =
1/r_squared
% The maximum and minimum values of this function occur for theta given by
% tan(2*theta) = 2H/(A-B) and the angle beta is determined by evaluating
% the sign of the second derivative of the polar equation. This angle is
% substituted into the polar equation to determine the length of the semi-major
% axis length a. Beta - 90 degrees will give the length of the semi-minor
% axis length b
% Note that the semi-axes lengths a,b are not the same as the parameters
% a and b in the general equation of the ellipse.
% Results are written to a textfile having the same path and name as the
% data file but with the extension ".out"
%
% References:
% Notes on Least Squares (2005), Geospatial Science, RMIT
% University, 2005
%
% Arrays:
% B - coeff matrix of observation equation v + Bx = f
% f - vector of numeric terms
% N - coefficient matrix of Normal equations Nx = t
% Ninv - inverse of N
% p - vector of perpendicular distances from ellipse to points
% point - vector of point numbers
% t - vector of numeric terms of Normal equations Nx = t
% u,v - vectors of u,v coords of ellipse
% W - weight matrix
% weight - vector of weights
% x - vector of solutions
% x,y - vectors of x,y coords of ellipse
% x_coord - vector of X coordinates
% y_coord - vector of Y coordinates
% xpt,ypt - vectors of coords for point number locations on plot
% Xpt,Ypt - vectors Xpt = xpt + Xc, Ypt = ypt + Yc
%
%
% Variables:
% A,B,H - parameters of ellipse Axx + 2Hxy + Byy = 1
% a,h,b, - parameters of ellipse aXX + 2hXY + bYY + dX + eY = 1
% d,e
% a1,b1 - semi-major and semi-minor axes of ellipse
% beta - angle between x-axis and major axis of ellipse (degrees)
% brg - bearing of major axis (u-axis) of ellipse (degrees)
% c - constant of translated ellipse or cos(x)
% d2r - degree to radian conversion factor = 180/pi = 57.29577951...
% e2 - eccentricity squared
% flat - flattening of ellipse
% f_dd - second derivative of the function "f" where f is the polar
% equation of an ellipse
% lat - latitude (radians) of point related to an ellipse
% n - number of equations
% new_lat - new latitude in iteration
% nu - radius of curvature in prime meridian
% pion2 - 90 degrees or pi/2
% u - number of unknowns
% s - sin(x)
% s1,s2 - sin(lat) and sin_squared(lat)
% scale - scale factor to reduce size of numbers in normal equations
% theta - angle for which polar equation of ellipse gives max/min
% values
% two_theta - 2*theta
% Xc,Yc - coords of centre of ellipse X = x + Xc, Y = y + Yc
% X0,Y0 - scaled coords of centre of ellipse
%
%============================================================================
%
% Set program constants
d2r = 180/pi;
pion2 = pi/2;
scale = 100;
%-------------------------------------------------------------------------
% 1. Call the User Interface (UI) to choose the input data file name
% 2. Concatenate strings to give the path and file name of the input file
% 3. Strip off the extension from the file name to give the rootName
% 4. Add extension ".out" to rootName to give the output filename
% 5. Concatenate strings to give the path and file name of the output file
%-------------------------------------------------------------------------
filepath = strcat('c:\temp\','*.dat');
[infilename,inpathname] = uigetfile(filepath);
infilepath = strcat(inpathname,infilename);
rootName = strtok(infilename,'.');
outfilename = strcat(rootName,'.out');
outfilepath = strcat(inpathname,outfilename);
%----------------------------------------------------------
% 1. Load the data into an array whose name is the rootName
% 2. set fileTemp = rootName
% 3. Copy columns of data into individual arrays
%----------------------------------------------------------
load(infilepath);
fileTemp = eval(rootName);
point = fileTemp(:,1);
x_coord = fileTemp(:,2);
y_coord = fileTemp(:,3);
weight = fileTemp(:,4);
%-----------------------------------------------------------------------
% Compute perpendicular distances from points to the ellipse of best fit
%-----------------------------------------------------------------------
% Create a set of u,v coordinates by first reducing the X,Y coords
% to x,y coordinates and then rotating these coordinates by the
% rotation angle beta. The u-axis is the major axis of the ellipse.
x = x_coord-Xc;
y = y_coord-Yc;
for k=1:n
u(k,1) = x(k)*cos(beta/d2r) + y(k)*sin(beta/d2r);
v(k,1) = -x(k)*sin(beta/d2r) + y(k)*cos(beta/d2r);
end
%----------------------------------------------------
% Compute the coordinate locations for a point number
% to be shown on the plot. These locations used in
% in the plot routines below.
%----------------------------------------------------
for k=1:n
theta = atan2(x(k),y(k));
if theta<0
theta = theta + 2*pi;
end
r = sqrt(x(k)^2 + y(k)^2)-10;
xpt(k) = r*sin(theta);
ypt(k) = r*cos(theta);
end
Xpt = xpt + Xc;
Ypt = ypt + Yc;
%-----------------------------
% print the data to the screen
%-----------------------------
fprintf('\n Ellipse of Best Fit\n');
fprintf('\n General Equation of Ellipse with X,Y origin not at centre of ellipse');
fprintf('\n aXX + 2hXY + bYY + dX + eY = 1');
fprintf('\n a = %14.6e',a/scale^2);
fprintf('\n h = %14.6e',h/scale^2);
fprintf('\n b = %14.6e',b/scale^2);
fprintf('\n d = %14.6e',d/scale^2);
fprintf('\n e = %14.6e\n',e/scale^2);
fprintf('\n Equation of Ellipse with x,y origin at centre of ellipse');
fprintf('\n Axx + 2Hxy + Byy = 1');
fprintf('\n A = %14.6e',A/scale^2);
fprintf('\n H = %14.6e',H/scale^2);
fprintf('\n B = %14.6e\n',B/scale^2);
fprintf('\n Ellipse parameters');
fprintf('\n semi-major axis a = %8.3f',a1);
fprintf('\n semi-minor axis b = %8.3f\n',b1);
fprintf('\n Bearing of major axis');
fprintf('\n beta(degrees) = %12.6f',beta);
[D,M,S] = DMS(beta);
fprintf('\n beta(DMS) = %4d %2d %5.2f',D,M,S);
fprintf('\n Brg(degrees) = %12.6f',brg);
[D,M,S] = DMS(brg);
fprintf('\n Brg(DMS) = %4d %2d %5.2f\n',D,M,S);
fprintf('\n Coordinates of centre of ellipse');
fprintf('\n X(centre) = %12.3f',Xc);
fprintf('\n Y(centre) = %12.3f\n',Yc);
fprintf('\n Data and offsets to ellipse of best fit');
fprintf('\n pt offset X Y x y u
v');
for k=1:n
fprintf('\n %3d %10.3f %10.3f %10.3f %10.3f %10.3f %10.3f
%10.3f',point(k),p(k,1),x_coord(k),y_coord(k),x(k),y(k),u(k,1),v(k,1));
end
fprintf('\n\n');
%----------------------------------
% print the data to the output file
%----------------------------------
fprintf(fidout,'\n\nInput Data');
fprintf(fidout,'\n point x(k) y(k) weight w(k)');
for k = 1:n
fprintf(fidout,'\n%3d %12.4f %12.4f
%12.4f',point(k),x_coord(k),y_coord(k),weight(k));
end
fprintf(fidout,'\n\n');
%-------------------------------------------------------------------
% Call function 'ellipse' with parameters a,b,theta and receive back
% X,Y coordinates whose origin is at the centre of the ellipse
%-------------------------------------------------------------------
[X,Y] = ellipse(a1,b1,beta);
X = X + Xc;
Y = Y + Yc;
%-------------------------------------------------------------------
% Set the X,Y coordinates of the major and minor axes of the ellipse
%-------------------------------------------------------------------
aX = [X(180) X(360)];
aY = [Y(180) Y(360)];
bX = [X(90) X(270)];
bY = [Y(90) Y(270)];
%-------------------------------------------------
% plot the ellipse of Best Fit and the data points
%-------------------------------------------------
figure(1);
clf(1);
plot(X,Y,'r-',aX,aY,'b-',bX,bY,'b-');
hold on;
plot(x_coord,y_coord,'k.');
axis equal;
box off;
for k=1:360
u = a*cos(k*d2r);
v = b*sin(k*d2r);
X(k) = u*cos(theta*d2r) - v*sin(theta*d2r);
Y(k) = u*sin(theta*d2r) + v*cos(theta*d2r);
end
return
val = abs(DecDeg);
D = fix(val);
M = fix((val-D)*60);
S = (val-D-M/60)*3600;
if(DecDeg<0)
D = -D;
end
return
3. PROPAGATION OF VARIANCES
In least squares problems, where measurements (with associated estimates of variances and
covariances) are used to determine the best estimates unknown quantities it is important to be
able to determine the precisions of these estimated (or calculated) quantities. To do this
requires an understanding of propagation of variances so that certain rules and techniques can
be developed.
Students studying Least Squares must become familiar with statistical definitions,
terminology and rules. Some of these rules and definitions have been introduced in earlier
sections of these notes, e.g., in Chapter 2 the definition and classification of measurements
and measurement errors was discussed as well as the rules for computing means and variances
for finite and infinite populations. In addition, Chapter 2 contains sections explaining matrix
representations of variances and covariances, known as variance-covariance matrices Σ and
the related cofactor matrices Q and weight matrices W. The following sections in this chapter
repeat some of the rules and definitions already introduced as well as expanding on some
concepts previously mentioned.
The term statistical experiment can be used to describe any process by which several chance
observations are obtained. All possible outcomes of an experiment comprise a set called the
sample space and a set or sample space contains N elements or members. An event is a subset
of the sample space containing n elements. Experiments, sets, sample spaces and events are
the fundamental "tools" used to determine the probability of certain events where probability
is defined as
n
P ( Event ) = (3.1)
N
For example, if a card is drawn from a deck of playing cards, what is the probability that it is
a heart? In this case, the experiment is the drawing of the card and the possible outcomes of
the experiment could be one of 52 different cards, i.e., the sample space is the set of N = 52
possible outcomes and the event is the subset containing n = 13 hearts. The probability of
drawing a heart is
n 13
P ( Heart ) = = = 0.25
N 52
Suppose observations are made on a series of occasions (often termed trials) and
during these trials it is noted whether or not a certain event occurs. The event can
be almost any observable phenomenon, for example, that the height of a person
walking through a doorway is greater than 1.8 metres, that a family leaving a
cinema contains three children, that a defective item is selected from an assembly
line, and so on. These trials could be conducted twice a week for a month, three
times a day for six months or every hour for every day for 10 years. In the
theoretical limit, the number of trials N would approach infinity and we could
assume, at this point, that we had noted every possible outcome. Therefore, as
N → ∞ then N becomes the number of elements in the sample space containing
all possible outcomes of the trials. Now for each trial we note whether or not a
certain event occurs, so that at the end of N trials we have noted nN events. The
probability of the event (if it in fact occurs) can then be defined as
⎛n ⎞
P ( Event ) = lim ⎜ N ⎟
N →∞
⎝N ⎠
Since nN and N are both non-negative numbers and nN is not greater than N then
nN
0≤ ≤1
N
Hence
0 ≤ P {Event} ≤ 1
If the event occurs at every trial then nN = N and nN N = 1 for all N and so
The converse of these two relationships need not hold, i.e., a probability of one
need not imply certainty since it is possible that lim nN N = 1 without nN = 1 for
N →∞
all values of N and a probability of zero need not imply impossibility since it is
possible that lim nN N = 0 even though nN > 0 . Despite these qualifications, it
N →∞
A random variable X is a rule or a function, which associates a real number with each point in
a sample space. As an example, consider the following experiment where two identical coins
are tossed; h denotes a head and t denotes a tail.
X ( hh ) = 2
X ( ht ) = 1
X ( th ) = 1
X ( tt ) = 0
In this example X is the random variable defined by the rule "the number of heads obtained".
The possible values (or real numbers) that X may take are 0, 1, 2. These possible values are
usually denoted by x and the notation X = x denotes x as a possible real value of the random
variable X.
Random variables may be discrete or continuous. A discrete random variable assumes each
of its possible values with a certain probability. For example, in the experiment above; the
tossing of two coins, the sample space S = {hh, ht , th, tt} has N = 4 elements and the
probability the random variable X (the number of heads) assumes the possible values 0, 1 and
2 is given by
x 0 1 2
P ( X = x) 1
4
2
4
1
4
Note that the values of x exhaust all possible cases and hence the probabilities add to 1
A continuous random variable has a probability of zero of assuming any of its values and
consequently, its probability distribution cannot be given in tabular form. The concept of the
probability of a continuous random variable assuming a particular value equals zero may
seem strange, but the following example illustrates the point. Consider a random variable
whose values are the heights of all people over 21 years of age. Between any two values, say
1.75 metres and 1.85 metres, there are an infinite number of heights, one of which is 1.80
metres. The probability of selecting a person at random exactly 1.80 metres tall and not one
of the infinitely large set of heights so close to 1.80 metres that you cannot humanly measure
the difference is extremely remote, and thus we assign a probability of zero to the event. It
follows that probabilities of continuous random variables are defined by specifying an interval
within which the random variable lies and it does not matter whether an end-point is included
in the interval or not.
variable X which takes the numerical values x within the function. Such functions are known
as probability distribution functions and they are paired; i.e., f X ( x ) pairs with FX ( x ) ,
g X ( x ) pairs with G X ( x ) , etc. The functions with the lowercase letters are probability
density functions and those with uppercase letters are cumulative distribution functions.
For discrete random variables, the probability density function has the properties
1. f X ( xk ) = P ( X = xk )
∞
2. ∑ f (x ) =1
k =1
X k
1. FX ( xk ) = P ( X ≤ xk )
2. FX ( x ) = ∑ f (x )
xk ≤ x
X k
⎪ 3, 1 3, 2 3, 3 3, 4 3, 5 3, 6 ⎪
Sample space: S=⎨ ⎬
⎪ 4, 1 4, 2 4, 3 4, 4 4, 5 4, 6
⎪
⎪ 5, 1 5, 2 5, 3 5, 4 5, 5 5, 6 ⎪
⎪ ⎪
⎪⎩ 6, 1 6, 2 6, 3 6, 4 6, 5 6, 6 ⎪
⎭
Random Variable: X, the total of the two numbers
The probability the random variable X assumes the possible values x = 2, 3, 4, …, 12 is given
in Table 3.1
x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P ( X = x)
36 36 36 36 36 36 36 36 36 36 36
Note that the values of x exhaust all possible cases and hence the probabilities add to 1
6− x−7
f X ( x) = , x = 2,3, 4,… ,12
36
Probability distributions are often shown in graphical form. For discrete random variables,
probability distributions are generally shown in the form of histograms consisting of series of
rectangles associated with values of the random variable. The width of each rectangle is one
unit and the height is the probability given by the function f X ( x ) and the sum of the areas of
all the rectangles is 1. Figure 3.1 shows the Probability histogram for the random variable X,
the sum of the numbers when a pair of dice is tossed.
Probability histogram
0.18
0.16
0.14
0.12
Probability f(x)
0.1
0.08
0.06
0.04
0.02
0
0 2 4 6 8 10 12 14
x
The MATLAB function dice_pdf.m was used to create the Probability histogram of Figure 3.1
function dice_pdf
% Function DICE_PDF calculates the probability of a random variable
% taking the sum of the values when two dice are tossed and
% plots the probability density function as a histogram.
0.8
0.6
F(x)
0.4
0.2
0
0 2 4 6 8 10 12 14
x
Figure 3.2 Cumulative distribution function. [The dots at the left ends
of the line segments indicate the value of FX ( x ) at those values of x.
are curves, which may take various forms depending on the nature of the random variable.
Probability density functions f X ( x ) that are used in practice to model the behaviour of
continuous random variables are always positive and the total area under its curve, bounded
by the x-axis, is equal to one. These density functions have the following properties
+∞
2. ∫ f X ( x ) dx = 1
−∞
The probability that a random variable X lies between any two values x = a and x = b is the
area under the density curve between those two values and is found by methods of integral
calculus
b
P ( a < X < b ) = ∫ f X ( x ) dx (3.2)
a
The equations of the density functions f X ( x ) are usually complicated and areas under their
curves are found from tables. In surveying, the Normal probability density function is the
usual model for the behaviour of measurements (regarded as random variables) and the
probability density function is (Kreyszig, 1970, p. 107)
1 ⎛ x−μ ⎞
2
1 − ⎜
σ ⎟⎠
f X ( x) = e 2⎝ (3.3)
σ 2π
μ and σ are the mean and standard deviation respectively of the infinite population of x and
Figure 3.3 shows a plot of the Normal probability density curve for μ = 2.0 and σ = 2.5 .
0.14
0.12
0.1
f(x)
0.08
0.06
0.04
0.02
0
-10 -8 -6 -4 -2 0 2 4 6 8 10
x
Figure 3.3 Normal probability density function for μ = 2.0 and σ = 2.5
The MATLAB function normal_pdf.m was used to create the Normal probability density
curve of Figure 3.3
function normal_pdf(mx,sx)
% Function NORMAL_PDF(MX,SX) calculates the probability of a random variable X
% having a NORMAL distribution with mean MX and standard deviation
% SX and plots the probability density function
a = (x-mx)./sx;
For continuous random variables X, the cumulative distribution function FX ( x ) has the
following properties
x
1. FX ( x ) = P ( X ≤ x ) = ∫ f X ( x ) dx
−∞
d
2. FX ( x ) = f X ( x )
dx
In surveying, the Normal distribution is the usual model for the behaviour of measurements
and the cumulative distribution function is (Kreyszig, 1970, p. 108)
1 ⎛ x−μ ⎞
2
1 − ⎜ ⎟
FX ( x ) =
x
∫
2⎝ σ ⎠
e dx (3.4)
σ 2π −∞
1 ⎛ x−μ ⎞
2
1 − ⎜ ⎟
P ( a < X < b ) = FX ( b ) − FX ( a ) =
b
∫
2⎝ σ ⎠
e dx (3.5)
σ 2π a
Figure 3.4 shows a plot of the Normal cumulative distribution curve for μ = 2.0 and σ = 2.5 .
0.9
0.8
0.7
0.6
F(x)
0.5
0.4
0.3
0.2
0.1
0
-10 -8 -6 -4 -2 0 2 4 6 8 10
x
Figure 3.4 Normal cumulative distribution function for μ = 2.0 and σ = 2.5
function f X ( x ) is used to define a function whose integral gives the probability of X 1 lying
in the range a1 < X 1 < b1 , X 2 lying in the range a2 < X 2 < b2 , X 3 lying in the range
b1 b2 b3
Although least squares adjustment theory does not require the random variables
(measurements) to have particular probability distributions, the Normal distribution is the
usual model assumed to represent measurements and associated errors and corrections
(residuals). The Multivariate Normal distribution of random variables has a density function
of the following form (Mikhail, 1976, p. 27)
⎧⎪ 1 ⎫⎪ ⎧ 1 ⎫
f X ( x1 , x2 , , xn ) = f X ( x ) = ⎨ ⎬ × exp ⎨ − ( x − μ x ) Σ ( x − μ x ) ⎬
−1
T
(3.7)
⎪⎩ ( 2π ) Σ ⎪⎭ ⎩ 2 ⎭
n 2
with mean vector μ x and variance-covariance matrix Σ . For the case of two random
variables X and Y the Bivariate Normal probability density function has the following form
1 ⎧⎪ σ x2σ y2
f XY ( x, y ) = × exp ⎨ −
2π σ x2σ y2 − σ xy2 ⎪⎩ 2 (σ xσ y − σ xy )
2 2 2
(3.8)
⎡ ( x − μ )2 ( x − μ x ) ( y − μ y ) ( y − μ y ) ⎤ ⎫⎪
2
⎢ x
− 2σ xy + ⎥⎬
⎢ σx σ x2σ y2 σ y2 ⎥ ⎪
2
⎣ ⎦⎭
where μ x , μ y are the means, σ x2 , σ y2 are the variances of the random variables X and Y
Figure 3.5 shows a 3-dimensional plot of a Bivariate Normal probability density function with
μ x = 0.8, μ y = −0.2 , σ x = 1.5, σ y = 1.2 and σ xy = −0.5 over a range of possible values x, y of
the random variables X and Y.
0.1
f(x,y)
0.05
0
5
y 0
5
0
x
-5 -5
curves of intersection.
The MATLAB function bivariate_normal.m was used to create the Bivariate Normal
probability density surface of Figure 3.5. The equation of the Bivariate Normal probability
density function f XY ( x, y ) used in the function is a modified form of (3.8) where the
correlation coefficient
σ xy
ρ xy = (3.9)
σ xσ y
1 ⎧⎪ 1
f XY ( x, y ) = × exp ⎨ −
2π σ xσ y 1 − ρ xy2 ⎪⎩ 2 (1 − ρ xy )
2
(3.10)
⎡ ( x − μ )2 ( ) ( ) ( ) ⎤⎫
2
− μ − μ − μ
⎥ ⎪⎬
x y y
⎢ x
− 2 ρ xy
x y
+
y
⎢ σx σ xσ y σ y ⎥⎪
2 2
⎣ ⎦⎭
function bivariate_normal(mx,my,sx,sy,sxy)
% Function BIVARIATE_NORMAL(MX,MY,SX,SY,SXY) calculates the bivariate
% normal density function f(x,y) of two random variables X and Y having
% NORMAL distributions with means MX, MY, standard deviations SX, SY and
% covariance SXY and plots the probability density surface.
a = (x-mx)./sx;
b = (y-my)./sy;
c = a.^2 - (a.*b).*(2*r) + b.^2;
3.1.4. Expectations
variable over all possible values. It is computed by taking the sum of all possible values of
X = x multiplied by its corresponding probability. In the case of a discrete random variable
the expectation is given by
N
E { X } = μ X = ∑ xk P ( xk ) (3.11)
k =1
Equation (3.11) is a general expression from which we can obtain the usual expression for the
arithmetic mean
N
1
μ=
N
∑x
k =1
k (3.12)
If there are N possible values xk of the random variable X, each having equal probability
This relationship may be extended to a more general form if we consider the expectation of a
function g ( X ) of a random variable X whose probability density function is f X ( x ) . In this
case
+∞
E {g ( X )} = ∫ g ( x ) f ( x ) dxX (3.14)
−∞
Expressing (3.15) in matrix notation gives a general form of the expected value of a
multivariate function g ( X ) as
+∞ +∞ +∞
E {g ( X )} = ∫ ∫ ∫ g ( x ) f ( x ) dx X (3.16)
−∞ −∞ −∞
There are some rules that are useful in calculating expectations. They are given here without
proof but can be found in many statistical texts, e.g., Walpole, 1974. With a and b as
constants and X and Y as random variables
E {a} = a
E {aX } = a E { X }
E {aX + b} = a E { X } + b
E {g ( X ) ± h ( X )} = E {g ( X )} ± E {h ( X )}
E {g ( X , Y ) ± h ( X , Y )} = E {g ( X , Y )} ± E {h ( X , Y )}
⎡ μ X1 ⎤ ⎡ E ( X 1 ) ⎤ ⎡ X1 ⎤
⎢μ ⎥ ⎢ ⎥ ⎢X ⎥
⎢ X2 ⎥
⎢ E ( X 2 )⎥
mX = = = E ⎢ 2 ⎥ = E {X} (3.18)
⎢ μX3 ⎥ ⎢ E ( X 3 )⎥ ⎢ X3⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣⎢ ⎦⎥ ⎣ ⎦ ⎣ ⎦
{
σ X2 = E ( X − μ X ) =
2
} ∫ (x − μ )
−∞
x
2
f X ( x ) dx (3.19)
σ XY = E {( X − μ X )(Y − μY )}
= E { XY − X μY − Y μ X + μ X μY }
= E { XY } − E { X μY } − E {Y μ X } + E {μ X μY }
= E { XY } − μY E { X } − μ X E {Y } + μ X μY
= E { XY } − μY μ X − μ X μY + μ X μY
= E { XY } − μ X μY
If the random variables X and Y are independent, the expectation of the product is equal to the
product of the expectations, i.e., E { XY } = E { X } E {Y } . Since the expected values of X and Y
are the means μ X and μY then E { XY } = μ X μY if X and Y are independent. Substituting this
result into the expansion above shows that the covariance σ XY is zero if X and Y are
independent.
For a multivariate function, variances and covariances of the random variables X is given by
the matrix equation
{
Σ XX = E [ X − m X ][ X − mY ]
T
} (3.21)
Σ XX is a symmetric matrix known as the variance-covariance matrix and its general form can
be seen when (3.21) is expanded
⎧ ⎡ X 1 − μ X1 ⎤ ⎫
⎪⎢ ⎥ ⎪
⎪⎢ X 2 − μX2 ⎥ ⎪
Σ XX = E⎨ ⎡ X − μ X1 X 2 − μ X2 X n − μ X n ⎤⎦ ⎬
⎪⎢ ⎥⎣ 1 ⎪
⎪⎢ ⎥
⎩ ⎣⎢ X n − μ X n ⎦⎥
⎪
⎭
⎡ σ X2 1 σ X1 X 2 σX X ⎤
⎢ ⎥
1 n
⎢σ X X σ X2 1 σX ⎥
giving Σ XX =⎢ 2 1 2Xn
⎥ (3.22)
⎢ ⎥
⎢⎣σ X n X1 σ X n X 2 σ X2 1
⎥⎦
y = Ax + b (3.23)
where A is a coefficient matrix and b is a vector of constants. Then, using the rules for
expectations developed above we may write an expression for the mean mY using (3.18)
mY = E {y}
= E {Ax + b}
= E {Ax} + E {b}
= A E {x} + b
= Am X + b
{
Σ yy = E ( y − m y )( y − m y )
T
}
{
= E ( Ax + b − Am x − b )( Ax + b − Am x − b )
T
}
= E {( Ax − Am )( Ax − Am ) }
T
x x
{
= E A ( x − mx ) ( A ( x − mx ))
T
}
{
= E A ( x − m x )( x − m x ) A T
T
}
{
= A E ( x − m x )( x − m x )
T
}A T
= AΣ xx A T
or
Σ yy = AΣ xx A T (3.24)
Q yy = AQ xx A T (3.25)
In many practical applications of variance propagation the random variables in x and y are
nonlinearly related, i.e.,
y = f (x) (3.26)
In such cases, we can expand the function on the right-hand-side of (3.26) using Taylor's
theorem.
For a non-linear function of a single variable Taylor's theorem may be expressed in the
following form
(x − a) ( x − a)
2 3
df d2 f d3 f
f ( x ) = f (a ) + ( x − a) + 2 + 3 +
dx a dx a
2! dx a
3!
(3.27)
(x − a)
n −1
d n −1 f
+ + Rn
dx n −1 a
( n − 1) !
where Rn is the remainder after n terms and lim Rn = 0 for f ( x ) about x = a and
n →∞
df d2 f
, etc are derivatives of the function f ( x ) evaluated at x = a .
dx a dx 2 a
For a non-linear function of two random variables, say φ = f ( x, y ) , the Taylor series
∂f ∂f
φ = f ( a, b ) + ( x − a) + ( y − b)
∂x a ,b ∂y a ,b
(3.28)
1 ⎧⎪ ∂ 2 f ∂2 f ∂f ∂f ⎫
+ ⎨ 2 ( x − a) + 2 ( y − b) + ( x − a )( y − b ) ⎪⎬ +
2 2
2! ⎪⎩ ∂x a ,b
∂y a ,b
∂x a ,b ∂y a ,b ⎪⎭
∂f ∂f ∂2 f
where f ( a , b ) is the function φ evaluated at x = a and y = b , and , ,
∂x a ,b ∂y a ,b
∂x 2 a ,b
Extending to n random variables, we may write a Taylor series approximation of the function
f ( x ) as a matrix equation
∂f
f ( x ) = f ( x0 ) + ( x − x 0 ) + higher order terms (3.29)
∂x x0
∂f
where f ( x 0 ) is the function evaluated at the approximate values x 0 and are the partial
∂x x0
Replacing f ( x ) in (3.26) by its Taylor series approximation, ignoring higher order terms,
gives
∂f
y = f ( x ) = f ( x0 ) + ( x − x0 ) (3.30)
∂x x0
m y = E {y}
⎧ ∂f ⎫
= E ⎨ f ( x0 ) + ( x − x0 )⎬
⎩ ∂x x0 ⎭
⎧ ∂f ⎫
{ }
= E f ( x0 ) + E ⎨
⎩ ∂x x0
( x − x0 )⎬
⎭
∂f
= f ( x0 ) +
∂x x0
{
E ( x − x0 ) }
∂f
= f ( x0 ) +
∂x x0
(
E {x} − E {x 0 } )
∂f
= f ( x0 ) + ( mx − x0 )
∂x x0
And
⎡ ∂f ⎤ ⎡ ∂f ⎤
y − m y = ⎢ f ( x0 ) + ( x − x 0 )⎥ − ⎢ f ( x 0 ) + ( m x − x 0 )⎥
⎣ ∂x x0 ⎦ ⎣ ∂x x0 ⎦
∂f
= (x − mx ) (3.31)
∂x x0
= J yx ( x − m x )
J yx is the (m,n) Jacobian matrix of partial derivatives, noting that y and x are (m,1) and (n,1)
vectors respectively
{
Σ yy = E ( y − m y )( y − m y )
T
}
= E {( J ( x − m x ) ) ( J yx ( x − m x ) ) }
T
yx
{
= E J yx ( x − m x )( x − m x ) J Tyx
T
}
{
= J yx E ( x − m x )( x − m x )
T
}J T
yx
= J yx Σ xx J Tyx
Thus, in a similar manner to above, we may express the Law of Propagation of Variances for
non-linear functions of random variables as
Σ yy = J yx Σ xx J Tyx (3.33)
Q yy = J yx Q xx J Tyx (3.34)
of z is
2
⎛ ∂z ⎞ ⎛ ∂z ⎞ ∂z ∂z
2
σ = ⎜ ⎟ σ x2 + ⎜ ⎟ σ y2 + 2
2
σ xy (3.35)
⎝ ∂x ⎠ ⎝ ∂y ⎠ ∂x ∂y
z
Equation (3.35) can be derived from the general matrix equation (3.33) in the following
⎡ x⎤
manner. Let z = f ( x, y ) be written as y = f ( x ) where y = [ z ] , a (1,1) matrix and x = ⎢ ⎥
⎣ y⎦
⎡ σ x2 σ xy ⎤
is a (2,1) vector. The variance-covariance matrix of the random vector x is Σ xx = ⎢ 2 ⎥
,
⎣σ xy σ y ⎦
⎡ ∂z ∂z ⎤
the Jacobian J yx = ⎢ and the variance-covariance matrix Σ yy which contains the
⎣ ∂x ∂y ⎥⎦
⎡ ∂z ⎤
⎡ ∂z ∂z ⎤ ⎡ σ
2
σ xy ⎤ ⎢ ∂x ⎥
Σ yy = ⎡⎣σ z2 ⎤⎦ = ⎢ ⎥⎢ ⎥
x
⎢
⎣ ∂x ∂y ⎥⎦ ⎣σ xy σ y2 ⎦ ⎢ ∂z ⎥
⎢ ∂y ⎥
⎣ ⎦
In the case where the random variables in x are independent, i.e., their covariances are zero;
we have the Special Law of Propagation of Variances. For the case of z = f ( x, y ) where the
random variables x and y are independent, the Special Law of Propagation of Variances is
written as
2
⎛ ∂z ⎞ ⎛ ∂z ⎞
2
σ = ⎜ ⎟ σ x2 + ⎜ ⎟ σ y2
2
(3.36)
⎝ ∂x ⎠ ⎝ ∂y ⎠
z
Propagation of variances, using either the Law of Propagation of Variances for linear
functions, equations (3.24) or (3.25) and non-linear functions, equations (3.33) or (3.34), or
the Special Law of Propagation of Variances, equation (3.36) where the variables are
independent are important "tools" for assessing the precision of computed quantities arising
from measurements. Implicit in every application of variance propagation is an a priori
knowledge of the precision of the measurements. For example, if quantities are derived from
Total Station EDM distances, then some knowledge of the precision of those distances is
assumed; if height differences are computed from a combination of Total Station EDM
distances and zenith angles, then the precisions of distances and zenith angles is assumed.
These a priori precision estimates may come from information supplied by the equipment
manufacturer, statistical analysis of observations, prior knowledge or simply educated
guesses. Whatever the source of knowledge, it is assumed that these precisions are known
before any variance propagation is made. The following sections set out some useful
techniques that are applicable to many surveying operations.
Total Stations are modern surveying instruments combining an electronic theodolite (for
angle measurement) and an EDM (for distance measurement to a reflecting prism). EDM is
an abbreviation of Electronic Distance Measurement. The primary Total Station
measurements are horizontal and vertical circle readings α and β respectively (from which
angles may be obtained) and slope distances D. Total Stations have "on board" computers
and may display (at the push of a button) computed quantities such as vertical height
differences ±V and horizontal distances H, between the instrument and the prism (the
sighting target).
A vertical circle reading β made with a Total Station is a clockwise angle from the zenith
(defined by the vertical axis of the Total Station) measured in a vertical plane. This vertical
plane is swept out by the telescope rotating about the Total Station's horizontal axis (the
trunnion axis). The horizontal distance H and the vertical component ±V of a measured
slope distance S are
H = D sin β
V = D cos β
Treating H, V, D and β as random variables and assuming that D and β are independent the
Special Law of Propagation of Variances (3.36) can be used to compute the variances of H
and V
2
⎛ ∂H ⎞ 2 ⎛ ∂H ⎞ 2
2
σ =⎜
2
⎟ σD + ⎜ ⎟ σ β = sin 2 β (σ D2 ) + D 2 cos2 β (σ β2 )
⎝ ∂D ⎠ ⎝ ∂β ⎠
H
2
⎛ ∂V ⎞ 2 ⎛ ∂V ⎞ 2
2
σ =⎜
2
⎟ σD + ⎜ ⎟ σ β = cos2 β (σ D2 ) + D 2 sin 2 β (σ β2 )
⎝ ∂D ⎠ ⎝ ∂β ⎠
V
σ D2 , σ β2 are the variances of the slope distance and zenith angle respectively. For any
properly calibrated Total Station EDM and prism combination, the standard deviation of a
distance can be expressed in the form σ D = x + y ppm (ppm is parts per million) and the
= 6.85 × 10−5 m 2
= 8.82 × 10−6 m 2
It is interesting to note that the computed quantities H and V are not independent, even though
they have been computed from independent quantities. This can be seen by computing the
variances in the following manner. We may write the computation of the components H and
V from the observations D and β as the vector equation
y = f (x)
⎡H ⎤ ⎡D⎤
y = ⎢ ⎥ , x = ⎢ ⎥ and y and x are non-linearly related. Using (3.33) we may write
⎣V ⎦ ⎣β ⎦
⎡σ2 σ HV ⎤
Σ yy = ⎢ H ⎥ = J yx Σ xx J yx
T
⎣σ HV σ V2 ⎦
⎡∂H ∂D ∂H ∂β ⎤ ⎡σ D 0 ⎤ ⎡∂H ∂D ∂V ∂D ⎤
2
=⎢ ⎥⎢ 2⎥⎢ ⎥
⎣ ∂V ∂D ∂V ∂β ⎦ ⎣ 0 σ β ⎦ ⎣ ∂H ∂β ∂V ∂β ⎦
⎡ sin β D cos β ⎤ ⎡σ D2 0 ⎤ ⎡ sin β cos β ⎤
=⎢ ⎢ ⎥
⎣cos β − D sin β ⎥⎦ ⎣ 0 σ β2 ⎦ ⎢⎣ D cos β − D sin β ⎥⎦
The diagonal elements of Σ yy are σ H2 = 6.85 × 10−5 m 2 and σ V2 = 8.82 × 10−6 m 2 which are
the same values as computed above and the off-diagonal elements are the covariance
σ HV = 4.62 × 10−6 m 2 . These elements, which are non-zero, indicate that computed quantities
are correlated.
rP level line rQ
d d
staff
Δh
P
Figure 3.1
Figure 3.1 shows a schematic diagram of a spirit level and a single height difference ΔhPQ
between positions of the levelling staves at P and Q. The backsight and foresight staff
readings are rP and rQ and the length of the sights to the levelling staff are the same and equal
ΔhPQ = rP − rQ
Considering the backsight and foresight staff readings to be independent and of equal
precision, the Special Law of Propagation of Variances (3.36) gives the variance of a single
height difference from spirit levelling as
2
⎛ ∂Δh ⎞ 2 ⎛ ∂Δh ⎞ 2
2
σ 2
Δh =⎜ ⎟ σ rP + ⎜⎜ ⎟⎟ σ rQ = σ rP + σ rQ = 2σ r
2 2 2
(3.37)
⎝ ∂rP ⎠ ⎝ ∂rQ ⎠
Suppose a flight of levels is run between two points A and B that are a distance D apart and
that at every set up of the level, the backsight and foresight distances are the same and equal
to d and the staff readings are all of equal precision σ r . There will be n = D 2d set ups and
the height difference ΔH AB will be the sum of the individual height differences of the level
run
Considering each Δh to be independent and of equal precision, the variance of the total
height difference ΔH AB is given by the Special Law of Propagation of Variances
It is usual practice in spirit levelling to "close the level run" by returning to the start, therefore
the mean height difference of a closed level run (or a levelling loop) is
ΔH AB + ΔH BA
ΔH MEAN =
2
Law of Propagation of Variances gives, bearing in mind (3.37) and (3.38) the variance of the
mean height difference of closed level run as
1 1 1 n
σ Δ2H = σ Δ2H AB + σ Δ2H BA = σ Δ2H = σ Δ2h = nσ r2 (3.39)
MEAN
4 4 2 2
n = D 2d is the number of set ups in the level run between A and B, that are distance D apart
(d is the length of the backsight/foresight distance) hence the variance in the mean height
difference is proportional D. Since weights are inversely proportional to variances, it is
common to express precisions of spirit levelled height differences as weights that are defined
as being inversely proportional to distances.
1
wΔH ∝ (3.40)
distance
C′
γ
A C″
α b
A′ B′ δ a
c β
Figure 3.2
If the observations, angles α , β , γ ,δ and distance c, have particular precisions then the
computed height of C will also have a precision that can be determined by propagation of
variances. The method of propagation will be demonstrated by the following example.
Example. Referring to Figure 3.2, the following observations and standard deviations are
known together with the Reduced Levels of A and B
Method of Computation of RL of C
a b c c
= = =
sin α sin β sin (180 − (α + β ) ) sin (α + β )
giving
c sin α
a= = 1695.816217 m (3.41)
sin (α + β )
c sin β
b= = 1749.707936 m (3.42)
sin (α + β )
(i) Inspection of equations (3.41) and (3.42) shows that a and b are non-linear functions of
the variables α , β and c which we may write as
y = f (x)
we may write
Σ yy = J yx Σ xx J Tyx (3.44)
⎡σ α2 0 0⎤
⎡σ 2
σ ab ⎤ ⎡ ∂a ∂α ∂a ∂β ∂a ∂c ⎤ ⎢ ⎥
where Σ yy = ⎢ a
2 ⎥
, J yx = ⎢ ⎥ and Σ xx = ⎢ 0 σ β2 0⎥
⎣σ ba σb ⎦ ⎣ ∂b ∂α ∂b ∂β ∂b ∂c ⎦
⎢0 0 σ c2 ⎥⎦
⎣
noting that the observations α , β and c are considered as independent random
variables, hence Σ xx is diagonal. The elements of the Jacobian J yx are the partial
d ⎛ u ⎞ v ( du dx ) − u ( dv dx )
Using the rule for derivatives ⎜ ⎟= and the relation
dx ⎝ v ⎠ v2
d
sin (α + β ) = cosα we may write
dα
Using the trigonometric function sin ( A − B ) = sin A cos B − cos A sin B where
∂a c sin β
= 2
∂α sin (α + β )
c sin β
Noting that b = the partial derivative is
sin (α + β )
∂a b
= (3.45)
∂α sin (α + β )
c sin a cos (α + β ) 1
Noting that a = and that = the partial derivative is
sin (α + β ) sin (α + β ) tan (α + β )
∂a −a
= (3.46)
∂β tan (α + β )
∂a sin α a
= = (3.47)
∂c sin (α + β ) c
∂b −b
= (3.48)
∂α tan (α + β )
∂b a
= (3.49)
∂β sin (α + β )
∂b sin β b
= = (3.50)
∂c sin (α + β ) c
(v) Substitute numeric values into the equations for the partial derivatives and set the
elements of the Jacobian J yx
⎡ b −a a⎤
⎢ sin (α + β ) tan (α + β ) c ⎥ ⎡6061.961236 5625.191025 3.355062 ⎤
J yx = ⎢ ⎥=
⎢ −b a b ⎥ ⎢⎣5803.955218 5875.250353 3.461684 ⎥⎦
⎢ tan (α + β ) sin (α + β ) c ⎥⎥
⎢⎣ ⎦
(vi) Substitute numeric values into the elements of the variance matrix Σ xx
⎡ ( 4.8481 × 10−5 ) 2 0 0 ⎤
⎡σ α2
0 0⎤ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
Σ xx = ⎢ 0 σ β2 0 ⎥=⎢ 0 ( 9.6963 × 10 )
−5 2
0 ⎥
⎢0 σ c ⎥⎦ ⎢
2
2⎥
⎣ 0
⎢⎣ 0 0 ( 0.050 ) ⎥
⎦
(vii) Perform the matrix multiplications in (3.44) to give the variance-covariance matrix of
the computed distances a and b
⎡ σ a2 σ ab ⎤ ⎡0.412012 0.422455⎤
Σ yy = ⎢ 2 ⎥
=⎢ ⎥ (3.51)
⎣σ ba σ b ⎦ ⎣ 0.422455 0.433671⎦
Note that the off-diagonal terms are not zero, indicating that the computed quantities
are correlated. The leading-diagonal elements are the variances of a and b, hence the
standard deviations are
σ a = 0.412012 = 0.642 m
σ b = 0.433671 = 0.659 m
(viii) Inspection of equation (3.43) shows that the mean RL of C is non-linear function of the
correlated variables a, b and the independent variables γ ,δ that we may write as
y = f (x)
may write
Σ yy = J yx Σ xx J Tyx (3.52)
where Σ yy = ⎡⎣σ RL
2
⎤ , J yx = [ ∂RLC ∂a ∂RLC ∂b ∂RLC ∂γ
C ⎦
∂RLC ∂δ ] and
⎡ σ a2 σ ab 0 0⎤
⎢ ⎥
σ σ b2 0 0⎥
Σ xx = ⎢ ba
⎢ 0 0 σ γ2 0⎥
⎢ ⎥
⎣⎢ 0 0 0 σ δ2 ⎦⎥
Note that Σ yy is a matrix containing a single element only, the variance of the mean RL
d 1
Noting that tan x = sec 2 x = the partial derivatives are
dx cos2 x
(x) Substitute numeric values into the equations for the partial derivatives and set the
elements of the Jacobian J yx
⎡ tan δ tan γ b a ⎤
J yx = ⎢
⎣ 2 2 2cos γ 2cos2 δ ⎥⎦
2
(xi) Substitute numeric values into the elements of the variance matrix Σ xx
⎡ 0.412012 0.422455 0 0 ⎤
⎡ σ a2 σ ab 0 0⎤ ⎢ ⎥
⎢ ⎥ 0.422455 0.433671 0 0
⎢σ ba σ b2 0 0 ⎥ ⎢ ⎥
Σ xx = =⎢
⎢ 0 0 σ γ2 0 ⎥ ⎢ 0 0 ( 7.2722 × 10 )
−5 2
0
⎥
⎥
⎢ ⎥
⎢⎣ 0 0 0 σ δ2 ⎥⎦ ⎢ (1.4544 × 10−4 ) ⎥⎦⎥
2
⎣⎢ 0 0 0
(xii) Perform the matrix multiplications in (3.52) to give the variance-covariance matrix of
the computed mean RL of C.
Σ yy = [0.037040]
σ RL
2
C
= 0.037040 m 2
σ RL = 0.037040 = 0.192 m
C
In some applications of surveying and least squares, we must deal with multiple functions of
random variables that may be correlated. To handle these cases, Mikhail (1976, pp.83-87)
develops general rules and techniques that are repeated in the following sections. Note that
cofactor matrices Q replace variance-covariance matrices Σ in the following developments.
Consider x ( n ,1) and t ( m ,1) to be two correlated vectors with cofactor matrices Q xx , Q xt and Qtt .
elements of the x and t vectors. Two other vectors y ( q ,1) and z ( p ,1) are functions of x and t (y
y = y (x)
(3.53)
z = z (t)
∂y
J yx =
∂x
(3.54)
∂z
J zt =
∂t
⎡y⎤ ⎡x ⎤
Letting r = ⎢ ⎥ and s = ⎢ ⎥ equation (3.53) can be written as
⎣z⎦ ⎣t⎦
r = f (s) (3.55)
Q rr = J rs Q ss J Trs (3.56)
⎡J J yt ⎤ ⎡ J yx 0⎤
J rs = ⎢ yx =
J zt ⎥⎦ ⎢⎣ 0 J zt ⎥⎦
(3.57)
⎣ J zx
From this expanded equation, we may write the following symbolic equation and four general
relationships
⎡y⎤ ⎡ y ( x )⎤
If ⎢ ⎥ == ⎢ ⎥ and x and t are correlated
⎣z⎦ ⎢⎣ z ( t ) ⎥⎦
Q yy = J yx Q xx J Tyx
Q yz = J yx Q xt J Tzt
(3.58)
Q zy = J zt Qtx J Tyz
Q zz = J zt Qtt J Tzt
The cofactor matrices where the subscripts are different letters, i.e., Q xt , Qtx , Q yz and Q zy
are crosscofactor matrices and their form is easily constructed, for example
This directly leads to the fact that crosscofactor matrices are not necessarily square and
symmetric, as cofactor matrices are, and that
Q xt = QTtx
m, n
To assist the practitioner in variance propagation (or cofactor propagation) of linear and non-
linear functions of random variables a technique known as symbolic multiplication can be
used. This mnemo-technical rule was originally devised by Tienstra (1966) to obtain
covariances of random variables related by systems of linear (or linearized) equations. His
rule, developed before the extensive use of matrix algebra, can be employed with matrix
equations in the following manner. For example, for linear functions
y = Ax + a (a)
z = Bt + b (b)
The constant vectors a and b play no part in variance propagation and can be ignored, and the
crosscofactor matrix Q yz is
Q yz = AQ xt BT (c)
(i) writing the cofactor matrix Q on the left-hand-side of the equals sign with
subscripts y and z representing the vectors y and z on the left-hand-sides of
equations (a) and (b) in the order of (a) first and (b) second, then
(ii) writing the coefficient matrix of the random variable in equation (a) on the right-
hand-side of the equals sign, then
(iii) multiplying by the cofactor matrix Q with the subscripts x and t representing the
random vectors x and t on the right-hand-sides of equations (a) and (b) in the
order of (a) first and (b) second, then
(iv) multiplying by the transpose of the coefficient matrix of the random variable on
the right-hand-side of equation (b).
Note the in the case of non-linear functions, the coefficient matrices J yx and J zt replace A
and B respectively, and equation (c) becomes identical to the 2nd equation of (3.58).
The rules and techniques of propagation of variances (and cofactors) given in the preceding
sections allow for propagation of variances through several transformations. These
propagations can be carried out in two ways, (i) substitution and (ii) stepwise. To
demonstrate these we consider the following three relations
y = Ax + a
z = By + b (3.60)
r = Cz + c
Let the random vector x be known, with its cofactor matrix Q xx and it is desired to obtain the
cofactor matrices of z and r. The vectors a, b and c contain constants and A, B and C are
coefficient matrices.
y = Ax + a
z = By + b = B ( Ax + a ) + b
= ( BA ) x + ( Ba + b )
r = Cz + c = C ( By + b ) + c
= ( CB ) y + ( Cb + c )
= ( CB )( Ax + a ) + ( Cb + c )
= ( CBA ) x + ( CBa + Cb + c )
Noting that the last terms in the equations for z and r evaluate to vectors, say d and e we may
rewrite the equations and apply propagation using symbolic multiplication to give
y = Ax + a Q yy = AQ xx A T
z = ( BA ) x + d Q zz = ( BA ) Q xx ( BA ) = BAQ xx A T BT
T
(3.61)
Q yz = AQ xx ( BA ) = AQ xx A T BT
T
Q yr = AQ xx ( CBA ) = AQ xx A T BT CT
T
Q zr = ( BA ) Q xx ( CBA ) = BAQ xx A T BT CT
T
Stepwise Propagation
The same result in equations (3.61) can be obtained by applying propagation in steps as
follows:
y = Ax + a Q yy = AQ xx A T
z = By + b Q zz = BQ yy BT = BAQ xx A T BT
r = Cz + c Q rr = CQ zz CT = CBAQ xx A T BT CT
(3.62)
Q yz = AQ xy BT
Q yr = AQ xz CT
Q zr = BQ yz CT
The last three (crosscofactor) relations of equations (3.62) do not correspond to those in
equations (3.61), particularly because of the absence of the matrices Q xy , Q xz and Q yz .
However, these matrices can be derived if equations (3.60) are supplemented by simple
identities x = Ix and y = Iy giving the following three pairs of equations from which the
matrix relationships can be obtained by symbolic multiplication.
x = Ix Q xy = IQ xx A T = Q xx A T
y = Ax + a
x = Ix Q xz = IQ xy BT = Q xx A T BT
(3.63)
z = By + b
y = Iy Q yz = IQ yy BT
z = By + b
Substituting these relations into the last three relations in (3.62) leads directly to equations
(3.61). This demonstrates that propagation through substitution is equivalent to stepwise
propagation.
4. APPROXIMATE VALUES
In many least squares problems, the unknown quantities being sought may be quite large
numbers and or the coefficients of these quantities may be large numbers. This can lead to
numerical problems in the formation of normal equations where large numbers are multiplied
and summed. To overcome this problem, approximate values of the unknown quantities may
be used and small, unknown corrections to the approximate values become the quantities
being sought.
x = x0 + δ x (4.1)
x = x0 + δ x (4.2)
The use of approximate values can best be explained by example and the following sections
contain worked examples of some simple least squares problems that demonstrate the use of
approximate values.
The diagram below shows a level network of height differences observed between the fixed
stations A (RL 102.440 m) and B (RL 104.565 m) and "floating" stations X, Y and Z whose
Reduced Levels (RL's) are unknown. The arrows on the diagram indicate the direction of
rise. The Table of Height differences shows the height difference for each line of the network
and the distance (in kilometers) of each level run.
X
1 •
Line Height Diff Dist (km) 2
1 6.345 1.7 6
2 4.235 2.5
3 3.060 1.0
A ⊗
4 0.920 3.8 5
• 7
⊗B
5 3.895 1.7 Y
6 2.410 1.2
7 4.820 1.5 3
•
4 Z
The method of Least Squares can be used to determine the best estimates of the RL's of X, Y
and Z bearing in mind that the precision of the observed height differences is inversely
proportional to the distance of the level run.
The observation equation for the RL's of two points P and Q connected by an observed spirit
levelled height difference ΔH PQ can be written as
P + ΔH PQ + vPQ = Q (4.3)
where P and Q are the RL's of points P and Q and vPQ is the residual, a small unknown
correction to the observed height difference. If the RL's of P and Q are unknown but have
approximate values, say P = P 0 + δ P and Q = Q 0 + δ Q we may write a general observation
equation for an observed height difference as
P 0 + δ P + ΔH PQ + vPQ = Q 0 + δ Q (4.4)
Using this general observation equation we may write an equation for each observed height
difference
A +ΔH1 + v1 = X 0 + δ X
B +ΔH 2 + v2 = X 0 + δ X
Z 0 + δ Z +ΔH 3 + v3 = B
Z 0 + δ Z +ΔH 4 + v4 = A
A +ΔH 5 + v5 = Y 0 + δ Y
Y 0 + δ Y +ΔH 6 + v6 = X 0 + δ X
Z 0 + δ Z + ΔH 7 + v7 = Y 0 + δ Y
Rearranging these equations so that all the unknown quantities are on the left-hand-side of the
equals sign and all the known quantities are on the right-hand-side gives
v1 − δ X = X 0 − A −ΔH1
v2 − δ X = X 0 − B −ΔH 2
v3 +δ Z = B − Z 0 −ΔH 3
v4 +δ Z = A − Z 0 −ΔH 4
v5 −δ Y = Y 0 − A −ΔH 5
v6 − δ X +δ Y = X 0 − Y 0 −ΔH 6
v7 −δ Y +δ Z = Y 0 − Z 0 −ΔH 7
The approximate RL's of the unknown points X, Y and Z can be determined from the RL's of
A and B and appropriate height differences
X 0 = A + ΔH1 = 108.785 m
Y 0 = A + ΔH 5 = 106.335 m
Z 0 = A − ΔH 4 = 101.520 m
⎡ ( X 0 − A ) − ΔH 1 ⎤
⎢ ⎥
⎡ v1 ⎤ ⎡ −1 0 0⎤
⎢ ( X 0 − B ) − ΔH 2 ⎥ ⎡
0.000 ⎤
⎢ v ⎥ ⎢ −1 0 0 ⎥ ⎢ ⎥ ⎢ −0.015 ⎥
⎢ 2⎥ ⎢ ⎥ ⎢ ⎥
1⎥ ⎡δ X ⎤ ⎢ ( B − Z ) − ΔH 3 ⎥ ⎢ −0.015 ⎥
⎢ 0
⎥
⎢ v3 ⎥ ⎢ 0 0
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ v4 ⎥ + ⎢ 0 0 1⎥ ⎢ δ Y ⎥ = ⎢ ( A − Z 0 ) − ΔH 4 ⎥ = ⎢ 0.000 ⎥
⎢ ⎥ ⎢ ⎥
⎢ v5 ⎥ ⎢ 0 −1 0⎥ ⎣⎢ δ Z ⎦⎥ ⎢ Y 0 − A − ΔH ⎥ ⎢ 0.000 ⎥
⎢ ⎥ ⎢ ( )
⎥ ⎥ ⎢ 0.040 ⎥
5
⎢
⎢ v6 ⎥ ⎢ −1 1 0⎥
⎢ ( X − Y ) − ΔH 6 ⎥ ⎢
0 0 ⎥
⎢⎣ v7 ⎥⎦ ⎢⎣ 0 −1 1⎦⎥ ⎢ ⎥ ⎣ −0.005 ⎥⎦
⎢
⎢ ( Y − Z ) − ΔH 7 ⎥
0 0
⎣ ⎦
⎡ 1 1 1 1 1 1 1 ⎤
W = diag ⎢
⎣ 1.7 2.5 1 3.8 1.7 1.2 1.5 ⎥⎦
= diag [0.5882 0.4000 1.0000 0.2632 0.5882 0.8333 0.6667]
The least squares solution for the vector of corrections x can be obtained from the MATLAB
function least_squares.m with the following data file c:\Temp\Level_Net_Data.dat
Running the program from the MATLAB command window created the following output file
c:\Temp|Level_Net_Data.out
Input Data
Vector of solutions x
-0.0095
0.0121
-0.0053
Vector of residuals v
-0.0095
-0.0245
-0.0097
0.0053
0.0121
0.0184
0.0124
A most important outcome of a least squares adjustment is that estimates of the precisions of
the quantities sought, the elements of x, the unknowns or the parameters, are easily obtained
from the matrix equations of the solution. Application of the Law of Propagation of
Variances demonstrates that N −1 , the inverse of the normal equation coefficient matrix is
equal to the cofactor matrix Q xx that contains estimates of the variances and covariances of
the elements of x. In addition, estimates of the precisions of the residuals and adjusted
observations may be obtained. This most useful outcome enables a statistical analysis of the
results of a least squares adjustment and provides the practitioner with a degree of confidence
in the results.
v + Bx = f (5.1)
f is an (n,1) vector of numeric terms derived from the (n,1) vector of observations l and the
(n,1) vector of constants d as
f = d−l (5.2)
cofactor matrix Ql l and a weight matrix Wl l = Ql−l1 . Remember that in most practical
applications of least squares, the matrix Σl l is unknown, but estimated a priori by Ql l that
contains estimates of the variances and covariances and Σ l l = σ 02Ql l where σ 02 is the
Note: In the derivations that follow, the subscript "ll" is dropped from Ql l and Wl l
f = ( −I ) l + d (5.3)
then (5.3) is in a form suitable for employing the Law of Propagation of Variances developed
in Chapter 3; i.e., if y = Ax + b and y and x are random variables linearly related and b is a
vector of constants then Q yy = AQ xx A T . Hence, the cofactor matrix of the numeric terms f is
Q ff = ( − I ) Q ( − I ) = Q
T
Thus the cofactor matrix of f is also the a priori cofactor matrix of the observations l.
The solution "steps" in the least squares adjustment of indirect observations are set out
Chapter 2 and restated as
N = BT W B
t = BT W f
x = N −1 t
v = f − Bx
ˆl = l + v
To apply the Law of Propagation of Variances, these equations may be re-arranged in the
form y = Ax + b where the terms in parenthesis ( ) constitute the A matrix.
t = ( BT W ) f (5.4)
x = ( N −1 ) t (5.5)
v = f − Bx
= f − BN −1t
= f − BN −1BT Wf
= ( I − BN −1BT W ) f (5.6)
ˆl = l + v
= l + f − Bx
= d − Bx
= ( −B ) x + d (5.7)
Applying the Law of Propagation of Variances to equations (5.4) to (5.7) gives the following
cofactor matrices
Qtt = ( BT W ) Q ff ( BT W ) = N
T
(5.8)
Q xx = ( N −1 ) Qtt ( N −1 ) = N −1
T
(5.9)
= Q − BN −1BT (5.10)
Qllˆˆ = ( − B ) Q ( − B )
T
= BN −1BT
= Q − Qvv (5.11)
Variance-covariance matrices for t, x, v and l̂ are obtained by multiplying the cofactor matrix
by the variance factor σ 02 .
vT Wv
σˆ 02 = (5.12)
r
A derivation of equation (5.12) is given below. The quadratic form v T Wv may be computed
in the following manner.
Remembering, for the method of indirect observations, the following matrix equations
N = BT WB
t = BT Wf
x = N −1t
v = f − Bx
then
v T Wv = (f − Bx )T W(f − Bx )
= (f T − x T BT ) W (f − Bx )
= (f T W − x T BT W )(f − Bx )
= f T Wf − f T WBx − xT BT Wf + x T BT WBx
= f T Wf − 2f T WBx + x T BT WBx
= f T Wf − 2t T x + x T Nx
= f T Wf − 2x T t + x T t
and
v T Wv = f T Wf − xT t (5.13)
Σ = σ 02Q (5.14)
Cofactor matrices Qv v , Q x x and Qlˆ lˆ are computed from equations (5.9) to (5.11) and so it
The development of a matrix expression for computing σˆ 02 is set out below and follows
Mikhail (1976, pp.285-288). Some preliminary relationships will be useful.
1. If A is an (n,n) square matrix, the sum of its diagonal elements is a scalar quantity
called the trace of A and denoted by tr ( A ) The following relationships are useful
tr ( A T ) = tr ( A ) (5.16)
xT A x = tr ( x x T A ) (5.17)
Σ xx = E {( x − m x )( x − m x )T }
= E {( x − m x )( xT − mTx )}
= E {xx T − xmTx − m x x T + m x mTx }
= E {xx T } − E {xmTx } − E {m x xT } + E {m x mTx }
= E {xx T } − E {x} mTx − m x E {x T } + m x mTx
E {v} = mv = 0 (5.20)
4. By definition (see Chapter 2) the weight matrix W, the cofactor matrix Q and the
variance-covariance matrix Σ are related by
Now, for the least squares adjustment of indirect observations the following relationships are
recalled
v + Bx = f , N = BT WB, t = BT Wf
Q ff = Q, Qtt = N, Q xx = N −1
1
Σ −1 = W, M = BT Σ −1B
σ 02
Σ ff = Σ, Σ tt = σ 04M, Σ xx = M −1
In addition, the expectation of the vector f is the mean m f and so we may write
m f = Bm x (5.22)
v T Wv = σ 02 ( vT Σ −1v ) (5.23)
v T Wv = f T Wf − xT t
= f T Wf − xT Nx
v T Σ −1v = f T Σ −1f − xT Mx
Recognising that the terms on the right-hand-side are both quadratic forms, equation (5.17)
can be used to give
{ } { }
E {v T Σ −1v} = E tr ( ff T Σ −1 ) − E tr ( xx T M )
= tr ( E {ff }Σ ) − tr ( E {xx }M )
T −1 T
( ) (
E {v T Σ −1v} = tr ⎡⎣Σ ff + m f mTf ⎤⎦ Σ −1 − tr ⎡⎣Σ xx + m x mTx ⎤⎦ M )
= tr ( I nn + m f mTf Σ −1 ) − tr ( I uu + m x mTx M )
= tr ( I nn − I uu ) − tr ( m f mTf Σ −1 + m x mTx M )
= ( n − u ) − mTf Σ −1m f + mTx Mm x
From equation (5.22) m f = Bm x hence using the rule for matrix transpose
E {v T Wv}
σ 02 =
n−u
v T Wv vT Wv
σˆ 02 = = (5.24)
n−u r
Using equation (5.13) an unbiased estimate of the variance factor σˆ 02 can be computed from
f T Wf − xT t
σˆ 02 = (5.25)
r
In Chapter 2 the least squares technique of adjustment of indirect observations was introduced
using the example of fitting a straight line through a series of data points. The "observations"
in this example were the x,y coordinates that were indirect measurements of the unknown
parameters m and c, the slope and intercept of the line on the y-axis respectively. Subsequent
examples of curve fitting (parabola and ellipse) demonstrated this technique and in Chapter 4
adjustment of indirect observations was applied to a level network. An alternative to this
technique, known as least squares adjustment of observations only, will be introduced in this
chapter using the level network example of Chapter 4.
Figure 6.1 shows a diagram of a level network of height differences observed between the
fixed stations A (RL 102.440 m) and B (RL 104.565 m) and "floating" stations X, Y and Z
whose Reduced Levels (RL's) are unknown. The arrows on the diagram indicate the direction
of rise. The Table of Height differences shows the height difference for each line of the
network and the distance (in kilometers) of each level run. The height differences can be
considered as independent (uncorrelated) and of unequal precision, where the weights of the
height differences are defined as being inversely proportional to the distances in kilometres
(see Chapter 3, Section 3.5.2)
X
1 •
Line Height Diff Dist (km) 2
1 6.345 1.7 6
2 4.235 2.5
3 3.060 1.0
A ⊗
4 0.920 3.8 5
• 7
⊗B
5 3.895 1.7 Y
6 2.410 1.2
7 4.820 1.5 3
•
4 Z
The measured height differences do not accord with the simple principle that they should sum
to zero around a "closed loop", i.e., there are misclosures. For example:
in the loop AXYA ΔH1 − ΔH 6 − ΔH 5 = +0.040 m
Hence it is required to determine the adjusted height differences (that will sum to zero) and
the RL's of X, Y and Z.
observations are required to fix the RL's of X, Y and Z. Hence there are r = n − n0 = 4
redundant measurements, which equals the number of independent condition equations.
Denoting the observed height differences as l1 , l2 etc , residuals as v1 , v2 etc and the RL's of
A and B as A and B, these condition equations are
( l1 + v1 ) − ( l6 + v6 ) − ( l5 + v5 ) = 0
− ( l2 + v2 ) − ( l3 + v3 ) + ( l7 + v7 ) + ( l6 + v6 ) = 0
(6.1)
( l5 + v5 ) − ( l7 + v7 ) + ( l4 + v4 ) = 0
( l1 + v1 ) − ( l2 + v2 ) = B − A
The first 3 equations of (6.1) are the loop closure conditions and the last equation is a
condition linking the RL's of A and B.
Since the measurements are of unequal precision, there is an associated weight wk with each
observation and the application of the least squares principle calls for the minimization of the
least squares function ϕ as
n
ϕ = the sum of the weighted squared residuals = ∑ wk vk2 (6.2)
k =1
Considering equation (6.1) it is clear that separate expressions for residuals cannot be derived
and substituted into ϕ , as was possible in the technique for adjustment of indirect
observations (see Chapter 2). Therefore another approach is needed to ensure that ϕ is a
v1 − v6 − v5 = 0 − ( l1 − l6 − l5 ) = f1
−v2 − v3 + v7 + v6 = 0 − ( −l2 − l3 + l7 + l6 ) = f 2
(6.3)
v5 − v7 + v4 = 0 − ( l5 − l7 + l4 ) = f3
v1 − v2 = ( B − A) − ( l1 − l2 ) = f4
v1 − v6 − v5 − f1 = 0
− v2 − v3 + v7 + v6 − f 2 = 0
(6.4)
v5 − v7 + v4 − f 3 = 0
v1 − v2 − f 4 = 0
where k1 , k2 , k3 and k4 are Lagrange multipliers and there are as many multipliers
as there are conditions. The introduction of −2 preceding each multiplier is for
convenience only. Inspection of equations (6.5), (6.4) and (6.2) show that ϕ and
ϕ ′ are equal since the additional terms in ϕ ′ equate to zero.
(iv) The unknowns in equation (6.5) are the residuals v1 , v2 , … , v7 and the Lagrange
derivatives of ϕ ′ with respect to each of the unknowns must be zero. Setting the
1
Joseph Louis LAGRANGE (1713-1813), a great French mathematician whose major work was in the calculus of variation,
celestial and general mechanics, differential equations and algebra. Lagrange spent 20 years of his life in Prussia and then
returned to Paris where his masterpiece, Mécanique analytique, published in 1788, formalized much of Newton's work on
calculus.
∂ϕ ′ 1
= 2 w1v1 − 2k1 − 2k4 = 0 or v1 = ( k1 + k4 )
∂v1 w1
∂ϕ ′ 1
= 2 w2 v2 + 2k2 + 2k4 = 0 or v2 = ( − k2 − k4 )
∂v2 w2
∂ϕ ′ 1
= 2 w3 v3 + 2k2 =0 or v3 = ( − k2 )
∂v3 w3
∂ϕ ′ 1
= 2 w4 v4 − 2k3 =0 or v4 = k3
∂v4 w4
∂ϕ ′ 1
= 2 w5 v5 + 2k1 − 2k3 = 0 or v5 = ( −k1 + k3 )
∂v5 w5
∂ϕ ′ 1
= 2 w6 v6 + 2k1 − 2k2 = 0 or v6 = ( −k1 + k2 )
∂v6 w6
∂ϕ ′ 1
= 2 w7 v7 − 2k2 + 2k3 = 0 or v7 = ( k 2 − k3 ) (6.6)
∂v7 w7
and when ϕ ′ is differentiated with respect to the Lagrange multipliers and equated
to zero
∂ϕ ′
= −2 ( v1 − v6 − v5 − f1 ) =0 or v1 − v6 − v5 = f1
∂k1
∂ϕ ′
= −2 ( −v2 − v3 + v7 + v6 − f 2 ) = 0 or − v2 − v3 + v7 + v6 = f 2
∂k2
∂ϕ ′
= −2 ( v5 − v7 + v4 − f 3 ) =0 or v5 − v7 + v4 = f 3
∂k3
∂ϕ ′
= −2 ( v1 − v2 − f 4 ) =0 or v1 − v2 = f 4 (6.7)
∂k4
the original condition equations (6.4) result. This demonstrates that the
introduction of Lagrange multipliers ensures that the conditions will be satisfied
when ϕ ′ is minimized.
(v) Now, substituting equations (6.6) into (6.7) gives four normal equations
⎛ 1 1 1 ⎞ 1 1 1
⎜ + + ⎟ k1 − k2 − k3 + k4 = f1
⎝ w1 w6 w5 ⎠ w6 w5 w1
1 ⎛ 1 1 1 1 ⎞ 1 1
− k1 + ⎜ + + + ⎟ k2 − k3 + k4 = f 2
w6 ⎝ w2 w3 w6 w7 ⎠ w7 w2
1 1 ⎛ 1 1 1 ⎞
− k1 − k2 + ⎜ + + ⎟ k3 = f 3
w5 w7 ⎝ w4 w5 w7 ⎠
1 1 ⎛ 1 1 ⎞
k1 + k2 + ⎜ + ⎟ k4 = f 4 (6.8)
w1 w2 ⎝ w1 w2 ⎠
Using the data from Figure 6.1 the weight reciprocals are the distances (in kilometres)
1
= {1.7 2.5 1 3.8 1.7 1.2 1.5}
wk
f1 = − ( l1 − l6 − l5 ) = −0.040 m
f 2 = − ( −l2 − l3 + l7 + l6 ) = 0.065 m
f3 = − ( l5 − l7 + l4 ) = 0.005 m
f4 = ( B − A) − ( l1 − l2 ) = 0.015 m
1
Substituting these values ( k1 , k2 , k3 and k4 ) together with the weight reciprocals into
wk
equations (6.6) gives the residuals v1 , v2 , … , v7 . The height differences, residuals and the
adjusted height differences (observed value + residual) of the level network are shown below.
These are identical results to those obtained by least squares adjustment of indirect
observations set out in Chapter 4.
6.2. Some Comments on the Two Applications of the Method of Least Squares
(a) determining the parameters of a "line of best fit" through a number of data points
(see Chapter 2) and
This technique of least squares "adjustment" is known by various names, some of which are
The last of these is perhaps the most explicit since each observation is in fact an indirect
measurement of the unknown parameters. Least squares adjustment of indirect
observations is the name adopted for this technique by Mikhail (1976) and Mikhail &
Gracie (1981) and will be used in these notes.
• A relationship or condition that the observations (and residuals) must satisfy was
established. In this case, the condition to be satisfied was that observed height
differences (plus some unknown corrections or residuals) should sum to zero
around a closed level loop.
• The minimum number of observations n0 required to fix the heights of X, Y and Z
and satisfy the condition between the fixed points A and B was determined giving
the number of independent condition equations equal to the number of redundant
observations r = n − n0 .
• There were r equations in n unknown residuals, and since r = n − n0 was less than
n, there was no unique solution for the residuals. The least squares principle was
used to determine a set of r normal equations, which were solved for r Lagrange
multipliers which in turn, were used to obtain the n residuals.
• The residuals were added to the observations to obtain the adjusted observations
which were then used to determine the heights of points X, Y and Z.
This technique of least squares "adjustment" is known by various names, two of which are
The second of these is the more explicit since equations involve only observations. No
parameters are used. Least squares adjustment of observations only is the name adopted for
this technique by Mikhail (1976) and Mikhail & Gracie (1981) and will be used in these
notes.
It should be noted that in practice, the method of adjustment of observations only is seldom
employed, owing to the difficulty of determining the independent condition equations
required as a starting point. This contrasts with the relative ease of the technique of
adjustment of indirect observations, where every observation yields an equation of fixed form.
Computer solutions of least squares problems almost invariably use the technique of
adjustment of indirect observations.
C
Consider the level network shown in Figure 6.2. The 1 • 4
RL of A is known and the RL's of B, C and D are to
be determined from the observed height differences.
3
•B
A ⊗
The arrows on the diagram indicate the direction of 6
rise. 5
2
•
D
Figure 6.2 Level network
There are n = 6 observations with a minimum of n0 = 3 required to fix the RL's of B, C and
D with respect to A. Hence there are r = n − n0 = 3 redundant measurements, which equal
the number of independent condition equations. Omitting the residuals, these equations are
l1 + l3 − l2 = 0
l4 − l5 − l3 = 0 (6.10)
l1 + l4 − l6 = 0
l1 + l3 + l5 − l6 = 0
l1 + l4 − l6 = 0 (6.11)
l1 + l4 − l5 − l2 = 0
But, here is a further set of condition equations, which are not independent
l1 + l3 − l2 = 0
l4 − l5 − l3 = 0 (6.12)
l1 + l4 − l5 − l2 = 0
where the third equation of (6.12) is obtained by adding the first two.
Care needs to be taken in determining independent equations and it is easy to see that this
could become quite difficult as the complexity of the adjustment problem increases.
Matrix methods may be used to develop standard equations and solutions for this technique of
least squares adjustment.
Consider again the example of the level net shown in Figure 6.1. The independent condition
equations, (reflecting the fact that height differences around closed level loops should sum to
zero and the condition between the known RL's of A and B), are
( l1 + v1 ) − ( l6 + v6 ) − ( l5 + v5 ) = 0
− ( l2 + v2 ) − ( l3 + v3 ) + ( l7 + v7 ) + ( l6 + v6 ) = 0
(6.13)
( l5 + v5 ) − ( l7 + v7 ) + ( l4 + v4 ) = 0
( l1 + v1 ) − ( l2 + v2 ) = B − A
⎡ l1 + v1 ⎤
⎢l + v ⎥
−1 − 1 0 ⎤ ⎢
2 2⎥
⎡1 0 0 0 ⎡ 0 ⎤
⎢ 0 −1 − 1 ⎥ ⎢ l3 + v3⎥ ⎢ 0 ⎥
0 0 1 1 ⎢
⎢ ⎥ l4 + v 4 ⎥ = ⎢ ⎥ (6.14)
⎢0 0 0 1 1 0 −1⎥ ⎢ ⎥ ⎢ 0 ⎥
⎢ ⎥ ⎢ l5 + v5 ⎥ ⎢ ⎥
⎣ 1 −1 0 0 0 0 0 ⎦⎢ ⎥ ⎣ B − A⎦
⎢ l6 + v6 ⎥
⎢⎣ l7 + v7 ⎥⎦
or Al + Av = d (6.15)
Av = f (6.16)
where f = d − Al (6.17)
and
n is the number of measurements or observations,
n0 is the minimum number of observations required,
r = n − n0 is the number of redundant observations (equal to the number of
condition equations,
v is an (n,1) vector of residuals,
l is the (n,1) vector of observations,
A is an (r,n) matrix of coefficients,
f is an (r,1) vector of numeric terms derived from the observations,
d is an (r,1) vector of constants. Note that in many least squares
problems the vector d is zero.
Now if each observation has an a priori estimate of its variance then the (n,n) weight matrix
of the observations W is known and the least squares function ϕ is
n
ϕ = the sum of the weighted squared residuals = ∑ wk vk2
k =1
ϕ = vT Wv (6.18)
Now ϕ is the function to be minimised but with the constraints imposed by the condition
equations (6.16). This is achieved by adding an (r,1) vector of Lagrange multipliers k and
forming a new function ϕ ′ .
ϕ ′ = v T Wv − 2k T ( Av − f ) (6.19)
∂ϕ ′
= −2 v T A T + 2 f T = 0 T (6.20)
∂k
∂ϕ ′
= 2 v T W − 2 k T A = 0T (6.21)
∂v
Dividing by two, re-arranging and transposing equations (6.20) and (6.21) gives
Av = f (6.22)
Wv − A T k = 0 (6.23)
Note that equations (6.22) are the original condition equations and also that W = W T due to
symmetry.
v = W −1A T k = QA T k (6.24)
A ( QA T k ) = ( AQA T ) k = f (6.25)
The matrix AQA T is symmetric and of order (r,r) and equations (6.25) are often termed the
normal equations. The solution of the (r,1) vector of Lagrange multipliers k is
k = ( AQA T ) f
−1
(6.26)
Now the term AQA T in equations (6.25) and (6.26) can be "simplified" if an equivalent set of
observations le is considered, i.e.,
le = A l (6.27)
We = Qe−1 = ( AQA T )
−1
and (6.29)
After computing k from either (6.26) or (6.30) the residuals v are computed from (6.24) and
the vector of adjusted observations l is given by
l = l+v (6.31)
This is the standard matrix solution for least squares adjustment of observations only.
In this technique of least squares adjustment, the condition equations in matrix form are
Av = f (6.32)
with f = d − Al (6.33)
Similarly to Chapter 5, equation (6.33) can be expressed in a form similar to equation (3.23)
and the general law of propagation of variances applied to give the cofactor matrix of the
numeric terms f.
f = − Al + d
Thus the cofactor matrix of f is also the cofactor matrix of an equivalent set of observations.
The solution "steps" in the least squares adjustment of observations only are set out above and
restated as
Qe = A Q AT
We = Q e−1
k = We f
v = Q AT k
l = l+v
Applying the law of propagation of variances (remembering that cofactor and weight matrices
are symmetric) gives the following cofactor matrices
Qkk = ( We )Q f f ( We )T = We (6.35)
and
ˆl = l + v
= l + QA T k
= l + QA T Wef
= l + QA T We ( d − Al )
ˆl = ( I − QA T W A ) l + QA T W d (6.37)
e e
Qlˆ lˆ = ( I − QA T We A ) Q ( I − QA T We A )
T
which reduces to
Qlˆ lˆ = Q − QA T We AQ = Q − Qv v (6.38)
Variance-covariance matrices for k, v and l are obtained by multiplying the cofactor matrix
by the variance factor σ 20 - see equation (2.32).
v T Wv
σ 20 = (6.39)
r
where
Remembering, for the method of observations only, the following matrix equations
Qe = AQA T
We = Qe−1
k = Wef
v = QA T k
then
v T Wv = ( QA T k ) W ( QA T k )
T
= k T AQWQA T k
= k T AQA T k
= k T Qek
= k T We−1k
= f T We We−1k
and
v T Wv = f T k (6.40)
The basic component of many surveys is a traverse whose bearings have been determined by
theodolite or total station observations and distances measured by EDM. If careful
observations are made with well maintained equipment, the measurements are usually free of
systematic errors and mistakes and the surveyor is left with small random errors which, in the
case of a closed traverse, reveal themselves as angular and linear misclosures. If the
misclosures are within acceptable limits, it is standard practice to remove the misclosures by
adjusting the original observations to make the traverse a mathematically correct figure. In
this section, only single closed traverses are considered and such traverses may begin and end
at different fixed points or close back on the starting point. Traverse networks, consisting of
two or more single traverses with common junction points, are not considered here; such
networks are usually adjusted by a method commonly known as Variation of Coordinates,
based on Least Squares Adjustment of Indirect Observations.
Bowditch's Rule and the Transit Rule, both of which adjust lengths and bearings of traverse
lines and Crandall's method, which adjusts the lengths only of the traverse lines, are three
popular adjustment methods that fail to meet the general guidelines above. Although
Crandall's method, which is explained in detail in later sections, does have mathematical
rigour if it assumed that the bearings of a traverse close and require no further adjustment.
Bowditch's Rule and the Transit Rule for adjusting single traverses are explained below by
applying the rules to adjust a four-sided polygon having an unusually large misclose. The
polygon, shown in Figure 6.3, does not reflect the usual misclosures associated with traverses
using modern surveying equipment.
Bowditch's Rule
Nathaniel Bowditch (1773-1838) was an American mathematician and astronomer (see
citation below). In 1808, in response to a prize offered by a correspondent in The Analyst 2 ,
Bowditch put forward a method of adjusting the misclose in a chain and compass survey
(bearings measured by magnetic compass and distances measured by surveyor's chain). His
method of adjustment was simple and became widely used. It is still used today for the
adjustment of a figure prior to the computation of the area, where the area-formula assumes a
closed mathematical figure.
Prior to the advent of programmable calculators and computers, Bowditch's Rule was often
used to adjust traverses that did not close due to the effects of random errors in the
measurement of bearings and distances. Its use was justified entirely by its simplicity and
whilst it had theoretical rigour – if the bearings of traverse lines were independent of each
other, as they are in compass surveys – it is incompatible with modern traversing techniques.
Bowditch's rule cannot take into account different measurement precisions of individual
traverse lines nor can it accommodate complicated networks of connecting traverses.
Nevertheless, due to its long history of use in the surveying profession, its simplicity and its
practical use in the computation of areas of figures that misclose, Bowditch's Rule is still
prominent in surveying textbooks and is a useful adjustment technique.
Bowditch, Nathaniel (b. March 26, 1773, Salem, Mass., U.S. – d. March 16, 1838, Boston, Mass., U.S.), self-
educated American mathematician and astronomer, author of the best book on navigation of his time, and
discoverer of the Bowditch curves, which have important applications in astronomy and physics. Between
1795 and 1799 Bowditch made four lengthy sea voyages, and in 1802 he was put in command of a merchant
vessel. Throughout that period he pursued his interest in mathematics. After investigating the accuracy of The
Practical Navigator, a work by the Englishman J.H. Moore, he produced a revised edition in 1799. His
additions became so numerous that in 1802 he published The New American Practical Navigator, based on
Moore's book, which was adopted by the U.S. Department of the Navy and went through some 60 editions.
Bowditch also wrote many scientific papers, one of which, on the motion of a pendulum swinging
simultaneously about two axes at right angles, described the so-called Bowditch curves (better known as the
Lissajous figures, after the man who later studied them in detail). Bowditch translated from the French and
updated the first four volumes of Pierre-Simon Laplace's monumental work on the gravitation of heavenly
bodies, Traité de mécanique céleste, more than doubling its size with his own commentaries. The resulting
work, Celestial Mechanics, was published in four volumes in 1829-39. Bowditch refused professorships at
several universities. He was president (1804-23) of the Essex Fire and Marine Insurance Company of Salem
and worked as an actuary (1823-38) for the Massachusetts Hospital Life Insurance Company of Boston. From
1829 until his death, he was president of the American Academy of Arts and Sciences. Copyright 1994-1999
Encyclopædia Britannica
2
The Analyst or Mathematical Museum was a journal of theoretical and applied mathematics. In Vol. I, No. II,
1808, Robert Patterson of Philadelphia posed a question on the adjustment of a traverse and offered a prize of
$10 for a solution; the editor Dr Adrian appointed as the judge of submissions. Bowditch's solution was
published in Vol. I, No. IV, 1808, pp. 88-93 (Stoughton, H.W., 1974. 'The first method to adjust a traverse
based on statistical considerations', Surveying and Mapping, June 1974, pp. 145-49).
Bowditch's adjustment can best be explained by considering the case of plotting a figure
(using a protractor and scale ruler) given the bearing and distances of the sides.
Consider Figure 6.3, a plot that does not close, of a four-sided figure ABCD. The solid lines
AB, BC, CD and DE are the result of marking point A, plotting the bearing AB and then
scaling the distance AB to fix B. Then, from point B, plotting the bearing and distance BC to
fix C, then from C, plotting the bearing and distance CD to fix D and finally from D, plotting
the bearing and distance DA. However, due to plotting errors, the final line does not meet the
starting point, but instead finishes at E. The distance EA is the linear misclose d, due to
plotting errors, i.e., errors in protracting bearings and scaling distances.
B'
x'
A
misclose
d
E
C'
x C
D'
A'
C' D'
B' d
A B C D E
L
To adjust the figure ABCDE to remove the misclose d the following procedure can be used.
1. Draw lines parallel to the line xx' (the misclose bearing) through points B, C and D.
2. Draw a right-angled triangle AEA'. The base of the triangle is L, equal to the sum of the
lengths of the sides and the height is the linear misclose d.
3. Along the base of the triangle, mark in proportion to the total length L, the distances AB,
BC and CD. These will be the points B, C and D.
4. Draw vertical lines from B, C and D intersecting the hypotenuse of the triangle at B', C'
and D'. These distances are then marked off along the parallel lines of the main figure.
5. The adjusted figure is AB'C'D'A.
This adjustment is a graphical demonstration of Bowditch's Rule; i.e., the linear misclose d is
apportioned to individual sides in the ratio of the length of the side to the total length of all the
sides in the direction of the misclose bearing.
each have easting and northing components, say dEB , dN B , dEC , dN C and dED , dN D , the
east misclose dEm = dEB + dEC + dED and the north misclose dN m = dN B + dN C + dN D .
Thus, we may express Bowditch's Rule for calculating adjustments dEk , dN k to individual
easting and northing components ΔEk , ΔN k of line k of a traverse whose total length is L as
⎛ dE ⎞
dEk = distk ⎜ m ⎟
⎝ L ⎠
(6.41)
⎛ dN ⎞
dN k = distk ⎜ m ⎟
⎝ L ⎠
As an example of a Bowditch adjustment, Table 6.1 shows the bearings and distances of the
polygon in Figure 6.3.
length L, equal to the sum of the four sides, is L = 51.53 + 53.86 + 36.31 + 54.71 = 196.41
The corrections to the easting and northing components of the line CD are
3.173
dE = 36.31 × = 0.587
196.41
8.181
dN = 36.31 × = 1.512
196.41
Note: (i) Easting and northing misclosures dEm and dN m used in equations (6.41) have
opposite signs to the misclosures in the tabulation,
(ii) The sums of the corrections are equal and of opposite sign to the misclosures
and
(iii) The sums of the adjusted easting and northing components are zero.
adjusted
components corrections
components
Line Bearing Dist ΔE ΔN dN dN ΔE ΔN
AB 52º 31' 51.53 40.891 31.358 0.832 2.146 41.723 33.504
BC 152º 21' 53.86 24.995 -47.709 0.870 2.243 25.865 -45.466
CD 225º 30' 36.31 -25.898 -25.450 0.587 1.512 -25.311 -23.938
DA 307º 55' 54.71 -43.161 33.620 0.884 2.280 -42.277 35.900
misclose -3.173 -8.181 3.173 8.181 0.000 0.000
Transit Rule
The Transit Rule has no theoretical basis related to surveying instruments or measuring
techniques. Its only justification is its mathematical simplicity, which is no longer a valid
argument for the method in this day of pocket computers. The Transit Rule for calculating
adjustments dEk , dN k to individual easting and northing components ΔEk , ΔN k of line k of a
⎛ ⎞ ⎛ ⎞
⎜ dE ⎟ ⎜ dN ⎟
dEk = ΔEk ⎜ n m ⎟ dN k = ΔN k ⎜ n m ⎟ (6.42)
⎜ ⎟ ⎜ ⎟
⎜ ∑ ΔE j ⎟ ⎜ ∑ ΔN j ⎟
⎝ j =1 ⎠ ⎝ j =1 ⎠
n
ΔEk is the absolute value of the east component of the kth traverse leg and ∑ ΔE
j =1
j is the
sum of the absolute values of the east components of the traverse legs and similarly for ΔN k
n
and ∑ ΔN
j =1
j .
As an example of a Transit Rule adjustment, Table 6.2 shows the bearings and distances of
the polygon in Figure 6.3. The east and north misclosures are dEm = 3.173 and dN m = 8.181 ,
and the sums of the absolute values of the east and north components of the traverse legs are
n n
∑ ΔE j = 134.945 and
j =1
∑ ΔN
j =1
j = 138.137
The corrections to the easting and northing components of the line CD are
3.173
dE = 25.898 × = 0.587
134.945
8.181
dN = 25.450 × = 1.512
138.137
Note: (i) Easting and northing misclosures dEm and dN m used in equations (6.42) have
opposite signs to the misclosures in the tabulation,
(ii) The sums of the corrections are equal and of opposite sign to the misclosures
and
(iii) The sums of the adjusted easting and northing components are zero.
adjusted
components corrections
components
Line Bearing Dist ΔE ΔN dN dN ΔE ΔN
AB 52º 31' 51.53 40.891 31.358 0.961 1.857 41.852 33.215
BC 152º 21' 53.86 24.995 -47.709 0.588 2.826 25.583 -44.883
CD 225º 30' 36.31 -25.898 -25.450 0.609 1.507 -25.289 -23.943
DA 307º 55' 54.71 -43.161 33.620 1.015 1.991 -42.146 35.611
misclose -3.173 -8.181 3.173 8.181 0.000 0.000
Suppose that the angles of a traverse – either beginning and ending at the same point or
between two known points with starting and closing known bearings – have been adjusted so
that the traverse has a perfect angular closure and the resulting bearings are considered as
correct, or adjusted. We call this a closed traverse. A mathematical closure, using the
adjusted bearings and measured distances, will in all probability, reveal a linear misclose, i.e.,
the sums of the east and north components of the traverse legs will differ from zero (in the
case of a traverse beginning and ending at the same point) or certain known values (in the
case of a traverse between known points). Crandall's method, which employs the least
squares principle, can be used to compute corrections to the measured distances to make the
traverse close mathematically. The method was first set out in the textbook Geodesy and
Least Squares by Charles L. Crandall, Professor of Railroad Engineering and Geodesy,
Cornell University, Ithaca, New York, U.S.A. and published by John Wiley & Sons, New
York, 1906.
N
φ2
ΔE1 ΔE 2
2
φ1 s2 2
s1
1 3 φ3
s3
Figure 6.4 shows a schematic diagram of a traverse of k = 1, 2, … , n legs where φk , sk are the
adjusted bearing and measured distance respectively of the kth leg. The east and north
components of each traverse leg are ΔEk = sk sin φk and ΔN k = sk cos φk respectively. If the
adjusted distance of the kth traverse leg is ( sk + vk ) where vk is the residual (a small unknown
correction) then the two conditions that must be fulfilled by the adjusted bearings and
adjusted distances in a closed traverse are
where DE = EEND − ESTART and DN = N END − N START are the east and north coordinate
differences respectively between the terminal points of the traverse. Note that in a traverse
beginning and ending at the same point DE and DN will both be zero.
n
S E = s1 sin φ1 + s2 sin φ2 + + sn sin φn = ∑ ΔE k
k =1
where n
(6.45)
S N = s1 cos φ1 + s2 cos φ2 + + sn cos φn = ∑ ΔN k
k =1
S E , S N are the sums of the east and north components, ΔEk , ΔN k respectively, of the
k = 1, 2, … , n traverse legs.
⎡ v1 ⎤
⎢v ⎥
⎡ sin φ1 sin φ2 sin φ3 sin φn ⎤ ⎢ 2 ⎥ ⎡ DE − S E ⎤
⎢ cos φ ⎢v ⎥ = (6.46)
⎣ 1 cos φ2 cos φ3 cos φn ⎥⎦ ⎢ 3 ⎥ ⎢⎣ DN − S N ⎥⎦
⎢ ⎥
⎢⎣ vn ⎥⎦
or Av = f
The solution for the vector of residuals v is given by equations (6.24) and (6.26) re-stated
again as
v = W −1A T k = QA T k
(6.47)
k = ( AQA T ) f
−1
where k is the vector of Lagrange multipliers, Q = W −1 is the cofactor matrix and W is the
weight matrix, A is a coefficient matrix containing sines and cosines of traverse bearings and
f is a vector containing the negative sums of the east and north components of the traverse
legs.
⎡ w1 0 0 0 ⎤ ⎡1 s1 0 0 0 ⎤
⎢0 w2 0 0 ⎥ ⎢ 0 1 s2 0 0 ⎥
⎢ ⎥ ⎢ ⎥
W=⎢0 0 w3 ⎥=⎢ 0 0 1 s3 ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢⎣ 0 wn ⎥⎦ ⎢⎣ 0 1 sn ⎥⎦
⎡ s1 0 0 0⎤
⎢0 s2 0 0⎥
⎢ ⎥
Q = ⎢0 0 s3 ⎥
⎢ ⎥
⎢ ⎥
⎢⎣ 0 sn ⎥⎦
⎡ n n
⎤
⎢ ∑ sin φk ΔEk ∑ sin φ ΔN k ⎥ k
AQA = ⎢ n ⎥
T k =1 k =1
⎢ n
⎥
⎢ ∑ cos φk ΔEk ∑ cos φk ΔN k ⎥
⎣ k =1 k =1 ⎦
ΔEk ΔN k
Now, since sin φk = and cos φk = then AQA T can be written as
sk sk
⎡ n ( ΔEk ) 2 n
ΔEk ΔN k ⎤
⎢∑ ∑ ⎥
AQA =
T ⎢ k =1 sk k =1 sk ⎥ = ⎡a c ⎤ (6.49)
⎢ n ⎢ ⎥
( ΔN k ) ⎥⎥ ⎣ c b⎦
2
⎢ ∑ ΔE k ΔN k
n
⎢⎣ k =1 sk
∑k =1 sk ⎥⎦
1 ⎡ b −c ⎤
( AQA )
−1
and T
= ⎢ −c a ⎥
ab − c 2 ⎣ ⎦
b ( DE − S E ) − c ( DN − S N )
k1 =
ab − c 2
(6.50)
a ( DN − S N ) − c ( DE − S E )
k2 =
ab − c 2
Figure 6.5 shows a closed traverse between stations A, B, C, D and E. The linear misclose
(bearing and distance) of the traverse is 222º 57' 31" 0.2340 and the components of the
misclose are −0.1594 m east and −0.1712 m north. It is required to adjust the distances
using Crandall's method.
B
96° 49′ C
86.430
7. 7′
12 2° 2
0
47
4
13 2.3
7° 70
16
16
′
A
229 D
295 .600
° 40 0
′ .42 35′
8
9 °
9
22
E
Figure 6.5 Closed traverse between stations ABCDE
The adjusted bearings and measured distances and the traverse leg components are shown in
Table 6.3 below. S E and S N are the summations of east and north components and since this
traverse begins and ends at the same point then DE and DN will both be zero.
Table 6.4 shows the functions of the components for each line and their summations.
( ΔE k ) ( ΔN k ) ΔEk ΔN k
2 2
Line
sk sk sk
1 58.069322 69.400678 63.482677
2 85.212376 1.217624 -10.186101
3 74.768213 87.601787 -80.931014
4 57.049491 41.370509 48.581545
5 186.525721 43.074279 -89.635154
a = 461.625124 b = 242.664876 c = -68.688048
The Lagrange multipliers k1 and k2 are computed from equations (6.50) using a, b, c from
Table 6.4, S E and S N from Table 6.3, and since this traverse begins and ends at the same
( ΔE k )
2
5
a =∑ = 461.625124 DE − S E = −0.159438 k1 = −4.7018E − 04
k =1 sk
( ΔN k )
2
5
b =∑ = 242.664876 DN − S N = −0.171224 k2 = −8.3868E − 04
k =1 sk
5
ΔEk ΔN k
c =∑ = −68.688048
k =1 sk
Table 6.5 shows the original traverse data, the residuals and adjusted traverse distances.
"fixed stations" whose east and north coordinates are known and P2 , P3 , P4 ,… , Pn −1 are
"floating stations" whose coordinates are to be determined from the traverse angles α and
distances s. The starting bearing φ0 and the finishing bearing φn are known.
N
Fi
P2
α2
xe
φ2
db
φ1
°
ea
rin
α1 s1
g
ΔE1
φ0
s2
2 αn-1 φn-1
°
P1 φn φn
ΔE 2 sn-2
°P
3
Pn-1
sn-1
αn
db
ear
ing
ixe
F
N
Pn
(i) The starting bearing φ0 plus all the measured angles should equal the known
finishing bearing φn ,
(ii) The starting east coordinate plus all the east components of the traverse legs
should equal the known east coordinate at the end point and
(iii) The starting north coordinate plus all the north components of the traverse legs
should equal the known north coordinate at the end point.
These conditions apply to all single traverses whether they start and end at different fixed
points or close back on the starting point and can be expressed mathematically as
φ0 + α1 + α 2 + α 3 + + an = φn
E1 + ΔE1 + ΔE2 + ΔE3 + + ΔEn −1 = En (6.52)
N1 +ΔN1 + ΔN 2 + ΔN 3 + + ΔN n −1 = N n
Traverses will generally misclose due to the small random errors in the angles (derived from
the measured directions) and the measured distances. To make the traverse mathematically
correct, small corrections must be applied to the measurements to give adjusted quantities.
These adjusted quantities are:
s = s′ + v s
α = α ′ + vα
where s and α are adjusted distance and angle respectively, s′ and α ′ are the measured angle
and distance, and vs and vα are small corrections. Replacing the adjusted quantities with
measurements and corrections allows the first member of equations (6.52) to be expressed as
φ0 + (α1 + vα ) + (α 2 + vα ) + (α 3 + vα ) +
1 2 3
( )
+ α n + vαn = φn
and summing the measured angles and rearranging gives a simple expression for the
summation of corrections to measured angles as
⎛ n
⎞
f1 = φn − ⎜ φ0 + ∑α k′ ⎟ = φn − φn′ (6.54)
⎝ k =1 ⎠
f1 is the angular misclose in the traverse and equation (6.54) simply states that the sum of the
corrections to the measured angles is equal to the angular misclose.
The second and third members of equations (6.52) can also be expressed as a summation of
corrections by considering the following
s = s′ + vs and φ = φ ′ + vφ
where φ ′ and vφ are "measured" bearing and correction respectively, hence we express the
ΔE = ( s′ + vs ) sin (φ ′ + vφ )
ΔN = ( s′ + vs ) cos (φ ′ + vφ )
Using the trigonometric expansions for sin ( A + B ) and cos ( A + B ) , and the approximations
and since vs and vφ are both small then their product vs vφ 0 , hence
Finally, the east and north components of a traverse leg computed using the measured
quantities are ΔE ′ = s′ sin φ ′ and ΔN ′ = s′ cos φ ′ , and we may write
ΔE = ΔE ′ + vφ ΔN ′ + vs sin φ ′
(6.55)
ΔN = ΔN ′ − vφ ΔE ′ + vs cos φ ′
Substituting equations (6.55) into the second and third members of equations (6.52) gives
(
E1 + ΔE1′ + vφ1 ΔN1′ + vs1 sin φ1′ )
+ ( ΔE ′ + v ΔN ′ + v
2 φ2 2 s2 sin φ2′ )
+
(
+ ΔEn′−1 + vφn −1 ΔN n′−1 + vsn −1 sin φn′−1 = En )
(
N1 + ΔN1′ − vφ1 ΔE1′ + vs1 cos φ1′ )
(
+ ΔN 2′ − vφ2 ΔE2′ + vs2 cos φ2′ )
+
(
+ ΔN n′−1 − vφn −1 ΔEn′−1 + vsn −1 cos φn′−1 = N n )
Letting the misclose in the east and north coordinates be
⎧ n −1
⎫
f 2 = En − ⎨ E1 + ∑ ΔEk′ ⎬ = En − En′
⎩ k =1 ⎭
(6.56)
⎧ n −1
⎫
f 3 = N n − ⎨ N1 + ∑ ΔN k′ ⎬ = N n − N n′
⎩ k =1 ⎭
n −1
and recognising that vφ1 = vα1 , vφ2 = vα1 + vα2 , vφ3 = vα1 + vα2 + vα3 etc, and vφn −1 = ∑ vαk then we
k =1
may write
( ) (
vα1 ΔN1′ + vs1 sin φ1′ + vα1 + vα2 ΔN 2′ + vs2 sin φ2′ + vα1 + vα2 + vα3 ΔN 3′ + vs3 sin φ3′ + )
⎛ n −1 ⎞
+ ⎜ ∑ vαk ⎟ ΔN n′−1 + vsn −1 sin φn′−1 = f 2
⎝ k −1 ⎠
( ) (
−vα1 ΔE1′ + vs1 cos φ1′ − vα1 + vα2 ΔE2′ + vs2 cos φ2′ − vα1 + vα2 + vα3 ΔE3′ + vs3 cos φ3′ − )
⎛ n −1 ⎞
− ⎜ ∑ vαk ⎟ ΔEn′−1 + vsn −1 cos φn′−1 = f 3
⎝ k −1 ⎠
Gathering together the coefficients of vα1 , vα2 , vα3 , etc and rearranging gives
+ ( N n′ − N1 ) vα + ( N n′ − N 2′ ) vα + ( N n′ − N 3′ ) vα + + ( N n′ − N n′ −1 ) vα
1 2 3 n −1
= f2 (6.57)
Equations (6.53), (6.57) and (6.58) are the three equations that relate corrections to angles and
distances, vα and vs respectively to angular and coordinate misclosures f1 , f 2 and f 3 given by
equations (6.54) and (6.56). In equation (6.53) the coefficients of corrections to angles are all
unity, whilst in equations (6.57) and (6.58) the coefficients of the corrections are sines and
cosines of bearings and coordinate differences derived from the measurements. Equations
(6.53), (6.57) and (6.58) are applicable to any single closed traverse.
Type I Traverses that begin and end at different fixed points with fixed
orienting bearings at the terminal points. Figure 6.7(a).
Type II Traverses that begin and end at the same point with a single
fixed orienting bearing. Figure 6.7(b)
Type III Traverses that begin and end at the same point with a fixed
datum bearing. Figure 6.7(b)
Fi
α2
xe
db
°
ea
rin
α1 s1
P2
g
φ0
s2
α3
P1 α4
°
P3 s3 °
P4 ar
ing
α5 e
s4 db
ixe
F
P5
N
Fi
xe
α2
db
α2
°
ea
α1 rin g
°
rin
ed b ea P2
g
s1 Fix
α1 s1
φ0
P
°
2
P1
s2 α5
α5 P1
s2
P3
°
s4 s4
α3
P3
°
P4 s3
s3 α3
α4 ° P4
α4°
Figure 6.7(b) Type II traverse Figure 6.7(c) Type III traverse
Figures 6.7(a), 6.7(b) and 6.7(c) show three types of closed traverses. In each case, the
traverse consists of four(4) distances s1 to s4 and five(5) angles α1 to α5 . Traverse points
In Figures 6.7(a) and 6.7(b) the bearing of the traverse line P1 → P2 is found by adding the
observed angle α1 to the fixed bearing φ0 . In both of these traverses five angles must be
observed to "close" the traverse.
In Figure 6.7(c) the bearing of the traverse line P1 → P2 is fixed and only four angles need be
observed to close the traverse. The angle α1 at P1 , clockwise from north to P2 , is the bearing
(i) Calculate the coordinates of the traverse points by using the observed bearings and
distances beginning at point P1 .
(ii) Calculate the angular and coordinate misclosures. In each case, the misclose is the
fixed value minus the observed or calculated value. These three values are the
elements f1 , f 2 and f 3 in the vector of numeric terms f
(iii) Calculate the coefficients of the correction (or residuals) in equations (6.53), (6.57) and
(6.58). These coefficients are either zero or unity for equation (6.53), or sines and
cosines of observed bearings together with coordinate differences in equations (6.57)
and (6.58). These values are the elements of the coefficient matrix A
(iv) Assign precisions (estimated standard deviations squared) of the observations. These
will be the diagonal elements of the cofactor matrix Q
Note: In Type III traverses where the bearing P1 → P2 is fixed, the angle α1 (which is
not observed) is assigned a variance (standard deviation squared) of zero.
(v) Form a set of three(3) normal equations ( AQA T ) k = f
(vi) Solve the normal equations for the three(3) Lagrange multipliers k1 , k2 and k3 which
are the elements of the vector k from k = ( AQA T ) f and then compute the vector of
−1
Figure 6.8 is a schematic diagram of a traverse run between two fixed stations A and B and
oriented at both ends by angular observations to a third fixed station C.
The bearings of traverse lines shown on the diagram, unless otherwise indicated, are called
"observed" bearings and have been derived from the measured angles (which have been
derived from observed theodolite directions) and the fixed bearing AC. The difference
between the observed and fixed bearings of the line BC represents the angular misclose. The
coordinates of the traverse points D, E and F have been calculated using the observed
bearings and distances and the fixed coordinates of A. The difference between the observed
and fixed coordinates at B represents the coordinate misclosures.
In this example estimated standard deviations of measured angles α are sα = 5′′ and for
measured distances s are ss = 10mm + 15ppm where ppm is parts per million.
2 75
2034.785 E FIXED
8776.030 N
°0
2′ 4
″4
9′ 1
7″
138° 18
A 5248.853 E
°2
110 8321.726 N
274
°1
3″
240 5′ 20″ E
′ 58″
°
12′ 3
1.60 ″
9 8″ 18
3 4′ 140 4 6′
195°
°
68 032.3 0°
° 1 13 5411.746 E
F 7786.963 N
D
° 1 13
°
1 5 6 49 ′ 50
D
4287.873 E
163°
FI XE
559.03′ 32″
7944.574 N 4. 68 ″
″
3 48′ 11
302°
0
22
″
01
3 8′ 46″ B
C 6° 7′
23 6° 3
23 6843.030 E
OBSERVED
ED 7154.779 N
E RV
S D
OB FIXE 6843.085 E
FIXED
7154.700 N
C
Figure 6.8 Traverse diagram showing field measurements, derived values and fixed values.
From equations (6.54) and (6.56) the angular and coordinate misclosures are the elements
f1 , f 2 and f 3 of the vector of numeric terms f. These misclosures may be characterised as
north misclose: f 3 = N n − N n′
= 7154.700 − 7154.779
= −0.079 m
= −7.9 cm
⎡ −15 ⎤ sec
vector of numeric terms: f = ⎢ 5.50 ⎥ cm
⎢ ⎥
⎢⎣ −7.9 ⎥⎦ cm
Note that the units of the numeric terms are seconds of arc (sec) and centimetres (cm)
The first row of A contains coefficients of zero or unity from equation (6.53)
⎛ 100 ⎞
The second row of A contains the coefficients sin φk′ and ( N n′ − N k′ ) ⎜ ⎟ from equation
⎝ ρ ′′ ⎠
(6.57).
+ ( N n′ − N1 ) vα + ( N n′ − N 2′ ) vα + ( N n′ − N 3′ ) vα + + ( N n′ − N n′ −1 ) vα
1 2 3 n −1
= f2
Note that the coefficients of the distance residuals are dimensionless quantities and the
180
coefficients of the angle residuals have the dimensions of sec/cm where ρ ′′ = × 3600 is
π
the number of seconds in one radian.
⎛ 100 ⎞
The third row of A contains the coefficients cos φk′ and − ( En′ − Ek′ ) ⎜ ⎟ from equation
⎝ ρ ′′ ⎠
(6.58).
Note that the coefficients of the distance residuals are dimensionless quantities and the
180
coefficients of the angle residuals have the dimensions of sec/cm where ρ ′′ = × 3600 is
π
the number of seconds in one radian. The equation Av = f is
⎡v ⎤ ↑
⎢ v ⎥ angles
⎢ ⎥
⎢v ⎥
⎢ ⎥ ↓
⎡ 0 0 0 0 1 1 1 1 1⎤ v
⎢ ⎥ ⎡ −15 ⎤
⎢ 0.9382 0.9309 0.2914 0.9147 −0.7860 −0.3829 −0.5658 −0.3065 0⎥ ⎢v ⎥ ↑ = ⎢ 5.50 ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎣⎢ −0.3462 0.3653 −0.9566 −0.4040 −2.3311 −1.2388 −0.7729 −0.6939 0⎦⎥ v ⎢ −7.90 ⎥⎦
⎣
⎢ ⎥
4.6 cm 2.5 cm 1.8 cm 3.3 cm 5′′ 5′′ 5′′ 5′′
⎢ ⎥
5′′ v distances
⎢ ⎥
⎢v ⎥
⎢⎣ v ⎥⎦ ↓
Note that the numbers below the columns of A are the estimates of the standard deviations of
the distances or angles associated with the coefficients.
⎧ 1 1 1 1 1 1 1 1 1 ⎫
variances, the diagonal elements of Q = ⎨ 2 ⎬
⎩ sα 1 sα2 2 sα2 3 sα2 4 ss21 ss22 ss23 ss24 ss25 ⎭
where the first 4 elements relate to the angles and the remaining 5 elements relate to the
distances. Now consider a diagonal matrix that denoted Q whose diagonal elements are the
element of A is the original element of A multiplied by the estimate of the standard deviation
associated with the particular element and the normal equations are given by ( AA T ) k = f
where
⎡ 0 0 0 0 5 5 5 5 5⎤
A = 4.3155 2.3272 0.5245 3.0186 −1.9145 −1.9145 −2.8288 −1.5324 0⎥
⎢
⎢ ⎥
⎣⎢ −1.5926 0.9134 −1.7219 −1.3333 −6.1939 −6.1939 −3.8644 −3.4696 0 ⎥⎦
Step 4: Solve the normal equations for the vector of Lagrange multipliers k
⎢ ⎥⎢ ⎥
⎢⎣symmetric 0.0127 ⎥⎦ ⎢⎣ −7.9 ⎥⎦
⎡ −0.3825⎤
and k = ⎢ 0.0738⎥
⎢ ⎥
⎣⎢ −0.2906⎦⎥
Since the cofactor matrix Q is diagonal, the individual residuals can be calculated from
v j = s 2j ( a1 j k1 + a2 j k2 + a3 j k3 ) (6.59)
where
Exactly the same result can be obtained by using the estimate of the standard deviations s j
v j = s j ( a1 j k1 + a2 j k2 + a3 j k3 ) (6.60)
⎡ 3.59 cm ⎤ ↑
⎢ −0.23 ⎥
⎢ ⎥
⎢ 0.97 ⎥ distances
⎢ ⎥
⎢ 2.01 cm ⎥ ↓
v = ⎢ 5.92′′ ⎥ ↑
⎢ ⎥
⎢ −1.27 ⎥
⎢ −4.99 ⎥ angles
⎢ ⎥
⎢ −5.09 ⎥
⎢⎣ −9.56′′ ⎥⎦ ↓
The residuals for the bearings are the cumulative residuals for the angles up to the particular
traverse line. They are
⎡ 5.92′′⎤
⎢ 4.65 ⎥
⎢ ⎥
vφ = ⎢ −0.34 ⎥
⎢ ⎥
⎢ −5.43 ⎥
⎢⎣ −14.99′′⎥⎦
Applying these residuals (or corrections) to the measured quantities gives the adjusted
traverse dimensions as
2034.785 E
A
8776. 030 N 5248.869 E
8321.627 N
110
° 15 ″
22.6
3″
1.64 8°
D 6 1032
5 .
559.0 03′ 31.7″
5411.766 E
16 3°
195 °
° F
7786.854 N
32
4287.883 E
7944.497 N
° 113
° 49
156 ′ 44.6 6843. 085 E
4.70 ″
3 B 7154.700 N
″
46
C 7′
6°3
2 3
In many surveying "problems" the solution depends upon selection of a mathematical model
suitable to the problem, and using this, together with the observations (or measurements)
obtain a solution.
For example, a surveyor is required to determine the location (the coordinates) of a point.
From this "unknown" point, they can see three known points (i.e., points of known
coordinates). Understanding geometric principles, the surveyor measures the directions to
these three known points with a theodolite, determines the two angles α and β between the
three lines and "solves the problem". In surveying parlance, this technique of solution of
position is known as a resection; the mathematical model is based on geometric principles and
the observations are the directions, from which the necessary angles are obtained for a
solution.
In many surveying problems, the observations exceed the necessary number required for a
unique solution. Again, using a resection as an example, consider the case where the surveyor
(at an unknown point) measures the directions to four known points. There are now multiple
solutions for the resection point, since the four directions give rise to three angles, exceeding
the minimum geometric requirements for a unique solution. That is, there is a redundancy in
the mathematical model. In this case of the resection, and other surveying problems where
there are redundant measurements, the method of least squares can be employed to obtain the
best estimate of the "unknowns".
Least squares (as a method of determining best estimates), depends upon the formation of sets
of observation equations and their solution. The normal techniques of solution of systems of
equations require that the sets of observation equations must be linear, i.e., "unknowns"
linearly related to measurements. This is not always the case. For example, in a resection,
the measurements, directions α ik from the unknown point Pi to known points Pk , are non-
linear functions of the coordinate differences (the unknowns).
⎛ Ek − Ei ⎞
αik + vik + zi = tan −1 ⎜ ⎟ (7.1)
⎝ Nk − Ni ⎠
αik are the observed directions from the resection point Pi to the known points Pk ,
vik are the residuals (small corrections) associated with observed directions,
zi is an orientation "constant"; the bearing of the Reference Object (RO) for the
Clearly, in this case, the measurements α ik are non-linear functions of the unknowns Ei , N i
and any system of equations in the form of (7.1) would be non-linear and could not be solved
by normal means. Consequently, whenever the equations in a mathematical model are non-
linear functions linking the measurements with the unknowns, some method of linearization
must be employed to obtain sets of linear equations.
The most common method of linearization is by using Taylor's theorem to represent the
function as a power series consisting of zero order terms, 1st order terms, 2nd order terms and
higher order terms. By choosing suitable approximations, second and higher-order terms can
be neglected, yielding a linear approximation to the function. This linear approximation of
the mathematical model can be used to form sets of linear equations, which can be solved by
normal means.
This theorem, due to the English mathematician Brook Taylor (1685–1731) enables the value
of a real function f ( x ) near a point x = a to be estimated from the values f ( a ) and the
derivatives of f ( x ) evaluated at x = a . Taylor's theorem also provides an estimate of the
error made in a polynomial approximation to a function. The Scottish mathematician Colin
Maclaurin (1698–1746) developed a special case of Taylor's theorem, which was named in his
honour, where the function f ( x ) is expanded about the origin x = a = 0 . The citations
below, from the Encyclopaedia Britannica give some historical information about Taylor and
Maclaurin.
Taylor, Brook (b. Aug. 18, 1685, Edmonton, Middlesex, Eng.– d. Dec. 29, 1731,
London), British mathematician noted for his contributions to the development of
calculus.
In 1708 Taylor produced a solution to the problem of the centre of oscillation. The
solution went unpublished until 1714, when his claim to priority was disputed by the
noted Swiss mathematician Johann Bernoulli. Taylor's Methodus incrementorum directa
et inversa (1715; "Direct and Indirect Methods of Incrementation") added to higher
mathematics a new branch now called the calculus of finite differences. Using this new
development, he was the first to express mathematically the movement of a vibrating
string on the basis of mechanical principles. Methodus also contained the celebrated
formula known as Taylor's theorem, the importance of which remained unrecognized
until 1772. At that time the French mathematician Joseph-Louis Lagrange realized its
importance and proclaimed it the basic principle of differential calculus.
A gifted artist, Taylor set forth in Linear Perspective (1715) the basic principles of
perspective. This work and his New Principles of Linear Perspective contained the first
general treatment of the principle of vanishing points. Taylor was elected a fellow of the
Royal Society of London in 1712 and in that same year sat on the committee for
adjudicating Sir Isaac Newton's and Gottfried Wilhelm Leibniz's conflicting claims of
priority in the invention of calculus.
Maclaurin, Colin (b. February 1698, Kilmodan, Argyllshire, Scot.–d. June 14, 1746,
Edinburgh), Scottish mathematician who developed and extended Sir Isaac Newton's
work in calculus, geometry, and gravitation. A child prodigy, he entered the University
of Glasgow at age 11. At the age of 19, he was elected professor of mathematics at
Marischal College, Aberdeen, and two years later he became a fellow of the Royal
Society of London. At this time he became acquainted with Newton. In his most
important work, Geometrica Organica; Sive Descriptio Linearum Curvarum Universalis
(1720; "Organic Geometry, with the Description of the Universal Linear Curves"),
Maclaurin developed several theorems similar to some in Newton's Principia,
introduced the method of generating conics (the circle, ellipse, hyperbola, and parabola)
that bears his name, and showed that certain types of curves (of the third and fourth
degree) can be described by the intersection of two movable angles. On the
recommendation of Newton, he was made professor of mathematics at the University of
Edinburgh in 1725. In 1740 he shared, with the mathematicians Leonhard Euler and
Daniel Bernoulli, the prize offered by the Académie des Sciences for an essay on tides.
His Treatisw of Fluxions (1742) was written in reply to criticisms by George Berkeley of
England that Newton's calculus was based on faulty reasoning. In this essay he showed
that stable figures for a homogeneous rotating fluid mass are the ellipsoids of revolution,
later known as Maclaurin's ellipsoids. He also gave in his Fluxions, for the first time,
the correct theory for distinguishing between maxima and minima in general and pointed
out the importance of the distinction in the theory of the multiple points of curves. The
Maclaurin series, a special case of the Taylor series, was named in his honour. In 1745,
when Jacobites (supporters of the Stuart king James II and his descendants) were
marching on Edinburgh, Maclaurin took a prominent part in preparing trenches and
barricades for the city's defense. As soon as the rebel army captured Edinburgh,
Maclaurin fled to England until it was safe to return. The ordeal of his escape ruined his
health, and he died at age 48. Maclaurin's Account of Sir Isaac Newton's Philosophical
Discoveries was published posthumously, as was his Treatise of Algebra (1748). "De
Linearum Geometricarum Proprietatibus Generalibus tractatus" ("A Tract on the General
Properties of Geometrical Lines"), noted for its elegant geometric demonstrations, was
appended to his Algebra. Copyright 1994-1999 Encyclopædia Britannica
( x − a) ( x − a)
2 3
(x − a)
k
n
f ( x) = ∑ f (k )
(a) (7.3)
k =0 k!
Other forms of Taylor's theorem may be obtained by a change of notation, for example: let
x = a + h , then f ( x ) = f ( a + h ) and x − a = h . Substitution into equation (7.2) gives
h2 h3
f ( x ) = f ( a + h ) = f ( a ) + h f ′( a ) + f ′′( a ) + f ′′′( a ) + "
2! 3!
(7.4)
h n −1
+ f ( n −1) ( a ) + Rn
( n − 1)!
This may be a more convenient form of Taylor's theorem for a particular application.
Inspection of equations (7.2), (7.3) and (7.4) show that Taylor's theorem can be used to
expand a non-linear function (about a point) into a linear series. Expansions of this form, also
called Taylor's series, are a convergent power series approximating f ( x ) .
Say φ = f ( x, y ) then the Taylor series expansion of the function φ about x = a and y = b is
∂f ∂f
φ = f ( a, b ) + ( x − a ) + ( y − b)
∂x ∂y
1⎧ 2 ∂ f
2
2 ∂ f
2
∂f ∂f ⎫
+ ⎨( x − a ) 2 ( + y − b ) 2 + ( x − a )( y − b ) ⎬ +" (7.5)
2! ⎩ ∂x ∂y ∂x ∂y ⎭
∂f ∂f ∂ 2 f
, , , etc are partial derivatives of the function φ evaluated at x = a and
∂x ∂y ∂x 2
y = b.
Say φ = f ( x, y , z ) then the Taylor series expansion of the function φ about x = a , y = b and
z=c
∂f ∂f ∂f
φ = f ( a, b, c ) + ( x − a ) + ( y − b) + ( z − c )
∂x ∂y ∂z
1⎧ 2 ∂ f
2
2 ∂ f
2
2 ∂ f
2
+ ⎨( x − a ) + ( y − b) + ( z − c)
2! ⎩ ∂x 2 ∂y 2 ∂z 2
∂f ∂f ∂f ∂f ∂f ∂f ⎫
+ ( x − a )( y − b ) + ( x − a )( z − c ) + ( y − b )( z − c ) ⎬ +" (7.6)
∂x ∂y ∂x ∂z ∂y ∂z ⎭
∂f ∂f ∂f ∂ 2 f
, , , , etc are partial derivatives evaluated at x = a , y = b and z = c .
∂x ∂y ∂z ∂x 2
Extensions to four or more variables follow a similar pattern. Equations (7.5) and (7.6) show
only terms up to the 2nd order; no remainder terms are shown.
In the Taylor expansions of functions shown above, suppose that the variables x, y , z, " etc
df d 2 f d 3 f
where the derivatives , 2
, 3
, " etc are evaluated at the approximation x 0 . If the
dx dx dx
correction Δx is small, then ( Δx ) , ( Δx ) , " etc will be exceedingly small and the higher
2 3
df
For φ = f ( x ) φ = f ( x ) f ( x 0 ) + Δx (7.7)
dx
Using similar reasoning, linear approximations can be written for functions of two and three
variables.
∂f ∂f
For φ = f ( x, y ) φ = f ( x , y ) f ( x 0 , y 0 ) + Δx + Δy (7.8)
∂x ∂y
∂f ∂f ∂f
For φ = f ( x, y , z ) , φ = f ( x , y , z ) f ( x 0 , y 0 , z 0 ) + Δx + Δ y + Δz (7.9)
∂x ∂y ∂z
Similar linear approximations can be written for functions of four or more variables. In
equations (7.7), (7.8) and (7.9) the derivatives are evaluated at the approximations x 0 , y 0 , z 0 .
∂f ∂f ∂f ∂f
φ = f ( x1 , x2 , x3 ," xn ) f ( x10 , x20 , x30 ," xn0 ) + Δx1 + Δx2 + Δx3 + " + Δ xn (7.10)
∂x1 ∂x2 ∂x3 ∂xn
φ = f ( x ) f ( x 0 ) + j Δx (7.11)
Suppose this generalized form, equation (7.11), is extended to the general case of m variables
y1 , y2 , y3 , " ym and each variable yk is a function of a set of variables x1 , x2 , x3 , " xn i.e.,
y1 = f1 ( x1 , x2 , x3 ," xn )
y2 = f 2 ( x1 , x2 , x3 ," xn )
#
ym = f m ( x1 , x2 , x3 ," xn )
∂y1 ∂y ∂y
y1 = y10 + Δx1 + 1 Δx2 + " + 1 Δxn
∂x1 ∂x2 ∂xn
∂y2 ∂y ∂y
y2 = y20 + Δx1 + 2 Δx2 + " + 2 Δxn
∂x1 ∂x2 ∂xn (7.12)
#
∂ym ∂y ∂y
ym = ym0 + Δx1 + m Δx2 + " + m Δxn
∂x1 ∂x2 ∂xn
y = y 0 + J Δx (7.13)
where
T
y 0 is an (m,1) vector of approximate values of the functions, y 0 = ⎡⎣ y10 y20 " ym0 ⎤⎦
Δx is an (n,1) vector of corrections to the approx. values, Δx = [ Δx1 Δx2 " Δxn ]
T
Consider Figure 7.1. Pi is the instrument point and directions αik and distances sik have been
observed to stations P1 , P2 , P3 " Pk . P1 is the Reference Object (RO) and the direction
αi1 = 0D 00′ 00′′ . A bearing is assigned to the RO and bearings to all other stations may be
obtained by adding the observed directions to the bearing of the RO.
N'
P3 Ek - Ei Pk
Nk - Ni
sik
α i3 αik
si3 φik
E'
zi
α i2 Pi
s i2 α i1
s i1
P2 P1 (RO)
φik bearing Pi to Pk
sik distance Pi to Pk
Ei , N i coordinates of Pi
Ek , N k coordinates of Pk
From Figure 7.1 the bearings φik and distances sik are non-linear functions of the coordinates
of points Pi and Pk
⎛ Ek − Ei ⎞
φik = tan −1 ⎜ ⎟ (7.14)
⎝ Nk − Ni ⎠
sik = ( Ek − Ei ) + ( Nk − Ni )
2 2
(7.15)
∂φik ∂φ ∂φ ∂φ
φik = φik0 + ΔEk + ΔN k ik + ΔEi ik + ΔN i ik (7.16)
∂Ek ∂N k ∂Ei ∂N i
∂sik ∂s ∂s ∂s
sik = sik0 + ΔEk + ΔN k ik + ΔEi ik + ΔN i ik (7.17)
∂Ek ∂N k ∂Ei ∂N i
where φik0 and sik0 are approximate bearings and distances respectively, obtained by
substituting the approximate coordinates Ek0 , N k0 , Ei0 , N i0 into equations (7.14) and (7.15).
The partial derivatives in equations (16) are evaluated in the following manner.
du dv
−u
v
d 1 du d ⎛u⎞ dx dx
Using the relationships: tan −1 u = and ⎜ ⎟=
dx 1 + u 2 dx dx ⎝ v ⎠ v2
∂φik
The partial derivative
∂Ek
∂φik ⎛ E k − Ei ⎞ ( Nk − Ni )
2
1 ∂ N k − Ni
= ⎜ ⎟=
∂Ek ⎛ E − Ei ⎞ ∂Ek ⎝ N k − N i ⎠ ( N k − N i ) + ( E k − Ei ) ( N k − N i )
2 2 2 2
1+ ⎜ k ⎟
⎝ Nk − Ni ⎠
giving
Similarly
∂φik
=
( Ek − Ei ) ( E − E ) sin φik = −a
= k 2 i = (7.21)
∂N i ( N k − N i ) + ( Ek − Ei )
2 2 ik
sik sik
The partial derivatives of equation (7.17) are evaluated in the following manner
∂sik
The partial derivative
∂Ek
∂sik 1 ⎡ E − Ei
1
2 −
= ( Ek − Ei ) + ( N k − N i ) ⎤ 2 2 ( Ek − Ei ) = k = sin φik = d ik
2
(7.22)
∂Ek 2 ⎣ ⎦ sik
Similarly
∂sik N k − N i
= = cos φik = cik (7.23)
∂N k sik
∂sik − ( Ek − Ei )
= = − sin φik = − d ik (7.24)
∂Ei sik
∂sik − ( N k − N i )
= = − cos φik = −cik (7.25)
∂N i sik
⎛ Ek − Ei ⎞
αik + vik + zi = φik = tan −1 ⎜ ⎟ (7.26)
⎝ Nk − Ni ⎠
where vik are the residuals (small corrections) associated with observed directions. Using
equation (7.16) together with the partial derivatives given in equations (7.18) to (7.21) gives a
linear approximation of the observation equation for an observed direction
αik + vik + zi = aik ΔN k + bik ΔEk − aik ΔN i − bik ΔEi + φik0 (7.27)
sik + vik = ( Ek − Ei ) + ( Nk − Ni )
2 2
(7.28)
where vik are the residuals (small corrections) associated with observed distances. Using
(7.17) together with the partial derivatives given in equations (7.22) to (7.25) gives a linear
approximation of the observation equation for an observed distance
Nk − Ni E − Ei
where cik = = cos φik and d ik = k = sin φik are the distance coefficients
sik sik
Figure 7.2 shows a point P, whose coordinates are unknown, intersected by bearings from
stations A and B whose coordinates are known.
N Coordinates
P
φA ⊗ A: 12273.910 E B: 12875.270 E
29612.310 N 28679.600 N
A
N Bearings
φB φ A = 81D 01′ 23′′ φ B = 34D 47′ 52′′
N
B Approximate coords P: 13677 E
E 29834 N
Figure 7.2 Bearing intersection
The information given above can be used to compute the coordinates of P by using an
iterative technique employing linearized observation equations approximating the bearings
φ A and φ B . These observation equations [see equation (7.27)] have been derived using
Taylor's theorem.
In general, a bearing is a function of the coordinates of the ends of the line, i.e.,
⎛ Ek − Ei ⎞
φik = tan −1 ⎜ ⎟ = f ( Ek , N k , Ei , N i ) (7.30)
⎝ N k − Ni ⎠
where subscripts i and k represent instrument and target respectively. In this example
(intersection) A and B are instrument points and are known and P is a target point and is
unknown hence
φik = f ( Ek , N k )
is a non-linear function of the variables Ek and N k only (the coordinates of P). Using
equations (7.26) and (7.27) with modifications ΔEi = ΔN i = 0 since the coordinates of the
instrument points are known gives
direction coefficients and φik0 is an approximate bearing. Note that φik0 and the direction
coefficients aik and bik are computed using the approximate coordinates of P.
Using equation (7.31), two equations for bearings φ A and φ B may be written as
φ A = a AΔN P + bAΔEP + φ A0
φ B = aB ΔN P + bB ΔEP + φ B0
⎡aA bA ⎤ ⎡ ΔN P ⎤ ⎡φ A − φ A0 ⎤
=
⎢a
⎣ B bB ⎥⎦ ⎢⎣ ΔEP ⎥⎦ ⎢⎣φ B − φ B0 ⎥⎦
or Cx = u
x = C−1u
From the information given with Figure 7.2 the computed bearings (φik0 ) and distances ( sik0 )
u A = φ A − φ A0 uB = φ B − φ B0
= 81D 01′ 23′′ − 81D 01′17.1′′ = 34D 47′ 52′′ − 34D 46′ 47.8′′
= 5.9′′ = 64.2′′
Cx = u
↓ 2
cm's seconds
the elements of the coefficient matrix C will be computed in sec cm (seconds per centimetre)
to maintain consistency of units so that
x = C−1u
0 ↓ 2
cm cm sec sec
Note that if the units (or dimensions) of the elements of C are sec cm then the units of the
The elements of C are the direction coefficients and with distances sik0 in centimetres
These are the "new" approximate coordinates for P. A further iteration will show that the
corrections to these values are less than 0.5 mm, hence the values above could be regarded as
exact.
estimates of the variances and covariances of the adjusted quantities. These precision
estimates, variances sE2 , sN2 and covariance sEN can be used to define a geometric figure
known as the Standard Error Ellipse, which is a useful graphical representation of the
precision of a position fix. Poor or "weak" fixes are indicated by narrow elongated ellipses
and good or "strong" position fixes are indicated by near circular ellipses.
Error ellipses may be computed for points before any observations are made provided the
approximate locations of points (fixed and floating) are known. Observations (directions,
bearings and distances) may be scaled from maps and diagrams and an approximate set of
normal equations formed. The inverse of the coefficient matrix N yields all the information
required for the computation of the parameters of the error ellipses. In such cases, error
ellipses are an important analysis tool for the surveyor in planning survey operations
Consider a point whose precision estimates, variances sE2 , sN2 and covariance sEN are known.
The variance in any other direction u may be calculated by considering the projection of E
and N onto the u-axis, which is rotated anti-clockwise from the E-axis by an angle φ
u = E cos φ + N sin φ u
(8.1)
= f ( E, N )
N sin φ
os φ
Ec E
φ
Figure 8.1
Applying the law of propagation of variances to equation (8.1) gives an expression for
variance in the u-direction
⎛ ∂f ⎞ 2 ⎛ ∂f ⎞ ∂f ∂f
2 2
s =s ⎜
2 2
⎟ + sN ⎜ ⎟ + 2 sEN (8.2)
⎝ ∂E ⎠ ⎝ ∂N ⎠ ∂E ∂N
u E
∂f ∂f
The partial derivatives = cos φ , = sin φ are obtained from (8.1) to give an equation for
∂E ∂N
the variance su2 in a direction φ (positive anti-clockwise) from the E-axis.
Equation (8.3) defines the pedal curve of the Standard Error Ellipse
P
•
A
• tang
ent
al
m
mino
nor
φ axis a
r
O θ E
r
majo
b is
Pedal curve
ax
Ellipse
0 1 2
Scale of units
In Figure 8.2, A is a point on an ellipse. The tangent to the ellipse at A intersects a normal to
the tangent passing through O at P. As A moves around the ellipse, the locus of all points P is
the pedal curve of the ellipse. The distance OP = su2 for the angle φ . The maximum and
minimum values of su2 define the directions and lengths of the axes of the ellipse and the
following section details the equations linking variances sE2 , sN2 and covariance sEN with the
Equation (8.3) has maximum and minimum values defining the lengths and directions of the
axes of the error ellipse. To determine these values from (8.3) the trigonometric identities
1 − cos 2φ = 2sin 2 φ , 1 + cos 2φ = 2 cos2 φ , sin 2φ = 2sin φ cos φ can be used to give
1 2 1
su2 = sE (1 + cos 2φ ) + sN2 (1 − cos 2φ ) + sEN sin 2φ
2 2
= ( sE2 + sN2 ) + ( sE2 − sN2 ) cos 2φ + sEN sin 2φ
1 1
2 2
Letting A = (
1 2
2
sE − sN2 ) and B = sEN this expression has the general form
su2 =
2
(
1 2
sE + sN2 ) + A cos 2φ + B sin 2φ (8.4)
su2 =
2
( sE + sN2 ) + R cos ( 2φ − α )
1 2
Equating the coefficients of cos 2φ and sin 2φ in equations (8.4) and (8.5) gives R cos α = A
and R sin α = B from which we obtain
R= A2 + B 2
(s − sN2 ) + ( sEN )
2
= 1 2 2
4 E
1
( sE2 − sN2 ) + 4 ( sEN )
2
=
2
2
1
= W (8.6)
2
(s − sN2 ) + 4 ( sEN )
2
W= 2 2
where E (8.7)
B 2s
tan α = = 2 EN 2 (8.8)
A sE − s N
i.e., cos (π ) = −1 or
su2 ( max ) = (
1 2
2
sE + sN2 ) + R = (
1 2
2
sE + sN2 + W )
(8.9)
su2 ( min ) = ( sE2 + sN2 ) − R = ( sE + sN2 − W )
1 1 2
2 2
Inspection of Figure 8.2 shows that the maximum and minimum values of su2 are in the
directions of the major and minor axes of the Standard Error Ellipse and the semi-axes lengths
are
a=
2
( sE + sN2 + W )
1 2
(8.10)
b=
2
( sE + sN2 − W )
1 2
The value of φ when su2 is a maximum is when ( 2φ − α ) = 0 , i.e., when α = 2φ thus from
equation (8.8), letting θ = φ when su2 is a maximum, the angle θ , measured anti-clockwise
from the E-axis to the major axis of the Standard Error Ellipse, is given by
2 sEN
tan 2θ = (8.11)
s − sN2
2
E
Note that su2 is a minimum when 2φ − α = π , i.e., when α = 2φ − π thus from equation (8.8),
the same equation for the angle to the major axis. Hence, it is not possible to distinguish
between the angles to the major or minor axes and the ambiguity must be resolved by using
equation (8.3).
Alternatively, the parameters of the Standard Error Ellipse can be determined from equation
(8.3) by the methods outlined in Chapter 2 (Sction 2.7.2 Least Squares Best Fit Ellipse).
Consider equation (8.3) expressed as
and the aim is to find the maximum and minimum values of f (the maximum and minimum
variances) and the values of φ when these occur by investigating the first and second
derivatives f ′ and f ′′ respectively, i.e.,
Now the maximum or minimum value of f occurs when f ′ = 0 and from the first member of
(8.13) the value of φ is given by
2 sEN
tan 2φ = (8.14)
s − sN2
2
E
But this value of φ could relate to either a maximum or a minimum value of f. So from the
second member of equations (8.13) with a value of 2φ from equation (8.14) this ambiguity
can be resolved by determining the sign of the second derivative f ′′ since it is known that
⎧ f max ⎫ ⎧ f ′′ < 0 ⎫
⎨ ⎬ when ⎨ ⎬
⎩ f min ⎭ ⎩ f ′′ > 0 ⎭
In the equation of the pedal curve of the Standard Error Ellipse given by equation (8.12) f max
2
coincides with smax 2
and f min coincides with smin so the angle θ (measured positive anti-
clockwise) from the E-axis to the major axis of the ellipse (see Figure 8.2) is found from
⎧ smax
2
⎫ ⎧ f ′′ < 0 ⎫ ⎧θ = φ ⎫
⎨ 2 ⎬ when ⎨ ⎬ and ⎨ ⎬
⎩ f ′′ > 0 ⎭ ⎩θ = φ − 2 π ⎭
1
⎩ smin ⎭
Substituting φ = θ and φ = θ + 12 π into equation (8.3) will give the max. and min. values of
the variance which are the lengths of the semi axes a and b of the Standard Error Ellipse.
(s − sN2 ) + 4 ( sEN )
2
W= 2 2
E
= 42 + 4 (1.2 )
2
= 4.6648
a=
2
( sE + sN2 + W ) = 2.5164
1 2
b=
2
( sE + sN2 − W ) = 1.2914
1 2
The angle between the E-axis and the major axis (positive anti-clockwise), noting the
quadrant signs to determine the proper quadrant of 2θ
2 sEN 2 (1.2 ) ⎛ + ⎞
tan 2θ = = =⎜ ⎟
sE2 − sN2 6−2 ⎝+⎠
2θ = 30D 57′ 50′′
θ = 15D 28′ 55′′
Substituting the values φ = θ = 15D 28′ 55′′ and φ = 90D + θ = 105D 28′ 55′′ into equation (8.3)
gives su = 2.5164 and su = 1.2914 respectively so θ = 15D 28′ 55′′ is the angle (positive anti-
clockwise) between the E-axis and the major axis. Hence the bearing of the major axis is
90D − θ = 74D 31′ 05′′
Alternatively, using the method of evaluating the second derivative we have from equation
(8.14)
2 sEN 2 (1.2 ) ⎛ + ⎞
tan 2φ = = =⎜ ⎟
s − sN
2
E
2
6−2 ⎝+⎠
2φ = 30D 57′ 50′′
φ = 15D 28′ 55′′
Now, since f ′′ < 0 then θ = φ = 15D 28′ 55′′ is the angle (positive anti-clockwise) from the E-
axis to the major axis of the ellipse. The bearing of the major axis is 90D − θ = 74D 31′ 05′′ .
The Standard Error Ellipse semi-axes lengths a and b are obtained from equation (8.3) with
φ = θ = 15D 28′ 55′′ and φ = θ + 12 π = 105D 28′ 55′′ respectively giving
D
B
C
A
B
In Resection 1 (Figure 8.3) the error ellipse indicates a strong position fix and the observed
stations are spread through an arc of approximately 200°.
In Resection 2 (Figure 8.4) the error ellipse indicates a poor position fix. The observed
stations lay in a small arc of approximately 35°.
N N N
θ E E E E
1 2 3 4
N
N
N N
θ E
θ E E E
θ
θ
5 6 7 8
sEN
Note: ρ EN = is the correlation coefficient −1 ≤ ρ EN ≤ 1
sE s N
8.6. References
Mikhail, E.M., 1976. Observations and Least Squares, IEP–A Dun-Donnelley, New York.
Mikhail, E.M. and Gracie, G., 1981. Analysis and Adjustment of Survey Measurements,
Van Nostrand Reinhold Company, New York.
In the case of four or more observed directions to known points, the method of least squares
(least squares adjustment of indirect observations) may be employed to obtain the best
estimates of the coordinates of the resected point. This technique requires the formation of a
set of observation equations that yield "normal equations" that are solved for the best
estimates of the coordinates of the resected point. Owing to the nature of the observation
equation, which is a linearized approximation, the least squares solution process is iterative.
That is, approximate values are assumed, corrections computed and approx values updated;
with the process repeated until the corrections to approximate vales become negligible. This
least squares technique is often called Variation of Coordinates.
In the case of four or more observed directions to known points, the method of least squares
may be employed to obtain the best estimates of the coordinates of the resected point. This
technique requires the formation of a set of observation equations; each equation based on a
linearized form of the following equation whose elements are shown in Figure 9.1. (see
Chapter 7, Section 7.3 for details regarding the linearization process using Taylor's theorem).
⎛ Ek − E P ⎞
α k + vk + z = tan −1 ⎜ ⎟ (9.1)
⎝ Nk − NP ⎠
N'
P3 Ek - Ei Pk
Nk - Ni
sik
α i3 αik
si3 φik
E'
zi
α i2 Pi
s i2 α i1
s i1
P2 P1 (RO)
N
where
αk are the observed directions from the resection point P to the known points Pk ,
z is an orientation "constant"; the bearing of the Reference Object (RO) for the
set of observed directions,
Ek , N k are the east and north coordinates of the known points, and
Equation (9.1) is a non-linear equation, which has a linear approximation of the form
EP = EP0 + ΔE
N P = N P0 + ΔN (9.4)
z = z 0 + Δz
φk0 , sk0 are approximate bearings and distances obtained by substituting the approximate
⎛ Ek − EP0 ⎞
φk0 = tan −1 ⎜ 0 ⎟
(9.5)
⎝ Nk − NP ⎠
(E − EP0 ) + ( N k − N P0 )
2 2
sk0 = k (9.6)
vk + ak ΔN + bk ΔE + Δz = φk0 − (α k + z 0 ) (9.7)
⎡φ10 − (α1 + z 0 ) ⎤
⎡ v1 ⎤ ⎡ a1 b1 1⎤
⎡ ΔN ⎤ ⎢ ⎥
⎢v ⎥ ⎢a b2 1⎥ ⎢ ⎥ ⎢φ20 − (α 2 + z 0 ) ⎥
⎢ 2⎥ + ⎢ 2 ⎥ ΔE = ⎢ ⎥
⎢#⎥ ⎢# #⎥ ⎢ ⎥ ⎢ # ⎥
⎢ ⎥ ⎢ ⎥ ⎢ Δz ⎥
bn 1⎦ ⎣ ⎦ ⎢φ 0 − α + z 0 ⎥
⎣ vn ⎦ ⎣ a n ⎢⎣ n ( n )⎥⎦
or v + Bx = f (9.8)
In any least squares adjustment, every measurement (or observation) has an associated
precision (a variance) and a measure of connection with every other measurement
(covariances). These statistics are contained in a covariance matrix Σ . The elements of a
covariance matrix are population statistics and in practice, the covariance matrix is estimated
a priori by a cofactor matrix Q. Covariance matrices and cofactor matrices are related by
Σ = σ 02Q where σ 02 is the variance factor. An estimate of the variance factor σˆ 02 may be
computed after the adjustment. In least squares theory it is often useful to express the relative
precision of observations in terms of weights, where a weight is defined as being inversely
proportional to a variance, this leads to the definition of a weight matrix as the inverse of a
cofactor matrix, i.e. W = Q −1 . Weight matrices, covariance matrices and cofactor matrices
are square and symmetric (see Chapter 2, Section 2.5).
Applying the least squares principle to equation (9.8) with the precisions of the observations
estimated by a weight matrix leads to a set of normal equations of the form
( B WB ) x = B Wf
T T
or Nx = t (9.9)
⎡ ΔN ⎤
x = ⎢ ΔE ⎥ = N −1t (9.10)
⎢ ⎥
⎢⎣ Δz ⎥⎦
9.4. The form of the Coefficient Matrix N and the Vector of Numeric Terms t
For n observations, the coefficient matrix B and the vector of numeric terms f have the
following form
⎡φ 0 − (α1 + z 0 ) ⎤
⎡ a1 b1 1⎤ ⎡ f1 ⎤ ⎢ 1 ⎥
⎢a b2 1⎥ ⎢ f ⎥ ⎢φ20 − (α 2 + z 0 ) ⎥
⎢ 2 ⎥ ⎢ 2⎥ ⎢ ⎥
B = ⎢ a3 b3 1⎥ , f = ⎢ f 3 ⎥ = ⎢φ30 − (α 3 + z 0 ) ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢# ⎥ ⎢#⎥ ⎢ # ⎥
⎢⎣ an 1⎥⎦ ⎢⎣ f n ⎥⎦ ⎢ 0 0 ⎥
⎣⎢φn − (α n + z )⎦⎥
bn
⎡ s12 0 0 " 0⎤
⎢ ⎥
⎢0 s22 0 " 0⎥
Q =⎢0 0 s32 " 0⎥
⎢ ⎥
⎢# # # % #⎥
⎢⎣ 0 0 0 " sn2 ⎥⎦
(9.11)
⎡1 s12 0 0 " 0 ⎤ ⎡ w1 0 0 " 0⎤
⎢ ⎥
⎢ 0 1 s2
2
0 " 0 ⎥ ⎢⎢ 0 w2 0 " 0⎥
⎥
W = Q −1 =⎢ 0 0 1 s32 " 0 ⎥=⎢0 0 w3 " 0⎥
⎢ ⎥ ⎢ ⎥
⎢ # # # % # ⎥ ⎢# # # % # ⎥
⎢⎣ 0 0 0 " 1 sn2 ⎥⎦ ⎢⎣ 0 0 0 " wn ⎥⎦
The normal equation coefficient matrix N and vector of numeric terms t have the following
form
⎡ n n n
⎤ ⎡ n ⎤
⎢ ∑ wk ak ∑ wk ak bk ∑ ⎢ ∑ wk ak f k ⎥
2
w a
k k⎥
⎢ k =1 k =1 k =1
⎥ ⎢ k =1 ⎥
⎢ n n n
⎥ ⎢ n
⎥
N = BT WB = ⎢ ∑ wk ak bk ∑w b 2
k k ∑ wk bk ⎥ , t = BT Wf = ⎢ ∑ wk bk f k ⎥ (9.12)
⎢ k =1 k =1 k =1
⎥ ⎢ k =1 ⎥
⎢ n n n
⎥ ⎢ n ⎥
⎢ ∑ wk ak ∑w b k k ∑ wk ⎥ ⎢ ∑ wk f k ⎥
⎣ k =1 k =1 k =1 ⎦ ⎣ k =1 ⎦
Note that if each element of the coefficient matrix B and vector of numeric terms f is divided
by the appropriate estimate of the standard deviation augmented matrices B and f can be
formed
⎡ (φ10 − α1 + z 0 ) s1 ⎤
⎡ a1 s1 b1 s1 1 s1 ⎤ ⎡ f1 s1 ⎤ ⎢ ⎥
⎢a s b2 s2 1 s2 ⎥ ⎢ f s ⎥ ⎢ (φ20 − α 2 + z 0 ) s2 ⎥
⎢ 2 2 ⎥ ⎢ 2 2⎥ ⎢ ⎥
B = ⎢ a3 s3 b3 s3 1 s3 ⎥ , f = ⎢ f 3 s 3 ⎥ = ⎢ (φ30 − α 3 + z 0 ) s3 ⎥ (9.13)
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ # ⎥ ⎢ # ⎥ ⎢ # ⎥
⎢⎣ an sn bn sn 1 sn ⎥⎦ ⎢⎣ f n sn ⎥⎦ ⎢ 0 ⎥
⎢⎣(φn − α n + z )
0
sn ⎥⎦
and the normal equation coefficient matrix N and vector of numeric terms t are given by
N = BT WB = BT B
(9.14)
t = BT Wf = BT f
coordinates EP0 , N P0 and the approximate orientation constant z 0 . These corrections are
added to the approximate values to obtain updated values of the approximations and another
iteration performed.
When the corrections for the kth iteration reach some desired value, say less than 0.5 mm, then
the current "approximate" values may be regarded as exact and the solution is complete.
At the end of the iterative process, the residuals v are computed from equation (9.8) and an
assessment of the "quality" of the observations can be made. Large residuals may indicate
poor observations.
An estimate of the variance factor σˆ 02 can be computed from the residuals after the
adjustment process by using the following equation (see Chapter 5, equation (5.25))
v T Wv f T Wf − xT t
σˆ 02 = = (9.15)
n−u n−u
Note that in (9.15) the variance factor can be computed without calculating the residuals.
The following pages show an example resection. Page 8 shows a diagram of the resection
observations to four known stations. Page 9 shows a geometric solution for the coordinates of
P using the Collins Point technique and pages 10 to 14 show the calculations required for a
least squares solution using all four observations.
EPIPHANY
STUDLEY PARK
ST JOHNS
N
P
GOVERNMENT HOUSE
EPIPHANY
323590.140 E
5816974.280 N
O 9′ 03.
26″
55 1 5666
71
1740. 325526.582 E
STUDLEY PARK 5815551.657 N
324095.159 E β
5814561.142 N P
α δ
γ
δ 15 17′ 41.62″
O
95 26″
4″
.11 ′ 11.
.6
48
2 1 4
1′ 804
40 3° 30
0
° 0 2
40 97.
44
21
56
174° 41′ 57.6
6 31′ 10.38″
O
GOVERNMENT
14126.62222
γ
168
HOUSE
°1
321862.876 E
0′ 47
5811188.930 N
4″
.
3
26″
15
2
10 ° 53
90 ′ 0
1. 5.
65 64
62 ″
1
180° − (γ + δ)
COLLINS POINT
326831.627 E
5801485.445 N
The data for this resection is shown on the preceding pages. The coordinates of P are
computed from the Collins Point resection and are rounded to the nearest 0.1 metres.
Computation of Direction coefficients and numeric terms for the observation equation
vk + ak ΔN + bk ΔE + Δz = φk0 − (α k + z 0 ) (9.16)
Note 1: In equation (9.16) the numeric terms on the right-hand-side are computed bearing -
"observed" bearing. The observed bearings are obtained by adding the observed
directions to the computed bearing of the RO (Government House).
Note 2: The dimensions (or units) of the numeric terms on the right-hand-side of (9.16) are
seconds of arc. This means that the elements on the left-hand-side must have
consistent dimensions, i.e. vk (seconds), ak , bk (seconds/length), ΔE , ΔN (length)
and Δz (seconds). If the corrections to approximate coordinates are expressed in
centimetres (cm), then the direction coefficients have dimensions of sec/cm and
equations (9.3) are
180
where distances and coordinate differences are in cm's and ρ ′′ = × 3600 (seconds
π
in one radian).
Coefficient matrix B, vector of numeric terms f and normal equation coefficient matrix
N = BT WB
ΔN ΔE Δz
0.281538 -0.425294 1 0.00
1.119473 0.663531 1 -1.74
B= f=
0.171384 0.818873 1 -5.56
-0.974383 0.674301 1 -9.78
In this example, the observations are assumed to be of equal precision. In such cases the
weight matrix W can be replaced by the Identity matrix I, and the normal equation coefficient
matrix N = BT WB = BT IB = BT B and the vector of numeric terms t = BT Wf = BT If = BT f
Station Residual
Government House 0.04 (sec)
St. Johns -0.19
Epiphany 0.30
Studley Park -0.15
An estimate of the variance factor σˆ 02 can be computed from equation (9.15) with W = I,
n = 4, u = 3 and
vT v f T f − xT t
σˆ 02 = = = 129.620678 − 129.465415 = 0.155263 sec 2
n−u 1
Assuming all the observations are of equal precision and letting W = I is equivalent to
assigning an estimated standard deviation of 1 second to each observation. Inspection of the
variance factor shows that if a standard deviation of 0.39 sec ( 0.39 = 0.155263 ) was used as
an estimate of the standard deviation of the observed directions, the estimate of the variance
factor computed from the adjustment would have been approximately unity (a variance factor
of unity indicates that the estimates of variances are close to the population statistics).
From this adjustment, we may conclude that the standard deviation of the observed directions
was approximately 0.4 sec.
A most important "by-product" of a least squares adjustment is the ability to estimate the
precision of the computed quantities. Theory shows that this information is contained in the
inverse of the normal equations and the covariance matrix of the computed quantities Σ xx is
given by
σ E = 0.157 cm = 0.004 m
σ N = 0.070 cm = 0.003 m
From the variance matrix Σ xx above we have σ E2 = 0.157, σ N2 = 0.070 and σ EN = 0.011
Using the formulae from Chapter 8, Section 8.3 (replacing s with σ ), the lengths of the semi-
axes of the Standard Error Ellipse are
(σ − σ N2 ) + 4 (σ EN )
2
W= 2 2
E
= 0.0897
a = (
1 2
2
σ E + σ N2 + W ) = 0.3980 cm
b = (
1 2
2
sE + sN2 − W ) = 0.2620 cm
The angle between the E-axis and the major axis (positive anti-clockwise), noting the
quadrant signs to determine the proper quadrant of 2θ
2σ EN 2 ( 0.011) ⎛+⎞
tan 2θ = = =⎜ ⎟
σ E − σ N 0.157 − 0.070 ⎝ + ⎠
2 2
Substituting the values φ = θ = 7D 05′ 44′′ and φ = 90D + θ = 97D 05′ 44′′ into equation (8.3)
gives su = 0.3980 cm and su = 0.2620 cm respectively so θ = 7D 05′ 44′′ is the angle (positive
anti-clockwise) between the E-axis and the major axis. Hence the bearing of the major axis is
90D − θ = 82D 54′16′′
Figure 9.2 shows a schematic diagram of the example resection and the Standard Error Ellipse
EPIPHANY
STUDLEY PARK
ST JOHNS
GOVERNMENT HOUSE
observed from known stations A, B, C and D. Bearings from two known points are the
minimum requirement for a solution, which may be obtained from geometric principles set
out below. In the case of three or more observed bearings from known points, the method of
least squares may be employed to obtain the best estimates of the coordinates of the
intersected point P. This technique requires the formation of a set of observation equations
that yield "normal equations" that are solved for the best estimates of the coordinates of P.
Owing to the nature of the observation equation, which is a linearized approximation, the least
squares solution process is iterative. That is, approximate values are assumed, corrections
computed and approx values updated; with the process repeated until the corrections to
approximate vales become negligible.
N N
φA B
A EP − EA φB
NP − NA
P
· N
N C
φD φC
N D
Figure 10.1. Unknown point P intersected by bearings from known points A,B,C and D
From Figure 10.1, using the bearings φ A , φ B to the unknown point P from known points A and
EP − E A
tan φ A =
NP − N A
(10.1)
E − EB
tan φ B = P
NP − NB
EP = N P tan φ A − N A tan φ A + E A
(10.2)
EP = N P tan φ B − N B tan φ B + E B
N A tan φ A − N B tan φ B + EB − E A
NP = (10.3)
tan φ A − tan φ B
Having obtained a solution for N P from (10.3) then EP can be obtained from either of
equations (10.2).
It should be noted that if P lies on the line between A and B then its position is indeterminate.
In the case of four or more observed directions to known points, the method of least squares
may be employed to obtain the best estimates of the coordinates of the resected point. This
technique requires the formation of a set of observation equation; each equation based on a
linearized form of the following equation whose elements are shown in Figure 10.1.
⎛ E P − Ek ⎞
φk + vk = tan −1 ⎜ ⎟ (10.4)
⎝ NP − Nk ⎠
where φk are the observed bearings known points Pk to the unknown point P,
Ek , N k are the east and north coordinates of the known points, and
Equation (10.4) is a non-linear equation, which has a linear approximation of the form
EP = EP0 + ΔE
(10.7)
N P = N P0 + ΔN
φk0 , sk0 are approximate bearings and distances obtained by substituting the approximate
⎛ EP0 − Ek ⎞
φk0 = tan −1 ⎜ ⎟ (10.8)
⎝ NP − Nk ⎠
0
(E − Ek ) + ( N P0 − N k )
2 2
sk0 = 0
P (10.9)
vk − ak ΔN − bk ΔE = φk0 − φk (10.10)
or v + Bx = f (10.11)
In any least squares adjustment, every measurement (or observation) has an associated
precision (a variance) and a measure of connection with every other measurement
(covariances). These statistics are contained in a covariance matrix Σ . The elements of a
covariance matrix are population statistics and in practice, the covariance matrix is estimated
a priori by a cofactor matrix Q. Covariance matrices and cofactor matrices are related by
Σ = σ 02 Q where σ 02 is the variance factor. An estimate of the variance factor σˆ 02 may be
computed after the adjustment. In least squares theory it is often useful to express the relative
precision of observations in terms of weights, where a weight is defined as being inversely
proportional to a variance, this leads to the definition of a weight matrix as the inverse of a
cofactor matrix, i.e. W = Q −1 . Weight matrices, covariance matrices and cofactor matrices
are square and symmetric (see Chapter 2, Section 2.5).
Applying the least squares principle to equation (10.8) with the precisions of the observations
estimated by a weight matrix leads to a set of normal equations of the form
(B T
WB ) x = BT Wf
or Nx = t (10.12)
⎡ ΔN ⎤
x = ⎢ ⎥ = N −1t (10.13)
⎣ ΔE ⎦
10.5. The form of the Coefficient Matrix N and the Vector of Numeric Terms t
For n observations, the coefficient matrix B and the vector of numeric terms f have the
following form
B = ⎢ − a3 −b3 ⎥ , f = ⎢ f 3 ⎥ = φ3 − φ3 ⎥
⎢ 0
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ # ⎥ ⎢#⎥ ⎢ # ⎥
⎢⎣ − an −bn ⎥⎦ ⎢⎣ f n ⎥⎦ ⎢⎣φn0 − φn ⎥⎦
⎡ s12 0 0 " 0⎤
⎢ ⎥
⎢0 s22 0 " 0⎥
Q=⎢0 0 s32 " 0⎥
⎢ ⎥
⎢# # # % #⎥
⎢⎣ 0 0 0 " sn2 ⎥⎦
The normal equation coefficient matrix N and vector of numeric terms t have the following
form
⎡ n n
⎤ ⎡ n ⎤
⎢ ∑ wk ak ∑ ⎢ ∑ wk ak f k ⎥
2
w a b
k k k⎥
N = BT WB = ⎢ n ⎥ , t = BT Wf = ⎢ k =1 ⎥
k =1 k =1
(10.15)
⎢ n
⎥ ⎢ n
⎥
⎢ ∑ wk ak bk ∑ wk bk2 ⎥ ⎢ ∑ wk bk f k ⎥
⎣ k =1 k =1 ⎦ ⎣ k =1 ⎦
Note that if each element of the coefficient matrix B and vector of numeric terms f is divided
by the appropriate estimate of the standard deviation augmented matrices B and f can be
formed
⎡ (φ 0 − φ1 ) s1 ⎤
⎡ −a1 s1 −b1 s1 ⎤ ⎡ f1 s1 ⎤ ⎢ 1 ⎥
⎢ −a s2 −b2 s2 ⎥ ⎢ f s ⎥ ⎢ (φ20 − φ2 ) s2 ⎥
⎢ 2 ⎥ ⎢ 2 2⎥ ⎢ ⎥
B = ⎢ − a3 s3 −b3 s3 ⎥ , f = ⎢ f 3 s 3 ⎥ = ⎢ (φ30 − φ3 ) s3 ⎥ (10.16)
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ # ⎥ ⎢ # ⎥ ⎢ # ⎥
⎢⎣ − an −bn sn ⎥⎦ ⎢⎣ f n sn ⎥⎦ ⎢ 0 ⎥
⎢⎣ (φn − φn )
sn
sn ⎥⎦
and the normal equation coefficient matrix N and vector of numeric terms t are given by
N = BT WB = BT B
(10.17)
t = BT Wf = BT f
coordinates EP0 , N P0 . These corrections are added to the approximate values to obtain
updated values of the approximations and another iteration performed.
When the corrections for kth iteration reach some desired value, say less than 0.5 mm, then the
current "approximate" values may be regarded as exact and the solution is complete.
At the end of the iterative process, residuals v are computed from equation (10.11) and an
assessment of the "quality" of the observations can be made. Large residuals may indicate
poor observations.
The variance factor σ 02 can be computed from the residuals after the adjustment process by
using the following
v T Wv f T Wf − xT t
σ 02 = = (10.18)
n−u n−u
Note that in (10.18) the variance factor can be computed without calculating the residuals.
C φC
D
N φD
P
φB
B
N
φA
N A
The data for this intersection is shown on the previous page. The coordinates of P are
computed from equations (10.3) and (10.2) then rounded down to the nearest 0.1 metres.
Computation of Direction coefficients and numeric terms for the observation equation
vk − ak ΔN − bk ΔE = φk0 − φk (10.19)
Note 1 In equation (10.19) the numeric terms on the right-hand-side are computed bearing
minus observed bearing.
Note 2 The dimensions (or units) of the numeric terms on the right-hand-side of (10.19) are
seconds of arc. This means that the elements on the left-hand-side must have
consistent dimensions, i.e., vk (seconds), ak , bk (seconds/length) and ΔE , ΔN
(length). If the corrections to approximate coordinates are expressed in centimetres
(cm), then the direction coefficients have dimensions of sec/cm and equations (10.6)
are
N P0 − N k cos φk0
bk = × ρ ′′ = × ρ ′′
(s )
0 2
k
sk0
180
where distances and coordinate differences are in cm's and ρ ′′ = × 3600 (seconds
π
in one radian).
ΔN ΔE
0.837318 -1.204891 -3.92
1.433784 -0.226459 5.43
B= f=
-0.584244 1.548607 -5.94
-1.796911 0.578189 13.52
In this example, the observations are assumed to be of equal precision. In such cases the
weight matrix W can be replaced by the Identity matrix I, and the normal equation coefficient
matrix N = BT WB = BT IB = BT B and the vector of numeric terms t = BT Wf = BT If = BT f
Station Residual
A -3.68 (sec)
B 10.42
C -4.33
D 8.01
An estimate of the variance factor σˆ 02 can be computed from equation (10.18) with W = I,
n = 4, u = 2 and
v T v f T f − xT t 262.925300 − 57.957192
σˆ 02 = = = = 102.484054 sec2
n−u 4−2 2
Assuming all the observations are of equal precision and letting W = I is equivalent to
assigning an estimated standard deviation of 1 second to each observation. Inspection of the
variance factor shows that if a standard deviation of 10.12 sec (10.12 = 102.484054 ) was
used as an estimate of the standard deviation of the observed directions, the estimate of the
variance factor computed from the adjustment would have been approximately unity (a
variance factor of unity indicates that the estimates of variances are close to the population
statistics).
From this adjustment, we may conclude that the standard deviation of the observed directions
was approximately 10.1 sec.
A most important "by-product" of a least squares adjustment is the ability to estimate the
precision of the computed quantities. Theory shows that this information is contained in the
inverse of the normal equations and the covariance matrix of the computed quantities Σ xx is
given by
σ E = 40.3804 cm = 0.064 m
σ N = 27.0319 cm = 0.052 m
Using the formulae from Chapter 8, Section 8.3 (replacing s with σ ), the lengths of the semi-
axes of the Standard Error Ellipse are
(σ − σ N2 ) + 4 (σ EN )
2
W = 2 2
E
= 43.9105
a = (
1 2
2
σ E + σ N2 + W ) = 7.4607 cm
b =
2
(σ E + σ N2 − W ) = 3.4280 cm
1 2
The angle between the E-axis and the major axis (positive anti-clockwise), noting the
quadrant signs to determine the proper quadrant of 2θ
2σ EN 2 ( 20.9162 ) ⎛+⎞
tan 2θ = = =⎜ ⎟
σ E − σ N 40.3804 − 27.0319 ⎝ + ⎠
2 2
Substituting the values φ = θ = 36D 09′ 04′′ and φ = 90D + θ = 126D 09′ 04′′ into equation (8.3)
gives su = 7.4607 cm and su = 3.4280 cm respectively so θ = 36D 09′ 04′′ is the angle
(positive anti-clockwise) between the E-axis and the major axis. Hence the bearing of the
major axis is 90D − θ = 53D 50′ 56′′
Figure 10.2 shows a schematic diagram of the example intersection and the Standard Error
Ellipse
REFERENCES
Cross, P.A. 1992, Advanced Least Squares Applied to Position Fixing, Working Paper No. 6,
Department of Land Information, University of East London.
Deakin, R.E. and Kildea, D.G., 1999, 'A note on standard deviation and RMS', The
Australian Surveyor, Vol. 44, No. 1, pp. 74-79, June 1999.
Gauss, K.F. 1809, Theory of the Motion of the Heavenly Bodies Moving about the Sun in
Conic Sections, a translation of Theoria Motus Corporum Coelestium in sectionibus
conicis solem ambientium by C.H. Davis, Dover, New York, 1963.
Johnson, N.L. and Leone, F.C., 1964, Statistics and Experimental Design In Engineering and
the Physical Sciences, Vol. I, John Wiley & Sons, Inc., New York
Krakiwsky, E.J. 1975, A Synthesis of Recent Advances in the Method of Least Squares,
Lecture Notes No. 42, 1992 reprint, Department of Surveying Engineering, University
of New Brunswick, Fredericton, Canada
Kreyszig, Erwin, 1970, Introductory Mathematical Statistics, John Wiley & Sons, New York.
Leahy, F.J. 1974, 'Two hundred years of adjustment of survey measurements', Two Centuries
of Surveying: Proceedings of the 17th Australian Survey Congress, Melbourne, 23 Feb.
− 4 Mar. 1974, Institution of Surveyors, Australia, pp.19-29.
Merriman, M. 1905, Method of Least Squares, 8th edn, John Wiley & Sons, New York.
Mikhail, E.M. 1976, Observations and Least Squares, IEP−A Dun-Donnelley, New York.
Mikhail, E.M. and Gracie, G. 1981. Analysis and Adjustment of Survey Measurements. Van
Nostrand Reinhold, New York, 340 pages.
Rainsford, H.F., 1968, Survey adjustments and least Squares, Constable, London.
Walpole, R.E., 1974, Introduction to Statistics, 2nd edn, Macmillan Publishing Co., Inc.
New York.
Wells, D.E. and Krakiwsky, E.J. 1971. The Method of Least Squares. Lecture Notes No. 18,
Department of Surveying Engineering, University of New Brunswick, May 1971,
reprinted September 1992, 180 pages.
A 1 INTRODUCTION
A 2 DEFINITIONS
A 2.1 Matrix
A letter or symbol refers to the whole matrix. In many texts and references, matrices are
denoted by boldface type, ie,
A X P Q W
Matrices may also be indicated by pacing a tilde (~) under a symbol, ie,
A X P Q W
~ ~ ~ ~ ~
Example A 1
LM2 2 −6 OP LM 2 5 OP
MM2
A= 5 1 9
−8QP
P MM−4
C= 3 5
1QP
P x = −14 22 3 −8
N 2 N
A, C and x are all matrices. Note that the matrix x is a row matrix or row vector. Row
matrices or row vectors are usually denoted by lowercase letters.
Individual elements of a matrix are shown by lowercase letters ai j where the subscripts i and j
indicate the element lies at the intersection of the i th row and the j th column. The first
subscript always refers to the row number and the second to the column number.
column j
B
LM a 11 a12 a13 a1 j a1n OP
MM a 21 a22 a23 a2 j a2 n
PP
=M PP ← row i
A m ,n
MM a ai 2 ai 3 ca h ai n
PP
i1 ij
MMNa
m1 am 2 am 3 am j amn PQ
Another way of representing a matrix is by typical element, for example
i = 1, 2, … , n
A = ai j n s j = 1, 2, … , m
A matrix is said to be of order m by n (or m, n) where m is the number of rows and n is the
number of columns. The order of a matrix may be expressed in various ways ie,
A m ,n A a m ,n f Am × n Aa m × n f m An
Example A 2
LM2 2 −6 OP LM 2 5 OP
A3,3
MM2
= 5 1 9
−8PQ
P C3,2
MM−4
= 3 5
1PQ
P x1, 4 = −14 22 3 −8
N 2 N
Matrix A is of order (3,3), matrix C is (3,2) and x is (1,4). If a matrix is of order (1,1), it is
called a scalar.
A 3 TYPES OF MATRICES
A square matrix is a matrix with an equal number of rows and columns. A square matrix
would be indicated by A m,m and said to be of order m. Square matrices have a principal or
leading diagonal whose elements are aij for i = j . In matrix A below, order (5,5), elements a,
g, m, s and y lie on the leading diagonal.
LM a b c d e OP
MM kf g h i j
PP
= l m n o
A 5, 5
MM p q r s t PP
MNu v w x y PQ
Special cases of square matrices are symmetric and skew-symmetric which are described
below.
A column matrix or column vector is a matrix composed of only one column. Column
vectors are usually designated by lowercase letters, for example
LM b OP
1
M
= Mb P
b
2
P
bm,1
MM PP
3
MNb PQ
m
A row matrix or row vector is a matrix composed of only one row. Row vectors are usually
designated by lowercase letters, for example
b1,n = b1 b2 b3 bn
A diagonal matrix is a square matrix with all "off-diagonal" elements equal to zero
LMd 11 0 0 0 OP
M
=M0
0 d22 0 0
0 P
P where dij = 0 for i ≠ j
MM
0 d33
D m,m
PP
MN 0 0 0 d PQ
mm
A diagonal matrix may have some diagonal elements equal to zero. A diagonal matrix is
often shown in the form
l
D = diag d1 , d2 , d3 , , dm q
A 3.5 Scalar Matrix
A scalar matrix is a diagonal matrix whose elements are all equal to the same scalar quantity
LMa 0 0 0 OP
M
A = M0
0 a
0
0
a
0
0
PP aij = 0 for i ≠ j
MM PP where
aij = a for i = j
MN0 0 0 a PQ
Example A 3
LM2 0 0 OP
MM0
W= 0 2 0
2 QP
P is a (3,3) scalar matrix
N 0
An identity or unit matrix is a diagonal matrix whose elements are all equal to 1 (unity). It is
always referred to as I where
LM1 0 0 0 OP
M
I = M0
0 1
0
0
1
0
0P
P
MM PP
MN0 0 0 1 PQ
Note that all the "off-diagonal" elements are zero and all the elements of the leading diagonal
are unity.
A null or zero matrix is a matrix whose elements are all zero. It is denoted by boldface 0.
A triangular matrix is a square matrix whose elements above, or below, but not including the
leading diagonal, are all zero. Square matrices whose elements above the leading diagonal
are zero are known as lower triangular matrices.
LM l11 0 0 0 OP
M
= Ml
l
21 l22 0 0
PP where lij = 0 for i < j
MM
l32 l33 0
L m,m 31
PP
MNl
m1 lm 2 lm 3 lmm PQ
Square matrices whose elements below the leading diagonal are zero are known as upper
triangular matrices.
Example A 4
LM2 1 −2 4 3 OP
M 0 5 4 −3 2
PP LM8 0 0 OP
G = M0
MM0
0 3 2 1
6 PP
MM5
H= 3 4 0
6 PQ
P
0 0 5 N −2
MN0 0 0 0 2 PQ
G and H are both triangular matrices. G is an upper triangular matrix and H is a lower
triangular matrix. Triangular matrices of order n have n2 + n 2 non-zero elements. c h
A 3.9 Unit Lower Triangular Matrix
This is a special case of a lower triangular matrix, in which all the elements of the leading
diagonal are equal to unity.
LM 1 0 0 0 OP
M
= Ml
l
21 1 0 0
PP lij = 0 for i < j
L m,m
MM31 l32 1 0
PP where
lij = 1 for i = j
MNl
m1 lm 2 lm 3 1 PQ
This is a special case of an upper triangular matrix, in which all the elements of the leading
diagonal are equal to unity.
MN0 0 0 1 PQ
A banded matrix is any square matrix in which the only non-zero elements occur in a band
about the leading diagonal. Thus, if A is to be a banded matrix
LMa 11 a12 0 0 OP
A=M PP
a a22 a23 0
where aij = 0 for i − j > 1
21
A 4 MATRIX OPERATIONS
A 4.1 Equality
Two matrices A and B are equal if and only if they are the same order and aij = bij for all i
and j. Matrices of different order cannot be equated.
A 4.2 Addition
The sum of two matrices A and B, of the same order, is a matrix C of that order whose
elements are cij = aij + bij for all i and j. Matrices of different order cannot be added. The
following laws of addition hold true for matrix algebra:
associative law A + (B + C) = (A + B) + C = A + B + C
Example A 5
Example A 6
k (A + B) = kA + kB
( k + q )A = kA + qA
k (AB) = ( kA)B = A( kB)
k ( qA) = ( kq )A
LMb OP
1
a1 a2 a3 MMb PP = a b + a b + a b
2 1 1 2 2 3 3
Nb Q
3
For three matrices A, B and C with their respective elements aij , bij and cij , then
which states that the element in row i and column j of C is equal to the scalar product of row i
of A and column j of B
It is important to note that for matrix multiplication to be defined the number of columns of
the first matrix must be equal to the number of rows of the second matrix.
As a quick method of assessing whether a matrix multiplication is defined, write down the
matrices to be multiplied with their associated rows and columns, ie,
If they are the same, then the multiplication is defined and the product matrix has an order
equal to the "outer numbers".
A 4 , 2 B2 ,6 = C 4 , 6
A A
Remember, that in all cases, the first number of the matrix order refers to the number of rows
and the second number refers to the number of columns.
Example A 7
LM1 2 OP LM5 1 1 3 OP
A 3,2
MM6
= 3 0
4PQ
P and B2, 4 =
N2 Q
N 3 1 2
In these relationships above, the sequence of the matrices is strictly preserved. Note that in
general, the commutative law of algebra does not hold for matrix multiplication even if
multiplication is defined in both orders, ie,
AB ≠ BA in general
Example A 8
A is of order (2,3), B is of order (3,2), AB is of order (2,2) and BA is of order (3,3). Even if
both matrices are square and of the same order, the results will in general not be the same
when the order of multiplication is reversed.
Example A 9
LM−5 4 3 OP LM 2 −6 OP LM0 0 OP
MM 5
A= 3 6 8 ;
4 PQ
P MM−6
B= 7 −21 ;
18PQ
P MM0
AB = 0 0
0 PQ
P
N 2 N N
This differs from ordinary algebra where if, for example, a × b = 0 then either a or b or both a
and b are zero.
Some particular results involving diagonal matrices are useful. If A is a square matrix and D
is a diagonal matrix of the same order, then
d j j of D.
Example A 10
LMd p
11 0 0 OP
=M PP
p
0 d 0
MM
22
Dnp,n
N0 0 dnnp
PQ
and for p > 0 and q > 0
D p Dq = D p+q
1 1
and in particular D2D2 = D
The transpose of a matrix A of order (m,k) is a (k,m) matrix formed from A by interchanging
rows and columns such that row i of A becomes column i of the transposed matrix. The
transpose of A is denoted by A T . There are various other notations used to indicate the
transpose of a matrix, such as: At , A′ , A* and A .
Example A 11
LM 3 2 3 OP LM3 1 4 OP
MM4
A= 1 −2 3
6QP
P A T
MM3
= 2 −2 0
6QP
P
N 0 N 3
LM4 OP
5
LM4 1 7 OP
MM7
B= 1 −2
0 PQ
P BT =
N5 −2 0 Q
N
cA hT T
=A
u = x T Ay
If x is an (m,1) vector of variables and B an (m,m) square matrix of constants, the scalar
function
q = x T Bx
is known as a quadratic form. An example of a quadratic form is the sum of the squares of
the weighted residuals; the function ϕ = vT Wv , which is minimised in least squares.
Division is not defined in matrix algebra. In place of division, the inverse A −1 of a square
matrix A is introduced. This inverse, if it exists, has the property
AA −1 = A −1A = I
This relationship defines the Cayley Inverse for square matrices only. A square matrix whose
determinant is zero is singular and a singular matrix does not have an inverse. A square
matrix whose determinant is non-zero is non-singular and does have an inverse. Furthermore,
if the inverse exists it is unique. Rectangular matrices have no determinants and so they are
taken to be singular but they may have an inverse (such as Moore-Penrose inverses), defined
using Generalised Matrix Algebra. These "generalised inverses" are not used in these notes.
Consider the matrix equation Ax = b . If A and b are known, then x may be determined from
x = A −1b . x is found in a sense, by "dividing" b by A, but in actual fact x is determined by
pre-multiplying both sides of the original equation by the inverse A −1 . For example
Ax = b
giving
Matrix inversion plays an important part in least squares, primarily in the solution of systems
of linear equations. If the order of A is small, say (2,2) or (3,3), then manual calculation of
the inverse is relatively simple. But as the order of A increases, computer programs or
software products such as Microsoft's Excel or The MathWorks MATLAB are the appropriate
tools to calculate inverses and solve systems of equations.
For a (2,2) matrix the inverse is simple and may be computed from the following relationship
LM da 11 da12 da1nOP
MM dadu du du
da P
P
M du P
21 da22 2n
dA
du m,n
= M du
MM
du
PP
MM dadu dam 2 da PP
du P
m1 mn
N du
Q
Example A 12
A=
LM3u 2u OP
2 3
then
dA
=
LM
6u 6u 2 OP
N u 4u Q
2 4
du N
2u 16u 3 Q
LM2u OP2
L 4u OP
dx M
x= Mu P
du M
MN12u PPQ
3
then = 3u 2
MN3u PQ
4 3
2. For the matrix product C = AB where the elements of the matrices A and B are
differentiable functions of the (scalar) variable u then dC du is given by
dC d dA dB
= (AB) = B+A
du du du du
Note that the sequence adopted in the product terms must be followed exactly, since for
example, the derivative of AB is in general not the same as the derivative of BA.
3. If a vector y m,1 represents m functions of the n elements of a variable vector x n,1 then the
total differential dy is given by
∂y
dy = dx
∂x
LM dy OP
1 LM dx OP
1
=M P =M P
dy dx
MM PP MM PP
2 2
dy m,1 and dx n,1
Ndy Qm Ndx Q
n
and the partial derivative ∂y ∂x is an (m,n) matrix known as the Jacobian Matrix
LM ∂y 1 ∂y1 ∂y1OP
MM ∂x 1 ∂x2 ∂xn
PP
M
∂y M ∂x
∂y ∂y2 ∂y P
∂x P
2 2
∂x2
∂x M PP
= 1 n
MM P
MM ∂y ∂ym ∂y P
∂x P
m m
MN ∂x1 ∂x2
PQ
n
AA −1 = I
d
dx
c
AA −1 =
dI
dx
h
=0
d A −1 dA −1
A +A =0
dx dx
hence
dA −1 dA − 1
= A −1 A
dx dx
∂u ∂u
= yT AT and = xT A
∂x ∂y
∂q
= 2xT A
∂x
These differentials are given without proof, but can be verified in the following manner
LM x OP1
Ly O LMa11 a12 OP
= Mx P , = M P , and A = Ma PP
1
let x 3,1
MN x PQ
2
3
y 2,1
Ny Q
2
3,2
MNa
21
31
a22
a32 Q
then
LMa11 a12 OP L y O
u = x Ay = x1 MMa PP MNy PQ
T 1
x2 x3 21 a22
Na
31 a32 Q 2
Ny Q
2
and
∂u
= y1 a11 + y2 a12 y1 a21 + y2 a22 y1 a31 + y2 a32
∂x
a f
= Ay
T
= yT AT
and
∂u
= x1a11 + x2 a21 + x3a31 x1a12 + x2 a22 + x3a32
∂y
= xT A
Using similar methods, the partial differential for the quadratic form q can also be verified.
More explicit proofs can be found in Mikhail (1976, pp.457-460) and Mikhail & Gracie
(1981, pp.322-324).
A subset of elements from a given matrix A is called a sub-matrix and matrix partitioning
allows the matrix to be written in terms of sub-matrices rather than individual elements.
Thus, the matrix A can be partitioned into sub-matrices as follows
A = A1 A 2
A=
LM A OP
1
NA Q 2
A=
LMA 11 A12 OP
NA 21 A 22 Q
where A11 is a (2,3) sub-matrix, A12 is a (2, (n − 3) ) sub-matrix, A 21 is an ( ( m − 2) ,3) sub-
matrix and A 22 is an ( ( m − 2) , (n − 3) ) sub-matrix.
All matrix operations outlined in the previous sections can be performed on the sub-matrices
as if they are normal matrix elements providing necessary precautions are exercised regarding
dimensions.
Example A 13
Transposing partitioned matrices
LM 1 2 3 4 OP LA A12 OP
8 =M
MM9
A= 5
P NA
11
N
6 7
10 11 12 PQ
21 A 22 Q
and
LM 1 5 9 OP
AT =
LMA T
A T
OP = M2 6 10
PP
Q MM 3
11 21
NA T
12 A T
22
7 11
P
N4 8 12 Q
Example A 14
Multiplying partitioned matrices
LM 1 OP LA LM 1 7OP
2 3 4
2 =M
A12 OP =M
2 4
PP LM OP
B11
MM3
= 2 −1 5
P NA =
11
A3,4
N −2 PQ
21 A 22 Q 2, 2
and B4 , 2
MM 7 6
PQ N Q
B21 2 ,1
2 1
N −2 5
the product is
AB = C =
LM(A B
11 11 + A12 B21 ) OP
N(A B
21 11 + A 22 B21 ) Q 2 ,1
where
A11B11 =
LM 1 2OPLM1 7OP = LM5 15OP ; A12 B21 =
LM3 4OPLM 3 6OP LM
=
1 38 OP
N2 −1QN2 4Q N0 10Q N5 2 Q N − 2 5 Q N
11 40 Q
= 3 2M
L1 7OP = 7 29 ; = 1 −2 M
L3 6O
5PQ
= 7 −4
A 21B11
N2 4 Q A 22 B21
N−2
noting that columns of A xx must equal rows of B xx . The product is
LM 6 53 OP
MM14
AB = C = 11 50
25PQ
P
N
A symmetric matrix is defined to be a matrix that remains invariant when transposed, ie,
AT = A
Symmetric matrices are always square matrices. For any symmetric matrix A, the elements
conform to the following
aij = a j i
Example A 15
LMa 11 3 5 7 OP
A=M PP
3 a22 9 11
MM 5 9 a33 13
PQ
N7 11 13 a44
For any matrix A and for any symmetric matrix B the matrices
AA T , A T A , ABA T and A T BA
In least squares, we are often dealing with symmetric matrices. For example, the matrix
equation Nu,u = BuT,n Wn,n Bn,u often appears. B is an (n,u) matrix of coefficients of the u
unknowns in n equations, W is an (n,n) weight matrix (always symmetric) and N is the (u,u)
symmetric coefficient matrix of the set of u normal equations.
AT = − A
aij = − a j i
Note that this definition means that the elements of the leading diagonal can only be zero. An
example of a skew-symmetric matrix of order 3 is
LM 0 b c OP
MM−c
A = −b 0 d
0 PQ
P
N −d
Example A 16
Skew-symmetric matrices are found in some surveying and geodesy applications. For
instance, a 3D conformal transformation from one orthogonal coordinate system (x,y,z) to
another (X,Y,Z) is defined by the matrix equation
LM X OP LM x OP LMT OP X
NQ NQ N Q Z
where λ is a scale factor, TX , TY and TZ translations between the coordinate origins and Rκ φ ω
is a rotation matrix derived by considering successive rotations ω , φ and κ about the x, y and
z axes respectively
LM c cφ κ cω sκ + sω sφ cκ OP
sω sκ − cω sφ cκ
Rκ φ ω = M− c sφ κ cω cκ − sω sφ sκ s c +c s s P
ω κ ω φ κ
MN s φ − sω cφ c cω φ
PQ
Note that cκ sφ sω = cosκ sin φ sin ω , and x, y, z and X, Y, Z refer to the axes of right-handed
In many applications the rotation matrix Rκ φ ω can be simplified because ω , φ and κ are
small (often less than 3°). In such cases, the sines of angles are approximately equal to their
radian measures, the cosines are approximately 1 and products of sines are approximately
zero. This allows the rotation matrix Rκ φ ω to be approximated by R S
LM 1 κ −φ OP
RS
MM φ
= −κ 1 ωP
1 PQ
N −ω
Note that R S can be expressed as the sum of the identity matrix I and a skew-symmetric
matrix S
LM 1 κ −φ OP LM1 0 0 OP LM 0 κ −φ OP
RS
MM φ
= −κ 1 ωP= 0
M
1 PQ MN0
1
P M
0 + −κ
1 PQ MN φ
0 ω P = I+S
0 PQ
N −ω 0 −ω
Every square matrix can be uniquely decomposed into the sum of a symmetric and skew-
symmetric matrix. Consider the following
A = A + 12 A T − 12 A T
= 1
2 cA + A h + cA − A h
T 1
2
T
= A Sym + A Skew
where
and
and hence an orthogonal matrix has the very useful property that its inverse matrix is the same
as its transpose matrix, or
A −1 = A T (if A is orthogonal)
The terms norm and orthogonal are applicable to vector algebra. The norm of a vector is the
magnitude of the vector and is the square root of the product of the vector and its transpose.
Any row (or column) of a matrix has all the characteristics of a vector, and hence the norm of
any row (or column) of a matrix is the square root of the product of the row (or column) by its
transpose. Two vectors are orthogonal if, and only if, their scalar product is zero.
Considering rows and columns of the matrix as vectors, then any two matrix rows (or
columns) are orthogonal if their scalar product is zero.
Example A 17
Rotation matrices are examples of orthogonal matrices. For example, consider a point P with
a f
coordinates P e, n in the east-north coordinate system. If the axes are rotated about the
origin by an angle θ (measured clockwise from north), P will have coordinates e′, n′ in the
rotated system equal to
e′ = e cos θ − n sin θ
n′ = e sin θ + n cos θ
R =M
Lcosθ − sin θ OP
where θ
N sin θ cosθ Q is known as the rotation matrix.
The norms of the columns and rows of Rθ are unity since sin 2 θ + cos2 θ = 1 and the columns
and rows are orthogonal since sin θ cos θ − sin θ cos θ = 0 . Hence Rθ is orthogonal and its
inverse is equal to its transpose. This is useful in defining the transformation from e′, n′ to
e, n coordinates. Pre-multiplying both sides of the original transformation by the inverse Rθ−1
gives
Rθ−1
LMe′ OP = R R LMeOP
−1 LMe OP = R LMe′ OP
−1
since Rθ−1Rθ = I
Nn ′ Q θ θ
Nn Q or
Nn Q Nn ′ Q
θ
and
APPENDIX A REFERENCES
Mikhail, E.M. 1976, Observations and Least Squares, IEP−A Dun-Donelley, New York.
Mikhail, E.M. and Gracie, G. 1981, Analysis and Adjustment of Survey Measurements. Van
Nostrand Reinhold, New York.
Williams, I.P. 1972, Matrices for Scientists, Hutchinson &Co Ltd, London.