0% found this document useful (0 votes)

2 views

LS MethodLeastSquares

The Method of Least Squares is a mathematical procedure used to find the best fit line for a set of observed data points, typically represented as a linear equation. The method minimizes the sum of the squares of the differences between observed values and the values predicted by the linear model, allowing for generalization to other forms of functions. The document provides a detailed explanation of the underlying calculus and statistics necessary for applying this method effectively.

Uploaded by

ksukirill3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

LS MethodLeastSquares

Uploaded by

ksukirill3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

The Method of Least Squares

Steven J. Miller∗

Department of Mathematics and Statistics

Williams College
Williamstown, MA 01267

Abstract
The Method of Least Squares is a procedure to determine the best fit line to data; the
proof uses calculus and linear algebra. The basic problem is to find the best fit straight
line 𝑦 = 𝑎𝑥 + 𝑏 given that, for 𝑛 ∈ {1, . . . , 𝑁 }, the pairs (𝑥𝑛 , 𝑦𝑛 ) are observed. The
method easily generalizes to finding the best fit of the form

𝑦 = 𝑎1 𝑓1 (𝑥) + ⋅ ⋅ ⋅ + 𝑐𝐾 𝑓𝐾 (𝑥); (0.1)

it is not necessary for the functions 𝑓𝑘 to be linearly in 𝑥 – all that is needed is that 𝑦 is to
be a linear combination of these functions.

Contents
1 Description of the Problem 1

2 Probability and Statistics Review 3

3 The Method of Least Squares 5

1 Description of the Problem

Often in the real world one expects to find linear relationships between variables. For example,
the force of a spring linearly depends on the displacement of the spring: 𝑦 = 𝑘𝑥 (here 𝑦 is
the force, 𝑥 is the displacement of the spring from rest, and 𝑘 is the spring constant). To test
the proposed relationship, researchers go to the lab and measure what the force is for various
displacements. Thus they assemble data of the form (𝑥𝑛 , 𝑦𝑛 ) for 𝑛 ∈ {1, . . . , 𝑁}; here 𝑦𝑛 is
the observed force in Newtons when the spring is displaced 𝑥𝑛 meters.
∗
E-mail: [email protected]

1
100

5 10 15 20

Figure 1: 100 “simulated” observations of displacement and force (𝑘 = 5).

Unfortunately, it is extremely unlikely that we will observe a perfect linear relationship.

There are two reasons for this. The first is experimental error; the second is that the underlying
relationship may not be exactly linear, but rather only approximately linear. (A standard
example is the force felt on a falling body. We initially approximate the force as 𝐹 = 𝑚𝑔 with
𝑔 the acceleration due to gravity; however, this is not quite right as there is a resistive force
which depends on the velocity.) See Figure 1 for a simulated data set of displacements and
forces for a spring with spring constant equal to 5.
The Method of Least Squares is a procedure, requiring just some calculus and linear alge-
bra, to determine what the “best fit” line is to the data. Of course, we need to quantify what
we mean by “best fit”, which will require a brief review of some probability and statistics.
A careful analysis of the proof will show that the method is capable of great generaliza-
tions. Instead of finding the best fit line, we could find the best fit given by any finite linear
combinations of specified functions. Thus the general problem is given functions 𝑓1 , . . . , 𝑓𝐾 ,
find values of coefficients 𝑎1 , . . . , 𝑎𝐾 such that the linear combination

𝑦 = 𝑎1 𝑓1 (𝑥) + ⋅ ⋅ ⋅ + 𝑎𝐾 𝑓𝐾 (𝑥) (1.1)

is the best approximation to the data.

2
2 Probability and Statistics Review
We give a quick introduction to the basic elements of probability and statistics which we need
for the Method of Least Squares; for more details see [BD, CaBe, Du, Fe, Kel, LF, MoMc].
Given a sequence of data 𝑥1 , . . . , 𝑥𝑁 , we define the mean (or the expected value) to be
(𝑥1 + ⋅ ⋅ ⋅ + 𝑥𝑁 )/𝑁. We denote this by writing a line above 𝑥: thus
𝑁
1 ∑
𝑥 = 𝑥𝑛 . (2.2)
𝑁 𝑛=1
The mean is the average value of the data.
Consider the following two sequences of data: {10, 20, 30, 40, 50} and {30, 30, 30, 30, 30}.
Both sets have the same mean; however, the first data set has greater variation about the mean.
This leads to the concept of variance, which is a useful tool to quantify how much a set of data
fluctuates about its mean. The variance1 of {𝑥1 , . . . , 𝑥𝑁 }, denoted by 𝜎𝑥2 , is
𝑁
1 ∑
𝜎𝑥2 = (𝑥𝑖 − 𝑥)2 ; (2.3)
𝑁 𝑛=1

the standard deviation 𝜎𝑥 is the square root of the variance:

v
u
u1 ∑ 𝑁
𝜎𝑥 = ⎷ (𝑥𝑖 − 𝑥)2 . (2.4)
𝑁 𝑛=1

Note that if the 𝑥’s have units of meters then the variance 𝜎𝑥2 has units of meters2 , and the
standard deviation 𝜎𝑥 and the mean 𝑥 have units of meters. Thus it is the standard deviation
that gives a good measure of the deviations of the 𝑥’s around their mean, as it has the same
units as our quantity of interest.
There are, of course, alternate measures one can use. For example, one could consider
𝑁
1 ∑
(𝑥𝑛 − 𝑥). (2.5)
𝑁 𝑛=1
Unfortunately this is a signed quantity, and large positive deviations can cancel with large
negatives. In fact, the definition of the mean immediately implies the above is zero! This,
then, would be a terrible measure of the variability in data, as it is zero regardless of what the
values of the data are.
We can rectify this problem by using absolute values. This leads us to consider
𝑁
1 ∑
∣𝑥𝑛 − 𝑥∣. (2.6)
𝑁 𝑛=1
1
For those who know more advanced statistics, for technical reasons the correct definition of the sample
variance is to divide by 𝑁 − 1 and not 𝑁 .

3
While this has the advantage of avoiding cancellation of errors (as well as having the same
units as the 𝑥’s), the absolute value function is not a good function analytically. It is not
differentiable. This is primarily why we consider the standard deviation (the square root of
the variance) – this will allow us to use the tools from calculus.
We can now quantify what we mean by “best fit”. If we believe 𝑦 = 𝑎𝑥+𝑏, then 𝑦−(𝑎𝑥+𝑏)
should be zero. Thus given observations
{(𝑥1 , 𝑦1), . . . , (𝑥𝑁 , 𝑦𝑁 )}, (2.7)
we look at
{𝑦1 − (𝑎𝑥1 + 𝑏), . . . , 𝑦𝑁 − (𝑎𝑥𝑁 + 𝑏)}. (2.8)
The mean should be small (if it is a good fit), and the sum of squares of the terms will measure
how good of a fit we have.
We define
∑𝑁
𝐸(𝑎, 𝑏) := (𝑦𝑛 − (𝑎𝑥𝑛 + 𝑏))2 . (2.9)
𝑛=1
Large errors are given a higher weight than smaller errors (due to the squaring). Thus our pro-
cedure favors many medium sized errors over a few large errors. If we used absolute values to
measure the error (see equation (2.6)), then all errors are weighted equally; however, the ab-
solute value function is not differentiable, and thus the tools of calculus become inaccessible.
Remark 2.1 (Choice of how to measure errors). As the point is so important, it is worth
looking at one more time. There are three natural candidates to use in measuring the error
between theory and observation:
𝑁
∑
𝐸1 (𝑎, 𝑏) = (𝑦𝑖 − (𝑎𝑥𝑖 + 𝑏)) , (2.10)
𝑛=1

𝑁
∑
𝐸2 (𝑎, 𝑏) = ∣𝑦𝑖 − (𝑎𝑥𝑖 + 𝑏)∣ (2.11)
𝑛=1
and
𝑁
∑
𝐸3 (𝑎, 𝑏) = (𝑦𝑖 − (𝑎𝑥𝑖 + 𝑏))2 . (2.12)
𝑛=1
The problem with (2.10) is that the errors are signed quantities, and positive errors can cancel
with negative errors. The problem with (2.11) is that the absolute value function is not differ-
entiable, and thus the tools and results of calculus are unavailable. The problem with (2.12)
is that errors are not weighted equally: large errors are given significantly more weight than
smaller errors. There are thus problems with all three. That said, the problems with (2.12)
is not so bad when compared to its advantages, namely that errors cannot cancel and that
calculus is available. Thus, most people typically use (2.12) and measure errors by sums of
squares.

4
3 The Method of Least Squares
Given data {(𝑥1 , 𝑦1 ), . . . , (𝑥𝑁 , 𝑦𝑁 )}, we defined the error associated to saying 𝑦 = 𝑎𝑥 + 𝑏 by
𝑁
∑
𝐸(𝑎, 𝑏) := (𝑦𝑛 − (𝑎𝑥𝑛 + 𝑏))2 . (3.13)
𝑛=1

Note that the error is a function of two variables, the unknown parameters 𝑎 and 𝑏.
The goal is to find values of 𝑎 and 𝑏 that minimize the error. In multivariable calculus we
learn that this requires us to find the values of (𝑎, 𝑏) such that the gradient of 𝐸 with respect
to our variables (which are 𝑎 and 𝑏) vanishes; thus we require
( )
∂𝐸 ∂𝐸
∇𝐸 = , = (0, 0), (3.14)
∂𝑎 ∂𝑏
or
∂𝐸 ∂𝐸
= 0, = 0. (3.15)
∂𝑎 ∂𝑏
Note we do not have to worry about boundary points: as ∣𝑎∣ and ∣𝑏∣ become large, the fit will
clearly get worse and worse. Thus we do not need to check on the boundary.
Differentiating 𝐸(𝑎, 𝑏) yields
𝑁
∂𝐸 ∑
= 2 (𝑦𝑛 − (𝑎𝑥𝑛 + 𝑏)) ⋅ (−𝑥𝑛 )
∂𝑎 𝑛=1
𝑁
∂𝐸 ∑
= 2 (𝑦𝑛 − (𝑎𝑥𝑛 + 𝑏)) ⋅ (−1). (3.16)
∂𝑏 𝑛=1

Setting ∂𝐸/∂𝑎 = ∂𝐸/∂𝑏 = 0 (and dividing by -2) yields

𝑁
∑
(𝑦𝑛 − (𝑎𝑥𝑛 + 𝑏)) ⋅ 𝑥𝑛 = 0
𝑛=1
𝑁
∑
(𝑦𝑛 − (𝑎𝑥𝑛 + 𝑏)) = 0. (3.17)
𝑛=1

Note we can divide both sides by -2 as it is just a constant; we cannot divide by 𝑥𝑖 as that
varies with 𝑖.
We may rewrite these equations as
( 𝑁 ) ( 𝑁 ) 𝑁
∑ ∑ ∑
𝑥2𝑛 𝑎 + 𝑥𝑛 𝑏 = 𝑥𝑛 𝑦𝑛
𝑛=1 𝑛=1 𝑛=1
( 𝑁
) ( 𝑁
) 𝑁
∑ ∑ ∑
𝑥𝑛 𝑎+ 1 𝑏 = 𝑦𝑛 . (3.18)
𝑛=1 𝑛=1 𝑛=1

5
We have obtained that the values of 𝑎 and 𝑏 which minimize the error (defined in (3.13))
satisfy the following matrix equation:
⎛ ∑𝑁 2
∑𝑁 ⎞⎛ ⎞ ⎛ ∑𝑁 ⎞
𝑛=1 𝑥𝑛 𝑛=1 𝑥𝑛 𝑎 𝑛=1 𝑥𝑛 𝑦𝑛
⎝ ⎠⎝ ⎠ = ⎝ ⎠. (3.19)
∑𝑁 ∑𝑁 ∑𝑁
𝑛=1 𝑥𝑛 𝑛=1 1
𝑏 𝑛=1 𝑦𝑛

) 𝐴 is a matrix 𝐵 such
We need a fact from linear algebra. Recall the inverse(of a matrix
𝛼 𝛽
that 𝐴𝐵 = 𝐵𝐴 = 𝐼, where 𝐼 is the identity matrix. If 𝐴 = is a 2 × 2 matrix where
𝛾 𝛿
det 𝐴 = 𝛼𝛿 − 𝛽𝛾 ∕= 0, then 𝐴 is invertible and
( )
−1 1 𝛿 −𝛾
𝐴 = . (3.20)
𝛼𝛿 − 𝛽𝛾 −𝛽 𝛼
( ) ( )
−1 1 0 1 2
In other words, 𝐴𝐴 = here. For example, if 𝐴 = then det 𝐴 = 1 and
0 1 3 7
( )
−1 7 −2
𝐴 = ; we can check this by noting (through matrix multiplication) that
−3 1
( )( ) ( )
1 2 7 −2 1 0
= . (3.21)
3 7 −3 1 0 1
We can show the matrix in (3.19) is invertible (so long as at least two of the 𝑥𝑛 ’s are
distinct), which implies
⎛ ⎞ ⎛ ∑𝑁 2
∑𝑁 ⎞−1 ⎛ ∑𝑁 ⎞
𝑎 𝑛=1 𝑥𝑛 𝑛=1 𝑥𝑛 𝑛=1 𝑥𝑛 𝑦𝑛
⎝ ⎠ = ⎝ ⎠ ⎝ ⎠. (3.22)
∑𝑁 ∑𝑁 ∑𝑁
𝑏 𝑛=1 𝑥𝑛 𝑛=1 1 𝑛=1 𝑦𝑛
Denote the matrix from (3.19) by 𝑀. The determinant of 𝑀 is
𝑁
∑ 𝑁
∑ 𝑁
∑ 𝑁
∑
det 𝑀 = 𝑥2𝑛 ⋅ 1− 𝑥𝑛 ⋅ 𝑥𝑛 . (3.23)
𝑛=1 𝑛=1 𝑛=1 𝑛=1
As
𝑁
1 ∑
𝑥 = 𝑥𝑛 , (3.24)
𝑁 𝑛=1
we find that
𝑁
∑
det 𝑀 = 𝑁 𝑥2𝑛 − (𝑁𝑥)2
𝑛=1
( 𝑁
)
1 ∑ 2
= 𝑁2 𝑥 − 𝑥2
𝑁 𝑛=1 𝑛
𝑁
1 ∑
= 𝑁2 ⋅ (𝑥𝑛 − 𝑥)2 , (3.25)
𝑁 𝑛=1

6
where the last equality follows from simple algebra. Thus, as long as all the 𝑥𝑛 are not equal,
det 𝑀 will be non-zero and 𝑀 will be invertible. Using the definition of variance, we notice
the above could also be written as

det 𝑀 = 𝑁 2 𝜎𝑥2 . (3.26)

Thus we find that, so long as the 𝑥’s are not all equal, the best fit values of 𝑎 and 𝑏 are
obtained by solving a linear system of equations; the solution is given in (3.22).

We rewrite (3.22) in a simpler form. Using the inverse of the matrix and the definition of
the mean and variance, we find
⎛ ⎞ ⎛ ∑𝑁 ⎞
𝑎 ( ) 𝑥𝑛 𝑦𝑛
⎝ ⎠ = 1 𝑁 −𝑁𝑥 𝑛=1
⎠.
∑𝑁 2
⎝ (3.27)
𝑁 𝜎𝑥 −𝑁𝑥
2 2 𝑥𝑛 𝑁
𝑏 𝑛=1
∑
𝑛=1 𝑦𝑛

Expanding gives

∑𝑁 ∑𝑁
𝑁
𝑥 𝑦
𝑛=1 𝑛 𝑛 − 𝑁 𝑥 𝑛=1 𝑦𝑛
𝑎 = 2
𝑁 2 𝜎𝑋
−𝑁 𝑥 𝑁
∑ ∑𝑁 2 ∑𝑁
𝑥
𝑛=1 𝑛 𝑛 𝑦 + 𝑛=1 𝑥𝑛 𝑛=1 𝑦𝑛
𝑏 = 2
𝑁 2 𝜎𝑋
𝑁
1 ∑
𝑥 = 𝑥𝑖
𝑁 𝑛=1
𝑁
1 ∑
𝜎𝑥2 = (𝑥𝑖 − 𝑥)2. (3.28)
𝑁 𝑛=1

As the formulas for 𝑎 and 𝑏 are so important, it is worth giving another expression for

7
them. We also have
∑𝑁 ∑𝑁 ∑𝑁 ∑𝑁
𝑛=1 1 𝑥 𝑦
𝑛=1 𝑛 𝑛 − 𝑥
𝑛=1 𝑛 𝑛=1 𝑦𝑛
𝑎 = ∑𝑁 ∑𝑁 2 ∑𝑁 ∑𝑁
𝑛=1 1 𝑥
𝑛=1 𝑛 − 𝑛=1 𝑥 𝑛 𝑛=1 𝑥𝑛

∑𝑁 ∑𝑁 2
∑𝑁 ∑𝑁
𝑛=1 𝑥𝑛 𝑛=1 𝑥𝑛 𝑦𝑛 − 𝑛=1 𝑥𝑛 𝑛=1 𝑦𝑛
𝑏 = 𝑁 𝑁 𝑁 𝑁
. (3.29)
2
∑ ∑ ∑ ∑
𝑥
𝑛=1 𝑛 𝑥
𝑛=1 𝑛 − 𝑥
𝑛=1 𝑛 𝑛=1 1

Remark 3.1. The formulas above for 𝑎 and 𝑏 are reasonable, as can be seen by a unit analysis.
For example, imagine 𝑥 is in meters and 𝑦 is in seconds. Then if 𝑦 = 𝑎𝑥 + 𝑏 we would need 𝑏
and 𝑦 to have the same units (namely seconds), and 𝑎 to have units seconds per meter. If we
substitute in the units for the various quantities on the right hand side of (3.28), we do see 𝑎
and 𝑏 have the correct units. While this is not a proof that we have not made a mistake, it is a
great reassurance. No matter what you are studying, you should always try unit calculations
such as this.
There are other, equivalent formulas for 𝑎 and 𝑏; these give the same answer, but arrange
the algebra in a slightly different sequence of steps. Essentially what we are doing is the
following: image we are given
4 = 3𝑎 + 2𝑏
5 = 2𝑎 + 5𝑏.
If we want to solve, we can proceed in two ways. We can use the first equation to solve for
𝑏 in terms of 𝑎 and substitute in, or we can multiply the first equation by 5 and the second
equation by 2 and subtract; the 𝑏 terms cancel and we obtain the value of 𝑎. Explicitly,
20 = 15𝑎 + 10𝑏
10 = 4𝑎 + 10𝑏,
which yields
10 = 11𝑎,
or
𝑎 = 10/11.
Remark 3.2. The data plotted in Figure 1 was obtained by letting 𝑥𝑛 = 5 + .2𝑛 and then
letting 𝑦𝑛 = 5𝑥𝑛 plus an error randomly drawn from a normal distribution with mean zero
and standard deviation 4 (𝑛 ∈ {1, . . . , 100}). Using these values, we find a best fit line of
𝑦 = 4.99𝑥 + .48; (3.30)

8
thus 𝑎 = 4.99 and 𝑏 = .48. As the expected relation is 𝑦 = 5𝑥, we expected a best fit value of
𝑎 of 5 and 𝑏 of 0.
While our value for 𝑎 is very close to the true value, our value of 𝑏 is significantly off. We
deliberately chose data of this nature to indicate the dangers in using the Method of Least
Squares. Just because we know 4.99 is the best value for the slope and .48 is the best value
for the 𝑦-intercept does not mean that these are good estimates of the true values. The theory
needs to be supplemented with techniques which provide error estimates. Thus we want to
know something like, given this data, there is a 99% chance that the true value of 𝑎 is in
(4.96, 5.02) and the true value of 𝑏 is in (−.22, 1.18); this is far more useful than just knowing
the best fit values.
If instead we used
∑𝑁
𝐸abs (𝑎, 𝑏) = ∣𝑦𝑛 − (𝑎𝑥𝑛 + 𝑏)∣ , (3.31)
𝑛=1

then numerical techniques yield that the best fit value of 𝑎 is 5.03 and the best fit value of 𝑏
is less than 10−10 in absolute value. The difference between these values and those from the
Method of Least Squares is in the best fit value of 𝑏 (the least important of the two parameters),
and is due to the different ways of weighting the errors.

Exercise 3.3. Consider the observed data (0, 0), (1, 1), (2, 2). It should be clear that the best
fit line is 𝑦 = 𝑥; this leads to zero error in all three systems of measuring error, namely (2.10),
(2.11) and (2.12); however, show that if we use (2.10) to measure the error then line 𝑦 = 1
also yields zero error, and clearly this should not be the best fit line!

Exercise 3.4. Generalize the method of least squares to find the best fit quadratic to 𝑦 = 𝑎𝑥2 +
𝑏𝑥+𝑐 (or more generally the best fit degree 𝑚 polynomial to 𝑦 = 𝑎𝑚 𝑥𝑚 +𝑎𝑚−1 𝑥𝑚−1 +⋅ ⋅ ⋅+𝑎0 ).

While for any real world problem, direct computation determines whether or not the re-
sulting matrix is invertible, it is nice to be able to prove the determinant is always non-zero
for the best fit line (if all the 𝑥’s are not equal).

Exercise 3.5. If the 𝑥’s are not all equal, must the determinant be non-zero for the best fit
quadratic or the best fit cubic?

Looking at our proof of the Method of Least Squares, we note that it was not essential that
we have 𝑦 = 𝑎𝑥 + 𝑏; we could have had 𝑦 = 𝑎𝑓 (𝑥) + 𝑏𝑔(𝑥), and the arguments would have
proceeded similarly. The difference would be that we would now obtain
⎛ ∑𝑁 2
∑𝑁 ⎞⎛ ⎞ ⎛ ∑𝑁 ⎞
𝑛=1 𝑓 (𝑥𝑛 ) 𝑛=1 𝑓 (𝑥𝑛 )𝑔(𝑥𝑛 ) 𝑎 𝑛=1 𝑓 (𝑥𝑛 )𝑦𝑛
⎝ ⎠⎝ ⎠ = ⎝ ⎠ . (3.32)
∑𝑁 ∑𝑁 𝑁
2 𝑏
∑
𝑛=1 𝑓 (𝑥𝑛 )𝑔(𝑥𝑛 ) 𝑛=1 𝑔(𝑥𝑛 ) 𝑛=1 𝑔(𝑥𝑛 )𝑦𝑛

Finally, we comment briefly on a very important change of variable that allows us to use
the Method of Least Squares in many more situations than one might expect. Consider the

9
case of a researcher trying to prove Newton’s Law of Universal Gravity, which says the force
felt by two masses 𝑚1 and 𝑚2 has magnitude 𝐺𝑚1 𝑚2 /𝑟 2 , where 𝑟 is the distance between the
objects. If we fix the masses, then we expect the magnitude of the force to be inversely pro-
portional to the distance. We may write this as 𝐹 = 𝑘/𝑟 𝑛 , where we believe 𝑛 = 2 (the value
for 𝑘 depends on 𝐺 and the product of the masses). Clearly it is 𝑛 that is the more important
parameter here. Unfortunately, as written, we cannot use the Method of Least Squares, as one
of the unknown parameters arises non-linearly (as the exponent of the separation).
We can surmount this problem by taking a logarithmic transform of the data. Setting
𝒦 = log 𝑘, ℱ = log 𝐹 and ℛ = log 𝑟, the relation 𝐹 = 𝑘/𝑟 𝑛 becomes ℱ = 𝑛ℛ + 𝒦. We are
now in a situation where we can apply the Method of Least Squares. The only difference from
the original problem is how we collect and process the data; now our data is not the separation
between the two masses, but rather the logarithm of the separation. Arguing along these lines,
many power relations can be converted to instances where we can use the Method of Least
Squares. We thus (finally) fulfill a promise made by many high school math teachers years
ago: logarithms can be useful!
Exercise 3.6. Consider the generalization of the Method of Least Squares given in (3.32).
Under what conditions is the matrix invertible?
Exercise 3.7. The method of proof generalizes further to the case when one expects 𝑦 is a
linear combination of 𝐾 fixed functions. The functions need not be linear; all that is required
is that we have a linear combination, say 𝑎1 𝑓1 (𝑥) + ⋅ ⋅ ⋅ + 𝑎𝐾 𝑓𝐾 (𝑥). One then determines
the 𝑎1 , . . . , 𝑎𝐾 that minimize the variance (the sum of squares of the errors) by calculus and
linear algebra. Find the matrix equation that the best fit coefficients (𝑎1 , . . . , 𝑎𝐾 ) must satisfy.
Exercise 3.8. Consider the best fit line from the Method ∑of Least Squares,∑so the best fit values
are given by (3.22). Is the point (𝑥, 𝑦), where 𝑥 = 𝑛1 𝑁 𝑥
𝑛=1 𝑛 and 𝑦 = 𝑁
𝑛=1 𝑦𝑛 , on the best
fit line? In other words, does the best fit line go through the “average” point?
Exercise 3.9 (Kepler’s Third Law). Kepler’s third law states that if 𝑇 is the orbital period
of a planet traveling in an elliptical orbit about the sun (and no other objects exist), then
𝑇 2 = 𝐶𝐿3 , where 𝐿 is the length of the semi-major axis. I always found this the hardest of the
three laws; how would one be led to the right values of the exponents from observational data?
One way is through the Method of Least Squares. Set 𝒯 = log 𝑇 , ℒ = log 𝐿 and 𝑐 = log 𝒞.
Then a relationship of the form 𝑇 𝑎 = 𝐶𝐿𝑏 becomes 𝑎𝒯 = 𝑏ℒ + 𝑐, which is amenable to the
Method of Least Squares. The semi-major axis of the the 8 planets (sadly, Pluto is no longer
considered a planet) are Mercury 0.387, Venus 0.723, Earth 1.000, Mars 1.524, Jupiter 5.203,
Saturn 9.539, Uranus 19.182, Neptune 30.06 (the units are astronomical units, where one
astronomical unit is 1.496 ⋅108 km); the orbital periods (in years) are 0.2408467, 0.61519726,
1.0000174, 1.8808476, 11.862615, 29.447498, 84.016846 and 164.79132. Using this data,
apply the Method of Least Squares to find the best fit values of 𝑎 and 𝑏 in 𝑇 𝑎 = 𝐶𝐿𝑏 (note, of
course, you need to use the equation 𝑎𝒯 = 𝑏ℒ + 𝒞).
Actually, as phrased above, the problem is a little indeterminate
√ 1.5 for the following reason.
Imagine we have 𝑇 = 5𝐿 or 𝑇 = 25𝐿 or 𝑇 = 5𝐿 or even 𝑇 4 = 625𝐿12 . All of
2 3 4 6

10
these are the same equation! In other words, we might as well make our lives easy by taking
𝑎 = 1; there really is no loss in generality in doing this. This is yet another example of how
changing our point of view can really help us. At first it looks like this is a problem involving
three unknown parameters, 𝑎, 𝑏 and 𝐶; however, there is absolutely no loss in generality in
taking 𝑎 = 1; thus let us make our lives easier and just look at this special case.
For your convenience, here are the natural logarithms of the data: the lengths of the semi-
major axes are

{−0.949331, −0.324346, 0, 0.421338, 1.64924, 2.25539, 2.95397, 3.4032}

and the natural logarithms of the periods (in years) are

{−1.42359, −0.485812, 0.0000173998, 0.631723, 2.47339, 3.38261, 4.43102, 5.10468}.

The problem asks you to find the best fit values of 𝑎 and 𝑏. In some sense this is a bit
misleading, as there are infinitely many possible values for the pair (𝑎, 𝑏); however, all of
these pairs will have the same ratio 𝑎/𝑏 (which Kepler says should be close to 3/2 or 1.50). It
is this ratio that is truly important. The content of Kepler’s Third Law is that the square of the
period is proportional to the cube of the semi-major axis. The key numbers are the powers of
the period and the length (the 𝑎 and the 𝑏), not the proportionality constant. This is why I only
ask you to find the best fit values of 𝑎 and 𝑏 and not 𝐶 (or 𝒞), as 𝐶 (or 𝒞) is not as important.
If we take 𝑎 = 1 then the best fit value of 𝒞 is 0.000148796, and the best fit value of 𝑏 is almost
1.50.
Our notes above have many different formulas to find the best fit values 𝑎 and 𝑏 for a
relation 𝑦 = 𝑎𝑥 + 𝑏. For us, we have 𝒯 = 𝑎𝑏 ℒ + 𝒞𝑎 . Thus, for this problem, the role of 𝑎 from
before is being played by 𝑎𝑏 and the role of 𝑏 from before is being played by 𝒞𝑎 . Therefore if we
want to find the best fit value for the ratio 𝑎𝑏 for this problem, we just use the first of the two
formulas from (3.29).

References
[BD] P. Bickel and K. Doksum, Mathematical Statistics: Basic Ideas and Selected Top-
ics, Holden-Day, San Francisco, 1977.
[CaBe] G. Casella and R. Berger, Statistical Inference, 2nd edition, Duxbury Advanced
Series, Pacific Grove, CA, 2002.
[Du] R. Durrett, Probability: Theory and Examples, 2nd edition, Duxbury Press, 1996.
[Fe] W. Feller, An Introduction to Probability Theory and Its Applications, 2nd edition,
Vol. II, John Wiley & Sons, New York, 1971.
[Kel] D. Kelley, Introduction to Probability, Macmillan Publishing Company, London,
1994.

11
[LF] R. Larson and B. Farber, Elementary Statistics: Picturing the World, Prentice-Hall,
Englewood Cliffs, NJ, 2003.

[MoMc] D. Moore and G. McCabe, Introduction to the Practice of Statistics, W. H. Freeman

and Co., London, 2003.

Blood Banking Lab Manual 3rd
100% (1)
Blood Banking Lab Manual 3rd
7 pages
P. Vanicek, Introduction To Adjustment Calculus
No ratings yet
P. Vanicek, Introduction To Adjustment Calculus
250 pages
Method Least Squares
No ratings yet
Method Least Squares
7 pages
1B40 Practical Skills: Weighted Mean
No ratings yet
1B40 Practical Skills: Weighted Mean
7 pages
Least Squares Fitting With Excel
No ratings yet
Least Squares Fitting With Excel
12 pages
Introduction - Error Propagation
No ratings yet
Introduction - Error Propagation
3 pages
unit-5 -notes
No ratings yet
unit-5 -notes
41 pages
Chapter 2 SOLVING NONLINEAR EQUATION 3
No ratings yet
Chapter 2 SOLVING NONLINEAR EQUATION 3
14 pages
The Least Squer Method
No ratings yet
The Least Squer Method
192 pages
Least Squares PDF
No ratings yet
Least Squares PDF
192 pages
Sph 100 Lecture 3 by Leon Abonyo
No ratings yet
Sph 100 Lecture 3 by Leon Abonyo
8 pages
Unit III - Least Square Estimation
No ratings yet
Unit III - Least Square Estimation
6 pages
How to Make Pls Consistent
No ratings yet
How to Make Pls Consistent
6 pages
Analysis of Errors
No ratings yet
Analysis of Errors
13 pages
Least Square Method
No ratings yet
Least Square Method
5 pages
cubic spline L16
No ratings yet
cubic spline L16
23 pages
Least-Squares and Chi-Square For The Budding Aficionado: Art and Practice
No ratings yet
Least-Squares and Chi-Square For The Budding Aficionado: Art and Practice
66 pages
Chapter IV
No ratings yet
Chapter IV
24 pages
Numerical Optimization of Likelihoods: Additional Literature For STK2120
No ratings yet
Numerical Optimization of Likelihoods: Additional Literature For STK2120
46 pages
Curve Fitting - Lecturers - 2
No ratings yet
Curve Fitting - Lecturers - 2
21 pages
Model Fitting
No ratings yet
Model Fitting
19 pages
LMC02 App
No ratings yet
LMC02 App
3 pages
Physical Chemistry II
No ratings yet
Physical Chemistry II
11 pages
1 Review of Least Squares Solutions To Overdetermined Systems
No ratings yet
1 Review of Least Squares Solutions To Overdetermined Systems
4 pages
Dembinski 2019
No ratings yet
Dembinski 2019
25 pages
Dyedx: 1.1 Error Types
No ratings yet
Dyedx: 1.1 Error Types
44 pages
2 Curve Fitting
No ratings yet
2 Curve Fitting
4 pages
Errors in Measurement
No ratings yet
Errors in Measurement
35 pages
2-Presentation 2 MPE 311
No ratings yet
2-Presentation 2 MPE 311
34 pages
Least Square Methods
No ratings yet
Least Square Methods
2 pages
Least-square
No ratings yet
Least-square
40 pages
Experimental Methods: Department of Applied Mechanics
No ratings yet
Experimental Methods: Department of Applied Mechanics
12 pages
Linear Least Squares
No ratings yet
Linear Least Squares
40 pages
Lab06.least Squares Fitting Shortened - Desmos - MATH-1173-001 - Calculus I With Computer Expl
No ratings yet
Lab06.least Squares Fitting Shortened - Desmos - MATH-1173-001 - Calculus I With Computer Expl
5 pages
Measurement Techniques - Lesson03
No ratings yet
Measurement Techniques - Lesson03
35 pages
Statistical Methods in Experimental Chemistry
100% (1)
Statistical Methods in Experimental Chemistry
103 pages
Maths Section C
No ratings yet
Maths Section C
6 pages
(A) Extrapolation:: 2.5 Other Topics
No ratings yet
(A) Extrapolation:: 2.5 Other Topics
6 pages
Simple Regression
No ratings yet
Simple Regression
18 pages
Inf 2
No ratings yet
Inf 2
37 pages
Abdi Least Squares 06 Pretty
No ratings yet
Abdi Least Squares 06 Pretty
7 pages
17 21
No ratings yet
17 21
6 pages
2021_WS_3_71_2_course_math_statist_2021_22_complete
No ratings yet
2021_WS_3_71_2_course_math_statist_2021_22_complete
108 pages
LEAST SQUARE Curve Fitting
No ratings yet
LEAST SQUARE Curve Fitting
13 pages
Least Squares Fitting - From Wolfram MathWorld
No ratings yet
Least Squares Fitting - From Wolfram MathWorld
5 pages
ML Lec-3
No ratings yet
ML Lec-3
11 pages
Weighted Least Squares Fitting
No ratings yet
Weighted Least Squares Fitting
2 pages
Mohit Final REASEARCH PAPER
No ratings yet
Mohit Final REASEARCH PAPER
20 pages
Stats Main
No ratings yet
Stats Main
18 pages
Curve Fitting: Fitting A Straight Line
No ratings yet
Curve Fitting: Fitting A Straight Line
3 pages
Linear Regression Analysis and Least Square Methods
No ratings yet
Linear Regression Analysis and Least Square Methods
65 pages
Consensus Values and Weighting Factors: Robert C. Paule and John Mandel
No ratings yet
Consensus Values and Weighting Factors: Robert C. Paule and John Mandel
9 pages
dodge1987_An introduction to L1-norm based statistical data analysis
No ratings yet
dodge1987_An introduction to L1-norm based statistical data analysis
15 pages
Maximum Likelihood An Introduction: L. Le Cam
No ratings yet
Maximum Likelihood An Introduction: L. Le Cam
31 pages
Maximum Likelihood Estimators and Least Squares
No ratings yet
Maximum Likelihood Estimators and Least Squares
5 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
Chapter 7 Analysis of Variance (ANOVA)
No ratings yet
Chapter 7 Analysis of Variance (ANOVA)
23 pages
L7-CurveFitting(LeastSquaresRegression)
No ratings yet
L7-CurveFitting(LeastSquaresRegression)
45 pages
Understanding Vector Calculus: Practical Development and Solved Problems
From Everand
Understanding Vector Calculus: Practical Development and Solved Problems
Jerrold Franklin
No ratings yet
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
07-13-2015 Complaint Payne V MLB No 15-cv-3229
No ratings yet
07-13-2015 Complaint Payne V MLB No 15-cv-3229
76 pages
6th Sem Project Report
No ratings yet
6th Sem Project Report
22 pages
Past Simple Past Continuous Present Perfect 1
No ratings yet
Past Simple Past Continuous Present Perfect 1
2 pages
Feast of Christ The King (Trumpet) Feast of Christ The King (Trumpet)
No ratings yet
Feast of Christ The King (Trumpet) Feast of Christ The King (Trumpet)
3 pages
Hedley Byrne Co LTDV Heller Partners LTD 19632
No ratings yet
Hedley Byrne Co LTDV Heller Partners LTD 19632
42 pages
Project Proposal - Arpita Singh (15A3HP634)
No ratings yet
Project Proposal - Arpita Singh (15A3HP634)
11 pages
Nirjal Thakuri Position
No ratings yet
Nirjal Thakuri Position
4 pages
IGCSE Chemistry MSWB Answers Final
No ratings yet
IGCSE Chemistry MSWB Answers Final
22 pages
Cep Lesson Plan 1
No ratings yet
Cep Lesson Plan 1
10 pages
Chipko Movement - Final
50% (2)
Chipko Movement - Final
32 pages
G.R. No. 202868
No ratings yet
G.R. No. 202868
11 pages
Mobile Phones.28.11.2017
No ratings yet
Mobile Phones.28.11.2017
11 pages
MGT510Z - Managerial Communication Skills
No ratings yet
MGT510Z - Managerial Communication Skills
7 pages
Liebherr Crawler Dozer Pr 726 1329 1330 1331 Service Manual
No ratings yet
Liebherr Crawler Dozer Pr 726 1329 1330 1331 Service Manual
24 pages
Tổng hợp hiện tại đơn, quá khứ đơn
No ratings yet
Tổng hợp hiện tại đơn, quá khứ đơn
5 pages
NUGU: A Group-Based Intervention App PDF
No ratings yet
NUGU: A Group-Based Intervention App PDF
11 pages
Wooden Crate Plans: Directions: Glue and Nail All Boards
No ratings yet
Wooden Crate Plans: Directions: Glue and Nail All Boards
1 page
What Is Demutualization
No ratings yet
What Is Demutualization
6 pages
Instant Download Corporate Communication; Concepts and Practice 3rd Edition Jaishri Jethwaney PDF All Chapters
100% (1)
Instant Download Corporate Communication; Concepts and Practice 3rd Edition Jaishri Jethwaney PDF All Chapters
55 pages
The Effect of Russian Loanwords On Kazakh Phonology
No ratings yet
The Effect of Russian Loanwords On Kazakh Phonology
3 pages
TZ Act GN 2023 839 Publication Document
No ratings yet
TZ Act GN 2023 839 Publication Document
7 pages
Employer Employee Relationship 2 PDF
No ratings yet
Employer Employee Relationship 2 PDF
7 pages
The RSA Algorithm
No ratings yet
The RSA Algorithm
15 pages
Doll
100% (2)
Doll
6 pages
SB - Problems - S2. 2024 - 2025.docx
No ratings yet
SB - Problems - S2. 2024 - 2025.docx
6 pages
Catalogue Jetro WFFS2020
No ratings yet
Catalogue Jetro WFFS2020
16 pages
8 Government Grants
No ratings yet
8 Government Grants
5 pages
Act 436 Midwives Act 1966
No ratings yet
Act 436 Midwives Act 1966
20 pages
Tim D. Cochran, Kent E. Orr and Peter Teichner - Knot Concordance, Whitney Towers and L 2-Signatures
No ratings yet
Tim D. Cochran, Kent E. Orr and Peter Teichner - Knot Concordance, Whitney Towers and L 2-Signatures
87 pages

LS MethodLeastSquares

Uploaded by

LS MethodLeastSquares

Uploaded by

The Method of Least Squares

Department of Mathematics and Statistics

𝑦 = 𝑎1 𝑓1 (𝑥) + ⋅ ⋅ ⋅ + 𝑐𝐾 𝑓𝐾 (𝑥); (0.1)

2 Probability and Statistics Review 3

3 The Method of Least Squares 5

1 Description of the Problem

Figure 1: 100 “simulated” observations of displacement and force (𝑘 = 5).

Unfortunately, it is extremely unlikely that we will observe a perfect linear relationship.

𝑦 = 𝑎1 𝑓1 (𝑥) + ⋅ ⋅ ⋅ + 𝑎𝐾 𝑓𝐾 (𝑥) (1.1)

is the best approximation to the data.

the standard deviation 𝜎𝑥 is the square root of the variance:

Setting ∂𝐸/∂𝑎 = ∂𝐸/∂𝑏 = 0 (and dividing by -2) yields

det 𝑀 = 𝑁 2 𝜎𝑥2 . (3.26)

{−0.949331, −0.324346, 0, 0.421338, 1.64924, 2.25539, 2.95397, 3.4032}

and the natural logarithms of the periods (in years) are

{−1.42359, −0.485812, 0.0000173998, 0.631723, 2.47339, 3.38261, 4.43102, 5.10468}.

[MoMc] D. Moore and G. McCabe, Introduction to the Practice of Statistics, W. H. Freeman

You might also like