Lecture 13
Lecture 13
Gauss-Newton method
Michael S. Floater
November 12, 2018
This problem can be formulated as (1) with n = 2 where the residuals are
1
More generally, we could fit a polynomial
n
X
p(t) = xj tj−1 ,
j=1
In all these cases, the problem is linear in the sense that the solution is
found by solving a linear system of equations. This is because f is quadratic
in x. We can express f as
1
f (x) = kAx − bk2 ,
2
where A ∈ Rm,n is the Vandermonde matrix
φ1 (t1 ) φ2 (t1 ) · · · φn (t1 )
φ1 (t2 ) φ2 (t2 ) · · · φn (t2 )
A = .. .. .. ,
. . .
φ1 (tm ) φ2 (tm ) · · · φn (tm )
2
Another is the exponential function
p(t) = x1 ex2 t .
As for the linear case we can reformulate this as the minimization of f in (1)
with the residuals
ri (x) = ri (x1 , x2 ) = yi − p(ti ).
In these cases the problem is non-linear since f is no longer a quadratic
function (the residuals are no longer linear in the parameters x1 , . . . , xn ). One
approach to minimizing such an f is to try Newton’s method. Recall that
Newton’s method for minimizing f is simply Newton’s method for solving
the system of n equations, ∇f (x) = 0, which is the iteration
3
2 Gauss-Newton method
The Gauss-Newton method is a simplification or approximation of the New-
ton method that applies to functions f of the form (1). Differentiating (1)
with respect to xj gives
m
∂f X ∂ri
= ri ,
∂xj i=1
∂xj
and so the gradient of f is
∇f = JrT r,
where r = [r1 , . . . , rm ]T and Jr ∈ Rm,n is the Jacobian of r,
∂ri
Jr = .
∂xj i=1,...,m,j=1,...,n
4
are small as we approach a minimum.
An advantage of this method is that it does not require computing the
second order partial derivatives of the functions ri . Another is that the search
direction, i.e.,
d(k) = −(Jr (x(k) )T Jr (x(k) ))−1 ∇f (x(k) ),
is always a descent direction (as long as Jr (x(k) ) has full rank). This is
because JrT Jr is positive semi-definite, which implies that (JrT Jr )−1 is also
positive semi-definite, which means that
∇f (x(k) ))T d(k) = −∇f (x(k) ))T (Jr (x(k) )T Jr (x(k) ))−1 ∇f (x(k) ) ≤ 0.
If Jr (x(k) ) has full rank this inequality is strict. This suggests that the Gauss-
Newton method will typically be more robust than Newton’s method.
There is still no guarantee, however, that the Gauss-Newton method will
converge in general. In pratice, one would want to incorporate a step length
α(k) into the iteration:
using some rule like the Armijo rule, in order to ensure descent at each
iteration.
3 Example
In a biology experiment studying the relation between substrate concentra-
tion [S] and reaction rate in an enzyme-mediated reaction, the data in the
following table were obtained.
i 1 2 3 4 5 6 7
[S] 0.038 0.194 0.425 0.626 1.253 2.500 3.740
rate 0.050 0.127 0.094 0.2122 0.2729 0.2665 0.3317
5
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 0.5 1 1.5 2 2.5 3 3.5 4
6
Starting with the same initial estimates of x1 = 0.9 and x2 = 0.2, New-
ton’s method does not converge. However, if we change the initial estimates
to x1 = 0.4 and x2 = 0.6 we find that both the Gauss-Newton and New-
ton methods converge. Moreover, using again the stopping criterion of (4),
the Gauss-Newton method needs 11 iterations while Newton’s method needs
only 5.