0% found this document useful (0 votes)

28 views

Lecture 13

The document discusses nonlinear least squares problems and the Gauss-Newton method for solving them. It begins by introducing nonlinear least squares problems and describing how they generalize linear least squares problems. It then presents the Gauss-Newton method as an approximation of Newton's method for solving nonlinear least squares problems. The Gauss-Newton method neglects second-order derivative terms, making it simpler to implement than Newton's method. It also guarantees that search directions will be descent directions at each iteration, improving robustness over Newton's method. Finally, the document provides an example of applying the Gauss-Newton method to fit an enzymatic reaction rate model to experimental data.

Uploaded by

FM Khan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

Lecture 13

Uploaded by

FM Khan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Lecture 13: Non-linear least squares and the

Gauss-Newton method
Michael S. Floater
November 12, 2018

1 Non-linear least squares

A minimization problem that occurs frequently is the minimization of a func-
tion of the form m
1X
f (x) = ri (x)2 , (1)
2 i=1
where x ∈ Rn and ri : Rn → R, i = 1, . . . , m. Such a minimization problem
comes from curve fitting by least squares, where the ri are the residuals.

1.1 Linear case

Suppose we are given data (tj , yj ), j = 1, . . . , m, and we want to fit a straight
line,
p(t) = x1 + x2 t.
Then we would like to find x1 and x2 that minimize
m m
1X 1X
(yi − p(ti ))2 = (yi − x1 − x2 ti )2 .
2 i=1 2 i=1

This problem can be formulated as (1) with n = 2 where the residuals are

ri (x) = ri (x1 , x2 ) = yi − p(ti ) = yi − x1 − x2 ti .

1
More generally, we could fit a polynomial
n
X
p(t) = xj tj−1 ,
j=1

or even a linear combination of basis functions φ1 (t), . . . , φn (t),

n
X
p(t) = xj φj (t).
j=1

These are again examples of (1), where

ri (x) = ri (x1 , . . . , xn ) = yi − p(ti ).

In all these cases, the problem is linear in the sense that the solution is
found by solving a linear system of equations. This is because f is quadratic
in x. We can express f as
1
f (x) = kAx − bk2 ,
2
where A ∈ Rm,n is the Vandermonde matrix
 
φ1 (t1 ) φ2 (t1 ) · · · φn (t1 )
 φ1 (t2 ) φ2 (t2 ) · · · φn (t2 ) 
A =  .. .. ..  ,
 
 . . . 
φ1 (tm ) φ2 (tm ) · · · φn (tm )

x = [x1 , x2 , . . . , xn ]T is the vector of coefficients of p, and b = [y1 , y2 , . . . , ym ]T

is the vector of data observations.
We have seen that we can then find x from the QR decomposition of A
or from the normal equations, for example.

1.2 Non-linear case

It might be more appropriate to fit a curve p(t) that does not depend linearly
on its parameters x1 , . . . , xn . An example of this is the rational function
x1 t
p(t) = .
x2 + t

2
Another is the exponential function

p(t) = x1 ex2 t .

In both cases we would again like to find x1 and x2 to minimize

m
1X
(yi − p(ti ))2 .
2 i=1

As for the linear case we can reformulate this as the minimization of f in (1)
with the residuals
ri (x) = ri (x1 , x2 ) = yi − p(ti ).
In these cases the problem is non-linear since f is no longer a quadratic
function (the residuals are no longer linear in the parameters x1 , . . . , xn ). One
approach to minimizing such an f is to try Newton’s method. Recall that
Newton’s method for minimizing f is simply Newton’s method for solving
the system of n equations, ∇f (x) = 0, which is the iteration

x(k+1) = x(k) − (∇2 f (x(k) ))−1 ∇f (x(k) ). (2)

The advantages of Newton’s method are:

1. If f is quadratic, it converges in one step, i.e., x(1) is the global minimum

of f for any initial guess x(0) .

2. For non-linear least squares it converges quadratically to a local mini-

mum if the initial guess x(0) is close enough.

The disadvantage of Newton’s method is its lack of robustness. For non-

linear least squares it might not converge. One reason for this is that the
search direction
d(k) = −(∇2 f (x(k) ))−1 ∇f (x(k) )
might not even be a descent direction: there is no guarantee that it fulfills
the descent condition,
∇f (x(k) ))T d(k) < 0.
One way to improve robustness is to use the Gauss-Newton method in-
stead. The Gauss-Newton method is also simpler to implement.

3
2 Gauss-Newton method
The Gauss-Newton method is a simplification or approximation of the New-
ton method that applies to functions f of the form (1). Differentiating (1)
with respect to xj gives
m
∂f X ∂ri
= ri ,
∂xj i=1
∂xj
and so the gradient of f is
∇f = JrT r,
where r = [r1 , . . . , rm ]T and Jr ∈ Rm,n is the Jacobian of r,

∂ri
Jr = .
∂xj i=1,...,m,j=1,...,n

Differentiating again, with respect to xk , gives

m
∂ 2f ∂ 2 ri

X ∂ri ∂ri
= + ri ,
∂xj ∂xk i=1
∂x j ∂x k ∂x j ∂x k

and so the Hessian of f is

∇2 f = JrT Jr + Q,
where m
X
Q= ri ∇ 2 ri .
i=1
The Gauss-Newton method is the result of neglecting the term Q, i.e., making
the approximation
∇2 f ≈ JrT Jr . (3)
Thus the Gauss-Newton iteration is
x(k+1) = x(k) − (Jr (x(k) )T Jr (x(k) ))−1 Jr (x(k) )T r(x(k) ).
In general the Gauss-Newton method will not converge quadratically but
if the elements of Q are small as we approach a minimum, we can expect
fast convergence. This will be the case if either the ri or their second order
partial derivatives
∂ 2 ri
∂xj ∂xk

4
are small as we approach a minimum.
An advantage of this method is that it does not require computing the
second order partial derivatives of the functions ri . Another is that the search
direction, i.e.,
d(k) = −(Jr (x(k) )T Jr (x(k) ))−1 ∇f (x(k) ),
is always a descent direction (as long as Jr (x(k) ) has full rank). This is
because JrT Jr is positive semi-definite, which implies that (JrT Jr )−1 is also
positive semi-definite, which means that

∇f (x(k) ))T d(k) = −∇f (x(k) ))T (Jr (x(k) )T Jr (x(k) ))−1 ∇f (x(k) ) ≤ 0.

If Jr (x(k) ) has full rank this inequality is strict. This suggests that the Gauss-
Newton method will typically be more robust than Newton’s method.
There is still no guarantee, however, that the Gauss-Newton method will
converge in general. In pratice, one would want to incorporate a step length
α(k) into the iteration:

x(k+1) = x(k) + α(k) d(k) ,

using some rule like the Armijo rule, in order to ensure descent at each
iteration.

3 Example
In a biology experiment studying the relation between substrate concentra-
tion [S] and reaction rate in an enzyme-mediated reaction, the data in the
following table were obtained.

i 1 2 3 4 5 6 7
[S] 0.038 0.194 0.425 0.626 1.253 2.500 3.740
rate 0.050 0.127 0.094 0.2122 0.2729 0.2665 0.3317

It is desired to find a curve (model function) of the form

Vmax [S]
rate =
KM + [S]
that best fits the data in the least-squares sense, with the parameters Vmax
and KM to be determined.

5
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 0.5 1 1.5 2 2.5 3 3.5 4

Figure 1: Curve model

We can rewrite this problem as finding x1 and x2 such that

x1 t
p(t) =
x2 + t
best fits the data (ti , yi ), i = 1, 2, . . . , 7, of the table where ti is the i-th
concentration [S] and yi is the i-th rate. We will find x1 and x2 that minimize
the sum of squares of the residuals
x1 t i
ri = y i − , i = 1, . . . , 7.
x2 + t i
The Jacobian Jr of the vector of residuals ri with respect to the unknowns
x1 and x2 is a 7 × 2 matrix with the i-th row having the entries
∂ri ti ∂ri x1 t i
=− , = .
∂x1 x2 + t ∂x2 (x2 + t)2
Starting with the initial estimates of x1 = 0.9 and x2 = 0.2, and using
the stopping criterion
k∇f k2 ≤ 10−15 , (4)
the method converges in 14 iterations, yielding the solution x1 = 0.3618,
x2 = 0.5563. The sum of squares of residuals decreased from the initial value
of 1.445 to 0.0078. The plot in Figure 1 shows the curve determined by the
model for the optimal parameters with the observed data.
We can alternatively try the (full) Newton method. We then also need
the second order partial derivatives,
∂ 2 ri ∂ 2 ri ti ∂ 2 ri −2x1 ti
= 0, = , = .
∂x21 ∂x1 ∂x2 (x2 + ti )2 ∂x2 2
(x2 + ti )3

6
Starting with the same initial estimates of x1 = 0.9 and x2 = 0.2, New-
ton’s method does not converge. However, if we change the initial estimates
to x1 = 0.4 and x2 = 0.6 we find that both the Gauss-Newton and New-
ton methods converge. Moreover, using again the stopping criterion of (4),
the Gauss-Newton method needs 11 iterations while Newton’s method needs
only 5.

Engineering Graphics and Design Grade 12 Tasks For The Year Grade 12 Task Description Engineering Graphics and Design
50% (2)
Engineering Graphics and Design Grade 12 Tasks For The Year Grade 12 Task Description Engineering Graphics and Design
5 pages
SOP - SF - 002 Load Handling Equipment Lifts
100% (1)
SOP - SF - 002 Load Handling Equipment Lifts
32 pages
Chapter Three: 3.2 Legendre Equation and Legendre Polynomials
No ratings yet
Chapter Three: 3.2 Legendre Equation and Legendre Polynomials
73 pages
Lecture Notes 8
No ratings yet
Lecture Notes 8
38 pages
Characteristics_of_a_stochastic_process
No ratings yet
Characteristics_of_a_stochastic_process
10 pages
more exactly, measurable function w.r.t. some σ-algebra
No ratings yet
more exactly, measurable function w.r.t. some σ-algebra
6 pages
hw8 (5555)
No ratings yet
hw8 (5555)
3 pages
Statistics Diffusions
No ratings yet
Statistics Diffusions
66 pages
Zeros of Some Self-Reciprocal Polynomials: David Joyner 2010-11-12
No ratings yet
Zeros of Some Self-Reciprocal Polynomials: David Joyner 2010-11-12
19 pages
Chapter 5 Functions of Random Variables
No ratings yet
Chapter 5 Functions of Random Variables
29 pages
The Maximum Principle for Vector Fields
No ratings yet
The Maximum Principle for Vector Fields
10 pages
Klerk-2022 Bản Cần Dịch
No ratings yet
Klerk-2022 Bản Cần Dịch
21 pages
LD
No ratings yet
LD
84 pages
IWIAS Mini Course Opt GF Aug 2023 Nopause
No ratings yet
IWIAS Mini Course Opt GF Aug 2023 Nopause
26 pages
Companion for Chapter 09
No ratings yet
Companion for Chapter 09
10 pages
Chapter 2 Stationarity, Spectral Theorem, Ergodic Theorem (Lecture On 01-07-2021) - STAT 243 - Stochastic Process
No ratings yet
Chapter 2 Stationarity, Spectral Theorem, Ergodic Theorem (Lecture On 01-07-2021) - STAT 243 - Stochastic Process
4 pages
Wave Equation
No ratings yet
Wave Equation
18 pages
Lecture # 23: Subject No. PH11003 (Physics of Waves) Duration: 2 HR
No ratings yet
Lecture # 23: Subject No. PH11003 (Physics of Waves) Duration: 2 HR
14 pages
Statistical_Computing
No ratings yet
Statistical_Computing
6 pages
Master's Written Examination and Solution
No ratings yet
Master's Written Examination and Solution
14 pages
Lecture 9
No ratings yet
Lecture 9
8 pages
Euler Random
No ratings yet
Euler Random
17 pages
Adaptive Control Design and Analysis
No ratings yet
Adaptive Control Design and Analysis
45 pages
XSTKE Chapter 2
No ratings yet
XSTKE Chapter 2
18 pages
Spring Term 2019 Revision Problem Sheet
No ratings yet
Spring Term 2019 Revision Problem Sheet
2 pages
F (T, X, X, - . - , X F Is A Function Defined On Some Subset of T R, R, R, R
No ratings yet
F (T, X, X, - . - , X F Is A Function Defined On Some Subset of T R, R, R, R
6 pages
Comp Numerical Analysis Problems
No ratings yet
Comp Numerical Analysis Problems
7 pages
01 Volterra Rez
No ratings yet
01 Volterra Rez
13 pages
Mlelectures PDF
No ratings yet
Mlelectures PDF
24 pages
Probability and Statistics - 4
No ratings yet
Probability and Statistics - 4
29 pages
Review of Random Variables
No ratings yet
Review of Random Variables
8 pages
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
No ratings yet
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
9 pages
Chapter 3: The Lagrange Method: Elements of Decision: Lecture Notes of Intermediate Microeconomics
No ratings yet
Chapter 3: The Lagrange Method: Elements of Decision: Lecture Notes of Intermediate Microeconomics
12 pages
ch2 Revised
No ratings yet
ch2 Revised
17 pages
Probs 3
No ratings yet
Probs 3
3 pages
MIT18 S096F13 Lecnote3
No ratings yet
MIT18 S096F13 Lecnote3
7 pages
Lecture5 Worksheet Solns
No ratings yet
Lecture5 Worksheet Solns
6 pages
23 Lect DualAlgo
No ratings yet
23 Lect DualAlgo
51 pages
Kkterabio
No ratings yet
Kkterabio
12 pages
Answer Keys For Problem Set 1: MIT 14.04 Intermediate Microeconomie Theory Fall 2003
No ratings yet
Answer Keys For Problem Set 1: MIT 14.04 Intermediate Microeconomie Theory Fall 2003
4 pages
Cycpoly
No ratings yet
Cycpoly
5 pages
Hw3sol 21015 PDF
No ratings yet
Hw3sol 21015 PDF
13 pages
Cornellcstr75 245 PDF
No ratings yet
Cornellcstr75 245 PDF
13 pages
leastsquares_minnorm_problems
No ratings yet
leastsquares_minnorm_problems
6 pages
MTH 211
100% (1)
MTH 211
39 pages
A Brief Introduction To Physics For Mathematicians
No ratings yet
A Brief Introduction To Physics For Mathematicians
291 pages
A Collatz-Type Conjecture On The Set of Rational Numbers
No ratings yet
A Collatz-Type Conjecture On The Set of Rational Numbers
7 pages
An Introduction To Rough Paths
No ratings yet
An Introduction To Rough Paths
59 pages
ART Onvex Optimization
No ratings yet
ART Onvex Optimization
16 pages
Field Guide To Probability Random Processes and Random Data Analysis
No ratings yet
Field Guide To Probability Random Processes and Random Data Analysis
5 pages
Functions of Several Variables: MATH1251 - Calculus Outline
No ratings yet
Functions of Several Variables: MATH1251 - Calculus Outline
11 pages
Newton-Raphson Method
No ratings yet
Newton-Raphson Method
4 pages
P. Codeca - N. Taddia: Abstract. in This Paper We Consider The Problem of L
No ratings yet
P. Codeca - N. Taddia: Abstract. in This Paper We Consider The Problem of L
16 pages
Exact and Approximate Expressions For The Period of Anharmonic Oscillators
No ratings yet
Exact and Approximate Expressions For The Period of Anharmonic Oscillators
11 pages
Can You Hear The Shape of A Manifold?: Andrejs Treibergs
No ratings yet
Can You Hear The Shape of A Manifold?: Andrejs Treibergs
44 pages
Lecture 6 - Fall 2023
No ratings yet
Lecture 6 - Fall 2023
39 pages
Kayatu
No ratings yet
Kayatu
3 pages
Continuous Functions
No ratings yet
Continuous Functions
19 pages
vuu844
No ratings yet
vuu844
16 pages
2010 AIP Pradeep
No ratings yet
2010 AIP Pradeep
12 pages
Prékopa - 1973 - On Logarithmic Concave Measures and Functions
No ratings yet
Prékopa - 1973 - On Logarithmic Concave Measures and Functions
9 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Quat of Bose
No ratings yet
Quat of Bose
7 pages
Finite Element Simulation and Experiment of Chip Formation Process During High Speed Machining of AISI 1045 Hardened Steel
100% (1)
Finite Element Simulation and Experiment of Chip Formation Process During High Speed Machining of AISI 1045 Hardened Steel
5 pages
Case Study On Commercial Stairs
No ratings yet
Case Study On Commercial Stairs
17 pages
Accountability in Developing Assessment Tools: Carlo Magno MCL Consultant
No ratings yet
Accountability in Developing Assessment Tools: Carlo Magno MCL Consultant
94 pages
ALT Codes For Special Characters
No ratings yet
ALT Codes For Special Characters
16 pages
Roof Pitch Calculator - Inch Calculator
No ratings yet
Roof Pitch Calculator - Inch Calculator
24 pages
Arafa 2016
No ratings yet
Arafa 2016
9 pages
Design Coupling Beams Based OnACI and EC
100% (1)
Design Coupling Beams Based OnACI and EC
11 pages
Data Presentation and Interpretation QP
No ratings yet
Data Presentation and Interpretation QP
5 pages
Reservoir Rock Properties and Fluid Flow
No ratings yet
Reservoir Rock Properties and Fluid Flow
2 pages
Dijkstra's Algorithm
No ratings yet
Dijkstra's Algorithm
8 pages
Door Simens
No ratings yet
Door Simens
8 pages
11 Bearing Capacity
No ratings yet
11 Bearing Capacity
14 pages
Final Course Offering Spring-2023-Students
No ratings yet
Final Course Offering Spring-2023-Students
22 pages
Hys007107 MS6514
No ratings yet
Hys007107 MS6514
12 pages
Assignment Questions Final
No ratings yet
Assignment Questions Final
3 pages
Service Manual: Mobile Terminal (V1.0)
No ratings yet
Service Manual: Mobile Terminal (V1.0)
38 pages
BUILDING TECH 1 Lect 3 MASONRY - Building Stones
No ratings yet
BUILDING TECH 1 Lect 3 MASONRY - Building Stones
10 pages
SRV - Connector
No ratings yet
SRV - Connector
4 pages
Fifo
No ratings yet
Fifo
5 pages
Guardlink DEVICE LEVEL SAFETY LINKING TECHNOLOGY
No ratings yet
Guardlink DEVICE LEVEL SAFETY LINKING TECHNOLOGY
6 pages
Understanding Camera Lenses
No ratings yet
Understanding Camera Lenses
8 pages
Puppylinux Reference Card - A Short Guide To Common Commands
No ratings yet
Puppylinux Reference Card - A Short Guide To Common Commands
3 pages
Physics (Work and Energy) Class IX
No ratings yet
Physics (Work and Energy) Class IX
3 pages
A List of The Hot Dogs That Have Been Recalled.
No ratings yet
A List of The Hot Dogs That Have Been Recalled.
3 pages
1896 - Pearson - Mathematical Contributions To The Theory of Evolution. III. Regression, Heredity, and Panmixia
No ratings yet
1896 - Pearson - Mathematical Contributions To The Theory of Evolution. III. Regression, Heredity, and Panmixia
67 pages
STD 12 Chapter 9 Working With Array and String Textual Exercise and Previous Years Board Papers
100% (1)
STD 12 Chapter 9 Working With Array and String Textual Exercise and Previous Years Board Papers
10 pages
Chapter 07 - Trace Elements
No ratings yet
Chapter 07 - Trace Elements
41 pages

Lecture 13

Uploaded by

Lecture 13

Uploaded by

Lecture 13: Non-linear least squares and the

1 Non-linear least squares

1.1 Linear case

ri (x) = ri (x1 , x2 ) = yi − p(ti ) = yi − x1 − x2 ti .

or even a linear combination of basis functions φ1 (t), . . . , φn (t),

These are again examples of (1), where

ri (x) = ri (x1 , . . . , xn ) = yi − p(ti ).

x = [x1 , x2 , . . . , xn ]T is the vector of coefficients of p, and b = [y1 , y2 , . . . , ym ]T

1.2 Non-linear case

In both cases we would again like to find x1 and x2 to minimize

x(k+1) = x(k) − (∇2 f (x(k) ))−1 ∇f (x(k) ). (2)

The advantages of Newton’s method are:

1. If f is quadratic, it converges in one step, i.e., x(1) is the global minimum

2. For non-linear least squares it converges quadratically to a local mini-

The disadvantage of Newton’s method is its lack of robustness. For non-

Differentiating again, with respect to xk , gives

and so the Hessian of f is

x(k+1) = x(k) + α(k) d(k) ,

It is desired to find a curve (model function) of the form

Figure 1: Curve model

We can rewrite this problem as finding x1 and x2 such that

You might also like