The Levenberg-Marquardt Method For Nonlinear Least Squares Curve-Fitting Problems
The Levenberg-Marquardt Method For Nonlinear Least Squares Curve-Fitting Problems
2
(p) =
m
i=1
_
y(t
i
) y(t
i
; p)
w
i
_
2
(1)
= (y y(p))
T
W(y y(p)) (2)
= y
T
Wy 2y
T
W y + y
T
W y (3)
The value w
i
is a measure of the error in measurement y(t
i
). The weighting matrix W is
diagonal with W
ii
= 1/w
2
i
. If the function y is nonlinear in the model parameters p, then
the minimization of
2
with respect to the parameters must be carried out iteratively. The
goal of each iteration is to nd a perturbation h to the parameters p that reduces
2
.
1
2 The Gradient Descent Method
The steepest descent method is a general minimization method which updates parame-
ter values in the direction opposite to the gradient of the objective function. It is recognized
as a highly convergent algorithm for nding the minimum of simple objective functions [2, 3].
For problems with thousands of parameters, gradient descent methods are sometimes the
only viable method.
The gradient of the chi-squared objective function with respect to the parameters is
2
= (y y(p))
T
W
p
(y y(p)) (4)
= (y y(p))
T
W
_
y(p)
p
_
(5)
= (y y)
T
WJ (6)
where the m n Jacobian matrix [ y/p] represents the local sensitivity of the function y
to variation in the parameters p. For notational simplicity J will be used for [ y/p]. The
perturbation h that moves the parameters in the direction of steepest descent is given by
h
gd
= J
T
W(y y) , (7)
where the positive scalar determines the length of the step in the steepest-descent direction.
3 The Gauss-Newton Method
The Gauss-Newton method is a method for minimizing a sum-of-squares objective func-
tion. It presumes that the objective function is approximately quadratic in the parameters
near the optimal solution [?]. For moderately-sized problems the Gauss-Newton method
typically converges much faster than gradient-descent methods [4].
The function evaluated with perturbed model parameters may be locally approximated
through a rst-order Taylor series expansion.
y(p +h) y(p) +
_
y
p
_
h = y +Jh , (8)
Substituting the approximation for the perturbed function, y +Jh, for y in equation (3),
2
(p +h) y
T
Wy + y
T
W y 2y
T
W y 2(y y)
T
WJh +h
T
J
T
WJh . (9)
This shows that
2
is approximately quadratic in the perturbation h, and that the Hessian
of the chi-squared t criterion is approximately J
T
WJ.
The perturbation h that minimizes
2
is found from
2
/h = 0.
2
(p +h) 2(y y)
T
WJ + 2h
T
J
T
WJ , (10)
and the resulting normal equations for the Gauss-Newton perturbation are
_
J
T
WJ
_
h
gn
= J
T
W(y y) . (11)
2
4 The Levenberg-Marquardt Method
The Levenberg-Marquardt algorithm adaptively varies the parameter updates between
the gradient descent update and the Gauss-Newton update,
_
J
T
WJ + I
_
h
lm
= J
T
W(y y) , (12)
where small values of the algorithmic parameter result in a Gauss-Newton update and large
values of result in a gradient descent update. The parameter is initialized to be large.
If an iteration happens to result in a worse approximation, is increased. As the solution
approaches the minimum, is decreased, the Levenberg-Marquardt method approaches the
Gauss-Newton method, and the solution typically converges rapidly to the local minimum
[2, 3, 4].
Marquardts suggested update relationship [4]
_
J
T
WJ + diag(J
T
WJ)
_
h
lm
= J
T
W(y y) . (13)
is used in the Levenberg-Marquardt algorithm implemented in the Matlab function lm.m
4.1 Numerical Implementation
Many variations of the Levenberg-Marquardt have been published in papers and in
code. This document borrows from some of these, including the enhancement of a rank-1
Jacobian update. In iteration i, the step h is evaluated by comparing
2
(p) to
2
(p +h).
The step is accepted if the metric
i
[5] is greater than a user-specied value,
4
,
i
(h) =
_
2
(p)
2
(p +h)
_
/
_
2h
T
_
i
h +J
T
W(y y(p))
__
If in an iteration
i
(h) >
4
then p +h is suciently better than p, p is replaced by p +h,
and is reduced by a factor. Otherwise is increased by a factor, and the algorithm proceeds
to the next iteration.
4.1.1 Initialization and update of the L-M parameter, , and the parameters p
In lm.m users may select one of three methods for inializing and updating and p.
1.
0
=
o
;
o
is user-specied [4].
_
J
T
WJ +
i
diag[J
T
WJ]
_
h = J
T
W(y y(p))
if
i
(h) >
4
: p p +h;
i+1
= max[
i
/L
, 10
7
];
otherwise:
i+1
= min [
i
L
, 10
7
];
2.
0
=
o
max
_
diag[J
T
WJ]
_
;
o
is user-specied.
_
J
T
WJ +
i
I
_
h = J
T
W(y y(p)) ,
=
_
_
J
T
W(y y(p))
_
T
h
_
/
_
(
2
(p +h)
2
(p)) /2 + 2
_
J
T
W(y y(p))
_
T
h
_
;
if
i
(h) >
4
: p p + h;
i+1
= max [
i
/(1 + ), 10
7
];
otherwise:
i+1
=
i
+|
2
(p + h)
2
(p)|/(2);
3
3.
0
=
o
max
_
diag[J
T
WJ]
_
;
o
is user-specied [5].
_
J
T
WJ +
i
I
_
h = J
T
W(y y(p)) ,
if
i
(h) >
4
: p p +h;
i+1
=
i
max [1/3, 1 (2
i
1)
3
] ;
i
= 2;
otherwise:
i+1
=
i
i
;
i+1
= 2
i
;
For the examples in section 4.4, method 1 [4] with L
11 and L
J
T
W(y y)
<
1
;
Convergence in paramters, max |h
i
/p
i
| <
2
; or
Convergence in
2
,
2
/(mn + 1) <
3
.
Otherwise, iterations terminate when the iteration count exceeds a pre-specied limit.
4
4.2 Error Analysis
Once the optimal curve-t parameters p
t
are determined, parameter statistics are
computed for the converged solution using weight values, w
2
i
, equal to the mean square
measurement error,
2
y
,
w
2
i
=
2
y
=
1
mn + 1
(y y(p
t
))
T
(y y(p
t
)) i . (17)
The parameter covariance matrix is then computed from
V
p
= [J
T
WJ]
1
, (18)
and the asymptotic standard parameter errors are given by
p
=
_
diag([J
T
WJ]
1
) , (19)
The asymptotic standard parameter error is a measure of how unexplained variability in the
data propagates to variability in the parameters, and is essentially an error measure for the
parameters. The standard error of the t is given by
y
=
_
diag(J[J
T
WJ]
1
J
T
) . (20)
The standard error of the t indicates how variability in the parameters aects the variability
in the curve-t. The asymptotic standad prediction error reects the standard error of the
t as well as the mean square measurement error.
yp
=
_
2
y
+ diag(J[J
T
WJ]
1
J
T
) . (21)
4.3 Matlab code: lm.m
The Matlab function lm.m implements the Levenberg-Marquardt method for curve-
tting problems. The code with examples are available here:
http://www.duke.edu/
hpgavin/lm.m
http://www.duke.edu/
hpgavin/lm examp.m
http://www.duke.edu/
hpgavin/lm func.m
http://www.duke.edu/
hpgavin/lm plots.m
5
1 function [p,X2,sigma_p ,sigma_y ,corr ,R_sq ,cvg_hst] = lm(func ,p,t,y_dat ,weight ,dp,p_min ,p_max ,c,opts)
2 % [ p , X2, sigma p , sigma y , corr , R sq , c v g hs t ] = lm( func , p , t , y dat , wei ght , dp , p min , p max , c , opt s )
3 %
4 % Levenberg Marquardt curvef i t t i n g : minimize sum of wei ght ed squared r e s i dual s
5 % INPUT VARIABLES
6 % f unc = f unct i on of n i ndependent v ar i ab l e s , t , and m parameters , p ,
7 % r et ur ni ng t he si mul at ed model : y hat = f unc ( t , p , c )
8 % p = nvect or of i n i t i a l guess of parameter val ues
9 % t = mvect or s or mat ri x of i ndependent v a r i a b l e s ( used as arg t o f unc )
10 % y dat = mvect or s or mat ri x of dat a t o be f i t by f unc ( t , p)
11 % wei ght = wei ght i ng vect or f or l e a s t squares f i t ( wei ght >= 0 ) . . .
12 % i nver s e of t he st andard measurement er r or s
13 % Def aul t : s q r t ( d . o . f . / ( y dat y dat ) )
14 % dp = f r a c t i o na l i ncrement of p f or numeri cal d e r i v a t i v e s
15 % dp( j )>0 c e nt r al d i f f e r e nc e s c al c ul at e d
16 % dp( j )<0 one s i ded backwards d i f f e r e nc e s c al c ul at e d
17 % dp( j )=0 s e t s correspondi ng p a r t i a l s t o zero ; i . e . hol ds p( j ) f i x e d
18 % Def aul t : 0. 001;
19 % p min = nvect or of l ower bounds f or parameter val ues
20 % p max = nvect or of upper bounds f or parameter val ues
21 % c = an opt i onal mat ri x of val ues passed t o f unc ( t , p , c )
22 % opt s = vect or of al gor i t hmi c parameters
23 % parameter d e f a ul t s meaning
24 % opt s (1) = prnt 3 >1 i nt ermedi at e r e s u l t s ; >2 p l o t s
25 % opt s (2) = MaxIter 10Npar maximum number of i t e r a t i o ns
26 % opt s (3) = e ps i l on 1 1e3 convergence t ol er ance f or gr adi ent
27 % opt s (4) = e ps i l on 2 1e3 convergence t ol er ance f or parameters
28 % opt s (5) = e ps i l on 3 1e3 convergence t ol er ance f or Chisquare
29 % opt s (6) = e ps i l on 4 1e2 det ermi nes accept ance of a LM s t ep
30 % opt s (7) = l ambda 0 1e2 i n i t i a l val ue of LM paramter
31 % opt s (8) = l ambda UP fac 11 f ac t or f or i ncr eas i ng lambda
32 % opt s (9) = l ambda DN fac 9 f ac t or f or decreas i ng lambda
33 % opt s (10) = Update Type 1 1: LevenbergMarquardt lambda updat e
34 % 2: Quadrati c update
35 % 3: Ni el sen s lambda update equat i ons
36 %
37 % OUTPUT VARIABLES
38 % p = l e as t squares opt i mal es t i mat e of t he parameter val ues
39 % X2 = Chi squared c r i t e r i a
40 % si gma p = asympt ot i c st andard error of t he parameters
41 % si gma y = asympt ot i c st andard error of t he curvef i t
42 % corr = c or r e l at i on mat ri x of t he parameters
43 % R sq = Rsquared c o f f i c i e n t of mul t i pl e det ermi nat i on
44 % c v g hs t = convergence hi s t or y
The m-le to solve a least-squares curve-t problem with lm.m can be as simple as:
1 my_data = load(my_data_file ); % l oad t he dat a
2 t = my_data (:,1); % i f t he i ndependent v ar i ab l e i s i n column 1
3 y_dat = my_data (: ,2); % i f t he dependent v ar i ab l e i s i n column 2
4
5 p_min = [ -10 0.1 5 0.1 ]; % minimum expect ed parameter val ues
6 p_max = [ 10 5.0 15 0.5 ]; % maximum expect ed parameter val ues
7 p_init = [ 3 2.0 10 0.2 ]; % i n i t i a l guess f or parameter val ues
8
9 [ p_fit , X2, sigma_p , sigma_y , corr , R_sq , cvg_hst ] = ...
10 lm ( lm_func , p_init , t, y_dat , 1, -0.01, p_min , p_max )
where the user-supplied function lm_func.m could be, for example,
1 function y_hat = lm_func(t,p,c)
2 y_hat = p(1) * t .* exp(-t/p(2)) .* cos (2* pi *( p(3)*t - p(4) ));
6
It is common and desirable to repeat the same experiment two or more times and to estimate
a single set of curve-t parameters from all the experiments. In such cases the data le may
arranged as follows:
1 % tv ar i ab l e y (1 s t experi ment ) y (2nd experi emnt ) y (3 rd experi emnt )
2 0.50000 3.5986 3.60192 3.58293
3 0.80000 8.1233 8.01231 8.16234
4 0.90000 12.2342 12.29523 12.01823
5 : : : :
6 etc. etc. etc. etc.
If your data is arranged as above you may prepare the data for lm.m using the following lines.
1 my_data = load(my_data_file ); % l oad t he dat a
2 t_column = 1; % column of t he i ndependent v ar i ab l e
3 y_columns = [ 2 3 4 ]; % columns of t he measured dependent v a r i a b l e s
4
5 y_dat = my_data(:,y_columns ); % t he measured dat a
6 y_dat = y_dat (:); % a s i ng l e column vect or
7
8 t = my_data(:,t_column ); % t he i ndependent v ar i ab l e
9 t = t*ones(1,length(y_columns )); % a column of t f or each column of y
10 t = t(:); % a s i ng l e column vect or
Note that the arguments t and y dat to lm.m may be matrices as long as the dimensions of
t match the dimensions of y dat. The columns of t need not be identical. Results may be
plotted with lm plots.m:
1 function lm_plots ( t, y_dat , y_fit , sigma_y , cvg_hst , filename , epsPlots )
2 % l m pl ot s ( t , y dat , y f i t , sigma y , cvg hs t , f i l ename , eps Pl ot s )
3 % Pl ot s t a t i s t i c s of t he r e s u l t s of a LevenbergMarquardt l e a s t squares
4 % anal ys i s wi t h lm.m
5
6 formatPlot(epsPlots );
7
8 y_dat = y_dat (:);
9 y_fit = y_fit (:);
10
11 n = si ze (cvg_hst ,2)-3;
12
13 fi gure (101); % pl ot convergence hi s t or y of parameters , chi 2 , lambda
14 c l f
15 subplot (211)
16 plot( cvg_hst (:,1), cvg_hst (:,2:n+1), -o,linewidth ,4);
17 ylabel (parameter values )
18 subplot (212)
19 semilogy( cvg_hst (:,1) , [ 10. cvg_hst(:,n+2) cvg_hst(:,n+3)], -o,linewidth ,4)
20 legend(10{\ chi 2/(m-n+1)},\lambda );
21 ylabel (10{\ chi 2/(m-n+1)} and \lambda )
22 xlabel (function calls)
23 i f epsPlots , print( spri ntf (%s_1.eps,filename),-color ,-solid ,-F:28); end
24
25
26 fi gure (102); % pl ot data , f i t , and conf i dence i nt e r v a l of f i t
27 c l f
28 plot(t,y_dat ,og, t,y_fit ,-b,
29 t,y_fit +1.96* sigma_y ,.k, t,y_fit -1.96* sigma_y ,.k);
30 legend(y_{data},y_{fit},95% c.i.,);
31 ylabel (y(t))
32 xlabel (t)
33 % s ub pl ot (212)
34 % semi l ogy ( t , sigma y , r , l i newi dt h , 4 ) ;
35 % y l a b e l ( \ si gma y ( t ) )
36 i f epsPlots , print( spri ntf (%s_2.eps,filename),-color ,-solid ,-F:28); end
7
37
38 fi gure (103); % pl ot hi st ogram of r e s i dual s , are t hey Gaussean?
39 c l f
40 hi st ( real (y_dat - y_fit))
41 t i t l e (histogram of residuals )
42 axis (tight)
43 xlabel (y_{data} - y_{fit})
44 ylabel (count)
45 i f epsPlots , print( spri ntf (%s_3.eps,filename),-color ,-solid ,-F:28); end
8
4.4 Numerical Examples
The robustness of lm.m is tested in three numerical examples by curve-tting simulated
experimental measurements. Noisy experimental measurements y are simulated by adding
random measurement noise to the curve-t function evaluated with a set of true parameter
values y(t; p
true
). The random measurement noise is normally distributed with a mean of
zero and a standard deviation of 0.20.
y
i
= y(t
i
; p
true
) + N(0, 0.20). (22)
The convergence of the parameters from an erroneous initial guess p
initial
to values closer to
p
true
is then examined.
Each numerical example below has four parameters (n = 4) and one-hundred mea-
surements (m = 100). Each numerical example has a dierent curve-t function y(t; p), a
dierent true parameter vector p
true
, and a dierent vector of initial parameters p
initial
.
For several values of p
2
and p
4
, the
2
error criterion is calculated and is plotted as a
surface over the p
2
p
4
plane. The bowl-shaped nature of the objective function is clearly
evident in each example. The objective function may not appear quadratic in the parameters
and the objective function may have multiple minima. The presence of measurement noise
does not aect the smoothness of the objective function.
The gradient descent method endeavors to move parameter values in a down-hill direc-
tion to minimize
2
(p). This often requires small step sizes but is required when the objec-
tive function is not quadratic. The Gauss-Newton method approximates the bowl shape as
a quadratic and endeavors to move parameter values to the minimum in a small number of
steps. This method works well when the parameters are close to their optimal values. The
Levenberg-Marquardt method retains the best features of both the gradient-descent method
and the Gauss-Newton method.
The evolution of the parameter values, the evolution of
2
, and the evolution of from
iteration to iteration is plotted for each example.
The simulated experimental data, the curve t, and the 95-percent condence interval
of the t are plotted, the standard error of the t, and a histogram of the t errors are also
plotted.
The initial parameter values p
initial
, the true parameter values p
true
, the t parameter
values p
t
, the standard error of the t parameters
p
, and the correlation matrix of the
t parameters are tabulated. The true parameter values lie within the condence interval
p
t
1.96
p
< p
true
< p
t
+ 1.96
p
with a condence level of 95 percent.
9
4.4.1 Example 1
Consider tting the following function to a set of measured data.
y(t; p) = p
1
exp(t/p
2
) + p
3
t exp(t/p
4
) (23)
The m-function to be used with lm.m is simply:
1 function y_hat = lm_func(t,p,c)
2 y_hat = p(1)*exp(-t/p(2)) + p(3)*t.*exp(-t/p(4));
The true parameter values p
true
, the initial parameter values p
initial
, resulting curve-t
parameter values p
t
and standard errors of the t parameters
p
are shown in Table 1.
The R
2
t criterion is 98 percent. The standard parameter errors are less than one percent
of the parameter values except for the standard error for p
2
, which is 1.5 percent of p
2
.
The parameter correlation matrix is given in Table 2. Parameters p
3
and p
4
are the most
correlated at -96 percent. Parameters p
1
and p
4
are the least correlated at -35 percent.
The bowl-shaped nature of the
2
objective function is shown in Figure 1(a). This
shape is nearly quadratic and has a single minimum.
The convergence of the parameters and the evolution of
2
and are shown in Fig-
ure 1(b).
The data points, the curve t, and the curve t condence band are plotted in Fig-
ure 1(c). Note that the standard error of the t is smaller near the center of the t domain
and is larger at the edges of the domain.
A histogram of the dierence between the data values and the curve-t is shown in
Figure 1(d). Ideally these curve-t errors should be normally distributed.
10
Table 1. Parameter values and standard errors.
p
initial
p
true
p
t
p
p
/p
t
(%)
5.0 20.0 19.918 0.150 0.75
2.0 10.0 10.159 0.152 1.50
0.2 1.0 0.9958 0.005 0.57
10.0 50.0 50.136 0.209 0.41
Table 2. Parameter correlation matrix.
p
1
p
2
p
3
p
4
p
1
1.00 -0.74 0.40 -0.35
p
2
-0.74 1.00 -0.77 0.71
p
3
0.40 -0.77 1.00 -0.96
p
4
-0.35 0.71 -0.96 1.00
(a)
0
5
10
15
20
p
2
10
20
30
40
50
60
70
80
90
p
4
2
2.5
3
3.5
4
4.5
log
10
(
2
)
(b)
10
-8
10
-6
10
-4
10
-2
10
0
10
2
10
4
5 10 15 20 25 30 35
2
a
n
d
function calls
0
10
20
30
40
50
60
5 10 15 20 25 30 35
p
a
r
a
m
e
t
e
r
v
a
l
u
e
s
(c)
10
-2
10
-1
0 20 40 60 80 100
y
(
t
)
t
13
14
15
16
17
18
19
20
0 20 40 60 80 100
y
(
t
)
y
data
y
fit
y
fit
+1.96
y
y
fit
-1.96
y
(d)
0
5
10
15
20
-0.2 -0.1 0 0.1 0.2
c
o
u
n
t
y
data
- y
fit
histogram of residuals
Figure 1. (a) The sum of the squared errors as a function of p
2
and p
4
. (b) Top: the convergence
of the parameters with each iteration, (b) Bottom: values of
2
and each iteration. (c) Top:
data y, curve-t y(t; p
t
), curve-t+error, and curve-t-error; (c) Bottom: standard error of
the t,
y
(t). (d) Histogram of the errors between the data and the t.
11
4.4.2 Example 2
Consider tting the following function to a set of measured data.
y(t; p) = p
1
(t/ max(t)) + p
2
(t/ max(t))
2
+ p
3
(t/ max(t))
3
+ p
4
(t/ max(t))
4
(24)
This function is linear in the parameters and may be t using methods of linear least squares.
The m-function to be used with lm.m is simply:
1 function y_hat = lm_func(t,p,c)
2 mt = max(t);
3 y_hat = p(1)*(t/mt) + p(2)*(t/mt).2 + p(3)*(t/mt).3 + p(4)*(t/mt).4;
The true parameter values p
true
, the initial parameter values p
initial
, resulting curve-t
parameter values p
t
and standard errors of the t parameters
p
are shown in Table 3. The
R
2
t criterion is 99.9 percent. In this example, the standard parameter errors are larger
than in example 1. The standard error for p
2
is 12 percent and the standard error of p
3
is 17 percent. Note that a very high value of the R
2
coecient of determination does not
necessarily mean that parameter values have been found with great accuracy. The parameter
correlation matrix is given in Table 4. These parameters are highly correlated with one
another, meaning that a change in one parameter will almost certainly result in changes in
the other parameters.
The bowl-shaped nature of the
2
objective function is shown in Figure 2(a). This
shape is nearly quadratic and has a single minimum. The correlation of parameters p
2
and
p
4
, for example, is easily seen from this gure.
The convergence of the parameters and the evolution of
2
and are shown in Fig-
ure 2(b). The parameters converge monotonically to their nal values.
The data points, the curve t, and the curve t condence band are plotted in Fig-
ure 2(c). Note that the standard error of the t approaches zero at t = 0 and is largest at
t = 100. This is because y(0; p) = 0, regardless of the values in p.
A histogram of the dierence between the data values and the curve-t is shown in
Figure 1(d). Ideally these curve-t errors should be normally distributed, and they appear
to be so in this example.
12
Table 3. Parameter values and standard errors.
p
initial
p
true
p
t
p
p
/p
t
(%)
4.0 20.0 19.934 0.506 2.54
-5.0 -24.0 -22.959 2.733 11.90
6.0 30.0 27.358 4.597 16.80
10.0 -40.0 -38.237 2.420 6.33
Table 4. Parameter correlation matrix.
p
1
p
2
p
3
p
4
p
1
1.00 -0.97 0.92 -0.87
p
2
-0.97 1.00 -0.99 0.95
p
3
0.92 -0.99 1.00 -0.99
p
4
-0.87 0.95 -0.99 1.00
(a)
-50
-40
-30
-20
-10
0
p
2
-80
-70
-60
-50
-40
-30
-20
-10
0
p
4
0
1
2
3
4
5
log
10
(
2
)
(b)
10
-8
10
-7
10
-6
10
-5
10
-4
10
-3
10
-2
10
-1
10
0
10
1
10
2
6 8 10 12 14 16
2
a
n
d
function calls
-60
-40
-20
0
20
40
6 8 10 12 14 16
p
a
r
a
m
e
t
e
r
v
a
l
u
e
s
(c)
10
-3
10
-2
10
-1
0 20 40 60 80 100
y
(
t
)
t
-15
-10
-5
0
5
10
0 20 40 60 80 100
y
(
t
)
y
data
y
fit
y
fit
+1.96
y
y
fit
-1.96
y
(d)
0
5
10
15
20
25
-0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2
c
o
u
n
t
y
data
- y
fit
histogram of residuals
Figure 2. (a) The sum of the squared errors as a function of p
2
and p
4
. (b) Top: the convergence
of the parameters with each iteration, (b) Bottom: values of
2
and each iteration. (c) Top:
data y, curve-t y(t; p
t
), curve-t+error, and curve-t-error; (c) Bottom: standard error of
the t,
y
(t). (d) Histogram of the errors between the data and the t.
13
4.4.3 Example 3
Consider tting the following function to a set of measured data.
y(t; p) = p
1
exp(t/p
2
) + p
3
sin(t/p
4
) (25)
This function is linear in the parameters and may be t using methods of linear least squares.
The m-function to be used with lm.m is simply:
1 function y_hat = lm_func(t,p,c)
2 y_hat = p(1)*exp(-t/p(2)) + p(3)* si n(t/p(4));
The true parameter values p
true
, the initial parameter values p
initial
, resulting curve-t
parameter values p
t
and standard errors of the t parameters
p
are shown in Table 5. The
R
2
t criterion is 99.8 percent. In this example, the standard parameter errors are all less
than one percent. The parameter correlation matrix is given in Table 6. Parameters p
4
is
not correlated with the other parameters. Parameters p
1
and p
2
are most correlated at 73
percent.
The bowl-shaped nature of the
2
objective function is shown in Figure 3(a). This
shape is clearly not quadratic and has multiple minima. In this example, the initial guess
for parameter p
4
, the period of the oscillatory component, has to be within ten percent of
the true value, otherwise the algorithm in lm.m will converge to a very small value of the
amplitude of oscillation p
3
and an erroneous value for p
4
. When such an occurrence arises,
the standard errors
p
of the t parameters p
3
and p
4
are quite large and the histogram of
curve-t errors (Figure 3(d)) is not normally distributed.
The convergence of the parameters and the evolution of
2
and are shown in Fig-
ure 3(b). The parameters converge monotonically to their nal values.
The data points, the curve t, and the curve t condence band are plotted in Fig-
ure 3(c).
A histogram of the dierence between the data values and the curve-t is shown in
Figure 1(d). Ideally these curve-t errors should be normally distributed, and they appear
to be so in this example.
14
Table 5. Parameter values and standard errors.
p
initial
p
true
p
t
p
p
/p
t
(%)
10.0 6.0 5.987 0.032 0.53
50.0 20.0 20.100 0.144 0.72
6.0 1.0 0.978 0.010 0.98
5.6 5.0 4.999 0.004 0.08
Table 6. Parameter correlation matrix.
p
1
p
2
p
3
p
4
p
1
1.00 -0.74 -0.28 -0.02
p
2
-0.74 1.00 0.18 0.02
p
3
-0.28 0.18 1.00 -0.02
p
4
-0.02 0.02 -0.02 1.00
(a)
0
5
10
15
20
25
30
35
40
p
2
0
2
4
6
8
10
p
4
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
log
10
(
2
)
(b)
10
-8
10
-6
10
-4
10
-2
10
0
10
2
10
4
0 10 20 30 40 50
2
a
n
d
function calls
0
5
10
15
20
25
30
35
0 10 20 30 40 50
p
a
r
a
m
e
t
e
r
v
a
l
u
e
s
(c)
10
-2
10
-1
0 20 40 60 80 100
y
(
t
)
t
-1
0
1
2
3
4
5
6
0 20 40 60 80 100
y
(
t
)
y
data
y
fit
y
fit
+1.96
y
y
fit
-1.96
y
(d)
0
5
10
15
20
25
-0.2 -0.1 0 0.1 0.2 0.3
c
o
u
n
t
y
data
- y
fit
histogram of residuals
Figure 3. (a) The sum of the squared errors as a function of p
2
and p
4
. (b) Top: the convergence
of the parameters with each iteration, (b) Bottom: values of
2
and each iteration. (c) Top:
data y, curve-t y(t; p
t
), curve-t+error, and curve-t-error; (c) Bottom: standard error of
the t,
y
(t). (d) Histogram of the errors between the data and the t.
15
4.5 Fitting in Multiple Dimensions
The code lm.m can carry out tting in multiple dimensions. For example, the function
z(x, y) = (p
1
x
p
2
+ (1 p
1
)y
p
2
)
1/p
2
may be t to data points z
i
(x
i
, y
i
), (i = 1, , m), using lm.m using an m-le such as
1 my_data = load(my_data_file ); % l oad t he dat a
2 x_dat = my_data (: ,1); % i f t he i ndependent v ar i ab l e x i s i n column 1
3 y_dat = my_data (: ,2); % i f t he i ndependent v ar i ab l e y i s i n column 2
4 z_dat = my_data (: ,3); % i f t he dependent v ar i ab l e z i s i n column 3
5
6 p_min = [ 0.1 0.1 ]; % minimum expect ed parameter val ues
7 p_max = [ 0.9 2.0 ]; % maximum expect ed parameter val ues
8 p_init = [ 0.5 1.0 ]; % i n i t i a l guess f or parameter val ues
9
10 t = [ x_dat y_dat ]; % x and y are column v ect or s of i ndependent v a r i a b l e s
11
12 [p_fit ,Chi_sq ,sigma_p ,sigma_y ,corr ,R2 ,cvg_hst] = ...
13 lm(lm_func2d ,p_init ,t,z_dat ,weight ,0.01,p_min ,p_max);
with the m-function lm_func2d.m
1 function z_hat = lm_func2d(t,p)
2 % exampl e f unct i on used f or nonl i near l e a s t squares curvef i t t i n g
3 % t o demonst rat e t he LevenbergMarquardt f unct i on , lm.m,
4 % i n two f i t t i n g di mensi ons
5
6 x_dat = t(: ,1);
7 y_dat = t(: ,2);
8 z_hat = ( p(1)* x_dat.p(2) + (1-p(1))* y_dat .p(2) ).(1/p(2));
16
5 Remarks
This text and lm.m were written in an attempt to understand and explain methods of
nonlinear least squares for curve-tting applications.
It is important to remember that the purpose of applying statistical methods for data
analysis is primarily to estimate parameter statistics . . . not the parameter values them-
selves. Reasonable parameter values can often be found using non-statistical minimization
algorithms, such as random search methods, the Nelder-Mead simplex method, or simply
griding the parameter space and nding the best combination of parameter values.
Nonlinear least squares problems can have objective functions with multiple local min-
ima. Fitting algorithms will converge to dierent local minima depending upon values of the
initial guess, the measurement noise, and algorithmic parameters. It is perfectly appropriate
and good to use the best available estimate of the desired parameters as the initial guess. In
the absence of physical insight into a curve-tting problem, a reasonable initial guess may be
found by coarsely griding the parameter space and nding the best combination of parameter
values. There is no sense in forcing any statistical curve-tting algorithm to work too hard
by starting it with a poor initial guess. In most applications, parameters identied from
neighboring initial guesses (5%) should converge to similar parameter estimates (0.1%).
The t statistics of these converged parameters should be the same, within two or three
signicant gures.
References
[1] K. Levenberg. A Method for the Solution of Certain Non-Linear Problems in Least Squares.
The Quarterly of Applied Mathematics, 2: 164-168 (1944).
[2] M.I.A. Lourakis. A brief description of the Levenberg-Marquardt algorithm implemented by
levmar, Technical Report, Institute of Computer Science, Foundation for Research and Tech-
nology - Hellas, 2005.
[3] K. Madsen, N.B. Nielsen, and O. Tingle. Methods for nonlinear least squares problems. Tech-
nical Report. Informatics and Mathematical Modeling, Technical University of Denmark, 2004.
[4] D.W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters, Journal
of the Society for Industrial and Applied Mathematics, 11(2):431-441, 1963.
[5] H.B. Nielson, Damping Parameter In Marquardts Method, Technical Report IMM-REP-1999-
05, Dept. of Mathematical Modeling, Technical University Denmark.
[6] W.H. Press, S.A. Teukosky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes in C,
Cambridge University Press, second edition, 1992.
[7] Mark K.Transtrum, Benjamin B. Machta, and James P. Sethna, Why are nonlinear ts to
data so challenging?, Phys. Rev. Lett. 104, 060201 (2010),
[8] Mark K. Transtrum and James P. Sethna Improvements to the Levenberg-Marquardt algo-
rithm for nonlinear least-squares minimization, Preprint submitted to Journal of Computa-
tional Physics, January 30, 2012.
17