Least Square Methods
Least Square Methods
Abstract
In this task we will use optimization methods, such as line searchs and Levenberg-Marquardt
method for nonlinear least square regression. We will use three methods in line search to
compute the descent step direction, which are Steepest descent, Hybrid (Steepest-Newton), and
Gauss-Newton, while the step size must satisfy strong Wolfe condition. We also use Dogleg,
Levenberg-Marquardt, and Hybrid II (LM-QN) as other methods. Finally, we will compare
these methods based on its computational speed (number of iterations) and accuracy.
Keywords : Nonlinear least square, Line search methods, Levenberg-Marquardt.
I. Introduction
1.1 The Objective Function (SSE)
Before we talk about the objective function, we begin by introducing the data. Here, we
generate the data using MATLAB (See appendix B) by creating an array of independent
variables 𝑡𝑡 , and the dependent variables 𝑡𝑡 , where
1
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
Where 𝑡 is called the model function, which chosen as 𝑡(𝑡, 𝑡) = 𝑡3 𝑡𝑡1𝑡 + 𝑡4 𝑡𝑡2𝑡 in this
case. Of course, this is a nonlinear model, since we have the exponential terms for the
parameters 𝑡1 , 𝑡2 . Moreover, 𝐹 is the objective function, which is half of the sum of
squared error (SSE), with 𝑡𝑡 being the errors. Note that the term “error” for 𝑡𝑡 ’s are
completely different from 𝑡𝑡 ’s. We seek the best parameters 𝑡1 , … , 𝑡4 which gives a
minimum value for 𝑡, and therefore we call this a nonlinear least square problem.
1.2 Derivatives of 𝑭
Before we discuss each numerical methods used to find the minimizer, we will first
compute the gradient ∇𝑡, Jacobian 𝑡 and the Hessian ∇2 𝑡 for later usage in some
numerical methods. Denoting 𝑥 = (𝑥1 , … , 𝑥4 )𝑇 , we have
𝑚 𝑚
𝐹𝑥3 𝑥3 = ∑ 𝑒 2𝑥1 𝑡𝑗
, 𝐹𝑥4 𝑥4 = ∑ 𝑒 2𝑥2 𝑡𝑗
, 𝐹𝑥3 𝑥4 = ∑ 𝑒 (𝑥1 +𝑥2 )𝑡𝑗
𝑗=1 𝑗=1 𝑗=1
2
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
𝑚 𝑚
With only four variables, we already have to compute 10 derivatives, just to write the
Hessian matrix. This is why the Newton method, which uses the hessian within the
computation is not really preferable, or may be counted as a disadvantage.
1.3 General Computation in MATLAB
Once again, our goal is to fit the model with the saved data (see Appendix B). We will
use the same exact data on all methods, so we can fairly compare the effectiveness of
each method. There are two categories, in which we divide the methods. The first
categories include all line searches methods (Steepest descent, Hybrid I (Steepest-
Newton), Gauss-Newton), where the complete code is written in Appendix A, (A.1). The
rest of the methods (Dogleg, Levenberg-Marquardt, Hybrid II (LM-QN)) will be
embedded into the same category, and the code is given in Appendix A (A.2).
We choose the same starting point (initial point) for all methods, which is 𝑥0 =
[2,3,15, −7]𝑇 . We choose this starting point, so all of the chosen methods succeed to
converge to a minima. Note that some points may lead to a failure on some method, which
later be discuss. Furthermore, we record the converge point 𝑥 ∗ as the model’s parameter
on each method, along with the value of 𝐹(𝑥 ∗ ) and number of iterations.
3
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
We then plot the fitted curve from these three methods onto the same figure.
The termination condition for all method is when the norm of the gradient ‖∇𝑓‖2 < 10−3,
which actually mean that the norm-2 of the gradient is close to zero. The step direction in
iteration 𝑘 denoted as 𝑝𝑘 (pk in code), with step size 𝛼𝑘 (alpha in code). The number of
iterations are saved in variable iter, and the parameters 𝑥 ∗ are saved as Xmin, and 𝐹(𝑥 ∗ )
saved as Fmin. Then, we take these variables to an array param1,param2,param3.
The step size 𝛼𝑘 are computed using algorithm 3.2 & 3.3 in [1] pg.59,60 which uses the
strong Wolfe condition with 𝑐1 = 10−4 , 𝑐2 = 0.9. The interpolation part in the zoom
function will use the successive quadratic interpolation, which coded as matlab function
quadmin (see Appendix A).
Figure 2. The resulting fitted curves along with the data using steepest descent method
(blue), Hybrid I method (orange), and Gauss-Newton method (yellow).
The fitted curve can be seen in Figure 2, as the thick blue line. A small value for SSE is
a good indication that the process converge to a minima, yet it is too early to say the SSE
good enough, compared to other methods. It seems that this so practical method sacrifices
computational speed, since we have a lot of iterations going.
4
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
5
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
Method 1 := Dogleg
Method 2 := Levenberg-Marquardt
Method 3 := Hybrid II (LM-QN)
We then plot the fitted curve from these three methods onto the same figure (see Figure
3). The number of iterations are saved in variable iter, and the parameters 𝑥 ∗ are saved
as Xmin, and 𝐹(𝑥 ∗ ) saved as Fmin. Then, we take these variables to an array
param1,param2,param3.
Figure 3. Fitted curves resulted along with the data using Dogleg method (blue),
Levenberg-Marquardt method (orange), and Hybrid II method (yellow).
6
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
The fitted curve can be seen in Figure 3, as the thick blue line. Compared to steepest
descent, and Hybrid I method, this method is clearly more superior, in terms of
computational speed, while the SSE is slightly differ from 0.0111 and insignificant to be
considered so different. However, the Gauss-Newton has smaller iterations, so that we
still prefer Gauss-Newton method for this case. Note that, this is just a hard comparison,
and it is hard to discuss whether, which method is better (GN or Dogleg), since choosing
different parameters for this method, may gives a different number of iterations, and
maybe SSE. Therefore, we cannot really conclude which method is better (GN or
Dogleg), but we may say that this method is quite good, since the number of iteration is
quite small, and the computations are not so difficult. The disadvantage of this method
however, lays on the choosing of the parameters, where sometimes, bad choice of
parameters can gives a bad result.
3.3 Levenberg-Marquardt (LM) Method
One of the damped methods is the Levenberg-Marquardt (LM) method. This method
steps ℎ𝑙𝑚 determined by the following :
𝑇
(𝐽𝑇 𝐽 + 𝜇𝐼)ℎ𝑙𝑚 = −𝐽𝑇 𝑓,̅ 𝑓 ̅ = (𝑓1 (𝑥), … , 𝑓𝑚 (𝑥))
The terms 𝐽𝑇 𝐽 + 𝜇𝐼 approximates the hessian, while 𝐽𝑇 𝑓 ̅ represents ∇𝑓. When 𝜇 is large,
the LM steps are close to steepest descent, while small value of 𝜇, gives a close steps to
Gauss-Newton, and therefore, we may see this method as somewhat a middle
combination of steepest descent with Gauss-Newton. Implementing algorithm 3.16 in [2]
pg.27, with parameters 𝜀1 = 𝜀2 = 0.01, 𝜏 = 10−6 will yield the following results :
7
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
IV. Conclusions
We finish this report by the following conclusions :
All of the methods discussed ( Steepest descent, Hybrid I, Gauss-Newton, Dogleg,
LM, Hybrid II ) can be used to solve non-linear least square problem.
From all methods, the smallest value for 𝐹 is 0.0111, with the minimizer
parameters 𝑥1 , … , 𝑥4 of the model 𝑀 is close to the parameters used in generating
the data.
Implementation of all methods displayed in Appendix A.
The best method from the above discussions is LM and Hybrid II.
Reference
[1] J. Nocedal, S.J. Wright, Numerical Optimization, Springer, 1999.
[2] K. Madsen, H.B Nielsen, Method for Non Linear Least Square Problem,
IMM-DTU, 2004.
8
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
Appendix A
A.1 Complete MATLAB code for Line-search methods :
% Nonlinear Least Square (fitting) with Line Search
% Set model to M(x1...x4,t)=x3*exp(x1*t)+x4*exp(x2*t)
% Created by : Nicholas Malvin 20119020
clear all; clc; close all;
for k=1:maxiter
xk=x0;
X(1,k)=xk(1);X(2,k)=xk(2);X(3,k)=xk(3);X(4,k)=xk(4);
if Norm(gf(xk,t,y))<10^-3
break
end
%Descent direction pk
if method==1 % Steepest
pk=-gf(xk,t,y);
end
if method==2 % Hybrid(Stee-Newt)
H=hess(xk,t,y);E=eig(H);
if (E(1)<=0 || E(2)<=0) || (E(3)<=0 || E(4)<=0)
pk=-gf(xk,t,y);
else
pk=-H^(-1)*gf(xk,t,y);
end
end
if method==3 % Gauss-Newton
H=J(xk,t)'*J(xk,t);
pk=-H^(-1)*gf(xk,t,y);
end
9
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
if abs(dphi(a1,xk,pk,t,y))<=-c2*dphi(0,xk,pk,t,y)
alpha=a1; break
end
if dphi(a1,xk,pk,t,y)>=0
alpha=zoom(a1,a0,xk,pk,c1,c2,t,y); break
end
a0=a1;
a1=(a1+amax)/2;
end
x0=x0+alpha*pk;
end
Xmin=x0; % minimizer
Fmin=F(Xmin,t,y); % value of f at minimizer
iter=k; % Number of iterations
%Plotting regression
x1=Xmin(1);x2=Xmin(2);x3=Xmin(3);x4=Xmin(4);
figure(2)
hold on
tplot=-2:2.5/100:0.5;
if method==1
plot(tplot,x3*exp(x1*tplot)+x4*exp(x2*tplot),'color',[0, 0.447,
0.741],'linewidth',9)
param1=[x1 x2 x3 x4 Fmin iter];
end
if method==2
plot(tplot,x3*exp(x1*tplot)+x4*exp(x2*tplot),'color',[0.8500,
0.3250, 0.0980],'linewidth',5)
param2=[x1 x2 x3 x4 Fmin iter];
end
if method==3
plot(tplot,x3*exp(x1*tplot)+x4*exp(x2*tplot),'-.','color',[0.9290,
0.6940, 0.1250],'linewidth',2)
aaaaplot(t,y,'o','MarkerFaceColor','w','MarkerEdgeColor','k','MarkerSi
ze',8,'linewidth',1);
param3=[x1 x2 x3 x4 Fmin iter];
end
axis([-3 1 0.9*min(y) 1.1*max(y)]);
Leg=legend('Steepest','Hybrid(Stee-Newt)','Gauss-newton','Data');
set(Leg,'location','NorthWest');
end
%% Functions
function z=f(j,x,t,y)
z=y(j)-x(3)*exp(x(1)*t(j))-x(4)*exp(x(2)*t(j));
end
10
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
z=0;
for j=1:length(t)
z=z+(f(j,x,t,y))^2;
end
z=0.5*z;
end
function N=Norm(x)
N=sqrt(x(1)^2+x(2)^2+x(3)^2+x(4)^2);
end
11
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
P(j)=F(xk+x(j)*pk,t,y);
end
tol=0.001; %tolerance
for i=1:maxiter
p=polyfit(x,P,2);
xmin=-p(2)/2/p(1);
x=[x(2:end) xmin];
e=abs(x(3)-x(2))/abs(x(3));
if e<tol
break
end
for j=1:3
P(j)=F(xk+x(j)*pk,t,y);
end
end
y=xmin;
end
%ZOOM FUNCTION
function z = zoom(a0,a1,xk,pk,c1,c2,t,y)
for k=1:1000
if k==1000 %if zoom not converge, we safeguard the steplength as 1
z=1;
end
a=quadmin(a0,a1,xk,pk,t,y);
if phi(a,xk,pk,t,y)>phi(0,xk,pk,t,y)+c1*a*dphi(0,xk,pk,t,y) ||
phi(a,xk,pk,t,y)>=phi(a0,xk,pk,t,y)
a1=a;
else
if abs(dphi(a,xk,pk,t,y))<=-c2*dphi(0,xk,pk,t,y)
z=a; return
end
if dphi(a,xk,pk,t,y)*(a1-a0)>=0
a1=a0;
end
a0=a;
end
end
end
12
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
if Norm(hgn)<=D
hdl=hgn;
else if Norm(alp*hsd)>=D
hdl=(D/Norm(hsd))*hsd;
else
a=alp*hsd;b=hgn;c=a'*(b-a);
if c<=0
beta=(-c+sqrt(c^2+(Norm(b-a))^2*(D^2-a'*a)))/((b-
a)'*(b-a));
else
beta=(D^2-a'*a)/(c+sqrt(c^2+(Norm(b-a))^2*(D^2-
a'*a)));
end
hdl=alp*hsd+beta*(hgn-alp*hsd);
end
end
if Norm(hdl)<=0.01*(Norm(x)+0.01)
found=1;
else
xnew=x+hdl
r=(F(x,t,y)-F(xnew,t,y))/(L([0;0;0;0],x,t,y)-
L(hdl,x,t,y));
if r>0
x=xnew; g=J(x,t)'*fvec(x,t,y);
aaaaaaaaaaaaaaaafound=(max(abs(fvec(x,t,y)))<=0.001)||(max(abs(g))<=0.
01);
end
if r>0.75
D=max(D,3*Norm(hdl));
else if r<0.25
D=D/2; found=(D<=0.01*(Norm(x)+0.01));
13
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
end
end
end
end
end
if method==2 %Levenberg-Marquardt
k=0; v=2; x=x0; A=J(x,t)'*J(x,t);
g=J(x,t)'*fvec(x,t,y); tau=10^-6;
found=(max(abs(g))<=0.01); mu=tau*max(diag(A));
I=diag(ones(1,4));
while found==0 && k<maxiter
k=k+1; hlm=-((A+mu*I)^(-1))*g;
if Norm(hlm)<=0.001*(Norm(x)+0.01)
found=1;
else
xnew=x+hlm;
r=(F(x,t,y)-F(xnew,t,y))/(0.5*hlm'*(mu*hlm-g));
if r>0
x=xnew; A=J(x,t)'*J(x,t); g=J(x,t)'*fvec(x,t,y);
found=(max(abs(g))<=0.01);
mu=mu*max(1/3,1-(2*r-1)^3); v=2;
else
mu=mu*v; v=2*v;
end
end
end
end
14
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
else
count=0; better=0; mu=mu*v; v=2*v;
end
end
else
xnew=x; met='QN'; better=0; hqn=-B*gf(x,t,y);
if Norm(hqn)<=0.01*(Norm(x)+0.01)
found=1;
else
better=(F(xnew,t,y)<F(x,t,y))||(F(xnew,t,y)<=(1+10^-
6)*F(x,t,y) && max(abs(gf(xnew,t,y)))
<max(abs(gf(x,t,y))));
if max(abs(gf(xnew,t,y)))>=max(abs(gf(x,t,y)))
met='LM';
end
end
end
h=xnew-x;Y=J(xnew,t)'*J(xnew,t)*h+(J(xnew,t)-
J(x,t))'*fvec(xnew,t,y);
if h'*Y>0
V=B*h; B=B+(1/(h'*Y))*Y*Y'-(1/(h'*V)*V)*V';
end
if better==1
x=xnew;
end
end
end
Xmin=x; % minimizer
Fmin=F(Xmin,t,y); % value of f at minimizer
iter=k; % Number of iterations
%Plotting regression
x1=Xmin(1);x2=Xmin(2);x3=Xmin(3);x4=Xmin(4);
figure(2)
hold on
tplot=-2:2.5/100:0.5;
if method==1
plot(tplot,x3*exp(x1*tplot)+x4*exp(x2*tplot),'color',[0, 0.4470,
0.7410],'linewidth',9)
param1=[x1 x2 x3 x4 Fmin iter];
end
if method==2
plot(tplot,x3*exp(x1*tplot)+x4*exp(x2*tplot),'color',[0.8500,
0.3250, 0.0980],'linewidth',5)
param2=[x1 x2 x3 x4 Fmin iter];
end
if method==3
plot(tplot,x3*exp(x1*tplot)+x4*exp(x2*tplot),'-.','color',[0.9290,
0.6940, 0.1250],'linewidth',2)
aaaaplot(t,y,'o','MarkerFaceColor','w','MarkerEdgeColor','k','MarkerSi
ze',8,'linewidth',1);
param3=[x1 x2 x3 x4 Fmin iter];
end
axis([-3 1 0.9*min(y) 1.1*max(y)]);
Leg=legend('Dogleg','Lev-Mar','Hybrid(LM-QN)','Data')
set(Leg,'location','NorthWest')
end
15
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
%% Functions
function z=f(j,x,t,y)
z=y(j)-x(3)*exp(x(1)*t(j))-x(4)*exp(x(2)*t(j));
end
function z=fvec(x,t,y)
z=f(1,x,t,y);
for j=2:length(t)
z=[z;f(j,x,t,y)];
end
end
function Ja=J(x,t)
Ja=[-x(3)*t(1)*exp(x(1)*t(1)) -x(4)*t(1)*exp(x(2)*t(1)) -
exp(x(1)*t(1)) -exp(x(2)*t(1))];
for k=2:length(t)
j=[-x(3)*t(k)*exp(x(1)*t(k)) -x(4)*t(k)*exp(x(2)*t(k)) -
exp(x(1)*t(k)) -exp(x(2)*t(k))];
Ja=[Ja;j];
end
end
function N=Norm(x)
N=sqrt(x(1)^2+x(2)^2+x(3)^2+x(4)^2);
end
function Lin=L(h,x,t,y)
Lin=0.5*(Norm(fvec(x,t,y)+J(x,t)*h))^2;
end
16
MA5171 Metoda Optimasi Lanjut Nicholas Malvin / 20119020
Appendix B
Generating data by MATLAB :
%% Generate data (t,y)
t=-2:2.5/30:0.5-2.5/30;m=length(t);
y=10*exp(t)-5*exp(2*t)+normrnd(0,0.03,[1 m]);
figure(1)
plot(t,y,'o','MarkerFaceColor','w','MarkerEdgeColor','k','MarkerSize',
8,'linewidth',1);
axis([-3 1 0.9*min(y) 1.1*max(y)]);
Leg=legend('Data (t_i,y_i)')
set(Leg,'location','NorthWest')
𝑖 𝑡 𝑦
1 -2 1,29241004146288
2 -1,91666666666667 1,33655843638144
3 -1,83333333333333 1,48343080355583
4 -1,75000000000000 1,59720575339143
5 -1,66666666666667 1,72086369413927
6 -1,58333333333333 1,82029994022383
7 -1,50000000000000 1,99217146710787
8 -1,41666666666667 2,11568193915654
9 -1,33333333333333 2,26166074052819
10 -1,25000000000000 2,41852492988996
11 -1,16666666666667 2,66030686900851
12 -1,08333333333333 2,78648170473390
13 -1 2,99693058020358
14 -0,916666666666667 3,16283825140780
15 -0,833333333333334 3,39269026688311
16 -0,750000000000000 3,51105359278980
17 -0,666666666666667 3,78357672286345
18 -0,583333333333333 3,98054225334309
19 -0,500000000000000 4,19547586823796
20 -0,416666666666667 4,41301724381983
21 -0,333333333333333 4,58846707716412
22 -0,250000000000000 4,81368646590987
23 -0,166666666666667 4,86500749948671
24 -0,0833333333333333 4,96053455349911
25 5,55111512312578e-17 4,95292053588978
26 0,0833333333333334 4,94791695097324
27 0,166666666666667 4,79540270361915
28 0,250000000000000 4,59755678409042
29 0,333333333333333 4,24304664861307
30 0,416666666666667 3,67621611337129
17