0% found this document useful (0 votes)
32 views

ECON 5350 Class Notes Nonlinear Regression Models: 2.1 Linearized Regression Model and The Gauss-Newton Algorithm

1. The document discusses nonlinear regression models where the dependent variable is a nonlinear function of the independent variables and parameters. 2. It presents two examples of nonlinear models and describes methods like linearization and nonlinear optimization that can be used to estimate the parameters of such models. 3. The Gauss-Newton algorithm is described as an iterative least squares procedure to estimate parameters by linearizing the regression model around initial values. Asymptotic properties of the nonlinear least squares estimator are also covered.

Uploaded by

Raul Soles
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

ECON 5350 Class Notes Nonlinear Regression Models: 2.1 Linearized Regression Model and The Gauss-Newton Algorithm

1. The document discusses nonlinear regression models where the dependent variable is a nonlinear function of the independent variables and parameters. 2. It presents two examples of nonlinear models and describes methods like linearization and nonlinear optimization that can be used to estimate the parameters of such models. 3. The Gauss-Newton algorithm is described as an iterative least squares procedure to estimate parameters by linearizing the regression model around initial values. Asymptotic properties of the nonlinear least squares estimator are also covered.

Uploaded by

Raul Soles
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

ECON 5350 Class Notes

Nonlinear Regression Models

1 Introduction

In this section, we examine regression models that are nonlinear in the parameters and give a brief overview

of methods to estimate such models.

2 Nonlinear Regression Models

The general form of the nonlinear regression model is

yi = h(xi ; ; i ); (1)

which is more commonly written in a form with an additive error term

yi = h(xi ; ) + i : (2)

Below are two examples

1. h(xi ; ; i ) = 0 x1i
1
x2i2 exp( i ). This is an intrinsically linear model because by taking natural loga-

rithms, we get a model that is linear in the parameters, ln(yi ) = 0+ 1 ln(x1i ) + 2 ln(x2i ) + i . This

can be estimated with standard linear procedures such as OLS.

2. h(xi ; ) = 0 x1i
1
x2i2 . Since the error term in (2) is additive, there is no transformation that will produce

a linear model. This is an intrinsically nonlinear model (i.e., the relevant …rst-order conditions are

nonlinear in the parameters). Below we consider two methods for estimating such a model –linearizing

the underlying regression model and nonlinear optimization of the objective function.

2.1 Linearized Regression Model and the Gauss-Newton Algorithm

Consider a …rst-order Taylor series approximation of the regression model around 0

yi = h(xi ; ) + i ' h(xi ; 0) + g(xi ; 0 )( 0) + i

where

g(xi ; 0) = (@h=@ 1j = 0
; :::; @h=@ kj = 0
):

1
Collecting terms and rearranging gives

Y 0 = X0 + 0

where

Y0 Y h(X; 0) + g(X; 0) 0

X0 g(X; 0 ):

The matrix X 0 is called the pseudoregressor matrix. Note also that 0


will include higher-order approxima-

tion errors.

2.1.1 Gauss-Newton Algorithm

Given an initial value for 0, we can estimate with the following iterative LS procedure

bt+1 = [X 0 (bt )0 X 0 (bt )] 1


[X 0 (bt )0 Y 0 (bt )]

= [X 0 (bt )0 X 0 (bt )] 1
[X 0 (bt )0 (X 0 (bt )bt + e0t )]

= bt + [X 0 (bt )0 X 0 (bt )] 1
X 0 (bt )0 e0t

= bt + Wt t gt

where Wt = [2X 0 (bt )0 X 0 (bt )] 1


, t = 1 and gt = 2X 0 (bt )0 e0t . The iterations continue until the di¤erence

between bt+1 and bt is su¢ ciently small. This is called the Gauss-Newton algorithm. Interpretations for
2
Wt , t and gt will be given below. A consistent estimator of is

1 Xn
s2 = (yi h(xi ; b))2 :
n k i=1

2.1.2 Properties of the NLS Estimator

Only asymptotic results are available for this estimator. Assuming that the pseudoregessors are well-behaved

(i.e., plim n1 X 00 X 0 = Q0 , a …nite positive de…nite matrix), then we can apply the CLT to show that

2
asy
b N[ ; (Q0 ) 1
],
n

2
where the estimate of n (Q0 ) 1
is s2 (X 00 X 0 ) 1
:

2
2.1.3 Notes

1. Depending on the initial value, b0 , the Gauss-Newton algorithm can lead to a local (as opposed to

global) minimum or head o¤ on a divergent path.

2. The standard R2 formula may produce a goodness-of-…t value outside the interval [0; 1].

3. Extensions of the J test are available that allow one to test nonlinear versus linear models.

4. Hypothesis testing is only valid asymptotically.

2.2 Hypothesis Testing

Consider testing the hypothesis H0 : R( ) = q. Below are four tests that are asymptotically equivalent.

2.2.1 Asymptotic F test.

Begin by letting S(b) = (Y h(X; b))0 (Y h(X; b)) be the sum of square residuals evaluated at the unrestricted

NLS estimate. Also, let S(b ) be the corresponding measure evaluated at the restricted estimate. The

standard F formula gives


[S(b ) S(b)]=J
F = :
S(b)=(n k)
asy 2
Under the null hypothesis, JF (J).

2.2.2 Wald Test.

The nonlinear counterpart to the Wald statistic is

asy
W = [R(b) q]0 [C V^ C 0 ] 1
[R(b) q] 2
(J)

where V^ = ^ 2 (X 00 X 0 ) 1
, C = @R(b)=@b and ^ 2 = S(b)=n.

2.2.3 Likelihood Ratio Test.

2
Assume N (0; I). The likelihood ratio statistic is

asy 2
LR = 2[ln(L ) ln(L)] (J)

where ln(L) and ln(L ) are the unrestricted and restricted (log) likelihood values respectively.

3
2.2.4 Lagrange Multiplier Test.

The LM statistic is based solely on the restricted model. Occasionally, by imposing the restriction R( ) = q,

it may change an intrinsically nonlinear model into an intrinsically linear one. The LM statistic is

e0 X 0 [X 00 X 0 ] 1 X 00 e
LM =
S(b )=n
e0 X 0~b ]
ESS e2 asy 2
= =n = nR (J)
S(b )=n ]
T SS

where e = Y e2 is the coe¢ cient


h(X; b ), X 0 = g(X; b ), ~b is the estimated coe¢ cient of e on X 0 and R

of determination of e on X 0 .

3 Brief Overview of Nonlinear Optimization Techniques

An alternative method for estimating the parameters of equation (2) is to apply nonlinear optimization

techniques directly to the …rst-order conditions. Consider the NLS problem of minimizing

Xn Xn
S(b) = e2i = (yi h(xi ; b))2 :
i=1 i=1

The …rst-order conditions produce

@S(b) Xn @h(xi ; b)
= 2 (yi h(xi ; b)) = 0; (3)
@b i=1 @b

which is generally nonlinear in the parameters and does not have a nice closed-form, analytical solution.

The methods outlined below can be used to solve the set of equations (3).

3.1 Introduction

Consider the function f ( ) = a + b + c 2 . The …rst-order condition for minimization is

df ( )
= b + 2c = 0 =) = b=2c:
d

This is considered a linear optimization problem even though the objective function is nonlinear in the
2
parameters. Alternatively, consider the objective fuction f ( ) = a + b + c ln( ). The …rst-order condition

for minimization is
df ( ) c
= 2b + = 0:
d

This is considered a nonlinear optimization problem.

4
Here is a general outline of how to solve the nonlinear optimization problem. Let be the parameter

vector, the directional vector and the step length.

Procedure.

1. Specify 0 and 0.

2. Determine .

3. Compute t+1 = t + t t.

4. Convergence criterion satis…ed?

Yes ! Exit.

No ! Update t = t + 1, compute t and return to #2.

There are two general types of nonlinear optimization algorithms –those that do not involve derivatives

and those that do.

3.2 Derivative-Free Methods

Derivative-free algorithms are used when the number of parameters are small, analytical derivatives are

di¢ cult to calculate or seed values are needed for other algorithms.

1. Grid search. This is a trial-and-error method that is typically not feasible for more than two parame-

ters. It can be a useful means to …nd starting values for other algorithms.

2. Direct search methods. Using the iterative algorithm t+1 = t + t t, a search is performed in m

directions: 1 ; :::; m. t is chosen to ensure that G( t+1 ) > G( t ).

3. Other methods. Simplex algorithm and simulated annealing are examples of other derivative-free

methods.

3.3 Gradient Methods

The goal is to choose a directional vector t to go uphill (for a max) and an appropriate step length t. Too

big a step may overshoot a max and too small a step may be ine¢ cient. (See Figures 5.3 and 5.4 attached.)

With this in mind, consider choosing t such that the objective function increases (i.e., G( t+1 ) > G( t )).

The relevant derivative is


dG( t+ t t)
= gt0 t
d t

5
where gt = dG( t+1 )=d t+1 . If we let t = Wt gt , where Wt is a positive de…nite matrix, then we know that

dG( t + t t)
= gt0 Wt gt 0.
d t

As a result, almost all algorithms take the general form

t+1 = t + t Wt gt

where t is the step length, Wt is a weighting matrix, and gt is the gradient. The Gauss-Newton algorithm

above could be written in this general form. Here are examples of some other algorithms.

1. Steepest Ascent.

Wt = I so t = gt .

An optimal line search produces t = g 0 g=(g 0 Hg).

Therefore, the algorithm is t+1 = t [gt0 gt =(gt0 Ht gt )]gt .

This method has the drawbacks that (a) it can be slow to converge, especially on long narrow

ridges and (b) H can be di¢ cult to calculate.

2. Newton’s Method (aka Newton-Raphson).

Newton’s method can be motivated by taking a Taylor series approximation (around 0) of the

gradient and setting equal to zero. This gives g( t ) ' g( 0 ) + H( 0 )[ t 0 ]. Rearranging,


1
produces t = 0 H ( 0 )g( 0 ).

1
Therefore, Wt = H and t = 1.

Very popular and works well in many settings.

Hessian can be di¢ cult to calculate or positive de…nite if far from optimum.

Newton’s method will reach the optimum in one step if G( t ) is quadratic.

3. Quadratic Hill Climbing.

1
Wt = (H( t ) I) , where > 0 is chosen to ensure that Wt is positive de…nite.

4. Davidson-Fletcher-Powell (DFP).

Wt+1 = Wt + Et , where Et is a positive de…nite matrix.


( t t )( t t)
0
Wt (gt gt 1 )(gt gt 1 )0 Wt
Et = ( t t )(gt gt 1)
+ (gt gt 1 )0 Wt (gt gt 1 ) .

Notice that no second derivatives (i.e., H( t )) are required.

6
Choose W0 = I.

5. Method of Scoring.

2 1
Wt = E( @@ ln(L)
@ 0 ) .

6. BHHH or Outer Product of the Gradients.

2
Wt = (g( t )g( t )0 ) 1
is an estimate of H 1
( t) = ( @@ ln(L)
@ 0 )
1
.

Wt is always positive de…nite.

Only requires …rst derviatives.

Notes.

1. Nonlinear optimization with constraints. There are several options such as forming a Lagrangian

function, substituting the constraint directly into the objective and imposing arbitrary penalties into

the objective function.

2. Assessing convergence.

The ususal choice of convergence criterion is G or .

Sometimes these methods can be sensitive to the scaling of the function.

Belsley suggests using g 0 H 1


g as the criterion, which removes the units of measurement.

3. The biggest problem in nonlinear optimization is making sure the solution is a global, as opposed to

local, optimum. The above methods work well for globally concave (convex) functions.

3.4 Examples of Newton’s Method

Here are two numerical examples of Newton’s method.

1. Example #1. A sample of data (n = 20) was generated from the intrinsically nonlinear regression

model
2
yt = 1 + 2 x2t + 2 x3t + t,

where 1 = 2 = 1. The objective is to minimize the function

0
G( 1 ; 2) = (y h( 1 ; 2 )) (y h( 1 ; 2 ))

2
where h( 1 ; 2) = 1 + 2 x2t + 2 x3t . See Figure B.2 and Table B.3 (attached) to see how Newton’s

method performs for three di¤erent initial values.

7
2. Example #2. The objective is to minimize

3 2
G( ) = 3 + 5.

The gradient and Hessian are given by

2
g( ) = 3 6 =3 ( 2)

H( ) = 6 6 = 6( 1).

Substituting these into Newton’s algorithm gives

3 t ( t 2) t( t 2)
t+1 = t = t .
6( t 1) 2( t 1)

Now consider two di¤erent starting values 0 = 1:5 and 0 = 0:5.

Starting Value 0 = 1:5 Starting Value 0 = 0:5


(1:5)( 0:5) (0:5)( 1:5)
1 = 1:5 2(0:5) = 2:25 1 = 0:5 2( 0:5) = 0:25
(2:25)(0:25) ( 0:25)( 2:25)
2 = 2:25 2(1:25) = 2:025 2 = 0:25 2( 1:25) = 0:025
(2:025)(0:025) ( 0:025)( 2:025)
3 = 2:025 2(1:025) = 2:0003 3 = 0:025 2( 1:025) = 0:0003.

This examples highlights the fact that, at least for objective functions that are not globally concave

(or convex), the choice of starting values is an important aspect of nonlinear optimization.

4 MATLAB Example

In this example, we are going to estimate the parameters of an intrinsically nonlinear Cobb-Douglas produc-

tion function

Qt = 1 Lt
2
Kt 3 + t

using Gauss-Newton and Newton’s method, as well as test for constant returns to scale (see MATLAB

example 14).

For Gauss-Newton, the relevant gradient vector is

0 0 0 0 0 0
0 0 0
g(X; ) = fLt 2 Kt 3 ; ln(Lt ) 1 Lt Kt
2 3
; ln(Kt ) 1 Lt Kt
2 3
g.

8
For Newton’s method, the relevant gradient and Hessian matrices are

Xn @h(xi ; b)
g(b) = 2 ei ( )
i=1 @b
@g(b)
H(b) = .
@b

You might also like