0% found this document useful (0 votes)
558 views

Chapter 9 Newton's Method

Newton's method uses both first and second derivatives to find the minimum of a function, performing better than steepest descent which only uses first derivatives. It works by constructing a quadratic approximation of the function around the current point and finding the minimum of that approximation. This minimum then becomes the next iteration point. The method converges rapidly if started close to the solution but may not be a descent method and could fail to converge if the Hessian is not positive definite. The Levenberg-Marquardt modification adds a damping parameter to ensure the search direction always points towards descent. Newton's method can also be applied to nonlinear least squares problems.

Uploaded by

Hajra Swati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
558 views

Chapter 9 Newton's Method

Newton's method uses both first and second derivatives to find the minimum of a function, performing better than steepest descent which only uses first derivatives. It works by constructing a quadratic approximation of the function around the current point and finding the minimum of that approximation. This minimum then becomes the next iteration point. The method converges rapidly if started close to the solution but may not be a descent method and could fail to converge if the Hessian is not positive definite. The Levenberg-Marquardt modification adds a damping parameter to ensure the search direction always points towards descent. Newton's method can also be applied to nonlinear least squares problems.

Uploaded by

Hajra Swati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Chapter 9 Newton’s Method

An Introduction to Optimization
Spring, 2014

Wei-Ta Chu

1
Introduction
The steepest descent method uses only first derivatives in
selecting a suitable search direction.
Newton’s method (sometimes called Newton-Raphson method)
uses first and second derivatives and indeed performs better.
Given a starting point, construct a quadratic approximation to
the objective function that matches the first and second
derivative values at that point. We then minimize the
approximate (quadratic function) instead of the original
objective function. The minimizer of the approximate function
is used as the starting point in the next step and repeat the
procedure iteratively.

2
Introduction
We can obtain a quadratic approximation to the twice
continuously differentiable function using the
Taylor series expansion of about the current point ,
neglecting terms of order three and higher.

Where, for simplicity, we use the notation


Applying the FONC to yields

If , then achieves a
minimum at

3
Example
Use Newton’s method to minimize the Powell function:

Use as the starting point . Perform three


iterations.
Note that . We have

4
Example
Iteration 1.

5
Example
Iteration 2.

6
Example
Iteration 3.

7
Introduction
Observe that the th iteration of Newton’s method can be
written in two steps as
1. Solve for
2. Set
Step 1 requires the solution of an system of linear
equations. Thus, an efficient method for solving systems of
linear equations is essential when using Newton’s method.
As in the one-variable case, Newton’s method can be viewed as
a technique for iteratively solving the equation

where and . In this case is the Jacobian


matrix of at ; that is, is the matrix whose
entry is ,
8
Analysis of Newton’s Method
As in the one-variable case there is no guarantee that Newton’s
algorithm heads in the direction of decreasing values of the
objective function if is not positive definite (recall
Figure 7.7)
Even if , Newton’s method may not be a descent
method; that is, it is possible that
This may occur if our starting point is far away from the solution
Despite these drawbacks, Newton’s method has superior
convergence properties when the starting point is near the
solution.
Newton’s method works well if everywhere.
However, if for some , Newton’s method may fail
to converge to the minimizer.

9
Analysis of Newton’s Method
The convergence analysis of Newton’s method when is a
quadratic function is straightforward. Newton’s method reaches
the point such that in just one step starting from
any initial point .
Suppose that is invertible and
Then, and
Hence, given any initial point , by Newton’s algorithm

Therefore, for the quadratic case the order of convergence of


Newton’s algorithm is for any initial point
10
Analysis of Newton’s Method
Theorem 9.1: Suppose that and is a point such
that and is invertible. Then, for all
sufficiently close to , Newton’s method is well defined for all
and converge to with an order of convergence at least 2.
Proof: The Taylor series expansion of about yields

Because by assumption and is invertible, there


exist constants , and such that if ,
, we have

and by Lemma 5.3, exists and satisfies

11
Analysis of Newton’s Method

The first inequality holds because the remainder term in the


Taylor series expansion contains third derivatives of that are
continuous and hence bounded on
Suppose that . Then, substituting
in the inequality above and using the assumption that
we get

12
Analysis of Newton’s Method
Subtracting from both sides of Newton’s algorithm and
taking norms yields

Applying the inequalities above involving the constants and

Suppose that is such that

Then

13
Analysis of Newton’s Method
By induction, we obtain

Hence, and therefore the sequence


converges to . The order of convergence is at least 2 because
. That is,

14
Analysis of Newton’s Method
Theorem 9.2: Let be the sequence generated by Newton’s
method for minimizing a given objective function . If the
Hessian and , then the search
direction

from to is a descent direction for in the sense that


there exists an such that for all

15
Analysis of Newton’s Method
Proof: Let , then using the chain rule, we
obtain

Hence,

because and .
Thus, there exists an so that for all ,
This implies that for all

16
Analysis of Newton’s Method
Theorem 9.2 motivates the following modification of Newton’s
method

where
that is, at each iteration, we perform a line search in the
direction
A drawback of Newton’s method is that evaluation of
for large can be computationally expensive. Furthermore, we
have to solve the set of linear equations . In
Chapters 10 and 11 we discuss this issue.
The Hessian matrix may not be positive definite. In the next we
describe a simple modification to overcome this problem.

17
Levenberg-Marquardt Modification
If the Hessian matrix is not positive definite, then the
search direction may not point in a descent
direction.
Levenberg-Marquardt modification:

Consider a symmetric matrix , which may not be positive


definite. Let be the eigenvalues of with
corresponding eigenvectors . The eigenvalues are real,
but may not all be positive.
Consider the matrix , where . Note that the
eigenvalues of are .

18
Levenberg-Marquardt Modification
Indeed,

which shows that for all , is also an eigenvector of


with eigenvalue .
If is sufficiently large, then all the eigenvalues of are
positive and is positive definite.
Accordingly, if the parameter in the Levenberg-Marquardt
modification of Newton’s algorithm is sufficiently large, then
the search direction always points
in a descent direction.

19
Levenberg-Marquardt Modification
If we further introduce a step size

then we are guaranteed that the descent property holds.


By letting , the Levenberg-Marquardt modification
approaches the behavior of the pure Newton’s method.
By letting , this algorithm approaches a pure gradient
method with small step size.
In practice, we may start with a small value of and increase
it slowly until we find that the iteration is descent:

20
Newton’s Method for Nonlinear Least Squares
Consider , where ,
are given functions. This particular problem is called a
nonlinear least-squares problem.
Suppose that we are given measurements of a process at
points in time. Let denote the measurement times and
the measurements values. Note that and
We wish to fit a sinusoid to the measurement data.

21
Newton’s Method for Nonlinear Least Squares
The equation of the sinusoid is

with appropriate choices of the parameters .


To formulate the data-fitting problem, we construct the
objective function

representing the sum of the squared errors between the


measurement values and the function values at the
corresponding points in time.
Let represent the vector of decision variables. We
obtain the least-squares problem with

22
Newton’s Method for Nonlinear Least Squares
Defining , we write the objective function as
. To apply Newton’s method, we need to
compute the gradient and the Hessian of .
The th component of is

Denote the Jacobian matrix of by

Thus, the gradient of can be represented as

23
Newton’s Method for Nonlinear Least Squares
We compute the Hessian matrix of . The th component
of the Hessian is given by

Letting be the matrix whose th component is

We write the Hessian matrix as

24
Newton’s Method for Nonlinear Least Squares
Therefore, Newton’s method applied to the nonlinear least-
squares problem is given by

In some applications, the matrix involving the second


derivatives of the function can be ignored because its
components are negligibly small.
In this case Newton’s algorithm reduces to what is commonly
called the Gauss-Newton method:

Note that the Gauss-Newton method does not require


calculation of the second derivatives of

25
Example

The Jacobian matrix in this problem is a matrix


with elements given by

We apply the Gauss-Newton algorithm to find the sinusoid of


best fit.
The parameters of this sinusoid are

26
Newton’s Method for Nonlinear Least Squares
A potential problem with the Gauss-Newton method is that the
matrix may not be positive definite.
This problem can be overcome using a Levenberg-Marquardt
modification:

This is referred to in the literature as the Levenberg-Marquardt


algorithm because the original modification was developed
specifically for the nonlinear least-squares problem.
An alternative interpretation of the Levenberg-Marquardt
algorithm is to view the term as an approximation to
in the Newton’s algorithm.

27

You might also like