Chapter 3, Lecture 6: Broyden's Method: This Document Comes From The Math 484 Course Webpage
Chapter 3, Lecture 6: Broyden's Method: This Document Comes From The Math 484 Course Webpage
1 Motivation
Previously, we assumed that computing the gradient ∇f (x) or the Hessian matrix Hf (x) was easy
and costless. But in practice, there are many situations where that’s not true:
• If f : Rn → R with n large, then computing Hf (x) requires finding many derivatives, which
is expensive even when we can take partial derivatives of f easily.
• If f is not given to us explicitly, then we cannot compute derivatives of f directly (unless we
use some approximation algorithm for derivatives).
In one dimension, we discussed the secant method as a solution to the second of these problems.
Here, we used the values at the last two points to estimate the derivative.
Now let’s pass to the n-dimensional case: finding the solution to a system of equations g(x) = 0 for
a function g : Rn → Rn . To approximate Newton’s method, we need an estimate of the Jacobian
∇g; in other words, a linear approximation of g.
We encounter a problem: just the values of g at the last two points are not enough to build a
complete linear approximation of g. (In general, such an approximation would need at least n + 1
points to define.)
Then we let x(k+1) be the point where the linear approximation equals 0.
Now suppose that instead of computing the Jacobian ∇g(x(k) ), we somehow found an approximation
Dk : an n × n matrix that’s our best guess at the partial derivatives of g. Just as with Newton’s
1
This document comes from the Math 484 course webpage: https://faculty.math.illinois.edu/~mlavrov/
courses/484-spring-2019.html
1
method, we make a linear approximation
Together, equations (1), (2), and (3) characterize the change from Dk to Dk+1 .
It is easier to reason in terms of the “update” from Dk to Dk+1 : the difference between Dk+1 and
Dk . If we denote this difference by Uk = Dk+1 − Dk , and define b(k) = x(k+1) − x(k) , then by taking
the difference of (1) and (2), we get
Uk b(k) = g(x(k+1) )
Uk y = 0 whenever y · b(k) = 0.
If this matrix exists, it must be unique, because any input y can be written as y = yk + y⊥ where
yk is a multiple of b(k) , and y⊥ is orthogonal to b(k) ; the equations above define what Uk does to
yk and y⊥ , and therefore they determine what Uk does to y.
One way to get a function f (y) that has this property is to scale g(x(k+1) ) (the desired nonzero
output) proportionally to the dot product y · b(k) . That is, to set
y · b(k)
f (y) = g(x(k+1) ).
b(k) · b(k)
This turns out to be a linear function, so that there actually is some matrix Uk such that f (y) =
T
Uk y. We can see this by rewrititing y · b(k) as (b(k) ) y, so that
T
g(x(k+1) )(b(k) ) y
f (y) = .
b(k) · b(k)
2
This tells us that the unique update matrix Uk that does the job is
T
g(x(k+1) )(b(k) )
Uk = .
b(k) · b(k)
Adding this matrix to Dk to get Dk+1 is called a rank-one update, because the rank of the matrix
T
Uk is 1: its columns are all multiples of g(x(k+1) ). (And its rows are all multiples of (b(k) ) .) This
makes it a “minimal” change from Dk to Dk+1 in some sense, and using a rank-one matrix also has
some theoretical benefits.
3 Broyden’s method
3.1 The method
We begin, as usual, with some starting point x(0) ; we also need a matrix D0 to start as our
approximation to ∇g(x(0) ). (If computing derivatives of g is expensive but not impossible, we
could set D0 equal to the Jacobian, since we only need to do this once. Otherwise, we could set
D0 to the identity matrix because that’s simple to work with.)
To compute x(k+1) and Dk+1 from x(k) and Dk , we:
1. Solve g(x(k) ) + Dk (x(k+1) − x(k) ) = 0 for x(k+1) ; equivalently, set x(k+1) = x(k) − Dk−1 g(x(k) ).
T
g(x(k+1) )(b(k) )
2. Set Dk+1 = Dk + b(k) ·b(k)
, where b(k) = x(k+1) − x(k) .
3.2 An example
Here’s an example that illustrates how the method works, and also demonstrates the ability of
Broyden’s method to recover from a bad initial guess for D0 .
Suppose we are solving the system of equations
(
x + y = 2,
x−y =0
so the step we took is b(0) = (2, 0), and g(x1 , y1 ) = (0, 2). Using the update formula
T
g(x1 , y1 )(b(0) )
D1 = D0 +
b(0) · b(0)
3
we set
1 0 1 0 1 0
D1 = + 2 0 = .
0 1 4 2 1 1
(Newton’s method, or even Broyden’s method with an accurate D0 , would have gotten the answer
in this first step, but we’ll need a bit more work.)
Our second step sets
−1
x2 2 1 0 0 2
= − =
y2 0 1 1 2 −2
so the step we took is b(1) = (0, −2) and g(x2 , y2 ) = (−2, 4). We do another rank-one update to
get
1 0 1 −2 1 1
D2 = + 0 −2 = .
1 1 4 4 1 −1
Now, with information at more points, D2 becomes the correct Jacobian matrix of the linear
function g. In the next step, our linear approximation to g will actually be g, and so (x3 , y3 ) will
be the correct answer (1, 1).
In general, it is not always true that after a bad guess of D0 , the matrix Dk will always approximate
∇g(x(k) ) after many steps. It is often the case that after enough steps, the matrix Dk will accurately
tell us what ∇g(x(k) ) does in the relevant directions: the ones we actually need to use.
4
In this way, we avoid ever having to compute Dk explicitly: only Dk−1 is needed. After simplify-
ing the Sherman–Morrison formula to our specific needs, we can re-summarize Broyden’s method
as:
1. Set x(k+1) = x(k) − Dk−1 g(x(k) ).
2. In terms of the vectors b(k) = x(k+1) − x(k) = −Dk−1 g(x(k) ) and c(k) = Dk−1 g(x(k+1) ), set
T
c(k) (b(k) ) Dk−1
−1
Dk+1 = Dk−1 − (k) .
b · (b(k) + c(k) )
Even if the evaluation of ∇g(x(k) ) were costless, this method would be faster than Newton’s method
when the number of dimensions n is large.