Na Notes
Na Notes
rank web pages. Even the letters you are reading, whose shapes are
specified by polynomial curves, would suffer. (Several important ex-
ceptions involve discrete, not continuous, mathematics: combinatorial
optimization, cryptography and gene sequencing.)
On one hand, we are interested in complexity: we want algorithms
that minimize the number of calculations required to compute a solu-
tion. But we are also interested in the quality of approximation: since
we do not obtain exact solutions, we must understand the accuracy
of our answers. Discrepancies arise from approximating a compli-
cated function by a polynomial, a continuum by a discrete grid of
points, or the real numbers by a finite set of floating point numbers.
Different algorithms for the same problem will differ in the quality of
their answers and the labor required to obtain those answers; we will
2
pn ( x j ) = f ( x j ) for j = 0, . . . , n.
4
• If so, is it unique?
p n ( x ) = c0 + c1 x + c2 x 2 + · · · + c n x n
c0 + c1 x0 + c2 x02 + · · · + cn x0n = f ( x0 )
c0 + c1 x1 + c2 x12 + · · · + cn x1n = f ( x1 )
..
.
c0 + c1 xn + c2 xn2 + · · · + cn xnn = f ( xn ).
0.4 x2
x3
0.2 x4
x5
0
form a basis for IR2 . However, both vectors point in nearly the same
direction, though of course they are linearly independent. We can write
the vector [1, 1] T as a unique linear combination of these basis vec-
tors:
" # " # " #
1 1 1
(1.2) = 10, 000, 000, 000 − 9, 999, 999, 999 .
1 10−10 0
Although the vector we are expanding and the basis vectors them-
selves are all have modest size (norm), the expansion coefficients are
enormous. Furthermore, small changes to the vector we are expand-
ing will lead to huge changes in the expansion coefficients. This is a
recipe for disaster when computing with finite-precision arithmetic.
This same phenomenon can occur when we express polynomials
in the monomial basis. As a simple example, consider interpolating
f ( x ) = 2x + x sin(40x ) at uniformly spaced points (x j = j/n, j =
0, . . . , n) in the interval [0, 1]. Note that f ∈ C ∞ [0, 1]: this f is a ‘nice’
function with infinitely many continuous derivatives. As seen in
Figures 1.2–1.3, f oscillates modestly on the interval [0, 1], but it
7
-1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
-1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
f ( x j ) − pn ( x j ) = 0, j = 0, . . . , n.
100
10
-5 max j | f ( x j ) − pn ( x j )|
10-10
-15
10
-20
10
0 5 10 15 20 25 30 35 40
n
max | f ( x j ) − pn ( x j )|.
0≤ j ≤ n
Rather than being nearly zero, this quantity grows with n, until the
computed ‘interpolating’ polynomial differs from f at some interpo-
lation point by roughly 1/10 for the larger values of n: we must have
higher standards!
This is an example where a simple problem formulation quickly
yields an algorithm, but that algorithm gives unacceptable numerical
results.
Perhaps you are now troubled by this entirely reasonable question:
If the computations of pn are as unstable as Figure 1.4 suggests, why
should we put any faith in the plots of interpolants for n = 10 and,
especially, n = 30 in Figures 1.2–1.3?
You should trust those plots because I computed them using a
much better approach, about which we shall next learn.
10
The monomial basis may seem like the most natural way to write
down the interpolating polynomial, but it can lead to numerical
problems, as seen in the previous lecture. To arrive at more stable
expressions for the interpolating polynomial, we will derive several
different bases for Pn that give superior computational properties:
the expansion coefficients {c j } will typically be smaller, and it will
be simpler to determine those coefficients. This is an instance of
a general principle of applied mathematics: to promote stability,
express your problem in a well-conditioned basis.
Suppose we have some basis {b j }nj=0 for Pn . We seek the polyno- Recall that {b j }nj=0 is a basis if the
mial p ∈ Pn that interpolates f at x0 , . . . , xn . Write p in the basis functions span Pn and are linearly
independent. The first requirement
as means that for any polynomial p ∈ Pn
we can find constants c0 , . . . , cn such
p( x ) = c0 b0 ( x ) + c1 b1 ( x ) + · · · + cn bn ( x ). that
p = c0 b0 + · · · + cn bn ,
We seek the coefficients c0 , . . . , cn that express the interpolant p in while the second requirement means
this basis. The interpolation conditions are that if
0 = c0 b0 + · · · + cn bn
p( x0 ) = c0 b0 ( x0 ) + c1 b1 ( x0 ) + · · · + cn bn ( x0 ) = f ( x0 ) then we must have c0 = · · · = cn = 0.
p( x1 ) = c0 b0 ( x1 ) + c1 b1 ( x1 ) + · · · + cn bn ( x1 ) = f ( x1 )
..
.
p( xn ) = c0 b0 ( xn ) + c1 b1 ( xn ) + · · · + cn bn ( xn ) = f ( xn ).
p0 ( x ) = c0 .
p1 ( x ) = p0 ( x ) + c1 q1 ( x )
p1 ( x0 ) = p0 ( x0 ) + c1 q1 ( x0 )
= f ( x0 ) + c1 q1 ( x0 ).
p1 ( x ) = c0 + c1 ( x − x0 ),
Solving for c1 ,
f ( x1 ) − c0
c1 = .
x1 − x0
Next, find the p2 ∈ P2 that interpolates f at x0 , x1 , and x2 , where
p2 has the form
p2 ( x ) = p1 ( x ) + c2 q2 ( x ).
Similar to before, the first term, now p1 ( x ), ‘does the right thing’ at
the first two interpolation points, p1 ( x0 ) = f ( x0 ) and p1 ( x1 ) = f ( x1 ).
12
q2 ( x ) = ( x − x0 )( x − x1 ).
f ( x2 ) = p2 ( x2 ) = p1 ( x2 ) + c2 q2 ( x2 ),
where
n −1
qn ( x ) = ∏ ( x − x j ),
j =0
f ( xn ) − ∑nj=−01 c j q j ( xn )
cn = .
qn ( xn )
0.4
q2 ( x )
0.2 q3 ( x )
q4 ( x )
0
q5 ( x )
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
f ( x1 ) − c0
c1 = .
x1 − x0
With c0 and c1 , we can solve for c2 , and so on. This procedure, for-
ward substitution, requires roughly n2 floating point operations once
the entries are formed.
With this Newton form of the interpolant, one can easily update
pn to pn+1 in order to incorporate a new data point ( xn+1 , f ( xn+1 )),
as such a change affects neither the previous values of c j nor q j . The
new data ( xn+1 , f ( xn+1 )) simply adds a new row to the bottom of
the matrix in (1.4), which preserves the triangular structure of the
matrix and the values of {c0 , . . . , cn }. If we have already found these
coefficients, we easily obtain cn+1 through one more step of forward
substitution.
This form makes it clear that ` j ( x j ) = 1. With these new basis func-
tions, the constants {c j } can be written down immediately. The inter-
polating polynomial has the form
n
pn ( x ) = ∑ c k ` k ( x ).
k =0
When x = x j , all terms in this sum will be zero except for one, the
k = j term (since `k ( x j ) = 0 except when j = k). Thus,
pn ( x j ) = c j ` j ( x j ) = c j ,
`5 ( x )
`1 ( x ) `2 ( x ) `3 ( x ) `4 ( x ) on the interval [ a, b] = [0, 1] with
1 x j = j/5 (black dots). Note that each
Lagrange polynomial has roots at n of
the interpolation points. Compare these
0.5
polynomials to the monomial and New-
ton basis polynomials in Figures 1.1
0
and 1.5 (but note the different vertical
scale): these basis vectors look most
-0.5 independent of all.
-1
-1.5
-2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
The fact that these basis functions are not as closely aligned as the
previous ones has interesting consequences on the size of the coeffi-
cients {c j }. For example, if we have n + 1 = 6 interpolation points
for f ( x ) = sin(10x ) + cos(10x ) on [0, 1], we obtain the following
coefficients:
monomial Newton Lagrange
c0 1.0000000e+00 1.0000000e+00 1.0000000e+00
c1 4.0861958e+01 -2.5342470e+00 4.9315059e-01
c2 -3.8924180e+02 -1.7459341e+01 -1.4104461e+00
c3 1.0775024e+03 1.1232385e+02 6.8075479e-01
c4 -1.1683645e+03 -2.9464687e+02 8.4385821e-01
c5 4.3685881e+02 4.3685881e+02 -1.3830926e+00
max | f ( x ) − pn ( x )|
x ∈[ a,b]
implies
From this formula follows a bound for the worst error over [ a, b]:
| f (n+1) (ξ )| n
(1.7) max | f ( x ) − pn ( x )| ≤
x ∈[ a,b]
max
ξ ∈[ a,b] ( n + 1) !
max ∏ |x − xj |
x ∈[ a,b] j=0
.
We shall carefully prove this essential result; it will repay the ef-
fort, for this theorem becomes the foundation upon which we shall
build the convergence theory for piecewise polynomial approxima-
tion and interpolatory quadrature rules for definite integrals.
for some constant λ chosen to make the interpolant exact at xb. For
convenience, we write
n
w( x ) ≡ ∏( x − x j )
j =0
f ( xb) − pn ( xb)
λ= .
w( xb)
and we do not need all of them. Simply observe that we can write
w( x ) = x n+1 + q( x ), for some q ∈ Pn , and this polynomial q will
vanish when we take n + 1 derivatives:
n +1
( n +1) d n +1
w (x) = x + q(n+1) ( x ) = (n + 1)! + 0.
dx n+1
f ( n +1) ( ξ )
λ= .
( n + 1) !
( x − x0 ) k ( k )
f ( x ) = f ( x0 ) + ( x − x0 ) f 0 ( x0 ) + · · · + f ( x0 ) + remainder.
k!
Ignoring the remainder term at the end, note that the Taylor expan-
sion gives a polynomial model of f , but one based on local infor-
mation about f and its derivatives, as opposed to the polynomial
interpolant, which is based on global information, but only about f ,
not its derivatives.
An interesting feature of the interpolation bound is the polynomial
w( x ) = ∏nj=0 ( x − x j ). This quantity plays an essential role in ap-
proximation theory, and also a closely allied subdiscipline of complex
analysis called potential theory. Naturally, one might wonder what
choice of points { x j } minimizes |w( x )|: We will revisit this question
when we study approximation theory in the near future. For now, we
simply note that the points that minimize |w( x )| over [ a, b] are called
Chebyshev points, which are clustered more densely at the ends of the
interval [ a, b].
Example 1.1 ( f ( x) = sin( x)). We shall apply the interpolation
bound to f ( x ) = sin( x ) on x ∈ [−5, 5]. Since f (n+1) ( x ) = ± sin( x ) or
± cos( x ), we have maxx∈[−5,5] | f (n+1) ( x )| = 1 for all n. The interpo-
lation result we just proved then implies that for any choice of distinct
interpolation points in [−5, 5],
n
∏ |x − x j | < 10n+1 ,
j =0
20
the worst case coming if all the interpolation points are clustered at
an end of the interval [−5, 5]. Now our theorem ensures that
10n+1
max | sin( x ) − pn ( x )| ≤ .
x ∈[−5,5] ( n + 1) !
For small values of n, this bound will be very large, but eventually
(n + 1)! grows much faster than 10n+1 , so we conclude that our error
must go to zero as n → ∞ regardless of where in [−5, 5] we place our
interpolation points! The error bound is shown in the first plot below.
Consider the following specific example: Interpolate sin( x ) at
points uniformly selected in [−1, 1]. At first glance, you might think
there is no reason that we should expect our interpolants pn to con-
verge to sin( x ) for all x ∈ [−5, 5], since we are only using data from
the subinterval [−1, 1], which is only 20% of the total interval and
does not even include one entire period of the sine function. (In fact,
sin( x ) attains neither its maximum nor minimum on [−1, 1].) Yet
the error bound we proved above ensures that the polynomial in-
terpolant must converge throughout [−5, 5]. This is illustrated in
the first plot below. The next plots show the interpolants p4 ( x ) and
p10 ( x ) generated from these interpolation points. Not surprisingly,
these interpolants are most accurate near [−1, 1], the location of the
interpolation points (shown as circles), but we still see convergence
well beyond [−1, 1], in the same way that the Taylor expansion for
sin( x ) at x = 0 will converge everywhere.
Example 1.2 (Runge’s Example). The error bound (1.7) suggests those
functions for which interpolants might fail to converge as n → ∞:
beware if higher derivatives of f are large in magnitude over the
interpolation interval. The most famous example of such behavior
is due to Carl Runge, who studied convergence of interpolants for
f ( x ) = 1/(1 + x2 ) on the interval [−5, 5]. This function looks beau-
tiful: it resembles a bell curve, with no singularities in sight on IR, as
Figure 1.8 shows. However, the interpolants to f at uniformly spaced
points over [−5, 5] do not seem to converge even for x ∈ [−5, 5].
21
5
10
Figure 1.7: Interpolation of sin( x ) at
error bound points x0 , . . . , xn uniformly distributed
10 0
10n+1 /(n + 1)! on [−1, 1]. We develop an error bound
from Theorem 1.3 for the interval
[ a, b] = [−5, 5]. The bound proves that
10 -5
even though the interpolation points
only fall in [−1, 1], the interpolant will
10 -10 max | sin( x ) − pn ( x )| still converge throughout [−5, 5]. The
x ∈[−5,5] top plot shows this convergence for
-15
n = 0, . . . , 40; the bottom plots show
10
the polynomials p4 and p1 0, along with
the interpolation points that determine
10 -20 these polynomials (black circles).
-25
10
0 5 10 15 20 25 30 35 40
n
3 3
2 p4 ( x ) 2
f (x)
1 1
0
f (x) 0
p10 ( x )
-1 -1
-2 -2
-3 -3
-5 -1 0 1 5 -5 -1 0 1 5
x x
8x2 2
f 00 ( x ) = −
(1 + x 2 )3 (1 + x 2 )2
48x3 24x
f 000 ( x ) = − +
(1 + x 2 )4 (1 + x 2 )3
348x4 288x2 24
f (iv) ( x ) = 2 5
− +
(1 + x ) 2
(1 + x ) 4 (1 + x 2 )3
3
10
Figure 1.8: Interpolation of Runge’s
function 1/( x2 + 1) at points x0 , . . . , xn
uniformly distributed on [−5, 5]. The
2 top plot shows this convergence for
10
n = 0, . . . , 25; the bottom plots show the
x )|
n( interpolating polynomials p4 , p8 , p16 ,
) −p and p24 , along with the interpolation
10 1 x
2 +1 points that determine these polynomi-
| 1/ ( als (black circles). These interpolants
x
ma−5,5] do not converge to f as n → ∞. This is
[
x∈ not a numerical instability, but a fatal
0
10 flaw that arises when interpolating with
large degree polynomials at uniformly
spaced points.
-1
10
0 5 10 15 20 25
n
1.5
1.2
1 1
0.8
0.5
0.6
0.4 0
0.2 f (x) p8 ( x )
-0.5
0
-0.2
p4 ( x ) -1
-0.4
-1.5
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
x x
4 50
2
0
0
-2 -50
-4
-100
-6
p16 ( x )
p24 ( x )
-150
-8
-10 -200
-12
-250
-14
-16 -300
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
x x
-1
f (x) -0.6 f 0 (x)
-0.8
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
x x
#10 6
4
20 3
2
10
1
0 0
-1
-10
-2
-20 f (4) ( x ) -3 f (10) ( x )
-4
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
x x
#10 18 #10 25
3
1.5
2
1
1
0.5
0 0
-0.5
-1
-1
-2
f (20) ( x ) f (25) ( x )
-1.5
-3
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
x x
The good news is that there always exists a suitable set of interpo-
lation points for any given f ∈ C [ a, b].
These results are both quite abstract; for example, the construction
of the offensive example in Faber’s Theorem is not nearly as con-
crete as Runge’s nice example for uniformly spaced points discussed
above. We will revisit the question of the convergence of interpolants
in a few weeks when we discuss Chebyshev polynomials. Then we
will be able to say something much more positive: there exists a nice
set of points that works for all but the ugliest functions in C [ a, b].
25
again with
n
w( x ) := ∏ ( x − x j ).
j =0
(1.9) f ( x j ) = p( x j ) + λw( x j ), j = 0, . . . , n.
26
We must analyze
f ( n +1) ( ξ )
λ= .
( n + 1) !
f ( n +1) ( ξ ) 0
f 0 ( xb) − p0 ( xb) = λw0 ( xb) = w ( xb).
( n + 1) !
To arrive at a concrete estimate, perhaps we should say something Lemma 1.1 was proved by Andrey
more specific about w0 ( xb). Expanding w and computing w0 explicitly Markov in 1889, generalizing a result
for n = 2 that was obtained by the
will take us far into the weeds; it suffices to invoke an interesting famous chemist Mendeleev in his
result from 1889. research on specific gravity. Markov’s
younger brother Vladimir extended
Lemma 1.1 (Markov brothers’ inequality for first derivatives). it to higher derivatives (with a more
complicated right-hand side) in 1892.
For any polynomial q ∈ Pn , The interesting history of this inequality
(and extensions into the complex plane)
2n2 is recounted in a paper by Ralph Boas,
max |q0 ( x )| ≤ max |q( x )|. Jr. on ‘Inequalities for the derivatives
x ∈[ a,b] b − a x∈[ a,b]
of polynomials,’ Math. Magazine 42
(4) 1969, 165–174. The result is called
We can thus summarize our discussion as the following theorem, the ‘Markov brothers’ inequality’ to
an analogue of Theorem 1.3. distinguish it from the more famous
‘Markov’s inequality’ in probability
theory (named, like ‘Markov chains,’ for
Andrey; Vladimir died of tuberculosis
at the age of 25 in 1897).
27
f ( n +1) ( ξ ) 0
f 0 ( xk ) − p0n ( xk ) = w ( x k ), Why don’t we simply ‘take a derivative
( n + 1) ! of Theorem 1.3’? The subtlety is the
where w( x ) = ∏nj=0 ( x − x j ). From this formula follows the bound f (n+1) (ξ ) term in Theorem 1.3. Since ξ
depends on x, taking the derivative of
f (n+1) (ξ ( x )) via the chain rule would
2n2 | f (n+1) (ξ )| n
(1.11) | f 0 ( xk ) − p0n ( xk )| ≤ max max ∏ | x − x j | . require explicit knowledge of ξ ( x ). We
b − a ξ ∈[ a,b] (n + 1)! x ∈[ a,b] j=0 don’t want to work out a formula for
ξ ( x ) for each f and interval [ a, b].
Contrast the bound (1.11) with (1.7) from Theorem 1.3: the bounds
are the same, aside from the leading constant 2n2 /(b − a) inherited
from Lemma 1.1.
For our later discussion it will help to get a rough bound for he
case where the interpolation points are uniformly distributed, i.e.,
x j = a + jh, j = 0, . . . , n
i.e., maximize the product of the distances of x from each of the n=5
#10 -4
5
interpolation points. Consider the sketch in the margin. Think about
how you would place x ∈ [ x0 , xn ] so as to make ∏nj=0 | x − x j | as large 0
as possible. Putting x somewhere toward the ends, but not too near
-5
one of the interpolation points, will maximize product. Convince
yourself that, regardless of where x is placed within [ x0 , xn ]: w( x )
-10
• at least one interpolation point is no more than h/2 away from x; -15
0 0.2 0.4 0.6 0.8 1
x
• a different interpolation point is no more than h away from x;
Notice that for n = 5 uniformly
• a different interpolation point is no more than 2h away from x; spaced points on [0, 1], w( x ) takes its
.. maximum magnitude between the two
. interpolation points on each end of the
domain.
• the last remaining (farthest) interpolation point is no more than
nh = b − a away from x.
This reasoning gives the bound
n
h hn+1 n!
(1.12) max ∏ |x − xj | ≤
x ∈[ a,b] j=0 2
· h · 2h · · · nh =
2
.
nhn
0
(1.13) | f ( xk ) − p0n ( xk )| ≤ max | f ( n +1)
(ξ )| .
n+1 ξ ∈[ a,b]
f ( x1 ) − f ( x0 )
p1 ( x ) = f ( x0 ) + ( x − x0 )
x1 − x0
f ( x1 ) − f ( x0 )
= f ( x0 ) + ( x − x0 ).
h
Take a derivative of the interpolant:
f ( x1 ) − f ( x0 )
(1.14) p10 ( x ) = ,
h
which is precisely the conventional definition of the derivative, if
we take the limit h → 0. But how accurate an approximation is
it? Appealing to Corollary 1.1 with n = 1 and [ a, b] = [ x0 , x1 ] =
x0 + [0, h], we have
0 1
(1.15) | f ( xk ) − p10 ( xk )| ≤ max | f 00 (ξ )| h
2 ξ ∈[ x0 ,x1 ]
f ( x1 ) − f ( x0 ) f ( x0 ) − 2 f ( x1 ) + f ( x2 )
(1.16) = f ( x0 ) + ( x − x0 ) + ( x − x0 )( x − x1 ).
h 2h2
29
f ( x2 ) − f ( x0 )
(1.18) p20 ( x1 ) =
2h
f ( x0 ) − 4 f ( x1 ) + 3 f ( x2 )
(1.19) p20 ( x2 ) = .
2h
These beautiful formulas are right-looking, central, and left-looking These formulas can also be derived
by strategically combining Taylor
approximations to f 0 . Though we used an interpolating polynomial expansions for f ( x + h) and f ( x − h).
to derive these formulas, those polynomials are now nowhere in That is an easier route to simple for-
mulas like (1.18), but is less appealing
sight: they are merely the scaffolding that lead to these formulas. when more sophisticated approxima-
How accurate are these formulas? Corollary 1.1 with n = 2 and tions like (1.17) and (1.19) (and beyond)
[ a, b] = [ x0 , x2 ] = x0 + [0, 2h] gives are needed.
2
(1.20) | f 0 ( xk ) − p20 ( xk )| ≤ max | f 000 (ξ )| h2 .
3 ξ ∈[ x0 ,x2 ]
Notice that these approximations indeed scale with h2 , rather than h,
and so the quadratic interpolant leads to a much better approxima-
tion to f 0 , at the cost of evaluating f at three points (for f 0 ( x0 ) and
f 0 ( x2 )), rather than two.
Example 1.4 (Second derivative). While we have only proved a
bound for the error in the first derivative, f 0 ( x ) − p0 ( x ), you can
see that similar bounds should hold when higher derivatives of p
are used to approximate corresponding derivatives of f . Here we
illustrate with the second derivative.
Since p1 is linear, p100 ( x ) = 0 for all x, and the linear interpolant
will not lead to any meaningful bound on f 00 ( x ). Thus, we focus on
the quadratic interpolant to f at the three uniformly spaced points
x0 , x1 , and x2 . Take two derivatives of the formula (1.16) for p2 ( x ) to
obtain
f ( x0 ) − 2 f ( x1 ) + f ( x2 )
(1.21) p200 ( x ) = ,
h2
which is a famous approximation to the second derivative that is
often used in the finite difference discretization of differential equa-
tions. One can show that, like the approximations p20 ( xk ), this for-
mula is accurate to order h2 .
−u00 ( x ) = g( x ), x ∈ [0, 1]
with x j = j/N.
We seek to approximate the solution u( x ) at each of the grid
points x0 , . . . , xn . The Dirichlet boundary conditions give the end
values immediately:
u( x0 ) = 0, u( xn ) = 0.
do not know the values of u( x j−1 ), u( x j ), and u( x j+1 ): finding those val-
ues is the point of our entire endeavor. Thus we define approximate
values
u j ≈ u ( x j ), j = 1, . . . , n − 1.
and will instead use the polynomial p2,j that interpolates u j−1 , u j ,
and u j+1 , giving
00 u j−1 − 2u j + u j+1
(1.23) p2,j (x) = .
h2
31
u0 = 0
u0 − 2u1 + u2 = −h2 g( x1 )
u1 − 2u2 + u3 = −h2 g( x2 )
..
.
un = 0.
where the blank entries are zero. Notice that the first and last entries
are trivial: u0 = un = 0, and so we can trim them off to yield the
slightly simpler matrix
− h2 g ( x1 )
−2 1 u1
1 −2
1
u2
− h2 g ( x2 )
.. .. .. ..
= ..
(1.25)
. . .
. .
.
− h 2 g ( x n −2 )
1 −2 1 u n −2
1 −2 u n −1 − h 2 g ( x n −1 )
Solve this (n − 1) × (n − 1) linear system of equations using Gaussian Ideally, use an efficient version of
Gaussian elimination that exploits the
elimination. One can show that the solution to the differential equa-
banded structure of this matrix to give a
tion inherits the accuracy of the interpolant: the error |u( x j ) − u j | solution in O(n) operations.
behaves like O(h2 ) as h → 0.
−u00 ( x ) = g( x ), x ∈ [0, 1]
32
u0 (0) = u(1) = 0.
u1 − u0
(1.26) = 0.
h
−3u0 + 4u1 − u2
(1.28) u 0 (0) ≈ ,
2h
−3u0 + 4u1 − u2
= 0,
2h
33
g( x ) = cos(πx/2),
Figure 1.10 compares the solutions obtained by solving (1.27) and (1.29)
with n = 4. Clearly, the simple adjustment that gave the O(h2 ) ap-
proximation to u0 (0) = 0 makes quite a difference! This figures shows Indeed, we used this small value
that the solutions from (1.27) and (1.29) differ, but plots like this are of n because it is difficult to see the
difference between the exact solution
not the best way to understand how the approximations compare as and the approximation from (1.29) for
n → ∞. Instead, compute maximum error at the interpolation points, larger n.
max |u( x j ) − u j |
0≤ j ≤ n
0.2
0.15
0.1
0.05
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
34
10
0 Figure 1.11: Convergence of approxi-
mate solutions to −u00 ( x ) = cos(πx/2)
O( h ) with u0 (0) = u(1) = 0. The red line
-2
shows the approximation from (1.27);
10 u1 − it converges like O(h) as h → 0. The
u0 =
0 blue line shows the approximation
from (1.29), which converges like O(h2 ).
max |u( x j ) − u j |
-4
10 O( h2 )
−3
u0
+4
u1
−u
0≤ j ≤ n
-6
10 2 =0
10 -8
-10
10
0 1 2 3 4
10 10 10 10 10
n
for various values of n. Figure 1.11 shows the results of such exper-
iments for n = 22 , 23 , . . . , 212 . Notice that this figure is a ‘log-log’
plot; on such a scale, the errors fall on straight lines, and from the
slope of these lines one can determine the order of convergence. The
slope of the red curve is −1, so the accuracy of the approximations
from (1.27) is O(n−1 ) = O(h) accurate. The slope of the blue curve is
−2, so (1.29) gives an O(n−2 ) = O(h2 ) accurate approximation. How large would n need to be, to
This example illustrates a general lesson: when constructing finite get the same accuracy from the O(h)
approximation that was produced
difference approximations to differential equations, one must ensure by the O(h2 ) approximation with
that the approximations to the boundary conditions have the same n = 212 = 4096? Extrapolation of
the red curve suggests we would need
order of accuracy as the approximation of the differential equation roughly n = 108 .
itself. These formulas can be nicely constructed by from derivatives
of polynomial interpolants of appropriate degree.
35
Ak ( x ) := 1 − 2( x − xk )`0k ( xk ) `2k ( x ),
Bk ( x ) := ( x − xk )`2k ( x ).
A5 ( x )
nomials for n = 5 on the interval
A1 ( x ) A2 ( x ) A3 ( x ) A4 ( x ) [ a, b] = [0, 1] with x j = j/5 (black dots).
1 • A0 , . . . , A5 : A j ( x j ) = 1 (black circles).
• B0 , . . . , B5 : Bj ( xk ) = 0 for all j, k.
0.5 • A00 , . . . , A50 : A0j ( xk ) = 0 for all j, k.
• B00 , . . . , B50 : B0j ( x j ) = 1 (black circles).
0
-0.5
-1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
0.3
Bj ( x )
0.2
0.1
-0.1
-0.2
-0.3
15
A0j ( x )
10
-5
-10
-15
-20
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
6
4
B00 ( x )
B50 ( x )
-2
-4
-6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
38
Here are a couple of basic results whose proofs follow the same
techniques as the analogous proofs for the standard interpolation
problem.
n
f (2n+2) (ξ )
f ( x ) − hn ( x ) =
(2n + 2)! ∏ ( x − x j )2 .
j =0
14
Figure 1.13: Interpolation of
f ( x ) = sin(20x ) + e5x/2 at uniformly
12
spaced points for x ∈ [0, 1]. Top plot:
)
(x
the standard polynomial interpolant
p5
10
p5 ∈ P5 . Middle plot: the Hermite
interpolant h5 ∈ P11 . Bottom plot: the
8
standard interpolant p11 ∈ P11 .
f (x) Though the last two plots show
6
polynomials of the same degree, notice
how the interpolants differ. (At first
4 glance it appears the Hermite interpo-
lation condition fails at the rightmost
2 point in the middle plot; zoom in to see
that the slope of the interpolant indeed
0 matches f 0 (1).)
-2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
14
12
x)
h5 (
10
8 f (x)
-2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
14
12
)
f (x
10
p11 ( x )
8
-2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
40
h n ( x j ) = f ( x j ), h0n ( x j ) = 0 for j = 0, . . . , n.
Theorem 1.9. For each n ≥ 1, let hn be the Hermite–Fejér interpolant For a proof of Theorem 1.9, see page 57
of I. P. Natanson, Constructive Function
of f ∈ C [ a, b] at the Chebyshev interpolation points
Theory, vol. 3 (Ungar, 1965).
a + b b − a (2j + 1)π
xj = + cos , j = 0, . . . , n.
2 2 2n + 2
Then hn ( x ) converges uniformly to f on [ a, b].
41
Thus far all our interpolation schemes have been based on polynomi-
als. However, if the function f is periodic, one might naturally prefer
to interpolate f with some set of periodic functions. ‘2π-periodic’ means that f is
To be concrete, suppose we have a continuous 2π-periodic func- continuous throughout IR and
tion f that we wish to interpolate at the uniformly spaced points f ( x ) = f ( x + 2π ) for all x ∈ IR.
The choice of period 2π makes the
xk = 2πk/n for k = 0, . . . , n with n = 5. We shall build an interpolant notation a bit simpler, but the idea can
as a linear combination of the 2π-periodic functions be easily adapted for any period.
such that
t5 ( x j ) = f ( x j ), j = 0, . . . , 4.
To compute the unknown coefficients c0 , . . . , c4 , set up a linear system
as usual,
b0 ( x0 ) b1 ( x0 ) b2 ( x0 ) b3 ( x0 ) b4 ( x0 ) c0 f ( x0 )
b (x ) b (x ) b (x ) b (x ) b (x ) c f (x )
0 1 1 1 2 1 3 1 4 1 1 1
b0 ( x2 ) b1 ( x2 ) b2 ( x2 ) b3 ( x2 ) b4 ( x2 ) c2 = f ( x2 ) ,
b0 ( x3 ) b1 ( x3 ) b2 ( x3 ) b3 ( x3 ) b4 ( x3 ) c3 f ( x3 )
b0 ( x4 ) b1 ( x4 ) b2 ( x4 ) b3 ( x4 ) b4 ( x4 ) c4 f ( x4 )
span{1, sin( x ), cos( x ), sin(2x ), cos(2x )} = span{e−2ix , e−ix , e0ix , eix , e2ix }.
where ω = e2πi/n is an nth root of unity. In the n = 5 case, the linear This name comes from the fact that
system can thus be written as ω n = 1.
ω0 ω0 ω0 ω0 ω0
γ− 2 f ( x0 )
ω −2 ω −1 ω0 ω1 ω2
γ− 1
f ( x1 )
(1.31) ω −4 ω −2 ω0 ω2 ω4 = f ( x2 ) .
γ0
ω −6 ω −3 ω0 ω3 ω6
γ1 f ( x3 )
ω −8 ω −4 ω0 ω4 ω8 γ2 f ( x4 )
ω0
ω1
ω2 .
ω3
ω4
In other words, the matrix F has Vandermonde structure. From our past
experience with polynomial fitting addressed in Section 1.2.1, we
43
might fear that this formulation is ill-suited to numerical computa- In the language of numerical linear
algebra, we might fear that the matrix
tions, i.e., solutions γ to the system Fγ = f could be polluted by large
F is ill-conditioned, i.e., the condition
numerical errors. number kFkkF−1 k is large.
(F∗ F)k,k = ω 0 + ω 0 + ω 0 + ω 0 + ω 0 = n.
On the off-diagonal, use ω n = 1 to see that all the off diagonal entries
simplify to ω1
1
(F∗ F)`,k = ω 0 + ω 1 + ω 2 + ω 3 + ω 4 , ` 6= k. ω2
0.5
F∗ F = nI,
1 ∗
F −1 = F .
n
The system Fγ = f can be immediately solved without the need for
any factorization of F:
1
γ = F∗ f.
n
The ready formula for F−1 is reminiscent of a unitary matrix. In fact, Q ∈ C n×n is unitary if and only if
the matrices Q−1 = Q∗ , or, equivalently, Q∗ Q = I.
1 1
√ F and √ F∗
n n
44
are indeed unitary, and hence kn−1/2 Fk2 = kn−1/2 F∗ k2 = 1. The matrix 2-norm is defined as
From this we can compute the condition number of F: kFk2 = max
kFxk2
,
x6=0 k x k2
1
k F k 2 k F −1 k 2 = kFk2 kF∗ k2 = kn−1/2 Fk2 kn−1/2 F∗ k2 = 1. where the vector norm on the right
n hand side is the Euclidean norm
This special Vandermonde matrix is perfectly conditioned! One can easily 1/2
k y k2 = ∑ | y k |2 = (y∗ y)1/2 .
solve the system Fγ = f to high precision. The key distinction be- k
tween this case and standard polynomial interpolation is that now we The 2-norm of a unitary matrix is one:
have a Vandermonde matrix based on points eixk that are equally spaced If Q∗ Q = I, then
about the unit circle in the complex plane, whereas before our points kQxk22 = x∗ Q∗ Qx = x∗ x = kxk2 ,
were distributed over an interval on the real line. This distinction so kQk2 = 1.
makes all the difference between an unstable matrix equation and
one that is not only perfectly stable, but also forms the cornerstone of
modern signal processing.
In fact, we have just computed the ‘Discrete Fourier Transform’
(DFT) of the data vector
f ( x0 )
f ( x1 )
.. .
.
f ( x n −1 )
The coefficients γ−(n−1)/2 , . . . , γ(n−1)/2 that make up the vector
1 ∗
γ= F f
n
are the discrete Fourier coefficients of the data in f. From where does
this name derive?
Now use the fact that f ( x0 )e−i ` x0 = f ( xn )e−i ` xn to view the last sum
as a composite trapezoid rule approximation of an integral: The composite trapezoid rule will be
discussed in Chapter 3.
n −1
2π 1 1
2πγ` = f ( x 0 )e−i ` x0 + ∑ f ( x k )e−i ` x k + f ( x n )e−i ` x n
n 2 k =1
2
Z 2π
≈ f ( x )e−i ` x dx
0
= 2π c` .
(n−1)/2
tn ( x ) = ∑ γk e ik x
k =−(n−1)/2
ω0 ω0 ω0 ω0 ω0
γ0 f ( x0 )
ω0 ω1 ω2 ω −2 ω −1
γ1
f ( x1 )
(1.32) ω0 ω2 ω4 ω −4 ω −2 = f ( x2 ) ,
γ2
ω0 ω3 ω6 ω −6 ω −3
γ− 2 f ( x3 )
ω0 ω4 ω8 ω −8 ω −4 γ− 1 f ( x4 )
47
7
Figure 1.14: Trigonometric in-
6 terpolant to 2π-periodic function
f ( x ) = ecos( x)+sin(2x) , using n = 5, 7, 9
5 and 11 points uniformly spaced over
f (x) [0, 2π ) ({ xk }nk=0 for xk = 2πk/n). Since
4
both f and the interpolant are periodic,
3 the function fits well throughout IR,
not just on the interval for which the
2 interpolant was designed.
t5 ( x )
1
-1
-: 0 : 2: 3:
x
7
6
f (x)
5
3
t7 ( x )
2
-1
-: 0 : 2: 3:
x
7
1 f (x)
0
t9 ( x )
-1
-: 0 : 2: 3:
x
7
5
f (x)
4
2
t11 ( x )
1
-1
-: 0 : 2: 3:
x
48
7
p7 ( x )
6 Figure 1.15: Polynomial fit of degree
n = 7 through uniformly spaced grid
5 points x0 , . . . , xn for x j = 2πj/n, for
f (x) the same function f ( x ) = ecos( x)+sin(2x)
4
used in Figure 1.14. In contrast to the
3 trigonometric fits in the earlier figure,
the polynomial grows very rapidly
2 outside the interval [0, 2π ]. Moral: if
1
your function is periodic, fit it with a
trigonometric polynomial.
0
-1
-: 0 : 2: 3:
x
6
f (x) = x Figure 1.16: Trigonometric polynomial
fit of degree n = 11 through uniformly
5 spaced grid points x0 , . . . , xn for x j =
2πj/n, for the non-periodic function
4
f ( x ) = x (top) and for f ( x ) = ( x − π )2
3 (bottom). By restricting the latter
t11 ( x ) function to the domain [0, 2π ], one
2 can view it as a continuous periodic
function with a jump discontinuity in
1
the first derivative. The interpolant t11
0 seems to give a good approximation to
f , but the discontinuity in the derivative
-1 slows the convergence of tn to f as
-: 0 : 2: 3:
x n → ∞.
14
12
f (x) =
( x − π )2
10
4
t11 ( x )
2
-2
-: 0 : 2: 3:
x
ω0 ω0 ω0 ω0 ω0
ω0 ω1 ω2 ω −2 ω −1
5 ∗ ifft(eye(5)) = ω0 ω2 ω4 ω −4 ω −2 .
ω0 ω3 ω6 ω −6 ω −3
ω0 ω4 ω8 ω −8 ω −4
49
-2
10
to [0, 2π ], can be viewed as a continu-
ous but not continuously differentiable
10-4 function. Though the approximation
in Figure 1.16 looks good over [0, 2π ],
f( the convergence of tn to f is slow as
10-6 x) n → ∞.
x ∈[0,2π ]
=
e cos
(x
10-8 )+
sin
(2
x)
-10
10
-12
10
0 5 10 15 20 25 30 35 40 45 50
n
ω0 ω1 ω2 ω3 ω4
γ = fft(f)/n.
valued) 2π-periodic f . Take special note of the simple one line com-
mand to find the coefficient vector γ.
for k=1:(n+1)/2
tn = tn + gamma(k)*exp(1i*(k-1)*xx); % gamma_0, gamma_1, ... gamma_{(n-1)/2} terms
end
for k=(n+1)/2+1:n
tn = tn + gamma(k)*exp(1i*(-n+k-1)*xx); % gamma_{-(n-1)/2}, ..., gamma_{-1} terms
end
plot(xx,f(xx),’b-’), hold on % plot f
plot(xx, tn,’r-’) % plot t_n
In the case that f is real-valued (as with all the examples shown in
this section), one can further show that
γ− k = γk ,
indicating that the imaginary terms will not make any contribution to
tn . Since for k = 1, . . . , (n − 1)/2,
γ−k e−1ik x + γk e1ik x = 2 Re(γk ) cos(kx ) − Im(γk ) sin(kx ) ,
s j ( x j −1 ) = f j −1 , and sj (xj ) = f j
for each j = 1, . . . , n. It is simple to write down a formula for these Note that all the s j ’s are linear polyno-
polynomials, mials. Unlike our earlier notation, the
subscript j does not reflect the polyno-
(x j − x) mial degree.
s j (x) = f j − ( f − f j −1 ).
( x j − x j −1 ) j
Each s j is valid on x ∈ [ x j−1 , x j ], and the interpolant S( x ) is defined
as S( x ) = s j ( x ) for x ∈ [ x j−1 , x j ].
To analyze the error, we can apply the interpolation bound devel-
oped in the last lecture. If we let ∆ denote the largest space between
interpolation points,
∆ := max | x j − x j−1 |,
j=1,...,n
x)
S(
10 (bottom). Notice that the interpolant is
continuous, but its derivative has jump
8 f (x) discontinuities.
-2
0 0.2 0.4 0.6 0.8 1
x
50
f 0 (x)
40
30
S0 ( x )
20
10
-10
-20
0 0.2 0.4 0.6 0.8 1
x
s j ( x j −1 ) = f ( x j −1 ), j = 1, . . . , n;
s j ( x j ) = f ( x j ), j = 1, . . . , n;
s0j ( x j−1 ) = f 0 (x j −1 ), j = 1, . . . , n;
s0j ( x j ) = f 0 ( x j ), j = 1, . . . , n.
x)
S(
10 (middle). Now both the interpolant and
its derivative are continuous, and the
8 f (x) derivative interpolates f 0 . However,
the second derivative of the interpolant
6 now has jump discontinuities (bottom).
-2
0 0.2 0.4 0.6 0.8 1
x
50
f 0 (x)
40
30
20
S0 ( x )
10
-10
-20
0 0.2 0.4 0.6 0.8 1
x
600
400
f 00 ( x )
200
S00 ( x )
-200
-400
-600
0 0.2 0.4 0.6 0.8 1
x
55
1.11 Splines
n + n + (n − 1) + (n − 1) = 4n − 2 constraints
x)
(bottom) derivative. Note that S, S0 , and
S(
10
S00 are all continuous. Look closely at
8 f (x) the plot of S00 : clearly this function will
have jump discontinuities at the interior
6 nodes x2 and x3 , but the not-a-knot
condition forces S000 to be continuous at
4 the knots x1 and x4 = xn−1 .
-2
0 0.2 0.4 0.6 0.8 1
x
50
f 0 (x)
40
30 S0 ( x )
20
10
-10
-20
0 0.2 0.4 0.6 0.8 1
x
600
400
f 00 ( x )
200
0
S00 ( x )
-200
-400
-600
0 0.2 0.4 0.6 0.8 1
x
57
)
S( x
10 (bottom) derivative. Note that S, S0 , and
S00 are all continuous. For a complete
8 f (x) cubic spline, one specifies the value
of S0 ( x0 ) and S0 ( xn ); in this case we
6 have set S0 ( x0 ) = S0 ( xn ) = 0, as you
can confirm in the middle plot. In the
4 bottom plot, see that S000 ( x ) will have
jump discontinuities at all the interior
2 knots x1 , . . . , xn−1 , in contrast to the
not-a-knot spline shown in Figure 1.20.
0
-2
0 0.2 0.4 0.6 0.8 1
x
50
f 0 (x)
40
30
20
S0 ( x )
10
-10
-20
0 0.2 0.4 0.6 0.8 1
x
600
400
f 00 ( x )
200
-200
S00 ( x )
-400
-600
0 0.2 0.4 0.6 0.8 1
x
58
Natural cubic splines are a popular choice for they can be shown,
in a precise sense, to minimize curvature over all the other possible
splines. They also model the physical origin of splines, where beams
of wood extend straight (i.e., zero second derivative) beyond the first
and final ‘ducks.’
Continuing with the example from the last section, Figure 1.20
shows a not-a-knot spline, where S000 is continuous at x1 and xn−1 .
The cubic polynomials s1 for [ x0 , x1 ] and s2 for [ x1 , x2 ] must satisfy
s1 ( x1 ) = s2 ( x1 )
s10 ( x1 ) = s20 ( x1 )
s100 ( x1 ) = s200 ( x1 )
s1000 ( x1 ) = s2000 ( x1 )
Two cubics that match these four conditions must be the same:
s1 ( x ) = s2 ( x ), and hence x1 is ‘not a knot.’ (The same goes for xn−1 .)
Notice this behavior in Figure 1.20. In contrast, Figure 1.21 shows the
complete cubic spline, where S0 ( x0 ) = S0 ( xn ) = 0.
However we assign the two additional conditions, we get a system
of 4n equations (the various constraints) in 4n unknowns (the cubic
polynomial coefficients). These equations can be set up as a system
One can arrange Gaussian elimination
involving a banded coefficient matrix (zero everywhere except for a to solve an n × n tridiagonal system in
limited number of diagonals on either side of the main diagonal). We O(n) operations.
could derive this linear system by directly enforcing the continuity
conditions on the cubic polynomial that we have just described. In- Try constructing this matrix!
stead, we will develop a more general approach that expresses the
spline function S( x ) as the linear combination of special basis func-
tions, which themselves are splines.
59
The following plot shows the basis function B0,0 for the knots x j = j.
Note, in particular, that Bj,0 ( x j+1 ) = 0. The line drawn beneath
the spline marks the support of the spline, that is, the values of x for
which B0,0 ( x ) 6= 0.
1 B0,0 ( x )
0.5
-3 -2 -1 0 1 2 3 4 5 6 7
x
1
B0,1 ( x )
0.5
-3 -2 -1 0 1 2 3 4 5 6 7
x
0.5 B0,2 ( x )
0
-3 -2 -1 0 1 2 3 4 5 6 7
x
0.5 B0,3 ( x )
0
-3 -2 -1 0 1 2 3 4 5 6 7
x
From these plots and the recurrence defining Bj,k , one can deduce
several important properties:
• Sk ( x j ) = f j for j = 0, . . . , n;
• Sk ∈ C k−1 [ x0 , xn ] for k ≥ 1.
What limits should j have in this sum? For the greatest flexibility, let
j range over all values for which
n −1
(1.35) Sk ( x ) = ∑ c j,k Bj,k ( x ), k ≥ 1.
j=−k
-3 -2 -1 0 1 2 3 4 5 6 7
0.5
-3 -2 -1 0 1 2 3 4 5 6 7
1
B−3,3 B−2,3 B−1,3 B0,3 B1,3 B2,3 B3,3
0.5
-3 -2 -1 0 1 2 3 4 5 6 7
62
n −1
f ` = Sk ( x ` ) = ∑ c j,k Bj,k ( x` ), ` = 0, . . . , n.
j=−k
n −1
Sk ( x ) = ∑ c j,k Bj,k ( x ),
j=−k
n −1
∑ c j,k Bj,k ( x` ) = f ` , ` = 0, . . . , n.
j=−k
Let us consider the matrix in this equation. The matrix will have
n + 1 rows and n + k columns, so when k > 1 the system of equations
will be underdetermined. Since B-splines have ‘small support’ (i.e., One could obtain an (n + 1) × (n + 1)
Bj,k ( x ) = 0 for most x ∈ [ x0 , xn ]), this matrix will be sparse: most matrix by arbitrarily setting k − 1
certain values of c j,k to zero, but this
entries will be zero. would miss a great opportunity: we can
constructively include all n + k B-splines
The following subsections will describe the particular form the and impose k extra properties on Sk to
system (1.36) takes for k = 1, 2, 3. In each case we will illustrate the pick out a unique spline interpolant
from the infinitely many options that
resulting spline interpolant through the following data set. satisfy the interpolation conditions.
j 0 1 2 3 4
(1.37) xj 0 1 2 3 4
fj 1 3 2 −1 1
5
Figure 1.23: Linear spline S1 interpolat-
ing 5 data points {( x j , f j )}4j=0 .
4
3 S1 ( x )
-1
-2
-3 -2 -1 0 1 2 3 4 5 6 7
n −1
This above discussion is a pedantic way
S1 ( x ) = ∑ f j+1 Bj,1 ( x ). to arrive at an obvious solution: Since
j=−1 the jth ‘hat function’ B-spline equals
one at x j+1 and zero at all other knots,
Figure 1.23 shows the unique piecewise linear spline interpolant to just write the unique formula for the
the data in (1.37), which is a linear combination of the five linear interpolant immediately.
n −1
S2 ( x ) = ∑ c j,2 Bj,2 ( x ),
j=−2
B−2,2 ( x0 ) B−1,2 ( x0 ) ··· Bn−1,2 ( x0 ) c−2,2 f0
B−2,2 ( x1 )
B−1,2 ( x1 ) ··· Bn−1,2 ( x1 )
c−1,2
f1
(1.38) = .
.. .. .. .. ..
..
. . . .
. .
B−2,2 ( xn ) B−1,2 ( xn ) ··· Bn−1,2 ( xn ) cn−1,2 fn
Bj,2 ( x` ) = 0, ` 6∈ { j + 1, j + 2},
so the matrix is zero in all entries except the main diagonal (Bj,2 ( x j+2 ))
and the first superdiagonal (Bj,2 ( x j+1 )). To evaluate these nonzero en-
tries, recall that the recursion (1.33) for B-splines gives
x−x
j +3 − x
x
j
Bj,2 ( x ) = Bj,1 ( x ) + B ( x ).
x j +2 − x j x j+3 − x j+1 j+1,1
Evaluate this function at x j+1 and x j+2 , using our knowledge of the
66
x
j +1− xj x
j +3 − x j +1
x j +1 − x j
= ·1+ ·0 = ;
x j +2 − x j x j +3 − x j +1 x j +2 − x j
x
j +2− xj x
j +3 − x j +2
Bj,2 ( x j+2 ) = Bj,1 ( x j+2 ) + B (x )
x j +2 − x j x j+3 − x j+1 j+1,1 j+2
x
j +2− xj x
j +3 − x j +2
x j +3 − x j +2
= ·0+ ·1 = .
x j +2 − x j x j +3 − x j +1 x j +3 − x j +1
1
Bj,2 ( x j+1 ) = Bj,2 ( x j+2 ) = ,
2
hence the system (1.38) becomes
c
1/2 1/2 −2,2 f
0
c−1,2
f
1/2 1/2
1
c0,2 = ,
.. .. ..
. . .
..
1/2 1/2
.
fn
cn−1,2
Try to evaluate this expression at x j , x j+1 , or x j+2 : you must face that
fact that the linear B-splines Bj,1 and Bj+1,1 are not differentiable at the
knots! You must instead check that the one-sided derivatives match,
e.g.,
Bj,2 ( x j+1 + h) − Bj,2 ( x j+1 ) Bj,2 ( x j+1 + h) − Bj,2 ( x j+1 )
lim = lim .
h →0 h h →0 h
h <0 h >0
5
Figure 1.24: Two choices for the
quadratic spline S2 that interpolates
4 S2 ( x ), S20 ( xn ) = 0 the 5 data points {( x j , f j )}4j=0 in (1.37).
The blue spline satisfies the extra con-
3 dition that S20 ( x0 ) = 0, while the red
S2 ( x ), S20 ( x0 ) = 0 spline satisfies S20 ( xn ) = 0. Check to see
2
that these conditions are consistent with
the splines in the plot.
-1
-2
-3 -2 -1 0 1 2 3 4 5 6 7
B−3,3 ( x0 ) B−2,3 ( x0 ) ··· Bn−1,3 ( x0 ) c−3,3 f0
B−3,3 ( x1 )
B−2,3 ( x1 ) ··· Bn−1,3 ( x1 )
c−2,3
f1
(1.41) = .
.. .. .. .. ..
..
. . . .
. .
B−3,3 ( xn ) B−2,3 ( xn ) ··· Bn−1,3 ( xn ) cn−1,3 fn
Bj,3 ( x` ) = 0, ` 6∈ { j + 1, j + 2, j + 3},
which implies that only three diagonals of the matrix in (1.41) will
be nonzero. We shall only work out the nonzero entries in the case of
uniformly spaced knots, x j = x0 + jh for fixed h > 0. In this case,
x − xj
j +1
x
j +4 − x j +1
h 1 3h 1
Bj,3 ( x j+1 ) = Bj,2 ( x j+1 ) + Bj+1,2 ( x j+1 ) = · + ·0 =
x j +3 − x j x j +4 − x j +1 3h 2 3h 6
j +2 − x j x j +4 − x j +2
x 2h 1 2h 1 2
Bj,3 ( x j+2 ) = B (x ) + B (x ) = · + · =
x j+3 − x j j,2 j+2 x j+4 − x j+1 j+1,2 j+2 3h 2 3h 2 3
j +3 − x j x j +4 − x j +3
x 3h h 1 1
Bj,3 ( x j+3 ) = Bj,2 ( x j+3 ) + Bj+1,2 ( x j+3 ) = ·0+ · = ,
x j +3 − x j x j +4 − x j +1 3h 3h 2 6
where we have used the fact that Bj,2 ( x j+1 ) = Bj,2 ( x j+2 ) = 1/2 and
69
( xn+2 − xn−1 )cn−3,3 − ( xn+2 + xn+1 − xn−1 − xn−2 )cn−2,3 + ( xn+1 − xn−2 )cn−1,3 = 0,
5
Figure 1.25: Cubic spline S3 interpolant
to 5 data points {( x j , f j )}4j=0 , imposing
4 the two extra natural spline conditions
S300 ( x0 ) = S300 ( xn ) = 0 to give a unique
3 S3 ( x ) spline.
=0
=0
1
xn )
0)
3 (x
S300(
S 00
-1
-2
-3 -2 -1 0 1 2 3 4 5 6 7
To give a flavor for such results, we present one example. For a similar result involving complete
cubic splines, see Theorem 2.3.1 of
Gautschi’s Numerical Analysis (2nd ed.,
Birkhäuser, 2012). The proof here is an
Theorem 1.10 (Natural cubic splines minimize energy). easy adaptation of Gautchsi’s.
Suppose S3 is the natural cubic spline interpolant to {( x j , f j )}nj=0 , and
g is any C2 [ x0 , xn ] function that also interpolates the same data. Then
Z xn Z xn
S300 ( x )2 dx ≤ g00 ( x )2 dx.
x0 x0
71
Proof. The proof will actually quantify how much larger g00 is than S300
by showing that
Z xn Z xn Z xn 2
(1.44) g00 ( x )2 dx = S300 ( x )2 dx + g00 ( x ) − S300 ( x ) dx.
x0 x0 x0
To prove this claim, break the integral on the left into segments
[ x j , x j+1 ] between the knots. Write This decomposition of [ x0 , xn ] will
allow us to exploit the fact that S3
Z xn n Z xj is a standard cubic polynomial, and
g00 ( x ) − S300 ( x ) S300 ( x ) dx = ∑ g00 ( x ) − S300 ( x ) S300 ( x ) dx.
hence infinitely differentiable, on these
x0 j =1 x j +1 subintervals.
Most of the boundary terms on the right cancel one another out,
leaving only
Z xn
g00 ( x ) − S300 ( x ) S300 ( x ) dx = ( g0 ( xn ) − S30 ( xn ))S300 ( xn ) − ( g0 ( x0 ) − S30 ( x0 ))S300 ( x0 ) .
x0
Each of the terms on the right is zero by virtue of the natural cu-
bic spline condition S300 ( x0 ) = S300 ( xn ) = 0. This confirms the for-
mula (1.45), and hence the equivalent (1.44) that quantifies how much
larger g00 can be than S300 .
72
[ − 2 5 0 3 7]
In this last example note the 0 corresponding to the x2 term: all lower
powers of x must be accounted for in coefficient vector.
Given a polynomial in a vector, say p = [ − 2 5 0 3 7], one can
evaluate p( x ) using the command polyval, e.g.
>> polyval(p,x)
One can also compute the roots of polynomials very easily with the
Type type roots to see matlab’s code
command for the roots command. Scan to the
>> roots(p) % compute roots of p(x)=0 bottom to see the crucial lines. From
the coefficients matlabconstructs a
though one should be cautious of numerical errors when the degree companion matrix, then computes its
eigenvalues using the eig command.
of the polynomial is large. One can construct a polynomial directly For some (larger degree) polynomials,
from its roots, using the poly command. For example, these eigenvalues are very sensitive to
perturbations, and the roots can be very
>> poly([1:4]) inaccurate. For a famous example due
ans = to Wilkinson, try roots(poly[1:24])),
should return the roots 1, . . . , 24.
1 -10 35 -50 24
73
Note that the indices of xx account for the fact that x j = xx(j + 1).
For example, S.breaks contains the list of knots. One can also pass Another option to interp1 has a mis-
leading name: ’pchip’ constructs a
arguments to spline to specify complete boundary conditions. How- particular spline-like interpolant de-
ever, there is no easy way to impose natural boundary conditions. signed to be quite smooth: it cannot
For more sophisticated data fitting operations, matlab offers a Curve match any derivative information about
f , as no derivative information is even
Fitting Toolbox (which fits both curves and surfaces). passed to the function.
1.12.4 Chebfun
Chebfun is a free package of matlab routines developed by Nick
Trefethen and colleagues at Oxford University. Using sophisticated
techniques from polynomial approximation theory, Chebfun auto-
matically represents an arbitrary (piecewise smooth) function f ( x ) to
machine precision, and allows all manner of operations on this func-
tion, overloading every conceivable matlab matrix/vector operation.
There is no way to do this beautiful and powerful system justice in
a few lines of text here. Go to chebfun.org, download the software,
and start exploring. Suffice to say, Chebfun significantly enrich one’s In fact, it was used to generate a num-
study and practice of numerical analysis. ber of the plots in these notes.
2
Approximation Theory
the ‘infinity norm of g’. One can show that k · k∞ satisfies the basic
norm axioms on the vector space C [ a, b] of continuous functions. k gk∞ ≥ 0 for all g ∈ C [ a, b]
Thus the minimax approximation problem seeks p∗ ∈ Pn such that k gk∞ = 0 ⇐⇒ g( x ) = 0 for all x ∈ [ a, b].
kαgk∞ = |α|k gk∞ for all α ∈ C, g ∈ C [ a, b].
k g + hk∞ ≤ k gk∞ + khk∞ , for all g, h ∈ C [ a, b].
k f − p∗ k∞ = min k f − pk∞ .
p∈Pn
76
k f − p∗ k2 = min k f − pk2 .
p∈Pn
k f − p∗ k1 = min k f − pk1 .
p∈Pn
77
k f − p∗ k∞ = min k f − pk∞ .
p ∈Pn
k f − Πn f k∞ ≤ 1 + kΠn k∞ k f − p∗ k∞ ,
(2.1)
where Πn is the linear interpolation operator for That is, p = Πn f ∈ Pn is the polynomial
that interpolates f at x0 , . . . , xn .
x0 < x1 < · · · < x n
kΠn f k∞
kΠn k∞ = max
f ∈C [ a,b] k f k∞
The picture to the right shows |e0 − c0 | (blue) and |e1 − c0 | (red) for |e 1
−
c0 ∈ [1, e]. The optimal value for c0 will be the point at which the c0
|
larger of these two lines is minimal. The figure clearly reveals that this
happens when the errors are equal, at c0 = (1 + e)/2. We conclude
0
that the optimal minimax constant polynomial approximation to ex 1 e−1 e
2 c0
on x ∈ [0, 1] is p∗ ( x ) = c0 = (1 + e)/2.
e0 − c0 = −(e1 − c0 ).
3
Figure 2.1: Minimax approximation
) of degree k = 0 to f ( x ) = ex on
f (x
2.5 x ∈ [0, 1]. The top plot compares f
to p∗ ; the bottom plot shows the error
2 p∗ ( x ) f − p∗ , whose extreme magnitude is
attained, with opposite sign, at two values
of x ∈ [0, 1].
1.5
0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
1
0.8
0.6
f ( x ) − p∗ ( x )
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
79
The previous example hints that the points at which the error f − p∗
attains its maximum magnitude play a central role in the theory of
minimax approximation. The Theorem of de la Vallée Poussin is a
first step toward such a result. We include its proof to give a flavor of The proof is adapted from Section 8.3
how such results are established. of Süli and Mayers, An Introduction to
Numerical Analysis (Cambridge, 2003).
for j = 0, . . . , n. Then
Before proving this result, look at Figure 2.2 for an illustration of the
theorem. Suppose we wish to approximate f ( x ) = ex with some
quintic polynomial, r ∈ P5 (i.e., n = 5). This polynomial is not neces-
sarily the minimax approximation to f over the interval [0, 1]. However,
in the figure it is clear that for this r, we can find n + 2 = 7 points
at which the sign of the error f ( x ) − r ( x ) oscillates. The red curve These n + 2 points are by no means
shows the error for the optimal minimax polynomial p∗ (whose unique: we have a continuum of choices
available. However, taking the extrema
computation is discussed below). This is the point of de la Vallée of f − r will give the the best bounds in
Poussin’s theorem: Since the error f ( x ) − r ( x ) oscillates sign n + 2 the theorem.
times, the minimax error ±k f − p∗ k∞ exceeds | f ( x j ) − r ( x j )| at one of
the points x j that give the oscillating sign. In other words, de la Val-
lée Poussin’s theorem gives a nice mechanism for developing lower
bounds on k f − p∗ k∞ .
k f − p∗ k∞ = min k f − pk∞ .
p ∈Pn
#10-6
Figure 2.2: Illustration of de la Vallée
Poussin’s theorem for f ( x ) = ex and
1.5
f (x) − r(x) n = 5. Some polynomial r ∈ P5 gives
an error f − r for which we can identify
1 n + 2 = 7 points x j , j = 0, . . . , n + 1
(black dots) at which f ( x j ) − r ( x j )
oscillates sign. The minimum value
0.5
of | f ( x j ) − r ( x j )| gives a lower bound
the maximum error k f − p∗ k∞ of the
0 optimal approximation p∗ ∈ P5 .
-0.5
-1
f ( x ) − p∗ ( x )
-1.5
Now consider
p∗ ( x ) − r ( x ) = ( f ( x ) − r ( x )) − ( f ( x ) − p∗ ( x )),
k f − p∗ k∞ ≥ | f ( x j ) − r ( x j )|,
the sign of f − e
r oscillate, but the error takes its extremal values at
these points. That is,
| f ( x j ) − er ( x j )| = k f − er k∞ , j = 0, . . . , n + 1,
min k f − pk ≥ min | f ( x j ) − e
r ( x j )|.
p ∈Pn 0≤ j ≤ n +1
| f ( x j ) − er ( x j )| = k f − er k∞
min k f − pk ≥ min | f ( x j ) − e
r ( x j )| = k f − e
r k∞ .
p ∈Pn 0≤ j ≤ n +1
r ∈ Pn , it follows that
Since e
min k f − pk = k f − e
r k∞ ,
p ∈Pn
| f ( x j ) − p∗ ( x j )| = k f − p∗ k∞ , j = 0, . . . , n + 1
Note that this result is if and only if : the oscillation property exactly
characterizes the minimax approximation. We have proved one direc-
tion already by appeal to de la Vallée Poussin’s theorem. The proof of
the other direction is rather more involved.
82
Lemma 2.1. Let p∗ ∈ Pn be a minimax approximation of f ∈ C [ a, b], This ‘lemma’ is a diluted version of
Kolmolgorov’s Theorem, which is (a) an
k f − p∗ k∞ = min k f − pk∞ , ‘if and only if’ version of this lemma
p ∈Pn that (b) appeals to approximation with
much more general classes of functions,
and let X denote the set of all points x ∈ [ a, b] for which not just polynomials, and (c) handles
complex-valued functions. The proof
here is adapted from that more general
| f ( x ) − p∗ ( x )| = k f − p∗ k∞ . setting given in Theorem 2.1 of DeVore
and Lorentz, Constructive Approximation
Then for all q ∈ Pn , (Springer, 1993).
(2.5) max f ( x ) − p∗ ( x ) q( x ) ≥ 0.
x ∈X
where
X
e := {ξ ∈ [ a, b] : min |ξ − x | < δ}.
x ∈X
= E2 + 2λ f ( x ) − p∗ ( x ) qe( x ) + λ2 qe( x )2
4ε2 4ε2
(2.8) | f ( x ) − pe( x )|2 < E2 − 2λε + λ2 M2 < E2 − + = E2 = k f − p ∗ k2
M2 M2
for all x ∈ X.
e Thus pe beats p∗ on X. e Now since X comprises the
points where | f ( x ) − p∗ ( x )| attains its maximum, away from X
e this
error must be bounded away from its maximum, i.e., there exists
some η > 0 such that
max | f ( x ) − p∗ ( x )| ≤ E − η.
x ∈[ a,b]
x 6 ∈X
e
| f ( x ) − pe( x )| = | f ( x ) − p∗ ( x ) + λe
q( x )|
≤ | f ( x ) − p∗ ( x )| + λ|qe( x )|
≤ E − η + λkqek∞ ,
In conclusion, if
λ ∈ 0, min(2ε/M2 , η/kqek∞ ) ,
With this lemma, we can readily complete the proof of the Oscilla-
tion Theorem.
and the sign of q flips between each of these intervals. Thus the sign
of f ( x ) − p∗ ( x ) q( x ) is the same for all x ∈ X. Pick the ± sign in the
for all x ∈ X,
f ( x ) − p∗ ( x ) q( x ) < 0
This oscillation property forms the basis of algorithms that find the
minimax approximation: iteratively adjust an approximating poly-
nomial until it satisfies the oscillation property. The most famous
algorithm for computing the minimax approximation is called the
85
#10 -4
0.01 6
0.005
n=2 4 n=3
2
0 0
-2
-0.005
-4
-0.01 -6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x x
-5 -6
#10 #10
3
1
2 n=4 n=5
1 0.5
0 0
-1 -0.5
-2
-1
-3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x x
Figure 2.3: Illustration of the equioscil-
lating minimax error f − p∗ for approx-
imations of degree n = 2, 3, 4, and 5
with f ( x ) = ex for x ∈ [ a, b]. In each
Remez exchange algorithm, essentially a specialized linear program-
case, the error attains its maximum
ming procedure. In exact arithmetic, this algorithm is guaranteed to with alternating sign at n + 2 points.
terminate with the correct answer in finitely many operations.
The oscillation property is demonstrated in the Example 2.1, where
we approximated f ( x ) = ex with a constant. Indeed, the maxi-
mum error is attained at two points (that is, n + 2, since n = 0), and
the error differs in sign at those points. Figure 2.3 shows the errors
f ( x ) − p∗ ( x ) for minimax approximations p∗ of increasing degree. These examples were computed in
The oscillation property becomes increasingly apparent as the poly- matlab using the Chebfun pack-
age’s remez algorithm. For details, see
nomial degree increases. In each case, there are n + 2 extreme points www.chebfun.org.
of the error, where n is the degree of the approximating polynomial.
Example 2.2 (ex revisited). Now we shall use the Oscillation Theorem
to compute the optimal linear minimax approximation to f ( x ) = ex
on [0, 1]. Assume that the minimax polynomial p∗ ∈ P1 has the form
p∗ ( x ) = α + βx. Since f is convex, a quick sketch of the situation
suggests the maximal error will be attained at the end points of the
interval, x0 = 0 and x2 = 1. We assume this to be true, and seek some
third point x1 ∈ (0, 1) that attains the same maximal error, δ, but with
opposite sign. If we can find such a point, then the Oscillation Theo-
rem guarantees that the resulting polynomial is optimal, confirming
our assumption that the maximal error was attained at the ends of
the interval.
86
f ( x0 ) − p ∗ ( x0 ) = δ
f ( x1 ) − p ∗ ( x1 ) = − δ
f ( x2 ) − p∗ ( x2 ) = δ.
1−α = δ
x1
e − α − βx1 = −δ
e − α − β = δ.
To make this happen, require that the derivative of error be zero at x1 , This requirement need not hold at the
reflecting that the error f − p∗ attains a local minimum/maximum at points x0 and x2 , since these points
are on the ends of the interval [ a, b]; it
x1 . (The plots in Figure 2.3 confirm that this is reasonable.) Imposing is only required at the interior points
the condition that f 0 ( x1 ) − p0∗ ( x1 ) = 0 yields where the extreme error is attained,
x j ∈ ( a, b).
ex1 − β = 0.
1
δ= 2 2 − e + (e − 1) log(e − 1) = 0.10593 . . . .
Figure 2.4 shows the optimal approximation, along with the error
f ( x ) − p∗ ( x ) = ex − (α + βx ). In particular, notice the size of the
maximum error (δ = 0.10593 . . .) and the point x1 = 0.54132 . . . at
which this error is attained.
87
3
Figure 2.4: The top plot shows the
minimax approximation p∗ of degree
2.5 n = 1 (red) to f ( x ) = ex (blue); the
bottom plot shows the error f ( x ) −
2
p ∗(x
) ∈ P1 p∗ ( x ), equioscillating at n + 1 = 3
points.
x
=e
1.5
f (x)
1
0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.1
0.05
f (x
0 )−
p∗
(x
-0.05 )
-0.1
x
88
k f − pn k∞ ,
Notice that
n n n n n
∏(x − x j ) = xn+1 − xn ∑ x j + xn−1 ∑ ∑ x j xk − · · · + (−1)n+1 ∏ x j
j =0 j =0 j =0 k =0 j =0
= x n +1 − r ( x ),
Tn ( x ) = cos(n cos−1 x ).
Tn ( x ) = cos(n cos−1 x )
Tn (η j ) = (−1) j .
Proof. These results follow from direct calculations. For x ∈ [−1, 1],
Tn ( x ) = cos(n cos−1 ( x )) cannot exceed one in magnitude because
cosine cannot exceed one in magnitude. To verify the formula for the
roots, compute
(2j − 1)π (2j − 1)π
Tn (ξ j ) = cos n cos−1 cos = cos = 0,
2n 2
since cosine is zero at half-integer multiples of π. Similarly,
jπ
Tn (η j ) = cos n cos−1 cos = cos( jπ ) = (−1) j .
n
Since Tn (η j ) is a nonzero degree-n polynomial, it cannot attain more
than n + 1 extrema on [−1, 1], including the endpoint: we have thus
characterized all the maxima of | Tn | on [−1, 1].
1
n=1 1
n=2 1
n=3
0 0 0
-1 -1 -1
1
n=5 1
n=7 1
n = 10
0 0 0
-1 -1 -1
1
n = 15 1
n = 20 1
n = 30
0 0 0
-1 -1 -1
bn+1 ( x ) = x n+1 − rn ( x )
T
1.2 1.2
1 1
0.8 0.8
0.6 0.6
p8 ( x )
0.4 0.4
p4 ( x )
0.2 0.2
0
f (x) 0
-0.2 -0.2
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
x x
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
p16 ( x ) p24 ( x )
0.2 0.2
0 0
-0.2 -0.2
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
x x
The results are stated for [ a, b] = [−1, 1] but can be adapted to any
real interval.
4V (ν)
k f − pn k∞ ≤
π ( ν ( n − ν ) ν ).
i
• Suppose f is analytic on [−1, 1] and can be analytically continued
ρ = 1.75
(into the complex plane) onto the region bounded by the ellipse
ρ = 1.25
n ρeiθ + e−iθ /ρ o
Eρ := : θ ∈ [0, 2π ) .
2
Suppose further that | f (z)| ≤ M on and inside Eρ . Then
2Mρ−n −i
k f − pn k∞ ≤ .
ρ−1
Interval [−1, 1] (blue), with two ellipses
Eρ for ρ = 1.25 and ρ = 1.75.
For example, the first part of this theorem implies that if f 0 exists
and is bounded, then k f − pn k∞ must converge at least as fast as
1/n as n → ∞. While that is not such a fast rate, it does indeed
show convergence of the interpolant. The second part of the theorem
ensures that if f is well behaved in the region of the complex plane
around [−1, 1], the convergence will be extremely fast: the larger the
area of C in which f is well behaved, the faster the convergence.
k Tn k∞ = min k pk∞ ,
p ∈Pn
p monic
over the interval [−1, 1], where a polynomial is monic if it has the
form x n + q( x ) for q ∈ Pn−1 .
95
used previously? Trivially one can see that the new definition also
gives T0 ( x ) = 1 and T1 ( x ) = x. Like standard trigonometric func-
tions, the hyperbolic functions also satisfy the addition formula
α + β α − β
cosh α + cosh β = 2 cosh cosh ,
2 2
and so
cosh (n + 1)θ = 2 cosh θ cosh nθ − cosh (n − 1)θ ,
w + w −1
x= ,
2
which allows us to write
elog w + e− log w
x= = cosh(log w).
2
Thus work from the definition to obtain
Tn ( x ) = cosh(n cosh−1 ( x ))
= cosh(n log w)
n n)
elog(w ) + e− log(w wn + w−n
= cosh(log wn ) = = .
2 2
96
wn + w−n w + w −1
(2.11) Tn ( x ) = , x= 6∈ (−1, 1).
2 2
We have thus shown that | Tn ( x )| will grow exponentially in n for any
x 6∈ (−1, 1) for which |w| 6= 1. When does |w| = 1? Only when
x = ±1. Hence,
w2 − 4w + 1 = 0.
The inner product satisfies the following basic axioms: For simplicity we are assuming that
f and g are real-valued. To handle
• hα f + g, hi = αh f , hi + h g, hi for all f , g, h ∈ C [ a, b] and all α ∈ IR; complex-valued functions, one general-
izes the inner product to
• h f , gi = h g, f i for all f , g ∈ C [ a, b]; Z b
h f , gi = f ( x ) g( x ) dx,
a
• h f , f i ≥ 0 for all f ∈ C [ a, b].
which then gives h f , gi = h g, f i.
This is often called the ‘L2 norm,’ where the superscript ‘2’ in L2
refers to the fact that the integrand involves the square of the func-
tion f ; the L stands for Lebesgue, coming from the fact that this inner
The Lebesgue theory gives a more
product can be generalized from C [ a, b] to the set of all functions that robust definition of the integral than
are square-integrable, in the sense of Lebesgue integration. By restrict- the conventional Riemann approach.
With such notions one can extend least
ing our attention to continuous functions, we dodge the measure- squares approximation beyond C [ a, b],
theoretic complexities. to more exotic function spaces.
98
k f − P∗ k2 = min k f − pk2 .
p ∈Pn
c0 = 4e − 10, c1 = 18 − 6e.
3
Figure 2.7: Top: Approximation of
f ( x ) = ex (blue) over x ∈ [0, 1] via
2.5 least squares (P∗ , shown in red) and
minimax (p∗ , shown as a gray line).
2 Bottom: Error curves for least
squares, f − P∗ (red), and minimax,
f − p∗ (gray) approximation. While the
1.5
curves have similar shape, note that the
red curve does not attain its maximum
1 deviation from f at n + 2 = 3 points,
while the gray one does.
0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
0.2
0.15
error, f ( x ) − p( x )
0.1
0.05 lea
st s
qu
0
mi are
nim s
-0.05 ax
-0.1
-0.15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
We can see from the plots in Figure 2.7 that the approximation
looks decent to the eye, but the error is not terribly small. We can In fact, k f − P∗ k2 = 0.06277 . . .. This is
decrease that error by increasing the degree of the approximating indeed smaller than the 2-norm error
of the minimax approximation p∗ :
polynomial. Just as we used a 2-by-2 linear system to find the best k f − p∗ k2 = 0.07228 . . ..
linear approximation, a general (n + 1)-by-(n + 1) linear system can
be constructed to yield the degree-n least squares approximation.
100
n
p( x ) = ∑ ck φk (x).
k =0
n n n
= hf, fi − 2 ∑ ck h f , φk i + ∑ ∑ ck c` hφk , φ` i.
k =0 k=0 `=0
∂E ∂ ∂ n ∂ n n
2 ∑ ck h f , φk i +
∂c j k∑ ∑ k` k `
= hf, fi − c c h φ , φ i
∂c j ∂c j ∂c j k=0 =0 `=0
n n n n
∂ 2
= 0 − 2h f , φ j i + c j hφj , φj i + ∑ ck c j hφk , φj i + ∑ c j c` hφj , φ` i + ∑ ∑ ck c` hφk , φ` i
∂c j k =0 `=0 k =0 `=0
k6= j `6= j k 6= j `6= j
In this last line, we have broken the double sum on the previous line
into four parts: one that contains c2j , two that contain c j (ck c j for k 6= j;
c j c` for ` 6= j), and one (the double sum) that does not involve c j at
all. This decomposition makes it easier to compute the derivative:
n n n n
∂ 2
c j hφj , φj i + ∑ ck c j hφk , φj i + ∑ c j c` hφj , φ` i + ∑ ∑ ck c` hφk , φ` i
∂c j k =0 `=0 k =0 `=0
k6= j `6= j k 6= j `6= j
n n
= 2c j hφj , φj i + ∑ ck hφk , φj i + ∑ c` hφj , φ` i + 0
k =0 `=0
k6= j `6= j
n
= 2c j hφj , φj i + 2 ∑ ck hφk , φj i.
k =0
k6= j
101
hence G is symmetric. (In this case, symmetry also follows from the
equivalence of mixed partial deritivates.) The following theorem
confirms that G is indeed positive definite.
102
Theorem 2.7. If φ0 , . . . , φn are linearly independent, the Gram matrix This proof is very general: we are
thinking of φ0 , . . . , φn being a basis for
G is positive definite. Pn (and hence linearly independent),
but the same proof applies to any
linearly independent set of vectors in a
Proof. For a generic z ∈ IRn+1 , consider the product general inner product space.
h i
z0 z1 ··· zn hφ0 , φ0 i hφ0 , φ1 i ··· hφ0 , φn i z0
hφ1 , φ0 i hφ1 , φ1 i ··· hφ1 , φn i
z1
z∗ Gz =
.. .. .. ..
. . .
.
hφn , φ0 i hφn , φ1 i · · · hφn , φn i zn
∑nk=0 z j hφ0 , φk i
h i
z0 z1 ··· zn
n
∑
k=0 z j hφ1 , φk i n n
Eigenvalues illuminate. The surfaces
∑ ∑ z j zk hφj , φk i.
=
..
=
below visualize E(c0 , c1 ) for best ap-
.
j =0 k =0 proximation of f ( x ) = ex from P1 over
x ∈ [−1, 1] (top) and x ∈ [0, 1].
∑nk=0 z j hφn , φk i
For [−1, 1], the eigenvalues of G are
relatively large, and the error surface
Now use linearity of the inner product to write looks very bowl-like.
7
n n D n n E n 2
∗
∑ ∑ z j zk hφj , φk i = ∑ z j φj , ∑ zk φk ∑ z j φj
6
z Gz = = . 5
j =0 k =0 j =0 k =0 j =0
E ( c0 , c1 )
4
0
c0
i.e., if and only if z = 0. Thus, if z 6= 0, z∗ Gz > 0.
-1 -1
Eigenvalues of G: λ1 = 2, λ2 = 2/3
This answers the second question posed above, and also makes the
For [0, 1], G has a small eigenvalue: the
answer to the first trivial.
error surface is much more ‘shallow’
in one direction. (The orientation of
the trough can be found from the
Corollary 2.1. If φ0 , . . . , φn are linearly independent, the Gram matrix corresponding eigenvector of G.)
G is invertible. 6
4
E ( c0 , c1 )
0
trivial null space, and is thus invertible. 4
3
2
c1 1 2
3
1
We can summarize our findings as follows. 0 -1
0
c0
√
Eigenvalues of G: λ1 = (4 + √13)/6 = 1.26759 . . .
λ2 = (4 − 13)/6 = 0.06574 . . .
103
h f − P∗ , qi = 0, for all q ∈ Pn .
h f − P∗ , φj i = h f , φj i − h P∗ , φj i
D n E n
= h f , φj i − ∑ ck φk , φj = h f , φj i − ∑ ck hφk , φj i.
k =0 k =0
Recall that the jth row of the equation Gc = b (see (2.14) is precisely
n
∑ ck hφk , φj i = h f , φj i,
k =0
D n E n n
h f − P∗ , qi = f − P∗ , ∑ d j φj = ∑ d j h f − P∗ , φj i = ∑ d j · 0 = 0.
j =0 k =0 k =0
system are small). Unfortunately, this is not the case – the condition
number of G grows exponentially in the dimension n, and the accu-
racy of the computed solution to the linear system quickly degrades
as n increases.
The last few condition numbers
n kGkkG−1 k kc − bck kGkkG−1 k are in fact smaller than
they ought to be: matlab computes the
5 1.495 × 107 7.548 × 10−11 condition number based as the ratio of
the largest to smallest singular values
10 1.603 × 1014 0.01288 of G; the smallest singular value can
15 4.380 × 1017 12.61 only be determined accurately if it is
larger than about kGkε mach , where
20 1.251 × 1018 46.9 ε mach ≈ 2.2 × 10−16 . Thus, if the true
condition number is larger than about
Clearly these errors are not acceptable! 1/ε mach , we should not expect matlab
In summary: The monomial basis forms an ill-conditioned basis for Pn to compute it accurately.
then G only has nonzeros on the main diagonal, giving the system
hφ0 , φ0 i 0 ··· 0 c0 h f , φ0 i
.. ..
0 hφ1 , φ1 i . . c1 h f , φ1 i
= .
.. .. .. .. ..
. . . 0
.
.
0 ··· 0 hφn , φn i cn h f , φn i
hφj , φj i
cj = , j = 0, . . . , n.
h f , φj i
Thus, with respect to the orthogonal basis the least squares approxi-
mation to f is given by
n n h f , φj i
(2.17) P∗ ( x ) = ∑ c j φ j ( x ) = ∑ h φ j , φ j i φ j ( x ).
j =0 j =0
add in one more term. If we momentarily use the notation P∗,k for
the least squares approximation from Pk , then
h f , φn+1 i
P∗,n+1 ( x ) = P∗,n ( x ) + φ ( x ).
hφn+1 , φn+1 i n+1
In contrast, to increase the degree of the least squares approximation
in the monomial basis, one would need to extend the G matrix by
one row and column, and re-solve form Gc = b: increasing the degree
changes all the old coefficients in the monomial basis.
An orthogonal basis also permits a beautifully simple formula for
the norm of the error, k f − P∗ k2 . This result is closely related to Parseval’s
identity, which essentially says that if
Theorem 2.9. Let φ0 , . . . , φn denote an orthogonal basis for Pn . Then φ0 , φ1 , . . . forms an orthogonal basis
for the (possibly infinite dimensional)
for any f ∈ C [ a, b], the norm of the error f − P∗ of the least squares vector space V, then for any f ∈ V,
approximation P∗ ∈ Pn is
h f , φ j i2
v k f k2 = ∑ .
u n h f , φ i2 j
hφj , φj i
j
k f − P∗ k2 = tk f k22 − ∑
u
(2.18) .
j =0
h φ j , φ ji To put the utility of the formula (2.18)
in context, think about minimax ap-
proximation. We have various bounds,
Proof. First, use the formula (2.17) for P∗ to compute like de la Vallée Poussin’s theorem, on
the minimax error, but no easy formula
n n exists to give you that error directly.
h f , φj i h f , φk i
k P∗ k22 = ∑ φj , ∑ φk
j =0
hφj , φj i k=0 hφk , φk i
n n h f , φj i h f , φk i
= ∑∑ h φ , φ i,
hφj , φj i hφk , φk i j k
j =0 k =0
using linearity of the inner product. Since the basis polynomials are
orthogonal, hφj , φk i = 0 for j 6= k, which reduces the double sum to
n h f , φj i h f , φj i n h f , φ j i2
k P∗ k22 = ∑ hφj , φj i hφj , φj i hφj , φj i = ∑ hφj , φj i .
j =0 j =0
k f − P∗ k22 = h f − P∗ , f − P∗ i = h f , f i − h f , P∗ i − h P∗ , f i + h P∗ , P∗ i
= h f , f i − 2h f , P∗ i + h P∗ , P∗ i
n
h f , φj i
= hf, fi − 2 f, ∑ φj + h P∗ , P∗ i
j =0
hφj , φj i
n h f , φ j i2 n h f , φ i2
j
= hf, fi − 2 ∑ +∑
j =0
h φ j , φ j i j =0
h φ j , φ ji
n h f , φ j i2
= k f k22 − ∑ ,
j =0
hφj , φj i
as required.
107
(2.20) A∗ Ax = A∗ b,
A = QR,
(2.22) x = R−1 Q∗ b.
i.e., the discrete least squares problem seeks to approximate b with Ran(A) = {Ax : x ∈ IRn } is the range
some vector v = Ax from the subspace Ran(A) ⊂ IRm . Writing (column space) of A.
h i
A = a1 · · · a n
v = Ax = x1 a1 + · · · xn an ∈ IRm
108
to approximate b ∈ IRm .
Viewing a1 , . . . , an as a basis for the approximating subspace
Ran(A), one can develop the least squares theory precisely as we
have earlier in this section, using the inner product
ha j , ak i = a∗k a j .
E( x1 , . . . , xn ) = kb − ( x1 a1 + · · · + xn an )k22
v = c1 q1 + · · · + cn qn = Qc = QQ∗ b.
xk = a + khm .
This least squares error, when scaled by hm , takes the form of a Rie-
mann sum that, in the m → ∞ limit, approximates an integral:
m Z b
lim hm
m→∞
∑ ( f (xk ) − p(xk ))2 = a
( f ( x ) − p( x ))2 dx.
k =0
That is, as we take more and more approximation points, the er-
ror (2.26) that we are minimizing better and better approximates the
integral error formulation (2.12).
To solve (2.26), represent p ∈ Pn using the monomial basis,
p ( x ) = c0 + c1 x + · · · + c n x n .
110
where
x0n
1 x0 x02 ··· c0 f ( x0 )
x12 x1n
1 x1 ···
c1
f ( x1 )
1 x2 x22 ··· x2n , c2 , f ( x2 )
A=
c=
f=
.
.. .. .. .. .. .. ..
. . . . .
.
.
1 xm 2
xm ··· xmn cn f ( xm )
This discrete problem can be solved via the normal equations, i.e., find
c ∈ IRn+1 to solve the matrix equation
A∗ Ac = A∗ f.
which is precisely the right hand side vector b ∈ IRn+1 obtained for
the original least squares problem at the beginning of this section
in (2.15). Similarly, the ( j + 1, k + 1) entry of A∗ A ∈ IR(n+1)×(n+1) for
the discrete problem can be formed as
m m
∑ ∑ x`
j j+k
(A∗ A) j+1,k+1 = x` x`k = ,
`=0 `=0
lim hm A∗ A = G,
m→∞
h f , φj i
cj = .
hφj , φj i
Definition 2.3. Given a function w ∈ C [ a, b] with w( x ) > 0, the inner Generalizations are possible: for ex-
product of f , g ∈ C [ a, b] with respect to the weight w is ample, we can allow w( x ) = 0 on a
set of measure zero (e.g., finitely many
Z b points on [ a, b]), and we can take [ a, b]
h f , gi = f ( x ) g( x )w( x ) dx. to be the unbounded interval [0, ∞) or
a (−∞, ∞), provided we are willing to
restrict C [ a, b] to functions that have
One can confirm that this definition is consistent with the axioms finite norm on these intervals.
required of an inner product that were described on page 97. For any
such inner product, we then have the following definitions.
Definition 2.4. The functions f , g ∈ C [ a, b] are orthogonal if h f , gi = 0.
Definition 2.5. A set of functions {φk }nk=0 is a system of orthogonal
polynomials provided:
• φk is a polynomial of exact degree k (with φ0 6= 0);
• hφj , φk i = 0 when j 6= k.
113
Be sure not to overlook the first property, that φk has exact degree k;
it ensures the following result.
The proof follows by observing that the exact degree property en-
sures that φ0 , . . . , φ` are ` + 1 linearly independent vectors in the
` + 1-dimensional subspace Pn .We can apply it to derive the next
lemma, one we will use repeatedly.
Proof. Lemma 2.2 ensures that {φk }nj=−01 is a basis for Pn−1 . Thus for
any p ∈ Pn−1 , one can determine constants c0 , . . . , cn−1 such that
n −1
p= ∑ c j φj .
j =0
The linearity of the inner product and orthogonality of {φj }nj=0 imply
D n −1 E n −1 n −1
h p, φn i = ∑ c j φj , φn = ∑ c j hφj , φn i = ∑ 0 = 0,
j =0 j =0 j =0
as required.
Pk = span{φ0 , . . . , φk }.
The trick is that the x in xφk−1 ( x ) can be flipped to the other side of
the inner product,
Z b
h xφk−1 ( x ), φj ( x )i = xφk−1 ( x ) φj ( x )w( x ) dx
a
Z b
= φk−1 ( x ) xφj ( x ) w( x ) dx
a
Now, recall from Lemma 2.3 that φk−1 is orthogonal to all polynomi-
als of degree less that k − 1. Thus since xφj ( x ) ∈ P j+1 , if j + 1 < k − 1,
k −1 h xφk−1 ( x ), φj ( x )i k −1 h xφk−1 ( x ), φj ( x )i
∑ hφj , φj i
φj = ∑ hφj , φj i
φj .
j =0 j = k −2
φ0 ( x ) = 1,
h x, 1i
φ1 ( x ) = x − ,
h1, 1i
h xφk−1 ( x ), φk−1 ( x )i h xφk−1 ( x ), φk−2 ( x )i
φk ( x ) = xφk−1 ( x ) − φk−1 ( x ) − φ ( x ), for k ≥ 2.
hφk−1 ( x ), φk−1 ( x )i hφk−2 ( x ), φk−2 ( x )i k−2
even and φ1 is odd over x ∈ [−1, 1], h x, 1i = 0, giving φ1 = x. This The recurrence for the monic Legendre
polynomial is given, e.g., by Dahlquist
begins an inductive cascade: in the Gram–Schmidt process, for all k,
and Björck, Numerical Methods in Scien-
tific Computing, vol. 1, p. 571. In contrast
h xφk−1 ( x ), φk−1 i = 0, to these monic polynomials, Legen-
dre polynomials are, by longstanding
since if φk−1 is even, xφk−1 will be odd (or vice versa), and the inner tradition, usually normalized so that
product of even and odd functions with w( x ) = 1 over x ∈ [−1, 1] is φk (0) = 1.
always zero. Thus for Legendre polynomials the conventional three- 2
term recurrence in Theorem 2.10 reduces to φ0
0
h xφk−1 ( x ), φk−2 ( x )i
φk ( x ) = xφk−1 ( x ) − φ ( x ).
hφk−2 ( x ), φk−2 ( x )i k−2 -2
-1 -0.5 0 0.5 1
Legendre polynomials enjoy many nice properties and identities; x
with some extra work, one can simplify the coefficient multiplying 2
φk−2 to φ1
( k − 1)2 0
φk ( x ) = xφk−1 ( x ) − φ ( x ).
4 ( k − 1 )2 − 1 k −2
-2
The first few Legendre polynomials φ0 , . . . , φ6 are presented below -1 -0.5 0 0.5 1
x
(and plotted in the margin):
1
φ0 ( x ) = 1 φ2
0
φ1 ( x ) = x
-1
-1 -0.5 0 0.5 1
φ2 ( x ) = x2 − 1
3 x
φ3 ( x ) = x3 − 35 x 1
φ3
φ4 ( x ) = x4 − 67 x2 + 3
35
0
φ5 ( x ) = x5 − 10 3
9 x + 5
21 x
-1
-1 -0.5 0 0.5 1
x
6 15 4 5 2 5
φ6 ( x ) = x − 11 x + 11 x − 231 . 0.5
and
Z 1
hφ0 , φ0 i = 12 dx = 1
0
Z 1
hφ1 , φ1 i = ( x − 1/2)2 dx = 1/12.
0
hex , φ0 i hex , φ0 i
P∗ ( x ) = φ0 ( x ) + φ (x)
hφ0 , φ0 i hφ1 , φ1 i 1
e−1 (3 − e)/2
= 1+ ( x − 1/2)
1 1/12
= (e − 1) + (18 − 6e)( x − 1/2)
= 4e − 10 + x (18 − 6e).
x0 , . . . , xn ∈ [ a, b],
a
pn ( x ) dx = ∑ f (x j )` j (x) dx = ∑ f (x j )
a j =0 a
` j ( x ) dx. Why is the Lagrange basis special?
j =0 Could you not do the same kind of
expansion with the monomial or New-
In the nomenclature of quadrature rules, the integrals of the basis ton bases? Yes indeed: but then you
functions are called weights, denoted would need to compute the coefficients
c j that multiply these basis functions
Z b in the expansion pn ( x ) = ∑ c j φj ( x ),
w j := ` j ( x ) dx. which requires the solution of a (non-
a trivial) linear system. The beauty of
the Lagrange approach is that these
coefficients are instantly available by
The degree-n interpolatory quadrature rule at distinct nodes evaluating f at the quadrature nodes:
x0 , . . . , xn takes the form c j = f ( x j ).
Z b n
a
f ( x ) dx ≈ ∑ w j f ( x j ),
j =0
a
f ( x ) dx = ∑ w j f ( x j ).
j =0
a
f ( x ) dx = ∑ w j f ( x j ).
j =0
f (x)
10
b−a
6
xj = j .
n
4
x−b x−a
p1 ( x ) = f ( a ) + f (b) ,
a−b b−a
b−a b−a
= f ( a) + f (b) .
2 2
In summary, 12
63.5714198. . . (trapezoid)
10 73.4543644. . . (exact)
Trapezoid rule:
Z b
b−a 8
f ( x ) dx ≈ f ( a) + f (b) .
a 2
6
4
The procedure behind the trapezoid rule is illustrated in Figure 3.2
where the area approximating the integral is colored gray. 2
1 00 1 1 1 1
= f (η )( a3 − a2 b + ab2 − b3 ) The mean value theorem for inte-
2 6 2 2 6 grals states that if h, g ∈ C [ a, b] and
h does not change sign on [ a, b], then
1 00
=− f (η )(b − a)3 there exists some η ∈ [ a, b] such that
12 Rb Rb
a g ( t ) h ( t ) dt = g ( η ) a h ( t ) dt. The re-
quirement that h not change sign is es-
for some η ∈ [ a, b]. The second step follows from the mean value
sential. For example, if g(t) = h(t) = t
theorem for integrals. R1 R1
then −1 g(t)h(t) dt = −1 t2 dt = 2/3,
In a forthcoming lecture we shall develop a much more general R1 R1
yet −1 h(t) dt = −1 t dt = 0, so for
theory, based on the Peano kernel, from which we can derive this error
R1
all η ∈ [−1, 1], g(η ) −1 h(t) dt = 0 6=
R1
−1 g ( t ) h ( t ) dt = 2/3.
124
bound, plus bounds for more complicated schemes, too. For now, we
summarize the bound in the following Theorem.
10 -8
10 -10 O(
h 3)
10 -12
10 0 10 -1 10 -2 10 -3
h
x0 = a, x1 = ( a + b)/2, x2 = b.
where
Z b
x − x1 x − x2 b−a
w0 = dx =
a x0 − x1 x0 − x2 6
Z b
x − x0 x − x2 2( b − a )
w1 = dx =
a x1 − x0 x1 − x2 3
Z b
x − x0 x − x1 b−a
w2 = dx = .
a x2 − x0 x2 − x1 6
In summary: 12
76.9618331. . . (Simpson)
10 73.4543644. . . (exact)
Simpson’s rule:
Z b
b−a 8
f ( x ) dx ≈ f ( a) + 4 f ( 21 ( a + b)) + f (b) .
a 6
6
4
Simpson’s rule enjoys a remarkable feature: though it only approxi-
mates f by a quadratic, it integrates any cubic polynomial exactly! One 2
can verify this by directly applying Simpson’s rule to a generic cu-
bic polynomial. Write f ( x ) = αx3 + q( x ), where q ∈ P2 . Let 0
0 2.5 5 7.5 10
Rb x
I ( f ) = a f ( x ) dx and let I2 ( f ) denote the Simpson’s rule approx-
Figure 3.4: Simpson’s rule estimate of
imation. Then, by linearity of the integral, R 10
0 f ( x ) dx, shown in gray.
I ( f ) = αI ( x3 ) + I (q)
I2 ( f ) = αI2 ( x3 ) + I2 (q).
b−a 3
a + b 3
I2 ( x3 ) = a +4 + b3
6 2
b − a 3 b4 − a4
= 3a + 3a2 b + 3ab2 + 3b3 = = I ( x 3 ),
12 4
In fact, Newton–Cotes formulas based
confirming that Simpson’s rule is exact for x3 , and hence for all cu- on approximating f by an even-degree
bics. For now we simply state an error bound for Simpson’s rule, polynomial always exactly integrate
which we will prove in a future lecture. polynomials one degree higher.
This error formula captures the fact that Simpson’s rule is exact
for cubics, since it features the fourth derivative f (4) (η ), two deriva-
tives greater than f 00 (η ) in the trapezoid rule bound, even though
the degree of the interpolant has only increased by one. Perhaps it is
helpful to visualize the exactness of Simpson’s rule for cubics. Fig-
ure 3.5 shows f ( x ) = x3 (blue) and its quadratic interpolant (red).
On the left, the area under f is colored gray: its area is the integral
we seek. On the right, the area under the interpolant is colored gray.
Accounting area below the x axis as negative, both integrals give an
identical value even though the functions are quite different. It is
remarkable that this is the case for all cubics.
Typically one does not see Newton–Cotes rules based on poly-
nomials of degree higher than two (i.e., Simpson’s rule). Because Integrating the cubic interpolant at
it can be fun to see numerical mayhem, we give an example to em- four uniformly spaced points is called
Simpson’s three-eighths rule.
phasize why high-degree Newton–Cotes rules can be a bad idea.
Recall that Runge’s function f ( x ) = 1/(1 + x2 ) gave a nice exam-
ple for which the polynomial interpolant at uniformly spaced points
over [−5, 5] fails to converge uniformly to f . This fact suggests that
Newton–Cotes quadrature will also fail to converge as the degree of
the interpolant grows. The exact value of the integral we seek is
Z 5
1
dx = 2 tan−1 (5) = 2.75680153 . . . .
−5 1 + x2
Just as the interpolant at uniformly spaced points diverges, so too
does the Newton–Cotes integral. Figure 3.6 illustrates this diver-
gence, and shows that integrating the interpolant at Chebyshev
127
1.5 1.5
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
-1.5 -1.5
-1 0 1 2 -1 0 1 2
x x
10 1
over x ∈ [−5, 5].
−5
10 0
Z 5
f ( x ) dx −
10 -1
Che
bys
-2 h ev p
10 oint
−5
Z 5
10 -3
10 -4
1 8 16 24
n
This error analysis has an important consequence: the error for the
composite trapezoid rule is only O(h2 ), not the O(h3 ) we saw for the
usual trapezoid rule (in which case b − a = h since n = 1).
Now use Theorem 3.3 to derive an error formula for the composite
Simpson’s rule, using the same approach as for the composite trape-
zoid rule.
h4
=− ( b − a ) f (4) ( η )
180
12 12
73.2181469. . . (comp. trapezoid) 73.4610862. . . (comp. Simpson)
73.4543644. . . (exact) 73.4543644. . . (exact)
10 10
8 8
6 6
4 4
2 2
0 0
0 2.5 5 7.5 10 0 2.5 5 7.5 10
x x
Figure 3.7: Composite trapezoid rule
(left) and composite Simpson’s rule
3.2.4 Adaptive Quadrature (right).
where
n
x−x
` j (x) = ∏ x j − xkk
k =0
132
−1
f ( x ) dx ≈
−1
pn ( x ) dx = ∑ f (xj )
−1
` j ( x ) dx.
j =0
Clenshaw–Curtis rule:
Z 1 n
−1
f ( x ) dx ≈ ∑ w j f ( x j ),
j =0
R1
where x j = cos( jπ/n) and w j = −1 ` j ( x ) dx.
b−a
f ( a) + f (b) ,
2
exactly integrates linear polynomials, but not all quadratics. In fact,
one can show that no quadrature rule of the form
w a f ( a ) + wb f ( b )
will exactly integrate all quadratics over [ a, b], regardless of the choice
of constants wa and wb . However, notice that a general quadrature
rule with two points,
w0 f ( x 0 ) + w1 f ( x 1 ) ,
w0 = w1 = 12 (b − a),
√ √
3 3
x0 = 21 (b + a) − 6 (b − a ), x1 = 12 (b + a) + 6 (b − a ).
Notice that x0 , x1 ∈ [ a, b]: If this were not the case, we could not
use these points as quadrature nodes, since f might not be defined
outside [ a, b]. When [ a, b] = [−1, 1], the interpolation points are
√
±1/ 3, giving the quadrature rule
√ √
I ( f ) = f (−1/ 3) + f (1/ 3).
for some weight function w ∈ C ( a, b) that is non-negative over ( a, b) and This weight function plays an essential
role in the discussion: it defines the
takes the value of zero only on a set of measure zero.
inner product, and so it dictates what
Now we wish to construct an interpolatory quadrature rule for an it means for two functions to be or-
integral that incorporates the weight function w( x ) in the integrand: thogonal. Change the weight function,
and you will change the orthogonal
n Z b polynomials.
In ( f ) = ∑ wj f (xj ) ≈ a
f ( x )w( x ) dx.
j =0
It is our aim to make In ( p) exact for all p ∈ P2n+1 . First, we will show In the Section 3.4.4 we shall see some
useful examples of weight functions.
that any interpolatory quadrature rule In will at least be exact for the
weighted integral of degree-n polynomials. Showing this is a simple
modification of the argument made in Section 3.1 for unweighted
integrals.
Given a set of distinct nodes x0 , . . . , xn , construct the polynomial
interpolant to f at those nodes:
n
pn ( x ) = ∑ f (x j )` j (x),
j =0
a
f ( x )w( x ) dx ≈
a
pn ( x )w( x ) dx =
a
∑ f ( x j )` j ( x ) w( x ) dx
j =0
n Z b
= ∑ f (xj ) a
` j ( x )w( x ) dx.
j =0
a
p( x )w( x ) dx = ∑ w j p(x j ) = In ( p). weights w0 , . . . , wn .
j =0
p( x ) = φn+1 ( x )q( x ) + r ( x )
This last statement is a consequence of the fact that In (·) will exactly
integrate all r ∈ Pn . This will be true regardless of our choice for
the distinct nodes { x j } ⊂ [ a, b]. (Recall that the quadrature rule
is constructed so that it exactly integrates a degree-n polynomial
interpolant to the integrand, and in this case the integrand, r, is a
degree n polynomial. Hence In (r ) will be exact.)
Rb
Notice that we can force agreement between In ( p) and a p( x )w( x ) dx
provided
n
∑ w j φn+1 (x j )q(x j ) = 0.
j =0
where f may not even be defined. Since we are integrating f over the
interval [ a, b], it is crucial that φn+1 has n + 1 distinct roots in [ a, b].
Fortunately, this is one of the many beautiful properties enjoyed by
orthogonal polynomials.
Proof. The result is trivial for φ0 . Fix any k ∈ {1, . . . , n + 1}. Suppose
that φk , a polynomial of exact degree k, changes sign at j < k distinct
(k) j
roots { x` }`=1 , in the interval [ a, b]. Then define
φk ( x )
q( x )
0 0
φk ( x )q( x )
0
a b a b a b
Figure 3.8: The functions φk , q, and φk q
from the proof of Theorem 3.9.
As the weight function w( x ) is nonnegative on [ a, b], it must also be
that φk qw does not change sign on [ a, b]. However, the fact that q ∈ P j
for j < k implies that
Z b
φk ( x )q( x )w( x ) dx = hφk , qi = 0,
a
a
(`k ( x ))2 w( x )dx = ∑ w j (`k (x j ))2
j =0
= wk (`k ( xk ))2 = wk ,
139
f (2n+2) (ξ )
Z b Z b
f ( x )w( x ) dx − In ( f ) = ψ2 ( x )w( x ) dx
a (2n + 2)! a
v0 = v ( λ0 ), v1 = v ( λ1 ), ..., v n = v ( λ n ).
Then, with a bit of work, one can show that the weights for n + 1-
point Gaussian quadrature can be computed as
1
(3.7) w j = β0 , Note: assumes (v j )1 = φ0 (λ j ) = 1.
kv j k22
(v j )21
(3.8) w j = β0 ,
kv j k22
Gauss–Legendre (−1, 1) 1
1
Gauss–Chebyshev (−1, 1) √
1 − x2
Gauss–Laguerre (0, ∞) e− x
2
Gauss–Hermite (−∞, ∞) e− x
x9
Z 1
√ dx,
−1 1 − x2
This is a subtle point that many students overlook when first learning
about Gaussian quadrature.
φ2 ( x ) = x2 − 1/3,
and from this polynomial one can derive the 2-point quadrature rule
√
that is exact for cubic polynomials, with roots ±1/ 3. This agrees
with the special 2-point rule derived in Section 3.4.1. The values for
the weights follow simply, w0 = w1 = 1, giving the 2-point Gauss–
Legendre rule
√ √
In ( f ) = f (−1/ 3) + f (1/ 3)
0.2
0.6 0.35
n=4 n=8 n = 16
0.5 0.3
0.15
wj wj 0.25 wj
0.4
0.2
0.3 0.1
0.15
0.2
0.1 0.05
0.1 0.05
0 0 0
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
xj xj xj
0.1 0.05
n = 32 n = 64 0.025 n = 128
0.08 0.04 0.02
wj wj wj
0.06 0.03 0.015
0 0 0
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
xj xj xj
β 0 = 2;
k2
βk = , k = 1, 2, 3, . . . .
4k2 −1
Figure 3.9 shows the nodes and weights for six values of n, as com-
puted via the eigenvalue problem. Notice that the points are not uni-
formly spaced, but are slightly more dense at the ends of the interval.
Moreover, the weights are smaller at these ends of the interval.
The table below shows nodes and weights for n = 4, as computed
in MATLAB.
j nodes, x j weights, w j
0 −0.906179845938664 0.236926885056189
1 −0.538469310105683 0.478628670499366
2 0.000000000000000 0.568888888888889
3 0.538469310105683 0.478628670499367
4 0.906179845938664 0.236926885056189
Tk ( x ) = cos(k cos−1 x ),
1
w( x ) = √ .
1 − x2
The degree-(n + 1) Chebyshev polynomial has the roots
( j + 1/2)π
x j = cos , j = 0, . . . , n.
n+1
In this case all the weights work out to be identical; one can show
π
wj =
n+1
for all j = 0, . . . n. Figure 3.10 shows these nodes and weights. One See Süli and Mayers, Problem 10.4 for a
can also define the monic Chebyshev polynomials according to the sketch of a proof.
β 0 = π;
β 1 = 1/2;
β k = 1/4, k = 2, 2, 3, . . . .
The resulting polynomials are scaled versions of the usual Chebyshev
polynomials Tk+1 ( x ), and thus have the same roots.
Again, we emphasize that the weight function plays a crucial role:
the Gauss–Chebyshev rule based on n + 1 interpolation nodes will
exactly compute integrals of the form
Z 1
p( x )
√ dx
−1 1 − x2
for all p ∈ P2n+1 . For a general integral
Z 1
f (x)
√ dx.
−1 1 − x2
the quadrature rule should be implemented as √
Note that the 1/ 1 − x2 component of
n the integrand is not evaluated here; its
In ( f ) = ∑ w j f ( x j ); influence has already been incorporated
j =0 into the weights {w j }.
145
0.7 0.4
n=4 n=8 0.2 n = 16
0.6
0.3
wj 0.5 wj wj 0.15
0.4
0.2 0.1
0.3
0.2 0.1 0.05
0.1
0 0 0
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
xj xj xj
0.1 n = 32 0.05
n = 64 0.025
n = 128
0.08 0.04
wj wj wj 0.02
0 0 0
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
xj xj xj
Z ∞
f ( x ) e− x dx.
0
n=4 n=8 n = 16
0 0 0
-5 -5 -5
log10 w j
log10 w j
log10 w j
-10 -10 -10
-15 -15 -15
-20 -20 -20
-25 -25 -25
-2 -1 0 1 2 -2 -1 0 1 2
10 10 10 10 10 10 10 10 10 10 10 -2 10 -1 10 0 10 1 10 2
xj xj xj
αk = 0, k = 0, 1, . . . ;
√
β 0 = π;
β k = k/2, k = 1, 2, 3, . . . .
Figure 3.12 shows nodes and weights for various values of n. Though
the interval of integration is infinite, the nodes do not grow as
rapidly as for Gauss–Laguerre quadrature, since the Hermite weight
2
w( x ) = e− x decays more rapidly than the Laguerre weight w( x ) =
e− x . (Again, the nodes and weights in the figure were computed with
Chebfun’s implementation of the Glaser, Liu, and Rokhlin algorithm.)
147
0 0 0
n=4 n=8 n = 16
-5 -5 -5
log10 w j
log10 w j
log10 w j
-10 -10 -10
-6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6
xj xj xj
b−a
τ (x) = a + ( x − c)
d−c
for
b−a
bj =
w wj , xbj = τ −1 ( x j ),
d−c
where { x j }nj=0 and {w j }nj=0 are the nodes and weights for the quadra-
ture rule on [c, d].
Be sure to note how this change of variables alters the weight
function. The transformed rule will now have a weight function
R1
[−1, 1]. If one wishes to integrate, for example, 0 x (1 − x2 )−1/2 dx, it
is not sufficient just to use the change of variables formula described
here. To compute the desired integral, one would have to adjust the
nodes and weights to accommodate w( x ) = (1 − x2 )−1/2 on [0, 1].
• Often Φ becomes increasingly expensive to evaluate as h shrinks; For example, computing Φ(h/2) often
requires at least twice as much work
• The numerical accuracy with which we can evaluate Φ may de- as Φ(h). In some cases, Φ(h/2) could
teriorate as h gets small, due to rounding errors in floating point require 4, or even 8, times as much
work at Φ(h), i.e., the expense of Φ
arithmetic. (For an example of the latter, try computing estimates could grow like 1/h or 1/h2 or 1/h3 ,
of f 0 (α) using the formula f 0 (α) ≈ ( f (α + h) − f (α))/h as h → 0.) etc.
The derivatives here may seem to complicate matters (e.g., what are
the derivatives of a quadrature rule with respect to h?), but we shall
not need to compute them: they key is that the function Φ behaves
150
smoothly in h. Recalling that Φ(0) = ξ, we can rewrite the Taylor For the sake of clarity let us discuss
series for Φ(h) as a concrete case, elaborated upon in
Example 3.6 below. Suppose we wish
to compute ξ = f 0 (α) using the finite
Φ ( h ) = ξ + c1 h + c2 h2 + c3 h3 + · · · difference formula
f (α + h) − f (α)
for some constants {c j }∞ 0
j=1 . (For example, c1 = Φ (0).)
Φ(h) =
h
.
This expansion implies that taking Φ(h) as an approximation for ξ The quotient rule gives
incurs an O(h) error. Halving the parameter h should roughly halve f (α) − f (α + h) f 0 (α + h)
the error, according to the expansion Φ0 ( h) = 2
+ ,
h h
which will depend smoothly on h pro-
Φ(h/2) = ξ + c1 21 h + c2 41 h2 + c3 18 h3 + · · · . vided f is smooth near α. In particular,
a Taylor expansion for f gives
Here comes the trick that is key to the whole lecture: Combine the f (α + h) = f (α) + h f 0 (α) + 21 h2 f 00 (α) + 61 h3 f 000 (η )
expansions for Φ(h) and Φ(h/2) in such a way that eliminates the
for some η ∈ [α, α + h]. Substitute this
O(h) term. In particular, define formula into the equation for Φ0 (h) and
simplify to get
Ψ(h) := 2Φ(h/2) − Φ(h)
f 0 (α + h) − f 0 (α) 1 00
Φ0 ( h) = − 2 f (α) − 61 h f 000 (η ).
= 2 ξ + c1 21 h + c2 41 h2 + c3 81 h3 + · · · h
Now this expression leads to a clean
− ξ + c1 h + c2 h2 + c3 h3 + · · · formula for the first coefficient of the
Taylor series for Φ(h):
Ψ(h/2) = ξ − c2 18 h2 − c3 32
3 3
h +··· ,
we have
4Ψ(h/2) − Ψ(h)
Θ(h) := = ξ + c3 18 h3 + · · · .
3
To compute Θ(h), we must have access to both Ψ(h) and Ψ(h/2).
These, in turn, require Φ(h), Φ(h/2), and Φ(h/4). In many cases,
Φ becomes increasingly expensive to compute as the parameter h is
reduced. Thus there is some practical limit to how small we can take
h when evaluating Φ(h).
One could continue this procedure repeatedly, each time improv-
ing the accuracy by one order, at the cost of one additional Φ com-
putation with a smaller h. To facilitate generalization and to avoid a
further tangle of Greek characters, we adopt a new notation: Define
R( j, 0) := Φ(h/2 j ), j ≥ 0;
2k R( j, k − 1) − R( j − 1, k − 1)
R( j, k) := , j ≥ k > 0.
2k − 1
151
h Φ(h) error
1 4.670774270 1.95249 × 100
1/2 3.526814484 8.08533 × 10−1
1/4 3.088244516 3.69963 × 10−1
1/8 2.895480164 1.77198 × 10−1
1/16 2.805025851 8.67440 × 10−2
1/32 2.761200889 4.29191 × 10−2
1/64 2.739629446 2.13476 × 10−2
1/128 2.728927823 1.06460 × 10−2
1/256 2.723597892 5.31606 × 10−3
1/512 2.720938130 2.65630 × 10−3
10 -6
10 -8
10 -10 O( h )
10 -12
10 0 10 -2 10 -4 10 -6 10 -8 10 -10 10 -12
h
j R( j, 0) R( j, 1) R( j, 2) R( j, 3) R( j, 4)
0 4.67077427047160
1 3.52681448375804 2.38285469704447
2 3.08824451601118 2.64967454826433 2.73861449867095
3 2.89548016367188 2.70271581133258 2.72039623235534 2.71779362288168
4 2.80502585140344 2.71457153913500 2.71852344840247 2.71825590783778 2.71828672683485
10 -12
10 0 10 -1 10 -2 10 -3 10 -4 10 -5 10 -6
h
smooth, then Φ(h) will not have smooth derivatives, and the accu-
racy breaks down. The accuracy also eventually degrades because
of rounding errors that subtly pollute the initial column of data, as
shown in the Figure 3.14.
2rk R( j, k − 1) − R( j − 1, k − 1)
(3.9) R( j, k) := for j ≥ k > 0.
2rk − 1
Notice that T (h) only makes sense (as the composite trapezoid rule) If you find this restriction on h distract-
ing, just define T (h) to be a sufficiently
when h = (b − a)/n for some integer n. Notice that T ((b − a)/n) smooth interpolation between the
values of T ((b − a)/n) for n = 1, 2, . . . .
154
R( j, 0) = T (h/2 j ) for j ≥ 0
4k R( j, k − 1) − R( j − 1, k − 1)
R( j, k) = for j ≥ k > 0.
4k − 1
This procedure is called Romberg integration.
In cases where f ∈ C ∞ [ a, b] (or if f has many continuous deriva-
tives), the Romberg table will converge to high accuracy, though it
may be necessary to take h to be relatively small before this is ob-
served. When f does not have many continuous derivatives, each
column of the Romberg table will still converge to the true integral,
but not at the ever-improving clip we expect for smoother functions.
This procedure’s utility is best appreciated through an example.
Example 3.7. For purposes of demonstration, we should use an
integral we know exactly, say
Z π
sin( x ) dx = 2.
0
f ( x ) = M − x + e sin( x ).
This is a simple equation in one variable, x ∈ IR, and it turns out that
it is not particularly difficult to solve. Other examples are compli-
cated by nastier nonlinearities, multiple solutions (in which case one
might like to find them all), ill-conditioned zeros (where f ( x ) ≈ 0 for
x far from the true zeros of f ), solutions in the complex plane, and
expensive f evaluations.
Optimization is closely allied to the solution of nonlinear equa-
tions, since one finds extrema of F : IR → IR by solving
F 0 ( x ) = 0.
∇ F (x) = 0,
min F (x),
x ∈S
156
4.1.1 Bisection
Given a bracket [ a, b] for which f takes opposite sign at a and b, the
simplest technique for finding x∗ is the bisection algorithm:
For k = 0, 1, 2, . . .
1. Compute f (ck ) for ck = 21 ( ak + bk ).
2. If f (ck ) = 0, exit; otherwise, repeat with
(
[ ak , ck ], if f ( ak ) f (ck ) < 0;
[ a k + 1 , bk + 1 ] : =
[ck , bk ], if f (ck ) f (bk ) < 0.
3. Stop when the interval bk+1 − ak+1 is sufficiently small,
or if f (ck ) = 0.
How does this method converge? Not bad for such a simple idea. At
157
We say this iteration converges linearly (the log of the error is bounded
by a straight line when plotted against iteration count – see the next
example) with rate ρ = 1/2. Practically, this means that the error is
cut in half at each iteration, independent of the behavior of f . Reduction
of the initial bracket width by ten orders of magnitude would require
roughly log2 1010 ≈ 33 iterations. If f is fast to evaluate, this conver-
gence will be pretty quick; moreover, since the algorithm only relies
on our ability to compute the sign of f ( x ) accurately, the algorithm is
robust to strange behavior in f (such as local minima).
M = E − e sin E. 3
2
Cast this as the root-finding problem f ( E) = 0, where
1
0
f ( E) = M − E + e sin E.
-1 f ( E)
In this example we set e = 0.8 and M = 3π/4, yielding the function -2
shown in the margin. Judging from this plot, the desired root E∗ -3
-4
falls in the interval [2, 3]. Using the initial bracket [ a, b] = [2, 3], the 0 1 2 3 4 5 6
bisection method converges as steadily as expected, cutting the error E
in half at every step. Figure 4.2 shows the convergence to the exact
root E∗ = 2.69889638445749738544 . . . .
10 -10
10 -15
0 5 10 15 20 25 30 35 40 45 50
k
f ( bk ) − f ( a k )
pk ( x ) = f ( ak ) + ( x − a k ).
bk − a k
Note that only Step 3 differs significantly from the bisection method.
The former algorithm forces the bracket width bk − ak to zero as it
homes in on the root. In contrast, there is no mechanism in the regula
falsi algorithm to drive the bracket width to zero: it will still always
converge (in exact arithmetic) even though the bracket length does
159
(4.1) ≤ f ( x ) + f 0 (z)(z − x ).
f ( z ) = λ f ( z ) + (1 − λ ) f ( z )
Notice that
f (b) − f ( a)
p(z) = f ( a) + λa + (1 − λ)b − a f ( a)
b−a
f (b) − f ( a) a z b
= f ( a) + (1 − λ)(b − a)
b−a
= λ f ( a ) + (1 − λ ) f ( b )
≥ f λa + (1 − λ)b
= f ( z ),
f ( a0 )
161
10 -10
lsi
10 -15
0 5 10 15 20 25 30 35 40 45 50
k
[ a k + 1 , bk +1 ] = [ c k , bk ] .
Notice that ck > ak = ck−1 . If the algorithm never finds an exact root,
it forms a sequence of estimates {ck } that is monotonically increasing
and bounded above by b0 . The monotone convergence theorem in For a proof, see Rudin, Principles of
real analysis ensures that any bounded monotone sequence of real Mathematical Analysis, Theorem 3.14.
γ f (b0 ) − b0 f (γ)
γ= .
f (b0 ) − f (γ)
Example 4.3. Lest Example 4.2 suggest that regula falsi is always su-
perior to bisection, we now consider a function for which regula falsi
converges very slowly. Sketch out a few sample functions. You will
soon see how to design an f such that the root ck of the linear ap-
proximation converges slowly toward x∗ as k increases. The function
should be relatively flat and small in magnitude in a large region 2
near the root. One such example is
1.5
1/20
2 19 1
f ( x ) = sign(tan−1 ( x )) tan−1 ( x ) + ,
π 20
0.5 f (x)
which has a single root at x∗ ≈ −0.6312881 . . . . This function is illus-
0
trated in the margin. Figure 4.3 compares convergence of bisection
-0.5
and regula falsi for this f with the initial bracket [−10, 10]. The small -10 -5 0 5 10
value of f at the left end of the bracket ensures that [ a1 , b1 ] = [c0 , b] x
4.1.3 Accuracy
Here we have assumed that we calculate f ( x ) to perfect accuracy,
an unrealistic expectation on a computer. If we attempt to compute
x∗ to very high accuracy, we will eventually experience errors due
to inaccuracies in our function f ( x ). For example, f ( x ) may come
from approximating the solution to a differential equation, were there
10 -10
10 -15
0 20 40 60 80 100 120
k
163
4.1.4 Conditioning
When | f 0 ( x0 )| 0, the desired root is easy to pick out. In cases
where f 0 ( x0 ) ≈ 0, the root will be ill-conditioned, and it can be difficult
to locate. This is the case, for example, when x0 is a multiple root of
f . (You may find it strange that the more copies of a root you have,
A well-conditioned root.
the more difficult it can be to compute it!) In such cases bisection has
the advantage that it only depends on the sign of f .
4.1.5 Deflation
What is one to do if multiple distinct roots are required? One ap-
proach is to choose a new initial bracket that omits all known roots.
Another technique, though numerically fragile, is to work with
fb( x ) := f ( x )/( x − x0 ), where x0 is the previously computed root. An ill-conditioned root.
164
(4.4) f ( x∗ ) = f ( xk ) + f 0 ( xk )( x∗ − xk ) + 12 f 00 (ξ )( x∗ − xk )2 ,
which implies
f ( xk )
x∗ ≈ xk − .
f 0 ( xk )
We get an iterative method by replacing x∗ in this formula with xk+1 .
f ( xk )
(4.5) x k +1 : = x k − .
f 0 ( xk )
165
f ( xk )
(4.6) e k +1 = e k − .
f 0 ( xk )
0 = f ( xk ) − f 0 ( xk )ek + 12 f 00 (ξ )e2k .
Solving this equation for f ( xk ) and substituting that formula into the Newton’s method for finding zeros of
expression (4.6) for ek+1 gives f ( x ) = x7 − 1 in the complex plane.
This function has seven zeros the
complex plane, the principal roots of
f 0 ( xk )ek + 12 f 00 (ξ )e2k
e k +1 = e k − unity (shown as white dots in the plot).
f 0 ( xk ) The color encodes the convergence
of Newton’s method: for each point
f 00 (ξ ) 2 x0 ∈ C, iterate Newton’s method until
=− e .
2 f 0 ( xk ) k convergence. The color corresponds
to which of the seven roots that x0
When xk converges to x∗ , ξ ∈ [ x∗ , xk ] also converges to x∗ . Suppos- converged. In some regions, small
changes to x0 get attracted to vastly
ing that x∗ is a simple root, so that f 0 ( x∗ ) 6= 0, the above analysis different roots.
suggests that when xk is near x∗ ,
| e k +1 | ≤ C | e k | 2
for constant
| f 00 ( x∗ )|
C=
2 | f 0 ( x∗ )|
independent of k. Thus we say that if f 0 (?) 6= 0, then Newton’s
method converges quadratically, roughly meaning each iteration of
Newton’s method doubles the number of correct digits at each iteration.
Compare this to bisection, where
|ek+1 | ≤ 12 |ek |,
meaning that the error was halved at each step. Significantly, New-
ton’s method will often exhibit a transient period of linear conver-
gence as xk is gradually improved; once xk gets close enough to x∗ ,
the behavior transitions to quadratic convergence and full machine
precision is attained in just a couple more iterations.
√
Example 4.4. One way to compute 2 is to find the zero of
f ( x ) = x2 − 2.
166
Figure 4.5 shows convergence in just a few steps for the realistic start-
ing guess x0 = 1.25. The plot also shows the convergence behavior
for the (entirely ridiculous) starting guess x0 = 1000, to illustrate a
linear phase of convergence as the iterate gradually approaches the
region of quadratic convergence. Once xk is sufficiently close to x∗ ,
convergence proceeds very quickly.
Table 4.1 shows the iterates for x0 = 1000, computed exact arith-
metic in Mathematica, and displayed here to more than eighty digits.
This is a bit excessive: in the floating point arithmetic we have used
k xk
0 1000.00000000000000000000000000000000000000000000000000000000000000000000000000000000000
1 500.00100000000000000000000000000000000000000000000000000000000000000000000000000000000
2 250.00249999600000799998400003199993600012799974400051199897600204799590400819198361603
3 125.00524995800046799458406305526512856598014823595622393441695800477446685799463896484
4 62.51062464301703314888691358403320464529759944325744566631164600631017391478309761341
5 31.27130960206219455596422358771700548374565801842332086536365236578278080406153827364
6 15.66763299486836640030755527100281652065100159710324459452581543767403479921834012248
7 7.89764234785635806719051360934236238116968365174167025116461034160777628217364960111
8 4.07544124051949892088798573387067133352991149961309267159333980191548308075360961862
9 2.28309282439255383986306690358177946144339233634377781606055538481637200759555376236
10 1.57954875240601536527547001727498935127463981776389016188975791363939586265860323251
11 1.42286657957866825091209683856309818309310929428763928162890934673847036238184992693
12 1.41423987359153062319364616441120035182529489347860126716395746896392690040774558375
13 1.41421356261784851265589000359174396632207628548968908242398944391615436335625360056
14 1.41421356237309504882286807775717118221418114729423116637254804377031332440406155716
15 1.41421356237309504880168872420969807856983046705949994860439640079460765093858305190
16 1.41421356237309504880168872420969807856967187537694807317667973799073247846210704774
exact 1.41421356237309504880168872420969807856967187537694807317667973799073247846210703885038753 . . .
x k +1 = x k − α f ( x k )
| e k +1 | ≤ C | e k | p
| f 00 ( x∗ )|
C= .
2 | f 0 ( x∗ )|
f (x)
Φ( x ) = x − .
f 0 (x)
x ∗ = Φ ( x ∗ ).
169
f (x)
Φ( x ) = x − ,
f 0 (x)
f 0 ( x )2 − f ( x ) f 00 ( x ) f ( x ) f 00 ( x )
Φ0 ( x ) = 1 − = .
f 0 ( x )2 f 0 ( x )2
|Φ00 ( x∗ )| | f 00 ( x∗ )|
C= = ,
2 2 | f 0 ( x∗ )|
as expected.
What happens when f 0 ( x∗ ) = 0? The direct iteration framework
makes it straightforward to analyze this situation. If x∗ is a multiple
root, we might worry that Newton’s method might have trouble con-
verging, since we are dividing f ( xk ) by f 0 ( xk ), and both quantities
are nearing zero as xk → x∗ . To study convergence, we investigate
f ( x ) f 00 ( x )
lim Φ0 ( x ) = lim .
x → x∗ x → x∗ f 0 ( x )2
This limit has the indeterminate form 0/0. Assuming sufficient dif-
ferentiability, we can invoke l’Hôpital’s rule:
f ( x ) f 00 ( x ) f 0 ( x ) f 00 ( x ) + f ( x ) f 000 ( x )
lim = lim ,
x → x∗ f 0 ( x )2 x → x∗ 2 f 0 ( x ) f 00 ( x )
170
f ( x ) f 00 ( x ) f 00 ( x )2 + 2 f 0 ( x ) f 000 ( x ) + f ( x ) f (iv) ( x ) f 00 ( x )2 1
lim 0 2
= lim 0 000 00 2
= lim 00 2
= .
x → x∗ f (x) x → x∗ 2( f ( x ) f ( x ) + f ( x ) ) x → x∗ 2 f ( x ) 2
Newton’s method is fast if one has a good initial guess x0 . Even then,
it can be inconvenient and expensive to compute the derivatives
f 0 ( xk ) at each iteration. The final root finding algorithm we consider
is the secant method, a kind of quasi-Newton method based on an ap-
proximation of f 0 . It can be thought of as a hybrid between Newton’s
method and regula falsi.
f ( x + h) − f ( x )
f 0 (x) ≈
h
for some small h. (Recall that too-small h will give a bogus answer
due to rounding errors, so some caution is needed; see Section 3.5.)
What if we replace f 0 ( xk ) in Newton’s method with this sort of ap-
proximation? The natural algorithm that emerges is the secant method,
x k − x k −1 x f ( x k ) − x k f ( x k −1 ) 0
x k +1 = x k − f ( x k ) = k −1 . x0 x1 x2
f ( x k ) − f ( x k −1 ) f ( x k ) − f ( x k −1 )
Note the similarity between this formula and the regula falsi iteration:
f
a k f ( bk ) − bk f ( a k )
ck = .
f ( bk ) − f ( a k )
x − xk
(4.7) p( x ) = f ( xk ) + f ( x k −1 ) − f ( x k ) .
x k −1 − x k
172
Once again we can put the interpolation error formula (Theorem 1.3)
to good use. Assuming that f ∈ C2 (IR), for any x ∈ IR one can write
f 00 (ξ )
f ( x ) − p( x ) = ( x − xk )( x − xk−1 ),
2
where ξ falls within the extremes of x, xk , and xk−1 . Since f ( x∗ ) = 0,
we can thus write
f 00 (ξ )
0 = p( x∗ ) + ( x∗ − xk )( x∗ − xk−1 ).
2
Defining e j := x j − x∗ as usual, this last equation is
f 00 (ξ )
0 = p( x∗ ) + e k e k −1 .
2
Substituting formula (4.7) for p gives
x∗ − xk f 00 (ξ )
(4.8) 0 = f ( xk ) + f ( x k −1 ) − f ( x k ) + e k e k −1 .
x k −1 − x k 2
Now recall that, by design, the secant method picks xk+1 as the zero
of p, i.e.,
0 = p ( x k +1 )
x k +1 − x k
(4.9) = f ( xk ) + f ( x k −1 ) − f ( x k ) .
x k −1 − x k
x k +1 − x ∗ f 00 (ξ )
0= f ( x k −1 ) − f ( x k ) − e k e k −1
x k −1 − x k 2
f ( x k −1 ) − f ( x k ) f 00 (ξ )
(4.10) = e k +1 − e k e k −1 .
x k −1 − x k 2 p
f ( x k −1 ) − f ( x k )
x k −1 − x k
f ( x k −1 ) − f ( x k )
f 0 (η ) = .
x k −1 − x k
f 00 (ξ )
0 = f 0 ( η ) e k +1 − e k e k −1 ,
2
173
f 00 ( x∗ )
C= ,
2 f 0 ( x∗ )
(4.13) e j+1 ≈ Me jr
ek ≈ Mekr−1
implies that
ek−1 ≈ M−1/r e1/r
k ,
This equation must agree with the the error form (4.13) with j = k:
M = Cr/(r+1) ,
r2 − r − 1 = 0.
174
| e k +1 | ≤ M | e k | ϕ
for a constant M > 0. Note that ϕ < 2, so, in the region of asymptotic
convergence (xk close to x∗ ), one step of the secant method will make
a bit less progress to the root than one step of Newton’s method.
Though you may regret that the secant method does not recover
the quadratic convergence of Newton’s method, take solace in the
fact that the secant method requires only one function evaluation Of course, for the secant method one
f ( xk ) at each iteration, as opposed to Newton’s method, which re- stores the f ( xk−1 ) value computed
during the previous iteration.
quires f ( xk ) and f 0 ( xk ). Typically the derivative is more expensive to
compute than the function itself. Assuming that evaluating f ( xk ) and
f 0 ( xk ) requires the same amount of effort, then we can compute two
steps of the secant method for roughly the same cost as a one step of
Newton’s method. These two steps of the secant method combine to
give an improved convergence rate:
ϕ 2
| e k +2 | ≤ M | e k +1 | ϕ ≤ M M | e k | ϕ ≤ M 1+ ϕ | e k | ϕ ,
√
where ϕ2 = 12 (3 + 5) ≈ 2.62 > 2. Hence, in terms of computing
time, the secant method can actually be more efficient than Newton’s
method. This discussion is drawn from §3.3 of
Figure 4.5 compares the convergence of the secant method to New- Kincaid and Cheney, Numerical Analysis,
3rd ed.
ton’s method for the function f ( x ) = x2 − 2, which we can use to
√
compute x∗ = 2 as in Figure 4.5. This example starts with the (bad)
initial guess x0 = 10. To ensure that the secant method is not ham-
pered by a bad value of x1 , this experiments uses the same x1 value
computed using Newton’s method. After these two initial steps,
both methods steadily converge, but Newton’s method takes fewer
iterations, in agreement with the theory derived in this and the last
lecture. Table 4.2 shows the iterates xk and magnitude of the errors
|ek | for both methods.
175
10 -5
New
10 -10
sec
ton’
ant
s me
me
tho
thod
10 -15
d
0 1 2 3 4 5 6 7 8 9 10 11
k
x ( t0 ) = x0 .
That is, we are given a formula for the derivative of some unknown
function x (t), together with a single value of the function at some
initial time, t0 . The goal is to use this information to determine x (t) at
all points t beyond the initial time.
x 0 (t) = λx (t),
x0 = x (0) = eλ·0 c = c,
provides a good test case for numerical algorithms. Moreover, it is Since itβ is purely imaginary, |eitβ | = 1,
so
the prototypical linear ODE; from it, we gain insight into the local
|etλ | = etα .
behavior of nonlinear ODEs. Thus |etλ | → 0 as t → ∞ if Re λ < 0,
Applications typically give equations whose whose solutions while |etλ | → ∞ as t → ∞ if Re λ > 0.
cannot be expressed as simply as the solution of this linear model
problem. Among the tools that improve our understanding of more
difficult problems is the direction field of the function f (t, x ), a key
179
3
Figure 5.1: Force field for the equation
x 0 (t) = x (t), for which f (t, x ) = x.
2
x (t)
1
-1
-2
-3
0 1 2 3 4 5 6
t
For an elementary introduction to the
qualitative analysis of ODEs, see Hub-
technique from the sub-discipline of the qualitative analysis of ODEs. bard and West, Differential Equations:
Here is the critical idea: The function f (t, x (t)) reveals the slope of the A Dynamical Systems Approach, Part I,
Springer-Verlag, 1991.
solution x (t) going through any point in the (t, x (t)) plane. Hence
one can get a good impression about the behavior of a differential
equation by plotting these slopes throughout some interesting region
of the (t, x (t)) plane.
To plot the direction field, let the horizontal axis represent t, and
the vertical axis represent x. Then divide the (t, x ) plane with reg-
ular grid points, {(t j , xk )}. Centered at each grid point, draw a line
segment whose slope is f (t j , xk ). To get a rough impression of the
solution of the differential equation x 0 (t) = f (t, x ) with x (t0 ) = x0 ,
begin at the point (t0 , x0 ), and follow the direction of the slope lines.
Figure 5.1 shows the direction field for x 0 (t) = x (t), giving
f (t, x ) = x. Since f does not depend directly on t, the differential
equation is autonomous. In the plot of the direction field, for a fixed
value of x, the arrows point in the same direction and have the same
magnitude for all t.
One only needs a few simple MATLAB commands to produce a
direction field like the one seen in Figure 5.1, thanks to the build-in
quiver routine.
f = inline(’x’,’t’,’x’); % x’ = f(t,x) = x
x = linspace(-3,3,15); t = linspace(0,6,15); % grid of points at which to plot the slope
[T,X] = meshgrid(t,x); % turn grid vectors into matrices
figure(1), clf
quiver(T,X,ones(size(T)),f(T,X)), hold on % produce a "quiver" plot
axis([min(t) max(t) min(x) max(x)]) % adjust the axes
Figure 5.2 repeats Figure 5.1, but now showing solution trajectories
for x (0) = 0.1 and x (0) = −0.01. Notice how these solutions follow
180
3
Figure 5.2: Force field for the equation
x 0 (t) = x (t), now showing solutions for
x (0) = 0.1 (in blue) and x (0) = −0.01
2
(in red).
x (t)
1
-1
-2
-3
0 1 2 3 4 5 6
t
Example 5.2. Next consider an equation that, for most x (0), lacks an
elementary solution that can be expressed in closed form,
The direction field for sin( xt) is shown below. Though we don’t have
access to the exact solution, it is a simple matter to compute accu-
rate approximations. Several solutions (for x (0) = 3, x (0) = 0, and
x (0) = −2) are superimposed on the direction field. These were
computed using a one-step method of the kind we will discuss mo-
mentarily. (Those areas where up and down arrows appear to cross
4
Figure 5.3: Force field for the equation
x 0 (t) = sin(tx (t)), showing solutions for
3 x (0) = 3 (blue), x (0) = 0 (black) and
x (0) = −2 (red).
x (t) 2
-1
-2
-3
-4
0 2 4 6 8 10 12 14 16
t
181
All the techniques for solving scalar initial value problems described
in this course can be applied to systems of this type.
Note, in particular, that the initial conditions x (t0 ) and x 0 (t0 ) must
both be supplied. In some cases, one instead knows x (t)
This second order equation (and higher-order ODE’s as well) at two distinct points, x (t0 ) = x0 and
x (tfinal ) = xfinal , leading to an ODE
can always be written as a first order system of equations. Define boundary value problem.
x1 (t) = x (t), and let x2 (t) = x 0 (t). Then
Writing this in vector form, x(t) = [ x1 (t) x2 (t)]T , and the differential
equation becomes Fonts matter: x (t) denotes a scalar
quantity, while x(t) is a vector.
182
" # " #
0 x10 (t) x2 ( t )
x (t) = = = f(t, x(t)).
x20 (t) f (t, x1 (t), x2 (t))
x 00 (t) = − x (t),
−x(t)
x00 (t) = ,
kx(t)k32
Since x(t) ∈ IR3 , this second order equation reduces to a system of six
first order equations.
Theorem 5.1 (Picard’s Theorem). For a proof, see Süli and Mayers,
Let f (t, x ) be a continuous function on the rectangle Section 12.1.
x0 = x ( t0 ).
know that
x (t + h) − x (t)
x 0 (t) = lim .
h →0 h
This definition of the derivative inspires our first method. Apply it at
time t0 with a small but finite time step h > 0 to obtain
x ( t0 + h ) − x ( t0 )
x 0 ( t0 ) ≈ .
h
Euler’s Method: x k +1 = x k + h f ( t k , x k ).
6
x
slope at t0
x1 = f ( t0 , x0 ) s(t1 , x1 )
H x (t)
H
j
H
x0 s
-
t0 t1 t
Example 5.4. Next, consider the second example, x 0 (t) = sin(tx (t)),
this time with x (0) = 5. Since we do not know the exact solution, we
can only compare approximate answers, here obtained with h = 0.5
and h = 0.1. For t > 4, the solutions completely differ from one
another! Again, the smaller step size is the more accurate solution. In
the plot below, the direction field is shown together with the approx-
imate solutions. Note that f (t, x ) = sin(tx ) varies with x, so when
the h = 0.5 solution diverges from the h = 0.1 solution, very different
values of f are used to generate iterates. The h = 0.5 solution ‘jumps’
over the correct asymptote, and provides a very misleading answer.
with x (0) = 0. This equation looks innocuous enough; indeed, you This example is given in Kincaid and
might notice that the exact solution is x (t) = tan(t). The true solu- Cheney, page 525.
x k +1 = x k + h f ( t k , x k )
= h + xk (1 + hxk )
will always produce some finite quantity; it will never give the infi-
nite answer at t = π/2. Still, as we see in the plots below, Euler’s
method captures the qualitative behavior well, with the iterates grow-
ing very large soon after t = π/2. (Notice that the vertical axis is
logarithmic, so by t = 2, the approximation with time step h = 0.05
exceeds 1010 .)
187
x k +1 = x k + h f ( t k , x k ),
First consider a modification that might not look like such a big
improvement: simply replace f (tk , xk ) by f (tk+1 , xk+1 ) to obtain
x k +1 = x k + h f ( t k +1 , x k +1 ),
called the backward Euler method. Because xk+1 depends on the value
f (tk+1 , xk+1 ), this scheme is called an implicit method; to compute At each step, one must find a zero of the
xk+1 , one needs to solve a (generally nonlinear) system of equations, function
rather more involved than the simple update required for the for- G ( x k +1 ) = x k +1 − x k − h f ( t k +1 , x k +1 )
ward Euler method. using, for example Newton’s method
One can improve on both Euler methods by averaging the updates or the secant method. If h is small and
f is not too wild, we might hope that
they make to xk : we could get an initial guess xk+1 ≈ xk ,
or xk+1 ≈ xk + h f (tk , xk ). Note that
(5.1) xk+1 = xk + 21 h f (tk , xk ) + f (tk+1 , xk+1 ) . this nonlinear iteration could require
multiple evaluations of f to advance the
backward Euler method by one time
This method is the trapezoid method, for it can be derived by integrat- step.
ing the equation x 0 (t) = f (t, x (t)),
Z t Z t
k +1 k +1
x 0 (t) dt = f (t, x ) dt,
tk tk
and approximating the integral on the right using the trapezoid rule.
The fundamental theorem of calculus gives the exact formula for the
integral on the left, x (tk+1 ) − x (tk ). Together, this gives
t k +1 − t k
(5.2) x ( t k +1 ) − x ( t k ) ≈ f t k , x ( t k ) + f t k +1 , x ( t k +1 ) .
2
188
Replacing the inaccessible exact values x (tk ) and x (tk+1 ) with their
approximations xk and xk+1 , and using the time-step h = tk+1 − tk ,
equation (5.2) suggests
h
x k +1 − x k = f ( t k , x k ) + f ( t k +1 , x k +1 ) .
2
Rearranging this equation gives the trapezoid method (5.1) for xk+1 .
Like the backward Euler method, the trapezoid rule is implicit,
due to the f (tk+1 , xk+1 ) term. To obtain a similar explicit method,
replace xk+1 by its approximation from the explicit Euler method:
Note that this method can be implemented using only two evalua-
tions of the function f (t, x ).
The modified Euler method takes a similar approach to Heun’s
method:
xk+1 = xk + h f tk + 12 h, xk + 12 h f (tk , xk ) ,
k1 = f (tk , xk )
k2 = f (tk + 21 h, xk + 12 hk1 )
k3 = f (tk + 12 h, xk + 12 hk2 )
k4 = f (tk + h, xk + hk3 ).
1. The error due to the fact that even if the method was exact at tk ,
the updated value xk+1 at tk+1 will not be exact. This is called
truncation error, or local error.
ek : = x ( tk ) − xk .
x ( t k +1 ) − x ( t k )
≈ x 0 (tk ) = f (tk , x (tk )).
h
This type of error is made at every step. Generalize this error for all
explicit one-step methods.
x ( t k +1 ) − x ( t k )
Tk = − Φ ( t k , x ( t k ); h ).
h
If Tk → 0 as h → 0, the method is consistent. If Tk = O(h p ), the
method has order-p truncation error.
x ( t k +1 ) − x ( t k )
Tk = − Φ ( t k , x ( t k ); h )
h
x ( t k +1 ) − x ( t k )
= − f (tk , x (tk ))
h
x ( t k +1 ) − x ( t k )
= − x 0 ( t k ).
h
This last substitution, f (tk , x (tk )) = x 0 (tk ), is valid because f is eval-
uated at the exact solution x (tk ). (Recall that in general, f (tk , xk ) 6=
x 0 (tk ).) Assuming that x (t) ∈ C2 [tk , tk+1 ], we can expand x (t) in a
Taylor series about t = tk to obtain
for some ξ ∈ [tk , tk+1 ]. Rearrange this to obtain a formula for x 0 (tk ),
and substitute it into the formula for Tk , yielding
x ( t k +1 ) − x ( t k )
Tk = − x 0 (tk )
h
x ( t k +1 ) − x ( t k ) x (tk+1 ) − x (tk ) 1 00
= − + 2 hx (ξ )
h h
= 12 hx 00 (ξ ).
Similarly, one can find that Heun’s method and the modified Eu-
ler’s method both have O(h2 ) truncation error, while the error for the
four-stage Runge–Kutta method is O(h4 ). Extrapolating from this
data, one might expect that a method requiring m evaluations of f
can deliver O(hm ) truncation error. Unfortunately, this is not true be-
yond m = 4, hence the fame of the four-stage Runge–Kutta method.
All Runge–Kutta methods with O(h5 ) truncation error require at least
six evaluations of f . As we will discuss later, the same
Next we must address a fundamental question: Does Tk → 0 as function evaluations for higher order
methods can be strategically combined
h → 0 ensure global convergence, ek → 0, for each k = 1, 2, . . . ? to give two methods with different
orders of accuracy. Comparing the
estimates from two methods of different
orders, one can estimate the error in the
integration. Such estimates then allow
one to adjust the time-step h on the fly
during an integration to control the
error.
191
ek = x ( tk ) − xk
xk+1 = xk + hΦ(tk , xk ; h)
x ( t k +1 ) − x ( t k )
Tk = − Φ ( t k , x ( t k ); h ),
h
to obtain an expression for x (tk+1 ),
e k +1 = x ( t k +1 ) − x k +1
= x (tk ) − xk + h Φ(tk , x (tk ); h) − Φ(tk , xk ; h) + hTk
= ek + h Φ(tk , x (tk ); h) − Φ(tk , xk ; h) + hTk .
for all t ∈ [t0 , tfinal ] and all u, v ∈ IR. This assumption is closely
related to the Lipschitz condition that plays an essential role in the the-
orem of existence of solutions given in Section 5.1. For ‘nice’ ODEs
and reasonable methods Φ, this condition is not difficult to satisfy. For example, for the forward Euler
This assumption is precisely what we need to bound the difference method, LΦ = L, where L is the usual
Lipshitz constant for the ordinary
between Φ(tk , x (tk ); h) and Φ(tk , xk ; h) that appears in the formula for differential equation.
ek . In particular, we now have
|ek+1 | = ek + h Φ(tk , x (tk ); h) − Φ(tk , xk ; h) + hTk
| e0 | = | x ( t + 0 ) − x 0 | = 0
|e1 | ≤ h | T0 | ≤ hT
..
.
n −1
|en | ≤ hT ∑ (1 + hLΦ )k .
k =0
Notice that this bound for |en | is a finite geometric series, and thus
we have the convenient formula
(1 + hL )n − 1
Φ
|en | ≤ hT
(1 + hLΦ ) − 1
T
(1 + hLΦ )n − 1
=
LΦ Recall that
T nhLΦ eαx = 1 + αx + 21 (αx )2 + 1
3! ( αx )
3 +···
(5.3) < e −1 .
LΦ and so, since hLΦ > 0,
(This result and proof are given as Theorem 12.2 in the text by Süli 1 + hLΦ < ehLΦ .
and Mayers.)
There are two key lessons to be learned from this bound on |en |.
193
• Focus attention on some fixed target time tfinal , and consider time
steps
t − t0
h := final ,
n
so that xn ≈ x (tfinal ). As n → ∞, note that h → 0, and in this case
nh = tfinal − t0 is fixed. Thus in the bound
T nhLΦ
| en | < e −1
LΦ
The plots in Figure 5.4 confirm these observations. Again for the
model problem x 0 (t) = x (t) with (t0 , x0 ) = (0, 1), the figure shows
the error for Euler’s method (Tk = O(h)), Heun’s method ( Tk =
O(h2 )), and the four-stage Runge–Kutta method ( Tk = O(h4 )) for
t ∈ [0, 10]. Note the logarithmic scale of the vertical axes in these
plots. As n increases, the error grows exponentially in all these cases.
However, as h is reduced, the error gets smaller at all fixed times. The
extent of this error reduction is what one would expect from the local
truncation errors. All of the plots start with h = 0.1 (black dots). The
other curves show the result of repeatedly cutting h in half, giving
h = 0.05 (blue), h = 0.025 (red), h = 0.0125 (cyan), and h = 0.00625
(magenta).
5
10
h = 0.1
forward Euler method h = 0.00625
0
10
| ek |
10 -5
10 -10
-15
10
0 1 2 3 4 5 6 7 8 9 10
tk
10 5
10
0 h = 0.00625
| ek |
10 -5
10 -10
-15
10
0 1 2 3 4 5 6 7 8 9 10
tk
10 5
10 0
h = 0.1
h = 0.05
| ek |
h = 0.025
h = 0.0125
10 -5
h = 0.00625
k1 = f (tk , xk )
k2 = f (tk + 14 h, xk + 14 hk1 )
k3 = f (tk + 38 h, xk + 3
32 hk 1 + 9
32 hk 2 )
12 1932 7200 7296
k4 = f (tk + 13 h, xk + 2197 hk 1 − 2197 hk 2 + 2197 hk 3 )
439 3680 845
k5 = f (tk + h, xk + 216 hk 1 − 8hk2 + 513 hk 3 − 4104 hk 4 )
k6 = f (tk + 12 h, xk − 8
27 hk 1 + 2hk2 − 3544
2565 hk 3 + 1859
4104 hk 4 − 11
40 hk 5 ).
3
f (tk , xk ) − 12 f (tk−1 , xk−1 ) ,
x k +1 = x k + h 2
h
x k +3 = x k +2 + 9 f k+3 + 19 f k+2 − 5 f k+1 + f k ,
24
giving
α0 = 0, α1 = 0, α2 = −1, α4 = 1;
1 5 19 9
β0 = 24 , β1 = − 24 , β2 = 24 , β3 = 24 .
198
x ( t k +1 ) − x ( t k )
Tk = − Φ ( t k , x k ; h ).
h
Definition 5.3. The truncation error for the linear multistep method
m m
∑ α j xk+ j = h ∑ β j f (tk+ j , xk+ j )
j =0 j =0
and also
h2 000 h3 (4)
f (tk+1 , x (tk+1 )) = x 0 (tk + h) = x 0 (tk ) + hx 00 (tk ) + 2! x ( tk ) + 3! x ( tk ) + ···
22 h 2 23 h 3
f (tk+2 , x (tk+2 )) = x 0 (tk + 2h) = x 0 (tk ) + 2hx 00 (tk ) + 2! x 000 (tk ) + 3! x (4) ( t k ) + ···
32 h 2 33 h 3
f (tk+3 , x (tk+3 )) = x 0 (tk + 3h) = x 0 (tk ) + 3hx 00 (tk ) + 2! x 000 (tk ) + 3! x (4) ( t k ) + ···
..
.
m2 h2 000 m3 h3 (4)
f (tk+m , x (tk+m )) = x 0 (tk + mh) = x 0 (tk ) + mhx 00 (tk ) + 2! x ( tk ) + 3! x ( tk ) + ···.
m ∞ m
h i h 1 1 i
= h −1 ∑ j x (tk ) +
α ∑ h` ∑ αj
(` + 1)!
j`+1 − β j j` x (`+1) (tk )
`!
j =0 `=0 j =0
1h m i h m m i
= ∑
h j =0
α j x (tk ) + ∑ jα j − ∑ β j x 0 (tk )
j =0 j =0
m m
h j2 i
+h ∑ 2
α j − ∑ jβ j x 00 (tk )
j =0 j =0
m m 2 i
h j3 j
+ h2 ∑ 6
α j − ∑ β j x 000 (tk )
2
j =0 j =0
m m
h j4 j 3 i (4)
+ h3 ∑ 24 α j − ∑ 6 j
β x (tk ) + · · · .
j =0 j =0
If one of the conditions in Theorem 5.2 are violated, then the formula
for the truncation error contains either a term that grows like 1/h or
remains constant as h → 0. The Taylor analysis of the truncation error
yields even more information, though: inspecting the coefficients
multiplying h, h2 , etc. reveals easy conditions for determining the
overall truncation error of a linear multistep method.
200
for all ` = 1, . . . , p − 1.
α0 = −1, α1 = 1; β 0 = 1, β 1 = 0.
Thus, Tk = O(h).
1 1
α0 = −1, α1 = 1; β 0 = ,β = .
2 1 2
Again, consistency is easy to verify: α0 + α1 = −1 + 1 = 0 and
(0α0 + 1α1 ) − ( β 0 + β 1 ) = 1 − 1 = 0. Furthermore,
1 2 1 2 1 1
2 0 α0 + 2 1 α1 − (0β 0 + 1β 1 ) = 2 − 2 = 0,
so Tk = O(h2 ), but
1 3 1 3 1 2 1 2 1 1
6 0 α0 + 6 1 α1 − 2 0 β 0 + 2 1 β 1 ) = 6 − 4 6= 0,
α0 = 0, α1 = −1, α2 = 1;
β0 = − 21 , β1 = 3
2, β 2 = 0.
201
Does this explicit 2-step method deliver O(h2 ) accuracy, like the (im-
plicit) Trapezoid method? Consistency follows easily:
α0 + α1 + α2 = 0 − 1 + 1 = 0
and
(0α0 + 1α1 + 2α2 ) − ( β 0 + β 1 ) = 1 − 1 = 0.
The second order condition is also satisfied,
1 2 1 2 1 2 3 3
2 0 α 0 + 2 1 α 1 + 2 2 α 2 − 0β 0 + 1β 1 = 2 − 2 = 0,
α0 = 0, α1 = 0, α2 = 0, α3 = −1, α4 = 1;
9 37
β0 = − 24 , β1 = 24 , β2 = − 59
24 , β3 = 55
24 , β 4 = 0.
4
∑ 21 j2 α j − ∑4j=0 jβ j = 32 42
2 (−1) + 2 (1)
9
− 0(− 24 37
) + 1( 24 ) + 2(− 59
24 ) + 3 ( 55
24 ) = 7
2 − 84
24 = 0;
j =0
4
∑ 16 j3 α j − ∑4j=0 21 j2 β j = 33 43
6 (−1) + 6 (1) − 12 37 22 59 32 55
2 ( 24 ) + 2 (− 24 ) + 2 ( 24 ) = 37
6 − 148
24 = 0;
j =0
4
∑ 1 4 4 1 3
24 j α j − ∑ j=0 6 j β j = 34 44
24 (−1) + 24 (1) − 13 37 23 59 33 55
6 ( 24 ) + 6 (− 24 ) + 6 ( 24 ) = 175
24 − 1050
144 = 0.
j =0
4
∑ 120
1 5
j α j − ∑4j=0 24
1 4
j βj = 35 45
120 (−1) + 120 (1) − 14 37 24 59 34 55
24 ( 24 ) + 24 (− 24 ) + 24 ( 24 ) = 1267
120 − 887
144 6 = 0.
j =0
( α 0 x 0 + α 1 x 1 + · · · + α m −1 x m −1 )
xm = −
αm
x0 = x1 = · · · = xm−1 = 0,
then
( α 0 · 0 + α 1 · 0 + · · · + α m −1 · 0 )
xm = − = 0,
αm
and this pattern will continue: xm+1 = 0, xm+2 = 0, . . . . Any lin-
ear multistep method with exact starting data produces the exact
solution for this special problem, regardless of the time-step.
Of course, for more complicated problems it is unusual to have
exact starting values x1 , x2 , . . . xm−1 ; typically, these values are only
approximate, obtained from some high-order one-step ODE solver
or from an asymptotic expansion of the solution that is accurate in
a neighborhood of t0 . To discover how multistep methods behave,
we must first understand how these errors in the initial data pollute
future iterations of the linear multistep method.
that generate the approximate solutions { x j }nj=0 and { xbj }nj=0 , where
tn = tfinal . The multistep method is zero-stable for this initial value
problem if for sufficiently small h there exists some constant M (inde-
pendent of h) such that
As seen above, this method produces the exact solution if given exact
initial data, x0 = x1 = 0. But what if x0 = 0 but x1 = ε for some small
ε > 0? This method produces the iterates
x2 = 2x0 − x1 = 2 · 0 − ε = −ε
x3 = 2x1 − x2 = 2(ε) − (−ε) = 3ε
x4 = 2x2 − x3 = 2(−ε) − 3ε = −5ε
x5 = 2x3 − x4 = 2(3ε) − (−5ε) = 11ε
x6 = 2x4 − x5 = 2(−5ε) − (11ε) = −21ε
x7 = 2x5 − x6 = 2(11ε) − (−21ε) = 43ε
x8 = 2x6 − x7 = 2(−21ε) − (43ε) = 85ε.
205
4
0.1
3
h = 0.2 h = 0.1
2
0.05
xk ≈ x (tk )
xk ≈ x (tk )
1
0 0
-1
-0.05
-2
-3
-0.1
-4
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
tk tk
#10 9
4000 4
3000 3
h = 0.05 h = 0.025
2000 2
xk ≈ x (tk )
xk ≈ x (tk )
1000 1
0 0
-1000 -1
-2000 -2
-3000 -3
-4000 -4
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
tk tk
h = 0:2 h = 0:1 h = 0:05 h = 0:025 Figure 5.6: All four solutions from
Figure 5.5 plotted together, to illustrate
15 how the approximate solutions from the
method (5.4) to x 0 (t) = 0 with x0 = 0
and x1 = ε degrade as h → 0.
10
5
xk ≈ x (tk )
-5
-10
-15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
tk
γ2 = 2 − γ.
γ2 + γ − 2 = (γ + 2)(γ − 1)
γ12 + γ1 − 2 = γ22 + γ2 − 2 = 0,
and so
Aγ1k γ12 + γ1 − 2 = Bγ2k γ22 + γ2 − 2 = 0.
207
A+B = 0
−2A + B = ε,
which implies
A = −ε/3, B = ε/3.
Indeed, the solution
ε ε
(5.5) xk = − (−2)k
3 3
generates the iterates x0 = 0, x1 = ε, x2 = −ε, x3 = 3ε, x4 =
−5ε, . . . computed previously. Notice that (5.5) reveals exponential
growth with k: this growth overwhelms algebraic improvements
in the estimate x1 that might occur as we reduce h. For example, if
ε = x1 − x (t0 + h) = ch p for some constant c and p ≥ 1, then
xk = ch p (1 − (−2)k )/3 still grows exponentially in k.
Substituting xk = γk yields
m
∑ α j γk+ j = 0.
j =0
Canceling γk ,
m
∑ α j γ j = 0.
j =0
is
xk = c1 γ1k + c2 γ2k + · · · cm γm
k
.
One can see now where the term zero-stability comes from: it is
necessary and sufficient for the stability definition to hold for the
differential equation x 0 (t) = 0. In recognition of the discoverer of
this key result, zero-stability is sometimes called Dahlquist stability.
(Another synonymous term is root stability.) In addition to making
this beautiful characterization, Dahlquist also answered the question
about the conditions necessary for a multistep method to be conver-
gent.
x ( t k ) − x k = O( h p )
1 1
h = 0.05 h = 0.01
xk ≈ x (tk )
xk ≈ x (tk )
0.5 0.5
0 0
-0.5 -0.5
0 0.5 1 1.5 2 0 0.5 1 1.5 2
tk tk
γ2 − 2λhγ − 1 = 0.
1 1
h = 0.05 h = 0.01
xk ≈ x (tk )
xk ≈ x (tk )
0.5 0.5
0 0
-0.5 -0.5
0 0.5 1 1.5 2 0 0.5 1 1.5 2
tk tk
h
x k +2 − x k +1 = (3 f k +1 − f k ).
2
This method is zero stable, as ρ(z) = z2 − z = z(z − 1). Figure 5.8
repeats the exercise of Figure 5.7, with the same errors in x1 , but
with the second-order Adams–Bashforth method. Though the initial
value error will throw off the solution slightly, we recover the correct
qualitative behavior.
Judging from the different manner in which our two second-order
methods handle this simple problem, it appears that there is still
more to understand about linear multistep methods. This is the sub-
ject of the next lecture.
212
At this point, it may well seem that we have a complete theory for
linear multistep methods. With an understanding of truncation er-
ror and zero stability, the convergence of any method can be easily
understood. However, one further wrinkle remains. (Perhaps you
expected this: thus far the β j coefficients have played no role in our
stability analysis!) Up to this point, our convergence theory addresses
the case where h → 0. Methods differ significantly in how small
h must be before one observes this convergent regime. For h too
large, exponential errors that resemble those seen for zero-unstable
methods can emerge for rather benign-looking problems—and for
some ODEs and methods, the restriction imposed on h to avoid such
behavior can be severe. To understand this problem, we need to con-
sider how the numerical method behaves on a less trivial canonical
model problem. For an elaboration of many details
Now consider the model problem x 0 (t) = λx (t), x (0) = x0 for described here, see Chapter 12 of Süli
and Mayers.
some fixed λ ∈ C, which has the exact solution x (t) = etλ x0 . In those
cases where the real part of λ is negative (i.e., λ is in the open left
half of the complex plane), we have | x (t)| → 0 as t → ∞. For a fixed
step size h > 0, will a linear multistep method mimic this behavior?
The explicit Euler method applied to this equation takes the form
x k +1 = x k + h f k
= xk + hλxk
= (1 + hλ) xk .
Hence, this recursion has the general solution
xk = (1 + hλ)k x0 .
Under what conditions will xk → 0? Clearly we need |1 + hλ| < 1;
this condition is more easily interpreted by writing |1 + hλ| = | −
1 − hλ|, where that latter expression is simply the distance of hλ
from −1 in the complex plane. Hence |1 + hλ| < 1 provided hλ is
located strictly in the interior of the disk of radius 1 in the complex
plane, centered at −1. This is the stability region for the explicit Euler
method, shown in the plot on the next page.
Now consider the backward (implicit) Euler method for this same
model problem:
x k +1 = x k + h f k +1
= xk + hλxk+1 .
213
2 2
1.5 1.5
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-2.5 -2.5
-2 -1 0 1 2 -2 -1 0 1 2
x k +1 = x k + h f k x k +1 = x k + h f k +1
xk = (1 − hλ)−k x0 .
m m
∑ α j xk+ j = h ∑ β j f k+ j
j =0 j =0
214
ρ(z) − hλσ(z) = 0,
and
m
σ(z) = ∑ β j zj.
j =0
Thus for a fixed hλ, there will be m solutions of the form γkj for the
m roots γ1 , . . . , γm of the stability polynomial. If these roots are all
distinct, then for any initial data x0 , . . . , xm−1 we can find constants
c1 , . . . , cm such that
m
xk = ∑ c j γkj .
j =1
For a given value hλ, we have xk → 0 provided that |γ j | < 1 for all
j = 1, . . . , m. If that condition is met, we say that the linear multistep
method is absolutely stable for that value of hλ.
In the next section, we will describe how linear systems of differ-
ential equations, x0 (t) = Ax(t), can give rise, through an eigenvalue
decomposition of A, to the scalar problem x 0 (t) = λx (t) with com-
plex values of the eigenvalue λ (even if A is real). This explains our
interest in values of hλ ∈ C.
215
• zero stability;
z2 − (1 + 32 λh)z + 12 λh = 0.
2 2
1.5 1.5
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-2.5 -2.5
-4 -3 -2 -1 0 1 -4 -3 -2 -1 0 1
3 1 1
x k +2 − x k +1 = h 2 f k +1 − 2 fk x k +1 − x k = 2 h ( f k + f k +1 )
2 2
1.5 1.5
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-2.5 -2.5
-4 -3 -2 -1 0 1 -4 -3 -2 -1 0 1
1 1
x k +4 − x k +3 = 24 h 55 f k+3 − 59 f k+2 + 37 f k+1 − 9 f k x k +3 − x k +2 = 24 h 9 f k+3 + 19 f k+2 − 5 f k+1 + f k
6 6
4 4
2 2
0 0
-2 -2
-4 -4
-6 -6
-8 -8
-4 -2 0 2 4 6 8 10 12 -4 -2 0 2 4 6 8 10 12
1
x k +1 − x k = 2 h( f k + f k +1 ) 3xk+2 − 4xk+1 + xk = 2h f k+2
6 6
4 4
2 2
0 0
-2 -2
-4 -4
-6 -6
-8 -8
-4 -2 0 2 4 6 8 10 12 -4 -2 0 2 4 6 8 10 12
11xk+3 − 18xk+2 + 9xk+1 − 2xk = 6h f k+3 25xk+4 − 48xk+3 + 36xk+2 − 16xk+1 + 3xk = 12h f k+4
0
0
-0.5
-1
-1 -0.5 0 0.5 1
219
for A ∈ C n×n and x(t) ∈ C n . We wish to see how the scalar linear
stability theory discussed in the last lecture applies to such systems.
Suppose that the matrix A is diagonalizable, so that it can be written
A = VΛV−1 for the diagonal matrix Λ = diag(λ1 , . . . , λn ). Premulti-
plying the differential equation by V−1 yields
y j ( t ) = eλ j t y j (0 ).
Then write
x(t) = eAt x0 .
= Ax(t).
221
Hence x(t) = etA x0 solves the equation x0 (t) = Ax(t), and satisfies
the initial condition x(0) = x0 .
xk+1 = xk + hAxk
= (I + hA)xk .
x1 = (I + hA)x0
x2 = (I + hA)x1 = (I + hA)2 x0
x3 = (I + hA)x2 = (I + hA)3 x0
..
.
and, in general,
xk = (I + hA)k x0 .
h = 1/4 h = 1/8
2 2
1 1
0 0
-1 -1
-2 -2
-3 -2 -1 0 1 -3 -2 -1 0 1
h = 1/10 h = 1/16
2 2
1 1
0 0
-1 -1
-2 -2
-3 -2 -1 0 1 -3 -2 -1 0 1
xk = (I + hA)k x0
= (I + hVΛV−1 )k x0
= (VV−1 + hVΛV−1 )k x0
Compare this last expression to the formula (5.10) for the true solu-
223
25
10
h = 1/4
20
10
k x k k2
10 15
10 10
10 5
h = 1/8
10 0
h = 1/10
-5
10
0 2 4 6 8 10 12 14 16
tk
Figure 5.14: The second-order Adams–
Bashforth method applied to x0 (t) =
tion x(t) in terms of the matrix exponential. As we did in that case, Ax(t) for the same matrix A used for
Figure 5.13. As seen in that figure, for
we can bound kxk k2 : step-sizes h = 1/4 and h = 1/8 the
method is unstable, and kxk k2 → ∞ as
kxk k2 = kV(I + hΛ)k V−1 x0 k2 k → ∞. When h = 1/10, hλ j is in the
stability region for all eigenvalues λ j of
= kV(I + hΛ)k V−1 k2 kx0 k2 A, and hence kxk k2 → 0 as k → ∞.
giving
k(I + hΛ)k k2 = max |1 + hλ j |k .
1≤ j ≤ n
k x k k2
≤ kVk2 kV−1 k2 max |1 + hλ j |k ,
k x0 k2 1≤ j ≤ n
Since
(1 + hλ1 )k
(1 + hλ1 )k c1
c1
(1 + hλ2 )k
c2
(1 + hλ2 )k c2
.. = .. ,
..
. . .
(1 + hλn )k cn (1 + hλn )k cn
we have
(5.12)
(1 + hλ1 )k c1
h i (1 + hλ2 )k c2 n
∑ c j (1 + hλ j )k v j .
xk = v1 v2 ··· vn .. =
.
j =1
(1 + hλn )k cn
|1 + hλ` | = max |1 + hλ j |.
1≤ j ≤ n
the solution is
" #
e− t 0
x(t) = V V −1 x 0
0 e−100t
" #" # " # " #
e− t 0 c1 −1 2 3
=V = c 1 e− t + c2 e−100t ,
0 e−100t c2 1 −1 x(.012)
x2 ( t )
2 x(.006)
and so x(t) → 0 as t → ∞. The eigenvalue λ2 = −100 corresponds to
x(.002)
a fast transient, a component of the solution that decays very rapidly;
x (0)
the eigenvalue λ1 = −1 corresponds to a slow transient, a component 1
of the solution that decays much more slowly. Using this insight we
x (2)
can describe the behavior of the system as t → 0 more precisely than
0
merely saying x(t) → 0. Since e−100t decays much more quickly than -2 -1 0 1
e−t , the solution will be dominated by the λ1 term: x1 ( t )
" #
Some snapshots of the exact solution
−t −1
x ( t ) ∼ c1 e , t → ∞, x(t) for the example with λ1 = −1
1 and λ2 = −100, using initial condition
x(0) = [1, 1] T . The solution decays,
provided c1 6= 0. This means that the solution vector x(t) will quickly kx(t)k2 → 0 as t → ∞, and as it does
so, the solution aligns in the direction
align in the v1 direction as it converges toward zero. of the eigenvector associated with
λ1 = −1, v1 = [−1, 1] T .
Now apply the forward Euler method to this problem. From the
general expression (5.12), the iterate xk can be written in the basis of
eigenvectors as
" # " #
− 1 2
xk = c1 (1 + hλ1 )k + c2 (1 + hλ2 )k
1 −1 2
" # " # ( x k )2 x0
−1 2 x2
k k
= c1 (1 − h ) + c2 (1 − 100h) . 0 x4
1 −1
x6
-2
x8
To obtain a numerical solution {xk } that mimics the asymptotic be-
havior of the true solution, x(t) → 0, one must choose h sufficiently x10
small that |1 + hλ1 | = |1 − h| < 1 and |1 + hλ2 | = |1 − 100h| < 1. -4
0 2 4 6 8
The first condition requires h ∈ (0, 2), while the second condition ( x k )1
is far more restrictive: h ∈ (0, 1/50). The more restrictive condition
describes the values of h that will give xk → 0 for all x0 . Some iterates xk of the forward Euler
method for the example with λ1 = −1
Take note of this phenomenon: the faster a component decays from the and λ2 = −100, using time-step
h = 0.021. This time-step is slightly
true solution (like e−100t in our example), the smaller the time step must be larger than the stability limit h < 0.02,
for the forward Euler method (and other explicit schemes). so kxk k2 → ∞ as k increases. Moreover,
the solution aligns in the direction of
the the eigenvector associated with the
most unstable eigenvalue, v2 = [2, −1] T .
226
u( x, 0) = U0 ( x ), x ∈ [0, 1].
Quenching both ends of this bar in an ice bath equates to the homoge-
neous Dirichlet boundary conditions
u(0, t) = u(1, t) = 0.
u(0, t) = u(1, t) = 0, t ≥ 0,
directly imply that u0 (t) = un+1 (t) = 0 for all t ≥ 0. Now our goal This explains why this approach is
is to find u j (t) for j = 1, . . . , n. called the ‘method of lines’. It will
develop approximations u j (t) to the
2. Now recall the partial differential equation ut ( x, t) = u xx ( x, t). solution u( x, t) on ‘lines’ of constant
x = x j values in the ( x, t) plane.
Replacing u xx ( x, t) with the finite difference approximation, the
differential equation suggests that we find u j (t) so that
..
.
However, for large A computation of etA (e.g., using MATLAB’s Such large A arise when we have
expm command), is quite expensive, and so we wish to approxi- partial differential equations in two
and three physical dimensions. The
mate the solution of the ordinary differential equation using one of the one-dimensional example here is easy
techniques studied in this chapter. by comparison.
For example, one could fix a time-step ∆t > 0 and seek an
approximation
u k ≈ u ( t k ).
The forward Euler method gives
uk+1 = uk + (∆t)Auk ,
uk+1 = uk + (∆t)Auk+1
(I − (∆t)A)uk+1 = uk
that must be solved (e.g., via Gaussian elimination) at each step to
find uk . If ∆t is fixed, then one would compute
The main question, then, is: given a choice of numerical integra- a Cholesky or LU factorization of
I − ∆tA, thus expediting the solution
tor (forward Euler, backward Euler, etc.), how large can the time of this system at each step. If A is
step ∆t be to maintain stability? banded, as it is in this example, such
factorizations are very fast.
229
2 cos(nπ/(n + 1)) − 2 −4
λn = ≈ = −4( n + 1)2 .
(∆x ) 2 (∆x )2
Notice that the eigenvalues of A are all negative, so
n λ1 λn
16 −9.841548 . . . −1146.158 . . .
32 −9.862152 . . . −4346.137 . . .
64 −9.867683 . . . −16890.132 . . .
2 cos(nπ/(n + 1)) − 2
λn = > −4( n + 1)2 ,
(∆x )2
the eigenvalues of A are contained in the interval
0 0 0 0
-1 -1 -1 -1
1 8 16 1 8 16 1 8 16 1 8 16
0 0 0 0
-1 -1 -1 -1
1 8 16 1 8 16 1 8 16 1 8 16
0 0 0 0
-1 -1 -1 -1
1 8 16 1 8 16 1 8 16 1 8 16
0 0 0 0
-1 -1 -1 -1
1 8 16 1 8 16 1 8 16 1 8 16
0.15
16
0.1 14 15
11 12 13
t 9 10
0.05 7 8
4 5 6
1 2
3 j
0
u j (tk )
2 2 2
1 1 1
0 0 0
1 8 16 1 8 16 1 8 16
j j j
u j (tk )
2 2 2
1 1 1
0 0 0
1 8 16 1 8 16 1 8 16
j j j
Now solve this same problem using the forward Euler method.
In the eigenvector basis, the forward Euler approximation is
n
(5.16) uk = (I + (∆t) A)k u0 = ∑ c j (1 + (∆t) λ j )k v j .
j =1
∆t < 0.001744959 . . . .
Figure 5.18 shows the result of running forward Euler with a time-
step ∆t = 0.002 that is slightly too large. The computation proceeds
reasonably for the first ten time steps or so, but by k = 20 the
233
cn (1 + (∆t)λn )k vn ,
Thus the (1 + (∆t)λn )k term decays most slowly in the sum (5.16).
Will it not eventually dominate, causing the solution to again
10 -2
10 -3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
j
234
u j (tk ) 3 3 3
2 2 2
1 1 1
0 0 0
-1 -1 -1
1 8 16 1 8 16 1 8 16
j j j
6
u j (tk ) 3
4
2 2
0 0
1 -2
-4
0
-6
-1 -8 -5
1 8 16 1 8 16 1 8 16
j j j
0.5
0
30
25
20
16
15 14 15
12 13
t 10 10 11
8 9
5 5 6 7
3 4 j
0 1 2
u j (tk )
2 2 2
1 1 1
0 0 0
1 8 16 1 8 16 1 8 16
j j j
u j (tk )
2 2 2
1 1 1
0 0 0
1 8 16 1 8 16 1 8 16
j j j
236
I hope our investigations this semester have given you a taste of the beautiful
mathematics that empower numerical computations, the discrimination to
pick the right algorithm to suit your given problem, the insight to identify
those problems that are inherently ill-conditioned, and the tenacity to always
seek clever, efficient solutions.