0% found this document useful (0 votes)
76 views263 pages

Big Review

Uploaded by

Non Existent
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views263 pages

Big Review

Uploaded by

Non Existent
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 263

Scientic Computing

Approximations
Computer Arithmetic
Scientic Computing: An Introductory Survey
Chapter 1 Scientic Computing
Prof. Michael T. Heath
Department of Computer Science
University of Illinois at Urbana-Champaign
Copyright c 2002. Reproduction permitted
for noncommercial, educational use only.
Michael T. Heath Scientic Computing 1 / 46
Scientic Computing
Approximations
Computer Arithmetic
Outline
1
Scientic Computing
2
Approximations
3
Computer Arithmetic
Michael T. Heath Scientic Computing 2 / 46
Scientic Computing
Approximations
Computer Arithmetic
Introduction
Computational Problems
General Strategy
Scientic Computing
What is scientic computing?
Design and analysis of algorithms for numerically solving
mathematical problems in science and engineering
Traditionally called numerical analysis
Distinguishing features of scientic computing
Deals with continuous quantities
Considers effects of approximations
Why scientic computing?
Simulation of natural phenomena
Virtual prototyping of engineering designs
Michael T. Heath Scientic Computing 3 / 46
Scientic Computing
Approximations
Computer Arithmetic
Introduction
Computational Problems
General Strategy
Well-Posed Problems
Problem is well-posed if solution
exists
is unique
depends continuously on problem data
Otherwise, problem is ill-posed
Even if problem is well posed, solution may still be
sensitive to input data
Computational algorithm should not make sensitivity worse
Michael T. Heath Scientic Computing 4 / 46
Scientic Computing
Approximations
Computer Arithmetic
Introduction
Computational Problems
General Strategy
General Strategy
Replace difcult problem by easier one having same or
closely related solution
innite nite
differential algebraic
nonlinear linear
complicated simple
Solution obtained may only approximate that of original
problem
Michael T. Heath Scientic Computing 5 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Sources of Approximation
Before computation
modeling
empirical measurements
previous computations
During computation
truncation or discretization
rounding
Accuracy of nal result reects all these
Uncertainty in input may be amplied by problem
Perturbations during computation may be amplied by
algorithm
Michael T. Heath Scientic Computing 6 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Example: Approximations
Computing surface area of Earth using formula A = 4r
2
involves several approximations
Earth is modeled as sphere, idealizing its true shape
Value for radius is based on empirical measurements and
previous computations
Value for requires truncating innite process
Values for input data and results of arithmetic operations
are rounded in computer
Michael T. Heath Scientic Computing 7 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Absolute Error and Relative Error
Absolute error : approximate value true value
Relative error :
absolute error
true value
Equivalently, approx value = (true value) (1 + rel error)
True value usually unknown, so we estimate or bound
error rather than compute it exactly
Relative error often taken relative to approximate value,
rather than (unknown) true value
Michael T. Heath Scientic Computing 8 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Data Error and Computational Error
Typical problem: compute value of function f : R R for
given argument
x = true value of input
f(x) = desired result
x = approximate (inexact) input

f = approximate function actually computed


Total error:

f( x) f(x) =

f( x) f( x) + f( x) f(x)
computational error + propagated data error
Algorithm has no effect on propagated data error
Michael T. Heath Scientic Computing 9 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Truncation Error and Rounding Error
Truncation error : difference between true result (for actual
input) and result produced by given algorithm using exact
arithmetic
Due to approximations such as truncating innite series or
terminating iterative sequence before convergence
Rounding error : difference between result produced by
given algorithm using exact arithmetic and result produced
by same algorithm using limited precision arithmetic
Due to inexact representation of real numbers and
arithmetic operations upon them
Computational error is sum of truncation error and
rounding error, but one of these usually dominates
< interactive example >
Michael T. Heath Scientic Computing 10 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Example: Finite Difference Approximation
Error in nite difference approximation
f

(x)
f(x + h) f(x)
h
exhibits tradeoff between rounding error and truncation
error
Truncation error bounded by Mh/2, where M bounds
|f

(t)| for t near x


Rounding error bounded by 2/h, where error in function
values bounded by
Total error minimized when h 2
_
/M
Error increases for smaller h because of rounding error
and increases for larger h because of truncation error
Michael T. Heath Scientic Computing 11 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Example: Finite Difference Approximation
10
!16
10
!14
10
!12
10
!10
10
!8
10
!6
10
!4
10
!2
10
0
10
!18
10
!16
10
!14
10
!12
10
!10
10
!8
10
!6
10
!4
10
!2
10
0
10
2
step size
e
r
r
o
r
truncation error rounding error
total error
Michael T. Heath Scientic Computing 12 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Forward and Backward Error
Suppose we want to compute y = f(x), where f : R R,
but obtain approximate value y
Forward error : y = y y
Backward error : x = x x, where f( x) = y
Michael T. Heath Scientic Computing 13 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Example: Forward and Backward Error
As approximation to y =

2, y = 1.4 has absolute forward


error
|y| = | y y| = |1.4 1.41421 . . . | 0.0142
or relative forward error of about 1 percent
Since

1.96 = 1.4, absolute backward error is


|x| = | x x| = |1.96 2| = 0.04
or relative backward error of 2 percent
Michael T. Heath Scientic Computing 14 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Backward Error Analysis
Idea: approximate solution is exact solution to modied
problem
How much must original problem change to give result
actually obtained?
How much data error in input would explain all error in
computed result?
Approximate solution is good if it is exact solution to nearby
problem
Backward error is often easier to estimate than forward
error
Michael T. Heath Scientic Computing 15 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Example: Backward Error Analysis
Approximating cosine function f(x) = cos(x) by truncating
Taylor series after two terms gives
y =

f(x) = 1 x
2
/2
Forward error is given by
y = y y =

f(x) f(x) = 1 x
2
/2 cos(x)
To determine backward error, need value x such that
f( x) =

f(x)
For cosine function, x = arccos(

f(x)) = arccos( y)
Michael T. Heath Scientic Computing 16 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Example, continued
For x = 1,
y = f(1) = cos(1) 0.5403
y =

f(1) = 1 1
2
/2 = 0.5
x = arccos( y) = arccos(0.5) 1.0472
Forward error: y = y y 0.5 0.5403 = 0.0403
Backward error: x = x x 1.0472 1 = 0.0472
Michael T. Heath Scientic Computing 17 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Sensitivity and Conditioning
Problem is insensitive, or well-conditioned, if relative
change in input causes similar relative change in solution
Problem is sensitive, or ill-conditioned, if relative change in
solution can be much larger than that in input data
Condition number :
cond =
|relative change in solution|
|relative change in input data|
=
|[f( x) f(x)]/f(x)|
|( x x)/x|
=
|y/y|
|x/x|
Problem is sensitive, or ill-conditioned, if cond 1
Michael T. Heath Scientic Computing 18 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Condition Number
Condition number is amplication factor relating relative
forward error to relative backward error

relative
forward error

= cond

relative
backward error

Condition number usually is not known exactly and may


vary with input, so rough estimate or upper bound is used
for cond, yielding

relative
forward error

cond

relative
backward error

Michael T. Heath Scientic Computing 19 / 46


Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Example: Evaluating Function
Evaluating function f for approximate input x = x + x
instead of true input x gives
Absolute forward error: f(x + x) f(x) f

(x)x
Relative forward error:
f(x + x) f(x)
f(x)

f

(x)x
f(x)
Condition number: cond

(x)x/f(x)
x/x

xf

(x)
f(x)

Relative error in function value can be much larger or


smaller than that in input, depending on particular f and x
Michael T. Heath Scientic Computing 20 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Example: Sensitivity
Tangent function is sensitive for arguments near /2
tan(1.57079) 1.58058 10
5
tan(1.57078) 6.12490 10
4
Relative change in output is quarter million times greater
than relative change in input
For x = 1.57079, cond 2.48275 10
5
Michael T. Heath Scientic Computing 21 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Stability
Algorithm is stable if result produced is relatively
insensitive to perturbations during computation
Stability of algorithms is analogous to conditioning of
problems
From point of view of backward error analysis, algorithm is
stable if result produced is exact solution to nearby
problem
For stable algorithm, effect of computational error is no
worse than effect of small data error in input
Michael T. Heath Scientic Computing 22 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Accuracy
Accuracy : closeness of computed solution to true solution
of problem
Stability alone does not guarantee accurate results
Accuracy depends on conditioning of problem as well as
stability of algorithm
Inaccuracy can result from applying stable algorithm to
ill-conditioned problem or unstable algorithm to
well-conditioned problem
Applying stable algorithm to well-conditioned problem
yields accurate solution
Michael T. Heath Scientic Computing 23 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Floating-Point Numbers
Floating-point number system is characterized by four
integers
base or radix
p precision
[L, U] exponent range
Number x is represented as
x =
_
d
0
+
d
1

+
d
2

2
+ +
d
p1

p1
_

E
where 0 d
i
1, i = 0, . . . , p 1, and L E U
Michael T. Heath Scientic Computing 24 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Floating-Point Numbers, continued
Portions of oating-poing number designated as follows
exponent : E
mantissa: d
0
d
1
d
p1
fraction: d
1
d
2
d
p1
Sign, exponent, and mantissa are stored in separate
xed-width elds of each oating-point word
Michael T. Heath Scientic Computing 25 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Typical Floating-Point Systems
Parameters for typical oating-point systems
system p L U
IEEE SP 2 24 126 127
IEEE DP 2 53 1022 1023
Cray 2 48 16383 16384
HP calculator 10 12 499 499
IBM mainframe 16 6 64 63
Most modern computers use binary ( = 2) arithmetic
IEEE oating-point systems are now almost universal in
digital computers
Michael T. Heath Scientic Computing 26 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Normalization
Floating-point system is normalized if leading digit d
0
is
always nonzero unless number represented is zero
In normalized systems, mantissa m of nonzero
oating-point number always satises 1 m <
Reasons for normalization
representation of each number unique
no digits wasted on leading zeros
leading bit need not be stored (in binary system)
Michael T. Heath Scientic Computing 27 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Properties of Floating-Point Systems
Floating-point number system is nite and discrete
Total number of normalized oating-point numbers is
2( 1)
p1
(U L + 1) + 1
Smallest positive normalized number: UFL =
L
Largest oating-point number: OFL =
U+1
(1
p
)
Floating-point numbers equally spaced only between
successive powers of
Not all real numbers exactly representable; those that are
are called machine numbers
Michael T. Heath Scientic Computing 28 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Example: Floating-Point System
Tick marks indicate all 25 numbers in oating-point system
having = 2, p = 3, L = 1, and U = 1
OFL = (1.11)
2
2
1
= (3.5)
10
UFL = (1.00)
2
2
1
= (0.5)
10
At sufciently high magnication, all normalized
oating-point systems look grainy and unequally spaced
< interactive example >
Michael T. Heath Scientic Computing 29 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Rounding Rules
If real number x is not exactly representable, then it is
approximated by nearby oating-point number (x)
This process is called rounding, and error introduced is
called rounding error
Two commonly used rounding rules
chop: truncate base- expansion of x after (p 1)st digit;
also called round toward zero
round to nearest : (x) is nearest oating-point number to
x, using oating-point number whose last stored digit is
even in case of tie; also called round to even
Round to nearest is most accurate, and is default rounding
rule in IEEE systems
< interactive example >
Michael T. Heath Scientic Computing 30 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Machine Precision
Accuracy of oating-point system characterized by unit
roundoff (or machine precision or machine epsilon)
denoted by
mach
With rounding by chopping,
mach
=
1p
With rounding to nearest,
mach
=
1
2

1p
Alternative denition is smallest number such that
(1 + ) > 1
Maximum relative error in representing real number x
within range of oating-point system is given by

(x) x
x


mach
Michael T. Heath Scientic Computing 31 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Machine Precision, continued
For toy system illustrated earlier

mach
= (0.01)
2
= (0.25)
10
with rounding by chopping

mach
= (0.001)
2
= (0.125)
10
with rounding to nearest
For IEEE oating-point systems

mach
= 2
24
10
7
in single precision

mach
= 2
53
10
16
in double precision
So IEEE single and double precision systems have about 7
and 16 decimal digits of precision, respectively
Michael T. Heath Scientic Computing 32 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Machine Precision, continued
Though both are small, unit roundoff
mach
should not be
confused with underow level UFL
Unit roundoff
mach
is determined by number of digits in
mantissa of oating-point system, whereas underow level
UFL is determined by number of digits in exponent eld
In all practical oating-point systems,
0 < UFL <
mach
< OFL
Michael T. Heath Scientic Computing 33 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Subnormals and Gradual Underow
Normalization causes gap around zero in oating-point
system
If leading digits are allowed to be zero, but only when
exponent is at its minimum value, then gap is lled in by
additional subnormal or denormalized oating-point
numbers
Subnormals extend range of magnitudes representable,
but have less precision than normalized numbers, and unit
roundoff is no smaller
Augmented system exhibits gradual underow
Michael T. Heath Scientic Computing 34 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Exceptional Values
IEEE oating-point standard provides special values to
indicate two exceptional situations
Inf, which stands for innity, results from dividing a nite
number by zero, such as 1/0
NaN, which stands for not a number, results from
undened or indeterminate operations such as 0/0, 0 Inf,
or Inf/Inf
Inf and NaN are implemented in IEEE arithmetic through
special reserved values of exponent eld
Michael T. Heath Scientic Computing 35 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Floating-Point Arithmetic
Addition or subtraction: Shifting of mantissa to make
exponents match may cause loss of some digits of smaller
number, possibly all of them
Multiplication: Product of two p-digit mantissas contains up
to 2p digits, so result may not be representable
Division: Quotient of two p-digit mantissas may contain
more than p digits, such as nonterminating binary
expansion of 1/10
Result of oating-point arithmetic operation may differ from
result of corresponding real arithmetic operation on same
operands
Michael T. Heath Scientic Computing 36 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Example: Floating-Point Arithmetic
Assume = 10, p = 6
Let x = 1.92403 10
2
, y = 6.35782 10
1
Floating-point addition gives x + y = 1.93039 10
2
,
assuming rounding to nearest
Last two digits of y do not affect result, and with even
smaller exponent, y could have had no effect on result
Floating-point multiplication gives x y = 1.22326 10
2
,
which discards half of digits of true product
Michael T. Heath Scientic Computing 37 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Floating-Point Arithmetic, continued
Real result may also fail to be representable because its
exponent is beyond available range
Overow is usually more serious than underow because
there is no good approximation to arbitrarily large
magnitudes in oating-point system, whereas zero is often
reasonable approximation for arbitrarily small magnitudes
On many computer systems overow is fatal, but an
underow may be silently set to zero
Michael T. Heath Scientic Computing 38 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Example: Summing Series
Innite series

n=1
1
n
has nite sum in oating-point arithmetic even though real
series is divergent
Possible explanations
Partial sum eventually overows
1/n eventually underows
Partial sum ceases to change once 1/n becomes negligible
relative to partial sum
1
n
<
mach
n1

k=1
1
k
< interactive example >
Michael T. Heath Scientic Computing 39 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Floating-Point Arithmetic, continued
Ideally, x flop y = (x op y), i.e., oating-point arithmetic
operations produce correctly rounded results
Computers satisfying IEEE oating-point standard achieve
this ideal as long as x op y is within range of oating-point
system
But some familiar laws of real arithmetic are not
necessarily valid in oating-point system
Floating-point addition and multiplication are commutative
but not associative
Example: if is positive oating-point number slightly
smaller than
mach
, then (1 + ) + = 1, but 1 + ( + ) > 1
Michael T. Heath Scientic Computing 40 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Cancellation
Subtraction between two p-digit numbers having same sign
and similar magnitudes yields result with fewer than p
digits, so it is usually exactly representable
Reason is that leading digits of two numbers cancel (i.e.,
their difference is zero)
For example,
1.92403 10
2
1.92275 10
2
= 1.28000 10
1
which is correct, and exactly representable, but has only
three signicant digits
Michael T. Heath Scientic Computing 41 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Cancellation, continued
Despite exactness of result, cancellation often implies
serious loss of information
Operands are often uncertain due to rounding or other
previous errors, so relative uncertainty in difference may be
large
Example: if is positive oating-point number slightly
smaller than
mach
, then (1 + ) (1 ) = 1 1 = 0 in
oating-point arithmetic, which is correct for actual
operands of nal subtraction, but true result of overall
computation, 2, has been completely lost
Subtraction itself is not at fault: it merely signals loss of
information that had already occurred
Michael T. Heath Scientic Computing 42 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Cancellation, continued
Digits lost to cancellation are most signicant, leading
digits, whereas digits lost in rounding are least signicant,
trailing digits
Because of this effect, it is generally bad idea to compute
any small quantity as difference of large quantities, since
rounding error is likely to dominate result
For example, summing alternating series, such as
e
x
= 1 + x +
x
2
2!
+
x
3
3!
+
for x < 0, may give disastrous results due to catastrophic
cancellation
Michael T. Heath Scientic Computing 43 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Example: Cancellation
Total energy of helium atom is sum of kinetic and potential
energies, which are computed separately and have opposite
signs, so suffer cancellation
Year Kinetic Potential Total
1971 13.0 14.0 1.0
1977 12.76 14.02 1.26
1980 12.22 14.35 2.13
1985 12.28 14.65 2.37
1988 12.40 14.84 2.44
Although computed values for kinetic and potential energies
changed by only 6% or less, resulting estimate for total energy
changed by 144%
Michael T. Heath Scientic Computing 44 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Example: Quadratic Formula
Two solutions of quadratic equation ax
2
+ bx + c = 0 are
given by
x =
b

b
2
4ac
2a
Naive use of formula can suffer overow, or underow, or
severe cancellation
Rescaling coefcients avoids overow or harmful underow
Cancellation between b and square root can be avoided
by computing one root using alternative formula
x =
2c
b

b
2
4ac
Cancellation inside square root cannot be easily avoided
without using higher precision
< interactive example >
Michael T. Heath Scientic Computing 45 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Example: Standard Deviation
Mean and standard deviation of sequence x
i
, i = 1, . . . , n,
are given by
x =
1
n
n

i=1
x
i
and =
_
1
n 1
n

i=1
(x
i
x)
2
_
1
2
Mathematically equivalent formula
=
_
1
n 1
_
n

i=1
x
2
i
n x
2
__
1
2
avoids making two passes through data
Single cancellation at end of one-pass formula is more
damaging numerically than all cancellations in two-pass
formula combined
Michael T. Heath Scientic Computing 46 / 46
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Scientic Computing: An Introductory Survey
Chapter 5 Nonlinear Equations
Prof. Michael T. Heath
Department of Computer Science
University of Illinois at Urbana-Champaign
Copyright c 2002. Reproduction permitted
for noncommercial, educational use only.
Michael T. Heath Scientic Computing 1 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Outline
1
Nonlinear Equations
2
Numerical Methods in One Dimension
3
Methods for Systems of Nonlinear Equations
Michael T. Heath Scientic Computing 2 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Nonlinear Equations
Given function f, we seek value x for which
f(x) = 0
Solution x is root of equation, or zero of function f
So problem is known as root nding or zero nding
Michael T. Heath Scientic Computing 3 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Nonlinear Equations
Two important cases
Single nonlinear equation in one unknown, where
f : R R
Solution is scalar x for which f(x) = 0
System of n coupled nonlinear equations in n unknowns,
where
f : R
n
R
n
Solution is vector x for which all components of f are zero
simultaneously, f(x) = 0
Michael T. Heath Scientic Computing 4 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Examples: Nonlinear Equations
Example of nonlinear equation in one dimension
x
2
4 sin(x) = 0
for which x = 1.9 is one approximate solution
Example of system of nonlinear equations in two
dimensions
x
2
1
x
2
+ 0.25 = 0
x
1
+ x
2
2
+ 0.25 = 0
for which x =
_
0.5 0.5

T
is solution vector
Michael T. Heath Scientic Computing 5 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Existence and Uniqueness
Existence and uniqueness of solutions are more
complicated for nonlinear equations than for linear
equations
For function f : R R, bracket is interval [a, b] for which
sign of f differs at endpoints
If f is continuous and sign(f(a)) = sign(f(b)), then
Intermediate Value Theorem implies there is x

[a, b]
such that f(x

) = 0
There is no simple analog for n dimensions
Michael T. Heath Scientic Computing 6 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Examples: One Dimension
Nonlinear equations can have any number of solutions
exp(x) + 1 = 0 has no solution
exp(x) x = 0 has one solution
x
2
4 sin(x) = 0 has two solutions
x
3
+ 6x
2
+ 11x 6 = 0 has three solutions
sin(x) = 0 has innitely many solutions
Michael T. Heath Scientic Computing 7 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Example: Systems in Two Dimensions
x
2
1
x
2
+ = 0
x
1
+ x
2
2
+ = 0
Michael T. Heath Scientic Computing 8 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Multiplicity
If f(x

) = f

(x

) = f

(x

) = = f
(m1)
(x

) = 0 but
f
(m)
(x

) = 0 (i.e., mth derivative is lowest derivative of f


that does not vanish at x

), then root x

has multiplicity m
If m = 1 (f(x

) = 0 and f

(x

) = 0), then x

is simple root
Michael T. Heath Scientic Computing 9 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Sensitivity and Conditioning
Conditioning of root nding problem is opposite to that for
evaluating function
Absolute condition number of root nding problem for root
x

of f : R R is 1/|f

(x

)|
Root is ill-conditioned if tangent line is nearly horizontal
In particular, multiple root (m > 1) is ill-conditioned
Absolute condition number of root nding problem for root
x

of f : R
n
R
n
is J
1
f
(x

), where J
f
is Jacobian
matrix of f,
{J
f
(x)}
ij
= f
i
(x)/x
j
Root is ill-conditioned if Jacobian matrix is nearly singular
Michael T. Heath Scientic Computing 10 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Sensitivity and Conditioning
Michael T. Heath Scientic Computing 11 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Sensitivity and Conditioning
What do we mean by approximate solution x to nonlinear
system,
f( x) 0 or x x

0 ?
First corresponds to small residual, second measures
closeness to (usually unknown) true solution x

Solution criteria are not necessarily small simultaneously


Small residual implies accurate solution only if problem is
well-conditioned
Michael T. Heath Scientic Computing 12 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Convergence Rate
For general iterative methods, dene error at iteration k by
e
k
= x
k
x

where x
k
is approximate solution and x

is true solution
For methods that maintain interval known to contain
solution, rather than specic approximate value for
solution, take error to be length of interval containing
solution
Sequence converges with rate r if
lim
k
e
k+1

e
k

r
= C
for some nite nonzero constant C
Michael T. Heath Scientic Computing 13 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Convergence Rate, continued
Some particular cases of interest
r = 1: linear (C < 1)
r > 1: superlinear
r = 2: quadratic
Convergence Digits gained
rate per iteration
linear constant
superlinear increasing
quadratic double
Michael T. Heath Scientic Computing 14 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Interval Bisection Method
Bisection method begins with initial bracket and repeatedly
halves its length until solution has been isolated as accurately
as desired
while ((b a) > tol) do
m = a + (b a)/2
if sign(f(a)) = sign(f(m)) then
a = m
else
b = m
end
end
< interactive example >
Michael T. Heath Scientic Computing 15 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Bisection Method
f(x) = x
2
4 sin(x) = 0
a f(a) b f(b)
1.000000 2.365884 3.000000 8.435520
1.000000 2.365884 2.000000 0.362810
1.500000 1.739980 2.000000 0.362810
1.750000 0.873444 2.000000 0.362810
1.875000 0.300718 2.000000 0.362810
1.875000 0.300718 1.937500 0.019849
1.906250 0.143255 1.937500 0.019849
1.921875 0.062406 1.937500 0.019849
1.929688 0.021454 1.937500 0.019849
1.933594 0.000846 1.937500 0.019849
1.933594 0.000846 1.935547 0.009491
1.933594 0.000846 1.934570 0.004320
1.933594 0.000846 1.934082 0.001736
1.933594 0.000846 1.933838 0.000445
Michael T. Heath Scientic Computing 16 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Bisection Method, continued
Bisection method makes no use of magnitudes of function
values, only their signs
Bisection is certain to converge, but does so slowly
At each iteration, length of interval containing solution
reduced by half, convergence rate is linear, with r = 1 and
C = 0.5
One bit of accuracy is gained in approximate solution for
each iteration of bisection
Given starting interval [a, b], length of interval after k
iterations is (b a)/2
k
, so achieving error tolerance of tol
requires
_
log
2
_
b a
tol
__
iterations, regardless of function f involved
Michael T. Heath Scientic Computing 17 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Fixed-Point Problems
Fixed point of given function g : R R is value x such that
x = g(x)
Many iterative methods for solving nonlinear equations use
xed-point iteration scheme of form
x
k+1
= g(x
k
)
where xed points for g are solutions for f(x) = 0
Also called functional iteration, since function g is applied
repeatedly to initial starting value x
0
For given equation f(x) = 0, there may be many equivalent
xed-point problems x = g(x) with different choices for g
Michael T. Heath Scientic Computing 18 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Fixed-Point Problems
If f(x) = x
2
x 2, then xed points of each of functions
g(x) = x
2
2
g(x) =

x + 2
g(x) = 1 + 2/x
g(x) =
x
2
+ 2
2x 1
are solutions to equation f(x) = 0
Michael T. Heath Scientic Computing 19 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Fixed-Point Problems
Michael T. Heath Scientic Computing 20 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Fixed-Point Iteration
Michael T. Heath Scientic Computing 21 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Fixed-Point Iteration
Michael T. Heath Scientic Computing 22 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Convergence of Fixed-Point Iteration
If x

= g(x

) and |g

(x

)| < 1, then there is interval


containing x

such that iteration


x
k+1
= g(x
k
)
converges to x

if started within that interval


If |g

(x

)| > 1, then iterative scheme diverges


Asymptotic convergence rate of xed-point iteration is
usually linear, with constant C = |g

(x

)|
But if g

(x

) = 0, then convergence rate is at least


quadratic
< interactive example >
Michael T. Heath Scientic Computing 23 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Newtons Method
Truncated Taylor series
f(x + h) f(x) + f

(x)h
is linear function of h approximating f near x
Replace nonlinear function f by this linear function, whose
zero is h = f(x)/f

(x)
Zeros of original function and linear approximation are not
identical, so repeat process, giving Newtons method
x
k+1
= x
k

f(x
k
)
f

(x
k
)
Michael T. Heath Scientic Computing 24 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Newtons Method, continued
Newtons method approximates nonlinear function f near x
k
by
tangent line at f(x
k
)
Michael T. Heath Scientic Computing 25 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Newtons Method
Use Newtons method to nd root of
f(x) = x
2
4 sin(x) = 0
Derivative is
f

(x) = 2x 4 cos(x)
so iteration scheme is
x
k+1
= x
k

x
2
k
4 sin(x
k
)
2x
k
4 cos(x
k
)
Taking x
0
= 3 as starting value, we obtain
x f(x) f

(x) h
3.000000 8.435520 9.959970 0.846942
2.153058 1.294772 6.505771 0.199019
1.954039 0.108438 5.403795 0.020067
1.933972 0.001152 5.288919 0.000218
1.933754 0.000000 5.287670 0.000000
Michael T. Heath Scientic Computing 26 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Convergence of Newtons Method
Newtons method transforms nonlinear equation f(x) = 0
into xed-point problem x = g(x), where
g(x) = x f(x)/f

(x)
and hence
g

(x) = f(x)f

(x)/(f

(x))
2
If x

is simple root (i.e., f(x

) = 0 and f

(x

) = 0), then
g

(x

) = 0
Convergence rate of Newtons method for simple root is
therefore quadratic (r = 2)
But iterations must start close enough to root to converge
< interactive example >
Michael T. Heath Scientic Computing 27 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Newtons Method, continued
For multiple root, convergence rate of Newtons method is only
linear, with constant C = 1 (1/m), where m is multiplicity
k f(x) = x
2
1 f(x) = x
2
2x + 1
0 2.0 2.0
1 1.25 1.5
2 1.025 1.25
3 1.0003 1.125
4 1.00000005 1.0625
5 1.0 1.03125
Michael T. Heath Scientic Computing 28 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Secant Method
For each iteration, Newtons method requires evaluation of
both function and its derivative, which may be inconvenient
or expensive
In secant method, derivative is approximated by nite
difference using two successive iterates, so iteration
becomes
x
k+1
= x
k
f(x
k
)
x
k
x
k1
f(x
k
) f(x
k1
)
Convergence rate of secant method is normally
superlinear, with r 1.618
Michael T. Heath Scientic Computing 29 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Secant Method, continued
Secant method approximates nonlinear function f by secant
line through previous two iterates
< interactive example >
Michael T. Heath Scientic Computing 30 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Secant Method
Use secant method to nd root of
f(x) = x
2
4 sin(x) = 0
Taking x
0
= 1 and x
1
= 3 as starting guesses, we obtain
x f(x) h
1.000000 2.365884
3.000000 8.435520 1.561930
1.438070 1.896774 0.286735
1.724805 0.977706 0.305029
2.029833 0.534305 0.107789
1.922044 0.061523 0.011130
1.933174 0.003064 0.000583
1.933757 0.000019 0.000004
1.933754 0.000000 0.000000
Michael T. Heath Scientic Computing 31 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Higher-Degree Interpolation
Secant method uses linear interpolation to approximate
function whose zero is sought
Higher convergence rate can be obtained by using
higher-degree polynomial interpolation
For example, quadratic interpolation (Mullers method) has
superlinear convergence rate with r 1.839
Unfortunately, using higher degree polynomial also has
disadvantages
interpolating polynomial may not have real roots
roots may not be easy to compute
choice of root to use as next iterate may not be obvious
Michael T. Heath Scientic Computing 32 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Inverse Interpolation
Good alternative is inverse interpolation, where x
k
are
interpolated as function of y
k
= f(x
k
) by polynomial p(y),
so next approximate solution is p(0)
Most commonly used for root nding is inverse quadratic
interpolation
Michael T. Heath Scientic Computing 33 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Inverse Quadratic Interpolation
Given approximate solution values a, b, c, with function
values f
a
, f
b
, f
c
, next approximate solution found by tting
quadratic polynomial to a, b, c as function of f
a
, f
b
, f
c
, then
evaluating polynomial at 0
Based on nontrivial derivation using Lagrange
interpolation, we compute
u = f
b
/f
c
, v = f
b
/f
a
, w = f
a
/f
c
p = v(w(u w)(c b) (1 u)(b a))
q = (w 1)(u 1)(v 1)
then new approximate solution is b + p/q
Convergence rate is normally r 1.839
< interactive example >
Michael T. Heath Scientic Computing 34 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Inverse Quadratic Interpolation
Use inverse quadratic interpolation to nd root of
f(x) = x
2
4 sin(x) = 0
Taking x = 1, 2, and 3 as starting values, we obtain
x f(x) h
1.000000 2.365884
2.000000 0.362810
3.000000 8.435520
1.886318 0.244343 0.113682
1.939558 0.030786 0.053240
1.933742 0.000060 0.005815
1.933754 0.000000 0.000011
1.933754 0.000000 0.000000
Michael T. Heath Scientic Computing 35 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Linear Fractional Interpolation
Interpolation using rational fraction of form
(x) =
x u
vx w
is especially useful for nding zeros of functions having
horizontal or vertical asymptotes
has zero at x = u, vertical asymptote at x = w/v, and
horizontal asymptote at y = 1/v
Given approximate solution values a, b, c, with function
values f
a
, f
b
, f
c
, next approximate solution is c + h, where
h =
(a c)(b c)(f
a
f
b
)f
c
(a c)(f
c
f
b
)f
a
(b c)(f
c
f
a
)f
b
Convergence rate is normally r 1.839, same as for
quadratic interpolation (inverse or regular)
Michael T. Heath Scientic Computing 36 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Linear Fractional Interpolation
Use linear fractional interpolation to nd root of
f(x) = x
2
4 sin(x) = 0
Taking x = 1, 2, and 3 as starting values, we obtain
x f(x) h
1.000000 2.365884
2.000000 0.362810
3.000000 8.435520
1.906953 0.139647 1.093047
1.933351 0.002131 0.026398
1.933756 0.000013 0.000406
1.933754 0.000000 0.000003
< interactive example >
Michael T. Heath Scientic Computing 37 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Safeguarded Methods
Rapidly convergent methods for solving nonlinear
equations may not converge unless started close to
solution, but safe methods are slow
Hybrid methods combine features of both types of
methods to achieve both speed and reliability
Use rapidly convergent method, but maintain bracket
around solution
If next approximate solution given by fast method falls
outside bracketing interval, perform one iteration of safe
method, such as bisection
Michael T. Heath Scientic Computing 38 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Safeguarded Methods, continued
Fast method can then be tried again on smaller interval
with greater chance of success
Ultimately, convergence rate of fast method should prevail
Hybrid approach seldom does worse than safe method,
and usually does much better
Popular combination is bisection and inverse quadratic
interpolation, for which no derivatives required
Michael T. Heath Scientic Computing 39 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Zeros of Polynomials
For polynomial p(x) of degree n, one may want to nd all n
of its zeros, which may be complex even if coefcients are
real
Several approaches are available
Use root-nding method such as Newtons or Mullers
method to nd one root, deate it out, and repeat
Form companion matrix of polynomial and use eigenvalue
routine to compute all its eigenvalues
Use method designed specically for nding all roots of
polynomial, such as Jenkins-Traub
Michael T. Heath Scientic Computing 40 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Systems of Nonlinear Equations
Solving systems of nonlinear equations is much more difcult
than scalar case because
Wider variety of behavior is possible, so determining
existence and number of solutions or good starting guess
is much more complex
There is no simple way, in general, to guarantee
convergence to desired solution or to bracket solution to
produce absolutely safe method
Computational overhead increases rapidly with dimension
of problem
Michael T. Heath Scientic Computing 41 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Fixed-Point Iteration
Fixed-point problem for g: R
n
R
n
is to nd vector x such
that
x = g(x)
Corresponding xed-point iteration is
x
k+1
= g(x
k
)
If (G(x

)) < 1, where is spectral radius and G(x) is


Jacobian matrix of g evaluated at x, then xed-point
iteration converges if started close enough to solution
Convergence rate is normally linear, with constant C given
by spectral radius (G(x

))
If G(x

) = O, then convergence rate is at least quadratic


Michael T. Heath Scientic Computing 42 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Newtons Method
In n dimensions, Newtons method has form
x
k+1
= x
k
J(x
k
)
1
f(x
k
)
where J(x) is Jacobian matrix of f,
{J(x)}
ij
=
f
i
(x)
x
j
In practice, we do not explicitly invert J(x
k
), but instead
solve linear system
J(x
k
)s
k
= f(x
k
)
for Newton step s
k
, then take as next iterate
x
k+1
= x
k
+s
k
Michael T. Heath Scientic Computing 43 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Example: Newtons Method
Use Newtons method to solve nonlinear system
f(x) =
_
x
1
+ 2x
2
2
x
2
1
+ 4x
2
2
4
_
= 0
Jacobian matrix is J
f
(x) =
_
1 2
2x
1
8x
2
_
If we take x
0
=
_
1 2

T
, then
f(x
0
) =
_
3
13
_
, J
f
(x
0
) =
_
1 2
2 16
_
Solving system
_
1 2
2 16
_
s
0
=
_
3
13
_
gives s
0
=
_
1.83
0.58
_
,
so x
1
= x
0
+s
0
=
_
0.83 1.42

T
Michael T. Heath Scientic Computing 44 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Example, continued
Evaluating at new point,
f(x
1
) =
_
0
4.72
_
, J
f
(x
1
) =
_
1 2
1.67 11.3
_
Solving system
_
1 2
1.67 11.3
_
s
1
=
_
0
4.72
_
gives
s
1
=
_
0.64 0.32

T
, so x
2
= x
1
+s
1
=
_
0.19 1.10

T
Evaluating at new point,
f(x
2
) =
_
0
0.83
_
, J
f
(x
2
) =
_
1 2
0.38 8.76
_
Iterations eventually convergence to solution x

=
_
0 1

T
< interactive example >
Michael T. Heath Scientic Computing 45 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Convergence of Newtons Method
Differentiating corresponding xed-point operator
g(x) = x J(x)
1
f(x)
and evaluating at solution x

gives
G(x

) = I (J(x

)
1
J(x

) +
n

i=1
f
i
(x

)H
i
(x

)) = O
where H
i
(x) is component matrix of derivative of J(x)
1
Convergence rate of Newtons method for nonlinear
systems is normally quadratic, provided Jacobian matrix
J(x

) is nonsingular
But it must be started close enough to solution to converge
Michael T. Heath Scientic Computing 46 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Cost of Newtons Method
Cost per iteration of Newtons method for dense problem in n
dimensions is substantial
Computing Jacobian matrix costs n
2
scalar function
evaluations
Solving linear system costs O(n
3
) operations
Michael T. Heath Scientic Computing 47 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Secant Updating Methods
Secant updating methods reduce cost by
Using function values at successive iterates to build
approximate Jacobian and avoiding explicit evaluation of
derivatives
Updating factorization of approximate Jacobian rather than
refactoring it each iteration
Most secant updating methods have superlinear but not
quadratic convergence rate
Secant updating methods often cost less overall than
Newtons method because of lower cost per iteration
Michael T. Heath Scientic Computing 48 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Broydens Method
Broydens method is typical secant updating method
Beginning with initial guess x
0
for solution and initial
approximate Jacobian B
0
, following steps are repeated
until convergence
x
0
= initial guess
B
0
= initial Jacobian approximation
for k = 0, 1, 2, . . .
Solve B
k
s
k
= f(x
k
) for s
k
x
k+1
= x
k
+s
k
y
k
= f(x
k+1
) f(x
k
)
B
k+1
= B
k
+ ((y
k
B
k
s
k
)s
T
k
)/(s
T
k
s
k
)
end
Michael T. Heath Scientic Computing 49 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Broydens Method, continued
Motivation for formula for B
k+1
is to make least change to
B
k
subject to satisfying secant equation
B
k+1
(x
k+1
x
k
) = f(x
k+1
) f(x
k
)
In practice, factorization of B
k
is updated instead of
updating B
k
directly, so total cost per iteration is only O(n
2
)
Michael T. Heath Scientic Computing 50 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Example: Broydens Method
Use Broydens method to solve nonlinear system
f(x) =
_
x
1
+ 2x
2
2
x
2
1
+ 4x
2
2
4
_
= 0
If x
0
=
_
1 2

T
, then f(x
0
) =
_
3 13

T
, and we choose
B
0
= J
f
(x
0
) =
_
1 2
2 16
_
Solving system
_
1 2
2 16
_
s
0
=
_
3
13
_
gives s
0
=
_
1.83
0.58
_
, so x
1
= x
0
+s
0
=
_
0.83
1.42
_
Michael T. Heath Scientic Computing 51 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Example, continued
Evaluating at new point x
1
gives f(x
1
) =
_
0
4.72
_
, so
y
0
= f(x
1
) f(x
0
) =
_
3
8.28
_
From updating formula, we obtain
B
1
=
_
1 2
2 16
_
+
_
0 0
2.34 0.74
_
=
_
1 2
0.34 15.3
_
Solving system
_
1 2
0.34 15.3
_
s
1
=
_
0
4.72
_
gives s
1
=
_
0.59
0.30
_
, so x
2
= x
1
+s
1
=
_
0.24
1.120
_
Michael T. Heath Scientic Computing 52 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Example, continued
Evaluating at new point x
2
gives f(x
2
) =
_
0
1.08
_
, so
y
1
= f(x
2
) f(x
1
) =
_
0
3.64
_
From updating formula, we obtain
B
2
=
_
1 2
0.34 15.3
_
+
_
0 0
1.46 0.73
_
=
_
1 2
1.12 14.5
_
Iterations continue until convergence to solution x

=
_
0
1
_
< interactive example >
Michael T. Heath Scientic Computing 53 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Robust Newton-Like Methods
Newtons method and its variants may fail to converge
when started far from solution
Safeguards can enlarge region of convergence of
Newton-like methods
Simplest precaution is damped Newton method, in which
new iterate is
x
k+1
= x
k
+
k
s
k
where s
k
is Newton (or Newton-like) step and
k
is scalar
parameter chosen to ensure progress toward solution
Parameter
k
reduces Newton step when it is too large,
but
k
= 1 sufces near solution and still yields fast
asymptotic convergence rate
Michael T. Heath Scientic Computing 54 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Trust-Region Methods
Another approach is to maintain estimate of trust region
where Taylor series approximation, upon which Newtons
method is based, is sufciently accurate for resulting
computed step to be reliable
Adjusting size of trust region to constrain step size when
necessary usually enables progress toward solution even
starting far away, yet still permits rapid converge once near
solution
Unlike damped Newton method, trust region method may
modify direction as well as length of Newton step
More details on this approach will be given in Chapter 6
Michael T. Heath Scientic Computing 55 / 55
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Optimization
Given function f : R
n
R, and set S R
n
, nd x

S
such that f(x

) f(x) for all x S


x

is called minimizer or minimum of f


It sufces to consider only minimization, since maximum of
f is minimum of f
Objective function f is usually differentiable, and may be
linear or nonlinear
Constraint set S is dened by system of equations and
inequalities, which may be linear or nonlinear
Points x S are called feasible points
If S = R
n
, problem is unconstrained
Michael T. Heath Scientic Computing 3 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Optimization Problems
General continuous optimization problem:
min f(x) subject to g(x) = 0 and h(x) 0
where f : R
n
R, g: R
n
R
m
, h: R
n
R
p
Linear programming: f, g, and h are all linear
Nonlinear programming: at least one of f, g, and h is
nonlinear
Michael T. Heath Scientic Computing 4 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Examples: Optimization Problems
Minimize weight of structure subject to constraint on its
strength, or maximize its strength subject to constraint on
its weight
Minimize cost of diet subject to nutritional constraints
Minimize surface area of cylinder subject to constraint on
its volume:
min
x
1
,x
2
f(x
1
, x
2
) = 2x
1
(x
1
+ x
2
)
subject to g(x
1
, x
2
) = x
2
1
x
2
V = 0
where x
1
and x
2
are radius and height of cylinder, and V is
required volume
Michael T. Heath Scientic Computing 5 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Local vs Global Optimization
x

S is global minimum if f(x

) f(x) for all x S


x

S is local minimum if f(x

) f(x) for all feasible x in


some neighborhood of x

Michael T. Heath Scientic Computing 6 / 74


Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Global Optimization
Finding, or even verifying, global minimum is difcult, in
general
Most optimization methods are designed to nd local
minimum, which may or may not be global minimum
If global minimum is desired, one can try several widely
separated starting points and see if all produce same
result
For some problems, such as linear programming, global
optimization is more tractable
Michael T. Heath Scientic Computing 7 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Existence of Minimum
If f is continuous on closed and bounded set S R
n
, then
f has global minimum on S
If S is not closed or is unbounded, then f may have no
local or global minimum on S
Continuous function f on unbounded set S R
n
is
coercive if
lim
x
f(x) = +
i.e., f(x) must be large whenever x is large
If f is coercive on closed, unbounded set S R
n
, then f
has global minimum on S
Michael T. Heath Scientic Computing 8 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Level Sets
Level set for function f : S R
n
R is set of all points in
S for which f has some given constant value
For given R, sublevel set is
L

= {x S : f(x) }
If continuous function f on S R
n
has nonempty sublevel
set that is closed and bounded, then f has global minimum
on S
If S is unbounded, then f is coercive on S if, and only if, all
of its sublevel sets are bounded
Michael T. Heath Scientic Computing 9 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Uniqueness of Minimum
Set S R
n
is convex if it contains line segment between
any two of its points
Function f : S R
n
R is convex on convex set S if its
graph along any line segment in S lies on or below chord
connecting function values at endpoints of segment
Any local minimum of convex function f on convex set
S R
n
is global minimum of f on S
Any local minimum of strictly convex function f on convex
set S R
n
is unique global minimum of f on S
Michael T. Heath Scientic Computing 10 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
First-Order Optimality Condition
For function of one variable, one can nd extremum by
differentiating function and setting derivative to zero
Generalization to function of n variables is to nd critical
point, i.e., solution of nonlinear system
f(x) = 0
where f(x) is gradient vector of f, whose ith component
is f(x)/x
i
For continuously differentiable f : S R
n
R, any interior
point x

of S at which f has local minimum must be critical


point of f
But not all critical points are minima: they can also be
maxima or saddle points
Michael T. Heath Scientic Computing 11 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Second-Order Optimality Condition
For twice continuously differentiable f : S R
n
R, we
can distinguish among critical points by considering
Hessian matrix H
f
(x) dened by
{H
f
(x)}
ij
=

2
f(x)
x
i
x
j
which is symmetric
At critical point x

, if H
f
(x

) is
positive denite, then x

is minimum of f
negative denite, then x

is maximum of f
indenite, then x

is saddle point of f
singular, then various pathological situations are possible
Michael T. Heath Scientic Computing 12 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Constrained Optimality
If problem is constrained, only feasible directions are
relevant
For equality-constrained problem
min f(x) subject to g(x) = 0
where f : R
n
R and g: R
n
R
m
, with m n, necessary
condition for feasible point x

to be solution is that negative


gradient of f lie in space spanned by constraint normals,
f(x

) = J
T
g
(x

)
where J
g
is Jacobian matrix of g, and is vector of
Lagrange multipliers
This condition says we cannot reduce objective function
without violating constraints
Michael T. Heath Scientic Computing 13 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Constrained Optimality, continued
Lagrangian function L: R
n+m
R, is dened by
L(x, ) = f(x) +
T
g(x)
Its gradient is given by
L(x, ) =

f(x) +J
T
g
(x)
g(x)

Its Hessian is given by


H
L
(x, ) =

B(x, ) J
T
g
(x)
J
g
(x) O

where
B(x, ) = H
f
(x) +
m

i=1

i
H
g
i
(x)
Michael T. Heath Scientic Computing 14 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Constrained Optimality, continued
Together, necessary condition and feasibility imply critical
point of Lagrangian function,
L(x, ) =

f(x) +J
T
g
(x)
g(x)

= 0
Hessian of Lagrangian is symmetric, but not positive
denite, so critical point of L is saddle point rather than
minimum or maximum
Critical point (x

) of L is constrained minimum of f if
B(x

) is positive denite on null space of J


g
(x

)
If columns of Z form basis for null space, then test
projected Hessian Z
T
BZ for positive deniteness
Michael T. Heath Scientic Computing 15 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Constrained Optimality, continued
If inequalities are present, then KKT optimality conditions
also require nonnegativity of Lagrange multipliers
corresponding to inequalities, and complementarity
condition
Michael T. Heath Scientic Computing 16 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Sensitivity and Conditioning
Function minimization and equation solving are closely
related problems, but their sensitivities differ
In one dimension, absolute condition number of root x

of
equation f(x) = 0 is 1/|f

(x

)|, so if |f( x)| , then


| x x

| may be as large as /|f

(x

)|
For minimizing f, Taylor series expansion
f( x) = f(x

+ h)
= f(x

) + f

(x

)h +
1
2
f

(x

)h
2
+O(h
3
)
shows that, since f

(x

) = 0, if |f( x) f(x

)| , then
| x x

| may be as large as

2/|f

(x

)|
Thus, based on function values alone, minima can be
computed to only about half precision
Michael T. Heath Scientic Computing 17 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Unimodality
For minimizing function of one variable, we need bracket
for solution analogous to sign change for nonlinear
equation
Real-valued function f is unimodal on interval [a, b] if there
is unique x

[a, b] such that f(x

) is minimum of f on
[a, b], and f is strictly decreasing for x x

, strictly
increasing for x

x
Unimodality enables discarding portions of interval based
on sample function values, analogous to interval bisection
Michael T. Heath Scientic Computing 18 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Golden Section Search
Suppose f is unimodal on [a, b], and let x
1
and x
2
be two
points within [a, b], with x
1
< x
2
Evaluating and comparing f(x
1
) and f(x
2
), we can discard
either (x
2
, b] or [a, x
1
), with minimum known to lie in
remaining subinterval
To repeat process, we need compute only one new
function evaluation
To reduce length of interval by xed fraction at each
iteration, each new pair of points must have same
relationship with respect to new interval that previous pair
had with respect to previous interval
Michael T. Heath Scientic Computing 19 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Golden Section Search, continued
To accomplish this, we choose relative positions of two
points as and 1 , where
2
= 1 , so
= (

5 1)/2 0.618 and 1 0.382


Whichever subinterval is retained, its length will be
relative to previous interval, and interior point retained will
be at position either or 1 relative to new interval
To continue iteration, we need to compute only one new
function value, at complementary point
This choice of sample points is called golden section
search
Golden section search is safe but convergence rate is only
linear, with constant C 0.618
Michael T. Heath Scientic Computing 20 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Golden Section Search, continued
= (

5 1)/2
x
1
= a + (1 )(b a); f
1
= f(x
1
)
x
2
= a + (b a); f
2
= f(x
2
)
while ((b a) > tol) do
if (f
1
> f
2
) then
a = x
1
x
1
= x
2
f
1
= f
2
x
2
= a + (b a)
f
2
= f(x
2
)
else
b = x
2
x
2
= x
1
f
2
= f
1
x
1
= a + (1 )(b a)
f
1
= f(x
1
)
end
end
Michael T. Heath Scientic Computing 21 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Example: Golden Section Search
Use golden section search to minimize
f(x) = 0.5 xexp(x
2
)
Michael T. Heath Scientic Computing 22 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Example, continued
x
1
f
1
x
2
f
2
0.764 0.074 1.236 0.232
0.472 0.122 0.764 0.074
0.764 0.074 0.944 0.113
0.652 0.074 0.764 0.074
0.584 0.085 0.652 0.074
0.652 0.074 0.695 0.071
0.695 0.071 0.721 0.071
0.679 0.072 0.695 0.071
0.695 0.071 0.705 0.071
0.705 0.071 0.711 0.071
< interactive example >
Michael T. Heath Scientic Computing 23 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Successive Parabolic Interpolation
Fit quadratic polynomial to three function values
Take minimum of quadratic to be new approximation to
minimum of function
New point replaces oldest of three previous points and
process is repeated until convergence
Convergence rate of successive parabolic interpolation is
superlinear, with r 1.324
Michael T. Heath Scientic Computing 24 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Example: Successive Parabolic Interpolation
Use successive parabolic interpolation to minimize
f(x) = 0.5 xexp(x
2
)
Michael T. Heath Scientic Computing 25 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Example, continued
x
k
f(x
k
)
0.000 0.500
0.600 0.081
1.200 0.216
0.754 0.073
0.721 0.071
0.692 0.071
0.707 0.071
< interactive example >
Michael T. Heath Scientic Computing 26 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Newtons Method
Another local quadratic approximation is truncated Taylor
series
f(x + h) f(x) + f

(x)h +
f

(x)
2
h
2
By differentiation, minimum of this quadratic function of h is
given by h = f

(x)/f

(x)
Suggests iteration scheme
x
k+1
= x
k
f

(x
k
)/f

(x
k
)
which is Newtons method for solving nonlinear equation
f

(x) = 0
Newtons method for nding minimum normally has
quadratic convergence rate, but must be started close
enough to solution to converge < interactive example >
Michael T. Heath Scientic Computing 27 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Example: Newtons Method
Use Newtons method to minimize f(x) = 0.5 xexp(x
2
)
First and second derivatives of f are given by
f

(x) = (2x
2
1) exp(x
2
)
and
f

(x) = 2x(3 2x
2
) exp(x
2
)
Newton iteration for zero of f

is given by
x
k+1
= x
k
(2x
2
k
1)/(2x
k
(3 2x
2
k
))
Using starting guess x
0
= 1, we obtain
x
k
f(x
k
)
1.000 0.132
0.500 0.111
0.700 0.071
0.707 0.071
Michael T. Heath Scientic Computing 28 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Safeguarded Methods
As with nonlinear equations in one dimension,
slow-but-sure and fast-but-risky optimization methods can
be combined to provide both safety and efciency
Most library routines for one-dimensional optimization are
based on this hybrid approach
Popular combination is golden section search and
successive parabolic interpolation, for which no derivatives
are required
Michael T. Heath Scientic Computing 29 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Direct Search Methods
Direct search methods for multidimensional optimization
make no use of function values other than comparing them
For minimizing function f of n variables, Nelder-Mead
method begins with n + 1 starting points, forming simplex
in R
n
Then move to new point along straight line from current
point having highest function value through centroid of
other points
New point replaces worst point, and process is repeated
Direct search methods are useful for nonsmooth functions
or for small n, but expensive for larger n
< interactive example >
Michael T. Heath Scientic Computing 30 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Steepest Descent Method
Let f : R
n
R be real-valued function of n real variables
At any point x where gradient vector is nonzero, negative
gradient, f(x), points downhill toward lower values of f
In fact, f(x) is locally direction of steepest descent: f
decreases more rapidly along direction of negative
gradient than along any other
Steepest descent method: starting from initial guess x
0
,
successive approximate solutions given by
x
k+1
= x
k

k
f(x
k
)
where
k
is line search parameter that determines how far
to go in given direction
Michael T. Heath Scientic Computing 31 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Steepest Descent, continued
Given descent direction, such as negative gradient,
determining appropriate value for
k
at each iteration is
one-dimensional minimization problem
min

k
f(x
k

k
f(x
k
))
that can be solved by methods already discussed
Steepest descent method is very reliable: it can always
make progress provided gradient is nonzero
But method is myopic in its view of functions behavior, and
resulting iterates can zigzag back and forth, making very
slow progress toward solution
In general, convergence rate of steepest descent is only
linear, with constant factor that can be arbitrarily close to 1
Michael T. Heath Scientic Computing 32 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example: Steepest Descent
Use steepest descent method to minimize
f(x) = 0.5x
2
1
+ 2.5x
2
2
Gradient is given by f(x) =

x
1
5x
2

Taking x
0
=

5
1

, we have f(x
0
) =

5
5

Performing line search along negative gradient direction,


min

0
f(x
0

0
f(x
0
))
exact minimum along line is given by
0
= 1/3, so next
approximation is x
1
=

3.333
0.667

Michael T. Heath Scientic Computing 33 / 74


Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example, continued
x
k
f(x
k
) f(x
k
)
5.000 1.000 15.000 5.000 5.000
3.333 0.667 6.667 3.333 3.333
2.222 0.444 2.963 2.222 2.222
1.481 0.296 1.317 1.481 1.481
0.988 0.198 0.585 0.988 0.988
0.658 0.132 0.260 0.658 0.658
0.439 0.088 0.116 0.439 0.439
0.293 0.059 0.051 0.293 0.293
0.195 0.039 0.023 0.195 0.195
0.130 0.026 0.010 0.130 0.130
Michael T. Heath Scientic Computing 34 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example, continued
< interactive example >
Michael T. Heath Scientic Computing 35 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Newtons Method
Broader view can be obtained by local quadratic
approximation, which is equivalent to Newtons method
In multidimensional optimization, we seek zero of gradient,
so Newton iteration has form
x
k+1
= x
k
H
1
f
(x
k
)f(x
k
)
where H
f
(x) is Hessian matrix of second partial
derivatives of f,
{H
f
(x)}
ij
=

2
f(x)
x
i
x
j
Michael T. Heath Scientic Computing 36 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Newtons Method, continued
Do not explicitly invert Hessian matrix, but instead solve
linear system
H
f
(x
k
)s
k
= f(x
k
)
for Newton step s
k
, then take as next iterate
x
k+1
= x
k
+s
k
Convergence rate of Newtons method for minimization is
normally quadratic
As usual, Newtons method is unreliable unless started
close enough to solution to converge
< interactive example >
Michael T. Heath Scientic Computing 37 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example: Newtons Method
Use Newtons method to minimize
f(x) = 0.5x
2
1
+ 2.5x
2
2
Gradient and Hessian are given by
f(x) =

x
1
5x
2

and H
f
(x) =

1 0
0 5

Taking x
0
=

5
1

, we have f(x
0
) =

5
5

Linear system for Newton step is

1 0
0 5

s
0
=

5
5

, so
x
1
= x
0
+s
0
=

5
1

5
1

0
0

, which is exact solution


for this problem, as expected for quadratic function
Michael T. Heath Scientic Computing 38 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Newtons Method, continued
In principle, line search parameter is unnecessary with
Newtons method, since quadratic model determines
length, as well as direction, of step to next approximate
solution
When started far from solution, however, it may still be
advisable to perform line search along direction of Newton
step s
k
to make method more robust (damped Newton)
Once iterates are near solution, then
k
= 1 should sufce
for subsequent iterations
Michael T. Heath Scientic Computing 39 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Newtons Method, continued
If objective function f has continuous second partial
derivatives, then Hessian matrix H
f
is symmetric, and
near minimum it is positive denite
Thus, linear system for step to next iterate can be solved in
only about half of work required for LU factorization
Far from minimum, H
f
(x
k
) may not be positive denite, so
Newton step s
k
may not be descent direction for function,
i.e., we may not have
f(x
k
)
T
s
k
< 0
In this case, alternative descent direction can be
computed, such as negative gradient or direction of
negative curvature, and then perform line search
Michael T. Heath Scientic Computing 40 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Trust Region Methods
Alternative to line search is trust region method, in which
approximate solution is constrained to lie within region
where quadratic model is sufciently accurate
If current trust radius is binding, minimizing quadratic
model function subject to this constraint may modify
direction as well as length of Newton step
Accuracy of quadratic model is assessed by comparing
actual decrease in objective function with that predicted by
quadratic model, and trust radius is increased or
decreased accordingly
Michael T. Heath Scientic Computing 41 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Trust Region Methods, continued
Michael T. Heath Scientic Computing 42 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Quasi-Newton Methods
Newtons method costs O(n
3
) arithmetic and O(n
2
) scalar
function evaluations per iteration for dense problem
Many variants of Newtons method improve reliability and
reduce overhead
Quasi-Newton methods have form
x
k+1
= x
k

k
B
1
k
f(x
k
)
where
k
is line search parameter and B
k
is approximation
to Hessian matrix
Many quasi-Newton methods are more robust than
Newtons method, are superlinearly convergent, and have
lower overhead per iteration, which often more than offsets
their slower convergence rate
Michael T. Heath Scientic Computing 43 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Secant Updating Methods
Could use Broydens method to seek zero of gradient, but
this would not preserve symmetry of Hessian matrix
Several secant updating formulas have been developed for
minimization that not only preserve symmetry in
approximate Hessian matrix, but also preserve positive
deniteness
Symmetry reduces amount of work required by about half,
while positive deniteness guarantees that quasi-Newton
step will be descent direction
Michael T. Heath Scientic Computing 44 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
BFGS Method
One of most effective secant updating methods for minimization
is BFGS
x
0
= initial guess
B
0
= initial Hessian approximation
for k = 0, 1, 2, . . .
Solve B
k
s
k
= f(x
k
) for s
k
x
k+1
= x
k
+s
k
y
k
= f(x
k+1
) f(x
k
)
B
k+1
= B
k
+ (y
k
y
T
k
)/(y
T
k
s
k
) (B
k
s
k
s
T
k
B
k
)/(s
T
k
B
k
s
k
)
end
Michael T. Heath Scientic Computing 45 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
BFGS Method, continued
In practice, factorization of B
k
is updated rather than B
k
itself, so linear system for s
k
can be solved at cost of O(n
2
)
rather than O(n
3
) work
Unlike Newtons method for minimization, no second
derivatives are required
Can start with B
0
= I, so initial step is along negative
gradient, and then second derivative information is
gradually built up in approximate Hessian matrix over
successive iterations
BFGS normally has superlinear convergence rate, even
though approximate Hessian does not necessarily
converge to true Hessian
Line search can be used to enhance effectiveness
Michael T. Heath Scientic Computing 46 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example: BFGS Method
Use BFGS to minimize f(x) = 0.5x
2
1
+ 2.5x
2
2
Gradient is given by f(x) =

x
1
5x
2

Taking x
0
=

5 1

T
and B
0
= I, initial step is negative
gradient, so
x
1
= x
0
+s
0
=

5
1

5
5

0
4

Updating approximate Hessian using BFGS formula, we


obtain
B
1
=

0.667 0.333
0.333 0.667

Then new step is computed and process is repeated


Michael T. Heath Scientic Computing 47 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example: BFGS Method
x
k
f(x
k
) f(x
k
)
5.000 1.000 15.000 5.000 5.000
0.000 4.000 40.000 0.000 20.000
2.222 0.444 2.963 2.222 2.222
0.816 0.082 0.350 0.816 0.408
0.009 0.015 0.001 0.009 0.077
0.001 0.001 0.000 0.001 0.005
Increase in function value can be avoided by using line
search, which generally enhances convergence
For quadratic objective function, BFGS with exact line
search nds exact solution in at most n iterations, where n
is dimension of problem < interactive example >
Michael T. Heath Scientic Computing 48 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Conjugate Gradient Method
Another method that does not require explicit second
derivatives, and does not even store approximation to
Hessian matrix, is conjugate gradient (CG) method
CG generates sequence of conjugate search directions,
implicitly accumulating information about Hessian matrix
For quadratic objective function, CG is theoretically exact
after at most n iterations, where n is dimension of problem
CG is effective for general unconstrained minimization as
well
Michael T. Heath Scientic Computing 49 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Conjugate Gradient Method, continued
x
0
= initial guess
g
0
= f(x
0
)
s
0
= g
0
for k = 0, 1, 2, . . .
Choose
k
to minimize f(x
k
+
k
s
k
)
x
k+1
= x
k
+
k
s
k
g
k+1
= f(x
k+1
)

k+1
= (g
T
k+1
g
k+1
)/(g
T
k
g
k
)
s
k+1
= g
k+1
+
k+1
s
k
end
Alternative formula for
k+1
is

k+1
= ((g
k+1
g
k
)
T
g
k+1
)/(g
T
k
g
k
)
Michael T. Heath Scientic Computing 50 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example: Conjugate Gradient Method
Use CG method to minimize f(x) = 0.5x
2
1
+ 2.5x
2
2
Gradient is given by f(x) =

x
1
5x
2

Taking x
0
=

5 1

T
, initial search direction is negative
gradient,
s
0
= g
0
= f(x
0
) =

5
5

Exact minimum along line is given by


0
= 1/3, so next
approximation is x
1
=

3.333 0.667

T
, and we compute
new gradient,
g
1
= f(x
1
) =

3.333
3.333

Michael T. Heath Scientic Computing 51 / 74


Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example, continued
So far there is no difference from steepest descent method
At this point, however, rather than search along new
negative gradient, we compute instead

1
= (g
T
1
g
1
)/(g
T
0
g
0
) = 0.444
which gives as next search direction
s
1
= g
1
+
1
s
0
=

3.333
3.333

+ 0.444

5
5

5.556
1.111

Minimum along this direction is given by


1
= 0.6, which
gives exact solution at origin, as expected for quadratic
function
< interactive example >
Michael T. Heath Scientic Computing 52 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Truncated Newton Methods
Another way to reduce work in Newton-like methods is to
solve linear system for Newton step by iterative method
Small number of iterations may sufce to produce step as
useful as true Newton step, especially far from overall
solution, where true Newton step may be unreliable
anyway
Good choice for linear iterative solver is CG method, which
gives step intermediate between steepest descent and
Newton-like step
Since only matrix-vector products are required, explicit
formation of Hessian matrix can be avoided by using nite
difference of gradient along given vector
Michael T. Heath Scientic Computing 53 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Nonlinear Least Squares
Given data (t
i
, y
i
), nd vector x of parameters that gives
best t in least squares sense to model function f(t, x),
where f is nonlinear function of x
Dene components of residual function
r
i
(x) = y
i
f(t
i
, x), i = 1, . . . , m
so we want to minimize (x) =
1
2
r
T
(x)r(x)
Gradient vector is (x) = J
T
(x)r(x) and Hessian matrix
is
H

(x) = J
T
(x)J(x) +
m

i=1
r
i
(x)H
i
(x)
where J(x) is Jacobian of r(x), and H
i
(x) is Hessian of
r
i
(x)
Michael T. Heath Scientic Computing 54 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Nonlinear Least Squares, continued
Linear system for Newton step is

J
T
(x
k
)J(x
k
) +
m

i=1
r
i
(x
k
)H
i
(x
k
)

s
k
= J
T
(x
k
)r(x
k
)
m Hessian matrices H
i
are usually inconvenient and
expensive to compute
Moreover, in H

each H
i
is multiplied by residual
component r
i
, which is small at solution if t of model
function to data is good
Michael T. Heath Scientic Computing 55 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Gauss-Newton Method
This motivates Gauss-Newton method for nonlinear least
squares, in which second-order term is dropped and linear
system
J
T
(x
k
)J(x
k
)s
k
= J
T
(x
k
)r(x
k
)
is solved for approximate Newton step s
k
at each iteration
This is system of normal equations for linear least squares
problem
J(x
k
)s
k

= r(x
k
)
which can be solved better by QR factorization
Next approximate solution is then given by
x
k+1
= x
k
+s
k
and process is repeated until convergence
Michael T. Heath Scientic Computing 56 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example: Gauss-Newton Method
Use Gauss-Newton method to t nonlinear model function
f(t, x) = x
1
exp(x
2
t)
to data
t 0.0 1.0 2.0 3.0
y 2.0 0.7 0.3 0.1
For this model function, entries of Jacobian matrix of
residual function r are given by
{J(x)}
i,1
=
r
i
(x)
x
1
= exp(x
2
t
i
)
{J(x)}
i,2
=
r
i
(x)
x
2
= x
1
t
i
exp(x
2
t
i
)
Michael T. Heath Scientic Computing 57 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example, continued
If we take x
0
=

1 0

T
, then Gauss-Newton step s
0
is
given by linear least squares problem

1 0
1 1
1 2
1 3

s
0

=

1
0.3
0.7
0.9

whose solution is s
0
=

0.69
0.61

Then next approximate solution is given by x


1
= x
0
+s
0
,
and process is repeated until convergence
Michael T. Heath Scientic Computing 58 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example, continued
x
k
r(x
k
)
2
2
1.000 0.000 2.390
1.690 0.610 0.212
1.975 0.930 0.007
1.994 1.004 0.002
1.995 1.009 0.002
1.995 1.010 0.002
< interactive example >
Michael T. Heath Scientic Computing 59 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Gauss-Newton Method, continued
Gauss-Newton method replaces nonlinear least squares
problem by sequence of linear least squares problems
whose solutions converge to solution of original nonlinear
problem
If residual at solution is large, then second-order term
omitted from Hessian is not negligible, and Gauss-Newton
method may converge slowly or fail to converge
In such large-residual cases, it may be best to use
general nonlinear minimization method that takes into
account true full Hessian matrix
Michael T. Heath Scientic Computing 60 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Levenberg-Marquardt Method
Levenberg-Marquardt method is another useful alternative
when Gauss-Newton approximation is inadequate or yields
rank decient linear least squares subproblem
In this method, linear system at each iteration is of form
(J
T
(x
k
)J(x
k
) +
k
I)s
k
= J
T
(x
k
)r(x
k
)
where
k
is scalar parameter chosen by some strategy
Corresponding linear least squares problem is

J(x
k
)

k
I

s
k

=

r(x
k
)
0

With suitable strategy for choosing


k
, this method can be
very robust in practice, and it forms basis for several
effective software packages < interactive example >
Michael T. Heath Scientic Computing 61 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Equality-Constrained Optimization
For equality-constrained minimization problem
min f(x) subject to g(x) = 0
where f : R
n
R and g: R
n
R
m
, with m n, we seek
critical point of Lagrangian L(x, ) = f(x) +
T
g(x)
Applying Newtons method to nonlinear system
L(x, ) =

f(x) +J
T
g
(x)
g(x)

= 0
we obtain linear system

B(x, ) J
T
g
(x)
J
g
(x) O

s

f(x) +J
T
g
(x)
g(x)

for Newton step (s, ) in (x, ) at each iteration


Michael T. Heath Scientic Computing 62 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Sequential Quadratic Programming
Foregoing block 2 2 linear system is equivalent to
quadratic programming problem, so this approach is
known as sequential quadratic programming
Types of solution methods include
Direct solution methods, in which entire block 2 2 system
is solved directly
Range space methods, based on block elimination in block
2 2 linear system
Null space methods, based on orthogonal factorization of
matrix of constraint normals, J
T
g
(x)
< interactive example >
Michael T. Heath Scientic Computing 63 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Merit Function
Once Newton step (s, ) determined, we need merit
function to measure progress toward overall solution for
use in line search or trust region
Popular choices include penalty function

(x) = f(x) +
1
2
g(x)
T
g(x)
and augmented Lagrangian function
L

(x, ) = f(x) +
T
g(x) +
1
2
g(x)
T
g(x)
where parameter > 0 determines relative weighting of
optimality vs feasibility
Given starting guess x
0
, good starting guess for
0
can be
obtained from least squares problem
J
T
g
(x
0
)
0

= f(x
0
)
Michael T. Heath Scientic Computing 64 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Inequality-Constrained Optimization
Methods just outlined for equality constraints can be
extended to handle inequality constraints by using active
set strategy
Inequality constraints are provisionally divided into those
that are satised already (and can therefore be temporarily
disregarded) and those that are violated (and are therefore
temporarily treated as equality constraints)
This division of constraints is revised as iterations proceed
until eventually correct constraints are identied that are
binding at solution
Michael T. Heath Scientic Computing 65 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Penalty Methods
Merit function can also be used to convert
equality-constrained problem into sequence of
unconstrained problems
If x

is solution to
min
x

(x) = f(x) +
1
2
g(x)
T
g(x)
then under appropriate conditions
lim

= x

This enables use of unconstrained optimization methods,


but problem becomes ill-conditioned for large , so we
solve sequence of problems with gradually increasing
values of , with minimum for each problem used as
starting point for next problem < interactive example >
Michael T. Heath Scientic Computing 66 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Barrier Methods
For inequality-constrained problems, another alternative is
barrier function, such as

(x) = f(x)
p

i=1
1
h
i
(x)
or

(x) = f(x)
p

i=1
log(h
i
(x))
which increasingly penalize feasible points as they
approach boundary of feasible region
Again, solutions of unconstrained problem approach x

as
0, but problems are increasingly ill-conditioned, so
solve sequence of problems with decreasing values of
Barrier functions are basis for interior point methods for
linear programming
Michael T. Heath Scientic Computing 67 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example: Constrained Optimization
Consider quadratic programming problem
min
x
f(x) = 0.5x
2
1
+ 2.5x
2
2
subject to
g(x) = x
1
x
2
1 = 0
Lagrangian function is given by
L(x, ) = f(x) + g(x) = 0.5x
2
1
+ 2.5x
2
2
+ (x
1
x
2
1)
Since
f(x) =

x
1
5x
2

and J
g
(x) =

1 1

we have

x
L(x, ) = f(x) +J
T
g
(x) =

x
1
5x
2

1
1

Michael T. Heath Scientic Computing 68 / 74


Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example, continued
So system to be solved for critical point of Lagrangian is
x
1
+ = 0
5x
2
= 0
x
1
x
2
= 1
which in this case is linear system

1 0 1
0 5 1
1 1 0

x
1
x
2

0
0
1

Solving this system, we obtain solution


x
1
= 0.833, x
2
= 0.167, = 0.833
Michael T. Heath Scientic Computing 69 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example, continued
Michael T. Heath Scientic Computing 70 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Linear Programming
One of most important and common constrained
optimization problems is linear programming
One standard form for such problems is
min f(x) = c
T
x subject to Ax = b and x 0
where m < n, A R
mn
, b R
m
, and c, x R
n
Feasible region is convex polyhedron in R
n
, and minimum
must occur at one of its vertices
Simplex method moves systematically from vertex to
vertex until minimum point is found
Michael T. Heath Scientic Computing 71 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Linear Programming, continued
Simplex method is reliable and normally efcient, able to
solve problems with thousands of variables, but can
require time exponential in size of problem in worst case
Interior point methods for linear programming developed in
recent years have polynomial worst case solution time
These methods move through interior of feasible region,
not restricting themselves to investigating only its vertices
Although interior point methods have signicant practical
impact, simplex method is still predominant method in
standard packages for linear programming, and its
effectiveness in practice is excellent
Michael T. Heath Scientic Computing 72 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example: Linear Programming
To illustrate linear programming, consider
min
x
= c
T
x = 8x
1
11x
2
subject to linear inequality constraints
5x
1
+ 4x
2
40, x
1
+ 3x
2
12, x
1
0, x
2
0
Minimum value must occur at vertex of feasible region, in
this case at x
1
= 3.79, x
2
= 5.26, where objective function
has value 88.2
Michael T. Heath Scientic Computing 73 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example, continued
Michael T. Heath Scientic Computing 74 / 74
AMS527: Numerical Analysis II
Supplementary Material on
Numerical Optimization
Xiangmin Jiao
SUNY Stony Brook
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 1 / 21
Outline
1
BFGS Method
2
Conjugate Gradient Methods
3
Constrained Optimization
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 2 / 21
BFGS Method
BFGS Method is one of most eective secant updating methods for
minimization
Named after Broyden, Fletcher, Goldfarb, and Shanno
Unlike Broydens method, BFGS preserves the symmetry of
approximate Hessian matrix
In addition, BFGS preserves the positive deniteness of the
approximate Hessian matrix
Reference: J. Nocedal, S. J. Wright, Numerical Optimization, 2nd
edition, Springer, 2006. Section 6.1.
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 3 / 21
Algorithm
x
0
=initial guess
B
0
=initial Hessian approximation
for k =0, 1, 2, . . .
Solve B
k
s
k
= f (x
k
) for s
k
x
k+1
= x
k
+s
k
y
k
= f (x
k+1
) f (x
k
)
B
k+1
= B
k
+ (y
k
y
T
k
)/(y
T
k
s
k
) (B
k
s
k
s
T
k
B
k
)/(s
T
k
B
k
s
k
)
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 4 / 21
Motivation of BFGS
Let s
k
= x
k+1
x
k
and y
k
= f (x
k+1
) f (x
k
)
Matrix B
k+1
should satisfy secant equation
B
k+1
s
k
= y
k
In addition, B
k+1
is positive denition, which requires s
T
k
y
k
> 0
There are innite number of B
k+1
that satises secant equation
Davidon (1950s) proposed to choose B
k+1
to be closest to B
k
, i.e.,
min
B
|B B
k
|
Subject to B = B
T
, Bs
k
= y
k
.
BFGS proposed to choose B
k+1
so that B
1
k+1
is closest to B
1
k
, i.e.,
min
B
|B
1
B
1
k
|
Subject to B = B
T
, Bs
k
= y
k
.
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 5 / 21
Properties of BFGS
BFGS normally has superlinear convergence rate, even though
approximate Hessian does not necessarily converge to true Hessian
Approximate Hessian preserves positive deniteness

Key idea of proof: Let H


k
denote B
1
k
. For any vector z ,= 0, and let
w = z
k
y
k
(s
T
k
z), where
k
> 0. Then it can be shown that
z
T
H
k+1
z = w
T
H
k
w +
k
(s
T
k
z)
2
0.
If s
T
k
z = 0, then w = z ,= 0. So z
T
H
k+1
z > 0.
Line search can be used to enhance eectiveness of BFGS. If exact line
search is performed at each iteration, BFGS terminates at exact
solution in at most n iterations for a quadratic objective function
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 6 / 21
Outline
1
BFGS Method
2
Conjugate Gradient Methods
3
Constrained Optimization
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 7 / 21
Motivation of Conjugate Gradients
Conjugate gradient can be used to solve a linear system Ax = b,
where A is symmetric positive denite (SPD)
If A is m m SPD, then quadratic function
(x) =
1
2
x
T
Ax x
T
b
has unique minimum
Negative gradient of this function is residual vector
(x) = b Ax = r
so minimum is obtained precisely when Ax = b
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 8 / 21
Search Direction in Conjugate Gradients
Optimization methods have form
x
n+1
= x
n
+
n
p
n
where p
n
is search direction and is step length chosen to minimize
(x
n
+
n
p
n
)
Line search parameter can be determined analytically as

n
= r
T
n
p
n
/p
T
n
Ap
n
In CG, p
n
is chosen to be A-conjugate (or A-orthogonal) to previous
search directions, i.e., p
T
n
Ap
j
= 0 for j < n
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 9 / 21
Optimality of Step Length
Select step length
n
over vector p
n1
to minimize
(x) =
1
2
x
T
Ax x
T
b
Let x
n
= x
n1
+
n
p
n1
,
(x
n
) =
1
2
(x
n1
+
n
p
n1
)
T
A(x
n1
+
n
p
n1
) (x
n1
+
n
p
n1
)
T
b
=
1
2

2
n
p
T
n1
Ap
n1
+
n
p
T
n1
Ax
n1

n
p
T
n1
b + constant
=
1
2

2
n
p
T
n1
Ap
n1

n
p
T
n1
r
n1
+ constant
Therefore,
d
d
n
= 0
n
p
T
n1
Ap
n1
p
T
n1
r
n1
= 0
n
=
p
T
n1
r
n1
p
T
n1
Ap
n1
.
In addition, p
T
n1
r
n1
= r
T
n1
r
n1
because p
n1
= r
n1
+
n
p
n2
and r
T
n1
p
n2
= 0 due to the following theorem.
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 10 / 21
Conjugate Gradient Method
Algorithm: Conjugate Gradient Method
x
0
= 0, r
0
= b, p
0
= r
0
for n = 1 to 1, 2, 3, . . .

n
= (r
T
n1
r
n1
)/(p
T
n1
Ap
n1
) step length
x
n
= x
n1
+
n
p
n1
approximate solution
r
n
= r
n1

n
Ap
n1
residual

n
= (r
T
n
r
n
)/(r
T
n1
r
n1
) improvement this step
p
n
= r
n
+
n
p
n1
search direction
Only one matrix-vector product Ap
n1
per iteration
Apart from matrix-vector product, #operations per iteration is O(m)
CG can be viewed as minimization of quadratic function
(x) =
1
2
x
T
Ax x
T
b by modifying steepest descent
First proposed by Hestens and Stiefel in 1950s
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 11 / 21
An Alternative Interpretation of CG
Algorithm: CG
x
0
= 0, r
0
= b, p
0
= r
0
for n =1, 2, 3, . . .

n
= r
T
n1
r
n1
/(p
T
n1
Ap
n1
)
x
n
= x
n1
+
n
p
n1
r
n
= r
n1

n
Ap
n1

n
= r
T
n
r
n
/(r
T
n1
r
n1
)
p
n
= r
n
+
n
p
n1
Algorithm: A non-standard CG
x
0
= 0, r
0
= b, p
0
= r
0
for n =1, 2, 3, . . .

n
= r
T
n1
p
n1
/(p
T
n1
Ap
n1
)
x
n
= x
n1
+
n
p
n1
r
n
= b Ax
n

n
= r
T
n
Ap
n1
/(p
T
n1
Ap
n1
)
p
n
= r
n
+
n
p
n1
The non-standard one is less ecient but easier to understand
It is easy to see r
n
= r
n1

n
Ap
n1
= b Ax
n
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 12 / 21
Comparison of Linear and Nonlinear CG
Algorithm: Linear CG
x
0
= 0, r
0
= b,
p
0
= r
0
for n =1, 2, 3, . . .

n
= r
T
n1
r
n1
/(p
T
n1
Ap
n1
)
x
n
= x
n1
+
n
p
n1
r
n
= r
n1

n
Ap
n1

n
= r
T
n
r
n
/(r
T
n1
r
n1
)
p
n
= r
n
+
n
p
n1
Algorithm: Non-linear CG
x
0
= initial guess, g
0
= f (x
0
),
s
0
= g
0
for k = 0, 1, 2, . . .
Choose
k
to min f (x
k
+
k
s
k
)
x
k+1
= x
k
+
k
s
k
g
k+1
= f (x
k+1
)

k+1
= (g
T
k+1
g
k+1
)/(g
T
k
g
k
)
s
k+1
= g
k+1
+
k+1
s
k

k+1
= (g
T
k+1
g
k+1
)/(g
T
k
g
k
) was due to Fletcher and Reeves (1964)
An alternative formula
k+1
= (g
k+1
g
k
)
T
g
k+1
/(g
T
k
g
k
) was due
to Polak and Riebiere (1969)
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 13 / 21
Properties of Conjugate Gradients
Krylov subspaces for Ax = b is /
n
= b, Ab, . . . , A
n1
b.
Theorem
If r
n1
,= 0, spaces spanned by approximate solutions x
n
, search directions
p
n
, and residuals r
n
are all equal to Krylov subspaces
/
n
= x
1
, x
2
, . . . , x
n
= p
0
, p
1
, . . . , p
n1

= r
0
, r
1
, . . . , r
n1
= b, Ab, . . . , A
n1
b
The residual are orthogonal (i.e., r
T
n
r
j
= 0 for j < n) and search directions
are A-conjugate (i.e, p
T
n
Ap
j
= 0 for j < n).
Theorem
If r
n1
,= 0, then error e
n
= x

x
n
are minimized in A-norm in /
n
.
Because /
n
grows monotonically, error decreases monotonically.
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 14 / 21
Rate of Convergence
Some important convergence results

If A has n distinct eigenvalues, CG converges in at most n steps

If A has 2-norm condition number , the errors are


|e
n
|
A
|e
0
|
A
2

+ 1

n
which is 2

1
2

n
as . So convergence is expected in
O(

) iterations.
In general, CG performs well with clustered eigenvalues
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 15 / 21
Outline
1
BFGS Method
2
Conjugate Gradient Methods
3
Constrained Optimization
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 16 / 21
Equality-Constrained Minimization
Equality-constrained problem has form
min
xR
n
f (x) subject to g(x) = 0
where objective function f : R
n
R and constraints g : R
n
R
m
,
where m n
Necessary condition for feasible point x to be solution is that negative
gradient of f lie in space spanned by constraint normals, i.e.,
f (x

) = J
T
g
(x

),
where J
g
is Jacobian matrix of g, and is vector of Lagrange
multipliers
Therefore, constrained local minimum must be critical point of
Lagrangian function
/(x, ) = f (x) +
T
g(x)
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 17 / 21
First-Order and Second-Order Optimality Conditions
Equality-constrained minimization can be reduced to solving
/(x, ) =

f (x) +J
T
g
(x)
g(x)

= 0,
which is known as Karush-Kuhn-Tucker (or KKT) condition for
constrained local minimum.
Hessian of Lagrangian function
H
L
(x, ) =

B(x, ) J
T
g
(x)
J
g
(x) 0

where B(x, ) = H
f
(x) +

m
i =1

i
H
g
i
(x). H
L
is sometimes called
KKT (Karush-Kuhn-Tucker) matrix. H
L
is symmetric, but not in
general positive denite
Critical point (x

) of / is constrained minimum if B(x

) is
positive denite on null space of J
g
(x

).
Let Z form basis of null (J
g
(x

)), then projected Hessian Z


T
BZ
should be positive denite
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 18 / 21
Sequential Quadratic Programming
/(x, ) = 0 can be solved using Newtons method. kth iteration of
Newtons step is

B(x
k
,
k
) J
T
g
(x
k
)
J
g
(x
k
) 0

s
k

f (x
k
) +J
T
g
(x
k
)
k
g(x
k
)

,
and then x
k+1
= x
k
+s
k
and
k+1
=
k
+
k
Above system of equations is rst-order optimality condition for
constrained optimization problem
min
s
1
2
s
T
B(x
k
,
k
)s +s
T

f (x
k
) +J
T
g
(x
k
)
k

subject to
J
g
(x
k
)s +g(x
k
) = 0.
This problem is quadratic programming problem, so approach using
Newtons method is known as sequential quadratic programing
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 19 / 21
Solving KKT System
KKT system

B J
T
J 0

s

w
g

can be solved in several


ways
Direct solution

Solve system using method for symmetric indenite factorization, such


as LDL
T
with pivoting, or

Use iterative method such as GMRES, MINRES


Range-space method

Use block elimination and obtain symmetric system

JB
1
J
T

= g JB
1
w
and then
Bs = w J
T

First equation nds in range space of J

It is attractive when number of constraints m is relatively small,


because JB
1
J
T
is m m

However, it requires B to be nonsingular and J has full rank. Also,


condition number of JB
1
J
T
may be large
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 20 / 21
Solving KKT System Contd
Null space method
Let Z be composed of null space of J, and it can be obtained by QR
factorization of J
T
. Then JZ = 0
Let JY = R
T
, and write s = Yu +Zv. Second block row yields
Js = J(Yu +Zv) = R
T
u = g
and premultiplying rst block row by Z
T
yields

Z
T
BZ

v = Z
T
(w BYu)
Finally,
Y
T
J
T
= R = Y
T
(w +Bs)
This method method is advantageous when n m is small
It is more stable than range-space method. Also, B does not need to
be nonsingular
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 21 / 21
AMS527: Numerical Analysis II
Linear Programming
Xiangmin Jiao
SUNY Stony Brook
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 1 / 8
Linear Programming
Linear programming has linear objective function and linear equality
and inequality constraints
Example: Maximize prot of combination of wheat and barley, but
with limited budget of land, fertilizer, and insecticide. Let x
1
and x
2
be areas planted for wheat and barley, we have linear programming
problem
maximize c
1
x
1
+c
2
x
2
{maximize revenue}
0 x
1
+x
2
L {limit on area}
F
1
x
1
+F
2
x
2
F {limit on fertilizer}
P
1
x
1
+P
2
x
2
P {limit on insecticide}
x
1
0, x
2
0 {nonnegative land}
Linear programming is typically solved by simplex methods or interior
point methods
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 2 / 8
Standard Form of Linear Programming
Linear programming has many forms. A standard form (called slack
form) is
min cx subject to Ax = b and x 0
Simplex method and interior-point method requires slack form
Previous example can be converted into standard form
minimize (c
1
) x
1
+(c
2
)x
2
{maximize revenue}
x
1
+x
2
+x
3
= L {limit on area}
F
1
x
1
+F
2
x
2
+x
4
= F {limit on fertilizer}
P
1
x
1
+P
2
x
2
+x
5
= P {limit on insecticide}
x
1
, x
2
, x
3
, x
4
, x
5
0 {nonnegativity}
Here, x
3
, x
4
, and x
5
are called slack variables
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 3 / 8
Duality
m equations Ax = b have m corresponding Lagrange multipliers in y
Primal problem
Minimize c
T
x subject to Ax = b and x 0
Dual problem
Maximize b
T
y subject to A
T
y c
Weak duality: b
T
y c
T
x for any feasible x and y

because b
T
y = (Ax)
T
y = x
T

A
T
y

x
T
c = c
T
x
Strong duality: If both feasible sets of primal and dual problems are
nonempty, then c
T
x

= b
T
y

at optimal x

and y

Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 4 / 8


Simplex Methods
Developed by George Dantzig in 1947
Key observation: Feasible region is convex polytope in R
n
, and
minimum must occur at one of its vertices
Basic idea: Construct a feasible solution at a vertex of the polytope,
walk along a path on the edges of the polytope to vertices with
non-decreasing values of the objective function, until an optimum is
reached
Simplex method in the worst case can be slow, because number of
corners is exponential with m and n
However, its average-case complexity is polynomial time, and in
practice, best corner is often found in 2m steps
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 5 / 8
Interior Point Methods
First proposed by Narendra Karmarkar in 1984
In contrast to simplex methods, interior point methods move through
the interior of the feasible region
Barrier problem
minimize c
T
x (log x
1
+ +log x
n
) with Ax = b
When any x
i
touches zero, extra cost log x
i
blows up
Barrier problem gives approximate problem for each . Its Lagrangian
is
L(x, y, ) = c
T
x

log x
i

y
T
(Ax b)
The derivatives L/x
j
= c
j


x
j
(A
T
y)
j
= 0, or x
j
s
j
= , where
s = c A
T
y
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 6 / 8
Newton Step
n optimality equations x
j
s
j
= are nonlinear, and are solved iteratively
using Newtons method
To determine increment x, y, and s, we need to solve
(x
i
+x
i
)(s
i
+s
i
) = . It is typical to ignore second order term
x
i
s
i
. Then linear equations become
Ax = 0
A
T
y +s = 0
s
j
x
j
+x
j
s
j
= x
j
s
j
.
The iteration has quadratic convergence for each , and approaches
zero
Gilbert Strang, Computational Science and Engineering, Wellesley
Cambridge, 2007. Section 8.6, Linear Programming and Duality.
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 7 / 8
Example
Minimize c
T
x = 5x
1
+3x
2
+8x
3
with x
i
0 and Ax = x
1
+x
2
+2x
3
= 4.
Barrier Lagrangian is
L = (5x
1
+3x
2
+8x
3
) +(log x
1
+log x
2
+log x
3
) y(x
1
+x
2
+2x
3
4)
Optimality equation gives us:
s = c A
T
y s
1
= 5 y, s
2
= 3 y, s
3
= 8 2y
L/x
i
= 0 x
1
s
1
= x
2
s
2
= x
3
s
3
=
L/y = 0 x
1
+x
2
+2x
3
= 4
.
Start from an interior point x
1
= x
2
= x
3
= 1, y = 1, and s = (3, 1, 4).
From Ax = 0 and x
j
s
j
. +s
j
x
j
+x
j
s
j
= , we obtain equations
3x
1
1y = 3
1x
2
1y = 1
4x
3
2y = 4
x
1
+x
2
+2x
3
= 0.
Given = 4/3, we then obtain x
new
= (2/3, 2, 2/3) and y
new
= 8/3,
whereas x

= (0, 4, 0) and y

= 3
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 8 / 8
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Scientic Computing: An Introductory Survey
Chapter 7 Interpolation
Prof. Michael T. Heath
Department of Computer Science
University of Illinois at Urbana-Champaign
Copyright c _ 2002. Reproduction permitted
for noncommercial, educational use only.
Michael T. Heath Scientic Computing 1 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Outline
1
Interpolation
2
Polynomial Interpolation
3
Piecewise Polynomial Interpolation
Michael T. Heath Scientic Computing 2 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Interpolation
Basic interpolation problem: for given data
(t
1
, y
1
), (t
2
, y
2
), . . . (t
m
, y
m
) with t
1
< t
2
< < t
m
determine function f : R R such that
f(t
i
) = y
i
, i = 1, . . . , m
f is interpolating function, or interpolant, for given data
Additional data might be prescribed, such as slope of
interpolant at given points
Additional constraints might be imposed, such as
smoothness, monotonicity, or convexity of interpolant
f could be function of more than one variable, but we will
consider only one-dimensional case
Michael T. Heath Scientic Computing 3 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Purposes for Interpolation
Plotting smooth curve through discrete data points
Reading between lines of table
Differentiating or integrating tabular data
Quick and easy evaluation of mathematical function
Replacing complicated function by simple one
Michael T. Heath Scientic Computing 4 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Interpolation vs Approximation
By denition, interpolating function ts given data points
exactly
Interpolation is inappropriate if data points subject to
signicant errors
It is usually preferable to smooth noisy data, for example
by least squares approximation
Approximation is also more appropriate for special function
libraries
Michael T. Heath Scientic Computing 5 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Issues in Interpolation
Arbitrarily many functions interpolate given set of data points
What form should interpolating function have?
How should interpolant behave between data points?
Should interpolant inherit properties of data, such as
monotonicity, convexity, or periodicity?
Are parameters that dene interpolating function
meaningful?
If function and data are plotted, should results be visually
pleasing?
Michael T. Heath Scientic Computing 6 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Choosing Interpolant
Choice of function for interpolation based on
How easy interpolating function is to work with
determining its parameters
evaluating interpolant
differentiating or integrating interpolant
How well properties of interpolant match properties of data
to be t (smoothness, monotonicity, convexity, periodicity,
etc.)
Michael T. Heath Scientic Computing 7 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Functions for Interpolation
Families of functions commonly used for interpolation
include
Polynomials
Piecewise polynomials
Trigonometric functions
Exponential functions
Rational functions
For now we will focus on interpolation by polynomials and
piecewise polynomials
We will consider trigonometric interpolation (DFT) later
Michael T. Heath Scientic Computing 8 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Basis Functions
Family of functions for interpolating given data points is
spanned by set of basis functions
1
(t), . . . ,
n
(t)
Interpolating function f is chosen as linear combination of
basis functions,
f(t) =
n

j=1
x
j

j
(t)
Requiring f to interpolate data (t
i
, y
i
) means
f(t
i
) =
n

j=1
x
j

j
(t
i
) = y
i
, i = 1, . . . , m
which is system of linear equations Ax = y for n-vector x
of parameters x
j
, where entries of mn matrix A are
given by a
ij
=
j
(t
i
)
Michael T. Heath Scientic Computing 9 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Existence, Uniqueness, and Conditioning
Existence and uniqueness of interpolant depend on
number of data points m and number of basis functions n
If m > n, interpolant usually doesnt exist
If m < n, interpolant is not unique
If m = n, then basis matrix A is nonsingular provided data
points t
i
are distinct, so data can be t exactly
Sensitivity of parameters x to perturbations in data
depends on cond(A), which depends in turn on choice of
basis functions
Michael T. Heath Scientic Computing 10 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Polynomial Interpolation
Simplest and most common type of interpolation uses
polynomials
Unique polynomial of degree at most n 1 passes through
n data points (t
i
, y
i
), i = 1, . . . , n, where t
i
are distinct
There are many ways to represent or compute interpolating
polynomial, but in theory all must give same result
< interactive example >
Michael T. Heath Scientic Computing 11 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Monomial Basis
Monomial basis functions

j
(t) = t
j1
, j = 1, . . . , n
give interpolating polynomial of form
p
n1
(t) = x
1
+ x
2
t + + x
n
t
n1
with coefcients x given by n n linear system
Ax =
_

_
1 t
1
t
n1
1
1 t
2
t
n1
2
.
.
.
.
.
.
.
.
.
.
.
.
1 t
n
t
n1
n
_

_
_

_
x
1
x
2
.
.
.
x
n
_

_
=
_

_
y
1
y
2
.
.
.
y
n
_

_
= y
Matrix of this form is called Vandermonde matrix
Michael T. Heath Scientic Computing 12 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Example: Monomial Basis
Determine polynomial of degree two interpolating three
data points (2, 27), (0, 1), (1, 0)
Using monomial basis, linear system is
Ax =
_
_
1 t
1
t
2
1
1 t
2
t
2
2
1 t
3
t
2
3
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
y
1
y
2
y
3
_
_
= y
For these particular data, system is
_
_
1 2 4
1 0 0
1 1 1
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
27
1
0
_
_
whose solution is x =
_
1 5 4

T
, so interpolating
polynomial is
p
2
(t) = 1 + 5t 4t
2
Michael T. Heath Scientic Computing 13 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Monomial Basis, continued
< interactive example >
Solving system Ax = y using standard linear equation
solver to determine coefcients x of interpolating
polynomial requires O(n
3
) work
Michael T. Heath Scientic Computing 14 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Monomial Basis, continued
For monomial basis, matrix A is increasingly ill-conditioned
as degree increases
Ill-conditioning does not prevent tting data points well,
since residual for linear system solution will be small
But it does mean that values of coefcients are poorly
determined
Both conditioning of linear system and amount of
computational work required to solve it can be improved by
using different basis
Change of basis still gives same interpolating polynomial
for given data, but representation of polynomial will be
different
Michael T. Heath Scientic Computing 15 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Monomial Basis, continued
Conditioning with monomial basis can be improved by
shifting and scaling independent variable t

j
(t) =
_
t c
d
_
j1
where, c = (t
1
+ t
n
)/2 is midpoint and d = (t
n
t
1
)/2 is
half of range of data
New independent variable lies in interval [1, 1], which also
helps avoid overow or harmful underow
Even with optimal shifting and scaling, monomial basis
usually is still poorly conditioned, and we must seek better
alternatives
< interactive example >
Michael T. Heath Scientic Computing 16 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Evaluating Polynomials
When represented in monomial basis, polynomial
p
n1
(t) = x
1
+ x
2
t + + x
n
t
n1
can be evaluated efciently using Horners nested
evaluation scheme
p
n1
(t) = x
1
+ t(x
2
+ t(x
3
+ t( (x
n1
+ tx
n
) )))
which requires only n additions and n multiplications
For example,
1 4t + 5t
2
2t
3
+ 3t
4
= 1 + t(4 + t(5 + t(2 + 3t)))
Other manipulations of interpolating polynomial, such as
differentiation or integration, are also relatively easy with
monomial basis representation
Michael T. Heath Scientic Computing 17 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Lagrange Interpolation
For given set of data points (t
i
, y
i
), i = 1, . . . , n, Lagrange
basis functions are dened by

j
(t) =
n

k=1,k=j
(t t
k
) /
n

k=1,k=j
(t
j
t
k
), j = 1, . . . , n
For Lagrange basis,

j
(t
i
) =
_
1 if i = j
0 if i ,= j
, i, j = 1, . . . , n
so matrix of linear system Ax = y is identity matrix
Thus, Lagrange polynomial interpolating data points (t
i
, y
i
)
is given by
p
n1
(t) = y
1

1
(t) + y
2

2
(t) + + y
n

n
(t)
Michael T. Heath Scientic Computing 18 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Lagrange Basis Functions
< interactive example >
Lagrange interpolant is easy to determine but more
expensive to evaluate for given argument, compared with
monomial basis representation
Lagrangian form is also more difcult to differentiate,
integrate, etc.
Michael T. Heath Scientic Computing 19 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Example: Lagrange Interpolation
Use Lagrange interpolation to determine interpolating
polynomial for three data points (2, 27), (0, 1), (1, 0)
Lagrange polynomial of degree two interpolating three
points (t
1
, y
1
), (t
2
, y
2
), (t
3
, y
3
) is given by p
2
(t) =
y
1
(t t
2
)(t t
3
)
(t
1
t
2
)(t
1
t
3
)
+ y
2
(t t
1
)(t t
3
)
(t
2
t
1
)(t
2
t
3
)
+ y
3
(t t
1
)(t t
2
)
(t
3
t
1
)(t
3
t
2
)
For these particular data, this becomes
p
2
(t) = 27
t(t 1)
(2)(2 1)
+ (1)
(t + 2)(t 1)
(2)(1)
Michael T. Heath Scientic Computing 20 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Newton Interpolation
For given set of data points (t
i
, y
i
), i = 1, . . . , n, Newton
basis functions are dened by

j
(t) =
j1

k=1
(t t
k
), j = 1, . . . , n
where value of product is taken to be 1 when limits make it
vacuous
Newton interpolating polynomial has form
p
n1
(t) = x
1
+ x
2
(t t
1
) + x
3
(t t
1
)(t t
2
) +
+ x
n
(t t
1
)(t t
2
) (t t
n1
)
For i < j,
j
(t
i
) = 0, so basis matrix A is lower triangular,
where a
ij
=
j
(t
i
)
Michael T. Heath Scientic Computing 21 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Newton Basis Functions
< interactive example >
Michael T. Heath Scientic Computing 22 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Newton Interpolation, continued
Solution x to system Ax = y can be computed by
forward-substitution in O(n
2
) arithmetic operations
Moreover, resulting interpolant can be evaluated efciently
for any argument by nested evaluation scheme similar to
Horners method
Newton interpolation has better balance between cost of
computing interpolant and cost of evaluating it
Michael T. Heath Scientic Computing 23 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Example: Newton Interpolation
Use Newton interpolation to determine interpolating
polynomial for three data points (2, 27), (0, 1), (1, 0)
Using Newton basis, linear system is
_
_
1 0 0
1 t
2
t
1
0
1 t
3
t
1
(t
3
t
1
)(t
3
t
2
)
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
y
1
y
2
y
3
_
_
For these particular data, system is
_
_
1 0 0
1 2 0
1 3 3
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
27
1
0
_
_
whose solution by forward substitution is
x =
_
27 13 4

T
, so interpolating polynomial is
p(t) = 27 + 13(t + 2) 4(t + 2)t
Michael T. Heath Scientic Computing 24 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Newton Interpolation, continued
If p
j
(t) is polynomial of degree j 1 interpolating j given
points, then for any constant x
j+1
,
p
j+1
(t) = p
j
(t) + x
j+1

j+1
(t)
is polynomial of degree j that also interpolates same j
points
Free parameter x
j+1
can then be chosen so that p
j+1
(t)
interpolates y
j+1
,
x
j+1
=
y
j+1
p
j
(t
j+1
)

j+1
(t
j+1
)
Newton interpolation begins with constant polynomial
p
1
(t) = y
1
interpolating rst data point and then
successively incorporates each remaining data point into
interpolant < interactive example >
Michael T. Heath Scientic Computing 25 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Divided Differences
Given data points (t
i
, y
i
), i = 1, . . . , n, divided differences,
denoted by f[ ], are dened recursively by
f[t
1
, t
2
, . . . , t
k
] =
f[t
2
, t
3
, . . . , t
k
] f[t
1
, t
2
, . . . , t
k1
]
t
k
t
1
where recursion begins with f[t
k
] = y
k
, k = 1, . . . , n
Coefcient of jth basis function in Newton interpolant is
given by
x
j
= f[t
1
, t
2
, . . . , t
j
]
Recursion requires O(n
2
) arithmetic operations to compute
coefcients of Newton interpolant, but is less prone to
overow or underow than direct formation of triangular
Newton basis matrix
Michael T. Heath Scientic Computing 26 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Orthogonal Polynomials
Inner product can be dened on space of polynomials on
interval [a, b] by taking
p, q) =
_
b
a
p(t)q(t)w(t)dt
where w(t) is nonnegative weight function
Two polynomials p and q are orthogonal if p, q) = 0
Set of polynomials p
i
is orthonormal if
p
i
, p
j
) =
_
1 if i = j
0 otherwise
Given set of polynomials, Gram-Schmidt orthogonalization
can be used to generate orthonormal set spanning same
space
Michael T. Heath Scientic Computing 27 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Orthogonal Polynomials, continued
For example, with inner product given by weight function
w(t) 1 on interval [1, 1], applying Gram-Schmidt
process to set of monomials 1, t, t
2
, t
3
, . . . yields Legendre
polynomials
1, t, (3t
2
1)/2, (5t
3
3t)/2, (35t
4
30t
2
+ 3)/8,
(63t
5
70t
3
+ 15t)/8, . . .
rst n of which form an orthogonal basis for space of
polynomials of degree at most n 1
Other choices of weight functions and intervals yield other
orthogonal polynomials, such as Chebyshev, Jacobi,
Laguerre, and Hermite
Michael T. Heath Scientic Computing 28 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Orthogonal Polynomials, continued
Orthogonal polynomials have many useful properties
They satisfy three-term recurrence relation of form
p
k+1
(t) = (
k
t +
k
)p
k
(t)
k
p
k1
(t)
which makes them very efcient to generate and evaluate
Orthogonality makes them very natural for least squares
approximation, and they are also useful for generating
Gaussian quadrature rules, which we will see later
Michael T. Heath Scientic Computing 29 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Chebyshev Polynomials
kth Chebyshev polynomial of rst kind, dened on interval
[1, 1] by
T
k
(t) = cos(k arccos(t))
are orthogonal with respect to weight function (1 t
2
)
1/2
First few Chebyshev polynomials are given by
1, t, 2t
2
1, 4t
3
3t, 8t
4
8t
2
+1, 16t
5
20t
3
+5t, . . .
Equi-oscillation property : successive extrema of T
k
are
equal in magnitude and alternate in sign, which distributes
error uniformly when approximating arbitrary continuous
function
Michael T. Heath Scientic Computing 30 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Chebyshev Basis Functions
< interactive example >
Michael T. Heath Scientic Computing 31 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Chebyshev Points
Chebyshev points are zeros of T
k
, given by
t
i
= cos
_
(2i 1)
2k
_
, i = 1, . . . , k
or extrema of T
k
, given by
t
i
= cos
_
i
k
_
, i = 0, 1, . . . , k
Chebyshev points are abscissas of points equally spaced
around unit circle in R
2
Chebyshev points have attractive properties for
interpolation and other problems
Michael T. Heath Scientic Computing 32 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Interpolating Continuous Functions
If data points are discrete sample of continuous function,
how well does interpolant approximate that function
between sample points?
If f is smooth function, and p
n1
is polynomial of degree at
most n 1 interpolating f at n points t
1
, . . . , t
n
, then
f(t) p
n1
(t) =
f
(n)
()
n!
(t t
1
)(t t
2
) (t t
n
)
where is some (unknown) point in interval [t
1
, t
n
]
Since point is unknown, this result is not particularly
useful unless bound on appropriate derivative of f is
known
Michael T. Heath Scientic Computing 33 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Interpolating Continuous Functions, continued
If [f
(n)
(t)[ M for all t [t
1
, t
n
], and
h = maxt
i+1
t
i
: i = 1, . . . , n 1, then
max
t[t
1
,t
n
]
[f(t) p
n1
(t)[
Mh
n
4n
Error diminishes with increasing n and decreasing h, but
only if [f
(n)
(t)[ does not grow too rapidly with n
< interactive example >
Michael T. Heath Scientic Computing 34 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
High-Degree Polynomial Interpolation
Interpolating polynomials of high degree are expensive to
determine and evaluate
In some bases, coefcients of polynomial may be poorly
determined due to ill-conditioning of linear system to be
solved
High-degree polynomial necessarily has lots of wiggles,
which may bear no relation to data to be t
Polynomial passes through required data points, but it may
oscillate wildly between data points
Michael T. Heath Scientic Computing 35 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Convergence
Polynomial interpolating continuous function may not
converge to function as number of data points and
polynomial degree increases
Equally spaced interpolation points often yield
unsatisfactory results near ends of interval
If points are bunched near ends of interval, more
satisfactory results are likely to be obtained with
polynomial interpolation
Use of Chebyshev points distributes error evenly and
yields convergence throughout interval for any sufciently
smooth function
Michael T. Heath Scientic Computing 36 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Example: Runges Function
Polynomial interpolants of Runges function at equally
spaced points do not converge
< interactive example >
Michael T. Heath Scientic Computing 37 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Example: Runges Function
Polynomial interpolants of Runges function at Chebyshev
points do converge
< interactive example >
Michael T. Heath Scientic Computing 38 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Taylor Polynomial
Another useful form of polynomial interpolation for smooth
function f is polynomial given by truncated Taylor series
p
n
(t) = f(a)+f

(a)(ta)+
f

(a)
2
(ta)
2
+ +
f
(n)
(a)
n!
(ta)
n
Polynomial interpolates f in that values of p
n
and its rst n
derivatives match those of f and its rst n derivatives
evaluated at t = a, so p
n
(t) is good approximation to f(t)
for t near a
We have already seen examples in Newtons method for
nonlinear equations and optimization
< interactive example >
Michael T. Heath Scientic Computing 39 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Piecewise Polynomial Interpolation
Fitting single polynomial to large number of data points is
likely to yield unsatisfactory oscillating behavior in
interpolant
Piecewise polynomials provide alternative to practical and
theoretical difculties with high-degree polynomial
interpolation
Main advantage of piecewise polynomial interpolation is
that large number of data points can be t with low-degree
polynomials
In piecewise interpolation of given data points (t
i
, y
i
),
different function is used in each subinterval [t
i
, t
i+1
]
Abscissas t
i
are called knots or breakpoints, at which
interpolant changes from one function to another
Michael T. Heath Scientic Computing 40 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Piecewise Interpolation, continued
Simplest example is piecewise linear interpolation, in
which successive pairs of data points are connected by
straight lines
Although piecewise interpolation eliminates excessive
oscillation and nonconvergence, it appears to sacrice
smoothness of interpolating function
We have many degrees of freedom in choosing piecewise
polynomial interpolant, however, which can be exploited to
obtain smooth interpolating function despite its piecewise
nature
< interactive example >
Michael T. Heath Scientic Computing 41 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Hermite Interpolation
In Hermite interpolation, derivatives as well as values of
interpolating function are taken into account
Including derivative values adds more equations to linear
system that determines parameters of interpolating
function
To have unique solution, number of equations must equal
number of parameters to be determined
Piecewise cubic polynomials are typical choice for Hermite
interpolation, providing exibility, simplicity, and efciency
Michael T. Heath Scientic Computing 42 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Hermite Cubic Interpolation
Hermite cubic interpolant is piecewise cubic polynomial
interpolant with continuous rst derivative
Piecewise cubic polynomial with n knots has 4(n 1)
parameters to be determined
Requiring that it interpolate given data gives 2(n 1)
equations
Requiring that it have one continuous derivative gives n 2
additional equations, or total of 3n 4, which still leaves n
free parameters
Thus, Hermite cubic interpolant is not unique, and
remaining free parameters can be chosen so that result
satises additional constraints
Michael T. Heath Scientic Computing 43 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Cubic Spline Interpolation
Spline is piecewise polynomial of degree k that is k 1
times continuously differentiable
For example, linear spline is of degree 1 and has 0
continuous derivatives, i.e., it is continuous, but not
smooth, and could be described as broken line
Cubic spline is piecewise cubic polynomial that is twice
continuously differentiable
As with Hermite cubic, interpolating given data and
requiring one continuous derivative imposes 3n 4
constraints on cubic spline
Requiring continuous second derivative imposes n 2
additional constraints, leaving 2 remaining free parameters
Michael T. Heath Scientic Computing 44 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Cubic Splines, continued
Final two parameters can be xed in various ways
Specify rst derivative at endpoints t
1
and t
n
Force second derivative to be zero at endpoints, which
gives natural spline
Enforce not-a-knot condition, which forces two
consecutive cubic pieces to be same
Force rst derivatives, as well as second derivatives, to
match at endpoints t
1
and t
n
(if spline is to be periodic)
Michael T. Heath Scientic Computing 45 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Example: Cubic Spline Interpolation
Determine natural cubic spline interpolating three data
points (t
i
, y
i
), i = 1, 2, 3
Required interpolant is piecewise cubic function dened by
separate cubic polynomials in each of two intervals [t
1
, t
2
]
and [t
2
, t
3
]
Denote these two polynomials by
p
1
(t) =
1
+
2
t +
3
t
2
+
4
t
3
p
2
(t) =
1
+
2
t +
3
t
2
+
4
t
3
Eight parameters are to be determined, so we need eight
equations
Michael T. Heath Scientic Computing 46 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Example, continued
Requiring rst cubic to interpolate data at end points of rst
interval [t
1
, t
2
] gives two equations

1
+
2
t
1
+
3
t
2
1
+
4
t
3
1
= y
1

1
+
2
t
2
+
3
t
2
2
+
4
t
3
2
= y
2
Requiring second cubic to interpolate data at end points of
second interval [t
2
, t
3
] gives two equations

1
+
2
t
2
+
3
t
2
2
+
4
t
3
2
= y
2

1
+
2
t
3
+
3
t
2
3
+
4
t
3
3
= y
3
Requiring rst derivative of interpolant to be continuous at
t
2
gives equation

2
+ 2
3
t
2
+ 3
4
t
2
2
=
2
+ 2
3
t
2
+ 3
4
t
2
2
Michael T. Heath Scientic Computing 47 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Example, continued
Requiring second derivative of interpolant function to be
continuous at t
2
gives equation
2
3
+ 6
4
t
2
= 2
3
+ 6
4
t
2
Finally, by denition natural spline has second derivative
equal to zero at endpoints, which gives two equations
2
3
+ 6
4
t
1
= 0
2
3
+ 6
4
t
3
= 0
When particular data values are substituted for t
i
and y
i
,
system of eight linear equations can be solved for eight
unknown parameters
i
and
i
Michael T. Heath Scientic Computing 48 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Hermite Cubic vs Spline Interpolation
Choice between Hermite cubic and spline interpolation
depends on data to be t and on purpose for doing
interpolation
If smoothness is of paramount importance, then spline
interpolation may be most appropriate
But Hermite cubic interpolant may have more pleasing
visual appearance and allows exibility to preserve
monotonicity if original data are monotonic
In any case, it is advisable to plot interpolant and data to
help assess how well interpolating function captures
behavior of original data
< interactive example >
Michael T. Heath Scientic Computing 49 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Hermite Cubic vs Spline Interpolation
Michael T. Heath Scientic Computing 50 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
B-splines
B-splines form basis for family of spline functions of given
degree
B-splines can be dened in various ways, including
recursion (which we will use), convolution, and divided
differences
Although in practice we use only nite set of knots
t
1
, . . . , t
n
, for notational convenience we will assume
innite set of knots
< t
2
< t
1
< t
0
< t
1
< t
2
<
Additional knots can be taken as arbitrarily dened points
outside interval [t
1
, t
n
]
We will also use linear functions
v
k
i
(t) = (t t
i
)/(t
i+k
t
i
)
Michael T. Heath Scientic Computing 51 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
B-splines, continued
To start recursion, dene B-splines of degree 0 by
B
0
i
(t) =
_
1 if t
i
t < t
i+1
0 otherwise
and then for k > 0 dene B-splines of degree k by
B
k
i
(t) = v
k
i
(t)B
k1
i
(t) + (1 v
k
i+1
(t))B
k1
i+1
(t)
Since B
0
i
is piecewise constant and v
k
i
is linear, B
1
i
is
piecewise linear
Similarly, B
2
i
is in turn piecewise quadratic, and in general,
B
k
i
is piecewise polynomial of degree k
Michael T. Heath Scientic Computing 52 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
B-splines, continued
< interactive example >
Michael T. Heath Scientic Computing 53 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
B-splines, continued
Important properties of B-spline functions B
k
i
1
For t < t
i
or t > t
i+k+1
, B
k
i
(t) = 0
2
For t
i
< t < t
i+k+1
, B
k
i
(t) > 0
3
For all t,

i=
B
k
i
(t) = 1
4
For k 1, B
k
i
has k 1 continuous derivatives
5
Set of functions B
k
1k
, . . . , B
k
n1
is linearly independent
on interval [t
1
, t
n
] and spans space of all splines of degree
k having knots t
i
Michael T. Heath Scientic Computing 54 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
B-splines, continued
Properties 1 and 2 together say that B-spline functions
have local support
Property 3 gives normalization
Property 4 says that they are indeed splines
Property 5 says that for given k, these functions form basis
for set of all splines of degree k
Michael T. Heath Scientic Computing 55 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
B-splines, continued
If we use B-spline basis, linear system to be solved for
spline coefcients will be nonsingular and banded
Use of B-spline basis yields efcient and stable methods
for determining and evaluating spline interpolants, and
many library routines for spline interpolation are based on
this approach
B-splines are also useful in many other contexts, such as
numerical solution of differential equations, as we will see
later
Michael T. Heath Scientic Computing 56 / 56
AMS527: Numerical Analysis II
Review for Test 1
Xiangmin Jiao
SUNY Stony Brook
Xiangmin Jiao SUNY Stony Brook AMS527: Numerical Analysis II 1 / 5
Approximations in Scientic Computations
Concepts
Absolute error, relative error
Computational error, propagated data error
Truncation error, rounding error
Forward error, backward error
Condition number, stability
Cancellation
Xiangmin Jiao SUNY Stony Brook AMS527: Numerical Analysis II 2 / 5
Solutions of Nonlinear Equations
Concepts
Multiplicity
Sensitivity
Convergence rate
Basic methods
Interval bisection method
Fixed-point iteration
Newtons method
Secant method, Broydens method
Other Newton-like method
Xiangmin Jiao SUNY Stony Brook AMS527: Numerical Analysis II 3 / 5
Numerical Optimization
Concepts
Unconstrained optimization, constrained optimization (linear
vs. nonlinear programming)
Global vs. local minimum
First- and second-order optimality condition
Coercive, convex, unimodality
Methods for unconstrained optimization
Golden section search
Newtons method, Quasi-Newton methods (basic ideas)
Steepest descent, conjugate gradient (basic ideas)
Methods for constrained optimization (especially
equality-constrained optimization)
Lagrange multiplier for constrained optimization
Lagrange function and its solution
Linear programming
Xiangmin Jiao SUNY Stony Brook AMS527: Numerical Analysis II 4 / 5
Polynomial interpolation
Concepts
Existence and uniqueness
Interpolation vs. approximation
Accuracy; Runges phenomena
Methods
Monomial basis
Lagrange interpolant
Newton interpolation and divided dierence
Orthogonal polynomials
Xiangmin Jiao SUNY Stony Brook AMS527: Numerical Analysis II 5 / 5

You might also like