Big Review
Big Review
Approximations
Computer Arithmetic
Scientic Computing: An Introductory Survey
Chapter 1 Scientic Computing
Prof. Michael T. Heath
Department of Computer Science
University of Illinois at Urbana-Champaign
Copyright c 2002. Reproduction permitted
for noncommercial, educational use only.
Michael T. Heath Scientic Computing 1 / 46
Scientic Computing
Approximations
Computer Arithmetic
Outline
1
Scientic Computing
2
Approximations
3
Computer Arithmetic
Michael T. Heath Scientic Computing 2 / 46
Scientic Computing
Approximations
Computer Arithmetic
Introduction
Computational Problems
General Strategy
Scientic Computing
What is scientic computing?
Design and analysis of algorithms for numerically solving
mathematical problems in science and engineering
Traditionally called numerical analysis
Distinguishing features of scientic computing
Deals with continuous quantities
Considers effects of approximations
Why scientic computing?
Simulation of natural phenomena
Virtual prototyping of engineering designs
Michael T. Heath Scientic Computing 3 / 46
Scientic Computing
Approximations
Computer Arithmetic
Introduction
Computational Problems
General Strategy
Well-Posed Problems
Problem is well-posed if solution
exists
is unique
depends continuously on problem data
Otherwise, problem is ill-posed
Even if problem is well posed, solution may still be
sensitive to input data
Computational algorithm should not make sensitivity worse
Michael T. Heath Scientic Computing 4 / 46
Scientic Computing
Approximations
Computer Arithmetic
Introduction
Computational Problems
General Strategy
General Strategy
Replace difcult problem by easier one having same or
closely related solution
innite nite
differential algebraic
nonlinear linear
complicated simple
Solution obtained may only approximate that of original
problem
Michael T. Heath Scientic Computing 5 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Sources of Approximation
Before computation
modeling
empirical measurements
previous computations
During computation
truncation or discretization
rounding
Accuracy of nal result reects all these
Uncertainty in input may be amplied by problem
Perturbations during computation may be amplied by
algorithm
Michael T. Heath Scientic Computing 6 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Example: Approximations
Computing surface area of Earth using formula A = 4r
2
involves several approximations
Earth is modeled as sphere, idealizing its true shape
Value for radius is based on empirical measurements and
previous computations
Value for requires truncating innite process
Values for input data and results of arithmetic operations
are rounded in computer
Michael T. Heath Scientic Computing 7 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Absolute Error and Relative Error
Absolute error : approximate value true value
Relative error :
absolute error
true value
Equivalently, approx value = (true value) (1 + rel error)
True value usually unknown, so we estimate or bound
error rather than compute it exactly
Relative error often taken relative to approximate value,
rather than (unknown) true value
Michael T. Heath Scientic Computing 8 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Data Error and Computational Error
Typical problem: compute value of function f : R R for
given argument
x = true value of input
f(x) = desired result
x = approximate (inexact) input
f( x) f( x) + f( x) f(x)
computational error + propagated data error
Algorithm has no effect on propagated data error
Michael T. Heath Scientic Computing 9 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Truncation Error and Rounding Error
Truncation error : difference between true result (for actual
input) and result produced by given algorithm using exact
arithmetic
Due to approximations such as truncating innite series or
terminating iterative sequence before convergence
Rounding error : difference between result produced by
given algorithm using exact arithmetic and result produced
by same algorithm using limited precision arithmetic
Due to inexact representation of real numbers and
arithmetic operations upon them
Computational error is sum of truncation error and
rounding error, but one of these usually dominates
< interactive example >
Michael T. Heath Scientic Computing 10 / 46
Scientic Computing
Approximations
Computer Arithmetic
Sources of Approximation
Error Analysis
Sensitivity and Conditioning
Example: Finite Difference Approximation
Error in nite difference approximation
f
(x)
f(x + h) f(x)
h
exhibits tradeoff between rounding error and truncation
error
Truncation error bounded by Mh/2, where M bounds
|f
relative
forward error
= cond
relative
backward error
relative
forward error
cond
relative
backward error
(x)x
Relative forward error:
f(x + x) f(x)
f(x)
f
(x)x
f(x)
Condition number: cond
(x)x/f(x)
x/x
xf
(x)
f(x)
+
d
2
2
+ +
d
p1
p1
_
E
where 0 d
i
1, i = 0, . . . , p 1, and L E U
Michael T. Heath Scientic Computing 24 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Floating-Point Numbers, continued
Portions of oating-poing number designated as follows
exponent : E
mantissa: d
0
d
1
d
p1
fraction: d
1
d
2
d
p1
Sign, exponent, and mantissa are stored in separate
xed-width elds of each oating-point word
Michael T. Heath Scientic Computing 25 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Typical Floating-Point Systems
Parameters for typical oating-point systems
system p L U
IEEE SP 2 24 126 127
IEEE DP 2 53 1022 1023
Cray 2 48 16383 16384
HP calculator 10 12 499 499
IBM mainframe 16 6 64 63
Most modern computers use binary ( = 2) arithmetic
IEEE oating-point systems are now almost universal in
digital computers
Michael T. Heath Scientic Computing 26 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Normalization
Floating-point system is normalized if leading digit d
0
is
always nonzero unless number represented is zero
In normalized systems, mantissa m of nonzero
oating-point number always satises 1 m <
Reasons for normalization
representation of each number unique
no digits wasted on leading zeros
leading bit need not be stored (in binary system)
Michael T. Heath Scientic Computing 27 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Properties of Floating-Point Systems
Floating-point number system is nite and discrete
Total number of normalized oating-point numbers is
2( 1)
p1
(U L + 1) + 1
Smallest positive normalized number: UFL =
L
Largest oating-point number: OFL =
U+1
(1
p
)
Floating-point numbers equally spaced only between
successive powers of
Not all real numbers exactly representable; those that are
are called machine numbers
Michael T. Heath Scientic Computing 28 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Example: Floating-Point System
Tick marks indicate all 25 numbers in oating-point system
having = 2, p = 3, L = 1, and U = 1
OFL = (1.11)
2
2
1
= (3.5)
10
UFL = (1.00)
2
2
1
= (0.5)
10
At sufciently high magnication, all normalized
oating-point systems look grainy and unequally spaced
< interactive example >
Michael T. Heath Scientic Computing 29 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Rounding Rules
If real number x is not exactly representable, then it is
approximated by nearby oating-point number (x)
This process is called rounding, and error introduced is
called rounding error
Two commonly used rounding rules
chop: truncate base- expansion of x after (p 1)st digit;
also called round toward zero
round to nearest : (x) is nearest oating-point number to
x, using oating-point number whose last stored digit is
even in case of tie; also called round to even
Round to nearest is most accurate, and is default rounding
rule in IEEE systems
< interactive example >
Michael T. Heath Scientic Computing 30 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Machine Precision
Accuracy of oating-point system characterized by unit
roundoff (or machine precision or machine epsilon)
denoted by
mach
With rounding by chopping,
mach
=
1p
With rounding to nearest,
mach
=
1
2
1p
Alternative denition is smallest number such that
(1 + ) > 1
Maximum relative error in representing real number x
within range of oating-point system is given by
(x) x
x
mach
Michael T. Heath Scientic Computing 31 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Machine Precision, continued
For toy system illustrated earlier
mach
= (0.01)
2
= (0.25)
10
with rounding by chopping
mach
= (0.001)
2
= (0.125)
10
with rounding to nearest
For IEEE oating-point systems
mach
= 2
24
10
7
in single precision
mach
= 2
53
10
16
in double precision
So IEEE single and double precision systems have about 7
and 16 decimal digits of precision, respectively
Michael T. Heath Scientic Computing 32 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Machine Precision, continued
Though both are small, unit roundoff
mach
should not be
confused with underow level UFL
Unit roundoff
mach
is determined by number of digits in
mantissa of oating-point system, whereas underow level
UFL is determined by number of digits in exponent eld
In all practical oating-point systems,
0 < UFL <
mach
< OFL
Michael T. Heath Scientic Computing 33 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Subnormals and Gradual Underow
Normalization causes gap around zero in oating-point
system
If leading digits are allowed to be zero, but only when
exponent is at its minimum value, then gap is lled in by
additional subnormal or denormalized oating-point
numbers
Subnormals extend range of magnitudes representable,
but have less precision than normalized numbers, and unit
roundoff is no smaller
Augmented system exhibits gradual underow
Michael T. Heath Scientic Computing 34 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Exceptional Values
IEEE oating-point standard provides special values to
indicate two exceptional situations
Inf, which stands for innity, results from dividing a nite
number by zero, such as 1/0
NaN, which stands for not a number, results from
undened or indeterminate operations such as 0/0, 0 Inf,
or Inf/Inf
Inf and NaN are implemented in IEEE arithmetic through
special reserved values of exponent eld
Michael T. Heath Scientic Computing 35 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Floating-Point Arithmetic
Addition or subtraction: Shifting of mantissa to make
exponents match may cause loss of some digits of smaller
number, possibly all of them
Multiplication: Product of two p-digit mantissas contains up
to 2p digits, so result may not be representable
Division: Quotient of two p-digit mantissas may contain
more than p digits, such as nonterminating binary
expansion of 1/10
Result of oating-point arithmetic operation may differ from
result of corresponding real arithmetic operation on same
operands
Michael T. Heath Scientic Computing 36 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Example: Floating-Point Arithmetic
Assume = 10, p = 6
Let x = 1.92403 10
2
, y = 6.35782 10
1
Floating-point addition gives x + y = 1.93039 10
2
,
assuming rounding to nearest
Last two digits of y do not affect result, and with even
smaller exponent, y could have had no effect on result
Floating-point multiplication gives x y = 1.22326 10
2
,
which discards half of digits of true product
Michael T. Heath Scientic Computing 37 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Floating-Point Arithmetic, continued
Real result may also fail to be representable because its
exponent is beyond available range
Overow is usually more serious than underow because
there is no good approximation to arbitrarily large
magnitudes in oating-point system, whereas zero is often
reasonable approximation for arbitrarily small magnitudes
On many computer systems overow is fatal, but an
underow may be silently set to zero
Michael T. Heath Scientic Computing 38 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Example: Summing Series
Innite series
n=1
1
n
has nite sum in oating-point arithmetic even though real
series is divergent
Possible explanations
Partial sum eventually overows
1/n eventually underows
Partial sum ceases to change once 1/n becomes negligible
relative to partial sum
1
n
<
mach
n1
k=1
1
k
< interactive example >
Michael T. Heath Scientic Computing 39 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Floating-Point Arithmetic, continued
Ideally, x flop y = (x op y), i.e., oating-point arithmetic
operations produce correctly rounded results
Computers satisfying IEEE oating-point standard achieve
this ideal as long as x op y is within range of oating-point
system
But some familiar laws of real arithmetic are not
necessarily valid in oating-point system
Floating-point addition and multiplication are commutative
but not associative
Example: if is positive oating-point number slightly
smaller than
mach
, then (1 + ) + = 1, but 1 + ( + ) > 1
Michael T. Heath Scientic Computing 40 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Cancellation
Subtraction between two p-digit numbers having same sign
and similar magnitudes yields result with fewer than p
digits, so it is usually exactly representable
Reason is that leading digits of two numbers cancel (i.e.,
their difference is zero)
For example,
1.92403 10
2
1.92275 10
2
= 1.28000 10
1
which is correct, and exactly representable, but has only
three signicant digits
Michael T. Heath Scientic Computing 41 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Cancellation, continued
Despite exactness of result, cancellation often implies
serious loss of information
Operands are often uncertain due to rounding or other
previous errors, so relative uncertainty in difference may be
large
Example: if is positive oating-point number slightly
smaller than
mach
, then (1 + ) (1 ) = 1 1 = 0 in
oating-point arithmetic, which is correct for actual
operands of nal subtraction, but true result of overall
computation, 2, has been completely lost
Subtraction itself is not at fault: it merely signals loss of
information that had already occurred
Michael T. Heath Scientic Computing 42 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Cancellation, continued
Digits lost to cancellation are most signicant, leading
digits, whereas digits lost in rounding are least signicant,
trailing digits
Because of this effect, it is generally bad idea to compute
any small quantity as difference of large quantities, since
rounding error is likely to dominate result
For example, summing alternating series, such as
e
x
= 1 + x +
x
2
2!
+
x
3
3!
+
for x < 0, may give disastrous results due to catastrophic
cancellation
Michael T. Heath Scientic Computing 43 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Example: Cancellation
Total energy of helium atom is sum of kinetic and potential
energies, which are computed separately and have opposite
signs, so suffer cancellation
Year Kinetic Potential Total
1971 13.0 14.0 1.0
1977 12.76 14.02 1.26
1980 12.22 14.35 2.13
1985 12.28 14.65 2.37
1988 12.40 14.84 2.44
Although computed values for kinetic and potential energies
changed by only 6% or less, resulting estimate for total energy
changed by 144%
Michael T. Heath Scientic Computing 44 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Example: Quadratic Formula
Two solutions of quadratic equation ax
2
+ bx + c = 0 are
given by
x =
b
b
2
4ac
2a
Naive use of formula can suffer overow, or underow, or
severe cancellation
Rescaling coefcients avoids overow or harmful underow
Cancellation between b and square root can be avoided
by computing one root using alternative formula
x =
2c
b
b
2
4ac
Cancellation inside square root cannot be easily avoided
without using higher precision
< interactive example >
Michael T. Heath Scientic Computing 45 / 46
Scientic Computing
Approximations
Computer Arithmetic
Floating-Point Numbers
Floating-Point Arithmetic
Example: Standard Deviation
Mean and standard deviation of sequence x
i
, i = 1, . . . , n,
are given by
x =
1
n
n
i=1
x
i
and =
_
1
n 1
n
i=1
(x
i
x)
2
_
1
2
Mathematically equivalent formula
=
_
1
n 1
_
n
i=1
x
2
i
n x
2
__
1
2
avoids making two passes through data
Single cancellation at end of one-pass formula is more
damaging numerically than all cancellations in two-pass
formula combined
Michael T. Heath Scientic Computing 46 / 46
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Scientic Computing: An Introductory Survey
Chapter 5 Nonlinear Equations
Prof. Michael T. Heath
Department of Computer Science
University of Illinois at Urbana-Champaign
Copyright c 2002. Reproduction permitted
for noncommercial, educational use only.
Michael T. Heath Scientic Computing 1 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Outline
1
Nonlinear Equations
2
Numerical Methods in One Dimension
3
Methods for Systems of Nonlinear Equations
Michael T. Heath Scientic Computing 2 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Nonlinear Equations
Given function f, we seek value x for which
f(x) = 0
Solution x is root of equation, or zero of function f
So problem is known as root nding or zero nding
Michael T. Heath Scientic Computing 3 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Nonlinear Equations
Two important cases
Single nonlinear equation in one unknown, where
f : R R
Solution is scalar x for which f(x) = 0
System of n coupled nonlinear equations in n unknowns,
where
f : R
n
R
n
Solution is vector x for which all components of f are zero
simultaneously, f(x) = 0
Michael T. Heath Scientic Computing 4 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Examples: Nonlinear Equations
Example of nonlinear equation in one dimension
x
2
4 sin(x) = 0
for which x = 1.9 is one approximate solution
Example of system of nonlinear equations in two
dimensions
x
2
1
x
2
+ 0.25 = 0
x
1
+ x
2
2
+ 0.25 = 0
for which x =
_
0.5 0.5
T
is solution vector
Michael T. Heath Scientic Computing 5 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Existence and Uniqueness
Existence and uniqueness of solutions are more
complicated for nonlinear equations than for linear
equations
For function f : R R, bracket is interval [a, b] for which
sign of f differs at endpoints
If f is continuous and sign(f(a)) = sign(f(b)), then
Intermediate Value Theorem implies there is x
[a, b]
such that f(x
) = 0
There is no simple analog for n dimensions
Michael T. Heath Scientic Computing 6 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Examples: One Dimension
Nonlinear equations can have any number of solutions
exp(x) + 1 = 0 has no solution
exp(x) x = 0 has one solution
x
2
4 sin(x) = 0 has two solutions
x
3
+ 6x
2
+ 11x 6 = 0 has three solutions
sin(x) = 0 has innitely many solutions
Michael T. Heath Scientic Computing 7 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Example: Systems in Two Dimensions
x
2
1
x
2
+ = 0
x
1
+ x
2
2
+ = 0
Michael T. Heath Scientic Computing 8 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Multiplicity
If f(x
) = f
(x
) = f
(x
) = = f
(m1)
(x
) = 0 but
f
(m)
(x
), then root x
has multiplicity m
If m = 1 (f(x
) = 0 and f
(x
) = 0), then x
is simple root
Michael T. Heath Scientic Computing 9 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Sensitivity and Conditioning
Conditioning of root nding problem is opposite to that for
evaluating function
Absolute condition number of root nding problem for root
x
of f : R R is 1/|f
(x
)|
Root is ill-conditioned if tangent line is nearly horizontal
In particular, multiple root (m > 1) is ill-conditioned
Absolute condition number of root nding problem for root
x
of f : R
n
R
n
is J
1
f
(x
), where J
f
is Jacobian
matrix of f,
{J
f
(x)}
ij
= f
i
(x)/x
j
Root is ill-conditioned if Jacobian matrix is nearly singular
Michael T. Heath Scientic Computing 10 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Sensitivity and Conditioning
Michael T. Heath Scientic Computing 11 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Sensitivity and Conditioning
What do we mean by approximate solution x to nonlinear
system,
f( x) 0 or x x
0 ?
First corresponds to small residual, second measures
closeness to (usually unknown) true solution x
where x
k
is approximate solution and x
is true solution
For methods that maintain interval known to contain
solution, rather than specic approximate value for
solution, take error to be length of interval containing
solution
Sequence converges with rate r if
lim
k
e
k+1
e
k
r
= C
for some nite nonzero constant C
Michael T. Heath Scientic Computing 13 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Nonlinear Equations
Solutions and Sensitivity
Convergence
Convergence Rate, continued
Some particular cases of interest
r = 1: linear (C < 1)
r > 1: superlinear
r = 2: quadratic
Convergence Digits gained
rate per iteration
linear constant
superlinear increasing
quadratic double
Michael T. Heath Scientic Computing 14 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Interval Bisection Method
Bisection method begins with initial bracket and repeatedly
halves its length until solution has been isolated as accurately
as desired
while ((b a) > tol) do
m = a + (b a)/2
if sign(f(a)) = sign(f(m)) then
a = m
else
b = m
end
end
< interactive example >
Michael T. Heath Scientic Computing 15 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Bisection Method
f(x) = x
2
4 sin(x) = 0
a f(a) b f(b)
1.000000 2.365884 3.000000 8.435520
1.000000 2.365884 2.000000 0.362810
1.500000 1.739980 2.000000 0.362810
1.750000 0.873444 2.000000 0.362810
1.875000 0.300718 2.000000 0.362810
1.875000 0.300718 1.937500 0.019849
1.906250 0.143255 1.937500 0.019849
1.921875 0.062406 1.937500 0.019849
1.929688 0.021454 1.937500 0.019849
1.933594 0.000846 1.937500 0.019849
1.933594 0.000846 1.935547 0.009491
1.933594 0.000846 1.934570 0.004320
1.933594 0.000846 1.934082 0.001736
1.933594 0.000846 1.933838 0.000445
Michael T. Heath Scientic Computing 16 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Bisection Method, continued
Bisection method makes no use of magnitudes of function
values, only their signs
Bisection is certain to converge, but does so slowly
At each iteration, length of interval containing solution
reduced by half, convergence rate is linear, with r = 1 and
C = 0.5
One bit of accuracy is gained in approximate solution for
each iteration of bisection
Given starting interval [a, b], length of interval after k
iterations is (b a)/2
k
, so achieving error tolerance of tol
requires
_
log
2
_
b a
tol
__
iterations, regardless of function f involved
Michael T. Heath Scientic Computing 17 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Fixed-Point Problems
Fixed point of given function g : R R is value x such that
x = g(x)
Many iterative methods for solving nonlinear equations use
xed-point iteration scheme of form
x
k+1
= g(x
k
)
where xed points for g are solutions for f(x) = 0
Also called functional iteration, since function g is applied
repeatedly to initial starting value x
0
For given equation f(x) = 0, there may be many equivalent
xed-point problems x = g(x) with different choices for g
Michael T. Heath Scientic Computing 18 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Fixed-Point Problems
If f(x) = x
2
x 2, then xed points of each of functions
g(x) = x
2
2
g(x) =
x + 2
g(x) = 1 + 2/x
g(x) =
x
2
+ 2
2x 1
are solutions to equation f(x) = 0
Michael T. Heath Scientic Computing 19 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Fixed-Point Problems
Michael T. Heath Scientic Computing 20 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Fixed-Point Iteration
Michael T. Heath Scientic Computing 21 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Fixed-Point Iteration
Michael T. Heath Scientic Computing 22 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Convergence of Fixed-Point Iteration
If x
= g(x
) and |g
(x
(x
(x
)|
But if g
(x
(x)h
is linear function of h approximating f near x
Replace nonlinear function f by this linear function, whose
zero is h = f(x)/f
(x)
Zeros of original function and linear approximation are not
identical, so repeat process, giving Newtons method
x
k+1
= x
k
f(x
k
)
f
(x
k
)
Michael T. Heath Scientic Computing 24 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Newtons Method, continued
Newtons method approximates nonlinear function f near x
k
by
tangent line at f(x
k
)
Michael T. Heath Scientic Computing 25 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Newtons Method
Use Newtons method to nd root of
f(x) = x
2
4 sin(x) = 0
Derivative is
f
(x) = 2x 4 cos(x)
so iteration scheme is
x
k+1
= x
k
x
2
k
4 sin(x
k
)
2x
k
4 cos(x
k
)
Taking x
0
= 3 as starting value, we obtain
x f(x) f
(x) h
3.000000 8.435520 9.959970 0.846942
2.153058 1.294772 6.505771 0.199019
1.954039 0.108438 5.403795 0.020067
1.933972 0.001152 5.288919 0.000218
1.933754 0.000000 5.287670 0.000000
Michael T. Heath Scientic Computing 26 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Convergence of Newtons Method
Newtons method transforms nonlinear equation f(x) = 0
into xed-point problem x = g(x), where
g(x) = x f(x)/f
(x)
and hence
g
(x) = f(x)f
(x)/(f
(x))
2
If x
) = 0 and f
(x
) = 0), then
g
(x
) = 0
Convergence rate of Newtons method for simple root is
therefore quadratic (r = 2)
But iterations must start close enough to root to converge
< interactive example >
Michael T. Heath Scientic Computing 27 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Newtons Method, continued
For multiple root, convergence rate of Newtons method is only
linear, with constant C = 1 (1/m), where m is multiplicity
k f(x) = x
2
1 f(x) = x
2
2x + 1
0 2.0 2.0
1 1.25 1.5
2 1.025 1.25
3 1.0003 1.125
4 1.00000005 1.0625
5 1.0 1.03125
Michael T. Heath Scientic Computing 28 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Secant Method
For each iteration, Newtons method requires evaluation of
both function and its derivative, which may be inconvenient
or expensive
In secant method, derivative is approximated by nite
difference using two successive iterates, so iteration
becomes
x
k+1
= x
k
f(x
k
)
x
k
x
k1
f(x
k
) f(x
k1
)
Convergence rate of secant method is normally
superlinear, with r 1.618
Michael T. Heath Scientic Computing 29 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Secant Method, continued
Secant method approximates nonlinear function f by secant
line through previous two iterates
< interactive example >
Michael T. Heath Scientic Computing 30 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Secant Method
Use secant method to nd root of
f(x) = x
2
4 sin(x) = 0
Taking x
0
= 1 and x
1
= 3 as starting guesses, we obtain
x f(x) h
1.000000 2.365884
3.000000 8.435520 1.561930
1.438070 1.896774 0.286735
1.724805 0.977706 0.305029
2.029833 0.534305 0.107789
1.922044 0.061523 0.011130
1.933174 0.003064 0.000583
1.933757 0.000019 0.000004
1.933754 0.000000 0.000000
Michael T. Heath Scientic Computing 31 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Higher-Degree Interpolation
Secant method uses linear interpolation to approximate
function whose zero is sought
Higher convergence rate can be obtained by using
higher-degree polynomial interpolation
For example, quadratic interpolation (Mullers method) has
superlinear convergence rate with r 1.839
Unfortunately, using higher degree polynomial also has
disadvantages
interpolating polynomial may not have real roots
roots may not be easy to compute
choice of root to use as next iterate may not be obvious
Michael T. Heath Scientic Computing 32 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Inverse Interpolation
Good alternative is inverse interpolation, where x
k
are
interpolated as function of y
k
= f(x
k
) by polynomial p(y),
so next approximate solution is p(0)
Most commonly used for root nding is inverse quadratic
interpolation
Michael T. Heath Scientic Computing 33 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Inverse Quadratic Interpolation
Given approximate solution values a, b, c, with function
values f
a
, f
b
, f
c
, next approximate solution found by tting
quadratic polynomial to a, b, c as function of f
a
, f
b
, f
c
, then
evaluating polynomial at 0
Based on nontrivial derivation using Lagrange
interpolation, we compute
u = f
b
/f
c
, v = f
b
/f
a
, w = f
a
/f
c
p = v(w(u w)(c b) (1 u)(b a))
q = (w 1)(u 1)(v 1)
then new approximate solution is b + p/q
Convergence rate is normally r 1.839
< interactive example >
Michael T. Heath Scientic Computing 34 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Inverse Quadratic Interpolation
Use inverse quadratic interpolation to nd root of
f(x) = x
2
4 sin(x) = 0
Taking x = 1, 2, and 3 as starting values, we obtain
x f(x) h
1.000000 2.365884
2.000000 0.362810
3.000000 8.435520
1.886318 0.244343 0.113682
1.939558 0.030786 0.053240
1.933742 0.000060 0.005815
1.933754 0.000000 0.000011
1.933754 0.000000 0.000000
Michael T. Heath Scientic Computing 35 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Linear Fractional Interpolation
Interpolation using rational fraction of form
(x) =
x u
vx w
is especially useful for nding zeros of functions having
horizontal or vertical asymptotes
has zero at x = u, vertical asymptote at x = w/v, and
horizontal asymptote at y = 1/v
Given approximate solution values a, b, c, with function
values f
a
, f
b
, f
c
, next approximate solution is c + h, where
h =
(a c)(b c)(f
a
f
b
)f
c
(a c)(f
c
f
b
)f
a
(b c)(f
c
f
a
)f
b
Convergence rate is normally r 1.839, same as for
quadratic interpolation (inverse or regular)
Michael T. Heath Scientic Computing 36 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Example: Linear Fractional Interpolation
Use linear fractional interpolation to nd root of
f(x) = x
2
4 sin(x) = 0
Taking x = 1, 2, and 3 as starting values, we obtain
x f(x) h
1.000000 2.365884
2.000000 0.362810
3.000000 8.435520
1.906953 0.139647 1.093047
1.933351 0.002131 0.026398
1.933756 0.000013 0.000406
1.933754 0.000000 0.000003
< interactive example >
Michael T. Heath Scientic Computing 37 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Safeguarded Methods
Rapidly convergent methods for solving nonlinear
equations may not converge unless started close to
solution, but safe methods are slow
Hybrid methods combine features of both types of
methods to achieve both speed and reliability
Use rapidly convergent method, but maintain bracket
around solution
If next approximate solution given by fast method falls
outside bracketing interval, perform one iteration of safe
method, such as bisection
Michael T. Heath Scientic Computing 38 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Safeguarded Methods, continued
Fast method can then be tried again on smaller interval
with greater chance of success
Ultimately, convergence rate of fast method should prevail
Hybrid approach seldom does worse than safe method,
and usually does much better
Popular combination is bisection and inverse quadratic
interpolation, for which no derivatives required
Michael T. Heath Scientic Computing 39 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Bisection Method
Fixed-Point Iteration and Newtons Method
Additional Methods
Zeros of Polynomials
For polynomial p(x) of degree n, one may want to nd all n
of its zeros, which may be complex even if coefcients are
real
Several approaches are available
Use root-nding method such as Newtons or Mullers
method to nd one root, deate it out, and repeat
Form companion matrix of polynomial and use eigenvalue
routine to compute all its eigenvalues
Use method designed specically for nding all roots of
polynomial, such as Jenkins-Traub
Michael T. Heath Scientic Computing 40 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Systems of Nonlinear Equations
Solving systems of nonlinear equations is much more difcult
than scalar case because
Wider variety of behavior is possible, so determining
existence and number of solutions or good starting guess
is much more complex
There is no simple way, in general, to guarantee
convergence to desired solution or to bracket solution to
produce absolutely safe method
Computational overhead increases rapidly with dimension
of problem
Michael T. Heath Scientic Computing 41 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Fixed-Point Iteration
Fixed-point problem for g: R
n
R
n
is to nd vector x such
that
x = g(x)
Corresponding xed-point iteration is
x
k+1
= g(x
k
)
If (G(x
))
If G(x
T
, then
f(x
0
) =
_
3
13
_
, J
f
(x
0
) =
_
1 2
2 16
_
Solving system
_
1 2
2 16
_
s
0
=
_
3
13
_
gives s
0
=
_
1.83
0.58
_
,
so x
1
= x
0
+s
0
=
_
0.83 1.42
T
Michael T. Heath Scientic Computing 44 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Example, continued
Evaluating at new point,
f(x
1
) =
_
0
4.72
_
, J
f
(x
1
) =
_
1 2
1.67 11.3
_
Solving system
_
1 2
1.67 11.3
_
s
1
=
_
0
4.72
_
gives
s
1
=
_
0.64 0.32
T
, so x
2
= x
1
+s
1
=
_
0.19 1.10
T
Evaluating at new point,
f(x
2
) =
_
0
0.83
_
, J
f
(x
2
) =
_
1 2
0.38 8.76
_
Iterations eventually convergence to solution x
=
_
0 1
T
< interactive example >
Michael T. Heath Scientic Computing 45 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Convergence of Newtons Method
Differentiating corresponding xed-point operator
g(x) = x J(x)
1
f(x)
and evaluating at solution x
gives
G(x
) = I (J(x
)
1
J(x
) +
n
i=1
f
i
(x
)H
i
(x
)) = O
where H
i
(x) is component matrix of derivative of J(x)
1
Convergence rate of Newtons method for nonlinear
systems is normally quadratic, provided Jacobian matrix
J(x
) is nonsingular
But it must be started close enough to solution to converge
Michael T. Heath Scientic Computing 46 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Cost of Newtons Method
Cost per iteration of Newtons method for dense problem in n
dimensions is substantial
Computing Jacobian matrix costs n
2
scalar function
evaluations
Solving linear system costs O(n
3
) operations
Michael T. Heath Scientic Computing 47 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Secant Updating Methods
Secant updating methods reduce cost by
Using function values at successive iterates to build
approximate Jacobian and avoiding explicit evaluation of
derivatives
Updating factorization of approximate Jacobian rather than
refactoring it each iteration
Most secant updating methods have superlinear but not
quadratic convergence rate
Secant updating methods often cost less overall than
Newtons method because of lower cost per iteration
Michael T. Heath Scientic Computing 48 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Broydens Method
Broydens method is typical secant updating method
Beginning with initial guess x
0
for solution and initial
approximate Jacobian B
0
, following steps are repeated
until convergence
x
0
= initial guess
B
0
= initial Jacobian approximation
for k = 0, 1, 2, . . .
Solve B
k
s
k
= f(x
k
) for s
k
x
k+1
= x
k
+s
k
y
k
= f(x
k+1
) f(x
k
)
B
k+1
= B
k
+ ((y
k
B
k
s
k
)s
T
k
)/(s
T
k
s
k
)
end
Michael T. Heath Scientic Computing 49 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Broydens Method, continued
Motivation for formula for B
k+1
is to make least change to
B
k
subject to satisfying secant equation
B
k+1
(x
k+1
x
k
) = f(x
k+1
) f(x
k
)
In practice, factorization of B
k
is updated instead of
updating B
k
directly, so total cost per iteration is only O(n
2
)
Michael T. Heath Scientic Computing 50 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Example: Broydens Method
Use Broydens method to solve nonlinear system
f(x) =
_
x
1
+ 2x
2
2
x
2
1
+ 4x
2
2
4
_
= 0
If x
0
=
_
1 2
T
, then f(x
0
) =
_
3 13
T
, and we choose
B
0
= J
f
(x
0
) =
_
1 2
2 16
_
Solving system
_
1 2
2 16
_
s
0
=
_
3
13
_
gives s
0
=
_
1.83
0.58
_
, so x
1
= x
0
+s
0
=
_
0.83
1.42
_
Michael T. Heath Scientic Computing 51 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Example, continued
Evaluating at new point x
1
gives f(x
1
) =
_
0
4.72
_
, so
y
0
= f(x
1
) f(x
0
) =
_
3
8.28
_
From updating formula, we obtain
B
1
=
_
1 2
2 16
_
+
_
0 0
2.34 0.74
_
=
_
1 2
0.34 15.3
_
Solving system
_
1 2
0.34 15.3
_
s
1
=
_
0
4.72
_
gives s
1
=
_
0.59
0.30
_
, so x
2
= x
1
+s
1
=
_
0.24
1.120
_
Michael T. Heath Scientic Computing 52 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Example, continued
Evaluating at new point x
2
gives f(x
2
) =
_
0
1.08
_
, so
y
1
= f(x
2
) f(x
1
) =
_
0
3.64
_
From updating formula, we obtain
B
2
=
_
1 2
0.34 15.3
_
+
_
0 0
1.46 0.73
_
=
_
1 2
1.12 14.5
_
Iterations continue until convergence to solution x
=
_
0
1
_
< interactive example >
Michael T. Heath Scientic Computing 53 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Robust Newton-Like Methods
Newtons method and its variants may fail to converge
when started far from solution
Safeguards can enlarge region of convergence of
Newton-like methods
Simplest precaution is damped Newton method, in which
new iterate is
x
k+1
= x
k
+
k
s
k
where s
k
is Newton (or Newton-like) step and
k
is scalar
parameter chosen to ensure progress toward solution
Parameter
k
reduces Newton step when it is too large,
but
k
= 1 sufces near solution and still yields fast
asymptotic convergence rate
Michael T. Heath Scientic Computing 54 / 55
Nonlinear Equations
Numerical Methods in One Dimension
Methods for Systems of Nonlinear Equations
Fixed-Point Iteration
Newtons Method
Secant Updating Methods
Trust-Region Methods
Another approach is to maintain estimate of trust region
where Taylor series approximation, upon which Newtons
method is based, is sufciently accurate for resulting
computed step to be reliable
Adjusting size of trust region to constrain step size when
necessary usually enables progress toward solution even
starting far away, yet still permits rapid converge once near
solution
Unlike damped Newton method, trust region method may
modify direction as well as length of Newton step
More details on this approach will be given in Chapter 6
Michael T. Heath Scientic Computing 55 / 55
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Optimization
Given function f : R
n
R, and set S R
n
, nd x
S
such that f(x
= {x S : f(x) }
If continuous function f on S R
n
has nonempty sublevel
set that is closed and bounded, then f has global minimum
on S
If S is unbounded, then f is coercive on S if, and only if, all
of its sublevel sets are bounded
Michael T. Heath Scientic Computing 9 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Uniqueness of Minimum
Set S R
n
is convex if it contains line segment between
any two of its points
Function f : S R
n
R is convex on convex set S if its
graph along any line segment in S lies on or below chord
connecting function values at endpoints of segment
Any local minimum of convex function f on convex set
S R
n
is global minimum of f on S
Any local minimum of strictly convex function f on convex
set S R
n
is unique global minimum of f on S
Michael T. Heath Scientic Computing 10 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
First-Order Optimality Condition
For function of one variable, one can nd extremum by
differentiating function and setting derivative to zero
Generalization to function of n variables is to nd critical
point, i.e., solution of nonlinear system
f(x) = 0
where f(x) is gradient vector of f, whose ith component
is f(x)/x
i
For continuously differentiable f : S R
n
R, any interior
point x
, if H
f
(x
) is
positive denite, then x
is minimum of f
negative denite, then x
is maximum of f
indenite, then x
is saddle point of f
singular, then various pathological situations are possible
Michael T. Heath Scientic Computing 12 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Constrained Optimality
If problem is constrained, only feasible directions are
relevant
For equality-constrained problem
min f(x) subject to g(x) = 0
where f : R
n
R and g: R
n
R
m
, with m n, necessary
condition for feasible point x
) = J
T
g
(x
)
where J
g
is Jacobian matrix of g, and is vector of
Lagrange multipliers
This condition says we cannot reduce objective function
without violating constraints
Michael T. Heath Scientic Computing 13 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Constrained Optimality, continued
Lagrangian function L: R
n+m
R, is dened by
L(x, ) = f(x) +
T
g(x)
Its gradient is given by
L(x, ) =
f(x) +J
T
g
(x)
g(x)
B(x, ) J
T
g
(x)
J
g
(x) O
where
B(x, ) = H
f
(x) +
m
i=1
i
H
g
i
(x)
Michael T. Heath Scientic Computing 14 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Constrained Optimality, continued
Together, necessary condition and feasibility imply critical
point of Lagrangian function,
L(x, ) =
f(x) +J
T
g
(x)
g(x)
= 0
Hessian of Lagrangian is symmetric, but not positive
denite, so critical point of L is saddle point rather than
minimum or maximum
Critical point (x
) of L is constrained minimum of f if
B(x
)
If columns of Z form basis for null space, then test
projected Hessian Z
T
BZ for positive deniteness
Michael T. Heath Scientic Computing 15 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Constrained Optimality, continued
If inequalities are present, then KKT optimality conditions
also require nonnegativity of Lagrange multipliers
corresponding to inequalities, and complementarity
condition
Michael T. Heath Scientic Computing 16 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Denitions
Existence and Uniqueness
Optimality Conditions
Sensitivity and Conditioning
Function minimization and equation solving are closely
related problems, but their sensitivities differ
In one dimension, absolute condition number of root x
of
equation f(x) = 0 is 1/|f
(x
(x
)|
For minimizing f, Taylor series expansion
f( x) = f(x
+ h)
= f(x
) + f
(x
)h +
1
2
f
(x
)h
2
+O(h
3
)
shows that, since f
(x
) = 0, if |f( x) f(x
)| , then
| x x
| may be as large as
2/|f
(x
)|
Thus, based on function values alone, minima can be
computed to only about half precision
Michael T. Heath Scientic Computing 17 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Unimodality
For minimizing function of one variable, we need bracket
for solution analogous to sign change for nonlinear
equation
Real-valued function f is unimodal on interval [a, b] if there
is unique x
) is minimum of f on
[a, b], and f is strictly decreasing for x x
, strictly
increasing for x
x
Unimodality enables discarding portions of interval based
on sample function values, analogous to interval bisection
Michael T. Heath Scientic Computing 18 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Golden Section Search
Suppose f is unimodal on [a, b], and let x
1
and x
2
be two
points within [a, b], with x
1
< x
2
Evaluating and comparing f(x
1
) and f(x
2
), we can discard
either (x
2
, b] or [a, x
1
), with minimum known to lie in
remaining subinterval
To repeat process, we need compute only one new
function evaluation
To reduce length of interval by xed fraction at each
iteration, each new pair of points must have same
relationship with respect to new interval that previous pair
had with respect to previous interval
Michael T. Heath Scientic Computing 19 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Golden Section Search, continued
To accomplish this, we choose relative positions of two
points as and 1 , where
2
= 1 , so
= (
5 1)/2
x
1
= a + (1 )(b a); f
1
= f(x
1
)
x
2
= a + (b a); f
2
= f(x
2
)
while ((b a) > tol) do
if (f
1
> f
2
) then
a = x
1
x
1
= x
2
f
1
= f
2
x
2
= a + (b a)
f
2
= f(x
2
)
else
b = x
2
x
2
= x
1
f
2
= f
1
x
1
= a + (1 )(b a)
f
1
= f(x
1
)
end
end
Michael T. Heath Scientic Computing 21 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Example: Golden Section Search
Use golden section search to minimize
f(x) = 0.5 xexp(x
2
)
Michael T. Heath Scientic Computing 22 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Example, continued
x
1
f
1
x
2
f
2
0.764 0.074 1.236 0.232
0.472 0.122 0.764 0.074
0.764 0.074 0.944 0.113
0.652 0.074 0.764 0.074
0.584 0.085 0.652 0.074
0.652 0.074 0.695 0.071
0.695 0.071 0.721 0.071
0.679 0.072 0.695 0.071
0.695 0.071 0.705 0.071
0.705 0.071 0.711 0.071
< interactive example >
Michael T. Heath Scientic Computing 23 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Successive Parabolic Interpolation
Fit quadratic polynomial to three function values
Take minimum of quadratic to be new approximation to
minimum of function
New point replaces oldest of three previous points and
process is repeated until convergence
Convergence rate of successive parabolic interpolation is
superlinear, with r 1.324
Michael T. Heath Scientic Computing 24 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Example: Successive Parabolic Interpolation
Use successive parabolic interpolation to minimize
f(x) = 0.5 xexp(x
2
)
Michael T. Heath Scientic Computing 25 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Example, continued
x
k
f(x
k
)
0.000 0.500
0.600 0.081
1.200 0.216
0.754 0.073
0.721 0.071
0.692 0.071
0.707 0.071
< interactive example >
Michael T. Heath Scientic Computing 26 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Newtons Method
Another local quadratic approximation is truncated Taylor
series
f(x + h) f(x) + f
(x)h +
f
(x)
2
h
2
By differentiation, minimum of this quadratic function of h is
given by h = f
(x)/f
(x)
Suggests iteration scheme
x
k+1
= x
k
f
(x
k
)/f
(x
k
)
which is Newtons method for solving nonlinear equation
f
(x) = 0
Newtons method for nding minimum normally has
quadratic convergence rate, but must be started close
enough to solution to converge < interactive example >
Michael T. Heath Scientic Computing 27 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Example: Newtons Method
Use Newtons method to minimize f(x) = 0.5 xexp(x
2
)
First and second derivatives of f are given by
f
(x) = (2x
2
1) exp(x
2
)
and
f
(x) = 2x(3 2x
2
) exp(x
2
)
Newton iteration for zero of f
is given by
x
k+1
= x
k
(2x
2
k
1)/(2x
k
(3 2x
2
k
))
Using starting guess x
0
= 1, we obtain
x
k
f(x
k
)
1.000 0.132
0.500 0.111
0.700 0.071
0.707 0.071
Michael T. Heath Scientic Computing 28 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Golden Section Search
Successive Parabolic Interpolation
Newtons Method
Safeguarded Methods
As with nonlinear equations in one dimension,
slow-but-sure and fast-but-risky optimization methods can
be combined to provide both safety and efciency
Most library routines for one-dimensional optimization are
based on this hybrid approach
Popular combination is golden section search and
successive parabolic interpolation, for which no derivatives
are required
Michael T. Heath Scientic Computing 29 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Direct Search Methods
Direct search methods for multidimensional optimization
make no use of function values other than comparing them
For minimizing function f of n variables, Nelder-Mead
method begins with n + 1 starting points, forming simplex
in R
n
Then move to new point along straight line from current
point having highest function value through centroid of
other points
New point replaces worst point, and process is repeated
Direct search methods are useful for nonsmooth functions
or for small n, but expensive for larger n
< interactive example >
Michael T. Heath Scientic Computing 30 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Steepest Descent Method
Let f : R
n
R be real-valued function of n real variables
At any point x where gradient vector is nonzero, negative
gradient, f(x), points downhill toward lower values of f
In fact, f(x) is locally direction of steepest descent: f
decreases more rapidly along direction of negative
gradient than along any other
Steepest descent method: starting from initial guess x
0
,
successive approximate solutions given by
x
k+1
= x
k
k
f(x
k
)
where
k
is line search parameter that determines how far
to go in given direction
Michael T. Heath Scientic Computing 31 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Steepest Descent, continued
Given descent direction, such as negative gradient,
determining appropriate value for
k
at each iteration is
one-dimensional minimization problem
min
k
f(x
k
k
f(x
k
))
that can be solved by methods already discussed
Steepest descent method is very reliable: it can always
make progress provided gradient is nonzero
But method is myopic in its view of functions behavior, and
resulting iterates can zigzag back and forth, making very
slow progress toward solution
In general, convergence rate of steepest descent is only
linear, with constant factor that can be arbitrarily close to 1
Michael T. Heath Scientic Computing 32 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example: Steepest Descent
Use steepest descent method to minimize
f(x) = 0.5x
2
1
+ 2.5x
2
2
Gradient is given by f(x) =
x
1
5x
2
Taking x
0
=
5
1
, we have f(x
0
) =
5
5
0
f(x
0
0
f(x
0
))
exact minimum along line is given by
0
= 1/3, so next
approximation is x
1
=
3.333
0.667
x
1
5x
2
and H
f
(x) =
1 0
0 5
Taking x
0
=
5
1
, we have f(x
0
) =
5
5
1 0
0 5
s
0
=
5
5
, so
x
1
= x
0
+s
0
=
5
1
5
1
0
0
k
B
1
k
f(x
k
)
where
k
is line search parameter and B
k
is approximation
to Hessian matrix
Many quasi-Newton methods are more robust than
Newtons method, are superlinearly convergent, and have
lower overhead per iteration, which often more than offsets
their slower convergence rate
Michael T. Heath Scientic Computing 43 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Secant Updating Methods
Could use Broydens method to seek zero of gradient, but
this would not preserve symmetry of Hessian matrix
Several secant updating formulas have been developed for
minimization that not only preserve symmetry in
approximate Hessian matrix, but also preserve positive
deniteness
Symmetry reduces amount of work required by about half,
while positive deniteness guarantees that quasi-Newton
step will be descent direction
Michael T. Heath Scientic Computing 44 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
BFGS Method
One of most effective secant updating methods for minimization
is BFGS
x
0
= initial guess
B
0
= initial Hessian approximation
for k = 0, 1, 2, . . .
Solve B
k
s
k
= f(x
k
) for s
k
x
k+1
= x
k
+s
k
y
k
= f(x
k+1
) f(x
k
)
B
k+1
= B
k
+ (y
k
y
T
k
)/(y
T
k
s
k
) (B
k
s
k
s
T
k
B
k
)/(s
T
k
B
k
s
k
)
end
Michael T. Heath Scientic Computing 45 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
BFGS Method, continued
In practice, factorization of B
k
is updated rather than B
k
itself, so linear system for s
k
can be solved at cost of O(n
2
)
rather than O(n
3
) work
Unlike Newtons method for minimization, no second
derivatives are required
Can start with B
0
= I, so initial step is along negative
gradient, and then second derivative information is
gradually built up in approximate Hessian matrix over
successive iterations
BFGS normally has superlinear convergence rate, even
though approximate Hessian does not necessarily
converge to true Hessian
Line search can be used to enhance effectiveness
Michael T. Heath Scientic Computing 46 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example: BFGS Method
Use BFGS to minimize f(x) = 0.5x
2
1
+ 2.5x
2
2
Gradient is given by f(x) =
x
1
5x
2
Taking x
0
=
5 1
T
and B
0
= I, initial step is negative
gradient, so
x
1
= x
0
+s
0
=
5
1
5
5
0
4
0.667 0.333
0.333 0.667
k+1
= (g
T
k+1
g
k+1
)/(g
T
k
g
k
)
s
k+1
= g
k+1
+
k+1
s
k
end
Alternative formula for
k+1
is
k+1
= ((g
k+1
g
k
)
T
g
k+1
)/(g
T
k
g
k
)
Michael T. Heath Scientic Computing 50 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example: Conjugate Gradient Method
Use CG method to minimize f(x) = 0.5x
2
1
+ 2.5x
2
2
Gradient is given by f(x) =
x
1
5x
2
Taking x
0
=
5 1
T
, initial search direction is negative
gradient,
s
0
= g
0
= f(x
0
) =
5
5
T
, and we compute
new gradient,
g
1
= f(x
1
) =
3.333
3.333
1
= (g
T
1
g
1
)/(g
T
0
g
0
) = 0.444
which gives as next search direction
s
1
= g
1
+
1
s
0
=
3.333
3.333
+ 0.444
5
5
5.556
1.111
(x) = J
T
(x)J(x) +
m
i=1
r
i
(x)H
i
(x)
where J(x) is Jacobian of r(x), and H
i
(x) is Hessian of
r
i
(x)
Michael T. Heath Scientic Computing 54 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Nonlinear Least Squares, continued
Linear system for Newton step is
J
T
(x
k
)J(x
k
) +
m
i=1
r
i
(x
k
)H
i
(x
k
)
s
k
= J
T
(x
k
)r(x
k
)
m Hessian matrices H
i
are usually inconvenient and
expensive to compute
Moreover, in H
each H
i
is multiplied by residual
component r
i
, which is small at solution if t of model
function to data is good
Michael T. Heath Scientic Computing 55 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Gauss-Newton Method
This motivates Gauss-Newton method for nonlinear least
squares, in which second-order term is dropped and linear
system
J
T
(x
k
)J(x
k
)s
k
= J
T
(x
k
)r(x
k
)
is solved for approximate Newton step s
k
at each iteration
This is system of normal equations for linear least squares
problem
J(x
k
)s
k
= r(x
k
)
which can be solved better by QR factorization
Next approximate solution is then given by
x
k+1
= x
k
+s
k
and process is repeated until convergence
Michael T. Heath Scientic Computing 56 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example: Gauss-Newton Method
Use Gauss-Newton method to t nonlinear model function
f(t, x) = x
1
exp(x
2
t)
to data
t 0.0 1.0 2.0 3.0
y 2.0 0.7 0.3 0.1
For this model function, entries of Jacobian matrix of
residual function r are given by
{J(x)}
i,1
=
r
i
(x)
x
1
= exp(x
2
t
i
)
{J(x)}
i,2
=
r
i
(x)
x
2
= x
1
t
i
exp(x
2
t
i
)
Michael T. Heath Scientic Computing 57 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example, continued
If we take x
0
=
1 0
T
, then Gauss-Newton step s
0
is
given by linear least squares problem
1 0
1 1
1 2
1 3
s
0
=
1
0.3
0.7
0.9
whose solution is s
0
=
0.69
0.61
J(x
k
)
k
I
s
k
=
r(x
k
)
0
f(x) +J
T
g
(x)
g(x)
= 0
we obtain linear system
B(x, ) J
T
g
(x)
J
g
(x) O
s
f(x) +J
T
g
(x)
g(x)
(x) = f(x) +
1
2
g(x)
T
g(x)
and augmented Lagrangian function
L
(x, ) = f(x) +
T
g(x) +
1
2
g(x)
T
g(x)
where parameter > 0 determines relative weighting of
optimality vs feasibility
Given starting guess x
0
, good starting guess for
0
can be
obtained from least squares problem
J
T
g
(x
0
)
0
= f(x
0
)
Michael T. Heath Scientic Computing 64 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Inequality-Constrained Optimization
Methods just outlined for equality constraints can be
extended to handle inequality constraints by using active
set strategy
Inequality constraints are provisionally divided into those
that are satised already (and can therefore be temporarily
disregarded) and those that are violated (and are therefore
temporarily treated as equality constraints)
This division of constraints is revised as iterations proceed
until eventually correct constraints are identied that are
binding at solution
Michael T. Heath Scientic Computing 65 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Penalty Methods
Merit function can also be used to convert
equality-constrained problem into sequence of
unconstrained problems
If x
is solution to
min
x
(x) = f(x) +
1
2
g(x)
T
g(x)
then under appropriate conditions
lim
= x
(x) = f(x)
p
i=1
1
h
i
(x)
or
(x) = f(x)
p
i=1
log(h
i
(x))
which increasingly penalize feasible points as they
approach boundary of feasible region
Again, solutions of unconstrained problem approach x
as
0, but problems are increasingly ill-conditioned, so
solve sequence of problems with decreasing values of
Barrier functions are basis for interior point methods for
linear programming
Michael T. Heath Scientic Computing 67 / 74
Optimization Problems
One-Dimensional Optimization
Multi-Dimensional Optimization
Unconstrained Optimization
Nonlinear Least Squares
Constrained Optimization
Example: Constrained Optimization
Consider quadratic programming problem
min
x
f(x) = 0.5x
2
1
+ 2.5x
2
2
subject to
g(x) = x
1
x
2
1 = 0
Lagrangian function is given by
L(x, ) = f(x) + g(x) = 0.5x
2
1
+ 2.5x
2
2
+ (x
1
x
2
1)
Since
f(x) =
x
1
5x
2
and J
g
(x) =
1 1
we have
x
L(x, ) = f(x) +J
T
g
(x) =
x
1
5x
2
1
1
1 0 1
0 5 1
1 1 0
x
1
x
2
0
0
1
n
= r
T
n
p
n
/p
T
n
Ap
n
In CG, p
n
is chosen to be A-conjugate (or A-orthogonal) to previous
search directions, i.e., p
T
n
Ap
j
= 0 for j < n
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 9 / 21
Optimality of Step Length
Select step length
n
over vector p
n1
to minimize
(x) =
1
2
x
T
Ax x
T
b
Let x
n
= x
n1
+
n
p
n1
,
(x
n
) =
1
2
(x
n1
+
n
p
n1
)
T
A(x
n1
+
n
p
n1
) (x
n1
+
n
p
n1
)
T
b
=
1
2
2
n
p
T
n1
Ap
n1
+
n
p
T
n1
Ax
n1
n
p
T
n1
b + constant
=
1
2
2
n
p
T
n1
Ap
n1
n
p
T
n1
r
n1
+ constant
Therefore,
d
d
n
= 0
n
p
T
n1
Ap
n1
p
T
n1
r
n1
= 0
n
=
p
T
n1
r
n1
p
T
n1
Ap
n1
.
In addition, p
T
n1
r
n1
= r
T
n1
r
n1
because p
n1
= r
n1
+
n
p
n2
and r
T
n1
p
n2
= 0 due to the following theorem.
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 10 / 21
Conjugate Gradient Method
Algorithm: Conjugate Gradient Method
x
0
= 0, r
0
= b, p
0
= r
0
for n = 1 to 1, 2, 3, . . .
n
= (r
T
n1
r
n1
)/(p
T
n1
Ap
n1
) step length
x
n
= x
n1
+
n
p
n1
approximate solution
r
n
= r
n1
n
Ap
n1
residual
n
= (r
T
n
r
n
)/(r
T
n1
r
n1
) improvement this step
p
n
= r
n
+
n
p
n1
search direction
Only one matrix-vector product Ap
n1
per iteration
Apart from matrix-vector product, #operations per iteration is O(m)
CG can be viewed as minimization of quadratic function
(x) =
1
2
x
T
Ax x
T
b by modifying steepest descent
First proposed by Hestens and Stiefel in 1950s
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 11 / 21
An Alternative Interpretation of CG
Algorithm: CG
x
0
= 0, r
0
= b, p
0
= r
0
for n =1, 2, 3, . . .
n
= r
T
n1
r
n1
/(p
T
n1
Ap
n1
)
x
n
= x
n1
+
n
p
n1
r
n
= r
n1
n
Ap
n1
n
= r
T
n
r
n
/(r
T
n1
r
n1
)
p
n
= r
n
+
n
p
n1
Algorithm: A non-standard CG
x
0
= 0, r
0
= b, p
0
= r
0
for n =1, 2, 3, . . .
n
= r
T
n1
p
n1
/(p
T
n1
Ap
n1
)
x
n
= x
n1
+
n
p
n1
r
n
= b Ax
n
n
= r
T
n
Ap
n1
/(p
T
n1
Ap
n1
)
p
n
= r
n
+
n
p
n1
The non-standard one is less ecient but easier to understand
It is easy to see r
n
= r
n1
n
Ap
n1
= b Ax
n
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 12 / 21
Comparison of Linear and Nonlinear CG
Algorithm: Linear CG
x
0
= 0, r
0
= b,
p
0
= r
0
for n =1, 2, 3, . . .
n
= r
T
n1
r
n1
/(p
T
n1
Ap
n1
)
x
n
= x
n1
+
n
p
n1
r
n
= r
n1
n
Ap
n1
n
= r
T
n
r
n
/(r
T
n1
r
n1
)
p
n
= r
n
+
n
p
n1
Algorithm: Non-linear CG
x
0
= initial guess, g
0
= f (x
0
),
s
0
= g
0
for k = 0, 1, 2, . . .
Choose
k
to min f (x
k
+
k
s
k
)
x
k+1
= x
k
+
k
s
k
g
k+1
= f (x
k+1
)
k+1
= (g
T
k+1
g
k+1
)/(g
T
k
g
k
)
s
k+1
= g
k+1
+
k+1
s
k
k+1
= (g
T
k+1
g
k+1
)/(g
T
k
g
k
) was due to Fletcher and Reeves (1964)
An alternative formula
k+1
= (g
k+1
g
k
)
T
g
k+1
/(g
T
k
g
k
) was due
to Polak and Riebiere (1969)
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 13 / 21
Properties of Conjugate Gradients
Krylov subspaces for Ax = b is /
n
= b, Ab, . . . , A
n1
b.
Theorem
If r
n1
,= 0, spaces spanned by approximate solutions x
n
, search directions
p
n
, and residuals r
n
are all equal to Krylov subspaces
/
n
= x
1
, x
2
, . . . , x
n
= p
0
, p
1
, . . . , p
n1
= r
0
, r
1
, . . . , r
n1
= b, Ab, . . . , A
n1
b
The residual are orthogonal (i.e., r
T
n
r
j
= 0 for j < n) and search directions
are A-conjugate (i.e, p
T
n
Ap
j
= 0 for j < n).
Theorem
If r
n1
,= 0, then error e
n
= x
x
n
are minimized in A-norm in /
n
.
Because /
n
grows monotonically, error decreases monotonically.
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 14 / 21
Rate of Convergence
Some important convergence results
+ 1
n
which is 2
1
2
n
as . So convergence is expected in
O(
) iterations.
In general, CG performs well with clustered eigenvalues
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 15 / 21
Outline
1
BFGS Method
2
Conjugate Gradient Methods
3
Constrained Optimization
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 16 / 21
Equality-Constrained Minimization
Equality-constrained problem has form
min
xR
n
f (x) subject to g(x) = 0
where objective function f : R
n
R and constraints g : R
n
R
m
,
where m n
Necessary condition for feasible point x to be solution is that negative
gradient of f lie in space spanned by constraint normals, i.e.,
f (x
) = J
T
g
(x
),
where J
g
is Jacobian matrix of g, and is vector of Lagrange
multipliers
Therefore, constrained local minimum must be critical point of
Lagrangian function
/(x, ) = f (x) +
T
g(x)
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 17 / 21
First-Order and Second-Order Optimality Conditions
Equality-constrained minimization can be reduced to solving
/(x, ) =
f (x) +J
T
g
(x)
g(x)
= 0,
which is known as Karush-Kuhn-Tucker (or KKT) condition for
constrained local minimum.
Hessian of Lagrangian function
H
L
(x, ) =
B(x, ) J
T
g
(x)
J
g
(x) 0
where B(x, ) = H
f
(x) +
m
i =1
i
H
g
i
(x). H
L
is sometimes called
KKT (Karush-Kuhn-Tucker) matrix. H
L
is symmetric, but not in
general positive denite
Critical point (x
) is
positive denite on null space of J
g
(x
).
Let Z form basis of null (J
g
(x
B(x
k
,
k
) J
T
g
(x
k
)
J
g
(x
k
) 0
s
k
f (x
k
) +J
T
g
(x
k
)
k
g(x
k
)
,
and then x
k+1
= x
k
+s
k
and
k+1
=
k
+
k
Above system of equations is rst-order optimality condition for
constrained optimization problem
min
s
1
2
s
T
B(x
k
,
k
)s +s
T
f (x
k
) +J
T
g
(x
k
)
k
subject to
J
g
(x
k
)s +g(x
k
) = 0.
This problem is quadratic programming problem, so approach using
Newtons method is known as sequential quadratic programing
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 19 / 21
Solving KKT System
KKT system
B J
T
J 0
s
w
g
JB
1
J
T
= g JB
1
w
and then
Bs = w J
T
Z
T
BZ
v = Z
T
(w BYu)
Finally,
Y
T
J
T
= R = Y
T
(w +Bs)
This method method is advantageous when n m is small
It is more stable than range-space method. Also, B does not need to
be nonsingular
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 21 / 21
AMS527: Numerical Analysis II
Linear Programming
Xiangmin Jiao
SUNY Stony Brook
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 1 / 8
Linear Programming
Linear programming has linear objective function and linear equality
and inequality constraints
Example: Maximize prot of combination of wheat and barley, but
with limited budget of land, fertilizer, and insecticide. Let x
1
and x
2
be areas planted for wheat and barley, we have linear programming
problem
maximize c
1
x
1
+c
2
x
2
{maximize revenue}
0 x
1
+x
2
L {limit on area}
F
1
x
1
+F
2
x
2
F {limit on fertilizer}
P
1
x
1
+P
2
x
2
P {limit on insecticide}
x
1
0, x
2
0 {nonnegative land}
Linear programming is typically solved by simplex methods or interior
point methods
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 2 / 8
Standard Form of Linear Programming
Linear programming has many forms. A standard form (called slack
form) is
min cx subject to Ax = b and x 0
Simplex method and interior-point method requires slack form
Previous example can be converted into standard form
minimize (c
1
) x
1
+(c
2
)x
2
{maximize revenue}
x
1
+x
2
+x
3
= L {limit on area}
F
1
x
1
+F
2
x
2
+x
4
= F {limit on fertilizer}
P
1
x
1
+P
2
x
2
+x
5
= P {limit on insecticide}
x
1
, x
2
, x
3
, x
4
, x
5
0 {nonnegativity}
Here, x
3
, x
4
, and x
5
are called slack variables
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 3 / 8
Duality
m equations Ax = b have m corresponding Lagrange multipliers in y
Primal problem
Minimize c
T
x subject to Ax = b and x 0
Dual problem
Maximize b
T
y subject to A
T
y c
Weak duality: b
T
y c
T
x for any feasible x and y
because b
T
y = (Ax)
T
y = x
T
A
T
y
x
T
c = c
T
x
Strong duality: If both feasible sets of primal and dual problems are
nonempty, then c
T
x
= b
T
y
at optimal x
and y
log x
i
y
T
(Ax b)
The derivatives L/x
j
= c
j
x
j
(A
T
y)
j
= 0, or x
j
s
j
= , where
s = c A
T
y
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 6 / 8
Newton Step
n optimality equations x
j
s
j
= are nonlinear, and are solved iteratively
using Newtons method
To determine increment x, y, and s, we need to solve
(x
i
+x
i
)(s
i
+s
i
) = . It is typical to ignore second order term
x
i
s
i
. Then linear equations become
Ax = 0
A
T
y +s = 0
s
j
x
j
+x
j
s
j
= x
j
s
j
.
The iteration has quadratic convergence for each , and approaches
zero
Gilbert Strang, Computational Science and Engineering, Wellesley
Cambridge, 2007. Section 8.6, Linear Programming and Duality.
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 7 / 8
Example
Minimize c
T
x = 5x
1
+3x
2
+8x
3
with x
i
0 and Ax = x
1
+x
2
+2x
3
= 4.
Barrier Lagrangian is
L = (5x
1
+3x
2
+8x
3
) +(log x
1
+log x
2
+log x
3
) y(x
1
+x
2
+2x
3
4)
Optimality equation gives us:
s = c A
T
y s
1
= 5 y, s
2
= 3 y, s
3
= 8 2y
L/x
i
= 0 x
1
s
1
= x
2
s
2
= x
3
s
3
=
L/y = 0 x
1
+x
2
+2x
3
= 4
.
Start from an interior point x
1
= x
2
= x
3
= 1, y = 1, and s = (3, 1, 4).
From Ax = 0 and x
j
s
j
. +s
j
x
j
+x
j
s
j
= , we obtain equations
3x
1
1y = 3
1x
2
1y = 1
4x
3
2y = 4
x
1
+x
2
+2x
3
= 0.
Given = 4/3, we then obtain x
new
= (2/3, 2, 2/3) and y
new
= 8/3,
whereas x
= (0, 4, 0) and y
= 3
Xiangmin Jiao (SUNY Stony Brook) AMS527: Numerical Analysis II 8 / 8
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Scientic Computing: An Introductory Survey
Chapter 7 Interpolation
Prof. Michael T. Heath
Department of Computer Science
University of Illinois at Urbana-Champaign
Copyright c _ 2002. Reproduction permitted
for noncommercial, educational use only.
Michael T. Heath Scientic Computing 1 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Outline
1
Interpolation
2
Polynomial Interpolation
3
Piecewise Polynomial Interpolation
Michael T. Heath Scientic Computing 2 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Interpolation
Basic interpolation problem: for given data
(t
1
, y
1
), (t
2
, y
2
), . . . (t
m
, y
m
) with t
1
< t
2
< < t
m
determine function f : R R such that
f(t
i
) = y
i
, i = 1, . . . , m
f is interpolating function, or interpolant, for given data
Additional data might be prescribed, such as slope of
interpolant at given points
Additional constraints might be imposed, such as
smoothness, monotonicity, or convexity of interpolant
f could be function of more than one variable, but we will
consider only one-dimensional case
Michael T. Heath Scientic Computing 3 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Purposes for Interpolation
Plotting smooth curve through discrete data points
Reading between lines of table
Differentiating or integrating tabular data
Quick and easy evaluation of mathematical function
Replacing complicated function by simple one
Michael T. Heath Scientic Computing 4 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Interpolation vs Approximation
By denition, interpolating function ts given data points
exactly
Interpolation is inappropriate if data points subject to
signicant errors
It is usually preferable to smooth noisy data, for example
by least squares approximation
Approximation is also more appropriate for special function
libraries
Michael T. Heath Scientic Computing 5 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Issues in Interpolation
Arbitrarily many functions interpolate given set of data points
What form should interpolating function have?
How should interpolant behave between data points?
Should interpolant inherit properties of data, such as
monotonicity, convexity, or periodicity?
Are parameters that dene interpolating function
meaningful?
If function and data are plotted, should results be visually
pleasing?
Michael T. Heath Scientic Computing 6 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Choosing Interpolant
Choice of function for interpolation based on
How easy interpolating function is to work with
determining its parameters
evaluating interpolant
differentiating or integrating interpolant
How well properties of interpolant match properties of data
to be t (smoothness, monotonicity, convexity, periodicity,
etc.)
Michael T. Heath Scientic Computing 7 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Functions for Interpolation
Families of functions commonly used for interpolation
include
Polynomials
Piecewise polynomials
Trigonometric functions
Exponential functions
Rational functions
For now we will focus on interpolation by polynomials and
piecewise polynomials
We will consider trigonometric interpolation (DFT) later
Michael T. Heath Scientic Computing 8 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Basis Functions
Family of functions for interpolating given data points is
spanned by set of basis functions
1
(t), . . . ,
n
(t)
Interpolating function f is chosen as linear combination of
basis functions,
f(t) =
n
j=1
x
j
j
(t)
Requiring f to interpolate data (t
i
, y
i
) means
f(t
i
) =
n
j=1
x
j
j
(t
i
) = y
i
, i = 1, . . . , m
which is system of linear equations Ax = y for n-vector x
of parameters x
j
, where entries of mn matrix A are
given by a
ij
=
j
(t
i
)
Michael T. Heath Scientic Computing 9 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Motivation
Choosing Interpolant
Existence and Uniqueness
Existence, Uniqueness, and Conditioning
Existence and uniqueness of interpolant depend on
number of data points m and number of basis functions n
If m > n, interpolant usually doesnt exist
If m < n, interpolant is not unique
If m = n, then basis matrix A is nonsingular provided data
points t
i
are distinct, so data can be t exactly
Sensitivity of parameters x to perturbations in data
depends on cond(A), which depends in turn on choice of
basis functions
Michael T. Heath Scientic Computing 10 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Polynomial Interpolation
Simplest and most common type of interpolation uses
polynomials
Unique polynomial of degree at most n 1 passes through
n data points (t
i
, y
i
), i = 1, . . . , n, where t
i
are distinct
There are many ways to represent or compute interpolating
polynomial, but in theory all must give same result
< interactive example >
Michael T. Heath Scientic Computing 11 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Monomial Basis
Monomial basis functions
j
(t) = t
j1
, j = 1, . . . , n
give interpolating polynomial of form
p
n1
(t) = x
1
+ x
2
t + + x
n
t
n1
with coefcients x given by n n linear system
Ax =
_
_
1 t
1
t
n1
1
1 t
2
t
n1
2
.
.
.
.
.
.
.
.
.
.
.
.
1 t
n
t
n1
n
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
=
_
_
y
1
y
2
.
.
.
y
n
_
_
= y
Matrix of this form is called Vandermonde matrix
Michael T. Heath Scientic Computing 12 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Example: Monomial Basis
Determine polynomial of degree two interpolating three
data points (2, 27), (0, 1), (1, 0)
Using monomial basis, linear system is
Ax =
_
_
1 t
1
t
2
1
1 t
2
t
2
2
1 t
3
t
2
3
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
y
1
y
2
y
3
_
_
= y
For these particular data, system is
_
_
1 2 4
1 0 0
1 1 1
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
27
1
0
_
_
whose solution is x =
_
1 5 4
T
, so interpolating
polynomial is
p
2
(t) = 1 + 5t 4t
2
Michael T. Heath Scientic Computing 13 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Monomial Basis, continued
< interactive example >
Solving system Ax = y using standard linear equation
solver to determine coefcients x of interpolating
polynomial requires O(n
3
) work
Michael T. Heath Scientic Computing 14 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Monomial Basis, continued
For monomial basis, matrix A is increasingly ill-conditioned
as degree increases
Ill-conditioning does not prevent tting data points well,
since residual for linear system solution will be small
But it does mean that values of coefcients are poorly
determined
Both conditioning of linear system and amount of
computational work required to solve it can be improved by
using different basis
Change of basis still gives same interpolating polynomial
for given data, but representation of polynomial will be
different
Michael T. Heath Scientic Computing 15 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Monomial Basis, continued
Conditioning with monomial basis can be improved by
shifting and scaling independent variable t
j
(t) =
_
t c
d
_
j1
where, c = (t
1
+ t
n
)/2 is midpoint and d = (t
n
t
1
)/2 is
half of range of data
New independent variable lies in interval [1, 1], which also
helps avoid overow or harmful underow
Even with optimal shifting and scaling, monomial basis
usually is still poorly conditioned, and we must seek better
alternatives
< interactive example >
Michael T. Heath Scientic Computing 16 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Evaluating Polynomials
When represented in monomial basis, polynomial
p
n1
(t) = x
1
+ x
2
t + + x
n
t
n1
can be evaluated efciently using Horners nested
evaluation scheme
p
n1
(t) = x
1
+ t(x
2
+ t(x
3
+ t( (x
n1
+ tx
n
) )))
which requires only n additions and n multiplications
For example,
1 4t + 5t
2
2t
3
+ 3t
4
= 1 + t(4 + t(5 + t(2 + 3t)))
Other manipulations of interpolating polynomial, such as
differentiation or integration, are also relatively easy with
monomial basis representation
Michael T. Heath Scientic Computing 17 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Lagrange Interpolation
For given set of data points (t
i
, y
i
), i = 1, . . . , n, Lagrange
basis functions are dened by
j
(t) =
n
k=1,k=j
(t t
k
) /
n
k=1,k=j
(t
j
t
k
), j = 1, . . . , n
For Lagrange basis,
j
(t
i
) =
_
1 if i = j
0 if i ,= j
, i, j = 1, . . . , n
so matrix of linear system Ax = y is identity matrix
Thus, Lagrange polynomial interpolating data points (t
i
, y
i
)
is given by
p
n1
(t) = y
1
1
(t) + y
2
2
(t) + + y
n
n
(t)
Michael T. Heath Scientic Computing 18 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Lagrange Basis Functions
< interactive example >
Lagrange interpolant is easy to determine but more
expensive to evaluate for given argument, compared with
monomial basis representation
Lagrangian form is also more difcult to differentiate,
integrate, etc.
Michael T. Heath Scientic Computing 19 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Example: Lagrange Interpolation
Use Lagrange interpolation to determine interpolating
polynomial for three data points (2, 27), (0, 1), (1, 0)
Lagrange polynomial of degree two interpolating three
points (t
1
, y
1
), (t
2
, y
2
), (t
3
, y
3
) is given by p
2
(t) =
y
1
(t t
2
)(t t
3
)
(t
1
t
2
)(t
1
t
3
)
+ y
2
(t t
1
)(t t
3
)
(t
2
t
1
)(t
2
t
3
)
+ y
3
(t t
1
)(t t
2
)
(t
3
t
1
)(t
3
t
2
)
For these particular data, this becomes
p
2
(t) = 27
t(t 1)
(2)(2 1)
+ (1)
(t + 2)(t 1)
(2)(1)
Michael T. Heath Scientic Computing 20 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Newton Interpolation
For given set of data points (t
i
, y
i
), i = 1, . . . , n, Newton
basis functions are dened by
j
(t) =
j1
k=1
(t t
k
), j = 1, . . . , n
where value of product is taken to be 1 when limits make it
vacuous
Newton interpolating polynomial has form
p
n1
(t) = x
1
+ x
2
(t t
1
) + x
3
(t t
1
)(t t
2
) +
+ x
n
(t t
1
)(t t
2
) (t t
n1
)
For i < j,
j
(t
i
) = 0, so basis matrix A is lower triangular,
where a
ij
=
j
(t
i
)
Michael T. Heath Scientic Computing 21 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Newton Basis Functions
< interactive example >
Michael T. Heath Scientic Computing 22 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Newton Interpolation, continued
Solution x to system Ax = y can be computed by
forward-substitution in O(n
2
) arithmetic operations
Moreover, resulting interpolant can be evaluated efciently
for any argument by nested evaluation scheme similar to
Horners method
Newton interpolation has better balance between cost of
computing interpolant and cost of evaluating it
Michael T. Heath Scientic Computing 23 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Example: Newton Interpolation
Use Newton interpolation to determine interpolating
polynomial for three data points (2, 27), (0, 1), (1, 0)
Using Newton basis, linear system is
_
_
1 0 0
1 t
2
t
1
0
1 t
3
t
1
(t
3
t
1
)(t
3
t
2
)
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
y
1
y
2
y
3
_
_
For these particular data, system is
_
_
1 0 0
1 2 0
1 3 3
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
27
1
0
_
_
whose solution by forward substitution is
x =
_
27 13 4
T
, so interpolating polynomial is
p(t) = 27 + 13(t + 2) 4(t + 2)t
Michael T. Heath Scientic Computing 24 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Newton Interpolation, continued
If p
j
(t) is polynomial of degree j 1 interpolating j given
points, then for any constant x
j+1
,
p
j+1
(t) = p
j
(t) + x
j+1
j+1
(t)
is polynomial of degree j that also interpolates same j
points
Free parameter x
j+1
can then be chosen so that p
j+1
(t)
interpolates y
j+1
,
x
j+1
=
y
j+1
p
j
(t
j+1
)
j+1
(t
j+1
)
Newton interpolation begins with constant polynomial
p
1
(t) = y
1
interpolating rst data point and then
successively incorporates each remaining data point into
interpolant < interactive example >
Michael T. Heath Scientic Computing 25 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Divided Differences
Given data points (t
i
, y
i
), i = 1, . . . , n, divided differences,
denoted by f[ ], are dened recursively by
f[t
1
, t
2
, . . . , t
k
] =
f[t
2
, t
3
, . . . , t
k
] f[t
1
, t
2
, . . . , t
k1
]
t
k
t
1
where recursion begins with f[t
k
] = y
k
, k = 1, . . . , n
Coefcient of jth basis function in Newton interpolant is
given by
x
j
= f[t
1
, t
2
, . . . , t
j
]
Recursion requires O(n
2
) arithmetic operations to compute
coefcients of Newton interpolant, but is less prone to
overow or underow than direct formation of triangular
Newton basis matrix
Michael T. Heath Scientic Computing 26 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Orthogonal Polynomials
Inner product can be dened on space of polynomials on
interval [a, b] by taking
p, q) =
_
b
a
p(t)q(t)w(t)dt
where w(t) is nonnegative weight function
Two polynomials p and q are orthogonal if p, q) = 0
Set of polynomials p
i
is orthonormal if
p
i
, p
j
) =
_
1 if i = j
0 otherwise
Given set of polynomials, Gram-Schmidt orthogonalization
can be used to generate orthonormal set spanning same
space
Michael T. Heath Scientic Computing 27 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Orthogonal Polynomials, continued
For example, with inner product given by weight function
w(t) 1 on interval [1, 1], applying Gram-Schmidt
process to set of monomials 1, t, t
2
, t
3
, . . . yields Legendre
polynomials
1, t, (3t
2
1)/2, (5t
3
3t)/2, (35t
4
30t
2
+ 3)/8,
(63t
5
70t
3
+ 15t)/8, . . .
rst n of which form an orthogonal basis for space of
polynomials of degree at most n 1
Other choices of weight functions and intervals yield other
orthogonal polynomials, such as Chebyshev, Jacobi,
Laguerre, and Hermite
Michael T. Heath Scientic Computing 28 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Orthogonal Polynomials, continued
Orthogonal polynomials have many useful properties
They satisfy three-term recurrence relation of form
p
k+1
(t) = (
k
t +
k
)p
k
(t)
k
p
k1
(t)
which makes them very efcient to generate and evaluate
Orthogonality makes them very natural for least squares
approximation, and they are also useful for generating
Gaussian quadrature rules, which we will see later
Michael T. Heath Scientic Computing 29 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Chebyshev Polynomials
kth Chebyshev polynomial of rst kind, dened on interval
[1, 1] by
T
k
(t) = cos(k arccos(t))
are orthogonal with respect to weight function (1 t
2
)
1/2
First few Chebyshev polynomials are given by
1, t, 2t
2
1, 4t
3
3t, 8t
4
8t
2
+1, 16t
5
20t
3
+5t, . . .
Equi-oscillation property : successive extrema of T
k
are
equal in magnitude and alternate in sign, which distributes
error uniformly when approximating arbitrary continuous
function
Michael T. Heath Scientic Computing 30 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Chebyshev Basis Functions
< interactive example >
Michael T. Heath Scientic Computing 31 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Chebyshev Points
Chebyshev points are zeros of T
k
, given by
t
i
= cos
_
(2i 1)
2k
_
, i = 1, . . . , k
or extrema of T
k
, given by
t
i
= cos
_
i
k
_
, i = 0, 1, . . . , k
Chebyshev points are abscissas of points equally spaced
around unit circle in R
2
Chebyshev points have attractive properties for
interpolation and other problems
Michael T. Heath Scientic Computing 32 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Interpolating Continuous Functions
If data points are discrete sample of continuous function,
how well does interpolant approximate that function
between sample points?
If f is smooth function, and p
n1
is polynomial of degree at
most n 1 interpolating f at n points t
1
, . . . , t
n
, then
f(t) p
n1
(t) =
f
(n)
()
n!
(t t
1
)(t t
2
) (t t
n
)
where is some (unknown) point in interval [t
1
, t
n
]
Since point is unknown, this result is not particularly
useful unless bound on appropriate derivative of f is
known
Michael T. Heath Scientic Computing 33 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Interpolating Continuous Functions, continued
If [f
(n)
(t)[ M for all t [t
1
, t
n
], and
h = maxt
i+1
t
i
: i = 1, . . . , n 1, then
max
t[t
1
,t
n
]
[f(t) p
n1
(t)[
Mh
n
4n
Error diminishes with increasing n and decreasing h, but
only if [f
(n)
(t)[ does not grow too rapidly with n
< interactive example >
Michael T. Heath Scientic Computing 34 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
High-Degree Polynomial Interpolation
Interpolating polynomials of high degree are expensive to
determine and evaluate
In some bases, coefcients of polynomial may be poorly
determined due to ill-conditioning of linear system to be
solved
High-degree polynomial necessarily has lots of wiggles,
which may bear no relation to data to be t
Polynomial passes through required data points, but it may
oscillate wildly between data points
Michael T. Heath Scientic Computing 35 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Convergence
Polynomial interpolating continuous function may not
converge to function as number of data points and
polynomial degree increases
Equally spaced interpolation points often yield
unsatisfactory results near ends of interval
If points are bunched near ends of interval, more
satisfactory results are likely to be obtained with
polynomial interpolation
Use of Chebyshev points distributes error evenly and
yields convergence throughout interval for any sufciently
smooth function
Michael T. Heath Scientic Computing 36 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Example: Runges Function
Polynomial interpolants of Runges function at equally
spaced points do not converge
< interactive example >
Michael T. Heath Scientic Computing 37 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Example: Runges Function
Polynomial interpolants of Runges function at Chebyshev
points do converge
< interactive example >
Michael T. Heath Scientic Computing 38 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Monomial, Lagrange, and Newton Interpolation
Orthogonal Polynomials
Accuracy and Convergence
Taylor Polynomial
Another useful form of polynomial interpolation for smooth
function f is polynomial given by truncated Taylor series
p
n
(t) = f(a)+f
(a)(ta)+
f
(a)
2
(ta)
2
+ +
f
(n)
(a)
n!
(ta)
n
Polynomial interpolates f in that values of p
n
and its rst n
derivatives match those of f and its rst n derivatives
evaluated at t = a, so p
n
(t) is good approximation to f(t)
for t near a
We have already seen examples in Newtons method for
nonlinear equations and optimization
< interactive example >
Michael T. Heath Scientic Computing 39 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Piecewise Polynomial Interpolation
Fitting single polynomial to large number of data points is
likely to yield unsatisfactory oscillating behavior in
interpolant
Piecewise polynomials provide alternative to practical and
theoretical difculties with high-degree polynomial
interpolation
Main advantage of piecewise polynomial interpolation is
that large number of data points can be t with low-degree
polynomials
In piecewise interpolation of given data points (t
i
, y
i
),
different function is used in each subinterval [t
i
, t
i+1
]
Abscissas t
i
are called knots or breakpoints, at which
interpolant changes from one function to another
Michael T. Heath Scientic Computing 40 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Piecewise Interpolation, continued
Simplest example is piecewise linear interpolation, in
which successive pairs of data points are connected by
straight lines
Although piecewise interpolation eliminates excessive
oscillation and nonconvergence, it appears to sacrice
smoothness of interpolating function
We have many degrees of freedom in choosing piecewise
polynomial interpolant, however, which can be exploited to
obtain smooth interpolating function despite its piecewise
nature
< interactive example >
Michael T. Heath Scientic Computing 41 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Hermite Interpolation
In Hermite interpolation, derivatives as well as values of
interpolating function are taken into account
Including derivative values adds more equations to linear
system that determines parameters of interpolating
function
To have unique solution, number of equations must equal
number of parameters to be determined
Piecewise cubic polynomials are typical choice for Hermite
interpolation, providing exibility, simplicity, and efciency
Michael T. Heath Scientic Computing 42 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Hermite Cubic Interpolation
Hermite cubic interpolant is piecewise cubic polynomial
interpolant with continuous rst derivative
Piecewise cubic polynomial with n knots has 4(n 1)
parameters to be determined
Requiring that it interpolate given data gives 2(n 1)
equations
Requiring that it have one continuous derivative gives n 2
additional equations, or total of 3n 4, which still leaves n
free parameters
Thus, Hermite cubic interpolant is not unique, and
remaining free parameters can be chosen so that result
satises additional constraints
Michael T. Heath Scientic Computing 43 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Cubic Spline Interpolation
Spline is piecewise polynomial of degree k that is k 1
times continuously differentiable
For example, linear spline is of degree 1 and has 0
continuous derivatives, i.e., it is continuous, but not
smooth, and could be described as broken line
Cubic spline is piecewise cubic polynomial that is twice
continuously differentiable
As with Hermite cubic, interpolating given data and
requiring one continuous derivative imposes 3n 4
constraints on cubic spline
Requiring continuous second derivative imposes n 2
additional constraints, leaving 2 remaining free parameters
Michael T. Heath Scientic Computing 44 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Cubic Splines, continued
Final two parameters can be xed in various ways
Specify rst derivative at endpoints t
1
and t
n
Force second derivative to be zero at endpoints, which
gives natural spline
Enforce not-a-knot condition, which forces two
consecutive cubic pieces to be same
Force rst derivatives, as well as second derivatives, to
match at endpoints t
1
and t
n
(if spline is to be periodic)
Michael T. Heath Scientic Computing 45 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Example: Cubic Spline Interpolation
Determine natural cubic spline interpolating three data
points (t
i
, y
i
), i = 1, 2, 3
Required interpolant is piecewise cubic function dened by
separate cubic polynomials in each of two intervals [t
1
, t
2
]
and [t
2
, t
3
]
Denote these two polynomials by
p
1
(t) =
1
+
2
t +
3
t
2
+
4
t
3
p
2
(t) =
1
+
2
t +
3
t
2
+
4
t
3
Eight parameters are to be determined, so we need eight
equations
Michael T. Heath Scientic Computing 46 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Example, continued
Requiring rst cubic to interpolate data at end points of rst
interval [t
1
, t
2
] gives two equations
1
+
2
t
1
+
3
t
2
1
+
4
t
3
1
= y
1
1
+
2
t
2
+
3
t
2
2
+
4
t
3
2
= y
2
Requiring second cubic to interpolate data at end points of
second interval [t
2
, t
3
] gives two equations
1
+
2
t
2
+
3
t
2
2
+
4
t
3
2
= y
2
1
+
2
t
3
+
3
t
2
3
+
4
t
3
3
= y
3
Requiring rst derivative of interpolant to be continuous at
t
2
gives equation
2
+ 2
3
t
2
+ 3
4
t
2
2
=
2
+ 2
3
t
2
+ 3
4
t
2
2
Michael T. Heath Scientic Computing 47 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Example, continued
Requiring second derivative of interpolant function to be
continuous at t
2
gives equation
2
3
+ 6
4
t
2
= 2
3
+ 6
4
t
2
Finally, by denition natural spline has second derivative
equal to zero at endpoints, which gives two equations
2
3
+ 6
4
t
1
= 0
2
3
+ 6
4
t
3
= 0
When particular data values are substituted for t
i
and y
i
,
system of eight linear equations can be solved for eight
unknown parameters
i
and
i
Michael T. Heath Scientic Computing 48 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Hermite Cubic vs Spline Interpolation
Choice between Hermite cubic and spline interpolation
depends on data to be t and on purpose for doing
interpolation
If smoothness is of paramount importance, then spline
interpolation may be most appropriate
But Hermite cubic interpolant may have more pleasing
visual appearance and allows exibility to preserve
monotonicity if original data are monotonic
In any case, it is advisable to plot interpolant and data to
help assess how well interpolating function captures
behavior of original data
< interactive example >
Michael T. Heath Scientic Computing 49 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
Hermite Cubic vs Spline Interpolation
Michael T. Heath Scientic Computing 50 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
B-splines
B-splines form basis for family of spline functions of given
degree
B-splines can be dened in various ways, including
recursion (which we will use), convolution, and divided
differences
Although in practice we use only nite set of knots
t
1
, . . . , t
n
, for notational convenience we will assume
innite set of knots
< t
2
< t
1
< t
0
< t
1
< t
2
<
Additional knots can be taken as arbitrarily dened points
outside interval [t
1
, t
n
]
We will also use linear functions
v
k
i
(t) = (t t
i
)/(t
i+k
t
i
)
Michael T. Heath Scientic Computing 51 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
B-splines, continued
To start recursion, dene B-splines of degree 0 by
B
0
i
(t) =
_
1 if t
i
t < t
i+1
0 otherwise
and then for k > 0 dene B-splines of degree k by
B
k
i
(t) = v
k
i
(t)B
k1
i
(t) + (1 v
k
i+1
(t))B
k1
i+1
(t)
Since B
0
i
is piecewise constant and v
k
i
is linear, B
1
i
is
piecewise linear
Similarly, B
2
i
is in turn piecewise quadratic, and in general,
B
k
i
is piecewise polynomial of degree k
Michael T. Heath Scientic Computing 52 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
B-splines, continued
< interactive example >
Michael T. Heath Scientic Computing 53 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
B-splines, continued
Important properties of B-spline functions B
k
i
1
For t < t
i
or t > t
i+k+1
, B
k
i
(t) = 0
2
For t
i
< t < t
i+k+1
, B
k
i
(t) > 0
3
For all t,
i=
B
k
i
(t) = 1
4
For k 1, B
k
i
has k 1 continuous derivatives
5
Set of functions B
k
1k
, . . . , B
k
n1
is linearly independent
on interval [t
1
, t
n
] and spans space of all splines of degree
k having knots t
i
Michael T. Heath Scientic Computing 54 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
B-splines, continued
Properties 1 and 2 together say that B-spline functions
have local support
Property 3 gives normalization
Property 4 says that they are indeed splines
Property 5 says that for given k, these functions form basis
for set of all splines of degree k
Michael T. Heath Scientic Computing 55 / 56
Interpolation
Polynomial Interpolation
Piecewise Polynomial Interpolation
Piecewise Polynomial Interpolation
Hermite Cubic Interpolation
Cubic Spline Interpolation
B-splines, continued
If we use B-spline basis, linear system to be solved for
spline coefcients will be nonsingular and banded
Use of B-spline basis yields efcient and stable methods
for determining and evaluating spline interpolants, and
many library routines for spline interpolation are based on
this approach
B-splines are also useful in many other contexts, such as
numerical solution of differential equations, as we will see
later
Michael T. Heath Scientic Computing 56 / 56
AMS527: Numerical Analysis II
Review for Test 1
Xiangmin Jiao
SUNY Stony Brook
Xiangmin Jiao SUNY Stony Brook AMS527: Numerical Analysis II 1 / 5
Approximations in Scientic Computations
Concepts
Absolute error, relative error
Computational error, propagated data error
Truncation error, rounding error
Forward error, backward error
Condition number, stability
Cancellation
Xiangmin Jiao SUNY Stony Brook AMS527: Numerical Analysis II 2 / 5
Solutions of Nonlinear Equations
Concepts
Multiplicity
Sensitivity
Convergence rate
Basic methods
Interval bisection method
Fixed-point iteration
Newtons method
Secant method, Broydens method
Other Newton-like method
Xiangmin Jiao SUNY Stony Brook AMS527: Numerical Analysis II 3 / 5
Numerical Optimization
Concepts
Unconstrained optimization, constrained optimization (linear
vs. nonlinear programming)
Global vs. local minimum
First- and second-order optimality condition
Coercive, convex, unimodality
Methods for unconstrained optimization
Golden section search
Newtons method, Quasi-Newton methods (basic ideas)
Steepest descent, conjugate gradient (basic ideas)
Methods for constrained optimization (especially
equality-constrained optimization)
Lagrange multiplier for constrained optimization
Lagrange function and its solution
Linear programming
Xiangmin Jiao SUNY Stony Brook AMS527: Numerical Analysis II 4 / 5
Polynomial interpolation
Concepts
Existence and uniqueness
Interpolation vs. approximation
Accuracy; Runges phenomena
Methods
Monomial basis
Lagrange interpolant
Newton interpolation and divided dierence
Orthogonal polynomials
Xiangmin Jiao SUNY Stony Brook AMS527: Numerical Analysis II 5 / 5