Approximation Theory
Approximation Theory
Approximation Theory
Introduction
In 1853, the great Russian mathematician, P. L. Chebyshev C ebysev], while working on a
problem of linkages, devices which translate the linear motion of a steam engine into the
circular motion of a wheel, considered the following problem:
Given a continuous function f dened on a closed interval a b ] and a positive
P
integer n, can we \represent" f by a polynomial p(x) = nk=0 ak xk , of degree
at most n, in such a way that the maximum error at any point x in a b ] is
controlled? In particular, is it possible to construct p in such a way that the error
max jf (x) ; p(x)j is minimized?
a x b
This problem raises several questions, the rst of which Chebyshev himself ignored:
{ Why should such a polynomial even exist?
{ If it does, can we hope to construct it?
{ If it exists, is it also unique?
R
{ What happens if we change the measure of the error to, say, ab jf (x) ; p(x)j2 dx?
Chebyshev's problem is perhaps best understood by rephrasing it in modern terms.
What we have here is a problem of linear approximation in a normed linear space. Recall
that a norm on a (real) vector space X is a nonnegative function on X satisfying
kxk 0, and kxk = 0 () x = 0
kxk = jjkxk for 2 R
kx + yk kxk + kyk for any x, y 2 X .
Any norm on X induces a metric or distance function by setting dist(x y) = kx ; yk. The
abstract version of our problem(s) can now be restated:
{ Given a subset (or even a subspace) Y of X and a point x 2 X , is there an
element y 2 Y which is \nearest" to x that is, can we nd a vector y 2 Y such
that kx ; yk = zinf 2Y
kx ; zk? If there is such a \best approximation" to x from
elements of Y , is it unique?
Preliminaries 2
Examples
;P
1. In X = Rn with its usual norm k(xk )nk=1k2 = nk=1 jxk j2 1=2, the problem has
a complete solution for any subspace (or, indeed, any closed convex set) Y . This
problem is often considered in Calculus or Linear Algebra where it is called \least-
squares approximation." A large part of the current course will be taken up with
least-squares approximations, too. For now let's simply note that the problem changes
character dramatically if we consider a dierent norm on Rn.
Consider X = R2 under the norm k(x y)k = maxfjxj jyjg, and consider the
subspace Y = f(0 y) : y 2 Rg (i.e., the y-axis). It's not hard to see that the point
x = (1 0) 2 R2 has innitely many nearest points in Y indeed, every point (0 y),
;1 y 1, is nearest to x.
2. There are many norms we might consider on Rn. Of particular interest are the `p-
norms that is, the scale of norms:
X
n !1=p
k(xi )ni=1 kp = jxk jp 1p<1
k=1
and
k(xi )ni=1k1 = 1max jx j:
i n i
It's easy to see that k k1 and k k1 dene norms. The other cases take a bit more
work we'll supply full details later.
3. Our original problem concerns X = C a b ], the space of all continuous functions
f : a b ] ! R under the uniform norm kf k = amaxx b
jf (x)j. The word \uniform" is
used because convergence in this norm is the same as uniform convergence on a b ]:
kfn ; f k ! 0 () fn f on a b ]:
In this case we're interested in approximations by elements of Y = Pn, the subspace
of all polynomials of degree at most n in C a b ]. It's not hard to see that Pn is a
nite-dimensional subspace of C a b ] of dimension exactly n + 1. (Why?)
If we consider the subspace Y = P consisting of all polynomials in X = C a b ],
we readily see that the existence of best approximations can be problematic. It follows
Preliminaries 3
from the Weierstrass theorem, for example, that each f 2 C a b ] has distance 0
from P but, since not every f 2 C a b ] is a polynomial (why?), we can't hope for
a best approximating polynomial to exist in every case. For example, the function
f (x) = x sin(1=x) is continuous on 0 1 ] but can't possibly agree with any polynomial
on 0 1 ]. (Why?)
The key to the problem of polynomial approximation is the fact that each Pn is
nite-dimensional. To see this, it will be most ecient to consider the abstract setting of
nite-dimensional subspaces of arbitrary normed spaces.
\Soft" Approximation
Lemma. Let V be a nite-dimensional vector space. Then, all norms on V are equivalent.
That is, if k k and jjj jjj are norms on V , then there exist constants 0 < A B < 1 such
that
A kxk jjjx jjj B kxk
for all vectors x 2 V .
Proof. Suppose that V is n-dimensional and that k k is a norm on V . Fix a basis
e1 : : : en for V and consider the norm
X X
ai ei = n jaij = k(ai )ni=1k1
n
i=1 1 i=1
P
for x = ni=1 ai ei 2 V . Since e1 : : : en is a basis for V , it's not hard to see that k k1 is,
indeed, a norm on V . It now suces to show that k k and k k1 are equivalent. (Why?)
One inequality is easy to show indeed, notice that
X n X X
n ai ei X ja jk e k max k e k
n
ja j = B
n
a e :
i=1 i=1 i i 1 i n i
i=1
i i=1 1
i i
kx ; y k = min
y2Y
kx ; yk
for all y 2 Y . That is, there is a best approximation to x by elements of Y .
Proof. First notice that since 0 2 Y , we know that a nearest point y will satisfy
kx ; y k kxk = kx ; 0k. Thus, it suces to look for y among the vectors y 2 Y
satisfying kx ; yk kxk. It will be convenient to use a slightly larger set of vectors,
though. By the triangle inequality,
K = fy 2 Y : kyk 2kxkg:
To nish the proof, we need only notice that the function f (y) = kx ; yk is continuous:
jf (y) ; f (z)j = kx ; yk ; kx ; zk ky ; zk
hence attains a minimum value at some point y 2 K .
Corollary. For each f 2 C a b ], and each positive integer n, there is a (not necessarily
unique) polynomial pn 2 Pn such that
kf ; pnk = pmin
2Pn
kf ; pk:
Preliminaries 6
Corollary. Given f 2 C a b ] and a ( xed) positive integer n, there exists a constant
R < 1 such that if X n
f ; ak xk kf k
k=0
then 0max ja j R.
k n k
Examples
Nothing in our Corollary says that pn will be a polynomial of degree exactly n|rather, a
polynomial of degree at most n. For example, the best approximation to f (x) = x by a
polynomial of degree at most 3 is, of course, p(x) = x. Even examples of non-polynomial
functions are easy to come by for instance, the best linear approximation to f (x) = jxj
on ;1 1 ] is actually the constant function p(x) = 1=2, and this makes for an entertaining
exercise.
Before we leave these \soft" arguments behind, let's discuss the problem of uniqueness
of best approximations. First, let's see why we want best approximations to be unique:
Lemma. Let Y be a nite-dimensional subspace of a normed linear space X , and suppose
that each x 2 X has a unique nearest point yx 2 Y . Then, the nearest point map x 7! yx
is continuous.
Proof. Let's write P (x) = yx for the nearest point map, and let's suppose that xn ! x
in X . We want to show that P (xn ) ! P (x), and for this it's enough to show that there is
a subsequence of (P (xn )) which converges to P (x). (Why?)
Since the sequence (xn ) is bounded in X , say kxn k M for all n, we have
kP (xn )k kP (xn ) ; xn k + kxnk 2kxnk 2M:
Thus, (P (xn )) is a bounded sequence in Y , a nite-dimensional space. As such, by passing
to a subsequence, we may suppose that (P (xn )) converges to some element P0 2 Y . (How?)
Now we need to show that P0 = P (x). But
kP (xn) ; xnk kP (x) ; xnk (why?)
for any n, and hence, letting n ! 1,
kP0 ; xk kP (x) ; xk:
Preliminaries 7
Since nearest points in Y are unique, we must have P0 = P (x).
Exercise
Let X be a normed linear space and let P : X ! X . Show that P is continuous at x 2 X
if and only if, whenever xn ! x in X , some subsequence of (P (xn )) converges to P (x).
Hint: The forward direction is easy for the backward implication, suppose that (P (xn ))
fails to converge to P (x) and work toward a contradiction.]
It should be pointed out that the nearest point map is, in general, nonlinear and, as
such, can be very dicult to work with. Later we'll see at least one case in which nearest
point maps always turn out to be linear.
We next observe that the set of best approximations is always pretty reasonable:
Theorem. Let Y be a subspace of a normed linear space X , and let x 2 X . The set Yx,
consisting of all best approximations to x out of Y , is a bounded, convex set.
Proof. As we've seen, the set Yx is a subset of fy 2 X : kyk 2kxkg and, hence, is
bounded.
Now recall that a subset K of a vector space V is said to be convex if K contains the
line segment joining any pair of its points. Specically, K is convex if
x y 2 K 0 1 =) x + (1 ; )y 2 K:
Now, y1 , y2 2 Yx means that
kx ; y1k = kx ; y2 k = min
y2Y
kx ; yk:
Next, given 0 1, set y = y1 + (1 ; )y2 . We want to show that y 2 Yx, but notice
that we at least have y 2 Y . Finally, we estimate:
kx ; y k = kx ; (y1 + (1 ; )y2 )k
= k(x ; y1 ) + (1 ; )(x ; y2 )k
kx ; y1 k + (1 ; )kx ; y2k
= min
y2Y
kx ; yk:
Preliminaries 8
Hence, kx ; y k = min
y2Y
kx ; yk that is, y 2 Yx.
Exercise
If, in addition, Y is nite-dimensional, show that Yx is closed (hence compact).
If Yx contains more than one point, then, in fact, it contains an entire line segment.
Thus, Yx is either empty, contains exactly one point, or contains innitely many points.
This observation gives us a sucient condition for uniqueness of nearest points: If our
normed space X contains no line segments on any sphere fx 2 X : kxk = rg, then any
best approximation (out of any set) will be unique.
A norm k k on a vector space X is said to be strictly convex if, for any x 6= y 2 X
with kxk = r = kyk, we always have kx + (1 ; )yk < r for any 0 < < 1. That is,
the open line segment between any pair of points on the surface of the ball of radius r in
X lies entirely inside the ball. We often simply say that the space X is strictly convex,
with the understanding that a property of the norm in X is implied. Here's an immediate
corollary to our last result:
Corollary. If X has a strictly convex norm, then, for any subspace Y of X and any point
x 2 X , there can be at most one best approximation to x out of Y . That is, Yx is either
empty or consists of a single point.
In order to arrive at a condition that's somewhat easier to check, let's translate our
original denition into a statement about the triangle inequality in X .
Lemma. X has a strictly convex norm if and only if the triangle inequality is strict on
non-parallel vectors that is, if and only if
x 6= y y 6= x all 2 R =) kx + yk < kxk + kyk:
Proof. First suppose that X is strictly convex, and let x and y be non-parallel vectors
in X . Then, in particular, the vectors x=kxk and y=kyk must be dierent. (Why?) Hence,
kxk x kyk y
kxk + kyk kxk + kxk + kyk kyk < 1:
That is, kx + yk < kxk + kyk.
Preliminaries 9
Next suppose that the triangle inequality is strict on non-parallel vectors, and let
x 6= y 2 X with kxk = r = kyk. If x and y are parallel, then we must have y = ;x.
(Why?) In this case,
kx + (1 ; ) yk = j2 ; 1jkxk < r
since j2 ; 1j < 1 whenever 0 < < 1. Otherwise, x and y are non-parallel. In this case,
for any 0 < < 1, the vectors x and (1 ; ) y are likewise non-parallel. Thus,
Examples
1. The usual norm on C a b ] is not strictly convex (and so the problem of uniqueness
of best approximations is all the more interesting to tackle). For example, if f (x) = x
and g(x) = x2 in C 0 1 ], then kf k = 1 = kgk, f 6= g, while kf + gk = 2. (Why?)
2. The usual norm on Rn is strictly convex, as is any one of the norms k kp, 1 < p < 1.
(We'll prove these facts shortly.) The norms k k1 and k k1, on the other hand, are
not strictly convex. (Why?)
Appendix A
For completeness, we supply a few of the missing details concerning the `p-norms. We
begin with a handful of classical inequalities of independent interest. First recall that we
have dened a scale of \norms" on Rn by setting:
X
n !1=p
kxkp = jxi jp 1p<1
i=1
and
kxk1 = 1max jx j
i n i
where x = (xi )ni=1 2 Rn. Please note that the case p = 2 gives the usual Euclidean norm
on Rn and that the cases p = 1 and p = 1 clearly give rise to legitimate norms on Rn.
Common parlance is to refer to these expressions as `p-norms and to refer to the space
(Rn k kp) as `np. The space of all innite sequences x = (xn )1
n=1 for which the analogous
Preliminaries 10
innite sum (or supremum) kxkp is nite is referred to as `p. What's more, there is a
\continuous" analogue of this scale: We might also consider the norms
Z b !1=p
kf kp = jf (x)jp dx 1p<1
a
and
kf k1 = sup jf (x)j
a x b
where f is in C a b ] (or is simply Lebesgue integrable). The subsequent discussion actually
covers all of these cases, but we will settle for writing our proofs in the Rn setting only.
Lemma. (Young's inequality): Let 1 < p < 1, and let 1 < q < 1 be de ned by
p
p + q = 1 that is, q = p;1 . Then, for any a, b 0, we have
1 1
ab 1p ap + 1q bq :
Moreover, equality can only occur if ap = bq . (We refer to p and q as conjugate exponents
note that p satises p = q;q 1 . Please note that the case p = q = 2 yields the familiar
arithmetic-geometric mean inequality.)
Proof. A quick calculation before we begin:
q ; 1 = p ;p 1 ; 1 = p ;p(;p ;1 1) = p ;1 1 :
Now we just estimate areas for this you might nd it helpful to draw the graph of y = xp;1
(or, equivalently, the graph of x = yq;1 ). Comparing areas we get:
Za Zb
ab x dx + yq;1 dy = p1 ap + 1q bq :
p ; 1
0 0
The case for equality also follows easily from the graph of y = xp;1 (or x = yq;1 ), since
b = ap;1 = ap=q means that ap = bq .
Corollary. (H
older's inequality): Let 1 < p < 1, and let 1 < q < 1 be de ned by
1 1
p + q = 1. Then, for any a1 : : : an and b1 : : : bn in R we have:
Xn X
n !1=p X
n !1=q
jai bi j jaijp jbi jq :
i=1 i=1 i=1
(Please note that the case p = q = 2 yields the familiar Cauchy-Schwarz inequality.)
Moreover, equality in Holder's inequality can only occur if there exist nonnegative
scalars and such that jai jp = jbi jq for all i = 1 : : : n.
Preliminaries 11
P P
Proof. Let A = ( ni=1 jai jp )1=p and let B = ( ni=1 jbi jq )1=q . We may clearly assume
that A, B 6= 0 (why?), and hence we may divide (and appeal to Young's inequality):
jaibi j jai jp + jbi jq :
AB pAp qBq
Adding, we get:
1 Xn
ja b j 1 X n
j a jp + 1 X n
jb j q = 1 + 1 = 1:
AB i=1 i i pAp i=1 i qBq i=1 i p q
P
That is, ni=1 jai bi j AB.
The case for equality in H
older's inequality follows from what we know about Young's
inequality: Equality in H
older's inequality means that either A = 0, or B = 0, or else
jai jp=pAp = jbi jq =qBq for all i = 1 : : : n. In short, there must exist nonnegative scalars
and such that jai jp = jbi jq for all i = 1 : : : n.
Notice, too, that the case p = 1 (q = 1) works, and is easy:
X
n X
n !
jaibi j jai j max jb j
1 i n i
:
i=1 i=1
Exercise
When does equality occur in the case p = 1 (q = 1)?
Finally, an application of H
older's inequality leads to an easy proof that kkp is actually
a norm. It will help matters here if we rst make a simple observation: If 1 < p < 1 and
if q = p;p 1 , notice that
( ja jp;1)n = X
n
jai jp
!(p;1)=p
= kakpp;1:
i i=1 q
i=1
Lemma. (Minkowski's inequality): Let 1 < p < 1 and let a = (ai )ni=1, b = (bi )ni=1 2 Rn.
Then, ka + bkp kakp + kbkp.
Proof. In order to prove the triangle inequality, we once again let q be dened by
1 1
p + q = 1, and now we use H
older's inequality to estimate:
Xn Xn
jai + bi j = jai + bi j jai + bi jp;1
p
i=1 i=1
Preliminaries 12
X
n X
n
jaij jai + bi jp;1 + jbi j jai + bi jp;1
i=1 i=1
p ; n
kakp k ( jai + bi j )i=1kq +
1 kykp k ( jai + bi jp;1)ni=1kq
= ka + bkpp;1 ( kakp + kbkp) :
That is, ka + bkpp ka + bkpp;1 ( kakp + kbkp), and the triangle inequality follows.
If 1 < p < 1, then equality in Minkowski's inequality can only occur if a and b
are parallel that is, the `p-norm is strictly convex for 1 < p < 1. Indeed, if ka + bkp =
kakp +kbkp , then either a = 0, or b = 0, or else a, b 6= 0 and we have equality at each stage of
our proof. Now equality in the rst inequality means that jai + bi j = jai j + jbi j, which easily
implies that ai and bi have the same sign. Next, equality in our application of H
older's
inequality implies that there are nonnegative scalars C and D such that jaijp = C jai + bi jp
and jbi jp = D jai + bi jp for all i = 1 : : : n. Thus, ai = E bi for some scalar E and all
i = 1 : : : n.
Of course, the triangle inequality also holds in either of the cases p = 1 or p = 1
(with much simpler proofs).
Exercises
When does equality occur in the triangle inequality in the cases p = 1 or p = 1? In
particular, show that neither of the norms k k1 or k k1 is strictly convex.
Appendix B
Next, we provide a brief review of completeness and compactness. Such review is doomed
to inadequacy the reader unfamiliar with these concepts would be well served to consult
a text on advanced calculus such as Analysis in Euclidean Spaces by K. Homan, or
Principles of Mathematical Analysis by W. Rudin.
To begin, we recall that a subset A of normed space X (such as R or Rn) is said to be
closed if A is closed under the taking of sequential limits. That is, A is closed if, whenever
(an ) is a sequence from A converging to some point x 2 X , we always have x 2 A. It's not
hard to see that any closed interval, such as a b ] or a 1), is, indeed, a closed subset of
R in this sense. There are, however, much more complicated examples of closed sets in R.
Preliminaries 13
A normed space X is said to be complete if every Cauchy sequence from X converges
(to a point in X ). It is a familiar fact from Calculus that R is complete, as is Rn. In fact,
the completeness of R is often assumed as an axiom (in the form of the least upper bound
axiom). There are, however, many examples of normed spaces which are not complete
that is, there are examples of normed spaces in which Cauchy sequences need not converge.
We say that a subset A of a normed space X is complete if every Cauchy sequence
from A converges to a point in A. Please note here that we require not only that Cauchy
sequences from A converge, but also that the limit be back in A. As you might imagine,
the completeness of A depends on properties of both A and the containing space X .
First note that a complete subset is necessarily also closed. Indeed, since every con-
vergent sequence is also Cauchy, it follows that a complete subset is closed.
Exercise
If A is a complete subset of a normed space X , show that A is also closed.
If the containing space X is itself complete, then it's easy to tell which of its subsets
are complete. Indeed, since every Cauchy sequence in X converges (somewhere), all we
need to know is whether the subset is closed.
Exercise
Let A be a subset of a complete normed space X . Show that A is complete if and only if
A is a closed subset of X . In particular, please note that every closed subset of R (or Rn)
is complete.
Finally, we recall that a subset A of a normed space X is said to be compact if every
sequence from A has a subsequence which converges to a point in A. Again, since we
have insisted that certain limits remain in A, it's not hard to see that compact sets are
necessarily also closed.
Exercise
If A is a compact subset of a normed space X , show that A is also closed.
Moreover, since a Cauchy sequence with a convergent subsequence must itself converge
Preliminaries 14
(why?), we actually have that every compact set is necessarily complete.
Exercise
If A is a compact subset of a normed space X , show that A is also complete.
Since the compactness of a subset A has something to do with every sequence in A,
it's not hard to believe that it is a more stringent property than the others we've considered
so far. In particular, it's not hard to see that a compact set must be bounded.
Exercise
If A is a compact subset of a normed space X , show that A is also bounded. Hint: If not,
then A would contain a sequence (an ) with kank ! 1.]
Now it is generally not so easy to describe the compact subsets of a particular normed
space X , however, it is quite easy to describe the compact subsets of R (or Rn). This
well-known result goes by many names we will refer to it as the Heine-Borel theorem.
Theorem. A subset A of R (or Rn) is compact if and only if A is both closed and
bounded.
Proof. One direction of the proof is easy: As we've already seen, compact sets in R
are necessarily closed and bounded. For the other direction, notice that if A is a bounded
subset of R, then it follows from the Bolzano-Weierstrass theorem that every sequence
from A has a subsequence which converges in R. If A is also a closed set, then this limit
must, in fact, be back in A. Thus, every sequence in A has a subsequence converging to a
point in A.
Appendix C
We next oer a brief review of pointwise and uniform convergence. We begin with an
elementary example:
Example
(a) For each n = 1 2 3 : : :, consider the function fn(x) = ex + nx for x 2 R. Note that
for each (xed) x the sequence (fn (x))1 x
n=1 converges to f (x) = e because
jfn(x) ; f (x)j = jnxj ! 0 as n ! 1:
Preliminaries 15
In this case we say that the sequence of functions (fn ) converges pointwise to the
function f on R. But notice, too, that the rate of convergence depends on x. In
particular, in order to get jfn(x) ; f (x)j < 1=2 we would need to take n > 2jxj. Thus,
at x = 2, the inequality is satised for all n > 4, while at x = 1000, the inequality is
satised only for n > 2000. In short, the rate of convergence is not uniform in x.
(b) Consider the same sequence of functions as above, but now let's suppose that we
restrict that values of x to the interval ;5 5 ]. Of course, we still have that fn(x) !
f (x) for each (xed) x in ;5 5 ] in other words, we still have that (fn ) converges
pointwise to f on ;5 5 ]. But notice that the rate of convergence is now uniform over
x in ;5 5 ]. To see this, just rewrite the initial calculation:
jfn(x) ; f (x)j = jnxj n5 for x 2 ;5 5 ]
and notice that the upper bound 5=n tends to 0, as n ! 1, independent of the choice
of x. In this case, we say that (fn ) converges uniformly to f on ;5 5 ]. The point
here is that the notion of uniform convergence depends on the underlying domain as
well as on the sequence of functions at hand.
With this example in mind, we now oer formal denitions of pointwise and uniform
convergence. In both cases we consider a sequence of functions fn : X ! R, n = 1 2 3 : : :,
each dened on the same underlying set X , and another function f : X ! R (the candidate
for the limit).
We say that (fn ) converges pointwise to f on X if, for each x 2 X , we have fn(x) !
f (x) as n ! 1 thus, for each x 2 X and each " > 0, we can nd an integer N (which
depends on " and which may also depend on x) such that jfn(x) ; f (x)j < " whenever
n > N . A convenient shorthand for pointwise convergence is: fn ! f on X or, if X is
understood, simply fn ! f .
We say that (fn ) converges uniformly to f on X if, for each " > 0, we can nd
an integer N (which depends on " but not on x) such that jfn(x) ; f (x)j < " for each
x 2 X , provided that n > N . Please notice that the phrase \for each x 2 X " now occurs
well after the phrase \for each " > 0" and, in particular, that the rate of convergence N
does not depend on x. It should be reasonably clear that uniform convergence implies
Preliminaries 16
pointwise convergence in other words, uniform convergence is \stronger" than pointwise
convergence. For this reason, we sometimes use the shorthand: fn f on X or, if X is
understood, simply fn f .
The denition of uniform convergence can be simplied by \hiding" one of the quan-
tiers under dierent notation indeed, note that the phrase \jfn(x) ; f (x)j < " for any
x 2 X " is (essentially) equivalent to the phrase \supx2X jfn (x) ; f (x)j < "." Thus, our
denition may be reworded as follows: (fn) converges uniformly to f on X if, given " > 0,
there is an integer N such that supx2X jfn (x) ; f (x)j < " for all n > N .
The notion of uniform convergence exists for one very good reason: Continuity is
preserved under uniform limits. This fact is well worth stating.
Exercise
Let X be a subset of R, let f , fn : X ! R for n = 1 2 3 : : :, and let x0 2 X . If each fn
is continuous at x0 , and if fn f on X , then f is continuous at x0. In particular, if each
fn is continuous on all of X , then so is f . Give an example showing that this result may
fail if we only assume that fn ! f on X .
Appendix D
Lastly, we discuss continuity for linear transformations between normed vector spaces.
Throughout this section, we consider a linear map T : V ! W between vector spaces V
and W that is we suppose that T satises T (x + y) = T (x) + T (y) for all x, y 2 V ,
and all scalars , . Please note that every linear map T satises T (0) = 0. If we further
suppose that V is endowed with the norm k k, and that W is endowed with the norm
jjj jjj, the we may consider the issue of continuity of the map T .
The key result for our purposes is that, for linear maps, continuity|even at a single
point|is equivalent to uniform continuity (and then some!).
Theorem. Let (V k k ) and (W jjj jjj ) be normed vector spaces, and let T : V ! W be a
linear map. Then, the following are equivalent:
(i) T is Lipschitz
(ii) T is uniformly continuous
Preliminaries 17
(iii) T is continuous (everywhere)
(iv) T is continuous at 0 2 V
(v) there is a constant C < 1 such that jjj T (x) jjj C kxk for all x 2 V .
Proof. Clearly, (i) =) (ii) =) (iii) =) (iv). We need to show that (iv) =) (v), and
that (v) =) (i) (for example). The second of these is easier, so let's start there.
(v) =) (i): If condition (v) holds for a linear map T , then T is Lipschitz (with constant
C ) since jjj T (x) ; T (y) jjj = jjj T (x ; y) jjj C kx ; yk for any x, y 2 V .
(iv) =) (v): Suppose that T is continuous at 0. Then we may choose a
> 0 so that
jjj T (x) jjj = jjj T (x) ; T (0) jjj 1 whenever kxk = kx ; 0k
. (How?)
Given 0 6= x 2 V , we may scale by the factor
=kxk to get
x=kxk =
. Hence,
T ;
x=kxk 1. But T ;
x=kxk = (
=kxk) T (x), since T is linear, and so we get
jjj T (x) jjj (1=
)kxk. That is, C = 1=
works in condition (v). (Note that since condition
(v) is trivial for x = 0, we only care about the case x 6= 0.)
A linear map satisfying condition (v) of the Theorem (i.e., a continuous linear map)
is often said to be bounded. The meaning in this context is slightly dierent than usual.
Here it means that T maps bounded sets to bounded sets. This follows from the fact
that T is Lipschitz. Indeed, if jjj T (x) jjj C kxk for all x 2 V , then (as we've seen)
jjj T (x) ; T (y) jjj C kx ; yk for any x, y 2 V , and hence T maps the ball about x of
;
radius r into the ball about T (x) of radius Cr. In symbols, T Br (x) BCr ( T (x)). More
generally, T maps a set of diameter d into a set of diameter at most Cd. There's no danger
of confusion in our using the word bounded to mean something new here the ordinary
usage of the word (as applied to functions) is uninteresting for linear maps. A nonzero
linear map always has an unbounded range. (Why?)
The smallest constant that works in (v) is called the norm of the operator T and is
usually written kT k. In symbols,
The most important collection of functions for our purposes is the space C a b ], consisting
of all continuous functions f : a b ] ! R. It's easy to see that C a b ] is a vector space
under the usual pointwise operations on functions: (f + g)(x) = f (x)+ g(x) and (f )(x) =
f (x) for 2 R. Actually, we will be most interested in the nite-dimensional subspaces
Pn of C a b ], consisting of all algebraic polynomials of degree at most n.
. 1. The subspace Pn has dimension exactly n + 1. Why?
Another useful subset of C a b ] is the collection lipK , consisting of all those f 's which
satisfy a Lipschitz condition of order > 0 with constant 0 < K < 1 i.e., those f 's for
which jf (x) ; f (y)j K jx ; yj for all x, y in a b ]. Some authors would say that f is
Holder continuous with exponent .]
2. (a) Show that lipK is, indeed, a subset of C a b ].
(b) If > 1, show that lipK contains only the constant functions.
p
(c) Show that x is in lip1 (1=2) and that sin x is in lip11 on 0 1 ].
(d) Show that the collection lip , consisting of all those f 's which are in lipK for
some K , is a subspace of C a b ].
(e) Show that lip 1 contains all the polynomials.
(f) If f 2 lip for some > 0, show that f 2 lip for all 0 < < .
(g) Given 0 < < 1, show that x is in lip1 on 0 1 ] but not in lip for any > .
We will also want to consider a norm on the vector space C a b ] we typically use the
uniform or sup norm (Rivlin calls this the Chebyshev norm) dened by kf k = amax x b
jf (x)j.
Some authors write kf ku or kf k1.]
3. Show that Pn and lipK are closed subsets of C a b ] (under the sup norm). Is lip
closed? A bit harder: Show that lip 1 is both rst category and dense in C a b ].
P
. 4. Fix n and consider the norm kpk1 = nk=0 jak j for p(x) = a0 + a1x + + anxn 2 Pn.
Function Spaces 20
Show that there are constants 0 < An Bn < 1 such that Ankpk1 kpk Bnkpk1,
where kpk = max jp(x)j. Do An and Bn really depend on n ?
a x b
We will occasionally consider spaces of real-valued functions dened on nite sets that
is, we will consider Rn under various norms. (Why is this the same?) We dene a scale
P
of norms on Rn by kxkp = ( ni=1 jxi jp)1=p, where x = (x1 : : : xn ) and 1 p < 1 (we
need p 1 in order for this expression to be a legitimate norm, but the expression makes
perfect sense for any p > 0, and even for p < 0 provided no xi is 0). Notice, please, that
the usual norm on Rn is given by kxk2.
5. Show that plim
!1 kxkp = 1max jx j. For this reason we dene kxk1 = 1max
i n i
jx j. Thus
i n i
Rn under the norm k k1 is the same as C (f1 2 : : : ng) with its usual norm.
Introduction
Let's begin with some notation. Throughout, we're concerned with the problem of best
(uniform) approximation of a given function f 2 C a b ] by elements from Pn, the subspace
of algebraic polynomials of degree at most n in C a b ]. We know that the problem has a
solution (possibly more than one), which we've chosen to write as pn . We set
En(f ) = pmin
2P n
kf ; pk = kf ; pnk:
Since Pn Pn+1 for each n, it's clear that En(f ) En+1(f ) for each n. Our goal in this
chapter is to prove that En(f ) ! 0. We'll accomplish this by proving:
Theorem. (The Weierstrass Approximation Theorem, 1885): Let f 2 C a b ]. Then,
for every " > 0, there is a polynomial p such that kf ; pk < ".
It follows from the Weierstrass theorem that pn f for each f 2 C a b ]. (Why?)
This is an important rst step in determining the exact nature of En(f ) as a function of
f and n. We'll look for much more precise information in later sections.
Now there are many proofs of the Weierstrass theorem (a mere three are outlined in
the exercises, but there are hundreds!), and all of them start with one simplication: The
underlying interval a b ] is of no consequence.
Lemma. If the Weierstrass theorem holds for C 0 1 ], then it also holds for C a b ], and
conversely. In fact, C 0 1 ] and C a b ] are, for all practical purposes, identical: They
are linearly isometric as normed spaces, order isomorphic as lattices, and isomorphic as
algebras (rings).
Proof. We'll settle for proving only the rst assertion the second is outlined in the
exercises (and uses a similar argument).
Given f 2 C a b ], notice that the function
;
g(x) = f a + (b ; a)x
0x1
Algebraic Polynomials 24
denes an element of C 0 1 ]. Now, given " > 0, suppose that we can nd a polynomial p
such that kg ; pk < " in other words, suppose that
;
max f a + (b ; a)x ; p(x) < ":
0 x 1
Then, t ; a
max f (t) ; p b ; a < ":
a t b
t;a
(Why?) But if p(x) is a polynomial in x, then q(t) = p b;a is a polynomial in t (again,
why?) satisfying kf ; qk < ".
The
proof of the converse is entirely similar: If g(x) is an element of C 0 1 ], then
f (t) = g bt;;aa , a t b, denes an element of C a b ]. Moreover, if q(t) is a polynomial
in t approximating f (t), then p(x) = q(a + (b ; a)x) is a polynomial in x approximating
g(x). The remaining details are left as an exercise.
The point to our rst result is that it suces to prove the Weierstrass theorem for
any interval we like 0 1 ] and ;1 1 ] are popular choices, but it hardly matters which
interval we use.
Bernstein's Proof
The proof of the Weierstrass theorem we present here is due to the great Russian math-
ematician S. N. Bernstein in 1912. Bernstein's proof is of interest to us for a variety of
reasons perhaps most important is that Bernstein actually displays a sequence of polyno-
mials that approximate a given f 2 C 0 1 ]. Moreover, as we'll see later, Bernstein's proof
generalizes to yield a powerful, unifying theorem, called the Bohman-Korovkin theorem.
If f is any bounded function on 0 1 ], we dene the sequence of Bernstein polynomials
for f by n k n
;B (f )(x) = X f n k xk (1 ; x)n;k 0 x 1:
n
k=0
Please note that Bn(f ) is a polynomial of degree at most n. Also, it's easy to see that
;B (f )(0) = f (0), and ;B (f )(1) = f (1). In general, ;B (f )(x) is an average of
n n n
the numbers f (k=n), k = 0 : : : n. Bernstein's theorem states that Bn(f ) f for each
f 2 C 0 1 ]. Surprisingly, the proof actually only requires that we check three easy cases:
f0 (x) = 1 f1(x) = x and f2(x) = x2 :
Algebraic Polynomials 25
This, and more, is the content of the following lemma.
n n
X k n;k n
k=0 k x (1 ; x) = x + (1 ; x)] = 1:
k n = (n ; 1) !
n ; 1
n k (k ; 1) ! (n ; k) ! = k ; 1 :
Consequently,
n k n
X n n ; 1
X
xk (1 ; x)n;k = x k;1
k ; 1 x (1 ; x)
n;k
k=0 n k k=1
;1 n ; 1
nX
= x j xj (1 ; x)(n;1);j = x:
j =0
To prove (iii) we combine the results in (i) and (ii) and simplify. Since ((k=n) ; x)2 =
(k=n)2 ; 2x(k=n) + x2, we get
n k
X 2 n
n ;x k xk (1 ; x)n;k = 1 ; n1 x2 + n1 x ; 2x2 + x2
k=0
= n1 x(1 ; x) 41n
for 0 x 1.
Improved Estimates
To begin, we will need a bit more notation. The modulus of continuity of a bounded
function f on the interval a b ] is dened by
!f (
) = !f ( a b ]
) = sup jf (x) ; f (y)j : x y 2 a b ] jx ; yj
for any
> 0. Note that !f (
) is a measure of the \"" that goes along with
(in the
denition of uniform continuity) literally, we have written " = !f (
) as a function of
.
Here are a few easy facts about the modulus of continuity:
Exercises
1. We always have jf (x) ; f (y)j !f ( jx ; yj ) for any x 6= y 2 a b ].
2. If 0 <
0
, then !f (
0 ) !f (
).
3. f is uniformly continuous if and only if !f (
) ! 0 as
! 0+.
4. If f 0 exists and is bounded on a b ], then !f (
) K
for some constant K .
5. More generally, we say that f satises a Lipschitz condition of order with constant
K , where 0 < 1 and 0 K < 1, if jf (x) ; f (y)j K jx ; yj for all x, y. We
abbreviate this statement by the symbols: f 2 lipK . Check that if f 2 lipK , then
!f (
) K
for all
> 0.
For the time being, we actually only need one simple fact about !f (
):
We next repeat the proof of Bernstein's theorem, making a few minor adjustments
here and there.
Theorem. For any bounded function f on 0 1 ] we have
kf ; Bn(f )k 23 !f p1n :
In particular, if f 2 C 0 1 ], then En(f ) 23 !f ( p1n ) ! 0 as n ! 1.
Proof. We rst do some term juggling:
Xn
jf (x) ; Bn(f )(x)j = f (x) ; f n k n
k xk (1 ; x)n;k
k=0
Xn k n
f (x) ; f n k xk (1 ; x)n;k
k=0
Xn k n
!f x ; n k xk (1 ; x)n;k
k=0
1 X n p k n
!f pn 1 + n x ; n k xk (1 ; x)n;k
k=0
1 " p X n k
n #
= !f pn 1 + n x ; n k x (1 ; x) k n ; k
k=0
Algebraic Polynomials 31
p
where the third inequality follows from our previous Lemma (by taking = n x ; nk
and
= p1n ). All that remains is to estimate the sum, and for this we'll use Cauchy-
Schwarz (and our earlier observations about Bernstein polynomials). Since each of the
;
terms nk xk (1 ; x)n;k is nonnegative, we have
Xn
x ; k n xk (1 ; x)n;k
k=0 n k
"Xn 2 n #1=2 "Xn n #1=2
k
x ; xk (1 ; x)n;k xk (1 ; x)n;k
k=0 n k k=0 k
1 1=2 1
4n = 2pn :
Finally, 1
p 1
3 1
jf (x) ; Bn(f )(x)j !f pn 1 + n 2pn = 2 !f pn :
Examples
1. If f 2 lipK , it follows that kf ; Bn(f )k 23 Kn;=2 and hence En(f ) 23 Kn;=2.
2. As a particular case of the rst example, consider f (x) = x ; 12 on 0 1 ]. Then
f 2 lip11, and so kf ; Bn(f )k 23 n;1=2 . But, as Rivlin points out (see Remark 3 on
p. 16 of his book), kf ; Bn(f )k > 12 n;1=2 . Thus, we can't hope to improve on the
power of n in this estimate. Nevertheless, we will see an improvement in our estimate
of En(f ).
The Bohman-Korovkin Theorem
The real value to us in Bernstein's approach is that the map f 7! Bn(f ), while providing
a simple formula for an approximating polynomial, is also linear and positive. In other
words,
Bn(f + g) = Bn(f ) + Bn(g)
Bn(f ) = Bn(f ) 2 R
and
Bn(f ) 0 whenever f 0:
As it happens, any positive, linear map T : C 0 1 ] ! C 0 1 ] is necessarily also continuous!
Algebraic Polynomials 32
Lemma. If T : C a b ] ! C a b ] is both positive and linear, then T is continuous.
Proof. First note that a positive, linear map is also monotone. That is, T satises
T (f ) T (g) whenever f g. (Why?) Thus, for any f 2 C a b ], we have
;f f jf j =) ;T (f ) T (f ) T (jf j)
that is, jT (f )j T (jf j). But now jf j kf k 1, where 1 denotes the constant 1 function,
and so we get
jT (f )j T (jf j) kf k T (1):
Thus,
kT (f )k kf k kT (1)k
for any f 2 C a b ]. Finally, since T is linear, it follows that T is Lipschitz with constant
kT (1)k:
kT (f ) ; T (g)k = kT (f ; g)k kT (1)kkf ; gk:
Consequently, T is continuous.
Now positive, linear maps abound in analysis, so this is a fortunate turn of events.
What's more, Bernstein's theorem generalizes very nicely when placed in this new setting.
The following elegant theorem was proved (independently) by Bohman and Korovkin in,
roughly, 1952.
Theorem. Let Tn : C 0 1 ] ! C 0 1 ] be a sequence of positive, linear maps, and suppose
that Tn(f ) ! f uniformly in each of the three cases
One of our rst tasks will be to give a constructive proof of Weierstrass's Theorem, stating
that each f 2 C a b ] is the uniform limit of a sequence of polynomials. As it happens, the
choice of interval a b ] is inconsequential: If Weierstrass's theorem is true for one, then
it's true for all.
. 18. Dene
: 0 1 ] ! a b ] by
(t) = a + t(b ; a) for 0 t 1, and dene a transfor-
mation T : C a b ] ! C 0 1 ] by (T (f ))(t) = f (
(t)). Prove that T satises:
(a) T (f + g) = T (f ) + T (g) and T (cf ) = c T (f ) for c 2 R.
(b) T (fg) = T (f ) T (g). In particular, T maps polynomials to polynomials.
(c) T (f ) T (g) if and only if f g.
(d) kT (f )k = kf k.
(e) T is both one-to-one and onto. Moreover, (T );1 = T;1 .
The point to exercise 18 is that C a b ] and C 0 1 ] are identical as vector spaces, metric
spaces, algebras, and lattices. For all practical purposes, they are one and the same space.
While Bernstein's proof of the Weierstrass theorem (below) will prove most useful for our
purposes, there are many others two of these (in the case of C 0 1 ]) are sketched below.
19. (Landau's proof): For each n = 1 2 : : : and 0
1, dene In(
) = R1 (1 ; x2 )n dx.
Show that In(
)=In (0) ! 0 as n ! 1 for any
> 0. Now, given f 2 C 0 1 ] with
R
f (0) = f (1) = 0, show that the polynomial Ln(x) = (2In(0));1 01 f (t)(1;(t;x)2 )n dt
converges uniformly to f (x) on 0 1 ] as n ! 1. Hint: You may assume that f
0
outside of 0 1 ].] To get the result for general f 2 C 0 1 ], we simply need to subtract
the linear function f (0) + x(f (1) ; f (0)).
20. (Lebesgue's proof): Given f 2 C 0 1 ], rst show that f can be uniformly approxi-
mated by a polygonal function. Specically, given a positive integer N , dene L(x)
by the conditions L(k=N ) = f (k=N ) for k = 0 1 : : : N , and L(x) is linear for
k=N x (k +1)=N show that kf ; Lk is small provided that N is suciently large.
The function L(x) can be written (uniquely) as a linear combination of the \angles"
P
'k (x) = jx ; k=N j + x ; k=N and 'N (x) = 1 the equation L(x) = Nk=0 ck 'k (x) can
P
be solved since the system of equations L(k=N ) = Nk=0 ck 'k (k=N ), k = 0 : : : N ,
Polynomials 35
can be solved (uniquely) for c0 : : : cN . (How?) To nish the proof, we need to show
that jxj can be approximated by polynomials on any interval a b ]. (Why?)
21. Here's an elementary proof that there is a sequence of polynomials (Pn) converging
uniformly to jxj on ;1 1 ].
(a) Dene (Pn) recursively by Pn+1(x) = Pn(x) + x ; Pn(x)2 ]=2, where P0(x) = 0.
Clearly, each Pn is a polynomial.
p
(b) Check that 0 Pn(x) Pn+1(x) x for 0 x 1. Use Dini's theorem to
conclude that Pn(x)
px on 0 1 ].
(c) Pn(x2 ) is also a polynomial, and Pn(x2 ) jxj on ;1 1 ].
. 22. The result in problem 19 (or 20) shows that the polynomials are dense in C 0 1 ].
Using the results in 18, conclude that the polynomials are also dense in C a b ].
. 23. How do we know that there are non-polynomial elements in C 0 1 ]? In other words,
is it possible that every element of C 0 1 ] agrees with some polynomial on 0 1 ]?
24. Let (Qn ) be a sequence of polynomials of degree mn, and suppose that (Qn ) converges
uniformly to f on a b ], where f is not a polynomial. Show that mn ! 1.
25. If f 2 C ;1 1 ] (or C 2 ) is an even function, show that f may be uniformly approxi-
mated by even polynomials (or even trig polynomials).
26. If f 2 C 0 1 ] and if f (0) = f (1) = 0, show that the sequence of polynomials
Pn ;nf (k=n)
xk (1 ; x)n;k with integer coecients converges uniformly to f
k=0 k
(where x] denotes the greatest integer in x). The same trick works for any f 2 C a b ]
provided that 0 < a < b < 1.
27. If p is a polynomial and " > 0, prove that there is a polynomial q with rational
coecients such that kp ; qk < " on 0 1 ]. Conclude that C 0 1 ] is separable.
28. Let (xi ) be a sequence of numbers in (0 1) such that nlim 1 Pn xk exists for every
P !1 n i=1 i
n
!1 n i=1 f (xi ) exists for every f 2 C 0 1 ].
k = 0 1 2 : : :. Show that nlim 1
0:
Introduction
A (real) trigonometric polynomial, or trig polynomial for short, is a function of the form
X
n ;
a0 + ak cos kx + bk sin kx ()
k=1
where a0 : : : an and b1 : : : bn are real numbers. The degree of a trig polynomial is the
highest frequency occurring in any representation of the form () thus, () has degree n
provided that one of an or bn is nonzero. We will use Tn to denote the collection of trig
polynomials of degree at most n, and T to denote the collection of all trig polynomials
(i.e., the union of the Tn's).
It is convenient to take the space of all continuous 2-periodic functions on R as the
containing space for Tn a space we denote by C 2 . The space C 2 has several equivalent
descriptions. For one, it's obvious that C 2 is a subspace of C (R), the space of all con-
tinuous functions on R. But we might also consider C 2 as a subspace of C 0 2 ] in the
following way: The 2-periodic continuous functions on R may be identied with the set
of functions f 2 C 0 2 ] satisfying f (0) = f (2). Each such f extends to a 2-periodic
element of C (R) in an obvious way, and it's not hard to see that the condition f (0) = f (2)
denes a subspace of C 0 2 ]. As a third description, it is often convenient to identify C 2
with the collection C (T), consisting of all the continuous real-valued functions on T, where
T is the unit circle in the complex plane C . That is, we simply make the identications
Our goal in this chapter is to prove what is sometimes called Weierstrass's second
theorem (also from 1885).
Trig Polynomials 39
Theorem. (Weierstrass's Second Theorem, 1885) Let f 2 C 2 . Then, for every " > 0,
there exists a trig polynomial T such that kf ; T k < ".
Ultimately, we will give several dierent proofs of this theorem. Weierstrass gave a
separate proof of this result in the same paper containing his theorem on approximation
by algebraic polynomials, but it was later pointed out by Lebesgue (1898) that the two
theorems are, in fact, equivalent. Lebesgue's proof is based on several elementary obser-
vations. We will outline these elementary facts as \exercises with hints," supplying a few
proofs here and there, but leaving full details to the reader.
We rst justify the use of the word \polynomial" in describing ().
Lemma. cos nx and sin(n + 1)x= sin x can be written as polynomials of degree exactly n
in cos x for any integer n 0.
Proof. Using the recurrence formula cos kx + cos(k ; 2)x = 2 cos(k ; 1)x cos x it's not
hard to see that cos 2x = 2 cos2 x ; 1, cos 3x = 4 cos3 x ; 3 cos x, and cos 4x = 8 cos4 x ;
8 cos2 x + 1. More generally, by induction, cos nx is a polynomial of degree n in cos x
with leading coecient 2n;1. Using this fact and the identity sin(k + 1)x ; sin(k ; 1)x =
2 cos kx sin x (along with another easy induction argument), it follows that sin(n +1)x can
be written as sin x times a polynomial of degree n in cos x with leading coecient 2n.
Alternatively, notice that by writing (i sin x)2k = (cos2 x ; 1)k we have
"X
n n #
cos nx = Re (cos x + i sin x)n ] = Re k n;k
k=0 k (i sin x) cos x
X n
n=2]
= (cos2 x ; 1)k cosn;2k x:
k=0 2k
The coecient of cosn x in this expansion is then
X n
n=2]
1X n n
n;1 :
2k = 2 k=0 k = 2
k=0
(All the binomial coecients together sum to (1 + 1)n = 2n, but the even or odd terms
taken separately sum to exactly half this amount since (1 + (;1))n = 0.)
Trig Polynomials 40
Similarly,
"nX
+1 #
sin(n + 1)x = Im (cos x + i sin x)n+1 = Im n + 1 (i sin x)k cosn+1;k x
k=0 k
X n+1
n=2]
k n;2k x sin x
2k + 1 (cos x ; 1) cos
= 2
k=0
where we've written (i sin x)2k+1 = i(cos2 x ; 1)k sin x. The coecient of cosn x sin x is
X n+1
n=2]
1 nX
+1
n + 1 = 2n:
2k + 1 = 2 k=0 k
k=0
Corollary. Any trig polynomial () may be written as P (cos x) + Q(cos x) sin x, where
P and Q are algebraic polynomials of degree at most n and n ; 1, respectively. If ()
represents an even function, then it can be written using only cosines.
Corollary. The collection T , consisting of all trig polynomials, is both a subspace and
a subring of C 2 (that is, T is closed under both linear combinations and products). In
other words, T is a subalgebra of C 2 .
It's not hard to see that the procedure we've described above can be reversed that is,
each algebraic polynomial in cos x and sin x can be written in the form (). For example,
4 cos3 x = 3 cos x + cos 3x. But, rather than duplicate our eorts, let's use a bit of linear
algebra. First, the 2n + 1 functions
A = f 1 cos x cos 2x : : : cos nx sin x sin 2x : : : sin nx g
are linearly independent the easiest way to see this is to notice that we may dene an
inner product on C 2 under which these functions are orthogonal. Specically,
Z 2 Z 2
hf gi = f (x) g(x) dx = 0 hf f i = f (x)2 dx 6= 0
0 0
for any pair of functions f 6= g 2 A. (We'll pursue this direction in greater detail later in
the course.) Second, we've shown that each element of A lives in the space spanned by
the 2n + 1 functions
B = f 1 cos x cos2 x : : : cosn x sin x cos x sin x : : : cosn;1 x sin x g:
Trig Polynomials 41
That is,
Tn
span A span B:
By comparing dimensions, we have
2n + 1 = dim Tn = dim(span A) dim(span B) 2n + 1
and hence we must have span A = span B. The point here is that Tn is a nite-dimensional
subspace of C 2 of dimension 2n + 1, and we may use either one of these sets of functions
as a basis for Tn.
Before we leave these issues behind, let's summarize the situation for complex trig
polynomials i.e., the case where we allow complex coecients in (). Now it's clear that
every trig polynomial (), whether real or complex, can be written as
X
n
ck eikx ()
k=;n
where the ck 's are complex that is, a trig polynomial is actually a polynomial (over C ) in
z = eix and z! = e;ix . Conversely, every polynomial () can be written in the form (),
using complex ak 's and bk 's. Thus, the complex trig polynomials of degree n form a vector
space of dimension 2n +1 over C (hence of dimension 2(2n +1) when considered as a vector
space over R). But, not every polynomial in z and z! represents a real trig polynomial.
Rather, the real trig polynomials are the real parts of the complex trig polynomials. To
see this, notice that () represents a real-valued function if and only if
X
n X
n X
n
ck eikx = ck eikx = c!;k eikx
k=;n k=;n k=;n
that is, ck = c!;k for each k. In particular, c0 must be real, and hence
X
n X
n
ck eikx = c0 + (ck eikx + c;k e;ikx)
k=;n k=1
Xn
= c0 + (ck eikx + c!k e;ikx)
k=1
Xn
= c0 + (ck + c!k ) cos kx + i(ck ; c!k ) sin kx
k=1
Xn
= c0 + 2Re(ck ) cos kx ; 2Im(ck ) sin kx
k=1
Trig Polynomials 42
which is of the form () with ak and bk real.
Conversely, given any real trig polynomial (), we have
X
n ; n a ; ib
X a + ib
a0 + ak cos kx + bk sin kx = a0 + k k ikx
e + k k e ; ikx
k=1 k=1 2 2
which of of the form () with ck = c!;k for each k.
It's time we returned to approximation theory! Since we've been able to identify C 2
with a subspace of C 0 2 ], and since Tn is a nite-dimensional subspace of C 2 , we have
Corollary. Each f 2 C 2 has a best approximation (on all of R) out of Tn. If f is an
even function, then it has a best approximation which is also even.
Proof. We only need to prove the second claim, so suppose that f 2 C 2 is even and
that T 2 Tn satises
kf ; T k = min kf ; T k:
T 2Tn
Then, since f is even, Te(x) = T (;x) is also a best approximation to f out of Tn indeed,
kf ; Te k = max jf (x) ; T (;x)j
x2R
= max jf (;x) ; T (x)j
x2R
= max jf (x) ; T (x)j = kf ; T k:
x2R
We next give (de la Vall"ee Poussin's version of) Lebesgue's proof of Weierstrass's
second theorem that is, we will deduce the second theorem from the rst.
Trig Polynomials 43
Theorem. Let f 2 C 2 and let " > 0. Then, there is a trig polynomial T such that
kf ; T k = max jf (x) ; T (x)j < ".
x2R
Proof. We will prove that Weierstrass's rst theorem for C ;1 1 ] implies his second
theorem for C 2 .
Step 1. If f is even, then f may be uniformly approximated by even trig polynomials.
If f is even, then it's enough to approximate f on the interval 0 ]. In this case, we
may consider the function g(y) = f (arccos y), ;1 y 1, in C ;1 1 ]. By Weierstrass's
rst theorem, there is an algebraic polynomial p(y) such that
max jf (arccos y) ; p(y)j = max jf (x) ; p(cos x)j < ":
;1 y 1 0 x
But T (x) = p(cos x) is an even trig polynomial! Hence,
kf ; T k = max jf (x) ; T (x)j < ":
x2R
Let's agree to abbreviate kf ; T k < " as f T .
Step 2. Given f 2 C 2 , there is a trig polynomial T such that 2f (x) sin2 x T (x).
Each of the functions f (x) + f (;x) and f (x) ; f (;x)] sin x is even. Thus, we may
choose even trig polynomials T1 and T2 such that
f (x) + f (;x) T1(x) and f (x) ; f (;x)] sin x T2(x):
Multiplying the rst expression by sin2 x, the second by sin x, and adding, we get
2f (x) sin2 x T1 (x) sin2 x + T2(x) sin x
T3(x)
where T3 (x) is still a trig polynomial, and where \" now means \within 2"" (since
j sin x j 1).
Step 3. Given f 2 C 2 , there is a trig polynomial T such that 2f (x) cos2 x T (x), where
\" means \within 2"."
Repeat Step 2 for f (x ; =2) and translate: We rst choose a trig polynomial T4 (x)
such that
2f x ; 2 sin2 x T4 (x):
Trig Polynomials 44
That is,
2f (x) cos2 x T5(x)
where T5(x) is a trig polynomial.
Finally, by combining the conclusions of Steps 2 and 3, we nd that there is a trig
polynomial T6 (x) such that f T6 (x), where, again, \" means \within 2"."
Just for fun, let's complete the circle and show that Weierstrass's second theorem
for C 2 implies his rst theorem for C ;1 1 ]. Since, as we'll see, it's possible to give an
independent proof of the second theorem, this is a meaningful exercise.
Theorem. Given f 2 C ;1 1 ] and " > 0, there exists an algebraic polynomial p such
that kf ; pk < ".
Proof. Given f 2 C ;1 1 ], the function f (cos x) is an even function in C 2 . By our
Corollary to Weierstrass's second theorem, we may approximate f (cos x) by an even trig
polynomial:
f (cos x) a0 + a1 cos x + a2 cos 2x + + an cos nx:
But, as we've seen, cos kx can be written as an algebraic polynomial in cos x. Hence, there
is some algebraic polynomial p such that f (cos x) p(cos x). That is,
max jf (cos x) ; p(cos x)j = max jf (t) ; p(t)j < ":
0 x ;1 t 1
now becomes
Tn(x) = 2x Tn;1 (x) ; Tn;2(x) n 2
where T0 (x) = 1 and T1(x) = x. This recurrence relation (along with the initial cases T0
and T1 ) may be taken as a denition for the Chebyshev polynomials of the rst kind. At
any rate, it's now easy to list any number of the Chebyshev polynomials Tn for example,
the next few are T2 (x) = 2x2 ; 1, T3(x) = 4x3 ; 3x, T4(x) = 8x4 ; 8x2 + 1, and T5(x) =
16x5 ; 20x3 + 5x.
Math 682 Problem Set: Trigonometric Polynomials 5/26/98
A (real) trigonometric polynomial, or trig polynomial for short, is a function of the form
X
n ;
a0 + ak cos kx + bk sin kx ()
k=1
where a0 : : : an and b1 : : : bn are real numbers. We will use Tn to denote the collection
of trig polynomials of degree at most n, considered as a subspace C 2 , the space of all
continuous 2-periodic functions on R. The space C 2 may, in turn, be considered as a
subspace of C 0 2 ]. Indeed, the 2-periodic continuous functions on R may be identied
with the subspace of C 0 2 ] consisting of those f 's which satisfy f (0) = f (2). As an
alternate description, it is often convenient to instead identify C 2 with the collection
C (T), consisting of all continuous real-valued functions on T, where T is the unit circle in
the complex plane C . In this case, we simply make the identications
. 41. (a) By using the recurrence formulas cos kx + cos(k ; 2)x = 2 cos(k ; 1)x cos x and
sin(k + 1)x ; sin(k ; 1)x = 2 cos kx sin x, show that each of the functions cos kx
and sin(k + 1)x= sin x may be written as algebraic polynomials of degree exactly
k in cos x. In each case, what is the coecient of cosk x?
(b) Equivalently, use the binomial formula to write the real and imaginary parts of
(cos x + i sin x)n = cos nx + i sin nx as algebraic polynomials in cos x and sin x.
Again, what are the leading coecients of these polynomials?
(c) If P (x y) is an algebraic polynomial (in two variables) of degree at most n, show
that P (cos x sin x) may be written as Q(cos x) + R(cos x) sin x, where Q and
R are algebraic polynomials (in one variable) of degrees at most n and n ; 1,
respectively.
(d) Show that cosn x can be written as a linear combination of the functions cos kx,
k = 1 : : : n, and that cosn;1 x sin x can be written as a linear combinations of
the functions sin kx, k = 1 : : : n. Thus, each polynomial P (cos x sin x) in cos x
and sin x can be written in the form ().
Trigonometric Polynomials 47
(e) If () represents an even function, show that it can be written using only cosines.
Conversely, if P (x y) is an even polynomial, show that P (cos x sin x) can be
written using only cosines.
42. Show that Tn has dimension exactly 2n + 1 (as a vector space over R).
43. We might also consider complex trig polynomials that is, functions of the form () in
which we now allow the ak 's and bk 's to be complex numbers.
(a) Show that every trig polynomial, whether real or complex, may be written as
X
n
ck eikx ()
k=;n
where the ck 's are complex. Thus, complex trig polynomials are just algebraic
polynomials in z and z!, where z = eix 2 T.
(b) Show that () is real-valued if and only if c!k = c;k for any k.
(c) If () is a real-valued function, show that it may be written as a real trig poly-
nomial that is, it may be written in the form () using only real coecients.
Math 682 Characterization of Best Approximation 5/27/98
and 2 3
E0(f ) = 2 1 4 max f (x) ; min f (x)5 :
a x b a x b
Proof. Exercise.
Now all of this is meant as motivation for the general case, which essentially repeats
the observation of our rst Lemma inductively. A little experimentation will convince you
that a best linear approximation, for example, would imply the existence of three points
(at least) at which f ; p1 alternates between kf ; p1 k.
A bit of notation will help us set up the argument for the general case: Given g in
C a b ], we'll say that x 2 a b ] is a (+) point for g (respectively, a (;) point for g) if
g(x) = kgk (respectively, g(x) = ;kgk). A set of distinct point a x0 < x1 < < xn b
will be called an alternating set for g if the xi 's are alternately (+) points and (;) points
that is, if
jg(xi )j = kgk i = 0 1 ::: n
and
g(xi ) = ;g(xi;1 ) i = 1 2 : : : n:
Using this notation, we will be able to characterize the polynomial of best approximation.
Since the following three theorems are particularly important, we will number them for
future reference. Our rst result is where all the ghting takes place:
Theorem 1. Let f 2 C a b ], and suppose that p = pn is a best approximation to f out
of Pn. Then, there is an alternating set for f ; p consisting of at least n + 2 points.
Proof. If f 2 Pn, there's nothing to show. (Why?) Thus, we may suppose that f 2= Pn
and, hence, that E = En(f ) = kf ; pk > 0.
Best Approximation 50
Now consider the (uniformly) continuous function ' = f ; p. We may partition a b ]
by way of a = t0 < t1 < < tn = b into suciently small intervals so that
Here's why we'd want to do such a thing: If ti ti+1 ] contains a (+) point for ' = f ; p,
then ' is positive on all of ti ti+1 ]. Indeed,
Similarly, if ti ti+1 ] contains a (;) point for ', then ' is negative on all of ti ti+1 ].
Consequently, no interval ti ti+1 ] can contain both (+) points and (;) points.
Call ti ti+1 ] a (+) interval (respectively, a (;) interval) if it contains a (+) point
(respectively, a (;) point) for ' = f ; p. Notice that no (+) interval can even touch a
(;) interval. In other words, a (+) interval and a (;) interval must be strictly separated
(by some interval containing a zero for ').
We now relabel the (+) and (;) intervals from left to right, ignoring the \neither"
intervals. There's no harm in supposing that the rst \signed" interval is a (+) interval.
Thus, we suppose that our relabeled intervals are written
where Ik1 is the last (+) interval before we reach the rst (;) interval, Ik1 +1. And so on.
For later reference, we let S denote the union of all the \signed" intervals ti ti+1 ]
S I , and we let N denote the union of all the \neither" intervals t t ].
that is, S = m j =1 kj i i+1
Thus, S and N are compact sets with S N = a b ] (note that while S and N aren't
quite disjoint, they are at least \non-overlapping"|their interiors are disjoint).
Our goal here is to show that m n + 2. (So far we only know that m 2!) Let's
suppose that m < n + 2 and see what goes wrong.
Best Approximation 51
Since any (+) interval is strictly separated from any (;) interval, we can nd points
z1 : : : zm;1 2 N such that
max Ik1 < z1 < min Ik1 +1
max Ik2 < z2 < min Ik2 +1
::::::::::::::::::::
max Ikm;1 < zm;1 < min Ikm;1+1
Notice that q 2 Pn since m ; 1 n. (Here is the only use we'll make of the assumption
m < n + 2!) We're going to show that p + q 2 Pn is a better approximation to f than p,
for some suitable scalar .
We rst claim that q and f ; p have the same sign. Indeed, q has no zeros in any
of the () intervals, hence is of constant sign on any such interval. Thus, q > 0 on
I1 : : : Ik1 because each (zj ; x) > 0 on these intervals q < 0 on Ik1+1 : : : Ik2 because
here (z1 ; x) < 0, while (zj ; x) > 0 for j > 1 and so on.
We next nd . Let e = max jf (x) ; p(x)j, where N is the union of all the subin-
x2N
tervals ti ti+1 ] which are neither (+) intervals nor (;) intervals. Then, e < E . (Why?)
Now choose > 0 so that kqk < minfE ; e E=2g. We claim that p + q is a better
approximation to f than p. One case is easy: If x 2 N , then
while
;E (f ; p)(xi ) (f ; q)(xi ) E:
But this means that
for each i. (Why?) That is, x0 x1 : : : xn+1 is an alternating set for both f ; p and f ; q.
In particular, the polynomial q ; p = (f ; p) ; (f ; q) has n + 2 zeros! Since q ; p 2 Pn,
we must have p = q.
Finally, we come full circle:
Theorem 3. Let f 2 C a b ], and let p 2 Pn. If f ; p has an alternating set containing
n + 2 (or more) points, then p is the best approximation to f out of Pn.
Best Approximation 53
Proof. Let x0 x1 : : : xn+1 be an alternating set for f ; p, and suppose that some q 2 Pn
is a better approximation to f than p that is, kf ; qk < kf ; pk. In particular, then, we
must have
jf (xi ) ; p(xi )j = kf ; pk > kf ; qk jf (xi ) ; q(xi )j
for each i = 0 1 : : : n + 1. Now the inequality jaj > jbj implies that a and a ; b have the
same sign (why?), hence q ; p = (f ; p) ; (f ; q) alternates in sign n + 2 times (because
f ; p does). But then, q ; p would have at least n +1 zeros. Since q ; p 2 Pn, we must have
q = p, which is a contradiction. Thus, p is the best approximation to f out of Pn.
Example (taken from Rivlin)
While an alternating set for f ; pn is supposed to have at least n + 2 points, it may well
have more than n + 2 points thus, alternating sets need not be unique. For example,
consider the function f (x) = sin 4x on ; ]. Since there are 8 points where f alternates
between 1, it follows that p0 = 0 and that there are 4 4 = 16 dierent alternating
sets consisting of exactly 2 points (not to mention all those with more than 2 points). In
addition, notice that we actually have p1 = = p6 = 0, but that p7 6= 0. (Why?)
Exercise
Show that y = x ; 1=8 is the best linear approximation to y = x2 on 0 1 ].
Essentially repeating the proof given for Theorem 3 yields a lower bound for En(f ).
Theorem. Let f 2 C a b ], and suppose that q 2 Pn is such that f (xi ) ; q(xi ) alternates
in sign at n + 2 points a x0 x1 : : : xn+1 b. Then,
Proof. If the inequality fails, then the best approximation p = pn would satisfy
Now we could repeat (essentially) the same argument used in the proof of Theorem 3 to
arrive at a contradiction. The details are left as an exercise.
Best Approximation 54
Even for relatively simple functions, the problem of actually nding the polynomial
of best approximation is genuinely dicult (even computationally). We end this section
by stating two important problems that Chebyshev was able to solve.
Problem
Find the polynomial pn;1 2 Pn;1, of degree at most n ; 1, that best approximates
f (x) = xn on the interval ;1 1 ]. (This particular choice of interval makes for a tidy
solution we'll discuss the general situation later.)
Since pn;1 is to minimize max jxn ; pn;1 (x)j, our rst problem is equivalent to:
jxj 1
Problem
Find the monic polynomial of degree n which deviates least from 0 on ;1 1 ]. In other
words, nd the monic polynomial of degree n which has smallest norm in C ;1 1 ].
We'll give two solutions to this problem (which we know has a unique solution, of
course). First, let's simplify our notation. We write
p(x) = xn ; pn;1 (x) (the solution)
and
M = kpk = En;1(xn ;1 1 ]):
All we know about p is that it has an alternating set ;1 x0 < x1 < < xn 1
containing (n ; 1) + 2 = n + 1 points that is, jp(xi )j = M and p(xi+1 ) = ;p(xi ) for all i.
Using this tiny bit of information, Chebyshev was led to compare the polynomials p2 and
p 0 . Watch closely!
Step 1. At any xi in (;1 1), we must have p 0 (xi ) = 0 (because p(xi ) is a relative extreme
value for p). But, p 0 is a polynomial of degree n ; 1 and so can have at most n ; 1 zeros.
Thus, we must have
xi 2 (;1 1) and p 0 (xi ) = 0 for i = 1 : : : n ; 1
(in fact, x1 : : : xn;1 are all the zeros of p 0 ) and
x0 = ;1 p 0 (x0 ) 6= 0 xn;1 = 1 p 0 (xn;1 ) 6= 0:
Best Approximation 55
Step 2. Now consider the polynomial M 2 ; p2 2 P2n. We know that M 2 ; (p(xi ))2 = 0
for i = 0 1 : : : n, and that M 2 ; p2 0 on ;1 1 ]. Thus, x1 : : : xn;1 must be double
roots (at least) of M 2 ; p2. But this makes for 2(n ; 1) + 2 = 2n roots already, so we
must have them all. Hence, x1 : : : xn;1 are double roots, x0 and xn are simple roots, and
these are all the roots of M 2 ; p2.
Step 3. Next consider (p 0 )2 2 P2(n;1). We know that (p 0 )2 has a double root at each
of x1 : : : xn;1 (and no other roots), hence (1 ; x2 )(p 0 (x))2 has a double root at each
x1 : : : xn;1 , and simple roots at x0 and xn . Since (1 ; x2)(p 0 (x))2 2 P2n, we've found all
of its roots.
Here's the point to all this \rooting":
Step 4. Since M 2 ; (p(x))2 and (1 ; x2 )(p 0 (x))2 are polynomials of the same degree with
the same roots, they are, up to a constant multiple, the same polynomial! It's easy to see
what constant, too: The leading coecient of p is 1 while the leading coecient of p 0 is
n thus, 2 )(p 0 (x))2
M ; (p(x)) =
2 2 (1 ; x :
n2
After tidying up,
0
p M 2p;(x(p) (x))2
= p n 2:
1;x
We really should have an extra here, but we know that p 0 is positive on some interval
we'll simply assume that it's positive on ;1 x1 ]. Now, upon integrating,
p(x)
arccos M = n arccos x + C
or
p(x) = M cos(n arccos x + C ):
But p(;1) = ;M (because p 0 (;1) 0), so
cos(n + C ) = ;1 =) C = m (with n + m odd)
=) p(x) = M cos(n arccos x)
=) p(cos x) = M cos nx:
Look familiar? Since we know that cos nx is a polynomial of degree n with leading coe-
cient 2n;1 (the n-th Chebyshev polynomial Tn ), the solution to our problem must be
p(x) = 2;n+1 Tn(x):
Best Approximation 56
Since jTn(x)j 1 for jxj 1 (why?), the minimum norm is M = 2;n+1.
Next we give a \fancy" solution, based on our characterization of best approximations
(Theorem 3) and a few simple properties of the Chebyshev polynomials.
Theorem. For any n 1, the formula p(x) = xn ; 2;n+1 Tn (x) de nes a polynomial
p 2 Pn;1 satisfying
2;n+1 = max jxn ; p(x)j < max jxn ; q(x)j
jxj 1 jxj 1
for any other q 2 Pn;1.
Proof. We know that 2;n+1 Tn (x) has leading coecient 1, and so p 2 Pn;1 . Now set
xk = cos((n ; k)=n) for k = 0 1 : : : n. Then, ;1 = x0 < x1 < < xn = 1 and
Tn(xk ) = Tn(cos((n ; k)=n)) = cos((n ; k)) = (;1)n;k :
Since jTn (x)j = jTn(cos )j = j cos nj 1, for ;1 x 1, we've found an alternating set
for Tn containing n + 1 points.
In other words, xn ; p(x) = 2;n+1 Tn(x) satises jxn ; p(x)j 2;n+1 and, for each
k = 0 1 : : : n, has xnk ; p(xk ) = 2;n+1 Tn(xk ) = (;1)n;k 2;n+1. By our characterization
of best approximations (Theorem 3), p must be the best approximation to xn out of
Pn;1.
Corollary. The monic polynomial of degree exactly n having smallest norm in C a b ] is
(b ; a)n T 2x ; b ; a :
2n2n;1 n b;a
Proof. Exercise. Hint: If p(x) is a polynomial of degree n with leading coecient 1,
then p~(x) = p((2x ; b ; a)=(b ; a)) is a polynomial of degree n with leading coecient
2n=(b ; a)n. Moreover, max jp(x)j = max jp~(x)j.]
a x b ;1 x 1
Properties of the Chebyshev Polynomials
As we've seen, the Chebyshev polynomial Tn (x) is the (unique, real) polynomial of degree
n (having leading coecient 1 if n = 0, and 2n;1 if n 1) such that Tn(cos ) = cos n
Best Approximation 57
for all . The Chebyshev polynomials have dozens of interesting properties and satisfy all
sorts of curious equations. We'll catalogue just a few.
C1. Tn(x) = 2x Tn;1 (x) ; Tn;2 (x) for n 2.
Proof. It follows from the trig identity cos n = 2 cos cos(n ; 1) ; cos(n ; 2) that
Tn(cos ) = 2 cos Tn;1(cos ) ; Tn;2(cos ) for all that is, the equation Tn (x) =
2x Tn;1 (x) ; Tn;2(x) holds for all ;1 x 1. But since both sides are polynomials,
equality must hold for all x.
The next two properties are proved in essentially the same way:
C2. Tm (x) + Tn(x) = 12 Tm+n(x) + Tm;n (x)
for m > n.
C3. Tm (Tn(x)) = Tmn (x).
C4. Tn(x) = 12 (x + x2 ; 1 )n + (x ; x2 ; 1 )n
.
p p
Proof. First notice that the expression on the right-hand side is actually a polynomial
p p
since, on combining the binomial expansions of (x + x2 ; 1 )n and (x ; x2 ; 1 )n, the
p
odd powers of x2 ; 1 cancel. Next, for x = cos ,
Tn(x) = Tn (cos ) = cos n = 12 (ein + e;in )
= 1 (cos + i sin )n + (cos ; i sin )n
2
; p ; p
= 1 x + i 1 ; x2 n + x ; i 1 ; x2 n
2
1 ; p ; p
= 2 x + x2 ; 1 n + x ; x2 ; 1 n :
We've shown that these two polynomials agree for jxj 1, hence they must agree for all x
(real or complex, for that matter).
p p
For real x with jxj 1, the expression 21 (x + x2 ; 1 )n + (x ; x2 ; 1 )n equals
cosh(n cosh;1 x). In other words,
C5. Tn(cosh x) = cosh nx for all real x.
The next property also follows from property C4.
Best Approximation 58
p
C6. Tn(x) (jxj + x2 ; 1 )n for jxj 1.
An approach similar to the proof of property C4 allows us to write xn in terms of the
Chebyshev polynomials T0 T1 : : : Tn .
X n
n=2]
C7. For n odd, 2nxn = k 2 Tn;2k (x) for n even, 2 T0 should be replaced by T0.
k=0
Proof. For ;1 x 1,
Proof. We may write f (x) = R cos m(x ; x0 ) for some R and x0 . (How?) Now we need
only display a suciently large alternating set for f (in some interval of length 2).
Setting xk = x0 + k=m, k = 1 2 : : : 2m, we get f (xk ) = R cos k = R(;1)k and
xk 2 (x0 x0 + 2]. Since m > n, it follows that 2m 2n + 2.
Example
The best approximation to
nX
+1;
f (x) = a0 + ak cos kx + bk sin kx
k=1
out of Tn is
X
n ;
T (x) = a0 + ak cos kx + bk sin kx
k=1
q2
and kf ; T k = an+1 + b2n+1 in C 2 .
Finally, let's make a simple connection between the two types of polynomial approxi-
mation:
Best Approximation 63
Theorem. Let f 2 C ;1 1 ] and de ne ' 2 C 2 by '() = f (cos ). Then,
En(f ) = min kf ; pk = min k' ; T k
EnT ('):
p2Pn T 2Tn
P
Proof. Suppose that p (x) = nk=0 ak xk is the best approximation to f out of Pn .
Then, Tb() = p (cos ) is in Tn and, clearly,
We've shown that cos n and sin(n + 1)= sin can be written as algebraic polynomials
of degree n in cos we use this observation to dene the Chebyshev polynomials. The
Chebyshev polynomials of the rst kind (Tn(x)) are dened by Tn(cos ) = cos n, for
n = 0 1 2 : : :, while the Chebyshev polynomials of the second kind (Un(x)) are dened
by Un(cos ) = sin(n + 1)= sin for n = 0 1 2 : : :.
. 44. Establish the following properties of Tn(x).
(i) T0 (x) = 1, T1(x) = x, and Tn(x) = 2xTn;1 (x) ; Tn;2(x) for n 2.
(ii) Tn(x) is a polynomial of degree n having leading coecient 2n;1 for n 1, and
containing only even (resp., odd) powers of x if n is even (resp., odd).
(iii) jTn(x)j 1 for ;1 x 1 when does equality occur? Where are the zeros of
Tn(x)? Show that between two consecutive zeros of Tn(x) there is exactly one
zero of Tn;1(x). Can Tn(x) and Tn;1(x) have a common zero?
(iv) jTn0 (x)j n2 for ;1 x 1, and jTn0 (1)j = n2.
(v) Tm (x) + Tn (x) = 12 Tm+n(x) + Tm;n (x) for m > n.
(vi) Tm (Tn (x)) = Tmn(x).
Z1
(vii) Evaluate Tn(x) Tm (x) p dx 2 .
;1 1;x
(viii) Show that Tn is a solution to (1 ; x2)y00 ; xy0 + n2y = 0.
p p
(ix) Tn(x) = 21 (x + x2 ; 1 )n + (x ; x2 ; 1 )n for any x, real or complex.
(x) Re 1
;P tnein = P1 tn cos n = 1 ; t cos for ;1 < t < 1 that is,
P1 tnT (x) = 1 ; tx (this is a1 ;generating
n=0 n=0 2t cos + t2
n=0 n 1 ; 2tx + t2 function for Tn it's closely
related to the Poisson kernel).
(xi) Find analogues of (i){(x) (if possible) for Un(x).
. 45. Show that every p 2 Pn has a unique representation as p = a0 + a1 T1 + + an Tn.
Find this representation in the case p(x) = xn.
Chebyshev Polynomials 65
. 46. The polynomial of degree n having leading coecient 1 and deviating least from 0 on
;1 1 ] is given by Tn(x)=2n;1 . On an arbitrary interval a b ] we would instead take
(b ; a)n T 2x ; b ; a :
22n;1 n b ; a
Is this solution unique? Explain.
47. If p is a polynomial on a b ] of degree n having leading coecient an > 0, then
kpk an(b ; a)n =22n;1. If b ; a 4, then no polynomial of degree exactly n with
integer coecients can satisfy kpk < 2 (compare this with problem 26 on the \Uniform
Approximation by Polynomials" problem set).
48. Given p 2 Pn, show that jp(x)j kpkjTn(x)j for jxj > 1.
49. If p 2 Pn with kpk = 1 on ;1 1 ], and if jp(xi )j = 1 at n + 1 distinct point x0 : : : xn
in ;1 1 ], show that either p = 1, or else p = Tn. Hint: One approach is to
compare the polynomials 1 ; p2 and (1 ; x2)(p 0 )2.]
50. Compute Tn(k)(1) for k = 0 1 : : : n, where Tn(k) is the k-th derivative of Tn. For x 1
and k = 0 1 : : : n, show that Tn(k)(x) > 0.
Math 682 Examples: Chebyshev Polynomials in Practice 5/28/98
The following examples are cribbed from the book Chebyshev Polynomials, by L. Fox and
I. B. Parker (Oxford University Press, 1968).
As we've seen, the Chebyshev polynomals can be generated by a recurrence relation. By
reversing the procedure, we could solve for xn in terms of T0 T1 : : : Tn (we'll do this
calculation in class). Here are the rst few terms in each of these relations:
T0(x) = 1 1 = T0(x)
T1(x) = x x = T1(x)
T2(x) = 2x2 ; 1 x2 = (T0 (x) + T2(x))=2
T3(x) = 4x3 ; 3x x3 = (3 T1 (x) + T3 (x))=4
T4(x) = 8x4 ; 8x2 + 1 x4 = (3 T0 (x) + 4 T2 (x) + T4(x))=8
T5(x) = 16x5 ; 20x3 + 5x x5 = (10 T1 (x) + 5 T3 (x) + T5(x))=16
Note the separation of even and odd terms in each case. Writing ordinary, garden variety
polynomials in their equivalent Chebyshev form has some distinct advantages for numerical
computations. Here's why:
1 ; x + x2 ; x3 + x4 = 15
6 T 0 ( x) ; 7 T (x) + T (x) ; 1 T (x) + 1 T (x)
4 1 2 4 3 8 4
(after some simplication). Now we see at once that we can get a cubic approximation to
1 ; x + x2 ; x3 + x4 on ;1 1 ] with error at most 1=8 by simply dropping the T4 term on
the right-hand side (since jT4(x)j 1), whereas simply using 1 ; x + x2 ; x3 as our cubic
approximation could cause an error as big as 1. Pretty slick! This gimmick of truncating
the equivalent Chebyshev form is called economization.
As a second example we note that a polynomial with small norm on ;1 1 ] may have
annoyingly large coecients:
Our goal in this section is to prove the following result (as well as discuss its ramications).
In fact, this result is so fundamental that we will present three proofs!
Theorem. Let x0 x1 : : : xn be distinct points, and let y0 y1 : : : yn be arbitrary points
in R. Then, there exists a unique polynomial p 2 Pn satisfying p(xi ) = yi , i = 0 1 : : : n.
First notice that uniqueness is obvious. Indeed, if two polynomials p, q 2 Pn agree at
n + 1 points, then p
q. (Why?) The real work comes in proving existence.
First Proof. (Vandermonde's determinant.) We seek c0 c1 : : : cn so that p(x) =
Pn ck xk satises
k=0
Xn
p(xi ) = ck xki = yi i = 0 1 : : : n:
k=0
That is, we need to solve a system of n + 1 linear equations for the ci's. In matrix form:
2 32 3 2 3
1 x0 x20 xn0
66 1 x1 x2 xn 77 66 cc01 77 66 yy01 77
66 . . .1 . .1 77 66 .. 77 = 66 . 77 :
4 .. .. .. . . .. 5 4 . 5 4 .. 5
1 xn x2n xnn cn yn
This equation always has a unique solution because the coecient matrix has determinant
Y
D = (xi ; xj ) 6= 0:
0 j<i n
(D is called Vandermonde's determinant note that D > 0 if x0 < x1 < < xn .) Since
this fact is of independent interest, we'll sketch a short proof below.
Lemma. D = Q (xi ; xj ).
0 j<i n
Proof. Consider
1 x0 x20 xn0
1 x1 x21 xn1
V (x0 x1 : : : xn;1 x) = ... ... ..
. ... ..
.
:
1 xn;1 2
xn;1 xnn;1
1 x x2 xn
Interpolation 69
V (x0 x1 : : : xn;1 x) is a polynomial of degree n in x, and it's 0 whenever x = xi , i =
0 1 : : : n ; 1. Thus, V (x0 : : : x) = c ni=0
Q ;1(x ; x ), by comparing roots and degree.
i
However, it's easy to see that the coecient of xn in V (x0 : : : x) is V (x0 : : : xn;1 ).
Q ;1 (x ; x ). The result now follows by induction
Thus, V (x0 : : : x) = V (x0 : : : xn;1 ) in=0 i
and the obvious case 11 xx01 = x1 ; x0 .
Second Proof. (Lagrange interpolation.) We could dene p immediately if we had
polynomials `i (x) 2 Pn, i = 0 : : : n, such that `i (xj ) =
ij (where
ij is Kronecker's
P
delta that is,
ij = 0 for i 6= j , and
ij = 1 for i = j ). Indeed, p(x) = ni=0 yi `i (x)
would then work as our interpolating polynomial. In short, notice that the polynomials
f`0 `1 : : : `ng would form a (particularly convenient) basis for Pn.
We'll give two formulas for `i(x):
Y
(a). Clearly, `i(x) = xx ;; xxj works.
j 6=i i j
(b). Start with W (x) = (x ; x0 )(x ; x1 ) (x ; xn), and notice that the polynomial we
need satises
`i(x) = ai xW;(xx)
i
for some ai 2 R. (Why?) But then, 1 = `i (xi ) = ai W 0 (xi ) (again, why?) that is,
`i(x) = (x ; W (x)
x ) W 0 (x ) :
i i
Q
Please note that `i(x) is a multiple of the polynomial j6=i (x ; xj ), for i = 0 : : : n, and
that p(x) is then a suitable linear combination of such polynomials.
Third Proof. (Newton's formula.) We seek p(x) of the form
(Please note that xn does not appear on the right-hand side.) This form makes it almost
eortless to solve for the ai 's by plugging-in the xi 's, i = 0 : : : n ; 1.
y0 = p(x0 ) = a0
y1 = p(x1 ) = a0 + a1(x1 ; x0 ) =) a1 = xy1 ;
;x
a0 :
1 0
Interpolation 70
Continuing, we nd
Li (f ) = yi i = 1 ::: n ()
always has a (unique) solution f 2 X for any choice of scalars y1 : : : yn if and only if
L1 : : : Ln are linearly independent.
Proof. Let f1 : : : fn be a basis for X . Then, () is equivalent to the system of equations
a1L1 (f1 ) + + anL1 (fn ) = y1
a1L2 (f1 ) + + anL2 (fn ) = y2
.. ()
.
a1 Ln(f1 ) + + an Ln(fn ) = yn
by taking f = a1 f1 + + an fn. Thus, () always has a solution if and only if () always
has a solution if and only if Li(fj )] is nonsingular if and only if L1 : : : Ln are linearly
independent. In any of these cases, note that the solution must be unique.
In the case of Lagrange interpolation, X = Pn and Li is evaluation at xi i.e., Li (f ) =
f (xi ), which is easily seen to be linear in f . Moreover, L0 : : : Ln are linearly independent
provided that x0 : : : xn are distinct. (Why?)
In the case of Hermite interpolation, the linear functionals are of the form Lxk (f ) =
f (k)(x), dierentiation composed with a point evaluation. If x 6= y, then Lxk and Lym
are independent for any k and m if k 6= m, then Lxk and Lxm are independent. (How
would you check this?)
Math 682 Problem Set: Lagrange Interpolation 6/2/98
(2) Argue that this process converges (in some sense) to the best approximation on all
of a b ] provided that Xm \gets big" as m ! 1. In actual practice, there's no need
to worry about pn(Xm ) converging to pn (the best approximation on all of a b ])
rather, we will argue that En(f Xm ) ! En(f ) and appeal to \abstract nonsense."
(3) Find an ecient strategy for carrying out items (1) and (2).
Observations
1. If m n + 1, then En(f Xm ) = 0. That is, we can always nd a polynomial p 2 Pn
that agrees with f at n + 1 (or fewer) points. (How?) Of course, p won't be unique if
m < n + 1. (Why?) In any case, we might as well assume that m n + 2. In fact, as
we'll see, the case m = n + 2 is all that we really need to worry about.
2. If X Y a b ], then En(f X ) En(f Y ) En(f ). Indeed, if p 2 Pn is the best
approximation on Y , then
Next let's see how this reduces our study to the case m = n + 2.
So, by uniqueness of best approximations on Xn+2, we must have pn = pn(Xn+2) and
En(f ) = En(f Xn+2 ). The second assertion follows from a similar argument using the
uniqueness of pn on a b ].
(ii): This is just (i) with a b ] replaced everywhere by Xm .
Here's the point: Through some as yet undisclosed method, we choose Xm with
m n + 2 (in fact, m >> n + 2) such that En(f Xm ) En(f ) En(f Xm ) + ", and
then we search for the \best" Xn+2 Xm , meaning the largest value of En(f Xn+2 ). We
then take pn(Xn+2) as an approximation for pn. As we'll see momentarily, pn(Xn+2) can
be computed directly and explicitly.
Finite Sets 80
Now suppose that the elements of Xn+2 are a x0 < x1 < < xn+1 b, let
p = pn (Xn+2) be p(x) = a0 + a1 x + + anxn , and let
E = En(f Xn+2 ) = max jf (xi ) ; p(xi )j:
0 i n+1
In order to compute p and E , we use the fact that f (xi ) ; p(xi ) = E , alternately, and
write (for instance)
f (x0 ) = E + p(x0 )
f (x1 ) = ;E + p(x1 )
..
.
f (xn+1 ) = (;1)n+1E + p(xn+1)
(where the \E column" might, instead, read ;E , E , : : :, (;1)n E ). That is, in order to
nd p and E , we need to solve a system of n + 2 linear equations in the n + 2 unknowns
E , a0 : : : an . The determinant of this system is (up to sign)
1 1 x0 xn0
;1 1 x1 xn1
.. .. ... .. = A0 + A1 + + An+1 > 0
. n+1 . .
(;1) 1 xn xnn
where we have expanded by cofactors along the rst column and have used the fact that
each minor Ak is a Vandermonde determinant (and hence each Ak > 0). If we apply
Cramer's rule to nd E we get
X41 = f0 1=3 2=3g X42 = f0 1=3 1g X43 = f0 2=3 1g X44 = f1=3 2=3 1g:
In each case we nd a p and a (= E in our earlier setup). For instance, in the case of
X42 we would solve the system of equations f (x) = + p(x) for x = 0 1=3 1.
9
0 = (2) + a0 >
> (2) = 91
1 = ;(2) + a + 1 a =
9 0 3 1 > =) a0 = ; 19
1 = (2) + a0 + a1
>
!
a1 = 1
In the other three cases you would nd that (1) = 1=18, (3) = 1=9, and (4) = 1=18.
Since we need the largest , we're done: X42 (or X43) works, and p1 (X4 )(x) = x ; 1=9.
(Recall that the best approximation on all of 0 1 ] is p1 (x) = x ; 1=8.)
Where does this leave us? We still need to know that there is some hope of nding an
initial set Xm with En(f ) ; " En(f Xm ) En(f ), and we need a more ecient means
;
of searching through the nm+2 subsets Xn+2 Xm . In order to attack the problem of
nding an initial Xm , we'll need a few classical inequalities. We won't directly attack the
second problem instead, we'll outline an algorithm that begins with an initial set Xn0+2,
containing exactly n + 2 points, which is then \improved" to some Xn1+2 by changing only
a single point.
Proof. We know that the Lagrange interpolation formula is exact for polynomials of
degree < n, and we know that, up to a constant multiple, Tn(x) is the product W (x) =
(x ; x1 ) (x ; xn). All that remains is to compute Tn0 (xi ). But recall that for x = cos
we have
sin n = pn sin n = pn sin n :
Tn0 (x) = n sin 1 ; cos2 1 ; x2
Finite Sets 84
But for xi = cos((2i ; 1)=2n), i.e., for i = (2i ; 1)=2n, it follows that sin ni =
sin((2i ; 1)=2) = (;1)i;1 that is,
1 = (;1)i;1 1 ; x2i :
p
Tn0 (xi ) n
Lemma 2. For any polynomial p 2 Pn;1, we have
p
max jp(x)j max n 1 ; x2 p(x):
;1 x 1 ;1 x 1
because sin 2= for 0 =2 (from the mean value theorem). Hence, for jxj
p
cos(=2n), we get jp(x)j n 1 ; x2 jp(x)j M .
Now, for x's outside the interval xn x1 ], we apply our interpolation formula. In this
case, each of the factors x ; xi is of the same sign. Thus,
Xn p
jp(x)j = n1 p(xi ) (;1) x 1;;x xi Tn(x)
i; 1 2
i=1 i
X X
M
n n
Tn (x) = M2 Tn(x) :
n i=1 x ; xi
2 n i=1 x ; xi
But,
X
n T (x)
n
x ; x = Tn0 (x) (why?)
i=1 i
and we know that jTn0 (x)j n2. Thus, jp(x)j M .
We next turn our attention to trig polynomials. As usual, given an algebraic poly-
nomial p 2 Pn, we will sooner or later consider S () = p(cos ). In this case, S 0() =
p 0 (cos ) sin is an odd trig polynomial of degree at most n and jS 0 ()j = jp 0 (cos ) sin j =
Finite Sets 85
p
jp 0 (x) 1 ; x2 j. Conversely, if S 2 Tn is an odd trig polynomial, then S ()= sin is even,
and so may be written S ()= sin = p(cos ) for some algebraic polynomial p of degree at
most n ; 1. From Lemma 2,
S ()
max sin = max jp(cos )j n max jp(cos ) sin j = n max jS ()j:
0 2 0 2 0 2 0 2
This proves
Proof. We rst dene an auxiliary function f ( ) = S ( + ) ; S ( ; ) 2. For
"
xed, f ( ) is an odd trig polynomial in of degree at most n. Consequently,
f ( )
sin n0 max
2
jf ( )j n max jS ()j:
0 2
But
S 0 () = lim S ( + ) ; S ( ; ) = lim f ( )
!0 2 !0 sin
and hence jS 0()j n max jS ()j.
0 2
Since S 0 () = p 0 (cos ) sin is also trig polynomial of degree at most n, Bernstein's in-
equality yields
max jp0 (cos ) sin j n max jp(cos )j:
0 2 0 2
In other words,
p
max p 0 (x) 1 ; x2 n max jp(x)j:
;1 x 1 ;1 x 1
Since p 0 2 Pn;1, the desired inequality now follows easily from Lemma 2:
p
max jp 0 (x)j n max p 0 (x) 1 ; x2 n2 max jp(x)j:
;1 x 1 ;1 x 1 ;1 x 1
m = max min jx ; xi j > 0
x2I 1 i m
Now we're ready to compare En(f Xm ) to En(f ). Our result won't be as good as
Rivlin's (he uses a fancier version of Markov's inequality), but it will be a bit easier to
prove. As in the Lemma, we'll suppose that
2 4
m =
m2n < 1
and we'll set
$m = 1
;m n2 :
m
Note that as
m ! 0 we also have m ! 0 and $m ! 0.]
Finite Sets 88
Theorem. For f 2 C ;1 1 ],
En(f Xm ) En(f ) (1 + $m) En (f Xm ) + !f (;1 1 ]
m ) + $m kf k:
Consequently, if
m ! 0, then En(f Xm ) ! En(f ) (as m ! 1).
Proof. Let p = pn (Xm ) 2 Pn be the best approximation to f on Xm . Recall that
where we've used (2) from the previous Lemma to estimate !p(
m ). All that remains is to
revise this last estimate, eliminating reference to p. For this we use the triangle inequality
again:
max jp(xi )j max jf (xi ) ; p(xi )j + max jf (xi )j
1 i m 1 i m 1 i m
En(f Xm ) + kf k:
Putting all the pieces together gives us our result:
En(f ) !f (
m ) + En(f Xm ) + $m En(f Xm ) + kf k :
As Rivlin points out, it is quite possible to give a lower bound on m in the case of, say,
equally spaced points, which will give En(f Xm ) En(f ) En(f Xm ) + ", but this is
surely an inecient approach to the problem. Instead, we'll discuss the one point exchange
algorithm.
The One Point Exchange Algorithm
We're given f 2 C ;1 1 ], n, and " > 0.
Finite Sets 89
n+1;i
1. Pick a starting \reference" Xn+2. A convenient choice is the set xi = cos n+1 ,
i = 0 1 ::: n +1. These are the \peak points" of Tn+1 that is, Tn+1(xi ) = (;1)n+1;i
(and so Tn+1 is the polynomial e from our \Conclusion").
2. Find p = pn (Xn+2) and (by solving a system of linear equations). Recall that
jj = jf (xi ) ; p(xi )j kf ; p k kf ; pk
where p is the best approximation to f on all of ;1 1 ].
3. Find (approximately, if necessary) the \error function" e(x) = f (x) ; p(x) and any
point where jf () ; p()j = kf ; pk. (According to Powell, this can be accomplished
using \local quadratic ts.")
4. Replace an appropriate xi by so that the new reference Xn0 +2 = fx01 x02 : : :g has the
properties that f (x0i ) ; p(x0i ) alternates in sign, and that jf (x0i ) ; p(x0i )j jj for all
i. The new polynomial p0 = pn (Xn0 +2) and new 0 must then satisfy
for m 6= 0, while
1 Z T (x) dx = 0 Z dx = :
; 2 ; 0
That is, if T 2 Tn, then T is actually equal to its own Fourier series.
3. The partial sum operator sn(f ) is a linear projection from C 2 onto Tn.
;
4. If T (x) = 20 + Pnk=1 k cos kx + k sin kx is a trig polynomial, then
1 Z f (x) T (x) dx = 0 Z f (x) dx + X n Z
k f (x) cos kx dx
; 2 ; k=1 ;
X k
n Z
+ f (x) sin kx dx
k=1 ;
0 a 0 Xn ;
= 2 + k ak + k bk
k=1
where (ak ) and (bk ) are the Fourier coecients for f . This should remind you of the
dot product of the coecients.]
5. Motivated by 1, 2, and 4, we dene the inner product of two elements f , g 2 C 2 by
1 Z
hf gi = f (x) g(x) dx:
;
Note that from 4 we have hf sn (f )i = hsn (f ) sn (f )i for any n. (Why?)
6. If some f 2 C 2 has ak = bk = 0 for all k, then f
0.
Indeed, by 4 (or linearity of the integral), this means that
Z
f (x) T (x) dx = 0
;
for any trig polynomial T . But from Weierstrass's second theorem we know that f is
the uniform limit of some sequence of trig polynomials (Tn ). Thus,
Z Z
f (x)2 dx = lim
n!1 ;
f (x) Tn (x) dx = 0:
;
Fourier Series 92
Since f is continuous, this easily implies that f
0.
7. If f , g 2 C 2 have the same Fourier series, then f
g. Hence, the Fourier series for
an f 2 C 2 provides a representation for f (even if the series fails to converge to f ).
8. The coecients a0 a1 : : : an and b1 b2 : : : bn minimize the expression
Z
'(a0 a1 : : : bn ) = f (x) ; sn (f )(x) 2 dx:
;
It's not hard to see, for example, that
@ ' = Z 2f (x) ; s (f )(x)
cos kx dx = 0
@ ak ; n
2 Xn 1 Z k 4 Xn 1
k j sin x j dx = 2 k
k=1 (k;1) k=1
42 log n
P
because nk=1 k1 log n.
R
The numbers n = kDn k1 = 1 ; jDn(t)j dt are called the Lebesgue numbers asso-
ciated to this process (compare this to the terminology we used for interpolation). The
point here is that n gives the norm of the partial sum operator (projection) on C 2 and
(just as with interpolation) n ! 1 as n ! 1. As a matter of no small curiosity, notice
that, from Observation 10, the norm of sn as an operator on L2 is 1.
Corollary. If f 2 C 2 , then
1 Z
jsn(f )(x)j jf (x + t)jjDn (t)j dt nkf k: ()
;
In particular, ksn(f )k nkf k (3 + log n)kf k.
If we approximate the function sgn Dn by a continuous function f of norm one, then
1 Z
sn (f )(0) jDn(t)j dt = n :
;
Thus, n is the smallest constant that works in (). The fact that the partial sum operators
are not uniformly bounded on C 2 , along with the Baire category theorem, tells us that
there must be some f 2 C 2 for which ksn(f )k is unbounded. But, as we've seen, this has
more to do with projections than it does with Fourier series:
Theorem. (Kharshiladze, Lozinski) For each n, let Ln be a continuous, linear projection
from C 2 onto Tn . Then, there is some f 2 C 2 for which kLn(f ) ; f k is unbounded.
Although our last Corollary may not look very useful, it does give us some information
about the eectiveness of sn(f ) as a uniform approximation to f . Specically, we have
Lebesgue's theorem:
Fourier Series 97
Theorem. If f 2 C 2 , and if we set EnT (f ) = min kf ; T k, then
T 2Tn
Lebesgue's theorem should remind you of our \fancy" version of Bernstein's theorem
if we knew that EnT (f ) log n ! 0 as n ! 1, then we'd know that sn(f ) converged uni-
formly to f . Our goal, then, is to improve our estimates on EnT (f ), and the idea behind
these improvements is to replace Dn by a better kernel (with regard to uniform approx-
imation). Before we pursue anything quite so delicate as an estimate on EnT (f ), though,
let's investigate a simple (and useful) replacement for Dn.
Since the sequence of partial sums (sn ) need not converge to f , we might try looking
at their arithmetic means (or Ces'aro sums):
n = s0 + s1 +n + sn;1 :
(These averages typically have better convergence properties than the partial sums them-
selves. Consider
n in the (scalar) case sn = (;1)n, for example.) Specically, we set
h i
n (f )(x) = n1 s0 (f )(x) + + sn;1(f )(x)
Z " nX ;1 # Z
= 1 1
f (x + t) n Dk (t) dt = 1 f (x + t) Kn (t) dt
; k=0 ;
where Kn = (D0 + D1 + + Dn;1)=n is called Fej
er's kernel. The same techniques we
used earlier can be applied to nd a closed form for
n (f ) which, of course, reduces to
simplifying (D0 + D1 + + Dn;1 )=n. As before, we begin with a trig identity:
nX
;1 nX
;1
2 sin sin (2k + 1) = cos 2k ; cos (2k + 2)
k=0 k=0
= 1 ; cos 2n = 2 sin2 n:
Thus,
1 nX;1 sin (2k + 1) t=2 sin2 (nt=2)
Kn(t) = n = :
k=0 2 sin (t=2) 2n sin2(t=2)
Fourier Series 99
R
Please note that Kn is even, nonnegative, and 1 ; Kn (t) dt = 1. Thus,
n(f ) is
a positive, linear map from C 2 onto Tn (but it's not a projection|why?), satisfying
k
n(f )k2 kf k2 (why?).
Now the arithmetic mean operator
n (f ) is still a good approximation f in L2 norm.
Indeed, nX
;1 nX
;1
kf ;
n (f )k2 = n (f ; sk (f )) n1 kf ; sk (f )k2 ! 0
1
k=0 2 k=0
as n ! 1 (since kf ; sk (f )k2 ! 0). But, more to the point,
n (f ) is actually a good
uniform approximation to f , a fact that we'll call Fej
er's theorem:
Theorem. If f 2 C 2 , then
n(f ) converges uniformly to f as n ! 1.
Note that, since
n(f ) 2 Tn, Fej"er's theorem implies Weierstrass's second theorem.
Curiously, Fej"er was only 19 years old when he proved this result (about 1900) while
Weierstrass was 75 at the time he proved his approximation theorems.
We'll give two proofs of Fej"er's theorem one with details, one without. But both
follow from quite general considerations. First:
Kernel operators abound in analysis for example, Landau's proof of the Weierstrass
theorem uses the kernel Ln(x) = cn(1 ; x2)n . And, in the next section, we'll encounter
Jackson's kernel Jn(t) = cn sin4 nt=n3 sin4 t, which is essentially the square of Fej"er's kernel.
While we will have no need for a general theory of such operators, please note that the key
to their utility is the fact that they're nonnegative!
Fourier Series 101
Lastly, a word or two about Fourier series involving complex coecients. Most modern
textbooks consider the case of a 2-periodic, integrable function f : R ! C and dene the
Fourier series of f by
X1
ck eikt
k=;1
where now we have only one formula for the ck 's:
1 Z
ck = 2 f (t) e;ikt dt
;
but, of course, the ck 's may well be complex. This somewhat simpler approach has other
advantages for one, the exponentials eikt are now an orthonormal set (relative to the
normalizing constant 1=2). And, if we remain consistent with this choice and dene the
L2 norm by 1 Z 1=2
kf k2 = 2 jf (t)j dt 2
;
then we have the simpler estimate kf k2 kf k for f 2 C 2 .
The Dirichlet and Fejer kernels are essentially the same in this case, too, except that
P
we would now write sn(f )(x) = nk=;n ck eikx. Given this, the Dirichlet and Fej"er kernels
can be written
X
n Xn
Dn (x) = e = 1 + (eikx + e;ikx)
ikx
k=;n k=1
X n
=1+2 cos kx
k=1
sin (n + 12 ) x
=
sin 12 x
and
nX
;1 X m Xn
1
Kn (x) = n ikx
e = 1 ; jnkj eikx
m=0 k=;m k=;n
1 nX
;1 sin(m + 1 ) x
= 2
n m=0 sin 12 x
sin 2 (nt=2)
= :
n sin2(t=2)
Fourier Series 102
In other words, each is twice its real coecient counterpart. Since the choice of normalizing
p p
constant (1= versus 1=2, and sometimes even 1= or 1= 2 ) has a (small) eect on
these formulas, you may nd some variation in other textbooks.
Math 682 Problem Set: Fourier Series 6/8/98
57. Dene f (x) = ( ; x)2 for 0 x 2, and extend f to a 2-periodic continuous
function on R in the obvious way. Check that the Fourier series for f is 2 =3 +
P cos nx=n2 . Since this series is uniformly convergent, it actually converges to f .
4 1 n=1
In particular, note that setting x = 0 yields the familiar formula 1
P 1=n2 = 2 =6.
n=1
58. (a) Given n 1 and " > 0, show that there is a continuous function f 2 C 2
R
satisfying kf k = 1 and 1 ; jf (t) ; sgn Dn(t)j dt < "=(n + 1).
(b) Show that sn(f )(0) n ; " and, hence, that ksn (f )k n ; ".
R
59. (a) If f , k 2 C 2 , prove that g(x) = ; f (x + t) k(t) dt is also in C 2 .
(b) If we only assume that f is 2-periodic and Riemann integrable on ; ] (but
still k 2 C 2 ), is g still continuous?
(c) If we simply assume that f and k are 2-periodic and Riemann integrable on
; ], is g still continuous?
60. Suppose that kn 2 C 2 satises
1 Z Z
kn 0 kn(t) dt = 1 and kn(t) dt ! 0 (n ! 1)
; jtj
R
for every
> 0. If f is Riemann integrable, show that 1 ; f (x + t) kn(t) dt ! f (x)
pointwise, as n ! 1, at each point of continuity of f . In particular,
n (f )(x) ! f (x)
at each point of continuity of f .
61. Given f , g 2 C 2 , we dene the convolution of f and g, written f g, by
1 Z
(f g)(x) = f (t) g(x ; t) dt:
;
(Compare this integral with that used in problem 59.)
(a) Show that f g = g f and that f g 2 C 2 .
(b) If one of f or g is a trig polynomial, show that f g is again a trig polynomial
(of the same degree).
(c) If one of f or g is continuously dierentiable, show that f g is likewise continu-
ously dierentiable and nd an integral formula for (f g)0(x).
Math 682 Jackson's Theorems 6/16/98
We continue our investigations of the \middle ground" between algebraic and trigonometric
approximation by presenting several results due to the great American mathematician
Dunham Jackson (from roughly 1911{1912). The rst of these results will give us the best
possible estimate of En(f ) in terms of !f and n.
Jackson's theorems are what we might call direct theorems. If we know something
about f , then we can say something about En(f ). There is also the notion of an inverse
theorem, meaning that if we know something about En(f ), we should be able to say
something about f . In other words, we would expect an inverse theorem to be, more or
less, the converse of some direct theorem. Now inverse theorems are typically much harder
to prove than direct theorems, but in order to have some idea of what such theorems might
tell us (and to see some of the techniques used in their proofs), we present one of the easier
inverse theorems, due to Bernstein. This result gives the converse to one of our corollaries
to Jackson's theorem (see the top of page 105).
Theorem. If f 2 C 2 satis es EnT (f ) A n; , for some constants A and 0 < < 1,
then f 2 lipK for some constant K .
Proof. For each n, choose Un 2 Tn so that kf ; Un k A n; . Then, in particular,
(Un) converges uniformly to f . Now if we set V0 = U1 and Vn = U2n ; U2n;1 for n 1,
P Vn . Indeed,
then Vn 2 T2n and f = 1 n=0
kVn k kU2n ; f k + kU2n;1 ; f k A (2n ); + A (2n;1 ); = B 2;n
which is summable thus, the (telescoping) series 1
P
n=0 Vn converges uniformly to f .
(Why?)
Next we estimate jf (x) ; f (y)j using nitely many of the Vn's, the precise number to
be specied later. Using the mean value theorem and Bernstein's inequality we get
X
1
jf (x) ; f (y)j jVn(x) ; Vn(y)j
n=0
Jackson's Theorems 109
mX
;1 X
1
jVn(x) ; Vn (y)j + 2 kVn k
n=0 n=m
mX;1 X
1
= 0
jVn(n )jjx ; yj + 2 kVn k
n=0 n=m
mX
;1 X
1
jx ; y j 2nkVnk + 2 kVnk
n=0 n=m
mX;1 X 1
jx ; yj B 2n(1;) + 2 B 2;n
h n=0 m(1;) ;nm=mi
C jx ; y j 2 +2 ()
where we've used, in the fourth line, the fact that Vn 2 T2n and, in the last line, standard
estimates for geometric series. Now we want the right-hand side to be dominated by a
constant times jx ; yj . In other words, if we set jx ; yj =
, then we want
2m(1;) + 2;m D
or, equivalently,
(2m
)(1;) + (2m
); D:
Thus, we should choose m so that 2m
is both bounded above and bounded away from
zero. For example, if 0 <
< 1, we could choose m so that 1 2m
< 2.
In order to better explain the phrase \more or less the converse of some direct theo-
rem," let's see how the previous result falls apart when = 1. Although we might hope
that EnT (f ) A=n would imply that f 2 lipK 1, it happens not to be true. The best result
in this regard is due to Zygmund, who gave necessary and sucient conditions on f so
that EnT (f ) A=n (and these conditions do not characterize lipK 1 functions). Instead of
pursuing Zygmund's result, we'll settle for simple \surgery" on our previous result, keeping
an eye out for what goes wrong. This result is again due to Bernstein.
Theorem. If f 2 C 2 satis es EnT (f ) A=n, then !f (
) K
j log
j for some constant
K and all
suciently small.
Proof. If we repeat the previous proof, setting = 1, only a few lines change. In
particular, the conclusion of that long string of inequalites () would now read
jf (x) ; f (y)j C jx ; yj m + 2;m
= C m
+ 2;m
:
Jackson's Theorems 110
Clearly, the right-hand side cannot be dominated by a constant times
, as we might have
hoped, for this would force m to be bounded (independent of
), which in turn bounds
m
+ 2;m D
j log
j
or
m(2m
) + 1 D (2m
)j log
j:
m log 2 + log
< log 2 =) m < log 2log; 2log
< log2 2 j log
j
and, nally,
m(2m
) + 1 2m + 1 3m log6 2 j log
j log6 2 (2m
) j log
j:
Math 682 Orthogonal Polynomials 6/17/98
Given a positive (except possibly at nitely many points), Riemann integrable weight
function w(x) on a b ], the expression
Zb
hf gi = f (x) g(x) w(x) dx
a
denes an inner product on C a b ] and
Z b ! 1=2 p
kf k2 = f (x)2 w(x) dx = hf f i
a
for n 1, where
" "
an = h x Qn Qn i h Qn Qn i and bn = h x Qn Qn;1 i h Qn;1 Qn;1 i
The sheer volume of literature on orthogonal polynomials and other \special func-
tions" is truly staggering. We'll content ourselves with the Legendre and the Chebyshev
Orthogonal Polynomials 117
polynomials. In particular, let's return to the problem of nding an explicit formula for
the Legendre polynomials. We could, as Rivlin does, use induction and a few observations
that simplify the basic recurrence formula (you're encouraged to read this see pp. 53{54).
Instead we'll give a simple (but at rst sight intimidating) formula that is of use in more
general settings than ours.
Lemma 2 (with w
1 and a b ] = ;1 1 ]) says that if we want to nd a polynomial
f of degree n that is orthogonal to Pn;1, then we'll need to take a polynomial for u,
and this u will have to be divisible by (x ; 1)n (x + 1)n. (Why?) That is, we must have
Pn(x) = cn Dn (x2 ; 1)n , where D denotes dierentiation, and where we nd cn by
evaluating the right-hand side at x = 1.
n n
X
Lemma 5. (Leibniz's formula) Dn (fg) = k n;k
k=0 k D (f ) D (g).
Proof.
; ; ;
Induction and the fact that n;1 + n;1 = n .
k ;1 k k
P ;
Consequently, Q(x) = Dn (x ; 1)n (x +1)n = nk=0 nk Dk (x ; 1)n Dn;k (x +1)n and
it follows that Q(1) = 2nn! and Q(;1) = (;1)n2nn!. This, nally, gives us the formula
discovered by Rodrigues in 1814:
Pn(x) = 2n1n! Dn (x2 ; 1)n :
The Rodrigues formula is quite useful (and easily generalizes to the Jacobi polynomials).
Observations
6. By Lemma 3, the roots of Pn are real, distinct, and lie in (;1 1).
7. (x2 ; 1)n = Pnk=0(;1)k ;nkx2n;2k . If we apply 2n1n! Dn and simplify, we get another
formula for the Legendre polynomials.
1 X
n=2] n2n ; 2k
Pn(x) = 2n k
(;1) k n xn;2k :
k=0
In particular, if n is even (odd), then Pn is even (odd). Notice, too, that if we let
Pen denote the polynomial given by the standard construction, then we must have
;
Pn = 2;n 2nn Pen .
Orthogonal Polynomials 118
8. In terms of our standard recurrence formula, it follows that an = 0 (because xPn(x)2
is always odd). It remains to compute bn. First, integrating by parts,
Z1 i1 Z1
Pn(x)2 dx = xPn(x)2 ;1
; x 2Pn(x) Pn0 (x) dx
;1 ;1
or h Pn Pn i = 2 ; 2h Pn xPn0 i. But xPn0 = nPn + lower degree terms hence,
h Pn xPn0 i = nh Pn Pn i. Thus, h Pn Pn i = 2=(2n + 1). Using this and the fact
;
that Pn = 2;n 2nn Pen, we'd nd that bn = n2=(4n2 ; 1). Thus,
n2
Pn+1 = 2;n;1 2nn++12 Pen+1 = 2;n;1 2nn++12 x Pen ; (4n2 ; 1) Pen;1
= 2nn++11 x Pn ; n + n P :
1 n;1
That is, the Legendre polynomials satisfy the recurrence formula
(n + 1) Pn+1(x) = (2n + 1) x Pn (x) ; n Pn;1 (x):
q
9. It follows from 8 that the sequence Pbn = 2n2+1 Pn is orthonormal on ;1 1 ].
10. The Legendre polynomials satisfy (1 ; x2 ) Pn00 (x) ; 2x Pn0 (x) + n (n + 1) Pn(x) = 0. If
we set u = (x2 ; 1)n that is, if u(n) = 2nn!Pn, note that u0 (x2 ; 1) = 2nxu. Now we
apply Dn+1 to both sides of this last equation (using Leibniz's formula) and simplify:
u(n+2)(x2 ; 1) + (n + 1) u(n+1) 2x + (n +2 1)n u(n) 2 = 2n u(n+1) x + (n + 1) u(n)
=) (1 ; x2 ) u(n+2) ; 2x u(n+1) + n (n + 1) u(n) = 0:
11. Through a series of exercises, similar in spirit to 10, Rivlin shows that jPn(x)j 1 on
;1 1 ]. See pp. 63{64 of Rivlin for details.
Given an orthogonal sequence, it makes sense to consider \generalized Fourier series"
relative to the sequence and to nd analogues of the Dirichlet kernel, Lebesgue's theorem,
and so on. In case of the Legendre polynomials we have the following:
Example. The \Fourier-Legendre" series for f 2 C ;1 1 ] is given by Pk h f Pbk i Pbk ,
where r Z1
Pbk = 2k 2+ 1 Pk and h f Pbk i = f (x) Pbk (x) dx:
;1
Orthogonal Polynomials 119
P
The partial sum operator Sn(f ) = nk=0h f Pbk i Pbk is a linear projection onto Pn and may
be written as Z1
Sn(f )(x) = f (t) Kn (t x) dt
;1
P
where Kn(t x) = nk=0 Pbk (t) Pbk (x). (Why?)
Since the Pbk 's are orthonormal, we have
X n
b 2 X
1
jh f Pk ij = kSn(f )k2
2 kf k22 = jh f Pbk ij2
k=0 k=0
and so the generalized Fourier coecients h f Pbk i are square summable in particular,
h f Pbk i ! 0 as k ! 1. As in the case of Fourier series, the fact that the polynomials
(i.e., the span of the Pbk 's) are dense in C a b ] implies that Sn(f ) actually converges to f
in the k k2 norm. These same observations remain valid for any sequence of orthogonal
polynomials. The real question remains, just as with Fourier series, whether Sn(f ) is a
good uniform (or even pointwise) approximation to f .
If you're willing to swallow the fact that jPn(x)j 1, then
n r 2k + 1 r 2k + 1
X Xn 2
jKn (t x)j 2 2 = 2 (2k + 1) = (n +2 1) :
1
k=0 k=0
Hence, kSn(f )k (n + 1)2 kf k. That is, the \Lebesgue numbers" for this process are at
most (n + 1)2 . The analogue of Lebesgue's theorem in this case would then read:
kf ; Sn(f )k Cn2En(f ):
Thus, Sn(f ) f whenever n2En(f ) ! 0, and Jackson's theorem tells us when this will
happen: If f is twice continuously di erentiable, then the Fourier-Legendre series for f
converges uniformly to f on ;1 1 ].
The Christoel-Darboux Identity
It would also be of interest to have a closed form for Kn (t x). That this is indeed always
possible, for any sequence of orthogonal polynomials, is a very important fact.
Using our original notation, let (Qn ) be the sequence of monic orthogonal polynomials
corresponding to a given weight w, and let (Qbn ) be the orthonormal counterpart of (Qn )
Orthogonal Polynomials 120
p
in other words, Qn = nQbn , where n = h Qn Qn i . It will help things here if you recall
(from Observation 1 on page 112) that 2n = bn2n;1.
As with the Legendre polynomials, each f 2 C a b ] is represented by the generalized
P
Fourier series k h f Qbk i Qbk , with partial sum operator
Zb
Sn(f )(x) = f (t) Kn (t x) w(t) dt
a
P
where Kn (t x) = nk=0 Qbk (t) Qbk (x). As before, Sn is a projection onto Pn in particular,
Sn(1) = 1 for every n.
Theorem. (Christoel-Darboux) The kernel Kn(t x) can be written
Xn b b b b
Qbk (t) Qbk (x) = n+1;n 1 Qn+1(t) Qn (xt) ;; xQn (t) Qn+1 (x) :
k=0
Proof. We begin with the standard recurrence formulas
where h(t) = (f (t) ; f (x0 ))=(t ; x0 ). But h is bounded (and continuous everywhere
except, possibly, at x0 ) by hypothesis (i), n+1;n 1 is bounded, and Qbn(x0 ) is bounded by
hypothesis (ii). All that remains is to notice that the numbers h h Qbn i are the generalized
Fourier coecients of the bounded, Riemann integrable function h, and so must tend to
zero (since, in fact, they're even square summable).
We end this section with a negative result, due to Nikolaev:
Theorem. There is no weight w such that every f 2 C a b ] has a uniformly convergent
expansion in terms of orthogonal polynomials. In fact, given any w, there is always some
f for which kf ; Sn(f )k is unbounded.
Math 682 Problem Set: Orthogonal Polynomials 6/17/98
Throughout, w denotes a xed, positive (except possibly at nitely many points), Riemann
integrable weight function on a b ], and we consider the inner product on C a b ] dened
by
Zb
hf gi = f (x) g(x) w(x) dx
a
and the associated (strictly convex) norm
q Z b ! 1=2
kf k2 = hf f i = jf (x)j2 w(x) dx :
a
62. Prove that every inner product norm is strictly convex. Specically, let h i be an
p
inner product on a vector space X , and let kxk = hx xi be the associated norm.
Show that:
(a) kx + yk2 + kx ; yk2 = 2 (kxk2 + kyk2) for all x, y 2 X (the parallelogram identity).
(b) If kxk = r = kyk and if kx ; yk =
, then x+2 y 2 = r2 ; (
=2)2 . In particular,
x+y < r whenever x 6= y.
2
We dene a sequence of polynomials (Qn ) which are mutually orthogonal, relative to w,
by setting Q0(x) = 1, Q1 (x) = x ; a0 , and
1 nX
;1 2k + 1 2 nX;1
In(f ) = n 2n = 2n3 (2k + 1)2 = 32 ; 6n1 2 :
1
k=;n k=0
That is, j In(f ) ; I (f ) j = 1=6n2. In particular, we would need to take n 130 to get
1=6n2 10;5, for example, and this would require that we perform over 250 evaluations
of f . We'd like a method that converges a bit faster! In other words, there's no shortage
of quadrature formulas|we just want faster ones.
One reasonable requirement for our proposed quadrature formula is that it be exact
for polynomials of low degree. As it happens, this is easy to come by.
Lemma 1. Given w(x) on a b ] and nodes a x1 < < xn b, there exist unique
weights A1 : : : An such that
Zb X
n
p(x) w(x) dx = Ai p(xi )
a i=1
Gaussian Quadrature 125
for all polynomials p 2 Pn;1.
Proof. Let `1 : : : `n be the Lagrange interpolating polynomials of degree n ; 1 associ-
P
ated to the nodes x1 : : : xn , and recall that we have p = ni=1 p(xi ) `i for all p 2 Pn;1.
Hence, Zb Zb
X
n
p(x) w(x) dx = p(xi ) `i(x) w(x) dx:
a i=1 a
R
That is, Ai = ab `i (x) w(x) dx works. To see that this is the only choice, suppose that
Zb Xn
p(x) w(x) dx = Bi p(xi )
a i=1
is exact for all p 2 Pn;1, and set p = `j :
Zb X
n
Aj = `j (x) w(x) dx = Bi `j (xi ) = Bj :
a i=1
The point here is that `1 : : : `n form a basis for Pn;1 and integration is linear thus,
integration is completely determined by its action on the basis|that is, by the n values
Ai = I (`i ), i = 1 : : : n.
T
Said another way, the n point evaluations
i (p) = p(xi ) satisfy Pn;1 \ ( ni=1 ker
i ) =
f0g, and it follows that every linear, real-valued function on Pn;1 must be a linear com-
bination of the
i 's. Here's why: Since the xi 's are distinct, Pn;1 may be identied with
Rn by way of the isomorphism p 7! (p(x1 ) : : : p(xn )). A linear, real-valued function on
Pn;1 must, then, correspond to some linear, real-valued function on Rn. In other words,
it's given by inner product against some xed vector (A1 : : : An ) in particular, we must
P
have I (p) = ni=1 Ai p(xi ).
In any case, we now have our quadrature formula: For f 2 C a b ] we dene I (f ) =
Pn A f (x ), where A = R b ` (x) w(x) dx. But notice that the proof of our lastnresult
i=1 i i i a i
suggests an alternate way of writing our quadrature formula. Indeed, if Ln;1(f )(x) =
Pn f (xi )`i (x) is the Lagrange interpolating polynomial for f of degree n ; 1 based on
i=1
the nodes x1 : : : xn , then
Zb X
n Zb X
n
(Ln;1(f ))(x) w(x) dx = f (xi ) `i(x) w(x) dx = Ai f (xi ):
a i=1 a i=1
Gaussian Quadrature 126
In summary, In(f ) = I (Ln;1(f )) I (f ) that is,
X
n Zb Zb
In(f ) = Ai f (xi ) = (Ln;1(f ))(x) w(x) dx f (x) w(x) dx = I (f )
i=1 a a
where Ln;1 is the Lagrange interpolating polynomial of degree n ; 1 based on the nodes
x1 : : : xn . This formula is obviously exact for f 2 Pn;1.
It's easy to give a bound on jIn(f )j in terms of kf k indeed,
X
n X
n !
jIn(f )j jAi jjf (xi )j kf k jAi j :
i=1 i=1
By considering a norm one continuous function f satisfying f (xi ) = sgnAi for each i =
P
1 : : : n, it's easy to see that ni=1 jAi j is the smallest constant that works in this inequality.
P
In other words, n = ni=1 jAi j, n = 1 2 : : :, are the \Lebesgue numbers" for this process.
As with all previous settings, we want these numbers to be uniformly bounded.
If w(x)
1 and if f is n-times continuously dierentiable, we even have an error
estimate for our quadrature formula:
Z b Z b
f ; Ln;1(f ) Z b jf ; Ln;1(f )j 1 kf (n)k Z b Y
n
jx ; xi j dx
a a a n! a i=1
(recall the Theorem on page 72 of \A Brief Introduction to Interpolation"). As it happens,
the integral on the right is minimized when the xi 's are taken to be the zeros of the
Chebyshev polynomial Un (see Rivlin, page 72).
The fact that a quadrature formula is exact for polynomials of low degree does not by
P
itself guarantee that the formula is highly accurate. The problem is that ni=1 Ai f (xi ) may
be estimating a very small quantity through the cancellation of very large quantities. So,
for example, a positive function may yield a negative result in this approximate integral.
This wouldn't happen if the Ai 's were all positive|and we've already seen how useful
positivity can be. Our goal here is to further improve our quadrature formula to have this
property. But we have yet to take advantage of the fact that the xi 's are at our disposal.
We'll let Gauss show us the way!
Theorem. (Gauss) Fix a weight w(x) on a b ], and let (Qn ) be the canonical sequence of
orthogonal polynomials relative to w. Given n, let x1 : : : xn be the zeros of Qn (these all
Gaussian Quadrature 127
P R
lie in (a b)), and choose A1 : : : An so that the formula ni=1 Ai f (xi ) ab f (x) w(x) dx
is exact for polynomials of degree less than n. Then, in fact, the formula is exact for all
polynomials of degree less than 2n.
Proof. Given a polynomial P of degree less than 2n, we may divide: P = Qn R + S ,
where R and S are polynomials of degree less than n. Then,
Zb Zb Zb
P (x) w(x) dx = Qn (x) R(x) w(x) dx + S (x) w(x) dx
a a a
Zb
= S (x) w(x) dx since deg R < n
a
X
n
= Ai S (xi ) since deg S < n:
i=1
R
But P (xi ) = Qn(xi ) R(xi ) + S (xi ) = S (xi ), since Qn(xi ) = 0. Hence, ab P (x) w(x) dx =
Pn A P (x ) for all polynomials P of degree less than 2n.
i=1 i i
Amazing! But, well, not really: P2n;1 is of dimension 2n, and we had 2n numbers
x1 : : : xn and A1 : : : An to choose as we saw t. Said another way, the division algorithm
tells us that P2n;1 = QnPn;1 Pn;1. Since QnPn;1 ker(In), the action of In on P2n;1
is the same as its action on a \copy" of Pn;1.
In still other words, since any polynomial that vanishes at all the xi 's must be divisible
T
by Qn (and conversely), we have QnPn;1 = P2n;1 \ ( ni=1 ker
i ) = ker(In jP2n;1 ). Thus,
In \factors through" the quotient space P2n;1=QnPn;1 = Pn;1.
Also not surprising is that this particular choice of xi 's is unique.
Lemma 2. Suppose that a x1 < < xn b and A1 : : : An are given so that the
R P
equation ab P (x) w(x) dx = ni=1 Ai P (xi ) is satis ed for all polynomials P of degree less
than 2n. Then, x1 : : : xn are the zeros of Qn .
Proof. Let Q(x) =
Qn (x ; x ). Then, for k < n, the polynomial Q Qk has degree
i=1 i
n + k < 2n. Hence,
Zb X
n
Q(x) Qk (x) w(x) dx = Ai Q(xi ) Qk (xi ) = 0:
a i=1
Gaussian Quadrature 128
Since Q is a monic polynomial of degree n which is orthogonal to each Qk , k < n, we must
have Q = Qn. Thus, the xi 's are actually the zeros of Qn .
According to Rivlin, the phrase Gaussian quadrature is usually reserved for the specic
R R
quadrature formula whereby ;11 f (x) dx is approximated by ;11 (Ln;1(f ))(x) dx, where
Ln;1(f ) is the Lagrange interpolating polynomial to f using the zeros of the n-th Legendre
polynomial as nodes. (What a mouthful!) What is actually being described in our version
of Gauss's theorem is Gaussian-type quadrature.
Before computers, Gaussian quadrature was little more than a curiosity the roots
of Qn are typically irrational, and certainly not easy to come by. By now, though, it's
considered a standard quadrature technique. In any case, we still can't judge the quality
of Gauss's method without a bit more information.
Gaussian-type Quadrature
First, let's summarize our rather cumbersome notation.
orthogonal approximate
polynomial zeros weights integral
Q1 x(1)
1 A(1)
1 I1
Q2 x(2)
1 x2
(2) A1 A(2)
(2)
2 I2
Q3 x(3)
1 x2 x3
(3) (3) A1 A(3)
(3) (3)
2 A3 I3
.. .. .. ..
. . . .
P
Hidden here is the Lagrange interpolation formula Ln;1(f ) = ni=1 f (x(in) ) `(in;1), where
`(in;1) denote the Lagrange polynomials of degree n ; 1 based on x(1n) : : : x(nn). The n-th
quadrature formula is then
Zb X
n Zb
In(f ) = Ln;1(f )(x) w(x) dx = A(n)f (x(n) )
i i f (x) w(x) dx
a i=1 a
Computational Considerations
You've probably been asking yourself: \How do I nd the Ai 's without integrating?" Well,
rst let's recall the denition: In the case of Gaussian-type quadrature we have
Zb Zb Qn (x)
A(n)
i = `(n;1)(x) w(x) dx
i = w(x) dx
a a (x ; x(i n) ) Q0n (x(i n) )
(because \W " is the same as Qn here|the xi 's are the zeros of Qn ). Next, consider the
function Z b Qn (t) ; Qn (x)
'n(x) = t;x w(t) dt:
a
Since t ; x divides Qn(t) ; Qn (x), note that 'n is actually a polynomial (of degree at most
n ; 1 ) and that
Z b Qn (t)
'n (x(n))
i = (n) w(t) dt = A(in)Q0n (x(in)):
a t ; xi
Now Q0n (x(in)) is readily available we just need to compute 'n(x(in)).
Claim. The 'n's satisfy the same recurrence formula as the Qn's
'n+1(x) = (x ; an )'n(x) ; bn'n;1(x) n 1
but with di erent starting values
Zb
'0(x)
0 and '1(x)
w(x) dx:
a
Gaussian Quadrature 131
Proof. The formulas for '0 and '1 are obviously correct, since Q0 (x)
1 and Q1 (x) =
x ; a0 . We only need to check the recurrence formula itself.
Z b Qn+1(t) ; Qn+1(x)
'n+1(x) = t;x w(t) dt
a
Z b (t ; an) Qn (t) ; bnQn;1 (t) ; (x ; an ) Qn (x) + bnQn;1 (x)
= t;x w(t) dt
a
Z b Qn (t) ; Qn (x) Z b Qn;1 (t) ; Qn;1 (x)
= (x ; an) t;x w(t) dt ; bn t;x w(t) dt
a a
= (x ; an) 'n (x) ; bn 'n;1(x)
R
since ab Qn(t) w(t) dt = 0.
Of course, the derivatives Q0n satisfy a recurrence relation of sorts, too:
Q0n+1(x) = Qn(x) + (x ; an) Q0n (x) ; bn Q0n;1(x):
Q
But Q0n(x(in)) can be computed without knowing Q0n(x). Indeed, Qn(x) = ni=1(x ; x(in)),
Q
so we have Q0n (x(in)) = j6=i(xi(n) ; x(jn)).
The weights A(in), or Christo el numbers, together with the zeros of Qn are tabulated
in a variety of standard cases. See, for example, Handbook of Mathematical Functions with
Formulas, Graphs, and Tables, by Abramowitz and Stegun, eds. In practice, of course, it's
enough to tabulate data for the case a b ] = ;1 1 ].
Applications to Interpolation
Although Ln(f ) isn't typically a good uniform approximation to f , if we interpolate at the
zeros of an orthogonal polynomial Qn+1, then Ln(f ) will be a good approximation in the
k k1 or k k2 norm generated by the corresponding weight w. Specically, by rewording
R
our earlier results, it's easy to get estimates for each of the errors ab jf ; Ln(f )j w and
R b jf ; L (f )j2 w. We use essentially the same notation as before, except now we take
a n
nX
+1 ;
Ln(f ) = f x(in+1) `(in)
i=1
where x(1n+1) : : : x(nn+1+1) are the roots of Qn+1 and `(in) is of degree n. This leads to a
quadrature formula that's exact on polynomials of degree less than 2(n + 1).
Gaussian Quadrature 132
As we've already seen, `(1n) : : : `(nn+1
) are orthogonal and so kL (f )k may be computed
n 2
exactly.
1=2
Lemma. kLn(f )k2 kf k Rab w(x) dx .
Proof. Since Ln (f )2 is a polynomial of degree 2n < 2(n + 1), we have
Zb
kLn(f )k22 = Ln(f )]2 w(x) dx
a
nX "nX #2
=
+1
A(jn+1)
+1 ; ;
f x(n+1) `(n) x(n+1)
i i j
j =1 i=1
nX +1 h; i2
= A(jn+1) f x(jn+1)
j =1
nX
+1 Zb
kf k2 A(n+1)
j = kf k2 w(x) dx:
j =1 a
R 1=2
Please note that we also have kf k2 kf k ab w(x) dx that is, this same estimate
holds for kf k2 itself.
As usual, once we have an estimate for the norm of an operator, we also have an
analogue of Lebesgue's theorem.
R b 1=2
Theorem. kf ; Ln(f )k2 2 a w(x) dx En(f ).
Proof. Here we go again! Let p be the best uniform approximation to f out of Pn and
use the fact that Ln(p ) = p to see that:
kf ; Ln(f )k2 kf ; pk2 + kLn(f ; p )k2
Z b ! 1=2 Z b !1=2
kf ; pk w(x) dx + kf ; p k w(x) dx
a a
Z b
!1=2
= 2En(f ) w(x) dx :
a
Proof. Since 'n has degree < n and 'n (x(i n) ) 6= 0 for any i, we may appeal to partial-
fractions to write
'n(x) = 'n(x) Xn ci
Qn (x) ( n ) ( n ) = (n )
(x ; x1 ) (x ; xn ) i=1 x ; xi
where ci is given by
' n ( x )
'n (x (n) )
ci = Q (x) (x ; xi ) (n) = 0 i(n) = A(in):
(n )
n x=xi Qn(xi )
Now here's where the continued fractions come in: Stieltjes recognized the fact that
'n+1(x) = b0
Qn+1(x) (x ; a0 ) ; (x ; a ) ; b1
1 ...
; (x ;bnan)
Gaussian Quadrature 137
R
(which can be proved by induction), where b0 = ab w(t) dt. More generally, induction will
show that the n-th convergent of a continued fraction can be written as
An = p1
Bn q1 ; q ; p2
2 ...
; pqnn
by means of the recurrence formulas
A0 = 0 B0 = 1
A1 = p1 B1 = q1
An = qnAn;1 + pnAn;2 Bn = qnBn;1 + pnBn;2
where n = 2 3 4 : : :. Please note that An and Bn satisfy the same recurrence formula,
but with dierent starting values (as is the case with 'n and Qn ).
Again using the Chebyshev weight as an example, for x > 1 we have
Z1 dtp
p2 = =
x ;1 ;1 (x ; t) 1 ; t 2
x; 1=2
x ; 1=41=4
x; .
..
since an = 0 for all n, b1 = 1=2, and bn = 1=4 for n 2. In other words, we've just found
a continued fraction expansion for (x2 ; 1);1=2 .
Appendix
Finally, here is a brief review of some of the fancier bits of linear algebra used in this
chapter. To begin, we discuss sums and quotients of vector spaces.
Each subspace M of a nite-dimensional X induces an equivalence relation on X by
x y () x ; y 2 M:
Standard arguments show that the equivalence classes under this relation are the cosets
(translates) x + M , x 2 X . That is,
x + M = y + M () x ; y 2 M () x y:
Gaussian Quadrature 138
Equally standard is the induced vector arithmetic
For several weeks now we've taken advantage of the fact that the monomials 1 x x2 : : :
have dense linear span in C 0 1 ]. What, if anything, is so special about these particular
P
powers? How about if we consider polynomials of the form nk=0 ak xk2 are they dense,
too? More generally, what can be said about the span of a sequence of monomials (x
n ),
where 0 < 1 < 2 < ? Of course, we'll have to assume that 0 0, but it's not hard
P
to see that we will actually need 0 = 0, for otherwise each of the polynomials nk=0 ak x
k
vanishes at x = 0 (and so has distance at least 1 from the constant 1 function, for example).
If the n's are integers, it's also clear that we'll have to have n ! 1 as n ! 1. But
what else is needed? The answer comes to us from M
untz in 1914. (You sometimes see
the name Otto Sz"asz associated with M
untz's theorem, because Sz"asz proved a similar
theorem at nearly the same time (1916).)
Theorem. Let 0 0 < 1 < 2 < . Then, the functions (x
n ) have dense linear span
P ;1 = 1.
in C 0 1 ] if and only if 0 = 0 and 1
n=1 n
What M
untz is trying to tell us here is that the n's can't get big too quickly. In
P
particular, the polynomials of the form nk=0 ak xk2 are evidently not dense in C 0 1 ]. On
the other hand, the n's don't have to be unbounded indeed, M
untz's theorem implies an
earlier result of Bernstein from 1912: If 0 < 1 < 2 < < K (some constant), then
1 x1 x2 : : : have dense linear span in C 0 1 ].
Before we give the proof of M
untz's theorem, let's invent a bit of notation: We write
(X
n )
Xn = ak x
k : a0 : : : an 2 R
k=0
and, given f 2 C 0 1 ], we write dist(f Xn ) to denote the distance from f to the space
S X . That is, X is the linear span of
spanned by 1 x
1 : : : x
n . Let's also write X = 1 n=0 n
the entire sequence (x
n )1
n=0 . The question here is whether X is dense, and we'll address
the problem by determining whether dist(f Xn ) ! 0, as n ! 1, for every f 2 C 0 1 ].
If we can show that each (xed) power xm can be uniformly approximated by a linear
combination of x
n 's, then the Weierstrass theorem will tell us that X is dense in C 0 1 ].
Muntz Theorems 140
(How?) Surprisingly, the numbers dist(xm Xn) can be estimated. Our proof won't give
P ;1 = 1 comes into the
the best estimate, but it will show how the condition 1n=1 n
picture.
Yn m
Lemma. Let m > 0. Then, dist(xm Xn ) 1 ; .
k=1 k
Proof. We may certainly assume that m 6= n for any n. Given this, we inductively
dene a sequence of functions by setting P0(x) = xm and
Z1
Pn(x) = (n ; m) x
n t;1;
n Pn;1(t) dt
x
for n 1. For example,
Z1
1
P1(x) = (1 ; m) x
1 t;1;
1 tm dt = ;x
1 tm;
1 x = x ;x :
m
1
x
P
By induction, each Pn is of the form xm ; nk=0 ak x
k for some scalars (ak ):
Z1
Pn (x) = (n ; m) x
n t;1;
n Pn;1(t) dt
Zx1 " nX
;1 #
= (n ; m) x
n t;1;
n tm ; ak t
k dt
x k=0
nX
;1 ak
= xm ; x
n + (n ; m) n ; k (x ; x ):
k
n
k=0
Finally, kP0k = 1 and kPnk j1 ;
mn jkPn;1k, because
Z1 m
jn ; mj x
n t;1;
n dt = jn ; mj (1 ; x
n )
n 1 ; :
x n
Thus, n
Y m
1 ; :
dist(xm Xn ) kPnk k
k=1
The preceding result is due to v. Golitschek. A slightly better estimate, also due to
Q
v. Golitschek (1970), is dist(xm Xn) nk=1 jmm+;
kk j .
Now a well-known fact about innite products is that, for positive a 's, the product
Q1 1 ; ak diverges (to 0) if and only if the series P1 ak diverges (tok 1) if and only
k=1 k=1
Muntz Theorems 141
Q Q
if the product 1 k=11 + ak diverges (to 1). In particular, k=1 1 ;
k ! 0 if and only
n m
P
if nk=1
1k ! 1. That is, dist(xm Xn ) ! 0 if and only if 1
P 1 = 1. This proves the
k=1
k
\backward" direction of M
untz's theorem.
We'll prove the \forward" direction of M
untz's theorem by proving a version of M
untz's
theorem for the space L2 0 1 ]. For our purposes, L2 0 1 ] denotes the space C 0 1 ]
endowed with the norm Z 1 1=2
kf k2 = jf (x)j dx
2
0
although our results are equally valid in the \real" space L2 0 1 ] (consisting of square-
integrable, Lebegue measurable functions). In the latter case, we no longer need to assume
that 0 = 0, but we do need to assume that each n > ;1=2 (in order that x2
n be
integrable on 0 1 ]).
Remarkably, the distance from f to the span of x
0 x
1 : : : x
n can be computed
exactly in the L2 norm. For this we'll need some more notation: Given linearly independent
vectors f1 : : : fn in an inner product space, we call
h f f i h f f i
1 . 1 . 1 n
.. = det h fi fj i
G(f1 : : : fn ) = .. ..
h fn f1 i h fn fn i
. ij
d 2 = h g ; f g ; f i = h g ; f g i = h g g i ; h g f i
in other words,
X
n
d2 + aih g fi i = h g g i: ()
i=1
Now consider () and () as a system of n +1 equations in the n +1 unknowns a1 : : : an,
and d 2 in matrix form we have
2 1 h g f1 i h g fn i 32 d2 3 2 hg g i 3
66 0 h f1 f1 i h f1 fn i 77 66 a1 77 66 h f1 g i 77
64 .. . . .. .. 75 64 .. 75 = 64 .. 75 :
. . . . . .
0 h fn f1 i h fn fn i an h fn g i
Solving for d 2 using Cramer's rule gives the desired result expanding along the rst
column shows that the matrix of coecients has determinant G(f1 : : : fn ), while the
matrix obtained by replacing the \d column" by the right-hand side has determinant
G(g f1 : : : fn).
Note: By our last Lemma and induction, every Gram determinant is positive!
In what follows, we will still use Xn to denote the span of x
0 : : : x
n , but now we'll
write dist 2 (f Xn) to denote the distance from f to Xn in the L2 norm.
Theorem. Let m, k > ;1=2 for k = 0 1 2 : : :. Then,
1 Yn jm ; j
k :
dist 2(xm Xn ) = p
2m + 1 k=0 m + k + 1
Y a22+b12 1
a + b a + b
(ai + bj ) .. ... ..
i6=j . .
aann++bbn1 aann++bnbn;1 1
and now take the limit as b1 ! ;a1 , b2 ! ;a2, etc. The expression above tends to
Q (ai ; aj ), as does the right-hand side of Cauchy's formula.
i6=j
R
Now, h xp xq i = 01 xp+q dx = p+1q+1 for p, q > ;1=2, so
! Q
1 (i ; j )2
0
G(x : : : x ) = det + + 1
n = Q ( + + 1)
i>j
i j ij ij i j
with a similar formula holding for G(xm x
0 : : : x
n ). Substituting these expressions into
our distance formula and taking square roots nishes the proof.
Now we can determine exactly when X is dense in L2 0 1 ]. For easier comparison to
the C 0 1 ] case, we suppose that the n 's are nonnegative.
Theorem. Let 0 0 < 1 < 2 < . Then, the functions (x
n ) have dense linear span
P ;1 = 1.
in L2 0 1 ] if and only if 1
n=1 n
P1 Q Q
n=1
n < 1, then each of the products k=1 1 ;
k and k=1 1 +
k
Proof. If 1 n m n (m+1)
converges to some nonzero limit for any m not equal to any k . Thus, dist 2(xm Xn) 6! 0,
as n ! 1, for any m 6= k , k = 0 1 2 : : :. In particular, the functions (x
n ) cannot have
dense linear span in L2 0 1 ].
P 1 = 1, then Qn 1 ; m diverges to 0 while Qn 1 + (m+1)
Conversely, if 1n=1
n k=1
k k=1
k
diverges to +1. Thus, dist 2(x Xn) ! 0, as n ! 1, for every m > ;1=2. Since the
m
polynomials are dense in L2 0 1 ], this nishes the proof.
Finally, we can nish the proof of M
untz's theorem in the case of C 0 1 ]. Suppose
that the functions (x
n ) have dense linear span in C 0 1 ]. Then, since kf k2 kf k, it
Muntz Theorems 144
follows that the functions (x
n ) must also have dense linear span in L2 0 1 ]. (Why?)
Hence, 1
P 1 = 1.
n=1
n
Just for good measure, here's a second proof of the \backward" direction for C 0 1 ]
P 1 = 1, and let m 1. Then,
based on the L2 0 1 ] version. Suppose that 1 n=1
n
Xn Z x n a Zx
X
xm ; ak x
k = m tm;1 dt ; k t
k ;1 dt
1
k=0 0 k=0 k 0
Z 1 1 X n a
m tm;1 ; k t
k;1 dt
0 k=0 k
0Z
11=2
Xn 2
@ m tm;1 ; ak t
k;1 dtA :
1 1
0 k=0 k
Application. Let 0 = 0 < 1 < 2 < with P1n=1 ;n 1 = 1, and let f be a continuous
function on 0 1) for which c = tlim
!1 f (t) exists. Then, f can be uniformly approximated
by nite linear combinations of the exponentials (e;
nt )1n=0 .
Proof. The function g (x) = f (; log x), for 0 < x 1, and g (0) = c, is continuous on
0 1 ]. In other words, g(e;t) = f (t) for each 0 t < 1. Thus, given " > 0, we can nd
n and a0 : : : an such that
Xn Xn
max g(x) ; ak x
k
= max f (t) ; ak e;
kt < ":
0 x 1 k=0 0 t<1 k=0
Math 682 The Stone-Weierstrass Theorem
To begin, an algebra is a vector space A on which there is a multiplication (f g) 7! fg
(from A A into A) satisfying
(i) (fg)h = f (gh), for all f , g, h 2 A
(ii) f (g + h) = fg + fh and (f + g)h = fg + gh, for all f , g, h 2 A
(iii) (fg) = (f )g = f (g), for all scalars and all f , g 2 A.
In other words, an algebra is a ring under vector addition and multiplication, together
with a compatible scalar multiplication. The algebra is commutative if
(iv) fg = gf , for all f , g 2 A.
And we say that A has an identity element if there is a vector e 2 A such that
(v) fe = ef = f , for all f 2 A.
In case A is a normed vector space, we also require that the norm satisfy
(vi) kfgk kf kkgk
(this simplies things a bit), and in this case we refer to A as a normed algebra. If a
normed algebra is complete, we refer to it as a Banach algebra. Finally, a subset B of an
algebra A is called a subalgebra (of A) if B is itself an algebra (under the same operations)
that is, if B is a (vector) subspace of A which is closed under multiplication.
If A is a normed algebra, then all of the various operations on A (or A A) are
continuous. For example, since
kfg ; hkk = kfg ; fk + fk ; hkk kf k kg ; kk + kkk kf ; hk
it follows that multiplication is continuous. (How?) In particular, if B is a subspace (or
subalgebra) of A, then B, the closure of B, is also a subspace (or subalgebra) of A.
Examples
1. If we dene multiplication of vectors \coordinatewise," then Rn is a commutative
Banach algebra with identity (the vector (1 : : : 1)) when equipped with the norm
kxk1 = max jxi j.
1 i n
Stone-Weierstrass 146
2. It's not hard to identify the subalgebras of Rn among its subspaces. For example, the
subalgebras of R2 are f(x 0) : x 2 Rg, f(0 y) : y 2 Rg, and f(x x) : x 2 Rg, along
with f(0 0)g and R2.
3. Given a set X , we write B(X ) for the space of all bounded, real-valued functions on
X . If we endow B(X ) with the sup norm, and if we dene arithmetic with functions
pointwise, then B(X ) is a commutative Banach algebra with identity (the constant
1 function). The constant functions in B(X ) form a subalgebra isomorphic (in every
sense of the word) to R.
4. If X is a metric (or topological) space, then we may consider C (X ), the space of all
continuous, real-valued functions on X . If we again dene arithmetic with functions
pointwise, then C (X ) is a commutative algebra with identity (the constant 1 function).
The bounded, continuous functions on X , written Cb(X ) = C (X ) \ B(X ), form a
closed subalgebra of B(X ). If X is compact, then Cb(X ) = C (X ). In other words,
if X is compact, then C (X ) is itself a closed subalgebra of B(X ) and, in particular,
C (X ) is a Banach algebra with identity.
5. The polynomials form a dense subalgebra of C a b ]. The trig polynomials form a
dense subalgebra of C 2 . These two sentences summarize Weierstrass's two classical
theorems in modern parlance and form the basis for Stone's version of the theorem.
Using this new language, we may restate the classical Weierstrass theorem to read:
If a subalgebra A of C a b ] contains the functions e(x) = 1 and f (x) = x, then A is
dense in C a b ]. Any subalgebra of C a b ] containing 1 and x actually contains all the
polynomials thus, our restatement of Weierstrass's theorem amounts to the observation
that any subalgebra containing a dense set is itself dense in C a b ].
Our goal in this section is to prove an analogue of this new version of the Weierstrass
theorem for subalgebras of C (X ), where X is a compact metric space. In particular, we
will want to extract the essence of the functions 1 and x from this statement. That is, we
seek conditions on a subalgebra A of C (X ) that will force A to be dense in C (X ). The
key role played by 1 and x, in the case of C a b ], is that a subalgebra containing these
two functions must actually contain a much larger set of functions. But since we can't
be assured of anything remotely like polynomials living in the more general C (X ) spaces,
Stone-Weierstrass 147
we might want to change our point of view. What we really need is some requirement on
a subalgebra A of C (X ) that will allow us to construct a wide variety of functions in A.
And, if A contains a suciently rich variety of functions, it might just be possible to show
that A is dense.
Since the two replacement conditions we have in mind make sense in any collection of
real-valued functions, we state them in some generality.
Let A be a collection of real-valued functions on some set X . We say that A separates
points in X if, given x 6= y 2 X , there is some f 2 A such that f (x) 6= f (y). We say that
A vanishes at no point of X if, given x 2 X , there is some f 2 A such that f (x) 6= 0.
Examples
6. The single function f (x) = x clearly separates points in a b ], and the function e(x) =
1 obviously vanishes at no point in a b ]. Any subalgebra A of C a b ] containing
these two functions will likewise separate points and vanish at no point in a b ].
7. The set E of even functions in C ;1 1 ] fails to separate points in ;1 1 ] indeed,
f (x) = f (;x) for any even function. However, since the constant functions are even,
E vanishes at no point of ;1 1 ]. It's not hard to see that E is a proper closed
subalgebra of C ;1 1 ]. The set of odd functions will separate points (since f (x) = x
is odd), but the odd functions all vanish at 0. The set of odd functions is a proper
closed subspace of C ;1 1 ], although not a subalgebra.
8. The set of all functions f 2 C ;1 1 ] for which f (0) = 0 is a proper closed subalgebra
of C ;1 1 ]. In fact, this set is a maximal (in the sense of containment) proper closed
subalgebra of C ;1 1 ]. Note, however, that this set of functions does separate points
in ;1 1 ] (again, because it contains f (x) = x).
9. It's easy to construct examples of non-trivial closed subalgebras of C (X ). Indeed,
given any closed subset X0 of X , the set A(X0 ) = ff 2 C (X ) : f vanishes on X0g is
a non-empty, proper subalgebra of C (X ). It's closed in any reasonable topology on
C (X ) because it's closed under pointwise limits. Subalgebras of the type A(X0 ) are
of interest because they're actually ideals in the ring C (X ). That is, if f 2 C (X ),
and if g 2 A(X0 ), then fg 2 A(X0 ).
Stone-Weierstrass 148
As these few examples illustrate, neither of our new conditions, taken separately, is
enough to force a subalgebra of C (X ) to be dense. But both conditions together turn
out to be sucient. In order to better appreciate the utility of these new conditions, let's
isolate the key computational tool that they permit within an algebra of functions.
Lemma. Let A be an algebra of real-valued functions on some set X , and suppose that
6 y 2 X and a,
A separates points in X and vanishes at no point of X . Then, given x =
b 2 R, we can nd an f 2 A with f (x) = a and f (y) = b.
;
Proof. Given any pair of distinct points x 6= y 2 X , the set Ae = f f (x) f (y ) : f 2 Ag
is a subalgebra of R2. If A separates points in X , then Ae is evidently neither f(0 0)g nor
f(x x) : x 2 Rg. If A vanishes at no point, then f(x 0) : x 2 Rg and f(0 y) : y 2 Rg are
both excluded. Thus Ae = R2. That is, for any a, b 2 R, there is some f 2 A for which
(f (x) f (y)) = (a b).
Now we can state Stone's version of the Weierstrass theorem (for compact metric
spaces). It should be pointed out that the theorem, as stated, also holds in C (X ) when
X is a compact Hausdor topological space (with the same proof), but does not hold for
algebras of complex-valued functions over C . More on this later.
Stone-Weierstrass Theorem. (real scalars) Let X be a compact metric space, and let
A be a subalgebra of C (X ). If A separates points in X and vanishes at no point of X ,
then A is dense in C (X ).
What Cheney calls an \embryonic" version of this theorem appeared in 1937, as a small
part of a massive 106 page paper! Later versions, appearing in 1948 and 1962, benetted
from the work of the great Japanese mathematician Kakutani and were somewhat more
palatable to the general mathematical public. But, no matter which version you consult,
you'll nd them dicult to read. For more details, I would recommend you rst consult
Folland's Real Analysis, or Simmons's Topology and Modern Analysis.
As a rst step in attacking the proof of Stone's theorem, notice that if A satises the
conditions of the theorem, then so does its closure A. (Why?) Thus, we may assume that
A is actually a closed subalgebra of C (X ) and prove, instead, that A = C (X ). Now the
closed subalgebras of C (X ) inherit more structure than you might rst imagine.
Stone-Weierstrass 149
Theorem. If A is a subalgebra of C (X ), and if f 2 A, then jf j 2 A. Consequently, A is
a sublattice of C (X ).
Proof. Let " > 0, and consider the function jtj on the interval
;kf k kf k
. By the
P
Weierstrass theorem, there is a polynomial p(t) = nk=0 ak tk such that jtj; p(t) < " for
all jtj kf k. In particular, notice that jp(0)j = ja0 j < ".
Now, since jf (x)j kf k for all x 2 X , it follows that jf (x)j ; p(f (x)) < " for all
x 2 X . But p(f (x)) = (p(f ))(x), where p(f ) = a0 1 + a1f + + an f n , and the function
g = a1 f + + an f n 2 A, since A is an algebra. Thus, jf (x)j ; g(x) ja0j + " < 2"
for all x 2 X . In other words, for each " > 0, we can supply an element g 2 A such that
k jf j ; gk < 2". That is, jf j 2 A.
The statement that A is a sublattice of C (X ) means that if we're given f , g 2 A, then
maxff gg 2 A and minff gg 2 A, too. But this is actually just a statement about real
numbers. Indeed, since
and so Z 2 Z 2
2 = f (eit ) f (eit ) dt =
f (eit ) f (eit ) ; p(eit ) dt
0 0
because f (z) f (z) = jf (z)j2 = 1. Now, taking absolute values, we get
Z 2
2 f (eit ) ; p(eit ) dt 2kf ; pk:
0
That is, kf ; pk 1 for any polynomial p.
We might as well proceed in some generality: Given a compact metric space X , we'll
write CC (X ) for the set of all continuous, complex-valued functions f : X ! C , and we
Stone-Weierstrass 152
norm CC (X ) by kf k = max x2X
jf (x)j (where jf (x)j is the modulus of the complex number
f (x), of course). CC (X ) is a Banach algebra over C . In order to make it clear which eld
of scalars is involved, we'll write CR (X ) for the real-valued members of CC (X ). Notice,
though, that CR (X ) is nothing other than C (X ) with a new name.
More generally, we'll write AC to denote an algebra, over C , of complex-valued func-
tions and AR to denote the real-valued members of AC . It's not hard to see that AR is
then an algebra, over R, of real-valued functions.
Now if f is in CC (X ), then so is the function f (x) = f (x) (the complex-conjugate of
f (x)). This puts
;
Ref = 12 f + f and Imf = 21i f ; f
;
the real and imaginary parts of f , in CR (X ) too. Conversely, if g, h 2 CR (X ), then
g + ih 2 CC (X ).
This simple observation gives us a hint as to how we might apply the Stone-Weierstrass
theorem to subalgebras of CC (X ). Given a subalgebra AC of CC (X ), suppose that we could
prove that AR is dense in CR (X ). Then, given any f 2 CC (X ), we could approximate Ref
and Imf by elements g, h 2 AR . But since AR AC , this means that g + ih 2 AC , and
g + ih approximates f . That is, AC is dense in CC (X ). Great! And what did we really use
here? Well, we need AR to contain the real and imaginary parts of \most" functions in
CC (X ). If we insist that AC separate points and vanish at no point, then AR will contain
\most" of CR (X ). And, to be sure that we get both the real and imaginary parts of each
element of AC , we'll insist that AC contain the conjugates of each of its members: f 2 AC
whenever f 2 AC . That is, we'll require that AC be self-conjugate (or, as some authors
say, self-adjoint).
Stone-Weierstrass Theorem. (complex scalars) Let X be a compact metric space, and
let AC be a subalgebra, over C , of CC (X ). If AC separates points in X , vanishes at no
point of X , and is self-conjugate, then AC is dense in CC (X ).
Proof. Again, write AR for the set of real-valued members of AC . Since AC is self-
conjugate, AR contains the real and imaginary parts of every f 2 AC
; ;
Ref = 21 f + f 2 AR and Imf = 21i f ; f 2 AR :
Stone-Weierstrass 153
Moreover, AR is a subalgebra, over R, of CR (X ). In addition, AR separates points in X
and vanishes at no point of X . Indeed, given x 6= y 2 X and f 2 AC with f (x) 6= f (y),
we must have at least one of Ref (x) 6= Ref (y) or Imf (x) 6= Imf (y). Similarly, f (x) 6= 0
means that at least one of Ref (x) 6= 0 or Imf (x) 6= 0 holds. That is, AR satises the
hypotheses of the real-scalar version of the Stone-Weierstrass theorem. Consequently, AR
is dense in CR (X ).
Now, given f 2 CC (X ) and " > 0, take g, h 2 AR with kg ; Ref k < "=2 and
kh ; Imf k < "=2. Then, g + ih 2 AC and kf ; (g + ih)k < ". Thus, AC is dense in
CC (X ).
Corollary. The polynomials, with complex coecients, in z and z are dense in CC (T). In
other words, the complex trig polynomials are dense in CC2 .
Note that it follows from the complex-scalar proof that the real parts of the polyno-
mials in z and z , that is, the real trig polynomials, are dense in CR (T) = CR2 .
Corollary. The real trig polynomials are dense in CR2 .
Application: Lipschitz Functions
In most Real Analysis courses, the classical Weierstrass theorem is used to prove that
C a b ] is separable. Likewise, the Stone-Weierstrass theorem can be used to show that
C (X ) is separable, where X is a compact metric space. While we won't have anything quite
so convenient as polynomials at our disposal, we do, at least, have a familiar collection of
functions to work with.
Given a metric space (X d ), and 0 K < 1, we'll write lipK (X ) to denote the
collection of all real-valued Lipschitz functions on X with constant at most K that is,
f : X ! R is in lipK (X ) if jf (x) ; f (y)j Kd(x y) for all x, y 2 X . And we'll write
lip(X ) to denote the set of functions that are in lipK (X ) for some K in other words,
lip(X ) = 1
S lip (X ). It's easy to see that lip(X ) is a subspace of C (X ) in fact, if X
K =1 K
is compact, then lip(X ) is even a subalgebra of C (X ). Indeed, given f 2 lipK (X ) and
g 2 lipM (X ), we have
jf (x)g(x) ; f (y)g(y)j jf (x)g(x) ; f (y)g(x)j + jf (y)g(x) ; f (y)g(y)j
K kgk jx ; yj + M kf k jx ; yj:
Stone-Weierstrass 154
Lemma. If X is a compact metric space, then lip(X ) is dense in C (X ).
Proof. Clearly, lip(X ) contains the constant functions and so vanishes at no point of
X . To see that lip(X ) separates point in X , we use the fact that the metric d is Lipschitz:
Given x0 6= y0 2 X , the function f (x) = d(x y0 ) satises f (x0 ) > 0 = f (y0 ) moreover,
f 2 lip1(X ) since
(Why?) The sets EK are (uniformly) bounded and equicontinuous. Hence, by the Arzel'a-
Ascoli theorem, each EK is compact in C (X ). Since compact sets are separable, as are
countable unions of compact sets, it follows that lip(X ) is separable.
As it happens, the converse is also true (which is why this is interesting) see Folland's
Real Analysis for more details.
Theorem. If C (X ) is separable, where X is a compact Hausdor topological space, then
X is metrizable.
A Short List of References
Books
Abramowitz, M. and Stegun, I., eds., Handbook of Mathematical Functions with For-
mulas, Graphs, and Mathematical Tables, Dover, 1965.
Birkhoff, G., A Source Book in Classical Analysis, Harvard, 1973.
Buck, R. C., ed., Studies in Modern Analysis, MAA, 1962.
Carothers, N. L., Real Analysis, Cambridge, 2000.
Cheney, E. W., Introduction to Approximation Theory, Chelsea, 1982.
Davis, P. J., Interpolation and Approximation, Dover, 1975.
DeVore, R. A. and Lorentz, G. G., Constructive Approximation, Springer-Verlag,
1993.
Dudley, R. M., Real Analysis and Probability, Wadsworth & Brooks/Cole, 1989.
Folland, G. B., Real Analysis: modern techniques and their applications, Wiley, 1984.
Fox, L. and Parker, I. B., Chebyshev Polynomials, Oxford University Press, 1968.
Hoffman, K., Analysis in Euclidean Space, Prentice-Hall, 1975.
Jackson, D., Theory of Approximation, AMS, 1930.
Jackson, D., Fourier Series and Orthogonal Polynomials, MAA, 1941.
Ko rner, T. W., Fourier Analysis, Cambridge, 1988.
Korovkin, P. P., Linear Operators and Approximation Theory, Hindustan Publishing,
1960.
La Vall
ee Poussin, Ch.-J. de, Lecons sur l'Approximation des Fonctions d'une Vari-
able Reele, Gauthier-Villars, 1919.
Lorentz, G. G., Bernstein Polynomials, Chelsea, 1986.
Lorentz, G. G., Approximation of Functions, Chelsea, 1986.
Natanson, I., Constructive Function Theory, 3 vols., Ungar, 1964{1965.
Powell, M. J. D., Approximation Theory and Methods, Cambridge, 1981.
Rivlin, T. J., An Introduction to the Approximation of Functions, Dover, 1981.
Rivlin, T. J., The Chebyshev Polynomials, Wiley, 1974.
Rudin, W., Principles of Mathematical Analysis, 3rd. ed., McGraw-Hill, 1976.
Simmons, G. F., Introduction to Topology and Modern Analysis, McGraw-Hill, 1963
reprinted by Robert E. Krieger Publishing, 1986.
Articles
Boas, R. P., \Inequalities for the derivatives of polynomials," Mathematics Magazine,
42 (1969), 165{174.
Fisher, S., \Quantitative approximation theory," The American Mathematical Monthly,
85 (1978), 318{332.
References 156
Hedrick, E. R., \The signicance of Weierstrass's theorem," The American Mathemat-
ical Monthly, 20 (1927), 211{213.
Jackson, D., \The general theory of approximation by polynomials and trigonometric
sums," Bulletin of the American Mathematical Society, 27 (1920{1921), 415{431.
Lebesgue, H., \Sur l'approximation des fonctions," Bulletin des Sciences Mathematique,
22 (1898), 278{287.
Shields, A., \Polynomial approximation," The Mathematical Intelligencer, 9 (1987),
No. 3, 5{7.
Shohat, J. A., \On the development of functions in series of orthogonal polynomials,"
Bulletin of the American Mathematical Society, 41 (1935), 49{82.
Stone, M. H., \Applications of the theory of Boolean rings to general topology," Trans-
actions of the American Mathematical Society, 41 (1937), 375{481.
Stone, M. H., \A generalized Weierstrass theorem," in Studies in Modern Analysis, R.
C. Buck, ed., MAA, 1962.
Van Vleck, E. B., \The in&uence of Fourier's series upon the development of mathe-
matics," Science, 39 (1914), 113{124.
Weierstrass, K., \U
ber die analytische Darstellbarkeit sogenannter willk
urlicher Func-
tionen einer reellen Ver
anderlichen," Sitzungsberichte der Koniglich Preussischen
Akademie der Wissenshcaften zu Berlin, (1885), 633{639, 789{805.
Weierstrass, K., \Sur la possibilit"e d'une repr"esentation analytique des fonctions
dites arbitraires d'une variable r"eele," Journal de Mathematiques Pures et Appliquees,
2 (1886), 105{138.