NSA Notes
NSA Notes
ISAAC GOLDBRING
Contents
1. The hyperreals 3
1.1. Basic facts about the ordered real field 3
1.2. The nonstandard extension 4
1.3. Arithmetic in the hyperreals 5
1.4. The structure of N∗ 7
1.5. More practice with transfer 8
1.6. Problems 9
2. Logical formalisms for nonstandard extensions 10
2.1. Approach 1: The compactness theorem 11
2.2. Approach 2: The ultrapower construction 12
2.3. Problems 16
3. Sequences and series 17
3.1. First results about sequences 17
3.2. Cluster points 19
3.3. Series 21
3.4. Problems 22
4. Continuity 23
4.1. First results about continuity 23
4.2. Uniform continuity 25
4.3. Sequences of functions 27
4.4. Problems 30
5. Differentiation 33
5.1. The derivative 33
5.2. Continuous differentiability 35
5.3. Problems 36
6. Riemann Integration 38
6.1. Hyperfinite Riemann sums and integrability 38
6.2. The Peano Existence Theorem 41
6.3. Problems 43
7. Weekend Problem Set #1 44
8. Many-sorted and Higher-Type Structures 47
8.1. Many-sorted structures 47
1. The hyperreals
1.1. Basic facts about the ordered real field. The ordered field of real
numbers is the structure (R; +, ·, 0, 1, <). We recall some basic properties:
• (Q is dense in R) for every r ∈ R and every ∈ R>0 , there is q ∈ Q
such that |r − q| < ;
• (Triangle Inequality) for every x, y ∈ R, we have |x + y| ≤ |x| + |y|;
• (Archimedean Property) for every x, y ∈ R>0 , there is n ∈ N such
that nx > y.
Perhaps the most important property of the ordered real field is
Definition 1.1 (Completeness Property). If A ⊆ R is nonempty and bounded
above, then there is a b ∈ R such that:
• for all a ∈ A, we have a ≤ b (b is an upper bound for A);
4 ISAAC GOLDBRING
Suppose that the result is true for a given m. Notice that N − m 6= 0, else
N = m ∈ N. Applying transfer to the statement “for all n ∈ N, if n 6= 0,
then n − 1 ∈ N,” we see that (N − m) − 1 = N − (m + 1) ∈ N∗ .
Since we know we have at least one nonstandard natural number, we now
know that we have an entire galaxy of them. Notice that a galaxy looks just
like a copy of Z and that γ(M ) = γ(N ) if and only if |M − N | ∈ N.
Observe that if γ(M ) = γ(M 0 ) and γ(N ) = γ(N 0 ) and γ(M ) 6= γ(N ),
then M < N if and only if M 0 < N 0 . Consequently, we can define an
ordering on galaxies: if γ(M ) 6= γ(N ), then we say γ(M ) < γ(N ) if and
only if M < N . When γ(M ) < γ(N ), we think of M as being infinitely less
than N .
What can be said about the ordering of the set of galaxies? In particular,
are there more than just two galaxies?
Lemma 1.16. The set of infinite galaxies is densely ordered without end-
points, meaning:
(1) there is no largest infinite galaxy, that is, for every M ∈ N∗ \ N,
there is N ∈ N∗ \ N such that γ(M ) < γ(N );
(2) there is no smallest infinite galaxy, that is, for every M ∈ N∗ \ N,
there is N ∈ N∗ \ N such that γ(N ) < γ(M );
(3) between any two infinite galaxies, there is a third (infinite) galaxy,
that is, for every M1 , M2 ∈ N∗ \ N such that γ(M1 ) < γ(M2 ), there
is N ∈ N∗ \ N such that γ(M1 ) < γ(N ) < γ(M2 ).
Proof. (1) Given M ∈ N∗ \ N, we claim that γ(M ) < γ(2M ). Otherwise,
2M = M + m for some m ∈ N, whence M = m, a contradiction.
(2) Since γ(M ) = γ(M −1), we may as well suppose that M is even. Then
γ( M M ∗
2 ) < γ(M ) from the proof of (1); it remains to note that 2 ∈ N \ N.
(3) Again, we may as well assume that M1 and M2 are both even. In this
case, arguing as before, one can see that γ(M1 ) < γ( M1 +M
2
2
) < γ(M2 ).
Under suitable richness assumptions on the nonstandard extension (to
be discussed later), one can go even further: if (Nα )α<κ is a descending
sequence of nonstandard natural numbers, then there is N ∈ N∗ \ N such
that N < Nα for all α < κ.
1.5. More practice with transfer. In order to get some practice with the
Transfer Principle, we will prove the assertion made in Remark 1.3. More
precisely:
Theorem 1.17. The statement “every finite element of R∗ has a standard
part” implies the Completeness Property of the ordered real field.
Proof. Suppose that A ⊆ R is nonempty and bounded above. We must
show that sup(A) exists. Let b ∈ R be an upper bound for A. Let’s define
a function f : R → R as follows: If r ∈ R \ N, set f (r) = 0. Otherwise, set
f (n) = the least k ∈ Z such that nk is an upper bound for A; such a k exists
LECTURE NOTES ON NONSTANDARD ANALYSIS 9
Claim 1: f (N ) f (N )
N ∈ Rfin : Suppose this is not the case, so, N ∈ Rinf . Since
f (1) ≤ f (N ), we must have that f (N ) is a positive infinite element. Since
the statement “for all n ∈ N, there is a ∈ A such that f (n)−1
n < a” is true in
R, by the Transfer Principle, the statement for all n ∈ N , there is a ∈ A∗
∗
1.6. Problems.
Problem 1.1. Let x, x0 , y, y 0 ∈ Rfin .
(1) Show that x ≈ y if and only if st(x) = st(y).
(2) Show that if x ∈ R, then x = st(x).
(3) Show that x ≤ y implies st(x) ≤ st(y). Show that the converse is
false.
(4) Show that if st(x) < st(y), then x < y. In fact, show that if st(x) <
st(y), then there is r ∈ R such that x < r < y.
(5) Suppose that x ≈ x0 and y ≈ y 0 . Show that:
(a) x ± y ≈ x0 ± y 0 ;
10 ISAAC GOLDBRING
(b) x · y ≈ x0 · y 0 ;
0
(c) xy ≈ xy0 if y 6≈ 0.
Show that (c) can fail if y, y 0 ∈ µ \ {0}.
Problem 1.2. Suppose x, y ∈ R∗ and x ≈ y. Show that if b ∈ Rfin , then
bx ≈ by. Show that this can fail if b ∈
/ Rfin .
Problem 1.3.
(1) Show that R∗ is not complete by finding a nonempty subset of R∗
which is bounded above that does not have a supremum.
(2) Show that if A ⊆ R is unbounded, then A has no least upper bound
when considered as a subset of R∗ . (This may even be how you
solved part (a).)
Problem 1.4. Let F be an ordered field. F is said to be archimedean if for
any x, y ∈ F with x, y > 0, there is n ∈ N such that y < nx. Show that R∗ is
not archimedean. (It is a fact that archimedean ordered fields are complete,
so this problem strengthens the result of the previous problem.)
Problem 1.5. Construct a sequence of subsets (An ) of R such that
∞
[ [∞
∗
( An ) 6= A∗n .
n=1 n=1
If you asked the former question, you can safely skip this section and
discover the wonders of the nonstandard calculus to come in the follow-
ing sections. (But please, at some point, return and read this section!) If
you asked the latter question, we will ease your trepidations by offering
not one, but two, different logical formalisms for nonstandard extensions.
The first formalism will rely heavily on the Compactness Theorem from
first-order logic, but, modulo that prerequisite, this route is the quickest
way to obtain nonstandard extensions. The second formalism is the Ultra-
product Approach, which is the most algebraic and “mainstream” way to
explain nonstandard methods to “ordinary” mathematicians. Of course, at
some point, some logic must be introduced in the form of Los’ (pronounced
“Wash’s”) theorem, which will be discussed as well.
Exercise 2.1.
(1) The function h : R → A given by h(r) = (cr )A is an injective
homomorphism of L-structures.
(2) Use (1) to find some L-structure A0 isomorphic to A such that R is
a substructure of A0 .
that, for every r ∈ R>0 , R∗ |= c0 < v < cr JK. Then is a positive infinites-
imal, verifying postulate (NS2). Finally, the fact that R∗ |= Th(R) is the
rigorous, precise meaning of postulate (NS4). We have thus proven:
Theorem 2.2. There is a nonstandard universe, namely R∗ .
Exercise 2.3. Suppose that A ⊆ Rn and f : A → R is a function.
(1) Let f1 : Rn → R be an arbitrary extension of f to all of Rn . Define
f ∗ := f1∗ |A∗ . Show that f ∗ is independent of the choice of extension
f1 . (This provides a way to define the nonstandard extensions of
partial functions.)
(2) Set Γf := {(x, y) ∈ Rn+1 | f (x) = y}. Show that Γ∗f is the graph of
a function g : A∗ → R∗ . Show also that g = f ∗ .
The following observation is very useful.
Proposition 2.4. Suppose that ϕ(x1 , . . . , xm ) is an L-formula. Set S :=
{~r ∈ Rm | R |= ϕJ~rK}. Then S ∗ = {~r ∈ (R∗ )m | R∗ |= ϕJ~rK}.
Proof. Just observe that ∀~v (PS ~v ↔ ϕ(~v )) belongs to Th(R).
Corollary 2.5. N is not a definable set (even with parameters) in R∗ .
Proof. Suppose, towards a contradiction, that there is an L-formula ϕ(x, ~y )
and ~r ∈ R∗ such that N = {a ∈ R∗ | R∗ |= ϕJa, ~rK}. Write down an
L-sentence σ which says that for all ~y , if ϕ(x, ~y ) defines a nonempty set
of natural numbers that is bounded above, then ϕ(x, ~y ) has a maximum.
(Remember you have a symbol PN for the set of natural numbers.) Since
R |= σ, we have R∗ |= σ. Now N is bounded above in R∗ (by an infinite
element). Thus, N should have a maximum in R∗ , which is clearly ridiculous.
Since N∗ is a definable set in R∗ (defined by PN ), we obtain the following
Corollary 2.6. N∗ \ N is not definable in R∗ .
In modern nonstandard analysis parlance, the previous two results would
be phrased as “N and N∗ \ N” are not internal sets. We will discuss internal
sets later in these notes.
2.2. Approach 2: The ultrapower construction. In this approach to
nonstandard analysis, one gives an “explicit” construction of the nonstan-
dard universe in a manner very similar to the explicit construction of the
real numbers from the rational numbers. Recall that a real number can be
viewed as a sequence of rational numbers which we view as better and better
approximations to the real number. Similarly, an element of R∗ should be
viewed as a sequence of real numbers. For example, the sequence (1, 2, 3, . . .)
should represent some infinite element of R∗ .
However, many different sequences of rational numbers represent the same
real number. Thus, a real number is an equivalence class of sequences of
LECTURE NOTES ON NONSTANDARD ANALYSIS 13
rational numbers (qn ), where (qn ) and (qn0 ) are equivalent if they “represent
the same real number,” or, more formally, if limn→∞ (qn − qn0 ) = 0. We run
into the same issue here: many sequences of real numbers should represent
the same hyperreal number. For instance, it should hopefully be clear that
the sequence (1, 2, 3, . . . , n, n + 1, . . .) and (π, e, −72, 4, 5, 6, . . . , n, n + 1, . . .)
should represent the same (infinite) hyperreal number as they only differ in
a finite number of coordinates.
More generally, we would like to say that two sequences of real numbers
represent the same hyperreal number if they agree on “most” coordinates.
But what is a good notion of “most” coordinates? A first guess might be
that “most” means all but finitely many; it turns out that this guess is
insufficient for our purposes. Instead, we will need a slightly more general
notion of when two sequences agree on a large number of coordinates; this
brings in the notion of a filter.
Definition 2.7. A (proper) filter on N is a set F of subsets of N (that is,
F ⊆ P(N)) such that:
• ∅∈ / F, N ∈ F;
• if A, B ∈ F, then A ∩ B ∈ F ;
• if A ∈ F and A ⊆ B, then B ∈ F.
We think of elements of F as “big” sets (because that’s what filters do,
they catch the big objects). The first and third axioms are (hopefully)
intuitive properties of big sets. Perhaps the second axiom is not as intuitive,
but if one thinks of the complement of a big set as a “small” set, then the
second axiom asserts that the union of two small sets is small (which is
hopefully more intuitive).
Exercise 2.8. Set F := {A ⊆ N | N \ A is finite}. Prove that F is a filter
on N, called the Frechet or cofinite filter on N.
Exercise 2.9. Suppose that D is a set of subsets of N with the finite intersec-
tion property, namely, whenever D1 , . . . , Dn ∈ D, we have D1 ∩ · · · ∩ Dn 6= ∅.
Set
hDi := {E ⊆ N | D1 ∩ · · · Dn ⊆ E for some D1 , . . . , Dn ∈ D}.
Show that hDi is the smallest filter on N containing D, called the filter
generated by D.
If F is a filter on N, then a subset of N cannot be simultaneously big and
small (that is, both it and it’s complement belong to F), but there is no
requirement that it be one of the two. It will be desirable (for reasons that
will become clear in a second) to add this as an additional property:
Definition 2.10. If F is a filter on N, then F is an ultrafilter if, for any
A ⊆ N, either A ∈ F or N \ A ∈ F (but not both!).
Ultrafilters are usually denoted by U. Observe that the Frechet filter on
N is not an ultrafilter since there are sets A ⊆ N such that A and N \ A
14 ISAAC GOLDBRING
(2) Suppose (sn ) is bounded below and nonincreasing. Then (sn ) con-
verges to inf{sn | n ∈ N}..
Proof. We only prove (1); the proof of (2) is similar. By the previous theo-
rem, it suffices to prove the following
Exercise 3.9. (sn ) is Cauchy if and only if, for all M, N > N, sM ≈ sN .
Proposition 3.10. (sn ) converges if and only if (sn ) is Cauchy.
Proof. The (⇒) direction is an easy exercise, so we only prove the (⇐)
direction. Suppose (sn ) is Cauchy; then (sn ) is bounded. Fix M > N; then
sM ∈ Rfin . Set L := st(sM ). If N > N is another infinite natural number,
then by the previous exercise, sN ≈ sM , so sN ≈ L. Thus, (sn ) converges
to L.
3.2. Cluster points. If (sn ) is a sequence and L ∈ R, then we say that
L is a cluster point of (sn ) if, for each ∈ R>0 , the interval (L − , L + )
contains infinitely many sn ’s. It will be useful for us to write this in another
way: L is a cluster point of (sn ) if and only if, for every ∈ R>0 , for every
m ∈ N, there is n ∈ N such that n ≥ m and |sn − L| < .
We can also recast this notion in terms of subsequences. A subsequence
of (sn ) is a sequence (tk ) such that there is an increasing function α : N → N
satisfying tk = sα(k) . We often write (snk ) for a subsequence of (sn ), where
nk := α(k).
Exercise 3.11. L is a cluster point of (sn ) if and only if there is a subse-
quence (snk ) of (sn ) that converges to L.
Recall that (sn ) converges to L if sN ≈ L for all N > N. Changing the
quantifier “for all” to “there exists” gives us the notion of cluster point:
Proposition 3.12. L is a cluster point of (sn ) if and only if there is N > N
such that sN ≈ L.
Proof. (⇒): Apply the transfer principle to the definition of cluster point.
Fix ∈ µ>0 and M > N. Then there is N ∈ N∗ , N ≥ M , such that
|sN − L| < . This is the desired N since N > N and is infinitesimal.
(⇐): Fix N > N such that sN ≈ L. Fix ∈ R>0 , m ∈ N. Then
R∗ |= (∃n ∈ N∗ )(n ≥ m ∧ |sn − L| < );
indeed, N witnesses the truth of this quantifier. It remains to apply transfer
to this statement.
We immediately get the famous:
Corollary 3.13 (Bolzano-Weierstraß). Suppose that (sn ) is bounded. Then
(sn ) has a cluster point.
Proof. Fix N > N. Then since (sn ) is bounded, sN ∈ Rfin . Let L = st(sN );
then L is a cluster point of (sn ) by the last proposition.
Suppose s = (sn ) is a bounded sequence. Let C(s) denote the set of
cluster points of s. Then, by the previous proposition, we have
C(s) = {L ∈ R | sN ≈ L for some N > N}.
Exercise 3.14. C(s) is a bounded set.
20 ISAAC GOLDBRING
PN
(2) a ∈ Rfin for all N > N;
P0N i
(3) 0 ai ∈ Rfin for some N > N.
(Hint: Use the Monotone Convergence Theorem.)
Problem 3.8. Let (sn ) be a sequence. Show that (sn ) is Cauchy if and
only if for every M, N > N, we have sM ≈ sN .
Problem 3.9 (Advanced).
(1) Let ∞
P P∞
0 ai and 0 bi be two series, where an , bn ≥ 0 for all P
i ∈ N.
Suppose that an ≤ bnP for all n > N. Further suppose that ∞ 0 bi
converges. Show that ∞ 0 a i converges.
(2) Show that the condition “an ≤ bn for all n > N” is equivalent to the
condition “there exists k ∈ N such that an ≤ bn for all n ∈ N with
n ≥ k,” i.e. (bn ) eventually dominates (an ).
The result established in this problem is usually called the Comparison
Test.
4. Continuity
4.1. First results about continuity. Let A ⊆ R, f : A → R a function,
and c ∈ A. We say that f is continuous at c if: for all ∈ R>0 , there is δ ∈
R>0 such that, for all x ∈ R, if x ∈ A and |x − c| < δ, then |f (x) − f (x)| < .
We say that f is continuous if f is continuous at c for all c ∈ A. Here is the
nonstandard characterization of continuity:
Theorem 4.1. Suppose f : A → R and c ∈ A. The following are equivalent:
(1) f is continuous at c;
(2) if x ∈ A∗ and x ≈ c, then f (x) ≈ f (c);
(3) there is δ ∈ µ>0 such that, for all x ∈ A∗ , if |x − c| < δ, then
f (x) ≈ f (c).
The equivalence between (1) and (2) backs up our usual heuristic that f
is continuous at c if, for all x ∈ A really close to c, we have f (x) is really
close to f (c).
Proof. (1) ⇒ (2): Suppose that x ≈ c; we want f (x) ≈ f (c). Fix ∈ R>0 ;
we want |f (x) − f (c)| < . By (1), there is δ ∈ R>0 such that
R |= ∀x((x ∈ A ∧ |x − c| < δ) → |f (x) − f (c)| < ).
Applying transfer to this statement and realizing that x ≈ c implies |x−c| <
δ yields that |f (x) − f (c)| < .
(2) ⇒ (3) follows by taking δ ∈ µ>0 arbitrary.
(3) ⇒ (1): Fix δ as in (3). Fix ∈ R>0 . Since x ∈ A∗ and |x − c| < δ
implies f (x) ≈ f (c), and, in particular, |f (x) − f (c)| < , we have
R∗ |= (∃δ ∈ R>0 )(∀x ∈ A)(|x − c| < δ → |f (x) − f (c)| < ).
Apply transfer.
24 ISAAC GOLDBRING
Example 4.2. Since | sin x| ≤ |x| for x small, we see that, by transfer,
sin ∈ µ for ∈ µ. (In other words, sin is continuous at 0.) A similar
argument, shows that cos is continuous at 0, that is, cos ≈ 1 for ∈ µ.
Using this and the transfer of the usual trigonometric identities, we can
prove that sin is continuous: if c ∈ R and x ≈ c, write x = c + for ∈ µ.
Then
sin x = sin(c + ) = sin c cos + cos c sin ≈ sin c · 1 + cos c · 0 = sin c.
Example 4.3. Let (
sin( x1 ) if x 6= 0
f (x) =
0 if x = 0.
Then f is not continuous at 0. Indeeed, let N > N be odd and set x :=
2 Nπ
N π ≈ 0. Then f (x) = sin 2 = 1 6≈ f (0).
However, the function g defined by
(
x sin(1/x) if x 6= 0
g(x) =
0 if x = 0
is continuous at 0. Indeed, suppose x ≈ 0, x 6= 0. Then since | sin(1/x)| ≤ 1,
|g(x)| = |x|| sin(1/x)| ≈ 0.
How about the usual connection between continuity and limits? We say
that limx→c f (x) = L if: for all ∈ R>0 , there is δ ∈ R>0 such that, for all
x ∈ A, if 0 < |x − c| < δ, then |f (x) − L| < .
Exercise 4.4. Prove that limx→c f (x) = L if and only if, for all x ∈ A∗ , if
x ≈ c but x 6= c, then f (x) ≈ L.
Corollary 4.5. f is continuous at c if and only if limx→c f (x) = f (c).
Proposition 4.6. Suppose that f is continuous at c and g is continuous at
f (c). Then g ◦ f is continuous at c.
Proof. If x ≈ c, then f (x) ≈ f (c), whence g(f (x)) ≈ g(f (c)).
The following result is fundamental:
Theorem 4.7 (Intermediate Value Theorem). Suppose that f : [a, b] → R
is continuous. Let d be a value strictly in between f (a) and f (b). Then there
is c ∈ (a, b) such that f (c) = d.
The nonstandard proof of the Intermediate Value Theorem will be our
first example of using so-called “hyperfinite methods;” in this case, we will
be using hyperfinite partitions. The idea is to partition the interval [a, b]
into subintervals of width N1 for N > N. Then, logically, this partition will
behave as if it were finite; in particular, we will be able to detect “the last
time” some particular phenomenon happens. However, since we are taking
infinitesimal steps, the change in function value at this turning point will be
infinitesimal. Here are the precise details:
LECTURE NOTES ON NONSTANDARD ANALYSIS 25
Proof. Without loss of generality, f (a) < f (b), so f (a) < d < f (b). Define a
sequence (sn ) as follows: for n > 0, let {p0 , . . . , pn } denote the partition of
[a, b] into n equal pieces of width b−a
n , so p0 = a and pn = b. Since f (p0 ) < d,
we are entitled to define the number sn := max{pk | f (pk ) < d}, so pk is the
“last time” that f (pk ) < d. Observe that sn < b.
We now fix N > N and claim that c := st(sN ) ∈ [a, b] is as desired,
namely, that f (c) = d. (Note that sN ∈ [a, b], whence st(sN ) is defined.)
Indeed, by transfer, sN < b, whence sN + b−a N ≤ b. Again, by transfer,
f (sN ) < d ≤ f (sN + N ). However, sN + b−a
b−a
N ≈ sN ≈ c. Since f is
continuous at c, we have
b−a
f (c) ≈ f (sN ) < d ≤ f (sN + ) ≈ f (c),
N
whence f (c) ≈ d. Since f (c), d ∈ R, we get that f (c) = d.
The next fundamental result is also proven using hyperfinite partitions.
Theorem 4.8 (Extreme Value Theorem). Suppose that f : [a, b] → R is
continuous. Then there are c, d ∈ [a, b] such that f (c) ≤ f (x) ≤ f (d) for all
x ∈ [a, b]. (In other words, f achieves it maximum and minimum.)
Proof. We will only prove the existence of the maximum of f . Once again, let
{p0 , p1 , . . . , pn } denote the partition of [a, b] into n equal pieces. This time,
we define sn to be some pk such that f (pj ) ≤ f (pk ) for all j = 0, . . . , n. Fix
N > N and set d := st(sN ) ∈ [a, b]. We claim that f achieves its maximum
at d. (Please appreciate the beauty of this claim: We are partitioning [a, b]
into hyperfinitely many pieces, looking for where the function achieves its
maximum on this hyperfinite set, and claiming that this element is infinitely
close to where the function achieves its maximum on [a, b]. Magic!)
Fix x ∈ [a, b]. We need f (x) ≤ f (d). First, we need to “locate” x in our
hyperfinite partition. Since
k(b − a) (k + 1)(b − a)
R |= (∀n ∈ N)(∃k ∈ N)(0 ≤ k < n ∧ a + ≤ x ≤ a+ ),
n n
by transfer, we can find k ∈ N∗ , 0 ≤ k < N , such that a + k(b−a) N ≤x≤
(k+1)(b−a) k(b−a) (k+1)(b−a)
a+ N . Since the interval [a + N , a + N ] has infinitesimal
width N and f is continuous at x, we see that f (a + k(b−a)
b−a
N ) ≈ f (x) ≈
(k+1)(b−a) k(b−a)
f (a + N ). However, by transfer, f (a + N ) ≤ f (sN ). Since
f (sN ) ≈ f (d) (since f is continuous at d), it follows that f (x) ≤ f (d).
x. In other words, it is what one gets when one slides the first universal
quantifier in the above display over to the spot after the ∃δ. More formally:
Definition 4.9. f : A → R is uniformly continuous if, for all ∈ R>0 , there
exists δ ∈ R>0 such that, for all x, y ∈ A, if |x−y| < δ, then |f (x)−f (y)| < .
Using purely symbolic language as above, we see that f : A → R is
uniformly continuous if and only if:
(∀ ∈ R>0 )(∃δ ∈ R>0 )(∀x, y ∈ A)(|x − y| < δ → |f (x) − f (y)| < ).
While uniform continuity is no more difficult to state than ordinary con-
tinuity (as it results from a simple permutation of quantifiers), it is often
difficult for students to first digest. For this reason, uniform continuity
is perhaps one of the best examples of elucidating a standard concept by
nonstandard means. Here is the nonstandard equivalence:
Proposition 4.10. f : A → R is uniformly continuous if and only if,
whenever x, y ∈ A∗ are such that x ≈ y, then f (x) ≈ f (y).
Please make sure that you see how this is different from ordinary continu-
ity. Indeed, f : A → R is continuous if and only if, whenever x, y ∈ A∗ are
such that x ≈ y, and at least one of x and y are standard, then f (x) ≈ f (y).
Thus, uniform continuity demands continuity for the extended part of A as
well.
Exercise 4.11. Prove Proposition 4.10.
Example 4.12. It is now easy to see why f : (0, 1] → R given by f (x) = x1
is not uniformly continuous. Indeed, fix N > N. Then N1 , N 1+1 ∈ (0, 1]∗ ,
1 1 1 1
N ≈ N +1 , but N = f ( N ) 6≈ f ( N +1 ) = N + 1. Note that this calculation
does not contradict the fact that f is continuous. Indeed, since 0 is not in
the domain of f , we would never be calculating f (x) for infinitesimal x when
verifying continuity.
Recall our above heuristic, namely that f : A → R is uniformly continuous
when f is continuous on the “extended part” of A. But sometimes A doesn’t
have an extended part. For example, if A = [a, b], then A∗ is such that
every element is infinitely close to an element of A. It is for this reason
that continuous functions whose domains are closed, bounded intervals are
automatically uniformly continuous:
Theorem 4.13. Suppose f : [a, b] → R is continuous. Then f is uniformly
continuous.
Proof. Suppose x, y ∈ [a, b]∗ , x ≈ y. Then c := st(x) = st(y) ∈ [a, b]. Since
f is continuous at c, we have that f (x) ≈ f (c) ≈ f (y).
Please compare the proof of the previous theorem with the usual standard
proof. In particular, compare the lengths of these proofs!
LECTURE NOTES ON NONSTANDARD ANALYSIS 27
The terminology internally continuous will make more sense later in these
notes when we define internal sets and functions. Notice that being inter-
nally continuous is just like ordinary continuity, but with everything deco-
rated by stars. By the transfer principle, if f : A → R is continuous, then
its nonstandard extension f : A∗ → R∗ is internally continuous. Another
example is provided by the above proof: if each element of the sequence fn
is continuous, then fn : A∗ → R∗ is internally continuous for all n ∈ N∗ .
While we are on the topic, let’s define another notion of continuity for
nonstandard functions.
Definition 4.22. f : A∗ → R∗ is S-continuous if, for all x, y ∈ A∗ , if x ≈ y,
then f (x) ≈ f (y).
So, for example, if f : A → R is a standard function, then f is uniformly
continuous if and only if f : A∗ → R∗ is S-continuous.
Here is another nice convergence theorem:
Theorem 4.23 (Dini). Suppose that fn : [a, b] → R is continuous, f :
[a, b] → R is continuous and fn converges pointwise to f . Further suppose
that fn (x) is nonincreasing for each x ∈ [a, b]. Then fn converges uniformly
to f .
Proof. By subtracting f from all of the functions involved, we may as well
assume that f is the zero function. Fix N > N and c ∈ [a, b]∗ . We need
to show that fN (c) ≈ 0. Fix n ∈ N. By transfer, 0 ≤ fN (c) ≤ fn (c).
Set d := st(c) ∈ [a, b]. Since fn is continuous, we see that fn (c) ≈ fn (d).
Consequently, fN (c) ∈ Rfin . Taking standard parts, we see that
0 ≤ st(fN (c)) ≤ st(fn (c)) = fn (d).
Since fn (d) → 0 as n → ∞, we see that st(fN (c)) = 0.
In Theorem 4.20, if we assumed that each fn was uniformly continuous,
could we conclude that f was uniformly continuous? Unfortunately, this is
not true in general. To explain when this is true, we need to introduce a
further notion. First, suppose that each fn : A → R is uniformly continuous.
Then, in symbols, this means:
(∀n ∈ N)(∀ ∈ R>0 )(∃δ ∈ R>0 )(∀x, y ∈ A)(|x−y| < δ → |fn (x)−fn (y)| < ).
Thus, δ can depend on both the and the particular function fn . Here’s
what happens when we only require the δ to depend on the :
Definition 4.24. The sequence (fn ) is equicontinuous if: for all ∈ R>0 ,
there is δ ∈ R>0 such that, for all x, y ∈ A, for all n ∈ N, if |x − y| < δ, then
|fn (x) − fn (y)| < .
In other words, (fn ) is equicontinuous if it is uniformly uniformly contin-
uous in the sense that each fn is uniformly continuous and, for a given ,
there is a single δ that witnesses the uniform continuity for each fn .
30 ISAAC GOLDBRING
(2) Show that limx→c− f (x) = L iff f (x) ≈ L for all x ∈ A∗ with x ≈ c
and x < c.
(3) Show that limx→c f (x) = L iff f (x) ≈ L for all x ∈ A∗ with x ≈ c
and x 6= c.
(4) Show that limx→c+ f (x) = L iff f (x) ≈ L for all x ∈ A∗ with x ≈ c
and x > c.
(5) Show that limx→c f (x) = +∞ iff f (x) > N for all x ∈ A∗ with x ≈ c
and x 6= c.
(6) Show that limx→c f (x) = −∞ iff −f (x) > N for all x ∈ A∗ with
x ≈ c and x 6= c.
(7) Show that limx→+∞ f (x) = L iff there is x ∈ A∗ such that x > N
and f (x) ≈ L for all x ∈ A∗ with x > N.
(8) Show that limx→−∞ f (x) = L iff there is x ∈ A∗ such that −x > N
and f (x) ≈ L for all x ∈ A∗ with −x > N.
Problem 4.2. Suppose f, g : A → R and c, L, M ∈ R. Show that:
(1) limx→c f (x) = L iff limx→c+ f (x) = L and limx→c− f (x) = L.
(2) If limx→c f (x) = L and limx→c g(x) = M , then:
(a) limx→c [(f + g)(x)] = L + M ;
(b) limx→c [(f g)(x)] = LM ;
(c) limx→c [ fg (x)] = M
L
if M 6= 0.
Problem 4.3. Suppose that ∈ µ. Show that:
(1) sin() ≈ 0;
(2) cos() ≈ 1;
(3) tan() ≈ 0.
Problem 4.4. Determine the points of continuity for each of the functions
below.
(1) (
1 if x ∈ Q
g(x) =
0 if x ∈/ Q.
(2) (
x if x ∈ Q
h(x) =
−x if x ∈
/ Q.
(3)
(
0 if x ∈
/Q
j(x) = 1
n if x = m
n ∈ Q in simplest form with n ≥ 1.
between f (a) and f (b), there is c ∈ [a, b] such that f (c) = d. Show that f is
continuous.
5. Differentiation
5.1. The derivative. We suppose that f : A → R is a function and c ∈ A
is an interior point, that is, there is an interval around c contained in A.
Definition 5.1. f is differentiable at c if limh→0 f (c+h)−f
h
(c)
exists; in this
0
case, the limit is denoted by f (c) and is called the derivative of f at c.
The nonstandard characterization of limits (Exercise 4.4) immediately
gives the following nonstandard characterization of differentiability:
Proposition 5.2. f is differentiable at c with derivative D if and only if
for every ∈ µ \ {0}, we have f (c+)−f
(c)
≈ D.
Suppose f is differentiable at c. Fix dx ∈ µ \ {0}. (Here we are calling
our infinitesimal dx to match with the usual verbiage of calculus.) Set
df
df := df (c, dx) = f (c + dx) − f (c). Then f 0 (x) = st( dx ). In this way, the
derivative is, in some sense, an actual fraction. (Recall in calculus we are
df
warned not to take the notation dx to seriously and not to treat this as an
actual fraction.)
If the domain of f is an open (possibly infinite) interval, we say that f is
differentiable if it is differentiable at all points of its domain.
Example 5.3. If f (x) = x2 , then
df = f (x + dx) − f (x) = (x + dx)2 − x2 = x2 + 2xdx + (dx)2 − x2 .
df df
Thus, dx = 2x + dx ≈ 2x, whence f 0 (x) exists and f 0 (x) = st( dx ) = 2x.
Proposition 5.4. If f is differentiable at x, then f is continuous at x.
Proof. Suppose y ≈ x; we need f (y) ≈ f (x). Write y = x + dx with dx ∈ µ.
Without loss of generality, dx 6= 0. Then
f (y) − f (x) = df ≈ f 0 (x)dx ≈ 0.
Here are some fundamental properties of derivatives:
Theorem 5.5. Suppose that f, g are differentiable at x and c ∈ R. Then
f + g, cf , and f g are also differentiable at x. If g(x) 6= 0, then fg is also
differentiable at x. Moreover, the derivatives are given by:
(1) (f + g)0 (x) = f 0 (x) + g 0 (x);
(2) (cf )0 (x) = cf 0 (x);
(3) (Product Rule) (f g)0 (x) = f 0 (x)g(x) + f (x)g 0 (x);
34 ISAAC GOLDBRING
Proof. We’ll only prove the Product Rule, leaving the others as exercises.
Fix dx ∈ µ \ {0}. Then
so
d(f g) dg df dg
= f (x) + g(x) + (df ) .
dx dx dx dx
Since f is continuous at x, df ≈ 0. Thus, taking standard parts of the above
display yields (f g)0 (x) = f (x)g 0 (x) + g(x)f 0 (x).
Now, the Chain Rule is notorious for having many incorrect proofs in
textbooks. Hopefully our nonstandard proof is correct:
Proof. Suppose, towards a contradiction, that f 0 (c) > 0. (The case that
f 0 (c) < 0 is similar.) Fix as in the statement of the theorem and fix
dx ∈ µ>0 . Then c + dx ∈ (c − , c + )∗ , so f (c + dx) ≤ f (c). However,
f 0 (c) ≈ f (c+dx)−f
dx
(c)
, whence f (c+dx)−f
dx
(c)
> 0. Since dx > 0, this forces
f (c + dx) − f (c) > 0, a contradiction.
LECTURE NOTES ON NONSTANDARD ANALYSIS 35
The Taylor series for f need not converge at some (or even any) x. Even
if the Taylor series for f does converge at some x, it need not converge to
f (x). For example, suppose
( −1
e x2 if x 6= 0
f (x) =
0 if x = 0.
Then f (n) (x) = 0 for all n ∈ N and x ∈ R, whence the Taylor series for f
is identically 0.
Fix n ≥ 0. The nth degree Taylor polynomial for f centered at a is the
polynomial
n
X f (i) (x) f (n) (x)
pn (x) = (x − a)i = f (a) + f 0 (a)(x − a) + · · · + (x − a)n .
i! n!
i=0
Problem 5.6. For a given x, show that the Taylor series for f at x converges
to f if and only if pn (x) ≈ f (x) for all n > N.
Set Rn (x) := f (x) − pn (x). It follows from the previous problem that
the Taylor series for f at x converges to f (x) if and only if Rn (x) ≈ 0
for all n > N. There is a theorem due to Lagrange that says that if f is
(n + 1)-times differentiable on some open interval I containing a, then for
each x ∈ I, there is a c between a and x such that
f (n+1) (c)
Rn (x) = (x − a)n+1 .
(n + 1)!
38 ISAAC GOLDBRING
Problem 5.7. Suppose that f (n) (x) exists for all n ∈ N and x ∈ I (we say
f is infinitely differentiable on I in this situation). Discuss how to make
sense of f (n) (x) for n ∈ N∗ and x ∈ I ∗ . Show that for all x ∈ I ∗ and n ∈ N∗
with n ≥ 1, we have
f (n) (a)
pn (x) − pn−1 (x) = (x − a)n .
n!
Problem 5.8. Suppose that f is infinitely differentiable on I and x ∈ I.
(1) Suppose that
f (n+1) (c)
(x − a)n+1
(n + 1)!
is infinitesimal for every n > N and every c ∈ R∗ such that c is
between a and x. Show that the Taylor series for f at x converges
to f (x).
(2) Use part (1) to show that the Taylor series for cos x centered at
a = 0 (otherwise known as the Maclaurin series for cos x) converges
to cos x for all x ∈ R.
(3) Use part (1) to show that the Maclaurin series for ex converges to
ex for all x ∈ R.
Problem 5.9. Suppose that f (n) exists for all real numbers in some open
interval I. Further suppose that f (n) is continuous at x ∈ I. Show that for
any infinitesimal ∆x, there is an infinitesimal such that
f 00 (x) f (n) (x)
f (x + ∆x) = f (x) + f 0 (x)∆x + + ··· + (∆x)n + (∆x)n .
2 n!
(Hint: Consider the Lagrange form of the remainder Rn .)
Problem 5.10. There is another form of the remainder Rn , which states
that
f (n) (c) − f (n) (a)
Rn (x) = (x − a)n+1
(c − a)(n + 1)!
for some c in between a and x. Apply this form of the remainder to Rn−1 to
prove the result from Problem 5.9 without using the assumption that f (n)
is continuous.
6. Riemann Integration
6.1. Hyperfinite Riemann sums and integrability. The Riemann in-
tegral has a particularly slick description in terms of hyperfinite partitions.
Indeed, one takes a Riemann sum with respect to rectangles of infinitesi-
mal width; this Riemann sum will then be infinitely close to the Riemann
integral.
First, let’s recall some standard notions. Throughout, we assume that
f : [a, b] → R is a bounded function. A partition of [a, b] is a finite ordered
set P = {x0 , . . . , xn } such that a = x0 < x1 < . . . < xn = b. A partition P2
is a refinement of a partition P1 if P1 ⊆ P2 . (So P2 is obtained from P1 by
LECTURE NOTES ON NONSTANDARD ANALYSIS 39
Set M := sup{f (x) | x ∈ [a, b]} and m := inf{f (x) | x ∈ [a, b]}.
Exercise 6.2.
(1) For any partition P , we have
m(b − a) ≤ L(f, P ) ≤ S(f, P ) ≤ U (f, P ) ≤ M (b − a).
(2) If P2 is a refinement of P1 , then
L(f, P1 ) ≤ L(f, P2 ) ≤ U (f, P2 ) ≤ U (f, P1 ).
(3) For any two partitions P1 and P2 , L(f, P1 ) ≤ U (f, P2 ).
Part (2) of the previous exercise motivates the following
Definition 6.3.
(1) The lower Riemann integral of f is
L(f ) := sup{L(f, P ) | P a partition of [a, b]}.
(2) The upper Riemann integral of f is
U (f ) := inf{L(f, P ) | P a partition of [a, b]}.
By Exercise 6.2(1) and (3), we see that
m(b − a) ≤ L(f ) ≤ U (f ) ≤ M (b − a).
We say that f is Riemann integrable if U (f ) = L(f ). In this case, we set
Rb
a f dx := U (f ) = L(f ).
The following Cauchy-type criterion for integrability is quite useful:
40 ISAAC GOLDBRING
Exercise 6.4 (Riemann Lemma). f is Riemann integrable if and only if, for
every ∈ R>0 , there is a partition P of [a, b] such that U (f, P )−L(f, P ) < .
Fix ∆x ∈ R>0 . Set P∆x = {x0 , . . . , xn }, where [a, x1 ], [x1 , x2 ], . . . , [xn−2 , xn−1 ]
all have equal length ∆x and [xn−1 , xn ] is the “leftover” piece. (If ∆x ≥ b−a,
then P∆x = {a, b}.) We thus get a function U (f, ·) : R>0 → R by setting
U (f, ∆x) := U (f, P∆x ). In a similar manner, we get functions L(f, ·) and
S(f, ·).
Here is another Cauchy-type criterion for integrability:
Exercise 6.5. f is Riemann integrable if and only if, for every ∈ R>0 ,
there is δ ∈ R>0 such that, for all ∆x ∈ R>0 , if ∆x < δ, then U (f, ∆x) −
L(f, ∆x)) < .
The functions U (f, ·), L(f, ·), S(f, ·) have nonstandard extensions U (f, ·) :
(R∗ )>0 → R∗ , etc... So, for example, if ∆x ∈ µ>0 , then U (f, ∆x) equals the
upper Riemann sum of f with respect to a hyperfinite partition, where each
interval in the partition has infinitesimal length.
Theorem 6.6. f is Riemann integrable if and only if U (f, ∆x) ≈ L(f, ∆x)
for any ∆x ∈ µ>0 . In this case, for any ∆x ∈ µ>0 , we have
Z b
f dx = st(U (f, ∆x)) = st(L(f, ∆x)) = st(S(f, ∆x)).
a
Theorem 6.7. If f is continuous on [a, b], then for any ∆x ∈ µ>0 , L(f, ∆x) ≈
U (f, ∆x). Consequently, by the previous theorem, continuous functions are
Riemann integrable.
Proof.
Pn Let’s give the idea of the proof first. Note that U (f, ∆x)−L(f, ∆x) =
i=1 (M i − mi )∆xi . We will find an upper bound for this sum of the form
[f (c) − f (d)](b − a), where |c − d| < ∆x. If ∆x ∈ µ>0 , then c ≈ d, so
f (c) ≈ f (d) by the uniform continuity of f . Since b − a ∈ R, this will show
that U (f, ∆x) − L(f, ∆x) ≈ 0.
Now for the details: for ∆x ∈ R>0 , define Mi (∆x) := Mi (P∆x ) and
mi (∆x) := mi (P∆x ). We then define the oscillation of f with respect to ∆x
to be the quantity
ω(∆x) := max{Mi (∆x) − mi (∆x) | i = 1, . . . , n}.
Suppose that j ∈ {1, . . . , n} is such that ω(∆x) = Mj (∆x) − mj (∆x). Fix
c∆x , d∆x ∈ [xj−1 , xj ] such that f (c∆x ) = Mj (∆x) and f (d∆x ) = mj (∆x);
this is possible since continuous functions achieve their max and min. Clearly,
|c∆x − d∆x | ≤ ∆x. Also,
Xn
U (f, ∆x) − L(f, ∆x) = (Mi (∆x) − mi (∆x))(∆xi ) ≤ ω(∆x)(b − a).
i=1
Fix ∆x ∈ By transfer, there are c, d ∈ [a, b]∗ such that |c−d| ≤ ∆x and
µ>0 .
U (f, ∆x)−L(f, ∆x) ≤ (f (c)−f (d))(b−a), whence U (f, ∆x) ≈ L(f, ∆x).
Exercise 6.8. Prove Theorem 6.7 with the assumption of “continuity” re-
placed by “monotonicity.”
Now that integrals are infinitely close to hyperfinite sums, properties of
integrals follow almost immediately from properties of sums. For example:
Proposition 6.9. Suppose that f is integrable and c ∈ R. Then cf is
Rb Rb
integrable and a (cf )dx = c a f dx.
Proof. Fix ∆x ∈ µ>0 . First suppose that c ≥ 0. Then U (cf, ∆x) =
cU (f, ∆x) and L(cf, ∆x) = cL(f, ∆x). Since U (f, ∆x) ≈ L(f, ∆x), we
have that U (cf, ∆x) ≈ L(cf, ∆x), whence cf is integrable. Moreover,
Z b Z b
(cf )dx = st(U (cf, ∆x)) = c st(U (f, ∆x) = c f dx.
a a
If c < 0, then U (cf, ∆x) = cL(f, ∆x) and L(cf, ∆x) = cU (f, ∆x). The
proof then proceeds as in the previous paragraph.
6.2. The Peano Existence Theorem. Here is an application of the non-
standard approach to integration to differential equations:
Theorem 6.10 (Peano Existence Theorem). Suppose that g : [0, 1]×R → R
is a bounded, continuous function. Then for any a ∈ R, there is a differen-
tiable function f : [0, 1] → R satisfying f (0) = a and f 0 (t) = g(t, f (t)) for
all t ∈ [0, 1].
42 ISAAC GOLDBRING
k+1
= sN · ∈ µ.
N
Therefore, for any x ∈ [0, 1], writing x = st( Nk ) with Nk < x, we have
k
f (x) ≈ Z()
N
k−1
X n n 1
=a+ (g( ), Z( )) · )
N N N
n=0
k−1
X n 1
=a+ (W ( )· )
N N
n=0
k−1
X n 1
≈a+ (h( )· )
N N
n=0
Z x
≈a+ h(t)dt.
0
The last step follows from Theorem 6.6.
R x Since the beginning and end are
standard numbers, we have f (x) = a + 0 h(t)dt.
6.3. Problems.
Problem 6.1. Let f : [a, b] → R be continuous. For n ∈ N>0 and i =
0, . . . , n − 1, define xi := a + i(b−a)
n . Define
f (x0 ) + f (x1 ) + · · · + f (xn−1 )
Av(n) := ,
n
which is often referred to as a sample average for f . Prove that if n > N,
1
Rb
we have Av(n) ≈ b−a a f (x)dx.
1
Rb
Remark. b−a a f (x)dx is often referred to as the average of f on [a, b]. This
exercises illustrates a common phenomenon in nonstandard analysis, namely
approximating continuous things by hyperfinite discrete things.
Problem 6.2 (Both). Suppose that f, g : [a, b] → R are Riemann integrable.
Show that:
Rb Rb Rb
(1) f + g is integrable and a (f (x) + g(x))dx = a f (x)d(x) + a g(x)dx.
(2) f is Riemann integrable on both [a, c] and [c, b] for any c ∈ [a, b] and
Rb Rc Rc
f (x)dx = f (x)dx + b f (x)dx.
Rab Rab
(3) a f (x)dx ≤ a g(x)dx if f (x) ≤ g(x) for all x ∈ [a, b].
Rb
(4) m(b − a) ≤ a f (x)dx ≤ M (b − a) if m ≤ f (x) ≤ M for all x ∈ [a, b].
Problem 6.3. Suppose thatRf : [a, b] → R is Riemann integrable. Define
x
F : [a, b] → R by F (x) := a f (t)dt. Prove that F is continuous (even
though f may not be).
Problem 6.4.
44 ISAAC GOLDBRING
We should say the sequence (σn ) is called the sequence of Cesáro means
of the sequence (sn ). It is possible that (σn ) converges when (sn ) diverges.
When (sn ) is the sequence of partial sums of an infinite series, this leads to
the notion of Cesáro summability, which is useful in the theory of Fourier
series.
Problem 7.5. Suppose f : R → R is a function and f (x + y) = f (x) + f (y)
for all x, y ∈ R (i.e. f is an additive group homomorphism).
(1) Show that f (0) = 0.
(2) Show that f (nx) = nf (x) for all x ∈ R and n ∈ N.
(3) Show that f (−x) = −f (x) for all x ∈ R.
(4) Show that f (kx) = kf (x) for all k ∈ Z.
(5) Show that f (qx) = qf (x) for all q ∈ Q.
(6) (Cauchy) Suppose f is continuous. Show that f (x) = f (1) · x for all
x ∈ R.
Problem 7.6. In this problem, we prove a strengthening of Cauchy’s result
from Problem 6 by showing that if f : R → R is an additive group homo-
morphism and there is an inteveral I ⊆ R such that f is bounded on I, then
f (x) = f (1) · x for all x ∈ R. (This result is due to Darboux.) Fix x0 ∈ I
and M ∈ R>0 such that |f (x)| ≤ M for all x ∈ I.
(1) Show that if x ≈ 0, then |f (x)| ≤ M + |f (x0 )|.
(2) Show that if x ≈ 0, then f (x) ≈ 0. (Hint: If x ≈ 0, then nx ≈ 0 for
all n ∈ N.
(3) Show that f (x) = f (1) · x for all x ∈ R. (Hint: Use the fact that
any x ∈ R is infinitely close to an element of Q∗ .)
Problem 7.7.
(1) Suppose f : (a, b) → R is C 1 . Suppose x ∈ (a, b)∗ is such that
st(x) ∈ (a, b). Suppose ∆x ≈ 0. Prove that there exists ∈ µ such
that
f (x + ∆x) = f (x) + f 0 (x) + ∆x.
(2) Define f : R → R by
(
x2 sin( x1 ) if x 6= 0
f (x) =
0 if x = 0.
(a) Show that f 0 (x) exists for all x but that f 0 is not continuous at
0.
46 ISAAC GOLDBRING
1
(b) Let N > N and x = 2πN . Show that there is an infinitesimal
∆x such that there is no ∈ µ making the conclusion of (a)
true.
(c) Discuss why parts (1) and (2)(b) don’t contradict the fact that
f 0 is continuous on (0, 1).
Problem 7.8. A Dirac delta function is a definable function D : R∗ → R∗
such that:
• D(x) ∗
R +∞ ≥ 0 for all x ∈ R ;
• −∞ D(x)dx = 1;
Rδ
• there is a positive infinitesimal δ > 0 such that −δ D(x)dx ≈ 1.
(1) Make sense of the above properties, i.e. explain how to precisely
state the above properties of a Dirac delta function.
R +∞delta function and f : R → R a standard function.
(2) Let D be a Dirac
Show that st( −∞ f (x)D(x)dx) = f (0).
R∞
(3) Suppose f : R → R is standard and −∞ f (x)dx = 1. Fix n ∈ N∗ \ N
and define D : R∗ → R∗ by D(x) := nf (nx). Show that D is a Dirac
delta function. In particular, the following functions are Dirac delta
functions:
• (
1
n if |x| ≤ 2n
D(x) =
0 otherwise.
n
• D(x) = π(1+n2 x2 )
.
2 2
• D(x) = ne−πn x .
(4) Suppose that f : R → R is a standard function such that f (x) ≥ 0
R +∞ R1
for all x ∈ R and −∞ f (x) = 1. For n ∈ N>0 , define an := −n 1 f (x).
n
Show that an ≈ 0 for all n > N. Conclude that a Dirac delta function
can never be the nonstandard extension of a standard nonnegative,
integrable function.
Problem 7.9. For a, b ∈ N with a ≤ b, we set
[a, b] := {a, a + 1, . . . , b − 1, b} ⊆ N.
Suppose that A ⊆ N. We say that:
• A is thick if for all k ∈ N>0 , there is x ∈ N such that [x+1, x+k] ⊆ A.
• A is syndetic if N \ A is not thick, that is, there is k ∈ N>0 such that,
for all x ∈ N, [x + 1, x + k] ∩ A 6= ∅.
• A is piecewise syndetic if A = B ∩ C, where B is thick and C is
syndetic.
(1) Prove that A is thick if and only if A∗ contains an infinite interval,
that is, there are M, N ∈ N∗ with N − M ∈ N∗ \ N such that
[M, N ] ⊆ A∗ .
LECTURE NOTES ON NONSTANDARD ANALYSIS 47
(2) Prove that A is syndetic if and only if A∗ has finite gaps, that is, for
all intervals [M, N ] ⊆ N∗ , if [M, N ] ∩ A∗ = ∅, then N − M ∈ N.
(3) (Standard) Prove that A is piecewise syndetic if and only if there is
a finite set F ⊆ N such that A + F is thick, where
A + F := {a + f : a ∈ A, f ∈ F }.
(4) Prove that A is piecewise syndetic if and only if there is an infinite
interval on which A∗ has only finite gaps.
(5) Use the nonstandard characterization of piecewise syndeticity to
prove that piecewise syndeticity is a partition regular notion, that
is, if A = A1 ∪ · · · ∪ An is piecewise syndetic, then Ai is piecewise
syndetic for some i ∈ {1, . . . , n}.
The notions appearing in the previous problem are present in additive
combinatorics and combinatorial number theory.
Example 8.9. Let us consider (N, P(N)) and its nonstandard extension
(N∗ , P(N)∗ ). We claim that N is an external subset of N∗ . To see this, note
that the following sentence is true in (N, P(N)):
∀A ∈ P(N)((∃x ∈ N(PE (x, A)) ∧ ∃y ∈ N∀z ∈ N(PE (z, A) → z ≤ y))
→ ∃y ∈ N(PE (y, A) ∧ ∀z ∈ N(PE (z, A) → z ≤ y).
This sentence says that if A ⊆ N is bounded above, then A has a maximum
element. By transfer, the same holds true for any A ∈ P(N)∗ , that is, for
any internal subset of N∗ . If N were internal, then since it is bounded above
(by an infinite element), it would have a maximum, which is clearly not true.
Example 8.10. We continue to work with the set-up of the previous ex-
ample. Since
(N, P(N)) |= ∀n ∈ N∃A ∈ P(N)∀m ∈ N(PE (m, A) ↔ m ≤ n),
by transfer we have
(N∗ , P(N)∗ ) |= ∀n ∈ N∗ ∃A ∈ P(N)∗ ∀m ∈ N∗ (PE (m, A) ↔ m ≤ n).
Fixing N ∈ N, we suggestively let {0, 1, . . . , N } denote the internal subset
of N∗ consisting of all the elements of N∗ that are no greater than N . This
is a prototypical example of a hyperfinite set.
The following principle is useful in practice; it says that sets defined (in
the first-order logic sense) from internal parameters are internal.
Theorem 8.11 (Internal Definition Principle). Let ϕ(x, x1 , . . . , xm , y1 , . . . , yn )
be a formula, where x, x1 , . . . , xm range over the sort for X and y1 , . . . , ym
range over the sort for P(X). Suppose that a1 , . . . , am ∈ X ∗ and A1 , . . . , An ∈
P(X)∗ . Set
B := {b ∈ X ∗ | (X ∗ , P(X)∗ ) |= ϕ(b, a1 , . . . , am , A1 , . . . , An )}.
Then B is internal.
Proof. The following sentence is true in (X, P(X)):
∀x1 , . . . , xm ∀y1 , . . . , yn ∃z∀x(PE (x, z) ↔ ϕ(x, x1 , . . . , xm , y1 , . . . , yn )).
By transfer, this remains true in (X ∗ , P(X)∗ ). Plugging in ai for xi and Aj
for yj , we see that
(X ∗ , P(X)∗ ) |= ∃z∀x(PE (x, z) ↔ ϕ(x, a1 , . . . , am , A1 , . . . , An )).
The set asserted to exist is B, which then belongs to P(X)∗ , that is, B is
internal.
Example 8.12. For any finite collection a1 , . . . , am ∈ X ∗ , the set {a1 , . . . , am }
is internal. Indeed, let ϕ(x, x1 , . . . , xm ) be the formula x = x1 ∨· · ·∨x = xm .
Then
{a1 , . . . , am } = {b ∈ X ∗ | (X ∗ , P(X)∗ ) |= ϕ(b, a1 , . . . , am )}.
LECTURE NOTES ON NONSTANDARD ANALYSIS 51
Problem 8.2.
(1) Suppose r, s ∈ R∗ and r < s. Set [r, s] := {t ∈ R∗ | r ≤ t ≤ s}. Show
that [r, s] is internal.
(2) Show that µ is external.
(3) Show that Rfin is external.
Problem 8.3. Discuss what it should mean for a function f : A → B to be
internal, where A ⊆ M~s and B ⊆ M~t.
Problem 8.4. Suppose that f : N∗ × X ∗ → X ∗ is an internal function. Fix
x ∈ X ∗ . Show that there exists a unique internal function F : N∗ → X ∗
such that F (0) = x and F (n + 1) = f (n + 1, F (n)). (This is the principle of
Internal Recursion.)
Problem 8.5. Suppose that the nonstandard extension is κ-saturated. Show
that every infinite internal set has cardinality at least κ.
Problem 8.6.
(1) Suppose that N ∈ N∗ \ N. Fix r ∈ (0, 1) (so r is standard). Show
there is a smallest k ∈ N∗ such that N r ≤ k.
(2) Show that any infinite hyperfinite set has cardinality at least 2ℵ0 .
(3) Show that any infinite internal set has cardinality at least 2ℵ0 . (This
improves the result from the previous exercise.)
Problem 8.7. Suppose the nonstandard extension satisfies the Countable
Comprehension Principle. Further suppose that (Kn | n ∈ N) is a sequence
of elements of N∗ such that Kn > N for all n ∈ N. Show that there is
K ∈ N∗ \ N such that K < Kn for all n ∈ N.
Problem 8.8. Show that a nonstandard extension satisfying the Countable
Comprehension Principle must be ℵ1 -saturated. (Hint: you might find the
previous problem useful.)
Problem 8.9. Fix k ∈ N. Suppose that G = (V, E) is a (combinatorial)
graph such that every finite subgraph of G is k-colorable. Prove that G is
k-colorable. (Hint: hyperfinite approximation!)
(1) d(x, y) ≥ 0;
(2) d(x, y) = 0 if and only if x = y;
(3) d(x, y) = d(y, x);
(4) (Triangle Inequality) d(x, z) ≤ d(x, y) + d(y, z).
If (2) in the above list is replaced by the weaker condition
(2’) x = y implies d(x, y) = 0,
then d is called a pseudometric and (X, d) is called a pseudometric space.
As usual, we often speak of “the metric space X,” suppressing mention
of the metric. Keep in mind: the same set X can be equipped with many
different metrics, yielding many different metric spaces. (See the examples
below.)
Example 9.2.
pPn
(1) For ~x, ~y ∈ Rn , define d(~x, ~y ) := 2 n
i=1 (xi − yi ) . Then (R , d) is
a metric space. This metric is usually referred to as the euclidean
metric on Rn .
(2) For ~x, ~y ∈ Rn , define d∞ (~x, ~y ) := maxi=1,...,n |xi −yi |. Then (Rn , d∞ )
is a metric space.
(3) Set C([0, 1]), R) to be the set of continuous functions f : [0, 1] → R.
Define a metric d on C([0, 1], R) by setting
d(f, g) := max |f (x) − g(x)|.
x∈[0,1]
• If U, V ∈ τ , then U ∩ V ∈ τ .
So, for example, a metric space, equipped with its collection of open sets
(as defined above), is a topological space. There are a plethora of topological
spaces not arising from metric spaces and the notion of topological space
is central to most areas of mathematics (even logic!). Given aTtopological
space X and a ∈ X, we define the monad of a to be µ(a) = {U ∗ | U ∈
τ, a ∈ U }. (Double-check that this agrees with the definition in the metric
space context.) With a little more effort, the results we have established in
this section (that do not refer to metric notions) hold in the more general
context of topological spaces. In fact, this is true of the majority of the
results to come (at least the ones that do not mention metric notions).
Here’s a question: What is the analog of Rfin for our metric space X? If we
use just the definition of Rfin , then we should make the following definition:
Definition 9.15. The set of finite points of X ∗ is
Xfin = {a ∈ X ∗ | d(a, b) ∈ Rfin for some b ∈ X}.
However, by Theorem 1.9, every element of Rfin is infinitely close to a
(standard) real number. This motivates:
Definition 9.16. The set of nearstandard elements of X ∗ is
Xns := {a ∈ X ∗ | a ≈ b for some b ∈ X}.
S
In other words, Xns = b∈X µ(b). Some remarks are in order:
Remarks 9.17.
(1) The “for some” in Definition 9.15 can be replaced with “for all,”
that is,
Xfin := {a ∈ X ∗ | d(a, b) ∈ Rfin for all b ∈ X}.
Indeed, suppose that a ∈ X ∗ and b ∈ X are such that d(a, b) ∈ Rfin .
For any other c ∈ X, we have
d(a, c) ≤ d(a, b) + d(b, c) ∈ Rfin + R ⊆ Rfin .
(2) It is immediate to see that Xns ⊆ Xfin . Sometimes we have equality;
for example, Theorem 1.9 says that Rns = Rfin . However, we often
have a strict inclusion Xns ( Xfin . For example, let X = C([0, 1], R)
from Example 9.2. By transfer, an element of X ∗ is a function
60 ISAAC GOLDBRING
Proof. First suppose that X is a Heine-Borel metric space and suppose that
a ∈ Xfin ; we need a ∈ Xns . Fix b ∈ X; then d(a, b) ∈ Rfin , say d(a, b) < r
with r ∈ R>0 . Then a ∈ B̄(b; r)∗ . Since X is Heine-Borel, we have B̄(b; r)
is compact, whence a ≈ c for some c ∈ B̄(b; r). In particular, a ∈ Xns .
Conversely, suppose that Xns = Xfin . Suppose that K ⊆ X is closed and
bounded; we need K to be compact. Fix a ∈ K ∗ ; we need a ≈ b for some
b ∈ K. Since K is bounded, we have K ∗ ⊆ Xfin , whence a ∈ Xfin = Xns .
Thus, there is (unique) b ∈ X such that a ≈ b. It remains to verify that
b ∈ K; but this follows immediately from the fact that K is closed (and the
nonstandard characterization of closed).
Corollary 9.24. Rn is a Heine-Borel metric space. C([0, 1], R) is not a
Heine-Borel metric space.
We can define continuity between metric spaces. Suppose that Y is also
a metric space and f : X → Y is a function. For p ∈ X, we say that f is
continuous at p if whenever O ⊆ Y is open and f (p) ∈ O, then there is an
open O0 ⊆ X such that f (O0 ) ⊆ f (O). We say that f is continuous if f is
continuous at p for all p ∈ X.
The astute observer will notice that this is not the direct translation of
continuity for functions on R. However, the following exercise will make
them feel better:
Exercise 9.25. The following are equivalent:
(1) f is continuous at p;
(2) For all ∈ R>0 , there is δ ∈ R>0 such that, for all q ∈ X, if
d(p, q) < δ, then d(f (p), f (q)) < ;
(3) f (µ(p)) ⊆ µ(f (p)), that is, if q ≈ p, then f (q) ≈ f (p).
We use the above definition for continuity as it makes sense in an arbitrary
topological space and not just for metric spaces. The equivalence of (1) and
(3) in the previous exercise will still hold in this more general context.
Proposition 9.26. Suppose that f : X → Y is continuous and K ⊆ X is
compact. Then f (K) ⊆ Y is compact.
Proof. Suppose y ∈ f (K)∗ ; we need y ∈ f (K)ns . By transfer, we have
y = f (x) for some x ∈ K ∗ . Since K is compact, st(x) exists and belongs to
K. Since f is continuous at st(x), y = f (x) ∈ µ(f (st(x)), so y ∈ f (K)ns .
Exercise 9.27. Suppose that f : X → Y is a function.
(1) Define what it means for f to be uniformly continuous. Then state
and prove a nonstandard characterization of uniform continuity.
(2) Suppose that f is continuous and X is compact. Prove that f is
uniformly continuous.
For the purpose of the next exercise, define Xinf := X ∗ \ Xfin . A (not
necessarily continuous) function f : X → Y is said to be proper if f −1 (K) ⊆
X is compact for every compact K ⊆ Y .
62 ISAAC GOLDBRING
Exercise 9.28. Suppose that X and Y are Heine Borel metric spaces and
f : X → Y is continuous. Prove that f is proper if and only if f (Xinf ) ⊆ Yinf .
We can also bring the notions of sequences and convergence of sequences
into the metric space setting. For example, a sequence (an ) from X converges
to a ∈ X if and only if, for every ∈ R>0 , there is m ∈ N such that, for all
n ∈ N, if n ≥ m, then d(an , a) < .
Here is the metric space version of Bolzano-Weierstrauss:
Theorem 9.29. If X is a compact metric space and (an ) is a sequence in
X, then an has a convergent subsequence.
Proof. Fix N > N. Then aN ∈ X ∗ = Xns . Then st(aN ) is a limit point of
(an ).
Definition 9.30. X is a complete metric space if every Cauchy sequence in
X converges.
Corollary 9.31. Compact metric spaces are complete.
Proof. Suppose that (an ) is a Cauchy sequence in X, so aM ≈ aN for all
M, N > N. Since X is compact, aN ∈ Xns for all N > N. Thus, if L =
st(aN ) for N > N, then aM ≈ L for all M > N, whence (an ) converges to
L.
Exercise 9.32. Suppose that X is complete and C ⊆ X is closed. Prove
that C is also complete.
In order to explain the nonstandard characterization of completeness, it
is convenient at this point to introduce another important set of points in
X ∗:
Definition 9.33. The set of pre-nearstandard points of X ∗ is
Xpns := {a ∈ X ∗ | for each ∈ R>0 , there is b ∈ X such that d(a, b) < }.
Immediately, we see that Xns ⊆ Xpns ⊆ Xfin .
Theorem 9.34. X is complete if and only if Xns = Xpns .
Proof. First suppose that X is complete and p ∈ Xpns . Then, for every
n ≥ 1, there is qn ∈ X such that d(p, qn ) < n1 . It follows that (qn ) is Cauchy,
whence converges to q ∈ X. It follows that p ≈ q, whence p ∈ Xns .
Towards the converse, suppose that Xns = Xpns and suppose that (xn )
is Cauchy. Fix N > N; it suffices to show that xN ∈ Xns . If not, then
xN ∈/ Xpns , whence there is ∈ R>0 such that d(xN , q) ≥ for all q ∈ X.
In particular, d(xN , xn ) ≥ for all n ∈ N. But (xn ) is Cauchy, so for some
n ∈ N big enough, d(xN , xn ) < , a contradiction.
Corollary 9.35. If X is Heine-Borel, then X is complete.
The following theorem on “remoteness” will prove useful later in these
notes:
LECTURE NOTES ON NONSTANDARD ANALYSIS 63
2. Suppose q ∈ X is such that d(p, q) < δ. Then d(f (p), f (q)) < 2 . Since
f (p) ≈ F (p) and f (q) ≈ F (q), this shows that d(F (p), F (q)) < .
Now suppose that p ∈ X ∗ ; we need F (p) ≈ f (p). Let p0 := st(p); this
is possible since X is compact. Then f (p) ≈ f (p0 ) by S-continuity of f .
Meanwhile, F (p0 ) ≈ f (p0 ) by definition of F and F (p0 ) ≈ F (p) by continuity
of F .
Proof. First suppose that f is compact. Fix p ∈ Xfin ; we need f (p) ∈ Yns .
∗
Well, p ∈ B := B(a; r) for some a ∈ X and r ∈ R>0 , whence f (p) ∈ f (B) ⊆
Yns since f (B) is compact.
Conversely, suppose that f (Xfin ) ⊆ Yns . Fix B ⊆ X bounded; we must
∗
show that f (B) is compact. Take q ∈ f (B) ; we must find q 0 ∈ f (B)
such that q ≈ q 0 . Fix ∈ µ>0 ; by transfer, there is y ∈ f (B)∗ such that
d(q, y) < . Write y = f (x) for x ∈ B ∗ . By assumption, f (x) ∈ Yns , so
f (x) ≈ q 0 for some q 0 ∈ Y . It remains to show that q 0 ∈ f (B). Fix δ ∈ R>0 .
By assumption, there is z ∈ f (B)∗ such that d(q, z) < δ, whence it follows
that d(q 0 , z) < δ. Applying transfer to this last fact, we see that there is
z ∈ f (B) such that d(q 0 , z) < δ.
Corollary 9.45. Suppose that f : X → Y is a function. If Y is compact,
then f is compact.
Corollary 9.46. Suppose that (fn ) is a sequence of compact functions from
X to Y . Further assume that Y is complete and that (fn ) converges uni-
formly to f . Then f is compact.
Proof. Suppose x ∈ Xfin ; we need f (x) ∈ Yns . Since Y is complete, it
suffices to prove that f (x) ∈ Ypns . Fix ∈ R>0 . Fix m ∈ N such that
d(fm (p), f (p)) < 2 for all p ∈ X. By transfer, d(fm (x), f (x)) < 2 . Since fm
is compact, we have fm (x) ∈ Yns , say fm (x) ≈ y with y ∈ Y . It follows that
d(f (x), y) < . Since was arbitrary, this shows that f (x) ∈ Ypns .
9.4. Problems. You may assume any level of saturation that you need in
any given problem.
Problem 9.1. Suppose that X is a metric space and A is a subset of X.
The interior of A, denoted A◦ , is defined by
A◦ := {x ∈ A | there exists r ∈ R>0 such that B(x, r) ⊆ A}.
(1) Show that A is open ◦
◦
S iff A = A . (Standard reasoning)
(2) Show that A = {O | O is open and O ⊆ A}. (Standard reason-
ing)
(3) Show that, for any x ∈ X, we have x ∈ A◦ iff y ∈ A∗ for any y ∈ X ∗
with y ≈ x.
Problem 9.2. Suppose that X is a metric space and A is a subset of X.
The closure of A, denoted A. is defined by
A := {x ∈ X | for any r ∈ R>0 , there is a ∈ A such that d(x, a) < r}.
(1) Show that A = {x ∈ X | there is (an ) from A such that an → x}.
(Standard reasoning)
T
(2) Show that A = {F | F is closed and A ⊆ F }. (Standard reasoning)
(3) Show that A is closed iff A = A. (Standard reasoning)
(4) Show that, for any x ∈ X, we have x ∈ A iff there is y ∈ A∗ such
that x ≈ y.
66 ISAAC GOLDBRING
Exercise 10.4. Verify that all of the normed spaces from Example 10.2
are Banach spaces. (Hints: Don’t forget our nonstandard characterization
of completeness. Also, for showing that C(X, F) is a Banach space, don’t
forget about our nonstandard characterization of C(X, F)ns from Problem
9.6.)
Until otherwise stated, we fix normed spaces V and W ; we write d for
both of the associated metrics on V and W . For x ∈ V ∗ , we say that x is
infinitesimal if x ≈ 0, that is, d(x, 0) ∈ µ (equivalently, kxk ∈ µ). It follows
immediately that x ≈ y if and only if x − y is infinitesimal.
Lemma 10.5. If x, y ∈ V ∗ and x ≈ y, then kxk ≈ kyk. (The converse fails
miserably!)
Proof. We may suppose that kxk ≤ kyk Write y = x + (y − x). Then
kyk ≤ kxk + ky − xk ≈ kxk since y − x is infinitesimal. Thus, kxk ≈ kyk.
Exercise 10.6.
(1) If α ∈ Ffin and x, y ∈ V ∗ are such that x ≈ y, show that αx ≈ αy.
(2) Prove that the addition and scalar multiplication maps + : V × V →
V and · : F × V → V are continuous (with respect to the metric d).
Please use the nonstandard characterization of continuity.
10.2. Bounded linear maps.
Proposition 10.7. Suppose that T : V → W is a linear transformation and
T is continuous at some x0 ∈ V . Then T is uniformly continuous.
Proof. We use the nonstandard characterization of uniform continuity: sup-
pose x, y ∈ V ∗ and x ≈ y. We show that T x ≈ T y. Well, x0 + (x − y) ≈ x0 ,
so by the continuity of T at x0 , we have T (x0 + x − y) ≈ T (x0 ). Thus,
T (x0 ) + T (x) − T (y) ≈ T (x0 ), whence T (x) ≈ T (y). (We have used the
transfer principle to infer that the nonstandard extension of T is also lin-
ear.)
Exercise 10.8. Suppose that T : V → W is a linear transformation that is
continuous. Prove that ker(T ) := {x ∈ V | T (x) = 0} is a closed subspace
of V .
Definition 10.9. We say that a linear transformation T : V → W is
bounded if there is M ∈ R>0 such that kT xk ≤ M kxk for all x ∈ V .
The terminology in the above definition corresponds to the next fact:
Proposition 10.10. T : V → W is bounded if and only if {T (x) | kxk = 1}
is a bounded subset of W .
Proof. Let A := {T (x) | kxk = 1}. For the (⇒) direction, if kT xk ≤ M kxk
for all x ∈ V , then A is contained in the closed ball around 0 (in W ) of radius
M . Conversely, suppose A is contained in the closed ball around 0 of radius
M . We claim that kT xk ≤ M kxk for all x ∈ V . Indeed, for x ∈ V \ {0},
1 1
k kxk xk = 1, so kT ( kxk x)k ≤ M , whence kT xk ≤ M kxk.
LECTURE NOTES ON NONSTANDARD ANALYSIS 69
For the next problem, you will need to use the following:
Fact 10.29. If V is a normed space, then {x ∈ V | kxk ≤ 1} is compact if
and only if V is finite-dimensional.
Problem 10.3. Suppose that V is a Banach space and T : V → V is a
compact linear operator.
(1) Show that the identity operator I : V → V is compact if and only if
V is finite-dimensional.
(2) Suppose that U : V → V is any bounded linear operator. Show that
T ◦ U and U ◦ T are also compact.
(3) Suppose that T is invertible. Show that T −1 is compact if and only
if V is finite-dimensional.
Problem 10.4. Suppose K : [0, 1] × [0, 1] → R is a continuous function.
Suppose that T : C([0, 1], R) → C([0, 1], R) is defined by
Z 1
T (f )(s) := f (t)K(s, t)dt.
0
Show that T is a compact linear operator. (Hint: Use our earlier charac-
terization of C([0, 1], R)ns .) Such an operator is called a Fredholm Integral
Operator.
Problem 10.5. Suppose that k · k1 and k · k2 are both norms on a vector
space W . We say that k · k1 and k · k2 are equivalent if there exist constants
c, d ∈ R>0 such that, for all x ∈ W , we have
ckxk1 ≤ kxk2 ≤ dkxk1 .
For x, y ∈ W∗and i = 1, 2, let us write x ≈i y to mean kx − yki ≈ 0.
(1) Suppose that, for all x ∈ W ∗ , if x ≈1 0, then x ≈2 0. Show that
{kxk2 | x ∈ W, kxk1 ≤ 1} is bounded.
(2) Suppose that {kxk2 | x ∈ W, kxk1 ≤ 1} is bounded. Let
d := sup{kxk2 | x ∈ W, kxk1 ≤ 1}.
Show that kxk2 ≤ dkxk1 for all x ∈ W .
(3) Show that k · k1 and k · k2 are equivalent iff for all x ∈ W ∗ , we have
x ≈1 0 iff x ≈2 0.
(4) Show that (W, k · k1 ) is a Banach space iff (W, k · k2 ) is a Banach
space.
(5) Suppose A ⊆ W . For i = 1, 2, say that A is openi if A is open with
respect to the metric associated to k · ki . Show that k · k1 and k · k2
are equivalent iff for all A ⊆ W , we have A is open1 iff A is open2 .
(In fancy language, this exercise says that two norms are equivalent
if and only if they induce the same topology on W .)
Problem 10.6.
(1) Let k · k be any norm on Rn (not necessarily the usual norm on Rn ).
Suppose x ∈ (R∗ )n . Show that kxk ≈ 0 iff |xi | ≈ 0 for i = 1, . . . , n.
LECTURE NOTES ON NONSTANDARD ANALYSIS 73
Now let m → ∞. It is now easy to verify that the axioms for an i.p.s. hold.
We should remark that of all the `p spaces, `2 is the only one that carries the
structure of an i.p.s. and the above inner product on `2 induces the norm
on `2 introduced in the previous section.
Example 11.8. Let V = C([0, 1], F). Then V becomes an i.p.s. when
R1
equipped with the inner product given by hf, gi := 0 f (x)g(x)dx. How
does the norm on V induced by the inner product compare with the norm
placed on V in the previous section?
Definition 11.9. V is called a Hilbert space if the metric associated to V
is complete.
In other words, an i.p.s. is a Hilbert space if the associated normed space
is a Banach space.
Exercise 11.10.
(1) Prove that the inner product spaces in Exercises 11.6 and 11.7 are
Hilbert spaces. (Your proof for `2 should probably be standard as
we have yet to characterize `2ns .)
(2) Prove that the inner product space in Exercise 11.8 is not a Hilbert
space.
LECTURE NOTES ON NONSTANDARD ANALYSIS 75
Definition 11.13. An orthonormal basis for V is a maximal orthonormal
sequence of vectors for V .
By Zorn’s lemma, every inner product space has an orthonormal basis.
One must be careful with the word basis here: while in finite-dimensional
inner product spaces, an orthonormal basis is a basis (in the usual linear
algebra sense), for infinite-dimensional inner product spaces, an orthonormal
basis is never a basis. (In this setting, the usual notion of “basis” is called
“Hamel basis” to help make the distinction.)
Fact 11.14. Let (en | n ∈ N) be an orthonormal set of vectors for the Hilbert
space H. Then the following are equivalent:
(1) (en ) is an orthonormal basis for V ;
(2) If v ∈ V is such that v ⊥ en for each n, then v = 0;
(3) For all v ∈ V , there is a sequence (αn ) from F such that m
P
n=0 αn en
converges to v as m → ∞.
You will prove the
P previous fact in the exercises. In (3) of the previous
fact, we write v = ∞ n=0 αn en .
ka − bk2
= n∈N∗ |αn − βn |2 ≈ 0, whence (†) ≈ 0.
P
Since a ≈ b, we have that
It remains to prove that (††) ≈ 0. However, since b ∈ H, by the first part of
P (††) ≈ 0.
the proof, we know that
Now suppose that n>k |αn |2 ≈ 0 for every k > N; we must show that a ∈
Hns . Since a ∈ Hfin , we know that n∈N∗ |αn |2 ∈ Rfin ; say n∈N∗ |αn |2 ≤
P P
|2
P
If k ∈ N, then n≤k |αn − βn ≈ 0; thus, by the Infinitesimal Prolongation
Theorem, there is k > N such that n≤k |αn − βn |2 ≈ 0. On the other hand,
P
X X X
|αn − βn |2 ≤ |αn |2 + |βn |2 .
n>k n>k n>k
|2 2
P P
By assumption, n>k |αn ≈ 0, whilst n>k |βn | ≈ 0 by the forward
direction of the theorem and the fact that b is standard. Consequently,
ka − bk2 ≈ 0.
P∞ Pm
For a = n=0 αn en ∈ H and m ∈ N, set P (m, a) := Pm (a) = n=0 αn en ∈
H. We thus have maps P : N × H → H and, for n ∈ N, Pn : H → H.
Exercise 11.22. For n ∈ N, Pn is a bounded linear transformation with
kPn k = 1.
78 ISAAC GOLDBRING
Claim: z ∈ Ens .
Proof of Claim: Suppose, towards a contradiction, z ∈ / Ens . Since H is
complete and E is closed in H, we have that E is complete. Consequently,
since z ∈
/ Ens , we have that z ∈ whence there is r ∈ R>0 such that
/ Epns ,q
2
kz − wk ≥ r for all w ∈ E. Since α < α2 + r4 , we have w ∈ E such that
q
2
kx − wk < α2 + r4 . By the Parallelogram Identity, we have:
Thus:
kw − zk2 = 2(kx − wk2 + kx − zk2 ) − k(x − w) + (x − z)k2
r2
< 2(α2 + ) + 2kx − zk2 − 4α2
4
r2 r2
< 2(α2 + ) + 2α2 + − 4α2
4 2
< r2 .
This contradicts the fact that kz − wk ≥ r.
Lemma 11.39. (E ⊥ )⊥ = E.
Lemma 11.40. Suppose that G = E ⊕ F . Then PG = PE + PF . Conse-
quently, I = PE + PE ⊥ .
Now suppose that E1 , E2 , E3 are closed subspaces such that E1 ⊥ E2 ,
E2 ⊥ E3 and E1 ⊥ E3 . Then E1 ⊥ (E2 ⊕ E3 ), (E1 ⊕ E2 ) ⊥ E3 , and
E1 ⊕ (E2 ⊕ E3 ) = (E1 ⊕ E2 ) ⊕ E3 . We may thus unambiguously write
E1 ⊕ E2 ⊕ E3 . Ditto for any finite number of mutually perpendicular closed
subspacesLE1 , . . . , En of H; we often write the direct sum in the compact
notation ni=1 Ei .
Corollary 11.41. If G = ni=1 Ei , then PG = PE1 + · · · + PEn .
L
Ln | n ≥
Returning to the situation preceding the lemmas, suppose that (E
1) are mutually perpendicular closed subspaces of H. Set Gn := ni=1 Ei ,
L∞ S
a closed subspace of H. We then define n=1 Ei := n≥1 Gn , a closed
subspace of H by the previous two lemmas.
Lemma 11.44. Suppose that (Gn | n ≥ 1) is a sequence of subspaces of H
S
with Gn ⊆ Gn+1 for all n ≥ 1 and G = n≥1 Gn . Then, for all x ∈ H, we
have PGn (x) → PG (x) as n → ∞.
L∞
Corollary
Pn 11.45. Suppose that E = i=1 Ei . Then, for all x ∈ H, we
have i=1 PEi (x) → PE (x) as n → ∞.
11.4. Hyperfinite-dimensional subspaces. Once again, H denotes an
arbitrary Hilbert space. Let E denote the set of finite-dimensional subspaces
of H, whence E ⊆ P(H). We will refer to elements of E ∗ as hyperfinite-
dimensional subspaces of H ∗ . We have a map P : E × H → H given by
P (E, x) := PE (x). We thus get a nonstandard extension P : E ∗ × H ∗ → H ∗ ,
whence it makes sense to speak of the orthogonal projection map PE : H ∗ →
H ∗ for E ∈ E ∗ . Similary, we have a map dim : E → N, whence we get a map
dim : E ∗ → N∗ .
If H is separable and (en | n ≥ 1) is an orthonormal basis for H, then
by transfer, for N ∈ N∗ , we have HN := sp(e1 , . . . , eN ) ∈ E ∗ , the internal
LECTURE NOTES ON NONSTANDARD ANALYSIS 83
P x ∈ Hns .
Problem 11.4. Suppose that H is a separable Hilbert space and
Suppose that (en ) is an orthonormal basis for `2 and x = n∈N∗ xn en .
Further suppose that xn ≈ 0 for all n ∈ N. Show that x ≈ 0.
Problem 11.5. Suppose P : H → H is an idempotent bounded operator on
a Hilbert space H. Show that P is compact if and only if P is a finite-rank
operator.
84 ISAAC GOLDBRING
Problem 11.6. Suppose that E is any linear subspace of any unitary space
U . Show that E ⊥ is a linear subspace of U .
We will need the following notation for the next problem. If w = a + bi ∈
C, then Re(w) := a and Im(w) := b.
Problem 11.7. Suppose that E is a closed linear subspace of the Hilbert
space H. Suppose x ∈ / E. We aim to show that x − PE (x) ∈ E ⊥ . Set
α := kx − PE (x)k > 0. Fix z ∈ E \ {0}.
(1) Fix λ ∈ R \ {0}. Show that λ2 kzk2 − 2λ Re(hx − PE (x), zi) > 0.
(Hint: Start with α2 < kx − (PE (x) + λz)k2 .)
(2) Considering
Re(hx − PE (x), zi)
λ := ,
kzk2
conclude that Re(hx − PE (x), zi) = 0.
(3) Show that hx − PE (x), zi = 0, and thus x − PE (x) ∈ E ⊥ .
Problem 11.8. Suppose (e1 , . . . , ek ) is an orthonormal sequence of vectors
in the Hilbert space H. Suppose E = span(e1 , . . . , ek ). Show that, for all
v ∈ H, we have
X k
PE (x) = hv, ei iei .
i=1
and such that for all i ∈ {1, . . . , n}, all g ∈ G, and all h ∈ Gi , we have
ghg −1 h−1 ∈ Gi−1 . If n is the smallest length of such a sequence of subgroups
for G, we call n the nilpotency class of G.
(1) G is nilpotent of nilpotency class 1 if and only if G is nontrivial (i.e.
G 6= {1}) and abelian (i.e. xy = yx for all x, y ∈ G.)
(2) Show that G∗ is a group, as is any subset of G∗ closed under the
extension of the group operations.
(3) Suppose A ⊆ G. The subgroup of G generated by A, denoted hAi, is
\
hAi := {H | H ≤ G and A ⊆ H}.
A subgroup H of G is called finitely generated if H = hAi for some
finite A ⊆ G. Discuss how the nonstandard extension of the set of
finitely generated subgroups of G is the set of hyperfinitely generated
subgroups of G∗ .
(4) Show that there is a hyperfinitely generated subgroup H of G∗ such
that G ≤ H.
(5) Suppose that G is locally nilpotent, that is, suppose that every finitely
generated subgroup of G is nilpotent. Further suppose that the
nilpotency class of any finitely generated subgroup of G is less than
or equal to n. Show that G is nilpotent of nilpotency class less than
or equal to n. (You will have to use the fact that a subgroup of a
nilpotent group of nilpotency class ≤ n is itself a nilpotent group of
nilpotency class ≤ n. Try proving this fact!)
(6) Why doesn’t your proof in (5) work if G is locally nilpotent with un-
bounded nilpotency class? (There are some difficult open problems
about locally nilpotent groups of unbounded nilpotency class that
some group theorists hope might be solved by nonstandard meth-
ods.)
Problem 12.6. (Assume ℵ1 -saturation) Suppose that X is a normed space
over F and E ⊆ X ∗ is an internal subspace, that is, E is internal, 0 ∈ E, E
is closed under addition, and E is closed under multiplication by elements
of F∗ . For example, E := X ∗ is an internal subspace of X ∗ . Define
Efin := {x ∈ E | kxk ∈ Ffin }
and
µE := {x ∈ E | kxk ∈ µ(0)}.
(1) Show that Efin is a vector space over F and that µE is a subspace
of Efin . We set Ê := Efin /µE and call it the nonstandard hull of E.
For x ∈ Efin , we often write x̂ instead of x + µE .
(2) For x̂ ∈ Ê, define kx̂k := st(kxk). Show that this definition is
independent of the coset representative and that k · k : Ê → R is a
norm on Ê.
(3) Show that Ê is a Banach space. (Notice as a consequence that even
if X was incomplete, X c∗ is automatically complete.)
88 ISAAC GOLDBRING
For simplicity, set αj := st(κj ). We can now state the Spectral Theorem
in case W is finite.
Theorem 13.17 (Spectral Theorem- Case 1). If X = {1, . . . , k} for some
k ∈ N, then the eigenvalues of T are 0, α1 , . . . , αk and
T = α1 PW̃1 + · · · + αk TW̃k
is the spectral resolution of T .
Proof. Fix x ∈ H; then T (x) = T 0 (x). Consequently,
η
X k
X η
X
0
T (x) = T (x) κj PWj = κj PWj (x) + κj PWj (x) = (†) + (††).
i=1 j=1 j=k+1
By the previous lemma, (†) ≈ kj=1 αj PW̃j (x) ∈ H. We need to show that
P
Thus, αPW̃0 (x) = 0 and αPW̃j (x) = αj PW̃j (x) for j = 1, . . . , k (since ele-
ments of W̃0 , W̃1 , . . . , W̃k are pairwise perpendicular and thus linearly inde-
pendent). Since x 6= 0, there is j ∈ {1, . . . , k} such that PW̃j (x) 6= 0; for this
j, we have α = αj .
In order to deal with the case that X = N>0 , we need one final lemma.
Lemma 13.18. If X = N>0 , then limj→∞ αj = 0.
Proof. Since |αn | is nonincreasing and bounded below, it suffices to prove
that 0 is a limit point of |αn |. We consider the extension of the sequence
(αn | n ∈ N>0 ) to a sequence (αn | n ∈ (N>0 )∗ ). By the Infinitesimal
Prolongation Theorem, there is N ∈ N∗ \ N, N ≤ η such that αN ≈ κN .
But κN ≈ 0 since N ∈ / X. Thus, αN ≈ 0 and hence 0 is a limit point of
(αn ).
We are now ready to state:
Theorem 13.19 (Spectral Theorem-Case >0
P∞ 2). If X = N , then the nonzero
eigenvalues are α1 , α2 , . . . and T = j=1 αj PW̃j is the spectral resolution of
T , that is, T (x) = ∞
P
i=1 αj PW̃j (x) for all x ∈ H.
LECTURE NOTES ON NONSTANDARD ANALYSIS 93
Proof. Fix x ∈ H. Set rn := kT (x)− nj=1 αj PW̃j (x)k; we need to show that
P
and T (x) = T 0 (x) = ηj=1 κj PWj (x). Thus, rn ≈ k ηj=n+1 κj PWj (x)k for
P P
all n ∈ N. We play a Pythagorean game again:
η
X η
X η
X
k κj PWj (x)k2 = |κj |2 kPWj (x)k2 ≤ |κn+1 |2 kPWj (x)k2 = |κn+1 |2 kxk2 .
j=n+1 j=n+1 j=1
13.1. Problems.
Problem 13.1. Suppose that T is a Hermitian operator on a unitary space
V . Show that every eigenvalue of T is a real number.
Problem 13.2. Let V be the unitary space C([0, 1], C) endowed with the
inner product
Z 1
hf, gi := f (t)g(t)dt.
0
Define T : V → V by T (f ) := tf , i.e. T (f )(t) := tf (t).
(1) Show that T is a bounded Hermitian operator on V .
(2) Show that T has no eigenvalues.
(3) Show that T is not compact. (Don’t just say if it were compact,
it would have eigenvalues by the Spectral Theorem. Show directly
that T is not compact.)
Problem 13.3. For each of the following bounded linear operators T given
below, explain why they are compact and Hermitian. Then find the eigen-
values, eigenspaces, and projections, yielding the spectral decomposition for
T.
(1) Let A be the matrix
3 −4
.
−4 3
Define T : C2 → C2 by T (x) = Ax.
(2) Define T : `2 → `2 by T ((xk | k ≥ 1)) = ( xkk | k ≥ 1).
There is a Spectral Theorem for Hermitian operators which need not be
compact. The statement of the spectral theorem is more involved (and a
little less satisfying), so we will not give it here. However, the results in
the remaining problems are ingredients towards proving this more general
Spectral Theorem. The interested reader can find a complete discussion of
this in Chapter 5, Section 5 of [1].
94 ISAAC GOLDBRING
It will prove desirable later on to know that T and T̃ are “close.” More
precisely, we will want to know that, for y ∈ Hν ∩ Hfin , T (y) ≈ T̃ (y). We
can achieve this goal by choosing ν appropriately, as we proceed to show
now.
First, we would like to know a predictable form for the matrix repre-
sentation of T n with respect to our basis {e1 , e2 , . . .}. Usually, this is not
possible, but thankfully, [ajk ] is almost superdiagonal. Recursively define
(n) (1) (n+1) (n) Pk+1 (n)
:= ∞
P
ajk as follows: Set ajk := ajk and ajk i=1 aji aik = i=1 aji aik .
15.4. Integration. Once one has a measure space (X, Ω, µ), one can inte-
grate as follows. First, for a function f : X → R, we say that f is measurable
if f −1 (U ) ∈ Ω for any open U ⊆ R.
Next, define a simple function g : X → R to be a measurable function
with finite range. Given a set A ⊆ X, we define 1A : X → R by 1A (x) = 1 if
x ∈ A and 0 otherwise. If A P∈ Ω, then 1A is measurable. Any simple function
g can be written as g = ni=1 ri 1Ai with ri ∈ R and R Ai ∈ Ω.PnFor such a
simple function g, we define the integral of g to be gdµ := i=1 ri µ(Ai ).
For an arbitrary positive
R R function f : X → R, we define the
measurable
integral of f to be f dµ R:= sup{ gdµ | g ≤ f, g a simple function}. f
is said to be integrable if f dµ < ∞. For an arbitrary function f , we set
f + := max(f, 0) and f − := max(−f, 0). We say that R f is integrableR if both
f + and f − are integrable, in Rwhich case we defineR f dµ = f + dµ− f − dµ.
R
In other words, P (g) is an Ω0 -measurable function which has the same in-
tegral as g over subsets of Ω0 . For probability reasons, P (g) is called the
conditional expectation of g with respect to Ω0 and is often denoted E[g|Ω0 ].
LECTURE NOTES ON NONSTANDARD ANALYSIS 105
that, in some sense, almost all of the pairs of points are in -pseudorandom
pairs. We can now state
Theorem 16.1 (Szemerédi’s Regularity Lemma). For any ∈ R>0 , there
is a constant C() such that any graph (V, E) admits an -regular partition
into m ≤ C() pieces.
LECTURE NOTES ON NONSTANDARD ANALYSIS 107
References
[1] M. Davis, Applied nonstandard analysis, John Wiley and Sons Inc., 1977.
[2] R. Goldblatt, Lectures on the hyperreals: an introduction to nonstandard analysis,
volume 188 of Graduate Texts in Mathematics. Springer-Verglag, 1998
[3] C. W. Henson, Foundations of Nonstandard Analysis: A Gentle Introduction to Non-
standard Extensions; Nonstandard Analysis: Theory and Applications, L.O. Arkeryd,
N.J. Cutland, and C.W. Henson, eds., NATO Science Series C; Springer 2001.