Markos Linear Methods
Markos Linear Methods
Markus Grasmair
Trondheim, Norway
Lecture notes for the course TMA4145 - Linear Methods, Autumn 2024, NTNU
Contents
8. p-norms 111
9. Sequence Spaces 115
10. Completions 118
Chapter 5. Bounded Linear Operators 121
1. Continuity of linear mappings 121
2. Norms of bounded linear operators 124
3. Extension of Linear Mappings 129
4. Range and Kernel 133
5. Spaces of Bounded Linear Mappings 136
Chapter 6. Hilbert Spaces 141
1. Best approximation and projections 141
2. Projections on subspaces 146
3. Riesz’ Representation Theorem 148
4. Adjoint operators on Hilbert spaces 150
5. Hilbert space bases 155
6. Galerkin’s Method 162
7. Reproducing Kernel Hilbert Spaces 164
Bibliography 171
CHAPTER 0
Mathematical Language
When you open a maths book, you will immediately note that it does not
look the same as other books, or even textbooks in other disciplines. There are of
course the obvious differences like the prevalent use of mathematical symbols and
the frequent appearance of equations, both incorporated in the running text and as
separate lines or even paragraphs. Then there is the strangely fragmented structure
of the text with definitions, theorems, lemmas, proofs, remarks, and so on, but often
hardly any text in between. The differences, however, go deeper than that, and you
will notice that the language itself looks weird. Especially when reading through
definitions and theorems, you might get the impression that they are written in
an unnecessarily complicated and convoluted manner, and these complications can
look even worse in the case of proofs. In the following, we will briefly discuss the
differences between mathematical language and everyday language. Many of the
thoughts in this chapter are based on the excellent book [Dev12], which intends to
ease the transition from high school to university mathematics. Even though you
are not the main target group of that book, having already put up with several
advanced mathematic courses, it can still make sense to skim through it.
Mathematical Statements
If you use language for everyday communication, you usually do not have to
be extremely precise. If something is unclear, you always can ask additional ques-
tions. In written communication, this is not always possible, but you still have
some context and common sense to your disposal, which can clear up most misun-
derstandings. In mathematics, you no longer have this luxury, as soon as you are
dealing with more complicated objects than integers or rational numbers, or with
more complex subjects than spatial geometry. Because of this, we have to be very
careful how to use language to talk about abstract mathematical objects.
Modern mathematics is largely concerned with investigating whether state-
ments about mathematical objects are true or false. Examples of such statements
are the following:
(1) The number √ 7 is a prime number. The number 6 is not a prime number.
(2) The number 2 is irrational.
(3) Every quadratic equation x2 + bx + c = 0 has at least two solutions.
(4) Every even natural number greater than 2 can be expressed as the sum
of two prime numbers.
The two statements in (1) are true. The statement (2) is true as well. The
statement (3) is wrong. The statement (4) is Goldbach’s conjecture; at the time of
the writing of this note, it was not yet known whether this is true or false.
In order to find the truth value of a mathematical statement, it is necessary to
actually know what the different expressions in the statement mean. That is, we
need a sufficiently precise definition of all the expressions. For instance, in order
to determine whether the statement that 7 is a prime number is true, we first need
to know what a prime number is. One possibility for defining prime numbers is the
1
2 0. MATHEMATICAL LANGUAGE
And. The statement (ϕ and ψ) is true precisely when both statements ϕ and
ψ are true. If at least one statement (or both) are false, it is false as well.
In everyday language, the word “and” can have a slightly different meaning than
the mathematical connector “and” and also convey a temporal or final relationship:
In the sentence “The sun went down and it became dark,” we are really talking
about a sequence of events: First, the sun went down, and then it became dark.
We might even imply that the first part of the sentence is the cause of the second
one.3 In mathematical expressions, the word “and” is never used in that sense. The
mathematical statements (ϕ and ψ) and (ψ and ϕ) are precisely the same, and the
order of the components ϕ and ψ does not matter. Consider in contrast the sinister
meaning of the reversed sentence “It became dark and the sun went down.”
Or. The statement (ϕ or ψ) is true precisely when at least one of the state-
ments ϕ or ψ (or both) are true. It is only false, when both ϕ and ψ are false. For
instance, the statement
(7 is a prime) or (7 < 0)
1Although, there is sometimes doubt about the status of the number 1: Is it a prime or not?
The definition above should clarify this doubt!
2At least the connectors do so pretty regularly; the appearance of the quantifiers is probably
much more sparse.
3You should not jump to conclusions that fast, though: Do not fall easy prey to the fallacy
post hoc ergo propter hoc and always remember that correlation does not imply causation!
CONNECTORS AND QUANTIFIERS 3
is true, as at least one of the component statements (namely the statement that 7
is a prime) is true; it does not matter that the statement 7 < 0 is false.
In addition, we have to be a bit careful when augmenting an “or” statement to
an “either . . . or” statement. In everyday language these two phrases are often, but
not always, used interchangeably, but the “either . . . or” can also be understood as
an exclusive or, which is true whenever precisely one of the component statements is
true, but not both. In this class, I will use the phrase “either . . . or” interchangeably
with “or”. If I need to make use of an exclusive or, I will use more complicated
expressions.4
Not. The negation inverts the truth statement of an expression: If the state-
ment ϕ is true, then the statement (not ϕ) is false; if the statement ϕ is wrong,
then the statement (not ϕ) is true.
We have the following rules for combining not with and and or (De Morgan’s
laws of propositional logic):
(not(ϕ and ψ)) = ((not ϕ) or (not ψ)),
(not(ϕ or ψ)) = ((not ϕ) and (not ψ)).
If — then. The statement (if ϕ then ψ) is false only in the case where ϕ is
true, but ψ is false. That is, in the case where the statement ϕ is false, the statement
(if ϕ then ψ) is true no matter the truth value of ψ. This fact is known as ex falso
quodlibet, roughly translating to “from the wrong, anything [follows]”: If one starts
with a wrong premise (that is, if ϕ is false), then one cannot conclude anything
about the conclusion (that is, we don’t know whether ψ is true or false), even if the
deduction process (that is, the statement (if ϕ then ψ)) was completely correct.
Many mathematical theorems can be formulated in terms of “if — then” state-
ments. As a consequence, reformulations of such statements are regularly used
for the proof of theorems. The probably most important reformulations are the
following two:
(if ϕ then ψ) = (if (not ψ) then (not ϕ)),
(if ϕ then ψ) = not(ϕ and (not ψ)).
The first reformulation is used in proofs by contraposition: In order to show the
truth of the statement (if ϕ then ψ), we assume that the conclusion ψ does not hold
(that is, we assume (not ψ)) and try to conclude that ϕ does not hold either (that
is, we try to show that (not ϕ) holds). The second reformulation is used in proofs
by contradiction: Again we are trying to show that the statement (if ϕ then ψ)
holds. For that, we assume that ϕ and the negation of ψ are simultaneously true,
and then we try to show that this implies a false statement.
Be aware that the common meaning of the word “implication” is different from
the truth value of an “if . . . then” expression even in mathematics. For instance,
the statement
(2 + 2 = 4) =⇒ (7 is a prime)
is true as a mathematical statement: Both the antecedent (2 + 2 = 4) and the
consequent (7 is a prime) are true statements. However, we would not say (or
write) that “the assumption that 2 + 2 = 4 implies that 7 is a prime,” but we
rather reserve the notion of “implications” to the case where there is actually a
connection between the two statements.
4Knowing myself, I will invariably use at some point during the lecture a phrase along the
lines of “If a, then either b or c,” even though the phrase calls for a non-exclusive or. This is
intended as a safeguard against accusations of even more errors than really warranted.
4 0. MATHEMATICAL LANGUAGE
For all, there exists. The statement (for all x : ϕ(x)) is true precisely when
the statement ϕ(x) is true for every single admissible argument x. Conversely, it is
false when there exists at least one admissible argument such that ϕ(x) is false. The
statement (there exists x : ϕ(x)) is true when the statemet ϕ(x) is true for at least
one (but possibly more) admissible arguments x. The negation of (for all x : ϕ(x))
is (there exists x : (not ϕ(x))).
Many mathematical statements are “for all” or “there exists” statements (or
both) in disguise: For instance, the statement “The number 6 is a prime number.”
can be translated to (or should be interpreted as) the statement “There exist natural
numbers a and b with a < 6 and b < 6, such that ab = 6.” Breaking this down
further, we may arrive at the statement “There exist natural numbers a and b with
the properties a < 6 and b < 6 and ab = 6.”
More complicated, the expression “Every quadratic equation x2 + bx + c = 0
has a complex solution” can be translated to “For all numbers b and c, there exists
a complex number x with the property x2 + bx + c = 0.”
This last example also demonstrates the importance of sentence order: The
statement above is true (and you even know an explicit formula for all the solutions
of the equation), but if we reverse the order of the “for all” and the “there exists”
terms, we obtain the blatantly wrong statement “There exists a complex number
x such that all numbers b and c have the property x2 + bx + c = 0.” (In order to
see that this reversed statement is false, consider for fixed x the numbers b = −x
and c = 1.)
CHAPTER 1
Basic Notions
0. Reading Guide
In this note, definitions, remarks, examples, and exercises end with a ■; proofs
in most mathematical texts including this one end with a □. Theorems, proposi-
tions, lemmas, and corollaries are written in italics, which makes it unnecessary to
delineate their boundaries further.
Some of the sections are marked by a ∗ to indicate that they are of (sometimes
considerably) increased difficulty and/or abstraction. The content of these sections
is not used in the other parts of the notes, and thus you can freely skip over these.
The same convention is used for remarks, exercises, proofs, and so on. You may, of
course, read these parts as well, but only at your own risk. You have been warned.
1. Sets
Although it is possible to develop a precise notion of sets, I will not do so in
these notes. The reason is that such a rigorous introduction would be tedious and
difficult, but at the same time would only contribute little to the main subjects
of our studies in this course, namely linear mappings between vector spaces. This
section is intended to ensure that we have a common understanding about how we
can deal with sets. For a classical (and well written) introduction to set theory, I
refer to [Hal74]. I, in contrast, will rather loosely follow the exposition in the first
pages of [HS75].
Informally, a “set is any identifiable collection of objects of any sort.” The
objects contained in a set are called its elements or members. There are three main
possibilities to introduce a new set: First, we can simply list all of its elements
and, say, define a set X := {“Norwegian”, “blue”, “parrot”}. This is usually done
for small, finite sets, but we can use dots (. . .) for larger or even infinite sets; here
I assume that everybody understands what I mean with the sets {1, . . . , 1729} or
{3, 4, 5, . . .}. Second, we can define a set by a property of its elements. For instance,
we could define a set
Y := x : x is a dairy product that can be bought in a cheese shop .
Finally, we can build up a set from other sets using operations like union, intersec-
tion, or product of sets (see below).
The symbol ∈ indicates that an object is a member of a set; the symbol ̸∈
indicates that is is not. For instance, with the definition of X above, we have that
“parrot” ∈ X, but “Norwegian blue” ̸∈ X.
Definition 1.1 (Subset and proper subset). Assume that X and Y are sets.
We say that X is a subset of Y , denoted X ⊂ Y , if for every x ∈ X we have that
x ∈ Y ; otherwise we write X ̸⊂ Y . If X ⊂ Y and Y ⊂ X, then we write X = Y
(and the sets X and Y are equal); else X ̸= Y . If X ⊂ Y but X ̸= Y , then we say
that X is a proper subset of Y , denoted X ⊊ Y . ■
Remark 1.2. There is an alternative school of notation that uses the symbol ⊆
to denote subsets, and ⊂ to denote proper subsets. We do not follow this notation
5
6 1. BASIC NOTIONS
Remark 1.3. If we want to show that two sets, say X and Y , are equal, we have
to show that they contain precisely the same elements. Thus, the basic approach
for a proof is, first to take an arbitrary element x ∈ X and prove that it is also
contained in Y , and then to take an arbitrary element y ∈ Y and prove that it is
also contained in X. ■
Definition 1.4 (Empty set). By ∅ we denote the empty set, which has no
elements. ■
Statements about the empty set always appear somewhat awkward, and it can
be difficult to formulate proofs, as the “standard start” of picking some element
x ∈ ∅ does not really work. Thus it can be helpful to rely on proofs by contradiction.
As an example, we prove the following (really trivial) statement:
Lemma 1.5. For every set X we have ∅ ⊂ X.
Proof. Let X be an arbitrary set, and assume to the contrary that ∅ ̸⊂ X.
Then there exists some y ∈ ∅ such that y ̸∈ X. Because of the definition of the
emptyset, however, such a y cannot exist, and thus we have a contradiction. As a
consequence, the assumption ∅ ̸⊂ X is wrong, or, put differently, ∅ ⊂ X. □
Remark 1.6. In many cases, the elements of a set can be sets themselves. As
an example, a line is a set of point. Thus, if we are talking about the set of all lines
in the plane, we are in fact talking about a set consisting of sets. ■
Definition 1.8 (Power set). Let X be a set. By P(X) we denote the power
set of X, which consists of all subsets of X. That is,
S ∈ P(x) : ⇐⇒ S ⊂ X.
■
1. SETS 7
Remark 1.11. When we are working sets, we do not care about the “or-
der” in which elements appear in a set, nor the “multiplicity”. Thus the sets
{“blue”, “parrot”}, {“parrot”, “blue”}, and {“blue”, “parrot”, “blue”} are all the
same (as they contain the same elements), even though they look different at first
glance. ■
elements that are themselves sets) and some property p(x) for elements in U , we
are allowed to define a set of the form
x ∈ U : the property p(x) holds ⊂ U.
However, the same construction is not allowed without some (explicit or implicit)
specification of the superset U .
Still, one sometimes want to talk simultaneously about all different sets, or, in
this course rather, simultaneously about all possible real or complex vector spaces.
In order to do so, one can use the notion of a class instead of a set, which is (very
informally) a collection of sets defined by a common property, but is not a set
itself. Thus one can talk about the class of all sets or the class of all vector spaces.
Now, the contradiction we have observed above is no longer there: The class of all
sets that do not contain itself as elements is a perfectly fine class. Moreover, it
trivially does not contain itself as an element, as it only contains sets, not (proper)
classes. ■
2. Functions
Informally spoken, a function f : X → Y from set X to a set Y is an operation
that assigns to each element x ∈ X a unique element y := f (x) ∈ Y . Formally,
within the context of set theory, we can define a function as follows:
Definition 1.13 (Function). Let X and Y be sets. A function from X to Y ,
written f : X → Y , is a subset f ⊂ X × Y such that there exists for each x ∈ X
precisely one y ∈ Y with (x, y) ∈ f . Given x ∈ X, we denote by f (x) the unique
element in Y such that (x, f (x)) ∈ f . ■
Remark 1.14. In this course, we will not make use of the set theoretic definition
(and notation) of a function, but we will rather use the “standard” interpretation as
an operation. Feel therefore free to forget or ignore the set theoretic Definition 1.13.
■
Definition 1.15 (Domain and codomain). Assume that X and Y are sets and
that f : X → Y is a function. We say that X is the domain of f and Y the codomain
of f . ■
Remark 1.16. In this course, we will use the word function interchangeably
with mapping. Sometimes, we will also use the words transformation or operator,
though these will be mostly reserved for linear mappings. ■
Lemma 1.18 (Inverse). Assume that X and Y are sets and that f : X → Y is
a bijective function. Then there exists a unique function f −1 : Y → X, the inverse
of f , such that
f (f −1 (y)) = y for all y ∈ Y
and
f −1 (f (x)) = x for all x ∈ X.
2. FUNCTIONS 9
Definition 1.21 (Restriction). Assume that X and Y are sets and that f : X →
Y is a function. Assume moreover that U ⊂ X is a subset. By f |U we denote the
restriction of f to U , defined as the function f |U : U → Y satisfying
f |U (x) := f (x) for all x ∈ U.
■
Definition 1.22 (Image and pre-image). Assume that X and Y are sets and
that f : X → Y is a function. If A ⊂ X is a subset of X we denote by
f (A) := y ∈ Y : there exists x ∈ A with f (x) = y
10 1. BASIC NOTIONS
Proof. Exercise! □
In order to differentiate between sets and families, we will use curly brackets
{·} to enclose sets, but round brackets (·) to enclose families.
Remark 1.26. If the index set of a family is the set {1, 2, . . . , n} for some
n ∈ N, we rather speak of a tuple instead of a family. Also, we usually use in this
case either the notation (x1 , x2 , . . . , xn ) or (xi )i=1,...,n . ■
Example 1.27. In order to see the difference between a family and a set, let
us first consider the set
X := xk ∈ R : xk = (−1)k , k ∈ N = (−1)k : k ∈ N .
4. REAL NUMBERS 11
By defining the set X in that way, we might get the impression that X has infinitely
many elements, but this is in fact not the case: A set is defined only through the
distinct elements it contains, and it does not matter how many times the same
element appears in the definition; the notion of “multiplicity” of elements in a set
is not defined at all. Next, we could also write the set X as
X = {−1, +1, −1, +1, −1, . . .}.
Still, this might give a quite wrong impression about X as a set, as it looks like −1
would be the “first” element of X, then +1 would be the “second”, −1 would be
again the “third”, and so on. However, the order in which its elements appear in
the definition plays no role at all, and it does not make sense to talk about “the
first element” of a set. The set X simply is
X = {+1, −1} = {−1, +1}.
Now consider instead the family
Y := ((−1)k )k∈N = (−1, +1, −1, +1, . . .).
Here it is actually true to say that this family has infinitely many members, although
it, obviously, only has 2 distinct ones. Also, −1 is the first term, +1 is the second
term, and so on. ■
In the special case where the index set are the natural numbers, we usually
speak of a sequence instead of a family:
Definition 1.28 (Sequence). Assume that X is a set. A sequence in X is a
function x : N → X. Usually, we denote a sequence in the form (xi )i∈N , where
xi := x(i) are the terms of the sequence. ■
4. Real Numbers
There are different approaches to the definition of the real numbers, all of which
are somewhat complicated and not necessarily intuitive. Because of that, we will
abstain from providing a precise definition, but rather assume that everybody has
an intuitive understanding of what we mean by the real numbers. At some point
during the course (specifically, in Section 4), however, we have to make use of the
completeness of real numbers, a concept that we will introduce only then. In this
section, we will therefore introduce the main property of the real numbers that will
be required then.
12 1. BASIC NOTIONS
As an example, −29 is a lower bound of the half-open interval (0, 1], whereas
1 is an example of an upper bound. We have for all subsets S ⊂ R that +∞ is an
upper bound of S and −∞ is a lower bound of S. For the particular case S = ∅,
every element of R ∪ {±∞} is simultaneously a lower bound and an upper bound.
Definition 1.33 (Infimum and supremum). Let S ⊂ R.
• We say that y ∈ R ∪ {±∞} is a supremum of S, if y is an upper bound,
and y ≤ z for all upper bounds z of S.
• We say that y ∈ R ∪ {±∞} is an infimum of S, if y is a lower bound of S
and y ≥ z for all lower bounds z of S.
■
Example 1.38. The infimum of the half-open interval [0, 1) is 0, and the su-
premum is 1. More general, if I = (a, b) is any interval in R, then inf I = a and
sup I = b, and the same holds if we replace the open interval by the closed interval
or any half-open interval.
Moreover, we have that
inf R = −∞ and sup R = +∞,
1
whereas
inf ∅ = +∞ and sup ∅ = −∞.
■
5*. Choice
Modern mathematics is built upon the foundation of axiomatic set theory.
However, this foundation is buried so deep under lots of familiar looking definitions
and theorems that you almost never will come in contact with it, unless you want
to specialise in set theory or in logic. There is one major exception, though, in the
form of a set theoretic axiom that consistently shows itself in unexpected places,
namely the axiom of choice. In this class, we will, for instance, meet this axiom
when we will discuss the existence of bases of arbitrary vector spaces.
In this section, I will present a brief overview over a naive approach to axiomatic
set theory up to and including the notorious axiom of choice. I will largely follow
here the basic approach of [Hal74], but will skip over almost all of the details; you
can find them there if you are interested. Let us thus start our journey into the
mathematical underworld with the fitting “lasciate ogne speranza, voi ch’intrate.”
Or, on a slightly more positive note, quoting Paul Halmos [Hal74, p. vi]: “[G]eneral
set theory is pretty trivial stuff really, but, if you want to be a mathematician, you
need some, and here it is; read it, absorb it, and forget it.”
In axiomatic approaches to set theory, one generally ignores the question of
what sets actually are, but rather focusses what one can do with sets. Essentially,
one specifies operations that are allowed with sets and that result in other sets
being formed.
Axiom 1.40* (Axiom of Extension). Two sets are equal, if and only if the
contain the same elements.
This is just a rewording (and axiomatisation) of Definition 1.1.
Axiom 1.41* (Axiom of Specification). If Y is a set and p(x) is a statement
about the elements x ∈ Y , then there exists a set X ⊂ Y containing precisely the
elements x ∈ Y for which the statement p(x) is true.
1Do verify the next statement!
14 1. BASIC NOTIONS
In other words, the “object” x ∈ Y : p(x) is true is actually a set. Note here
that we explicitly restrict ourselves to subsets of a given set, thereby sidestepping
the issues discussed in Remark 1.12*.
Axiom 1.42* (Axiom of Pairing). If X and Y are sets, there exists a set Z
containing both X and Y .
Note here that the elements of the set Z are the sets X and Y , not the elements
of these sets. The latter is only taken care of in the next axiom:
Axiom 1.43* (Axiom of Unions). For every collection of sets, there exists a set
containing each element of each set in that collection.
Put (slightly) differently, we can form the union of arbitrarily many sets.
Axiom 1.44* (Axiom of Powers). For every set X, there exists a set that con-
tains all the subsets of X as elements.
In particular, this means that the power set P(X) of any set X is again a set.
Note that all axioms until now build on already existing sets, but we have not
yet stated that “sets” actually exist. We will finally do so in the next axiom:
Axiom 1.45* (Axiom of Infinity). The set N of natural numbers exists.
Remark 1.46*. Usually, this axiom is formulated quite a bit differently: One
starts by defining 0 := ∅, then defines 1 := {0} = {∅}, 2 := {0, 1} = {∅, {∅}},
3 := {0, 1, 2}, and so on. The axiom of infinity then is formulated at stating that
the empty set ∅ exists, as well as a set that contains all the sets 0, 1, 2, . . . ,
constructed in that manner.
In particular, we obtain from this construction that, viewed from a set theor-
etical perspective, the set of natural numbers is N = {0, 1, 2, 3, . . .}, starting at 0.
However, a guesstimated majority of mathematicians excludes 0 from belonging to
the natural numbers and defines N = {1, 2, 3, . . .} starting at 1. I personally belong
to the latter camp as a mathematician and to the former camp when programming,
but really do not bother too much. You can therefore assume that the set N does
not contain the number 0 whenever I use it as an index set for a sequence, although
it mostly should not matter. At the same time, I am not offended if you follow the
alternative convention and insist that 0 is natural.2 ■
2As a fine Norwegian compromise position, we could decide to take the average of the two
definitions and define N = {1/2, 3/2, 5/2, . . .}, though I do doubt that this definition would gain
much traction in practice.
3
This construction is somewhat roundabout and unintuitive, though, as we do not have the
notion of pairs at our disposal: If X and Y are sets, we can define the Cartesian product of X
and Y as the set of all sets of the form {x, {x, y}} with x ∈ X and y ∈ Y .
5*. CHOICE 15
the union of this collection. The Cartesian product of the collection of sets is the
set ×i∈I Xi defined as
×
Xi := x : I → X : xi ∈ Xi for all i ∈ I .
i∈I
Axiom 1.48* (Axiom of Choice). Assume that (Xi )i∈I is a family of non-empty
Q
sets Xi indexed by a non-empty index set I. Then the Cartesian product i∈I Xi
is non-empty.
In view of the definition of the Cartesian product, we can reformulate the Axiom
of Choice in terms of functions as follows: If (Xi )i∈I is a family of non-empty
S sets
Xi indexed by a non-empty index set I, there exists a function f : I → i∈I Xi
such that f (i) ∈ Xi for all i ∈ I. Such a function is called a choice function for
this family. Colloquially, we can say that we may (simultaneously) choose from any
collection of sets one element from each of the sets in that collection.
At first glance, this axiom is obviously true, and one might think that there
is nothing controversial about it: Of course, we can choose elements from sets.
However, the axiom does not put any limits on the number and form of sets from
which we choose, and it does not provide any concrete method for constructing a
choice function. For instance, we can consider the situation where the collection
of sets we are interested in is just the power set of R excluding ∅, that is, the set
of all the non-empty subsets of R. The Axiom of Choice then claims/postulates
that it is possible to select from each non-empty subset X ⊂ R an element x ∈ X.
For certain subsets, such a selection is easily possible: For bounded intervals, for
instance, we could simply the midpoint. However, the assertion is that a selection
is possible no matter the shape of the subset of R.
We now look at two important equivalent reformulations of the Axiom of Choice
intended to illustrate the controversial nature of said axiom, namely the Well-
ordering Axiom and Zorn’s Lemma. For this, we first have to introduce the notion
of ordered sets.
Definition 1.49* (Partial order). Let X be a non-empty set. A partial order
on X is a subset R ⊂ X × X with the following properties:
• Reflexivity: For all x ∈ X we have that (x, x) ∈ R.
• Anti-symmetry: If (x, y) ∈ R and (y, x) ∈ R, then x = y.
• Transitivity: If (x, y) ∈ R and (y, z) ∈ R, then (x, z) ∈ R.
Given a partial order R on X, we usually indicate the relation (x, y) ∈ R by x ≤R y.
A set X together with a partial order on X is called a partially ordered set. ■
Definition 1.50* (Total order). Let X be a non-empty set and R be a partial
order on X. We say that R is a total order, if for all x, y ∈ X, at least one of the
relations x ≤R y or y ≤R x holds. A set X together with a total order on X is
called a totally ordered set. ■
Example 1.52*. Let Y be any set and let X := P(Y ) be the power set of Y
consisting of all subsets of Y . On X we can define a partial order by (S, T ) ∈ R if
S ⊂ T . In general, this is not a total order.4 ■
Remark 1.54*. It is easy to see (in view of the anti-symmetry of orders) that
smallest elements are necessarily unique, if they exist. Thus we may as well speak of
the smallest element of a set S instead of a smallest element, once we have assured
that such an element actually exists. In contrast, minimal and maximal elements
need not be unique. ■
Example 1.57*. The set N of natural numbers with the standard order is well-
ordered, as every non-empty subset of positive integers has a smallest element. The
set Z of integers is not well-ordered; for instance the whole set Z has no smallest
element, as it is unbounded below. The set R≥0 of non-negative real numbers is not
well ordered: For instance, the open interval (1, 2) contains no smallest element. ■
Theorem 1.58* (Well-ordering Theorem). Every non-empty set can be well-
ordered.
Proof. See [Hal74, Sec. 17]. □
Remark 1.59*. This theorem of course does not state that every non-empty
set X with a total order R is already well-ordered, but only that we can find another
total order S on X that is a well-ordering. This order S may have nothing to do
with R and need not be compatible to any other structure on X. ■
Definition 1.60*. Assume that X is a partially ordered set with partial order
R. A chain in X is a subset Y of X such that the restriction of R to Y defines a
total order on Y . ■
Linear Algebra
Remark 2.3. For simplicity, we will in the following denote by K either the
set R of real numbers or the set C of complex numbers in order to simplify the
notation. There will be some situations later on, when we will have to distinguish
between real and complex vector spaces, but many of the simpler results are the
same in both cases.1 ■
Example 2.4. The following are some standard examples of vector spaces that
will be used throughout the course. It is left to the reader as an exercise2 to verify
that all the requirements for a vector space (that is, Items (1)–(6) in Definition 2.1)
are indeed satisfied in each example.
1The letter K here stands for field, which in Norwegian is kropp (and in German Körper ).
This is the most general algebraic structure for which it makes sense to define vector spaces. For
more information, please see the course TMA4150 – Algebra.
2
These “exercises for the reader” can actually provide you with suitable training for the exam!
They are not exclusively here, because I was too lazy to formulate a complete proof.
19
20 2. LINEAR ALGEBRA
Example 2.5 (Function spaces). Assume that S is an arbitrary set and denote
by
KS := f : S → K
vector space.
More general, let U be a vector space over K and let S be an arbitrary set. We
denote by
U S := f : S → U
Remark 2.6. The usage of + and · for the addition and scalar multiplication
in Definition 2.1 is intended to be suggestive and follows the standard notation of
these operations. However, it somewhat hides how restrictive these assumptions
actually are.
A more formal (but much less intuitive) way would be to define a vector space
as a set V together with two functions f : V × V → V and g : K × V → V that
satisfy Items (1)–(6). With this notation, commutativity and parts of associativity
and distributivity, for instance, would read as
Commutativity: f (u, v) = f (v, u),
Associativity: f f (u, v), w = f u, f (v, w) ,
Distributivity: g(λ + µ, v) = f g(λ, v), g(µ, v) ,
respectively. ■
1. VECTOR SPACES AND LINEAR TRANSFORMATIONS 21
1.2. General Properties. In the following, we will prove some general results
concerning vector spaces that follow directly from the definition of a vector space.
Amongst others, we will show that the additive identity and the additive inverse are
unique, although this seems not to be required in the definition. This is obviously
true in all the examples provided above. What these results, however, show, is that
it is impossible to construct a vector space with two different 0-elements. If one
would need such an object for whatever theoretical or practical reason, one would
need to search for it outside the realm of vector spaces.
This section is mainly intended to demonstrate by means of simple examples
how elementary algebraic proofs look like. Feel free to skip over this part, if you are
already familiar with this type of proofs.
Lemma 2.7. Assume that V is a vector space. Then the additive identity 0 ∈ V
is unique. That is, there exists no vector v ∈ V with v ̸= 0 such that u + v = u for
all u ∈ V .
Proof. Assume that v ∈ V satisfies u + v = u for all u ∈ V . With u = 0 we
have in particular that
0 = 0 + v |{z}
= v + 0 |{z}
= v,
(1) (3)
Lemma 2.8. Assume that V is a vector space and that u ∈ V . Then there exists
a unique vector v ∈ V such that u + v = 0.
Proof. Since V is a vector space, we already know that such a vector v exists.
Thus it remains to show that v is unique.
Assume therefore that v1 , v2 ∈ V satisfy u + v1 = 0 and u + v2 = 0. Then
v1 |{z}
= v1 +0 = v1 +(u+v2 ) |{z}
= (v1 +u)+v2 |{z}
= (u+v1 )+v2 = 0+v2 |{z}
= v2 +0 = v2 ,
(3) (2) (1) (1)
This shows that (−1) · v is an additive inverse for v. Now the identity (−1) · v = −v
follows from the uniqueness of the additive inverse. □
22 2. LINEAR ALGEBRA
Exercise 2.1. Given a matrix A ∈ Matm,n (C), we define its Hermitian con-
jugate as the transpose of the componentwise complex conjugate of A, that is,
AH := (A)T ∈ Matn,m (C).
Show that the mapping T : Matm,n (C) → Matn,m (C),
T A := AH
is not linear. ■
Proposition 2.18. Assume that U , V , and W are vector spaces over K, and
that T : U → V and S : V → W are linear transformations. Then the composition
S ◦ T : U → W is a linear transformation as well.
Proof. Let u, v ∈ U , and λ, µ ∈ K. Then
S ◦ T (λu + µv) = S T (λu + µv) |{z}
= S λTu + µTv
T linear
= λ S(T u) + µ S(T v) = λ S ◦ T (u) + µ S ◦ T (v).
|{z}
S linear
□
1.4. Subspaces. We will now introduce the notion of subspaces of vector
spaces. Roughly spoken, a subspace of a vector space is a subset, which is a vector
itself when using the same addition and scalar multiplication.
Definition 2.19 (Subspace). Assume that V is a vector space over K and
U ⊂ V a subset of V . We say that U is a subspace of V , if U is in itself a vector
space over K with the same addition and scalar multiplication as on U .
That is, if +V and ·V denote addition and scalar multiplication on V , and +U
and ·U denote addition and scalar multiplication on U ⊂ V , then u +V v = u +U v
for all u, v ∈ U , and λ ·V u = λ ·U u for all λ ∈ K and u ∈ U . ■
u ∈ U . Then Items (2) and (3) imply that 0 = u + (−1) · u ∈ U , and thus (1)
holds. □
Example 2.21. Consider the following examples of vector spaces and subsets,
which in some cases are subspaces and in others are not:3
space V , the sets {0} and V are subspaces of V .
(1) For every vector
(2) The set U := (x1 , x2 ) ∈ R2 : x2 = −x1 is a subspace of R2 .
(3) Let V = Kn and U := Kn−1 × {0} = (x1 , . . . , xn−1 , xn ) ∈ Kn : xn = 0 .
Then U is a subspace of V .
(4) For every n ∈ N, the sets Symn (K) := A∈ Matn (K) : AT = A of
symmetric n × n-matrices and Skewn (K) := A ∈ Matn (K) : AT = −A
of skewsymmetric n × n-matrices are subspaces of Matn (K).
(5) For every n ∈ N, the set Pn (K) of polynomials of degree at most n is a
subspace of the vector space P(K) of all polynomials.
(6) The set C(K) of continuous K-valued functions f : R → K is a subspace
of the space RK of
all K-valued functions on R.
(7) Denote by V := f ∈ C(K) : f (7) = 0 . Then V is a subspace of C(K).
(8) Let V = Kn and U := Kn−1 × {1} = (x1 , . . . , xn−1 , xn ) ∈ Kn : xn = 1 .
Then U isnot a subspace of V .
(9) Let U := (x1 , x2 ) ∈ K2 : x1 = 0 or x2 = 0 . Then U is not a subspace
of K2 .
(10) Let n ∈ N, n ≥ 2, and U := A ∈ Matn (K) : det A = 0 the set of
singular matrices. Then U is not a subspace of Matn (K).
■
Put differently, the kernel of T is the set of all solutions of the equation T u = 0;
the range of T is the set of all right hand sides v for which the equation T u = v
has a solution.
Lemma 2.23. The range and the kernel of a linear transformation are subspaces
of the respective vector spaces.
Proof. Exercise! □
3As an exercise, you can verify in each of the examples whether this is actually true.
2. LINEAR INDEPENDENCE AND BASES 25
• The range ran(T ) is the set of all matrices B that can be written in the
form B = A + AT for some matrix A. It is easy to see that this is the case
if and only if B is symmetric.
Thus we have that
ker(T ) = Skewn (K) and ran(T ) = Symn (K).
■
′
Example 2.25. Consider the mapping D : P(K) → P(K), p 7→ Dp := p . Then
ker(D) = p ∈ P(K) : p′ = 0 = p ∈ P(K) : p is constant
and
ran(D) = P(K).
■
Remark 2.29. It is useful to be able to talk about the span of the empty family
∅, mainly in order to avoid having to deal with annoying special cases in various
theoretical results. For that, we simply define
span(∅) := {0} ⊂ V.
■
Proposition 2.30. Assume that V is a vector space and S = (vi )i∈I is a family
in V . Then span(S) is the smallest subspace of V containing all members of S.
That is, span(S) is a subspace of V , and if U ⊂ V is any other subspace of V with
S ⊂ U , then also span(S) ⊂ U .
Proof. We show first that span(S) is actually a subspace of V . To that end,
we note first that, trivially, 0 ∈ V . Next assume that u, w ∈ span(S). Then we can
write
XN NX
+M
u= λj vij and w= λ j vi j
j=1 j=N +1
for some N , M ∈ N, indices i1 , . . . , iN +M ∈ I, and scalars λ1 , . . . , λN +M ∈ K. As
a consequence,
NX +M
u+w = λj vij ∈ span(S).
j=1
which shows that also αu ∈ span(S). According to Lemma 2.20, this shows that
span(S) is a subspace of V .
Now assume that U ⊂ V is any subspace of V containing S. Let moreover
u ∈ span(S). We have to show that also u ∈ U .
Because u ∈ span(S), we can write
N
X
u= λj vij
j=1
Remark 2.33. We have formulated the notions of linear dependence and in-
dependence for families, but it is straightforward to formulate them for sets: A set
S ⊂ V is linearly independent, if for P each finite subset W ⊂ S and each family
(λw )w∈W ⊂ K of scalars such that w∈W λw w = 0 we have that λw = 0 for all
w ∈ W.
There is, however, a difference as to how sets and families handle “duplicate”
members, which can lead to an at first glance counterintuitive result about linear
independence of sets: It is true to say that a family S = (vi )i∈I is linearly dependent,
if it contains the same element twice, that is, if there exist i ̸= j ∈ I such that
vi = vj . In particular, if V is a vector space and v ∈ V \ {0} is any non-zero vector
in V , then the family (v, v) is linearly dependent. However, the set {v, v} is linearly
independent, as, actually, {v, v} = {v} (as a set). Because of this (and because of
the possibility of actually accessing different members by means of their indices) we
will most of the time employ families when talking about linear independence. ■
Proposition 2.34. Assume that V is a vector space and S = (vi )i∈I is a family
in V . Then S is linearly independent, if and only if every vector u ∈ span(S) can
be written in a unique way as a finite linear combination of members of S.
More precisely, S is linearly independent if and only if the following holds: If
u ∈ span(S), then there exists a unique finite subset JP⊂ I and a unique family
(λj )j∈J ∈ K\{0} of non-zero coefficients, such that u = j∈J λj vj . Here we define
the empty sum to be equal to 0.
Proof. Assume first that S is linearly independent, and let u ∈ span(S).
From the definition of span(S), it follows that u can be written as a finite linear
combination of members of S. Thus we only have to show that this finite linear
combination is unique. Assume therefore that we can write
X X
u= λ j vj = µk vk
j∈J k∈K
for finite subsets J, K ⊂ I, and families (λj )j∈J ⊂ K\{0} and (µk )k∈K ⊂ K\{0} of
non-zero coefficients. Define now L := J ∪ K, and define the family (νℓ )ℓ∈L setting
λℓ − µℓ if ℓ ∈ J ∩ K,
νℓ := λℓ if ℓ ∈ J \ K,
−µℓ if ℓ ∈ K \ J.
Then X X X
νℓ v ℓ = λℓ vℓ − µℓ vℓ = u − u = 0.
ℓ∈L ℓ∈J ℓ∈K
Now the linear independence of S implies that νℓ = 0 for all ℓ ∈ L. In particular,
λℓ = µℓ for all ℓ ∈ J ∩ K. It remains to show that J = K, that is, that J \ K = ∅
28 2. LINEAR ALGEBRA
Example 2.35. Consider the space P(K) of all polynomials with coefficients
in K. Let moreover M := (1, x, x2 , . . .) be the family of all monomials. Then M is
linearly independent. ■
Example 2.38. The set M of monomials is a basis of the vector space of all
polynomials. ■
which implies that −λv v ∈ span(B). Since by assumption v ̸∈ span(B), this further
implies that λv = 0. As a consequence, we have that
X
0= λu u.
u∈J\{v}
The existence of a basis of every vector space is, for pure mathematicians, an
extremely interesting result. For practical applications, however, it turns out to
be largely useless in that generality, as one can actually show that there cannot
be a universally applicable method for actually constructing a basis. (In fact, it is
possible to show that the existence of bases of aribtrary vector spaces is equivalent to
the Axiom of Choice.) Because of that, we will turn instead to finite dimensional
vector spaces, where the situation appears to be much simpler. This, however,
requires us first to define the notion of the dimension of a vector space.
Definition 2.40. A vector space V is called finite dimensional, if V has a
finite basis. Else, V is called infinite dimensional. ■
Theorem 2.41. Assume that V is finite dimensional. Then all bases of V are
finite and contain the same number of members.
Definition 2.42 (Dimension). Assume that V is finite dimensional. The num-
ber of members of members of any (and therefore every) basis of V is called the
dimension of V and denote dim V . ■
30 2. LINEAR ALGEBRA
The proof of Theorem 2.41 is surprisingly tricky and relies on another important
result, namely the Steinitz Exchange Lemma 2.43*, which we formulate (and prove)
below. Feel therefore free to skip over the proof.
which again is a finite linear combination of members of B̂. Thus it follows that
v ∈ span(B̂).
We now show that span(B̂) is linearly independent. Assume therefore that
L ⊂ I is finite and that µℓ ∈ K, ℓ ∈ L, are such that
X
µℓ v̂ℓ = 0.
ℓ∈L
with µℓ ̸= 0 for all ℓ ∈ L . Now, if im+1 ̸∈ L′ , then v̂ℓ = ṽℓ for all ℓ ∈ L′ and thus
′
the linear independence of B̃ implies that µℓ = 0 for all ℓ ∈ L′ , which implies that
L′ = ∅. Assume therefore that im+1 ∈ L′ . Then we can write
µi X X
0 = m+1 um+1 − λj ṽj + µℓ ṽℓ .
λim+1 ′
j∈J\{im+1 } ℓ∈L \{im+1 }
As a consequence,
X X λim+1
(1) um+1 = λj ṽj + µℓ ṽℓ .
µim+1
j∈J\{im+1 } ℓ∈L′ \{im+1 }
and that λim+1 ̸= 0. Since B̃ is a basis of V and the basis representation of every
vector v ∈ V is unique, it follows that the vector ṽim+1 must also appear (with
a non-zero coefficient) in the representation (1), which is obviously not the case.
Thus im+1 ̸∈ L′ , which in turn shows that L′ = ∅. Thus the set B̂ is linearly
independent, which concludes the proof. □
T x2 , where we have the coefficients (0, 2, 0). In total, we obtain the matrix
0 1 0 0
A = 0 0 2 0 .
0 0 0 3
■
Proposition 2.46. Let U and V be vector spaces over K. Then their direct
sum U ⊕ V is again a vector space over K.
Proof. Exercise! □
More general, we can consider several (but finitely many) vector spaces U1 ,. . . ,Un
and define their direct sum U1 ⊕ U2 ⊕ · · · ⊕ Un as the product U1 × U2 × · · · × Un
together with the componentwise addition
(u1 , u2 , . . . , un ) + (v1 , v2 , . . . , vn ) := (u1 + v1 , u2 + v2 , . . . , un + vn )
and componentwise scalar multiplication
λ · (u1 , u2 , . . . , un ) := (λu1 , λu2 , . . . , λun ).
Example 2.47. The direct sum of m copies of K is the space Km . The direct
sum of n copies of the space Km is the space Matm,n (K). ■
Example 2.49. In the previous examples, we only have the case where all the
involved spaces in the direct sum were identical. However, this need not be the
case in general. A simple, but important, example here is the case where U = Kn
and V = Km for some n, m ∈ N. For this case, we obtain that
Kn ⊕ Km = Kn+m .
■
34 2. LINEAR ALGEBRA
Lemma 2.51. Assume that U1 ,. . . ,Un are vector spaces and 1 ≤ k ≤ n. The
k-th inclusion ık : Uk → U1 ⊕· · ·⊕Un and the k-th projection πk : U1 ⊕· · ·⊕Un → Uk
are linear transformations.
Proof. Exercise! □
Assume now that U1 , U2 , and V1 , V2 are vector spaces and that T : U1 ⊕ U2 →
V1 ⊕ V2 is a linear transformation. Moreover, let (u1 , u2 ) ∈ U1 ⊕ U2 be arbitrary
(that is, let u1 ∈ U1 and u2 ∈ U2 be arbitrary). Then the linearity of T implies
that
T (u1 , u2 ) = T (u1 , 0) + (0, u2 ) = T (u1 , 0) + T (0, u2 ).
Now both T (u1 , 0) and T (0, u2 ) are vectors in V1 ⊕ V2 , and thus we can decompose
both of these vectors into parts
T (u1 , 0) =: (T11 u, T21 u) ∈ V1 ⊕ V2 and T (0, u2 ) =: (T21 v, T22 v) ∈ V1 ⊕ V2 .
Here T11 is the part of T that maps U1 to V1 ; then T21 is the part of T that maps
U1 to V2 ; next, T12 is the part of T that maps U2 to V1 ; finally, T22 is the part of T
that maps U2 to V2 . Thus we can write the transformation T in “(block)-matrix-
vector-form” as
u1 T11 T12 u1 T11 u1 + T12 u2
T = = .
u2 T21 T22 u2 T21 u1 + T22 u2
The block Tij can be expressed explicitly expressed in terms of a composition of the
inclusion ıj : Uj → U1 ⊕ U2 , the transformation T , and the projection πi : V1 ⊕ V2 →
Vi as
(6) Tij = πi ◦ T ◦ ıj : Uj → Vi .
More general, if U1 ,. . . ,Un and V1 ,. . . ,Vm are vector spaces and T : U1 ⊕ · · · ⊕
Un → V1 ⊕ · · · ⊕ Vm is linear, we can write the operator T in the form
u1 T11 . . . T1n u1 T11 u1 + . . . + T1n un
.. .. . . .
. .. ..
(7) T . = . . . = ,
. .
un Tm1 ... Tmn un Tm1 u1 + . . . + Tmn un
where the blocks Tij : Uj → Vi are again defined as in (6).
Conversely, if Tij : Uj → Vi are arbitrary linear mappings, we can define a
mapping T : U1 ⊕ · · · ⊕ Un → V1 ⊕ · · · ⊕ Vm using the formula (7). Thus providing
the whole mapping T is equivalent to providing the blocks Tij .
Remark 2.52. What we did just know looks pretty much like matrix–vector
calculus, and up to a point it really is: The case of linear mappings from Kn
to Km written as matrices A ∈ Km×n can be regarded as a special case, since
we can interpret Kn and Km as the n-fold and m-fold direct sum of copies of K,
respectively. ■
4. DIRECT SUMS AND INVARIANT SUBSPACES 35
We now consider the special case where U1 ,. . . ,Un are vector spaces and we are
given linear transformations Ti : Ui → Ui for 1 ≤ i ≤ n. Denote U := U1 ⊕ · · · ⊕ Un .
Then we can define a block-diagonal transformation Diag(T1 , . . . , Tn ) : U → U by
setting
T1 0 . . . . . . 0
..
0 T2 . . .
.
.. . . .. .. ..
Diag(T1 , . . . , Tn ) := . . . . .
. .
.. .. T
n−1 0
0 ... ... 0 Tn
with the notation from above. Note here that a 0 on the ij-th position is actually
the 0-operator from Uj to Ui , which maps every element u ∈ Uj to the 0-vector in
Ui .
In a sense, block-diagonal linear transformations are the simplest possible linear
transformations, both from a numerical and an analytical point of view, as all the
information about the transformation is already contained in the blocks T1 ,. . . ,Tn .
For instance, in order to represent the whole transformation, it is only necessary
to store (or implement) these n blocks; all the off-diagonal 0-blocks can simply
be ignored. Also, it is for instance straight-forward to show that a block-diagonal
transformation is invertible, if and only if each of the blocks is invertible; in this
case, the inverse is again block-diagonal and the blocks are the inverses Ti−1 of the
transformations Ti .
Remark 2.53* (Infinite direct sums). In the discussion above, we have only
considered direct sums of finitely many vector spaces. It is also possible, and some-
times necessary, to look at direct sums of infinitely many vector spaces. However,
the construction of these infinite direct sums is slightly more complicated than that
of finite direct sums.
Let us therefore assume that we are given a family (Ui )i∈I of vector spaces for
some non-empty (and possibly infinite) index set I. Then we can the direct sum of
this family as
n o
×
M
Ui := (ui )i∈I ∈ Ui : there exists a finite set J ⊂ I with ui = 0 for i ̸∈ J .
i∈I i∈I
That is, we take all the elements in the Cartesian product of the vector space for
which all but finitely many components are 0. As in the finite case, we define the
addition and the scalar multiplication on this set componentwise.
All the results we indicated above concerning the interplay between linear trans-
formations and direct sums still hold in this setting: If we are given two families of
vector spaces (Uj )j∈J and (Vi )i∈I and for each pair (i, j) ∈ I ×JLa linear transform-
L
ation Tij : Uj → Vi , we can define a linear transformation T : j∈J Uj → i∈I Vi
by
X
(8) T (uj )j∈J := Tij uj .
i∈I
j∈J
P
Note here that the sum j∈J Tij uj is actually a finite sum for each i ∈ I, as only
L
finitely many componentsLof (uj )j∈J L ∈ Uj are non-zero. Conversely, given a
linear transformation T : U
j∈J j → can define Tij := πi ◦T ◦ıj : Uj →
i∈I i , we L
V
Vi . Then the equation (8) holds for all (uj )j∈J ∈ j∈J Uj . ■
4.2. Internal Direct Sums. In the previous section, we have seen how we
can use the idea of a direct sum to combine different vector spaces to a larger space.
Moreover, we have discussed the connection between a linear transformation T of
36 2. LINEAR ALGEBRA
the larger space and the mappings Tij between the component spaces. Of particular
interest is the case where the mappings Tij are 0 whenever i ̸= j, and thus T has
a block-diagonal form.
The main goal of this chapter is to investigate whether we can go the opposite
direction as well. That is, we start with an arbitrary vector space U and a linear
transformation T : U → U . Is it then possible to decompose U as a direct sum
U = U1 ⊕ · · · ⊕ Un such that the mapping T has block-diagonal form w.r.t. this
decomposition? Also, to which extent are the spaces Uj unique?
As a first step, we define sums of arbitrary subsets of a vector space.
Definition 2.54. Assume that V is a vector space and that S1 , . . . , SN are
subsets of V . We define
S1 +. . .+SN := v ∈ V : there exist ui ∈ Si , i = 1, . . . , N, with v = u1 +. . .+uN .
■
In the case of a sum of two sets, there is a simpler way for checking whether a
sum is direct or not, as the following result shows.
Lemma 2.59. Let V be a vector space and let U1 , U2 ⊂ V be subspaces of V .
The sum U1 + U2 is direct, if and only if U1 ∩ U2 = {0}.
Proof. Assume first that U1 ∩ U2 = {0}, and assume that w ∈ U1 + U2 can
be written as
w = u 1 + u 2 = v 1 + v2
with u1 , v1 ∈ U1 , and u2 , v2 ∈ U2 . Since U1 , U2 are subspaces of V , it follows
that u1 − v1 ∈ U1 and u2 − v2 ∈ U2 , and we also have that −(u1 − v1 ) ∈ U1 and
−(u2 − v2 ) ∈ U2 . However, we also have that
0 = w − w = (u1 − v1 ) + (u2 − v2 ),
and therefore
u1 − v1 = −(u2 − v2 ).
This shows that u1 − v1 = −(u2 − v2 ) ∈ U1 ∩ U2 = {0}, and thus u1 − v1 =
−(u2 − v2 ) = 0, or u1 = v1 and u2 = v2 . Thus the sum U1 + U2 is direct.
Now assume that the sum U1 + U2 is direct, and let v ∈ U1 ∩ U2 . Then we can
write
v =v+0=0+v
in two ways as sum of one vector in U1 and one vector in U2 . Since the sum is
direct, this is only possible for v = 0, which shows that U1 ∩ U2 = {0}. □
Remark 2.60. The result from Lemma 2.59 does not hold in the same/similar
form for sums of more than two subspaces. That is, if we have three subspaces U1 ,
U2 , U3 and we have that all pairwise intersections are trivial, that is, U1 ∩ U2 =
U1 ∩ U3 = U2 ∩ U3 = {0}, then we cannot conclude that the sum U1 + U2 + U3
is direct. As a simple counterexample, consider the spaces U1 = span{(1, 0)},
U2 = span{(0, 1)}, U3 = span{(1, 1)} of K2 . Their pairwise intersection is trivial,
but their sum is not direct: The vector (1, 1), for instance, is an element of U3 , but
it can also be written in the form (1, 1) = (1, 0) + (0, 1) as a sum of vectors in U1
and U2 . ■
38 2. LINEAR ALGEBRA
and
U2 = (x, y, z) ∈ R3 : x + y + z = 0 .
We will show
R 3 = U1 ⊕ U2 .
We start by showing that U1 ∩ U2 = {0}. To that end let (x, y, z) ∈ U1 ∩ U2 .
Because (x, y, z) ∈ U1 , it follows that y = z = 0. Now the fact that (x, y, z) =
(x, 0, 0) ∈ U2 implies that x = 0 as well. Thus (x, y, z) = (0, 0, 0).
Next we need to show that R3 = U1 + U2 . Assume therefore that (x, y, z) ∈ R.
Then we can write
x x+y+z −y − z
(9) y = 0 + y .
z 0 z
Since (x + y + z, 0, 0) ∈ U1 and (−y − z, y, z) ∈ U2 , it follows that we can write
an arbitrary vector in R3 as a sum of a vector in U1 and a vector in U2 . Thus
R3 = U1 + U2 . Since U1 ∩ U2 = {0}, it follows that R3 = U1 ⊕ U2 . ■
Proposition 2.63. Assume that V is a vector space and that U1 and U2 are
subspaces such that V = U1 ⊕ U2 . Then the projections π1 and π2 onto U1 along
U2 and onto U2 along U1 , respectively, are linear, and we have that
v = π1 (v) + π2 (v)
for all v ∈ V .
Proof. The decomposition v = π1 (v) + π2 (v) follows immediately from the
definition. Assume now that v, w ∈ V . Then we can uniquely write v = u1 + u2
and w = z1 + z2 with u1 , z1 ∈ U1 , and u2 , z2 ∈ U2 . In particular, we have that
u1 = π1 (v), u2 = π2 (v), z1 = π1 (w), and z2 = π2 (w). As a consequence, we can
write
v + w = u1 + u2 + z1 + z2 = (u1 + z1 ) + (u2 + z2 ).
Since U1 and U2 are subspaces, it follows that u1 + z1 ∈ U1 and u2 + z2 ∈ U2 . Thus
the definition of π1 implies that
π1 (v + w) = u1 + z1 = π1 (v) + π1 (w),
π2 (v + w) = u2 + z2 = π2 (v) + π2 (w).
Next assume that v ∈ V and λ ∈ K. Again, we can write v = u1 + u2 with
u1 ∈ U1 and u2 ∈ U2 , and thus u1 = π1 (v) and u2 = π2 (v). Because U1 and U2
are subspaces, we have that λu1 ∈ U1 and λu2 ∈ U2 . Thus we can decompose
λv = λu1 + λu2 with λu1 ∈ U1 and λu2 ∈ U2 . This implies that
π1 (λv) = λu1 = λπ1 (v) and π2 (λv) = λu2 = λπ2 (v).
This proves the linearity of π1 and π2 . □
4. DIRECT SUMS AND INVARIANT SUBSPACES 39
Example 2.64. We continue with Example 2.61 and consider again the two
subspaces
U1 = (x, y, z) ∈ R3 : y = z = 0
and
U2 = (x, y, z) ∈ R3 : x + y + z = 0 .
We now generalise the notion of an oblique projection to the case where the
vector space V is the direct sum of more than two subspaces.
Definition 2.65 (Projection). Assume that V is a vector space and that
U1 ,. . . ,UN are subspaces such that V = U1 ⊕ · · · ⊕ UN . The projections πi : V → V ,
i = 1, . . . , N , are defined as the unique mappings such that πi (v) ∈ Ui for each
i = 1, . . . , N and v ∈ V , and
v = π1 (v) + . . . + πN (v)
for each v ∈ V . ■
Proposition 2.66. Assume that V is a vector space and that U1 ,. . . ,UN are
subspaces such that V = U1 ⊕ · · · ⊕ UN . The projections πi : V → V , i = 1, . . . , N ,
defined in Definition 2.65 are linear transformations.
Proof. This is essentially the same as the proof of Proposition 2.63. □
Proposition 2.67. Assume that V is a vector space and that U1 ,. . . ,UN are
subspaces such that V = U1 ⊕ · · · ⊕ UN . Assume moreover that Bi is a basis of Ui ,
i = 1, . . . , N . Then B := B1 ∪ . . . ∪ BN is a basis of V .
Proof. For i = 1, . . . , N we can write Bi = (vj )j∈Ji for some index set Ji .
Here we may assume without loss of generality that the index sets are pairwise
disjoint, that is, that Ji ∩ Jj = ∅ for i ̸= j. Denote moreover J := J1 ∪ . . . ∪ JN .
Assume now that v ∈ V is an arbitrary vector. Then there exist unique vectors
ui ∈ Ui , i = 1, . . . , N , such that
v = u1 + . . . + uN .
Moreover, for each i = 1, . . . , N there exists a unique finite subset Ki ⊂ Ji and
unique non-zero scalars λj ∈ K \ {0}, j ∈ Ji , such that
X
ui = λ j vj .
j∈Ki
Define for i = 1, . . . , N
X
Li := L ∩ Ji and wi := µℓ vℓ .
ℓ∈Li
PN
Then wi ∈ Ui for each i and v = i=1 wi . Thus the assumption that the sum of
the vector spaces Ui is direct implies that wi = ui for each i. Next, the assumption
that Bi is a basis of Ui and the fact that
X X
λj vj = ui = wi = µℓ vℓ
j∈Ki ℓ∈Li
since B ∪ C is a basis of V .
It remains to show that the sum is direct. To that end, assume that v ∈ U ∩ W .
Since (uk )k∈I∪J is a basis of V , there exists a unique way of writing v in the form
N
X M
X
v= λℓ uiℓ + µm vjm
ℓ=1 m=1
Exercise 2.2. Show that ker(T ) and ran(T ) are always T -invariant subspaces.
■
Proof. We write
T11 T12
T = with Tij = πi ◦ T |Uj .
T21 T22
We have to show that T21 = 0. Let therefore u ∈ U . Then the assumption that U is
T -invariant implies that T u ∈ U . Thus π1 (T u) = T u, and π2 (T u) = 0. This shows
that (π2 ◦ T )u = 0 for every u ∈ U , which proves that T21 = π2 ◦ T |U = 0. □
Lemma 2.74. Assume that V is a vector space and that U1 ,. . . ,UN ⊂ V are
subspaces such that V = U1 ⊕ · · · ⊕ UN . Let moreover T : V → V be a linear trans-
formation such that Ui is a T -invariant subspace of V for every i = 1, . . . , N . Then
the block-matrix-representation of T with respect to this decomposition is block-
diagonal.
Proof. Exercise! □
42 2. LINEAR ALGEBRA
Theorem 2.81. Assume that V is a finite dimensional vector space and that
T : V → V is linear. The following are equivalent:
(1) T is bijective.
(2) T is injective.
(3) T is surjective.
Proof. The implications (1) =⇒ (2) and (1) =⇒ (3) hold trivially.
We now show the implication (2) =⇒ (3). Since T is injective and thus
ker T = {0}, we have
dim(V ) = dim(ker T ) + dim(ran T ) = 0 + dim(ran T ) = dim(ran T ).
Thus ran T is a subspace of V of the same dimension as V . This already implies
that V = ran T and thus T is surjective.
44 2. LINEAR ALGEBRA
Finally we show the implication (3) =⇒ (1). Here it remains to show that
T is injective. Since T is surjective, we have that V = ran T and in particular
dim V = dim(ran T ). Thus
dim(V ) = dim(ker T ) + dim(ran T ) = dim(ker T ) + dim(V ),
which implies that dim(ker T ). This is only possible if ker T = {0}, that is, T is
injective. □
Definition 2.82. Let V be a vector space. We denote by IdV : V → V the
identity operator on V given by IdV v = v for all v ∈ V . If the space V is clear
from the context, we omit the subscript V and write Id instead. ■
6. Existence of Eigenvalues
We will now show that linear transformations of complex, finite dimensional
vector spaces always have at least one eigenvalue. In order to so, we need to
introduce the concept of the evaluation of a polynomial on a linear transformation.
Assume to that end that T : V → V is a linear transformation of the space V .
Then we can consider the composition T ◦ T with itself, which is again a linear
transformation of V . Thus we can further consider the composition T ◦ T ◦ T .
Again, this is a linear transformation of V . In a similar way, we can consider
arbitrary (but finite) compositions of T with itself. In order to simplify notation,
we abbreviate in the following T 2 := T ◦ T , and similarly T 3 = T ◦ T ◦ T , and so
on.
Definition 2.85. Assume that V is a vector space, T : V → V is a linear
transformation, and p is a polynomial with coefficients in K, say
p(x) = c0 + c1 x + c2 x2 + . . . + cn xn .
We define the mapping p(T ) : V → V by
p(T )u := c0 u + c1 T u + c2 T 2 u + . . . cn T n u for all u ∈ V.
■
7. TRIANGULARISATION 45
7. Triangularisation
Lemma 2.89. Assume that the matrix
β1 ∗ . . . ∗
.
0 β2 . . . ..
B=.
∈ Matn (K)
.. . . . . . . ∗
0 . . . 0 βn
is upper triangular. Then the matrix B is invertible, if and only if all diagonal
entries β1 , . . . , βn are different from zero.
46 2. LINEAR ALGEBRA
Idea of proof. The matrix B is invertible, if and only if its columns are
linearly independent. If all diagonal entries are different from zero, this is clearly
the case (if you want, you can prove this by induction over n).
Conversely, if one of the diagonal entries is equal to zero, say βj = 0, then the
first j columns cannot be linearly independent, as only the first j − 1 entries of each
of these columns are possibly different from zero. □
0 . . . 0 αn
for some αj ∈ K, j = 1, . . . , n. Now note that the matrix representation of T − λ Id
in the same basis is the matrix
α1 − λ ∗ ... ∗
. ..
α2 − λ . .
0 .
A − λ Id = . .
.. . .. . ..
∗
0 ... 0 αn − λ
By Lemma 2.89, this matrix is invertible if and only if its diagonal entries are
non-zero.
Now recall that λ is an eigenvalue of T , if and only if the transformation T −λ Id
is not bijective. This is the case, if and only if its matrix representation (in any
basis) is not invertible, which, as we have just seen, is the case if and only if λ = αj
for some j, which proves the assertion. □
Assume therefore that we have proven the claim for every vector space U with
dim U < n and every mapping S : U → U .
Let λ be an eigenvalue of V (the existence of which follows from Theorem 2.87).
If T = λ Id, then the result holds trivially, as the matrix of T with respect to any
basis is simply the matrix λ Id. Thus we may assume without loss of generality
that T ̸= λ Id Define now U := ran(T − λ Id). Since λ is an eigenvalue of T , it
follows that T − λ Id is not surjective, which implies that dim(U ) < dim(V ). Since
T − λ Id ̸= 0, we also obtain that dim(U ) ≥ 1.
We now want to use our induction assumption on the space U . For that, we
show first that U is T -invariant: Indeed, assume that u ∈ U . Then there exists
some v ∈ V with (T − λ Id)v = u. Thus
T u = T (T − λ Id)v = (T − λ Id)T v ∈ ran(T − λ Id) = U.
8. DIAGONALISATION 47
Here the second equality follows from the computation rules for polynomials in
Proposition 2.86.
Define now the transformation S : U → U , u 7→ Su := T u. That is, the
mapping S is the restriction T |U of T to U , but seen as a mapping from U to itself
(which is possible because U is T -invariant). Thus there exists a basis (u1 , . . . , um )
of U for which the matrix representation of S is upper triangular.
Now let W ⊂ V be such that V = U ⊕W , and choose any basis (w1 , . . . , wn−m )
of W . We now consider the matrix representation A of T with respect to the
basis (u1 , . . . , um , w1 , . . . , wn−m ). Following our considerations from Section 4.3
and specifically Lemma 2.73, the fact that U is T -invariant implies that A has a
block upper triangular structure
A11 A12
A= .
0 A22
Moreover, the matrix A11 is precisely the matrix representation of S with respect
to the basis (u1 , . . . , um ) and thus upper triangular. Finally, we have for all wj that
T wj = (T wj − λwj ) + λwj = (T − λ Id)wj + λwj .
Since (T − λ Id)wj ∈ ran(T − λ Id) = U , it follows that the matrix A22 (which
represents the part of T that maps W into W ) is A22 = λ Id. Thus the matrix A
as a whole is upper triangular. □
8. Diagonalisation
Definition 2.92. Let V be a finite dimensional vector space and let T : V → V
be linear. We say that T is diagonalisable if there exists a basis of V such that the
corresponding matrix representation of T is diagonal. ■
In the following, we will derive different equivalent conditions for the diagonal-
isability of a linear operator. For that, we show first that eigenvectors for different
eigenvalues are necessarily linearly independent.
Theorem 2.93. Assume that V is a vector space and T : V → V is linear. Let
moreover λ1 , . . . , λm ∈ K be distinct eigenvalues (that is, λi ̸= λj for i ̸= j), and
let v1 , . . . , vm ∈ V be corresponding eigenvectors. Then the family (v1 , . . . , vm ) is
linearly independent.
Proof. We apply induction over m. For m = 1, the result is trivial, as every
eigenvector is non-zero.
Assume now that we have shown the result for m − 1. That is, the family
(v1 , . . . , vm−1 ) is linearly independent.
Let now c1 , . . . , cm ∈ K be such that
0 = c1 v1 + . . . + cm vm .
We have to show that cj = 0 for all j = 1, . . . , m.
First we consider the case where cm = 0. In this case, we have that
0 = c1 v1 + . . . + cm−1 vm−1 .
Since by our induction assumption, the set (v1 , . . . , vm−1 ) is linearly independent, it
follows that the coefficients c1 , . . . , cm−1 are all equal to zero as well, which proves
the claim in this case.
Now assume that cm ̸= 0. Defining dj := −cj /cm for j = 1, . . . , m − 1, we then
obtain the equation
(11) vm = d1 v1 + . . . + dm−1 vm−1 .
48 2. LINEAR ALGEBRA
9. Generalised Eigenspaces
In the previous section, we have introduce the notion of diagonalisability of a
linear transformation. Moreover, we have seen that a transformation is diagonalis-
able if and only if it has a basis of eigenvectors. Unfortunately, it turns out that
this is not the case for all transformations. As an example, consider the matrix
0 1
A= .
0 0
Since this matrix is upper triangular, we can read off the eigenvalues from its
diagonal, which is 0. Thus the only eigenvalue of A is 0. However, the matrix A
cannot be diagonalisable: If it were, it could be transformed by a basis change into
a diagonal matrix where all the diagonal entries are equal to 0; in other words, the
matrix would have be 0.
With the same argumentation we obtain that strictly upper triangular matrices
cannot be diagonalisable, unless they are already equal to 0. Moreover, the same
holds if we add the same constant λ to the diagonal: A matrix of the form
λ ∗ ... ∗
.
0 . . . . . . ..
(13) A = .
.. . . . . . . ∗
0 ... 0 λ
can only be diagonalisable, if all the off-diagonal elements are equal to 0 and A =
λ Id.
In this section, we will show that this is in a sense the most general form
of a non-diagonalisable matrix. More precisely, we will show that every linear
operator T on a finite dimensional, complex vector space has a basis such that the
corresponding matrix of T has a block-diagonal form where every block is of the
form (13). In order to arrive at this representation, we will have to introduce some
notation, though, and then do some hard work.
Nilpotent Transformations.
Definition 2.97. Assume that V is a vector space and T : V → V is a linear
transformation. We say that T is nilpotent, if there exists j ∈ N such that T j =
0. ■
Lemma 2.99. Assume that the matrix A ∈ Matn (K) is strictly upper triangu-
lar, that is, its entries aij satisfy aij = 0 for all i ≥ j. Then An = 0.
Proof. This is a simple calculation. □
Now consider the situation where the operator T has a single eigenvalue λ.
After choosing a suitable basis, we can then write the matrix A of T in the form
given in (13). This, however, implies that the matrix A − λ Id is nilpotent and thus
(A − λ Id)n = 0. Now note that the matrix (A − λ Id)n is precisely the matrix of
the operator (T − λ Id)n in the same basis. Thus the operator T − λ Id is nilpotent
and every element v in the vector space V satisfies (T − λ Id)n v = 0. In particular,
there exists a basis of V consisting of vectors that satisfy (T − λ Id)n v = 0.
Definition 2.101. Let V be a vector space, let T : V → V be linear and λ ∈ K.
A vector v ∈ V , v ̸= 0, is called a generalised eigenvector of T for the eigenvalue λ,
if there exists j ∈ N such that
(T − λ Id)j v = 0.
The linear span of all generalised eigenvectors of T for the eigenvalue λ is called
the generalised eigenspace of T for λ, and denoted by G(λ, T ). ■
Remark 2.102. It is easy to see that the generalised eigenspace for the eigen-
value λ consists of all the generalised eigenvectors for λ together with the vector
0. ■
Proof. Assume that u ∈ ker(p(T )). Then p(T ) = 0 and thus p(T )(T u) =
T (p(T )u) = T (0) = 0, which shows that also T u ∈ ker(p(T )). Thus ker(p(T )) is
T -invariant.
Now assume that u ∈ ran(p(T )), say u = p(T )v for some v ∈ V . Then T u =
T (p(T )v) = p(T )(T v) ∈ ran(p(T )), which shows that ran(p(T )) is T -invariant as
well. □
Lemma 2.105. Assume that V is a vector space and that T : V → V is linear.
Then
{0} ⊂ ker T ⊂ ker T 2 ⊂ ker T 3 ⊂ . . .
Proof. Let k ∈ N and let u ∈ ker T k . Then T k u = 0 and thus T k+1 u =
T (T k u) = T (0) = 0 as well, which shows that u ∈ ker T k+1 . This proves that
ker T k ⊂ ker T k+1 for all k. □
Lemma 2.106. Assume that V is a vector space and that T : V → V is linear.
Assume moreover that for some k ∈ N we have that ker T k = ker T k+1 . Then
ker T k+m = ker T k for all m ≥ 0.
Proof. We prove this result by induction over m. For m = 0, this result is
trivial.
Now assume that ker T k+m = ker T k . We have to show that also ker T k+m+1 =
ker T k . By Lemma 2.105 we know that ker T k ⊂ ker T k+m+1 . Thus we only have to
show the inclusion ker T k+m+1 ⊂ ker T k . Assume therefore that u ∈ ker T k+m+1 .
Then 0 = T k+m+1 u = T k+1 (T m u), and thus T m u ∈ ker T k+1 . Since ker T k+1 =
ker T k , we then obtain that T m u ∈ ker T k , which in turn shows that T k+m u =
T k (T m u) = 0. Thus u ∈ ker T k+m , which shows that ker T k+m+1 ⊂ ker T k+m . □
Lemma 2.107. Assume that V is a finite dimensional vector space and that
T : V → V is linear. Then
ker T dim V = ker T dim V +1 = ker T dim V +2 = ...
Proof. By Lemma 2.106 it is sufficient to show that the equality ker T dim V =
ker T dim V +1 holds. Assume thus that this is not the case, that is, that ker T dim V ⊊
ker T dim V +1 . Then Lemma 2.106 implies that ker T k ⊊ ker T k+1 for all k ≤ dim V .
Thus we have that dim(ker T k+1 ) ≥ dim(ker T k ) + 1 for all k ≤ dim V . This,
however, implies that dim(ker T dim V +1 ) ≥ dim V + 1, which is impossible. □
Proposition 2.108. Assume that V is a finite dimensional vector space and
T : V → V is linear. Then
V = (ker T dim V ) ⊕ (ran T dim V ).
Proof. We show first that (ker T dim V ) ∩ (ran T dim V ) = {0}. Assume to that
end that v ∈ (ker T dim V ) ∩ (ran T dim V ). Then we can write v = T dim V u for some
u ∈ V . Since v ∈ ker T dim V , it then follows that T 2 dim V u = T dim V (T dim V u) =
T dim V v = 0, that is, u ∈ ker T 2 dim V . Now we know from Lemma 2.107 that
ker T 2 dim V = ker T dim V , and thus we actually have that u ∈ ker T dim V . This,
however, implies that v = T dim V u = 0.
We now have shown that the sum of the spaces ker T dim V and ran T dim V is
direct and thus
dim(ker T dim V ) + dim(ran T dim V ) = dim (ker T dim V ) ⊕ (ran T dim V ) .
0 . . . 0 λj
Proof. Since G(λj , T ) = ker (T − λj Id)dim V , the generalised eigenspaces
are the kernels of polynomials of T and thus T -invariant. Moreover, once we have
established that V is the direct sum of the generalised eigenspaces, we can find a
basis of each of spaces G(λj , V ) for which the restriction of T to G(λj , V ) is upper
triangular. After concatenating these bases to a basis of the whole space V , we
then obtain from our previous considerations that the matrix A of T has the above
given form.
Thus it remains to establish the decomposition (15), which we will do by in-
duction over the dimension n = dim V of the space V .
For dim V = 1, the result trivially holds.
Assume now that we have shown (15) for every finite dimensional complex
vector space U of dimension dim U < n, and every linear transformation S : U → U .
Let λ1 ∈ C be an eigenvalue of T , the existence of which follows from The-
orem 2.87. Define moreover
U := ran (T − λ1 Id)n .
If U = {0}, we have the desired decomposition and are done. Assume therefore
that this is not the case.
Since U is the range of a polynomial of T it is T -invariant. Thus we can define
the operator S : U → U , u 7→ Su := T u. That is, S is the restriction of T to U ,
interpreted as a mapping from U to itself. We now note that the eigenvalues of
S are a subset of {λ2 , . . . , λm }. Indeed, if u ∈ U ⊂ V is an eigenvector of S for
54 2. LINEAR ALGEBRA
G(λj , T ) = v ∈ V : (T − λj Id)dim V = 0 .
Since U ⊂ V and dim U < dim V , we immediately obtain that G(λj , S) ⊂ G(λj , T )
for all j = 2, . . . , m.
For the converse inclusion assume that v ∈ G(λj , T ). Because of the decom-
position V = G(λ1 , T ) ⊕ U we can write
v = v1 + u
with v1 ∈ G(λ1 , T ) and u ∈ U . Next, the decomposition (15) implies that
u = u2 + . . . + um
with uj ∈ G(λj , S) for j = 2, . . . , m. Thus
v = v1 + u2 + . . . + um ,
which we can rewrite as
0 = v1 + u2 + . . . + uj−1 + (uj − v) + uj+1 + . . . + um .
Now note that v1 ∈ G(λ1 , T ), uℓ ∈ G(λℓ , S) ⊂ G(λℓ , T ) for ℓ = 2, . . . , m, ℓ ̸= j, and
also (uj − v) ∈ G(λj , T ) + G(λj , S) = G(λj , T ).
Thus we have written 0 as a sum of elements of the generalised eigenspaces
of T , which each consist of the generalised eigenvectors of T for the corresponding
eigenvalue together with the vector 0. Since generalised eigenvectors of T for differ-
ent eigenvalues are linearly independent, they cannot occur on the right hand side
of this sum. Thus we have to conclude that all the terms on the right hand side are
equal to 0, and in particular that (uj − v) = 0. This shows that v = uj ∈ G(λj , S).
Since v ∈ G(λj , T ) was arbitrary, this proves that G(λj , T ) ⊂ G(λj , S), which
concludes the proof. □
Remark 2.116. Assume that the operator T : V → V has for some basis an
upper triangular matrix with not necessarily distinct diagonal entries µ1 , µ2 , . . . , µn
(with n = dim V ). Then the characteristic polynomial of T is
χ(z) = (z − µ1 )(z − µ2 ) · · · (z − µn ).
Note here, though, that the same linear factor may repeat several times. ■
Lemma 2.119. Assume that V is a finite dimensional complex vector space and
that T : V → V is linear. There exists a unique monic polynomial p of minimal
degree such that p(T ) = 0.
Proof. From Theorem 2.117 we know that there exists a polynomial such that
p(T ) = 0 (namely the characteristic polynomial). Thus there also exists a monic
polynomial of minimal degree with this property. It remains to show that this is
unique.
Assume therefore to the contrary that p and q are monic polynomials of minimal
degree such that p(T ) = q(T ) = 0, and that p ̸= q. In particular, deg p = deg q,
and thus we can Write
p(z) = z m + am−1 z m−1 + . . . + a1 z + a0 ,
q(z) = z m + bm−1 z m−1 + . . . + b1 z + b0 ,
for some coefficients aj , bj ∈ K. Let k be the largest index such that ak ̸= bk (such
an index exists, as p ̸= q). Now define the polynomial r(z) = (p(z)−q(z))/(ak −bk ).
Then r(T ) = p(T ) − q(T ) = 0 and r is a monic polynomial of degree k < m, which
is a contradiction to the definitions of p and q. □
Definition 2.120. Assume that V is a finite dimensional complex vector space
and that T : V → V is linear. The unique monic polynomial p of minimal degree
satisfying p(T ) = 0 is called the minimal polynomial of T . ■
Next we establish some results about the relation between the minimal poly-
nomial and the characteristic polynomial.
Proposition 2.122. Assume that V is a finite dimensional complex vector
space and that T : V → V is linear, and denote by p(z) the minimal polynomial of
T . Assume moreover that q(z) is another polynomial satisfying q(T ) = 0. Then
q is a polynomial multiple of p. That is, there exists a polynomial s such that
q(z) = s(z)p(z).
Proof. By performing polynomial division with remainder, we can write
q(z) = s(z)p(z) + r(z),
where r and s are polynomials, and deg r < deg p. Now, since q(T ) = 0 and
p(T ) = 0, we obtain that also r(T ) = 0. However, p is the minimal polynomial of
11. JORDAN’S NORMAL FORM 57
T and deg r < deg p, which is only possible if r(z) = 0 is the 0-polynomial. Thus
we obtain the desired result. □
Proposition 2.123. Assume that V is a finite dimensional complex vector
space and that T : V → V is linear. Then the zeroes of the minimal polynomial p
of T are precisely the eigenvalues of T . Moreover, p is of the form
p(z) = (z − λ1 )s1 (z − λ2 )s2 · · · (z − λm )sm
with λ1 , . . . , λm are the eigenvalues of T , and 1 ≤ sj ≤ dj for all j, with dj being
the algebraic multiplicity of the eigenvalue λj .
Proof. We know already that the characteristic polynomial is a polynomial
multiple of p, which shows that p is of the form
p(z) = (z − λ1 )s1 (z − λ2 )s2 · · · (z − λm )sm
with 0 ≤ sj ≤ dj . It remains to show that sj ≥ 1 for all j. Assume therefore to
the contrary that sj = 0 for some j, that is, that the corresponding linear factor
does not appear in the minimal polynomial. Let now vj be an eigenvector for the
eigenvalue λj . Then a simple calculation (similar to those we needed for the linear
independence of generalised eigenspaces) shows that
p(T )vj = (λj − λ1 )s1 · · · (λj − λj−1 )sj−1 (λj − λj+1 )sj+1 · · · (λj − λm )sm vj .
Since vj ̸= 0 and λk − λj ̸= 0 for k ̸= j, this shows that p(T )vj ̸= 0, which
contradicts the assumption that p(T ) = 0. Therefore sj ≥ 1 for all j, which proves
the claim. □
(ℓ)
(or possibly Jj = (λj ) for sub-blocks of size 1), where λ1 , . . . , λm are the distinct
(ℓ)
eigenvalues of T . Such a basis is called a Jordan basis of T , and the sub-blocks Jj
are called Jordan blocks of T .
Moreover the following properties are satisfied:
• The blocks Ajj are (dj ×dj )-matrices, where dj is the algebraic multiplicity
of the eigenvalue λj .
• The number kj of different blocks for the eigenvalue λj is the geometric
multiplicity of λj .
• The minimal polynomial p of T has the form
p(z) = (z − λ1 )s1 (z − λs )s2 · · · (z − λm )sm ,
(ℓ)
where sj is the size of the largest Jordan block Jj for the eigenvalue λj .
• Apart from the order, the Jordan blocks are unique.
Example 2.125. Assume that the operator T : C4 → C4 has the only eigenvalue
λ = 2. Then we have the following possibilities for the Jordan normal form of T :
(1) A decomposition in a single Jordan block of size 4:
2 1 0 0
0 2 1 0
A= 0 0 2 1 .
0 0 0 2
Here the geometric multiplicity of the eigenvalue 2 is 1, and the minimal
polynomial of T is p(z) = (z − 2)4 .
(2) A decomposition in a Jordan block of size 3 and a block of size 1:
2 1 0 0
0 2 1 0
A= 0 0 2 0 .
0 0 0 2
Here the geometric multiplicity of the eigenvalue 2 is 2, and the minimal
polynomial of T is p(z) = (z − 2)3 .
(3) A decomposition in two Jordan blocks of size 2:
2 1 0 0
0 2 0 0
A= 0 0 2 1 .
0 0 0 2
Here the geometric multiplicity of the eigenvalue 2 is 2, and the minimal
polynomial of T is p(z) = (z − 2)2 .
(4) A decomposition in one Jordan blocks of size 2 and two blocks of size 1:
2 1 0 0
0 2 0 0
A= 0 0 2 0 .
0 0 0 2
Here the geometric multiplicity of the eigenvalue 2 is 3, and the minimal
polynomial of T is p(z) = (z − 2)2 .
(5) A decomposition in four Jordan blocks of size 1:
2 0 0 0
0 2 0 0
A= 0 0 2 0 .
0 0 0 2
11. JORDAN’S NORMAL FORM 59
1. Inner Products
Recall that the Euclidean (or standard) inner product on Rn is given as
n
X
⟨u, v⟩ = ui v i ,
i=1
and that the corresponding Euclidean norm is
Xn 1/2
∥u∥ := u2i .
i=1
The intuition behind these notion is that ∥u∥ would be the length of the vector
u ∈ Rn , whereas the inner product ⟨u, v⟩ is related to the angle between the vectors
u and v. More precisely, we have
⟨u, v⟩ = ∥u∥ ∥v∥ cos(α),
where α is the angle between the vectors u and v.
In the following, we will develop a generalisation of this notion of inner products
for arbitrary real or complex vector spaces.
Definition 3.1. Let V be a vector space. An inner product on V is a mapping
⟨·, ·⟩ : V → K with the following properties:
(1) Linearity in the first component:
⟨λu + µv, w⟩ = λ⟨u, w⟩ + µ⟨v, w⟩
for all u, v, w ∈ V and λ, µ ∈ K.
(2) Conjugate symmetry:
⟨u, v⟩ = ⟨v, u⟩
for all u, v ∈ V .
(3) Positive definiteness:
⟨v, v⟩ ∈ R≥0
for all v ∈ V . Moreover ⟨v, v⟩ = 0 if and only if v = 0.
A vector space V with an inner product is called an inner product space.1 ■
61
62 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION
where
v H := (v)T
is the Hermitian conjugate (or: “conjugate transpose”) of v.
Here the positive definiteness follows from the fact that
X n n
X
⟨v, v⟩ = vi vi = |vi |2 ,
i=1 i=1
which is non-negative (and real) for all v ∈ Cn , and equal to zero if and
only if v = 0.
(2) We can make the space C([0, 1], C) of complex-valued continuous functions
on [0, 1] to an inner product space by defining
Z 1
⟨u, v⟩ := u(x)v(x) dx
0
for u, v ∈ C([0, 1], C).
We obtain here the positive definiteness by writing
Z 1
⟨u, u⟩ = |u(x)|2 dx,
0
which obviously is real and non-negative. Moreover, since the integrand
|u(x)|2 is a continuous and non-negative function, the integral is equal to
zero if and only if |u(x)| = 0 for all x, which is the case if and only if
u = 0.
■
Lemma 3.3. Assume that V is an inner product space with inner product ⟨·, ·⟩.
Then we have for all u, v, w ∈ V and λ, µ ∈ K that
⟨u, λv + µw⟩ = λ⟨u, v⟩ + µ⟨u, w⟩.
That is, ⟨·, ·⟩ is conjugate linear in the second component.
Proof. We can write
⟨u, λv + µw⟩ = ⟨λv + µw, u⟩ = λ⟨v, u⟩ + µ⟨w, u⟩
= λ⟨v, u⟩ + µ⟨w, u⟩ = λ⟨u, v⟩ + µ⟨u, w⟩.
□
Remark 3.4. It is also somewhat common, especially in physics, to define
inner products slightly differently by requiring linearity in the second component
and conjugate linearity in the first component. Then one would for instance write
the inner product on Cn as ⟨u, v⟩ = uH v. From a theoretical point of view, this
makes no real difference, but it can be pretty annoying in practice, as all concrete
formulas look slightly different. ■
Remark 3.5. The linearity of an inner product in the first component to-
gether with the conjugate linearity in the second component is sometimes called
sesquilinearity from the Latin word sesqui meaning one-and-a-half. ■
Definition 3.6. Let V be an inner product space with inner product ⟨·, ·⟩.
The associated norm on V is the mapping ∥·∥ : V → R≥0 ,
1/2
∥v∥ := ⟨v, v⟩ .
■
Lemma 3.7. The norm on an inner product space V has the following proper-
ties:
1. INNER PRODUCTS 63
Lemma 3.14. Assume that V is an inner product space and that the family
(v1 , . . . , vm ) ⊂ V is orthonormal. Then (v1 , . . . , vm ) is linearly independent.
Proof. Assume that c1 , . . . , cm ∈ K are such that
0 = c1 v1 + . . . + cm vm .
A repeated application of Pythagoras’ Theorem implies that
0 = ∥c1 v1 + . . . + cm vm ∥2 = ∥c1 v1 ∥2 + . . . + ∥cm vm ∥2
= |c1 |2 ∥v1 ∥2 + . . . + |cm |2 ∥vm ∥2 = |c1 |2 + . . . + |cm |2 .
Thus c1 = c2 = . . . = cm = 0, which proves the linear independence of the set
{v1 , . . . , vm }. □
Definition 3.15. Let V be an inner product space. An orthonormal basis of
V is an orthonormal family that is a basis of V . ■
Proof. We have
n
DX n
X E X n
n X n
X
⟨u, w⟩ = x i vi , y j vj = xi yj ⟨vi , vj ⟩ = xi yi .
i=1 j=1 i=1 j=1 i=1
□
Definition 3.19. Assume that V is a finite dimensional inner product space
and let (v1 , . . . , vn ) be a family of vectors in V . The Gram matrix for this family
is the matrix G ∈ Matn (K) with entries Gij = ⟨vj , vi ⟩. That is,
⟨v1 , v1 ⟩ ⟨v2 , v1 ⟩ . . . ⟨vn , v1 ⟩
⟨v1 , v2 ⟩ ⟨v2 , v2 ⟩ . . . ⟨vn , v2 ⟩
G := . .
. .. . . ..
. . . .
⟨v1 , vn ⟩ ⟨v2 , vn ⟩ . . . ⟨vn , vn ⟩
■
Remark 3.20. We see immediately that the Gram matrix for a linear system
is the identity matrix, if and only if the system is orthonormal. ■
66 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION
□
H
Definition 3.22. A matrix A ∈ Matn (K) is called Hermitian, if A = A,
that is, if aij = aji for all i, j = 1, . . . , n. ■
Lemma 3.24. Assume that V is a finite dimensional inner product space and
that (v1 , . . . , vn ) is a family in V , and denote by G the corresponding Gram matrix.
Then G is Hermitian and positive semi-definite. Moreover, G is positive definite,
if and only if the family (v1 , . . . , vn ) is linearly independent.
Proof. Since
Gij = ⟨vj , vi ⟩ = ⟨vi , vj ⟩ = Gji ,
the matrix G is Hermitian. Pn
Now let x ∈ Kn and let v = i=1 xi vi be the vector with coordinates (x1 , . . . , xn ).
Then
xH Gx = ⟨v, v⟩ = ∥v∥2 ≥ 0,
which proves that G is positive semi-definite.
Now assume that Pn(v1 , . . . , vn ) is linearly independent, let x = (x1 , . . . , xn ) ∈
Kn \{0} and let v = i=1 xi vi . Since x ̸= 0 and (v1 , . . . , vn ) is linearly independent,
it follows that v ̸= 0. Thus the same computation as above shows that xH Gx =
∥v∥2 > 0. This proves that G is positive definite.
Now assume conversely that G is positive definite, and let (x1 , . . . , xn ) ∈ Kn
be such that
Xn
xi vi = 0.
i=1
Then
0 = ∥0∥2 = xH Gx.
Thus the positive definiteness of G implies that x = 0. This proves that (v1 , . . . , vn )
is linearly independent. □
Remark 3.25. Conversely, assume that (v1 , . . . , vn ) is a basis of V and G ∈
Matn (K) is positive definite and Hermitian. Then we can define an inner product
on V by setting
⟨u, v⟩G := y H Gx,
n
if x, y ∈ K are the coordinates of the vectors u, v ∈ V . (It is straight-forward to
show that this is indeed an inner product.) Thus we have a one-to-one correspond-
ence between positive definite Hermitian matrices and inner products. ■
3. ORTHOGONAL COMPLEMENTS AND PROJECTIONS 67
which proves (7). In addition, we obtain that ∥πU v∥2 ≤ ∥v∥2 with equality if and
only if ∥v − πU v∥ = 0, that is, v = πU v, which in turn is the case if and only if
v ∈ U . This proves (8).
Next, we note that for all u ∈ U we have that
∥v − u∥2 = ∥v − πU v + πU v − u∥2 = ∥v − πU v∥2 + ∥πU v − u∥2 ,
since v − πU v ∈ U ⊥ and πU v − u ∈ U . Thus the minimisation problem minu∈U ∥v −
u∥2 is equivalent to the problem
min∥πU v − u∥2 ,
u∈U
3Note here that the solution u can never be unique, as −u is another obvious solution.
1 1
5. UNITARY TRANSFORMATIONS AND ADJOINTS 71
Remark 3.36. One can show that the singular values of a mapping are unique.
In addition, one obtains a limited uniqueness result for the singular vectors: If the
singular values satisfy σj−1 > σj = σj+1 = . . . = σj+k > σj+k+1 . . ., then the linear
spans span{uj , . . . , uj+k } and span{vj , . . . , vj+k } are independent of the choice of
the singular vectors. In particular, if all the singular values are distinct, then the
singular vectors are uniquely determined up to orientation. ■
Lemma 3.37. Assume that the mapping T : U → V has the singular value
decomposition
Xp
Tu = σi ⟨u, ui ⟩vi .
i=1
Then
ran T = span{v1 , . . . , vp } and ker T = span{u1 , . . . , up }⊥ .
Proof. This is immediately obvious from the decomposition. □
We will come back soon to the singular value decomposition in order to discuss
important properties, but also slightly different representations. For that, we will
need some further notation, though.
Theorem 3.43. Assume that U and V are finite dimensional inner product
spaces and that T : U → V is linear. The adjoint T ∗ : V → U is the unique linear
transformation satisfying
(19) ⟨T u, v⟩V = ⟨u, T ∗ v⟩U for all u ∈ U and v ∈ V.
∗
Proof. The linearity of T is obvious from its definition.
We next show that (19) holds. Let therefore u ∈ U and v ∈ V . Then
p
DX E p
X
⟨T u, v⟩V = σi ⟨u, ui ⟩U vi , v = σi ⟨u, ui ⟩U ⟨vi , v⟩V .
V
i=1 i=1
On the other hand, we have that
D X p E
⟨u, T ∗ v⟩V = u, σi ⟨v, vi ⟩V ui
U
i=1
p
X p
X
= σi ⟨v, vi ⟩V ⟨u, ui ⟩U = σi ⟨vi , v⟩V ⟨u, ui ⟩U .
i=1 i=1
The resulting expressions are the same in the two cases, which shows that (19)
holds.
It remains to show that T ∗ is the unique linear transformation with this prop-
erty. Assume therefore that S : V → U is another linear transformation such that
⟨T u, v⟩V = ⟨u, Sv⟩U for all u ∈ U and v ∈ V . Then
⟨u, T ∗ v⟩U = ⟨T u, v⟩V = ⟨u, Sv⟩U
for all u ∈ U and v ∈ V , and thus
⟨u, (T ∗ − S)v⟩U = 0
for all u ∈ U and v ∈ V . That is, (T ∗ − S)v ∈ U ⊥ = {0} for all v ∈ V , which is the
same as saying that T ∗ v = Sv for all v ∈ V , that is, T ∗ = S. □
Lemma 3.44. Assume that U and V are inner product spaces and that T : U →
V is linear. Then
T = (T ∗ )∗ .
Proof. Exercise!
(Use Theorem 3.43!) □
5. UNITARY TRANSFORMATIONS AND ADJOINTS 73
Lemma 3.45. Assume that U , V , W are inner product spaces and that T : U →
V and S : V → W are linear. Then
(S ◦ T )∗ = T ∗ ◦ S ∗ .
Proof. Exercise!
(Use Theorem 3.43!) □
Lemma 3.46. Assume that U and V are finite dimensional inner product spaces
and that T : U → V is linear. Then
ker T = (ran T ∗ )⊥ and ran T = (ker T ∗ )⊥ .
Proof. We have that u ∈ ker T if and only if T u = 0. This is equivalent to u
satisfying ⟨T u, v⟩V = 0 for all v ∈ V . Using Theorem 3.43, this in turn is equivalent
to u satisfying ⟨u, T ∗ v⟩V = 0 for all v ∈ V , which is equivalent to the inclusion
u ∈ (ran T ∗ )⊥ . This shows that ker T = (ran T ∗ )⊥ .
By applying this result to T ∗ and recalling that (T ∗ )∗ = T , we obtain that
ker T ∗ = (ran T )⊥ . Taking orthogonal complements on both sides, we finally obtain
that ran T = (ker T ∗ )⊥ as claimed. □
Adjoints of unitary transformations.
Using the notion of the adjoint of a mapping, we can find a different charac-
terisation of unitary transformations:
Proposition 3.47. Assume that U and V are inner product spaces and that
T : U → V is a linear transformation. Then T is unitary, if and only if
T ∗ T = IdU .
Proof. Assume first that T ∗ T = IdU . Then we have for all u, v ∈ U that
⟨T u, T v⟩V = ⟨u, T ∗ T v⟩U = ⟨u, v⟩U ,
which shows that T is unitary.
Conversely, assume that T is unitary. Then
⟨u, T ∗ T v⟩U = ⟨T u, T v⟩V = ⟨u, v⟩U
for all u, v ∈ U , and thus
⟨u, (T ∗ T − IdU )v⟩U = 0
for all u, v ∈ U . This shows that (T ∗ T − IdU )v ∈ U ⊥ = {0} for all v ∈ U , which in
turn shows that T ∗ T = IdU . □
Proposition 3.48. Assume that U and V are inner product spaces and that
T : U → V is unitary. Then
T T ∗ = πran T .
Proof. Let v ∈ V . Then we can decompose v as v = v1 + v2 with v1 ∈ ran T
and v2 ∈ (ran T )⊥ . We have to show that T T ∗ v = v1 .
Since v1 ∈ ran T , there exists u1 ∈ U such that v1 = T u1 . Since T ∗ T = IdU ,
we then have that
T T ∗ v = T T ∗ (v1 + v2 ) = T T ∗ T u1 + T T ∗ v2
= T (T ∗ T u1 ) + T T ∗ v2 = T u1 + T T ∗ v2 = v1 + T T ∗ v2 .
Thus it remains to show that T T ∗ v2 = 0.
By the definition of the adjoint, we have for every w ∈ V that
⟨T T ∗ v2 , w⟩V = ⟨T ∗ v2 , T ∗ w⟩U = ⟨v2 , (T ∗ )∗ T ∗ w⟩V = ⟨v2 , T T ∗ w⟩V .
Since T T ∗ w ∈ ran T and v2 ∈ (ran T )⊥ , we conclude that ⟨T T ∗ v2 , w⟩V = 0. Since
w ∈ V was arbitrary, this shows that T T ∗ v2 = 0, which concludes the proof. □
74 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION
⟨u, up ⟩
• The second mapping is the transformation
Σ : Kp → Kp ,
σ1 0 . . . 0
x1 σ1 x1 .. . .. x1
. ..
.. . 0 . .
. 7→ .. = . . ..
.. .. ... 0 xp
xp σp xp
0 ... 0 σp
7. THE MOORE–PENROSE INVERSE 75
Definition 3.53. Assume that U and V are finite dimensional inner product
spaces and that T : U → V has the singular value decomposition
p
X
Tu = σi ⟨u, ui ⟩vi .
i=1
Theorem 3.55. Assume that U and V are finite dimensional inner product
spaces and that T : U → V is linear. Then the Moore–Penrose inverse T † of T has
the following properties:
(1) ker T † = (ran T )⊥ .
(2) ran T † = (ker T )⊥ .
(3) T T † = πran T .
(4) T † T = π(ker T )⊥ .
(5) T †T T † = T †.
(6) T T †T = T .
Proof. Write T = QΣP ∗ with Q : Kp → U and P : Kp → V unitary, and
Σ : Kp → Kp diagonal with positive diagonal entries. Then T † = P Σ−1 Q∗ .
From the results of the singular value decomposition we now obtain that ran T =
ran Q and ker T = (ran P )⊥ . Now note that the decomposition T † = P Σ−1 Q∗ is
actually a singular value decomposition of T † (up to a necessary reordering of the
singular values). Thus we also have that ran T † = ran P and ker T † = (ran Q)⊥ .
Therefore (ran T † )⊥ = ran P = (ker T )⊥ and ker T † = (ran Q)⊥ = (ran T )⊥ .
Recall now that P and Q are unitary, and therefore P ∗ P = Idp and Q∗ Q = Idp .
Thus
T T † = QΣP ∗ P Σ−1 Q∗ = QΣΣ−1 Q∗ = QQ∗ .
This in turn shows that
T T † = QQ∗ = πran Q = πran T .
Next we have that
T † T = P Σ−1 Q∗ QΣP ∗ = P Σ−1 ΣP ∗ = P P ∗ = πran P = π(ker T )⊥ .
Moreover,
T † T T † = P P ∗ P Σ−1 Q∗ = P Σ−1 Q∗ = T †
and
T T † T = QQ∗ QΣP ∗ = QΣP ∗ = T,
which concludes the proof. □
8. SINGULAR VALUE DECOMPOSITION OF MATRICES 77
Theorem 3.56. Assume that U and V are finite dimensional inner product
spaces and that T : U → V is linear. Let v ∈ V . Then
u† := T † v
solves the (bi-level) optimisation problem
(20) min∥u∥2U s.t. u solves min∥T u − v∥2V .
u u
In other words, the Moore–Penrose inverse looks at the least squares problem
minu ∥T u − v∥2 and selects from all solutions of this problems the one with the
smallest norm.
Proof. The vector u ∈ U solves the problem minu ∥T u − v∥2V , if and only if
the vector v̂ = T u solves the problem
min∥v̂ − v∥2V s.t. v̂ ∈ ran T.
v̂
4The decision of whether to start with AAH or AH A should be based on which of these
matrices is expected to look simpler. In this case, there is no real difference between them, so it
does not matter. In some cases, one can make one’s life significantly harder, though, by starting
the computations the wrong way.
80 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION
Lemma 3.64. Assume that U is a finite dimensional inner product space and
T : U → U is Hermitian. Then T is positive semi-definite, if and only if all eigen-
values of T are non-negative.
Moreover, T is positive definite, if and only if all eigenvalues of T are positive.
Proof. Assume first that T is positive semi-definite and that λ is an eigenvalue
of T with eigenvector u. Then
λ∥u∥2 = ⟨λu, u⟩ = ⟨T u, u⟩ ≥ 0,
which shows that λ ≥ 0.
Now assume that all eigenvalues of T are non-negative. Denote by λ1 , . . . , λn ≥
0 the eigenvalues of T , and let (u1 , . . . , un ) be a corresponding orthonormal basis
of eigenvectors. Let moreover u ∈ U . Then we can write
u = x1 u1 + . . . + xn un
for some xi ∈ K, i = 1, . . . , n. Since the vectors ui are eigenvectors of T for the
eigenvalues λi , we have that
T u = λ1 x1 u1 + . . . + λn xn un .
Since (u1 , . . . , un ) is an orthonormal basis of U and λi ≥ 0 for all i, we obtain that
n
X
⟨T u, u⟩ = λi |xi |2 ≥ 0,
i=1
Lemma 3.65. Assume that U and V are finite dimensional inner product spaces
and T : U → V is a linear transformation. Then T ∗ T : U → U and T T ∗ : V → V
are self-adjoint positive semi-definite.
Proof. We note first that (T ∗ T )∗ = T ∗ (T ∗ )∗ = T ∗ T and (T T ∗ )∗ = (T ∗ )∗ T ∗ =
T T ∗ , which shows that T ∗ T and T T ∗ are self-adjoint.
Next we have for all u ∈ U that
⟨u, T ∗ T u⟩U = ⟨T u, T u⟩V = ∥T u∥2V ≥ 0,
and for all v ∈ V that
⟨T T ∗ v, v⟩V = ⟨T ∗ v, T ∗ v⟩U = ∥T ∗ v∥2U ≥ 0,
which proves the positive semi-definiteness of T ∗ T and T T ∗ . □
82 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION
Theorem 3.67. Assume that U and V are finite dimensional inner product
spaces and that T : U → V is linear. Let (u1 , . . . , un ) be an orthonormal eigenbasis
of T ∗ T for the eigenvalues λ1 ≥ λ2 ≥ . . . ≥ λn ≥ 0.
Denote p = rank T , and define
p T ui
σi = λ i and vi =
σi
for k = 1, . . . , p. Then
p
X
(22) Tu = σi ⟨u, ui ⟩vi
i=1
for all u ∈ U , which shows that the representation of T given in (22) is valid.
9. SELF-ADJOINT AND POSITIVE DEFINITE TRANSFORMATIONS 83
It remains to show that the family (v1 , . . . , vp ) is orthonormal as well. For that,
we compute for 1 ≤ i, j ≤ p the inner product
1 1
⟨vi , vj ⟩V = ⟨T ui , T uj ⟩V = ⟨ui , T ∗ T uj ⟩U
σi σj σi σj
1 σj 1 if i = j,
= ⟨ui , σj2 uj ⟩U = ⟨ui , uj ⟩U =
σi σj σi 0 if i ̸= j,
as the family (u1 , . . . , up ) is orthonormal. This concludes the proof. □
CHAPTER 4
Until now, we have treated vector spaces and linear mappings in a purely algeb-
raic way. That is, all1 the constructions we have performed in the proofs required
finitely many well defined steps, involving only linear operations and at times also
polynomials. Although we were sometimes dealing with infinite dimensional vector
spaces, we were never required to rely on infinite sums of vectors.
This restriction to pure algebra is fine for solving some mathematical problems.
The vast majority of problems that occur in practice in physics and engineering,
however, cannot be reasonably treated in that way. Instead, it is necessary to bring
in methods from analysis as well, in particular the ideas of convergence of sequences
and continuity of mappings.2 In the following, we will introduce the main ideas in
that direction.
1. Normed Spaces
We start by defining the notion of a norm on vector spaces, which is a function
that assigns a length to each vector.
Definition 4.1 (Norm). A norm on a vector space U over K is a function
∥·∥ : U → R with the following properties:
(1) Positivity: ∥u∥ ≥ 0 for all u ∈ U , and ∥u∥ = 0 if and only if u = 0.
(2) Homogeneity: ∥λu∥ = |λ|∥u∥ for all u ∈ U and λ ∈ K.
(3) Triangle inequality: ∥u + v∥ ≤ ∥u∥ + ∥v∥ for all u, v ∈ U .
A vector space together with a norm is called a normed space. ■
P 1/2
n
Example 4.2. The space Kn with the Euclidean norm ∥x∥2 := i=1 |xi |
2
is a normed space. ■
The Euclidean norm is by no means the only norm on Kn . Amongst those that
are used regularly in applications are the 1-norm and the ∞-norm defined below.
Example 4.3. On the space Kn we define
∥x∥1 := |x1 | + . . . + |xn |
and
∥x∥∞ := max |x1 |, . . . , |xn |
for x ∈ Kn . Both of these are norms on Kn , as positivity and homogeneity are
clearly satisfied and the triangle inequality follows from the triangle inequality on
the real/complex numbers. ■
1Arguably with the exception of the existence of the singular value decomposition, where
we actually employed a result from calculus concerning the existence of solutions of optimisation
problems. Also, the existence of a basis of an arbitrary vector space required a significantly more
complicated argumentation, but this was relegated to the optional part of these notes.
2Differentiability of mappings would be the next important idea, but, alas, we won’t have
time in this course to go that far.
85
86 4. BANACH AND HILBERT SPACES
It is obvious that the norms ∥·∥1 and ∥·∥∞ on the space Kn (with n ≥ 2) are
different from the Euclidean norm that is induced by the standard inner product.
However, it might be still possible that there is another inner product on Kn that
induces these norms. We will show in the following that this is not the case by
deducing a property that all norms induced by an inner product need to satisfy.
Lemma 4.5 (Parallelogram law). Assume that U is an inner product space and
that ∥u∥ = ⟨u, u⟩1/2 is the induced norm on U . Then
∥u − v∥2 + ∥u + v∥2 = 2∥u∥2 + 2∥v∥2
for all u, v ∈ U .
Proof. We write
∥u − v∥2 + ∥u + v∥2 = ⟨u − v, u − v⟩ + ⟨u + v, u + v⟩
= ⟨u, u⟩ − ⟨v, u⟩ + ⟨v, v⟩ − ⟨u, v⟩ + ⟨u, u⟩ + ⟨v, u⟩ + ⟨v, v⟩ + ⟨u, v⟩
= 2⟨u, u⟩ + 2⟨v, v⟩ = 2∥u∥2 + 2∥v∥2 .
□
Example 4.6. The norms ∥·∥1 and ∥·∥∞ on Kn with n ≥ 2 are not induced by
an inner product. We show this by demonstrating that the parallelogram law fails
to hold in both cases. For this, we take u = e1 and v = e2 , the first and second
standard basis vector in Kn . Then
∥e1 − e2 ∥21 + ∥e1 + e2 ∥21 = 4 + 4 = 8,
whereas
2∥e1 ∥21 + 2∥e2 ∥21 = 2 + 2 = 4.
Similarly,
∥e1 − e2 ∥2∞ + ∥e1 + e2 ∥2∞ = 1 + 1 = 2,
whereas
2∥e1 ∥2∞ + 2∥e2 ∥2∞ = 2 + 2 = 4.
■
Theorem 4.7 (Jordan–von Neumann). Assume that (U, ∥·∥) is a normed space
and that the parallelogram law
∥u − v∥2 + ∥u + v∥2 = 2∥u∥2 + 2∥v∥2
holds for all u, v ∈ U . Then there exists an inner product ⟨·, ·⟩ on U such that
∥u∥ = ⟨u, u⟩1/2 for all u ∈ U .
• If U is a real vector space, then ⟨·, ·⟩ is given as
1
⟨u, v⟩ = ∥u + v∥2 − ∥u − v∥2 .
4
• If U is a complex vector space, then ⟨·, ·⟩ is given as
4
1X k
⟨u, v⟩ = i ∥u + ik v∥2 .
4
k=1
Proof. I will add the proof for the real case when I find time. (It is not
difficult to check that the requirements for an inner product are satisfied in the
real case, but it is somewhat tedious. In the complex case, the tediousness rises
significantly.) □
Analoga of the 1-norm and ∞-norm can also be defined for function spaces:
Example 4.8. The space C([0, 1]; Cn ) of continuous functions f : [0, 1] → Cn
can be made a normed space with the norm
∥f ∥∞ := max |f (x)|2 .
x∈[0,1]
Here we denote by |·|2 the Euclidean norm on Cn (in order to minimise the confusion
possibilities with the norm on C([0, 1]; Cn )). There are (reasonable) alternatives to
this norm, though. For instance, we can define the norm
Z 1
∥f ∥1 := |f (x)|2 dx.
0
We will see later that these two norms have pretty different properties. ■
We will now state a consequence of the triangle inequality that turns out to be
useful in a large number of estimates.
Lemma 4.9 (Reverse triangle inequality). Assume that (U, ∥·∥) is a normed
space and that u, v ∈ U . Then
∥u∥ − ∥v∥ ≤ ∥u − v∥.
Proof. We have that
∥u∥ = ∥u − v + v∥ ≤ ∥u − v∥ + ∥v∥
and therefore
∥u∥ − ∥v∥ ≤ ∥u − v∥.
Similarly,
∥v∥ = ∥v − u + u∥ ≤ ∥v − u∥ + ∥u∥
and thus
∥v∥ − ∥u∥ ≤ ∥u − v∥.
Together, these inequalities imply that
∥u∥ − ∥v∥ ≤ ∥u − v∥,
which proves the assertion. □
88 4. BANACH AND HILBERT SPACES
Remark 4.11. The statement that for all ε > 0 there exists N ∈ N such that
∥un − u∥ < ε whenever n ≥ N is precisely the same as the convergence of the
sequence of real numbers ∥un − u∥ to 0. That is, we have that
lim un = u ⇐⇒ lim ∥un − u∥ = 0.
n→∞ n→∞
■
Example 4.12. Consider the space C([0, 1]) of continuous real valued functions
on the interval [0, 1] with the norm
Z 1
∥f ∥1 = |f (x)| dx,
0
then the same sequence does not converge to 0 (in fact, it does not converge at all).
In this case, we have that
∥fn − 0∥∞ = max |xn − 0| = 1
x∈[0,1]
for all n ∈ N. This shows that the notion of convergence of a sequence can change
drastically if we change the norm on the space. ■
Lemma 4.13 (Uniqueness of the limit). Assume that (U, ∥·∥) is a normed space
and that (un )n∈N ⊂ X is a sequence in U . If the limit limn→∞ un exists, it is
unique.
Proof. Let u = limn→∞ un and v = limn→∞ un . Assume to the contrary that
u ̸= v. Then ∥u − v∥ > 0. Define now ε := ∥u − v∥/2. Because of the convergence
of the sequence (un )n∈N to u and v, there exist N1 , N2 ∈ N such that ∥un − u∥ < ε
2. CONVERGENCE AND CONTINUITY 89
Proposition 4.15. Assume that (U, ∥·∥) is a normed space and that (uk )k∈N ⊂
U is a convergent sequence in U . Then the set {uk : k ∈ N} is bounded.
Proof. Denote u := limk→∞ uk . Then there exists K ∈ N such that ∥u−uk ∥ ≤
1 whenever k ≥ K. (Here we have used the definition of continuity with the choice
ε = 1.) Now define
r := max{∥u1 ∥, ∥u2 ∥, . . . , ∥uK−1 ∥, ∥u∥ + 1} + 1.
Then we obviously have that ∥uk ∥ < r whenever k < K. Moreover, for k ≥ K it
follows from the triangle inequality that
∥uk ∥ ≤ ∥uk − u∥ + ∥u∥ ≤ 1 + ∥u∥ < r.
Thus ∥uk ∥ < r for all k ∈ N, or, put differently, uk ∈ Br (0) for all k ∈ N. This
shows the boundedness of the set {uk : k ∈ N}. □
Having defined convergence of sequences in normed spaces, we can now continue
with the definition of continuity of functions between normed spaces. The definition
is, in a sense, a straight-forward generalisation of continuity of functions on R; we
simply replace each occurrence of an absolute value by the norm on the respective
normed space.
Definition 4.16 (Continuity). Assume that (U, ∥·∥U ) and (V, ∥·∥V ) are normed
spaces and that f : U → V is a mapping. We say that f is continuous at a point
u ∈ U if for every sequence (un )n∈N ⊂ U with limn→∞ un = u we have that
limn→∞ f (un ) = f (u).
We say that f is continuous everywhere or simply continuous, if it is continuous
at every point u ∈ U . ■
Now assume that (23) does not hold. That is, assume that there exists some
ε > 0 such that for every δ > 0 there exists û ∈ U with ∥u − û∥ < δ and ∥f (u) −
f (û)∥V ≥ ε. Applying this with δ = 1/n, this means that we can find a sequence
(un )n∈N such that ∥u−un ∥U < 1/n and ∥f (u)−f (un )∥ ≥ ε. Since ∥u−un ∥U < 1/n,
we then have that u = limn→∞ un . On the other hand, since ∥f (u) − f (un )∥V ≥ ε,
the sequence (f (un ))n∈N does not converge to f (u). This shows that f is not
continuous at u. □
We will now show that norms work well together with the linear structure of a
vector space in that the basic operations of addition and scalar multiplication will
always be continuous on a normed space.
Theorem 4.18. Assume that U is a normed space. Then the following hold:
(1) Continuity of the norm: If un →n→∞ u, then ∥un ∥ →n→∞ ∥u∥.
(2) Continuity of addition: If un →n→∞ u and vn →n→∞ v, then (un +
vn ) →n→∞ u + v.
(3) Continuity of scalar multiplication: If un →n→∞ u and λn →n→∞ λ, then
(λn un ) →n→∞ λu.
Proof. Assume that u = limn→∞ un , that is, ∥u − un ∥ → 0 as n → ∞. Then
the reverse triangle inequality implies that
∥u∥ − ∥un ∥ ≤ ∥u − un ∥ → 0 as n → ∞,
which proves that ∥un ∥ → ∥u∥.
Now assume that u = limn→∞ un and v = limn→∞ vn . Then
∥u + v − (un + vn )∥ = ∥u − un + v − vn ∥ ≤ ∥u − un ∥ + ∥v − vn ∥ → 0 as n → ∞,
which shows that u + v = limn→∞ (un + vn ).
Finally, assume that u = limn→∞ un and λ = limn→∞ λn . Then
∥λu − λn un ∥ = ∥λu − λun + λun − λn un ∥ ≤ ∥λ(u − un )∥ + ∥(λ − λn )un ∥
= |λ|∥u − un ∥ + |λ − λn |∥un ∥ → 0 as n → ∞,
as the sequence (∥un ∥)n∈N converges to ∥u∥ and therefore is bounded. □
n
Example 4.19. On K we define the 0-”norm” ∥·∥0 by
∥u∥0 := #{k : uk ̸= 0}.
That is, ∥u∥0 is the number of non-zero components of the vector u. We stress that
this is not a norm on Kn , because the homogeneity is not satisfied: We have for
instance that ∥e1 ∥0 = 1, but also ∥2e1 ∥0 = 1.
The other properties of a norm are satisfied, though: Non-negativity is clear
from the definition, and ∥u∥0 = 0 if and only if the vector u has no non-zero
components, that is, if u = 0. Also, the number of non-zero components of a vector
u + v cannot be larger the sum of non-zero components of u and v, which proves
the triangle inequality.
However, scalar multiplication is not a continuous operation on (Kn , ∥·∥0 ). Take
for instance the constant sequence un = e1 for all n and λn = n1 . Then un → e1
and λn → 0 as n → ∞, but
1
∥λn un − 0∥0 = e1 = 1 for all n,
n
that is, the sequence (λn un )n∈N does not converge to 0. ■
There is an alternative characterisation of continuity that can be much easier
to work with in certain cases. For that, however, we will need to introduce some
additional notation, or, rather, generalise some well known notation from Rn to
arbitrary normed spaces.
3. OPEN AND CLOSED SETS 91
That is, a set S is open if we can find for each point u in S a small ball around
u that is still completely contained in S.
In particular, open balls are actually open sets according to this definition.
While this statement might seem trivial at first glance, it actually is not and does
deserve a proof: In order to show that Br (u) is an open set, we have to find for
each v ∈ Br (u) some ball Bs (v) such that Bs (v) ⊂ Br (u).
Lemma 4.22. Let (u, ∥·∥) be a normed space, let u ∈ U and r > 0. Then the
open ball Br (u) is an open set.
Proof. Exercise! □
Example 4.23. Consider the space C([0, 1]) with the norm ∥·∥∞ . Then the set
U := f ∈ C([0, 1]) : f (x) > 0 for all x ∈ [0, 1]
is open.
In order to see that this is actually true, assume that f ∈ U . Since f is
continuous, the function f admits its minimum on the (bounded and closed) interval
[0, 1]. That is, the optimisation problem minx∈[0,1] f (x) has at least one solution
x∗ ∈ [0, 1]. Define now r := f (x∗ ). Since f ∈ U , we have that r > 0. Also, from
the definition of r it follows that f (x) ≥ r for all x ∈ [0, 1].
Assume to that end that g ∈ Br (f ). Then
g(x) ≥ f (x) − |f (x) − g(x)| ≥ f (x) − max |f (y) − g(y)|
y∈[0,1]
fr (x) := r r 2
1 if ≤ x ≤ 1.
2
It is easy to see that this function is continuous and not contained in U (since
fr (0) = −1). In addition, we have that
Z 1 Z r/2
4x r
∥fr − f ∥1 = |fr (x) − f (x)| dx = dx = < r.
0 0 r 2
Thus fr ∈ Br (f ) and fr ̸∈ U . Since r was arbitrary, this shows that U is not
open. ■
92 4. BANACH AND HILBERT SPACES
Now the equivalence of these items follows from the fact that a set is open if and
only if its complement is closed.
It remains to show the equivalence of the first item with any of the others. For
this, we recall that the function f is continuous, if and only if for every u ∈ U
and every ε > 0 there exists δ > 0 such that ∥f (v) − f (u)∥V < ε whenever v ∈ V
satisfies ∥v − u∥U < δ. Now observe that this is equivalent to stating that for every
u ∈ U and every ε > 0 there exists δ > 0 such that
Bδ (u) ⊂ f −1 (Bε (u)).
Assume now that Item 2 holds. Let moreover u ∈ U and let ε > 0. Since the set
Bε (f (u)) ⊂ V is open, it follows that also C := f −1 (Bε (f (u))) is open. Moreover,
we obviously have that u ∈ C. Thus, as C is open, there exists δ > 0 such that
Bδ (u) ⊂ C = f −1 (Bε (f (u))). Since u ∈ U and ε > 0 were arbitrary, it follows that
f is continuous.
3. OPEN AND CLOSED SETS 93
and
u ∈ U : f (u) ≥ 0 = f −1 ([0, ∞))
is open. ■
We now investigate open and closed sets a bit further. We show first that
arbitrary unions of open sets are open, and that arbitrary intersections of closed
sets are closed.
Proposition 4.30. Assume that (U, ∥·∥) S is a normed space and that Ai , i ∈ I,
are open subsets of U . Then the set A := i∈I Ai is open. T
Similarly, if Ki , i ∈ I, are closed subsets of U , then the set K := i∈I Ki is
closed.
S
Proof. Assume that u ∈ A = i∈I Ai . Then there exists an index j ∈ I such
that u ∈ Aj . Since Aj is open, there exists r > 0 such that Br (u) ⊂ Aj . This
implies that
[
Br (u) ⊂ Aj ⊂ Ai = A,
i∈I
which shows that A is open.
In order to show that K is closed, we note that
\ C [
KC = Ki = KiC .
i∈I i∈I
Remark 4.33. Since the union of open sets is open, and the intersection of
closed sets is closed, it follows that A◦ is open, and A is closed. Moreover, the set
∂A is the intersection of the closed set A and the closed set (A◦ )C , and therefore
closed as well.
In addition, we can characterise A◦ as the largest set contained in A, and A as
the smallest set containing A. ■
Proposition 4.34. Let (U, ∥·∥) be a normed space, and let A ⊂ U be a set.
Then u ∈ A◦ if and only if there exists r > 0 such that Br (u) ⊂ A.
Proof. Assume first that u ∈ A◦ . By the definition of A◦ , there exists an
open set D ⊂ A such that u ∈ D. Since D is open, there exists r > 0 such that
Br (u) ⊂ D. As a consequence, Br (u) ⊂ D ⊂ A.
Conversely, assume that there exists r > 0 such that Br (u) ⊂ A. Since the set
Br (u) is open and contained in A, it follows that Br (u) ⊂ A◦ . In particular, this
shows that u ∈ A◦ . □
Lemma 4.35. Let (U, ∥·∥) be a normed space, and let A ⊂ U be a set. Then
◦ C
(A ) = AC and (A)C = (AC )◦ .
Proof. The set A◦ is open and A◦ ⊂ A. Thus (A◦ )C is closed and AC ⊂
(A◦ )C . This shows that AC ⊂ (A◦ )C , as AC is the smallest closed set containing
AC .
Similarly, A is closed and A ⊂ A. Thus (A)C is open and (A)C ⊂ AC . This
shows that (A)C ⊂ (AC )◦ , as (AC )◦ is the largest open set contained in AC .
Now denote D := AC . Applying the first result we have shown in this proof,
we obtain that DC ⊂ (D◦ )C , which can be rewritten as A ⊂ ((AC )◦ )C , or (AC )◦ ⊂
(A)C .
Similarly, applying the second result we have shown in this proof to D = AC ,
we obtain that (D)C ⊂ (DC )◦ , or (AC )C ⊂ A◦ , which shows that (A◦ )C ⊂ AC . □
Lemma 4.36. Let (U, ∥·∥) be a normed space, and let A ⊂ U be a set. Then
∂A = A ∩ AC .
Proof. By definition, ∂A = A \ A◦ . However, by Lemma 4.35 we have that
C
A◦ = AC . Thus
∂A = A \ A◦ = A ∩ (A◦ )C = A ∩ AC .
□
Lemma 4.37. Let (U, ∥·∥) be a normed space, and let A ⊂ U be a set. Then
u ∈ A, if and only if for all r > 0 we have that Br (u) ∩ A ̸= ∅.
4. CAUCHY–SEQUENCES AND COMPLETENESS 95
til now by moving from the rational numbers to the so called algebraic numbers,
which include all roots of polynomials with rational coefficients.
However, we are now in the process of introducing analytic tools into linear
algebra, and analysis does not work well with rational (or algebraic) numbers, as
many of the well-known results from calculus fail to hold in that setting. For
instance, the intermediate value theorem that states that a continuous function f
on an interval [a, b] takes as function values all the numbers between f (a) and f (b)
does not hold if we only allow for function values that are rational.
A major problem is that many sequences of rational numbers that appear to be
converging, actually don’t. Or, to be more precise, they only converge if we allow
the limit to lie outside the rational numbers. In fact, this is the main reason for
the introduction of the real numbers. One intuition here is that Q is incomplete,
but has “holes” filled with the irrational numbers. Take for instance the sequence
(xk )k∈N of rational numbers defined by
k
X 1
(24) xk := .
ℓ!
ℓ=0
cannot even define ∥f ∥∞,1 , because we would need the derivative of f to be able to
do so.
In the cases discussed above, it appears pretty clear to us that the sequence we
consider should converge, and it is possible for us to find a limit after turning to
a larger space: the space of real numbers in the case of sequences in Q, and some
space of piecewise continuous functions or piecewise diffentiable functions in the
second case.
Now there are (at least) two questions: First, can we formally characterise
the situation where this phenomenon occurs. That is, can we formalise a notion of
completeness and incompleteness of normed spaces, and can we do so in an intrinsic
way without regarding any larger spaces. Second, if we know that a given normed
space (U, ∥·∥) is incomplete, can we find a larger space (V, ∥·∥) containing U , such
that every sequence in U that should converge, actually converges in V ? In order
to begin to tackle this question, we first have to clarify what we mean when we say
that a sequence “should converge”. Even more, we need to find a suitable definition
that does not rely on the limit of the sequence (which as of now might not exist).
For that, we recall the definition of convergence of a sequence in a normed
space: The sequence (un )n∈N converges in the normed space (U, ∥·∥) to u ∈ U , if
there exists for each ε > 0 some N ∈ N such that ∥un − u∥ < ε for all n ≥ N .
Now assume that this is the case, and take n, m ≥ N . Then the triangle inequality
implies that
∥un − um ∥ ≤ ∥un − u∥ + ∥um − u∥ < 2ε.
Note here that the inequality ∥un −um ∥ < 2ε makes no mention of the limit u of the
sequence. Thus we can use this in a definition of sequences that should converge.
Definition 4.45 (Cauchy sequence). A sequence (un )n∈N in a normed space
(U, ∥·∥) is a Cauchy sequence, if there exists for each ε > 0 some N ∈ N such that
∥un − um ∥ < ε whenever n, m ≥ N.
■
From our discussion before the definition, we immediately get the following
result:
Lemma 4.46. Every convergent sequence in a normed space is a Cauchy se-
quence.
Conversely, the sequence defined in (24) is a Cauchy sequence in Q, but does not
converge in that space. Moreover, within the real numbers, there is no difference
between convergent sequences and Cauchy sequences. As a preparation for this
result, we show that Cauchy sequences, like convergent sequences, are bounded.
Lemma 4.47. Assume that (U, ∥·∥) is a normed space and that (un )n∈N is a
Cauchy sequence in U . Then the set (un )n∈N is bounded.
Proof. Since the sequence (un )n∈N is a Cauchy sequence, there exists N ∈ N
such that ∥un − um ∥ < 1 whenever n, m ≥ N . Define now
r := max{∥u1 ∥, ∥u2 ∥, . . . , ∥uN ∥} + 1.
Then by construction un ∈ Br (0) for all 1 ≤ n ≤ N . Also, for n ≥ N we have that
∥un ∥ ≤ ∥un − uN ∥ + ∥uN ∥ < ∥uN ∥ + 1 ≤ r.
Thus we have that un ∈ Br (0) for all n ∈ N, which shows that the sequence is
bounded. □
Theorem 4.48. Every Cauchy sequence in the real numbers R converges.
98 4. BANACH AND HILBERT SPACES
Proof∗ . Assume that (xn )n∈N is a Cauchy sequence in R. Then by Lemma 4.47,
the sequence (xn )n∈N is bounded. Define now for N ∈ N
aN := inf xn : n ≥ N and bN := sup xn : n ≥ N .
Note that the existence of aN and bN follows from Axiom 1.36. Also, we have that
inf xn : n ∈ N = a1 ≤ aN ≤ bM ≤ b1 = sup xn : n ∈ N
for every N , M ∈ N. Define now
x := sup aN : N ∈ N .
Then we have by definition that x ≥ aN for all N . In addition, since aN ≤ bM
for all N and M , it follows that also x ≤ bM for all M ∈ N. We now claim that
x = limn→∞ xn .
Let therefore ε > 0. Since (xn )n∈N is a Cauchy sequence, there exists some
N ∈ N such that |xn− xm | < ε/2 for all n, m ≥ N . Let now n ≥ N be fixed.
Since aN = inf xk : k ≥ N , there exists some m ≥ N such that xm <
aN + ε/2. Thus
(xn − x) ≤ xn − aN < xn − (xm − ε/2) ≤ |xn − xm | + ε/2 < ε.
Moreover, since x ≤ bN , there exists some ℓ ≥ N such that xℓ > bN − ε/2. Thus
(x − xn ) ≤ bN − xn < xℓ + ε/2 − xn ≤ |xℓ − xn | + ε/2 < ε.
This shows that |x − xn | < ε for all n ≥ N , which proves that the sequence (xn )n∈N
converges to x. □
Remark 4.49. In the definition of a Cauchy sequence, it is crucial that we
take the distances between all elements in the sequence after the given index. If
Pn ∥xn − xn+1 ∥, the
we were to restrict ourselves to only the consecutive differences
definition would not be helpful: The harmonic series Sn := k=1 1/k diverges, but
|Sn+1 − Sn | = 1/n → 0. ■
Thus our Theorem above can be rephrased as saying that the real numbers are
complete.
Lemma 4.51. The spaces Cd and Rd with the Euclidean norm ∥·∥2 are complete.
Proof. We only prove the claim in the (slightly more cumbersome) case of
Cd .
Assume that (zn )n∈N is a Cauchy sequence in Cd . Denote now, for 1 ≤ j ≤ d,
(j) (j) (j) (j)
by zk ∈ C the j-th component of the vector zk and write zk = ak + ibk , where
(j) (j) (j)
ak and bk are the real and imaginary parts of zk , respectively. Now let ε > 0.
d
Since (zn )n∈N is a Cauchy sequence in C , there exists some N ∈ N such that
∥zn − zm ∥2 < ε whenever n, m ≥ N . Thus we have for every n, m ≥ N that
|a(j) (j) (j) (j)
n − am | ≤ |zn − zm | ≤ ∥zn − zm ∥ < ε,
and similarly
|b(j) (j) (j) (j)
n − bm | ≤ |zn − zm | ≤ ∥zn − zm ∥ < ε.
(j) (j)
Thus the sequences (an )n∈N and (bn )n∈N are Cauchy sequences in R and therefore
have limits, say
a(j) := lim a(j)
n and b(j) := lim b(j)
n .
n→∞ n→∞
d (1) (d)
Define now z ∈ C by z := (z ,...,z ) with
(j) (j)
z := a + ib(j) .
4. CAUCHY–SEQUENCES AND COMPLETENESS 99
We now consider specifically intervals [a, b] = [0, 1/2] and [a, b] = [1/2 + 1/N, 1]
for some fixed N ∈ N.
• Since fn (x) = 0 for all x ∈ [0, 1/2] and all n ∈ N, we have that
Z 1/2 Z 1/2 Z 1/2
0 = lim |fn (x) − f (x)| dx = lim |f (x)| dx = |f (x)| dx.
n→∞ 0 n→∞ 0 0
Since f is continuous, this is only possible if f (x) = 0 for all x ∈ [0, 1/2].
• Since fn (x) = 1 for all x ∈ [0, 1/2 + 1/N ] and all n ≥ N , we have that
Z 1 Z 1
0 = lim |fn (x) − f (x)| dx = lim |f (x) − 1| dx
n→∞ 1/2+1/N n→∞ 1/2+1/N
Z 1
= |f (x) − 1| dx.
1/2+1/N
The above considerations hold for each N ∈ N, and thus it follows that f (x) = 0
for all 0 ≤ x ≤ 1/2, and f (x) = 1 for all 1/2 < x ≤ 1. Therefore the possible limit
of the sequence (fn )n∈N is not continuous, and thus the sequence (fn )n∈N does not
converge in C([0, 1]). Thus the space is incomplete. □
The situation looks different when we regard the space C([0, 1]) with a different
norm.
Theorem 4.53. The space C([0, 1]) with the norm ∥f ∥∞ = maxx∈[0,1] |f (x)| is
complete.
Proof. Let (fn )n∈N be a Cauchy sequence in C([0, 1]). We have to show that
the sequence converges to some function f ∈ C([0, 1]). We start by identifying a
potential candidate for the limit function f . For each x ∈ [0, 1] we have that
|fn (x) − fm (x)| ≤ max |fn (y) − fm (y)| = ∥fn − fm ∥∞
y∈[0,1]
Since (fn )n∈N is a Cauchy sequence, this implies that the sequence of real numbers
(fn (x))n∈N is a Cauchy sequence as well. Because R is complete, there exists some
f (x) ∈ R with limn→∞ fn (x) = f (x).
We will now show that the resulting function f : [0, 1] → R is the limit of the
sequence of functions (fn )n∈N . For that, we have to show that f is continuous and
that ∥fn − f ∥∞ → 0 as n → ∞.
We will first show that f is continuous. Let therefore ε > 0. Since (fn )n∈N is
a Cauchy sequence, there exists N ∈ N such that
∥fn − f ∥∞ = max |fn (x) − fm (x)| < ε/3
x∈[0,1]
Now let y ∈ [0, 1] be arbitrary. The function fN is continuous, and thus there
exists δ > 0 such that |fN (y)−fN (z)| < ε/3 whenever |y −z| < δ. By applying (26)
with n = N we thus obtain for every z with |y − z| that
ε ε ε
|f (y) − f (z)| ≤ |f (y) − fN (y)| + |fN (y) − fN (z)| + |fN (z) − f (z)| < + + = ε.
3 3 3
Since ε was arbitrary, this shows that f is continuous at y ∈ [0, 1]. Since y was
arbitrary, it follows that f is continuous.
Now we use (26) again to conclude that
∥fn − f ∥∞ = max |fn (x) − f (x)| ≤ ε/3 for all n ≥ N.
x∈[0,1]
With essentially the same proof, we can show that the space C([0, 1], V ) with this
norm is complete as well provided that V is complete. Also, we can change the
interval [0, 1] to an arbitrary closed and bounded interval I ⊂ R (or a closed and
bounded set Ω ⊂ Rn ).3
Since the space Rd is complete respect to the Euclidean norm, this implies in
particular that the space C(I; Rd ) is complete with respect to ∥·∥∞ whenever I ⊂ R
is a closed and bounded interval. This is something we will make use of in Section 7
below. ■
Theorem 4.61. Assume that (U, ∥·∥) is a Banach space and that V ⊂ U is a
subspace. Then V is complete (with respect to the restriction of the norm ∥·∥ to V )
if and only if V is closed (as a subspace of U ).
Proof. Assume that V is complete as a normed space. Let moreover (un )n∈N
be a sequence with un ∈ V for all n ∈ N and limn→∞ un = u for some u ∈ U . We
have to show that u ∈ V .
Since the sequence (un )n∈N converges in U , it is a Cauchy sequence in U . This,
however, implies that it is a Cauchy sequence in V , which is complete. Thus the
sequence (un )n∈N converges to some w ∈ V . Because of the uniqueness of the limit
3The reason for requiring the interval (or set) to be closed and bounded is that we want
the maximum actually to exist and, in particular, be finite. If we allow non-closed or unbounded
intervals, this is no longer the case, and the “norm” of a function could become +∞. We could
remedy this problem, though, by restricting ourselves to bounded continuous functions, but we
won’t go that far.
102 4. BANACH AND HILBERT SPACES
5. Equivalence of Norms
Definition 4.62. Assume that U is a vector space and that ∥·∥a and ∥·∥b
are norms on U . We say that the norms are equivalent, if there exist constants
0 < c < C < ∞ such that
c∥u∥a ≤ ∥u∥b ≤ C∥u∥a
for all u ∈ U . ■
Example 4.63. The norms ∥·∥1 and ∥·∥∞ on Kn are equivalent. Indeed, we
have for every u ∈ Kn the estimate
n
X
∥u∥∞ = max |ui | ≤ |ui | = ∥u∥1 ≤ n max |ui | = n∥u∥∞ .
1≤i≤n 1≤i≤n
i=1
Example 4.64. The norms ∥·∥1 and ∥·∥∞ on C([0, 1]) are not equivalent. Al-
though we can estimate
Z 1
∥f ∥1 = |f (x)| dx ≤ max |f (x)| = ∥f ∥∞
0 x∈[0,1]
for all f ∈ C([0, 1]), there exists no constant C > 0 such that ∥f ∥∞ ≤ C∥f ∥1 for
all f ∈ C([0, 1]). This can be seen be regarding the functions fn (x) := xn . We
have ∥fn ∥∞ = 1 for all n ∈ N, but ∥fn ∥1 = 1/(n + 1). Thus such a constant C > 0
would have to satisfy the estimate C ≥ n + 1 for all n ∈ N, which is of course
impossible. ■
If ∥·∥a and ∥·∥b are equivalent norms, then they induce the same convergence
of sequences:
Proposition 4.65. Assume that U is a vector space and ∥·∥a and ∥·∥b are
equivalent norms on U . Assume moreover that (un )n∈N is a sequence in U .
• (un )n∈N is a Cauchy sequence with respect to ∥·∥a if and only if it is a
Cauchy sequence with respect to ∥·∥b .
• (un )n∈N converges with respect to ∥·∥a if and only if it converges with
respect to ∥·∥b . Moreover, in this case the limits are the same.
Proof. Exercise. □
As a consequence, equivalent norms define the same closed (and thus open)
sets on a vector space.
Corollary 4.66. Assume that U is a vector space and ∥·∥a and ∥·∥b are equi-
valent norms on U .
• A subset A ⊂ U is open with respect to ∥·∥a if and only if it is open with
respect to ∥·∥b .
• A subset K ⊂ U is closed with respect to ∥·∥a if and only if it is open with
respect to ∥·∥b .
5. EQUIVALENCE OF NORMS 103
Corollary 4.67. Assume that U is a vector space and ∥·∥a and ∥·∥b are equi-
valent norms on U . Then U is complete with respect to ∥·∥a if and only if it is
complete with respect to ∥·∥b .
Of particular importance is the case of a finite dimensional vector space. Here
we can show that in fact all norms are equivalent. Thus the choice of the norm has
no influence on the convergence of sequences or continuity of functions.
Theorem 4.68. Assume that U is a finite dimensional vector space. Then all
norms on U are equivalent.
Proof. Let (u1 , . . . , un ) be a basis of U . We define a norm ∥·∥1 on U setting
Xn n
X
∥u∥1 := |xi | if u = xi ui .
i=1 i=1
That is, ∥u∥1 is the 1-norm of the coordinate representation of u with respect to the
basis (u1 , . . . , un ). It is then sufficient to show that all norms on U are equivalent
to ∥·∥1 .
Assume therefore that ∥·∥ is a norm on U . Define
C := max{∥u1 ∥, . . . , ∥un ∥}.
Assume moreover that
n
X
u= xi ui
i=1
is a vector in U . Then
Xn n
X n
X
∥u∥ = xi ui ≤ |xi |∥ui ∥ ≤ |xi | max ∥uj ∥ = C∥u∥1 .
1≤j≤n
i=1 i=1 i=1
Next define
c := inf ∥v∥ : ∥v∥1 = 1 .
Clearly, c ≥ 0. In fact, we will show that c > 0. For this we note first that we can
equivalently write c as
n Xn n
X o
c = inf xi ui : x ∈ Kn , |xi | = 1 .
i=1 i=1
n
Define now the mapping f : K → R,
n
X
f (x) := xi ui .
i=1
We claim that f is continuous. By the reverse triangle inequality for the norm ∥·∥
we have for all x, y ∈ Kn that
n
X n
X n
X
|f (x) − f (y)| = xi ui − yi ui ≤ (xi − yi )ui
i=1 i=1 i=1
n n n
X X X √
≤ |xi − yi |∥ui ∥ ≤ C |xi − yi | = C 1 · |xi − yi | ≤ C n∥x − y∥2 .
i=1 i=1 i=1
Here we have used the Cauchy–Schwarz–Bunyakovsky inequality in the last step.
This, however, shows that f in fact is Lipschitz continuous on Kn . Now the Ex-
tremal Value Theorem implies that the function
f attains Pnits minimum (and max-
imum) on the closed and bounded set K := x ∈ Kn : i=1 |xi | = 1 . Moreover,
by construction of f we have that f (x) > 0 for all x ∈ K. Thus
c = inf f (x) > 0.
x∈K
104 4. BANACH AND HILBERT SPACES
Now let u ∈ U \ {0} and denote v := u/∥u∥1 . Then ∥v∥1 = 1 and thus ∥v∥1 ≥ c.
Thus
u
∥u∥ = ∥u∥1 = ∥u∥1 ∥v∥ ≥ c∥u∥1 ,
∥u∥1
which concludes the proof. □
Remark 4.70. Sometimes one defines the Lipschitz constant of f as the smal-
lest constant for which (27) holds. We will refrain from doing so, as the smallest
such constant is often difficult to find, whereas finding any such constant is often
feasible in practice. ■
Definition 4.73 (Contraction). Let (U, ∥·∥U ) and (V, ∥·∥V ) be normed spaces
and let f : U → V be a function. We say that f is a contraction, if f is Lipschitz
continuous with a Lipschitz constant L satisfying 0 ≤ L < 1. ■
Theorem 4.75 (Banach’s Fixed Point Theorem). Assume that (U, ∥·∥) is a
Banach space and that f : U → U is a contraction. Then there exists a unique
element u∗ ∈ U (a fixed point of f ) such that
(28) f (u∗ ) = u∗ .
Proof. Let 0 ≤ L < 1 be a Lipschitz constant of f .
We start by showing that the solution of (28), if it exists, is unique. Assume
therefore that u, v ∈ U satisfy f (u) = u and f (v) = v. Then
∥u − v∥ = ∥f (u) − f (v)∥ ≤ L∥u − v∥.
Since L < 1, this is only possible if ∥u − v∥ = 0 and thus u = v. This shows
uniqueness of a fixed point of f . It remains to show that a fixed point actually
exists.
Let now u0 ∈ U be arbitrary and consider the sequence given by
un+1 = f (un ).
We will show that this sequence converges to a fixed point of f .
We start by showing that this is actually a Cauchy sequence. For that, we note
that for all n ∈ N,
∥un+1 − un ∥ = ∥f (un ) − f (un−1 )∥ ≤ L∥un − un−1 ∥.
By induction over n we thus obtain that
∥un+1 − un ∥ ≤ Ln ∥u1 − u0 ∥.
Thus we have for all n ∈ N and k ∈ N that
k
X k
X
(29) ∥un+k − un ∥ ≤ ∥un+j − un+j−1 ∥ ≤ Ln+j−1 ∥u1 − u0 ∥
j=1 j=1
k ∞
X X Ln
= Ln ∥u1 − u0 ∥ Lj−1 ≤ Ln ∥u1 − u0 ∥ Lj = ∥u1 − u0 ∥.
j=1 j=0
1−L
un+1 = f (un )
f (u) ∈ K whenever u ∈ K
and let h ∈ C([0, 1]) be a given function. We want to solve the integral equation
Z 1
(30) f (x) + k(x, y) f (y) dy = h(x).
0
That is, we want to find a function f ∈ C([0, 1]) such that (30) holds for each
x ∈ [0, 1]. Using Banach’s fixed point theorem, we can show that this equation has
a unique solution: R1
Since k is continuous, it follows that the function x 7→ 0 k(x, y)f (y) dy is
continuous. Since h is assumed to be continuous as well, we can therefore define
the mapping T : C([0, 1]) → C([0, 1]),
Z 1
T f (x) := h(x) − k(x, y) f (y) dy.
0
= c∥f − g∥∞ .
Thus T is a contraction on the Banach space C([0, 1]), ∥·∥∞ , and therefore T
has a unique fixed point. Put differently, the equation (30) has, under the given
assumptions on k and h, a unique continuous solution f .
then y is necessarily differentiable (because the right hand side is) and it satis-
fies (31).
Define therefore the mapping T : C(I; Rd ) → C(I; Rd ),
Z t
T y(t) := y0 + f s, y(s) ds.
t0
Then y solves the initial value problem 31 if and only if it satisfies the fixed point
equation T y = y.
Step 2: Restrict T to a suitable subset of C(I; Rd ).
Our goal is to apply Banach’s fixed point theorem to the equation T y = y. For
that, we need to find a suitable subset K ⊂ C(I; Rd ) such that T maps K into itself
and T is a contraction on K. Define therefore
K := y ∈ C(I; Rd ) : |y(t) − y0 |2 ≤ b for all t ∈ I
Assume now that t > t0 . Since the mapping t 7→ |y(t)−y0 |2 is continuous, there
exists a minimal value t0 < t1 < t0 + a such that |y(t1 ) − y0 |2 = b. In particular,
|y(t1 ) − y0 |2 < b for all t0 < s < t1 . As a consequence
Z t1
b = |y(t1 ) − y0 |2 = f s, y(s) ds ≤ M |t1 − t0 | < M a = b,
t0 2
Since
1 ≥ e−2L|t−t0 | ≥ e−2La
for all t ∈ I, we have that
e2La ∥y∥∞ ≤ ∥y∥ ≤ ∥y∥∞
for all y ∈ C(I; Rd ). Thus ∥·∥ and ∥·∥∞ are equivalent norms on C(I; Rd ). Since
C(I; Rd ) with the norm ∥·∥∞ is complete, this implies that it is also complete with
the norm ∥·∥ and thus a Banach space.
110 4. BANACH AND HILBERT SPACES
Also, a set is closed with respect to ∥·∥ if and only if it is closed with respect to
∥·∥∞ . Since K is precisely the closed ball of radius b around the constant function
y0 with respect to ∥·∥∞ , it is closed for that norm. Thus K is closed for the norm
∥·∥ as well.
Step 6: Collect everything and call it a day.
We have shown that K is a closed subset of the Banach space C(I; Rd ) (Step
5), that T y ∈ K whenever y ∈ K (Step 2), and that the restriction of T to K is a
contraction (Step 4). Thus Banach’s fixed point theorem implies that the mapping
T has a unique fixed point y ∗ in K. Since solving the ODE (31) is equivalent to
finding a fixed point of T in C(I; Rd ) (Step 1), we can conclude that the solution (31)
has at least one solution (namely y ∗ ). Moreover, if z is another solution, it cannot
be contained in K. This, however, is not possible by Step 3. □
Remark 4.79. It is possible to show that the ODE (31) has a solution if the
function f is continuous; the Lipschitz condition we have assumed in the proof is
actually not necessary for that (this is the statement of the Peano existence theorem
for ODEs). However, without the Lipschitz condition, we can no longer guarantee
that the solution is unique. A classical example for this is the ODE
y ′ = y 1/3 , y(0) = 0.
An obvious solution of the initial value problem is the constant function y(t) = 0.
However, the functions
y± (t) = ±(2t/3)3/2
for t ≥ 0 are also solutions of the same ODE. Even worse, for any T > t0 we can
define the functions
(
0 if t < T,
y±,T (t) := 3/2
±(2(t − T )/3) if t ≥ T.
All of these functions solve the same initial value problem.
This does not contradict the Picard–Lindelöf Theorem, though, as the function
y 7→ y 1/3 is not Lipschitz continuous in a neighbourhood of 0. ■
Remark 4.80 (Picard iteration). Since we have proved Theorem 4.78 by show-
ing that the Banach fixed point Theorem is applicable, we obtain in addition that
the fixed point iteration (the Picard iteration)
Z t
(34) yk+1 (t) = T yk (t) = y0 + f s, yk (s) ds
t0
(where we use the initialisation y0 (t) = y0 , although this is not necessary for the
convergence) actually converges to a solution of the ODE (31).
If we apply this to the ODE
y ′ = y, y(0) = 1,
we obtain the iterates
y0 (t) = 1,
Z t
y1 (t) = 1 + 1 ds = 1 + t,
0
t
t2
Z
y2 (t) = 1 + 1 + s ds = 1 + t + ,
0 2
t
s2 t2 t3
Z
y3 (t) = 1 + 1+s+ ds = 1 + t + + ,
0 2 2 6
8. p-NORMS 111
and, in general,
n
X tk
yn (t) = ,
k!
k=0
which happens to be the truncated Taylor series expansion of the exponential func-
tion.
More generally, one can in principle use the iteration (34) in order to define a
numerical solution method for ODEs. A difficulty here is that the integrals that
need to be evaluated in each step of the iteration can in general not be solved
analytically. This problem can, however, be solved in practice by falling back
to numerical quadratures. Still, methods based on Picard iteration appear to be
limited to some niche applications, and in general one uses either some Runge–
Kutte method or some multi-step method for the numerical solution of initial value
problems. ■
8. p-norms
A particularly interesting and useful class of norms on Kn are the so called
p-norms, which are generalisations of the Euclidean norm as well as the norms ∥·∥1
and ∥·∥∞ .
Definition 4.81 (p-norm on Kn ). Let 1 ≤ p < ∞. We define the mapping
∥·∥p : Kn → R,
1/p
∥x∥p := |x1 |p + . . . + |xn |p
for x ∈ Kn .
Similarly, we define ∥·∥∞ : Kn → R by
∥x∥∞ := max |x1 |, . . . , |xn |
for x ∈ Kn . ■
We will show in the following that ∥·∥p is indeed a norm on Kn . The symmetry
and positivity are fairly obvious, but the triangle inequality is not, unless we are in
the simpler cases p = 1 or p = ∞. Also, for p = 2, we simply obtain the Euclidean
norm, which we have previously considered in the context of inner products.
For proving the triangle inequality in all other cases, we will need some addi-
tional results.
Definition 4.82 (Conjugate). Let 1 < p < ∞. We define the conjugate (or
Hölder conjugate) of p as the solution 1 < p∗ < ∞ of the equation
1 1
+ = 1.
p p∗
Moreover, for p = 1 we define the conjugate as p∗ := ∞, and for p = ∞ as
p∗ := 1. ■
Proof. For b = 0, this inequality holds obviously. Now assume that b > 0 and
consider the function f : R≥0 → R≥0 ,
1
f (a) := ab − ap .
p
Then (35) is equivalent to the statement that f (a) ≤ p1∗ bp∗ for all a ≥ 0. In order to
show this, we compute the maximum of f . Since lima→∞ f (a) = −∞ (because the
term − p1 ap decreases superlinearly towards −∞, whereas the term ab only increases
linearly), this maximum actually exists in R≥0 . Moreover, since f ′ (0) = b > 0, the
point 0 cannot be the maximum of f , and thus the function f attains the maximum
at some point 0 < a < ∞. In particular, we can therefore find this maximum by
computing the derivative of f and setting it to 0. We have that
f ′ (a) = b − ap−1 ,
which is equal to 0 for a = b1/(p−1) . Thus a = b1/(p−1) is the unique maximiser of
f , and therefore f (a) ≤ f (b1/(p−1) ) for all a ≥ 0. Thus
1 1
ab − ap = f (a) ≤ f (b1/(p−1) ) = bb1/(p−1) − bp/(p−1)
p p
1 p − 1 p/(p−1) 1
= bp/(p−1) − bp/(p−1) = b = bp∗ ,
p p p∗
which proves (35). □
Lemma 4.84 (Hölder’s inequality). Assume that 1 < p < ∞. Then we have for
all x, y ∈ Kn that
Xn Xn n
X n
1/p X 1/p∗
(36) xk yk ≤ |xk ||yk | ≤ |xk |p |yk |p∗ = ∥x∥p ∥y∥p∗ .
k=1 k=1 k=1 k=1
■
114 4. BANACH AND HILBERT SPACES
We now show that this defines a norm on C([0, 1]). The steps towards this
results are really the same as for the case (Kn , ∥·∥p ). We will first show Hölder’s
inequality for the integral, and then Minkowski’s inequality. Also, the proofs of
these inequalities are essentially the same as for Kn . All we have to do is to replace
every sum by an integral.4
Lemma 4.88 (Hölder’s inequality). Assume that 1 < p < ∞. Then we have for
all f , g ∈ C([0, 1]) that
Z 1 Z 1
(39) f (x)g(x) dx ≤ |f (x)g(x) dx|
0 0
Z 1 1/p Z 1 1/p∗
≤ |f (x)|p dx |g(x)|p∗ dx = ∥f ∥p ∥g∥p∗ .
0 0
1 1 |f (x)|p
Z 1
|g(x)|p∗
Z
1 1 1
= p dx + p∗ dx = + = 1.
p 0 ∥f ∥p p∗ 0 ∥g∥p∗ p p∗
□
Lemma 4.89 (Minkowski’s inequality). Assume that 1 < p < ∞. Then
Z 1 1/p
∥f + g∥p = |f (x) + g(x)|p dx
0
Z 1 1/p Z 1 1/p
p
≤ |f (x)| dx + |g(x)|p dx = ∥f ∥p + ∥g∥p .
0 0
for all f , g ∈ C([0, 1]).
Proof. We estimate
Z 1 Z 1
p p
|f (x) + g(x)|p−1 |f (x)| + |g(x)| dx
∥f + g∥p = |f (x) + g(x)| dx ≤
0 0
Z 1 (p−1)/p Z 1 1/p Z 1 1/p
p p
≤ |f (x) + g(x)| dx |f (x)| dx + |g(x)|p dx ,
0 0 0
where we have used Hölder’s inequality (39) and the fact that p∗ = p/(p − 1) in the
R (p−1)/p
1
last step. Then we obtain the result by dividing by 0 |f (x)+g(x)|p dx . □
Theorem 4.90. For all 1 ≤ p ≤ ∞, the function ∥·∥p : C([0, 1]) → R is a norm
on C([0, 1]).
4The fact that we can use essentially the same proof in two seemingly different situations
should give us a hint that these situations are not that different after all. In fact, both cases can
be seen as special cases of a result that holds on general so called “measure spaces.” For (many)
more results on this topic, I refer to the course TMA4225 – Foundations of Analysis.
9. SEQUENCE SPACES 115
9. Sequence Spaces
Assume that f : R → Z is a periodic function with period 2π. Assuming enough
regularity of f , we can then represent it as a Fourier series of the form
X
f (x) = cn einx ,
n∈Z
where Z 2π
1
cn = f (x)e−inx dx
2π 0
is the n-th Fourier coefficient of f . This Fourier coefficient indicates the contribution
of oscillations of angular frequency n to the whole function f . For many practical
problems it then can make sense to work directly with the Fourier coefficients
instead of working with the function f itself. In other words, it makes sense to
identify the function f with its Fourier series, which is a sequence (cn )n∈Z . In this
section, we will thus develop a theory of spaces of sequences and norms on such
spaces.
Definition 4.91 (ℓp -spaces). Let 1 ≤ p < ∞. By ℓp (or also ℓp (N)), we denote
the set of all sequences (xk )k∈N such that
X∞ 1/p
∥x∥p := |xk |p < ∞.
k=1
∞ ∞
Moreover, we denote by ℓ (or ℓ (N)) the set of all bounded sequences, that is,
all sequences for which
∥x∥∞ := sup|xk | < ∞.
k∈N
■
In the following we will show that these are indeed vector spaces and that the
mapping ∥·∥p : ℓp → R is a norm on ℓp .
Remark 4.93. One or both sides of the inequality (40) may in principle by
infinite. However, the inequality in this case still is satisfied. That is, if the right
hand side of (40) is finite, then so is the left hand side. In other words, if x ∈ ℓp
and y ∈ ℓp∗ , then the pointwise product z defined by zk = xk yk satisfies z ∈ ℓ1 . ■
Theorem 4.94 (ℓp as normed space). Let 1 ≤ p ≤ ∞. Then (ℓp , ∥·∥p ) is a
normed space.
Proof. We only prove the claim for 1 ≤ p < ∞ and leave the case p = ∞ to
the interested reader.
We have to show that ℓp is a vector space and that ∥·∥p is a norm on ℓp . For
the former, we will show that ℓp is a subspace of the vector space of all sequences
116 4. BANACH AND HILBERT SPACES
for all n ∈ N. Taking the limit n → ∞ and using that ∥x∥p < ∞ and ∥y∥p < ∞, it
follows that
X∞ 1/p X ∞ ∞
1/p X 1/p
∥x+y∥p = |xk +yk |p ≤ |xk |p + |yk |p = ∥x∥p +∥y∥p < ∞.
k=1 k=1 k=1
p
This shows that x + y ∈ ℓ , and also that ∥·∥p satisfies the triangle inequality.
Thus ℓp is a subspace of the space of all sequences and thus a vector space
itself, and ∥·∥p is a norm on ℓp . □
The space ℓ∞ is the space of all bounded sequence, whereas ℓ1 is the space
of all absolutely summable sequences. Since every summable sequence necessary
has to be bounded (and, in fact, converge to 0), it follows that ℓ1 is contained
in ℓ∞ . However, the converse inclusion does not hold, as the constant sequence
x = (1, 1, 1, . . .) is contained in ℓ∞ but certainly not in ℓ1 . Thus we have that
ℓ1 ⊊ ℓ∞ . This type of inclusion can be generalised to all ℓp spaces:
Lemma 4.96. Assume that 1 < p < q < ∞. Then
ℓ1 ⊊ ℓp ⊊ ℓq ⊊ ℓ∞ .
Proof. We show first that all the inclusions hold. Here we note first that
all sequences in ℓq necessarily have to converge to 0 and thus are bounded, which
proves the last inclusion. Next assume that x ∈ ℓp . Then
X∞ ∞
X ∞
X
∥x∥qq = |xk |q = |xk |q−p |xk |p ≤ sup|xk |q−p |xk |p = sup|xk |q−p ∥x∥pp .
k∈N k∈N
k=1 k=1 k=1
9. SEQUENCE SPACES 117
Since q > p and the sequence x is bounded, it follows that ∥x∥q < ∞ and thus
x ∈ ℓq . This shows that ℓp ⊂ ℓq . Moreover, the inclusion ℓ1 ⊂ ℓp can be shown
using a similar argument.
Now we show that the inclusions are actually strict. First we note that the
constant sequence x = (1, 1, . . .) is contained in ℓ∞ but not in ℓq , which shows that
ℓ1 ⊊ ℓ∞ . Next consider the sequence x = (x1 , x2 , . . .) given by
1
xk = .
k 1/p
Then
∞
X 1
∥xk ∥qq = q/p
<∞
k=1
k
as q > p and therefore x ∈ ℓq . However,
∞
X 1
∥xk ∥pp = = +∞,
k
k=1
p p q
which shows that x ̸∈ ℓ . Thus ℓ ⊊ ℓ . Moreover, the proof of the strict inclusion
ℓ1 ⊊ ℓp is similar. □
Lemma 4.97 (Inner product on ℓ2 ). The norm ∥·∥2 on the space ℓ2 is induced
by the inner product
X∞
⟨x, y⟩ := xk yk .
k=1
Proof. It is clear from the definition that ∥x∥2 = (⟨x, x⟩)1/2 . Thus we only
have to show that ⟨·, ·⟩ is actually an inner product. However, it is straightforward
to verify that all the requirements for an inner product, that is, linearity in the first
component, conjugate symmetry, and positive definiteness, are satisfied. □
As a final result, we show that all the ℓp -spaces are actually complete.
Theorem 4.98. The space (ℓp , ∥·∥p ) is a Banach space for all 1 ≤ p ≤ ∞, and
a Hilbert space for p = 2.
Proof. We only have to show that (ℓp , ∥·∥p ) is complete. For simplicity, we
will restrict ourselves to the case p = 1. (The case 1 < p < ∞ is similar; the case
p = ∞ is simpler, and actually a problem on the current exercise sheet.)
Assume that (x(n) )n∈N ⊂ ℓ1 is a Cauchy sequence. We have to show that this
sequence convergences with respect to ∥·∥1 to some x ∈ ℓ1 . We start by identifying
a candidate for x.
Since (x(n) )n∈N is a Cauchy sequence, there exists for all ε > 0 some N ∈ N
such that
∞
(n) (m)
X
∥x(n) − x(m) ∥1 = |xk − xk | < ε
k=1
whenever n, m ≥ N . In particular, this shows that
(n) (m)
|xk − xk | < ε
(n)
for all k ∈ N and all n, m ≥ N , and thus (xk )n∈N ⊂ K is for all k ∈ N a Cauchy
(n)
sequence in K. Since K is complete, it follows that xk := limn→∞ xk exists for
all k ∈ N. Define now x := (x1 , x2 , . . .). We will show that x ∈ ℓ1 and that
limn→∞ ∥x(n) − x∥1 = 0, that is, x = limn→∞ x(n) with respect to ∥·∥1 .
118 4. BANACH AND HILBERT SPACES
Note here that we were able to exchange the order of the limit and the sum, as we
were only dealing with a finite sum. The estimate (41) holds for all K ∈ N. Thus
it still holds in the limit K → ∞, which shows that
K
X
∥x∥1 = lim |xk | ≤ R,
K→∞
k=1
and, in particular, x ∈ ℓ1 .
Finally we show that limn→∞ ∥x(n) − x∥1 = 0. Let therefore ε > 0. Since
(x(n) )n∈N is a Cauchy sequence, there exists N ∈ N such that ∥x(n) − x(m) ∥1 < ε
whenever n, m ≥ N . In particular, we have for every K ∈ N that
K
(n) (m)
X
|xk − xk | < ε for all n, m ≥ N.
k=1
for all K ∈ N and all n ≥ N . Now we can again take the limit K → ∞ and obtain
that
∞
(n)
X
∥x(n) − x∥1 = |xk − xk | ≤ ε
k=1
for all n ≥ N . Since ε > 0 was arbitrary, this proves that ∥x(n) − x∥1 → 0. Thus
(ℓ1 , ∥·∥1 ) is complete. □
10. Completions
We have seen that there exist incomplete normed spaces like, for instance, the
space C([0, 1]) with the norm ∥·∥1 . Now one can ask the question, whether it is
possible to enlarge this space in such a way that it becomes complete. The basic
intuition for this is that we only need to “add all limits of the non-convergent
Cauchy sequences” to the space in order to achieve this goal. In this section, we
will discuss this idea in an somewhat more mathematical manner. For that, we
have to introduce some more notation.
Definition 4.99 (Isomorphism). Let U and V be vector spaces (over the same
field K). An isomorphism between U and V is a bijective linear transformation
T : U → V . The spaces U and V are called isomorphic, if an isomorphism between
U and V exists. ■
10. COMPLETIONS 119
Example 4.100. The spaces Kn+1 and Pn (K) (of polynomials of degree ≤ n)
are isomorphic. An example of an isomorphism between Kn+1 and Pn (K) is the
mapping that maps a vector (c0 , c1 , . . . , cn ) to the polynomial p(x) = c0 + c1 x +
. . . + cn xn . ■
2
Example 4.101. The spaces Kn and Matn (K) are isomorphic. One example
2
of an isomorphism between Kn and Matn (K) is the mapping that maps a vector
2
x ∈ Kn to the matrix A = (aij )1≤i,j≤n ∈ Matn (K) with entries given as aij =
xni+j for 1 ≤ i, j ≤ n. ■
Essentially, we say that the spaces U and V are isomorphic if they “look the
same” as vector spaces. However, we ignore all additional structures like norms or
inner product in this definition.
Definition 4.103 (Isometry). Let (U, ∥·∥U ) and (V, ∥·∥V ) be normed spaces
and let T : U → V be linear. We say that T is an isometry or an embedding, if
∥T u∥V = ∥u∥U for all u ∈ U .
We say that (U, ∥·∥U ) is embedded in (V, ∥·∥V ), if there exists an embedding
T: U →V. ■
Example 4.105. The space (Kn , ∥·∥p ) is embedded in (Kn+1 , ∥·∥p ). One ex-
ample of an embedding is the mapping T : Kn → Kn+1 defined by (x1 , . . . , xn ) 7→
(x1 , . . . , xn , 0).
In general, however, the space (Kn , ∥·∥p ) is not embedded in (Kn+1 , ∥·∥q ) if
q ̸= p. Whereas there exist many linear mappings T : Kn → Kn+1 , in general it is
impossible to find any one that satisfies ∥T u∥q = ∥u∥p for all u ∈ Kn .5 ■
Theorem 4.106 (Completion). Assume that (U, ∥·∥U ) is a normed space. Then
there exists a Banach space (V, ∥·∥V ) and an embedding i : U → V such that ran i
is dense in V .
Moreover, the space V is unique in the following sense: If W is another Banach
space with the same properties, then the spaces V and W are isometrically iso-
morphic. (That is, there exists a bijective isometry T : V → W .)
Definition 4.107. The space (V, ∥·∥V ) from Theorem 4.106 is called the com-
pletion of (U, ∥·∥U ). ■
Intuitively, we can say that U is a dense subspace of V and that the mapping
i : U → V is just the inclusion operator that maps u ∈ U to itself, but now seen as
an element of V . Because of the density of U in V , it now follows that every element
v ∈ V can be approached by a sequence (uk )k∈N ⊂ U such that v = limk→∞ uk .
We thus can say informally that the completion V of U consists of the “limits” of
all Cauchy sequences in the space U .
We now consider the case p = ∞. In this case, it is easy to see that cfin is not
dense in ℓ∞ . Indeed, take x = (1, 1, . . .) ∈ ℓ∞ . Then, if y = (y1 , . . . , yK , 0, . . .) ∈ cfin
is any finite sequence, it follows that ∥x − y∥∞ ≥ 1. Thus it is impossible to find
any sequence of finite sequences that converges to x.
In fact, one can show that the completion of cfin with respect to ∥·∥∞ is the
space c0 of all sequences that converge to 0. For a proof, I have to refer to the
exercises. ■
Remark 4.109. One important example in practice is the case of the space
C([0, 1]) with the norm ∥·∥p , 1 ≤ p < ∞. Here the completion of (C([0, 1]), ∥·∥p ) is
the space Lp ([0, 1]) of Lebesgue p-integrable functions. A detailed discussion of the
construction of these spaces can be found in the course TMA4225 - Foundations of
Analysis. ■
121
122 5. BOUNDED LINEAR OPERATORS
Example 5.7. We now revisit the examples from the start of this section.
• Every mapping A : Kn → Km (defined by a matrix) is bounded and thus
continuous with respect to ∥·∥1 on both spaces. More general, it is possible
to show that every linear mapping between finite dimensional normed
spaces is continuous.
• The mapping T : C([0, 1]) → K, T f = f (1), is bounded and thus continu-
ous with respect to ∥·∥∞ , but unbounded and discontinuous with respect
to ∥·∥1 .
• If k : [0, 1]2 → K is continuous, then the mapping T : C([0, 1]) → C([0, 1]),
R1
T f (x) = 0 k(x, y)f (y) dy is bounded and thus continuous with respect
to ∥·∥∞ .
■
Both of these operators are bounded. Specifically, we have that ∥Rx∥p = ∥x∥p for
all x ∈ ℓp , and ∥Lx∥p ≤ ∥x∥p for all x ∈ ℓp . ■
This mapping is well-defined (in the sense that the infinite series on the right hand
side converges for all x ∈ ℓ1 , as
∞
X X
xk yk ≤ sup|yk | |xk | = ∥y∥∞ ∥x∥1 .
k
k=1 k
In fact, this last inequality actually shows that T is bounded, the constant C in (42)
being ∥y∥∞ .
More general, let 1 ≤ p ≤ ∞ and let p∗ be the Hölder conjugate of p. Let
moreover y ∈ ℓp∗ be fixed and define T : ℓp → K,
∞
X
Tx = xk yk .
k=1
Assume now that U and V are normed spaces. Our next goal is to show that
the set of bounded linear operators T : U → V is a subspace of L(U, V ). Moreover,
we will be able to define a natural norm on that space.
Definition 5.11. Let (U, ∥·∥U ) and (V, ∥·∥V ) be normed spaces. By B(U, V )
we denote the set of bounded linear operators T : U → V . ■
Lemma 5.12. Let (U, ∥·∥U ) and (V, ∥·∥V ) be normed spaces. The set B(U, V )
is a vector space.
2. NORMS OF BOUNDED LINEAR OPERATORS 125
Theorem 5.16. Let (U, ∥·∥U ) and (V, ∥·∥V ) be normed spaces. Then ∥·∥B(U,V )
as defined in (43) defines a norm on B(U, V ).
Proof. Exercise. □
Lemma 5.17. Assume that (U, ∥·∥U ), (V, ∥·∥V ), and (W, ∥·∥W ) are normed
spaces and that T : U → V and S : V → W are bounded linear. The the com-
position S ◦ T : U → W is bounded linear and
(45) ∥S ◦ T ∥B(U,W ) ≤ ∥S∥B(V,W ) ∥T ∥B(U,V ) .
Proof. For every u ∈ U we have
∥(S ◦ T )u∥W = ∥S(T u)∥W ≤ ∥S∥B(V,W ) ∥T u∥V ≤ ∥S∥B(V,W ) ∥T ∥B(U,V ) ∥u∥U .
This shows that S ◦ T is bounded. Moreover, we can use C = ∥S∥B(V,W ) ∥T ∥B(U,V )
as a constant in the definition (42). Now (45) follows from the definition of ∥S ◦
T ∥B(U,W ) as the smallest such constant. □
the norm of the matrix (or operator) A with respect to the p-norm on both spaces.
We have four different cases:
(1) The case p = 2, that is,
∥A∥2 := sup ∥Ax∥2 .
∥x∥2 =1
(4) All other cases, that is, p ̸= 1, 2, ∞. Here no analytic formulas for the p-
norm of a general matrix A ∈ Km×n exist, but it is possible to approximate
solutions of the optimisation problem
max ∥Ax∥p subject to ∥x∥p = 1
x∈Kn
numerically.
■
Example 5.19. More general, we can use different norms on the domain Kn
and the target space Km , say the p-norm on Kn and the q-norm on Km with
p ̸= q. That is, we interpret the matrix A as a mapping between the normed spaces
(Kn , ∥·∥p ) and (Km , ∥·∥q ). The resulting operator norm is defined as
∥A∥p7→q := sup ∥Ax∥q .
∥x∥p =1
In the specific case p = 1, we again have a simple explicit formula for the resulting
norm, namely
Xm 1/q
(46) ∥A∥17→q = max |aij |q
1≤j≤n
i=1
for 1 ≤ q < ∞, and
∥A∥17→∞ = max |aij |
1≤j≤n,
1≤i≤m
for q = ∞.
We will show now that (46) holds for 1 ≤ q < ∞. Denote by ej ∈ Kn the j-th
standard basis vector. Then we have for all x ∈ Kn with ∥x∥1 = 1 that
n
X n
X n
X
∥Ax∥q = A(xj ej ) = xj (Aej ) ≤ |xj |∥Aej ∥q
q q
j=1 j=1 j=1
m
X 1/q
|aij |q
≤ ∥x∥1 max ∥Aej ∥q = max .
1≤j≤n 1≤j≤n
i=1
Moreover, if 1 ≤ k ≤ n is an index where the maximum in (46) is attained, then
Xm 1/q m
X 1/q
∥Aek ∥q = |aik |q = max |aij |q .
1≤j≤n
i=1 i=1
For the case q = ∞, the proof is similar.
It turns out, however, that the cases p ̸= 1 are much more difficult to treat,
and there no longer exist any analytical formulas (unless of course p = q = 2 and
128 5. BOUNDED LINEAR OPERATORS
Example 5.20. Let 1 ≤ p ≤ ∞ and consider again the left and right shift
operators L, R : ℓp → ℓp ,
L(x1 , x2 , . . .) = (x2 , x3 , . . .) and R(x1 , x2 , . . .) = (0, x1 , x2 , . . .).
We will show that ∥L∥ = ∥R∥ = 1. For simplicity, we only show this in the case
1 ≤ p < ∞, though the proof for p = ∞ is essentially the same (but the definition
of the p-norm looks different).
First we note that
X∞ 1/p ∞
X 1/p X ∞ 1/p
∥Rx∥p = |(Rx)k |p = 0+ |xk−1 |p = |xk |p = ∥x∥p .
k=1 k=2 k=1
This shows that
∥Rx∥p
=1
∥x∥p
for all x ∈ ℓp , and thus also ∥R∥ = 1.
Concerning the left shift operator L, we have that
X∞ 1/p X ∞ ∞
1/p X 1/p
∥Lx∥p = |(Lx)k |p = |xk+1 |p = |xk |p ≤ ∥x∥p ,
k=1 k=1 k=2
and thus
∥Lx∥p
≤ 1.
∥L∥ = sup
x̸=0 ∥x∥p
Moreover, if we consider for instance the sequence x = (0, 1, 0, 0, . . .), we see that
∥x∥p = 1 and ∥Lx∥p = ∥(1, 0, . . .)∥p = 1 as well. Thus
∥(1, 0, . . .)∥p
∥L∥ ≥ = 1.
∥(0, 1, 0, . . .)∥p
Together, we obtain again that ∥L∥ = 1. ■
Example 5.21. Let k : [0, 1] × [0, 1] → K be continuous and define the linear
mapping T : C([0, 1]) → C([0, 1]),
Z 1
T f (x) = k(x, y)f (y) dt.
0
We have previously seen that T is bounded with respect to ∥·∥∞ on both copies of
C([0, 1]), and we have also derived the estimate
∥T f ∥∞ ≤ max 2 |k(x, y)| ∥f ∥∞
(x,y)∈[0,1]
However, we cannot conclude that we have equality in this estimate, and, in fact,
in general we have not.
In fact, we can obtain a better estimate of the norm of T as follows:
Z 1
(47) ∥T f ∥∞ = max |T f (x)| = max k(x, y)f (y) dy
x∈[0,1] x∈[0,1] 0
Z 1 Z 1
≤ max |k(x, y)| dy max |f (y)| = max |k(x, y)| dy ∥f ∥∞ .
x∈[0,1] 0 y∈[0,1] x∈[0,1] 0
3. EXTENSION OF LINEAR MAPPINGS 129
Now a question is, whether this also works in infinite dimensional vector spaces.
At first, the answer appears to be yes: First of all, we note that the dimension of
V does not play any role in the consideration above. Now assume that U is infinite
dimensional and that (ui )i∈I is a Hamel basis of U (with some infinite index set I).
Then we can again write each u ∈ U uniquely in the form
u = α1 ui1 + . . . + αN uiN
for some N ∈ N, indices i1 , . . . , iN ∈ I, and coefficients α1 , . . . , αN ∈ K. Thus, if
we know the values of T ui ∈ V for all i ∈ I, then we have that
T u = α1 (T ui1 ) + . . . + αN (T uiN ).
There is, however, a pretty significant catch: All of this only works if we actually
have a basis in the sense of linear algebra, that is, if we have vectors ui , i ∈ I, such
that we can write each element u ∈ U as a finite linear combination of the vectors
ui . In most situations we encounter in practice, this is not the case. Consider for
instance the case U = ℓ2 of all square summable sequences. Similar to the case of
Kn we can define there “unit sequences”
(49) ei = (0, . . . , 0, |{z}
1 , 0, . . .)
i-th position
for i ∈ N. However, these sequences do not form a basis of ℓ2 in the sense above,
as it is impossible to write an infinite sequence as a finite linear combination of the
unit sequences ei .
130 5. BOUNDED LINEAR OPERATORS
Note, however, that these unit sequences actually do form a basis of cfin , as we
can write any finite sequence x = (x1 , . . . , xN , 0, . . .) as the finite linear combination
N
X
x= xi ei .
i=1
Now assume that V is a vector space and that we are given vectors vi ∈ V , i ∈ N.
Then we can define a linear mapping
T : cfin → V,
(x1 , . . . , xN , 0, . . .) 7→ x1 v1 + . . . + xN vN .
The question now becomes, whether it is possible to extend this mapping from cfin
to the whole space ℓ2 in a meaningful way.
Definition 5.22 (Extension). Assume that X and Y are sets and that S ⊂ X
is a subset. Assume moreover that T : S → Y is some mapping. An extension of T
to X is a mapping
T̄ : X → Y
such that
T̄ (x) = T (x) for all x ∈ S.
■
The next result states that extensions of bounded linear operators exist.
Theorem 5.23. Assume that (U, ∥·∥U ) is a normed space and that (V, ∥·∥V )
is a Banach space. Assume moreover that M ⊂ U is a dense subspace and that
T : M → V is a bounded linear operator. Then there exists a unique bounded linear
extension T̄ of T to U . In addition, we have that
∥T̄ ∥B(U,V ) = ∥T ∥B(M,V ) .
Proof. We start by constructing the mapping T̄ .
Assume that u ∈ U . Because M ⊂ U is dense, there exists a sequence (un )n∈N
with un ∈ M for all n ∈ N and u = limn→∞ un . Now the boundedness of T implies
that
∥T un − T um ∥V = ∥T (un − um )∥V ≤ ∥T ∥B(M,V ) ∥un − um ∥U .
Since (un )n∈N is a Cauchy sequence (as it is convergent), this implies that the
sequence (T un )n∈N is a Cauchy sequence in V . Now the completeness of V implies
that the limit limn→∞ T un exists in V . Define therefore
T̄ u := lim T un .
n→∞
We now show that the value T̄ u ∈ V does not depend on the choice of the
sequence (un )n∈N converging to u. Indeed, assume that (vn )n∈N is another sequence
with u = limn→∞ vn and vn ∈ M for all n ∈ N. Then we have in particular that
∥un − vn ∥U ≤ ∥un − u∥U + ∥u − vn ∥U →n→∞ 0.
Thus it follows that
∥T̄ u − T vn ∥V ≤ ∥T̄ u − T un ∥V + ∥T un − T vn ∥V
≤ ∥T̄ u − T un ∥V + ∥T ∥∥un − vn ∥U →n→∞ 0,
that is T̄ u = limn→∞ T vn .
In particular, for u ∈ M we can choose un = u for all n, which implies that
T̄ u = T u for all u ∈ M . Thus T̄ is indeed an extension of T .
3. EXTENSION OF LINEAR MAPPINGS 131
Next, we will show that T̄ is in fact linear. For that, assume that u, v ∈ U , and
let (un )n∈N and (vn )n∈N be sequences in M converging to u and v, respectively.
Then
T̄ u = lim T un and T̄ v = lim T vn .
n→∞ n→∞
Now the continuity of addition in a normed space implies that
lim (un + vn ) = lim un + lim vn = u + v
n→∞ n→∞ n→∞
This shows that T̄ is bounded linear and ∥T̄ ∥B(U,V ) ≤ ∥T ∥B(M,V ) . On the other
hand, we have that
∥T̄ ∥B(U,V ) = sup ∥T̄ u∥U ≥ sup ∥T u∥U = ∥T ∥B(M,V ) ,
u∈U, u∈M,
∥u∥U =1 ∥u∥U =1
Example 5.26. Let 1 ≤ p < ∞. Then cfin is dense in ℓp and the unit se-
quences ei , i ∈ N, as defined in (49) form a basis of cfin . Assume now that V is a
Banach space and let vi , i ∈ N, be such that the sequence (∥v1 ∥V , ∥v2 ∥, ∥v3 ∥, . . .)
is contained in ℓp∗ , where p∗ is the Hölder conjugate of p. Define now the mapping
T : cfin → V by
T (x1 , . . . , xN , 0, . . .) = x1 v1 + . . . + xN vN .
Then we have for all x = (x1 , . . . , xN , 0, . . .) ∈ cfin that
Example 5.27. Denote by CK (R) the set of continuous functions with compact
support. That is, f ∈ CK (R), if f is continuous and there exists R > 0 such that
f (x) = 0 for all x ∈ R with |x| ≥ R. On CK (R) we can consider the norm ∥·∥1
defined by
Z ∞
∥f ∥1 := |f (x)| dx.
−∞
Note here that ∥f ∥1 is finite for all f ∈ CK (R): If f ∈ CK (R), then there exists
R > 0 such that f (x) = 0 for all x ̸∈ [−R, R]. Thus
Z ∞ Z R
∥f ∥1 = |f (x)| dx = |f (x)| dx ≤ 2R max |f (x)| < ∞,
−∞ −R x∈[−R,R]
as the function |f | is continuous and thus admits its maximum on the bounded and
closed interval [−R, R].
Denote now by Cb (R) the space of bounded continuous mappings on R. If is
possible to show that Cb (R) is a Banach space with the norm
∥g∥∞ := sup|g(x)|.
x∈R
1A concrete construction of this space relies on measure theory and is discussed in the course
“Foundations
R∞ of Analysis.” Essentially, L1 (R) consists of all functions f for which the integral
−∞ |f (x)| dx can be reasonably defined and is finite.
2Alternatively, if (u )
n n∈N ⊂ ker T is a convergent sequence with u = limn→∞ un , then
T un = 0 for all n ∈ N (as un ∈ ker T for all n), and thus the continuity of T implies that
T u = limn→∞ T un = 0 as well, which implies that also u ∈ ker T .
134 5. BOUNDED LINEAR OPERATORS
which shows that T is bounded. However, the range of T is not closed. Take for
instance the sequence of functions
r
1 1
gn (x) := x + − √ .
n n
For all n ∈ N we have that gn is continuously differentiable on the interval [0, 1]
and gn (0) = 0. Thus gn ∈ ran T for all n ∈ N. In addition, we have limn→∞ gn = g
with √
g(x) = x.
Since the derivative g ′ (0) does not exist, the function g is not contained in ran T .
That is, the equation Z x
√
f (y) dy = y
0
has no solution f ∈ C([0, 1]). This shows that ran T is not a closed linear subspace
of C([0, 1]). ■
Theorem 5.30. Assume that (U, ∥·∥U ) and (V, ∥·∥V ) are Banach spaces and
that T : U → V is bounded linear. Assume that there exists C > 0 such that
(50) ∥T u∥V ≥ C∥u∥U
for all u ∈ U . Then ran T ⊂ V is closed and thus a Banach space. Moreover, T
has a bounded linear inverse
T −1 : ran T → U
and we can estimate
1
∥T −1 ∥B(ran T,U ) ≤ ,
C
where C > 0 is the constant from (50).
Proof. We show first that T is injective. Assume to that end that u ∈ ker T ,
that is, T u = 0. Then
0 = ∥T u∥V ≥ C∥u∥U ,
which shows that ∥u∥U = 0 and thus u = 0. Thus ker T = {0}, which shows the
injectivity of T .
Next we show that ran T is closed. Assume to that end that (vn )n∈N is a
sequence such that vn ∈ ran T for all n ∈ N and that v = limn→∞ vn exists in V .
We have to show that v ∈ ran T . That is, we need to show that there exists u ∈ U
such that T u = v.
Because vn ∈ ran T , there exist un ∈ U , n ∈ N, such that T un = vn . By (50),
we can now estimate
∥T un − T um ∥V = ∥T (un − um )∥V ≥ C∥un − um ∥U
4. RANGE AND KERNEL 135
for all n, m ∈ N, or
1 1
(51) ∥un − um ∥U ≤ ∥T un − T um ∥V = ∥vn − vm ∥.
C C
By assumption, the sequence (vn )n∈N converges, which implies that it is a Cauchy
sequence. Now (51) implies that the sequence (un )n∈N is a Cauchy sequence as
well. Because U is a Banach space, it is complete, and thus the Cauchy sequence
(un )n∈N converges to some u ∈ U . Now the boundedness of T implies that
T u = T lim un = lim T un = lim vn = v.
n→∞ n→∞ n→∞
We now show that the inequality (50) is also a necessary condition for T to
have a bounded inverse.
Lemma 5.31. Assume that (U, ∥·∥U ) and (V, ∥·∥V ) are normed spaces and that
T : U → V is bounded linear and bijective. Assume in addition that T −1 : V → U
is bounded. Then there exists C ≥ 0 such that
(52) ∥T u∥V ≥ C∥u∥U for all u ∈ U.
Proof. Without loss of generality assume that U , V ̸= {0}. Then T −1 = ̸ 0
and thus ∥T −1 ∥B(V,U ) > 0. Now let u ∈ U be arbitrary. Then u = T −1 T u, and
thus we can estimate
∥u∥U = ∥T −1 T u∥U ≤ ∥T −1 ∥B(V,U ) ∥T u∥V .
Remark 5.32. Assume that (U, ∥·∥U ) and (V, ∥·∥V ) are Banach spaces and
that T : U → V is bounded linear. Then Theorem 5.30 implies that T has a
bounded linear inverse T −1 : V → U provided that ran T ⊂ V is dense and the
estimate (50) holds. (The density of ran T implies that the closure of ran T is equal
to V ; because ran T already is closed by Theorem 5.30 it is equal to its closure and
thus ran T = V .)
Conversely, assume that T : U → V is bounded linear and that T is bijective.
Then it is possible to show that (50) holds for some C > 0 and thus T −1 is actually
a bounded linear mapping as well. This result, however, is far from trivial and goes
beyond the scope of this class.3 ■
• We consider the Banach space (C([0, 1]), ∥·∥∞ ). For x ∈ [0, 1] define
δx : C([0, 1]) → K,
δx (f ) := f (x).
Then
|δx (f )| = |f (x)| ≤ sup |f (y)| = ∥f ∥∞ ,
y∈[0,1]
which shows that δx is a bounded linear functional. The mapping δx is
called the point evaluation at x, or also the Dirac δ-distribution.4
• Assume that U is an inner product space and that v ∈ U is fixed. Then
the mapping u 7→ ⟨u, v⟩U is bounded linear and thus a bounded linear
functional. For a proof of this statement, I refer to the exercise sheets.
• Consider again the Banach space (C([0, 1]), ∥·∥∞ ), and let g ∈ C([0, 1]) be
a fixed function. Then the mapping Tg : C([0, 1]) → K,
Z 1
Tg (f ) := f (x)g(x) dx
0
is a bounded linear functional on C([0, 1]), as
Z 1
|Tg (f )| ≤ |f (x)||g(x)| dx ≤ ∥f ∥∞ ∥g∥∞
0
for all f ∈ C([0, 1]).
■
Definition 5.36 (dual space). Assume that U is a normed apce. The dual
space of U is defined as
U ∗ := B(U, K) = f : U → K : f is bounded linear .
Theorem 5.37. Let (U, ∥·∥U ) be a normed space. Then U ∗ is a Banach space
with the norm
∥f ∥U ∗ = sup |f (u)|.
u∈U,
∥u∥U ≤1
Proof. This immediately follows from Theorem 5.33 using the completeness
of K. □
Example 5.38. We consider the case U = Kn with the norm ∥·∥p for some
1 ≤ p ≤ ∞. Then every linear mapping T : Kn → K can be written as a matrix-
vector produce T x = Ax for some matrix A ∈ K1×n ∼ Kn . Moreover, one can
see that every linear mapping T : Kn → K is actually bounded. Thus we can
“identify” the dual space (Kn )∗ with the space Kn . The norm on the dual space
(Kn )∗ , however, in general is different from the norm on Kn . Indeed, since we are
considering the p-norm ∥·∥p on Kn , we have
n
X
∥A∥(K n )∗ = sup |Ax| = sup ai xi .
∥x∥p =1 ∥x∥p =1 i=1
4For more information on distributions and specifically their importance in the solution of
partial differential equations, I refer to the class TMA4305 - Partial Differential Equations.
138 5. BOUNDED LINEAR OPERATORS
and thus
∥A∥(K n )∗ ≤ ∥A∥p∗ .
We will show now that we have in fact an equality here, that is, that ∥A∥(K n )∗ =
∥A∥p∗ . For this, we have to find for each A ∈ Kn some x ∈ Kn with x ̸= 0 such that
|Ax| = ∥A∥p∗ ∥x∥p . For simplicity, we will restrict ourselves to the case 1 < p < ∞.
For A = 0 this holds for all x ∈ Kn . Let therefore A ∈ Kn \ {0}. Define x ∈ Kn
by setting (
ai |ai |p∗ −2 if ai ̸= 0,
xi :=
0 if ai = 0.
Then we have that
|xi | = |ai |p∗ −1 and ai xi = |ai |p∗
for all i. Using the identities p∗ = p/(p − 1) and p∗ /p = p∗ − 1 we then can compute
X n 1/p Xn 1/p
∥x∥p = |ai |(p∗ −1)p = |ai |p∗ = ∥A∥pp∗∗ /p = ∥A∥pp∗∗ −1 .
i=1 i=1
Thus
n
X n
X
|Ax| = ai xi = |ai |p∗ = ∥A∥pp∗∗ = ∥A∥p∗ ∥A∥pp∗∗ −1 = ∥A∥p∗ ∥x∥p .
i=1 i=1
Together with the previous estimate, this shows that ∥A∥(Kn )∗ = ∥A∥p∗ . Thus the
dual space of (Kn , ∥·∥p ) can be identified with the space (Kn , ∥·∥p∗ ). ■
Theorem 5.39. Let 1 ≤ p < ∞ and let f : ℓp → K be linear. The following are
equivalent:
(1) f ∈ (ℓp )∗ .
(2) There exists a sequence y ∈ ℓp∗ such that
∞
X
f (x) = xk yk
k=1
for all x ∈ ℓp .
In addition, the sequence y ∈ ℓp∗ from (2) is unique and we have
∥f ∥(ℓp )∗ = ∥y∥p∗ .
Proof. (2) =⇒ (1): By Hölder’s inequality, we can estimate
∞
X
|f (x)| = xk yk ≤ ∥x∥p ∥y∥p∗ ,
k=1
We will show now that y ∈ ℓp∗ . Again, we assume for simplicity that 1 < p < ∞.
As in Example 5.38 we define a sequence x = (x1 , x2 , . . .) setting
(
yk |yk |p∗ −2 if xk ̸= 0,
xk =
0 if xk = 0.
Thus
N
X N
X
∥y (N ) ∥pp∗∗ = |yk |p∗ = xk yk
k=1 k=1
Similar as in the finite dimensional case, this implies that we can “identify” for
1 ≤ p < ∞ the dual space (ℓp )∗ with the space ℓp∗ .
140 5. BOUNDED LINEAR OPERATORS
Hilbert Spaces
Put differently, a set K is convex if for all vectors u, v ∈ K the whole line
segment connecting u and v is contained in K.
Example 6.2. Assume K ⊂ U is a linear subspaces. Then K is convex. Indeed,
assume that u, v ∈ K. Then λu + µv ∈ K for all λ, µ ∈ K. With 0 < λ < 1 and
µ = 1 − λ we obtain the convexity of K. ■
Example 6.3. Assume (U, ∥·∥U ) is a normed space and let R > 0. Then the
ball
BR (0) = u ∈ U : ∥u∥U < R
is convex. Indeed, assume that u, v ∈ BR (0), that is, ∥u∥U < R and ∥v∥U < R,
and let 0 < λ < 1. Then
∥λu+(1−λ)v∥U ≤ |λ|∥u∥U +|1−λ|∥v∥U = λ∥u∥U +(1−λ)∥v∥U < λR+(1−λ)R = R,
which shows that λu + (1 − λ)v ∈ BR (0). ■
Definition 6.4 (distance to a set). Assume that (U, ∥·∥U ) is a normed space
and that K ⊂ U is a non-empty subset. For u ∈ U we denote by
dist(u, K) := inf ∥u − v∥U
v∈K
Thus z ∈ K satisfies
∥u − z∥U = dist(u, K).
Remark 6.9. We have stated Theorem 6.6 and Corollary 6.7 in the setting of
closed convex subsets (subspaces) of Hilbert spaces. However, the characteristions
for the projections that we have derived there remain valid for arbitrary convex
subsets (subspaces) of inner product spaces. That is, if U is an inner product space
and K ⊂ U is convex and non-empty, then z = πK (u) solves the optimisation
problem
min∥v − u∥,
v∈K
if and only if
z∈K and ℜ(⟨u − z, v − z⟩) ≤ 0 for all v ∈ K.
Moreover, if W ⊂ U is a subspaces, then z = πW (u) solves the problem
min ∥u − w∥
w∈W
if and only if
z∈W and u − z ∈ W ⊥.
Note, however, that such a vector need not exist. ■
Then ∥z∥ = R and thus z ∈ K. In view of Theorem 6.6 we thus have to show that
(61) ℜ(⟨u − z, v − z⟩) ≤ 0 for all v ∈ K.
Assume therefore that v ∈ K. We can compute
Remark 6.11. The existence and uniqueness proof for the projection onto a
convex set relied strongly on the fact that U is a Hilbert space: During the proof
we relied twice on the parallelogram law, both for proving the existence and for
proving the uniqueness. Indeed, in general Banach spaces neither existence nor
uniqueness need to hold.
Consider first the case of U = R2 with the ∞-norm. In this case, the projection
need not be unique. That is, given a closed and convex set K ⊂ U and a point
u ̸∈ K there might be different points z ∈ K such that ∥u − z∥ = dist(u, K). Take,
for instance, K = Re1 the x-axis, and let u = (0, 1). Then dist(u, K) = 1, but
every point (a, 0) ∈ K with |a| ≤ 1 satisfies ∥(a, 0) − u∥∞ = 1 = dist(u, K). That
is, every point (a, 0) ∈ K with |a| ≤ 1 can be considered a projection of (0, 1) onto
K. In particular, the projection is not unique.
Now we consider the case U = ℓ1 the space of all real-valued summable se-
quences, and we define
∞
n X 1 o
K = x ∈ ℓ1 : 1− xn ≥ 1 .
n=1
n
Then one can show that K is a non-empty, closed and convex set. However, the
projection of 0 onto K does not exist. That is, there exists no x ∈ K such that
∥x − 0∥1 = dist(0, K). In order to see this, we note first that for all x ∈ ℓ1 we have
∞ ∞ ∞
X 1 X 1 X
1≤ 1− xn ≤ 1− |xn | < |xn | = ∥x∥1 .
n=1
n n=1
n n=1
That is, ∥x∥1 > 1 for all x ∈ ℓ1 , which also implies that dist(0, K) ≥ 1. On the
other hand, for all n ≥ 2 we have that
n
en ∈ K,
n−1
which shows that
n n
dist(0, K) ≤ en =
n−1 1 n−1
for all n ≥ 2. Taking the infimum over all n ≥ 2, it follows that dist(0, K) ≤ 1.
Together, these estimates show that dist(0, K) = 1. However, as shown above,
∥x∥1 > 1 for all x ∈ K. Thus there exists no x ∈ K such that ∥x∥1 = dist(0, K). ■
146 6. HILBERT SPACES
2. Projections on subspaces
We now study in more details the properties of projections onto closed sub-
spaces. For this we will exploit the characterisation of the projection derived in
Corollary 6.7.
The results to be derived will be a generalisations of our discussion in Section 3
of the finite dimensional case.
Proposition 6.13. Assume that U is a Hilbert space and that W ⊂ U is a
closed subspace. Then the projection πW : U → U is a bounded linear mapping.
Moreover, ker πW = W ⊥ and ran πW = W .
Proof. Linearity: Assume that u, v ∈ U . Then πW (u) ∈ W and πW (v) ∈ W ,
which implies that also πW (u) + πW (v) ∈ W . Moreover, u − πW (u) ∈ W ⊥ and
v − πW (v) ∈ W ⊥ , and thus (u + v) − (πW (u) + πW (v)) ∈ W ⊥ . This shows that
z = πW (u) + πW (v) satisfies the conditions of Corollary 6.7 and thus πW (u) +
πW (v) = πW (u + v).
Similarly, if u ∈ U and λ ∈ K, then λπW (u) ∈ W and
λu − λπW (u) = λ(u − πW (u)) ∈ W ⊥ ,
which shows that λπW (u) = πW (λu).
Boundedness: This is a direct consequence of Proposition 6.12.
Kernel and range: We have that πW (u) = 0 if and only if 0 ∈ W and u − 0 ∈
W ⊥ . This is the case, if and only if u ∈ W ⊥ , which shows that ker πW = W ⊥ . Next,
2. PROJECTIONS ON SUBSPACES 147
Proof. The “if” part follows directly from the fact that the mapping u 7→
⟨u, vf ⟩ is bounded linear, as discussed previously.
For the “only if” part, assume that f ∈ U ∗ . If f = 0, then we can choose
vf = 0. Assume therefore that f ̸= 0, and denote W := ker f . Then W is a closed
subspace of U , and thus we can write U = W ⊕ W ⊥ . Moreover, since f ̸= 0,
we have that W = ker f ⊊ U , which implies that W ⊥ ̸= {0}. Choose now some
z ∈ W ⊥ with z ̸= 0. Since z ̸∈ W = ker f , it follows that f (z) ̸= 0. Moreover, we
have for all u ∈ U that
f (u) f (u)
f u− z = f (u) − f (z) = 0,
f (z) f (z)
and thus
f (u)
u− z ∈ ker f = W.
f (z)
Since z ∈ W ⊥ , this implies that
D f (u) E
u− z, z = 0
f (z)
for all u ∈ U . This can be rewritten as
f (u)
⟨u, z⟩ = ∥z∥2 ,
f (z)
or, as ∥z∥ =
̸ 0,
f (z) D f (z) E
f (u) = 2
⟨u, z⟩ = u, z
∥z∥ ∥z∥2
for all u ∈ U . Thus, if we define
f (z)
vf = z,
∥z∥2
Lemma 6.27. Assume that U , V , and W are Hilbert spaces and that T : U → V
and S : V → W are bounded linear. Then
(S ◦ T )∗ = T ∗ ◦ S ∗ .
Proof. We have that
and
∞
X
⟨x, R∗ y⟩ = xn (R∗ y)n .
n=1
Thus we have the condition that
X∞ ∞
X
xn yn+1 = xn (R∗ y)n
n=1 n=1
2
for all x, y ∈ ℓ . From this, we can conclude that
yn+1 = (R∗ y)n
for all y ∈ ℓ2 and n ∈ N, which implies that R∗ = L the left shift operator
L(y1 , y2 , y3 , . . .) = (y2 , y3 , . . .).
Obviously, R ̸= L = R∗ and thus R is not self-adjoint. In addition, we note
that (L ◦ R)x = x for all x ∈ ℓ2 , whereas
(R ◦ L)x = (0, x2 , x3 , . . .).
This shows that R ◦ R = R ◦ L ̸= L ◦ R = R∗ ◦ R, and thus R is not normal.
∗
■
Example 6.29. Recall that L2 ([0, 1]) is the completion of the space C([0, 1])
with respect to the norm ∥·∥2 defined by
Z 1 1/2
∥f ∥2 := |f (x)|2 dx .
0
4. ADJOINT OPERATORS ON HILBERT SPACES 153
That is, T is again an integral functional with kernel k ∗ (x, y) := k(y, x).
∗
Example 6.30. Consider the linear initial value problem with varying coeffi-
cients
y ′ (t) = A(t)y(t),
(63)
y(0) = x,
154 6. HILBERT SPACES
⟨y(t; x), v(t; w)⟩′ = ⟨y(t; x)′ , v(t; w)⟩ + ⟨y(t; x), v(t; w)′ ⟩
= ⟨A(t)y(t; x), v(t; w)⟩ + ⟨y(t; x), −A(t)∗ v(t; w)⟩
= ⟨A(t)y(t; x), v(t; w)⟩ − ⟨A(t)y(t; x), v(t; w)⟩ = 0
for all t ∈ [0, 1] and all x, w ∈ Rn . Thus
Z 1
0= ⟨y(t; x), v(t; w)⟩′ dt = ⟨y(1; x), v(1; w)⟩ − ⟨y(0; x), v(0; w)⟩
0
= ⟨T x, w⟩ − ⟨x, v(0; w)⟩
n
for all x, w ∈ R . This shows that
T ∗ w = v(0; w)
for all w ∈ Rn . That is, the adjoint of T maps the final value w to the initial value
of the corresponding solution of the adjoint equation (64). ■
In the finite dimensional case, we have shown the relations ker T = (ran T ∗ )⊥
and ran T = (ker T ∗ )⊥ (see Lemma 3.46). The next result provides a generalisation
to the infinite dimensional case.
Proposition 6.31. Assume that U and V are Hilbert spaces and that T : U →
V is bounded linear. Then
ran T = (ker T ∗ )⊥ and ker T = (ran T ∗ )⊥ .
Proof. We show first that ran T = (ker T ∗ )⊥ .
Assume for that first that v ∈ ran T . Then v = T u for some u ∈ U . Now let
w ∈ ker T ∗ . Then T ∗ w = 0 and thus
⟨v, w⟩V = ⟨T u, w⟩V = ⟨u, T ∗ w⟩U = 0.
This shows that v ∈ (ker T ∗ )⊥ . Since this holds for all v ∈ ran T , it follows that
ran T ⊂ (ker T ∗ )⊥ . Since (ker T ∗ )⊥ is closed, we can conclude that also ran T ⊂
(ker T ∗ )⊥ .
Now assume that v ∈ (ran T )⊥ , that is, that ⟨w, v⟩ = 0 for all w ∈ ran T . Then
we have for all u ∈ U that
⟨u, T ∗ v⟩U = ⟨T u, v⟩V = 0,
5. HILBERT SPACE BASES 155
One problem with Hamel bases in infinite dimensional Banach spaces is that
one can show that they are necessarily uncountable, which makes them essentially
impossible to use in practice.
Proposition 6.38. Let U be an infinite dimensional Banach space. Then every
Hamel basis of U is uncountable.
Remark 6.39. For incomplete normed spaces, the situation is different as we
have already seen. For instance, the family (e1 , e2 , . . . , . . .) of unit sequences is a
Hamel basis in the space cfin of finite sequences. However, cfin is incomplete with
respect to all the norms we have discussed. ■
2One has to be slightly careful here, as the decimal expansion of a real number is not always
unique (for instance we have 1 = 0.99999 . . .). However, this non-uniqueness issue only occurs
for numbers with a finite decimal expansion, and we can easily avoid any potential issues by for
(j) (j)
instance setting bj = 5 if aj = 0 or aj = 9.
5. HILBERT SPACE BASES 157
Example 6.41. Let 1 ≤ p < ∞ and consider the family of unit sequences
(e1 , e2 , . . .) in ℓp . Then this set is a Schauder basis of ℓp . In order to see this,
assume that x = (x1 , x2 , . . .) ∈ ℓp and define λn = xn . Then
N
X ∞
X 1/p
x− λ n en = |xn |p .
p
n=1 n=N +1
for all sequences λ1 , λ2 , . . ., and thus we cannot represent x in the desired form. ■
Definition 6.43 (Separability). A normed space U is called separable, if there
exists a countable dense subset of U . ■
Lemma 6.45. Assume that the normed space U has a Schauder basis. Then U
is separable.
Proof. Exercise! □
Remark 6.46. A long standing problem in functional analysis was the question,
whether every separable Banach space has a Schauder basis. In the seventies, this
question was answered negatively. That is, there exist separable Banach spaces,
which do not have any Schauder basis. ■
158 6. HILBERT SPACES
where I ⊂ N is any subset. One can show3 that the set I : I ⊂ N (and thus also
(I)
the set e : I ⊂ N ) is uncountable. Moreover, we have that ∥e(I) − e(J) ∥∞ = 1
whenever I ̸= J.
Thus we have an uncountable discrete subset of ℓ∞ , which makes it impossible
to find a countable dense subset: Assume that {s1 , s2 , . . .} ⊂ ℓ∞ is dense. Then
there exists in particular for each I ⊂ N some index nI such that ∥snI − e(I) ∥∞ <
1/4. Then, however, we have for all I ̸= J that
We will now discuss in more detail the case where U is a Hilbert space. Recall
to that end that a sequence (u1 , u2 , . . .) in an inner product space is
• orthogonal, if ⟨ui , uj ⟩ = 0 whenever i ̸= j,
• orthonormal, if it is orthogonal and ∥ui ∥ = 1 for all i.
Theorem 6.48. Let U be an inner product space. Assume moreover that
(u1 , u2 , . . . , un ) is a finite orthonormal system in U and denote
W = span({u1 , u2 , . . . , un }).
Then W is a closed subspace of U and we have
n
X
⟨u, uj ⟩uj = πW (u)
j=1
and
n
X
∥u − πW (u)∥2 = ∥u∥2 − |⟨u, uj ⟩|2
j=1
for all u ∈ U .
Proof. Exercise! □
for all u ∈ U .
This shows that (vn )n∈N is a Cauchy sequence. Since U is a Hilbert space and thus
complete, it follows that the sequence (vn )n∈N converges. □
Theorem 6.51. Assume that U is an infinite dimensional Hilbert space and
that (u1 , u2 , . . .) ⊂ U is an orthonormal system. Denote moreover
W = span{u1 , u2 , . . .}.
Then
∞
X
πW (u) = ⟨u, uj ⟩uj
j=1
for all u ∈ U .
Proof. By Bessel’s inequality we have that
X∞
|⟨u, uj ⟩|2 ≤ ∥u∥2 ,
j=1
holds.
Proof. Assume that S is total. Then S ⊥ = {0} and thus U = span S. Thus
we have for all u ∈ U that πspan S (u) = u. Thus
∞
X
u = πspan S (u) = ⟨u, uj ⟩uj .
j=1
and thus
∞
X 2 ∞
X 2
∥u∥2 = lim ⟨u, uj ⟩uj = lim ⟨u, uj ⟩uj
n→∞ n→∞
j=1 j=1
∞
X ∞
X
= lim |⟨u, uj ⟩|2 = |⟨u, uj ⟩|2 .
n→∞
j=1 j=1
Example 6.60. Let L2 ([0, 2π], C) denote the space of square integrable complex
valued functions on [0, 2π]. That is, L2 ([0, 2π], C) is the completion of the space
C([0, 2π], C) of continuous complex values functions on [0, 2π] with respect to the
norm Z 2π
∥f ∥2 = |f (x)|2 dx,
0
162 6. HILBERT SPACES
The structure of general Banach spaces is much more complicated, and there
exists no result similar to the Riesz–Fischer theorem for Banach spaces.
6. Galerkin’s Method
We will now provide a glimpse into the theory of numerical functional analysis,
that is, the (approximate) numerical solution of equations in infinite dimensional
spaces. One central idea there, which builds on everything that we did in this
course, is the so called Galerkin method. The basic idea in this method is that
one tries to approximate an infinite dimensional problem by a sufficiently high
dimensional finite dimensional one. As a preparation, we return once again to least
squares solutions of linear systems.
Theorem 6.61. Assume that U and V are Hilbert spaces, that T : U → V is
bounded linear, and that v ∈ V is given. Then u ∈ U solves the least squares
problem
(66) min∥T u − v∥2V
u∈U
in the sense that w solves (68) if and only if w = T u for some solution u of (66).
Since (68) is just a projection problem, we know that w ∈ V is a solution of (68)
if and only if
w ∈ ran T and w − v ∈ (ran T )⊥ = ker(T ∗ ).
That is, w = T u for some u ∈ U and T ∗ (w − v) = 0. This, however, is in turn
equivalent to the normal equation T ∗ (T u − v) = 0. □
Remark 6.62. Note that it can happen that neither of the problems (66)
or (67) has a solution. A part of the claim of the theorem is, however, that if one
of the problems has a solution, then so has the other. ■
Assume now that U and V are infinite dimensional Hilbert spaces, that T : U →
V is bounded linear, and that v ∈ V is given. Our goal is the solution of the equation
T u = v. Since we cannot really do so in this infinite dimensional setting, we have
to apply some form of discretisation. The idea of Galerkin’s method is to do this
by “projecting” the problem T u = v to a finite dimensional subspace of U .
Assume therefore that W ⊂ U is some finite dimensional subspace and denote
by S := T |W : W → V the restriction of T to W . Then we can consider the least
squares problem
min ∥Su − v∥2V .
u∈W
6. GALERKIN’S METHOD 163
That is, we try to find a “best solution” of the infinite dimensional problem within
the finite dimensional subspace W . In view of the previous result, this is equivalent
to solving the normal equation
(69) S ∗ Su = S ∗ v,
where S ∗ : V → W is the adjoint of S = T |W . Note moreover that this problem
necessarily has a solution, because W , and therefore also ran(S), is finite dimen-
sional.
Assume now that (u1 , u2 , . . . , uN ) is a basis of W . Then u ∈ W solves (69), if
and only if
⟨S ∗ Su, ui ⟩ = ⟨S ∗ v, ui ⟩
for all 1 ≤ i ≤ N . By the definition of the adjoint S ∗ of S, this is equivalent to the
system of equations
⟨Su, Sui ⟩ = ⟨v, Sui ⟩
for all 1 ≤ i ≤ N . Moreover, by definition Su = T u for all u ∈ W and thus we
obtain the system of equations
(70) ⟨T u, T ui ⟩ = ⟨v, T ui ⟩ for all 1 ≤ i ≤ N.
Finally, we will rewrite this system in matrix-vector form. Since (u1 , . . . , uN )
is a basis of W , we can write every solution u of (70) as
N
X
u= xj uj
j=1
Then
u† = lim u(N ) .
N →∞
−1
Proof. Since T : V → U is bounded linear, we can estimate
† −1
∥u(N )
− u ∥U ≤ ∥T ∥B(V,U ) ∥T u(N ) − T u† ∥V = ∥T −1 ∥B(V,U ) ∥T u(N ) − v∥V .
Thus it is sufficient to show that ∥T u(N ) − v∥V → 0. Since u(N ) solves the least
squares problem (72) we have that
∥T u(N ) − v∥V ≤ ∥T πWN (u† ) − v∥V
for all N ∈ N.
Since (u1 , u2 , . . .) is a Hilbert basis of U we have that
N
X
†
πWN (u ) = ⟨u† , uj ⟩uj
j=1
and thus
N
X
lim πWN (u† ) = lim ⟨u† , uj ⟩uj = u† .
N →∞ N →∞
j=1
Because of the boundedness of T , this implies that
lim T πWN (u† ) = T u† = v.
N →∞
Thus
lim ∥T u(N ) − v∥V ≤ lim ∥T πWN (u† ) − v∥V = 0.
N →∞ N →∞
□
Remark 6.64. In the proof of the previous result, we actually obtain the con-
crete estimate
∥u(N ) − u† ∥U ≤ ∥T ∥B(U,V ) ∥T −1 ∥B(V,U ) ∥u† − πWN (u† )∥,
which can be rewritten as
∞
X
∥u(N ) − u† ∥U ≤ ∥T ∥B(U,V ) ∥T −1 ∥B(V,U ) |⟨u† , ui ⟩|2 .
n=N +1
The approximation error thus depends on the decay properties of the generalised
Fourier coefficients of the true solution u† ; the faster these decay, the better the
approximation. ■
Example 6.66. The Sobolev space H01 ([0, 1]) is the completion of the space of
all functions u ∈ C 1 ([0, 1]) satisfying u(0) = u(1) = 0 with respect to the inner
product
Z 1
⟨u, v⟩1 := u′ (x)v ′ (x) dx.
0
One can show that this space is indeed a space of functions on the interval [0, 1]
and that the point evaluations δx are continuous.4 Thus H01 ([0, 1]) is an RKHS.
Moreover, it is possible to show that the kernel k : [0, 1] × [0, 1] → K is the function
(
(1 − x)y if x ≥ y,
k(y, x) =
x(1 − y) if x ≤ y.
Indeed, if u : [0, 1] → K is (for simplicity) continuously differentiable and satisfies
u(0) = u(1) = 0, then
Z 1 Z x Z 1
′ ′
⟨u, k(·, x)⟩U = u (y) ∂y k(y, x) dy = u (y)(1 − x) dy − u′ (y)x dy
0 0 x
= u(x)(1 − x) − u(0) − u(1) + u(x)x = u(x).
■
4More details will e.g. be discussed in the course TMA4212 Numerical Solution of Differential
Equations by Difference Methods, as Sobolev spaces are a central tool in the modern theory of
partial differential equations and their analytic and numerical solution.
166 6. HILBERT SPACES
Then
N
DX N
X E
0 ≤ ∥f ∥2U = ⟨f, f ⟩U = αi k(·, xi ), αj k(·, xj )
U
i=1 j=1
N X
X N N X
X N
= αi αj k(·, xi ), k(·, xj ) U
= αi αj k(xj , xi ).
i=1 j=1 i=1 j=1
□
An alternative way of interpreting these last results is the following: Given
distinct points x1 , . . . , xN ∈ Ω we define a matrix K ∈ KN ×N by setting
Kij := k(xi , xj ).
Then Lemma 6.67 states that K is a Hermitian matrix, and Lemma 6.68 states
that K is positive semi-definite.
Remark 6.69. Assume again that x1 , . . . , xN ∈ Ω are distinct points, and
define the space
V := span{kx1 , . . . , kxN } ⊂ U.
Then V is a Hilbert space itself (being a finite dimensional subspace of U ) and the
family (kx1 , . . . , kxN ) by construction contained in V . Moreover, we have that
⟨kxj , kxi ⟩U = ⟨k(·, xj ), k(·, xi )⟩U = k(xi , xj ).
Thus the matrix K ∈ KN ×N with Kij = k(xi , xj ) is in fact the Gram matrix of the
family (kx1 , . . . , kxN ) in V .
In particular, we obtain by Lemma 3.24 that the family (kx1 , . . . , knN ) is linearly
independent (and thus a basis of V ), if and only if the matrix K is positive definite.
■
Conversely, it is possible to show that every function k that satisfies the con-
ditions of Lemmas 6.67 and 6.68 defines a unique RKHS:
Theorem 6.70 (Moore–Aronszajn). Let Ω be a set and let k : Ω × Ω → K be
Hermitian symmetric and positive semi-definite. Then there exists a unique Hilbert
space U of functions on Ω with reproducing kernel k. Moreover, the space
U0 := span k(·, x) : x ∈ Ω
is a dense subspace of U .
Proof. See [BTA04, Thm. 3]. □
Remark 6.71. The statement in Theorem 6.70 that the space U0 is dense in
U is the same as saying that functions in U can be arbitrarily well approximated
by finite linear combinations of functions k(·, xj ) for some xj ∈ Ω. That is, for
7. REPRODUCING KERNEL HILBERT SPACES 167
let
W := u ∈ U : u(xi ) = 0 for all i = 1, . . . , N
and denote
Ŵ := ŵ + W = u ∈ U : u(xi ) = zi for all i = 1, . . . , N .
168 6. HILBERT SPACES
which is the same as the problem of computing the projection of the point 0 ∈ U
onto the affine space Ŵ . Note moreover that W is finite dimensional and therefore
a closed subspace of U . Thus Corollary 6.8 implies that (73) has a unique solution
u† . Moreover, u† is uniquely characterised by the conditions
u† ∈ Ŵ and u† ∈ W ⊥ .
Now define the mapping T : U → Kn ,
⟨u, kx1 ⟩U u(x1 )
T u := .. ..
= . .
.
⟨u, kxN ⟩U u(xN )
Then W = ker T . As a consequence, since W is closed, we have that
W ⊥ = (ker T )⊥ = ran T ∗ .
We now determine the mapping T ∗ : KN → U . Here we regard KN as a Hilbert
space with the standard Euclidean inner product. Let c ∈ KN and u ∈ U . Then
N
X D X N E
⟨u, T ∗ c⟩U = ⟨T u, c⟩KN = ⟨u, kxi ⟩ci = u, ci kxi .
i=1 i=1
Thus
N
X
T ∗c = ci kxi .
i=1
As a consequence,
ran T ∗ = span{kx1 , . . . , kxN }.
This further implies that u† ∈ U is the unique function of the form
N
X
u† = cj kxj
j=1
such that
u† (xi ) = zi for all i = 1, . . . , N.
Now we note that (cf. Proposition 3.26)
N
DX E N
X N
X
† †
zi = u (xi ) = ⟨u , kxi ⟩U = cj kxj , kxi = cj k(xi , kj ) = Kij cj .
U
j=1 j=1 j=1
[Axl24] Sheldon Axler. Linear algebra done right. Undergraduate Texts in Mathematics.
Springer, Cham, fourth edition, 2024.
[BTA04] Alain Berlinet and Christine Thomas-Agnan. Reproducing kernel Hilbert spaces in prob-
ability and statistics. Kluwer Academic Publishers, Boston, MA, 2004. With a preface
by Persi Diaconis.
[Dev12] Keith J. Devlin. Introduction to mathematical thinking. Keith Devlin, Palo Alto, CA,
USA, 2012.
[Hal74] Paul R. Halmos. Naive set theory. Undergraduate Texts in Mathematics. Springer-
Verlag, New York-Heidelberg, 1974. Reprint of the 1960 edition.
[HS75] Edwin Hewitt and Karl Stromberg. Real and abstract analysis, volume No. 25 of Gradu-
ate Texts in Mathematics. Springer-Verlag, New York-Heidelberg, 1975. A modern treat-
ment of the theory of functions of a real variable, Third printing.
171