0% found this document useful (0 votes)
17 views

Markos Linear Methods

Linear methods
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Markos Linear Methods

Linear methods
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 175

Linear Methods

Markus Grasmair

Trondheim, Norway

Lecture notes for the course TMA4145 - Linear Methods, Autumn 2024, NTNU
Contents

Chapter 0. Mathematical Language 1


Mathematical Statements 1
Connectors and Quantifiers 2

Chapter 1. Basic Notions 5


0. Reading Guide 5
1. Sets 5
2. Functions 8
3. Families and Sequences 10
4. Real Numbers 11
5*. Choice 13

Chapter 2. Linear Algebra 19


1. Vector Spaces and Linear Transformations 19
2. Linear Independence and Bases 25
3. Matrix Representations of Linear Transformations 31
4. Direct Sums and Invariant Subspaces 33
5. Eigenvalues and Eigenspaces 42
6. Existence of Eigenvalues 44
7. Triangularisation 45
8. Diagonalisation 47
9. Generalised Eigenspaces 49
10. Characteristic and Minimal Polynomial 54
11. Jordan’s Normal Form 57

Chapter 3. Inner Product Spaces and Singular Value Decomposition 61


1. Inner Products 61
2. Bases of Inner Product Spaces 64
3. Orthogonal Complements and Projections 67
4. Singular Value Decomposition 69
5. Unitary Transformations and Adjoints 71
6. Singular Value Decomposition Revisited 74
7. The Moore–Penrose Inverse 75
8. Singular Value Decomposition of Matrices 77
9. Self-adjoint and Positive Definite Transformations 80

Chapter 4. Banach and Hilbert Spaces 85


1. Normed Spaces 85
2. Convergence and Continuity 88
3. Open and Closed Sets 91
4. Cauchy–sequences and Completeness 95
5. Equivalence of Norms 102
6. Banach’s Fixed Point Theorem 104
7. Application of the Fixed Point Theorem: Existence for ODEs 107
iii
iv CONTENTS

8. p-norms 111
9. Sequence Spaces 115
10. Completions 118
Chapter 5. Bounded Linear Operators 121
1. Continuity of linear mappings 121
2. Norms of bounded linear operators 124
3. Extension of Linear Mappings 129
4. Range and Kernel 133
5. Spaces of Bounded Linear Mappings 136
Chapter 6. Hilbert Spaces 141
1. Best approximation and projections 141
2. Projections on subspaces 146
3. Riesz’ Representation Theorem 148
4. Adjoint operators on Hilbert spaces 150
5. Hilbert space bases 155
6. Galerkin’s Method 162
7. Reproducing Kernel Hilbert Spaces 164

Bibliography 171
CHAPTER 0

Mathematical Language

When you open a maths book, you will immediately note that it does not
look the same as other books, or even textbooks in other disciplines. There are of
course the obvious differences like the prevalent use of mathematical symbols and
the frequent appearance of equations, both incorporated in the running text and as
separate lines or even paragraphs. Then there is the strangely fragmented structure
of the text with definitions, theorems, lemmas, proofs, remarks, and so on, but often
hardly any text in between. The differences, however, go deeper than that, and you
will notice that the language itself looks weird. Especially when reading through
definitions and theorems, you might get the impression that they are written in
an unnecessarily complicated and convoluted manner, and these complications can
look even worse in the case of proofs. In the following, we will briefly discuss the
differences between mathematical language and everyday language. Many of the
thoughts in this chapter are based on the excellent book [Dev12], which intends to
ease the transition from high school to university mathematics. Even though you
are not the main target group of that book, having already put up with several
advanced mathematic courses, it can still make sense to skim through it.

Mathematical Statements
If you use language for everyday communication, you usually do not have to
be extremely precise. If something is unclear, you always can ask additional ques-
tions. In written communication, this is not always possible, but you still have
some context and common sense to your disposal, which can clear up most misun-
derstandings. In mathematics, you no longer have this luxury, as soon as you are
dealing with more complicated objects than integers or rational numbers, or with
more complex subjects than spatial geometry. Because of this, we have to be very
careful how to use language to talk about abstract mathematical objects.
Modern mathematics is largely concerned with investigating whether state-
ments about mathematical objects are true or false. Examples of such statements
are the following:
(1) The number √ 7 is a prime number. The number 6 is not a prime number.
(2) The number 2 is irrational.
(3) Every quadratic equation x2 + bx + c = 0 has at least two solutions.
(4) Every even natural number greater than 2 can be expressed as the sum
of two prime numbers.
The two statements in (1) are true. The statement (2) is true as well. The
statement (3) is wrong. The statement (4) is Goldbach’s conjecture; at the time of
the writing of this note, it was not yet known whether this is true or false.
In order to find the truth value of a mathematical statement, it is necessary to
actually know what the different expressions in the statement mean. That is, we
need a sufficiently precise definition of all the expressions. For instance, in order
to determine whether the statement that 7 is a prime number is true, we first need
to know what a prime number is. One possibility for defining prime numbers is the
1
2 0. MATHEMATICAL LANGUAGE

following: “A natural number p is a prime number, if there do not exist numbers


a and b, both strictly smaller than p, such that p = ab.” With this definition at
hand, we can then actually verify, or prove, that 7 is a prime number: All we have
to do is to simply compute all the different products ab with natural numbers a
and b strictly smaller than 7. Since the number 7 never appears as such a product,
it is prime. Conversely, one possibility for showing that 6 is not a prime number
is to find numbers a and b strictly smaller than 6 such that their product is 6; for
instance, the numbers a = 3 and b = 2 would do.
Of course, everyone reading this text knew in advance what a prime number
was, and in particular that 7 is prime. Thus the necessity of both a precise definition
of primes and a proof that 7 is a prime might look quite a bit superficial and
superfluous.1 If we are dealing with less intuitive concepts, however, the situation
is different. Then, in order to avoid misunderstandings, precise definitions are
crucial. An important example here, which you have seen in your first calculus
class, is that of continuity. Although we might think that we have a good intuitive
understanding of what it means for a function to be continuous, this intuition
can get challenged by complicated functions. It is pretty clear that the function
f (x) = x2 is continuous, whereas the function f (x) = sgn(x) is not. But what
about, say, the function f (x) = sin(1/x), where we define f (0) := 0? In order to
be able to make a statement of its continuity, we need a precise definition.

Connectors and Quantifiers


Most mathematical statements are built up of simpler statements using the
connectors “and”, “or”, “not”, and “if . . . then”. Moreover, the quantifiers “for
all” and “there exists” are regularly needed. While these innocent looking terms
also appear in everyday language,2 there are some notable differences between their
everyday meanings and their mathematical meanings. In the following, we will
briefly review these main connectors and quantifiers in terms of their effect on the
truth values of the component expressions.

And. The statement (ϕ and ψ) is true precisely when both statements ϕ and
ψ are true. If at least one statement (or both) are false, it is false as well.
In everyday language, the word “and” can have a slightly different meaning than
the mathematical connector “and” and also convey a temporal or final relationship:
In the sentence “The sun went down and it became dark,” we are really talking
about a sequence of events: First, the sun went down, and then it became dark.
We might even imply that the first part of the sentence is the cause of the second
one.3 In mathematical expressions, the word “and” is never used in that sense. The
mathematical statements (ϕ and ψ) and (ψ and ϕ) are precisely the same, and the
order of the components ϕ and ψ does not matter. Consider in contrast the sinister
meaning of the reversed sentence “It became dark and the sun went down.”

Or. The statement (ϕ or ψ) is true precisely when at least one of the state-
ments ϕ or ψ (or both) are true. It is only false, when both ϕ and ψ are false. For
instance, the statement
(7 is a prime) or (7 < 0)

1Although, there is sometimes doubt about the status of the number 1: Is it a prime or not?
The definition above should clarify this doubt!
2At least the connectors do so pretty regularly; the appearance of the quantifiers is probably
much more sparse.
3You should not jump to conclusions that fast, though: Do not fall easy prey to the fallacy
post hoc ergo propter hoc and always remember that correlation does not imply causation!
CONNECTORS AND QUANTIFIERS 3

is true, as at least one of the component statements (namely the statement that 7
is a prime) is true; it does not matter that the statement 7 < 0 is false.
In addition, we have to be a bit careful when augmenting an “or” statement to
an “either . . . or” statement. In everyday language these two phrases are often, but
not always, used interchangeably, but the “either . . . or” can also be understood as
an exclusive or, which is true whenever precisely one of the component statements is
true, but not both. In this class, I will use the phrase “either . . . or” interchangeably
with “or”. If I need to make use of an exclusive or, I will use more complicated
expressions.4

Not. The negation inverts the truth statement of an expression: If the state-
ment ϕ is true, then the statement (not ϕ) is false; if the statement ϕ is wrong,
then the statement (not ϕ) is true.
We have the following rules for combining not with and and or (De Morgan’s
laws of propositional logic):
(not(ϕ and ψ)) = ((not ϕ) or (not ψ)),
(not(ϕ or ψ)) = ((not ϕ) and (not ψ)).

If — then. The statement (if ϕ then ψ) is false only in the case where ϕ is
true, but ψ is false. That is, in the case where the statement ϕ is false, the statement
(if ϕ then ψ) is true no matter the truth value of ψ. This fact is known as ex falso
quodlibet, roughly translating to “from the wrong, anything [follows]”: If one starts
with a wrong premise (that is, if ϕ is false), then one cannot conclude anything
about the conclusion (that is, we don’t know whether ψ is true or false), even if the
deduction process (that is, the statement (if ϕ then ψ)) was completely correct.
Many mathematical theorems can be formulated in terms of “if — then” state-
ments. As a consequence, reformulations of such statements are regularly used
for the proof of theorems. The probably most important reformulations are the
following two:
(if ϕ then ψ) = (if (not ψ) then (not ϕ)),
(if ϕ then ψ) = not(ϕ and (not ψ)).
The first reformulation is used in proofs by contraposition: In order to show the
truth of the statement (if ϕ then ψ), we assume that the conclusion ψ does not hold
(that is, we assume (not ψ)) and try to conclude that ϕ does not hold either (that
is, we try to show that (not ϕ) holds). The second reformulation is used in proofs
by contradiction: Again we are trying to show that the statement (if ϕ then ψ)
holds. For that, we assume that ϕ and the negation of ψ are simultaneously true,
and then we try to show that this implies a false statement.
Be aware that the common meaning of the word “implication” is different from
the truth value of an “if . . . then” expression even in mathematics. For instance,
the statement
(2 + 2 = 4) =⇒ (7 is a prime)
is true as a mathematical statement: Both the antecedent (2 + 2 = 4) and the
consequent (7 is a prime) are true statements. However, we would not say (or
write) that “the assumption that 2 + 2 = 4 implies that 7 is a prime,” but we
rather reserve the notion of “implications” to the case where there is actually a
connection between the two statements.

4Knowing myself, I will invariably use at some point during the lecture a phrase along the
lines of “If a, then either b or c,” even though the phrase calls for a non-exclusive or. This is
intended as a safeguard against accusations of even more errors than really warranted.
4 0. MATHEMATICAL LANGUAGE

For all, there exists. The statement (for all x : ϕ(x)) is true precisely when
the statement ϕ(x) is true for every single admissible argument x. Conversely, it is
false when there exists at least one admissible argument such that ϕ(x) is false. The
statement (there exists x : ϕ(x)) is true when the statemet ϕ(x) is true for at least
one (but possibly more) admissible arguments x. The negation of (for all x : ϕ(x))
is (there exists x : (not ϕ(x))).
Many mathematical statements are “for all” or “there exists” statements (or
both) in disguise: For instance, the statement “The number 6 is a prime number.”
can be translated to (or should be interpreted as) the statement “There exist natural
numbers a and b with a < 6 and b < 6, such that ab = 6.” Breaking this down
further, we may arrive at the statement “There exist natural numbers a and b with
the properties a < 6 and b < 6 and ab = 6.”
More complicated, the expression “Every quadratic equation x2 + bx + c = 0
has a complex solution” can be translated to “For all numbers b and c, there exists
a complex number x with the property x2 + bx + c = 0.”
This last example also demonstrates the importance of sentence order: The
statement above is true (and you even know an explicit formula for all the solutions
of the equation), but if we reverse the order of the “for all” and the “there exists”
terms, we obtain the blatantly wrong statement “There exists a complex number
x such that all numbers b and c have the property x2 + bx + c = 0.” (In order to
see that this reversed statement is false, consider for fixed x the numbers b = −x
and c = 1.)
CHAPTER 1

Basic Notions

0. Reading Guide
In this note, definitions, remarks, examples, and exercises end with a ■; proofs
in most mathematical texts including this one end with a □. Theorems, proposi-
tions, lemmas, and corollaries are written in italics, which makes it unnecessary to
delineate their boundaries further.
Some of the sections are marked by a ∗ to indicate that they are of (sometimes
considerably) increased difficulty and/or abstraction. The content of these sections
is not used in the other parts of the notes, and thus you can freely skip over these.
The same convention is used for remarks, exercises, proofs, and so on. You may, of
course, read these parts as well, but only at your own risk. You have been warned.

1. Sets
Although it is possible to develop a precise notion of sets, I will not do so in
these notes. The reason is that such a rigorous introduction would be tedious and
difficult, but at the same time would only contribute little to the main subjects
of our studies in this course, namely linear mappings between vector spaces. This
section is intended to ensure that we have a common understanding about how we
can deal with sets. For a classical (and well written) introduction to set theory, I
refer to [Hal74]. I, in contrast, will rather loosely follow the exposition in the first
pages of [HS75].
Informally, a “set is any identifiable collection of objects of any sort.” The
objects contained in a set are called its elements or members. There are three main
possibilities to introduce a new set: First, we can simply list all of its elements
and, say, define a set X := {“Norwegian”, “blue”, “parrot”}. This is usually done
for small, finite sets, but we can use dots (. . .) for larger or even infinite sets; here
I assume that everybody understands what I mean with the sets {1, . . . , 1729} or
{3, 4, 5, . . .}. Second, we can define a set by a property of its elements. For instance,
we could define a set

Y := x : x is a dairy product that can be bought in a cheese shop .
Finally, we can build up a set from other sets using operations like union, intersec-
tion, or product of sets (see below).
The symbol ∈ indicates that an object is a member of a set; the symbol ̸∈
indicates that is is not. For instance, with the definition of X above, we have that
“parrot” ∈ X, but “Norwegian blue” ̸∈ X.
Definition 1.1 (Subset and proper subset). Assume that X and Y are sets.
We say that X is a subset of Y , denoted X ⊂ Y , if for every x ∈ X we have that
x ∈ Y ; otherwise we write X ̸⊂ Y . If X ⊂ Y and Y ⊂ X, then we write X = Y
(and the sets X and Y are equal); else X ̸= Y . If X ⊂ Y but X ̸= Y , then we say
that X is a proper subset of Y , denoted X ⊊ Y . ■

Remark 1.2. There is an alternative school of notation that uses the symbol ⊆
to denote subsets, and ⊂ to denote proper subsets. We do not follow this notation
5
6 1. BASIC NOTIONS

in this course. However, it is important to know that this alternative notation


exists, especially when you are reading additional support material. In this case, it
is important that you check what notation is used there. ■

Remark 1.3. If we want to show that two sets, say X and Y , are equal, we have
to show that they contain precisely the same elements. Thus, the basic approach
for a proof is, first to take an arbitrary element x ∈ X and prove that it is also
contained in Y , and then to take an arbitrary element y ∈ Y and prove that it is
also contained in X. ■

Definition 1.4 (Empty set). By ∅ we denote the empty set, which has no
elements. ■

Statements about the empty set always appear somewhat awkward, and it can
be difficult to formulate proofs, as the “standard start” of picking some element
x ∈ ∅ does not really work. Thus it can be helpful to rely on proofs by contradiction.
As an example, we prove the following (really trivial) statement:
Lemma 1.5. For every set X we have ∅ ⊂ X.
Proof. Let X be an arbitrary set, and assume to the contrary that ∅ ̸⊂ X.
Then there exists some y ∈ ∅ such that y ̸∈ X. Because of the definition of the
emptyset, however, such a y cannot exist, and thus we have a contradiction. As a
consequence, the assumption ∅ ̸⊂ X is wrong, or, put differently, ∅ ⊂ X. □

Remark 1.6. In many cases, the elements of a set can be sets themselves. As
an example, a line is a set of point. Thus, if we are talking about the set of all lines
in the plane, we are in fact talking about a set consisting of sets. ■

Definition 1.7 (Union, intersection, difference, and product). Assume that


X and Y are sets.
• The union X ∪ Y is the set of all elements contained in at least one of the
sets X and Y . That is,
x ∈ X ∪ Y : ⇐⇒ x ∈ X or x ∈ Y.
• The intersection X ∩ Y is the set of all elements contained in both of the
sets X and Y . That is,
x ∈ X ∩ Y : ⇐⇒ x ∈ X and x ∈ Y.
• The difference X\Y of the sets X and Y is the set of all elements contained
in X but not in Y . That is,
x ∈ X \ Y : ⇐⇒ x ∈ X and x ̸∈ Y.
• The Cartesian product (or simply product) of the sets X and Y is the set
X × Y consisting of all ordered pairs (x, y) such that
(x, y) ∈ X × Y : ⇐⇒ x ∈ X and y ∈ Y.

Definition 1.8 (Power set). Let X be a set. By P(X) we denote the power
set of X, which consists of all subsets of X. That is,
S ∈ P(x) : ⇐⇒ S ⊂ X.

1. SETS 7

Example 1.9. Let X := {“Norwegian”, “blue”}. Then



P(X) = ∅, {“Norwegian”}, {“blue”}, {“Norwegian”, “blue”} .
Note here that the elements of P(X) are themselves sets. In particular, the set
{“blue”} is not the same as the element “blue”. Note also that both statements
∅ ⊂ P(X) and ∅ ∈ P(X) are simultaneously true: the empty set is a subset of
P(X) (as it is a subset of every set), but at the same time, the empty set appears
amongst the elements of P(X). ■

Exercise 1.1. Assume that X is a set. Determine the following sets:


• The set X ∪ ∅.
• The set X ∩ ∅.
• The set X \ ∅.
• The set ∅ \ X.
• The set X × ∅.
• The set P(∅).

Definition 1.10 (Complement). Assume that X is a set and U ⊂ X is a


subset. By U C := X \ U we denote the complement of U (in X). ■

Remark 1.11. When we are working sets, we do not care about the “or-
der” in which elements appear in a set, nor the “multiplicity”. Thus the sets
{“blue”, “parrot”}, {“parrot”, “blue”}, and {“blue”, “parrot”, “blue”} are all the
same (as they contain the same elements), even though they look different at first
glance. ■

Remark 1.12* (Russell’s paradox). When we specify a set by the properties


of its elements, we have to be very careful not to destroy the very fabric of math-
ematics. In fact, if we allow for the construction of arbitrary sets in that way, we
readily run into contradictions: Take for instance the “set”

X := Y : Y is a set ,
that is, the set containing all possible sets (including itself) as elements. Then we
can consider the subset

Z := Y ∈ X : Y ̸∈ Y
of all sets that do not contain itself as an element. Now we can ask the question,
whether the set Z is an element of itself, that is, whether Z ∈ Z holds, or not. The
answer to this question is that neither can logically be the case:
• Assume first that the inclusion Z ∈ Z holds, that is, that Z contains
itself as an element. By the definition of Z as the set of all sets that do
not contain itself as an element, this would further imply that Z is not
contained in Z. This, however, contradicts the initial assumption Z ∈ Z,
and therefore that assumption cannot hold.
• Now assume conversely that Z ̸∈ Z. Since Z consists of all sets that do
not contain itself as an element, the statement Z ̸∈ Z then implies that
Z satisfies the description of the elements of Z, and thus we can conclude
that Z ∈ Z. Again, this contradicts the initial assumption Z ̸∈ Z.
Thus neither of the statements Z ∈ Z or Z ̸∈ Z can be true, and we have a
contradiction.
The common solution is to deny the object X the status of a “set.” More
precisely, we allow the specification of sets by the properties of their elements only
in the case of subsets. Given an already existing set U (which may of course contain
8 1. BASIC NOTIONS

elements that are themselves sets) and some property p(x) for elements in U , we
are allowed to define a set of the form

x ∈ U : the property p(x) holds ⊂ U.
However, the same construction is not allowed without some (explicit or implicit)
specification of the superset U .
Still, one sometimes want to talk simultaneously about all different sets, or, in
this course rather, simultaneously about all possible real or complex vector spaces.
In order to do so, one can use the notion of a class instead of a set, which is (very
informally) a collection of sets defined by a common property, but is not a set
itself. Thus one can talk about the class of all sets or the class of all vector spaces.
Now, the contradiction we have observed above is no longer there: The class of all
sets that do not contain itself as elements is a perfectly fine class. Moreover, it
trivially does not contain itself as an element, as it only contains sets, not (proper)
classes. ■

2. Functions
Informally spoken, a function f : X → Y from set X to a set Y is an operation
that assigns to each element x ∈ X a unique element y := f (x) ∈ Y . Formally,
within the context of set theory, we can define a function as follows:
Definition 1.13 (Function). Let X and Y be sets. A function from X to Y ,
written f : X → Y , is a subset f ⊂ X × Y such that there exists for each x ∈ X
precisely one y ∈ Y with (x, y) ∈ f . Given x ∈ X, we denote by f (x) the unique
element in Y such that (x, f (x)) ∈ f . ■

Remark 1.14. In this course, we will not make use of the set theoretic definition
(and notation) of a function, but we will rather use the “standard” interpretation as
an operation. Feel therefore free to forget or ignore the set theoretic Definition 1.13.

Definition 1.15 (Domain and codomain). Assume that X and Y are sets and
that f : X → Y is a function. We say that X is the domain of f and Y the codomain
of f . ■

Remark 1.16. In this course, we will use the word function interchangeably
with mapping. Sometimes, we will also use the words transformation or operator,
though these will be mostly reserved for linear mappings. ■

Definition 1.17 (Injectivity, surjectivity, and bijectivity). Assume that X and


Y are sets and that f : X → Y is a function.
• The function f is injective, if there exists for each y ∈ Y at most one
x ∈ X such that f (x) = y.
• The function f is surjective, if there exists for each y ∈ Y at least one
x ∈ X such that f (x) = y.
• The function f is bijective, if there exists for each y ∈ Y precisely one
x ∈ X such that f (x) = y.

Lemma 1.18 (Inverse). Assume that X and Y are sets and that f : X → Y is
a bijective function. Then there exists a unique function f −1 : Y → X, the inverse
of f , such that
f (f −1 (y)) = y for all y ∈ Y
and
f −1 (f (x)) = x for all x ∈ X.
2. FUNCTIONS 9

Proof. We have to show that a function f −1 : Y → X with these properties


exists, and that it is unique. We start with the existence of f −1 , which we prove
by actually constructing f −1 .
Let y ∈ Y be arbitrary. Since f is bijective, there exists a unique x ∈ X such
that f (x) = y. Thus we can define f −1 (y) := x. Doing so for each y ∈ Y , we obtain
a function f −1 : Y → X. Moreover, by construction, we have that f (f −1 (y)) = y
for each y ∈ Y .
Now let x ∈ X be arbitrary and denote y := f (x). Then
f (x) = y = f (f −1 (y)) = f (f −1 (f (x))).
Now the injectivity of f implies that x = f −1 (f (x)). Thus the function f −1 has
the desired properties.
Assume now that g : Y → X is another function such that f (g(y)) = y for all
y ∈ Y and g(f (x)) = x for all x ∈ X. Let moreover y ∈ Y . Then f (g(y)) =
y = f (f −1 (y)). Now the injectivity of f implies that g(y) = f −1 (y). Since y was
arbitrary, this proves that g = f −1 . Thus f −1 is the unique function with these
properties. □
Definition 1.19 (Composition). Assume that X, Y , Z are sets, and that
f : X → Y and g : Y → Z are functions. We define the composition of g and f as
the function g ◦ f : X → Z,
g ◦ f (x) := g(f (x)) for all x ∈ X.

Lemma 1.20. Assume that X, Y , Z are sets, and that f : X → Y and g : Y → Z


are functions.
• If g ◦ f is injective, then f is injective.
• If g ◦ f is surjective, then g is surjective.
• If g and f are injective, then g ◦ f is injective.
• If g and f are surjective, then g ◦ f is surjective.
• If g and f are bijective, then g and f are bijective, and we have that
(g ◦ f )−1 = f −1 ◦ g −1 .
Proof. Exercise! □
Exercise 1.2. In Lemma 1.20, some implications are missing—for the excel-
lent reason that they do not hold. In this exercise, you are tasked with finding
counterexamples to these missing implications. That is, find sets X, Y , Z and
functions f : X → Y and g : Y → Z such that:
• g ◦ f is injective, but g is not injective.
• g ◦ f is surjective, but f is not surjective.
• g ◦ f is bijective, but neither g nor f is bijective.

Definition 1.21 (Restriction). Assume that X and Y are sets and that f : X →
Y is a function. Assume moreover that U ⊂ X is a subset. By f |U we denote the
restriction of f to U , defined as the function f |U : U → Y satisfying
f |U (x) := f (x) for all x ∈ U.

Definition 1.22 (Image and pre-image). Assume that X and Y are sets and
that f : X → Y is a function. If A ⊂ X is a subset of X we denote by

f (A) := y ∈ Y : there exists x ∈ A with f (x) = y
10 1. BASIC NOTIONS

the image of A under f .


If B ⊂ Y is a subset of Y we denote by
f −1 (B) := x ∈ X : f (x) ∈ B


the pre-image of B under f . ■

Remark 1.23. Assume that f : X → Y is a bijective function with inverse f −1 .


Then, if B ⊂ Y is a subset of Y , we can consider both the pre-image of B under
the function f and the image of B under the function f −1 , and we actually use the
same notation for these two sets. Fortunately, this is no problem, because they are
actually the same.
In addition, if B = {y} ⊂ Y is a subset consisting of a single element y ∈ Y ,
we can apply both the inverse f −1 to the element y, and the pre-image to the set
B. In this case, we obtain (after a brief consideration) that
f −1 ({y}) = {f −1 (y)}.
Note here, that the f −1 on the left hand side denotes the pre-image operation,
whereas the f −1 on the right hand side denotes the inverse function.
Because of this, the fact that f −1 denotes two different objects poses no dif-
ficulties in practice. However, it is important to remember that the pre-image op-
eration is also defined for non-invertible functions. In particular, using the notion
“f −1 ” for the pre-image does by no means imply that the function f is actually
invertible. ■

Proposition 1.24. Assume that X and Y are sets and that f : X → Y is a


function. For every subset A ⊂ X we have
A ⊂ f −1 f (A) .


For every subset B ⊂ Y we have


f f −1 (B) = B.


Proof. Exercise! □

3. Families and Sequences


Recall that sets are unordered collections of objects without repetition. In many
cases, however, we will need “sets” where the order is important and repetitions
might occur. For this type of “sets”, we introduce the notion of a family, or,
informally, an indexed set.
Definition 1.25 (Family). Assume that X is a set. A family in X is a function
x : I → X for some set I. The set I is called the index set of the family and the
values x(i) are the members of the family. Usually, we denote a family in the form
(xi )i∈I , where I is the index set and xi := x(i) are the members or terms of the
family. ■

In order to differentiate between sets and families, we will use curly brackets
{·} to enclose sets, but round brackets (·) to enclose families.
Remark 1.26. If the index set of a family is the set {1, 2, . . . , n} for some
n ∈ N, we rather speak of a tuple instead of a family. Also, we usually use in this
case either the notation (x1 , x2 , . . . , xn ) or (xi )i=1,...,n . ■

Example 1.27. In order to see the difference between a family and a set, let
us first consider the set
X := xk ∈ R : xk = (−1)k , k ∈ N = (−1)k : k ∈ N .
 
4. REAL NUMBERS 11

By defining the set X in that way, we might get the impression that X has infinitely
many elements, but this is in fact not the case: A set is defined only through the
distinct elements it contains, and it does not matter how many times the same
element appears in the definition; the notion of “multiplicity” of elements in a set
is not defined at all. Next, we could also write the set X as
X = {−1, +1, −1, +1, −1, . . .}.
Still, this might give a quite wrong impression about X as a set, as it looks like −1
would be the “first” element of X, then +1 would be the “second”, −1 would be
again the “third”, and so on. However, the order in which its elements appear in
the definition plays no role at all, and it does not make sense to talk about “the
first element” of a set. The set X simply is
X = {+1, −1} = {−1, +1}.
Now consider instead the family
Y := ((−1)k )k∈N = (−1, +1, −1, +1, . . .).
Here it is actually true to say that this family has infinitely many members, although
it, obviously, only has 2 distinct ones. Also, −1 is the first term, +1 is the second
term, and so on. ■

In the special case where the index set are the natural numbers, we usually
speak of a sequence instead of a family:
Definition 1.28 (Sequence). Assume that X is a set. A sequence in X is a
function x : N → X. Usually, we denote a sequence in the form (xi )i∈N , where
xi := x(i) are the terms of the sequence. ■

Remark 1.29*. Although we won’t need the notion of a subsequence in this


course, we can use the definition of sequences as functions in order to obtain a
precise definition of subsequences as well:
Assume that x : N → X is a sequence in X. A subsequence of x is the compos-
ition x ◦ k : N → X of x with a strictly increasing function k : N → N (in the sense
that k(i) > k(j) whenever i > j). Usually, we denote a subsequence in the form
(xk(i) )i∈N , where (xi )i∈N is the original sequence. ■

In a slight abuse of notation, we extend the notion of subsets to families:


Definition 1.30. Assume that X is a set and that (xi )i∈I is a family in X.
Assume moreover that Y ⊂ X is a subset of X. We say that the family is contained
in Y , denoted (xi )i∈I ⊂ Y , if each member of the family is contained in Y , that is,
if xi ∈ Y for all i ∈ I. ■

Definition 1.31 (Subfamily). Assume that X is a set, that S := (xi )i∈I is a


family in X. We say that T is a subfamily of S, if there exists a subset J ⊂ I such
that T = (xj )j∈J . In this case, we write T ⊂ S. ■

4. Real Numbers
There are different approaches to the definition of the real numbers, all of which
are somewhat complicated and not necessarily intuitive. Because of that, we will
abstain from providing a precise definition, but rather assume that everybody has
an intuitive understanding of what we mean by the real numbers. At some point
during the course (specifically, in Section 4), however, we have to make use of the
completeness of real numbers, a concept that we will introduce only then. In this
section, we will therefore introduce the main property of the real numbers that will
be required then.
12 1. BASIC NOTIONS

Definition 1.32 (Lower and upper bound). Let S ⊂ R.


• We say that y ∈ R ∪ {±∞} is a lower bound of S, if y ≤ x for all x ∈ S.
• We say that z ∈ R ∪ {±∞} is an upper bound of S is z ≥ x for all x ∈ S.
• The set S is bounded, if it has both an upper and lower bound within the
real numbers R.

As an example, −29 is a lower bound of the half-open interval (0, 1], whereas
1 is an example of an upper bound. We have for all subsets S ⊂ R that +∞ is an
upper bound of S and −∞ is a lower bound of S. For the particular case S = ∅,
every element of R ∪ {±∞} is simultaneously a lower bound and an upper bound.
Definition 1.33 (Infimum and supremum). Let S ⊂ R.
• We say that y ∈ R ∪ {±∞} is a supremum of S, if y is an upper bound,
and y ≤ z for all upper bounds z of S.
• We say that y ∈ R ∪ {±∞} is an infimum of S, if y is a lower bound of S
and y ≥ z for all lower bounds z of S.

Put differently, the supremum of a set S is the smallest upper bound of S,


whereas the infimum is the largest lower bound. In the next proposition, we show
that the use of the definite article in the previous sentence was actually justified.
Proposition 1.34. The supremum and infimum of a set S ⊂ R are unique, if
they exist.
Proof. Assume that y and z are two suprema of S. Then we have by definition
that y ≤ z (because y is a supremum) and z ≤ y (because z is a supremum), and
therefore y = z. The proof of the uniqueness of infima is similar. □
Proposition 1.35. Assume that S ⊂ R is a non-empty bounded set.
• We have that x = sup S, if and only if x ≥ s for all s ∈ S and there exists
for every ε > 0 some sε ∈ S such that sε > x − ε.
• We have that y = inf S , if and only if y ≤ s for all s ∈ S and there exists
for every ε > 0 some sε ∈ S such that sε < y + ε.
Proof. We only show the first part of the claim, as the second is similar.
Assume that x = sup S. Then x is an upper bound of S and therefore x ≥ s
for every s ∈ S. Now assume to the contrary that there exists some ε > 0 such
that s ≤ x − ε for all s ∈ S. Then x − ε is also an upper bound of S, but x − ε < x.
This contradicts the assumption that x is the smallest upper bound of S.
Conversely, assume that x ∈ R is such that x ≥ s for all s ∈ S and there exists
for every ε > 0 some sε ∈ S such that sε > x − ε. Then clearly x is an upper bound
of S. Assume now to the contrary that x is not the smallest upper bound of S.
Then there exists some upper bound z of S with z < x. In particular, s ≤ z for all
s ∈ S. Define now ε := x − z. Then there exists sε ∈ S with sε > x − ε = z. This
contradicts the assumption that z was an upper bound of S. Thus x is indeed the
smallest upper bound of S, which proves the assertion. □
A central property of the real numbers (which we state here as an axiom) is
that infimum and supremum of arbitrary sets actually exist.
Axiom 1.36. Every subset of R has a supremum and an infimum in R∪{±∞}.
Definition 1.37 (Inf and sup). Let S ⊂ R. By inf S and sup S we denote the
infimum and supremum of S, respectively. ■
5*. CHOICE 13

Example 1.38. The infimum of the half-open interval [0, 1) is 0, and the su-
premum is 1. More general, if I = (a, b) is any interval in R, then inf I = a and
sup I = b, and the same holds if we replace the open interval by the closed interval
or any half-open interval.
Moreover, we have that
inf R = −∞ and sup R = +∞,
1
whereas
inf ∅ = +∞ and sup ∅ = −∞.

Proposition 1.39. Infimum and supremum have the following properties:


• If S ⊂ T ⊂ R, then
inf T ≤ inf S and inf S ≤ inf T.
• If S is a non-empty set and x ∈ S, then
inf S ≤ x ≤ sup S.
• For all sets S ⊂ R we have that
inf(−S) = − sup S,

where −S := −x : x ∈ S .
Proof. Exercise! □

5*. Choice
Modern mathematics is built upon the foundation of axiomatic set theory.
However, this foundation is buried so deep under lots of familiar looking definitions
and theorems that you almost never will come in contact with it, unless you want
to specialise in set theory or in logic. There is one major exception, though, in the
form of a set theoretic axiom that consistently shows itself in unexpected places,
namely the axiom of choice. In this class, we will, for instance, meet this axiom
when we will discuss the existence of bases of arbitrary vector spaces.
In this section, I will present a brief overview over a naive approach to axiomatic
set theory up to and including the notorious axiom of choice. I will largely follow
here the basic approach of [Hal74], but will skip over almost all of the details; you
can find them there if you are interested. Let us thus start our journey into the
mathematical underworld with the fitting “lasciate ogne speranza, voi ch’intrate.”
Or, on a slightly more positive note, quoting Paul Halmos [Hal74, p. vi]: “[G]eneral
set theory is pretty trivial stuff really, but, if you want to be a mathematician, you
need some, and here it is; read it, absorb it, and forget it.”
In axiomatic approaches to set theory, one generally ignores the question of
what sets actually are, but rather focusses what one can do with sets. Essentially,
one specifies operations that are allowed with sets and that result in other sets
being formed.
Axiom 1.40* (Axiom of Extension). Two sets are equal, if and only if the
contain the same elements.
This is just a rewording (and axiomatisation) of Definition 1.1.
Axiom 1.41* (Axiom of Specification). If Y is a set and p(x) is a statement
about the elements x ∈ Y , then there exists a set X ⊂ Y containing precisely the
elements x ∈ Y for which the statement p(x) is true.
1Do verify the next statement!
14 1. BASIC NOTIONS


In other words, the “object” x ∈ Y : p(x) is true is actually a set. Note here
that we explicitly restrict ourselves to subsets of a given set, thereby sidestepping
the issues discussed in Remark 1.12*.
Axiom 1.42* (Axiom of Pairing). If X and Y are sets, there exists a set Z
containing both X and Y .
Note here that the elements of the set Z are the sets X and Y , not the elements
of these sets. The latter is only taken care of in the next axiom:
Axiom 1.43* (Axiom of Unions). For every collection of sets, there exists a set
containing each element of each set in that collection.
Put (slightly) differently, we can form the union of arbitrarily many sets.
Axiom 1.44* (Axiom of Powers). For every set X, there exists a set that con-
tains all the subsets of X as elements.
In particular, this means that the power set P(X) of any set X is again a set.
Note that all axioms until now build on already existing sets, but we have not
yet stated that “sets” actually exist. We will finally do so in the next axiom:
Axiom 1.45* (Axiom of Infinity). The set N of natural numbers exists.
Remark 1.46*. Usually, this axiom is formulated quite a bit differently: One
starts by defining 0 := ∅, then defines 1 := {0} = {∅}, 2 := {0, 1} = {∅, {∅}},
3 := {0, 1, 2}, and so on. The axiom of infinity then is formulated at stating that
the empty set ∅ exists, as well as a set that contains all the sets 0, 1, 2, . . . ,
constructed in that manner.
In particular, we obtain from this construction that, viewed from a set theor-
etical perspective, the set of natural numbers is N = {0, 1, 2, 3, . . .}, starting at 0.
However, a guesstimated majority of mathematicians excludes 0 from belonging to
the natural numbers and defines N = {1, 2, 3, . . .} starting at 1. I personally belong
to the latter camp as a mathematician and to the former camp when programming,
but really do not bother too much. You can therefore assume that the set N does
not contain the number 0 whenever I use it as an index set for a sequence, although
it mostly should not matter. At the same time, I am not offended if you follow the
alternative convention and insist that 0 is natural.2 ■

With these axioms, it is already possible to perform most constructions you


will ever need during your mathematical life (without you even ever noticing this).
In particular, it is possible (though by no means simple or intuitive) to define what
the Cartesian product of two sets is,3 which then allows us to define the notions of
functions and families. Now we are able to define the Cartesian product of arbitrary
families of sets.
Definition 1.47*. Assume that (Xi )i∈I is a collection of sets and denote by
[
X := Xi
i∈I

2As a fine Norwegian compromise position, we could decide to take the average of the two
definitions and define N = {1/2, 3/2, 5/2, . . .}, though I do doubt that this definition would gain
much traction in practice.
3
This construction is somewhat roundabout and unintuitive, though, as we do not have the
notion of pairs at our disposal: If X and Y are sets, we can define the Cartesian product of X
and Y as the set of all sets of the form {x, {x, y}} with x ∈ X and y ∈ Y .
5*. CHOICE 15

the union of this collection. The Cartesian product of the collection of sets is the
set ×i∈I Xi defined as

×

Xi := x : I → X : xi ∈ Xi for all i ∈ I .
i∈I

Axiom 1.48* (Axiom of Choice). Assume that (Xi )i∈I is a family of non-empty
Q
sets Xi indexed by a non-empty index set I. Then the Cartesian product i∈I Xi
is non-empty.
In view of the definition of the Cartesian product, we can reformulate the Axiom
of Choice in terms of functions as follows: If (Xi )i∈I is a family of non-empty
S sets
Xi indexed by a non-empty index set I, there exists a function f : I → i∈I Xi
such that f (i) ∈ Xi for all i ∈ I. Such a function is called a choice function for
this family. Colloquially, we can say that we may (simultaneously) choose from any
collection of sets one element from each of the sets in that collection.
At first glance, this axiom is obviously true, and one might think that there
is nothing controversial about it: Of course, we can choose elements from sets.
However, the axiom does not put any limits on the number and form of sets from
which we choose, and it does not provide any concrete method for constructing a
choice function. For instance, we can consider the situation where the collection
of sets we are interested in is just the power set of R excluding ∅, that is, the set
of all the non-empty subsets of R. The Axiom of Choice then claims/postulates
that it is possible to select from each non-empty subset X ⊂ R an element x ∈ X.
For certain subsets, such a selection is easily possible: For bounded intervals, for
instance, we could simply the midpoint. However, the assertion is that a selection
is possible no matter the shape of the subset of R.
We now look at two important equivalent reformulations of the Axiom of Choice
intended to illustrate the controversial nature of said axiom, namely the Well-
ordering Axiom and Zorn’s Lemma. For this, we first have to introduce the notion
of ordered sets.
Definition 1.49* (Partial order). Let X be a non-empty set. A partial order
on X is a subset R ⊂ X × X with the following properties:
• Reflexivity: For all x ∈ X we have that (x, x) ∈ R.
• Anti-symmetry: If (x, y) ∈ R and (y, x) ∈ R, then x = y.
• Transitivity: If (x, y) ∈ R and (y, z) ∈ R, then (x, z) ∈ R.
Given a partial order R on X, we usually indicate the relation (x, y) ∈ R by x ≤R y.
A set X together with a partial order on X is called a partially ordered set. ■
Definition 1.50* (Total order). Let X be a non-empty set and R be a partial
order on X. We say that R is a total order, if for all x, y ∈ X, at least one of the
relations x ≤R y or y ≤R x holds. A set X together with a total order on X is
called a totally ordered set. ■

Example 1.51*. On Rn we can consider the standard, or componentwise, par-


tial order x ≤R y, if and only if xi ≤ yi for all 1 ≤ i ≤ n. This is indeed a partial
order, as the following considerations show:
• If x ∈ Rn , then, trivially, xi ≤ xi for each i, which implies that x ≤R x.
• If x ≤R y and y ≤R x, then xi ≤ yi and yi ≤ xi for each i, which implies
that xi = yi for each i. Thus x = y.
• If x ≤R y and y ≤R z, then we have for each i that xi ≤ yi and yi ≤ zi ,
which implies that xi ≤ zi for each i. Thus we also have that x ≤R z.
16 1. BASIC NOTIONS

In dimensions n ≥ 2, this is not a total order. For instance, the vectors e1 =


(1, 0, 0, . . .) and e2 = (0, 1, 0, . . .) are not comparable: Neither of the inequalities
e1 ≤R e2 or e2 ≤R e1 holds. ■

Example 1.52*. Let Y be any set and let X := P(Y ) be the power set of Y
consisting of all subsets of Y . On X we can define a partial order by (S, T ) ∈ R if
S ⊂ T . In general, this is not a total order.4 ■

Definition 1.53*. Let X be a non-empty set with partial order R. Assume


moreover that S ⊂ X is a non-empty subset of X.
• We say that x ∈ X is a lower bound of S, if x ≤R y for all y ∈ S.
• We say that x ∈ X is a smallest element of S, if x ∈ S and x ≤R y for all
y ∈ S.
• We say that x ∈ X is an upper bound of S, if y ≤R x for all y ∈ S.
• We say that x ∈ X is a largest element of S, if x ∈ S and y ≤R x for all
y ∈ S.
• We say that x ∈ X is a minimal element of S, if x ∈ S and there does
not exist any y ∈ S, y ̸= x, such that y ≤R x.
• We say that x ∈ X is a maximal element of S, if x ∈ S and there does
not exist any y ∈ S, y ̸= x, such that x ≤R y.

Remark 1.54*. It is easy to see (in view of the anti-symmetry of orders) that
smallest elements are necessarily unique, if they exist. Thus we may as well speak of
the smallest element of a set S instead of a smallest element, once we have assured
that such an element actually exists. In contrast, minimal and maximal elements
need not be unique. ■

Example 1.55*. Takethe set X = R2 with the componentwise order R and


consider the subset S := (x1 , x2 ) ∈ R2 : x1 ≥ 0, x2 ≥ 0 . Then (0, 0) is the
smallest element
 of S, as (0, 0) ≤R (x1 , x2 ) for all (x1 , x2 ) ∈ S. Now consider the
set T := (x1 , x2 ) ∈ R2 : x2 ≥ −x1 . Then it is easy to see that there does not
exist any (x1 , x2 ) ∈ T such that (x1 , x2 ) ≤R (0, 0). Thus (0, 0) is a minimal element
of T . However, (x1 , x2 ) is not a smallest element of T , as, for instance, (1, −1) ∈ T ,
but (0, 0) ̸≤R (1, −1). In fact, the set T has no smallest element at all. ■

Definition 1.56* (Well-ordering). Let X be a non-empty set with total order


R. We say that R is a well-ordering of X, if every non-empty subset of X has a
least element. A set X with a well-ordering on X is called a well-ordered set. ■

Example 1.57*. The set N of natural numbers with the standard order is well-
ordered, as every non-empty subset of positive integers has a smallest element. The
set Z of integers is not well-ordered; for instance the whole set Z has no smallest
element, as it is unbounded below. The set R≥0 of non-negative real numbers is not
well ordered: For instance, the open interval (1, 2) contains no smallest element. ■
Theorem 1.58* (Well-ordering Theorem). Every non-empty set can be well-
ordered.
Proof. See [Hal74, Sec. 17]. □
Remark 1.59*. This theorem of course does not state that every non-empty
set X with a total order R is already well-ordered, but only that we can find another
total order S on X that is a well-ordering. This order S may have nothing to do
with R and need not be compatible to any other structure on X. ■

4Exercise: Find all cases, where this actually is a total order!


5*. CHOICE 17

Definition 1.60*. Assume that X is a partially ordered set with partial order
R. A chain in X is a subset Y of X such that the restriction of R to Y defines a
total order on Y . ■

Theorem 1.61* (Zorn’s Lemma). Assume that X is a partially ordered set,


such that every chain in X has an upper bound in X. Then X contains a maximal
element.
Proof. See [Hal74, Sec. 16]. □
In fact, one can show that the Axiom of Choice, the Well-ordering Theorem,
and Zorn’s Lemma are all equivalent in the following sense: If we assume that the
other axioms of set theory hold, then the validity of either of these three statements
implies the validity of the other two. Now, the Axiom of Choice looks, at least at
first sight, like a reasonable proposition, whereas the Well-ordering Theorem is
far less convincing (can you think of a well-ordering on the real numbers?). The
mathematician Jerry L. Bona summed this up as follows: “The axiom of choice is
obviously true, the well-ordering principle obviously false, and who can tell about
Zorn’s lemma?”
CHAPTER 2

Linear Algebra

In this chapter, we will discuss basics of linear algebra up to and including


eigendecompositions of linear transformations of vector spaces. The finite dimen-
sional part of this chapter is largely inspired by, and follows the argumentation as,
[Axl24], though we will deviate at different points.

1. Vector Spaces and Linear Transformations


1.1. Basic Definition. We start with a formal definition of vector spaces.
Most if not all of this section has already been discussed in previous mathematics
classes (e.g. Mathematics 3 or Linear Algebra with Applications), though possibly
in less detail and possibly in a slightly less abstract way.
Definition 2.1 (Vector space). A vector space over the real numbers R (the
complex numbers C) is a set V together with the operations + : V × V → V and
· : R × V → V (· : C × V → V ), such that the following properties hold:
(1) Commutativity: For all u, v ∈ V we have u + v = v + u.
(2) Associativity: For all u, v, w ∈ V we have (u + v) + w = u + (v + w), and
for all v ∈ V and all λ, µ ∈ R (λ, µ ∈ C) we have λ(µv) = (λµ)v.
(3) Additive identity: There exists an element 0 ∈ V such that v + 0 = v for
all v ∈ V .
(4) Additive inverse: For every element u ∈ V there exists an element v ∈ V
such that u + v = 0.
(5) Multiplicative identity: For all v ∈ V we have that 1v = v.
(6) Distributivity: For all u, v ∈ V and all λ, µ ∈ R (λ, µ ∈ C) we have that
λ(u + v) = λu + λv, and (λ + µ)v = λv + µv.

Definition 2.2 (Vector). An element of a vector space is called a vector. ■

Remark 2.3. For simplicity, we will in the following denote by K either the
set R of real numbers or the set C of complex numbers in order to simplify the
notation. There will be some situations later on, when we will have to distinguish
between real and complex vector spaces, but many of the simpler results are the
same in both cases.1 ■

Example 2.4. The following are some standard examples of vector spaces that
will be used throughout the course. It is left to the reader as an exercise2 to verify
that all the requirements for a vector space (that is, Items (1)–(6) in Definition 2.1)
are indeed satisfied in each example.

1The letter K here stands for field, which in Norwegian is kropp (and in German Körper ).
This is the most general algebraic structure for which it makes sense to define vector spaces. For
more information, please see the course TMA4150 – Algebra.
2
These “exercises for the reader” can actually provide you with suitable training for the exam!
They are not exclusively here, because I was too lazy to formulate a complete proof.

19
20 2. LINEAR ALGEBRA

(1) For every n ∈ N, the set Kn of n-tuples of real (complex) numbers


is a real (complex) vector space. Here we use the componentwise ad-
dition and scalar multiplication, that is, for x = (x1 , x2 , . . . , xn ), y =
(y1 , y2 , . . . , yn ) ∈ Kn and λ ∈ K we define
(x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ) = (x1 + y1 , x2 + y2 , . . . , xn + yn )
and
λ · (x1 , x2 , . . . , xn ) = (λx1 , λx2 , . . . , λxn ).
(2) For all n, m ∈ N, the sets Matm,n (K) = Km×n and Matn (K) = Kn×n of
(n × m)-dimensional and (n × n)-dimensional matrices, respectively, are
vector spaces.
(3) The set P(K) of all polynomials (of arbitrary degree) with coefficients in
K is a vector space.

Example 2.5 (Function spaces). Assume that S is an arbitrary set and denote
by
KS := f : S → K


the set of K-valued functions on S. On KS we define addition and scalar multiplic-


ation of functions by
(f + g)(s) := f (s) + g(s) and (λf )(s) := λf (s)
for f , g ∈ K , λ ∈ K, and s ∈ S. With these operations, the set KS becomes a
S

vector space.
More general, let U be a vector space over K and let S be an arbitrary set. We
denote by
U S := f : S → U


the set of functions on S with values in U . Similarly as above, we can define on U S


addition and scalar multiplication by
(f + g)(u) := f (u) + g(u) and (λf )(u) := λf (u)
for u, v ∈ U , and λ ∈ K. With these operations, the set U S becomes again a vector
space.
As particular examples, the set RR is the (real) vector space of all real valued
functions on R, and the set Matm,n (C)R denotes the (complex) vector space of all
functions f : R → Matm,n (C) defined on R and taking values in the vector space
of complex m × n-matrices. ■

Remark 2.6. The usage of + and · for the addition and scalar multiplication
in Definition 2.1 is intended to be suggestive and follows the standard notation of
these operations. However, it somewhat hides how restrictive these assumptions
actually are.
A more formal (but much less intuitive) way would be to define a vector space
as a set V together with two functions f : V × V → V and g : K × V → V that
satisfy Items (1)–(6). With this notation, commutativity and parts of associativity
and distributivity, for instance, would read as
Commutativity: f (u, v) = f (v, u),
 
Associativity: f f (u, v), w = f u, f (v, w) ,

Distributivity: g(λ + µ, v) = f g(λ, v), g(µ, v) ,
respectively. ■
1. VECTOR SPACES AND LINEAR TRANSFORMATIONS 21

1.2. General Properties. In the following, we will prove some general results
concerning vector spaces that follow directly from the definition of a vector space.
Amongst others, we will show that the additive identity and the additive inverse are
unique, although this seems not to be required in the definition. This is obviously
true in all the examples provided above. What these results, however, show, is that
it is impossible to construct a vector space with two different 0-elements. If one
would need such an object for whatever theoretical or practical reason, one would
need to search for it outside the realm of vector spaces.
This section is mainly intended to demonstrate by means of simple examples
how elementary algebraic proofs look like. Feel free to skip over this part, if you are
already familiar with this type of proofs.
Lemma 2.7. Assume that V is a vector space. Then the additive identity 0 ∈ V
is unique. That is, there exists no vector v ∈ V with v ̸= 0 such that u + v = u for
all u ∈ V .
Proof. Assume that v ∈ V satisfies u + v = u for all u ∈ V . With u = 0 we
have in particular that
0 = 0 + v |{z}
= v + 0 |{z}
= v,
(1) (3)

which shows that v = 0. □

Lemma 2.8. Assume that V is a vector space and that u ∈ V . Then there exists
a unique vector v ∈ V such that u + v = 0.
Proof. Since V is a vector space, we already know that such a vector v exists.
Thus it remains to show that v is unique.
Assume therefore that v1 , v2 ∈ V satisfy u + v1 = 0 and u + v2 = 0. Then
v1 |{z}
= v1 +0 = v1 +(u+v2 ) |{z}
= (v1 +u)+v2 |{z}
= (u+v1 )+v2 = 0+v2 |{z}
= v2 +0 = v2 ,
(3) (2) (1) (1)

that is, v1 = v2 . This proves the uniqueness of the additive inverse. □

Definition 2.9. Let V be a vector space and v ∈ V . By −v ∈ V we denote the


additive inverse of v, that is, the unique vector satisfying v + (−v) = 0. Moreover,
for u ∈ V , we generally abbreviate u − v := u + (−v). ■

Lemma 2.10. Let V be a vector space and let v ∈ V . Then 0 · v = 0.


Proof. We have that
v + 0 · v |{z}
= 1 · v + 0 · v |{z}
= (1 + 0) · v = 1 · v |{z}
= v,
(5) (6) (5)

which shows that 0 · v is an additive identity in V . Because of the uniqueness of


the additive identity (see Lemma 2.7), it follows that 0 · v = 0. □

Lemma 2.11. Let V be a vector space and v ∈ V . Then −v = (−1) · v.


Proof. We have that
v + (−1) · v |{z}
= 1 · v + (−1) · v |{z}
= (1 − 1) · v = 0 · v =
|{z} 0.
(5) (6) Lemma 2.10

This shows that (−1) · v is an additive inverse for v. Now the identity (−1) · v = −v
follows from the uniqueness of the additive inverse. □
22 2. LINEAR ALGEBRA

1.3. Linear Transformations.


Definition 2.12. Assume that U , V are vector spaces over K and that T : U →
V is some mapping. We say that T is linear (or a linear transformation) if the
following conditions hold:
• For all u, v ∈ U we have T (u + v) = T (u) + T (v).
• For all u ∈ U and λ ∈ K we have T (λu) = λ T (u).

Remark 2.13. For linear transformations it is common to drop the parentheses


around the argument. That is, if T : U → V is linear, one usually writes T u instead
of T (u). ■

Remark 2.14. If the domain and co-domain of a linear transformation T are


equal, say for instance that T : U → U is a linear transformation from U to U itself,
we usually say that T is a linear transformation on U . ■

Example 2.15. The following mappings are examples of linear transformations:


(1) The mapping T : Matn (K) → Matn (K), A 7→ T A := AT , which takes a
matrix and returns its transpose, is linear.
(2) Let P ∈ Matn (K) be a fixed matrix. Then the mapping T : Matn (K) →
Matn (K), A 7→ T A := P AP is linear.
(3) The mapping D : P(K) → P(K), p 7→ Dp := p′ , which takes a polynomial
and returns its derivative, is linear.

Example 2.16 (Point evaluation). We now assume that S is a set and U a


vector space over K. For fixed s ∈ S, we define the point evaluation in s as the
mapping δs : U S → U defined by
δs (f ) := f (s)
for f ∈ U S . (That is, the input of the mapping is the function f , and the output
is the value of f in the point s.) Then this mapping is a linear transformation. ■
The following example shows that we have to be a bit careful when dealing
with complex vector spaces, as the scalars λ in the definition of linearity might also
be imaginary numbers.
Example 2.17. Consider the mapping T : Cn → Cn ,
T (x1 , x2 , . . . , xn ) := T (x1 , x2 , . . . , xn ),
where xj denotes the complex conjugate of xj . Then T is not linear.
In order to demonstrate that T is not linear, it suffices to find a single example
of a condition for linearity that is not satisfied. Take to that end
x := (1, 1, . . . , 1) ∈ Cn .
Then
T x = x.
However,
T (ix) = T (i, i, . . . , i) = (ı, ı, . . . , ı) = (−i, −i, . . . , −i) = −ix.
Thus
T (ix) = −ix ̸= ix = i T x,
and thus the mapping T is not linear. ■
1. VECTOR SPACES AND LINEAR TRANSFORMATIONS 23

Exercise 2.1. Given a matrix A ∈ Matm,n (C), we define its Hermitian con-
jugate as the transpose of the componentwise complex conjugate of A, that is,
AH := (A)T ∈ Matn,m (C).
Show that the mapping T : Matm,n (C) → Matn,m (C),
T A := AH
is not linear. ■

Proposition 2.18. Assume that U , V , and W are vector spaces over K, and
that T : U → V and S : V → W are linear transformations. Then the composition
S ◦ T : U → W is a linear transformation as well.
Proof. Let u, v ∈ U , and λ, µ ∈ K. Then
 
S ◦ T (λu + µv) = S T (λu + µv) |{z}
= S λTu + µTv
T linear
= λ S(T u) + µ S(T v) = λ S ◦ T (u) + µ S ◦ T (v).
|{z}
S linear

1.4. Subspaces. We will now introduce the notion of subspaces of vector
spaces. Roughly spoken, a subspace of a vector space is a subset, which is a vector
itself when using the same addition and scalar multiplication.
Definition 2.19 (Subspace). Assume that V is a vector space over K and
U ⊂ V a subset of V . We say that U is a subspace of V , if U is in itself a vector
space over K with the same addition and scalar multiplication as on U .
That is, if +V and ·V denote addition and scalar multiplication on V , and +U
and ·U denote addition and scalar multiplication on U ⊂ V , then u +V v = u +U v
for all u, v ∈ U , and λ ·V u = λ ·U u for all λ ∈ K and u ∈ U . ■

In the definition of a subspace, we assume that we are already given a vector


space structure on the set U , which, a-priori, might be independent from the struc-
ture of the set V . In most practical applications with subspaces, however, we are
given the vector space V and the subset U , and we want to know whether this sub-
set is actually a subspace when we use the same addition and scalar multiplication
as on the larger space. The next result answers this question.
Lemma 2.20. Assume that V is a vector space and U ⊂ V a subset. Then U
is a subspace of V (with the same addition and scalar multiplication as on U ), if
and only if the following conditions hold:
(1) Additive identity: 0 ∈ U .
(2) Closedness under addition: If u, v ∈ U , then also u + v ∈ U .
(3) Closedness under scalar multiplication: If u ∈ U and λ ∈ R, then also
λu ∈ U .
Proof. Assume first that Items (1)–(3) hold. Items (2) and (3) guarantee that
the addition and scalar multiplication indeed map U × U and K × U onto U (and
not arbitrarily into V ). Moreover, Item (1) guarantees that U contains an additive
identity, and Item (3) guarantees that U contains additive inverses (namely the
element (−1) · u for u ∈ U ). The other properties of a vector space are satisfied,
because they already hold for all elements of the larger space V . Thus U is actually
a vector space, which implies that it is a subspace of V .
Conversely, assume that U is a subspace of V . Then, necessarily, the addition
and scalar multiplication map U × U and K × U onto U , and thus Items (2) and (3)
hold. Next, since U is a vector space itself, it is non-empty. Choose therefore some
24 2. LINEAR ALGEBRA

u ∈ U . Then Items (2) and (3) imply that 0 = u + (−1) · u ∈ U , and thus (1)
holds. □

Example 2.21. Consider the following examples of vector spaces and subsets,
which in some cases are subspaces and in others are not:3
 space V , the sets {0} and V are subspaces of V .
(1) For every vector
(2) The set U := (x1 , x2 ) ∈ R2 : x2 = −x1 is a subspace of R2 .
(3) Let V = Kn and U := Kn−1 × {0} = (x1 , . . . , xn−1 , xn ) ∈ Kn : xn = 0 .
Then U is a subspace of V . 
(4) For every n ∈ N, the sets Symn (K) := A∈ Matn (K) : AT = A of
symmetric n × n-matrices and Skewn (K) := A ∈ Matn (K) : AT = −A
of skewsymmetric n × n-matrices are subspaces of Matn (K).
(5) For every n ∈ N, the set Pn (K) of polynomials of degree at most n is a
subspace of the vector space P(K) of all polynomials.
(6) The set C(K) of continuous K-valued functions f : R → K is a subspace
of the space RK of
 all K-valued functions on R.
(7) Denote by V := f ∈ C(K) : f (7) = 0 . Then V is a subspace of C(K).
(8) Let V = Kn and U := Kn−1 × {1} = (x1 , . . . , xn−1 , xn ) ∈ Kn : xn = 1 .
Then U isnot a subspace of V .
(9) Let U := (x1 , x2 ) ∈ K2 : x1 = 0 or x2 = 0 . Then U is not a subspace
of K2 . 
(10) Let n ∈ N, n ≥ 2, and U := A ∈ Matn (K) : det A = 0 the set of
singular matrices. Then U is not a subspace of Matn (K).

1.5. Injectivity and Surjectivity of Linear Transformations. Assume


now that U and V are vector spaces over K and that T : U → V is a linear trans-
formation. Then we have two important associated subspaces of U and V :
Definition 2.22. Let U , V be vector spaces and T : U → V be a linear trans-
formation.
• The kernel of T is defined as

ker(T ) := u ∈ U : T u = 0 ⊂ U.
• The range of T is defined as

ran(T ) := v ∈ V : there exists u ∈ U with T u = v ⊂ V.

Put differently, the kernel of T is the set of all solutions of the equation T u = 0;
the range of T is the set of all right hand sides v for which the equation T u = v
has a solution.
Lemma 2.23. The range and the kernel of a linear transformation are subspaces
of the respective vector spaces.
Proof. Exercise! □

Example 2.24. Consider the mapping T : Matn (K) → Matn (K), A 7→ T A :=


AT + A.
• The kernel ker(T ) is the set of all matrices A that satisfy AT + A = 0,
that is, the set of all skew-symmetric matrices.

3As an exercise, you can verify in each of the examples whether this is actually true.
2. LINEAR INDEPENDENCE AND BASES 25

• The range ran(T ) is the set of all matrices B that can be written in the
form B = A + AT for some matrix A. It is easy to see that this is the case
if and only if B is symmetric.
Thus we have that
ker(T ) = Skewn (K) and ran(T ) = Symn (K).


Example 2.25. Consider the mapping D : P(K) → P(K), p 7→ Dp := p . Then
ker(D) = p ∈ P(K) : p′ = 0 = p ∈ P(K) : p is constant
 

and
ran(D) = P(K).

Proposition 2.26 (Injectivity and surjectivity). Assume that U and V are


vector spaces over K and that T : U → V is a linear transformation.
• T is injective if and only if ker(T ) = {0}.
• T is surjective if and only if ran(T ) = V .
Proof. In the case of surjectivity, this is a trivial consequence of the definition.
Thus we only discuss the case of injectivity.
Assume first that T is injective, and let u ∈ ker(T ). Then T u = 0. Moreover,
the linearity of T implies that T 0 = 0. Thus we have that T u = 0 = T 0. The
injectivity of T now implies that u = 0, which proves that ker(T ) = {0}.
Conversely, assume that ker(T ) = {0}. Moreover, let u1 , u2 ∈ U be such that
T u1 = T u2 . We have to show that u1 = u2 . Because of the linearity of T we have
that
0 = T u1 − T u2 = T (u1 − u2 ),
which shows that u1 − u2 ∈ ker(T ). Since, by assumption, ker(T ) = {0}, this
implies that u1 − u2 = 0 and thus u1 = u2 , which proves the claim. □
Proposition 2.27. Assume that U and V are vector spaces over K and that
T : U → V is a linear transformation. Assume moreover that T is bijective. Then
the inverse T −1 : V → U is a linear transformation as well.
Proof. Let v1 , v2 ∈ V , and λ, µ ∈ K. Denote moreover u1 := T −1 v1 and
u2 := T −1 v2 . The linearity of T implies that
λv1 + µv2 = λT u1 + λT u2 = T (λu1 + µu2 ).
Applying the function T −1 on both sides of this equation, we obtain that
T −1 (λv1 + µv2 ) = T −1 (T (λu1 + µu2 )) = λu1 + µu2 = λT −1 v1 + µT −1 v2 ,
which proves the linearity of T −1 . □

2. Linear Independence and Bases


2.1. Linear Span.
Definition 2.28. Assume that V is a vector space and that S := (vi )i∈I is a
(possibly infinite) family in V . We define the linear span span(S) ⊂ V of S as the
set of all finite linear combinations of members of S.
That is, a vector u ∈ V is an element of span(S), if and only if there exist
N ∈ N, vectors vi1 , . . . , viN ∈ S, and scalars λ1 , . . . , λN ∈ K such that
N
X
u= λ j vi j .
j=1
26 2. LINEAR ALGEBRA

Remark 2.29. It is useful to be able to talk about the span of the empty family
∅, mainly in order to avoid having to deal with annoying special cases in various
theoretical results. For that, we simply define
span(∅) := {0} ⊂ V.

Proposition 2.30. Assume that V is a vector space and S = (vi )i∈I is a family
in V . Then span(S) is the smallest subspace of V containing all members of S.
That is, span(S) is a subspace of V , and if U ⊂ V is any other subspace of V with
S ⊂ U , then also span(S) ⊂ U .
Proof. We show first that span(S) is actually a subspace of V . To that end,
we note first that, trivially, 0 ∈ V . Next assume that u, w ∈ span(S). Then we can
write
XN NX
+M
u= λj vij and w= λ j vi j
j=1 j=N +1
for some N , M ∈ N, indices i1 , . . . , iN +M ∈ I, and scalars λ1 , . . . , λN +M ∈ K. As
a consequence,
NX +M
u+w = λj vij ∈ span(S).
j=1

Next assume that u ∈ span(S) and α ∈ K. Since u ∈ span(S), we can write


N
X
u= λj vij
j=1

for some N ∈ N, indices i1 , . . . , iN ∈ I, and scalars λ1 , . . . , λN ∈ K. Then


N
X N
X
αu = α λj vij = αλj vij ,
j=1 j=1

which shows that also αu ∈ span(S). According to Lemma 2.20, this shows that
span(S) is a subspace of V .
Now assume that U ⊂ V is any subspace of V containing S. Let moreover
u ∈ span(S). We have to show that also u ∈ U .
Because u ∈ span(S), we can write
N
X
u= λj vij
j=1

for some N ∈ N, indices i1 , . . . , iN ∈ I, and scalars λ1 , . . . , λN ∈ K. By assumption,


vij ∈ U for every j, and therefore also λj vij ∈ U , as U is a subspace of V . Next,
we have that
λ1 vi1 + λ2 vi2 + . . . + λN viN ∈ U,
because each term is contained in U and, again, U is a subspace of V . This shows
that u ∈ U , which concludes the proof. □
Example 2.31. Consider the space P(K) of all polynomials with coefficients in
K. Let moreover M := {1, x, x2 , . . .} be the set of all monomials. Then span(M ) =
P(K), as we can write every polynomial as a linear combination of finitely many
polynomials. ■
2. LINEAR INDEPENDENCE AND BASES 27

2.2. Linear Independence and Bases.


Definition 2.32 (Linear independence). Assume that V is a vector space over
K and S = (vi )i∈I is a family in V . We say that S is linearly independent, if the
following holds:
For each finite subset of indices J ⊂ I and each family (λj )j∈J ⊂ K or scalars,
if X
λj vj = 0,
j∈J
then
λj = 0 for all j ∈ J.
If S is not linearly independent, we call S linearly dependent. ■

Remark 2.33. We have formulated the notions of linear dependence and in-
dependence for families, but it is straightforward to formulate them for sets: A set
S ⊂ V is linearly independent, if for P each finite subset W ⊂ S and each family
(λw )w∈W ⊂ K of scalars such that w∈W λw w = 0 we have that λw = 0 for all
w ∈ W.
There is, however, a difference as to how sets and families handle “duplicate”
members, which can lead to an at first glance counterintuitive result about linear
independence of sets: It is true to say that a family S = (vi )i∈I is linearly dependent,
if it contains the same element twice, that is, if there exist i ̸= j ∈ I such that
vi = vj . In particular, if V is a vector space and v ∈ V \ {0} is any non-zero vector
in V , then the family (v, v) is linearly dependent. However, the set {v, v} is linearly
independent, as, actually, {v, v} = {v} (as a set). Because of this (and because of
the possibility of actually accessing different members by means of their indices) we
will most of the time employ families when talking about linear independence. ■
Proposition 2.34. Assume that V is a vector space and S = (vi )i∈I is a family
in V . Then S is linearly independent, if and only if every vector u ∈ span(S) can
be written in a unique way as a finite linear combination of members of S.
More precisely, S is linearly independent if and only if the following holds: If
u ∈ span(S), then there exists a unique finite subset JP⊂ I and a unique family
(λj )j∈J ∈ K\{0} of non-zero coefficients, such that u = j∈J λj vj . Here we define
the empty sum to be equal to 0.
Proof. Assume first that S is linearly independent, and let u ∈ span(S).
From the definition of span(S), it follows that u can be written as a finite linear
combination of members of S. Thus we only have to show that this finite linear
combination is unique. Assume therefore that we can write
X X
u= λ j vj = µk vk
j∈J k∈K

for finite subsets J, K ⊂ I, and families (λj )j∈J ⊂ K\{0} and (µk )k∈K ⊂ K\{0} of
non-zero coefficients. Define now L := J ∪ K, and define the family (νℓ )ℓ∈L setting

λℓ − µℓ if ℓ ∈ J ∩ K,

νℓ := λℓ if ℓ ∈ J \ K,

−µℓ if ℓ ∈ K \ J.

Then X X X
νℓ v ℓ = λℓ vℓ − µℓ vℓ = u − u = 0.
ℓ∈L ℓ∈J ℓ∈K
Now the linear independence of S implies that νℓ = 0 for all ℓ ∈ L. In particular,
λℓ = µℓ for all ℓ ∈ J ∩ K. It remains to show that J = K, that is, that J \ K = ∅
28 2. LINEAR ALGEBRA

and K \ J = ∅. Assume therefore that ℓ ∈ J \ K. Then 0 = νℓ = λℓ , which


contradicts the assumption that λℓ ̸= 0 for all ℓ ∈ J. Thus J \ K = ∅. The proof
that K \ J = ∅ is similar.
Assume now that every vector u ∈ span(S) can be written in a unique way as
a finite linear combination of members of S. We have to show that S is linearly
independent. Assume to that end that J ⊂ I and (λj )j∈J ⊂ K are such that
X
λj vj = 0.
j∈J

Denote by J ′ ⊂ J the subset of indices j ∈ J for which λj ̸= 0. We have to show


that J ′ = ∅. By construction of the set J ′ we have that
X
0= λ j vj .
j∈J ′
P
At the same time, we can write 0 as the empty sum j∈∅ λj vj . Thus the uniqueness
assumption of the representation of 0 ∈ span(S) implies that J ′ = ∅, which implies
that S is linearly independent. □

Example 2.35. Consider the space P(K) of all polynomials with coefficients
in K. Let moreover M := (1, x, x2 , . . .) be the family of all monomials. Then M is
linearly independent. ■

Definition 2.36 (Basis). Let V be a vector space and B ⊂ V a family in


V . We say that B is a (Hamel) basis of V , if B is linearly independent and
span(B) = V . ■

Remark 2.37. In view of Proposition 2.34, we can equivalently say that B is


a basis of V , if every vector v ∈ V can be written in a unique way as a finite linear
combination of elements of B. ■

Example 2.38. The set M of monomials is a basis of the vector space of all
polynomials. ■

Theorem 2.39. Assume that V is a vector space and S ⊂ V is some linearly


independent family in V . Then there exists a basis B of V with S ⊂ B.
In particular, every vector space has a basis.
The proof of this theorem is by no means simple, and it actually relies on Zorn’s
Lemma (and thus the Axiom of Choice).

Proof∗ . Let L ⊂ P(V ) be the set of all linearly independent subsets of V


containing S. On L we consider the partial order R defined by inclusion, that is,
A ≤R B if A ⊂ B. We will first show that L contains a maximal element with
respect to this order. After that, we will show that such a maximal element is a
basis.
In order to show that L contains a maximal element, we will use Zorn’s Lemma
(Theorem 1.61*). For this, we have to show that every chain in L has an upper
bound. Assume therefore that M is a chain in L. That is, M contains linearly
independent subsets of V , each containing S, and for all A, B ∈ M we have that
either A ⊂ B or B ⊂ A. S
Define now M := A∈M A the union of all the sets in M. We will show that
M is an upper bound for M in L. First we note that, clearly, A ⊂ M for every
A ∈ M. Thus we only have to show that M ∈ L. It is obvious that S ⊂ M . Thus
it remains to show that M is linearly independent.
2. LINEAR INDEPENDENCE AND BASES 29

Assume therefore that J ⊂ M is a finite subset and that λu , u ∈ J, are such


that X
0= λu u.
u∈J
S
Since J ⊂ M = A∈M M , there exists for each u ∈ J some Au ∈ M such that
u ∈ Au . Now the sets in M are totally ordered by inclusion, and thus we have
for each u, v ∈ J, that at least one of the inclusions Au ⊂ Av or Av ⊂ Au holds.
Since the set J is finite, it follows that there exists v ∈ J such that Au ⊂ Av for
all u ∈ J. In particular, u ∈ Av for all u ∈ J. Since the set Av is by assumption
linearly independent, it follows that λu = 0 for all u ∈ J. This shows that M is
linearly independent.
We have now shown that every chain in L has an upper bound. Zorn’s Lemma
therefore implies that there exists a maximal element B ∈ L. We want to show that
B is a basis containing S. By construction, we have that B is linearly independent
and that S ⊂ B. Thus it remains to show that span(B) = V .
For this, assume to the contrary that span(B) ̸= V . Then there exists v ∈
V \ span(B). We assert that B ∪ {v} is linearly independent as well. Assume
therefore that J ⊂ B ∪ {v} is a finite subset and that λu ∈ K, u ∈ J, are such that
X
λu u = 0.
u∈J

If J ⊂ B, then the linear independence of B implies that λu = 0 for all u ∈ J.


Assume therefore there v ∈ J. Then we can write
X
−λv v = λu u,
u∈J\{v}

which implies that −λv v ∈ span(B). Since by assumption v ̸∈ span(B), this further
implies that λv = 0. As a consequence, we have that
X
0= λu u.
u∈J\{v}

Now the linear independence of B implies that λu = 0 for all u ∈ J \ {v}. In


total, we therefore have that λu = 0 for all u ∈ J, and thus B ∪ {v} is linearly
independent. This, however, is a contradiction to the maximality of B. Therefore
span(B) = V , which proves that B is a basis of V . □

The existence of a basis of every vector space is, for pure mathematicians, an
extremely interesting result. For practical applications, however, it turns out to
be largely useless in that generality, as one can actually show that there cannot
be a universally applicable method for actually constructing a basis. (In fact, it is
possible to show that the existence of bases of aribtrary vector spaces is equivalent to
the Axiom of Choice.) Because of that, we will turn instead to finite dimensional
vector spaces, where the situation appears to be much simpler. This, however,
requires us first to define the notion of the dimension of a vector space.
Definition 2.40. A vector space V is called finite dimensional, if V has a
finite basis. Else, V is called infinite dimensional. ■

Theorem 2.41. Assume that V is finite dimensional. Then all bases of V are
finite and contain the same number of members.
Definition 2.42 (Dimension). Assume that V is finite dimensional. The num-
ber of members of members of any (and therefore every) basis of V is called the
dimension of V and denote dim V . ■
30 2. LINEAR ALGEBRA

The proof of Theorem 2.41 is surprisingly tricky and relies on another important
result, namely the Steinitz Exchange Lemma 2.43*, which we formulate (and prove)
below. Feel therefore free to skip over the proof.

Proof∗ . Since V is a finite dimensional space, it has a finite basis, say C =


(u1 , . . . , um ). Assume now that B = (vi )i∈I is an arbitrary basis of V . By
Lemma 2.43* below, we can find distinct indices i1 , . . . , im ∈ I, replace the vec-
tors vi1 , . . . , vim in B by the vectors u1 , . . . , um , and still obtain a basis of V , say
B̃ = (ṽi )i∈I . In particular, this implies that I has at least m members. On the other
hand, if I has more than m members, then there exists some index j ̸∈ {i1 , . . . , im }.
Since C is a basis of V , we can write the corresponding vector vj as a finite linear
combination of the vectors u1 , . . . , um . Since vj = ṽj , this contradicts the linear
independence of the family B̃. As a consequence, B has precisely m members,
which proves the assertion. □

Lemma 2.43* (Steinitz Exchange Lemma). Assume that V is a vector space


over K. Then, if B = (vi )i∈I is a basis of V and (u1 , . . . , um ) is a finite family of
linearly independent vectors, then there exist distinct indices i1 , . . . , im ∈ I, such
that we can replace the basis elements vi1 , . . . , vim by the vectors u1 , . . . , um and
still obtain a basis of V .
Proof∗ . We prove this assertion by induction over m:
For m = 0, the claim is trivial.
Assume therefore that the claim is true for all bases of V and all linearly
independent families with m members. Let moreover B = (vi )i∈I be a basis of V ,
and let (u1 , . . . , um , um+1 ) be a family of m + 1 linearly independent vectors in V .
By our induction hypothesis, we can find distinct indices i1 , . . . , im ∈ I and replace
the basis elements vi1 , . . . , vim by u1 , . . . , um , and still obtain a basis of V . Let us
denote this new basis by B̃ = (ṽi )i∈I . Since this is a basis of V and um+1 ̸= 0,
we can in particular find a unique finite set J ⊂ I and unique non-zero coefficients
λj ∈ K \ {0}, j ∈ J, such that
X
um+1 = λj ṽj .
j∈J

We claim now that J \{i1 , . . . , im } =


̸ ∅. Indeed, if it were true that J \{i1 , . . . , im } =
∅ and thus {i1 , . . . , im } ⊂ J, we could write um+1 as a linear combination of the
vectors u1 , . . . , um , which contradicts the assumption that the family (u1 , . . . , um+1 )
was linearly independent. Choose now an element im+1 ∈ J \ {i1 , . . . , im }, replace
vim+1 by um+1 , and denote the resulting family by B̂ := (v̂i )i∈I . We claim that
this is a basis of V , that is, that it is linearly independent and spans V .
We first show that span(B̂) = V . Assume to that end that v ∈ V . Since B̃ is
a basis of V , it spans V and thus there exist a finite subset L ⊂ I and coefficients
µℓ ∈ K, ℓ ∈ L, such that
X
v= µℓ ṽℓ .
ℓ∈L
Now, if im+1 ̸∈ L we have that ṽℓ = v̂ℓ and we have thus written v as a finite linear
combination of members of B̂. On the other hand, if im+1 ∈ L, we can write
X
v = µim+1 ṽim+1 + µℓ v̂ℓ
ℓ∈L\{im+1 }
µim+1  X  X
= um+1 − λj v̂j + µℓ v̂ℓ ,
λim+1
j∈J\{im+1 } ℓ∈L\{im+1 }
3. MATRIX REPRESENTATIONS OF LINEAR TRANSFORMATIONS 31

which again is a finite linear combination of members of B̂. Thus it follows that
v ∈ span(B̂).
We now show that span(B̂) is linearly independent. Assume therefore that
L ⊂ I is finite and that µℓ ∈ K, ℓ ∈ L, are such that
X
µℓ v̂ℓ = 0.
ℓ∈L

Denote now L := ℓ ∈ L : µℓ ̸= 0 . We have to show that L′ = ∅. We can write




X
µℓ v̂ℓ = 0
ℓ∈L′

with µℓ ̸= 0 for all ℓ ∈ L . Now, if im+1 ̸∈ L′ , then v̂ℓ = ṽℓ for all ℓ ∈ L′ and thus

the linear independence of B̃ implies that µℓ = 0 for all ℓ ∈ L′ , which implies that
L′ = ∅. Assume therefore that im+1 ∈ L′ . Then we can write
µi  X  X
0 = m+1 um+1 − λj ṽj + µℓ ṽℓ .
λim+1 ′
j∈J\{im+1 } ℓ∈L \{im+1 }

As a consequence,
X X λim+1
(1) um+1 = λj ṽj + µℓ ṽℓ .
µim+1
j∈J\{im+1 } ℓ∈L′ \{im+1 }

Now recall that we also have that


X X
um+1 = λj ṽj = λim+1 ṽim+1 + λj ṽj
j∈J j∈J\{im+1 }

and that λim+1 ̸= 0. Since B̃ is a basis of V and the basis representation of every
vector v ∈ V is unique, it follows that the vector ṽim+1 must also appear (with
a non-zero coefficient) in the representation (1), which is obviously not the case.
Thus im+1 ̸∈ L′ , which in turn shows that L′ = ∅. Thus the set B̂ is linearly
independent, which concludes the proof. □

3. Matrix Representations of Linear Transformations


Assume that U , V are finite dimensional vector spaces (over the same field
K) with n = dim(U ) and m = dim(V ). Let moreover B = (u1 , . . . , un ) ⊂ U and
C = (v1 , . . . , vm ) ⊂ V be bases of U and V , respectively.
Since B and C are bases of the respective spaces, we can write each vector u ∈ U
uniquely as a linear combination
n
X
u= αj uj with αj ∈ K for j = 1, . . . , n,
j=1

and similarly each vector v ∈ V as a linear combination


m
X
v= βi vi with βi ∈ K for i = 1, . . . , m.
i=1

The vector α = (α1 , . . . , αn )T ∈ Kn is called the coordinate vector for u with


respect to the basis B, and the vector β = (β1 , . . . , βm )T ∈ Km the coordinate
vector for v with respect to C. Commonly one says that one “identifies u and v
with their coordinates α and β.” Be mindful, though, that this identification only
works (or is well-defined) once we have fixed the bases on the spaces; if we change
the bases, the coordinate representations of the same vectors will usually change as
well.
32 2. LINEAR ALGEBRA

Assume now that T : U → V is a linear transformation and that u ∈ U and


v ∈ V satisfy the relation
T u = v.
In the following, we will discuss how this relation between the vectors translates
into a relation between their coordinate representations α and β.
To start with, we consider the image of each of the basis vectors uj , j = 1, . . . , n,
separately. Since C is a basis of V and T uj ∈ V , there exist, for each j, unique
coefficients aij , i = 1, . . . , m, such that
m
X
(2) T uj = aij vi .
i=1

Now consider the vector


n
X
u= α j uj
j=1

with coordinates (α1 , . . . , αn ). The linearity of T implies that


Xn X n m
X  X n X m m X
X n 
(3) T u = αj T uj = αj aij vi = αj aij vi = aij αj vi .
j=1 j=1 i=1 j=1 i=1 i=1 j=1

Now assume that T u = v has the coordinate representation


m
X
(4) Tu = v = βi vi .
i=1

Since (v1 , . . . , vm ) is a basis of V , the coefficients in the representation are unique.


By comparing the right hand sides of (3) and (4), we obtain that
n
X
(5) aij αj = βi for i = 1, . . . , m.
j=1

Now collect the coefficients aij in an (m × n)-dimensional matrix A. Recall-


ing the definition of matrix-vector multiplication, we then see that the system of
equations (5) translates into the matrix-vector system
Aα = β.
In other words, the matrix A ∈ Km×n with entries aij defined by the relation (2)
represents the linear transformation T in the bases B and C.
Example 2.44. Consider the mapping T : P3 → P2 , p 7→ T p := p′ (that is,
differentiation of 3rd degree polynomials). In P3 we may choose the monomial basis
B = (1, x, x2 , x3 ), and in P2 the essentially same basis C = (1, x, x2 ) (though we
are missing the x3 term, as we are only considering 2nd degree polynomials there).
In order to find the matrix representation A of T in the monomial basis, we
apply the operation T to the basis elements in B and write the result in terms of
the elements of C. We have
T 1 = 0 = 0 · 1 + 0 · x + 0 · x2 ,
T x = 1 = 1 · 1 + 0 · x + 0 · x2 ,
T x2 = 2x = 0 · 1 + 2 · x + 0 · x2 ,
T x3 = 3x2 = 0 · 1 + 0 · x + 3 · x2 ,
The column entries of the matrix A can then be directly read off from the coefficients
on the right hand side. For instance, the third column corresponds to the result of
4. DIRECT SUMS AND INVARIANT SUBSPACES 33

T x2 , where we have the coefficients (0, 2, 0). In total, we obtain the matrix
 
0 1 0 0
A = 0 0 2 0 .
0 0 0 3

4. Direct Sums and Invariant Subspaces


4.1. External Direct Sums.
Definition 2.45 (Direct sum). Assume that U and V are vector spaces over K
with additions +U and +V and scalar multiplications ·U and ·V , respectively. The
(external) direct sum of U and V , denoted U ⊕V , is the set U ×V together with the
operations +U ⊕V : (U ×V )×(U ×V ) → (U ×V ) and ·U ⊕V : K×(U ×V ) → (U ×V ),
defined by
(u1 , v1 ) +U ⊕V (u2 , v2 ) := (u1 +U u2 , v1 +V v2 ) for all (u1 , v1 ), (u2 , v2 ) ∈ U × V
and
λ ·U ⊕V (u, v) := (λ ·U u, λ ·V v) for all (u, v) ∈ U × V and λ ∈ K.

Proposition 2.46. Let U and V be vector spaces over K. Then their direct
sum U ⊕ V is again a vector space over K.
Proof. Exercise! □

More general, we can consider several (but finitely many) vector spaces U1 ,. . . ,Un
and define their direct sum U1 ⊕ U2 ⊕ · · · ⊕ Un as the product U1 × U2 × · · · × Un
together with the componentwise addition
(u1 , u2 , . . . , un ) + (v1 , v2 , . . . , vn ) := (u1 + v1 , u2 + v2 , . . . , un + vn )
and componentwise scalar multiplication
λ · (u1 , u2 , . . . , un ) := (λu1 , λu2 , . . . , λun ).
Example 2.47. The direct sum of m copies of K is the space Km . The direct
sum of n copies of the space Km is the space Matm,n (K). ■

Example 2.48. Let U := C(R; K) the vector space of continuous functions on


R with values in K. Then the direct sum U ⊕ U consists of all pairs (f, g), where
both f and g are K-valued continuous functions on R. An alternative interpretation
of such a pair (f, g) of functions on R is as the vector-valued function that maps a
point t ∈ R to the pair (f (t), g(t)) ∈ K2 . Thus we can write the direct sum U ⊕ U
alternatively as
C(R; K) ⊕ C(R; K) = C(R; K2 ).

Example 2.49. In the previous examples, we only have the case where all the
involved spaces in the direct sum were identical. However, this need not be the
case in general. A simple, but important, example here is the case where U = Kn
and V = Km for some n, m ∈ N. For this case, we obtain that
Kn ⊕ Km = Kn+m .

34 2. LINEAR ALGEBRA

Definition 2.50. Assume that U1 ,. . . ,Un are vector spaces and 1 ≤ k ≤ n. By


ık : Uk → U1 ⊕ · · · ⊕ Un ,
u 7→ (0, . . . , 0, |{z}
u , 0 . . . , 0),
k-th component
and
πk : U1 ⊕ · · · ⊕ Un → Uk ,
(u1 , . . . , un ) 7→ uk ,
we denote the inclusion of Uk in the direct sum U1 ⊕ · · · ⊕ Un and the projection
from the direct sum U1 ⊕ · · · ⊕ Un to the k-th component Uk , respectively. ■

Lemma 2.51. Assume that U1 ,. . . ,Un are vector spaces and 1 ≤ k ≤ n. The
k-th inclusion ık : Uk → U1 ⊕· · ·⊕Un and the k-th projection πk : U1 ⊕· · ·⊕Un → Uk
are linear transformations.
Proof. Exercise! □
Assume now that U1 , U2 , and V1 , V2 are vector spaces and that T : U1 ⊕ U2 →
V1 ⊕ V2 is a linear transformation. Moreover, let (u1 , u2 ) ∈ U1 ⊕ U2 be arbitrary
(that is, let u1 ∈ U1 and u2 ∈ U2 be arbitrary). Then the linearity of T implies
that 
T (u1 , u2 ) = T (u1 , 0) + (0, u2 ) = T (u1 , 0) + T (0, u2 ).
Now both T (u1 , 0) and T (0, u2 ) are vectors in V1 ⊕ V2 , and thus we can decompose
both of these vectors into parts
T (u1 , 0) =: (T11 u, T21 u) ∈ V1 ⊕ V2 and T (0, u2 ) =: (T21 v, T22 v) ∈ V1 ⊕ V2 .
Here T11 is the part of T that maps U1 to V1 ; then T21 is the part of T that maps
U1 to V2 ; next, T12 is the part of T that maps U2 to V1 ; finally, T22 is the part of T
that maps U2 to V2 . Thus we can write the transformation T in “(block)-matrix-
vector-form” as
      
u1 T11 T12 u1 T11 u1 + T12 u2
T = = .
u2 T21 T22 u2 T21 u1 + T22 u2
The block Tij can be expressed explicitly expressed in terms of a composition of the
inclusion ıj : Uj → U1 ⊕ U2 , the transformation T , and the projection πi : V1 ⊕ V2 →
Vi as
(6) Tij = πi ◦ T ◦ ıj : Uj → Vi .
More general, if U1 ,. . . ,Un and V1 ,. . . ,Vm are vector spaces and T : U1 ⊕ · · · ⊕
Un → V1 ⊕ · · · ⊕ Vm is linear, we can write the operator T in the form
      
u1 T11 . . . T1n u1 T11 u1 + . . . + T1n un
 ..   .. . . .
.   ..   ..
(7) T . = . .  .  =  ,

. .
un Tm1 ... Tmn un Tm1 u1 + . . . + Tmn un
where the blocks Tij : Uj → Vi are again defined as in (6).
Conversely, if Tij : Uj → Vi are arbitrary linear mappings, we can define a
mapping T : U1 ⊕ · · · ⊕ Un → V1 ⊕ · · · ⊕ Vm using the formula (7). Thus providing
the whole mapping T is equivalent to providing the blocks Tij .
Remark 2.52. What we did just know looks pretty much like matrix–vector
calculus, and up to a point it really is: The case of linear mappings from Kn
to Km written as matrices A ∈ Km×n can be regarded as a special case, since
we can interpret Kn and Km as the n-fold and m-fold direct sum of copies of K,
respectively. ■
4. DIRECT SUMS AND INVARIANT SUBSPACES 35

We now consider the special case where U1 ,. . . ,Un are vector spaces and we are
given linear transformations Ti : Ui → Ui for 1 ≤ i ≤ n. Denote U := U1 ⊕ · · · ⊕ Un .
Then we can define a block-diagonal transformation Diag(T1 , . . . , Tn ) : U → U by
setting  
T1 0 . . . . . . 0
.. 
 0 T2 . . .

 . 
 .. . . .. .. .. 
Diag(T1 , . . . , Tn ) :=  . . . . . 
 
. .
 .. .. T

n−1 0
0 ... ... 0 Tn
with the notation from above. Note here that a 0 on the ij-th position is actually
the 0-operator from Uj to Ui , which maps every element u ∈ Uj to the 0-vector in
Ui .
In a sense, block-diagonal linear transformations are the simplest possible linear
transformations, both from a numerical and an analytical point of view, as all the
information about the transformation is already contained in the blocks T1 ,. . . ,Tn .
For instance, in order to represent the whole transformation, it is only necessary
to store (or implement) these n blocks; all the off-diagonal 0-blocks can simply
be ignored. Also, it is for instance straight-forward to show that a block-diagonal
transformation is invertible, if and only if each of the blocks is invertible; in this
case, the inverse is again block-diagonal and the blocks are the inverses Ti−1 of the
transformations Ti .
Remark 2.53* (Infinite direct sums). In the discussion above, we have only
considered direct sums of finitely many vector spaces. It is also possible, and some-
times necessary, to look at direct sums of infinitely many vector spaces. However,
the construction of these infinite direct sums is slightly more complicated than that
of finite direct sums.
Let us therefore assume that we are given a family (Ui )i∈I of vector spaces for
some non-empty (and possibly infinite) index set I. Then we can the direct sum of
this family as
n o
×
M
Ui := (ui )i∈I ∈ Ui : there exists a finite set J ⊂ I with ui = 0 for i ̸∈ J .
i∈I i∈I

That is, we take all the elements in the Cartesian product of the vector space for
which all but finitely many components are 0. As in the finite case, we define the
addition and the scalar multiplication on this set componentwise.
All the results we indicated above concerning the interplay between linear trans-
formations and direct sums still hold in this setting: If we are given two families of
vector spaces (Uj )j∈J and (Vi )i∈I and for each pair (i, j) ∈ I ×JLa linear transform-
L
ation Tij : Uj → Vi , we can define a linear transformation T : j∈J Uj → i∈I Vi
by
 X 
(8) T (uj )j∈J := Tij uj .
i∈I
j∈J
P
Note here that the sum j∈J Tij uj is actually a finite sum for each i ∈ I, as only
L
finitely many componentsLof (uj )j∈J L ∈ Uj are non-zero. Conversely, given a
linear transformation T : U
j∈J j → can define Tij := πi ◦T ◦ıj : Uj →
i∈I i , we L
V
Vi . Then the equation (8) holds for all (uj )j∈J ∈ j∈J Uj . ■

4.2. Internal Direct Sums. In the previous section, we have seen how we
can use the idea of a direct sum to combine different vector spaces to a larger space.
Moreover, we have discussed the connection between a linear transformation T of
36 2. LINEAR ALGEBRA

the larger space and the mappings Tij between the component spaces. Of particular
interest is the case where the mappings Tij are 0 whenever i ̸= j, and thus T has
a block-diagonal form.
The main goal of this chapter is to investigate whether we can go the opposite
direction as well. That is, we start with an arbitrary vector space U and a linear
transformation T : U → U . Is it then possible to decompose U as a direct sum
U = U1 ⊕ · · · ⊕ Un such that the mapping T has block-diagonal form w.r.t. this
decomposition? Also, to which extent are the spaces Uj unique?
As a first step, we define sums of arbitrary subsets of a vector space.
Definition 2.54. Assume that V is a vector space and that S1 , . . . , SN are
subsets of V . We define

S1 +. . .+SN := v ∈ V : there exist ui ∈ Si , i = 1, . . . , N, with v = u1 +. . .+uN .

We will mostly consider sums of subspaces of a vector space. In this case, it is


easy to show that this is again a subspace.
Lemma 2.55. Assume that V is a vector space and that U1 , . . . , UN are sub-
spaces of V . Then U1 + . . . + UN is again a subspace of V .
Proof. Exercise! □
Example 2.56. We consider the vector space Matn (K) of n × n-dimensional
matrices over K. We denote by TriU∗n (K) ⊂ Matn (K) the subspace of strictly
upper triangular matrices, that is, matrices A of the form
 
0 ∗ ... ∗
 .. . . . .
. . . . .. 
 ∈ Kn×n .
A= .
. . .

. . ∗
0 ... ... 0
The sum
W := TriU∗n (K) + Skewn (K)
is the subspace of all matrices that can be written as a sum of a strict upper
triangular matrix and a skew-symmetric matrix. It is easy to see that W consists
of all matrices A where all the diagonal entries are equal to 0.
Now denote by Diagn (K) ⊂ Matn (K) the subspace of all diagonal matrices.
Then we have that
Matn (K) = TriU∗n (K) + Skewn (K) + Diagn (K),
that is, we can write every matrix as a sum of a strict upper triangular matrix, a
skew-symmetric matrix, and a diagonal matrix. ■

The sum of subspaces Ui , i = 1, . . . , N , thus consists of all vectors w that can


in some way be written as a sum w = u1 + . . . + uN , where each of the vectors ui
is an element of the corresponding subspace Ui . These vectors ui , however, need in
general not be unique. That is, there might be different ways of decomposing the
vector w into a sum of vectors in Ui .
Definition 2.57. Let V be a vector space and let U1 , . . . , UN be subspaces of
V . Denote moreover W := U1 + . . . + UN . We say that W is the (internal) direct
sum of the subspaces U1 , . . . , UN , denoted
W = U1 ⊕ . . . ⊕ UN ,
if every vector w ∈ W can be uniquely written in the form w = u1 + . . . + uN with
ui ∈ Ui for i = 1, . . . , N . ■
4. DIRECT SUMS AND INVARIANT SUBSPACES 37

More explicitly, we have that W = U1 ⊕ . . . ⊕ UN , if we can conclude from the


fact that we can write
w = u1 + . . . + uN with ui ∈ Ui for all i = 1, . . . , N,
w = v1 + . . . + vN with vi ∈ Ui for all i = 1, . . . , N,
that necessarily
ui = vi for all i = 1, . . . , N.
Example 2.58. We have that
Matn (K) = TriU∗n (K) ⊕ Skewn (K) ⊕ Diagn (K),
since for every matrix A ∈ Matn (K) there is one and only one way how we can write
A as a sum of a strict upper triangular matrix, a skew-symmetric matrix, and a
diagonal matrix: The diagonal entries of A need to be taken care of by the diagonal
matrix; the strict lower triangular part needs to go into the skew-symmetric matrix;
what remains of A after subtracting the diagonal and the skew-symmetric part is
strictly upper triangular. ■

In the case of a sum of two sets, there is a simpler way for checking whether a
sum is direct or not, as the following result shows.
Lemma 2.59. Let V be a vector space and let U1 , U2 ⊂ V be subspaces of V .
The sum U1 + U2 is direct, if and only if U1 ∩ U2 = {0}.
Proof. Assume first that U1 ∩ U2 = {0}, and assume that w ∈ U1 + U2 can
be written as
w = u 1 + u 2 = v 1 + v2
with u1 , v1 ∈ U1 , and u2 , v2 ∈ U2 . Since U1 , U2 are subspaces of V , it follows
that u1 − v1 ∈ U1 and u2 − v2 ∈ U2 , and we also have that −(u1 − v1 ) ∈ U1 and
−(u2 − v2 ) ∈ U2 . However, we also have that
0 = w − w = (u1 − v1 ) + (u2 − v2 ),
and therefore
u1 − v1 = −(u2 − v2 ).
This shows that u1 − v1 = −(u2 − v2 ) ∈ U1 ∩ U2 = {0}, and thus u1 − v1 =
−(u2 − v2 ) = 0, or u1 = v1 and u2 = v2 . Thus the sum U1 + U2 is direct.
Now assume that the sum U1 + U2 is direct, and let v ∈ U1 ∩ U2 . Then we can
write
v =v+0=0+v
in two ways as sum of one vector in U1 and one vector in U2 . Since the sum is
direct, this is only possible for v = 0, which shows that U1 ∩ U2 = {0}. □

Remark 2.60. The result from Lemma 2.59 does not hold in the same/similar
form for sums of more than two subspaces. That is, if we have three subspaces U1 ,
U2 , U3 and we have that all pairwise intersections are trivial, that is, U1 ∩ U2 =
U1 ∩ U3 = U2 ∩ U3 = {0}, then we cannot conclude that the sum U1 + U2 + U3
is direct. As a simple counterexample, consider the spaces U1 = span{(1, 0)},
U2 = span{(0, 1)}, U3 = span{(1, 1)} of K2 . Their pairwise intersection is trivial,
but their sum is not direct: The vector (1, 1), for instance, is an element of U3 , but
it can also be written in the form (1, 1) = (1, 0) + (0, 1) as a sum of vectors in U1
and U2 . ■
38 2. LINEAR ALGEBRA

Example 2.61. Define U1 , U2 ⊂ R3 as


U1 = (x, y, z) ∈ R3 : y = z = 0


and
U2 = (x, y, z) ∈ R3 : x + y + z = 0 .


We will show
R 3 = U1 ⊕ U2 .
We start by showing that U1 ∩ U2 = {0}. To that end let (x, y, z) ∈ U1 ∩ U2 .
Because (x, y, z) ∈ U1 , it follows that y = z = 0. Now the fact that (x, y, z) =
(x, 0, 0) ∈ U2 implies that x = 0 as well. Thus (x, y, z) = (0, 0, 0).
Next we need to show that R3 = U1 + U2 . Assume therefore that (x, y, z) ∈ R.
Then we can write
     
x x+y+z −y − z
(9) y  =  0  +  y .
z 0 z
Since (x + y + z, 0, 0) ∈ U1 and (−y − z, y, z) ∈ U2 , it follows that we can write
an arbitrary vector in R3 as a sum of a vector in U1 and a vector in U2 . Thus
R3 = U1 + U2 . Since U1 ∩ U2 = {0}, it follows that R3 = U1 ⊕ U2 . ■

Definition 2.62 (Projection). Assume that V is a vector space and that U1


and U2 are subspaces such that V = U1 ⊕ U2 . The (oblique) projection onto U1
along U2 is the mapping π1 : V → V defined by the condition that π1 (v) = u1 if
v = u1 + u2 with u1 ∈ U1 and u2 ∈ U2 .
Similarly, the projection onto U2 along U1 is the mapping π2 : V → V defined
by the condition that π2 (v) = u2 if v = u1 + u2 with u1 ∈ U1 and u2 ∈ U2 . ■

Proposition 2.63. Assume that V is a vector space and that U1 and U2 are
subspaces such that V = U1 ⊕ U2 . Then the projections π1 and π2 onto U1 along
U2 and onto U2 along U1 , respectively, are linear, and we have that
v = π1 (v) + π2 (v)
for all v ∈ V .
Proof. The decomposition v = π1 (v) + π2 (v) follows immediately from the
definition. Assume now that v, w ∈ V . Then we can uniquely write v = u1 + u2
and w = z1 + z2 with u1 , z1 ∈ U1 , and u2 , z2 ∈ U2 . In particular, we have that
u1 = π1 (v), u2 = π2 (v), z1 = π1 (w), and z2 = π2 (w). As a consequence, we can
write
v + w = u1 + u2 + z1 + z2 = (u1 + z1 ) + (u2 + z2 ).
Since U1 and U2 are subspaces, it follows that u1 + z1 ∈ U1 and u2 + z2 ∈ U2 . Thus
the definition of π1 implies that
π1 (v + w) = u1 + z1 = π1 (v) + π1 (w),
π2 (v + w) = u2 + z2 = π2 (v) + π2 (w).
Next assume that v ∈ V and λ ∈ K. Again, we can write v = u1 + u2 with
u1 ∈ U1 and u2 ∈ U2 , and thus u1 = π1 (v) and u2 = π2 (v). Because U1 and U2
are subspaces, we have that λu1 ∈ U1 and λu2 ∈ U2 . Thus we can decompose
λv = λu1 + λu2 with λu1 ∈ U1 and λu2 ∈ U2 . This implies that
π1 (λv) = λu1 = λπ1 (v) and π2 (λv) = λu2 = λπ2 (v).
This proves the linearity of π1 and π2 . □
4. DIRECT SUMS AND INVARIANT SUBSPACES 39

Example 2.64. We continue with Example 2.61 and consider again the two
subspaces
U1 = (x, y, z) ∈ R3 : y = z = 0


and
U2 = (x, y, z) ∈ R3 : x + y + z = 0 .


We have already seen that R3 = U1 ⊕ U2 . Moreover, from (9) we immediately


obtain that
       
x x+y+z x −y − z
π1  y  =  0  and π2  y  =  y  .
z 0 z z

We now generalise the notion of an oblique projection to the case where the
vector space V is the direct sum of more than two subspaces.
Definition 2.65 (Projection). Assume that V is a vector space and that
U1 ,. . . ,UN are subspaces such that V = U1 ⊕ · · · ⊕ UN . The projections πi : V → V ,
i = 1, . . . , N , are defined as the unique mappings such that πi (v) ∈ Ui for each
i = 1, . . . , N and v ∈ V , and
v = π1 (v) + . . . + πN (v)
for each v ∈ V . ■

Proposition 2.66. Assume that V is a vector space and that U1 ,. . . ,UN are
subspaces such that V = U1 ⊕ · · · ⊕ UN . The projections πi : V → V , i = 1, . . . , N ,
defined in Definition 2.65 are linear transformations.
Proof. This is essentially the same as the proof of Proposition 2.63. □
Proposition 2.67. Assume that V is a vector space and that U1 ,. . . ,UN are
subspaces such that V = U1 ⊕ · · · ⊕ UN . Assume moreover that Bi is a basis of Ui ,
i = 1, . . . , N . Then B := B1 ∪ . . . ∪ BN is a basis of V .
Proof. For i = 1, . . . , N we can write Bi = (vj )j∈Ji for some index set Ji .
Here we may assume without loss of generality that the index sets are pairwise
disjoint, that is, that Ji ∩ Jj = ∅ for i ̸= j. Denote moreover J := J1 ∪ . . . ∪ JN .
Assume now that v ∈ V is an arbitrary vector. Then there exist unique vectors
ui ∈ Ui , i = 1, . . . , N , such that
v = u1 + . . . + uN .
Moreover, for each i = 1, . . . , N there exists a unique finite subset Ki ⊂ Ji and
unique non-zero scalars λj ∈ K \ {0}, j ∈ Ji , such that
X
ui = λ j vj .
j∈Ki

As a consequence, we can write


N
X N X
X X
v= ui = λj vj = λj vj .
i=1 i=1 j∈Ki j∈K1 ∪...∪KN

Thus we can represent the vector v ∈ V as finite linear combination of members of


B. Now assume that L ⊂ J is another finite subset and that µℓ ∈ K \ {0}, ℓ ∈ L,
are such that X
v= µℓ vℓ .
ℓ∈L
40 2. LINEAR ALGEBRA

Define for i = 1, . . . , N
X
Li := L ∩ Ji and wi := µℓ vℓ .
ℓ∈Li
PN
Then wi ∈ Ui for each i and v = i=1 wi . Thus the assumption that the sum of
the vector spaces Ui is direct implies that wi = ui for each i. Next, the assumption
that Bi is a basis of Ui and the fact that
X X
λj vj = ui = wi = µℓ vℓ
j∈Ki ℓ∈Li

imply that Ki = Li and λj = µj for each j ∈ Ki = Li . This further implies


SN
that L = K := i=1 Ki and that λj = µj for each j ∈ K. This shows that the
representation of v is unique, which in turn shows that B indeed is a basis of V . □
As a particular consequence, we obtain that the dimension of a direct sum is
the sum of the dimensions of the subspaces:
Proposition 2.68. Assume that V is a finite dimensional vector space and
that U1 , . . . , UN are subspaces of V . Then
dim(U1 + . . . + UN ) ≤ dim(U1 ) + . . . dim(UN ),
and we have equality if and only if the sum is direct.
Proof. Exercise! □
Lemma 2.69. Assume that V is a vector space and U ⊂ V is a subspace. Then
there exists a subset W ⊂ V such that V = U ⊕ W .
Proof. Let B = (ui )i∈I be a basis of U . In particular, B is a linearly inde-
pendent subset of V and thus, by Theorem 2.39, can be extended to a basis of
the whole space V . That is, there exists an index set J satisfying J ∩ I = ∅ and
a (linearly independent) family C := (uj )j∈J ⊂ V , such that the combined family
(uk )k∈I∪J forms a basis of V .
Define now W := span(C). Then
= span(B ∪ C) = V,
U + W = span(B) + span(C) |{z}
Exercise!

since B ∪ C is a basis of V .
It remains to show that the sum is direct. To that end, assume that v ∈ U ∩ W .
Since (uk )k∈I∪J is a basis of V , there exists a unique way of writing v in the form
N
X M
X
v= λℓ uiℓ + µm vjm
ℓ=1 m=1

with indices i1 , . . . , iN ∈ I and j1 , . . . , jM ∈ J, and coefficients λℓ , µm ∈ K. How-


ever, because v ∈ U ∩ W ⊂ W , it follows that the coefficients µm have to be equal
to 0 for all m. Similarly, because v ∈ U , it follows that the coefficients λℓ have all
to be equal to 0. Thus v = 0, which shows that U ∩ W = {0}. This shows that the
sum is direct, which in turn completes the proof. □
4.3. Invariant Subspaces. Assume now that V is a vector space and T : V →
V is a linear transformation of V . Assume moreover that U1 ,. . . ,UN ⊂ V are
subspaces such that V = U1 ⊕ · · · ⊕ UN . Similar as in Section 4.1, we can then write
the transformation T in block-matrix-form with respect to the given decomposition
of V . To that end, we define the transformations
Tij := πi ◦ T |Uj : Uj → Ui
4. DIRECT SUMS AND INVARIANT SUBSPACES 41

for i, j = 1, . . . , N . As before, the transformation Tij represents the “part of T


that maps Uj to Ui .” Now identify a vector u ∈ V with the tuple

u = u1 , . . . , uN with ui := πi u for i = 1, . . . , N.
Then we can (again, see (7)) write
    
T11 . . . T1n u1 T11 u1 + . . . + T1n un
 .. .. ..   ..  =  ..
Tu =  .

. .  .   . 
Tm1 . . . Tmn un Tm1 u1 + . . . + Tmn un
for every u ∈ V .
Our goal, as stated in the beginning of Section 4.2, is to discuss whether we
can choose the subspaces U1 ,. . . ,UN in such a way that the mapping T has block-
diagonal-form with respect to this decomposition. In the next definition, we will
introduce a notion that is central to this goal.

Definition 2.70 (T -invariant subspace). Assume that V is a vector space and


that T : V → V is a linear transformation. Assume moreover that U ⊂ V is a
subspace. We say that U is a T -invariant subspace of V , if T (U ) ⊂ U , that is, for
all u ∈ U we have that T u ∈ U . ■

Exercise 2.2. Show that ker(T ) and ran(T ) are always T -invariant subspaces.

Example 2.71. Consider the transformation T : Matn (K) → Matn (K), A 7→


T A = AT . Then the sets Symn (K) and Skewn (K) are T -invariant subspaces:
If A ∈ Symn (K), then T A = AT = A ∈ Symn (K) as well, and similarly, if
A ∈ Skewn (K), then T A = AT = −A ∈ Skewn (K) as well. ■

Example 2.72. Denote by C ∞ (R) the space of all arbitrarily differentiable


real-valued functions on R, and let P be the space of all real polynomials. Define
moreover T : C ∞ (R), f 7→ T f := f ′ . Then P is a T -invariant subspace of C ∞ (R):
If p is a polynomial, then T p = p′ is a polynomial as well. That is, if p ∈ P, then
also T p ∈ P, which is precisely the definition of T -invariance. ■

Lemma 2.73. Assume that V is a vector space, that T : V → V is a linear


transformation, and that U ⊂ V is a T -invariant subspace. Let W ⊂ V be a
subspace such that V = U ⊕ W . Then the block-matrix-representation of T with
respect to this decomposition is block-upper-triangular.

Proof. We write
 
T11 T12
T = with Tij = πi ◦ T |Uj .
T21 T22
We have to show that T21 = 0. Let therefore u ∈ U . Then the assumption that U is
T -invariant implies that T u ∈ U . Thus π1 (T u) = T u, and π2 (T u) = 0. This shows
that (π2 ◦ T )u = 0 for every u ∈ U , which proves that T21 = π2 ◦ T |U = 0. □

Lemma 2.74. Assume that V is a vector space and that U1 ,. . . ,UN ⊂ V are
subspaces such that V = U1 ⊕ · · · ⊕ UN . Let moreover T : V → V be a linear trans-
formation such that Ui is a T -invariant subspace of V for every i = 1, . . . , N . Then
the block-matrix-representation of T with respect to this decomposition is block-
diagonal.

Proof. Exercise! □
42 2. LINEAR ALGEBRA

5. Eigenvalues and Eigenspaces


Recall once again our goal of decomposing a vector space V as a direct sum
V = U1 ⊕ · · · ⊕ UN in such a way that the given linear transformation T : V →
V has block-diagonal form. In the previous section, we have just seen that this
requires the subspaces Uj to be T -invariant. It therefore makes sense to look at the
smallest possible (non-trivial) T -invariant subspaces of V , which are precisely the
eigenspaces of T .
Definition 2.75. Assume that V is a vector space and T : V → V is a linear
transformation. Then λ ∈ K is an eigenvalue of T , if there exists v ∈ V with v ̸= 0
such that
T v = λv.
Such a vector v is called an eigenvector of T (for the eigenvalue λ).
The eigenspace E(λ, T ) of T for the eigenvalue λ is defined as

E(λ, T ) := ker(T − λI) = v ∈ V : T v = λv .

Example 2.76. Consider the transformation T : Matn (K) → Matn (K), A 7→


T A := AT . If A ∈ Symn (K) is symmetric, then T A = AT = A, that is, 1 is an
eigenvalue of T , and every symmetric matrix with exception of the 0-matrix is a
corresponding eigenvector.
Similarly, if A ∈ Skewn (K) is skew-symmetric, then T A = AT = −A, and
thus −1 is an eigenvalue as well, and the corresponding eigenvectors are all non-
zero skew-symmetric matrices. ■

Example 2.77. Consider the transformation T : P(K) → P(K), p 7→ T p := p′ .


Then the only eigenvalue of T is 0, and the corresponding eigenvectors are the
constant polynomials p(x) = c for c ∈ K \ {0}.
We can, however, consider “the same” transformation over the space C ∞ (R, K)
of arbitrarily differentiable K-valued functions on R. That is, we consider the
function S : C ∞ (R, K) → C ∞ (R, K), f 7→ Sf := f ′ . In this case, every value
λ ∈ K is an eigenvalue of S with corresponding eigenfunctions f (x) = ceλx with
c ∈ K \ {0}. ■

Lemma 2.78. Assume that V is a vector space and T : V → V is linear. Assume


moreover that U ⊂ V is a one-dimensional subspace of V . Then U is T -invariant,
if and only if there exists an eigenvector v of T with U = span(v).
Proof. Assume first that U is T -invariant. Since U is one-dimensional, we
can write U = span(v) for some v ̸= 0 (with (v) being a basis of U ). Then
T v ∈ U = span(v), which implies that there exists c ∈ K with T v = cv. Thus v is
an eigenvector of T (for the eigenvalue c).
Conversely, assume that v is an eigenvector of T for the eigenvalue λ ∈ K and
that U = span(v). Let u ∈ U . Then there exists c ∈ K with u = cv. Since v is an
eigenvector of T we have that
T u = T (cv) = c T v = cλ v ∈ span(v) = U.
Thus U is T -invariant. □
We will now introduce an alternative characterisation of eigenvalues in finite
dimensional spaces, which, however, requires some technical preparation.
Lemma 2.79. Assume that U , V are vector spaces, U is finite dimensional, and
that T : U → V is bijective. Assume moreover that B = (u1 , . . . , un ) is a basis of U .
Then T B = (T u1 , . . . , T un ) is a basis of V . In particular, V is finite dimensional
and dim(U ) = dim(V ).
5. EIGENVALUES AND EIGENSPACES 43

Proof. Assume that v ∈ V . Since T is surjective, we have that V = ran(T ),


and thus there exists u ∈ U with v = T u. Since B is a basis of U we can write
u = c1 u1 + . . . + cn un for some cn ∈ K. As a consequence,
v = T u = c1 T u1 + . . . + cn T un .
This shows that V = span{T u1 , . . . , T un }.
It remains to show that the family (T u1 , . . . , T un ) is linearly independent.
Assume to that end that
0 = c1 T u1 + . . . + cn T un .
We can rewrite this as

0 = T c1 u1 + . . . + cn un ,
that is,
c1 u1 + . . . + cn un ∈ ker(T ).
Since T is injective, we have that ker(T ) = {0}, and thus
c1 u1 + . . . + cn un = 0.
Now the linear independence of the family (u1 , . . . , un ) implies that cj = 0 for all
j, which in turn proves the linear independence of the set (T u1 , . . . , T un ). □

Theorem 2.80. Assume that U , V are vector spaces, U is finite dimensional,


and that T : U → V is linear. Then
dim U = dim(ker T ) + dim(ran T ).
Proof. By Lemma 2.69, there exists a subspace W ⊂ U such that U =
(ker T ) ⊕ W . Now consider the transformation S : W → ran T , w 7→ Sw := T w.
That is, S is the restriction of T to W , seen as a mapping from W to ran T . We
will show that S is a bijection.
We note first that ker S = W ∩ker T = {0}, as the sum of W and ker T is direct.
Thus S is injective. In order to show that S is surjective, assume that v ∈ ran T .
Then there exists u ∈ U with v = T u. Now decompose u = u0 + w with u0 ∈ ker T
and w ∈ W . That is, u0 = πker T u and w = πW u. Then v = T u = T u0 + T w =
0 + Sw = Sw. Thus v ∈ ran S, which shows that S is surjective as well.
As a consequence, it follows from Lemma 2.79 that dim(W ) = dim(ran T ).
Next, since the sum of ker T and W is direct, we have that dim U = dim(ker T ) +
dim(W ), and thus
dim U = dim(ker T ) + dim(W ) = dim(ker T ) + dim(ran T ).

Theorem 2.81. Assume that V is a finite dimensional vector space and that
T : V → V is linear. The following are equivalent:
(1) T is bijective.
(2) T is injective.
(3) T is surjective.
Proof. The implications (1) =⇒ (2) and (1) =⇒ (3) hold trivially.
We now show the implication (2) =⇒ (3). Since T is injective and thus
ker T = {0}, we have
dim(V ) = dim(ker T ) + dim(ran T ) = 0 + dim(ran T ) = dim(ran T ).
Thus ran T is a subspace of V of the same dimension as V . This already implies
that V = ran T and thus T is surjective.
44 2. LINEAR ALGEBRA

Finally we show the implication (3) =⇒ (1). Here it remains to show that
T is injective. Since T is surjective, we have that V = ran T and in particular
dim V = dim(ran T ). Thus
dim(V ) = dim(ker T ) + dim(ran T ) = dim(ker T ) + dim(V ),
which implies that dim(ker T ). This is only possible if ker T = {0}, that is, T is
injective. □
Definition 2.82. Let V be a vector space. We denote by IdV : V → V the
identity operator on V given by IdV v = v for all v ∈ V . If the space V is clear
from the context, we omit the subscript V and write Id instead. ■

Theorem 2.83. Assume that V is finite dimensional, T : V → V is a linear


transformation, and that λ ∈ K.
The following are equivalent:
(1) λ is an eigenvalue of T .
(2) T − λ Id is not injective.
(3) T − λ Id is not surjective.
(4) T − λ Id is not bijective.
Proof. The equivalences (2) ⇐⇒ (3) ⇐⇒ (4) follow from Theorem 2.81.
Moreover, we have that λ is an eigenvalue of T , if and only if there exists v ̸= 0
with T v = λv. This is in turn equivalent to stating that there exists v ̸= 0 such
that (T − λ Id)v = 0, which in turn is equivalent to the non-injectivity of T − λ Id.
This proves the equivalence (1) ⇐⇒ (2). □
Remark 2.84. Do note that the characterisation of eigenvalues from The-
orem 2.83 only holds in finite dimensions. In infinite dimensions, however, it is
only true to say that λ is an eigenvalue of T if and only if T − λ Id is not injective.
The other equivalences do no longer hold.
In fact, we have already seen an example of this, namely the transformation
T : P(K) → P(K), p 7→ T p := p′ . Here we know that 0 is an eigenvalue, which is
the same as saying that T is not injective. Nevertheless, the transformation T is
surjective. ■

6. Existence of Eigenvalues
We will now show that linear transformations of complex, finite dimensional
vector spaces always have at least one eigenvalue. In order to so, we need to
introduce the concept of the evaluation of a polynomial on a linear transformation.
Assume to that end that T : V → V is a linear transformation of the space V .
Then we can consider the composition T ◦ T with itself, which is again a linear
transformation of V . Thus we can further consider the composition T ◦ T ◦ T .
Again, this is a linear transformation of V . In a similar way, we can consider
arbitrary (but finite) compositions of T with itself. In order to simplify notation,
we abbreviate in the following T 2 := T ◦ T , and similarly T 3 = T ◦ T ◦ T , and so
on.
Definition 2.85. Assume that V is a vector space, T : V → V is a linear
transformation, and p is a polynomial with coefficients in K, say
p(x) = c0 + c1 x + c2 x2 + . . . + cn xn .
We define the mapping p(T ) : V → V by
p(T )u := c0 u + c1 T u + c2 T 2 u + . . . cn T n u for all u ∈ V.

7. TRIANGULARISATION 45

Proposition 2.86. Assume that V is a vector space, T : V → V is linear, and


p, q are polynomials with coefficients in K. Denote by r(x) := p(x)q(x) the product
of p and q. Then
r(T ) = p(T ) ◦ q(T ) = q(T ) ◦ p(T ).
Proof. Exercise. □
Theorem 2.87. Assume that V is a finite dimensional complex vector space,
V ̸= {0}, and that T : V → V is a linear transformation. Then T has at least one
eigenvalue λ ∈ C.
Proof. Denote by n := dim(V ) the dimension of V . Let v ∈ V be arbitrary
with v ̸= 0 and consider the vectors
v, T v, T 2 v, . . . , T n v.
These are n + 1 vectors in an n-dimensional space, which implies that they cannot
be linearly independent. That is, there exist numbers c0 , . . . , cn ∈ C, not all of
them equal to 0, such that
(10) c0 v + c1 T v + c2 T 2 v + . . . + cn T n v = 0.
Define now the polynomial
p(x) := c0 + c1 x + . . . + cn xn .
Then we can write (10) as
p(T )v = 0.
Now let m be the largest index for which cm ̸= 0 (it is possible that cn = 0, so it
can happen that m < n). Then we can factorise the polynomial p as
p(x) = cm (x − λ1 )(x − λ2 ) · · · (x − λm )
with λ1 , . . . , λm ∈ C being the complex zeroes of p (possible with higher multipli-
cities). Thus, after dividing by cm ̸= 0, the equation (10) can be further written
as
(T − λ1 Id) ◦ (T − λ2 Id) ◦ . . . ◦ (T − λm Id)v = 0.
Since v ̸= 0 by assumption, this implies that the composition (T − λ1 Id) ◦ . . . ◦ (T −
λm Id) cannot be injective (as v is in its kernel). This, however, is only possible if
at least one of the factors T − λj Id is not injective. This, in turn, is equivalent to
λj being an eigenvalue of T . □
Remark 2.88. Note that the proof breaks down if we consider real vector
spaces, as in this case the factorisation of the polynomial is no longer possible (and
polynomials need not have real zeroes). Indeed, there exist linear transformations
in real vector spaces that do not have (real!) eigenvalues, the simplest example
being that of a rotation in R2 (by an angle that is not an integer multiple of π). ■

7. Triangularisation
Lemma 2.89. Assume that the matrix
 
β1 ∗ . . . ∗
. 
 0 β2 . . . .. 

B=.
  ∈ Matn (K)
 .. . . . . . . ∗ 

0 . . . 0 βn
is upper triangular. Then the matrix B is invertible, if and only if all diagonal
entries β1 , . . . , βn are different from zero.
46 2. LINEAR ALGEBRA

Idea of proof. The matrix B is invertible, if and only if its columns are
linearly independent. If all diagonal entries are different from zero, this is clearly
the case (if you want, you can prove this by induction over n).
Conversely, if one of the diagonal entries is equal to zero, say βj = 0, then the
first j columns cannot be linearly independent, as only the first j − 1 entries of each
of these columns are possibly different from zero. □

Proposition 2.90. Assume that V is a finite dimensional vector space and


that T : V → V is linear. Assume moreover that V has a basis with respect to
which that matrix representation A of T is upper triangular. Then the eigenvalues
of T are precisely the distinct diagonal entries of A.
Proof. The matrix A is upper triangular, and thus it has the form
 
α1 ∗ . . . ∗
.. 
 0 α2 . . .

. 
A=  . .

 .. .. ... ∗ 

0 . . . 0 αn
for some αj ∈ K, j = 1, . . . , n. Now note that the matrix representation of T − λ Id
in the same basis is the matrix
 
α1 − λ ∗ ... ∗
. .. 
α2 − λ . .

 0 . 
A − λ Id =  .  .
 .. . .. . ..

∗ 
0 ... 0 αn − λ
By Lemma 2.89, this matrix is invertible if and only if its diagonal entries are
non-zero.
Now recall that λ is an eigenvalue of T , if and only if the transformation T −λ Id
is not bijective. This is the case, if and only if its matrix representation (in any
basis) is not invertible, which, as we have just seen, is the case if and only if λ = αj
for some j, which proves the assertion. □

Theorem 2.91. Assume that V is a finite dimensional complex vector space


and that T : V → V is linear. Then there exists a basis of V with respect to which
the matrix representation of T is upper triangular.
Proof. We prove this result by induction over n := dim V . For n = 1, this
result is trivial, as every (1 × 1)-matrix is automatically upper triangular (and even
diagonal!).

Assume therefore that we have proven the claim for every vector space U with
dim U < n and every mapping S : U → U .
Let λ be an eigenvalue of V (the existence of which follows from Theorem 2.87).
If T = λ Id, then the result holds trivially, as the matrix of T with respect to any
basis is simply the matrix λ Id. Thus we may assume without loss of generality
that T ̸= λ Id Define now U := ran(T − λ Id). Since λ is an eigenvalue of T , it
follows that T − λ Id is not surjective, which implies that dim(U ) < dim(V ). Since
T − λ Id ̸= 0, we also obtain that dim(U ) ≥ 1.
We now want to use our induction assumption on the space U . For that, we
show first that U is T -invariant: Indeed, assume that u ∈ U . Then there exists
some v ∈ V with (T − λ Id)v = u. Thus
T u = T (T − λ Id)v = (T − λ Id)T v ∈ ran(T − λ Id) = U.
8. DIAGONALISATION 47

Here the second equality follows from the computation rules for polynomials in
Proposition 2.86.
Define now the transformation S : U → U , u 7→ Su := T u. That is, the
mapping S is the restriction T |U of T to U , but seen as a mapping from U to itself
(which is possible because U is T -invariant). Thus there exists a basis (u1 , . . . , um )
of U for which the matrix representation of S is upper triangular.
Now let W ⊂ V be such that V = U ⊕W , and choose any basis (w1 , . . . , wn−m )
of W . We now consider the matrix representation A of T with respect to the
basis (u1 , . . . , um , w1 , . . . , wn−m ). Following our considerations from Section 4.3
and specifically Lemma 2.73, the fact that U is T -invariant implies that A has a
block upper triangular structure
 
A11 A12
A= .
0 A22
Moreover, the matrix A11 is precisely the matrix representation of S with respect
to the basis (u1 , . . . , um ) and thus upper triangular. Finally, we have for all wj that
T wj = (T wj − λwj ) + λwj = (T − λ Id)wj + λwj .
Since (T − λ Id)wj ∈ ran(T − λ Id) = U , it follows that the matrix A22 (which
represents the part of T that maps W into W ) is A22 = λ Id. Thus the matrix A
as a whole is upper triangular. □

8. Diagonalisation
Definition 2.92. Let V be a finite dimensional vector space and let T : V → V
be linear. We say that T is diagonalisable if there exists a basis of V such that the
corresponding matrix representation of T is diagonal. ■

In the following, we will derive different equivalent conditions for the diagonal-
isability of a linear operator. For that, we show first that eigenvectors for different
eigenvalues are necessarily linearly independent.
Theorem 2.93. Assume that V is a vector space and T : V → V is linear. Let
moreover λ1 , . . . , λm ∈ K be distinct eigenvalues (that is, λi ̸= λj for i ̸= j), and
let v1 , . . . , vm ∈ V be corresponding eigenvectors. Then the family (v1 , . . . , vm ) is
linearly independent.
Proof. We apply induction over m. For m = 1, the result is trivial, as every
eigenvector is non-zero.

Assume now that we have shown the result for m − 1. That is, the family
(v1 , . . . , vm−1 ) is linearly independent.
Let now c1 , . . . , cm ∈ K be such that
0 = c1 v1 + . . . + cm vm .
We have to show that cj = 0 for all j = 1, . . . , m.
First we consider the case where cm = 0. In this case, we have that
0 = c1 v1 + . . . + cm−1 vm−1 .
Since by our induction assumption, the set (v1 , . . . , vm−1 ) is linearly independent, it
follows that the coefficients c1 , . . . , cm−1 are all equal to zero as well, which proves
the claim in this case.
Now assume that cm ̸= 0. Defining dj := −cj /cm for j = 1, . . . , m − 1, we then
obtain the equation
(11) vm = d1 v1 + . . . + dm−1 vm−1 .
48 2. LINEAR ALGEBRA

If we apply the transformation T to (11) we then obtain that



λm vm = T vm = T d1 v1 + . . . + dm−1 vm−1
= d1 T v1 + . . . + dm−1 T vm−1 = d1 λ1 v1 + . . . + dm−1 λm−1 vm−1 .
If we now multiply (11) by λm and subtract this last equation, we obtain that
0 = d1 (λm − λ1 )v1 + . . . + dm−1 (λm − λm−1 )vm−1 .
Now the assumed linear independence of the family (v1 , . . . , vm−1 ) implies that all
the coefficients dj (λm − λj ), j = 1, . . . , m − 1, are equal to zero. Since λm ̸= λj
for j = 1, . . . , m − 1, this implies that also dj ̸= 0. Finally, this shows that cj =
−cm dj ̸= 0, which concludes the proof. □
As a consequence of this result, we obtain that the sum of eigenspaces is direct.
Lemma 2.94. Assume that T : V → V is linear and that λ1 , . . . , λm ∈ K are
distinct eigenvalues of T . Then the sum of the eigenspaces E(λ1 , T ), . . . , E(λm , T )
is direct.
Proof. Let v ∈ E(λ1 , T ) + . . . + E(λm , T ). Assume that we can write v in
two ways as
v = u1 + . . . + um ,
v = v1 + . . . + vm ,
such that uj , vj ∈ E(λj , T ) for all j = 1, . . . , m. We have to show that in this case
uj = vj for all j.
By subtracting the two representations of v from each other, we see that
(12) 0 = (u1 − v1 ) + . . . + (um − vm )
with (uj − vj ) ∈ E(λj , T ) for all j = 1, . . . , m. Now recall that we shown in
Theorem 2.93 that eigenvectors for distinct eigenvalues are linearly independent.
Thus (12) can only hold if the vectors (uj − vj ) are not eigenvectors of T , which is
only possible if they are equal to zero. □
Theorem 2.95. Assume that V is a finite dimensional vector space and that
T : V → V is linear. Denote by λ1 , . . . , λm ∈ K the distinct eigenvalues of T . The
following are equivalent:
(1) T is diagonalisable.
(2) V has a basis of eigenvectors of T .
(3) We can decompose V = U1 ⊕ U2 ⊕ . . . ⊕ Un such that each space Uj is a
one-dimensional T -invariant subspace of V .
(4) V = E(λ1 , T ) ⊕ . . . ⊕ E(λ
 m , T ). 
(5) dim V = dim E(λ1 , T ) + . . . + dim E(λm , T ) .
Proof. It is easy to see that the matrix A with respect to a basis v1 , . . . , vn of
V is diagonal, if and only if there exist µi ∈ K, i = 1, . . . , n, (the diagonal entries of
A) such that T vi = µi vi for all i. This shows the equivalence (1) ⇐⇒ (2). Also,
the equivalence (1) ⇐⇒ (3) follows from the considerations in Section 4.3.
Next we show the equivalence (2) ⇐⇒ (4): If V has a basis of eigenvectors,
then necessarily V is the sum of the eigenspaces of T . Moreover, this sum is direct
by Lemma 2.94. Conversely, if (4) holds, we can choose a basis of each eigenspace,
and concatenate the different bases to a basis of V .
The implication (4) =⇒ (5) follows from Proposition 2.68. Finally, assume
that (5) holds. By Lemma 2.94, the sum of the eigenspaces is direct, and thus, by
Proposition 2.68,
  
dim E(λ1 , T ) + . . . + E(λm , T ) = dim E(λ1 , T ) + . . . + dim E(λm , T ) = dim V.
9. GENERALISED EIGENSPACES 49

Since we already know that the sum is direct, it follows that


V = E(λ1 , T ) ⊕ . . . ⊕ E(λm , T ).

In particular, we obtain the following result concerning diagonalisability of


operators:
Corollary 2.96. Assume that dim V = n and that the linear transformation
T : V → V has n distinct eigenvalues. Then T is diagonalisable.

9. Generalised Eigenspaces
In the previous section, we have introduce the notion of diagonalisability of a
linear transformation. Moreover, we have seen that a transformation is diagonalis-
able if and only if it has a basis of eigenvectors. Unfortunately, it turns out that
this is not the case for all transformations. As an example, consider the matrix
 
0 1
A= .
0 0
Since this matrix is upper triangular, we can read off the eigenvalues from its
diagonal, which is 0. Thus the only eigenvalue of A is 0. However, the matrix A
cannot be diagonalisable: If it were, it could be transformed by a basis change into
a diagonal matrix where all the diagonal entries are equal to 0; in other words, the
matrix would have be 0.
With the same argumentation we obtain that strictly upper triangular matrices
cannot be diagonalisable, unless they are already equal to 0. Moreover, the same
holds if we add the same constant λ to the diagonal: A matrix of the form
 
λ ∗ ... ∗
.
 0 . . . . . . .. 

(13) A = .
 
 .. . . . . . . ∗ 

0 ... 0 λ
can only be diagonalisable, if all the off-diagonal elements are equal to 0 and A =
λ Id.
In this section, we will show that this is in a sense the most general form
of a non-diagonalisable matrix. More precisely, we will show that every linear
operator T on a finite dimensional, complex vector space has a basis such that the
corresponding matrix of T has a block-diagonal form where every block is of the
form (13). In order to arrive at this representation, we will have to introduce some
notation, though, and then do some hard work.

Nilpotent Transformations.
Definition 2.97. Assume that V is a vector space and T : V → V is a linear
transformation. We say that T is nilpotent, if there exists j ∈ N such that T j =
0. ■

It is easy to show that nilpotent transformations can have no other eigenvalue


than 0:
Lemma 2.98. Assume that V is a vector space and T : V → V is a nilpotent
linear transformation. Then the only eigenvalue of T is 0.
50 2. LINEAR ALGEBRA

Proof. Assume that λ ∈ K is an eigenvalue of T , and let v ∈ V , v ̸= 0, be


a corresponding eigenvector. Then T v = λv. Moreover, since T is nilpotent, there
exists j ∈ N such that T j = 0, and in particular T j v = 0. Thus we have that
0 = T j v = T j−1 T v = T j−1 λv = λT j−1 v = . . . = λj v.
Since v ̸= 0, this implies that λ = 0. □

Lemma 2.99. Assume that the matrix A ∈ Matn (K) is strictly upper triangu-
lar, that is, its entries aij satisfy aij = 0 for all i ≥ j. Then An = 0.
Proof. This is a simple calculation. □

Proposition 2.100. Assume that V is a complex finite dimensional vector


space and that T : V → V is nilpotent. Then
T dim(V ) = 0.
Proof. By Theorem 2.91, we can find a basis of V such that the matrix A of
T is upper triangular. Next we know from Lemma 2.98 that the only eigenvalue of
T is 0. Thus all the diagonal elements of A (which are the eigenvalues of T ) are
0, which means that the matrix A is actually strictly upper triangular. Now the
claim follows from Lemma 2.99. □

Now consider the situation where the operator T has a single eigenvalue λ.
After choosing a suitable basis, we can then write the matrix A of T in the form
given in (13). This, however, implies that the matrix A − λ Id is nilpotent and thus
(A − λ Id)n = 0. Now note that the matrix (A − λ Id)n is precisely the matrix of
the operator (T − λ Id)n in the same basis. Thus the operator T − λ Id is nilpotent
and every element v in the vector space V satisfies (T − λ Id)n v = 0. In particular,
there exists a basis of V consisting of vectors that satisfy (T − λ Id)n v = 0.
Definition 2.101. Let V be a vector space, let T : V → V be linear and λ ∈ K.
A vector v ∈ V , v ̸= 0, is called a generalised eigenvector of T for the eigenvalue λ,
if there exists j ∈ N such that
(T − λ Id)j v = 0.
The linear span of all generalised eigenvectors of T for the eigenvalue λ is called
the generalised eigenspace of T for λ, and denoted by G(λ, T ). ■

Remark 2.102. It is easy to see that the generalised eigenspace for the eigen-
value λ consists of all the generalised eigenvectors for λ together with the vector
0. ■

Remark 2.103. If v ̸= 0 satisfies the equation (T − λ Id)j v = 0, then λ neces-


sarily has to be an eigenvalue of T : Let k be the largest non-negative integer such
that w := (T − λ Id)k v ̸= 0. Then
(T − λ Id)w = (T − λ Id)k+1 v = 0,
and thus T w = λw, which in particular shows that λ is an eigenvalue (and w a
corresponding eigenvector). ■

Preparatory Technical Results.


We will in the following show that every vector space has a basis consisting of
generalised eigenvectors. For that, we will need several additional technical results.
Lemma 2.104. Let V be a vector space, let T : V → V be linear, and let p be a
polynomial. Then the subspaces ker(p(T )) and ran(p(T )) are T -invariant.
9. GENERALISED EIGENSPACES 51

Proof. Assume that u ∈ ker(p(T )). Then p(T ) = 0 and thus p(T )(T u) =
T (p(T )u) = T (0) = 0, which shows that also T u ∈ ker(p(T )). Thus ker(p(T )) is
T -invariant.
Now assume that u ∈ ran(p(T )), say u = p(T )v for some v ∈ V . Then T u =
T (p(T )v) = p(T )(T v) ∈ ran(p(T )), which shows that ran(p(T )) is T -invariant as
well. □
Lemma 2.105. Assume that V is a vector space and that T : V → V is linear.
Then
{0} ⊂ ker T ⊂ ker T 2 ⊂ ker T 3 ⊂ . . .
Proof. Let k ∈ N and let u ∈ ker T k . Then T k u = 0 and thus T k+1 u =
T (T k u) = T (0) = 0 as well, which shows that u ∈ ker T k+1 . This proves that
ker T k ⊂ ker T k+1 for all k. □
Lemma 2.106. Assume that V is a vector space and that T : V → V is linear.
Assume moreover that for some k ∈ N we have that ker T k = ker T k+1 . Then
ker T k+m = ker T k for all m ≥ 0.
Proof. We prove this result by induction over m. For m = 0, this result is
trivial.
Now assume that ker T k+m = ker T k . We have to show that also ker T k+m+1 =
ker T k . By Lemma 2.105 we know that ker T k ⊂ ker T k+m+1 . Thus we only have to
show the inclusion ker T k+m+1 ⊂ ker T k . Assume therefore that u ∈ ker T k+m+1 .
Then 0 = T k+m+1 u = T k+1 (T m u), and thus T m u ∈ ker T k+1 . Since ker T k+1 =
ker T k , we then obtain that T m u ∈ ker T k , which in turn shows that T k+m u =
T k (T m u) = 0. Thus u ∈ ker T k+m , which shows that ker T k+m+1 ⊂ ker T k+m . □
Lemma 2.107. Assume that V is a finite dimensional vector space and that
T : V → V is linear. Then
ker T dim V = ker T dim V +1 = ker T dim V +2 = ...
Proof. By Lemma 2.106 it is sufficient to show that the equality ker T dim V =
ker T dim V +1 holds. Assume thus that this is not the case, that is, that ker T dim V ⊊
ker T dim V +1 . Then Lemma 2.106 implies that ker T k ⊊ ker T k+1 for all k ≤ dim V .
Thus we have that dim(ker T k+1 ) ≥ dim(ker T k ) + 1 for all k ≤ dim V . This,
however, implies that dim(ker T dim V +1 ) ≥ dim V + 1, which is impossible. □
Proposition 2.108. Assume that V is a finite dimensional vector space and
T : V → V is linear. Then
V = (ker T dim V ) ⊕ (ran T dim V ).
Proof. We show first that (ker T dim V ) ∩ (ran T dim V ) = {0}. Assume to that
end that v ∈ (ker T dim V ) ∩ (ran T dim V ). Then we can write v = T dim V u for some
u ∈ V . Since v ∈ ker T dim V , it then follows that T 2 dim V u = T dim V (T dim V u) =
T dim V v = 0, that is, u ∈ ker T 2 dim V . Now we know from Lemma 2.107 that
ker T 2 dim V = ker T dim V , and thus we actually have that u ∈ ker T dim V . This,
however, implies that v = T dim V u = 0.
We now have shown that the sum of the spaces ker T dim V and ran T dim V is
direct and thus
dim(ker T dim V ) + dim(ran T dim V ) = dim (ker T dim V ) ⊕ (ran T dim V ) .


Next we know from Theorem 2.80 that


dim V = dim(ker T dim V ) + dim(ran T dim V ).
Thus (ker T dim V ) ⊕ (ran T dim V ) is a subspace of V of the same dimension as V .
This immediately implies that it is the whole space, which finishes the proof. □
52 2. LINEAR ALGEBRA

Lemma 2.109. Let V be a complex finite dimensional vector space, T : V → V


linear, and λ ∈ K an eigenvalue of T . Then
G(λ, T ) = ker (T − λ Id)dim V .


Proof. Assume that v ∈ G(λ, T ). Then there exists j ∈ N such that (T −


λI)j v = 0, that is, v ∈ ker (T − λ Id)j . From Lemma 2.107 we know that we
may assume without loss of generality that j ≤ dim V. Then, however, we can use
Lemma 2.105 and obtain that v ∈ ker (T − λ Id)dim V , which proves the assertion.

Linear Independence of Generalised Eigenvectors.
Theorem 2.110. Let V be a vector space and let T : V → V be linear. As-
sume moreover that λ1 , . . . , λm are distinct eigenvalues of T and that v1 , . . . , vm
are corresponding generalised eigenvectors. Then the family (v1 , . . . , vm ) is linearly
independent.
Proof. Assume that c1 , . . . , cm ∈ K are such that
(14) 0 = c1 v1 + . . . + cm vm .
For each j = 1, . . . , m, denote by kj the largest non-negative integer such that
(T − λj Id)kj vj ̸= 0. That is,
(T − λj Id)kj vj ̸= 0 but (T − λj Id)kj +1 vj = 0.
Such an integer exists for each j, as the vectors vj themselves are non-zero, but
(T − λj Id)k vj = 0 for some sufficiently large k.
Now denote
w := (T − λ1 )k1 v1 .
Then w ̸= 0, but
(T − λ1 )w = (T − λ1 )k1 +1 v1 = 0,
which shows that w is an eigenvector of T for the eigenvalue λ1 .
In particular, this implies that
(T − λj Id)w = T w − λj w = λ1 w − λj w = (λ1 − λj )w,
and thus also
(T − λj Id)k w = (λ1 − λj )k w.
Now consider the operator
S = (T − λ1 Id)k1 ◦ (T − λ2 Id)k2 +1 ◦ (T − λ3 Id)k3 +1 ◦ · · · ◦ (T − λm Id)km +1 .
Note that S is a polynomial of T , and thus we can rearrange the factors in any way
we want in order to evaluate S. In particular, we have
Sv1 = (T − λ2 Id)k2 +1 ◦ · · · ◦ (T − λm Id)km +1 ◦ (T − λ1 Id)k1 v
= (T − λ2 Id)k2 +1 ◦ · · · ◦ (T − λm Id)km +1 w
= (λ1 − λ2 )k2 +1 · · · (λ1 − λm )km +1 w.
On the other hand, we obtain for j = 2, . . . , m that
Svj = (T − λ1 Id)k1 ◦ (T − λ2 Id)k2 +1 ◦ · · · ◦ (T − λj−1 Id)kj−1 +1 ◦
◦ (T − λj+1 Id)kj+1 +1 ◦ · · · ◦ (T − λm Id)km +1 ◦ (T − λj Id)kj +1 vj = 0.
Now apply the operator S to both sides of the equation (14). Then we obtain
that
0 = S(0) = c1 S(v1 ) + c2 S(v2 ) + . . . + cm S(vm )
= c1 (λ1 − λ2 )k2 +1 · · · (λ1 − λm )km +1 w.
9. GENERALISED EIGENSPACES 53

Since w ̸= 0 and λ1 ̸= λj for j = 2, . . . , m, this is only possible if c1 = 0.


With a similar argumentation, we now obtain that also c2 = c3 = . . . = cm = 0,
which proves the linear independence of the family (v1 , . . . , vm ). □

Decomposition into Generalised Eigenspaces.


Theorem 2.111. Assume that V is a finite dimensional complex vector space
and that T : V → V is linear. Denote by λ1 , . . . , λm ∈ C the distinct eigenvalues of
T . Then
(15) V = G(λ1 , T ) ⊕ · · · ⊕ G(λm , T )
and each subspace G(λj , T ) is T -invariant.
In particular, there exists a basis of V such that the corresponding matrix A of
T has the blockdiagonal form
 
A11 0 . . . 0
 .. .. .. 
 0 . . . 
A= . ,
 .. . .. . ..

0 
0 . . . 0 Amm
and each non-zero block Ajj is upper triangular with diagonal elements λj , that is,
 
λj ∗ . . . ∗
.
 0 . . . . . . .. 

Ajj =  . .
 .. . . . . . . ∗ 

0 . . . 0 λj

Proof. Since G(λj , T ) = ker (T − λj Id)dim V , the generalised eigenspaces
are the kernels of polynomials of T and thus T -invariant. Moreover, once we have
established that V is the direct sum of the generalised eigenspaces, we can find a
basis of each of spaces G(λj , V ) for which the restriction of T to G(λj , V ) is upper
triangular. After concatenating these bases to a basis of the whole space V , we
then obtain from our previous considerations that the matrix A of T has the above
given form.
Thus it remains to establish the decomposition (15), which we will do by in-
duction over the dimension n = dim V of the space V .
For dim V = 1, the result trivially holds.
Assume now that we have shown (15) for every finite dimensional complex
vector space U of dimension dim U < n, and every linear transformation S : U → U .
Let λ1 ∈ C be an eigenvalue of T , the existence of which follows from The-
orem 2.87. Define moreover
U := ran (T − λ1 Id)n .


Then we can write


V = ker (T − λ1 Id)n ⊕ ran (T − λ1 Id)n = G(λ1 , T ) ⊕ U.
 

If U = {0}, we have the desired decomposition and are done. Assume therefore
that this is not the case.
Since U is the range of a polynomial of T it is T -invariant. Thus we can define
the operator S : U → U , u 7→ Su := T u. That is, S is the restriction of T to U ,
interpreted as a mapping from U to itself. We now note that the eigenvalues of
S are a subset of {λ2 , . . . , λm }. Indeed, if u ∈ U ⊂ V is an eigenvector of S for
54 2. LINEAR ALGEBRA

an eigenvalue λ ∈ K, then we have that T u = Su = λu, and thus u is already an


eigenvector of T for the same eigenvalue. However, if Su = λ1 u, then

u ∈ ker(S − λ1 Id) = ker(T − λ1 Id) ∩ U


⊂ ker(T − λ1 Id)n ∩ U = G(λ1 , T ) ∩ U = {0}.
Thus λ1 cannot be an eigenvalue of S.
Since dim U = dim V −dim(G(λ1 , T )) < dim V = n, we can apply the induction
hypothesis to the transformation S : U → U . Thus we can write
(16) U = G(λ2 , S) ⊕ . . . ⊕ G(λm , S),
and consequently
V = G(λ1 , T ) ⊕ G(λ2 , S) ⊕ . . . ⊕ G(λm , S).
It is therefore sufficient to show that G(λj , S) = G(λj , T ) for all j = 2, . . . , m.
We note first that
G(λj , S) = u ∈ U : (T − λj Id)dim U = 0 ,


G(λj , T ) = v ∈ V : (T − λj Id)dim V = 0 .


Since U ⊂ V and dim U < dim V , we immediately obtain that G(λj , S) ⊂ G(λj , T )
for all j = 2, . . . , m.
For the converse inclusion assume that v ∈ G(λj , T ). Because of the decom-
position V = G(λ1 , T ) ⊕ U we can write
v = v1 + u
with v1 ∈ G(λ1 , T ) and u ∈ U . Next, the decomposition (15) implies that
u = u2 + . . . + um
with uj ∈ G(λj , S) for j = 2, . . . , m. Thus
v = v1 + u2 + . . . + um ,
which we can rewrite as
0 = v1 + u2 + . . . + uj−1 + (uj − v) + uj+1 + . . . + um .
Now note that v1 ∈ G(λ1 , T ), uℓ ∈ G(λℓ , S) ⊂ G(λℓ , T ) for ℓ = 2, . . . , m, ℓ ̸= j, and
also (uj − v) ∈ G(λj , T ) + G(λj , S) = G(λj , T ).
Thus we have written 0 as a sum of elements of the generalised eigenspaces
of T , which each consist of the generalised eigenvectors of T for the corresponding
eigenvalue together with the vector 0. Since generalised eigenvectors of T for differ-
ent eigenvalues are linearly independent, they cannot occur on the right hand side
of this sum. Thus we have to conclude that all the terms on the right hand side are
equal to 0, and in particular that (uj − v) = 0. This shows that v = uj ∈ G(λj , S).
Since v ∈ G(λj , T ) was arbitrary, this proves that G(λj , T ) ⊂ G(λj , S), which
concludes the proof. □

10. Characteristic and Minimal Polynomial


Definition 2.112 (Multiplicities of eigenvalues). Assume that V is a finite
dimensional vector space, that T : V → V is linear, and that λ ∈ K is an eigenvalue
of T .
• The algebraic multiplicity of λ is defined as dim(G(λ, T )).
• The geometric multiplicity of λ is defined as dim(E(λ, T )).

10. CHARACTERISTIC AND MINIMAL POLYNOMIAL 55

Remark 2.113. Assume that V is a finite dimensional complex vector space


and T : V → V is linear. We have just shown that V can be decomposed into
the direct sum of the generalised eigenspaces of T . In particular, this shows that
the sum of the algebraic multiplicities of all the eigenvalues of T is equal to the
dimension of V .
In general, though, the geometric multiplicities may be smaller than the algeb-
raic multiplicities, and their sum may be smaller than the dimension of V . More
precisely, we obtain that the geometric multiplicities of all the eigenvalues sum up
to the dimension of V , if and only if T is diagonalisable. ■

Definition 2.114. Assume that V is a finite dimensional complex vector space


and that T : V → V is linear. Denote be λ1 , . . . , λm the distinct eigenvalues of T ,
and let d1 , . . . , dm be their respective algebraic multiplicities. By
χ(z) := (z − λ1 )d1 · · · (z − λm )dm
we denote the characteristic polynomial of T . ■

Remark 2.115. This definition of the characteristic polynomial turns out to


be the same as the one using determinants, which you have seen in previous maths
courses. That is, one can show that
χ(z) = det(z Id −T ).

Remark 2.116. Assume that the operator T : V → V has for some basis an
upper triangular matrix with not necessarily distinct diagonal entries µ1 , µ2 , . . . , µn
(with n = dim V ). Then the characteristic polynomial of T is
χ(z) = (z − µ1 )(z − µ2 ) · · · (z − µn ).
Note here, though, that the same linear factor may repeat several times. ■

Theorem 2.117 (Cayley–Hamilton). Assume that V is a finite dimensional


complex vector space and that T : V → V is linear, and denote by χ the character-
istic polynomial of T . Then
χ(T ) = 0.
Proof. Write
χ(z) := (z − λ1 )d1 · · · (z − λm )dm ,
where λ1 , . . . , λm are the distinct eigenvalues of T and d1 , . . . , dm the corresponding
algebraic multiplicities.
We show first that the restriction of χ(T ) to each of the generalised eigen-
spaces of T is equal to 0. Since G(λj , T ) is T -invariant, we can interpret the
restriction of T to G(λj , T ) as a transformation of the space G(λj , T ). Now we
have defined the generalised eigenspace as G(λj , T ) = ker(T − λj Id)dim V , and thus
(T − λj Id)dim V v = 0 for every v ∈ G(λj , T ). In other words, the restriction of
T − λj Id to G(λj , T ) is nilpotent. This, however, implies that we already have
that (T − λj Id)dim(G(λj ,T )) v = 0 for all v ∈ G(λj , T ). Since dim G(λj , T ) = dj ,
this shows that (T − λj Id)dj v = 0 for all v ∈ G(λj , T ). Now the definition of the
characteristic polynomial (and the fact that we can freely switch the order of the
linear factors) shows that χ(T )v = 0 for all v ∈ G(λj , T ) and all j = 1, . . . , m.
Now let v ∈ V be arbitrary. Since V is the direct sum of its generalised
eigenspaces, we can write
v = v1 + . . . + vm with vj ∈ G(λj , T ) for j = 1, . . . , m.
Thus
χ(T )v = χ(T )v1 + . . . + χ(T )vm = 0.
56 2. LINEAR ALGEBRA

Since this holds for all v ∈ V , it follows that χ(T ) = 0. □


Definition 2.118. A polynomial p is called monic, if its leading order coeffi-
cient is equal to 1. That is, a monic polynomial is of the form
p(z) = z m + cm−1 z m−1 + cm−2 z m−2 + . . . + c1 z + c0 .

Lemma 2.119. Assume that V is a finite dimensional complex vector space and
that T : V → V is linear. There exists a unique monic polynomial p of minimal
degree such that p(T ) = 0.
Proof. From Theorem 2.117 we know that there exists a polynomial such that
p(T ) = 0 (namely the characteristic polynomial). Thus there also exists a monic
polynomial of minimal degree with this property. It remains to show that this is
unique.
Assume therefore to the contrary that p and q are monic polynomials of minimal
degree such that p(T ) = q(T ) = 0, and that p ̸= q. In particular, deg p = deg q,
and thus we can Write
p(z) = z m + am−1 z m−1 + . . . + a1 z + a0 ,
q(z) = z m + bm−1 z m−1 + . . . + b1 z + b0 ,
for some coefficients aj , bj ∈ K. Let k be the largest index such that ak ̸= bk (such
an index exists, as p ̸= q). Now define the polynomial r(z) = (p(z)−q(z))/(ak −bk ).
Then r(T ) = p(T ) − q(T ) = 0 and r is a monic polynomial of degree k < m, which
is a contradiction to the definitions of p and q. □
Definition 2.120. Assume that V is a finite dimensional complex vector space
and that T : V → V is linear. The unique monic polynomial p of minimal degree
satisfying p(T ) = 0 is called the minimal polynomial of T . ■

Example 2.121. Assume that T : C3 → C3 is the linear transformation given


by the matrix  
2 1 0
A = 0 2 0
0 0 2
with respect to the standard basis on C3 . Since A is triangular, we can immediately
read of the characteristic polynomial of T and obtain
χ(T ) = (z − 2)3 .
It turns out, however, that we already have that (A − 2 Id)2 = 0. Thus in this case
the minimal polynomial is different from the characteristic polynomial. In fact, one
can show that the minimal polynomial of T is p(z) = (z − 2)2 . ■

Next we establish some results about the relation between the minimal poly-
nomial and the characteristic polynomial.
Proposition 2.122. Assume that V is a finite dimensional complex vector
space and that T : V → V is linear, and denote by p(z) the minimal polynomial of
T . Assume moreover that q(z) is another polynomial satisfying q(T ) = 0. Then
q is a polynomial multiple of p. That is, there exists a polynomial s such that
q(z) = s(z)p(z).
Proof. By performing polynomial division with remainder, we can write
q(z) = s(z)p(z) + r(z),
where r and s are polynomials, and deg r < deg p. Now, since q(T ) = 0 and
p(T ) = 0, we obtain that also r(T ) = 0. However, p is the minimal polynomial of
11. JORDAN’S NORMAL FORM 57

T and deg r < deg p, which is only possible if r(z) = 0 is the 0-polynomial. Thus
we obtain the desired result. □
Proposition 2.123. Assume that V is a finite dimensional complex vector
space and that T : V → V is linear. Then the zeroes of the minimal polynomial p
of T are precisely the eigenvalues of T . Moreover, p is of the form
p(z) = (z − λ1 )s1 (z − λ2 )s2 · · · (z − λm )sm
with λ1 , . . . , λm are the eigenvalues of T , and 1 ≤ sj ≤ dj for all j, with dj being
the algebraic multiplicity of the eigenvalue λj .
Proof. We know already that the characteristic polynomial is a polynomial
multiple of p, which shows that p is of the form
p(z) = (z − λ1 )s1 (z − λ2 )s2 · · · (z − λm )sm
with 0 ≤ sj ≤ dj . It remains to show that sj ≥ 1 for all j. Assume therefore to
the contrary that sj = 0 for some j, that is, that the corresponding linear factor
does not appear in the minimal polynomial. Let now vj be an eigenvector for the
eigenvalue λj . Then a simple calculation (similar to those we needed for the linear
independence of generalised eigenspaces) shows that
p(T )vj = (λj − λ1 )s1 · · · (λj − λj−1 )sj−1 (λj − λj+1 )sj+1 · · · (λj − λm )sm vj .
Since vj ̸= 0 and λk − λj ̸= 0 for k ̸= j, this shows that p(T )vj ̸= 0, which
contradicts the assumption that p(T ) = 0. Therefore sj ≥ 1 for all j, which proves
the claim. □

11. Jordan’s Normal Form


We will now discuss a further refinement of the decomposition shown in The-
orem 2.111, which describes the, in a sense, simplest possibility of representing an
arbitrary linear transformation of a finite dimensional complex vector space.
Theorem 2.124. Assume that V is a finite dimensional complex vector space
and that T : V → V is linear. Then there exists a basis of V such that the matrix
representation A of T has the blockdiagonal form
 
A11 0 ... 0
 .. .. 
 0 A22 . . 
A= . ,
 .. . .. . ..

0 
0 ... 0 Amm
where each block Amm itself has a block-diagonal form
 (1) 
Jj 0 ... 0

(2) .. .. 
 0

Jj . . 

Ajj =  . ,
 . .. ..
 . . . 0 

k
0 ... 0 Jj j
and the sub-blocks are upper triangular matrices of the form
 
λj 1 0 ... 0
.. .
. .. 

 0 λj 1
 
Jj =  ... . . . . . . . . . 0 
(ℓ)  
 
. . .
. . .

. . . 1
0 . . . . . . 0 λj
58 2. LINEAR ALGEBRA

(ℓ)
(or possibly Jj = (λj ) for sub-blocks of size 1), where λ1 , . . . , λm are the distinct
(ℓ)
eigenvalues of T . Such a basis is called a Jordan basis of T , and the sub-blocks Jj
are called Jordan blocks of T .
Moreover the following properties are satisfied:
• The blocks Ajj are (dj ×dj )-matrices, where dj is the algebraic multiplicity
of the eigenvalue λj .
• The number kj of different blocks for the eigenvalue λj is the geometric
multiplicity of λj .
• The minimal polynomial p of T has the form
p(z) = (z − λ1 )s1 (z − λs )s2 · · · (z − λm )sm ,
(ℓ)
where sj is the size of the largest Jordan block Jj for the eigenvalue λj .
• Apart from the order, the Jordan blocks are unique.
Example 2.125. Assume that the operator T : C4 → C4 has the only eigenvalue
λ = 2. Then we have the following possibilities for the Jordan normal form of T :
(1) A decomposition in a single Jordan block of size 4:
 
2 1 0 0
0 2 1 0
A= 0 0 2 1 .

0 0 0 2
Here the geometric multiplicity of the eigenvalue 2 is 1, and the minimal
polynomial of T is p(z) = (z − 2)4 .
(2) A decomposition in a Jordan block of size 3 and a block of size 1:
 
2 1 0 0
0 2 1 0
A= 0 0 2 0 .

0 0 0 2
Here the geometric multiplicity of the eigenvalue 2 is 2, and the minimal
polynomial of T is p(z) = (z − 2)3 .
(3) A decomposition in two Jordan blocks of size 2:
 
2 1 0 0
 0 2 0 0
A=  0 0 2 1 .

0 0 0 2
Here the geometric multiplicity of the eigenvalue 2 is 2, and the minimal
polynomial of T is p(z) = (z − 2)2 .
(4) A decomposition in one Jordan blocks of size 2 and two blocks of size 1:
 
2 1 0 0
 0 2 0 0
A=  0 0 2 0 .

0 0 0 2
Here the geometric multiplicity of the eigenvalue 2 is 3, and the minimal
polynomial of T is p(z) = (z − 2)2 .
(5) A decomposition in four Jordan blocks of size 1:
 
2 0 0 0
 0 2 0 0
A=  0 0 2 0 .

0 0 0 2
11. JORDAN’S NORMAL FORM 59

Here the geometric multiplicity of the eigenvalue 2 is 4, and the minimal


polynomial of T is p(z) = (z − 2).
The algebraic multiplicity of 2 is in all cases equal to 4, and the characteristic
polynomial is always χ(z) = (z − 2)4 . ■
CHAPTER 3

Inner Product Spaces and Singular Value


Decomposition

1. Inner Products
Recall that the Euclidean (or standard) inner product on Rn is given as
n
X
⟨u, v⟩ = ui v i ,
i=1
and that the corresponding Euclidean norm is
Xn 1/2
∥u∥ := u2i .
i=1
The intuition behind these notion is that ∥u∥ would be the length of the vector
u ∈ Rn , whereas the inner product ⟨u, v⟩ is related to the angle between the vectors
u and v. More precisely, we have
⟨u, v⟩ = ∥u∥ ∥v∥ cos(α),
where α is the angle between the vectors u and v.
In the following, we will develop a generalisation of this notion of inner products
for arbitrary real or complex vector spaces.
Definition 3.1. Let V be a vector space. An inner product on V is a mapping
⟨·, ·⟩ : V → K with the following properties:
(1) Linearity in the first component:
⟨λu + µv, w⟩ = λ⟨u, w⟩ + µ⟨v, w⟩
for all u, v, w ∈ V and λ, µ ∈ K.
(2) Conjugate symmetry:
⟨u, v⟩ = ⟨v, u⟩
for all u, v ∈ V .
(3) Positive definiteness:
⟨v, v⟩ ∈ R≥0
for all v ∈ V . Moreover ⟨v, v⟩ = 0 if and only if v = 0.
A vector space V with an inner product is called an inner product space.1 ■

Example 3.2. Consider the following examples of inner product spaces:


(1) The space Cn is an inner product space with the inner product
n
X
⟨u, v⟩ := ui vi ,
i=1
which we can also write as
⟨u, v⟩ = v H u,
1Sometimes one also uses the name pre-Hilbert space for an inner product space.

61
62 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION

where
v H := (v)T
is the Hermitian conjugate (or: “conjugate transpose”) of v.
Here the positive definiteness follows from the fact that
X n n
X
⟨v, v⟩ = vi vi = |vi |2 ,
i=1 i=1
which is non-negative (and real) for all v ∈ Cn , and equal to zero if and
only if v = 0.
(2) We can make the space C([0, 1], C) of complex-valued continuous functions
on [0, 1] to an inner product space by defining
Z 1
⟨u, v⟩ := u(x)v(x) dx
0
for u, v ∈ C([0, 1], C).
We obtain here the positive definiteness by writing
Z 1
⟨u, u⟩ = |u(x)|2 dx,
0
which obviously is real and non-negative. Moreover, since the integrand
|u(x)|2 is a continuous and non-negative function, the integral is equal to
zero if and only if |u(x)| = 0 for all x, which is the case if and only if
u = 0.

Lemma 3.3. Assume that V is an inner product space with inner product ⟨·, ·⟩.
Then we have for all u, v, w ∈ V and λ, µ ∈ K that
⟨u, λv + µw⟩ = λ⟨u, v⟩ + µ⟨u, w⟩.
That is, ⟨·, ·⟩ is conjugate linear in the second component.
Proof. We can write
⟨u, λv + µw⟩ = ⟨λv + µw, u⟩ = λ⟨v, u⟩ + µ⟨w, u⟩
= λ⟨v, u⟩ + µ⟨w, u⟩ = λ⟨u, v⟩ + µ⟨u, w⟩.

Remark 3.4. It is also somewhat common, especially in physics, to define
inner products slightly differently by requiring linearity in the second component
and conjugate linearity in the first component. Then one would for instance write
the inner product on Cn as ⟨u, v⟩ = uH v. From a theoretical point of view, this
makes no real difference, but it can be pretty annoying in practice, as all concrete
formulas look slightly different. ■

Remark 3.5. The linearity of an inner product in the first component to-
gether with the conjugate linearity in the second component is sometimes called
sesquilinearity from the Latin word sesqui meaning one-and-a-half. ■

Definition 3.6. Let V be an inner product space with inner product ⟨·, ·⟩.
The associated norm on V is the mapping ∥·∥ : V → R≥0 ,
1/2
∥v∥ := ⟨v, v⟩ .

Lemma 3.7. The norm on an inner product space V has the following proper-
ties:
1. INNER PRODUCTS 63

(1) Positivity: ∥v∥ ≥ 0 for all v ∈ V , and ∥v∥ = 0 if and only if v = 0.


(2) Positive homogeneity: ∥λv∥ = |λ|∥v∥ for all v ∈ V and λ ∈ K.
Proof. The positivity follows immediately from the positive definiteness of
the inner product. For the positive homogeneity we note that
∥λv∥2 = ⟨λv, λv⟩ = λλ⟨v, v⟩ = |λ|2 ∥v∥.

Definition 3.8. Let V be an inner product space with inner product ⟨·, ·⟩, and
let u, v ∈ V . We say that u and v are orthogonal, if ⟨u, v⟩ = 0. ■

Theorem 3.9 (Pythagoras). Let V be an inner product space, and let u, v ∈ V


be orthogonal. Then
∥u + v∥2 = ∥u∥2 + ∥v∥2 .
Proof. We have that
∥u + v∥2 = ⟨u + v, u + v⟩ = ⟨u, u⟩ + ⟨u, v⟩ + ⟨v, u⟩ + ⟨v, v⟩
= ⟨u, u⟩ + ⟨v, v⟩ = ∥u∥2 + ∥v∥2 .

Remark 3.10. In an inner product space we can also write
∥u + v∥2 = ⟨u, u⟩ + ⟨u, v⟩ + ⟨v, u⟩ + ⟨v, v⟩ = ∥u∥2 + ∥v∥2 + 2ℜ(⟨u, v⟩),
where ℜ(z) denotes the real part of z ∈ C. Thus Pythagoras’ Theorem actually
holds more generally for vectors u, v for which the inner product ⟨u, v⟩ is purely
imaginary. ■

Theorem 3.11 (Cauchy–Schwarz–Bunyakovsky). Assume that V is an inner


product space and that u, v ∈ V . Then
|⟨u, v⟩| ≤ ∥u∥∥v∥,
and equality holds if and only if u and v are linearly dependent.
Proof. If v = 0, then the result holds (with equality, and u and v are linearly
dependent). We may therefore assume without loss of generality that v ̸= 0.
Define now
⟨u, v⟩
w := u − v.
∥v∥2
Then
⟨u, v⟩
⟨w, v⟩ = ⟨u, v⟩ − ⟨v, v⟩ = 0,
∥v∥2
and thus
⟨u, v⟩ 2 |⟨u, v⟩|2 |⟨u, v⟩|2
∥u∥2 = w + v = ∥w∥ 2
+ ∥v∥ 2
= ∥w∥2
+ .
∥v∥2 ∥v∥4 ∥v∥2
Thus
∥u∥2 ∥v∥2 = ∥v∥2 ∥w∥2 + |⟨u, v⟩|2 ≥ |⟨u, v⟩|2 ,
and we have equality if and only if w = 0. In the latter case, however, we have that
u = ⟨u,v⟩
∥v∥2 v and thus u and v are linearly dependent.
It remains to show that the linear dependence of u and v implies that |⟨u, v⟩| =
∥u∥∥v∥. Assume therefore that u and v are linearly dependent. If v = 0, then the
claimed equality holds. Thus we may assume without loss of generality that v ̸= 0
and that u = λv for some λ ∈ K. Then
|⟨u, v⟩| = |⟨λv, v⟩| = |λ||⟨v, v⟩| = |λ|∥v∥2 = ∥λv∥∥v∥ = ∥u∥∥v∥,
which concludes the proof. □
64 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION

Theorem 3.12 (Triangle inequality). Assume that V is an inner product space


and u, v ∈ V . Then
∥u + v∥ ≤ ∥u∥ + ∥v∥,
and we have equality, if and only if either v = 0 or u = λv for some λ ∈ R≥0 .
Proof. We can write
∥u + v∥2 = ∥u∥2 + ∥v∥2 + 2ℜ(⟨u, v⟩) ≤ ∥u∥2 + ∥v∥2 + 2|⟨u, v⟩|
≤ ∥u∥2 + ∥v∥2 + 2∥u∥∥v∥ = (∥u∥ + ∥v∥)2 .
This shows that the inequality ∥u + v∥ ≤ ∥u∥ + ∥v∥ holds.
Moreover, we have an equality, if and only if we have an equality in all the
steps of our estimate above. The third step was the Cauchy–Schwarz–Bunyakovsky
inequality, where we have an equality if and only if u and v are linearly independent,
that is, if either v = 0 or u = λv for some λ ∈ K. For the second step, we have an
equality if and only if ℜ(⟨u, v⟩) = |⟨u, v⟩|. If v = 0, this trivially holds; if u = λv
for some λ ∈ K, we need the additional condition that λ ∈ R≥0 , which proves the
assertion. □

2. Bases of Inner Product Spaces


Definition 3.13. Let V be an inner product space. A family (v1 , . . . , vm ) ⊂ V
of vectors is called orthonormal, if ∥vi ∥ = 1 for all i = 1, . . . , m, and ⟨vi , vj ⟩ = 0 for
all i, j = 1, . . . , m with i ̸= j. ■

Lemma 3.14. Assume that V is an inner product space and that the family
(v1 , . . . , vm ) ⊂ V is orthonormal. Then (v1 , . . . , vm ) is linearly independent.
Proof. Assume that c1 , . . . , cm ∈ K are such that
0 = c1 v1 + . . . + cm vm .
A repeated application of Pythagoras’ Theorem implies that
0 = ∥c1 v1 + . . . + cm vm ∥2 = ∥c1 v1 ∥2 + . . . + ∥cm vm ∥2
= |c1 |2 ∥v1 ∥2 + . . . + |cm |2 ∥vm ∥2 = |c1 |2 + . . . + |cm |2 .
Thus c1 = c2 = . . . = cm = 0, which proves the linear independence of the set
{v1 , . . . , vm }. □
Definition 3.15. Let V be an inner product space. An orthonormal basis of
V is an orthonormal family that is a basis of V . ■

Theorem 3.16 (Gram–Schmidt Orthogonalisation). Assume that V is a vector


space and that S = (v1 , v2 , . . .) ⊂ V is a linearly independent family in V . Then
we can construct a family E = (e1 , e2 , . . .) as follows:
• Start with setting
v1
e1 = .
∥v1 ∥
• Assume we have already constructed orthonormal vectors e1 , . . . , em . We
define first
fm+1 = vm+1 − ⟨vm+1 , e1 ⟩e1 − ⟨vm+1 , e2 ⟩e2 − . . . − ⟨vm+1 , em ⟩em
and then set
fm+1
em+1 = .
∥fm+1 ∥
Then (e1 , e2 , . . .) is a well-defined orthonormal family. Moreover, we have that
span{v1 , v2 , . . . , vm } = span{e1 , e2 , . . . , em } for all m.
2. BASES OF INNER PRODUCT SPACES 65

Proof. We show by induction over m that the family (e1 , e2 , . . . , em ) is well-


defined and orthonormal, and satisfies span{v1 , . . . , vm } = span{e1 , . . . , em }.
For m = 1, this is obvious (as v1 ̸= 0).
Now assume that we have shown the result for m. Then we have for all 1 ≤
j ≤ m that
⟨fm+1 , ej ⟩ = ⟨vm+1 , ej ⟩ − ⟨vm+1 , e1 ⟩⟨e1 , ej ⟩ − . . . − ⟨vm+1 , em ⟩⟨em , ej ⟩
= ⟨vm+1 , ej ⟩ − ⟨vj , ej ⟩⟨ej , ej ⟩ = 0.
Here we have used that ⟨ei , ej ⟩ = 0 for i ̸= j and ⟨ej , ej ⟩ = 1.
In order to show that em+1 is well-defined, we need to show that fm+1 ̸= 0.
Assume therefore to the contrary that fm+1 = 0. Then we can write
vm+1 = ⟨vm+1 , e1 ⟩e1 + ⟨vm+1 , e2 ⟩e2 + . . . + ⟨vm+1 , em ⟩em
∈ span{e1 , . . . , em } = span{v1 , . . . , vm },
which contradicts the linear independence of the family (v1 , . . . , vm , vm+1 ). Thus
fm+1 ̸= 0, and em+1 is well-defined and satisfies ∥em+1 ∥ = 1. Therefore the family
(e1 , . . . , em , em+1 ) is orthonormal.
Finally, by construction every vector ej can be written as a linear combination of
the vectors v1 , . . . , vj , and thus span{e1 , . . . , em , em+1 } ⊂ span{v1 , . . . , vm , vm+1 }.
Since those spaces have the same dimension (namely m + 1) they have to be equal.

Corollary 3.17. Every finite dimensional inner product space has an or-
thonormal basis.
Proof. Start with any basis of the vector space, and apply Gram–Schmidt
orthogonalisation. □
Lemma 3.18. Assume that V is a finite dimensional inner product space with
orthonormal basis (v1 , . . . , vn ). Assume that u, w ∈ V have the coordinates x =
(x1 , . . . , xn )T ∈ Kn and y = (y1 , . . . , yn )T ∈ Kn , respectively, that is,
u = x 1 v1 + . . . + x n vn , w = y1 v1 + . . . + yn vn .
Then
n
X
⟨u, w⟩ = xi yi = y H x.
i=1

Proof. We have
n
DX n
X E X n
n X n
X
⟨u, w⟩ = x i vi , y j vj = xi yj ⟨vi , vj ⟩ = xi yi .
i=1 j=1 i=1 j=1 i=1


Definition 3.19. Assume that V is a finite dimensional inner product space
and let (v1 , . . . , vn ) be a family of vectors in V . The Gram matrix for this family
is the matrix G ∈ Matn (K) with entries Gij = ⟨vj , vi ⟩. That is,
 
⟨v1 , v1 ⟩ ⟨v2 , v1 ⟩ . . . ⟨vn , v1 ⟩
 ⟨v1 , v2 ⟩ ⟨v2 , v2 ⟩ . . . ⟨vn , v2 ⟩ 
G :=  . .
 
. .. . . ..
 . . . . 
⟨v1 , vn ⟩ ⟨v2 , vn ⟩ . . . ⟨vn , vn ⟩

Remark 3.20. We see immediately that the Gram matrix for a linear system
is the identity matrix, if and only if the system is orthonormal. ■
66 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION

Proposition 3.21. Assume that V is a finite dimensional inner product space,


that (v1 , . . . , vn ) is a basis of V , and that G is the corresponding Gram matrix.
Assume that u, w ∈ V have the coordinates x = (x1 , . . . , xn )T ∈ Kn and y =
(y1 , . . . , yn )T ∈ Kn , respectively. Then
⟨u, w⟩ = y H Gx.
Proof. We have
n X
X n n X
X n
⟨u, w⟩ = xi yj ⟨vi , vj ⟩ = y j Gji xi = y H Gx.
i=1 j=1 j=1 i=1


H
Definition 3.22. A matrix A ∈ Matn (K) is called Hermitian, if A = A,
that is, if aij = aji for all i, j = 1, . . . , n. ■

Definition 3.23. A matrix A ∈ Matn (K) is called positive semi-definite, if


xH Ax ∈ R≥0
for all x ∈ Kn . It is called positive definite, if
xH Ax ∈ R>0
for all x ∈ Kn with x ̸= 0. ■

Lemma 3.24. Assume that V is a finite dimensional inner product space and
that (v1 , . . . , vn ) is a family in V , and denote by G the corresponding Gram matrix.
Then G is Hermitian and positive semi-definite. Moreover, G is positive definite,
if and only if the family (v1 , . . . , vn ) is linearly independent.
Proof. Since
Gij = ⟨vj , vi ⟩ = ⟨vi , vj ⟩ = Gji ,
the matrix G is Hermitian. Pn
Now let x ∈ Kn and let v = i=1 xi vi be the vector with coordinates (x1 , . . . , xn ).
Then
xH Gx = ⟨v, v⟩ = ∥v∥2 ≥ 0,
which proves that G is positive semi-definite.
Now assume that Pn(v1 , . . . , vn ) is linearly independent, let x = (x1 , . . . , xn ) ∈
Kn \{0} and let v = i=1 xi vi . Since x ̸= 0 and (v1 , . . . , vn ) is linearly independent,
it follows that v ̸= 0. Thus the same computation as above shows that xH Gx =
∥v∥2 > 0. This proves that G is positive definite.
Now assume conversely that G is positive definite, and let (x1 , . . . , xn ) ∈ Kn
be such that
Xn
xi vi = 0.
i=1
Then
0 = ∥0∥2 = xH Gx.
Thus the positive definiteness of G implies that x = 0. This proves that (v1 , . . . , vn )
is linearly independent. □
Remark 3.25. Conversely, assume that (v1 , . . . , vn ) is a basis of V and G ∈
Matn (K) is positive definite and Hermitian. Then we can define an inner product
on V by setting
⟨u, v⟩G := y H Gx,
n
if x, y ∈ K are the coordinates of the vectors u, v ∈ V . (It is straight-forward to
show that this is indeed an inner product.) Thus we have a one-to-one correspond-
ence between positive definite Hermitian matrices and inner products. ■
3. ORTHOGONAL COMPLEMENTS AND PROJECTIONS 67

Proposition 3.26. Assume that V is a finite dimensional inner product space


and that (v1 , . . . , vn ) is a basis of V , denote by G the corresponding Gram matrix
and by G−1 its inverse, and let v ∈ V . Then the coordinates x1 , . . . , xn of v are
given as
X n
xi = G−1
ij ⟨v, vj ⟩.
j=1

Proof. Let x1 , . . . , xn be the coordinates of v with respect to (v1 , . . . , vn ).


Denote moreover yj := ⟨v, vj ⟩ for j = 1, . . . , n. Then
n
X n
X
yj = ⟨v, vj ⟩ = xi ⟨vi , vj ⟩ = xi Gji .
i=1 i=1

Thus the vectors y = (y1 , . . . , yn ) and x = (x1 , . . . , xn ) satisfy the equation


y = Gx.
Solving this equation for x yields the assertion. □
Corollary 3.27. Assume that V is a finite dimensional inner product space
and that (e1 , . . . , en ) is an orthonormal basis of V . Then
n
X
v= ⟨v, ei ⟩ei
i=1

for all v ∈ V . In particular, the coordinate vector of v is (⟨v, e1 ⟩, . . . , ⟨v, en ⟩).

3. Orthogonal Complements and Projections


Definition 3.28. Assume that V is an inner product space and U ⊂ V is a
subset. We denote by
U ⊥ := v ∈ V : ⟨u, v⟩ = 0 for all u ∈ U


the orthogonal complement of U . ■

Lemma 3.29. Let V be an inner product space and U ⊂ V a subset. Then U ⊥


is a subspace of V and U ∩ U ⊥ = {0}.
Proof. We note first that 0 ∈ U ⊥ , since ⟨u, 0⟩ = 0 for all u ∈ V . Now assume
that v, w ∈ U ⊥ . Then we have for all u ∈ U that
⟨u, v + w⟩ = ⟨u, v⟩ + ⟨u, w⟩ = 0 + 0 = 0,
and thus v + w ∈ U . Similarly, if v ∈ U ⊥ and λ ∈ K, then

⟨u, λv⟩ = λ⟨u, v⟩ = 0


for all u ∈ U and therefore λv ∈ U ⊥ . This shows that U ⊥ is a subspace.
Finally assume that u ∈ U ∩ U ⊥ . Then we have in particular that ⟨u, u⟩ = 0,
and thus u = 0. □
Lemma 3.30. Let V be an inner product space and U , W be subsets of V
satisfying U ⊂ W . Then W ⊥ ⊂ U ⊥ .
Proof. Assume that v ∈ W ⊥ . Then ⟨w, v⟩ = 0 for all w ∈ W . Since U ⊂ W ,
this implies that, in particular, ⟨u, v⟩ = 0 for all u ∈ U , which in turn shows that
v ∈ U ⊥. □
Proposition 3.31. Assume that V is a finite dimensional inner product space
and that U ⊂ V is a subspace. Then
V = U ⊕ U ⊥.
68 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION

Proof. Denote n = dim V and m = dim U . Let B = (u1 , . . . , um ) be an


orthonormal basis of U . Using the Gram–Schmidt algorithm, we can extend this to
an orthonormal basis of V . That is, we can find a family C = (v1 , . . . , vn−m ) such
that B ∪ C is an orthonormal basis of V . We claim that U ⊥ = span C.
For this, we note first that C ⊂ U ⊥ by construction (as every vector vj is
orthogonal to all the basis elements of U ). Since U ⊥ is a subspace, this shows that
span C ⊂ U ⊥ . In order to show the converse inclusion, we recall that U ⊥ ∩ U = {0},
which implies that the sum of U and U ⊥ is direct. In particular,
dim U ⊥ = dim(U ⊕ U ⊥ ) − dim U ≤ n − m = dim(span C).
Since span C ⊂ U ⊥ , this implies that U ⊥ = span C.
As a consequence, we have that
V = span B ⊕ span C = U ⊕ U ⊥ ,
which concludes the proof. □
Proposition 3.32. Assume that V is a finite dimensional inner product space
and that U ⊂ V is a subspace. Then (U ⊥ )⊥ = U .
Proof. Assume that u ∈ U . Then we have for every v ∈ U ⊥ that ⟨u, v⟩ = 0.
This, however, implies that u ∈ (U ⊥ )⊥ . This shows that U ⊂ (U ⊥ )⊥
Now assume that v ∈ (U ⊥ )⊥ . Since V = U ⊕ U ⊥ , we can write v in the
form v = u + w with u ∈ U and w ∈ U ⊥ . We have to show that w = 0. Since
u ∈ U ⊂ (U ⊥ )⊥ , it follows that w = v − u ∈ (U ⊥ )⊥ . On the other hand, we know
that w ∈ U ⊥ . Since U ⊥ ∩ (U ⊥ )⊥ = {0}, this implies that w = 0. Thus v = u ∈ U ,
which shows that (U ⊥ )⊥ ⊂ U . □
Definition 3.33. Assume that V is a finite dimensional inner product space
and that U ⊂ V is a subspace. The orthogonal projection onto U , denoted πU : V →
V , is the projection onto U along U ⊥ . ■

Theorem 3.34. Assume that V is a finite dimensional inner product space


and that U ⊂ V is a subspace. Then the orthogonal projection πU onto U has the
following properties:
(1) πU : V → V is a linear transformation.
(2) πU u = u for all u ∈ U .
(3) πU w = 0 for all w ∈ U ⊥ .
(4) ran πU = U and ker πU = U ⊥ .
(5) v − πU v ∈ U ⊥ for all v ∈ V .
2
(6) πU = πU .
(7) ∥v∥2 = ∥πU v∥2 + ∥v − πU v∥2 for all v ∈ V .
(8) ∥πU v∥ ≤ ∥v∥ for all v ∈ V , and we have equality if and only if v ∈ U .
(9) πU v is the unique solution of the optimisation problem
min∥v − u∥2 .
u∈U

(10) If (e1 , . . . , em ) is an orthonormal basis of U , then


πU v = ⟨v, e1 ⟩e1 + . . . + ⟨v, em ⟩em .
Proof. The linearity of the orthogonal projection is a special case of Pro-
position 2.63 on general (oblique) projections. Moreover, (2)–(6) are immediate
consequences of the definition of the orthogonal projection.
Now let v ∈ V . Then πU v ∈ U and v − πU v ∈ U ⊥ , which implies that
⟨πU v, v − πU v⟩ = 0. Thus we can apply Pythagoras’ Theorem and obtain that
∥v∥2 = ∥πU v∥2 + ∥v − πU v∥2 ,
4. SINGULAR VALUE DECOMPOSITION 69

which proves (7). In addition, we obtain that ∥πU v∥2 ≤ ∥v∥2 with equality if and
only if ∥v − πU v∥ = 0, that is, v = πU v, which in turn is the case if and only if
v ∈ U . This proves (8).
Next, we note that for all u ∈ U we have that
∥v − u∥2 = ∥v − πU v + πU v − u∥2 = ∥v − πU v∥2 + ∥πU v − u∥2 ,
since v − πU v ∈ U ⊥ and πU v − u ∈ U . Thus the minimisation problem minu∈U ∥v −
u∥2 is equivalent to the problem
min∥πU v − u∥2 ,
u∈U

which has the obvious unique solution u = πU v. This proves (9).


Finally, let (e1 , . . . , em ) be an orthonormal basis of U , and let (em+1 , . . . , en )
be an orthonormal basis of U ⊥ , such that (e1 , . . . , en ) is an orthonormal basis of
V . Then we can write
Xn Xm n
  X 
v= ⟨v, ei ⟩ei = ⟨v, ei ⟩ei + ⟨v, ei ⟩ei ∈ U ⊕ U ⊥ .
i=1 i=1 i=m+1

From the definition of πU v, it now follows that


Xm n
X
πU v = ⟨v, ei ⟩ei and v − πU v = ⟨v, ei ⟩ei ,
i=1 i=m+1

which shows (10) and concludes the proof. □

4. Singular Value Decomposition


Theorem 3.35 (Singular Value Decomposition (SVD)). Assume that U and V
are finite dimensional inner product spaces with inner products ⟨·, ·⟩U and ⟨·, ·⟩V ,
respectively, and that T : U → V is linear. Denote by p := rank T = dim(ran T ) the
rank of T . Then there exist orthonormal families (u1 , . . . , up ) ⊂ U and (v1 , . . . , vp ) ⊂
V , and real numbers σ1 ≥ σ2 ≥ . . . ≥ σp > 0 such that
p
X
Tu = σ⟨u, ui ⟩vi
i=1
for all u ∈ U .
The numbers σj are called the singular values of T , and the families (u1 , . . . , up )
and (v1 , . . . , vp ) singular systems of T .
Proof. We use induction over n = dim U . For n = 0, the claim trivially holds
(with T : {0} → V being the zero operator, p = 0, empty orthonormal systems and
no singular values).
Assume therefore that n ≥ 1 and that the claim holds for all linear transform-
ations between inner product spaces, where the dimension of the domain is strictly
smaller than n. If T u = 0 for all u ∈ U , then there is nothing to show (we can
again choose p = 0, empty orthonormal systems, and no singular values). Assume
therefore without loss of generality that there exists some u ∈ U with T u ̸= 0.
Denote now by ∥·∥U and ∥·∥V the norms on U and V , respectively, and consider
the optimisation problem
(17) max ∥T u∥V .
∥u∥U =1

The function u 7→ ∥T u∥V is continuous, and the set u ∈ U : ∥u∥U = 1 is closed
and bounded, and thus this optimisation problem admits a solution.2
2Here we have actually used quite a number of results from calculus in several variables,
which may not seem immediately obvious.
70 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION

Now choose a solution u1 of (17), define σ1 = ∥T u1 ∥ and v1 = T u1 /σ1 .3 Then


we have in particular that
 u 
(18) ∥T u∥V = ∥u∥U T ≤ σ1 ∥u∥U
∥u∥U V
for all u ∈ U with u ̸= 0, and the same inequality ∥T u∥V ≤ σ1 ∥u∥U obviously holds
for u = 0.
Define now U1 = span{u1 } and U2 = U1⊥ , and define similarly V1 = span{v1 }
and V2 = V1⊥ . We will show in the following that T w ∈ V2 for all w ∈ U2 .
Assume therefore that w ∈ U2 , and let α ∈ K. Then
T (αu1 + w) = αT u1 + T w = ασ1 v1 + T w.
As a consequence, we have that
∥T (αu1 + w)∥2V = ∥ασ1 v1 + T w∥2V
= ∥ασ1 v1 + πV1 (T w)∥2V + ∥T w − πV1 w∥2V
= |ασ1 + ⟨T w, v1 ⟩V |2 + ∥T w − πV1 w∥2V
= σ12 |α|2 + 2σ1 |α||⟨T w, v1 ⟩V | + |⟨T w, v1 ⟩V |2 + ∥T w − πV1 w∥2V
= σ12 |α|2 + 2σ1 |α||⟨T w, v1 ⟩V | + ∥πV1 T w∥2V + ∥T w − πV1 w∥2V
= σ12 |α|2 + 2σ1 |α||⟨T w, v1 ⟩V | + ∥T w∥2V .
On the other hand, we have by (18) that
∥T (αu1 + w)∥2V ≤ σ12 ∥αu1 + w∥2U = σ12 |α|2 + σ12 ∥w∥2U .
Combining these estimates, we obtain that
σ12 |α|2 + 2σ1 |α||⟨T w, v1 ⟩V | + ∥T w∥2V ≤ σ12 |α|2 + σ12 ∥w∥2U ,
or
2σ1 |α||⟨T w, v1 ⟩V | ≤ σ12 ∥w∥2U − ∥T w∥2V .
Since α ∈ K was arbitrary and σ1 > 0, this is only possible if ⟨T w, v1 ⟩V = 0, that
is, if T w ∈ V1⊥ = V2 .
We have thus shown that T w ∈ V2 whenever w ∈ U2 . Consider therefore the
mapping S : U2 → V2 , w 7→ Sw := T w. Since dim U2 < dim U , we can apply the
induction hypothesis. We can easily see that rank S = p − 1 (one dimension of the
range of T being the space V1 ). Thus there exist orthonormal systems (u2 , . . . , up )
and (v2 , . . . , vp ) in U2 and V2 , respectively, and positive numbers σ2 ≥ . . . ≥ σp > 0
such that
Xp
Sw = σi ⟨w, ui ⟩vi .
i=2
Since U2 and U1 are orthogonal, it follows that (u1 , u2 , . . . , up ) is an orthonormal
system in U , and similarly (v1 , v2 , . . . , vp ) is an orthonormal system in V . Moreover,
we have that
Xp
T u = T (πU1 u) + T (πU2 u) = T (⟨u, u1 ⟩u1 ) + T (πU2 u) = σ1 ⟨u, u1 ⟩v1 + ⟨u, ui ⟩vi
i=2

for all u ∈ U . Finally, we note that v2 = σ2 u2 and thus, by (18),


σ2 = σ2 ∥v2 ∥ = ∥u2 ∥ ≤ σ1 ∥u2 ∥ = σ1 .
This shows that σ1 ≥ σ2 ≥ . . . ≥ σp > 0, which concludes the proof. □

3Note here that the solution u can never be unique, as −u is another obvious solution.
1 1
5. UNITARY TRANSFORMATIONS AND ADJOINTS 71

Remark 3.36. One can show that the singular values of a mapping are unique.
In addition, one obtains a limited uniqueness result for the singular vectors: If the
singular values satisfy σj−1 > σj = σj+1 = . . . = σj+k > σj+k+1 . . ., then the linear
spans span{uj , . . . , uj+k } and span{vj , . . . , vj+k } are independent of the choice of
the singular vectors. In particular, if all the singular values are distinct, then the
singular vectors are uniquely determined up to orientation. ■

Lemma 3.37. Assume that the mapping T : U → V has the singular value
decomposition
Xp
Tu = σi ⟨u, ui ⟩vi .
i=1
Then
ran T = span{v1 , . . . , vp } and ker T = span{u1 , . . . , up }⊥ .
Proof. This is immediately obvious from the decomposition. □
We will come back soon to the singular value decomposition in order to discuss
important properties, but also slightly different representations. For that, we will
need some further notation, though.

5. Unitary Transformations and Adjoints


In real vector spaces, we have the notion of orthogonal transformations, which
are linear mappings that preserve inner products between vectors. In complex
vector spaces, we have the same idea, but we call these transformations unitary
instead:
Definition 3.38. Let U and V be inner product spaces, and let T : U → V be
linear. Then T is called unitary, if
⟨T u, T v⟩V = ⟨u, v⟩U
for all u, v ∈ U . ■

Lemma 3.39. Let U and V be inner product spaces, and let T : U → V be


unitary. Then T is injective.
Proof. Assume that u ∈ U is such that T u = 0. Then
0 = ∥T u∥2V = ⟨T u, T u⟩V = ⟨u, u⟩U = ∥u∥2U
and thus u = 0. □
Lemma 3.40. Assume that U and V are finite dimensional inner product spaces
with bases (u1 , . . . , un ) and (v1 , . . . , vm ), respectively, and denote by GU and GV the
corresponding Gram matrices. Let moreover T : U → V be a linear transformation
with matrix representation A ∈ Km×n . Then T is unitary, if and only if
GU = AH GV A.
Proof. Let u, v ∈ U be arbitrary with coordinate vectors x, y ∈ Kn . Then
⟨u, v⟩U = y H GU x,
and
⟨T u, T v⟩V = (Ax)H GV Ay = xH (AH GV A)y.
Thus T is unitary, if and only if
y H GU x = xH (AH GV A)y
for all x, y ∈ Kn , which is the case if and only if GU = AH GV A. □
72 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION

Remark 3.41. If (u1 , . . . , un ) and (v1 , . . . , vm ) are orthonormal bases of U and


V , then the corresponding Gram matrices are just the identity matrices of the
corresponding dimensions. In this case, the transformation is unitary, if and only
if
AH A = Idn ,
that is, matrix A is unitary. ■

Definition 3.42 (Adjoint of a linear transformation). Assume that U and V


are finite dimensional inner product spaces and that T : U → V is linear. Assume
moreover that T has the singular value decomposition
X p
Tu = σi ⟨u, ui ⟩vi
i=1
for all u ∈ U . The adjoint transformation T ∗ : V → U is defined as
Xp
T ∗v = σi ⟨v, vi ⟩ui
i=1
for all v ∈ V . ■

Theorem 3.43. Assume that U and V are finite dimensional inner product
spaces and that T : U → V is linear. The adjoint T ∗ : V → U is the unique linear
transformation satisfying
(19) ⟨T u, v⟩V = ⟨u, T ∗ v⟩U for all u ∈ U and v ∈ V.

Proof. The linearity of T is obvious from its definition.
We next show that (19) holds. Let therefore u ∈ U and v ∈ V . Then
p
DX E p
X
⟨T u, v⟩V = σi ⟨u, ui ⟩U vi , v = σi ⟨u, ui ⟩U ⟨vi , v⟩V .
V
i=1 i=1
On the other hand, we have that
D X p E
⟨u, T ∗ v⟩V = u, σi ⟨v, vi ⟩V ui
U
i=1
p
X p
X
= σi ⟨v, vi ⟩V ⟨u, ui ⟩U = σi ⟨vi , v⟩V ⟨u, ui ⟩U .
i=1 i=1
The resulting expressions are the same in the two cases, which shows that (19)
holds.
It remains to show that T ∗ is the unique linear transformation with this prop-
erty. Assume therefore that S : V → U is another linear transformation such that
⟨T u, v⟩V = ⟨u, Sv⟩U for all u ∈ U and v ∈ V . Then
⟨u, T ∗ v⟩U = ⟨T u, v⟩V = ⟨u, Sv⟩U
for all u ∈ U and v ∈ V , and thus
⟨u, (T ∗ − S)v⟩U = 0
for all u ∈ U and v ∈ V . That is, (T ∗ − S)v ∈ U ⊥ = {0} for all v ∈ V , which is the
same as saying that T ∗ v = Sv for all v ∈ V , that is, T ∗ = S. □
Lemma 3.44. Assume that U and V are inner product spaces and that T : U →
V is linear. Then
T = (T ∗ )∗ .
Proof. Exercise!
(Use Theorem 3.43!) □
5. UNITARY TRANSFORMATIONS AND ADJOINTS 73

Lemma 3.45. Assume that U , V , W are inner product spaces and that T : U →
V and S : V → W are linear. Then
(S ◦ T )∗ = T ∗ ◦ S ∗ .
Proof. Exercise!
(Use Theorem 3.43!) □
Lemma 3.46. Assume that U and V are finite dimensional inner product spaces
and that T : U → V is linear. Then
ker T = (ran T ∗ )⊥ and ran T = (ker T ∗ )⊥ .
Proof. We have that u ∈ ker T if and only if T u = 0. This is equivalent to u
satisfying ⟨T u, v⟩V = 0 for all v ∈ V . Using Theorem 3.43, this in turn is equivalent
to u satisfying ⟨u, T ∗ v⟩V = 0 for all v ∈ V , which is equivalent to the inclusion
u ∈ (ran T ∗ )⊥ . This shows that ker T = (ran T ∗ )⊥ .
By applying this result to T ∗ and recalling that (T ∗ )∗ = T , we obtain that
ker T ∗ = (ran T )⊥ . Taking orthogonal complements on both sides, we finally obtain
that ran T = (ker T ∗ )⊥ as claimed. □
Adjoints of unitary transformations.
Using the notion of the adjoint of a mapping, we can find a different charac-
terisation of unitary transformations:
Proposition 3.47. Assume that U and V are inner product spaces and that
T : U → V is a linear transformation. Then T is unitary, if and only if
T ∗ T = IdU .
Proof. Assume first that T ∗ T = IdU . Then we have for all u, v ∈ U that
⟨T u, T v⟩V = ⟨u, T ∗ T v⟩U = ⟨u, v⟩U ,
which shows that T is unitary.
Conversely, assume that T is unitary. Then
⟨u, T ∗ T v⟩U = ⟨T u, T v⟩V = ⟨u, v⟩U
for all u, v ∈ U , and thus
⟨u, (T ∗ T − IdU )v⟩U = 0
for all u, v ∈ U . This shows that (T ∗ T − IdU )v ∈ U ⊥ = {0} for all v ∈ U , which in
turn shows that T ∗ T = IdU . □
Proposition 3.48. Assume that U and V are inner product spaces and that
T : U → V is unitary. Then
T T ∗ = πran T .
Proof. Let v ∈ V . Then we can decompose v as v = v1 + v2 with v1 ∈ ran T
and v2 ∈ (ran T )⊥ . We have to show that T T ∗ v = v1 .
Since v1 ∈ ran T , there exists u1 ∈ U such that v1 = T u1 . Since T ∗ T = IdU ,
we then have that
T T ∗ v = T T ∗ (v1 + v2 ) = T T ∗ T u1 + T T ∗ v2
= T (T ∗ T u1 ) + T T ∗ v2 = T u1 + T T ∗ v2 = v1 + T T ∗ v2 .
Thus it remains to show that T T ∗ v2 = 0.
By the definition of the adjoint, we have for every w ∈ V that
⟨T T ∗ v2 , w⟩V = ⟨T ∗ v2 , T ∗ w⟩U = ⟨v2 , (T ∗ )∗ T ∗ w⟩V = ⟨v2 , T T ∗ w⟩V .
Since T T ∗ w ∈ ran T and v2 ∈ (ran T )⊥ , we conclude that ⟨T T ∗ v2 , w⟩V = 0. Since
w ∈ V was arbitrary, this shows that T T ∗ v2 = 0, which concludes the proof. □
74 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION

Adjoints in matrix form.


Finally, we investigate how the adjoint of a linear transformation looks in matrix
form:
Lemma 3.49. Assume that the linear transformation T : U → V has the mat-
rix representation A ∈ Km×n with respect to bases of U and V , and denote by
GU and GV the Gram matrices of U and V . Then the matrix A∗ of the adjoint
transformation T ∗ is
A∗ = G−1 H
U A GV .

Proof. The adjoint T ∗ is uniquely characterised by the condition ⟨T u, v⟩V =


⟨u, T ∗ v⟩U for all u ∈ U and v ∈ V . Translated to coordinates, this means that
y H GV (Ax) = (A∗ y)H GU x
for all x ∈ Kn and y ∈ Km . We can rewrite this to the equation
yHGV Ax = y H (A∗ )H GU x.
Since this has to hold for all x ∈ Kn and y ∈ Km , we obtain that
GV A = (A∗ )H GU ,
or
(A∗ )H = GV AG−1
U .
Finally, we take the Hermitian conjugate on both sides and obtain
A∗ = (G−1 H H H −1 H
U ) A GV = GU A GV ,
as the matrices GU and GV , and thus also their inverses (which exist as they are
positive definite), are Hermitian. □
Remark 3.50. If A ∈ Km×n is the matrix of linear transformation with respect
to orthonormal bases on the involved spaces, then
A∗ = AH .

6. Singular Value Decomposition Revisited


Assume now again that U and V are inner product spaces, and that the linear
transformation T : U → V has the singular value decomposition
Xp
Tu = σi ⟨u, ui ⟩vi .
i=1
We can alternatively interpret this as a composition of three different mappings:
• The first mapping is the transformation
P̃ : U → Kp ,
 
⟨u, u1 ⟩
u 7→  ...  .
 

⟨u, up ⟩
• The second mapping is the transformation
Σ : Kp → Kp ,
 
    σ1 0 . . . 0  
x1 σ1 x1 .. . ..  x1
. ..

 ..   .  0 . .
 .  7→  ..  =  . .   .. 

 .. .. ... 0  xp
xp σp xp
0 ... 0 σp
7. THE MOORE–PENROSE INVERSE 75

• The third and final mapping is the transformation


Q : Kp → V,
 
y1 p
 ..  X
 .  7→ yi vi .
yp i=1

Moreover, if we consider the standard (Euclidean) inner product on Kp , then the


orthonormality of the family (v1 , . . . , vp ) implies that the mapping Q : Kp → V is
unitary.
The mapping P̃ : U → Kp typically is not unitary itself (it actually will not
be injective, if T is not injective). However, it turns out that its adjoint is. More
precisely, consider the mapping
P : Kp → U,
 
x1 p
 ..  X
 .  7→ xi ui .
xp i=1

Then P is unitary and P̃ = P ∗ .


In addition, we have that ran Q = span{v1 , . . . , vp } = ran T , and ran P =
span{u1 , . . . , up } = (ker T )⊥ .
All in all, we obtain the following re-interpretation of the singular value decom-
position:
Theorem 3.51 (Singular Value Decomposition, again). Assume that U and V
are finite dimensional inner product spaces and that T : U → V is linear. Denote
by p the rank of T . There exist unitary mappings P : Kp → U and Q : Kp → V and
a diagonal mapping Σ : Kp → Kp with diagonal entries σ1 ≥ σ2 ≥ . . . ≥ σp > 0
such that
T = Q ◦ Σ ◦ P ∗.
Moreover, ran T = ran Q and ker T = (ran P )⊥ .
Finally, we can rewrite this decomposition in terms of matrix products, once
we have chosen orthonormal bases on the spaces U and V .
Theorem 3.52 (Singular Value Decomposition in Matrix Form). Assume that
U and V are finite dimensional inner product spaces and that T : U → V is linear,
and that A ∈ Km×n is the matrix of T with respect to orthonormal bases on U and
V , respectively. Then there exist unitary matrices Q ∈ Km×p and P ∈ Kn×p as
well as a diagonal matrix Σ ∈ Rp×p with positive entries σ1 ≥ σ2 ≥ . . . ≥ σp > 0
such that
A = QΣP H .

7. The Moore–Penrose Inverse


Assume that we want to solve a linear system T u = v. If T is bijective, then
the solution is simply v = T −1 u. Also, this solution can be actually computed
numerically (provided we have sufficient time and memory available), and we can
use any linear solver of our choice to do so. If, however, T is not injective, then
solutions of the equation will not be unique, and if T is not surjective, they might
not exist at all. In that case, we can use the singular value decomposition of T to
define a generalised inverse that comes, in a sense, as close as possible to an actual
inverse of T .
76 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION

Definition 3.53. Assume that U and V are finite dimensional inner product
spaces and that T : U → V has the singular value decomposition
p
X
Tu = σi ⟨u, ui ⟩vi .
i=1

The Moore–Penrose inverse or pseudo-inverse of T is the mapping T † : V → U ,


p
X 1
T †v = ⟨v, vi ⟩ui .
σ
i=1 i

Remark 3.54. Alternatively, if we use the decomposition


T = Q ◦ Σ ◦ P∗
as in Theorem 3.51, we can write the Moore–Penrose inverse as
T † = P ◦ Σ−1 ◦ Q∗ .

Theorem 3.55. Assume that U and V are finite dimensional inner product
spaces and that T : U → V is linear. Then the Moore–Penrose inverse T † of T has
the following properties:
(1) ker T † = (ran T )⊥ .
(2) ran T † = (ker T )⊥ .
(3) T T † = πran T .
(4) T † T = π(ker T )⊥ .
(5) T †T T † = T †.
(6) T T †T = T .
Proof. Write T = QΣP ∗ with Q : Kp → U and P : Kp → V unitary, and
Σ : Kp → Kp diagonal with positive diagonal entries. Then T † = P Σ−1 Q∗ .
From the results of the singular value decomposition we now obtain that ran T =
ran Q and ker T = (ran P )⊥ . Now note that the decomposition T † = P Σ−1 Q∗ is
actually a singular value decomposition of T † (up to a necessary reordering of the
singular values). Thus we also have that ran T † = ran P and ker T † = (ran Q)⊥ .
Therefore (ran T † )⊥ = ran P = (ker T )⊥ and ker T † = (ran Q)⊥ = (ran T )⊥ .
Recall now that P and Q are unitary, and therefore P ∗ P = Idp and Q∗ Q = Idp .
Thus
T T † = QΣP ∗ P Σ−1 Q∗ = QΣΣ−1 Q∗ = QQ∗ .
This in turn shows that
T T † = QQ∗ = πran Q = πran T .
Next we have that
T † T = P Σ−1 Q∗ QΣP ∗ = P Σ−1 ΣP ∗ = P P ∗ = πran P = π(ker T )⊥ .
Moreover,
T † T T † = P P ∗ P Σ−1 Q∗ = P Σ−1 Q∗ = T †
and
T T † T = QQ∗ QΣP ∗ = QΣP ∗ = T,
which concludes the proof. □
8. SINGULAR VALUE DECOMPOSITION OF MATRICES 77

Theorem 3.56. Assume that U and V are finite dimensional inner product
spaces and that T : U → V is linear. Let v ∈ V . Then
u† := T † v
solves the (bi-level) optimisation problem
(20) min∥u∥2U s.t. u solves min∥T u − v∥2V .
u u

In other words, the Moore–Penrose inverse looks at the least squares problem
minu ∥T u − v∥2 and selects from all solutions of this problems the one with the
smallest norm.
Proof. The vector u ∈ U solves the problem minu ∥T u − v∥2V , if and only if
the vector v̂ = T u solves the problem
min∥v̂ − v∥2V s.t. v̂ ∈ ran T.

In other words, T u = v̂ = πran T v. Thus we can reformulate (20) equivalently as


(21) min∥u∥2U s.t. T u = πran T v.
u

We will show in the following that u† = T † v solves (21).


For that, we note first that T u† = T T † v = πran T v, which shows that u†
satisfies the constraint in (21). Now assume that ũ ∈ U is an arbitrary vector
satisfying T ũ = πran T v. Then T (ũ − u† ) = 0 and thus ũ − u† ∈ ker T . Since
u† = T † v † ∈ ran T † = (ker T )⊥ , it follows that
∥ũ∥2U = ∥ũ − u† ∥2U + ∥u† ∥2U ≥ ∥u† ∥2U .
Since this inequality holds for all ũ satisfying T ũ = πran T v, it follows that u†
solves (21), which concludes the proof. □

8. Singular Value Decomposition of Matrices


We consider now specifically the case where U = Kn , V = Km , both regarded
with the standard inner product, and T : U → V is a linear mapping, which we can
identify with a matrix A ∈ Km×n . According to Theorem 3.52, we can decompose
A as
A = Q Σ PH

Km×n Km×p Kp×p Kp×n


with Q and P unitary matrices, that is QH Q = Idp and P H P = Idp , and Σ a
diagonal matrix with non-increasing positive diagonal entries.
Now consider the matrices AH A ∈ Kn×n and AAH ∈ Km×m . We have
AH A = (QΣP H )H QΣP H = P ΣH QH QΣP H = P Σ2 P H ,
and similarly
AAH = QΣP H (QΣP H )H = QΣP H P ΣH QH = QΣ2 QH .
Next we will have a closer look at the decomposition of AH A and provide
a different interpretation of it. The matrix P ∈ Kn×p is unitary, and thus its
columns form an orthonormal system in Kn (the columns of P are precisely the
coordinates of the vectors u1 , . . . , up in the SVD). We can now use Gram–Schmidt
orthogonalisation in order to extend this orthonormal system to an orthonormal
basis of Kn and then collect this basis in a square matrix. That is, we can find a
matrix P ′ ∈ Kn×(n−p) such that
P̂ := P P ′ ∈ Kn×n

78 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION

is unitary. In addition, we can extend the matrix Σ ∈ Kp×p to a matrix Σ̃ ∈ Kn×n


by adding zeroes as necessary. That is, we define the matrix
 
Σ 0p×(n−p)
Σ̃ := ∈ Kn×n .
0(n−p)×p 0(n−p)×(n−p)
Then
 Σ2 PH  Σ2 P H
    
2 H ′ 0 ′
P̂ Σ̃ P̂ = P P = P P = P Σ2 P H = AH A.
0 0 (P ′ )H 0

Now note that P̂ ∈ Kn×n satisfies P̂ H P̂ = Idn . Since P̂ is a square matrix,


this implies that P̂ is actually invertible with
P̂ −1 = P̂ H .
Thus we actually have the decomposition
AH A = P̂ Σ̃2 P̂ −1 .
This means that Σ̃2 is the matrix of the mapping AH A in the basis of Kn given
by the columns of P̂ . Since moreover Σ̃2 ∈ Kn×n is diagonal, this implies that
AH A is diagonalisable with non-zero eigenvalues σ12 , . . . , σp2 and the eigenvalue 0
with multiplicity (n − p). In addition, the columns of P̂ form an (orthonormal)
eigenbasis for AH A. Moreover, the eigenvectors of AH A that correspond to non-
zero eigenvalues are precisely the columns of the matrix P .
We can now apply the same idea to the matrix AAH ∈ Km×m . That is, we
find a matrix Q′ ∈ Km×(m−p) such that Q̂ := (QQ′ ) ∈ Km×m is unitary, and then
obtain that  2 
Σ 0
H
AA = Q̂ Q̂−1 ,
0 0
where we have again filled up the matrix Σ2 ∈ Kp×p with zeroes to obtain an
(m × m)-matrix. Again, we obtain that σ12 , . . . , σp2 are the non-zero eigenvalues of
AAH , and the columns of Q are corresponding eigenvectors.
The considerations above yield the following result:
Theorem 3.57. Let A ∈ Kn×m . The singular values of A are precisely the
square roots of the non-zero eigenvalues of the matrices AH A or AH A, and the
singular vectors of A are eigenvalues of AH A and AH A, respectively.
Reduced and full SVD.
The decomposition of a matrix A = QΣP H with Q ∈ Km×p , Σ ∈ Kp×p , and
P ∈ Kn×p is in the literature often called the reduced singular value decomposition of
the matrix A. The full singular value decomposition then is of the form A = Q̂Σ̂P̂ H
with Q̂ ∈ Km×m , Σ̂m×n , and P̂ ∈ Kn×n . Here the matrices Q̂ and P̂ are unitary,
and Σ̂ is a rectangular diagonal matrix with ordered non-negative diagonal entries.
That is, Σ̂11 ≥ Σ̂22 ≥ . . . ≥ Σ̂kk ≥ 0, where k = min{m, n}, and all other entries of
Σ̂ are equal to zero.
In order to compute the (full) SVD of a matrix A ∈ Km×n one can proceed
as follows: If n ≤ m, we start by computing AH A ∈ Kn×n and compute the
eigenvalues as well as an orthonormal eigenbasis of AH A. That is, we write
AH A = P̂ Σ̃2 P̂ 2
with P̂ ∈ Kn×n unitary and Σ̃2 diagonal (with necessarily non-negative diagonal
entries).
8. SINGULAR VALUE DECOMPOSITION OF MATRICES 79

We then define the matrix Σ̂ ∈ Km×n by adding zeroes, that is,


 
Σ̃
Σ̂ = ,
0(m−n)×n
and we write
A = Q̂Σ̂P̂ H
with yet to be determined Q̂ ∈ Km×m . By multiplying this equation from the right
with P̂ , we obtain that

AP̂ = Q̂Σ̂ = σ̂1 q̂1 σ̂2 q̂2 . . . σ̂n q̂n ,
where q̂1 , . . . , q̂m are the columns of Q̂. Thus we obtain the j-th column of Q̂ by
dividing the j-th column of AP̂ by σ̂j whenever possible (that is, whenever σ̂j ̸= 0).
Finally, we obtain the remaining columns of Q̂ by completing the computed ones
to an orthonormal basis of Km .
If m < n, we can in theory use the same approach, but it is usually simpler
to start with the matrix AAH = Q̂Σ̃2 Q̂H instead, as this matrix is smaller than
AH A. Thus one computes first an orthonormal eigenvalue decomposition AAH =
Q̂Σ̃2 Q̂H . Then the j-th column of P̂ can be obtained by dividing the j-th column
of AH Q̂ by σ̂j if possible. The remaining columns of P̂ can again be obtained by
orthogonalisation.
Example 3.58. Consider the matrix
 
1 c
A=
0 1
with c ̸= 0.
We first compute a Jordan decomposition of A in order to demonstrate the
difference to the SVD. This matrix has a single geometric eigenvalue λ1 = 1 with
corresponding eigenvector (1, 0). A possible Jordan decomposition of A reads
 −1   
c 0 1 1 c 0
A= .
0 1 0 1 0 1
Next we will compute a singular value decomposition of A. To that end, we
will compute first the eigenvalues and eigenvectors of AAH (doing the computations
with AH A would work fine as well4 ). We have
1 + c2 c
 
H
AA =
c 1
with eigenvalues r
c2 c4
λ1,2 = 1 + ± c2 + .
2 4
√ √
As a consequence, the singular values of A are λ1 and λ2 , or
s r s r
c2 2
c 4 c2 c4
σ1 = 1 + + c + and σ2 = 1 + − c2 + .
2 4 2 4
In the particular case c = 8/3 (which noticeable simplifies all the following
calculations) we have
p p
σ1 = λ 1 = 3 and σ2 = λ2 = 1/3.

4The decision of whether to start with AAH or AH A should be based on which of these
matrices is expected to look simpler. In this case, there is no real difference between them, so it
does not matter. In some cases, one can make one’s life significantly harder, though, by starting
the computations the wrong way.
80 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION

Moreover, the eigenvalues of AAH corresponding to λ1 and λ2 are


   
1 3 1 −1
q1 = √ and q2 = √ .
10 1 10 3
Thus we can write
    
1 3 −1 9 0 1 3 1
AAH = QΣ2 QH = √ √ .
10 1 3 0 1/9 10 −1 3
Moreover, the matrix Q in this decomposition of AAT can be chosen to be precisely
the matrix Q in the singular value decomposition A = QΣP H of A. Now the
equation A = QΣP H implies that
 
1 1 −3
P = AH QΣ−1 = √ .
10 3 1
We therefore obtain the singular value decomposition
      
1 8/3 1 3 −1 3 0 1 1 3
=√ √ .
0 1 10 1 3 0 1/3 10 −3 1

9. Self-adjoint and Positive Definite Transformations


We now turn back to the more general case where T is a linear transformation
between the finite dimensional inner product spaces U and V , and try to emulate
the results of the previous section also in this matrix free situation. Since the
matrix AH corresponds to the operator T ∗ , this indicates that we should look at
the eigenvalues and eigenvectors of the transformations T ∗ T and T T ∗ , respectively.
For that, we will first discuss some properties of these types of transformations
independent of the singular value decomposition.
Definition 3.59. A linear transformation T : U → U on an inner product
spaces U is called self-adjoint or Hermitian if T ∗ = T . ■

Proposition 3.60. Assume that U is a finite dimensional inner product spaces


and that T : U → U is self-adjoint. Then the eigenvalues of T are real. In addition,
if λ1 ̸= λ2 are different eigenvalues of T with corresponding eigenvectors u1 and
u2 , then ⟨u1 , u2 ⟩ = 0.
Proof. Assume that λ ∈ K is an eigenvalue of T with eigenvector u ∈ U .
Then
λ∥u∥2 = ⟨λu, u⟩ = ⟨T u, u⟩ = ⟨u, T ∗ u⟩ = ⟨u, T u⟩ = ⟨u, λu⟩ = λ∥u∥2 .
Since ∥u∥ =
̸ 0, it follows that λ = λ, and thus λ ∈ R.
Next assume that λ1 ̸= λ2 are distinct eigenvalues of T and that u1 , u2 are
corresponding eigenvectors. Then

λ1 ⟨u1 , u2 ⟩ = ⟨λ1 u1 , u2 ⟩ = ⟨T u1 , u2 ⟩ = ⟨u1 , T ∗ u2 ⟩ = ⟨u1 , T u2 ⟩


= ⟨u1 , λ2 u2 ⟩ = λ2 ⟨u1 , u2 ⟩ = λ2 ⟨u1 , u2 ⟩.
Thus
(λ1 − λ2 )⟨u1 , u2 ⟩ = 0,
which shows that ⟨u1 , u2 ⟩ = 0. □

Even more, it is actually possible to show that self-adjoint transformations are


diagonalisable:
9. SELF-ADJOINT AND POSITIVE DEFINITE TRANSFORMATIONS 81

Theorem 3.61. Assume U is a finite dimensional inner product spaces and


that T : U → U is self-adjoint. Then U has an orthonormal basis consisting of
eigenvectors of T . In particular, T is diagonalisable.
Remark 3.62. One can show that a more general class of linear transformations
has an orthonormal eigenbasis. We say that a linear transformation T : U → U is
normal, if T T ∗ = T ∗ T . One can then show that T is normal, if and only if U has
an orthonormal basis consisting of eigenvectors of T . ■

Definition 3.63. Assume that U is a finite dimensional inner product space


and T : U → U is self-adjoint. We say that T is positive semi-definite, if
⟨T u, u⟩ ≥ 0
for all u ∈ U . We say that T is positive definite, if ⟨T u, u⟩ > 0 for all u ∈ U ,
u ̸= 0. ■

Lemma 3.64. Assume that U is a finite dimensional inner product space and
T : U → U is Hermitian. Then T is positive semi-definite, if and only if all eigen-
values of T are non-negative.
Moreover, T is positive definite, if and only if all eigenvalues of T are positive.
Proof. Assume first that T is positive semi-definite and that λ is an eigenvalue
of T with eigenvector u. Then
λ∥u∥2 = ⟨λu, u⟩ = ⟨T u, u⟩ ≥ 0,
which shows that λ ≥ 0.
Now assume that all eigenvalues of T are non-negative. Denote by λ1 , . . . , λn ≥
0 the eigenvalues of T , and let (u1 , . . . , un ) be a corresponding orthonormal basis
of eigenvectors. Let moreover u ∈ U . Then we can write
u = x1 u1 + . . . + xn un
for some xi ∈ K, i = 1, . . . , n. Since the vectors ui are eigenvectors of T for the
eigenvalues λi , we have that
T u = λ1 x1 u1 + . . . + λn xn un .
Since (u1 , . . . , un ) is an orthonormal basis of U and λi ≥ 0 for all i, we obtain that
n
X
⟨T u, u⟩ = λi |xi |2 ≥ 0,
i=1

which shows that T is positive semi-definite.


The proof for positive definiteness is similar. □

Lemma 3.65. Assume that U and V are finite dimensional inner product spaces
and T : U → V is a linear transformation. Then T ∗ T : U → U and T T ∗ : V → V
are self-adjoint positive semi-definite.
Proof. We note first that (T ∗ T )∗ = T ∗ (T ∗ )∗ = T ∗ T and (T T ∗ )∗ = (T ∗ )∗ T ∗ =
T T ∗ , which shows that T ∗ T and T T ∗ are self-adjoint.
Next we have for all u ∈ U that
⟨u, T ∗ T u⟩U = ⟨T u, T u⟩V = ∥T u∥2V ≥ 0,
and for all v ∈ V that
⟨T T ∗ v, v⟩V = ⟨T ∗ v, T ∗ v⟩U = ∥T ∗ v∥2U ≥ 0,
which proves the positive semi-definiteness of T ∗ T and T T ∗ . □
82 3. INNER PRODUCT SPACES AND SINGULAR VALUE DECOMPOSITION

In order to relate the singular value decomposition of T to the properties of


T ∗ T and T T ∗ , we need an additional result that shows that the transformations T
and T ∗ T have the same kernel.
Lemma 3.66. Assume that U and V are finite dimensional inner product spaces
and T : U → V is a linear transformation. Then ker(T ∗ T ) = ker T .
Proof. Assume first that u ∈ ker T . Then T u = 0 and thus also T ∗ T u =
T 0 = 0. This shows that ker T ⊂ ker(T ∗ T ).

Conversely, assume that u ∈ ker(T ∗ T ). Then T ∗ T u = 0, and thus


⟨T ∗ T u, u⟩U = ⟨T u, T u⟩V = ∥T u∥2V = 0,
which shows that T u = 0 and thus u ∈ ker T . □

Theorem 3.67. Assume that U and V are finite dimensional inner product
spaces and that T : U → V is linear. Let (u1 , . . . , un ) be an orthonormal eigenbasis
of T ∗ T for the eigenvalues λ1 ≥ λ2 ≥ . . . ≥ λn ≥ 0.
Denote p = rank T , and define
p T ui
σi = λ i and vi =
σi
for k = 1, . . . , p. Then
p
X
(22) Tu = σi ⟨u, ui ⟩vi
i=1

is a singular value decomposition of T . In particular, the singular values of T are


precisely the non-zero eigenvalues of the mapping T ∗ T .
Proof. We have to show that the numbers σi and the vectors vi are well-
defined, and that all the conditions for a singular value decomposition are satisfied.
That is, that the sets (u1 , . . . , up ) and (v1 , . . . , vp ) are orthonormal, σ1 ≥ σ2 ≥
. . . ≥ σp > 0, and that T u actually has the form given in (22).
Since ker(T ∗ T ) = ker T , it follows that

rank(T ∗ T ) = dim(ran T ∗ T ) = dim U − dim(ker T ∗ T )


= dim U − dim(ker T ) = dim(ran T ) = rank T.
Thus the number of non-zero eigenvalues of T ∗ T (with multiplicity) is precisely p.
Since T ∗ T is positive semi-definite, all its non-zero eigenvalues are strictly positive,
and thus the numbers σi > 0 and the vectors vi are well-defined. Since λ1 ≥ λ2 ≥
. . ., we also have that σ1 ≥ σ2 ≥ . . ..
Since (u1 , . . . , un ) is an orthonormal basis of U , the family (u1 , . . . , up ) is or-
thonormal as well. Moreover, we have that
n
X
u= ⟨u, ui ⟩ui
i=1

for all u ∈ U . Now we note that ui ∈ ker(T ∗ T ) = ker T for all i = p + 1, . . . , n,


since the corresponding eigenvalue λi of T ∗ T is by assumption equal to 0. Thus
T ui = 0 for i = p + 1, . . . , n, and thus
n
X p
X p
X
Tu = ⟨u, ui ⟩T ui = ⟨u, ui ⟩T ui = σi ⟨u, ui ⟩vi
i=1 i=1 i=1

for all u ∈ U , which shows that the representation of T given in (22) is valid.
9. SELF-ADJOINT AND POSITIVE DEFINITE TRANSFORMATIONS 83

It remains to show that the family (v1 , . . . , vp ) is orthonormal as well. For that,
we compute for 1 ≤ i, j ≤ p the inner product
1 1
⟨vi , vj ⟩V = ⟨T ui , T uj ⟩V = ⟨ui , T ∗ T uj ⟩U
σi σj σi σj
1 σj 1 if i = j,
= ⟨ui , σj2 uj ⟩U = ⟨ui , uj ⟩U =
σi σj σi 0 if i ̸= j,
as the family (u1 , . . . , up ) is orthonormal. This concludes the proof. □
CHAPTER 4

Banach and Hilbert Spaces

Until now, we have treated vector spaces and linear mappings in a purely algeb-
raic way. That is, all1 the constructions we have performed in the proofs required
finitely many well defined steps, involving only linear operations and at times also
polynomials. Although we were sometimes dealing with infinite dimensional vector
spaces, we were never required to rely on infinite sums of vectors.
This restriction to pure algebra is fine for solving some mathematical problems.
The vast majority of problems that occur in practice in physics and engineering,
however, cannot be reasonably treated in that way. Instead, it is necessary to bring
in methods from analysis as well, in particular the ideas of convergence of sequences
and continuity of mappings.2 In the following, we will introduce the main ideas in
that direction.

1. Normed Spaces
We start by defining the notion of a norm on vector spaces, which is a function
that assigns a length to each vector.
Definition 4.1 (Norm). A norm on a vector space U over K is a function
∥·∥ : U → R with the following properties:
(1) Positivity: ∥u∥ ≥ 0 for all u ∈ U , and ∥u∥ = 0 if and only if u = 0.
(2) Homogeneity: ∥λu∥ = |λ|∥u∥ for all u ∈ U and λ ∈ K.
(3) Triangle inequality: ∥u + v∥ ≤ ∥u∥ + ∥v∥ for all u, v ∈ U .
A vector space together with a norm is called a normed space. ■
P 1/2
n
Example 4.2. The space Kn with the Euclidean norm ∥x∥2 := i=1 |xi |
2

is a normed space. ■

The Euclidean norm is by no means the only norm on Kn . Amongst those that
are used regularly in applications are the 1-norm and the ∞-norm defined below.
Example 4.3. On the space Kn we define
∥x∥1 := |x1 | + . . . + |xn |
and

∥x∥∞ := max |x1 |, . . . , |xn |
for x ∈ Kn . Both of these are norms on Kn , as positivity and homogeneity are
clearly satisfied and the triangle inequality follows from the triangle inequality on
the real/complex numbers. ■

1Arguably with the exception of the existence of the singular value decomposition, where
we actually employed a result from calculus concerning the existence of solutions of optimisation
problems. Also, the existence of a basis of an arbitrary vector space required a significantly more
complicated argumentation, but this was relegated to the optional part of these notes.
2Differentiability of mappings would be the next important idea, but, alas, we won’t have
time in this course to go that far.

85
86 4. BANACH AND HILBERT SPACES

A particular class of normed spaces is that of inner product spaces. There we


have already defined a normed induced by the inner product. We now show that
that “norm” indeed satisfies the requirements put forward in Definition 4.1.
Lemma 4.4. Assume that U is an inner product space with inner product ⟨·, ·⟩.
Then
∥u∥ := ⟨u, u⟩1/2
defines a norm on U .
Proof. Since the inner product is positive definite, it follows that ∥u∥ ≥ 0 for
all u ∈ U with equality if and only if u = 0. Next, if u ∈ U and λ ∈ K, then
1/2
∥λu∥ = ⟨λu, λu⟩1/2 = λλ⟨u, u⟩ = |λ|⟨u, u⟩1/2 = |λ|∥u∥,
which proves the positive homogeneity. Finally, the triangle inequality has been
shown in Theorem 3.12. □

It is obvious that the norms ∥·∥1 and ∥·∥∞ on the space Kn (with n ≥ 2) are
different from the Euclidean norm that is induced by the standard inner product.
However, it might be still possible that there is another inner product on Kn that
induces these norms. We will show in the following that this is not the case by
deducing a property that all norms induced by an inner product need to satisfy.
Lemma 4.5 (Parallelogram law). Assume that U is an inner product space and
that ∥u∥ = ⟨u, u⟩1/2 is the induced norm on U . Then
∥u − v∥2 + ∥u + v∥2 = 2∥u∥2 + 2∥v∥2
for all u, v ∈ U .
Proof. We write

∥u − v∥2 + ∥u + v∥2 = ⟨u − v, u − v⟩ + ⟨u + v, u + v⟩
= ⟨u, u⟩ − ⟨v, u⟩ + ⟨v, v⟩ − ⟨u, v⟩ + ⟨u, u⟩ + ⟨v, u⟩ + ⟨v, v⟩ + ⟨u, v⟩
= 2⟨u, u⟩ + 2⟨v, v⟩ = 2∥u∥2 + 2∥v∥2 .

Example 4.6. The norms ∥·∥1 and ∥·∥∞ on Kn with n ≥ 2 are not induced by
an inner product. We show this by demonstrating that the parallelogram law fails
to hold in both cases. For this, we take u = e1 and v = e2 , the first and second
standard basis vector in Kn . Then
∥e1 − e2 ∥21 + ∥e1 + e2 ∥21 = 4 + 4 = 8,
whereas
2∥e1 ∥21 + 2∥e2 ∥21 = 2 + 2 = 4.
Similarly,
∥e1 − e2 ∥2∞ + ∥e1 + e2 ∥2∞ = 1 + 1 = 2,
whereas
2∥e1 ∥2∞ + 2∥e2 ∥2∞ = 2 + 2 = 4.

Interestingly, the parallelogram law provides a complete characterisation of


norms induced by inner products:
1. NORMED SPACES 87

Theorem 4.7 (Jordan–von Neumann). Assume that (U, ∥·∥) is a normed space
and that the parallelogram law
∥u − v∥2 + ∥u + v∥2 = 2∥u∥2 + 2∥v∥2
holds for all u, v ∈ U . Then there exists an inner product ⟨·, ·⟩ on U such that
∥u∥ = ⟨u, u⟩1/2 for all u ∈ U .
• If U is a real vector space, then ⟨·, ·⟩ is given as
1
⟨u, v⟩ = ∥u + v∥2 − ∥u − v∥2 .

4
• If U is a complex vector space, then ⟨·, ·⟩ is given as
4
1X k
⟨u, v⟩ = i ∥u + ik v∥2 .
4
k=1

Proof. I will add the proof for the real case when I find time. (It is not
difficult to check that the requirements for an inner product are satisfied in the
real case, but it is somewhat tedious. In the complex case, the tediousness rises
significantly.) □
Analoga of the 1-norm and ∞-norm can also be defined for function spaces:
Example 4.8. The space C([0, 1]; Cn ) of continuous functions f : [0, 1] → Cn
can be made a normed space with the norm
∥f ∥∞ := max |f (x)|2 .
x∈[0,1]

Here we denote by |·|2 the Euclidean norm on Cn (in order to minimise the confusion
possibilities with the norm on C([0, 1]; Cn )). There are (reasonable) alternatives to
this norm, though. For instance, we can define the norm
Z 1
∥f ∥1 := |f (x)|2 dx.
0
We will see later that these two norms have pretty different properties. ■

We will now state a consequence of the triangle inequality that turns out to be
useful in a large number of estimates.
Lemma 4.9 (Reverse triangle inequality). Assume that (U, ∥·∥) is a normed
space and that u, v ∈ U . Then
∥u∥ − ∥v∥ ≤ ∥u − v∥.
Proof. We have that
∥u∥ = ∥u − v + v∥ ≤ ∥u − v∥ + ∥v∥
and therefore
∥u∥ − ∥v∥ ≤ ∥u − v∥.
Similarly,
∥v∥ = ∥v − u + u∥ ≤ ∥v − u∥ + ∥u∥
and thus
∥v∥ − ∥u∥ ≤ ∥u − v∥.
Together, these inequalities imply that
∥u∥ − ∥v∥ ≤ ∥u − v∥,
which proves the assertion. □
88 4. BANACH AND HILBERT SPACES

2. Convergence and Continuity


On a normed space, we can say whether two vectors are close to each other, as
the norm gives us a method for calculating distances between vectors. This in turn
gives a method for defining convergence of sequences:
Definition 4.10 (Convergence and limits). Assume that (U, ∥·∥) is a normed
space and that (un )n∈N ⊂ X is a sequence in U . We say that the sequence converges
to u ∈ U , if there exists for every ε > 0 some index N ∈ N such that
∥un − u∥ < ε whenever n ≥ N.
We call u a limit of the sequence (un )n∈N and denote the convergence by
u = lim un .
n→∞

Remark 4.11. The statement that for all ε > 0 there exists N ∈ N such that
∥un − u∥ < ε whenever n ≥ N is precisely the same as the convergence of the
sequence of real numbers ∥un − u∥ to 0. That is, we have that
lim un = u ⇐⇒ lim ∥un − u∥ = 0.
n→∞ n→∞

Example 4.12. Consider the space C([0, 1]) of continuous real valued functions
on the interval [0, 1] with the norm
Z 1
∥f ∥1 = |f (x)| dx,
0

and define the sequence (fn )n∈N by


fn (x) := xn .
Then this sequence converges to the function f (x) = 0. Indeed, we have that
Z 1 Z 1
n 1
∥fn − f ∥1 = |x − 0| dx = xn dx = ,
0 0 n + 1
which shows that
lim ∥fn − f ∥1 = 0.
n→∞
However, if we consider the same space with the norm
∥f ∥∞ := max |f (x)|,
x∈[0,1]

then the same sequence does not converge to 0 (in fact, it does not converge at all).
In this case, we have that
∥fn − 0∥∞ = max |xn − 0| = 1
x∈[0,1]

for all n ∈ N. This shows that the notion of convergence of a sequence can change
drastically if we change the norm on the space. ■

Lemma 4.13 (Uniqueness of the limit). Assume that (U, ∥·∥) is a normed space
and that (un )n∈N ⊂ X is a sequence in U . If the limit limn→∞ un exists, it is
unique.
Proof. Let u = limn→∞ un and v = limn→∞ un . Assume to the contrary that
u ̸= v. Then ∥u − v∥ > 0. Define now ε := ∥u − v∥/2. Because of the convergence
of the sequence (un )n∈N to u and v, there exist N1 , N2 ∈ N such that ∥un − u∥ < ε
2. CONVERGENCE AND CONTINUITY 89

whenever n ≥ N1 and ∥un − v∥ < ε whenever n ≥ N2 . Let now n := max{N1 , N2 }.


Then the triangle inequality and positive homogeneity of the norm imply that
2ε = ∥u − v∥ ≤ ∥u − un ∥ + ∥un − v∥ = ∥un − u∥ + ∥un − v∥ < ε + ε = 2ε,
which is clearly a contradiction. This shows that u = v. □
As a consequence of this uniqueness result, we can indeed talk about the limit
of a convergent sequence (un )n∈N in a normed space.
Definition 4.14 (Bounded sets). Assume that (U, ∥·∥) is a normed space and
that A ⊂ U is some subset. We say that A is bounded, if it is contained in some
ball centered at 0, that is, there exists r > 0 such that A ⊂ Br (0). ■

Proposition 4.15. Assume that (U, ∥·∥) is a normed space and that (uk )k∈N ⊂
U is a convergent sequence in U . Then the set {uk : k ∈ N} is bounded.
Proof. Denote u := limk→∞ uk . Then there exists K ∈ N such that ∥u−uk ∥ ≤
1 whenever k ≥ K. (Here we have used the definition of continuity with the choice
ε = 1.) Now define
r := max{∥u1 ∥, ∥u2 ∥, . . . , ∥uK−1 ∥, ∥u∥ + 1} + 1.
Then we obviously have that ∥uk ∥ < r whenever k < K. Moreover, for k ≥ K it
follows from the triangle inequality that
∥uk ∥ ≤ ∥uk − u∥ + ∥u∥ ≤ 1 + ∥u∥ < r.
Thus ∥uk ∥ < r for all k ∈ N, or, put differently, uk ∈ Br (0) for all k ∈ N. This
shows the boundedness of the set {uk : k ∈ N}. □
Having defined convergence of sequences in normed spaces, we can now continue
with the definition of continuity of functions between normed spaces. The definition
is, in a sense, a straight-forward generalisation of continuity of functions on R; we
simply replace each occurrence of an absolute value by the norm on the respective
normed space.
Definition 4.16 (Continuity). Assume that (U, ∥·∥U ) and (V, ∥·∥V ) are normed
spaces and that f : U → V is a mapping. We say that f is continuous at a point
u ∈ U if for every sequence (un )n∈N ⊂ U with limn→∞ un = u we have that
limn→∞ f (un ) = f (u).
We say that f is continuous everywhere or simply continuous, if it is continuous
at every point u ∈ U . ■

Alternatively to this definition of continuity, we have the ε-δ-characterisation


that everyone loves and likes:
Theorem 4.17. Let (U, ∥·∥U ) and (V, ∥·∥V ) be normed spaces. The function
f : U → V is continuous at u ∈ U if and only if for every ε > 0 there exists δ > 0
such that
(23) ∥f (u) − f (v)∥V < ε whenever v ∈ V satisfies ∥u − v∥U < δ.
Proof. Assume that (23) holds for all ε > 0 and δ > 0. Now let (un )n∈N be
a sequence converging to u. We have to show that (f (un ))n∈N converges to f (x).
That is, we have to show that there exists for every ε > 0 some N ∈ N such that
∥f (un ) − f (u)∥V < ε whenever n ≥ N .
Let therefore ε > 0 and let δ > 0 be such that (23) holds. Since u = limn→∞ un ,
there exists N ∈ N such that ∥un − u∥U < δ for all n ≥ N . According to (23), this
implies that ∥f (un ) − f (u)∥V < ε for all n ≥ N . This proves that f is continuous
at u.
90 4. BANACH AND HILBERT SPACES

Now assume that (23) does not hold. That is, assume that there exists some
ε > 0 such that for every δ > 0 there exists û ∈ U with ∥u − û∥ < δ and ∥f (u) −
f (û)∥V ≥ ε. Applying this with δ = 1/n, this means that we can find a sequence
(un )n∈N such that ∥u−un ∥U < 1/n and ∥f (u)−f (un )∥ ≥ ε. Since ∥u−un ∥U < 1/n,
we then have that u = limn→∞ un . On the other hand, since ∥f (u) − f (un )∥V ≥ ε,
the sequence (f (un ))n∈N does not converge to f (u). This shows that f is not
continuous at u. □
We will now show that norms work well together with the linear structure of a
vector space in that the basic operations of addition and scalar multiplication will
always be continuous on a normed space.
Theorem 4.18. Assume that U is a normed space. Then the following hold:
(1) Continuity of the norm: If un →n→∞ u, then ∥un ∥ →n→∞ ∥u∥.
(2) Continuity of addition: If un →n→∞ u and vn →n→∞ v, then (un +
vn ) →n→∞ u + v.
(3) Continuity of scalar multiplication: If un →n→∞ u and λn →n→∞ λ, then
(λn un ) →n→∞ λu.
Proof. Assume that u = limn→∞ un , that is, ∥u − un ∥ → 0 as n → ∞. Then
the reverse triangle inequality implies that
∥u∥ − ∥un ∥ ≤ ∥u − un ∥ → 0 as n → ∞,
which proves that ∥un ∥ → ∥u∥.
Now assume that u = limn→∞ un and v = limn→∞ vn . Then
∥u + v − (un + vn )∥ = ∥u − un + v − vn ∥ ≤ ∥u − un ∥ + ∥v − vn ∥ → 0 as n → ∞,
which shows that u + v = limn→∞ (un + vn ).
Finally, assume that u = limn→∞ un and λ = limn→∞ λn . Then
∥λu − λn un ∥ = ∥λu − λun + λun − λn un ∥ ≤ ∥λ(u − un )∥ + ∥(λ − λn )un ∥
= |λ|∥u − un ∥ + |λ − λn |∥un ∥ → 0 as n → ∞,
as the sequence (∥un ∥)n∈N converges to ∥u∥ and therefore is bounded. □
n
Example 4.19. On K we define the 0-”norm” ∥·∥0 by
∥u∥0 := #{k : uk ̸= 0}.
That is, ∥u∥0 is the number of non-zero components of the vector u. We stress that
this is not a norm on Kn , because the homogeneity is not satisfied: We have for
instance that ∥e1 ∥0 = 1, but also ∥2e1 ∥0 = 1.
The other properties of a norm are satisfied, though: Non-negativity is clear
from the definition, and ∥u∥0 = 0 if and only if the vector u has no non-zero
components, that is, if u = 0. Also, the number of non-zero components of a vector
u + v cannot be larger the sum of non-zero components of u and v, which proves
the triangle inequality.
However, scalar multiplication is not a continuous operation on (Kn , ∥·∥0 ). Take
for instance the constant sequence un = e1 for all n and λn = n1 . Then un → e1
and λn → 0 as n → ∞, but
1
∥λn un − 0∥0 = e1 = 1 for all n,
n
that is, the sequence (λn un )n∈N does not converge to 0. ■
There is an alternative characterisation of continuity that can be much easier
to work with in certain cases. For that, however, we will need to introduce some
additional notation, or, rather, generalise some well known notation from Rn to
arbitrary normed spaces.
3. OPEN AND CLOSED SETS 91

3. Open and Closed Sets


Definition 4.20 (Open ball). Assume that (U, ∥·∥) is a normed space, let
u ∈ U and r > 0. By

Br (u) := v ∈ V : ∥u − v∥ < r
we denote the (open) ball of radius r around u. ■

Definition 4.21 (Open set). A subset S ⊂ U of a normed space (U, ∥·∥) is


open if there exists for every u ∈ S some r > 0 such that Br (u) ⊂ S. ■

That is, a set S is open if we can find for each point u in S a small ball around
u that is still completely contained in S.
In particular, open balls are actually open sets according to this definition.
While this statement might seem trivial at first glance, it actually is not and does
deserve a proof: In order to show that Br (u) is an open set, we have to find for
each v ∈ Br (u) some ball Bs (v) such that Bs (v) ⊂ Br (u).
Lemma 4.22. Let (u, ∥·∥) be a normed space, let u ∈ U and r > 0. Then the
open ball Br (u) is an open set.
Proof. Exercise! □
Example 4.23. Consider the space C([0, 1]) with the norm ∥·∥∞ . Then the set

U := f ∈ C([0, 1]) : f (x) > 0 for all x ∈ [0, 1]
is open.
In order to see that this is actually true, assume that f ∈ U . Since f is
continuous, the function f admits its minimum on the (bounded and closed) interval
[0, 1]. That is, the optimisation problem minx∈[0,1] f (x) has at least one solution
x∗ ∈ [0, 1]. Define now r := f (x∗ ). Since f ∈ U , we have that r > 0. Also, from
the definition of r it follows that f (x) ≥ r for all x ∈ [0, 1].
Assume to that end that g ∈ Br (f ). Then
g(x) ≥ f (x) − |f (x) − g(x)| ≥ f (x) − max |f (y) − g(y)|
y∈[0,1]

= f (x) − ∥f − g∥∞ > f (x∗ ) − r = 0


for all x ∈ [0, 1], which shows that g ∈ U . Since this holds for every g ∈ Br (f ), it
follows that Br (f ) ⊂ U , which in turn shows that U is open. ■

We now revisit the previous example with a different norm.


Example 4.24. Consider the space C([0, 1]) with the norm ∥·∥1 . Then the set

U := f ∈ C([0, 1]) : f (x) > 0 for all x ∈ [0, 1]
is not open.
Take for instance the function f (x) = 1, which is clearly contained in U . For
every r > 0 we can then define the function
−1 + 4x if 0 ≤ x < r ,

fr (x) := r r 2
1 if ≤ x ≤ 1.
2
It is easy to see that this function is continuous and not contained in U (since
fr (0) = −1). In addition, we have that
Z 1 Z r/2
4x r
∥fr − f ∥1 = |fr (x) − f (x)| dx = dx = < r.
0 0 r 2
Thus fr ∈ Br (f ) and fr ̸∈ U . Since r was arbitrary, this shows that U is not
open. ■
92 4. BANACH AND HILBERT SPACES

Definition 4.25 (Closed set). A subset K ⊂ U of a normed space (U, ∥·∥) is


closed, if every sequence in K that converges in U has its limit in K. That is, if
(un )n∈N is a sequence with un ∈ K for all n ∈ N such that u = limn→∞ un exists
in U , then u ∈ K. ■

Proposition 4.26 (Relation between open and closed sets). A set S in a


normed space (U, ∥·∥) is open, if and only if its complement S C := U \ S is closed.
Proof. Assume that S ⊂ U is open. Let moreover (uk )k∈N be a convergent
sequence with uk ∈ S C for all k ∈ N. We have to show that u := limk→∞ uk ∈ S C .
Assume to the contrary that u ∈ S. Since S is open, there exists ε > 0 such that
Bε (u) ⊂ S. Next, since u = limk→∞ uk , there exists K ∈ N such that ∥uk − u∥ < ε
for all k ≥ K. In other words, uk ∈ Bε (u) ⊂ S for all k ≥ K. This is clearly a
contradiction to the assumption that uk ̸∈ S for all k ∈ N. Thus u ∈ S C , which
shows that S C is closed.
Now assume that S is not open. Then there exists u ∈ S such that for every
ε > 0 we have that Bε (u) ̸⊂ S. That is, for every ε > 0 we have that Bε (u)∩S C ̸= ∅.
As a consequence, we can find for each k ∈ N some uk ∈ S C with ∥uk − u∥ < 1/k.
The resulting sequence (uk )k∈N then by construction converges to u and satisfies
that uk ∈ S C for all k ∈ N. However, limk→∞ uk = u ̸∈ S C , which shows that S C
is not closed. □
Lemma 4.27. Assume that (U, ∥·∥) is a normed space, that u ∈ U , and r > 0.
Denote by

B r (u) := v ∈ U : ∥u − v∥ ≤ r
the closed ball of radius r around u. Then B r (u) is a closed set.
Proof. Exercise! □
Now we can formulate an alternative characterisation of continuity:
Theorem 4.28 (Continuity). Let (U, ∥·∥U ) and (V, ∥·∥V ) be normed spaces and
let f : U → V be a function. The following are equivalent:
(1) The function f is continuous.
(2) For every open set A ⊂ V the set f −1 (A) = u ∈U : f (u) ∈ A is open.


(3) For every closed set K ⊂ V the set f −1 (K) = u ∈ U : f (u) ∈ K is


closed.
Proof. For the equivalence of the last two items, we observe that
f −1 (A)C = u ∈ U : f (u) ̸∈ A = f −1 (AC ).


Now the equivalence of these items follows from the fact that a set is open if and
only if its complement is closed.
It remains to show the equivalence of the first item with any of the others. For
this, we recall that the function f is continuous, if and only if for every u ∈ U
and every ε > 0 there exists δ > 0 such that ∥f (v) − f (u)∥V < ε whenever v ∈ V
satisfies ∥v − u∥U < δ. Now observe that this is equivalent to stating that for every
u ∈ U and every ε > 0 there exists δ > 0 such that
Bδ (u) ⊂ f −1 (Bε (u)).
Assume now that Item 2 holds. Let moreover u ∈ U and let ε > 0. Since the set
Bε (f (u)) ⊂ V is open, it follows that also C := f −1 (Bε (f (u))) is open. Moreover,
we obviously have that u ∈ C. Thus, as C is open, there exists δ > 0 such that
Bδ (u) ⊂ C = f −1 (Bε (f (u))). Since u ∈ U and ε > 0 were arbitrary, it follows that
f is continuous.
3. OPEN AND CLOSED SETS 93

Assume now that f is continuous and let A ⊂ V be open. We have to show


that f −1 (A) is open. Assume therefore that u ∈ f −1 (A). Then f (u) ∈ A. Since
A is open, there exists ε > 0 such that Bε (f (u)) ⊂ A. Now the continuity of f
implies the existence of δ > 0 such that Bδ (u) ⊂ f −1 (Bε (f (u))). Now the fact that
Bε (f (u)) ⊂ A implies that
Bδ (u) ⊂ f −1 (Bε (f (u))) ⊂ f −1 (A).
Since u ∈ f −1 (A) was arbitrary, this proves that f −1 (A) is open. □
Remark 4.29. Assume that (U, ∥·∥) is a normed space and that f : U → R is
continuous. Then the sets
u ∈ U : f (u) = 0 = f −1 (0)


and
u ∈ U : f (u) ≥ 0 = f −1 ([0, ∞))


are closed, and the set


u ∈ U : f (u) > 0 = f −1 ((0, ∞))


is open. ■

We now investigate open and closed sets a bit further. We show first that
arbitrary unions of open sets are open, and that arbitrary intersections of closed
sets are closed.
Proposition 4.30. Assume that (U, ∥·∥) S is a normed space and that Ai , i ∈ I,
are open subsets of U . Then the set A := i∈I Ai is open. T
Similarly, if Ki , i ∈ I, are closed subsets of U , then the set K := i∈I Ki is
closed.
S
Proof. Assume that u ∈ A = i∈I Ai . Then there exists an index j ∈ I such
that u ∈ Aj . Since Aj is open, there exists r > 0 such that Br (u) ⊂ Aj . This
implies that
[
Br (u) ⊂ Aj ⊂ Ai = A,
i∈I
which shows that A is open.
In order to show that K is closed, we note that
 \ C [
KC = Ki = KiC .
i∈I i∈I

Since Ki is closed for each i, the set KiC


is open. Thus K C is the union of open
sets and therefore open. As a consequence K is closed. □
For intersections of open sets, the situation is different. It is still possible to
show that finite intersections of open sets are open, but for infinite intersections,
this is usually not the case.
Proposition 4.31. Assume that (U, ∥·∥) is T a normed space and that A1 , . . . , An
n
are finitely many open sets. Then the set A := k=1 Ak is open. Sn
Similarly, if K1 , . . . , Kn are finitely many closed sets, then K := k=1 Kk is
closed.
Tn
Proof. Let u ∈ k=1 Ak . Then u ∈ Ak for each k. Since each set Ak is
open, there exist rk > 0, k = 1, . . . , n, such that Brk (u) ⊂ Ak . Now define
rT:= min{r1 , . . . , rn }. Then Br (u) ⊂ Brk (u) ⊂ Ak for each k, and thus Br (u) ⊂
n
k=1 Ak = A. This shows that U is open.
94 4. BANACH AND HILBERT SPACES

For showing that K is closed, we write again


[n C \n
KC = Kk = KkC .
k=1 k=1

Each set KkC C


is open, and thus K is the finite intersection of open sets and
therefore itself open. Thus K is closed. □
Definition 4.32 (Interior, closure, and boundary). Let (U, ∥·∥) be a normed
space, and let A ⊂ U be a set.
• The interior A◦ of A is the union of all open sets contained in A.
• The closure A of A is the intersection of all closed sets containing A.
• The boundary ∂A of A is defined as ∂A = A \ A◦ .

Remark 4.33. Since the union of open sets is open, and the intersection of
closed sets is closed, it follows that A◦ is open, and A is closed. Moreover, the set
∂A is the intersection of the closed set A and the closed set (A◦ )C , and therefore
closed as well.
In addition, we can characterise A◦ as the largest set contained in A, and A as
the smallest set containing A. ■

Proposition 4.34. Let (U, ∥·∥) be a normed space, and let A ⊂ U be a set.
Then u ∈ A◦ if and only if there exists r > 0 such that Br (u) ⊂ A.
Proof. Assume first that u ∈ A◦ . By the definition of A◦ , there exists an
open set D ⊂ A such that u ∈ D. Since D is open, there exists r > 0 such that
Br (u) ⊂ D. As a consequence, Br (u) ⊂ D ⊂ A.
Conversely, assume that there exists r > 0 such that Br (u) ⊂ A. Since the set
Br (u) is open and contained in A, it follows that Br (u) ⊂ A◦ . In particular, this
shows that u ∈ A◦ . □
Lemma 4.35. Let (U, ∥·∥) be a normed space, and let A ⊂ U be a set. Then
◦ C
(A ) = AC and (A)C = (AC )◦ .
Proof. The set A◦ is open and A◦ ⊂ A. Thus (A◦ )C is closed and AC ⊂
(A◦ )C . This shows that AC ⊂ (A◦ )C , as AC is the smallest closed set containing
AC .
Similarly, A is closed and A ⊂ A. Thus (A)C is open and (A)C ⊂ AC . This
shows that (A)C ⊂ (AC )◦ , as (AC )◦ is the largest open set contained in AC .
Now denote D := AC . Applying the first result we have shown in this proof,
we obtain that DC ⊂ (D◦ )C , which can be rewritten as A ⊂ ((AC )◦ )C , or (AC )◦ ⊂
(A)C .
Similarly, applying the second result we have shown in this proof to D = AC ,
we obtain that (D)C ⊂ (DC )◦ , or (AC )C ⊂ A◦ , which shows that (A◦ )C ⊂ AC . □
Lemma 4.36. Let (U, ∥·∥) be a normed space, and let A ⊂ U be a set. Then
∂A = A ∩ AC .
Proof. By definition, ∂A = A \ A◦ . However, by Lemma 4.35 we have that
C
A◦ = AC . Thus
∂A = A \ A◦ = A ∩ (A◦ )C = A ∩ AC .

Lemma 4.37. Let (U, ∥·∥) be a normed space, and let A ⊂ U be a set. Then
u ∈ A, if and only if for all r > 0 we have that Br (u) ∩ A ̸= ∅.
4. CAUCHY–SEQUENCES AND COMPLETENESS 95

Proof. By Lemma 4.35, we have that u ∈ A, if and only if u ̸∈ (AC )◦ . By


Proposition 4.34, this is the case if and only if for all r > 0 we have that Br (u) ̸⊂ AC .
This, in turn, is the case if and only if for all r > 0 we have that Br (u) ∩ A ̸= ∅,
which proves the assertion. □
Lemma 4.38. Let (U, ∥·∥) be a normed space, and let A ⊂ U be a set. Then
u ∈ ∂A, if and only if for all r > 0 we have that Br (u) ∩ A ̸= ∅ and Br (u) ∩ A ̸= ∅.
Proof. By Lemma 4.36, we have that u ∈ ∂A, if and only if u ∈ A and
u ∈ AC . By applying Lemma 4.37 to both A and AC , we thus obtain the claimed
result. □
Proposition 4.39. Let (U, ∥·∥) be a normed space, and let A ⊂ U be a set. The
closure A of A consists of all limits of convergent sequences contained in A. That is,
u ∈ A, if and only if there exists a sequence (uk )k∈N ⊂ A such that u = limk→∞ uk .
Proof. Since A is closed, is contains the limits of all convergent sequences in
A, and thus a forteriori the limits of all convergent sequences in A.
Conversely, assume that u ∈ A. By Lemma 4.37, we have for all k ∈ N that
B1/k (u) ∩ A ̸= ∅. In other words, for every k ∈ N we can find uk ∈ A such that
∥uk − u∥ < 1/k. Thus u = limk→∞ uk , which proves the assertion. □
Definition 4.40 (Dense set). Assume that (U, ∥·∥) is a normed space, that
A ⊂ B ⊂ U are subsets of X and that B is closed. We say that A is dense in B, if
A = B. ■

Equivalently, we have that A is dense in B, if there exists for every u ∈ B some


sequence (uk )k∈N with u = limk→∞ uk and uk ∈ A for all k.
Example 4.41. The set Q of rational numbers is dense in R, as we can approx-
imate every real number up to arbitrary (finite) accuracy by a rational number. ■
The following theorems show some non-trivial (and fairly hard to prove) ex-
amples of dense subsets:
Theorem 4.42 (Stone–Weierstraß). The space P of polynomials is dense in
C([0, 1]) with respect to the norm ∥·∥∞ .
Theorem 4.43 (Weierstraß). The space of trigonometric polynomials, that is,
PN
the set of functions of the form f (x) = k=−N ck eıkx with N ∈ N and ck ∈ C, is
dense in the space of complex 2π-periodic continuous functions on R with respect
to the norm ∥·∥∞ .
Remark 4.44. The previous result does not state that the Fourier series of an
arbitrary continuous function f converges with respect to ∥·∥∞ to f . In fact, there
exists examples of continuous functions, where the Fourier series does not converge.
However, it is possible to show that running averages (so called “Cesàro means”)
of the Fourier series converge. ■

4. Cauchy–sequences and Completeness


A large portion of linear algebra does not make full use of the real numbers, but
can as well be formulated just for vector spaces over the rational numbers Q. For
instance, if a system of linear equations with only rational coefficients is solvable
within the real numbers, it is already solvable within the rational numbers. The
situation changes a bit if we want to talk about eigenvalues and singular values, and
the corresponding decompositions, as these necessarily involve solutions of algebraic
equations. Still, one can formulate all the theory that we covered in this course up
96 4. BANACH AND HILBERT SPACES

til now by moving from the rational numbers to the so called algebraic numbers,
which include all roots of polynomials with rational coefficients.
However, we are now in the process of introducing analytic tools into linear
algebra, and analysis does not work well with rational (or algebraic) numbers, as
many of the well-known results from calculus fail to hold in that setting. For
instance, the intermediate value theorem that states that a continuous function f
on an interval [a, b] takes as function values all the numbers between f (a) and f (b)
does not hold if we only allow for function values that are rational.
A major problem is that many sequences of rational numbers that appear to be
converging, actually don’t. Or, to be more precise, they only converge if we allow
the limit to lie outside the rational numbers. In fact, this is the main reason for
the introduction of the real numbers. One intuition here is that Q is incomplete,
but has “holes” filled with the irrational numbers. Take for instance the sequence
(xk )k∈N of rational numbers defined by
k
X 1
(24) xk := .
ℓ!
ℓ=0

This is a sequence of rational numbers that converges to Euler’s number e. How-


ever, Euler’s number is not rational. Thus, if we restrict ourselves only to the
rational numbers,
P∞ the sequence does not converge at all, even though it should (the
remainder ℓ=k+1 1/ℓ! tends very fast to 0).
In the case of the rational numbers, we get around that problem by introducing
the real numbers and working with them instead when we want to do calculus.
Still, when we start working with (infinite dimensional) normed spaces over the
real numbers, we observe similar phenomena to occur. Take for instance the space
C([0, 1]) of continuous real functions on the interval [0, 1] with the norm ∥f ∥1 =
R1
0
|f (x)| dx, and consider the sequence of functions
1


 0 if x < ,

 2
1 1 1 1
 
(25) fk (x) = k x − if ≤ x ≤ + ,
 2 2 2 k


1 1 1
if x > + .

2 k
This sequence appears to converge to the function

0 if x ≤ 1 ,

f (x) = 2
1
1 if x > .

2
This function, however, is not continuous and thus not contained in the space
C([0, 1]) we are interested in. Even though the sequence appears to converge, it
does not actually do so within the space of continuous functions.
On the space C 1 ([−1, 1]) of continuously differentiable real-valued functions on
the interval [−1, 1], we can consider the norm
Z 1
∥f ∥∞,1 := max |f (x)| + |f ′ (x)| dx.
x∈[−1,1] −1
p
Now consider the sequence of functions (fk )k∈N ⊂ C 1 ([−1, 1]), fk (x) := x2 + 1/k.
Then |fk (x) − |x|| → 0 for all k, and also fk′ (x) → sgn(x) for all x ̸= 0 (which is
the derivative of the function x 7→ |x| outside of 0), and it therefore appears as if
the sequence (fk )k∈N would converge to the function f (x) = |x|. Again, we have
that f ̸∈ C 1 ([−1, 1]), but the situation is even worse than in the previous case: We
4. CAUCHY–SEQUENCES AND COMPLETENESS 97

cannot even define ∥f ∥∞,1 , because we would need the derivative of f to be able to
do so.
In the cases discussed above, it appears pretty clear to us that the sequence we
consider should converge, and it is possible for us to find a limit after turning to
a larger space: the space of real numbers in the case of sequences in Q, and some
space of piecewise continuous functions or piecewise diffentiable functions in the
second case.
Now there are (at least) two questions: First, can we formally characterise
the situation where this phenomenon occurs. That is, can we formalise a notion of
completeness and incompleteness of normed spaces, and can we do so in an intrinsic
way without regarding any larger spaces. Second, if we know that a given normed
space (U, ∥·∥) is incomplete, can we find a larger space (V, ∥·∥) containing U , such
that every sequence in U that should converge, actually converges in V ? In order
to begin to tackle this question, we first have to clarify what we mean when we say
that a sequence “should converge”. Even more, we need to find a suitable definition
that does not rely on the limit of the sequence (which as of now might not exist).
For that, we recall the definition of convergence of a sequence in a normed
space: The sequence (un )n∈N converges in the normed space (U, ∥·∥) to u ∈ U , if
there exists for each ε > 0 some N ∈ N such that ∥un − u∥ < ε for all n ≥ N .
Now assume that this is the case, and take n, m ≥ N . Then the triangle inequality
implies that
∥un − um ∥ ≤ ∥un − u∥ + ∥um − u∥ < 2ε.
Note here that the inequality ∥un −um ∥ < 2ε makes no mention of the limit u of the
sequence. Thus we can use this in a definition of sequences that should converge.
Definition 4.45 (Cauchy sequence). A sequence (un )n∈N in a normed space
(U, ∥·∥) is a Cauchy sequence, if there exists for each ε > 0 some N ∈ N such that
∥un − um ∥ < ε whenever n, m ≥ N.

From our discussion before the definition, we immediately get the following
result:
Lemma 4.46. Every convergent sequence in a normed space is a Cauchy se-
quence.
Conversely, the sequence defined in (24) is a Cauchy sequence in Q, but does not
converge in that space. Moreover, within the real numbers, there is no difference
between convergent sequences and Cauchy sequences. As a preparation for this
result, we show that Cauchy sequences, like convergent sequences, are bounded.
Lemma 4.47. Assume that (U, ∥·∥) is a normed space and that (un )n∈N is a
Cauchy sequence in U . Then the set (un )n∈N is bounded.
Proof. Since the sequence (un )n∈N is a Cauchy sequence, there exists N ∈ N
such that ∥un − um ∥ < 1 whenever n, m ≥ N . Define now
r := max{∥u1 ∥, ∥u2 ∥, . . . , ∥uN ∥} + 1.
Then by construction un ∈ Br (0) for all 1 ≤ n ≤ N . Also, for n ≥ N we have that
∥un ∥ ≤ ∥un − uN ∥ + ∥uN ∥ < ∥uN ∥ + 1 ≤ r.
Thus we have that un ∈ Br (0) for all n ∈ N, which shows that the sequence is
bounded. □
Theorem 4.48. Every Cauchy sequence in the real numbers R converges.
98 4. BANACH AND HILBERT SPACES

Proof∗ . Assume that (xn )n∈N is a Cauchy sequence in R. Then by Lemma 4.47,
the sequence (xn )n∈N is bounded. Define now for N ∈ N
 
aN := inf xn : n ≥ N and bN := sup xn : n ≥ N .
Note that the existence of aN and bN follows from Axiom 1.36. Also, we have that
 
inf xn : n ∈ N = a1 ≤ aN ≤ bM ≤ b1 = sup xn : n ∈ N
for every N , M ∈ N. Define now

x := sup aN : N ∈ N .
Then we have by definition that x ≥ aN for all N . In addition, since aN ≤ bM
for all N and M , it follows that also x ≤ bM for all M ∈ N. We now claim that
x = limn→∞ xn .
Let therefore ε > 0. Since (xn )n∈N is a Cauchy sequence, there exists some
N ∈ N such that |xn− xm | < ε/2 for all n, m ≥ N . Let now n ≥ N be fixed.
Since aN = inf xk : k ≥ N , there exists some m ≥ N such that xm <
aN + ε/2. Thus
(xn − x) ≤ xn − aN < xn − (xm − ε/2) ≤ |xn − xm | + ε/2 < ε.
Moreover, since x ≤ bN , there exists some ℓ ≥ N such that xℓ > bN − ε/2. Thus
(x − xn ) ≤ bN − xn < xℓ + ε/2 − xn ≤ |xℓ − xn | + ε/2 < ε.
This shows that |x − xn | < ε for all n ≥ N , which proves that the sequence (xn )n∈N
converges to x. □
Remark 4.49. In the definition of a Cauchy sequence, it is crucial that we
take the distances between all elements in the sequence after the given index. If
Pn ∥xn − xn+1 ∥, the
we were to restrict ourselves to only the consecutive differences
definition would not be helpful: The harmonic series Sn := k=1 1/k diverges, but
|Sn+1 − Sn | = 1/n → 0. ■

Definition 4.50 (Completeness). A normed space (U, ∥·∥) is complete, if every


Cauchy sequence in U converges. ■

Thus our Theorem above can be rephrased as saying that the real numbers are
complete.
Lemma 4.51. The spaces Cd and Rd with the Euclidean norm ∥·∥2 are complete.
Proof. We only prove the claim in the (slightly more cumbersome) case of
Cd .
Assume that (zn )n∈N is a Cauchy sequence in Cd . Denote now, for 1 ≤ j ≤ d,
(j) (j) (j) (j)
by zk ∈ C the j-th component of the vector zk and write zk = ak + ibk , where
(j) (j) (j)
ak and bk are the real and imaginary parts of zk , respectively. Now let ε > 0.
d
Since (zn )n∈N is a Cauchy sequence in C , there exists some N ∈ N such that
∥zn − zm ∥2 < ε whenever n, m ≥ N . Thus we have for every n, m ≥ N that
|a(j) (j) (j) (j)
n − am | ≤ |zn − zm | ≤ ∥zn − zm ∥ < ε,
and similarly
|b(j) (j) (j) (j)
n − bm | ≤ |zn − zm | ≤ ∥zn − zm ∥ < ε.
(j) (j)
Thus the sequences (an )n∈N and (bn )n∈N are Cauchy sequences in R and therefore
have limits, say
a(j) := lim a(j)
n and b(j) := lim b(j)
n .
n→∞ n→∞
d (1) (d)
Define now z ∈ C by z := (z ,...,z ) with
(j) (j)
z := a + ib(j) .
4. CAUCHY–SEQUENCES AND COMPLETENESS 99

We claim that z = limn→∞ zn .


(j) (j)
Let therefore ε > 0. Since a(j) = limn→∞ an and b(j) = limn→∞ bn , there
exist N (j) ∈ N such that
ε ε
|a(j) (j)
n −a |< √ and |b(j) (j)
n −b |< √
2d 2d
for every n ≥ N (j) . Define now N := max{N (1) , . . . , N (d) } and let n ≥ N . Then
d
X 1/2 d
X d
X 1/2
∥zn − z∥2 = |zn(j) − z (j) |2 = |a(j) (j) 2
n −a | + |b(j) (j) 2
n −b |
j=1 j=1 j=1
d d
X ε2 X ε2 1/2
≤ + = ε.
j=1
2d j=1 2d

This proves that z = limn→∞ zn , which in turn proves the completeness of Cd . □


R1
Lemma 4.52. The space C([0, 1]) with the norm ∥f ∥1 = 0 |f (x)| dx is not
complete.
Proof. Let fk : [0, 1] → R be as in (25). We will show explicitly that the
sequence (fk )k∈N is a Cauchy sequence with respect to ∥·∥1 . For that, let ε > 0
and choose some N > 1/ε. Let moreover n, m ≥ N . Then
h1 1 1i
fn (x) = fm (x) for all x ̸∈ , + .
2 2 N
Moreover, for 1/2 ≤ x ≤ 1/N we have that |fn (x) − fm (x)| ≤ 1. Thus
Z 1/2+1/N Z 1/2+1/N
1
∥fn − fm ∥1 = |fn (x) − fm (x)| dx ≤ 1 dx = < ε.
1/2 1/2 N
This shows that the sequence (fn )n∈N is a Cauchy sequence.
Now we will show that the sequence (fn )n∈N does not converge in C([0, 1]).
Assume to the contrary that limn→∞ fn = f ∈ C([0, 1]). Then we have for each
interval [a, b] ⊂ [0, 1] that
Z 1 Z b
0 = lim |fn (x) − f (x)| dx ≥ lim |fn (x) − f (x)| dx.
n→∞ 0 n→∞ a

We now consider specifically intervals [a, b] = [0, 1/2] and [a, b] = [1/2 + 1/N, 1]
for some fixed N ∈ N.
• Since fn (x) = 0 for all x ∈ [0, 1/2] and all n ∈ N, we have that
Z 1/2 Z 1/2 Z 1/2
0 = lim |fn (x) − f (x)| dx = lim |f (x)| dx = |f (x)| dx.
n→∞ 0 n→∞ 0 0

Since f is continuous, this is only possible if f (x) = 0 for all x ∈ [0, 1/2].
• Since fn (x) = 1 for all x ∈ [0, 1/2 + 1/N ] and all n ≥ N , we have that
Z 1 Z 1
0 = lim |fn (x) − f (x)| dx = lim |f (x) − 1| dx
n→∞ 1/2+1/N n→∞ 1/2+1/N
Z 1
= |f (x) − 1| dx.
1/2+1/N

Since f is continuous, this is only possible if f (x) = 1 for all x ∈ [1/2 +


1/N, 1].
100 4. BANACH AND HILBERT SPACES

The above considerations hold for each N ∈ N, and thus it follows that f (x) = 0
for all 0 ≤ x ≤ 1/2, and f (x) = 1 for all 1/2 < x ≤ 1. Therefore the possible limit
of the sequence (fn )n∈N is not continuous, and thus the sequence (fn )n∈N does not
converge in C([0, 1]). Thus the space is incomplete. □
The situation looks different when we regard the space C([0, 1]) with a different
norm.
Theorem 4.53. The space C([0, 1]) with the norm ∥f ∥∞ = maxx∈[0,1] |f (x)| is
complete.
Proof. Let (fn )n∈N be a Cauchy sequence in C([0, 1]). We have to show that
the sequence converges to some function f ∈ C([0, 1]). We start by identifying a
potential candidate for the limit function f . For each x ∈ [0, 1] we have that
|fn (x) − fm (x)| ≤ max |fn (y) − fm (y)| = ∥fn − fm ∥∞
y∈[0,1]

Since (fn )n∈N is a Cauchy sequence, this implies that the sequence of real numbers
(fn (x))n∈N is a Cauchy sequence as well. Because R is complete, there exists some
f (x) ∈ R with limn→∞ fn (x) = f (x).
We will now show that the resulting function f : [0, 1] → R is the limit of the
sequence of functions (fn )n∈N . For that, we have to show that f is continuous and
that ∥fn − f ∥∞ → 0 as n → ∞.
We will first show that f is continuous. Let therefore ε > 0. Since (fn )n∈N is
a Cauchy sequence, there exists N ∈ N such that
∥fn − f ∥∞ = max |fn (x) − fm (x)| < ε/3
x∈[0,1]

for all n, m ∈ N. Since f (x) = limm→∞ fm (x), we have that


(26) |fn (x)−f (x)| = lim |fN (x)−fm (x)| ≤ ε/3 for all x ∈ [0, 1] and n ≥ N.
m→∞

Now let y ∈ [0, 1] be arbitrary. The function fN is continuous, and thus there
exists δ > 0 such that |fN (y)−fN (z)| < ε/3 whenever |y −z| < δ. By applying (26)
with n = N we thus obtain for every z with |y − z| that
ε ε ε
|f (y) − f (z)| ≤ |f (y) − fN (y)| + |fN (y) − fN (z)| + |fN (z) − f (z)| < + + = ε.
3 3 3
Since ε was arbitrary, this shows that f is continuous at y ∈ [0, 1]. Since y was
arbitrary, it follows that f is continuous.
Now we use (26) again to conclude that
∥fn − f ∥∞ = max |fn (x) − f (x)| ≤ ε/3 for all n ≥ N.
x∈[0,1]

Since ε was arbitrary, this shows that f = limn→∞ fn .


Thus we have shown that every Cauchy sequence in C([0, 1]) converges, and
thus C([0, 1]) is complete with respect to ∥·∥∞ . □
Because of their importance, complete normed spaces as well as complete inner
product spaces have earned their own names:
Definition 4.54 (Banach space). A Banach space is a complete normed space.

Definition 4.55 (Hilbert space). A Hilbert space is a complete inner product


space. ■

Thus we can rephrase the previous results as follows:


Example 4.56. The space (Kd , ∥·∥2 ) is a Hilbert space. ■
4. CAUCHY–SEQUENCES AND COMPLETENESS 101

Theorem 4.57. The space (C([0, 1]), ∥·∥∞ ) is a Banach space.


Lemma 4.58. The space (C[0, 1], ∥·∥1 ) is not a Banach space.
Remark 4.59. Assume that (fn )n∈N is a sequence of functions fn : [0, 1] → R,
and assume that f : [0, 1] → R.
• The sequence (fn )n∈N converges pointwise to f , if
lim |fn (x) − f (x)| = 0 for all x ∈ [0, 1].
n→∞

• The sequence (fn )n∈N converges uniformly to f , if


lim sup |fn (x) − f (x)| = 0.
n→∞ x∈[0,1]

That is, uniform convergence of a sequence of functions is precisely convergence with


respect to the norm ∥·∥∞ (though it makes sense to use the expression “uniform
convergence” also for discontinuous functions). Thus Theorem 4.53 in particular
implies that the uniform limit of a sequence of continuous functions is again con-
tinuous.
However, from the pointwise convergence of a sequence (fn )n∈N of continuous
to some f : [0, 1] → R we cannot conclude that f is continuous as well. An example
is the sequence of continuous functions from (25) that converges pointwise to the
discontinuous function f : [0, 1] → R, f (x) = 0 for 0 ≤ x ≤ 1/2 and f (x) = 1 for
1/2 < x ≤ 1. ■

Remark 4.60. In Theorem 4.53, we have used the completeness of R to show


that the space C([0, 1]) of continuous R-valued functions is complete as well with
respect to the norm ∥·∥∞ .
Now assume that (V, ∥·∥V ) is an arbitrary normed space. Then we can consider
the space C([0, 1], V ) of continuous functions f : [0, 1] → V with the norm
∥f ∥∞ := max ∥f (x)∥V .
x∈[0,1]

With essentially the same proof, we can show that the space C([0, 1], V ) with this
norm is complete as well provided that V is complete. Also, we can change the
interval [0, 1] to an arbitrary closed and bounded interval I ⊂ R (or a closed and
bounded set Ω ⊂ Rn ).3
Since the space Rd is complete respect to the Euclidean norm, this implies in
particular that the space C(I; Rd ) is complete with respect to ∥·∥∞ whenever I ⊂ R
is a closed and bounded interval. This is something we will make use of in Section 7
below. ■

Theorem 4.61. Assume that (U, ∥·∥) is a Banach space and that V ⊂ U is a
subspace. Then V is complete (with respect to the restriction of the norm ∥·∥ to V )
if and only if V is closed (as a subspace of U ).
Proof. Assume that V is complete as a normed space. Let moreover (un )n∈N
be a sequence with un ∈ V for all n ∈ N and limn→∞ un = u for some u ∈ U . We
have to show that u ∈ V .
Since the sequence (un )n∈N converges in U , it is a Cauchy sequence in U . This,
however, implies that it is a Cauchy sequence in V , which is complete. Thus the
sequence (un )n∈N converges to some w ∈ V . Because of the uniqueness of the limit
3The reason for requiring the interval (or set) to be closed and bounded is that we want
the maximum actually to exist and, in particular, be finite. If we allow non-closed or unbounded
intervals, this is no longer the case, and the “norm” of a function could become +∞. We could
remedy this problem, though, by restricting ourselves to bounded continuous functions, but we
won’t go that far.
102 4. BANACH AND HILBERT SPACES

of convergent sequences in normed spaces, it follows that u = w ∈ V . Thus V is


closed.
Now assume that V is closed, and let (un )n∈N be a Cauchy sequence in V . Then
this is a Cauchy sequence in U as well, and thus the completeness of U implies that
it converges to some u ∈ U . Because V is closed and u = limn→∞ un , it follows
that u ∈ V . Thus the Cauchy sequence (un )n∈N converges in V , and thus V is
complete. □

5. Equivalence of Norms
Definition 4.62. Assume that U is a vector space and that ∥·∥a and ∥·∥b
are norms on U . We say that the norms are equivalent, if there exist constants
0 < c < C < ∞ such that
c∥u∥a ≤ ∥u∥b ≤ C∥u∥a
for all u ∈ U . ■

Example 4.63. The norms ∥·∥1 and ∥·∥∞ on Kn are equivalent. Indeed, we
have for every u ∈ Kn the estimate
n
X
∥u∥∞ = max |ui | ≤ |ui | = ∥u∥1 ≤ n max |ui | = n∥u∥∞ .
1≤i≤n 1≤i≤n
i=1

Thus we can use the constants c = 1 and C = n. ■

Example 4.64. The norms ∥·∥1 and ∥·∥∞ on C([0, 1]) are not equivalent. Al-
though we can estimate
Z 1
∥f ∥1 = |f (x)| dx ≤ max |f (x)| = ∥f ∥∞
0 x∈[0,1]

for all f ∈ C([0, 1]), there exists no constant C > 0 such that ∥f ∥∞ ≤ C∥f ∥1 for
all f ∈ C([0, 1]). This can be seen be regarding the functions fn (x) := xn . We
have ∥fn ∥∞ = 1 for all n ∈ N, but ∥fn ∥1 = 1/(n + 1). Thus such a constant C > 0
would have to satisfy the estimate C ≥ n + 1 for all n ∈ N, which is of course
impossible. ■

If ∥·∥a and ∥·∥b are equivalent norms, then they induce the same convergence
of sequences:
Proposition 4.65. Assume that U is a vector space and ∥·∥a and ∥·∥b are
equivalent norms on U . Assume moreover that (un )n∈N is a sequence in U .
• (un )n∈N is a Cauchy sequence with respect to ∥·∥a if and only if it is a
Cauchy sequence with respect to ∥·∥b .
• (un )n∈N converges with respect to ∥·∥a if and only if it converges with
respect to ∥·∥b . Moreover, in this case the limits are the same.
Proof. Exercise. □
As a consequence, equivalent norms define the same closed (and thus open)
sets on a vector space.
Corollary 4.66. Assume that U is a vector space and ∥·∥a and ∥·∥b are equi-
valent norms on U .
• A subset A ⊂ U is open with respect to ∥·∥a if and only if it is open with
respect to ∥·∥b .
• A subset K ⊂ U is closed with respect to ∥·∥a if and only if it is open with
respect to ∥·∥b .
5. EQUIVALENCE OF NORMS 103

Corollary 4.67. Assume that U is a vector space and ∥·∥a and ∥·∥b are equi-
valent norms on U . Then U is complete with respect to ∥·∥a if and only if it is
complete with respect to ∥·∥b .
Of particular importance is the case of a finite dimensional vector space. Here
we can show that in fact all norms are equivalent. Thus the choice of the norm has
no influence on the convergence of sequences or continuity of functions.
Theorem 4.68. Assume that U is a finite dimensional vector space. Then all
norms on U are equivalent.
Proof. Let (u1 , . . . , un ) be a basis of U . We define a norm ∥·∥1 on U setting
Xn n
X
∥u∥1 := |xi | if u = xi ui .
i=1 i=1
That is, ∥u∥1 is the 1-norm of the coordinate representation of u with respect to the
basis (u1 , . . . , un ). It is then sufficient to show that all norms on U are equivalent
to ∥·∥1 .
Assume therefore that ∥·∥ is a norm on U . Define
C := max{∥u1 ∥, . . . , ∥un ∥}.
Assume moreover that
n
X
u= xi ui
i=1
is a vector in U . Then
Xn n
X n
X 
∥u∥ = xi ui ≤ |xi |∥ui ∥ ≤ |xi | max ∥uj ∥ = C∥u∥1 .
1≤j≤n
i=1 i=1 i=1
Next define 
c := inf ∥v∥ : ∥v∥1 = 1 .
Clearly, c ≥ 0. In fact, we will show that c > 0. For this we note first that we can
equivalently write c as
n Xn n
X o
c = inf xi ui : x ∈ Kn , |xi | = 1 .
i=1 i=1
n
Define now the mapping f : K → R,
n
X
f (x) := xi ui .
i=1
We claim that f is continuous. By the reverse triangle inequality for the norm ∥·∥
we have for all x, y ∈ Kn that
n
X n
X n
X
|f (x) − f (y)| = xi ui − yi ui ≤ (xi − yi )ui
i=1 i=1 i=1
n n n
X X X √
≤ |xi − yi |∥ui ∥ ≤ C |xi − yi | = C 1 · |xi − yi | ≤ C n∥x − y∥2 .
i=1 i=1 i=1
Here we have used the Cauchy–Schwarz–Bunyakovsky inequality in the last step.
This, however, shows that f in fact is Lipschitz continuous on Kn . Now the Ex-
tremal Value Theorem implies that the function
 f attains Pnits minimum (and max-
imum) on the closed and bounded set K := x ∈ Kn : i=1 |xi | = 1 . Moreover,
by construction of f we have that f (x) > 0 for all x ∈ K. Thus
c = inf f (x) > 0.
x∈K
104 4. BANACH AND HILBERT SPACES

Now let u ∈ U \ {0} and denote v := u/∥u∥1 . Then ∥v∥1 = 1 and thus ∥v∥1 ≥ c.
Thus
u
∥u∥ = ∥u∥1 = ∥u∥1 ∥v∥ ≥ c∥u∥1 ,
∥u∥1
which concludes the proof. □

6. Banach’s Fixed Point Theorem


Definition 4.69 (Lipschitz continuity). Let (U, ∥·∥U ) and (V, ∥·∥V ) be normed
spaces and let f : U → V be a function. We say that f is Lipschitz continuous, if
there exists L ≥ 0 such that
(27) ∥f (u) − f (v)∥V ≤ L∥u − v∥U
for all u, v ∈ U . A constant L with this property is called a Lipschitz constant of
f. ■

Remark 4.70. Sometimes one defines the Lipschitz constant of f as the smal-
lest constant for which (27) holds. We will refrain from doing so, as the smallest
such constant is often difficult to find, whereas finding any such constant is often
feasible in practice. ■

Lemma 4.71 (Continuity of Lipschitz functions). Every Lipschitz continuous


function is continuous.
Proof. Assume that (U, ∥·∥U ) and (V, ∥·∥V ) are normed spaces and let f : U →
V be a Lipschitz continuous function and let L > 0 be a (strictly positive) Lipschitz
constant for f . Let ε > 0 and define δ := ε/L. Let moreover u, v ∈ U be such that
∥u − v∥U < δ. Then
∥f (u) − f (v)∥V ≤ L∥u − v∥U < Lδ = ε.
Thus f is continuous. □
Remark 4.72. Assume that f : Rn → R is continuously differentiable and that
there exists L ≥ 0 such that ∥∇f (x)∥ ≤ L for all x ∈ Rn . Then f is Lipschitz
continuous with Lipschitz constant L. In order to see that, note that
Z 1

f (y) = f (x) + ∇f x + t(y − x) , y − x dt
0
and therefore
Z 1 
∥f (y) − f (x)∥ = ∇f x + t(y − x) , y − x dt
0
Z 1 
≤ ∇f x + t(y − x) , y − x dt
0
Z 1 
≤ ∥∇f x + t(y − x) ∥∥y − x∥ dt ≤ L∥y − x∥
0
for all x, y ∈ Rn . ■

Definition 4.73 (Contraction). Let (U, ∥·∥U ) and (V, ∥·∥V ) be normed spaces
and let f : U → V be a function. We say that f is a contraction, if f is Lipschitz
continuous with a Lipschitz constant L satisfying 0 ≤ L < 1. ■

In other words, f is a contraction, if there exists 0 ≤ L < 1 such that


∥f (u) − f (v)∥V ≤ L∥u − v∥U
for all u, v ∈ U .
6. BANACH’S FIXED POINT THEOREM 105

Remark 4.74. Note that it is not sufficient for a function f to be a contraction


that it satisfies
∥f (u) − f (v)∥V < ∥u − v∥U
for all u ̸= v. Instead, we require the Lipschitz constant to be strictly smaller than
1.
For instance, the function f : R → R, f (x) = x − arctan x is not a contraction,
although
1 x2
f ′ (x) = 1 − 2 = 2 <1
x +1 x +1
for all x ∈ R. ■

Theorem 4.75 (Banach’s Fixed Point Theorem). Assume that (U, ∥·∥) is a
Banach space and that f : U → U is a contraction. Then there exists a unique
element u∗ ∈ U (a fixed point of f ) such that
(28) f (u∗ ) = u∗ .
Proof. Let 0 ≤ L < 1 be a Lipschitz constant of f .
We start by showing that the solution of (28), if it exists, is unique. Assume
therefore that u, v ∈ U satisfy f (u) = u and f (v) = v. Then
∥u − v∥ = ∥f (u) − f (v)∥ ≤ L∥u − v∥.
Since L < 1, this is only possible if ∥u − v∥ = 0 and thus u = v. This shows
uniqueness of a fixed point of f . It remains to show that a fixed point actually
exists.
Let now u0 ∈ U be arbitrary and consider the sequence given by
un+1 = f (un ).
We will show that this sequence converges to a fixed point of f .
We start by showing that this is actually a Cauchy sequence. For that, we note
that for all n ∈ N,
∥un+1 − un ∥ = ∥f (un ) − f (un−1 )∥ ≤ L∥un − un−1 ∥.
By induction over n we thus obtain that
∥un+1 − un ∥ ≤ Ln ∥u1 − u0 ∥.
Thus we have for all n ∈ N and k ∈ N that
k
X k
X
(29) ∥un+k − un ∥ ≤ ∥un+j − un+j−1 ∥ ≤ Ln+j−1 ∥u1 − u0 ∥
j=1 j=1
k ∞
X X Ln
= Ln ∥u1 − u0 ∥ Lj−1 ≤ Ln ∥u1 − u0 ∥ Lj = ∥u1 − u0 ∥.
j=1 j=0
1−L

Now let ε > 0 and let N ∈ N be such that


LN
∥u1 − u0 ∥ < ε.
1−L
Let moreover n, m ≥ N . Without loss of generality, assume that m ≥ n. Then we
can apply (29) with k = m − n and obtain that
Ln LN
∥um − un ∥ ≤ ∥u1 − u0 ∥ ≤ ∥u1 − u0 ∥ < ε.
1−L 1−L
Thus the sequence (un )n∈N is a Cauchy sequence. Since (U, ∥·∥) is a Banach space,
there exists u∗ ∈ U such that u∗ = limn→∞ un .
106 4. BANACH AND HILBERT SPACES

It remains to show that u∗ is a fixed point of f . Since f is a contraction, it is


in particular Lipschitz continuous and thus continuous. Since u∗ = limn→∞ un by
definition, we thus have that

f (u∗ ) = lim f (un ) = lim un+1 = u∗ ,


n→∞ n→∞

which concludes the proof. □

Remark 4.76. It is important to note here that the proof is constructive in


that it provides us with an explicit algorithm, namely fixed point iteration, to ap-
proximate the fixed point u∗ of f . In fact, we have shown that the sequence

un+1 = f (un )

converges for each starting point u0 ∈ U to u∗ . In fact, we have therefore formu-


lated a numerical algorithm for the solution of the fixed point equation under the
assumption that f is a contraction.
Moreover, if we refine some of the analysis of the proof, we can obtain some
estimates for how close we are to u∗ after a finite number of steps:
By letting k tend to infinity in (29), we obtain the a-priori estimate
Ln
∥u∗ − un ∥ ≤ ∥u1 − u0 ∥.
1−L
This allows us to estimate already after the first step how close we will be to u∗
after n steps.
In addition, we can show the a-posteriori estimate
L
∥u∗ − un ∥ ≤ ∥un − un−1 ∥.
1−L
This gives us an upper bound for how close we are to the solution by only looking
at the size of the last step we made. ■

Sometimes we are given a mapping f that is not a contraction on the whole


space U but only on some subset K ⊂ U . In this case, it is still possible to apply
a variation of Banach’s fixed point theorem, provided that K is closed and f maps
K onto itself:

Proposition 4.77. Assume that (U, ∥·∥) is a Banach space, that f : U → U is


a function, and that K ⊂ U is a closed subset. Assume moreover that

f (u) ∈ K whenever u ∈ K

and that there exists 0 ≤ L < 1 such that

∥f (u) − f (v)∥ ≤ L∥u − v∥ for all u, v ∈ K.

Then there exists a unique u∗ ∈ K such that f (u∗ ) = u∗ .

Proof. Choose u0 ∈ K arbitrary and define again a sequence setting un+1 :=


f (un ). Then un ∈ K for all n ∈ N. Applying the same estimates as in the proof
of Theorem 4.75, we then obtain that (un )n∈N is a Cauchy sequence and therefore
converges to some u∗ ∈ U . Because K is closed and un ∈ K for all n, it follows
that u∗ ∈ K. Moreover, as in the proof of Theorem 4.75 we obtain that u∗ is a
fixed point of f . For the uniqueness of that fixed point (within K), we can also
apply the same ideas as there. □
7. APPLICATION OF THE FIXED POINT THEOREM: EXISTENCE FOR ODES 107

Application to Integral Equations. Assume that k : [0, 1] × [0, 1] → R is a


continuous function satisfying
c := max |k(x, y)| < 1,
(x,y)∈[0,1]2

and let h ∈ C([0, 1]) be a given function. We want to solve the integral equation
Z 1
(30) f (x) + k(x, y) f (y) dy = h(x).
0

That is, we want to find a function f ∈ C([0, 1]) such that (30) holds for each
x ∈ [0, 1]. Using Banach’s fixed point theorem, we can show that this equation has
a unique solution: R1
Since k is continuous, it follows that the function x 7→ 0 k(x, y)f (y) dy is
continuous. Since h is assumed to be continuous as well, we can therefore define
the mapping T : C([0, 1]) → C([0, 1]),
Z 1
T f (x) := h(x) − k(x, y) f (y) dy.
0

Then f ∈ C([0, 1]) solves (30) if and only if f is a fixed point of T .


Now we show that T is a contraction on the space C([0, 1]) with respect to the
norm ∥·∥∞ . For that
∥T f − T g∥∞ = max T f (x) − T g(x)
x∈[0,1]
Z 1 
= max k(x, y) f (y) − g(y) dy
x∈[0,1] 0
Z 1
≤ max k(x, y) f (y) − g(y) dy
x∈[0,1] 0
Z 1
≤ max c f (y) − g(y) dy
x∈[0,1] 0
Z 1
=c f (y) − g(y) dy
0
≤ c max f (y) − g(y)
y∈[0,1]

= c∥f − g∥∞ .

Thus T is a contraction on the Banach space C([0, 1]), ∥·∥∞ , and therefore T
has a unique fixed point. Put differently, the equation (30) has, under the given
assumptions on k and h, a unique continuous solution f .

7. Application of the Fixed Point Theorem: Existence for ODEs


We now consider the application of Banach’s fixed point Theorem to the ques-
tion of existence and uniqueness of solutions of ODEs. For that, we consider an
initial value problem of the form
y ′ = f (t, y),
(31)
y(t0 ) = y0 ,
where f : R × Rd → Rd is some continuous function, t0 ∈ R, and y0 ∈ Rd is a given
initial value.
Theorem 4.78 (Picard–Lindelöf). Let a > 0 and b > 0, and define the cylinder
Z := (t, y) ∈ R × Rd : |t − t0 | ≤ a and |y − y0 |2 ≤ b .

108 4. BANACH AND HILBERT SPACES

Assume that the function f : R × Rd → Rd is continuous on Z and that it satisfies


a Lipschitz condition with respect to its second component in the sense that there
exists L ≥ 0 such that
f (t, y) − f (t, z) 2
≤ L|y − z|2

for all (t, y), (t, z) ∈ Z. Assume moreover that


b
M := max |f (t, y)|2 ≤ .
(t,y)∈Z a
Then the initial value problem has a unique solution on the interval I = [t0 − a, t0 +
a].
Proof. Since the proof is somewhat longish, we split it up into separate steps.

Step 1: Rewrite the ODE as an integral equation.


Assume that y solves the initial value problem (31). Then we have that
Z t Z t
y ′ (s) ds = y0 +

y(t) = y0 + f s, y(s) ds
t0 t0

for all t. Conversely, if y is a continuous function that satisfies the equation


Z t

(32) y(t) = y0 + f s, y(s) ds,
t0

then y is necessarily differentiable (because the right hand side is) and it satis-
fies (31).
Define therefore the mapping T : C(I; Rd ) → C(I; Rd ),
Z t

T y(t) := y0 + f s, y(s) ds.
t0

Then y solves the initial value problem 31 if and only if it satisfies the fixed point
equation T y = y.
Step 2: Restrict T to a suitable subset of C(I; Rd ).
Our goal is to apply Banach’s fixed point theorem to the equation T y = y. For
that, we need to find a suitable subset K ⊂ C(I; Rd ) such that T maps K into itself
and T is a contraction on K. Define therefore
K := y ∈ C(I; Rd ) : |y(t) − y0 |2 ≤ b for all t ∈ I


and assume that y ∈ K. Then



f s, y(s) 2
≤M
for all s ∈ I and thus
Z t Z t
  b
T y(t) − y0 2 = f s, y(s) ds ≤ f s, y(s) 2
ds ≤ |t − t0 |M ≤ a =b
t0 2 t0 a
for all t ∈ I. Since T y is continuous, it follows that T y ∈ K as well.
Step 3: Show that we do not lose any possible solutions.
Assume now that y ∈ C(I; Rd ) is any solution of the equation (32) (or, equival-
ently, of the ODE (31)). We want to show that, necessarily, y ∈ K. Assume there-
fore to the contrary that y ̸∈ K. Then there exists t ∈ I such that |y(t) − y0 |2 > b.
Since y(t) = y0 by assumption, it follows that t ̸= t0 .
7. APPLICATION OF THE FIXED POINT THEOREM: EXISTENCE FOR ODES 109

Assume now that t > t0 . Since the mapping t 7→ |y(t)−y0 |2 is continuous, there
exists a minimal value t0 < t1 < t0 + a such that |y(t1 ) − y0 |2 = b. In particular,
|y(t1 ) − y0 |2 < b for all t0 < s < t1 . As a consequence
Z t1

b = |y(t1 ) − y0 |2 = f s, y(s) ds ≤ M |t1 − t0 | < M a = b,
t0 2

which is an obvious contradiction.


We obtain a contradiction in a similar way if we assume that t < t0 . This shows
that such a t cannot exist, and thus y ∈ K.
Step 4: Define a suitable norm on C(I; Rd ).
We now need to define a norm on C(I; Rd ) with respect to which the mapping
T is a contraction on K. For that we define
(33) ∥y∥ := max e−2L|t−t0 | |y(t)|2 .
t∈I

Assume now that y, z ∈ K. Then


∥T y − T z∥ = max e−2L|t−t0 | |T y(t) − T z(t)|2
t∈I
Z t Z t
= max e−2L|t−t0 | y0 +
 
f s, y(s) ds − y0 − f s, z(s) ds
t∈I t0 t0 2
Z t
= max e−2L|t−t0 |
 
f s, y(s) − f s, z(s) ds
t∈I t0 2
Z t
≤ max e−2L|t−t0 |
 
f s, y(s) − f s, z(s) 2 ds
t∈I t0
Z t
≤ max e−2L|t−t0 | L|y(s) − z(s)|2 ds
t∈I t0
Z t
= max e−2L|t−t0 | Le2L|s−t0 | e−2L|s−t0 | |y(s) − z(s)|2 ds
t∈I t0
Z t  
≤ max e−2L|t−t0 | Le2L|s−t0 | ds max e−2L|s−t0 | |y(s) − z(s)|2
t∈I t0 s∈I
 Z t 
= max e−2L|t−t0 | Le2L|s−t0 | ds ∥y − z∥
t∈I t0
2L|t−t0 | 
 e
= max e−2L|t−t0 | ∥y − z∥
t∈I 2
1
= ∥y − z∥.
2
This shows that T is a contraction with respect to the norm defined in (33).
Step 5: Show that C(I; Rd ) is complete and K closed.
Denote by ∥·∥∞ the standard ∞-norm on C(I; Rd ), that is,
∥y∥ := max|y(t)|2 .
t∈I

Since
1 ≥ e−2L|t−t0 | ≥ e−2La
for all t ∈ I, we have that
e2La ∥y∥∞ ≤ ∥y∥ ≤ ∥y∥∞
for all y ∈ C(I; Rd ). Thus ∥·∥ and ∥·∥∞ are equivalent norms on C(I; Rd ). Since
C(I; Rd ) with the norm ∥·∥∞ is complete, this implies that it is also complete with
the norm ∥·∥ and thus a Banach space.
110 4. BANACH AND HILBERT SPACES

Also, a set is closed with respect to ∥·∥ if and only if it is closed with respect to
∥·∥∞ . Since K is precisely the closed ball of radius b around the constant function
y0 with respect to ∥·∥∞ , it is closed for that norm. Thus K is closed for the norm
∥·∥ as well.
Step 6: Collect everything and call it a day.
We have shown that K is a closed subset of the Banach space C(I; Rd ) (Step
5), that T y ∈ K whenever y ∈ K (Step 2), and that the restriction of T to K is a
contraction (Step 4). Thus Banach’s fixed point theorem implies that the mapping
T has a unique fixed point y ∗ in K. Since solving the ODE (31) is equivalent to
finding a fixed point of T in C(I; Rd ) (Step 1), we can conclude that the solution (31)
has at least one solution (namely y ∗ ). Moreover, if z is another solution, it cannot
be contained in K. This, however, is not possible by Step 3. □
Remark 4.79. It is possible to show that the ODE (31) has a solution if the
function f is continuous; the Lipschitz condition we have assumed in the proof is
actually not necessary for that (this is the statement of the Peano existence theorem
for ODEs). However, without the Lipschitz condition, we can no longer guarantee
that the solution is unique. A classical example for this is the ODE
y ′ = y 1/3 , y(0) = 0.
An obvious solution of the initial value problem is the constant function y(t) = 0.
However, the functions
y± (t) = ±(2t/3)3/2
for t ≥ 0 are also solutions of the same ODE. Even worse, for any T > t0 we can
define the functions
(
0 if t < T,
y±,T (t) := 3/2
±(2(t − T )/3) if t ≥ T.
All of these functions solve the same initial value problem.
This does not contradict the Picard–Lindelöf Theorem, though, as the function
y 7→ y 1/3 is not Lipschitz continuous in a neighbourhood of 0. ■

Remark 4.80 (Picard iteration). Since we have proved Theorem 4.78 by show-
ing that the Banach fixed point Theorem is applicable, we obtain in addition that
the fixed point iteration (the Picard iteration)
Z t

(34) yk+1 (t) = T yk (t) = y0 + f s, yk (s) ds
t0

(where we use the initialisation y0 (t) = y0 , although this is not necessary for the
convergence) actually converges to a solution of the ODE (31).
If we apply this to the ODE
y ′ = y, y(0) = 1,
we obtain the iterates
y0 (t) = 1,
Z t
y1 (t) = 1 + 1 ds = 1 + t,
0
t
t2
Z
y2 (t) = 1 + 1 + s ds = 1 + t + ,
0 2
t
s2 t2 t3
Z
y3 (t) = 1 + 1+s+ ds = 1 + t + + ,
0 2 2 6
8. p-NORMS 111

and, in general,
n
X tk
yn (t) = ,
k!
k=0
which happens to be the truncated Taylor series expansion of the exponential func-
tion.
More generally, one can in principle use the iteration (34) in order to define a
numerical solution method for ODEs. A difficulty here is that the integrals that
need to be evaluated in each step of the iteration can in general not be solved
analytically. This problem can, however, be solved in practice by falling back
to numerical quadratures. Still, methods based on Picard iteration appear to be
limited to some niche applications, and in general one uses either some Runge–
Kutte method or some multi-step method for the numerical solution of initial value
problems. ■

8. p-norms
A particularly interesting and useful class of norms on Kn are the so called
p-norms, which are generalisations of the Euclidean norm as well as the norms ∥·∥1
and ∥·∥∞ .
Definition 4.81 (p-norm on Kn ). Let 1 ≤ p < ∞. We define the mapping
∥·∥p : Kn → R,
1/p
∥x∥p := |x1 |p + . . . + |xn |p
for x ∈ Kn .
Similarly, we define ∥·∥∞ : Kn → R by

∥x∥∞ := max |x1 |, . . . , |xn |
for x ∈ Kn . ■

We will show in the following that ∥·∥p is indeed a norm on Kn . The symmetry
and positivity are fairly obvious, but the triangle inequality is not, unless we are in
the simpler cases p = 1 or p = ∞. Also, for p = 2, we simply obtain the Euclidean
norm, which we have previously considered in the context of inner products.
For proving the triangle inequality in all other cases, we will need some addi-
tional results.
Definition 4.82 (Conjugate). Let 1 < p < ∞. We define the conjugate (or
Hölder conjugate) of p as the solution 1 < p∗ < ∞ of the equation
1 1
+ = 1.
p p∗
Moreover, for p = 1 we define the conjugate as p∗ := ∞, and for p = ∞ as
p∗ := 1. ■

Alternatively, we can write the conjugate of 1 < p < ∞ explicitly as


p
q= .
p−1
Lemma 4.83 (Young’s inequality). Let 1 < p < ∞, and let 1 < p∗ < ∞ be the
conjugate of p. Then for all a, b ≥ 0 the inequality
1 p 1
(35) ab ≤ a + bp∗
p p∗
holds.
112 4. BANACH AND HILBERT SPACES

Proof. For b = 0, this inequality holds obviously. Now assume that b > 0 and
consider the function f : R≥0 → R≥0 ,
1
f (a) := ab − ap .
p
Then (35) is equivalent to the statement that f (a) ≤ p1∗ bp∗ for all a ≥ 0. In order to
show this, we compute the maximum of f . Since lima→∞ f (a) = −∞ (because the
term − p1 ap decreases superlinearly towards −∞, whereas the term ab only increases
linearly), this maximum actually exists in R≥0 . Moreover, since f ′ (0) = b > 0, the
point 0 cannot be the maximum of f , and thus the function f attains the maximum
at some point 0 < a < ∞. In particular, we can therefore find this maximum by
computing the derivative of f and setting it to 0. We have that
f ′ (a) = b − ap−1 ,
which is equal to 0 for a = b1/(p−1) . Thus a = b1/(p−1) is the unique maximiser of
f , and therefore f (a) ≤ f (b1/(p−1) ) for all a ≥ 0. Thus
1 1
ab − ap = f (a) ≤ f (b1/(p−1) ) = bb1/(p−1) − bp/(p−1)
p p
1 p − 1 p/(p−1) 1
= bp/(p−1) − bp/(p−1) = b = bp∗ ,
p p p∗
which proves (35). □

Lemma 4.84 (Hölder’s inequality). Assume that 1 < p < ∞. Then we have for
all x, y ∈ Kn that
Xn Xn n
X n
1/p X 1/p∗
(36) xk yk ≤ |xk ||yk | ≤ |xk |p |yk |p∗ = ∥x∥p ∥y∥p∗ .
k=1 k=1 k=1 k=1

Proof. Assume without loss of generality that x, y ̸= 0, else the claim is


trivial.
The estimate
X n Xn
xk yk ≤ |xk ||yk |
k=1 k=1
is simply the triangle inequality on K. Now define
|xk | |yk |
ak = and bk = .
∥x∥p ∥y∥p∗
Then Young’s inequality implies that
n n n n
1 X X 1X p 1 X p
|xk ||yk | = ak bk ≤ ak + bk
∥x∥p ∥y∥p∗ p p∗
k=1 k=1 k=1 k=1
n n
1 X |xk |p 1 X |yk |p∗ 1 1
= p + p∗ = + = 1.
p ∥x∥p p∗ ∥y∥p∗ p p∗
k=1 k=1

Lemma 4.85 (Minkowski’s inequality). Assume that 1 < p < ∞. Then


Xn 1/p Xn n
1/p X 1/p
(37) ∥x+y∥p = |xk +yk |p ≤ |xk |p + |yk |p = ∥x∥p +∥y∥p .
k=1 k=1 k=1
n
for all x, y ∈ K .
8. p-NORMS 113

Proof. Without loss of generality, assume that x + y ̸= 0, else the claim is


trivial.
We start by estimating the p-th power of the left hand side by
n
X n
X
p
|xk + yk |p−1 |xk | + |yk |

(38) |xk + yk | ≤
k=1 k=1
n
X n
X
= |xk + yk |p−1 |xk | + |xk + yk |p−1 |yk |.
k=1 k=1

Now we apply Hölder’s inequality and obtain that


n
X n
X n
1/p∗ X 1/p
|xk + yk |p−1 |xk | ≤ |xk + yk |(p−1)p∗ |xk |p
k=1 k=1 k=1
n
X n
(p−1)/p X 1/p
= |xk + yk |p |xk |p ,
k=1 k=1

where we have used that the conjugate p∗ of p satisfies


p
p∗ = .
p−1
Similarly, we get that
n
X n
X n
(p−1)/p X 1/p
|xk + yk |p−1 |yk | ≤ |xk + yk |p |yk |p .
k=1 k=1 k=1

Inserting these estimates in (38), we obtain


n
X n
X n
(p−1)/p X 1/p n
X 1/p 
|xk + yk |p ≤ |xk + yk |p |xk |p + |yk |p .
k=1 k=1 k=1 k=1
P (p−1)/p
n
Now dividing both sides of this inequality by k=1 |xk + yk |p results
in (37). □

Theorem 4.86. For all 1 ≤ p ≤ ∞, the function ∥·∥p : Kn → R is a norm on


n
K .
Proof. For p = 1 and p = ∞, we have already discussed this earlier. Next,
for 1 < p < ∞, the symmetry and positivity of ∥·∥p are obvious from the defin-
ition. Finally, the triangle inequality for ∥·∥p is precisely Minkowski’s inequality,
Lemma 4.85. □

Similarly, we can define the p-norm of a continuous function:


Definition 4.87. Let 1 ≤ p < ∞. We define the function ∥·∥p : C([0, 1]) → R
by
Z 1 1/p
∥f ∥p := |f (x)| dx
0
for all f ∈ C([0, 1]).
Moreover, for p = ∞ we define
∥f ∥∞ := max |f (x)|.
x∈[0,1]


114 4. BANACH AND HILBERT SPACES

We now show that this defines a norm on C([0, 1]). The steps towards this
results are really the same as for the case (Kn , ∥·∥p ). We will first show Hölder’s
inequality for the integral, and then Minkowski’s inequality. Also, the proofs of
these inequalities are essentially the same as for Kn . All we have to do is to replace
every sum by an integral.4
Lemma 4.88 (Hölder’s inequality). Assume that 1 < p < ∞. Then we have for
all f , g ∈ C([0, 1]) that
Z 1 Z 1
(39) f (x)g(x) dx ≤ |f (x)g(x) dx|
0 0
Z 1 1/p Z 1 1/p∗
≤ |f (x)|p dx |g(x)|p∗ dx = ∥f ∥p ∥g∥p∗ .
0 0

Proof. Without loss of generality assume that f , g ̸= 0, else the claim is


trivial. Moreover, the estimate
Z 1 Z 1
f (x)g(x) dx ≤ |f (x)g(x) dx|
0 0
follows from standard properties of the integral. Define now
f (x) g(x)
f˜(x) := and g̃(x) := .
∥f ∥p ∥g∥p∗
Then Young’s inequality implies that
Z 1 Z 1 Z 1
1 ˜ 1 ˜ 1
|f (x)g(x)| dx = |f (x)g̃(x)| dx ≤ |f (x)|p + |g̃(x)|p∗ dx
∥f ∥p ∥g∥p 0 0 0 p p∗

1 1 |f (x)|p
Z 1
|g(x)|p∗
Z
1 1 1
= p dx + p∗ dx = + = 1.
p 0 ∥f ∥p p∗ 0 ∥g∥p∗ p p∗

Lemma 4.89 (Minkowski’s inequality). Assume that 1 < p < ∞. Then
Z 1 1/p
∥f + g∥p = |f (x) + g(x)|p dx
0
Z 1 1/p Z 1 1/p
p
≤ |f (x)| dx + |g(x)|p dx = ∥f ∥p + ∥g∥p .
0 0
for all f , g ∈ C([0, 1]).
Proof. We estimate
Z 1 Z 1
p p
|f (x) + g(x)|p−1 |f (x)| + |g(x)| dx

∥f + g∥p = |f (x) + g(x)| dx ≤
0 0
Z 1 (p−1)/p Z 1 1/p Z 1 1/p 
p p
≤ |f (x) + g(x)| dx |f (x)| dx + |g(x)|p dx ,
0 0 0
where we have used Hölder’s inequality (39) and the fact that p∗ = p/(p − 1) in the
R (p−1)/p
1
last step. Then we obtain the result by dividing by 0 |f (x)+g(x)|p dx . □

Theorem 4.90. For all 1 ≤ p ≤ ∞, the function ∥·∥p : C([0, 1]) → R is a norm
on C([0, 1]).
4The fact that we can use essentially the same proof in two seemingly different situations
should give us a hint that these situations are not that different after all. In fact, both cases can
be seen as special cases of a result that holds on general so called “measure spaces.” For (many)
more results on this topic, I refer to the course TMA4225 – Foundations of Analysis.
9. SEQUENCE SPACES 115

Proof. For p = 1 and p = ∞, we have already discussed this earlier. Next,


for 1 < p < ∞, the symmetry and positivity of ∥·∥p are obvious from the defin-
ition. Finally, the triangle inequality for ∥·∥p is precisely Minkowski’s inequality,
Lemma 4.89. □

9. Sequence Spaces
Assume that f : R → Z is a periodic function with period 2π. Assuming enough
regularity of f , we can then represent it as a Fourier series of the form
X
f (x) = cn einx ,
n∈Z

where Z 2π
1
cn = f (x)e−inx dx
2π 0
is the n-th Fourier coefficient of f . This Fourier coefficient indicates the contribution
of oscillations of angular frequency n to the whole function f . For many practical
problems it then can make sense to work directly with the Fourier coefficients
instead of working with the function f itself. In other words, it makes sense to
identify the function f with its Fourier series, which is a sequence (cn )n∈Z . In this
section, we will thus develop a theory of spaces of sequences and norms on such
spaces.
Definition 4.91 (ℓp -spaces). Let 1 ≤ p < ∞. By ℓp (or also ℓp (N)), we denote
the set of all sequences (xk )k∈N such that
X∞ 1/p
∥x∥p := |xk |p < ∞.
k=1
∞ ∞
Moreover, we denote by ℓ (or ℓ (N)) the set of all bounded sequences, that is,
all sequences for which
∥x∥∞ := sup|xk | < ∞.
k∈N

Remark 4.92. There is a large conceptual difference between the constructions


of the p-norms on Kn and the construction of the ℓp -spaces. In the former case we
started with the vector space Kn and then defined a norm on this space. In the
latter case the construction is in some sense reversed: We start by defining a “p-
norm” on the set of all sequences, but then say that the space ℓp consists of all
those sequences for which this “norm” is actually finite. ■

In the following we will show that these are indeed vector spaces and that the
mapping ∥·∥p : ℓp → R is a norm on ℓp .
Remark 4.93. One or both sides of the inequality (40) may in principle by
infinite. However, the inequality in this case still is satisfied. That is, if the right
hand side of (40) is finite, then so is the left hand side. In other words, if x ∈ ℓp
and y ∈ ℓp∗ , then the pointwise product z defined by zk = xk yk satisfies z ∈ ℓ1 . ■
Theorem 4.94 (ℓp as normed space). Let 1 ≤ p ≤ ∞. Then (ℓp , ∥·∥p ) is a
normed space.
Proof. We only prove the claim for 1 ≤ p < ∞ and leave the case p = ∞ to
the interested reader.
We have to show that ℓp is a vector space and that ∥·∥p is a norm on ℓp . For
the former, we will show that ℓp is a subspace of the vector space of all sequences
116 4. BANACH AND HILBERT SPACES

in K. Following Lemma 2.20 we thus have to show that 0 ∈ ℓp , that x + y ∈ ℓp if


x, y ∈ ℓp , and that λx ∈ ℓp if x ∈ ℓp and λ ∈ K.
The inclusion 0 ∈ ℓp holds trivially. Moreover, it is clear that ∥x∥p ≥ 0 for all
x ∈ ℓp with ∥x∥p = 0 if and only if x = 0.
Now assume that x ∈ ℓp and λ ∈ K. Then
X∞ 1/p  X∞ 1/p
∥λx∥p = |λxk |p = |λ|p |xk |p = |λ|∥x∥p < ∞.
k=1 k=1
p
This shows that λx ∈ ℓ and also that ∥·∥p is homogeneous.
Finally, assume that x, y ∈ ℓp . By Minkowski’s inequality in Kn (Lemma 4.85)
we have that
Xn 1/p X n 1/p Xn 1/p
p p
|xk + yk | ≤ |xk | + |yk |p
k=1 k=1 k=1

for all n ∈ N. Taking the limit n → ∞ and using that ∥x∥p < ∞ and ∥y∥p < ∞, it
follows that
X∞ 1/p X ∞ ∞
1/p X 1/p
∥x+y∥p = |xk +yk |p ≤ |xk |p + |yk |p = ∥x∥p +∥y∥p < ∞.
k=1 k=1 k=1
p
This shows that x + y ∈ ℓ , and also that ∥·∥p satisfies the triangle inequality.
Thus ℓp is a subspace of the space of all sequences and thus a vector space
itself, and ∥·∥p is a norm on ℓp . □

In addition, we can also show that Hölder’s inequality holds on ℓp :


Lemma 4.95 (Hölder’s inequality on ℓp ). Let 1 < p < ∞. For all sequences
(xk )k∈N , (yk )k∈N ⊂ K we have that
X∞ ∞
X ∞
X ∞
1/p X 1/p∗
(40) xk yk ≤ xk yk ≤ |xk |p |yk |p∗ .
k=1 k=1 k=1 k=1

Proof. The first inequality in (40) is obvious.


Moreover, we obtain from Hölder’s inequality in Kn (see Lemma 4.84) that
Xn Xn n
1/p X 1/p∗
xk yk ≤ |xk |p |yk |p∗
k=1 k=1 k=1

for all n ∈ N. Taking the limit n → ∞ now results in (40). □

The space ℓ∞ is the space of all bounded sequence, whereas ℓ1 is the space
of all absolutely summable sequences. Since every summable sequence necessary
has to be bounded (and, in fact, converge to 0), it follows that ℓ1 is contained
in ℓ∞ . However, the converse inclusion does not hold, as the constant sequence
x = (1, 1, 1, . . .) is contained in ℓ∞ but certainly not in ℓ1 . Thus we have that
ℓ1 ⊊ ℓ∞ . This type of inclusion can be generalised to all ℓp spaces:
Lemma 4.96. Assume that 1 < p < q < ∞. Then
ℓ1 ⊊ ℓp ⊊ ℓq ⊊ ℓ∞ .
Proof. We show first that all the inclusions hold. Here we note first that
all sequences in ℓq necessarily have to converge to 0 and thus are bounded, which
proves the last inclusion. Next assume that x ∈ ℓp . Then
X∞ ∞
X  ∞
X  
∥x∥qq = |xk |q = |xk |q−p |xk |p ≤ sup|xk |q−p |xk |p = sup|xk |q−p ∥x∥pp .
k∈N k∈N
k=1 k=1 k=1
9. SEQUENCE SPACES 117

Since q > p and the sequence x is bounded, it follows that ∥x∥q < ∞ and thus
x ∈ ℓq . This shows that ℓp ⊂ ℓq . Moreover, the inclusion ℓ1 ⊂ ℓp can be shown
using a similar argument.
Now we show that the inclusions are actually strict. First we note that the
constant sequence x = (1, 1, . . .) is contained in ℓ∞ but not in ℓq , which shows that
ℓ1 ⊊ ℓ∞ . Next consider the sequence x = (x1 , x2 , . . .) given by
1
xk = .
k 1/p
Then

X 1
∥xk ∥qq = q/p
<∞
k=1
k
as q > p and therefore x ∈ ℓq . However,

X 1
∥xk ∥pp = = +∞,
k
k=1
p p q
which shows that x ̸∈ ℓ . Thus ℓ ⊊ ℓ . Moreover, the proof of the strict inclusion
ℓ1 ⊊ ℓp is similar. □

Lemma 4.97 (Inner product on ℓ2 ). The norm ∥·∥2 on the space ℓ2 is induced
by the inner product
X∞
⟨x, y⟩ := xk yk .
k=1

Proof. It is clear from the definition that ∥x∥2 = (⟨x, x⟩)1/2 . Thus we only
have to show that ⟨·, ·⟩ is actually an inner product. However, it is straightforward
to verify that all the requirements for an inner product, that is, linearity in the first
component, conjugate symmetry, and positive definiteness, are satisfied. □

As a final result, we show that all the ℓp -spaces are actually complete.
Theorem 4.98. The space (ℓp , ∥·∥p ) is a Banach space for all 1 ≤ p ≤ ∞, and
a Hilbert space for p = 2.
Proof. We only have to show that (ℓp , ∥·∥p ) is complete. For simplicity, we
will restrict ourselves to the case p = 1. (The case 1 < p < ∞ is similar; the case
p = ∞ is simpler, and actually a problem on the current exercise sheet.)
Assume that (x(n) )n∈N ⊂ ℓ1 is a Cauchy sequence. We have to show that this
sequence convergences with respect to ∥·∥1 to some x ∈ ℓ1 . We start by identifying
a candidate for x.
Since (x(n) )n∈N is a Cauchy sequence, there exists for all ε > 0 some N ∈ N
such that

(n) (m)
X
∥x(n) − x(m) ∥1 = |xk − xk | < ε
k=1
whenever n, m ≥ N . In particular, this shows that
(n) (m)
|xk − xk | < ε
(n)
for all k ∈ N and all n, m ≥ N , and thus (xk )n∈N ⊂ K is for all k ∈ N a Cauchy
(n)
sequence in K. Since K is complete, it follows that xk := limn→∞ xk exists for
all k ∈ N. Define now x := (x1 , x2 , . . .). We will show that x ∈ ℓ1 and that
limn→∞ ∥x(n) − x∥1 = 0, that is, x = limn→∞ x(n) with respect to ∥·∥1 .
118 4. BANACH AND HILBERT SPACES

Since (x(n) )n∈N is a Cauchy sequence, it is bounded by Lemma 4.47. Thus


there exists R > 0 such that
∥x(n) ∥1 < R for all n ∈ N.
For all K ∈ N we have that
K K K K
(n) (n) (n)
X X X X
(41) |xk | = lim xk = lim |xk | = lim |xk |
n→∞ n→∞ n→∞
k=1 k=1 k=1 k=1
K ∞
(n) (n)
X X
≤ sup |xk | ≤ sup |xk | = sup∥x(n) ∥1 ≤ R.
n∈N n∈N n∈N
k=1 k=1

Note here that we were able to exchange the order of the limit and the sum, as we
were only dealing with a finite sum. The estimate (41) holds for all K ∈ N. Thus
it still holds in the limit K → ∞, which shows that
K
X
∥x∥1 = lim |xk | ≤ R,
K→∞
k=1

and, in particular, x ∈ ℓ1 .
Finally we show that limn→∞ ∥x(n) − x∥1 = 0. Let therefore ε > 0. Since
(x(n) )n∈N is a Cauchy sequence, there exists N ∈ N such that ∥x(n) − x(m) ∥1 < ε
whenever n, m ≥ N . In particular, we have for every K ∈ N that
K
(n) (m)
X
|xk − xk | < ε for all n, m ≥ N.
k=1

Taking the limit m → ∞, this implies that


K
(n)
X
|xk − xk | ≤ ε
k=1

for all K ∈ N and all n ≥ N . Now we can again take the limit K → ∞ and obtain
that

(n)
X
∥x(n) − x∥1 = |xk − xk | ≤ ε
k=1

for all n ≥ N . Since ε > 0 was arbitrary, this proves that ∥x(n) − x∥1 → 0. Thus
(ℓ1 , ∥·∥1 ) is complete. □

10. Completions
We have seen that there exist incomplete normed spaces like, for instance, the
space C([0, 1]) with the norm ∥·∥1 . Now one can ask the question, whether it is
possible to enlarge this space in such a way that it becomes complete. The basic
intuition for this is that we only need to “add all limits of the non-convergent
Cauchy sequences” to the space in order to achieve this goal. In this section, we
will discuss this idea in an somewhat more mathematical manner. For that, we
have to introduce some more notation.
Definition 4.99 (Isomorphism). Let U and V be vector spaces (over the same
field K). An isomorphism between U and V is a bijective linear transformation
T : U → V . The spaces U and V are called isomorphic, if an isomorphism between
U and V exists. ■
10. COMPLETIONS 119

Example 4.100. The spaces Kn+1 and Pn (K) (of polynomials of degree ≤ n)
are isomorphic. An example of an isomorphism between Kn+1 and Pn (K) is the
mapping that maps a vector (c0 , c1 , . . . , cn ) to the polynomial p(x) = c0 + c1 x +
. . . + cn xn . ■
2
Example 4.101. The spaces Kn and Matn (K) are isomorphic. One example
2
of an isomorphism between Kn and Matn (K) is the mapping that maps a vector
2
x ∈ Kn to the matrix A = (aij )1≤i,j≤n ∈ Matn (K) with entries given as aij =
xni+j for 1 ≤ i, j ≤ n. ■

Example 4.102. Generally, every n-dimensional vector space U over K is iso-


morphic to Kn . If (u1 , . . . , un ) is a basis of U , we can define an isomorphism
T : Kn → U by
T (α1 , . . . , αn ) = α1 u1 + . . . + αn un .

Essentially, we say that the spaces U and V are isomorphic if they “look the
same” as vector spaces. However, we ignore all additional structures like norms or
inner product in this definition.
Definition 4.103 (Isometry). Let (U, ∥·∥U ) and (V, ∥·∥V ) be normed spaces
and let T : U → V be linear. We say that T is an isometry or an embedding, if
∥T u∥V = ∥u∥U for all u ∈ U .
We say that (U, ∥·∥U ) is embedded in (V, ∥·∥V ), if there exists an embedding
T: U →V. ■

Remark 4.104. If T : U → V is an isometry, then it is necessarily injective: If


T u = 0, then ∥T u∥V = 0 and thus also ∥u∥U = 0, which can only happen if u = 0.
This shows that ker T = {0}, which implies the injectivity of T . ■

Example 4.105. The space (Kn , ∥·∥p ) is embedded in (Kn+1 , ∥·∥p ). One ex-
ample of an embedding is the mapping T : Kn → Kn+1 defined by (x1 , . . . , xn ) 7→
(x1 , . . . , xn , 0).
In general, however, the space (Kn , ∥·∥p ) is not embedded in (Kn+1 , ∥·∥q ) if
q ̸= p. Whereas there exist many linear mappings T : Kn → Kn+1 , in general it is
impossible to find any one that satisfies ∥T u∥q = ∥u∥p for all u ∈ Kn .5 ■

Theorem 4.106 (Completion). Assume that (U, ∥·∥U ) is a normed space. Then
there exists a Banach space (V, ∥·∥V ) and an embedding i : U → V such that ran i
is dense in V .
Moreover, the space V is unique in the following sense: If W is another Banach
space with the same properties, then the spaces V and W are isometrically iso-
morphic. (That is, there exists a bijective isometry T : V → W .)
Definition 4.107. The space (V, ∥·∥V ) from Theorem 4.106 is called the com-
pletion of (U, ∥·∥U ). ■

Intuitively, we can say that U is a dense subspace of V and that the mapping
i : U → V is just the inclusion operator that maps u ∈ U to itself, but now seen as
an element of V . Because of the density of U in V , it now follows that every element
v ∈ V can be approached by a sequence (uk )k∈N ⊂ U such that v = limk→∞ uk .
We thus can say informally that the completion V of U consists of the “limits” of
all Cauchy sequences in the space U .

5There are some exceptions to this, though. Try to find them!


120 4. BANACH AND HILBERT SPACES

Example 4.108. Denote by



cfin := x = (xk )k∈N : there exists K ∈ N such that xk = 0 for all k > K
the set of all finite sequences. Then it is easy to see that cfin is a vector space.
Moreover, for each 1 ≤ p ≤ ∞ we can consider the p-norm ∥·∥p on ℓp .
We will show in the following that for 1 ≤ p < ∞ the completion of (cfin , ∥·∥p )
is equal to ℓp . For that, we recall first that ℓp is indeed a Banach space. Moreover,
the inclusion
i : cfin → ℓp , ix = x
p
is an embedding of cfin into ℓ . We therefore only have to show that cfin is dense
in ℓp .
Let therefore x = (xk )k∈N ∈ ℓp be arbitrary. We have to show that there exist
x ∈ cfin such that limn→∞ ∥x(n) − x∥p = 0. Define therefore
(n)

x(n) := (x1 , . . . , xn , 0, 0, . . .) ∈ cfin .


Let moreover ε > 0. Since x ∈ ℓp , we have that

X 1/p
∥x∥p = |xk |p < ∞,
k=1
and thus there exists N ∈ N such that
 X ∞ 1/p
|xk |p < ε.
k=N +1
Let now n ≥ N . Then

 X 1/p ∞
 X 1/p
∥x − x(n) ∥p = |xk |p ≤ |xk |p < ε.
k=n+1 k=N +1
Since this holds for all n ≥ N and ε > 0 was arbitrary, this shows that x =
limn→∞ x(n) . Thus every vector x ∈ ℓp can be approximate (in the p-norm) by a
sequence of vectors in cfin , and thus cfin is dense in ℓp . Together, this implies that
ℓp is the completion of cfin with respect to ∥·∥p for 1 ≤ p < ∞.

We now consider the case p = ∞. In this case, it is easy to see that cfin is not
dense in ℓ∞ . Indeed, take x = (1, 1, . . .) ∈ ℓ∞ . Then, if y = (y1 , . . . , yK , 0, . . .) ∈ cfin
is any finite sequence, it follows that ∥x − y∥∞ ≥ 1. Thus it is impossible to find
any sequence of finite sequences that converges to x.
In fact, one can show that the completion of cfin with respect to ∥·∥∞ is the
space c0 of all sequences that converge to 0. For a proof, I have to refer to the
exercises. ■

Remark 4.109. One important example in practice is the case of the space
C([0, 1]) with the norm ∥·∥p , 1 ≤ p < ∞. Here the completion of (C([0, 1]), ∥·∥p ) is
the space Lp ([0, 1]) of Lebesgue p-integrable functions. A detailed discussion of the
construction of these spaces can be found in the course TMA4225 - Foundations of
Analysis. ■

Remark 4.110. If U is an inner product space, then its completion V will


necessarily be a Hilbert space. The reason for this is that the parallelogram law
holds on the dense subspace U ⊂ V because U is a Hilbert space, and therefore
also on the whole space. ■
CHAPTER 5

Bounded Linear Operators

1. Continuity of linear mappings


In the following, we will study linear mappings between normed spaces. To
start with, we have to address the question of continuity. It is not that difficult
to see that linear mappings T : Kn → Km are necessarily continuous. In the case
of linear mappings between arbitrary (infinite dimensional) normed spaces, how-
ever, the continuity need not always hold. As an example, consider the mapping
T : C([0, 1]) → R, f 7→ T f := f (1). This mapping is not continuous with re-
spect to the norm ∥·∥1 on C([0, 1]). This can be seen by considering the sequence
fn (x) := xn . We have that
Z 1
1
∥fn ∥1 = |xn | dx = →n→∞ 0,
0 n+1
which shows that 0 = limn→∞ fn . However, T (fn ) = fn (1) = 1 for all n ∈ N and
thus 
T lim fn = T (0) = 0 ̸= 1 = lim T fn .
n→∞ n→∞
Now assume that (U, ∥·∥U ) and (V, ∥·∥V ) are normed spaces and that T : U → V
is an arbitrary (not necessarily linear) mapping. Then T is continuous at a point
u ∈ U if one of the following equivalent conditions hold:
(1) For all sequences (un )n∈N with ∥un −u∥U →n→∞ 0 we have that ∥T (un )−
T (u)∥V →n→∞ 0.
(2) For all ε > 0 there exists δ > 0 such that ∥T (u) − T (v)∥V < ε whenever
v ∈ U satisfies ∥u − v∥U < δ.
In the case of linear mappings, these conditions can be simplified significantly.
Definition 5.1 (Bounded linear mapping). Let (U, ∥·∥U ) and (V, ∥·∥V ) be
normed spaces and let T : U → V be linear. We say that T is bounded, if there
exists C ≥ 0 such that
(42) ∥T u∥V ≤ C∥u∥U for all u ∈ U.

m×n
Example 5.2. Let A = (aij ) ∈ K be a matrix and consider the linear
mapping A : Kn → Km , x 7→ Ax. We will show that A is bounded with respect to
∥·∥1 on both Kn and Km . For that we estimate
m
X m X
X n m X
X n
∥Ax∥1 = |(Ax)i | = aij xj ≤ |aij ||xj |
i=1 i=1 j=1 i=1 j=1
n X
X m   m
X n
X   m
X 
= |aij | |xj | ≤ max |aij | |xj | = max |aij | ∥x∥1 .
1≤j≤n 1≤j≤n
j=1 i=1 i=1 j=1 i=1

Thus (42) holds with


m
X
C = max |aij |
1≤j≤n
i=1

121
122 5. BOUNDED LINEAR OPERATORS

and thus the mapping A is bounded. ■

Example 5.3. Let k : [0, 1] × [0, 1] → K be continuous and define


T : C([0, 1]) → C([0, 1]),
Z 1
T f (x) := k(x, y)f (y) dy.
0
Then T is bounded with respect to ∥·∥∞ on both spaces. Indeed, we can estimate
Z 1
∥T f ∥∞ = max |T f (x)| = max k(x, y)f (y) dy
x∈[0,1] x∈[0,1] 0
 Z 1   Z 1 
≤ max max |k(x, y)| |f (y)| dy = max |k(x, y)| |f (y)| dy
x∈[0,1] y∈[0,1] 0 (x,y)∈[0,1]2 0
 
≤ max 2 |k(x, y)| ∥f ∥∞ .
(x,y)∈[0,1]

Thus (42) holds with


C= max |k(x, y)|
(x,y)∈[0,1]2
and T is bounded. ■

Example 5.4. Let T : C([0, 1]) → K, T f = f (1). Then T is bounded with


respect to ∥·∥∞ , because
|T f | = |f (1)| ≤ max |f (x)| = ∥f ∥∞
x∈[0,1]

for all f ∈ C([0, 1]).


However, the mapping T is unbounded with respect to ∥·∥1 . Consider e.g. the
functions fn (x) := (n + 1)xn . Then
Z 1
∥fn ∥1 = |(n + 1)xn | dx = 1
0
for all n ∈ N, but
|T fn | = |fn (1)| = n + 1 = (n + 1)∥fn ∥1 .
Thus it is not possible to find a finite number C ≥ 0 such that (42) holds, and thus
T is unbounded with respect to ∥·∥1 . ■

It is not by accident that we found the same mapping T to be both unbounded


and discontinuous, as the following important theorem shows:
Theorem 5.5 (Boundedness and continuity). Let (U, ∥·∥U ) and (V, ∥·∥V ) be
normed spaces and let T : U → V be linear. The following are equivalent:
(1) T is bounded.
(2) T is continuous.
(3) There exists u0 ∈ U such that T is continuous at u0 .
Proof. We start by showing the implication (1) =⇒ (2). Assume therefore
that T is bounded, that is, that there exists C ≥ 0 such that
∥T u∥V ≤ C∥u∥U for all u ∈ U.
Assume moreover that u ∈ U and that (un )n∈N is a sequence in U satisfying
limn→∞ un = u. Then
∥T (un ) − T (u)∥V = ∥T (un − u)∥V ≤ C∥un − u∥U →n→∞ 0,
and thus T u = limn→∞ T un , which proves that T is continuous at u. Since u was
arbitrary, this proves the continuity of T .
1. CONTINUITY OF LINEAR MAPPINGS 123

The implication (2) =⇒ (3) is trivial.


For the implication (3) =⇒ (1), assume that u0 ∈ U is such that T is
continuous at u0 . Then there exists δ > 0 such that
∥T u0 − T v∥V < 1 whenever v ∈ U satisfies ∥u0 − v∥U < δ.
Now let w ∈ U be arbitrary with w ̸= 0. Define moreover
δ
v =u+ w.
2∥w∥U
Then
δ δ δ
∥u − v∥ = w = ∥w∥U = < δ,
2∥w∥U U 2∥w∥U 2
and thus
 δ 
1 > ∥T u − T w∥V = T u − T u + w
2∥w∥U V
 δ  δ
= T w = ∥T w∥V .
2∥w∥U V 2∥w∥U
This shows that
2
∥T w∥V ≤ ∥w∥U .
δ
Since w was arbitrary and the constant on the right hand side is independent of w,
it follows that T is bounded. □
We have thus shown that, for linear mappings on normed spaces, continuity is
precisely the same as boundedness and continuous linear mappings are the same as
bounded linear mappings. Because of the relatively simpler definition of bounded-
ness, one usually the term “bounded linear mapping” instead of “continuous linear
mapping.”
Remark 5.6. The equivalence of boundedness and continuity does not hold
for general non-linear mappings. There are some generalisation to e.g. bilinear
mappings available, that turn out to be extremely useful for the analysis of partial
differential equations. ■

Example 5.7. We now revisit the examples from the start of this section.
• Every mapping A : Kn → Km (defined by a matrix) is bounded and thus
continuous with respect to ∥·∥1 on both spaces. More general, it is possible
to show that every linear mapping between finite dimensional normed
spaces is continuous.
• The mapping T : C([0, 1]) → K, T f = f (1), is bounded and thus continu-
ous with respect to ∥·∥∞ , but unbounded and discontinuous with respect
to ∥·∥1 .
• If k : [0, 1]2 → K is continuous, then the mapping T : C([0, 1]) → C([0, 1]),
R1
T f (x) = 0 k(x, y)f (y) dy is bounded and thus continuous with respect
to ∥·∥∞ .

We now discuss some further examples on sequence spaces:


Example 5.8. Let 1 ≤ p ≤ ∞. The left shift operator L : ℓp → ℓp is the
mapping defined as
L(x1 , x2 , x3 , . . .) := (x2 , x3 , x4 , . . .).
The right shift operator R : ℓp → ℓp is the mapping defined as
R(x1 , x2 , x3 , . . .) := (0, x1 , x2 , . . .).
124 5. BOUNDED LINEAR OPERATORS

Both of these operators are bounded. Specifically, we have that ∥Rx∥p = ∥x∥p for
all x ∈ ℓp , and ∥Lx∥p ≤ ∥x∥p for all x ∈ ℓp . ■

Example 5.9. Assume that y = (y1 , y2 , . . .) ∈ ℓ∞ is some fixed sequence, and


define the mapping T : ℓ1 → K,

X
T x := xk yk .
k=1

This mapping is well-defined (in the sense that the infinite series on the right hand
side converges for all x ∈ ℓ1 , as

X X
xk yk ≤ sup|yk | |xk | = ∥y∥∞ ∥x∥1 .
k
k=1 k

In fact, this last inequality actually shows that T is bounded, the constant C in (42)
being ∥y∥∞ .
More general, let 1 ≤ p ≤ ∞ and let p∗ be the Hölder conjugate of p. Let
moreover y ∈ ℓp∗ be fixed and define T : ℓp → K,

X
Tx = xk yk .
k=1

Then Hölder’s inequality implies that



X
|T x| = xk yk ≤ ∥x∥p ∥y∥p∗ ,
k=1

which shows that T is a well-defined bounded linear mapping. ■

2. Norms of bounded linear operators


Assume that U and V are vector spaces. Let moreover T , S : U → V be linear
operators. Then we can define a new operator T + S : U → V by
(T + S)(u) := T u + Su.
Moreover, it is easy to verify that this operator T + S again is linear. For instance,
we have that
(T +S)(u+v) = T (u+v)+S(u+v) = T u+T v +Su+Sv = (T +S)(u)+(T +S)(v).
Moreover, for λ ∈ K we can define the operator λT : U → V by
(λT )(u) := λ(T u).
Again, the linearity of T implies that also λT is linear. Now we can see that the
set of all linear operators between two vector spaces U and V forms itself a vector
space (with the just defined addition and scalar multiplication).
Definition 5.10. Let U and V be vector spaces. By L(U, V ) we denote the
vector space of linear operators T : U → V . ■

Assume now that U and V are normed spaces. Our next goal is to show that
the set of bounded linear operators T : U → V is a subspace of L(U, V ). Moreover,
we will be able to define a natural norm on that space.
Definition 5.11. Let (U, ∥·∥U ) and (V, ∥·∥V ) be normed spaces. By B(U, V )
we denote the set of bounded linear operators T : U → V . ■

Lemma 5.12. Let (U, ∥·∥U ) and (V, ∥·∥V ) be normed spaces. The set B(U, V )
is a vector space.
2. NORMS OF BOUNDED LINEAR OPERATORS 125

Proof. We have to show that B(U, V ) is a subspace of L(U, V ). For that,


we have to verify that 0 is bounded linear, that the sum of two bounded linear
operators is again bounded, and that scalar multiples of bounded linear operators
are bounded.
First, the 0-operator is obviously bounded, as
∥0u∥V = ∥0∥V = 0 = 0∥u∥U
for all u ∈ U . Thus 0 ∈ B(U, V ).
Next, assume that T , S ∈ B(U, V ) are bounded linear operators. Then there
exist C, D ≥ 0 such that
∥T u∥V ≤ C∥u∥U and ∥Su∥V ≤ D∥u∥U
for all u ∈ U . Now the triangle inequality implies that
∥(T + S)u∥V = ∥T u + Su∥V ≤ ∥T u∥V + ∥Su∥V ≤ C∥u∥U + D∥u∥U = (C + D)∥u∥U ,
which shows that T + S is bounded as well.
Finally, assume that T ∈ B(U, V ) and λ ∈ K. Because of the boundedness of
T , there exists C ≥ 0 such that
∥T u∥V ≤ C∥u∥U
for all u ∈ U . Now the homogeneity of the norm and the linearity of T imply that
∥(λT )u∥V = ∥λ(T u)∥V = ∥T (λu)∥V ≤ C∥λu∥U = C|λ|∥u∥U
for all u ∈ U , and thus λT is bounded. □
Next, we will use the definition of boundedness of linear operators in order to
define a norm on B(U, V ). By definition, we can find for every T ∈ B(U, V ) some
number C ≥ 0 such that ∥T u∥V ≤ C∥u∥U for all u ∈ U . The smallest such number
C can be used as a norm.
Definition 5.13. Let (U, ∥·∥U ) and (V, ∥·∥V ) be normed spaces. We define the
mapping ∥·∥B(U,V ) : B(U, V ) → R by

(43) ∥T ∥B(U,V ) := inf C > 0 : ∥T u∥V ≤ C∥u∥U for all u ∈ U .

The following result provides an alternative characterisation of ∥·∥B(U,V ) that


is more useful in practice.
Lemma 5.14. Let (U, ∥·∥U ) and (V, ∥·∥V ) be normed spaces with U ̸= {0}. Then
∥T u∥V
(44) ∥T u∥B(U,V ) = sup = sup ∥T u∥V .
u∈U, ∥u∥U u∈U,
u̸=0 ∥u∥U =1

Proof. We note first that, for all u ∈ U with u ̸= 0, we have


∥T u∥V  u  u
= T and = 1.
∥u∥U ∥u∥U V ∥u∥U U
This shows the second equality in (44).
Next, assume that C ≥ 0 is such that ∥T u∥V ≤ C∥u∥U for all u ∈ U . For u ̸= 0
it then follows that
∥T u∥V ∥T v∥V
C≥ ≥ sup .
∥u∥U v∈U, ∥v∥U
v̸=0
Taking the infimum over all C ≥ 0, we obtain that
∥T v∥V
∥T ∥B(U,V ) ≥ sup .
v∈U, ∥v∥U
v̸=0
126 5. BOUNDED LINEAR OPERATORS

Conversely, we have for all u ̸= 0 that


∥T u∥V  ∥T v∥V 
∥T u∥V = ∥u∥U ≤ sup ∥u∥U ,
∥u∥U v∈U, ∥v∥U
v̸=0

which shows that supv∈U, ∥T v∥V


∥v∥U can be used as a constant C in (42). As a con-
v̸=0
sequence,
∥T v∥V
∥T ∥B(U,V ) ≤ sup .
v∈U, ∥v∥U
v̸=0

Remark 5.15. An immediate consequence of Lemma 5.14 is the inequality


∥T u∥V ≤ ∥T ∥B(U,V ) ∥u∥U
for all u ∈ U and all T ∈ B(U, V ). ■

Theorem 5.16. Let (U, ∥·∥U ) and (V, ∥·∥V ) be normed spaces. Then ∥·∥B(U,V )
as defined in (43) defines a norm on B(U, V ).
Proof. Exercise. □

Lemma 5.17. Assume that (U, ∥·∥U ), (V, ∥·∥V ), and (W, ∥·∥W ) are normed
spaces and that T : U → V and S : V → W are bounded linear. The the com-
position S ◦ T : U → W is bounded linear and
(45) ∥S ◦ T ∥B(U,W ) ≤ ∥S∥B(V,W ) ∥T ∥B(U,V ) .
Proof. For every u ∈ U we have
∥(S ◦ T )u∥W = ∥S(T u)∥W ≤ ∥S∥B(V,W ) ∥T u∥V ≤ ∥S∥B(V,W ) ∥T ∥B(U,V ) ∥u∥U .
This shows that S ◦ T is bounded. Moreover, we can use C = ∥S∥B(V,W ) ∥T ∥B(U,V )
as a constant in the definition (42). Now (45) follows from the definition of ∥S ◦
T ∥B(U,W ) as the smallest such constant. □

Example 5.18. We now consider specifically the case where A ∈ Km×n is a


matrix defining a linear operator A : Kn → Km . Moreover, we consider on Kn and
Km the p-norm for some 1 ≤ p ≤ ∞. For simplicity, we denote
∥A∥p := sup ∥Ax∥p
∥x∥p =1

the norm of the matrix (or operator) A with respect to the p-norm on both spaces.
We have four different cases:
(1) The case p = 2, that is,
∥A∥2 := sup ∥Ax∥2 .
∥x∥2 =1

This is a problem that we have already discussed previously in the chapter


about the singular value decomposition, and we seen in the proof of The-
orem 3.35 that the solution of the optimisation problem on the right hand
side is precisely the largest singular value of A. That is, we have that
∥A∥2 = σ1 .
(2) The case p = 1, that is,
∥A∥1 := sup ∥Ax∥1 .
∥x∥1 =1
2. NORMS OF BOUNDED LINEAR OPERATORS 127

This is a case we have discussed previously in Example 5.2. Although we


have not tried to find the optimal constant there, it turns out that we
have actually found it. That is, one can show that
Xm
∥A∥1 = max |aij |.
1≤j≤n
i=1
(3) The case p = ∞, that is,
∥A∥∞ := sup ∥Ax∥∞ .
∥x∥∞ =1

Here one can show that


n
X
∥A∥∞ = max |aij |.
1≤i≤m
j=1

(4) All other cases, that is, p ̸= 1, 2, ∞. Here no analytic formulas for the p-
norm of a general matrix A ∈ Km×n exist, but it is possible to approximate
solutions of the optimisation problem
max ∥Ax∥p subject to ∥x∥p = 1
x∈Kn
numerically.

Example 5.19. More general, we can use different norms on the domain Kn
and the target space Km , say the p-norm on Kn and the q-norm on Km with
p ̸= q. That is, we interpret the matrix A as a mapping between the normed spaces
(Kn , ∥·∥p ) and (Km , ∥·∥q ). The resulting operator norm is defined as
∥A∥p7→q := sup ∥Ax∥q .
∥x∥p =1

In the specific case p = 1, we again have a simple explicit formula for the resulting
norm, namely
Xm 1/q
(46) ∥A∥17→q = max |aij |q
1≤j≤n
i=1
for 1 ≤ q < ∞, and
∥A∥17→∞ = max |aij |
1≤j≤n,
1≤i≤m
for q = ∞.
We will show now that (46) holds for 1 ≤ q < ∞. Denote by ej ∈ Kn the j-th
standard basis vector. Then we have for all x ∈ Kn with ∥x∥1 = 1 that
n
X n
X n
X
∥Ax∥q = A(xj ej ) = xj (Aej ) ≤ |xj |∥Aej ∥q
q q
j=1 j=1 j=1
m
X 1/q
|aij |q

≤ ∥x∥1 max ∥Aej ∥q = max .
1≤j≤n 1≤j≤n
i=1
Moreover, if 1 ≤ k ≤ n is an index where the maximum in (46) is attained, then
Xm 1/q m
X 1/q
∥Aek ∥q = |aik |q = max |aij |q .
1≤j≤n
i=1 i=1
For the case q = ∞, the proof is similar.
It turns out, however, that the cases p ̸= 1 are much more difficult to treat,
and there no longer exist any analytical formulas (unless of course p = q = 2 and
128 5. BOUNDED LINEAR OPERATORS

p = q = ∞). Theoretically, the norm ∥A∥∞7→1 can be computed exactly, but


the computation involves the solution of an NP-hard optimisation problem. For
somewhat larger dimensions, this is not feasible in practice. ■

Example 5.20. Let 1 ≤ p ≤ ∞ and consider again the left and right shift
operators L, R : ℓp → ℓp ,
L(x1 , x2 , . . .) = (x2 , x3 , . . .) and R(x1 , x2 , . . .) = (0, x1 , x2 , . . .).
We will show that ∥L∥ = ∥R∥ = 1. For simplicity, we only show this in the case
1 ≤ p < ∞, though the proof for p = ∞ is essentially the same (but the definition
of the p-norm looks different).
First we note that
X∞ 1/p  ∞
X 1/p X ∞ 1/p
∥Rx∥p = |(Rx)k |p = 0+ |xk−1 |p = |xk |p = ∥x∥p .
k=1 k=2 k=1
This shows that
∥Rx∥p
=1
∥x∥p
for all x ∈ ℓp , and thus also ∥R∥ = 1.
Concerning the left shift operator L, we have that
X∞ 1/p X ∞ ∞
1/p X 1/p
∥Lx∥p = |(Lx)k |p = |xk+1 |p = |xk |p ≤ ∥x∥p ,
k=1 k=1 k=2
and thus
∥Lx∥p
≤ 1.
∥L∥ = sup
x̸=0 ∥x∥p
Moreover, if we consider for instance the sequence x = (0, 1, 0, 0, . . .), we see that
∥x∥p = 1 and ∥Lx∥p = ∥(1, 0, . . .)∥p = 1 as well. Thus
∥(1, 0, . . .)∥p
∥L∥ ≥ = 1.
∥(0, 1, 0, . . .)∥p
Together, we obtain again that ∥L∥ = 1. ■

Example 5.21. Let k : [0, 1] × [0, 1] → K be continuous and define the linear
mapping T : C([0, 1]) → C([0, 1]),
Z 1
T f (x) = k(x, y)f (y) dt.
0
We have previously seen that T is bounded with respect to ∥·∥∞ on both copies of
C([0, 1]), and we have also derived the estimate
 
∥T f ∥∞ ≤ max 2 |k(x, y)| ∥f ∥∞
(x,y)∈[0,1]

for all f ∈ C([0, 1]). This shows that


∥T ∥ ≤ max |k(x, y)|.
(x,y)∈[0,1]2

However, we cannot conclude that we have equality in this estimate, and, in fact,
in general we have not.
In fact, we can obtain a better estimate of the norm of T as follows:
Z 1
(47) ∥T f ∥∞ = max |T f (x)| = max k(x, y)f (y) dy
x∈[0,1] x∈[0,1] 0
Z 1    Z 1 
≤ max |k(x, y)| dy max |f (y)| = max |k(x, y)| dy ∥f ∥∞ .
x∈[0,1] 0 y∈[0,1] x∈[0,1] 0
3. EXTENSION OF LINEAR MAPPINGS 129

We therefore see that


Z 1
(48) ∥T ∥ ≤ max |k(x, y)| dy.
x∈[0,1] 0

In fact, we have equality in (48).


If k(x, y) ≥ 0 for all (x, y) ∈ [0, 1]2 , this is easy to see: We simply can use the
function f (x) = 1 in (47) and obtain an equality at all points. The argumentation
is a bit more involved, though, if k changes sign. ■

3. Extension of Linear Mappings


Assume that U and V are finite dimensional vector spaces and that T : U → V
is a linear mapping. Then T is uniquely determined by the action of T on a basis
of U . That is, if (u1 , . . . , un ) is a basis of U , then we know T as soon as we know
the values of T u1 , . . . , T un ∈ V : Because (u1 , . . . , un ) is a basis, we can write each
u ∈ U uniquely in the form
u = α1 u1 + . . . + αn un .
The linearity of T then implies that
T u = α1 (T u1 ) + . . . + αn (T un ).
Conversely, if we are given vectors v1 , . . . , vn ∈ V and a basis (u1 , . . . , un ), then
there exists a unique linear mapping T : U → V with T ui = vi for all 1 ≤ i ≤ n,
and this mapping is given by
n
X n
X
Tu = αi vi if u= αi ui .
i=1 i=1

Now a question is, whether this also works in infinite dimensional vector spaces.
At first, the answer appears to be yes: First of all, we note that the dimension of
V does not play any role in the consideration above. Now assume that U is infinite
dimensional and that (ui )i∈I is a Hamel basis of U (with some infinite index set I).
Then we can again write each u ∈ U uniquely in the form
u = α1 ui1 + . . . + αN uiN
for some N ∈ N, indices i1 , . . . , iN ∈ I, and coefficients α1 , . . . , αN ∈ K. Thus, if
we know the values of T ui ∈ V for all i ∈ I, then we have that
T u = α1 (T ui1 ) + . . . + αN (T uiN ).
There is, however, a pretty significant catch: All of this only works if we actually
have a basis in the sense of linear algebra, that is, if we have vectors ui , i ∈ I, such
that we can write each element u ∈ U as a finite linear combination of the vectors
ui . In most situations we encounter in practice, this is not the case. Consider for
instance the case U = ℓ2 of all square summable sequences. Similar to the case of
Kn we can define there “unit sequences”
(49) ei = (0, . . . , 0, |{z}
1 , 0, . . .)
i-th position

for i ∈ N. However, these sequences do not form a basis of ℓ2 in the sense above,
as it is impossible to write an infinite sequence as a finite linear combination of the
unit sequences ei .
130 5. BOUNDED LINEAR OPERATORS

Note, however, that these unit sequences actually do form a basis of cfin , as we
can write any finite sequence x = (x1 , . . . , xN , 0, . . .) as the finite linear combination
N
X
x= xi ei .
i=1

Now assume that V is a vector space and that we are given vectors vi ∈ V , i ∈ N.
Then we can define a linear mapping
T : cfin → V,
(x1 , . . . , xN , 0, . . .) 7→ x1 v1 + . . . + xN vN .
The question now becomes, whether it is possible to extend this mapping from cfin
to the whole space ℓ2 in a meaningful way.
Definition 5.22 (Extension). Assume that X and Y are sets and that S ⊂ X
is a subset. Assume moreover that T : S → Y is some mapping. An extension of T
to X is a mapping
T̄ : X → Y
such that
T̄ (x) = T (x) for all x ∈ S.

The next result states that extensions of bounded linear operators exist.
Theorem 5.23. Assume that (U, ∥·∥U ) is a normed space and that (V, ∥·∥V )
is a Banach space. Assume moreover that M ⊂ U is a dense subspace and that
T : M → V is a bounded linear operator. Then there exists a unique bounded linear
extension T̄ of T to U . In addition, we have that
∥T̄ ∥B(U,V ) = ∥T ∥B(M,V ) .
Proof. We start by constructing the mapping T̄ .
Assume that u ∈ U . Because M ⊂ U is dense, there exists a sequence (un )n∈N
with un ∈ M for all n ∈ N and u = limn→∞ un . Now the boundedness of T implies
that
∥T un − T um ∥V = ∥T (un − um )∥V ≤ ∥T ∥B(M,V ) ∥un − um ∥U .
Since (un )n∈N is a Cauchy sequence (as it is convergent), this implies that the
sequence (T un )n∈N is a Cauchy sequence in V . Now the completeness of V implies
that the limit limn→∞ T un exists in V . Define therefore
T̄ u := lim T un .
n→∞

We now show that the value T̄ u ∈ V does not depend on the choice of the
sequence (un )n∈N converging to u. Indeed, assume that (vn )n∈N is another sequence
with u = limn→∞ vn and vn ∈ M for all n ∈ N. Then we have in particular that
∥un − vn ∥U ≤ ∥un − u∥U + ∥u − vn ∥U →n→∞ 0.
Thus it follows that

∥T̄ u − T vn ∥V ≤ ∥T̄ u − T un ∥V + ∥T un − T vn ∥V
≤ ∥T̄ u − T un ∥V + ∥T ∥∥un − vn ∥U →n→∞ 0,
that is T̄ u = limn→∞ T vn .
In particular, for u ∈ M we can choose un = u for all n, which implies that
T̄ u = T u for all u ∈ M . Thus T̄ is indeed an extension of T .
3. EXTENSION OF LINEAR MAPPINGS 131

Next, we will show that T̄ is in fact linear. For that, assume that u, v ∈ U , and
let (un )n∈N and (vn )n∈N be sequences in M converging to u and v, respectively.
Then
T̄ u = lim T un and T̄ v = lim T vn .
n→∞ n→∞
Now the continuity of addition in a normed space implies that
lim (un + vn ) = lim un + lim vn = u + v
n→∞ n→∞ n→∞

and thus, by the definition of T̄ and the linearity of T ,


T̄ (u + v) = lim T (un + vn ) = lim T un + lim T vn = T̄ u + T̄ v.
n→∞ n→∞ n→∞
Next, assume that u ∈ U and λ ∈ K. Let again (un )n∈N be a sequence in M
converging to u. Then the continuity of the scalar multiplication implies that
lim λun = λ lim un = λu
n→∞ n→∞
and thus
T̄ (λu) = lim T (λun ) = lim λT un = λ lim T un = λT̄ u.
n→∞ n→∞ n→∞
This shows the linearity of T .
Next we will show that T̄ is bounded and that ∥T̄ ∥B(U,V ) = ∥T ∥B(M,V ) . For
that, assume that u ∈ U . Then there exists a sequence (un )n∈N with un ∈ M for
all n and u = limn→∞ un . Then by construction T̄ u = limn→∞ T un and thus the
continuity of the norm implies that
∥T̄ u∥V = lim ∥T un ∥M ≤ ∥T ∥B(M,V ) lim ∥un ∥U = ∥T ∥B(M,V ) ∥u∥U .
n→∞ n→∞

This shows that T̄ is bounded linear and ∥T̄ ∥B(U,V ) ≤ ∥T ∥B(M,V ) . On the other
hand, we have that
∥T̄ ∥B(U,V ) = sup ∥T̄ u∥U ≥ sup ∥T u∥U = ∥T ∥B(M,V ) ,
u∈U, u∈M,
∥u∥U =1 ∥u∥U =1

which shows that, in fact, ∥T̄ ∥B(U,V ) = ∥T ∥B(M,V ) .


Finally we show the uniqueness of T̄ . Assume therefore that S : U → V is
another bounded linear mapping such that Su = T u for all u ∈ M . Assume
moreover that u ∈ U . The density of M in U again implies that there exists a
sequence (un )n∈N in M such that limn→∞ un = u. Since the mappings S and T̄
are bounded linear, they are continuous. Thus we have that
Su = lim Sun = lim T un = lim T̄ un = T̄ u.
n→∞ n→∞ n→∞

This proves the uniqueness of T̄ , which concludes that proof. □


Corollary 5.24. Assume that (U, ∥·∥U ) and (V, ∥·∥V ) are Banach spaces,
M ⊂ U is some subspace of U , and that T : M → V is a bounded linear oper-
ator. Denote by M̄ ⊂ U the closure of M . Then there exists a unique bounded
linear extension T̄ : M̄ → V .
Proof. The closure M̄ of M by definition closed and thus complete, which
implies that M̄ is a Banach space. Moreover, M is by definition dense in its closure.
Thus we can apply Theorem 5.23. □
Remark 5.25. The construction in the proof of Theorem 5.23 relies heavily on
both the linearity of T and its boundedness. For unbounded linear operators, it is
still possible to show that extensions exist, but they will in general not be unique.
Moreover, the existence proof relies on the axiom of choice, which means that there
is no possibility for actually constructing such an extension. ■
132 5. BOUNDED LINEAR OPERATORS

Example 5.26. Let 1 ≤ p < ∞. Then cfin is dense in ℓp and the unit se-
quences ei , i ∈ N, as defined in (49) form a basis of cfin . Assume now that V is a
Banach space and let vi , i ∈ N, be such that the sequence (∥v1 ∥V , ∥v2 ∥, ∥v3 ∥, . . .)
is contained in ℓp∗ , where p∗ is the Hölder conjugate of p. Define now the mapping
T : cfin → V by
T (x1 , . . . , xN , 0, . . .) = x1 v1 + . . . + xN vN .
Then we have for all x = (x1 , . . . , xN , 0, . . .) ∈ cfin that

∥T x∥V = ∥x1 v1 + . . . + xN vN ∥V ≤ |x1 |∥v1 ∥V + . . . + |xN |∥vN ∥V .

Now we can use Hölder’s inequality and obtain that


N
X N
1/p  X 1/p∗
|x1 |∥v1 ∥V + . . . + |xN |∥vN ∥V ≤ |x1 |p ∥vn ∥pV∗
n=1 n=1
≤ ∥x∥p ∥(∥v1 ∥V , ∥v2 ∥V , . . .)∥p∗ .

Thus T : cfin → V is bounded linear. Since cfin ⊂ ℓp is dense, it follows that T


can be extended to a bounded linear mapping T̄ : ℓp → V . Moreover, following the
construction in the proof of that result, we obtain that

X N
X
T̄ x = xn vn := lim x n vn
N →∞
n=1 n=1

for all x = (x1 , x2 , . . .) ∈ ℓp . ■

Example 5.27. Denote by CK (R) the set of continuous functions with compact
support. That is, f ∈ CK (R), if f is continuous and there exists R > 0 such that
f (x) = 0 for all x ∈ R with |x| ≥ R. On CK (R) we can consider the norm ∥·∥1
defined by
Z ∞
∥f ∥1 := |f (x)| dx.
−∞

Note here that ∥f ∥1 is finite for all f ∈ CK (R): If f ∈ CK (R), then there exists
R > 0 such that f (x) = 0 for all x ̸∈ [−R, R]. Thus
Z ∞ Z R
∥f ∥1 = |f (x)| dx = |f (x)| dx ≤ 2R max |f (x)| < ∞,
−∞ −R x∈[−R,R]

as the function |f | is continuous and thus admits its maximum on the bounded and
closed interval [−R, R].
Denote now by Cb (R) the space of bounded continuous mappings on R. If is
possible to show that Cb (R) is a Banach space with the norm

∥g∥∞ := sup|g(x)|.
x∈R

Consider now the Fourier transform


F : CK (R) → Cb (R),
f 7→ Ff =: fˆ,

where fˆ: R → C is defined as


Z ∞
1
fˆ(ω) = √ f (x)e−iωx dx.
2π −∞
4. RANGE AND KERNEL 133

Then we can estimate


Z ∞
1
∥fˆ∥∞ = sup |fˆ(ω)| ≤ √ sup |f (x)e−iωx | dx
ω∈R 2π ω∈R −∞
Z ∞
1 1
= √ sup |f (x)| dx = √ ∥f ∥1 .
2π ω∈R −∞ 2π
This shows that F : CK (R) → Cb (R) is bounded linear with ∥F∥ ≤ √12π . As
a consequence, there exists a bounded linear extension of F to the completion of
CK (R), which can be shown to be the space L1 (R) of Lebesgue summable functions.1
That is, we have a natural definition of the Fourier transform of arbitrary summable
functions. Moreover, the Fourier transform fˆ of a summable function f ∈ L1 (R) is
necessarily a (bounded) continuous function. ■

4. Range and Kernel


Consider again the left and right shift operators L, R : ℓ2 → ℓ2 , that is
L(x1 , x2 , x3 , . . .) = (x2 , x3 , x4 , . . .),
R(x1 , x2 , x3 , . . .) = (0, x1 , x2 , . . .).
We have already seen that these are bounded linear mappings with ∥L∥ = ∥R∥ = 1.
Moreover, it is easy to see that L is surjective but not injective, whereas R is
injective but not surjective.
Now compare this to the finite dimensional case: In Theorem 2.81 we have
shown that, for a linear mapping T : V → V on a finite dimensional vector space,
injectivity, surjectivity, and bijectivity are all equivalent. As the example of the
shift operator shows, this no longer holds in infinite dimensional spaces. Now we
can ask the question, whether we have other possibilities for deciding whether a
linear mapping on an infinite dimensional space is invertible. In this section we will
give some partial answers to that question. For that, we have to discuss the kernel
and the range of bounded linear mappings.
Recall here that, for a linear mapping T : U → V between vector spaces U and
V , the kernel and range are defined as

ker T = u ∈ U : T u = 0 ,

ran T = v ∈ V : there exists u ∈ U with T u = v .
Lemma 5.28. Assume that (U, ∥·∥U ) and (V, ∥·∥V ) are normed spaces and that
T : U → V is bounded linear. Then ker T is a closed linear subspace of U .
Proof. Since T is bounded linear, it is continuous. Moreover we can write
ker T = T −1 ({0V }) as the pre-image of the closed set {0V } ⊂ V containing only
the zero vector 0V ∈ V . Thus ker T is closed, being the pre-image of a closed set
under a continuous mapping.2 □
We have now seen that the kernel of a bounded linear mapping is always a
closed linear space. Interestingly, and unfortunately, this is not the case for the
range. That is, if T : U → V is bounded linear, then there may exist a sequence
(vn )n∈N ⊂ V converging to some v ∈ V such that vn ∈ ran T for all n ∈ N, whereas

1A concrete construction of this space relies on measure theory and is discussed in the course
“Foundations
R∞ of Analysis.” Essentially, L1 (R) consists of all functions f for which the integral
−∞ |f (x)| dx can be reasonably defined and is finite.
2Alternatively, if (u )
n n∈N ⊂ ker T is a convergent sequence with u = limn→∞ un , then
T un = 0 for all n ∈ N (as un ∈ ker T for all n), and thus the continuity of T implies that
T u = limn→∞ T un = 0 as well, which implies that also u ∈ ker T .
134 5. BOUNDED LINEAR OPERATORS

v ̸∈ ran T . Put differently, for every n ∈ N, the equation T u = vn has a solution,


but the “limit equation” T u = v is no longer solvable.
Example 5.29. Consider the mapping T : C([0, 1]) → C([0, 1]),
Z x
T f (x) := f (y) dy.
0
Then ran T consists of all definite integrals of continuous function f on the unit
interval [0, 1]. The fundamental theorem of analysis now implies that ran T consists
of all continuously differentiable functions g ∈ C 1 ([0, 1]) with g(0) = 0. Now
consider both copies of C([0, 1]) with the ∞-norm ∥f ∥∞ = maxx∈[0,1] |f (x)|. Then
Z x Z 1
∥T f ∥∞ = max f (y) dy ≤ |f (y)| dy ≤ max |f (x)| = ∥f ∥∞ ,
x∈[0,1] 0 0 x∈[0,1]

which shows that T is bounded. However, the range of T is not closed. Take for
instance the sequence of functions
r
1 1
gn (x) := x + − √ .
n n
For all n ∈ N we have that gn is continuously differentiable on the interval [0, 1]
and gn (0) = 0. Thus gn ∈ ran T for all n ∈ N. In addition, we have limn→∞ gn = g
with √
g(x) = x.
Since the derivative g ′ (0) does not exist, the function g is not contained in ran T .
That is, the equation Z x

f (y) dy = y
0
has no solution f ∈ C([0, 1]). This shows that ran T is not a closed linear subspace
of C([0, 1]). ■

Theorem 5.30. Assume that (U, ∥·∥U ) and (V, ∥·∥V ) are Banach spaces and
that T : U → V is bounded linear. Assume that there exists C > 0 such that
(50) ∥T u∥V ≥ C∥u∥U
for all u ∈ U . Then ran T ⊂ V is closed and thus a Banach space. Moreover, T
has a bounded linear inverse
T −1 : ran T → U
and we can estimate
1
∥T −1 ∥B(ran T,U ) ≤ ,
C
where C > 0 is the constant from (50).
Proof. We show first that T is injective. Assume to that end that u ∈ ker T ,
that is, T u = 0. Then
0 = ∥T u∥V ≥ C∥u∥U ,
which shows that ∥u∥U = 0 and thus u = 0. Thus ker T = {0}, which shows the
injectivity of T .
Next we show that ran T is closed. Assume to that end that (vn )n∈N is a
sequence such that vn ∈ ran T for all n ∈ N and that v = limn→∞ vn exists in V .
We have to show that v ∈ ran T . That is, we need to show that there exists u ∈ U
such that T u = v.
Because vn ∈ ran T , there exist un ∈ U , n ∈ N, such that T un = vn . By (50),
we can now estimate
∥T un − T um ∥V = ∥T (un − um )∥V ≥ C∥un − um ∥U
4. RANGE AND KERNEL 135

for all n, m ∈ N, or
1 1
(51) ∥un − um ∥U ≤ ∥T un − T um ∥V = ∥vn − vm ∥.
C C
By assumption, the sequence (vn )n∈N converges, which implies that it is a Cauchy
sequence. Now (51) implies that the sequence (un )n∈N is a Cauchy sequence as
well. Because U is a Banach space, it is complete, and thus the Cauchy sequence
(un )n∈N converges to some u ∈ U . Now the boundedness of T implies that
 
T u = T lim un = lim T un = lim vn = v.
n→∞ n→∞ n→∞

Thus v ∈ ran T , which implies that ran T is closed.


By definition, the mapping T : U → ran T is surjective. Because is it is also
injective, as we have already shown above, it is bijective, and thus the inverse
T −1 : ran T → U exists and is linear. It thus remains to show that T −1 is bounded
linear with norm bounded by 1/C. Let now v ∈ ran T . Then we can write v =
T (T −1 v) and thus
∥v∥V = ∥T (T −1 v)∥V ≥ C∥T −1 v∥U
by (50). Thus
1
∥T −1 v∥U ≤
∥v∥V
C
for all v ∈ ran T , which shows that T −1 : ran T → U is bounded and that we can
estimate ∥T −1 ∥B(ran T,U ) ≤ 1/C. □

We now show that the inequality (50) is also a necessary condition for T to
have a bounded inverse.
Lemma 5.31. Assume that (U, ∥·∥U ) and (V, ∥·∥V ) are normed spaces and that
T : U → V is bounded linear and bijective. Assume in addition that T −1 : V → U
is bounded. Then there exists C ≥ 0 such that
(52) ∥T u∥V ≥ C∥u∥U for all u ∈ U.
Proof. Without loss of generality assume that U , V ̸= {0}. Then T −1 = ̸ 0
and thus ∥T −1 ∥B(V,U ) > 0. Now let u ∈ U be arbitrary. Then u = T −1 T u, and
thus we can estimate
∥u∥U = ∥T −1 T u∥U ≤ ∥T −1 ∥B(V,U ) ∥T u∥V .

This shows that (52) holds with C = 1/∥T −1 ∥B(U,V ) . □

Remark 5.32. Assume that (U, ∥·∥U ) and (V, ∥·∥V ) are Banach spaces and
that T : U → V is bounded linear. Then Theorem 5.30 implies that T has a
bounded linear inverse T −1 : V → U provided that ran T ⊂ V is dense and the
estimate (50) holds. (The density of ran T implies that the closure of ran T is equal
to V ; because ran T already is closed by Theorem 5.30 it is equal to its closure and
thus ran T = V .)
Conversely, assume that T : U → V is bounded linear and that T is bijective.
Then it is possible to show that (50) holds for some C > 0 and thus T −1 is actually
a bounded linear mapping as well. This result, however, is far from trivial and goes
beyond the scope of this class.3 ■

3I refer to the class TMA4230 – Functional Analysis for more details.


136 5. BOUNDED LINEAR OPERATORS

5. Spaces of Bounded Linear Mappings


We have seen previously that the space B(U, V ) is a normed space whenever
U and V are normed spaces. The next question would then be whether B(U, V ) is
complete and thus a Banach space. As we show in the next result, this is actually
the case provided that the space V is a Banach space itself. Note, though, that the
space U need not be complete.
Theorem 5.33. Assume that (U, ∥·∥U ) is a normed space and that (V, ∥·∥V ) is
a Banach space. Then B(U, V ) is a Banach space with the norm ∥·∥B(U,V ) .
Proof. We have to show that the normed space B(U, V ) is complete, that is,
that every Cauchy sequence in B(U, V ) converges. Assume thus that (Tn )n∈N ⊂
B(U, V ) is a Cauchy sequence. We have to show that there exists T ∈ B(U, V )
such that ∥Tn − T ∥B(U,V ) → 0 as n → ∞.
Let ε > 0. Because (Tn )n∈N is a Cauchy sequence, there exists some N ∈ N
such that
(53) ∥Tn − Tm ∥B(U,V ) < ε whenever n, m ≥ N.
Let now u ∈ U . Then
∥Tn u − Tm u∥U ≤ ∥Tn − Tm ∥B(U,V ) ∥u∥U < ε∥u∥U whenever n, m ≥ N.
Thus for every fixed u ∈ U , the sequence (Tn u)n∈N ⊂ V is a Cauchy sequence.
Because V is complete, it follows that limn→∞ T un exists in V . Define therefore
T u := lim T un .
n→∞
With this definition, we obtain a mapping T : U → V . We still have to show
that T is bounded linear, and that ∥Tn − T ∥B(U,V ) → 0 as n → ∞.
For the linearity of T we observe that
T (u + v) = lim Tn (u + v) = lim (Tn u + Tn v) = lim Tn u + lim Tn v = T u + T v
n→∞ n→∞ n→∞ n→∞
for all u, v ∈ U , and that
T (λu) = lim Tn (λu) = λ lim Tn u = λTn u
n→∞ n→∞
for all u ∈ U and λ ∈ K.
Since (Tn )n∈N is a Cauchy sequence in a normed space, it is bounded. Thus
there exists C ≥ 0 such that ∥Tn ∥B(U,V ) ≤ C for all n ∈ N. Consequently, we can
estimate
∥T u∥U = lim ∥Tn u∥U ≤ C∥u∥U
n→∞
for all u ∈ U . This shows that T is bounded linear.
It remains to show that ∥Tn − T ∥B(U,V ) → 0 as n → ∞. Let again ε > 0, and
let N ∈ N be such that (53) holds. Then we have for all u ∈ U and all n ≥ N that
∥(Tn − T )u∥V = ∥Tn u − T u∥V = lim ∥Tn u − Tm u∥V
m→∞
≤ lim ∥Tn − Tm ∥B(U,V ) ∥u∥U ≤ ε∥u∥U .
m→∞
Thus
∥Tn − T ∥B(U,V ) ≤ ε whenever n ≥ N,
which shows that ∥Tn − T ∥B(U,V ) → 0 as n → ∞. □
In particular, we may apply the previous result to the case V = K.
Definition 5.34 (functional). Let (U, ∥·∥U ) be a normed space. A bounded
linear functional on U is a bounded linear mapping f : U → K. ■

Example 5.35. We consider some examples of bounded linear functionals:


5. SPACES OF BOUNDED LINEAR MAPPINGS 137

• We consider the Banach space (C([0, 1]), ∥·∥∞ ). For x ∈ [0, 1] define
δx : C([0, 1]) → K,
δx (f ) := f (x).
Then
|δx (f )| = |f (x)| ≤ sup |f (y)| = ∥f ∥∞ ,
y∈[0,1]
which shows that δx is a bounded linear functional. The mapping δx is
called the point evaluation at x, or also the Dirac δ-distribution.4
• Assume that U is an inner product space and that v ∈ U is fixed. Then
the mapping u 7→ ⟨u, v⟩U is bounded linear and thus a bounded linear
functional. For a proof of this statement, I refer to the exercise sheets.
• Consider again the Banach space (C([0, 1]), ∥·∥∞ ), and let g ∈ C([0, 1]) be
a fixed function. Then the mapping Tg : C([0, 1]) → K,
Z 1
Tg (f ) := f (x)g(x) dx
0
is a bounded linear functional on C([0, 1]), as
Z 1
|Tg (f )| ≤ |f (x)||g(x)| dx ≤ ∥f ∥∞ ∥g∥∞
0
for all f ∈ C([0, 1]).

Definition 5.36 (dual space). Assume that U is a normed apce. The dual
space of U is defined as
U ∗ := B(U, K) = f : U → K : f is bounded linear .


Theorem 5.37. Let (U, ∥·∥U ) be a normed space. Then U ∗ is a Banach space
with the norm
∥f ∥U ∗ = sup |f (u)|.
u∈U,
∥u∥U ≤1

Proof. This immediately follows from Theorem 5.33 using the completeness
of K. □
Example 5.38. We consider the case U = Kn with the norm ∥·∥p for some
1 ≤ p ≤ ∞. Then every linear mapping T : Kn → K can be written as a matrix-
vector produce T x = Ax for some matrix A ∈ K1×n ∼ Kn . Moreover, one can
see that every linear mapping T : Kn → K is actually bounded. Thus we can
“identify” the dual space (Kn )∗ with the space Kn . The norm on the dual space
(Kn )∗ , however, in general is different from the norm on Kn . Indeed, since we are
considering the p-norm ∥·∥p on Kn , we have
n
X
∥A∥(K n )∗ = sup |Ax| = sup ai xi .
∥x∥p =1 ∥x∥p =1 i=1

By Hölder’s inequality, we can now estimate


n
X
ai xi ≤ ∥A∥p∗ ∥x∥p ,
i=1

4For more information on distributions and specifically their importance in the solution of
partial differential equations, I refer to the class TMA4305 - Partial Differential Equations.
138 5. BOUNDED LINEAR OPERATORS

and thus
∥A∥(K n )∗ ≤ ∥A∥p∗ .
We will show now that we have in fact an equality here, that is, that ∥A∥(K n )∗ =
∥A∥p∗ . For this, we have to find for each A ∈ Kn some x ∈ Kn with x ̸= 0 such that
|Ax| = ∥A∥p∗ ∥x∥p . For simplicity, we will restrict ourselves to the case 1 < p < ∞.
For A = 0 this holds for all x ∈ Kn . Let therefore A ∈ Kn \ {0}. Define x ∈ Kn
by setting (
ai |ai |p∗ −2 if ai ̸= 0,
xi :=
0 if ai = 0.
Then we have that
|xi | = |ai |p∗ −1 and ai xi = |ai |p∗
for all i. Using the identities p∗ = p/(p − 1) and p∗ /p = p∗ − 1 we then can compute
X n 1/p Xn 1/p
∥x∥p = |ai |(p∗ −1)p = |ai |p∗ = ∥A∥pp∗∗ /p = ∥A∥pp∗∗ −1 .
i=1 i=1

Thus
n
X n
X
|Ax| = ai xi = |ai |p∗ = ∥A∥pp∗∗ = ∥A∥p∗ ∥A∥pp∗∗ −1 = ∥A∥p∗ ∥x∥p .
i=1 i=1

Together with the previous estimate, this shows that ∥A∥(Kn )∗ = ∥A∥p∗ . Thus the
dual space of (Kn , ∥·∥p ) can be identified with the space (Kn , ∥·∥p∗ ). ■

Theorem 5.39. Let 1 ≤ p < ∞ and let f : ℓp → K be linear. The following are
equivalent:
(1) f ∈ (ℓp )∗ .
(2) There exists a sequence y ∈ ℓp∗ such that

X
f (x) = xk yk
k=1

for all x ∈ ℓp .
In addition, the sequence y ∈ ℓp∗ from (2) is unique and we have
∥f ∥(ℓp )∗ = ∥y∥p∗ .
Proof. (2) =⇒ (1): By Hölder’s inequality, we can estimate

X
|f (x)| = xk yk ≤ ∥x∥p ∥y∥p∗ ,
k=1

which shows that f is bounded linear and that


(54) ∥f ∥(ℓp )∗ ≤ ∥y∥p∗ .

(1) =⇒ (2): Define the sequence y = (y1 , y2 , . . .) by


yk = f (ek )
p
with ek ∈ ℓ being the k-th unit sequence. Then we have for all finite sequences
x = (x1 , . . . , xN , 0, . . .) ∈ cfin that
N
X N
X
f (x) = xk f (ek ) = xk yk .
k=1 k=1
5. SPACES OF BOUNDED LINEAR MAPPINGS 139

Moreover, we can estimate


N
X
xk yk = |f (x)| ≤ ∥f ∥(ℓp )∗ ∥x∥p .
k=1

We will show now that y ∈ ℓp∗ . Again, we assume for simplicity that 1 < p < ∞.
As in Example 5.38 we define a sequence x = (x1 , x2 , . . .) setting
(
yk |yk |p∗ −2 if xk ̸= 0,
xk =
0 if xk = 0.

Then x(N ) := (x1 , . . . , xN , 0, . . .) ∈ cfin ⊂ ℓp for all N ∈ N. Define now y (N ) :=


(y1 , . . . , yN , 0, . . .). Similarly as in Example 5.38 we have that
N
X 1/p
∥x(N ) ∥p = |yk |p∗ = ∥y (N ) ∥pp∗∗ −1 .
k=1

Thus
N
X N
X
∥y (N ) ∥pp∗∗ = |yk |p∗ = xk yk
k=1 k=1

= |f (x(N ) )| ≤ ∥f ∥(ℓp )∗ ∥x(N ) ∥p = ∥f ∥(ℓp )∗ ∥y (N ) ∥pp∗∗ −1 .


This shows that
∥y (N ) ∥p∗ ≤ ∥f ∥(ℓp )∗
for all N ∈ N. Taking the limit N → ∞, this implies that
(55) ∥y∥p∗ ≤ ∥f ∥(ℓp )∗
and thus y ∈ ℓp∗ .
Since cfin ⊂ ℓp is dense, it follows from Theorem 5.23 that the mapping

X
x 7→ xk yk
k=1

is a well-defined bounded linear mapping on ℓp . Moreover, by construction it co-


incides with f on the dense linear subspace cfin of ℓp . From the uniqueness in
Theorem 5.23 we can thus conclude that

X
f (x) = xk yk for all x ∈ ℓp .
k=1

Assume now that z ∈ ℓp∗ is another sequence such that



X
f (x) = xk zk
k=1

for all x ∈ ℓp . Then we have in particular that


yk = f (ek ) = zk .
This proves the uniqueness of the sequence y ∈ ℓp∗ .
Finally, the equality ∥f ∥(ℓp )∗ = ∥y∥p∗ follows from the two estimates (54)
and (55). □

Similar as in the finite dimensional case, this implies that we can “identify” for
1 ≤ p < ∞ the dual space (ℓp )∗ with the space ℓp∗ .
140 5. BOUNDED LINEAR OPERATORS

Remark 5.40. For p = ∞ the statement of the previous theorem no longer


holds. It is still straightforward to show that every sequence y ∈ ℓ1 defines a
bounded linear mapping f ∈ (ℓ∞ )∗ as

X
f (x) := xk yk .
k=1
1 ∞
Thus we can say that ℓ ⊂ ℓ . However, there exist in addition bounded linear
functionals on ℓ∞ that cannot be written in this form. Even more, there exist
non-zero bounded linear functionals f ∈ (ℓ∞ )∗ , f ̸= 0, such that f (ek ) = 0 for all
k ∈ N. This is due to fact that the space of finite sequences cfin is not dense in ℓ∞ ,
and thus we cannot make use of Theorem 5.23. ■
CHAPTER 6

Hilbert Spaces

1. Best approximation and projections


Previously we have introduced the notion of the orthogonal projection onto
a linear subspace in the context of finite dimensional inner product spaces. In
the following, we will discuss to which extent this can be generalised to infinite
dimensional Hilbert spaces. More general, we will discuss the idea of a projection
onto closed convex subsets of Hilbert spaces.
Definition 6.1 (convex set). Assume that U is a vector space and that K ⊂ U
is a subset. We say that K is convex if for all u, v ∈ K and all 0 < λ < 1 we have
that λu + (1 − λ)v ∈ K. ■

Put differently, a set K is convex if for all vectors u, v ∈ K the whole line
segment connecting u and v is contained in K.
Example 6.2. Assume K ⊂ U is a linear subspaces. Then K is convex. Indeed,
assume that u, v ∈ K. Then λu + µv ∈ K for all λ, µ ∈ K. With 0 < λ < 1 and
µ = 1 − λ we obtain the convexity of K. ■

Example 6.3. Assume (U, ∥·∥U ) is a normed space and let R > 0. Then the
ball

BR (0) = u ∈ U : ∥u∥U < R
is convex. Indeed, assume that u, v ∈ BR (0), that is, ∥u∥U < R and ∥v∥U < R,
and let 0 < λ < 1. Then
∥λu+(1−λ)v∥U ≤ |λ|∥u∥U +|1−λ|∥v∥U = λ∥u∥U +(1−λ)∥v∥U < λR+(1−λ)R = R,
which shows that λu + (1 − λ)v ∈ BR (0). ■

Definition 6.4 (distance to a set). Assume that (U, ∥·∥U ) is a normed space
and that K ⊂ U is a non-empty subset. For u ∈ U we denote by
dist(u, K) := inf ∥u − v∥U
v∈K

the distance between u and K. ■

Assume now that K ⊂ U is some non-empty subset and that u ̸∈ K. One


interesting question is whether there exists a closest point to u in K. Put differently,
can we find some point z ∈ K such that ∥u − z∥U = dist(u, K)? If yes, is that point
unique? The next result gives a positive answer to both of these questions in the
case where U is a Hilbert space.
Theorem 6.5 (approximation theorem). Assume that U is a Hilbert space and
that K ⊂ U is a non-empty closed and convex subset. There there exists for each
u ∈ U a unique point z = πK (u) ∈ K called the projection of u onto K such that
∥u − z∥U = dist(u, K).
141
142 6. HILBERT SPACES

Proof. By definition of the distance, there exists for each n ∈ N some zn ∈ K


such that
1
∥u − zn ∥2K ≤ dist(u, K)2 + .
n
Now let n, m ∈ N. Then we can apply the parallelogram law (see Lemma 4.5) to
zn − zm and 2u − zn − zm and obtain that

2∥zn − zm ∥2U + 2∥2u − zn − zm ∥2U


= ∥zn − zm + 2u − zn − zm ∥2U + ∥zn − zm − (2u − zn − zm )∥2U
= 4∥u − zm ∥2U + 4∥u − zn ∥2U .
Now the convexity of K and the assumption zn , zm ∈ K imply that (zn +zm )/2 ∈ K
and thus
zn + zm 2
∥2u − zn − zm ∥2U = 4 u − ≥ 4 dist(u, K)2 .
2 U
Thus we can estimate

∥zn − zm ∥2U ≤ 2∥u − zm ∥2U + 2∥u − zn ∥2U − 4 dist(u, K)2


2 2 2 2
≤ 2 dist(u, K)2 + + 2 dist(u, K)2 + − 4 dist(u, K)2 = + .
m n m n
This, however, shows that the sequence (zn )n∈N is a Cauchy sequence.
Since U is a Hilbert space and thus complete, it follows that the sequence
(zn )n∈N converges to some z ∈ U . Now the closedness of K and the assumption
zn ∈ K for all n ∈ N imply that z ∈ K. As a consequence, we have that
∥u − z∥2U ≥ dist(u, K)2 .
On the other hand, the continuity of the norm implies that
∥u − z∥2U = lim ∥u − zn ∥2U ≤ dist(u, K).
n→∞

Thus z ∈ K satisfies
∥u − z∥U = dist(u, K).

We still have to show that z is unique. Assume therefore that y, z ∈ K satisfy


∥u − y∥U = dist(u, K) = ∥u − z∥U .
The convexity of K implies again that (y−z)/2 ∈ K. With the same argumentation
as above, we thus obtain that
∥y − z∥2U ≤ 2∥u − y∥2U + 2∥u − z∥2U − 4 dist(u, K) = 0.
This, however, implies that y = z, which proves the uniqueness of z. □
Theorem 6.6 (characterisation of the projection). Assume that U is a Hilbert
space, that K ⊂ U is a non-empty closed and convex subset, and that u ∈ U . Then
z = πK (u) is the projection of u onto K, if and only if z ∈ K and the variational
inequality
(56) ℜ(⟨u − z, v − z⟩) ≤ 0 for all v ∈ K
holds.
Proof. Assume first that z = πK (u) is the projection of u onto K. Then we
have for all v ∈ K that ∥u − z∥2U ≤ ∥u − v∥2U . Even more, because of the convexity
of K we have that
(57) ∥u − z∥2U ≤ ∥u − λv − (1 − λ)z∥2U for all v ∈ K and 0 ≤ λ ≤ 1.
1. BEST APPROXIMATION AND PROJECTIONS 143

Now we can write


∥u − z∥2U = ∥u∥2U − 2ℜ(⟨u, z⟩) + ∥z∥2U
and
∥u − λv − (1 − λ)z∥2U = ∥u∥2U + λ2 ∥v∥2U + (1 − λ)2 ∥z∥2U
− 2λℜ(⟨u, v⟩) − 2(1 − λ)ℜ(⟨u, z⟩) + 2λ(1 − λ)ℜ(⟨v, z⟩).
Define therefore
fv (λ) := λ2 ∥v∥2U − λ(2 − λ)∥z∥2U − 2λℜ(⟨u, v⟩) + 2λℜ(⟨u, z⟩) + 2λ(1 − λ)ℜ(⟨v, z⟩).
Thus (57) can be reformulated as
(58) fv (λ) ≥ 0 for all v ∈ K and 0 ≤ λ ≤ 1.
Now note that fv (λ) : R → R is a quadratic function with non-negative leading
coefficient and that fv (0) = 0. Thus (58) holds if and only if fv′ (0) ≥ 0 for all
v ∈ K. Now we compute
fv′ (λ) = 2λ∥v∥2U − 2(1 − λ)∥z∥2U − 2ℜ(⟨u, v⟩) + 2ℜ(⟨u, z⟩) + 2(1 − 2λ)ℜ(⟨v, z⟩),
and thus
fv′ (0) = −2∥z∥2U − 2ℜ(⟨u, v⟩) + 2ℜ(⟨u, z⟩) + 2ℜ(⟨v, z⟩) = −2ℜ(⟨z − u, z − v⟩).
Thus (58) and therefore (57) is equivalent to the variational inequality
−fv′ (0) = ℜ(⟨z − u, z − v⟩) ≤ 0 for all v ∈ K.
This shows that the assumption z = πK (u) implies (56).
Conversely, assume that z ∈ K and that (56) holds. As seen above, this is
equivalent to z satisfying (57). Together with the assumption z ∈ K, this implies
that z solves the problem minv∈K ∥u − v∥U . Because of the uniqueness of the
projection, this shows that z = πK (u). □
Corollary 6.7. Assume that U is a Hilbert space and that W ⊂ U is a closed
subspace. Then there exists a unique point z = πW (u) such that z ∈ W and
∥z − u∥ = dist(u, W ) = inf ∥u − w∥.
w∈W

Moreover, z is the unique point satisfying


z∈W and u − z ∈ W ⊥,
where
W ⊥ = v ∈ U : ⟨v, w⟩ = 0 for all w ∈ W


is the orthogonal complement of W in U .


Proof. The existence and uniqueness follow from Theorem 6.5. Moreover,
Theorem 6.6 implies that z = πW (u) is the unique point satisfying z ∈ W and
(59) ℜ(⟨u − z, v − z⟩) ≤ 0 for all v ∈ W.
We thus have to show that (59) holds if and only if u − z ∈ W ⊥ .
Assume therefore that (59) holds and let w ∈ W . Since W is a subspace of U
and z ∈ W it follows that λw + z ∈ W for all λ ∈ K. Thus (59) implies that
0 ≥ ℜ(⟨u − z, λw + z − z⟩) = ℜ(⟨u − z, λw⟩) for all λ ∈ K.
Now
ℜ(⟨u − z, λw⟩) = ℜ(λ̄⟨u − z, w⟩) = ℜ(λ)ℜ(⟨u − z, w⟩) + ℑ(λ)ℑ(⟨u − z, w⟩).
This can only be non-positive for all λ ∈ K if ⟨u − z, w⟩ = 0 and thus u − z ∈ W ⊥ .
144 6. HILBERT SPACES

Conversely, assume that u − z ∈ W ⊥ , that is ⟨u − z, w⟩ = 0 for all w ∈ W , and


let v ∈ W . Then v−z ∈ W because W is a subspace of U , and thus ⟨u−z, v−z⟩ = 0.
In particular (59) holds. □

Corollary 6.8. Assume that U is a Hilbert space, that W ⊂ U is a closed


subspace, and that w0 ∈ U . Define Ŵ := w0 + W . Then there exists for each u ∈ U
a unique point z = πŴ (u) such that πŴ (u) ∈ Ŵ and
∥z − u∥ = dist(u, Ŵ ) = inf ∥u − w∥.
w∈Ŵ

Moreover, z is the unique point satisfying


z ∈ Ŵ and u − z ∈ W ⊥.
Proof. The existence and uniqueness of πŴ (u) follow from Theorem 6.5. De-
note now û := u − w0 . Then it is easy to see that πŴ (u) = πW (û) + w0 . Now by
Corollary 6.7, ẑ := πW (û) is characterised by the conditions ẑ ∈ W and ẑ− û ∈ W ⊥ .
Since ẑ = z − w0 and û = u − w0 , these conditions are the same as the conditions
z ∈ w0 + W and z − u ∈ W ⊥ , which proves the assertion. □

Remark 6.9. We have stated Theorem 6.6 and Corollary 6.7 in the setting of
closed convex subsets (subspaces) of Hilbert spaces. However, the characteristions
for the projections that we have derived there remain valid for arbitrary convex
subsets (subspaces) of inner product spaces. That is, if U is an inner product space
and K ⊂ U is convex and non-empty, then z = πK (u) solves the optimisation
problem
min∥v − u∥,
v∈K
if and only if
z∈K and ℜ(⟨u − z, v − z⟩) ≤ 0 for all v ∈ K.
Moreover, if W ⊂ U is a subspaces, then z = πW (u) solves the problem
min ∥u − w∥
w∈W

if and only if
z∈W and u − z ∈ W ⊥.
Note, however, that such a vector need not exist. ■

In the following example we make use of the characterisation of the projection


by means of a variational inequality in order to compute projections on balls in
Hilbert spaces.
Example 6.10. Let U be a Hilbert space, let R > 0 and let K = B̄R (0) ⊂ U
be the closed unit ball of radius R in U . Then K is a closed and convex set in a
Hilbert space, and thus the projection πK (u) exists for every u ∈ U . We will now
show that we can explicitly compute this projection as
 
u if ∥u∥ ≤ R
Ru
(60) πK (u) = u = .
R if ∥u∥ > R max{∥u∥, R}
∥u∥
First assume that u ∈ K, that is, ∥u∥ ≤ R. Then obviously πK (u) = u.
Thus (60) holds in this case.
Now assume that u ̸∈ K, that is, ∥u∥ > R, and denote
u
z=R .
∥u∥
1. BEST APPROXIMATION AND PROJECTIONS 145

Then ∥z∥ = R and thus z ∈ K. In view of Theorem 6.6 we thus have to show that
(61) ℜ(⟨u − z, v − z⟩) ≤ 0 for all v ∈ K.
Assume therefore that v ∈ K. We can compute

⟨u − z, v − z⟩ = ⟨u, v⟩ − ⟨z, v⟩ − ⟨u, z⟩ + ⟨z, z⟩


R  ⟨u, v⟩ 
= ⟨u, v⟩ − ⟨u, v⟩ − R∥u∥ + R2 = ∥u∥ − R −R .
∥u∥ ∥u∥
By assumption, ∥u∥ > R. Moreover, the Cauchy–Schwarz–Bunyakovsky inequality
implies that
ℜ(⟨u, v⟩) ≤ |⟨u, v⟩| ≤ ∥u∥∥v∥.
Since ∥v∥ ≤ R it follows that
 ℜ(⟨u, v⟩)   
ℜ(⟨u − z, v − z⟩) = ∥u∥ − R − R ≤ ∥u∥ − R ∥v∥ − R ≤ 0.
∥u∥
Thus (61) holds, which in turn shows that πK (u) = z = Ru/∥u∥. ■

Remark 6.11. The existence and uniqueness proof for the projection onto a
convex set relied strongly on the fact that U is a Hilbert space: During the proof
we relied twice on the parallelogram law, both for proving the existence and for
proving the uniqueness. Indeed, in general Banach spaces neither existence nor
uniqueness need to hold.
Consider first the case of U = R2 with the ∞-norm. In this case, the projection
need not be unique. That is, given a closed and convex set K ⊂ U and a point
u ̸∈ K there might be different points z ∈ K such that ∥u − z∥ = dist(u, K). Take,
for instance, K = Re1 the x-axis, and let u = (0, 1). Then dist(u, K) = 1, but
every point (a, 0) ∈ K with |a| ≤ 1 satisfies ∥(a, 0) − u∥∞ = 1 = dist(u, K). That
is, every point (a, 0) ∈ K with |a| ≤ 1 can be considered a projection of (0, 1) onto
K. In particular, the projection is not unique.
Now we consider the case U = ℓ1 the space of all real-valued summable se-
quences, and we define
∞ 
n X 1 o
K = x ∈ ℓ1 : 1− xn ≥ 1 .
n=1
n
Then one can show that K is a non-empty, closed and convex set. However, the
projection of 0 onto K does not exist. That is, there exists no x ∈ K such that
∥x − 0∥1 = dist(0, K). In order to see this, we note first that for all x ∈ ℓ1 we have
∞  ∞  ∞
X 1 X 1 X
1≤ 1− xn ≤ 1− |xn | < |xn | = ∥x∥1 .
n=1
n n=1
n n=1

That is, ∥x∥1 > 1 for all x ∈ ℓ1 , which also implies that dist(0, K) ≥ 1. On the
other hand, for all n ≥ 2 we have that
n
en ∈ K,
n−1
which shows that
n n
dist(0, K) ≤ en =
n−1 1 n−1
for all n ≥ 2. Taking the infimum over all n ≥ 2, it follows that dist(0, K) ≤ 1.
Together, these estimates show that dist(0, K) = 1. However, as shown above,
∥x∥1 > 1 for all x ∈ K. Thus there exists no x ∈ K such that ∥x∥1 = dist(0, K). ■
146 6. HILBERT SPACES

Proposition 6.12. Assume that U is a Hilbert space and that K ⊂ U is non-


empty, closed and convex. Then we have for all u, v ∈ U that
∥πK (u) − πK (v)∥ ≤ ∥u − v∥.
Proof. If πK (u) = πK (v), the claim is trivial, as the left hand side is equal to
zero. Assume therefore that πK (u) ̸= πK (v). We can write
∥πK (u) − πK (v)∥2 = ⟨πK (u) − πK (v), πK (u) − u + u − v + v − πK (v)⟩
= ⟨πK (v) − πK (u), u − πK (u)⟩ + ⟨πK (u) − πK (v), u − v⟩
+ ⟨πK (u) − πK (v), v − πK (v)⟩.
Since πK (v) ∈ K and πK (u) ∈ K, we can now use the variational inequality (56),
which shows that
ℜ(⟨πK (v) − πK (u), u − πK (u)⟩) ≤ 0,
ℜ(⟨πK (u) − πK (v), v − πK (v)⟩) ≤ 0.
Moreover, using the Cauchy–Schwarz–Bunyakovsky inequality, we can estimate
ℜ(⟨πK (v) − πK (u), u − v⟩) ≤ |⟨πK (v) − πK (u), u − v⟩|
≤ ∥πK (v) − πK (u)∥∥u − v∥.
Thus 2
∥πK (u) − πK (v)∥2 = ℜ ∥πK (u) − πK (v)∥
= ℜ ⟨πK (v) − πK (u), u − πK (u)⟩
+ ⟨πK (u) − πK (v), u − v⟩

+ ⟨πK (u) − πK (v), v − πK (v)⟩
≤ ℜ(⟨πK (v) − πK (u), v − u⟩)
≤ ∥πK (v) − πK (u)∥∥v − u∥.
Dividing by ∥πK (u) − πK (v)∥, we obtain the claimed estimate. □

2. Projections on subspaces
We now study in more details the properties of projections onto closed sub-
spaces. For this we will exploit the characterisation of the projection derived in
Corollary 6.7.
The results to be derived will be a generalisations of our discussion in Section 3
of the finite dimensional case.
Proposition 6.13. Assume that U is a Hilbert space and that W ⊂ U is a
closed subspace. Then the projection πW : U → U is a bounded linear mapping.
Moreover, ker πW = W ⊥ and ran πW = W .
Proof. Linearity: Assume that u, v ∈ U . Then πW (u) ∈ W and πW (v) ∈ W ,
which implies that also πW (u) + πW (v) ∈ W . Moreover, u − πW (u) ∈ W ⊥ and
v − πW (v) ∈ W ⊥ , and thus (u + v) − (πW (u) + πW (v)) ∈ W ⊥ . This shows that
z = πW (u) + πW (v) satisfies the conditions of Corollary 6.7 and thus πW (u) +
πW (v) = πW (u + v).
Similarly, if u ∈ U and λ ∈ K, then λπW (u) ∈ W and
λu − λπW (u) = λ(u − πW (u)) ∈ W ⊥ ,
which shows that λπW (u) = πW (λu).
Boundedness: This is a direct consequence of Proposition 6.12.
Kernel and range: We have that πW (u) = 0 if and only if 0 ∈ W and u − 0 ∈
W ⊥ . This is the case, if and only if u ∈ W ⊥ , which shows that ker πW = W ⊥ . Next,
2. PROJECTIONS ON SUBSPACES 147

πW (u) ∈ W for all u ∈ U , which shows that ran πW ⊂ W . Moreover, πW (u) = u


for all u ∈ W , which proves that, actually, ran πW = W . □
Lemma 6.14. Let U be a Hilbert space and let W ⊂ U be a closed subspace.
Then
U = W ⊕ W ⊥.
Proof. We can write every u ∈ U as u = πW u + (u − πW u) with πW u ∈ W
and u − πW u ∈ W ⊥ . Thus U = W + W ⊥ . Now assume that u ∈ W ∩ W ⊥ . Since
u ∈ W , we have u = πW u. Thus
∥u∥2 = ⟨u, u⟩ = ⟨u, πW u⟩ = 0,
the last equality following from the fact that u ∈ W ⊥ and πW u ∈ W . This shows
that W ∩ W ⊥ = {0}. Thus the sum of W and W ⊥ is direct. □
For the next results, we need some topological information about orthogonal
complements.
Lemma 6.15. Let U be an inner product space and let v ∈ U be fixed. Then
the mapping f : U → K, f (u) := ⟨u, v⟩ is bounded linear.
Proof. The linearity of f is simply the linearity of the inner product in the
first component. Moreover, the boundedness follows from the estimate
|f (u)| = |⟨u, v⟩| ≤ ∥u∥∥v∥.

Lemma 6.16. Assume that U is an inner product space and that S ⊂ U is
non-empty. Then S ⊥ ⊂ U is a closed subspace.
Proof. We have already shown previously that S ⊥ is a subspace. Thus we
only have to show that S ⊥ is closed. For that assume that (un )n∈N ⊂ S ⊥ is a
sequence such that u := limn→∞ un exists in U . Since un ∈ S ⊥ for all n ∈ N, we
have that ⟨un , v⟩ = 0 for all v ∈ S. Because of the boundedness of the functional
w 7→ ⟨un , v⟩ it follows that
⟨u, v⟩ = lim ⟨un , v⟩ = 0
n→∞

for all v ∈ S, which proves that u ∈ S ⊥ . □


Lemma 6.17. Let U be a Hilbert space and let W ⊂ U be a closed subspace.
Then
(W ⊥ )⊥ = W.
Proof. Assume first that w ∈ W . Then ⟨w, z⟩ = 0 for all z ∈ W ⊥ . This,
however, implies that w ∈ (W ⊥ )⊥ . Thus W ⊂ (W ⊥ )⊥ .
Now assume that u ∈ (W ⊥ )⊥ . Since W is a closed subspace, we can write
u = w + z with w = πW u ∈ W and z = u − πW u ∈ W ⊥ . By our previous
considerations, we obtain that w ∈ W ⊂ (W ⊥ )⊥ . Since (W ⊥ )⊥ is a subspace of U ,
this implies that z = u − w ∈ (W ⊥ )⊥ . Now, if we apply Lemma 6.14 to the closed
subspace W ⊥ , we obtain that
U = W ⊥ ⊕ (W ⊥ )⊥ .
In particular, W ⊥ ∩ (W ⊥ )⊥ = {0}. Thus
z ∈ W ⊥ ∩ (W ⊥ )⊥ = {0},
which implies that z = 0 and consequently u = w ∈ W . Thus (W ⊥ )⊥ ⊂ W . □
148 6. HILBERT SPACES

Next we will consider the question as to what happens, if the subspace W is


not closed.
Proposition 6.18. Assume that U is a Hilbert space and that W is a (not
necessarily closed) subspace. Then
(W ⊥ )⊥ = W .

Proof. Since W ⊂ W , it follows that W ⊂ W ⊥ , and thus, as W by definition
is closed,

(W ⊥ )⊥ ⊂ (W )⊥ = W .
Conversely, we obtain as in the proof of the previous Lemma that W ⊂ (W ⊥ )⊥ .
Since (W ⊥ )⊥ is closed and W is the smallest closed subspace of U containing W ,
this implies that actually W ⊂ (W ⊥ )⊥ . □
Corollary 6.19. Assume that U is a Hilbert space and W is a subspace. Then
U = W ⊕ W ⊥.
Proof. Since W ⊥ is closed, we can write
U = W ⊥ ⊕ (W ⊥ )⊥ = W ⊥ ⊕ W .

Corollary 6.20. Assume that U is a Hilbert space and S ⊂ U is a non-empty
subset. Then
S ⊥ = (span S)⊥ = (span S)⊥ .
Proof. Since S ⊂ span S, we have that (span S)⊥ ⊂ S ⊥ . Now assume that
u ∈ S ⊥ . Then ⟨u, w⟩ = 0 for all w ∈ S. Thus ⟨u, w⟩ = 0 whenever w is a finite
linear combination of elements in S. In other words, u ∈ (span S)⊥ . This shows
that S ⊥ = (span S)⊥ . Now it follows from Proposition 6.18 that
(span S)⊥ = (((span S)⊥ )⊥ )⊥ = (span S)⊥ .

Corollary 6.21. Assume that U is a Hilbert space and S ⊂ U a non-empty
subset. Then span S is dense in U if and only if S ⊥ = {0}.
Proof. Recall that span S is dense if and only if span S = U . Now we can
write
U = span S ⊕ (span S)⊥ = span S ⊕ S ⊥ .
Thus U = span S if and only if S ⊥ = {0}, which proves the claim. □

3. Riesz’ Representation Theorem


Recall that the dual of a normed space U is the space U ∗ = B(U, K) of all
bounded linear functionals on U . One of the results we have obtained when we
were discussing the dual of a normed space was that the Hilbert space (ℓ2 )∗ can be
identified with the space ℓ2 . We will now show that this identification of a Hilbert
space with its dual works in general.
Theorem 6.22 (Riesz’ Representation Theorem). Assume that U is a Hilbert
space. Then f ∈ U ∗ if and only if there exists vf ∈ U such that
f (u) = ⟨u, vf ⟩ for all u ∈ U.
Moreover, vf is unique, and we have that
∥f ∥U ∗ = ∥vf ∥.
3. RIESZ’ REPRESENTATION THEOREM 149

Proof. The “if” part follows directly from the fact that the mapping u 7→
⟨u, vf ⟩ is bounded linear, as discussed previously.
For the “only if” part, assume that f ∈ U ∗ . If f = 0, then we can choose
vf = 0. Assume therefore that f ̸= 0, and denote W := ker f . Then W is a closed
subspace of U , and thus we can write U = W ⊕ W ⊥ . Moreover, since f ̸= 0,
we have that W = ker f ⊊ U , which implies that W ⊥ ̸= {0}. Choose now some
z ∈ W ⊥ with z ̸= 0. Since z ̸∈ W = ker f , it follows that f (z) ̸= 0. Moreover, we
have for all u ∈ U that
 f (u)  f (u)
f u− z = f (u) − f (z) = 0,
f (z) f (z)
and thus
f (u)
u− z ∈ ker f = W.
f (z)
Since z ∈ W ⊥ , this implies that
D f (u) E
u− z, z = 0
f (z)
for all u ∈ U . This can be rewritten as
f (u)
⟨u, z⟩ = ∥z∥2 ,
f (z)

or, as ∥z∥ =
̸ 0,
f (z) D f (z) E
f (u) = 2
⟨u, z⟩ = u, z
∥z∥ ∥z∥2
for all u ∈ U . Thus, if we define

f (z)
vf = z,
∥z∥2

we obtain that f (u) = ⟨u, vf ⟩ for all u ∈ U .

Uniqueness: Assume now that w, z ∈ U are such that

⟨u, w⟩ = f (u) = ⟨u, z⟩

for all u ∈ U . Then we have in particular that

⟨u, w − z⟩ = 0 for all u ∈ U,

which implies that w − z = 0 or w = z. This proves the uniqueness of vf .

In order to compute the norm of vf , we note first that

|f (u)| = |⟨u, vf ⟩| ≤ ∥u∥∥vf ∥

for all u ∈ U by the Cauchy–Schwarz–Bunyakovsky inequality, which shows that


∥f ∥U ∗ ≤ ∥vf ∥U . Conversely, for u = vf we obtain that

|f (vf )| = |⟨vf , vf ⟩| = ∥vf ∥2 ,

and thus ∥f ∥U ∗ ≥ ∥vf ∥U . □


150 6. HILBERT SPACES

4. Adjoint operators on Hilbert spaces


In Section 5 we have introduced the notion of the adjoint of a linear trans-
formation T between finite dimensional inner product spaces. There, we have first
defined the adjoint T ∗ based on the singular value decomposition of the mapping
T , and then we have shown that T ∗ can also be characterised by the property that
⟨T u, v⟩ = ⟨u, T ∗ v⟩ for all vectors u and v. We now will generalise the notion of
an adjoint to arbitrary Hilbert spaces. In contrast to the finite dimensional case,
however, we will do so by using the characterisation by ⟨T u, v⟩ = ⟨u, T ∗ v⟩ as the
definition. This is necessary, as we have not developed a theory of the singular
value decomposition in infinite dimensional Hilbert spaces.1
In the proofs of the theorems to come, we repeatedly make use of the following
result, which thus deserves its own Lemma:
Lemma 6.23. Assume that U is an inner product space and that v ∈ U is some
vector. Then v = 0 if and only if
⟨u, v⟩ = 0 for all u ∈ U.
Proof. If v = 0, then also ⟨u, v⟩ = 0 for all u ∈ U .
Conversely, assume that ⟨u, v⟩ = 0 for all u ∈ U . Then in particular ∥v∥2 =
⟨v, v⟩ = 0, which proves that v = 0. □
Theorem 6.24 (Adjoint operator). Assume that U and V are Hilbert spaces
and that T : U → V is a bounded linear transformation. Then there exists a unique
bounded linear transformation T ∗ : V → U such that
⟨T u, v⟩V = ⟨u, T ∗ v⟩U for all u ∈ U and v ∈ V.

The operator T is called the adjoint of T .
Proof. Construction of T ∗ : Define for fixed v ∈ V the mapping fv : U → K,
fv (u) := ⟨T u, v⟩V .
The linearity of T together with the linearity of the inner product imply that fv is
linear. Moreover we have the estimate
|fv (u)| = |⟨T u, v⟩V | ≤ ∥T u∥V ∥v∥V ≤ ∥T ∥B(U,V ) ∥v∥V ∥u∥U ,
which shows that fv is a bounded linear functional on U . The Riesz Representation
Theorem now implies that there exists some w ∈ U such that
fv (u) = ⟨u, w⟩U

for all u ∈ U . Define thus T v := w.
Repeating this construction for each v ∈ V yields a mapping T ∗ : V → U with
the property that
⟨T u, v⟩V = ⟨u, T ∗ v⟩U
for all u ∈ U and v ∈ V . We will now show that T ∗ is bounded linear.
Linearity: Let v, w ∈ V . Then
⟨u, T ∗ (v + w)⟩U = ⟨T u, v + w⟩V = ⟨T u, v⟩V + ⟨T u, w⟩V
= ⟨u, T ∗ v⟩U + ⟨u, T ∗ w⟩U = ⟨u, T ∗ v + T ∗ w⟩U
for all u ∈ U . Thus
⟨u, T ∗ (v + w) − T ∗ v − T ∗ w⟩U = 0
for all u ∈ U , which implies that T ∗ (v + w) = T ∗ v + T ∗ w.
1It is actually possible to generalise the singular value decomposition to operators between
Hilbert spaces. This generalisation is quite complex, though, and requires some knowledge of
measure and integration theory.
4. ADJOINT OPERATORS ON HILBERT SPACES 151

Similarly, if v ∈ V and λ ∈ K, then


⟨u, T ∗ (λv)⟩U = ⟨T u, λv⟩U = λ⟨T u, v⟩U = λ⟨u, T ∗ v⟩U = ⟨u, λT ∗ v⟩U
for all u ∈ U , which shows that T ∗ (λv) = λT ∗ v. This proves the linearity of T ∗ .
Boundedness: For all v ∈ V we can estimate
∥T v∥2U = ⟨T ∗ v, T ∗ v⟩U = ⟨T T ∗ v, v⟩V ≤ ∥T T ∗ v∥V ∥v∥V ≤ ∥T ∥B(U,V ) ∥T ∗ v∥U ∥v∥V .

This shows that


(62) ∥T ∗ v∥U ≤ ∥T ∥B(U,V ) ∥v∥V ,
which proves the boundedness of T ∗ .
Uniqueness: Assume now that S ∈ B(V, U ) is another bounded linear trans-
formation such that ⟨T u, v⟩V = ⟨u, Sv⟩U for all u ∈ U and v ∈ V . Then
⟨u, Sv⟩U = ⟨T u, v⟩V = ⟨u, T ∗ v⟩V
for all u ∈ U and v ∈ V , and thus
⟨u, Sv − T ∗ v⟩U = 0
for all u ∈ U and v ∈ V . This implies that Sv = T ∗ v for all v ∈ V , which proves
that S = T ∗ . □
Definition 6.25. Let U be a Hilbert space and let T : U → U be bounded
linear.
• We say that T is self-adjoint, if T ∗ = T .
• We say that T is normal, if T T ∗ = T ∗ T .

Proposition 6.26 (Properties of adjoints). Assume that U and V are Hilbert


spaces and that T : U → V is bounded linear. Then the following hold:
• (T ∗ )∗ = T .
• ∥T ∗ ∥B(V,U ) = ∥T ∥B(U,V ) .
• ∥T ∗ T ∥B(U,U ) = ∥T T ∗ ∥B(V,V ) = ∥T ∥2B(U,V ) .
Proof. Let u ∈ U and v ∈ V . Then
⟨T u, v⟩V = ⟨u, T ∗ v⟩U = ⟨T ∗ v, u⟩U = ⟨v, (T ∗ )∗ u⟩U = ⟨(T ∗ )∗ u, v⟩V .
Since this holds for all v ∈ V , it follows that T u = (T ∗ )∗ u for all u ∈ U , which in
turn implies that T = (T ∗ )∗ .
We next show that ∥T ∗ ∥B(V,U ) = ∥T ∥B(U,V ) . For that, we note first that the
estimate ∥T ∗ ∥B(V,U ) ≤ ∥T ∥B(U,V ) follows from (62). Applying the same estimate
to T ∗ , we obtain moreover that
∥T ∥B(U,V ) = ∥(T ∗ )∗ ∥B(U,V ) ≤ ∥T ∗ ∥B(V,U ) ,
which proves the assertion.
For showing that ∥T ∗ T ∥B(U,U ) = ∥T ∥2B(U,V ) , we note first that
∥T ∗ T ∥B(U,U ) ≤ ∥T ∗ ∥B(V,U ) ∥T ∥B(U,V ) = ∥T ∥2B(U,V ) ,
as ∥T ∗ ∥B(V,U ) = ∥T ∥B(U,V ) . For the opposite inequality we note that, for all u ∈ U ,
∥T u∥2V = ⟨T u, T u⟩V = ⟨u, T ∗ T u⟩U ≤ ∥u∥U ∥T ∗ T u∥U ≤ ∥u∥2U ∥T ∗ T ∥B(U,U ) .
Thus
∥T u∥2V
∥T ∗ T u∥U ≥ sup = ∥T ∥2B(U,V ) .
u∈U ∥u∥2U
u̸=0
152 6. HILBERT SPACES

This shows that ∥T ∗ T ∥B(U,U ) = ∥T ∥2B(U,V ) .


Finally, we have that
∥T T ∗ ∥B(V,V ) = ∥(T ∗ )∗ T ∗ ∥B(V,V ) = ∥T ∗ ∥2B(V,U ) = ∥T ∥2B(U,V ) ,
which concludes the proof. □

Lemma 6.27. Assume that U , V , and W are Hilbert spaces and that T : U → V
and S : V → W are bounded linear. Then
(S ◦ T )∗ = T ∗ ◦ S ∗ .
Proof. We have that

⟨u, (S ◦ T )∗ w⟩U = ⟨(S ◦ T )u, w⟩W = ⟨S(T u), w⟩W


= ⟨T u, S ∗ w⟩V = ⟨u, T ∗ (S ∗ w)⟩U = ⟨u, (T ∗ ◦ S ∗ )w⟩U
for all u ∈ U and w ∈ W , which proves the assertion. □

We now look at some concrete examples of adjoints:


Example 6.28 (Right and left shift). Let U = ℓ2 and let T = R : U → U be
the right shift operator, that is,
Rx = (0, x1 , x2 , . . .)
2 ∗ 2 2
for all x ∈ ℓ . Then R : ℓ → ℓ is defined by the condition
⟨Rx, y⟩ = ⟨x, R∗ y⟩ for all x, y ∈ ℓ2 .
Now we can write

X ∞
X ∞
X
⟨Rx, y⟩ = (Rx)n yn = xn−1 yn = xn yn+1
n=1 n=2 n=1

and

X
⟨x, R∗ y⟩ = xn (R∗ y)n .
n=1
Thus we have the condition that
X∞ ∞
X
xn yn+1 = xn (R∗ y)n
n=1 n=1
2
for all x, y ∈ ℓ . From this, we can conclude that
yn+1 = (R∗ y)n
for all y ∈ ℓ2 and n ∈ N, which implies that R∗ = L the left shift operator
L(y1 , y2 , y3 , . . .) = (y2 , y3 , . . .).
Obviously, R ̸= L = R∗ and thus R is not self-adjoint. In addition, we note
that (L ◦ R)x = x for all x ∈ ℓ2 , whereas
(R ◦ L)x = (0, x2 , x3 , . . .).
This shows that R ◦ R = R ◦ L ̸= L ◦ R = R∗ ◦ R, and thus R is not normal.

Example 6.29. Recall that L2 ([0, 1]) is the completion of the space C([0, 1])
with respect to the norm ∥·∥2 defined by
Z 1 1/2
∥f ∥2 := |f (x)|2 dx .
0
4. ADJOINT OPERATORS ON HILBERT SPACES 153

This norm is induced by the inner product ⟨·, ·⟩ defined by


Z 1
⟨f, g⟩ := f (x)g(x) dx,
0
2
which implies that L ([0, 1]) is a Hilbert space.
Now assume that k : [0, 1] × [0, 1] → K is continuous and define the mapping
T : C([0, 1]) → L2 ([0, 1]),
Z 1
T f (x) := k(x, y)f (y) dy.
0
Then
Z 1 Z 1 Z 1 2
∥T f ∥22 = 2
|T f (x)| dx = k(x, y)f (y) dy dx
0 0 0
Z 1 Z 1 Z 1 
≤ |k(x, y)|2 dy |f (y)|2 dy dx
0 0 0
Z 1 Z 1 
= |k(x, y)|2 dx dy ∥f ∥22 ,
0 0
which proves that T is bounded linear. As a consequence, T has a unique bounded
linear extension (which we denote for simplicity again by T ) to L2 ([0, 1]).
We now compute the adjoint of T . For all f and g we have that
⟨f, T ∗ g⟩ = ⟨T f, g⟩
Z 1
= T f (x)g(x) dx
0
Z 1 Z 1 
= k(x, y)f (y) dy g(x) dx
0 0
Z 1 Z 1
= k(x, y)f (y)g(x) dy dx
0 0
Z 1Z 1
= k(x, y)f (y)g(x) dx dy
0 0
Z 1 Z 1 
= f (y) k(x, y)g(x) dx dy
0 0
= ⟨f, h⟩,
where Z 1
h(y) := k(x, y)g(x) dx.
0
This shows that T ∗ is the operator given by
Z 1
T ∗ g := k(x, y)g(x) dx.
0

That is, T is again an integral functional with kernel k ∗ (x, y) := k(y, x).

From this we see in particular that T is self-adjoint if k is conjugate symmetric


in the sense that k(x, y) = k(y, x) for all x, y ∈ [0, 1]. ■

Example 6.30. Consider the linear initial value problem with varying coeffi-
cients
y ′ (t) = A(t)y(t),
(63)
y(0) = x,
154 6. HILBERT SPACES

where x ∈ Rn is some given initial value and A : [0, 1] → Rn×n is a continu-


ous function. Then we have for each initial value x ∈ Rn a unique solution
y(t; x) : [0, 1] × Rn → Rn . Moreover, the solution y(t; x) depends for every t ∈ [0, 1]
linearly on the initial value x.
Denote know by T : Rn → Rn the mapping
T x := y(1; x)
that maps the initial value x to the final value of the corresponding solution of (63).
In the following, we will discuss the adjoint mapping T ∗ : Rn → Rn .
For that, define the adjoint equation to (63) as the final value problem
v ′ (t) = −A(t)∗ v(t),
(64)
v(1) = w,
where w ∈ Rn is some given final value and A(t)∗ denotes the adjoint of A(t). Then
we have again for each final value w ∈ Rn a unique solution v(t; w) : [0, 1]×Rn → Rn
of (64).
Now the product rule implies that

⟨y(t; x), v(t; w)⟩′ = ⟨y(t; x)′ , v(t; w)⟩ + ⟨y(t; x), v(t; w)′ ⟩
= ⟨A(t)y(t; x), v(t; w)⟩ + ⟨y(t; x), −A(t)∗ v(t; w)⟩
= ⟨A(t)y(t; x), v(t; w)⟩ − ⟨A(t)y(t; x), v(t; w)⟩ = 0
for all t ∈ [0, 1] and all x, w ∈ Rn . Thus
Z 1
0= ⟨y(t; x), v(t; w)⟩′ dt = ⟨y(1; x), v(1; w)⟩ − ⟨y(0; x), v(0; w)⟩
0
= ⟨T x, w⟩ − ⟨x, v(0; w)⟩
n
for all x, w ∈ R . This shows that
T ∗ w = v(0; w)
for all w ∈ Rn . That is, the adjoint of T maps the final value w to the initial value
of the corresponding solution of the adjoint equation (64). ■

In the finite dimensional case, we have shown the relations ker T = (ran T ∗ )⊥
and ran T = (ker T ∗ )⊥ (see Lemma 3.46). The next result provides a generalisation
to the infinite dimensional case.
Proposition 6.31. Assume that U and V are Hilbert spaces and that T : U →
V is bounded linear. Then
ran T = (ker T ∗ )⊥ and ker T = (ran T ∗ )⊥ .
Proof. We show first that ran T = (ker T ∗ )⊥ .
Assume for that first that v ∈ ran T . Then v = T u for some u ∈ U . Now let
w ∈ ker T ∗ . Then T ∗ w = 0 and thus
⟨v, w⟩V = ⟨T u, w⟩V = ⟨u, T ∗ w⟩U = 0.
This shows that v ∈ (ker T ∗ )⊥ . Since this holds for all v ∈ ran T , it follows that
ran T ⊂ (ker T ∗ )⊥ . Since (ker T ∗ )⊥ is closed, we can conclude that also ran T ⊂
(ker T ∗ )⊥ .
Now assume that v ∈ (ran T )⊥ , that is, that ⟨w, v⟩ = 0 for all w ∈ ran T . Then
we have for all u ∈ U that
⟨u, T ∗ v⟩U = ⟨T u, v⟩V = 0,
5. HILBERT SPACE BASES 155

which implies that T ∗ v = 0 and thus v ∈ ker T ∗ . This shows that


(ran T )⊥ ⊂ ker T ∗ .
Thus we can conclude that
(ker T ∗ )⊥ ⊂ ((ran T )⊥ )⊥ = ((ran T )⊥ )⊥ = ran T .
Together, these results show that
ran T = (ker T ∗ )⊥ .
Applying this result to T ∗ shows furthermore that
ran T ∗ = (ker(T ∗ )∗ )⊥ = (ker T )⊥ ,
which is equivalent to
ker T = (ran T ∗ )⊥ = (ran T ∗ )⊥ .

As a consquence of this, we obtain the following results:
ran T = (ker T ∗ )⊥ , ker T ∗ = (ran T )⊥ ,
ran T ∗ = (ker T )⊥ , ker T = (ran T ∗ )⊥ .
In addition, these imply the decompositions
U = ker T ⊕ ran T ∗ and V = ker T ∗ ⊕ ran T .
Apart from the necessity to take the closure of the ranges of T and T ∗ , these are
the same results that we have derived in the finite dimensional case.
Corollary 6.32. Assume that U and V are Hilbert spaces and that T : U → V
is bounded linear. Then T is injective, if and only if ran T ∗ ⊂ U is dense. Similarly,
T ∗ is injective, if and only if ran T ⊂ V is dense.
Proof. We have that T is injective, if and only if ker T = {0}. Using the
decomposition U = ker T ⊕ ran T ∗ , this is equivalent to the equality U = ran T ∗ ,
which is precisely the density of ran T ∗ in U . The second equivalence is similar. □

5. Hilbert space bases


Recall that a basis in a vector space U is a family (ui )i∈I ⊂ U with the property
that every vector u ∈ U can be uniquely written as a finite linear combination of
the form
XN
u= λn uin
n=1
for some N ∈ N, λn ∈ K, and in ∈ I for 1 ≤ n ≤ N . In the context of infinite
dimensional spaces, such a basis is often called a Hamel basis, in order to distinguish
from other basis concepts, some of which we will discuss in the following.
Before introducing alternative basis definitions, we need to discuss some con-
cepts related to measuring the “size” of different infinite sets.
Definition 6.33 (Countable set). An infinite set S is called countable, if there
exists a surjective mapping f : N → S. If an infinite set S is not countable, it is
called uncountable. ■

That is, a set S is countable, if we can write it as a sequence


S = {s1 , s2 , s3 , . . .}.
Example 6.34. The set N of natural numbers is obviously countable. So is the
set Z of integers, where we can use the ordering Z = {0, +1, −1, +2, −2, . . .}. ■
156 6. HILBERT SPACES

Example 6.35. If S is a countable set, then so is the Cartesian product S n


for every n ≥ 1. Indeed, assume that S = {s1 , s2 , s3 , . . .}, and take for simplicity
n = 2. Then we can write
S × S = {(s1 , s1 ), (s1 , s2 ), (s2 , s1 ), (s3 , s1 ), (s2 , s2 ), (s1 , s3 ), . . .}.
A similar construction works for n ≥ 3.
More general, if S1 , . . . , Sn are countable sets, then so is the product S1 × S2 ×
· · · × Sn . ■

Example 6.36. The set Q of rational numbers is countable. In order to see


this, note that we can write every rational number x as a fraction x = p/q with
p ∈ Z and q ∈ N. Since the Cartesian product Z × N is countable, it follows that Q
is countable as well. ■

Example 6.37. The set R of real numbers is uncountable. In order to show


this, we will employ a technique known as Cantor’s diagonal argument.
Assume to the contrary that R were countable. Then we could write R =
{s1 , s2 , s3 , . . .}. We will now construct a real number that is different from each of
the numbers s1 , s2 , . . .. For this, note that we can write each number sn in decimal
expansion in the form
(n) (n) (n) (n) (n) (n)
sn = ±a−Nn a−(Nn −1) . . . a0 , a1 a2 a3 . . .
(n)
with digits aj ∈ {0, 1, . . . , 9}. That is,

X
sn = aj · 10−j .
j=−Nn

Define now the number



X
x= bj · 10−j ,
j=1
(j)
where the digits of x are chosen such that bj ̸= aj . Then x differs from sn at least
in its n-th digit after the comma, and thus it is different from sn .2 This, however,
implies that x is not contained in the set {s1 , s2 , s3 , . . .}, which contradicts the
assumption that this set includes all real numbers. ■

One problem with Hamel bases in infinite dimensional Banach spaces is that
one can show that they are necessarily uncountable, which makes them essentially
impossible to use in practice.
Proposition 6.38. Let U be an infinite dimensional Banach space. Then every
Hamel basis of U is uncountable.
Remark 6.39. For incomplete normed spaces, the situation is different as we
have already seen. For instance, the family (e1 , e2 , . . . , . . .) of unit sequences is a
Hamel basis in the space cfin of finite sequences. However, cfin is incomplete with
respect to all the norms we have discussed. ■

We now introduce an alternative notion of a basis of a normed space.

2One has to be slightly careful here, as the decimal expansion of a real number is not always
unique (for instance we have 1 = 0.99999 . . .). However, this non-uniqueness issue only occurs
for numbers with a finite decimal expansion, and we can easily avoid any potential issues by for
(j) (j)
instance setting bj = 5 if aj = 0 or aj = 9.
5. HILBERT SPACE BASES 157

Definition 6.40 (Schauder basis). Let (U, ∥·∥U ) be an infinite dimensional


normed space. A countable family (u1 , u2 , u3 , . . .) ⊂ U is a Schauder basis of U , if
there exists for every u ∈ U a unique sequence λ1 , λ2 , . . ., such that
N
X
lim u− λn un = 0.
N →∞
n=1

Put differently, if (u1 , u2 , . . .) is a Schauder basis of U , then we can write each


u ∈ U uniquely as an infinite series

X
u= λn un .
n=1

Example 6.41. Let 1 ≤ p < ∞ and consider the family of unit sequences
(e1 , e2 , . . .) in ℓp . Then this set is a Schauder basis of ℓp . In order to see this,
assume that x = (x1 , x2 , . . .) ∈ ℓp and define λn = xn . Then
N
X ∞
 X 1/p
x− λ n en = |xn |p .
p
n=1 n=N +1

Since x ∈ ℓp , it follows that


N
X
lim x− λn en = 0.
N →∞ p
n=1

Moreover, if we choose any other sequence λ1 , λ2 , . . . of coefficients, then the series


P
n λn en does not converge to x. Thus the representation

X
x= λ n en
n=1

is unique, which shows that (e1 , e2 , . . .) is a Schauder basis of ℓp . ■

Example 6.42. The family of unit sequences (e1 , e2 , . . .) is not a Schauder


basis of ℓ∞ . Take for instance x = (1, 1, . . .). Then
N
X
x− λ n en ≥ 1
n=1

for all sequences λ1 , λ2 , . . ., and thus we cannot represent x in the desired form. ■
Definition 6.43 (Separability). A normed space U is called separable, if there
exists a countable dense subset of U . ■

Example 6.44. The set R of real numbers is separable, as it contains Q as


a countable dense subset. Similarly, for every n ∈ N the sets Rn and Cn are
separable. ■

Lemma 6.45. Assume that the normed space U has a Schauder basis. Then U
is separable.
Proof. Exercise! □

Remark 6.46. A long standing problem in functional analysis was the question,
whether every separable Banach space has a Schauder basis. In the seventies, this
question was answered negatively. That is, there exist separable Banach spaces,
which do not have any Schauder basis. ■
158 6. HILBERT SPACES

Example 6.47. The space ℓ∞ is not separable.


The basic idea is to consider the sequences e(I) ∈ ℓ∞ defined by
(
(I) 1 if n ∈ I,
en =
0 if n ̸∈ I,

where I ⊂ N is any subset. One can show3 that the set I : I ⊂ N (and thus also

 (I)
the set e : I ⊂ N ) is uncountable. Moreover, we have that ∥e(I) − e(J) ∥∞ = 1
whenever I ̸= J.
Thus we have an uncountable discrete subset of ℓ∞ , which makes it impossible
to find a countable dense subset: Assume that {s1 , s2 , . . .} ⊂ ℓ∞ is dense. Then
there exists in particular for each I ⊂ N some index nI such that ∥snI − e(I) ∥∞ <
1/4. Then, however, we have for all I ̸= J that

1 = ∥e(I) − e(J) ∥∞ ≤ ∥e(I) − snI ∥∞ + ∥snI − sn−J ∥∞ + ∥snJ − e(J) ∥∞


1
≤ + ∥snI + snJ ∥∞ ,
2
which shows that
1
∥snI + snJ ∥∞ ≥ .
2
In particular, we have that nI ̸= nJ whenever I ̸= J. Since we have uncountably
many subsets I ⊂ N, this implies that we also need uncountably many indices nI ,
which contradicts the assumption that {s1 , s2 , . . .} is countable. ■

We will now discuss in more detail the case where U is a Hilbert space. Recall
to that end that a sequence (u1 , u2 , . . .) in an inner product space is
• orthogonal, if ⟨ui , uj ⟩ = 0 whenever i ̸= j,
• orthonormal, if it is orthogonal and ∥ui ∥ = 1 for all i.
Theorem 6.48. Let U be an inner product space. Assume moreover that
(u1 , u2 , . . . , un ) is a finite orthonormal system in U and denote
W = span({u1 , u2 , . . . , un }).
Then W is a closed subspace of U and we have
n
X
⟨u, uj ⟩uj = πW (u)
j=1

and
n
X
∥u − πW (u)∥2 = ∥u∥2 − |⟨u, uj ⟩|2
j=1

for all u ∈ U .
Proof. Exercise! □

Lemma 6.49 (Bessel’s inequality). Assume that U is an infinite dimensional


inner product space and {u1 , u2 , . . .} is an infinite orthonormal system. Then

X
|⟨u, uj ⟩|2 ≤ ∥u∥2
j=1

for all u ∈ U .

3Try it yourself! This is really only Cantor’s diagonal argument again.


5. HILBERT SPACE BASES 159

Proof. Applying the result of Theorem 6.48 to the spaces


Wn = span({u1 , u2 , . . . , un }),
we obtain that
n
X
|⟨u, uj ⟩|2 = ∥u∥2 − ∥u − πW (u)∥2 ≤ ∥u∥2
j=1

for all u ∈ U . Taking the limit n → ∞, we obtain the claimed estimate. □


Theorem 6.50. Assume that U is an infinite dimensional Hilbert space and
that (u1 , u2 , . . .) is an orthonormal system in U . Assume moreover that λ =
(λ1 , λ2 , . . .) is a sequence in K. Then the sequence (vn )n∈N ⊂ U defined by
Xn
vn = λj uj
j=1
2
converges in U , if and only if λ ∈ ℓ .
Proof. Assume first that the sequence (vn )n∈N converges to some v ∈ U .
Then it is necessarily bounded and thus there exists R > 0 such that ∥vn ∥ ≤ R for
all n ∈ N. Now we note that, by Pythagoras’ theorem, using the orthonormality of
the system (u1 , u2 , . . .),
n
X n
X
R2 ≥ ∥vn ∥2 = |λj |2 ∥uj ∥2 = |λj |2 .
j=1 j=1
2
Since this holds for all n ∈ N, it follows that λ ∈ ℓ .
Assume now that λ ∈ ℓ2 and let ε > 0. Then there exists N ∈ N such that
X∞
|λj |2 < ε2 .
j=N +1

Now assume that n, m ≥ N . Without loss of generality assume that m ≥ n. Then


Xm 2 Xm ∞
X
∥vn − vm ∥2 = λ j uj = |λj |2 ≤ |λj |2 ≤ ε2 .
j=n+1 j=n+1 j=N +1

This shows that (vn )n∈N is a Cauchy sequence. Since U is a Hilbert space and thus
complete, it follows that the sequence (vn )n∈N converges. □
Theorem 6.51. Assume that U is an infinite dimensional Hilbert space and
that (u1 , u2 , . . .) ⊂ U is an orthonormal system. Denote moreover
W = span{u1 , u2 , . . .}.
Then

X
πW (u) = ⟨u, uj ⟩uj
j=1
for all u ∈ U .
Proof. By Bessel’s inequality we have that
X∞
|⟨u, uj ⟩|2 ≤ ∥u∥2 ,
j=1

which shows that the sequence {⟨u, uj ⟩}j∈N is contained in ℓ2 . Thus



X n
X
z := ⟨u, uj ⟩uj = lim ⟨u, uj ⟩uj
n→∞
j=1 j=1
160 6. HILBERT SPACES

exists in U . Moreover, by definition z is the limit of a sequence in span{u1 , u2 , . . .},


which implies that z ∈ W .
It thus remains to show that u − z ∈ W ⊥ . Now let k ∈ N be fixed. Then
Xn Xn
⟨z, uk ⟩ = lim ⟨u, uj ⟩uj , uk = lim ⟨u, uj ⟩⟨uj , uk ⟩ = ⟨u, uk ⟩,
n→∞ n→∞
j=1 j=1

as ⟨uj , uk ⟩ = 0 whenever j ̸= k and ⟨uk , uk ⟩ = 1. Thus


⟨u − z, uk ⟩ = ⟨u, uk ⟩ − ⟨u, uk ⟩
for all k ∈ N and thus

u − z ∈ {u1 , u2 , . . .}⊥ = span{u1 , u2 , . . .} = W ⊥ .
Thus we have shown that z ∈ W and u−z ∈ W ⊥ , which proves that z = πW (u). □
Remark 6.52. Assume that U is a Hilbert space and that (u1 , u2 , . . .) is an or-
thonormal system in U . The numbers ⟨u, uj ⟩ are sometimes called the (generalised)
Fourier coefficients of u with respect to the system (u1 , u2 , . . .). ■

Definition 6.53 (total subset). An orthonormal system (u1 , u2 , . . .) in a Hil-


bert space U is total or maximal, if
span({u1 , u2 , . . .}) = U,
or, equivalently, if
{u1 , u2 , . . .}⊥ = {0}.

Definition 6.54 (Hilbert basis). An orthonormal system (u1 , u2 , . . .) in an


infinite dimensional Hilbert space U is an orthonormal basis or a Hilbert basis of
U , if
X∞
u= ⟨u, uj ⟩uj
j=1
for all u ∈ U . ■

Theorem 6.55. Assume that U is an infinite dimensional Hilbert space and


S = (u1 , u2 , . . .) ⊂ U is an orthonormal system. The following are equivalent:
(1) S is total.
(2) S is a Hilbert basis of U .
(3) For every u ∈ U , Parseval’s identity

X
(65) |⟨u, uj ⟩|2 = ∥u∥2
j=1

holds.
Proof. Assume that S is total. Then S ⊥ = {0} and thus U = span S. Thus
we have for all u ∈ U that πspan S (u) = u. Thus

X
u = πspan S (u) = ⟨u, uj ⟩uj .
j=1

Since this holds for all u ∈ U , it follows that S is a Hilbert basis of U .


Assume now that S is a Hilbert basis of u. Then

X
u= ⟨u, uj ⟩uj
j=1
5. HILBERT SPACE BASES 161

and thus

X 2 ∞
X 2
∥u∥2 = lim ⟨u, uj ⟩uj = lim ⟨u, uj ⟩uj
n→∞ n→∞
j=1 j=1

X ∞
X
= lim |⟨u, uj ⟩|2 = |⟨u, uj ⟩|2 .
n→∞
j=1 j=1

Thus Parseval’s identity holds.


Finally assume that Parseval’s identity holds and let u ∈ S ⊥ . Then ⟨u, uj ⟩ = 0
for all j ∈ N and thus

X
2
∥u∥ = |⟨u, uj ⟩|2 = 0,
j=1

showing that u = 0. Thus S = {0} and therefore S is total. □
Lemma 6.56. Every infinite dimensional Hilbert space contains an infinite or-
thonormal system.
Idea of proof. Choose an infinite, linear independent sequence in U . By
applying Gram–Schmidt orthogonalisation to this sequence, we obtain an infinite
orthonormal system. □
Theorem 6.57. Every infinite dimensional separable Hilbert space has a Hilbert
basis.
Idea of proof. Apply Gram–Schmidt orthogonalisation to a countable dense
subset. □
In particular, we obtain from this result that every separable Hilbert space
“looks like” the sequence space ℓ2 .
Corollary 6.58 (Riesz–Fischer). Every infinite dimensional separable Hilbert
space is isometrically isomorphic to ℓ2 .
Proof. Let U be an infinite dimensional separable Hilbert space, and let
(u1 , u2 , . . .) be a Hilbert space basis of U . Define the mapping T : U → ℓ2 ,

T u = ⟨u, uj ⟩ j∈N .
Then every vector u ∈ U can uniquely be written as
X
u= ⟨u, uj ⟩uj ,
j∈N

which implies in particular that T is injective and surjective. Moreover, by Par-


seval’s identity we have that
X
∥u∥2U = |⟨u, uj ⟩|2 = ∥T u∥ℓ2
j∈N

for all u ∈ U , which proves that T is an isometry. □


2
Example 6.59. In the space ℓ the family (ej )j∈N of unit sequences is a Hilbert
basis. ■

Example 6.60. Let L2 ([0, 2π], C) denote the space of square integrable complex
valued functions on [0, 2π]. That is, L2 ([0, 2π], C) is the completion of the space
C([0, 2π], C) of continuous complex values functions on [0, 2π] with respect to the
norm Z 2π
∥f ∥2 = |f (x)|2 dx,
0
162 6. HILBERT SPACES

which is induced by the inner product


Z 2π
⟨f, g⟩ = f (x)g(x) dx.
0
Then the set of functions
1
fn (x) := √ einx

for n ∈ Z is a Hilbert basis (the Fourier basis) of L2 ([0, 2π], C). ■

The structure of general Banach spaces is much more complicated, and there
exists no result similar to the Riesz–Fischer theorem for Banach spaces.

6. Galerkin’s Method
We will now provide a glimpse into the theory of numerical functional analysis,
that is, the (approximate) numerical solution of equations in infinite dimensional
spaces. One central idea there, which builds on everything that we did in this
course, is the so called Galerkin method. The basic idea in this method is that
one tries to approximate an infinite dimensional problem by a sufficiently high
dimensional finite dimensional one. As a preparation, we return once again to least
squares solutions of linear systems.
Theorem 6.61. Assume that U and V are Hilbert spaces, that T : U → V is
bounded linear, and that v ∈ V is given. Then u ∈ U solves the least squares
problem
(66) min∥T u − v∥2V
u∈U

if and only if u solves the normal equation


(67) T ∗ T u = T ∗ v.
Proof. The least squares problem 66 is equivalent to the problem
(68) min ∥w − v∥2V
w∈ran(T )

in the sense that w solves (68) if and only if w = T u for some solution u of (66).
Since (68) is just a projection problem, we know that w ∈ V is a solution of (68)
if and only if
w ∈ ran T and w − v ∈ (ran T )⊥ = ker(T ∗ ).
That is, w = T u for some u ∈ U and T ∗ (w − v) = 0. This, however, is in turn
equivalent to the normal equation T ∗ (T u − v) = 0. □
Remark 6.62. Note that it can happen that neither of the problems (66)
or (67) has a solution. A part of the claim of the theorem is, however, that if one
of the problems has a solution, then so has the other. ■

Assume now that U and V are infinite dimensional Hilbert spaces, that T : U →
V is bounded linear, and that v ∈ V is given. Our goal is the solution of the equation
T u = v. Since we cannot really do so in this infinite dimensional setting, we have
to apply some form of discretisation. The idea of Galerkin’s method is to do this
by “projecting” the problem T u = v to a finite dimensional subspace of U .
Assume therefore that W ⊂ U is some finite dimensional subspace and denote
by S := T |W : W → V the restriction of T to W . Then we can consider the least
squares problem
min ∥Su − v∥2V .
u∈W
6. GALERKIN’S METHOD 163

That is, we try to find a “best solution” of the infinite dimensional problem within
the finite dimensional subspace W . In view of the previous result, this is equivalent
to solving the normal equation
(69) S ∗ Su = S ∗ v,
where S ∗ : V → W is the adjoint of S = T |W . Note moreover that this problem
necessarily has a solution, because W , and therefore also ran(S), is finite dimen-
sional.
Assume now that (u1 , u2 , . . . , uN ) is a basis of W . Then u ∈ W solves (69), if
and only if
⟨S ∗ Su, ui ⟩ = ⟨S ∗ v, ui ⟩
for all 1 ≤ i ≤ N . By the definition of the adjoint S ∗ of S, this is equivalent to the
system of equations
⟨Su, Sui ⟩ = ⟨v, Sui ⟩
for all 1 ≤ i ≤ N . Moreover, by definition Su = T u for all u ∈ W and thus we
obtain the system of equations
(70) ⟨T u, T ui ⟩ = ⟨v, T ui ⟩ for all 1 ≤ i ≤ N.
Finally, we will rewrite this system in matrix-vector form. Since (u1 , . . . , uN )
is a basis of W , we can write every solution u of (70) as
N
X
u= xj uj
j=1

with unique coordinates x1 , . . . , xN ∈ K. Inserting this form of u in (70), we obtain


the system of equations
N
X
(71) xj ⟨T uj , T ui ⟩ = ⟨v, T ui ⟩ for all 1 ≤ i ≤ N
j=1

for the coordinates x1 , . . . , xN of u. Now define the matrix



A = ⟨T uj , T ui ⟩ i,j=1,...,N
and the vector

b = ⟨v, T ui ⟩ i=1,...,N .
Then the system (71) can be written in matrix-vector form as
Ax = b.
Note moreover that, by construction, A = AH is a Hermitian matrix and that
b ∈ ran A. In particular, the equation has a solution in Kn . Moreover, if T is
injective, then the solution is unique.
Now an important question is, whether this actually yields an approximation
of the solution of the infinite dimensional problem. That is, if we increase the
dimension of the discretisation space W , do we get closer to solving the original
problem? The following results provides an answer to this question:
Theorem 6.63 (Convergence of Galerkin methods). Assume that U and V are
infinite dimensional Hilbert spaces, that T : U → V is bounded linear and invertible
with bounded linear inverse T −1 : V → U . Assume that v ∈ V is given and denote
by u† ∈ U the unique solution of the equation T u = v.
Assume moreover that (u1 , u2 , . . .) is a Hilbert basis of U . For N ∈ N denote
WN := span{u1 , u2 , . . . , uN } ⊂ U
164 6. HILBERT SPACES

and denote by u(N ) ∈ WN the (unique) solution of the problem


(72) min ∥T u − v∥2V .
u∈WN

Then
u† = lim u(N ) .
N →∞
−1
Proof. Since T : V → U is bounded linear, we can estimate
† −1
∥u(N )
− u ∥U ≤ ∥T ∥B(V,U ) ∥T u(N ) − T u† ∥V = ∥T −1 ∥B(V,U ) ∥T u(N ) − v∥V .
Thus it is sufficient to show that ∥T u(N ) − v∥V → 0. Since u(N ) solves the least
squares problem (72) we have that
∥T u(N ) − v∥V ≤ ∥T πWN (u† ) − v∥V
for all N ∈ N.
Since (u1 , u2 , . . .) is a Hilbert basis of U we have that
N
X

πWN (u ) = ⟨u† , uj ⟩uj
j=1

and thus
N
X
lim πWN (u† ) = lim ⟨u† , uj ⟩uj = u† .
N →∞ N →∞
j=1
Because of the boundedness of T , this implies that
lim T πWN (u† ) = T u† = v.
N →∞
Thus
lim ∥T u(N ) − v∥V ≤ lim ∥T πWN (u† ) − v∥V = 0.
N →∞ N →∞

Remark 6.64. In the proof of the previous result, we actually obtain the con-
crete estimate
∥u(N ) − u† ∥U ≤ ∥T ∥B(U,V ) ∥T −1 ∥B(V,U ) ∥u† − πWN (u† )∥,
which can be rewritten as

X
∥u(N ) − u† ∥U ≤ ∥T ∥B(U,V ) ∥T −1 ∥B(V,U ) |⟨u† , ui ⟩|2 .
n=N +1

The approximation error thus depends on the decay properties of the generalised
Fourier coefficients of the true solution u† ; the faster these decay, the better the
approximation. ■

7. Reproducing Kernel Hilbert Spaces


Assume that Ω ⊂ Rd is some subset and that U is a Hilbert space consisting
of functions defined on Ω. That is, every u ∈ U is a function u : Ω → K. We now
assume that for each x ∈ Ω, the point evaluation
δx : U → K, u 7→ δx (u) := u(x),
is a bounded linear functional. That is, there exists for each x ∈ Ω some constant
cx > 0 such that
|u(x)| ≤ cx ∥u∥U for all u ∈ U.
Then the Riesz Representation Theorem implies that there exists for each x ∈ Ω
some element kx ∈ U such that
u(x) = δx (u) = ⟨u, kx ⟩U for all u ∈ U.
7. REPRODUCING KERNEL HILBERT SPACES 165

Now kx ∈ U , and U consists of functions on Ω. Thus the element kx is itself a


function kx : Ω → K. We can therefore define a function k : Ω × Ω → K,
k(y, x) := kx (y).
Then we have that
u(x) = ⟨u, k(·, x)⟩U for all u ∈ U.
These considerations give rise to the following definitions:
Definition 6.65 (RKHS). A reproducing kernel Hilbert space (RKHS) is a
Hilbert space U consisting of K-valued functions on some set Ω such that for each
x ∈ Ω the point evaluation δx : U → K is a bounded linear functional. The function
k : Ω × Ω → K satisfying
u(x) = ⟨u, k(·, x)⟩U for all u ∈ U
is called the kernel of the RKHS U . ■

Example 6.66. The Sobolev space H01 ([0, 1]) is the completion of the space of
all functions u ∈ C 1 ([0, 1]) satisfying u(0) = u(1) = 0 with respect to the inner
product
Z 1
⟨u, v⟩1 := u′ (x)v ′ (x) dx.
0
One can show that this space is indeed a space of functions on the interval [0, 1]
and that the point evaluations δx are continuous.4 Thus H01 ([0, 1]) is an RKHS.
Moreover, it is possible to show that the kernel k : [0, 1] × [0, 1] → K is the function
(
(1 − x)y if x ≥ y,
k(y, x) =
x(1 − y) if x ≤ y.
Indeed, if u : [0, 1] → K is (for simplicity) continuously differentiable and satisfies
u(0) = u(1) = 0, then
Z 1 Z x Z 1
′ ′
⟨u, k(·, x)⟩U = u (y) ∂y k(y, x) dy = u (y)(1 − x) dy − u′ (y)x dy
0 0 x
= u(x)(1 − x) − u(0) − u(1) + u(x)x = u(x).

Lemma 6.67. Assume that U is an RKHS with kernel k : Ω × Ω → K. Then k


is Hermitian symmetric in the sense that
k(y, x) = k(x, y) for all x, y ∈ Ω.
Proof. By definition, the function y 7→ kx (y) := k(y, x) is an element of U .
Thus we have that
k(y, x) = kx (y) = ⟨kx , k(·, y)⟩U = ⟨k(·, x), k(·, y)⟩U .
Similarly, we can write
k(x, y) = ky (x) = ⟨ky , k(·, x)⟩U = ⟨k(·, y), k(·, x)⟩U .
Now the claim follows from the fact that ⟨f, g⟩U = ⟨g, f ⟩U for all f , g ∈ U . □

4More details will e.g. be discussed in the course TMA4212 Numerical Solution of Differential
Equations by Difference Methods, as Sobolev spaces are a central tool in the modern theory of
partial differential equations and their analytic and numerical solution.
166 6. HILBERT SPACES

Lemma 6.68. Assume that U is an RKHS with kernel k : Ω×Ω → K. Then k is


positive semi-definite in the following sense: For all distinct points x1 , . . . , xN ∈ Ω
and all α1 , . . . , αN ∈ K we have that
N X
X N
αi αj k(xj , xi ) ≥ 0.
i=1 j=1

Proof. Define the function f ∈ U ,


N
X
f (y) = αi k(y, xi ).
i=1

Then
N
DX N
X E
0 ≤ ∥f ∥2U = ⟨f, f ⟩U = αi k(·, xi ), αj k(·, xj )
U
i=1 j=1
N X
X N N X
X N
= αi αj k(·, xi ), k(·, xj ) U
= αi αj k(xj , xi ).
i=1 j=1 i=1 j=1


An alternative way of interpreting these last results is the following: Given
distinct points x1 , . . . , xN ∈ Ω we define a matrix K ∈ KN ×N by setting
Kij := k(xi , xj ).
Then Lemma 6.67 states that K is a Hermitian matrix, and Lemma 6.68 states
that K is positive semi-definite.
Remark 6.69. Assume again that x1 , . . . , xN ∈ Ω are distinct points, and
define the space
V := span{kx1 , . . . , kxN } ⊂ U.
Then V is a Hilbert space itself (being a finite dimensional subspace of U ) and the
family (kx1 , . . . , kxN ) by construction contained in V . Moreover, we have that
⟨kxj , kxi ⟩U = ⟨k(·, xj ), k(·, xi )⟩U = k(xi , xj ).
Thus the matrix K ∈ KN ×N with Kij = k(xi , xj ) is in fact the Gram matrix of the
family (kx1 , . . . , kxN ) in V .
In particular, we obtain by Lemma 3.24 that the family (kx1 , . . . , knN ) is linearly
independent (and thus a basis of V ), if and only if the matrix K is positive definite.

Conversely, it is possible to show that every function k that satisfies the con-
ditions of Lemmas 6.67 and 6.68 defines a unique RKHS:
Theorem 6.70 (Moore–Aronszajn). Let Ω be a set and let k : Ω × Ω → K be
Hermitian symmetric and positive semi-definite. Then there exists a unique Hilbert
space U of functions on Ω with reproducing kernel k. Moreover, the space

U0 := span k(·, x) : x ∈ Ω
is a dense subspace of U .
Proof. See [BTA04, Thm. 3]. □
Remark 6.71. The statement in Theorem 6.70 that the space U0 is dense in
U is the same as saying that functions in U can be arbitrarily well approximated
by finite linear combinations of functions k(·, xj ) for some xj ∈ Ω. That is, for
7. REPRODUCING KERNEL HILBERT SPACES 167

each u ∈ U and each ε > 0 there exist points x1 , . . . , xN ∈ Ω and coefficients


c1 , . . . , cN ∈ K such that
N
X
u− cj k(·, xj ) < ε.
U
j=1

d
Example 6.72. In the case Ω = R , so called radial kernels of the form
k(y, x) = κ(∥x∥22 ) for some function κ : R≥0 → R≥0 are of specific interest in view
of their symmetry properties. Particular examples are the following (cf. [BTA04,
p. 43]):
• The Gaussian kernel with bandwidth σ > 0 defined by
∥y−x∥2 2
k(y, x) = e− 2σ 2 .
• The Poisson kernel with bandwidth σ > 0 defined by
∥y−x∥2
k(y, x) = e− σ .
• The inverse multiquadric kernel with bandwidth σ > 0 and parameter
s > 0 defined by
1
k(y, x) = 2 .
(σ + ∥y − x∥22 )s

Interpolation and Regression.


Theorem 6.73. Let U be an RKHS over the set Ω and let k : Ω × Ω → K be
the reproducing kernel of U . Let x1 , . . . , xN be distinct points in Ω and let z =
(z1 , . . . , zN ) ∈ KN be given. Assume that the matrix K ∈ KN ×N , Kij := k(xi , xj ),
is positive definite. Then the problem
(73) min∥u∥2U s.t. u(xi ) = zi for all i = 1, . . . , N,
u∈U

has a unique solution u† ∈ U given by


N
X
u† = ci k(·, xi ),
i=1
where the vector c = (c1 , . . . , cN ) is defined as
c = K −1 z.
Proof. Since the matrix K is positive definite, it follows from the discussion
in Remark 6.69 that the family (kx1 , . . . , kxN ) in U is linearly independent. In
particular, there exists some ĉ ∈ KN such that
N
X
ĉj k(xi , xj ) = zi
j=1

for all i = 1, . . . , N . Denote now


N
X
ŵ := ĉj k(·, xj ) ∈ U,
j=1

let 
W := u ∈ U : u(xi ) = 0 for all i = 1, . . . , N
and denote

Ŵ := ŵ + W = u ∈ U : u(xi ) = zi for all i = 1, . . . , N .
168 6. HILBERT SPACES

Then we can rewrite the problem (73) as


min∥u∥2U s.t. u ∈ Ŵ ,
u∈U

which is the same as the problem of computing the projection of the point 0 ∈ U
onto the affine space Ŵ . Note moreover that W is finite dimensional and therefore
a closed subspace of U . Thus Corollary 6.8 implies that (73) has a unique solution
u† . Moreover, u† is uniquely characterised by the conditions
u† ∈ Ŵ and u† ∈ W ⊥ .
Now define the mapping T : U → Kn ,
   
⟨u, kx1 ⟩U u(x1 )
T u :=  ..   .. 
 =  . .

.
⟨u, kxN ⟩U u(xN )
Then W = ker T . As a consequence, since W is closed, we have that
W ⊥ = (ker T )⊥ = ran T ∗ .
We now determine the mapping T ∗ : KN → U . Here we regard KN as a Hilbert
space with the standard Euclidean inner product. Let c ∈ KN and u ∈ U . Then
N
X D X N E
⟨u, T ∗ c⟩U = ⟨T u, c⟩KN = ⟨u, kxi ⟩ci = u, ci kxi .
i=1 i=1
Thus
N
X
T ∗c = ci kxi .
i=1
As a consequence,
ran T ∗ = span{kx1 , . . . , kxN }.
This further implies that u† ∈ U is the unique function of the form
N
X
u† = cj kxj
j=1

such that
u† (xi ) = zi for all i = 1, . . . , N.
Now we note that (cf. Proposition 3.26)
N
DX E N
X N
X
† †
zi = u (xi ) = ⟨u , kxi ⟩U = cj kxj , kxi = cj k(xi , kj ) = Kij cj .
U
j=1 j=1 j=1

In other words, the coefficients c ∈ KN satisfy the equation Kc = z, which implies


that c = K −1 z. □
Theorem 6.74. Let U be an RKHS over the set Ω and let k : Ω × Ω → K be
the reproducing kernel of U . Let x1 , . . . , xN be (not necessarily distinct) points in
Ω and let z = (z1 , . . . , zN ) ∈ KN be given. Let moreover λ > 0. Then the problem
N
X
(74) min λ∥u∥2U + |u(xi ) − zi |2
u∈U
i=1
has a unique solution uλ ∈ U given by
N
X
uλ = ci k(·, xi ),
i=1
7. REPRODUCING KERNEL HILBERT SPACES 169

where the vector c = (c1 , . . . , cN ) ∈ KN is defined as


c = (λ Id +K)−1 z.
Proof. We can write
N N
 u(xi ) z 2
√ − √i
X X
λ∥u∥2U + |u(xi ) − zi |2 = λ ∥u∥2U + .
i=1 i=1
λ λ
Thus we can rewrite (74) as the problem
N
 X zi 2  u(xi )
min ∥u∥2U + yi − √ s.t. yi = √ for all i = 1, . . . , N.
(u,y)∈U ×KN
i=1
λ λ
Define now the mapping T : U × KN → KN ,
 u(x ) N
i
T (u, y) := √ − yi .
λ i=1

Define moreover on U × KN the inner product


⟨(u, y), (v, w)⟩U ×KN := ⟨u, v⟩U + ⟨y, w⟩KN
with corresponding norm ∥·∥U ×KN . Then this the problem is the same as the
problem
 z 
min (u, y) − 0, √ s.t. (u, y) ∈ ker T.
(u,y)∈U ×KN λ U ×KN
Since ker T is a closed linear subspace of the Hilbert space U × √ KN , this problem
z
has a unique solution (uλ , yλ ), which is the projection of (0, / λ) on the space
ker T . This projection is uniquely characterised by the conditions

(75) (uλ , yλ ) ∈ ker T and (uλ , yλ − z/ λ) ∈ (ker T )⊥ = ran T ∗ .
We now compute the mapping T ∗ . Let therefore (u, y) ∈ U × KN and w ∈ KN .
Then
N 
X u(xi ) 
⟨(u, y), T ∗ w⟩U ×KN = ⟨T (u, y), w⟩KN = √ − yi wi
i=1
λ
N N
X wi D X wi kxi E
= ⟨u, kxi ⟩U √ − ⟨y, w⟩KN = u, √ + ⟨y, −w⟩KN .
i=1
λ i=1
λ U
This shows that
N
X wi kxi 
T ∗w = √ , −w ∈ U × KN .
i=1
λ
We note in particular that ran T ∗ is a finite dimensional subspace of U × KN
and therefore closed. Thus (ker T )⊥ = ran T ∗ = ran T ∗ . As a consequence, the
characterisation (75) of the solution of (74) is equivalent to the system of equations
N
w
√j kxj ,
X
uλ =
j=1
λ
z
yλ − √ = −w,
λ
uλ (xi )
√ = yλ,i for all i = 1, . . . , N.
λ
Denote now
wj
cj := √ .
λ
170 6. HILBERT SPACES

Then we can rewrite this system as


N
X
uλ = cj k x j ,
j=1
uλ (xi ) = zi − λci for all i = 1, . . . , N.
In particular,
N
X N
X N
X
zi − λci = uλ (xi ) = cj kxj (xi ) = cj k(xi , xj ) = Kij cj .
j=1 j=1 j=1

Thus the vector c = (c1 , . . . , cN ) solves the equation


z = Kc + λc.
Since the matrix K is positive semi-definite, the matrix K +λ Id is positive definite.
Thus this equation has a unique solution c = (K + λ Id)−1 z, which concludes the
proof. □
Bibliography

[Axl24] Sheldon Axler. Linear algebra done right. Undergraduate Texts in Mathematics.
Springer, Cham, fourth edition, 2024.
[BTA04] Alain Berlinet and Christine Thomas-Agnan. Reproducing kernel Hilbert spaces in prob-
ability and statistics. Kluwer Academic Publishers, Boston, MA, 2004. With a preface
by Persi Diaconis.
[Dev12] Keith J. Devlin. Introduction to mathematical thinking. Keith Devlin, Palo Alto, CA,
USA, 2012.
[Hal74] Paul R. Halmos. Naive set theory. Undergraduate Texts in Mathematics. Springer-
Verlag, New York-Heidelberg, 1974. Reprint of the 1960 edition.
[HS75] Edwin Hewitt and Karl Stromberg. Real and abstract analysis, volume No. 25 of Gradu-
ate Texts in Mathematics. Springer-Verlag, New York-Heidelberg, 1975. A modern treat-
ment of the theory of functions of a real variable, Third printing.

171

You might also like