0% found this document useful (0 votes)
8 views

Lecture_notes_algebra_geometry

Uploaded by

fivisov819
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture_notes_algebra_geometry

Uploaded by

fivisov819
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 224

Lecture notes on linear algebra and geometry

Andrea Ferraguti

September 6, 2023
Disclaimer: If you find any mistake and/or typo in these lecture notes you
can let me know at [email protected].

ii
Contents
Chapter 0: Basics 1
0.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.3 The induction principle . . . . . . . . . . . . . . . . . . . . . . 7
0.4 Real and complex numbers . . . . . . . . . . . . . . . . . . . . 9
0.5 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
0.6 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Chapter 1: Vector spaces 18


1.1 Groups and fields . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2 Linear combinations and subspaces . . . . . . . . . . . . . . . 25
1.3 Linear dependence and linear independence . . . . . . . . . . 33
1.4 Bases and dimension . . . . . . . . . . . . . . . . . . . . . . . 37
1.5 Sum and intersection of subspaces . . . . . . . . . . . . . . . . 46

Chapter 2: Determinant and rank 51


2.1 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.2 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.3 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Chapter 3: Linear systems 73


3.1 Compatibility of linear systems . . . . . . . . . . . . . . . . . 73
3.2 The rank-nullity theorem . . . . . . . . . . . . . . . . . . . . . 79
3.3 How do I solve a linear system? . . . . . . . . . . . . . . . . . 83

Chapter 4: Scalar products and orthogonality 94


4.1 Bilinear forms and scalar products . . . . . . . . . . . . . . . . 94
4.2 Positive definite scalar products . . . . . . . . . . . . . . . . . 96

Chapter 5: Eigenspaces and diagonalization 105


5.1 Eigenvalues,eigenvectors and eigenspaces . . . . . . . . . . . . 105
5.2 Real symmetric matrices . . . . . . . . . . . . . . . . . . . . . 115

Chapter 6: Affine geometry 120


6.1 Affine spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.2 Linear subspaces . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.3 Relative position of linear subspaces . . . . . . . . . . . . . . . 127
6.4 Coordinate systems and equations of subspaces . . . . . . . . 131
6.5 Equations for lines, planes and hyperplanes . . . . . . . . . . . 138

iii
6.6 Relative position of subspaces via equations . . . . . . . . . . 142
6.7 Pencils and bundles of lines and planes . . . . . . . . . . . . . 150

Chapter 7: Euclidean geometry 156


7.1 Euclidean spaces . . . . . . . . . . . . . . . . . . . . . . . . . 156
7.2 Coordinate systems, orthogonality and distance . . . . . . . . 159

Chapter 8: Projective geometry 167


8.1 Equivalence relations . . . . . . . . . . . . . . . . . . . . . . . 167
8.2 Projective spaces . . . . . . . . . . . . . . . . . . . . . . . . . 169
8.3 Linear subspaces . . . . . . . . . . . . . . . . . . . . . . . . . 172
8.4 Equations of linear subspaces . . . . . . . . . . . . . . . . . . 175
8.5 Relative position of linear subspaces . . . . . . . . . . . . . . . 178
8.6 Projective pencils and bundles . . . . . . . . . . . . . . . . . . 181
8.7 Real and imaginary points . . . . . . . . . . . . . . . . . . . . 184

Chapter 9: Conics 189


9.1 Algebraic curves, intersection multiplicities and tangents . . . 189
9.2 Conics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
9.3 Real conics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
9.4 Conics in E2 (R) . . . . . . . . . . . . . . . . . . . . . . . . . . 211

iv
v
Andrea Ferraguti Chapter 0: Basics

Chapter 0: Basics
0.1 Sets
Throughout these lecture notes, a set will be an unordered collection of ob-
jects, without repetitions. The objects contained in a set are called elements.
Elements of a set will be enclosed between curly brackets.

Example 0.1.1. {1, 2, ♥, w} is the set whose elements are 1, 2, ♥ and w.


This is the same as {2, 1, w, ♥}, because sets are unordered collections of
objects, and it is the same as {1, 2, 2, w, w, w, , ♥}, because repetitions do
not matter.
There is a special set, called empty set, that is a collection of zero objects.
This is denoted by ∅.
Elements of a set can be listed one by one, as in the example above, or
they can be described by a property that characterizes them.

Example 0.1.2. The set of natural numbers will be denoted by N, and it


is the set {0, 1, 2, . . .}. The set of integers will be denoted by Z, and it
is the set {0, 1, −1, 2, −2, 3, −3, . . .}. The set of rational numbers will be
denoted by Q, and it can be described as {a/b : a, b ∈ Z and b 6= 0}. The
colons in the description of Q shall be read as ”such as”.

If s is an element of a set S, we write s ∈ S, and we say that s belongs to


S. If this is not the case, namely if s is not an element of S, we write s ∈
/ S.
The following symbols are fundamental standard notation in mathematics.

• The symbol ∀ means ”for all”;

• The symbol ∃ means ”there exists”.

• The symbol ∃! means ”there exists a unique”.

• The symbol @ means ”it does not exist”.

• The symbol ⇐⇒ means ”if and only if”. If P, Q are propositions, we


write P ⇐⇒ Q to say that if P holds true, then so does Q and if Q
holds true, then so does P .

1
Andrea Ferraguti Chapter 0: Basics

Example 0.1.3. The following propositions hold true.

• ∀x ∈ Z, ∃ y ∈ Z such that x + y > 0.

• ∀x ∈ Z, ∃! y ∈ Z such that x + y = 0.

• @x ∈ N such that x2 = 2.

• ∀x ∈ Z, x2 ≤ 4 ⇐⇒ −2 ≤ x ≤ 2

If S, T are sets, we say that S is a subset of T , and we write S ⊆ T , if


every element of S is an element of T as well. For instance, N ⊆ Z. If, on the
other hand, there exists an element of S that does not belong to T , we write
S 6⊆ T . For instance, Q 6⊆ Z.

Remark 0.1.4. Every set S has at least two subsets: the empty set and S
itself.

Definition 0.1.5. Two sets S, T are equal if S ⊆ T and T ⊆ S. If this is


the case, we write S = T .

For S, T sets, the following operations are allowed and produce a new set.
• The intersection of S, T is the set:

S ∩ T := {s : s ∈ S and s ∈ T }.

If S ∩ T = ∅, we say that S and T are disjoint.

• The union of S and T is the set:

S ∪ T := {s : s ∈ S or s ∈ T }.

• The difference of S and T is the set:

S \ T := {s : s ∈ S and s ∈
/ T }.

Remark 0.1.6. Intersection and union are commutative operations, that


is, S ∩ T = T ∩ S and S ∪ T = T ∪ S. On the other hand, the difference is
not commutative. For example, you can try to show that Q \ Z and Z \ Q

2
Andrea Ferraguti Chapter 0: Basics

are not equal.

Lemma 0.1.7. If S, T, U are sets, the following hold true.

1. S ∩ S = S ∪ S = S;

2. S ∩ ∅ = ∅;

3. S ∪ ∅ = S;

4. (S ∩ T ) ∩ U = S ∩ (T ∩ U );

5. (S ∪ T ) ∪ U = S ∪ (T ∪ U );

6. S ⊆ T ⇐⇒ S ∩ T = S ⇐⇒ S ∪ T = T ;

7. S ∩ (T ∪ U ) = (S ∩ T ) ∪ (S ∩ U );

8. S ∪ (T ∩ U ) = (S ∪ T ) ∩ (S ∪ U ).

Definition 0.1.8. Given two sets S, T , the cartesian product of S and T


is the set S × T formed by all ordered pairs (s, t) such that s ∈ S and
t ∈ T . In symbols,

S × T := {(s, t) : s ∈ S, t ∈ T }.

Example 0.1.9. If S = {a, b} and T = {1, 2, 3} then

S × T = {(a, 1), (a, 2), (a, 3), (b, 1), (b, 2), (b, 3)}.

If S, T, U are sets, the following properties hold true.


• S × ∅ = ∅ × S = ∅;
• If S and T are both nonempty, S × T = T × S ⇐⇒ S = T ;
• S × (T ∪ U ) = (S × T ) ∪ (S × U );
• S × (T ∩ U ) = (S × T ) ∩ (S × U ).
It is possible to construct the cartesian product of more than two sets.
Namely, if n ≥ 1 is any natural number and S1 , . . . , Sn are sets, we let:
S1 × S2 × . . . × Sn := {(s1 , s2 , . . . , sn ) : si ∈ Si ∀i ∈ {1, 2, . . . , n}}.

3
Andrea Ferraguti Chapter 0: Basics

If the n sets S1 , S2 , . . . , Sn are all equal to a set S, we write S n for the cartesian
product S1 × . . . × Sn .

0.2 Functions
Let S, T be sets.

Definition 0.2.1. A correspondence between S and T is a subset of S ×T .


A function (or map) between S and T is a correspondence f between S
and T such that for every s ∈ S there exists a unique t ∈ T such that
(s, t) ∈ f .


Example 0.2.2. If S = {0, 1} and T = {a, ♠, 7}, the following are
correspondences between S and T :

1. S × T ;

2. ∅;

3. {(0, ♠), (1, ♠)};



4. {(0, ♠), (1, 7), (1, a)}.

However, only 3. is a function between S and T .

If f ∈ S × T is a function, we know that for every s ∈ S there is a unique


t ∈ T with (s, t) ∈ f . Since such t only depends on s, we can write t = f (s).
Hence the function f coincides with the set {(s, f (s)) : s ∈ S}. With this
picture in mind, we will write
f: S →T
to denote that f is a function between S and T . In layman’s terms, f is a
”rule” that assigns to every s in S a unique element t ∈ T . Such element is
referred to as the image of s via f , and we denote it by f (s).
Note:-
In order to define a function between two sets S, T it is enough, by the
above considerations, to describe the image of every element S ∈ s. This
will be done by using the notation s 7→ f (s). For example, writing:

f: Z→N

4
Andrea Ferraguti Chapter 0: Basics

x 7→ x2
means that f is the function between Z and N that associates to every
integer x then natural number x2 .

Definition 0.2.3. Let f : S → T be a function.

1. The set S is called domain of f , while the set T is called codomain.

2. f is called injective if for every s1 , s2 ∈ S with s1 6= s2 we have


f (s1 ) 6= f (s2 ).

3. f is called surjective if for every t ∈ T there exists s ∈ S with


f (s) = t.

4. f is called bijective if it is both injective and surjective.

5. If S 0 ⊆ S, the image of S 0 is the set f (S 0 ) := {f (s) : s ∈ S 0 }.

6. If T 0 ⊆ T , the preimage of T 0 is the set f −1 (T 0 ) = {s ∈ S : f (s) ∈ T 0 .

Example 0.2.4.

• The function f : Z → Z that sends x 7→ x2 is neither injective nor


surjective.

• The function f : Z → Z that sends x 7→ x3 is injective but not


surjective.

• The function f : Q → Q that sends x 7→ x/2 is bijective.

If S is a set, a sequence of element of S is a function a : N → S. Images of


elements in the domain, instead of being denoted by a(0), a(1), a(2), . . . are
often denoted by a0 , a1 , a2 , . . ..
A set is infinite if there exists an injective sequence f : N → S. If this is
not the case, then S is finite.

Definition 0.2.5. If S is a set and n ∈ N, an n-tuple of elements of S is


a function a : {1, 2, . . . , n} → S.

Once again if a is an n-tuple of elements of S, images of the elements in the

5
Andrea Ferraguti Chapter 0: Basics

domain are denoted by a1 , a2 , . . . , an . Conversely, writing down n elements


of S, say (b1 , . . . , bn ) automatically defines an n-tuple, that is the function
b : {1, . . . , n} → S that maps i 7→ bi . From now on, n-tuples of elements of a
set S will be denoted by (a1 , . . . , an ). Notice that order matters. For example,
if S = N the triple (12, 27, 32) differs from the triple (27, 12, 32), because the
first one is a function {1, 2, 3} → N that maps 1 to 12, while the second one
is a function {1, 2, 3} → N that maps 1 to 27.
If S is finite and nonempty, then there exists a natural number n ≥ 1 and
a bijective function f : {1, 2, . . . , n} → S. The number n is called cardinality
of the set S, and we write |S| = n.

Proposition 0.2.6 (Inclusion-exclusion principle). Let S1 , S2 , . . . , Sn be fi-


nite sets. Then :
n
!
X X
|S1 ∪ S2 ∪ . . . ∪ Sn | = (−1)k+1 |Si1 ∩ Si2 ∩ . . . ∩ Sik |
k=1 1≤i1 <i2 <...<ik ≤n

Note:-
What Proposition 0.2.6 says is that in order to compute the cardinality of
a union of finite sets, you must sum the cardinality of each single set, then
subtract the cardinality of all intersections of two of them, then add the
cardinality of all intersections of three of them, and so on. For example, if
n = 2 then:
|S1 ∪ S2 | = |S1 | + |S2 | − |S1 ∩ S2 |,
and if n = 3 then
|S1 ∪ S2 ∪ S3 | =
= |S1 | + |S2 | + |S3 | − |S1 ∩ S2 | − |S1 ∩ S3 | − |S2 ∩ S3 | + |S1 ∩ S2 ∩ S3 |.
If S1 , . . . , Sn are pairwise disjoint, then |S1 ∪. . .∪Sn | = |S1 |+|S2 |+. . .+|Sn |.

Given two functions f : S → T and g : T → U , one can create a third


function, called composition of f and g. This is denoted by g ◦ f and it is
defined as
g◦f: S →U
s 7→ g(f (s))

6
Andrea Ferraguti Chapter 0: Basics

Proposition 0.2.7. Let f : S → T , g : T → U and h : U → W be functions.


Then:
h ◦ (g ◦ f ) = (h ◦ g) ◦ f,
that is, composition of functions is associative.
If S is a set, there always exists a function from S to itself, called identity
function. This is defined by
idS : S → S
s 7→ s
We say that a function f : S → T is invertible if there exists a function g : T →
S such that g ◦ f = idS and f ◦ g = idT . We call g an inverse function for
f.

Proposition 0.2.8. A function is invertible if and only if it is bijective.


Moreover, the inverse function is unique.
We denote the inverse of a function f by f −1 .

0.3 The induction principle


The induction principle is a tool that can be used to prove propositions in-
volving natural numbers. This works as follows. Suppose we want to prove a
proposition P (n) on the n-th natural number. If we can prove the following
two facts:

1. P (0) holds true;

2. if P (k) holds true for a natural number k, then P (k + 1) holds true,

then it follows that P (n) is true for every natural number n. We will now
illustrate this principle with three very classical examples.

Pn n(n+1)
Proposition 0.3.1. Let n be a natural number. Then i=0 i= 2
.

Proof. The proposition P (n) that we want to prove is:


n
X n(n + 1)
P (n) : i= .
i=0
2

7
Andrea Ferraguti Chapter 0: Basics

We therefore need to prove two facts. The first one is that P (0) holds
true, i.e. we need to show that
0
X 0(0 + 1)
i= .
i=0
2

This is clearly true.


Next, we need to prove that if ki=0 i = k(k+1)
P
2 P
for some integer k then
Pk+1
i=0 i =
(k+1)(k+2)
2
. That is, our hypothesis is ki=0 i = k(k+1)
2
(we call
Pk+1 (k+1)(k+2)
this inductive hypothesis) and our thesis is i=0 i = 2
.
So let us assume the inductive hypothesis and let us look at k+1
P
Pk+1 Pk i=0 i.
We can split this sum in two pieces, writing i=0 i = i=0 i + k + 1. But
now we can use our inductive hypothesis, and get
k+1 k
X X k(k + 1) (k + 1)(k + 2)
i= i+k+1= +k+1= ,
i=0 i=0
2 2

which is exactly what we needed to prove.

Proposition 0.3.2. Let n be a natural number. Then 22n − 1 is divisible


by 3.

Proof. The proposition we want to prove is: P (n) : 22n − 1 is divisible by


3.
Once again, we need to prove two things. The first one is that 22·0 − 1
is divisible by 3. This is obviously true. The second one is that if 22k − 1
is divisible by 3 for some integer k (inductive hypothesis), then 22(k+1) − 1
is also divisible by 3. So let us assume the inductive hypothesis and look
at 22(k+1) − 1. We have:

22(k+1) − 1 = 4 · 22k − 1 = 4 · (22k − 1 + 1) − 1 = 4 · (22k − 1) + 3.

Now the inductive hypothesis implies that 4 · (22k − 1) is a multiple of 3,


and therefore also 4 · (22k − 1) + 3 is a multiple of 3. This is exactly what
we needed to prove.

Proposition 0.3.3. Let S be a finite set with |S| = n. Then S has exactly
2n subsets.

8
Andrea Ferraguti Chapter 0: Basics

Proof. When n = 0, S is a set with 0 elements, and hence it is the empty


set ∅. The only subset of ∅ is ∅ itself, so S has 1 = 20 subsetes.
Now we need to show that if a set of cardinality k has 2k subsets
(inductive hypothesis), then a set of cardinality k + 1 has 2k+1 subsets.
Let S be a set with |S| = k + 1, and let P(S) be the set of all subsets of
S. Now fix an element s ∈ S (notice that this exists since S is not empty,
since its cardinality is at least 1). Elements of P(S) are of two types:
either they contain s or they do not. So we can write P(S) = T ∪ T 0 ,
where
T = {U ⊆ S : s ∈ U }
and
T 0 = {U ⊆ S : s ∈
/ U }.
The elements of T 0 are precisely the subsets of S \ {s}. Since |S \ {s}| = k,
we can use our inductive hypothesis: there are exactly 2k subsets of |S \
{s}| = k, so |T 0 | = 2k . On the other hand, there exists a bijection T 0 → T ,
that is the map sending U 7→ U ∪{s}. It follows that |T | = |T 0 | = 2k . Now
since T ∩ T 0 = ∅, by Proposition 0.2.6 we get that |P(S)| = |T | + |T 0 | =
2k+1 .

0.4 Real and complex numbers


The set of real numbers, denoted by R, is the set of all numbers of the form
a0 , a1 a2 a3 . . ., where a0 ∈ Z and ai ∈ {0, . . . , 9} for every i ≥ 1. We refer to
this form as decimal expansion. The set R can be defined in a formal way
starting from Q, but we are not interested in such construction in these notes.
Clearly R contains Z; integers are real numbers of the form a0 , a1 a2 . . . with
ai = 0 for every i ≥ 1. Moreover, R contains Q. Rational numbers are exactly
those with an eventually periodic decimal expansion, namely they are exactly
the ones whose decimal expansion has the form a0 , a1 . . . an an+1 . . . an+t for
some n ≥ 0, t ≥ 1. This means that the block of digits an+1 . . . an+t keeps
repeating itself, i.e. for every m > n we have am = am+t .
Note:-
Although every real number has a decimal expansion, the representation
of a real number as a0 , a1 a2 a3 . . . is not unique. In fact, for example, the
two expressions 0, 9 and 1, 0 are different, but they represent the same real
number.
The set of complex numbers, denoted by C, is the set {a + bi : a, b ∈ R},

9
Andrea Ferraguti Chapter 0: Basics

where i is a symbol such that i2 = −1. Of course C contains R; the latter is


simply the subset {a + 0i : a ∈ R}.
Note:-
The representation of a complex number as a + bi for some reals a, b is
unique. That is, a + bi = c + di if and only if a = c and b = d.

Complex numbers can be added and multiplied according to the following


rules:

• (a + bi) + (c + di) = (a + c) + (b + d)i.

• (a + bi)(c + di) = (ac − bd) + (ad + bc)i.

One can verify that addition and multiplication are commutative and associa-
tive.
There is a bijective map, called conjugation, defined as follows:

·: C → C

a + bi 7→ a + bi := a − bi
Conjugation has the property that x + y = x + y and xy = x · y for every
x, y ∈ C. Moreover,
R = {x ∈ C : x = x}. (1)

0.5 Polynomials
Let K be a field (for the definition of field, see Definition 1.1.13). If you are
not familiar yet with the concept of field, just think of K as to one among
Q, R or C. A polynomial with coefficients in K is an expression of the form

p(x) = a0 + a1 x + a2 x2 + . . . + an xn ,

where ai ∈ K for every i ∈ {0, . . . , n}.


The degree of p(x), denoted by deg p(x), is the largest index i such that
ai 6= 0. If there is no such index, then p(x) = 0, and we set by convention
deg p(x) = −∞.
Note:-
Polynomials of degree 0 are non-zero constants, i.e. they are simply elements
of K different from 0.
The set of all polynomials with coefficients in K will be denoted by K[x].

10
Andrea Ferraguti Chapter 0: Basics

can be added and multiplied. Let p(x) = ni=0 ai xi and q(x) =


P
PmPolynomials
j
j=0 bj x be two polynomials with coefficients in K. Now suppose
Pn
that n > m
(the case m > n is symmetric); then you can write q(x) = j=0 bj xj , where
bj = 0 if j ∈ {m + 1, . . . , n}. Then:
n
X
p(x) + q(x) = (ai + bi )xi
i=0

and !
2n
X X
p(x)q(x) = ai b j xk .
k=0 i,j : i+j=k

When we add up two polynomials, the degree of the sum is at most the largest
degree between the two, namely

deg(p(x) + q(x)) ≤ max{deg p(x), deg q(x)},

and equality holds if deg p(x) 6= deg q(x). When we multiply two polyomials,
their degrees add up:

deg(p(x)q(x)) = deg p(x) + deg q(x).

If p(x) is a polynomial with coefficients in K and z ∈ K, we can evaluate


p(x) at x = z. This simply means computing

p(z) := a0 + a1 z + a2 z 2 + . . . + an z n ,

that is an element of K. Notice that if p(x), q(x) ∈ K[x] and z ∈ K then:

(p(x) + q(x))(z) = p(z) + q(z) and (p(x)q(x))(z) = p(z)q(z).

We say that z is a root of p(x) if p(z) = 0. Polynomials of degree 0 have


no roots.

Theorem 0.5.1. Let p(x) ∈ K[x] have degree n ≥ 1. If z ∈ K is a root


of p(x), then there exists a unique integer k ≥ 1 and a unique polynomial
q(x) of degree n − k such that q(z) 6= 0 and p(x) = (x − z)k q(x).
The positive integer k whose existence is granted by Theorem 0.5.1 is called
multiplicity of the root z.

11
Andrea Ferraguti Chapter 0: Basics

Corollary 0.5.2. If deg p(x) = n ≥ 1 and z1 , . . . , zr ∈ K are roots of p(x),


with multiplicities k1 , . . . , kr , respectively, then k1 + . . . + kr ≤ n. In
particular, r ≤ n.
In other words, a polynomial of degree n has at most n roots in K, even
when each root is counted with its multiplicity.

Theorem 0.5.3 (Fundamental theorem of algebra). Let p(x) ∈ C[x] be


a polynomial of degree n. Then there exists unique complex numbers
z1 , . . . , zr and unique positive integers k1 , . . . , kr such that

p(x) = (x − z1 )k1 . . . (x − zr )kr .


In other words, a polynomial with complex coefficients of degree n has exactly
n roots, when each is counted with its multiplicity.
Notice that since Q ⊆ R ⊆ C, then every degree n polynomial with coef-
ficients in Q or in R has exactly n complex roots, when each is counted with
its multiplicty.

Lemma 0.5.4. Let p(x) ∈ R[x] be a polynomial. If λ ∈ C is a root of p,


then λ is a root of p as well.

0.6 Matrices
Let K be a field (for the definition of field, see Definition 1.1.13). If you are
not familiar yet with the concept of field, just think of K as to one among
Q, R or C. Let m, n ≥ 1 be integers.

Definition 0.6.1. An m × n matrix with coefficients in K is a function

A : {1, . . . , m} × {1, . . . , n} → K.

The image of a pair (i, j) ∈ {1, . . . , m} × {1, . . . , n} will be denoted by


aij .

For practical purposes, we identify a matrix with its image, and we arrange
the values of the function defining the matrix in a rectangular table with m
rows and n columns. That is, an m × n matrix A is a table of the following

12
Andrea Ferraguti Chapter 0: Basics

form:  
a11 a12 . . . a1n
 
 
 a21 a22 . . . a2n 
A=
 .

... ... ... ... 
 
am1 am2 . . . amn
To ease the notation, we sometimes will write A = (aij ) i=1,...,m to say that A
j=1,...,n
is an m × n matrix whose entry in row i and column j is aij .
The set of all m×n matrices with coefficients in K is denoted by Mm×n (K).
If m = n, we shorten the notation by just writing Mn (K). Matrices belonging
to Mn (K) are called square matrices of size n.
Any two elements of Mm×n (K) can be added, as follows. Given
   
a a12 . . . a1n b b12 . . . b1n
 11   11 
   
 a21 a22 . . . a2n   b21 b22 . . . b2n 
A= 
 and B = 
 


... ... ... ...  ... ... ... ...
   
am1 am2 . . . amn bm1 bm2 . . . bmn
we let  
a11 + b11 a12 + b12 ... a1n + b1n
 
 
 a21 + b21 a22 + b22 . . . a2n + b2n 
A + B := 
 .

 ... ... ... ... 
 
am1 + bm1 am2 + bm2 . . . amn + bmn
Matrices could, in principle, be multiplied entry-by-entry, similarly to the way
we add them. However, this multiplication will not be used anywhere in these
lecture notes. On the other hand, we will now define a matrix multiplication
that can be performed only between an m × n and an n × p matrix, where
m, n and p are positive integers. If A = (aij ) i=1,...,m ∈ Mm×n (K) and B =
j=1,...,n
(bij ) i=1,...,n ∈ Mn×p (K), the matrix AB is the matrix (cij ) i=1,...,m ∈ Mm×p (K)
j=1,...,p j=1,...,p
where for every i ∈ {1, . . . , m} and every j ∈ {1, . . . , p}:
n
X
cij = aik bkj .
k=1

In other words, to find the entry in row i and column j of the matrix AB, we
need to take the i-th row of A (that has n entries) and the n-th column of B

13
Andrea Ferraguti Chapter 0: Basics

(that also has n entries), multiply the corresponding entries (the entry in the
i-th row and k-th column of A must be multiplied with the entry in the k-th
row and j-th column of B), and add up all the results.
Note:-
Given matrices A and B, the product AB makes sense only when the num-
ber of columns of A equals the number of rows of B.

Example 0.6.2.
 
  1
2 3 1  
• Let A :=   ∈ M2×3 (R) and B := 
2 ∈ M3×1 (R).

−1 0 1  
2
 
10
Then AB =   ∈ M2×1 (R).
1
  

2 2 1 1 −1
• Let A :=   ∈ M2 (Q) and B :=   ∈ M2×3 (Q).
0 0 0 0 3
 
2 2 4
Then AB =   ∈ M2×3 (Q).
0 0 0
 
  1
2 2  
• The matrices A :=   ∈ M2 (C) and B :=  2 ∈ M3×1 (C)

0 0  
2
cannot be multiplied, since A has two columns and B has 3 rows.

Remark 0.6.3. If A, B ∈ Mn (K), then both products AB and BA make


sense. However, in general, the two results are different from each other.
That is, multiplication of matrices is not commutative.
We close this section with a few definitions that will be used later on.

Definition 0.6.4. Let K be a field and n ≥ 1 an integer.

14
Andrea Ferraguti Chapter 0: Basics

1. The identity matrix is the matrix In = (aij )i,j ∈ Mn (K) defined by


aij = 1 if i = j and 0 otherwise.

2. A = (aij ) ∈ Mn (K) is a diagonal matrix if aij = 0 whenever i 6= j.

3. A = (aij ) ∈ Mn (K) is upper triangular if aij = 0 whenever i > j.

4. A = (aij ) ∈ Mn (K) is lower triangular if aij = 0 whenever i < j.

Example 0.6.5.

• The identity matrix I3 ∈ M3 (K) is:


 
1 0 0
 
I3 = 0 1 0 .
 
 
0 0 1

• The matrix  
2 0 0 0
 
 
0 3 0
0 
 
0 −1
 
0 0 
 √ 
0 0 0 − 2
is diagonal.

• The matrix  
1 2 3
 
A 0 4 5 
 
 
0 0 −1
is upper triangular.

15
Andrea Ferraguti Chapter 0: Basics

• The matrix  
2 0 0 0
 
−7 3 0 0
 
 
 0 1 −1 0
 
 
π 0 22 0
is lower triangular.

Remark 0.6.6. If A ∈ Mn (K), then AIn = In A = A.

Definition 0.6.7. Let K be a field.

1. Let A = (aij ) i=1,...,m ∈ Mm×n (K). The transpose matrix is the


j=1,...,n
matrix t A = (bji ) j=1,...,n ∈ Mn×n (K) defined by bji = aij for every
i=1,...,m
i ∈ {1, . . . , m} and j ∈ {1, . . . , n}.

2. Let A ∈ Mn (K) be a square matrix. A is called symmetric if A = t A,


and is called antisymmetric if A = −t A.

Example 0.6.8.

• Let  
1 2 −3
A :=  .
0 1 7
Then  
1 0
 
t
A =  2 1 .
 
 
−3 7

• Let  
A := 1 2 3 4 .

16
Andrea Ferraguti Chapter 0: Basics

Then  
1
 
 
t
2
A=
 .

3
 
4

• Let  
1 2 0
 
A :=  5 6 7  .
 
 √ 
π 8 6
Then  
1 5 π
 
t
A = 2 6 8  .
 
 √ 
0 7 6

• The matrix  
1 0 −2
 
2 −3
 
0
 
−2 −3 0
is symmetric.

Remark 0.6.9.

1. If A ∈ Mm×n (K), then t (t A) = A.

2. If A, B ∈ Mm×n (K), then t (A + B) = t A + t B.

3. if A ∈ Mm×n (K) and B ∈ Mn×p (K), then t (AB) = t B t A.

17
Andrea Ferraguti Chapter 1: Vector spaces

Chapter 1: Vector spaces


In this chapter we will introduce the fundamental objects of linear algebra,
namely vector spaces.

1.1 Groups and fields


Let S be a set.

Definition 1.1.1. An operation on S is a function ? : S × S → S. An


operation ? on S is called:

1. associative if for every a, b, c ∈ S we have (a ? b) ? c = a ? (b ? c);

2. commutative if for every a, b ∈ S we have a ? b = b ? a.

Example 1.1.2.

• Addition is an associative and commutative operation on N. Sub-


traction is not an operation on N, because the difference of two
natural numbers is not always a natural number.

• Subtraction is an operation on Z, but it is not associative (for ex-


ample, 1 − (2 − 2) 6= (1 − 2) − 2) nor commutative (for example,
1 − 2 6= 2 − 1).

Definition 1.1.3. Let S be a set and ? be an operation on S. An element


e ∈ S is called a neutral element for ? if a ? e = e ? a = a for every a ∈ S.

Example 1.1.4. 0 is a neutral element for the operation + on Q. 1 is a


neutral element for the operation · on R.

Lemma 1.1.5. Let S be a set and ? an operation on S. A neutral element


for ?, if it exists, is unique.

Proof. Let e, e0 be neutral elements for ?. Then e ? e0 = e since e0 is


neutral, but also e ? e0 = e0 since e is neutral. Hence e = e0 .

18
Andrea Ferraguti Chapter 1: Vector spaces

Definition 1.1.6. Let S be a set and ? be an operation on S with neutral


element e. Let a ∈ S. We say that an element b ∈ S is an inverse of a if
a ? b = b ? a = e.

Lemma 1.1.7. Let S be a set and ? be an operation on S with neutral


element e. Let a ∈ S. If there exists an inverse of a, then it is unique.

Proof. Let b, b0 ∈ S be such that a ? b = e = a ? b0 . Multiplying both


sides of this equality by b we get that b ? (a ? b) = b ? (a ? b0 ). Since ? is
associative, it follows that (b ? a) ? b = (b ? a) ? b0 . Since b is an inverse
of a, we have b ? a = e and hence e ? b = e ? b0 . Since e is neutral, b = b0
follows.

Example 1.1.8. Addition is an associative and commutative operation


on C. The neutral element is 0, and every element a + bi has the unique
inverse −a − bi.

Definition 1.1.9. A group is a pair (G, ?) where G is a nonempty set and


? is an operation on G that satisfies the following properties:

1. ? is associative;

2. there exists a neutral element e;

3. every element in g has an inverse.

If, in addition, ? is commutative we say that the group (G, ?) is abelian.


If ? is not commutative, we say that (G, ?) is non-abelian.

Example 1.1.10.

• The pair (Q, +) is an abelian group. The set (Q, ·) is not an abelian
group, because 0 does not possess an inverse.

• The pair (C \ {0}, ·) is an abelian group. In fact, every non-zero


complex number a + bi has the multiplicative inverse aa−bi
2 +b2 .

19
Andrea Ferraguti Chapter 1: Vector spaces

• If S is a finite set, the set

{f : S → S s.t. f is bijective}

is a non-abelian group, when endowed with the operation ”compo-


sition of functions”. In fact, the composition of two bijective func-
tions is bijective, composition is associative by Proposition 0.2.7,
the identity function is the neutral element, and every element is
invertible by Proposition 0.2.8.

Remark 1.1.11. The fact that an operation ? on G is associative implies


that we can omit brackets when we apply it several times in a row. Namely,
the writing
g1 ? g2 ? . . . ? gn
makes sense because we can compute the operations in the order we prefer,
and the result does not change. For example,

g1 ? (g2 ? (g3 ? g4 ))) = (g1 ? g2 ) ? (g3 ? g4 ).

Lemma 1.1.12. Let (G, ?) be a group and let a, b, c ∈ G. If a ? b = a ? c,


then b = c.

Proof. Let a0 be the inverse of a. Since a ? b = a ? c, it follows that


a0 ? (a ? b) = a0 ? (a ? c). Since ? is associative, we get (a0 ? a) ? b = (a0 ? a) ? c,
and since a0 ? a is the neutral element, it follows b = c.
From now on, we will denote by + or · operations on groups, where + will
only be used in abelian groups while · can be used in both settings. When the
operation is denoted by +, we denote by 0 the neutral element and by −a the
inverse of a. When the operation is denoted by · we denote by 1 the neutral
element and by a−1 the inverse of a.

Definition 1.1.13. A field is a triple (K, +, ·), where K is a nonempty set


and +, · are two operations on K that satisfy the following properties:

1. (K, +) is an abelian group;

2. (K \ {0}, ·) is an abelian group;

20
Andrea Ferraguti Chapter 1: Vector spaces

3. for every a, b, c ∈ K we have a · (b + c) = a · b + a · c.

Example 1.1.14. Q, R, C are all fields with the usual operations of sum
and multiplication. Z is not a field with respect to the usual sum and
multiplication because elements different from ±1 do not have a multi-
plicative inverse.
The set F2 := {0, 1} is a field when endowed with the following two
operations:

+ 0 1 · 0 1

0 0 1 0 0 0
1 1 0 1 0 1

Remark 1.1.15. If (K, +, ·) is a field, then K must contain at least the


neutral element for the operation +, that we denote by 0, and the neutral
element for the operation ·, that we denote by 1. These two elements
cannot coincide, because the definition of field requires (K \ {0}, ·) to be
an abelian group, and groups are nonempty sets. Therefore, a field always
contains at least two distinct elements, 0 and 1. The field F2 described in
the above example is therefore the smallest possible example of a field.
From now on, we will denote fields just by the letter K, tacitly implying
that the operations on K are + and ·. Moreover, we will frequently drop the
multiplication sign in fields. That is, if a, b ∈ K we will write ab instead of
a · b.
Note:-
When K is a field, we denote by 0 the neutral element with respect to the
operation + and by 1 the neutral element with respect to the operation ·.

Definition 1.1.16. Let K be a field. A vector space over K or a K-vector


space is an abelian group (V, +) endowed with a map

∗: K × V → V

that satisfies the following properties:

21
Andrea Ferraguti Chapter 1: Vector spaces

1. for every α ∈ K and every v, w ∈ V , α ∗ (v + w) = α ∗ v + α ∗ w;

2. for every α, β ∈ K and every v ∈ V , (αβ) ∗ v = α ∗ (β ∗ v).

3. for every α, β ∈ K and every v ∈ V , (α + β) ∗ v = α ∗ v + β ∗ v;

4. for every v ∈ V , 1 ∗ v = v.

Elements of V are called vectors, and will be denoted by underlined letters,


such as v, w, u. In particular, the neutral element of (V, +) is denoted by
0 and is called zero vector. Elements of K are called scalars.

Note:-
Addition in K and addition in V are both denoted by +. However, beware of
the fact that these are different operations, since one is a function K × K →
K and the other one is a function V × V → V . Therefore, an expression of
the form α + v, where α ∈ K and v ∈ V , does not make any sense.
Note:-
While 0 denotes the zero vector, namely the neutral element of the group
(V, +), we denote by 0 the neutral element of the group (K, +). Hence these
are two very different objects, do not confuse them!

Example 1.1.17. Let V = {0} be the abelian group that possesses only
one element, the neutral element with respect to the operation + on V .
This is a K-vector space over any field K, the operation ∗ being defined
by α ∗ 0 = 0 for every α ∈ K. This is the simplest possible vector space,
although not a very interesting one.

There are several important examples of vector spaces, but for the sake of
these lecture notes the most important is by far the following one.

Example 1.1.18. We let K be any field, n be a positive integer and V :=


K n , that is, V is the set of all n-tuples of elements of K. V is an abelian
group with respect to the following operation:

(a1 , . . . , an ) + (b1 , . . . , bn ) = (a1 + b1 , a2 + b2 , . . . , an + bn )

Now we define the following operation:

∗: K × V → V

22
Andrea Ferraguti Chapter 1: Vector spaces

(α, (a1 , . . . , an )) 7→ (αa1 , . . . , αan )


Let us check that, for example, property 1. of the definition of a vector
space is indeed satisfied. In order to do that, we need to take an arbitrary
α ∈ K and two vectors v = (v1 , . . . , vn ) and w = (w1 , . . . , wn ) in V and
prove that α ∗ (v + w) = α ∗ v + α ∗ w. Now

v + w = (v1 + w1 , v2 + w2 , . . . , vn + wn )

and hence

α ∗ (v + w) = (α(v1 + w1 ), α(v2 + w2 ), . . . , α(vn + wn )) =

= (αv1 + αw1 , αv2 + αw2 , . . . , αvn + αwn ).


On the other hand,

α ∗ v = (αv1 , αv2 , . . . , αvn ) and α ∗ w = (αw1 , αw2 , . . . , αwn ).

Hence

αv + αw = (αv1 + αw1 , αv2 + αw2 , . . . , αvn + αwn ) = α ∗ (v + w),

as desired. It is a very useful exercise for the reader to verify that prop-
erties 2., 3. and 4. are satisfied as well.
Notice that when n = 1 we have V = K. When this happens, opera-
tion ∗ coincides with multiplication in K. Therefore, every field K, seen
as an abelian group with respect to the sum, is a vector space over K.

Example 1.1.19. Let n ≥ 1 be an integer and K be a field. Let

K[x]≤n = {p(x) ∈ K[x] : deg p(x) ≤ n}.

This is the set of all polynomials with coefficients in K of degree at most


n. The set K[x]≤n is an abelian group with respect to addition of poly-
nomials, and it is a vector space over K with respect to the operation

∗ : K × K[x]≤n → K[x]≤n
n
! n
X X
i
α, ai x 7→ (αai )xi
i=0 i=0

23
Andrea Ferraguti Chapter 1: Vector spaces

Example 1.1.20. Let n ≥ 1 be an integer and K be a field. The set


Mn (K) of all n × n matrices with coefficients in K is an abelian group
with respect to the sum of matrices. The group (Mn (K), +) can be given
a structure of vector space over K via the following operation:

∗ : K × Mn (K) → Mn (K)

(α, (aij )i,j=1,...,n ) 7→ (αaij )i,j=1,...,n .

Theorem 1.1.21. Let V be a vector space over a field K. Let α ∈ K and


v ∈ V . Then α ∗ v = 0 if and only if v = 0 or α = 0.

Proof. First, we show that if α = 0 or v = 0 then α ∗ v = 0. Let us start


by assuming that α = 0. Since 0 + 0 = 0 (because 0 is the neutral element
with respect to the sum in K), we have that

0 ∗ v = (0 + 0) ∗ v = 0 ∗ v + 0 ∗ v,

using property 3. of Definition 1.1.16.


Now 0 ∗ v = 0 ∗ v + 0, and therefore we can rewrote the above equality
as:
0 ∗ v + 0 = 0 ∗ v + 0 ∗ v.
Now we can apply Lemma 1.1.12 in the group (V, +) and conclude that
0 ∗ v = 0.
Next, suppose that v = 0. Since 0 + 0 = 0 (because 0 is the neutral
element with respect to the sum in V ), we have that

α ∗ 0 = α ∗ (0 + 0) = α ∗ 0 + α ∗ 0,

using property 1. of Definition 1.1.16.


Since α ∗ 0 = α ∗ 0 + 0, the above equality becomes

α ∗ 0 + 0 = α ∗ 0 + α ∗ 0,

and we can apply Lemma 1.1.12 and conclude that α ∗ 0 = 0.


Conversely, we must show that if α ∗ v = 0 then α = 0 or v = 0. So
suppose that α 6= 0; we will show that v = 0. In fact, since α 6= 0 then

24
Andrea Ferraguti Chapter 1: Vector spaces

there exists a multiplicative inverse α−1 of α. Starting from the equality


α ∗ v = 0, we get:
α−1 ∗ (α ∗ v) = α−1 ∗ 0.
Now on the one hand in the first part of the proof we proved that α−1 ∗0 =
0. On the ohter hand we can use properties of vector spaces given by
Definition 1.1.16, and get:

0 = α−1 ∗ 0 = α−1 ∗ (α ∗ v) = (α−1 α) ∗ v = 1 ∗ v = v,

that is, v = 0.

Corollary 1.1.22. If K is a field and a, b ∈ K are such that ab = 0, then


a = 0 or b = 0.

Proof. As noticed in Example 1.1.18, every field is a vector space over


itself, with the operation ∗ coinciding with multiplication in K. Hence it
is enough to apply Theorem 1.1.21 to such setting.

Corollary 1.1.23. Let V be a K-vector space. Then for every v ∈ V we


have (−1) ∗ v = −v.

Proof. By Theorem 1.1.21 and the properties of vector spaces, we have


that:

0 = 0 ∗ v = (1 − 1) ∗ v = 1 ∗ v + (−1) ∗ v = v + (−1) ∗ v,

that is, (−1) ∗ v = −v.


From now on, when V is a vector space over a field K with respect to some
operation ∗ : K × V → V , we will denote by · multiplication between scalars
and vectors, and we will often drop the multiplication sign. That is, if α ∈ K
and v ∈ V , we will write α · v or αv instead of α ∗ v.

1.2 Linear combinations and subspaces

Definition 1.2.1. Let V be a vector space over a field K. Let v 1 , . . . , v n ∈


V and α1 , . . . , αn ∈ K. The linear combination of v 1 , . . . , v n with coeffi-

25
Andrea Ferraguti Chapter 1: Vector spaces

cients α1 , . . . , αn is the vector:

α1 v 1 + α2 v 2 + . . . + αn v n ∈ V.

Example 1.2.2. Let V = R2 , let v 1 = (0, 1), v 2 = (1, −1) and v 3 = (2, 2)
be vectors. Let α1 = 1, α2 = 2, α3 = 0 be scalars. The linear combination
of v 1 , v 2 , v 3 with coefficients α1 , α2 , α3 is the vector

1 · (0, 1) + 2 · (1, −1) + 0 · (2, 2) = (2, −1) ∈ V.

Definition 1.2.3. Let (V, +) be an abelian group.

1. Let W ⊆ V be a nonempty subset. W is called a subgroup of V if


(W, +) is an abelian group.

2. Let K be a field and let ∗ : K × V → V be an operation that makes


V into a vector space over K. A subset W ⊆ V is called a vector
subspace of V if:

(a) (W, +) is an abelian group;


(b) W is a K-vector space with respect to the operation ∗.

Example 1.2.4.

• Let (V, +) := (Z, +). The subset

2Z := {n ∈ Z : 2 divides n}

is a subgroup of Z. In fact, the sum of two multiples of 2 is a


multiple of 2, there is a neutral element, namely 0, and the inverse
of a multiple of 2 is a multiple of 2.
On the other hand, the subset 2Z+1 = {n ∈ Z : 2 does not divide n}
is not a subgroup of Z, since + is not an operation on 2Z + 1: the
sum of two odd numbers is even.

• Let (V, +) := (K 2 , +), with the sum being defined coordinatewise.


As we have seen in Example 1.1.18, this is a K-vector space. The
subset W = {(x, x) : x ∈ K} ⊆ K 2 is a vector subspace of V . Let us

26
Andrea Ferraguti Chapter 1: Vector spaces

first verify that (W, +) is a subgroup of (V, +). If (x, x), (y, y) ∈ W ,
then (x, x) + (y, y) = (x + y, x + y), that is an element of W since
its two entries coincide. Hence + is an operation on W . Clearly
since + is associative on V it is also associative on W . The neutral
element for the sum on V is (0, 0), that belongs to W . Hence it is
the neutral element for the sum on W . Finally, the inverse of an
element (x, x) ∈ W is (−x, −x), that belongs again to W .
Next, λ(x, x) = (λx, λx) ∈ K 2 for every λ ∈ K and (x, x) ∈ W .
Hence the operation K × V → V given by (λ, (x, y)) 7→ (λx, λy)
restricts to an operation on W that makes the latter into a K-vector
space.
On the other hand, the subset U = {(x, 1) : x ∈ K} ⊆ V is not a
vector subspace of V , since (U, +) is not a subgroup of (V, +): in
fact, it has no neutral element.

Lemma 1.2.5. Let (G, ·) be a group, and let H ⊆ G. Then (H, ·) is a


subgroup if and only if for every g, h ∈ H we have g · h−1 ∈ H.

Proof. First, if H is a subgroup then given g, h ∈ H, the inverse of h, that


is h−1 , must also belong to H, and the product g · h−1 must belong to H
as well, since · is an operation on H.
Conversely, suppose that for every g, h ∈ H the product g · h−1 belongs
to H. We need to prove that · is an operation on H, that the neutral
element 1 is in H and that if h ∈ H, then h−1 ∈ H. Notice that the
fact that · is associative is obvious, since it is associative on G. If g ∈
H, then the hypothesis guarantees that 1 = g · g −1 ∈ H, and therefore
H contains the neutral element. Therefore, if h ∈ H then the same
hypothesis guarantees that h−1 = 1 · h−1 ∈ H, so H contains inverses of
every element. Finally, if g, h ∈ H then we have just proved that h−1 ∈ H
and so g · h = g · (h−1 )−1 ∈ H by the hypothesis, completing the proof.

Theorem 1.2.6. Let V be a vector space over a field K and let W ⊆ V


be a non-empty subset. Then W is a vector subspace of V if and only if
for every v 1 , v 2 ∈ W and every α1 , α2 ∈ K we have:

α1 v 1 + α2 v 2 ∈ W.

27
Andrea Ferraguti Chapter 1: Vector spaces

Proof. First, suppose that W is a subspace of V . Let v 1 , v 2 ∈ V and


α1 , α2 ∈ K. We need to prove that α1 v 1 + α2 v 2 ∈ W . Since W is a vector
space with respect to the same operation that makes V into a K-vector
space, we have that α1 v 1 ∈ W and α2 v 2 ∈ W . On the other hand (W, +)
is a subgroup of (V, +), and therefore for every w1 , w2 ∈ W we have that
w1 + w2 ∈ W . If we pick w1 = α1 v 1 and w2 = α2 v 2 , we get precisely that
α1 v 1 + α2 v 2 ∈ W .
Conversely, suppose that for every v 1 , v 2 ∈ W and every α1 , α2 ∈ K
we have α1 v 1 + α2 v 2 ∈ W . We need to prove that W is a subspace of
V . This amounts to proving that (W, +) is a subgroup of (V, +) and that
multiplication by scalars of K makes W into a K-vector space. We start
by proving that (W, +) is a subgroup of (V, +). By Lemma 1.2.5, it is
enough to show that for every w1 , w2 ∈ W we have w1 − w2 ∈ W . By
hypothesis, given w1 , w2 ∈ W , if we let α1 = 1 and α2 = −1 we have
α1 w1 + α2 w2 = w1 + (−1)w2 ∈ W . By Corollary 1.1.23, it follows that
w1 − w2 ∈ W . Hence (W, +) is a subgroup of (V, +). In order to prove
that W is a vector space, we only need to prove that for every α ∈ K and
every w ∈ W the vector αw belongs to W . In fact, properties 1., . . . , 4. of
Definition 1.1.16 are automatically satisfied for W , since they are satisfied
for V by hypothesis. Now if α ∈ K and w ∈ W , the hypothesis guarantees
that αw = α · w + 0 · w ∈ W , as we needed to prove.

Corollary 1.2.7. If W is a vector subspace of a K-vector space V , then


for every n ≥ 1, every α1 , . . . , αn ∈ K and every v 1 , . . . , v n ∈ W we have:

α1 v 1 + . . . + αn v n ∈ W.

Proof. We use induction on n. For n = 1, the statement is true because


α1 v 1 = α1 v 1 + 0 · v 1 , and the latter belongs to W by Theorem 1.2.6. Now
suppose that the claim is true for n − 1 vectors, and let α1 , . . . , αn ∈ K
and v 1 , . . . , v n ∈ V . Then

α1 v 1 + . . . + αn v n = (α1 v 1 + . . . + αn−1 v n−1 ) + αn v n .

Now α1 v 1 + . . . + αn−1 v n−1 ∈ W by the inductive hypothesis, and hence


there exists w ∈ W such that α1 v 1 + . . . + αn−1 v n−1 = w. Hence

α1 v 1 + . . . + αn v n = w + αn v n ,

28
Andrea Ferraguti Chapter 1: Vector spaces

and w + αn wn = 1 · w + αn v n belongs to W thanks to Theorem 1.2.6.


Note:-
If V is a K-vector space and W ⊆ V is a vector subspace, then 0 ∈ W . In
fact, in particular W is a subgroup of (V, +) and therefore it must contain
the neutral element for the sum on V . Hence, if W ⊆ V is a subset with
0∈ / W , then W is not a vector subspace of V .

Example 1.2.8. Using Theorem 1.2.6 it is easy to give examples of vector


subspaces.

• Let K be a field and V := K 3 . Let

W = {(x, y, z) ∈ V : x + y + z = 0}.

Then W is a vector subspace of V . In fact, if α, β ∈ K and


(x, y, z), (x0 , y 0 , z 0 ) ∈ W , then x + y + z = x0 + y 0 + z 0 = 0 and
since

α(x, y, z) + β(x0 , y 0 , z 0 ) = (αx + βx0 , αy + βy 0 , αz + βz 0 )

we get that

αx + βx0 + αy + βy 0 + αz + βz 0 = α(x + y + z) + β(x0 + y 0 + z 0 ) = 0,

so that α(x, y, z) + β(x0 , y 0 , z 0 ) ∈ W . Hence W is a vector subspace


of V .

• Let K be a field, n ≥ 1 be an integer and V := K[x]≤n . The set

W = {p(x) ∈ V : p(0) = 0}

is a vector subspace. In fact, if p(x), q(x) ∈ W and α, β ∈ K


then (αp(x) + βq(x))(0) = αp(0) + βq(0) = 0 + 0 = 0, and hence
αp(x) + βq(x) ∈ W . It follows that W is a vector subspace of V .

• Let K be a field, n ≥ 1 and Mn (K) the K-vector space of n × n


matrices with entries in K. The subset T ⊆ V of upper triangular
matrices (see Definition 0.6.4) is a vector subspace. In fact, let
M = (aij )i,j=1,...,n and M 0 = (a0ij )i,j=1,...,n be elements of T . Then
aij = a0ij = 0 for every i > j. If α, β ∈ K, the matrices αM and

29
Andrea Ferraguti Chapter 1: Vector spaces

βM 0 are still upper triangular, as αM = (αaij )i,j=1,...,n and βM 0 =


(βa0ij )i,j=1,...,n . On the other hand, the sum of two upper triangular
matrices is clearly upper triangular, and hence αM + βM 0 ∈ T .
Similarly, the set of lower triangular matrices and the set of diagonal
matrices are vector subspaces of Mn (K).

Definition 1.2.9. Let V be a K-vector space and let A ⊆ V be a nonempty


subset. The span of A is the set
( n )
X
hAi := αi v i : n ∈ N, αi ∈ K, v i ∈ A .
i=1

In other words, the span of A is the set of all linear combinations of all
elements of A.

Remark 1.2.10. Clearly A ⊆ hAi. In fact, every v ∈ A can be seen as the


linear combination 1 · v, and being a linear combination of elements of A
(in this case just one element), it belongs to hAi.

Theorem 1.2.11. Let A ⊆ V be a nonempty subset. Then hAi is a vector


subspace of V .
Pn Pm
Proof. Let i=1 αi v i and j=1 βj w j be two elements of hAi, so that
v i , wj ∈ A and αi , βj ∈ K for every i, j. Let λ, µ ∈ K. Then
n
X m
X
λ αi v i +µ βj wj = (λα1 )v 1 +. . .+(λαn )v n +(µβ1 )w1 +. . .+(µβm )wm ,
i=1 j=1

that is again a linear combination of vectors of A, and therefore it is by


definition an element of hAi. It follows by Theorem 1.2.6 that hAi is a
subspace of V .

30
Andrea Ferraguti Chapter 1: Vector spaces

Proposition 1.2.12. Let V be a K-vector space and A ⊆ V . Then


\
hAi = W.
W ⊆V subsp.
A⊆W

In other words, the span of A is the intersection of all vector subspaces of


V that contain A.

Proof. In order to prove that two sets are equal, we need to prove that
each is contained in the other one.
So first, suppose that v ∈ hAi. We need to show that P if W is a
subspace that contains A, then v ∈ W . By definition, v = ni=1 αi v i for
some v 1 , . . . , v n ∈ A and α1 , . . . , αn ∈ K. If A ⊆ W , then v 1 , . . . , v n ∈ W
and since subspaces are closed with respect to linear combinations (see
Corollary 1.2.7), it follows that v ∈ W .
Conversely, let v be a vector belonging to every subspace of V that
contains A. Since hAi is a vector subspace of V by Theorem 1.2.11, and
it contains A by Remark 1.2.10, it follows that v ∈ hAi.
What Proposition 1.2.12 says is that the span of a subset A ⊆ V is the smallest
vector subspace of V that contains A.

Proposition 1.2.13. Let V be a K-vector space and let A, B ⊆ V be


non-empty subsets.

1. If A ⊆ B, then hAi ⊆ hBi.

2. A = hAi if and only if A is a vector subspace of V .

Proof. 1. The set hBi is a vector subspace of V , and therefore it is closed


under linear combinations of elements of B. Since A ⊆ B and B ⊆ hBi,
it follows that hBi must contain all linear combinations of elements of A,
as well. That is, hAi ⊆ hBi.
2. If A = hAi, since the right hand side is a vector subspace by Theorem
1.2.11, then so is the left hand side.
Conversely, if A is a vector subspace then it is closed under linear
combinations of its elements, and hence it contains hAi.

Definition 1.2.14. Let V be a K-vector space and let W ⊆ V be a sub-

31
Andrea Ferraguti Chapter 1: Vector spaces

space. A subset A ⊆ W is called a system of generators for W if hAi = W .


The space V is said to be finitely generated (f.g. for short) if there exists
a finite system of generators for V .
In other words, A is a system of generators for W if every vector in W can be
written as a linear combination of vectors in A.

Example 1.2.15.

• The vector space K n is finitely generated, and the set

{(1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1)}

is a set of generators. In fact, for every v = (a1 , . . . , an ) ∈ K n we


can write:

v = a1 (1, 0, . . . , 0) + a2 (0, 1, 0, . . . , 0) + . . . + an (0, . . . , 0, 1),

i.e. every vector of V is a linear combination of

(1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1).

• The vector space K[x]≤n of polynomials


Pn of degree ≤ n is finitely
i
generated. In fact, if p(x) = i=0 ai x then p(x) is a linear com-
bination of the n + 1 polynomials 1, x, x2 , . . . , xn with coefficients
a0 , . . . , an+1 .

• The vector space K[x] of all polynomials with coefficients in K is


not finitely generated. In fact, suppose A = {p1 (x), . . . , pn (x)} is a
set of generators. Then every polynomial of K[x] would be a linear
combination of the pi (x)’s. But the degree of a linear combination of
elements of A is at most the maximum of the degrees of the elements
of A, since deg(αp(x) + βq(x)) ≤ max{deg p(x), deg q(x)}. Hence if
f (x) ∈ K[x] is a polynomial of degree larger than max{deg pi (x)}
then f (x) cannot be a linear combination of elements of A.

Proposition 1.2.16. Let V be a K-vector space. Let W ⊆ V be a finitely


generated vector subspace, and let {w1 , . . . , wn } ⊆ W be a system of

32
Andrea Ferraguti Chapter 1: Vector spaces

generators for W . Let v 0 ∈ V and

U := {v 0 + w : w ∈ W }

be the translated of W by v 0 . Then hU i = hv 0 , w1 , . . . , wn i.

Proof. Let u ∈ hU i. By definition of span there exist an integer P m ≥ 1,


scalars α1 , . . . , αm ∈ K and vectors u1 , . . . , um such that u = m
i=1 αi ui .
Now by definition of U , every ui has the form v 0 + v i for some v i ∈ W .
Hence,
m
X
u = (α1 + . . . + αm )v 0 + αi v i . (2)
i=1

On the other hand, W is spanned by w1 , . . . , wn , and Pn therefore for every i ∈


{1, . . . , m} there exist βi1 , . . . , βin such that v i = j=1 βij wj . Substituting
in (2), we get that:
n m
!
X X
u = (α1 + . . . + αm )v 0 + αi βij wj ,
j=1 i=1

1 , . . . , w n i.
so that u is an element of hv 0 , wP
Conversely, let u = α0 v 0 + ni=1 αi wi ∈ hv 0 , w1 , . . . , wn i. Notice that
v 0 ∈ U , because we can write v 0 as v 0 + 0, that is in U by definition.
Moreover, for every w ∈ W we have −w ∈ W and hence v 0 − w ∈ W .
Therefore, for every w ∈ W we have that:

w = v 0 − (v 0 − w) ∈ hU i,

because vector subspaces are closed under linear combinations. It fol-


Pn every wi is in hU i, and since the latter is a vector subspace
lows that
then
Pn i=1 αi w i ∈ hU i and α0 v 0 ∈ hU i. This implies that u = α0 v 0 +
i=1 αi w i ∈ hU i.

1.3 Linear dependence and linear independence

Definition 1.3.1. Let K be a field and V be a K-vector space. Let


v1, . . . , vn ∈ V .

1. We say that the vectors v 1 , . . . , v n are linearly dependent if there

33
Andrea Ferraguti Chapter 1: Vector spaces

exist α1 , . . . , αn ∈ K with at least one αi different from 0 such that:

α1 v 1 + α2 v 2 + . . . + αn v n = 0.

2. We say that the vectors v 1 , . . . , v n are linearly independent if given


α1 , . . . , αn ∈ K, the equality α1 v 1 + α2 v 2 + . . . + αn v n = 0 implies
that α1 = α2 = . . . = αn = 0.
In other words, n vectors v 1 , . . . , v n are linearly independent if the only
linear combination of them that yields the zero vector is 0v 1 + 0v 2 + . . . +
0v n . They are linearly dependent if there exists a linear combination of them
that yields the zero vector with not all the coefficients equal to 0. Therefore,
v 1 , . . . , v n are either linearly dependent or linearly independent, they cannot
be both dependent and independent at the same time.

Example 1.3.2. Let V = R3 . The vectors v 1 = (1, 0, 0), v 2 = (0, 1, 0)


and v 3 = (1, 1, 0) are linearly dependent. In fact, the linear combination
1 · v 1 + 1 · v 2 + (−1) · v 3 equals the zero vector. The vectors v 1 and v 2 , on
the other hand, are linearly independent. In fact, suppose that α, β ∈ K
are such that αv 1 + βv 2 = 0. Since αv 1 = (α, 0, 0) and βv 2 = (0, β, 0),
we must have that (α, β, 0) = 0 = (0, 0, 0). This implies clearly that
α = β = 0. Hence, the only linear combination of v 1 and v 2 that yields
the zero vector is that with both coefficients equal to 0.

Remark 1.3.3.

1. A single vector v ∈ V is linearly dependent if and only if it is the zero


vector. In fact, if v = 0 then the linear combination 1 · v equals 0,
and its coefficient is not zero, and therefore v is linearly dependent.
Conversely, if v is linearly dependent then by definition there exists
α ∈ K with α 6= 0 such that αv = 0. By Theorem 1.1.21, it follows
that v = 0.

2. Of course the notion of linear dependence/independence does not de-


pend on the order of the vectors. That is, if for example v 1 , v 2 , v 3 are
linearly dependent then so are v 2 , v 3 , v 1 . Formally, if v 1 , . . . , v n ∈ V
and σ : {1, . . . , n} → {1, . . . , n} is a bijection, then v 1 , . . . , v n are lin-
early dependent if and only if v σ(1) , . . . , v σ(n) are linearly dependent.

34
Andrea Ferraguti Chapter 1: Vector spaces

Proposition 1.3.4. Let V be a K-vector space and v 1 , . . . , v n ∈ V . Then


the following hold.

1. If v i = 0 for some i ∈ {1, . . . , n}, then v 1 , . . . , v n are linearly depen-


dent.

2. If there exist i, j ∈ {1, . . . , n} with i 6= j and α ∈ K such that


v i = αv j , then v 1 , . . . , v n are linearly dependent.

3. v 1 , . . . , v n are linearly dependent if and only if there exists i ∈


{1, . . . , n} and α1 , . . . , αi−1 , αi+1 , . . . , αn ∈ K such that

v i = α1 v 1 + α2 v 2 + . . . + αi−1 v i−1 + αi+1 v i+1 + . . . + αn v n .

4. If v 1 , . . . , v n are linearly independent and v ∈ V , then v 1 , . . . , v n , v


are linearly dependent if and only if there exist α1 , . . . , αn ∈ K such
that
v = α1 v 1 + . . . + αn v n .

5. If v 1 , . . . , v n are linearly dependent, then given any other m vec-


tors w1 , . . . , wm ∈ V , the vectors v 1 , . . . , v n , w1 , . . . , wm are linearly
dependent.

6. If v 1 , . . . , v n are linearly independent, then for every choice of indexes


i1 < i2 < . . . < im ⊆ {1, . . . , n}, the vectors v i1 , . . . , v im are linearly
independent.

Proof. 1. We have

0 · v 1 + 0 · v 2 + . . . + 0 · v i−1 + 1 · v i + 0 · v i+1 + . . . + 0 · v n = 0,

and hence there is a linear combination of the vectors that gives 0 without
all coefficients being 0.
2. By hypothesis, we have 1 · v i + (−α)v j = 0. Therefore,
n
X
1 · v i + (−α)v j + 0 · vk = 0
k=1
k6=i,j

is a linear combination of the vectors yielding 0 whose coefficients are not


all 0.

35
Andrea Ferraguti Chapter 1: Vector spaces

3. Suppose that v 1 , . . . , v n are linearly dependent.


P This means that
there exist α1 , . . . , αn ∈ K not all zero such that nj=1 αj v j = 0. Let
i ∈ {1, . . . , n} be an index such that αi 6= 0. Then

αi v i = −α1 v 1 − . . . − αi−1 v i−1 − αi+1 v i+1 − . . . − αn v n ,

and multiplying both sides by αi−1 we obtain:

v i = −(αi−1 α1 )v 1 − . . . − (αi−1 αi−1 )v i−1 − (αi−1 αi+1 )v i+1 − . . . − (αi−1 αn )v n .

Conversely, if

v i = α1 v 1 + α2 v 2 + . . . + αi−1 v i−1 + αi+1 v i+1 + . . . + αn v n

for some α1 , . . . , αn ∈ K then

α1 v 1 + α2 v 2 + . . . + αi−1 v i−1 + (−1) · v i + αi+1 v i+1 + . . . + αn v n = 0

is a linear combination of the v i ’s yielding the zero vector without all


coefficients being 0 (since the coefficient of v i is −1).
4. First, suppose that v 1 , . . . , v n , v are linearly dependent. Then we
have: n
X
αi v i + βv = 0 (3)
i=1

for some α1 , . . . , αn , β ∈ K not all zero. Now if it was β = 0, then at least


one αi would be non-zero, and hence (3) would imply that v 1 , . . . , v n are
linearly dependent, contradicting the hypothesis. Hence, β 6= 0. Then (3)
implies that:
v = −(β −1 α1 )v 1 − . . . − (β −1 αn )v n ,
proving the claim.
Conversely, if there exist α1 , . . . , αn such that v = ni=1 αi v i , then by
P
point 3. the vectors v 1 , . . . , v n , v are linearly dependent. P
5. Let α1 , . . . , αn ∈ K be not all zero and such that ni=1 αi v i = 0.
Then n
X
αi v i + 0 · w1 + . . . + 0 · wm = 0,
i=1

proving that v 1 , . . . , v n , w1 , . . . , wm are linearly dependent.

36
Andrea Ferraguti Chapter 1: Vector spaces

6. Suppose by contradiction that there are vectors v i1 , . . . , v im , with


Pm dependent. Then there exist βi1 , . . . , βim ∈
i1 < . . . < im that are linearly
K not all zero such that j=1 βij v ij = 0. Now for every i ∈ {1, . . . , n} let
(
0 if i ∈
/ {i1 , . . . , im }
αi =
βij if i = ij for some j ∈ {1, . . . , m}
Pn
and consider the linear combination i . This clearly yields the zero
Pm i=1 αi vP
vector, since it can be written as j=1 βij + i∈{i / 1 ,...,im } 0 · v i , and its coef-
ficients are not all zero, since at least one of the βij is nonzero. It follows
that v 1 , . . . , v n are linearly dependent, contradicting the hypothesis.

1.4 Bases and dimension


Definition 1.4.1. Let V be a f.g. K-vector space. A basis of V is an
n-tuple of vectors of V that are linearly independent and generate V .

Note:-
Bases are ordered, and therefore will be wrapped in round brackets. That
is, if (v 1 , v 2 , v 3 ) is a basis of a vector space V , then (v 2 , v 1 , v 3 ) is still a basis,
since the concepts of linear dependence and of generation do not depend on
the order, but it is a different basis.
We will now prove that every finitely generated vector space has a basis,
and any two bases have the same cardinality. This requires some preliminary
results.

Lemma 1.4.2. Let V be a f.g. K-vector space. Let {v 1 , . . . , v n } be a


system of generators for V . If v n is a linear combination of v 1 , . . . , v n−1 ,
then {v 1 , . . . , v n−1 } is a system of generators for V .

Proof. Let α1 , . . . , αn−1 ∈ K be such that


n−1
X
vn = αi v i . (4)
i=1

Now let v ∈ V . Since {v 1 , . . . , v n } isPa system of generators for V , there


exist β1 , . . . , βn ∈ K such that v = ni=1 βi v i . Substituting (4) into this

37
Andrea Ferraguti Chapter 1: Vector spaces

relation, we obtain that:


n−1
!
X
v = β1 v 1 + . . . + βn−1 v n−1 + βn αi v i =
i=1

= (β1 + βn α1 )v 1 + (β2 + βn α2 )v 2 + . . . + (βn−1 + βn αn−1 )v n−1 ,


so that v is a linear combination of v 1 , . . . , v n−1 . This means precisely
that {v 1 , . . . , v n−1 } is a system of generators for V .

Lemma 1.4.3. Let V be a f.g. K-vector space with V 6= {0}. Let A =


{v 1 , . . . , v n } ⊆ V be a system of generators for V . Then there exists a
subset B ⊆ A that is a linearly independent system of generators for V .

Proof. By induction on n. If n = 1 then A = {v 1 } and it cannot be v 1 = 0


because V 6= {0}. Hence v 6= 0, and by Remark 1.3.3, it follows that v is
linearly independent. Hence A is a linearly independent set of generators.
Now suppose that the claim is true for n − 1, and let A = {v 1 , . . . , v n }
be a set of generators for V . If A is linearly independent, there is nothing
to prove. Otherwise, A is linearly dependent, and therefore by Proposition
1.3.4 one of the vectors in A is a linear combination of the others. Up to
permuting the elements of A, we can assume that v n is a linear combi-
nation of v 1 , . . . , v n−1 . By Lemma 1.4.2, the set A0 = {v 1 , . . . , v n−1 } is a
system of generators for V . Since it has n − 1 elements, by the inductive
hypothesis it contains a linearly independent system of generators B. But
B ⊆ A, and hence we are done.

Corollary 1.4.4. Let V be a f.g. K-vector space. Then V has a basis.

Proof. Since V is f.g., there exists a finite system of generators {v 1 , . . . , v n }.


By Lemma 1.4.3 this contains a linearly independent system of generators,
i.e. a basis.

Lemma 1.4.5 (Steinitz lemma). Let V be a f.g. K-vector space with


V 6= {0} and let A = {v 1 , . . . , v n } be a system of generators for V . If
B = {w1 , . . . , wm } ⊆ V is a set of linearly independent vectors, then
m ≤ n.

38
Andrea Ferraguti Chapter 1: Vector spaces

Proof. By contradiction, suppose that m > n. Since A generates V , there


exist α1 , . . . , αn such that
n
X
w1 = αi v i . (5)
i=1

Since B is a set of linearly independent vectors, it cannot contain the zero


vector by Proposition 1.3.4. Therefore, the αi ’s cannot be all equal to
zero. Up to permuting the v i ’s, we can assume that α1 6= 0. Then by (5)
we get:
v 1 = −α1−1 w1 − (α1−1 α2 )v 2 − . . . − (α1−1 αn )v n .
In other words, v 1 is a linear combination of the vectors w1 , v 2 , . . . , v n .
Now since A generates V , then so does A ∪ {w1 }. But then by Lemma
1.4.2 the set {w1 , v 2 , . . . , v n } is a system of generators for V .
Since {w1 , v 2 , . . . , v n } generates V , the vector w2 is a linear combina-
tion of w1 , v 2 , . . . , v n . Hence there exist α1 , . . . , αn such that

w 2 = α1 w 1 + α2 v 2 + . . . + αn v n .

Now if it was α2 = α3 = . . . = αn = 0 then we would have w2 = α1 w1 and


by Proposition 1.3.4 the set B would be linearly dependent, yielding a con-
tradiction. Hence, at least one among α2 , . . . , αn must be non-zero. Again
without loss of generality we can assume that α2 6= 0. Then, reasoning as
above, we get

v 2 = −α2−1 α1 w1 + α2−1 w2 − α2−1 α3 v 3 + . . . + α2−1 αn v n ,

and it follows that {w1 , w2 , v 3 , . . . , v n } is a system of generators for V . Re-


peating this argument for w3 , . . . , wn we end up proving that {w1 , . . . , wn }
generates V . Since we are assuming that m > n, the vector wn+1 is a linear
combination of w1 , . . . , wn , and hence the vectors w1 , . . . , wn+1 are linearly
dependent by Proposition 1.3.4, and consequently so are w1 , . . . , wm . This
contradicts the hypothesis, and therefore it must be m ≤ n.

Corollary 1.4.6. Let V be a f.g. K-vector space. Let B and B 0 be two


bases of V . Then |B| = |B 0 |.

Proof. Let B = (v 1 , . . . , v n ) and B 0 = (w1 , . . . , wm ). By the definition of

39
Andrea Ferraguti Chapter 1: Vector spaces

basis, {v 1 , . . . , v n } is a system of generators and w1 , . . . , wm are linearly


independent. By Steinitz Lemma, it must be m ≤ n. But one can argue
symmetrically: {w1 , . . . , wm } is a system of generators and the vectors
v 1 , . . . , v n are linearly independent. By Steinitz lemma then we have n ≤
m, and hence n = m.
Note:-
It can be proven that any vector space, not necessarily finitely generated,
has a basis, and that any two bases have the same cardinality. However, the
proof requires the axiom of choice, and is beyond the goal of these lecture
notes.

Definition 1.4.7. Let V be a f.g. K-vector space with V 6= {0}. The


dimension of V is the cardinality of any basis of V . If V has dimension
n we write dim V = n.
Note:-
By convention, we say that the trivial vector space {0} has dimension 0.
Vector spaces different from this always have dimension ≥ 1.

Example 1.4.8.

• Let V = K n . The sequence (e1 , e2 , . . . , en ) is a basis of V , where


ei = (0, . . . , 0, 1, 0, . . . , 0) is the vector whose i-th entry is 1 and
whose other entries are 0. In fact, as we have seen in Example
1.2.15 the
Peni ’s generate V . Moreover, they are linearly independent,
because i=1 αi ei = (α1 , . . . , αn ), and hence if the latter is the zero
vector, then clearly αi = 0 for every i. It follows that

dim K n = n.

However, this is not the only basis of K n ; in fact if K is infinite


there are infinitely many bases. For example, when K = R and
n = 3 then the sequence (v 1 , v 2 , v 3 ) with v 1 = (1, 1, 0), v 2 = (0, 1, 1),
v 3 = (1, 0, 0) is a basis of R3 .

• Let V = R2 . The set {v 1 , v 2 , v 3 }, where v 1 = (1, 1), v 2 = (1, 0)


and v 3 = (−1, 1), is a system of generators for V but it is not a
basis. In fact, since dim V = 2 a basis must have 2 elements, and by

40
Andrea Ferraguti Chapter 1: Vector spaces

Steinitz Lemma three vectors must be linearly dependent. A linear


dependence relation between them is, for example, v 1 −2v 2 −v 3 = 0.

• Let V = K[x]≤n . As we have seen in Example 1.2.15, the set


{1, x, . . . , xn } is a system of generators for V . It is easy to see that
the sequence (1, x, . . . , xn ) isP
in fact a basis of V , since these vectors
are linearly independent; if ni=1 αi xi = 0, we must have αi = 0 for
every i: a polynomial is 0 if and only if all of its coefficients are 0.
Hence
dim V = n + 1.

• Let V = Mm×n (K). A basis of V is given by the sequence

(E11 , E12 , . . . , E1n , E21 , E22 , . . . , Emn ),

where Eij is the matrix whose entry in row i and column j is 1 and
all other entries are 0. It follows that

dim V = mn.

Definition 1.4.9. In the vector space K n , the canonical basis is the basis
(e1 , . . . , en ) introduced in Example 1.4.8.

Remark 1.4.10. The K-vector spaces Mn×1 (K) and M1×n (K) of n × 1
and 1 × n matrices, respectively, naturally behave like K n (this claim
can be made formal, but it goes beyond  thescope of these notes). In
a
 1  
layman’s terms, a matrix of the form . . . or a1 . . . an can be
 
 
an
n
seen as the element (a1 , . . . , an ) of K . This makes also evident the fact
that dim K n = dim Mn×1 (K) = dim M1×n (K) = n, and that a canonical
basis should also exist for the latter two spaces.  We call canonical basis
of M1×n (K) the basis (e1 , . . . , en ) where ei = 0 . . . 1 . . . 0 is the
matrix with a 1 in position (1, i) and 0 elsewhere. We call canonical basis
of Mn×1 (K) the basis constituted of the transpose of the vectors of the
canonical basis of M1×n (K).
The following theorem is of the utmost importance.

41
Andrea Ferraguti Chapter 1: Vector spaces

Theorem 1.4.11. Let V be a f.g. K-vector space of dimension n ≥ 1.


An n-tuple B = (v 1 , . . . , v n ) of vectors of V is a basis if and only if for
everyPv ∈ V there exist a unique n-tuple (α1 , . . . , αn ) ∈ K n such that
v = ni=1 αi v i .

Proof. First, suppose that B is a basis. LetP v ∈ V . Since B generates


V , there exist α1 , . . . , αn ∈ K such that v = ni=1 αi v i . We only need to
prove that the sequence (α1 , . . . , αn ) is P
unique. Suppose that there is a
second one, say (β1 , . . . , βn ). Since v = ni=1 βi v i , we get that
n
X n
X
αi v i = βi v i ,
i=1 i=1

and hence n
X
(αi − βi )v i = 0.
i=1

But B is linearly independent, and hence a linear combination of its vectors


that gives the zero vector must have all coefficients equal to 0, i.e. αi = βi
for every i ∈ {1, . . . , n}.
Conversely, suppose that every vector of V can be written in a unique
way as a linear combination of the v i ’s. To prove that B is a basis, we
need to prove that it generates V and that is linearly independent. That
it generates V is obvious by hypothesis, Pnsince every element of V is a
linear
Pn combination of elements of B. If i=1 αi v i = 0, then observe that
i=1 0v i = 0 by Theorem 1.1.21; since the representation of the vector
0 ∈ V as a linear combination of the v i ’s is unique, it must be αi = 0 for
every i.
Theorem 1.4.11 allows us to give the following definition.

Definition 1.4.12. Let V be a K-vector space of dimension n ≥ 1, and


let B = {v 1 , . . . , v n } be a basis of V . If v ∈ V , the components
P of v with
respect to B are the unique a1 , . . . , an ∈ K such that v = ni=1 ai v i .

Corollary 1.4.13. Let V be a K-vector space of dimension n. Let B =

42
Andrea Ferraguti Chapter 1: Vector spaces

(v 1 , . . . , v n ) be a basis of V . The function

ΦB : V → K n
n
X
v= ai v i 7→ (a1 , . . . , an )
i=1

is a bijection.

Proof. By Theorem 1.4.11, if ΦB (v) = ΦB (w) then it must be v = w,


because the representation of v and w with respect to the basis B is unique.
n
Hence ΦB is injective. Pn Surjectivity is obvious: if (b1 , . . . , bn ) ∈ K then
(b1 , . . . , bn ) = ΦB ( i=1 bi v i ).

Note:-
The function ΦB is the unique function that associates to a vector the n-
tuple of its components with respect to the basis B.

Remark 1.4.14. The function ΦB defined above is not just bijective, it


also has the property that for every α, β ∈ K and every v, w ∈ V we have

ΦB (αv + βw) = αΦB (v) + βΦB (w).

A map with all these properties is called an isomorphism. Being isomor-


phic vector spaces is an equivalence relation; what Corollary 1.4.13 says is
that, in loose terms, all K-vector spaces of dimension n ”behave” as K n .

Corollary 1.4.15. Let V be a K-dimensional vector space of dimension


n ≥ 1.

1. Let v 1 , . . . , v m ∈ V with m > n. Then v 1 , . . . , v m are linearly de-


pendent.

2. Let v 1 , . . . , v m ∈ V with m < n. Then hv 1 , . . . , v m i =


6 V.

3. Let v 1 , . . . , v n ∈ V . Then the following are equivalent:

(a) hv 1 , . . . , v n i = V ;
(b) v 1 , . . . , v n are linearly independent;
(c) (v 1 , . . . , v n ) is a basis of V .

43
Andrea Ferraguti Chapter 1: Vector spaces

4. For every m ∈ {0, . . . , n}, V has a vector subspace of dimension m.


If m = 0 or m = n such subspace is unique.

5. If W ⊆ V is a vector subspace, then W is f.g. and dim W ≤ dim V ,


with equality holding if and only if W = V .

6. Let v 1 , . . . , v m ∈ V . Then dimhv 1 , . . . , v m i is the maximum number


of linearly independent vectors in the set {v 1 , . . . , v m }.

Proof. 1. Since V is generated by n vectors, by Steinitz lemma m > n


vectors must be linearly dependent.
2. If it was, by contradiction, hv 1 , . . . , v m i = V , then by Lemma 1.4.3
the set {v 1 , . . . , v m } would contain a set of independent generators, i.e. a
basis. But all bases have the same cardinality by Corollary 1.4.6, and this
contradicts the assumption m < n.
3. We will prove that (a) ⇒ (b), (b) ⇒ (c) and (c) ⇒ (a).
(a) ⇒ (b). If v 1 , . . . , v n were linearly dependent, by Lemma 1.4.3 the
set {v 1 , . . . , v n } would contain a basis with less than n elements, contra-
dicting Corollary 1.4.6.
(b) ⇒ (c) If it was hv 1 , . . . , v n i 6= V , this would mean that at least a
vector w ∈ V is not a linear combination of the v i ’s.PBut then the vec-
tors v 1 , . . . , v n , w are linearly independent. In fact, if ni=1 αi v i + βw = 0
there are two cases: if β = 0 then all the αi ’s must Pn be 0 −1 since v 1 , . . . , v n
are linearly independent. If β 6= 0 then w = i=1 (−β αi )v i , contra-
dicting the fact that w is not a linear combination of the v i ’s. Hence
v 1 , . . . , v n , w are linearly independent, but this contradicts Steinitz lemma.
Hence hv 1 , . . . , v n i = V .
(c) ⇒ (a) Obvious by definition of basis.
4. The unique K-vector space of dimension 0 is {0} ⊆ V . Next, V
is a vector subspace of V of dimension n. Let us show that it is the
only one. If W ⊆ V is a vector subspace of dimension n, then it has a
basis B = (w1 , . . . , wn ). But the wi ’s belong to V and they are linearly
independent since B is a basis of W . Then by point 3. they are a basis of
V , and therefore hw1 , . . . , wn i = W = V .
Finally, if m ∈ {1, . . . , n − 1}, let (v 1 , . . . , v n ) be a basis of V and
define Wm = hv 1 , . . . , v m i. This is a subspace of dimension m because
(v 1 , . . . , v m ) is a basis of Wm : the vectors v 1 , . . . , v m generate Wm by
construction and they are linearly independent because they are elements
of a basis of V .

44
Andrea Ferraguti Chapter 1: Vector spaces

5. Suppose by contradiction that W is not f.g. Then W 6= {0}, that


is, W contains at least a non-zero vector w1 . Since W is not f.g., the
space hw1 i cannot coincide with W . Hence there exists w2 ∈ W such that
w2 ∈ / hw1 i. It follows that w1 and w2 are linearly independent. Now if it
was hw1 , w2 i = W then the latter would be finitely generated. Since we are
assuming that it is not, there exists w3 ∈ W such that w3 ∈ / hw1 , w2 i. It
follows that w1 , w2 , w3 are linearly independent. Repeating this argument,
we can find in W vectors w1 , . . . , wn+1 that are linearly independent. But
these vectors belong to V , and hence they are linearly dependent by 1.
Hence W must be f.g. It follows that there exists a finite basis (w1 , . . . , wm )
for W ; since w1 , . . . , wm are linearly independent then it must be m ≤ n
by point 1. Hence dim W ≤ dim V . If equality holds, then by point 4. it
must be W = V .
6. Let r be the maximum number of linearly independent vectors in
the set A = {v 1 , . . . , v m }, and let v i1 , . . . , v ir ∈ A be linearly independent.
Then, by definition of basis, B = (v i1 , . . . , v ir ) is a basis of hv i1 , . . . , v ir i,
which then has dimension r. Now if v j ∈ / A, the set A ∪ {v j } cannot
be linearly independent, since otherwise it would contain r + 1 linearly
independent vectors, and this contradicts the fact that r is the maximum
number of linearly independent vectors among the v i ’s. Hence v j must be
a linear combination of v i1 , . . . , v ir , and since this holds for every v j ∈ / A,
it follows that hAi = hBi, so that dimhAi = r.
We end this section with a very important result on bases.

Theorem 1.4.16. Let V be a f.g. K-vector space of dimension n, and let


B be a basis of V .

1. Let A ⊆ V . If B ⊆ hAi, then hAi = V .

2. Let v 1 , . . . , v m ∈ V be linearly independent, with m < n. Then


there exist v m+1 , . . . , v n ∈ B such that (v 1 , . . . , v n ) is a basis of V .

Proof. 1. By Proposition 1.2.13, since B ⊆ hAi then V = hBi ⊆ hAi ⊆ V .


Hence it must be hAi = V .
2. By Corollary 1.4.15 we must have hv 1 , . . . , v m i = 6 V . Therefore by
point 1. we cannot have B ⊆ hv 1 , . . . , v m i. This means that there exists a
vector v m+1 ∈ B such that v m+1 ∈ / hv 1 , . . . , v n i, and therefore v 1 , . . . , v m+1
are linearly independent by Proposition 1.3.4. Now we repeat the same
argument on the set {v 1 , . . . , v m+1 } until we get n linearly independent

45
Andrea Ferraguti Chapter 1: Vector spaces

vectors; these must form a basis by Corollary 1.4.15.

1.5 Sum and intersection of subspaces

Definition 1.5.1. Let V be a K-vector space and U, W ⊆ V be subspaces.


The sum of U and W is the set:

U + W = {u + w : u ∈ U, w ∈ W }.

We say that the sum U + W is direct if for every v ∈ U + W there exist


unique u ∈ U and w ∈ W such that v = u + w. If the sum U + W is
direct, we write U ⊕ W .

Proposition 1.5.2. Let V be a K-vector space and U, W be subspaces of


V . The sets U ∩ W and U + W are vector subspaces of V .

Proof. First, let u, w ∈ U ∩ W and α, β ∈ K. Since u, w ∈ U , that is a


vector subspace of V , then αu + βw ∈ U . But the same is true for W ,
since this is a vector subspace too. Then αu + βw ∈ W , and it follows
that αu + βw ∈ U ∩ W . Hence U ∩ W is a vector subspace by Theorem
1.2.6.
Next, let u1 + w1 , u2 + w2 ∈ U + W , where u1 , u2 ∈ U and w1 , w2 ∈ W .
Let α1 , α2 ∈ K. Then α1 (u1 +w1 )+α2 (u2 +w2 ) = (α1 u1 +α2 u2 )+(α1 w1 +
α2 w2 ), and since U and W are subspaces, we get that α1 u1 + α2 u2 ∈ U
and α1 w1 + α2 w2 ∈ W . It follows that α1 (u1 + w1 ) + α2 (u2 + w2 ) ∈ U + W ,
and hence the latter is a subspace by Theorem 1.2.6.

Proposition 1.5.3. Let V be a vector space and U, W be vector subspaces


of V . The sum U + W is direct if and only if U ∩ W = {0}.

Proof. First, assume that the sum is direct. We want to show that U ∩
W = {0}. Let v ∈ U ∩ W . Since v ∈ U , a way of expressing v as a sum
of a vector of U and one of W is v = v + 0. On the other hand, v ∈ W
as well, and hence we can also write v = 0 + v, a sum of a vector of U ,
namely 0, and one of W . Since the sum of U and W is direct, there can
be only one way of writing v as a sum of an element of U and one of W ,
and therefore it must be v = 0. That is, U ∩ W = {0}.
Conversely, assume that U ∩ W = {0}. Now let v ∈ U + W , and let

46
Andrea Ferraguti Chapter 1: Vector spaces

u1 , u2 ∈ U , w1 , w2 ∈ W be such that v = u1 + w1 = u2 + w2 . Then


u1 − u2 = w1 − w2 . This implies that u1 − u2 is both an element of U ,
because U is a subspace, and an element of W , because it is equal to
w1 − w2 that is an element of W . Hence u1 − u2 ∈ U ∩ W = {0}, so that
u1 = u2 . Therefore also w1 − w2 = 0, so that there is only one way of
writing v as a sum of a vector in U and one in W .

Proposition 1.5.4. Let U, W be f.g. subspaces of a K-vector space V .

1. U ∩ W is finitely generated.

2. Let B = (u1 , . . . , um ) be a basis of U and B 0 = (w1 , . . . , wn ) be a


basis of W . Then U + W is finitely generated, and B ∪ B 0 is a system
of generators.

Proof. 1. The space U ∩ W is a subspace of U , that is f.g., and hence it


is f.g. itself (see Corollary 1.4.15).
2. Let v ∈ U + W , so that v = u + w for some u ∈PU and w ∈
W . There
Pn exist α1 , . . . , αm ,P K such that u = m
β1 , . . . , βn ∈P i=1 αi ui and
m n
w = j=1 βj wj . Then v = i=1 αi ui + j=1 βj wj . We have just proved
that every vector of U + W is a linear combination of elements of B ∪ B 0 .
Therefore U + W = hB ∪ B 0 i.

Theorem 1.5.5 (Grassmann formula). Let V be a K-vector space and let


U, W ⊆ V be f.g. vector subspaces. Then:

dim(U + W ) = dim U + dim W − dim(U ∩ W ).

Proof. If dim U = 0 or dim W = 0 the formula is obvious since dim U ∩


W = 0. From now on, we let B = (u1 , . . . , um ) be a basis of U and
B 0 = (w1 , . . . , wn ) be a basis of W , so that dim U = m and dim W = n
and m, n ≥ 1.
Case 1: U ∩ W = {0}. If this happens, then dim(U ∩ W ) = 0, and so
we need to prove that

dim(U + W ) = dim U + dim W. (6)

By Proposition 1.5.4, B ∪ B 0 generates U + W . If we can prove that


u1 , . . . , um , w1 , . . . , wn are linearly independent, it will follow that B ∪ B 0

47
Andrea Ferraguti Chapter 1: Vector spaces

is a basis of U +W with m+n elements, and therefore dim(U +W ) = m+n


as we need.
Let α1 , . . . , αm , β1 , . . . , βn ∈ K be such that

α1 u1 + . . . + αm um + β1 w1 + . . . + βn wn = 0.

Then
α1 u1 + . . . + αm um = −β1 w1 − . . . − βn wn . (7)
The left hand side of the above equality is an element of U and it equals the
right hand side that is a vector of W . It follows that α1 u1 + . . . + αm um ∈
U ∩ W = {0}. This means that

α1 u1 + . . . + αm um = 0.

But B is a basis, and therefore α1 = . . . = αm = 0. Substituting this in


(7) it follows that
β1 w1 + . . . + βn wn = 0.
But B 0 is also a basis, and hence β1 = . . . = βn = 0. We proved that the
αi ’s and the βj ’s are all 0, and so that u1 , . . . , um , w1 , . . . , wn are linearly
independent; it follows that B ∪B 0 is a basis of U +W of cardinality m+n,
and (6) is proved.
Case 2: U ∩ W 6= {0}. By Proposition 1.5.4, the space U ∩ W is f.g.,
and hence it admits a basis B 00 = {v 1 , . . . , v p }, where p = dim(U ∩ W ) ≥ 1
and p ≤ min{m, n}. By Theorem 1.4.16, we can complete B 00 to a basis
of U , using vectors of B, and to a basis of W , using vectors of B 0 . Up to
permuting the elements of B and those of B 0 , we can assume that Be =
(v 1 , . . . , v p , u1 , . . . , um−p ) is a basis of U and Be0 = (v 1 , . . . , v p , w1 , . . . , un−p )
is a basis of W . If m − p = 0 this means that dim U ∩ W = dim U , and
this in turn means that U = U ∩ W , so that U ⊆ W . Then the formula
we need to prove becomes dim(U + W ) = dim W , which is certainly true
since U + W = W . The case n − p = 0 is analogous. Hence we can assume
that m − p, n − p ≥ 1.
Now we claim that C = (v 1 , . . . , v p , u1 , . . . , um−p , w1 , . . . , wn−p ) is a
basis of U + W . This will conclude the proof, since then U + W has a
basis with m + n − p elements, i.e. dim(U + W ) = m + n − p. We need to
prove that C generates U + W and that it consists of linearly independent
vectors.
The first claim follows immediately by Proposition 1.5.4, because C =
B ∪ Be0 .
e

48
Andrea Ferraguti Chapter 1: Vector spaces

Now let α1 , . . . , αm−p , β1 , . . . , βn−p , γ1 , . . . , γp ∈ K be such that:

α1 u1 +. . .+αm−p um−p +β1 w1 +. . .+βn−p wn−p +γ1 v 1 +. . .+γp v p = 0. (8)

Rewrite this as:

α1 u1 + . . . + αm−p um−p = −β1 w1 − . . . − βn−p wn−p − γ1 v 1 − . . . − γp v p .

The left hand side is a vector of U , and it equals the right hand side that
is a vector of W . Hence the left hand side is a vector of U ∩ W , and
00
therefore it can be expressed PpB . This means that there exist
in the basis
Pn−p
δ1 , . . . , δp ∈ K such that i=1 αi ui = i=1 δi v i . Substituting this in (8),
we get:

β1 w1 + . . . + βn−p wn−p + (γ1 + δ1 )v 1 + . . . + (γp + δp )v p = 0.

Now remember that Be0 is a basis; it must then be

β1 = . . . = βn−p = γ1 + δ1 = . . . = γp + δp = 0.

Since the βi ’s are all 0, equation (8) becomes:

α1 u1 + . . . + αm−p um−p + γ1 v 1 + . . . + γp v p = 0.

But now remember that also Be is a basis, and therefore

α1 = . . . = αm−p = γ1 = . . . = γp = 0.

All in all, we have proved that all the αi ’s, all the βi ’s and all the γi ’s are
0, i.e. that the vectors of C are linearly independent.

Definition 1.5.6. Let V be a K-vector space and let W ⊆ V be a sub-


space. A direct complement for W in V is a subspace W 0 ⊆ V such that
W ⊕ W0 = V .

Theorem 1.5.7. Let V be a f.g. K-vector space, and let W ⊆ V be a


vector subspace. Then there exists a direct complement W 0 for W .

Proof. If W = V then just take W 0 = {0}. If W = {0}, take W 0 =

49
Andrea Ferraguti Chapter 1: Vector spaces

V . Otherwise, let n = dim V and let B = (v 1 , . . . , v m ) be a basis for


W , with m < n, and B 0 a basis for V . By Theorem 1.4.16, there exist
v m+1 , . . . , v n ∈ B 0 such that (v 1 , . . . , v n ) is a basis of V . We claim that
W 0 = hv m+1 , . . . , v n i is a direct complement for W . We need to prove that
W + W 0 = V and that W ∩ W 0 = {0}.
The fact that W +W 0 = V follows immediately from Proposition 1.5.4.
If v ∈ W ∩ W 0 , thenP v = m
P
i=1 i v i for some α1 , . . . , αm ∈ K, since
α
n
v ∈ W , but also v = i=m+1 βi v i for some βm+1 , . . . , βn ∈ K, since
0
v ∈ W . Hence

α1 v 1 + . . . + αm v m − βm+1 v m+1 − . . . − βn v n = 0,

and since (v 1 , . . . , v n ) is a basis for V it follows that αi = 0 = βj for every


i, j. Hence W ∩ W 0 = {0}.

Remark 1.5.8. The proof of Theorem 1.5.7 shows clearly that a subspace
does not have just one direct complement. For example, if V = R2 and
W = h(1, 0)i, then W has infinitely many direct complements. In fact, if
v ∈ V \ W then W 0 = hvi is a direct complement for W : by construction
W ∩ W 0 = {0} and dim(W ⊕ W 0 ) = 2 by Grassmann formula. It must
then be W ⊕ W 0 = V , since the unique subspace of V of dimension 2 is
V itself (see Corollary 1.4.15).

50
Andrea Ferraguti Chapter 2: Determinant and rank

Chapter 2: Determinant and rank


2.1 Determinant
Definition 2.1.1. Let K be a field and A ∈ Mm×n (K) an m × n matrix
with entries in K. A submatrix is a matrix obtained by erasing rows
and/or columns of A.

 
1 2 4
 
Example 2.1.2. Let A =  0 1 2 ∈ M3 (R).
 
 
−1 6 7
 
1
2 4
Erasing the middle row, we obtain the submatrix  .
−1 6 7
 
4
 
Erasing the first two columns, we obtain the submatrix 2.
 
 
7
Erasingthefirst and third column and the third row, we obtain the
2
submatrix  .
1

Definition 2.1.3. Let K be a field and n ≥ 1 be an integer. Let A =


(aij )i,j=1,...,n ∈ Mn (K) be a square matrix. The determinant of A is the
element of K that is defined in the following recursive way:

1. if n = 1 then A = (a11 ) and the determinant of A is a11 ;

2. in general, the determinant of A is defined by:


n
X
det A = (−1)1+j a1j A1j ,
j=1

51
Andrea Ferraguti Chapter 2: Determinant and rank

where A1j is the determinant of the submatrix obtained by erasing


from A the first row and the j-th column.

Example 2.1.4.
 
a11 a12
• If n = 2 then A =  . Definition 2.1.3 yields:
a21 a22

det A = a11 det(a22 ) − a12 det(a21 ) = a11 a22 − a12 a21 .


 
a a a
 11 12 13 
• If n = 3 then A = a21 a22 a23 . Definition 2.1.3 yields:
 
 
a31 a32 a33
     
a22 a23 a21 a23 a21 a22
det A = a11 det  −a12 det  +a13 det  .
a32 a33 a31 a33 a31 a32

Now we can use the previous computation of the determinant of a


2 × 2 matrix and get:

det A = a11 (a22 a33 −a23 a32 )−a12 (a21 a33 −a23 a31 )+a13 (a21 a32 −a22 a31 ).

Definition 2.1.5. Given A ∈ Mm×n (K), a minor of A is the determinant


of a square submatrix of A.

Theorem 2.1.6 (Laplace theorem). Let K be a field and A = (aij )i,j=1,...,n ∈


Mn (K). Fix an index k ∈ {1, . . . , n}. Then:
n
X n
X
k+j
det A = (−1) akj Akj = (−1)i+k aik Aik ,
j=1 i=1

where A`m is the minor obtained by erasing the `-th row and the m-th
column from A.
In other words, Laplace theorem says that the determinant can be com-

52
Andrea Ferraguti Chapter 2: Determinant and rank

puted starting from any row or column of A, by multiplying each entry of


the row/column by the determinant of the corresponding submatrix and the
appropriate sign, and then adding up the terms.

 
1 2 0
 
Example 2.1.7. Let A = −1 0 1 ∈ M3 (R). The determinant of A
 
 
−1 1 1
can be computed by its definition, namely using the first row:
     
0 1 −1 1 −1 0
det A = 1 · det   − 2 · det   + 0 · det  =
1 1 −1 1 −1 1

= 1(0 − 1) − 2(−1 + 1) + 0(−1 − 0) = −1.


On the other hand, Laplace theorem says that we can also compute it
using any row or column. For example, let us choose the second column:
     
−1 1 1 0 1 0
det A = −2 · det   + 0 · det   − 1 · det  =
−1 1 −1 1 −1 1

= −2(−1 + 1) + 0(1 − 0) − 1(1 − 0) = −1.

Proposition 2.1.8. Let A ∈ Mn (K). The determinant of A enjoys the


following properties.

1. If A has a row or a column all of whose entries are 0, then det A = 0.

2. det(A) = det(t A).

3. If A0 is the matrix obtained by swapping two adjacent rows or two


adjacent columns of A, then det A = − det A0 .

4. If two rows or columns of A are equal, then det A = 0.

53
Andrea Ferraguti Chapter 2: Determinant and rank

 
R1
 
R2
 
 
 
...
 
 
5. If A =  , where R1 , . . . , Rn are the rows of A, then:
 Rk + Rk0
 

 
 
 ... 
 
Rn
   
R1 R1
   
R2 R2
   
   
   
... ...
   
   
det A = det   + det  .
Rk0
   
 Rk   
   
   
 ...   ... 
   
Rn Rn

Analogously, if A = (C1 | . . . |Ck + Ck0 | . . . |Cn ) where C1 , . . . , Cn are


the columns of A, then:

det A = det(C1 | . . . |Ck | . . . |Cn ) + det(C1 | . . . |Ck0 | . . . |Cn ).

6. if A0 is the matrix obtained by multiplying every entry of a row or


a column of A by the same constant λ ∈ K, then det A0 = λ det A.

7. if a row/column of A is a linear combination of other rows/columns


of A, then det A = 0.

8. if A0 is the matrix obtained by adding to a row/column of A a linear


combination of other rows/columns of A, then det A0 = det A.

Proof. 1. This is clear by Laplace theorem, if we compute the determinant


of A using a row or a column all of whose entries are 0.
2. Again clear by Laplace theorem: computing the determinant of A
using the first row is the same as computing the determinant of t A using
the first column.

54
Andrea Ferraguti Chapter 2: Determinant and rank

Thanks to point 2., it suffices to prove all remaining points for rows,
since the same statement for columns follows by considering the transpose
matrix.
3. Let A = (aij )i,j=1,...,n and suppose A0 = (bij )i,j=1,...,n is obtained by
swapping the i-th row with the (i + 1)-th row of A. By Laplace theorem,
n
X
det A = (−1)i+j aij Aij
j=1

by computing the determinant using the i-th row. On the other hand,
n
X
0
det A = (−1)i+1+j b(i+1)j A0(i+1)j
j=1

by Laplace theorem, where we computed the determinant using the (i+1)-


th row and A0(i+1)j is the minor of A0 obtained by erasing the (i + 1)-th
row and the j-th column. Now just notice that b(i+1)j = aij for every
j ∈ {1, . . . , n}, since we swapped row i and row i + 1, and A0(i+1)j = Aij
for every j ∈ {1, . . . , n}: deleting the (i + 1)-th row of A0 has the same
effect of deleting the i-th row of A. Hence
n
X n
X
0 i+1+j
det A = (−1) aij Aij = − (−1)i+j aij Aij = − det A.
j=1 j=1

4. By induction
  on n. If n = 2 and two rows are equal, we necessarily
a b
have A =  , so that det A = ab − ab = 0. Now suppose that the
a b
claim is true for a square matrix of size n − 1 and consider A ∈ Mn (K),
with n ≥ 3. Suppose that two rows of A coincide, say Ri = Rj and fix
another row Rk , with k 6= i, j. By Laplace theorem,
n
X
det A = (−1)k+` ak` Ak` ,
`=1

where Ak` is the minor of A obtained by erasing the k-th row and the `-th
column. Clearly, when we erase the k-th row and the `-th column from
A, the submatrix we obtain has two equal rows, because A had them and

55
Andrea Ferraguti Chapter 2: Determinant and rank

we erased a different row. This means that Ak` = 0 for every `, by the
inductive hypothesis. It follows that det A = 0.
5. Let Rk = (ak1 , . . . , akn ) and Rk0 = (a0k1 , . . . , a0kn ). Then by Laplace
theorem using the k-th row we get:
n
X
det A = (−1)k+j (akj + a0kj )Akj =
j=1

n
X n
X
= (−1)k+j akj Akj + (−1)k+j a0kj Akj ,
j=1 j=1

and by Laplace
 theorem these  twoterms are exactly the determinants of
R1 R1
   
 R2   R2 
   
   
 ...   ... 
   
the matrices 

 and 
  0 .

 Rk   Rk 
   
   
 ...   ... 
   
Rn Rn
6. Let A = (aij )i,j=1,...,n and suppose A0 is obtained by A by multiplying
the k-th row by a constant λ ∈ K. By Laplace theorem we get:
n
X n
X
0 k+j
det A = (−1) (λakj )Akj =λ (−1)k+j akj Akj = λ det A.
j=1 j=1

7. Let R1 , . . . , Rn beP
the rows of A and suppose that the k-th row Rk
of A can be written as i6=k λi Ri , for some λi ∈ K. By points 5. and 6.

56
Andrea Ferraguti Chapter 2: Determinant and rank

we get that  
R1
 
...
 
 
 
Ri
 
 
X  
det A = λi det  ,
 
...
 
i=1,...,n  
i6=k 
 Ri 

 

 ... 

Rn
and each of the determinants in the right hand side is 0 by point 4.
8. Suppose A0 is obtained from A by adding to the k-th row Rk a
linearPcombination of the other rows, so that the k-th row of A0 equals
Rk + i=1,...,n λi Ri . By point 5. we have
i6=k

   
R1 R1
   
R2  R2
   
  
   
...  ...
   
0
  
det A = det 

 + det 
  P
.

 Rk   λi Ri 
   
   
 ...   ... 
   
Rn Rn

Now the first term in the right hand side is exactly det A and the second
term is 0 by point 7.

Remark 2.1.9. If A is upper triangular or lower triangular, then the deter-


minant of A equals the product of the elements on the diagonal. This can
be shown by induction on the size n of the matrix. Of course it is enough
to prove it for upper triangular matrices, since the transpose of a lower
triangular matrix is upper triangular and has the same diagonal. If n = 1,
the claim is obvious. Suppose it is true for upper triangular matrices of

57
Andrea Ferraguti Chapter 2: Determinant and rank

 
a a12 . . . a1n
 11 
 
0 a22 . . . a2n 
size n − 1 and let A = 

 be upper triangular of size

. . . ... ... ...
 
0 . . . 0 ann
n. By Laplace theorem, using the first column, we get:
 
a a . . . a2n
 22 23 
 
 0 a32 . . . a3n 
det A = a11 det 
.

. . . . . . . . . . . . 
 
0 . . . 0 ann

In the right hand side of the equality we are computing the determinant
of an upper triangular matrix of size n − 1 and hence by the inductive
hypothesis its determinant is a22 a33 . . . ann . It follows that

det A = a11 a22 . . . ann .

Theorem 2.1.10 (Binet theorem). Let A, B ∈ Mn (K). Then:

det(AB) = det A · det B.

Definition 2.1.11. A matrix A ∈ Mn (K) is invertible if there exists a


matrix B ∈ Mn (K) such that AB = BA = In .

Theorem 2.1.12. A matrix A ∈ Mn (K) is invertible if and only if det A 6=


0. If this is the case, the inverse matrix is unique, is denoted by A−1 and
it is given by:
1
A−1 = · t ((−1)i+j Aij )i,j=1,...,n ,
det A
where Aij is the minor of A obtained by erasing the i-th row and the j-th
column.

58
Andrea Ferraguti Chapter 2: Determinant and rank

2.2 Change of basis


Let V be a f.g. K-vector space, and let B = (v 1 , . . . , v n ) and B 0 = (v 01 , . . . , v 0n )
be two bases for V . For every j ∈ {1, . . . , n}, let
n
X
v 0j = λij v i
i=1

be the expression of the vectors of B 0 with respect to the basis B.

Definition 2.2.1. The matrix A = (λij )i,j=1,...,n is called change of basis


matrix from B 0 to B.

Pn 0
Proposition 2.2.2. Let v = i=1 ai v i and let E = (a1 , . . . , an ). Then
t
A · E is the vector of the components of v with respect to the basis B.
Pn Pn
Proof. Let v = j=1 aj v 0j . Now substitute v 0j = i=1 λij v i so that

n n
!
X X
v= aj λij vi
i=1 j=1

is the expression of v with respect to the basis B. Now it is just a matter


of checking that   P 
n
a aλ
 1   j=1 j 1j 
   Pn 
 a2   j=1 aj λ2j 
A  =
  .

. . .   ... 
  P 
n
an j=1 aj λnj

Therefore once we have the change of basis matrix, in order to transform the
expression of a vector in basis B 0 into the expression in basis B we just have
to multiply the column vector of components by the matrix A.

Example 2.2.3. Let V = R3 . The sequence

B = ((1, 1, 1), (1, 0, 1), (−1, 0, 0))

59
Andrea Ferraguti Chapter 2: Determinant and rank

is a basis of V , and so is

B 0 = ((2, 0, 0), (0, 1, 1), (2, 3, 0)).

In order to write down the change of basis matrix, all we need to do is to


express vectors of B 0 with respect to B, and then writing the components
by columns in a 3 × 3 matrix. We have:

(2, 0, 0) = −2 · (−1, 0, 0)

(0, 1, 1) = 1 · (1, 1, 1) + 1 · (−1, 0, 0)


(2, 3, 0) = 3 · (1, 1, 1) − 3 · (1, 0, 1) − 2 · (−1, 0, 0)
so that the change of basis matrix is:
 
0 1 3
 
A =  0 0 −3 .
 
 
−2 1 −2

Now if we take any vector of V and we write it with respect to the basis
B 0 , such as for example (4, 4, 1) = 1 · (2, 0, 0) + 1 · (0, 1, 1) + 1 · (2, 3, 0), in
order to find its components with respect to B we just need to compute
     
0 1 3 1 4
     
 0 0 −3 · 1 = −3 .
     
     
−2 1 −2 1 −3

In fact,
(4, 4, 1) = 4 · (1, 1, 1) − 3 · (1, 0, 1) − 3 · (−1, 0, 0).

Proposition 2.2.4. The change of basis matrix A from B 0 to B is invertible,


and its inverse A−1 is the change of basis matrix from B to B 0

Proof. Let A0 be the change of basis matrix from B to B 0 . Then clearly


for any (a1 , . . . , an ) ∈ K n we have

A0 A · t (a1 , . . . , an ) = t (a1 , . . . , an ),

60
Andrea Ferraguti Chapter 2: Determinant and rank

because A· t (a1 , . . . , an ) is the vector of components of the vector ni=1 ai v 0i


P
with respect
Pto B, and hence A0 · (A · t (a1 , . . . , an )) is the vector of com-
n
ponents of i=1 ai v 0i in basis B 0 , that is just t (a1 , . . . , an ).
Then in particular when (a1 , . . . , an ) = (0, . . . , 0, 1, 0, . . . , 0), with the
only 1 in position k, we see that A0 At (a1 , . . . , an ) on the one hand is the
k-th column of A0 A, and on the other hand is t (0, . . . , 0, 1, 0, . . . , 0). This
means precisely that A0 A = In . On the other hand one can swap the role
of B and B 0 and repeat the same argument; this leads to proving that
AA0 = In . Therefore A is invertible and A0 is its inverse.

2.3 Rank
Definition 2.3.1. Let A ∈ Mm×n (K) be a matrix. If B is a square sub-
matrix of A, the size of B is the number of rows (or columns) of B.
The rank of A is the largest size of a square submatrix of A with
nonzero determinant. If all square submatrices of A have determinant 0,
then we say that A has rank 0. The rank of A is denoted by rk(A).

Remark 2.3.2.

1. A has rank 0 if and only if all of its entries are 0. In fact, if all of
its entries are 0 then clearly there are no square submatrices with
nonzero determinant, and conversely if all square submatrices of A
have determinant 0 then in particular all submatrices of size 1 have
determinant 0. But a square submatrix of size 1 is just an entry of
A.

2. If A ∈ Mn (K), then rk(A) = n if and only if det A 6= 0. In fact, the


unique square submatrix of A of size n is A itself.

3. If A ∈ Mm×n (K), then rk(A) ≤ min{m, n}, since a square submatrix


of A can have size at most min{m, n}.

4. Since the determinant of a matrix equals the determinant of the


transpose, we have that rk(A) = rk(t A).

5. if B is a submatrix of a matrix A, then rk(A) ≥ rk(B).

61
Andrea Ferraguti Chapter 2: Determinant and rank

Example 2.3.3.
 
1 2 3
• Let A =   ∈ M2×3 (R). Since A is not the zero matrix,
2 4 6
rk(A) ≥ 1 and of course rk(A) ≤ 2. To decide whether rk(A) = 1
or rk(A) = 2 we need to decide if A has a submatrix of size 2 with
non-zero determinant. The submatrices of size 2 of A are:
     
1 2 1 3 2 3
 ,   and  .
2 4 2 6 4 6

These all have determinant 0, so rk A = 1


 
1 2 3
• Let A =   ∈ M2×3 (R). The size 2 submatrix of A given
0 4 6
 
1 2
by   has determinant 4 6= 0. Therefore, rk(A) = 2.
0 4

If A ∈ Mm×n (K) has a square submatrix of size r with non-zero determinant,


then the rank of A is at least r, by definition. In order to prove that the
rank is exactly r, one needs to prove that every square submatrix of A of size
larger than r has zero determinant. However, it is enough to consider square
submatrices of size r + 1, as shown by the following proposition.

Proposition 2.3.4. Let A ∈ Mm×n (K) and let r ≤ min{m, n}. Suppose
that every square submatrix of A of size r + 1 has determinant 0. Then
rk(A) ≤ r.

Proof. To prove that the rank of A is at most r, we need to prove that


the determinant of every square submatrix of A of size r + k is zero, for
every k = 1, . . . , min{m, n} − r. We do this by induction on k.
The claim for k = 1 is true by hypothesis. Now suppose the claim is
true for k −1, i.e. suppose that every square submatrix of A of size r+k −1
has determinant 0, and let B = (bij )i,j=1,...,r+k be a square submatrix of A
of size r + k. Computing the determinant of B via Laplace theorem via

62
Andrea Ferraguti Chapter 2: Determinant and rank

the first row of B yields:


r+k
X
det B = (−1)1+j b1j det B1j , (9)
j=1

where B1j is the submatrix obtained from B by erasing the first row and
the j-th column of B. But now for every j = 1, . . . , r + k the matrix B1j
is a square submatrix of B, and hence of A, of size r + k − 1, and hence
it has determinant 0 by the inductive hypothesis. It follows from (9) that
det B = 0.

Corollary 2.3.5. Let A ∈ Mm×n (K) and suppose that A has a size r
submatrix with non-zero determinant. If all submatrices of size r + 1 have
zero determinant, then rk(A) = r

Proof. Since A has a size r submatrix with non-zero determinant, rk(A) ≥


r. On the other hand, by Proposition 2.3.4 we have rk(A) ≤ r, and
therefore rk(A) = r.

Definition 2.3.6. Let V be a f.g. K-vector space and let B = (v 1 , . . . , v n )


P of V . Let w1 , . . . , wk ∈ V and write, for every i ∈ {1, . . . , k},
be a basis
wi = nj=1 λji v j with λ1i , . . . , λni ∈ K. The matrix
 
λ λ12 . . . λ1k
 11 
 
 λ21 λ22 . . . λ2k 
AB = 



. . . ... ... ...
 
λn1 λn2 . . . λnk

is called component matrix of the vectors w1 , . . . , wk with respect to the


basis B.

Example 2.3.7. Let V = R3 and let B = ((1, 1, 1), (1, 0, 1), (0, 0, −1)),
that is a basis of V . Let w1 = (2, 1, 0) and w2 = (−1, 0, 0). We have
that w1 = (1, 1, 1) + (1, 0, 1) + 2(0, 0, −1) and w2 = −(1, 0, 1) − (0, 0, −1).

63
Andrea Ferraguti Chapter 2: Determinant and rank

Hence the component matrix of w1 , w2 with respect to B is:


 
1 0
 
1 −1 .
 
 
2 −1

Theorem 2.3.8. Let V be a f.g. K-vector space of dimension n, let B


be a basis of V and let v 1 , . . . , v k ∈ V be k vectors with k ≤ n. Let
AB ∈ Mn×k (K) be the component matrix of such vectors with respect to
B. Then v 1 , . . . , v k are linearly independent if and only if rk(AB ) = k.

Proof. Let A := AB = (λij )i=1,...,n be the component matrix.


j=1,...,k
First, we prove that if v , . . . , v k are linearly dependent then rk(A) <
k. Hence assume that v 1 , . . . , v k are linearly dependent. Then there are
α1 , . . . , αk ∈ K, not all 0, such that

α1 v 1 + . . . + αk v k = 0. (10)

Let B = (w1 , . . . , wn ) and let v i = nj=1 λji wj for every i. Substituting in


P
(10), we get that !
Xn Xk
αj λij wi = 0.
i=1 j=1

Since B is a basis, it follows that for every row index i ∈ {1, . . . , n} we


have
α1 λi1 + . . . + αk λik = 0. (11)
 
λ
 1j 
 
 λ2j 
Notice that the j-th column Cj of A is given by   , and therefore

. . .
 
λnj
Pk
relation (11) implies that j=1 αj Cj = 0.
Now if we erase any n − k rows from A, obtaining a square submatrix
A of size k, relation (11) will still hold on every row of A,
e e of course. But
this means precisely that the columns of A, e when thought as vectors in

64
Andrea Ferraguti Chapter 2: Determinant and rank

Mk×1 (K), are linearly dependent. Hence, by Proposition 2.1.8, det(A) e =


0. Since this holds for every square submatrix of A of size k, by Proposition
2.3.4 it follows that rk(A) < k.
Conversely, we prove that if v 1 , . . . , v k are linearly independent then
rk(A) = k. Assume then that v 1 , . . . , v k are linearly independent. By
Theorem 1.4.16, there exist v k+1 , . . . , v n ∈ B such that B 0 = (v 1 , . . . , v n )
is a basis of V . Now let H be the change of basis matrix from B 0 to B.
In order to obtain it, we need to express the vectors of B 0 in the basis B.
When doing this process on v 1 , . . . , v k , we obtain precisely the columns of
A, by definition. On the other hand the vectors v k+1 , . . . , v n are already
expressed with respect to the basis B, as they are simply elements of such
basis. Hence the corresponding columns of the change of basis matrix will
have all entries equal to 0 except for one entry equal to 1. All in all, the
matrix H will look like (A|A0 ), where
 
... ... ... ...
 
 
0
. . . . . . . . . 1 
A = 


 1 . . . . . . . . .
 
... 1 ... ...

is an n × (n − k) matrix whose columns have one entry equal to 1 and


all the remaining ones equal to 0. Moreover, since v k+1 , . . . , v n are all
different, if C, C 0 are two columns of A0 then the entry 1 will be in two
different positions.
Now the matrix H has non-zero determinant by Proposition 2.2.4. On
the other hand, we can compute det H by Laplace theorem using the last
column of A0 . This has only one non-zero entry that is 1 in position i,
so erasing the corresponding row of H we get that the determinant of H
is (up to sign) that of (A| eA e0 ), where A e is the submatrix obtained from
A by erasing the i-th row and A e0 is the submatrix obtained from A0 by
erasing the i-th row and the last column. Now we can repeat this process
using the last column of A e0 ; notice that such column will have exactly one
non-zero entry 1, since when we erased the i-th row from A0 we did not
erased any other non-zero entry from A0 : this happens because the 1’s in
such matrix are all in different positions. Repeating this process until we
use up all columns of A0 we end up seeing that the determinant of H is, up
to sign, that of a square submatrix of A obtained by erasing n − k rows.

65
Andrea Ferraguti Chapter 2: Determinant and rank

Since det H 6= 0, this means that A has a square submatrix of size k with
non-zero determinant, and hence rk(A) = k.

Corollary 2.3.9. Let V be a f.g. K-vector space of dimension n. Let


v 1 , . . . , v k ∈ V and let B be a basis of V . Then v 1 , . . . , v k are linearly de-
pendent if and only if in the component matrix AB every square submatrix
of size k has determinant 0.

Proof. By Theorem 2.3.8, v 1 , . . . , v k are linearly dependent if and only if


AB has rank < k, and this happens precisely when every submatrix of size
k has determinant 0, by the definition of rank.

Corollary 2.3.10. Let V be a f.g. K-vector space of dimension n. Let


v 1 , . . . , v n ∈ V and let B be a basis of V . Then (v 1 , . . . , v n ) is a basis of
V if and only if the component matrix of v 1 , . . . , v n with respect to B has
non-zero determinant.

Proof. By Corollary 1.4.15, (v 1 , . . . , v n ) is a basis if and only if v 1 , . . . , v n


are linearly independent. By Theorem 2.3.8, this happens if and only
if the component matrix AB of v 1 , . . . , v n with respect to B has rank n.
Since this is an n × n matrix, it has rank n if and only if its determinant
is non-zero, by Remark 2.3.2.

Example 2.3.11.

• Let V = R3 and let v 1 = (1, 1, 0), v 2 = (−1, 0, 1) and v 3 = (2, 2, 2).


In order to decide whether (v 1 , v 2 , v 3 ) is a basis of V or not, we
can apply Corollary 2.3.10. That is, we choose a basis B of V , we
write down the component matrix of (v 1 , v 2 , v 3 ) with respect to V
and then we compute its determinant: this is non-zero if and only if
(v 1 , v 2 , v 3 ) is a basis. Clearly the easiest choice for B is the canonical
basis B = (e1 , e2 , e3 ); the component matrix is
 
1 −1 2
 
AB = 1 0 2
 
 
0 1 2

66
Andrea Ferraguti Chapter 2: Determinant and rank

and its determinant is 2. Therefore (v 1 , v 2 , v 3 ) is a basis of V .

• Let V = R[x]≤3 . Consider the vectors v 1 = 2 − x + 3x2 , v 2 =


1 − x − x2 − x3 and v 3 = −x − 5x2 − 2x3 . In order to decide whether
they are linearly independent or not, we can apply Theorem 2.3.8.
So first we choose a convenient basis of V , such as, for example, B =
(1, x, x2 , x3 ). Next, we write the component matrix of (v 1 , v 2 , v 3 )
with respect to B. This is:
 
2 1 0
 
−1 −1 −1
 
AB = 
 .
 3 −1 −5

 
0 −1 −2

Theorem 2.3.8 says that the vectors are linearly independent if and
only if rk(AB ) = 3. The four submatrices of AB of size 3 are:
   
2 1 0 2 1 0
   
−1 −1 −1 , −1 −1 −1 ,
   
   
3 −1 −5 0 −1 −2
   
2 1 0−1 −1 −1
   
3 −1 −5 ,  3 −1 −5 ,
   
   
0 −1 −2 0 −1 −2
and they all have determinant 0. Therefore rk(AB ) < 3 and the
three vectors are linearly dependent.

• Let V = M2 (R), and consider the vectors


       
2 0 −1 −1 0 1 1 1
v1 =   , v2 =   , v3 =   , v4 =  .
0 1 0 0 0 1 1 1

67
Andrea Ferraguti Chapter 2: Determinant and rank

Let B be the following basis of V :


       
1 0 0 1 0 0 0 0
 , , ,  .
0 0 0 0 1 0 0 1

The component matrix of (v 1 , v 2 , v 3 , v 4 ) with respect to B is:


 
2 −1 0 1
 
0 −1 1
 
1
AB = 
.

0 0 0 1
 
1 0 1 1

The determinant of AB is
 
2 −1 0    
  −1 1 −1 0
det AB = − det 0 −1 1 = −2 det   −det  =
 
  0 1 −1 1
1 0 1

= 2 + 1 = 3 6= 0,
and therefore (v 1 , v 2 , v 3 , v 4 ) is a basis of V by Corollary 2.3.10.
Let A ∈ Mm×n (K). The rows of A can be thought as elements of K n ,
since they are n-tuples of elements of K. Analogously, the columns of A can
be thought as elements of K m .

Example 2.3.12. Let


 
2 31
A= √  ∈ M2×3 (R).
4 2 −1

The two rows of A can be seen as the vectors (1, 2, 3) and (4, 2,
3
√−1) in
R . The three columns of A can be seen as the vectors (1, 4), (2, 2) and
(3, −1) in R2 .

68
Andrea Ferraguti Chapter 2: Determinant and rank

Definition 2.3.13. Let A ∈ Mm×n (K). The space of rows of A, denoted


by R(A), is the subspace of K n generated by the rows of A. The space of
columns of A, denoted by C(A), is the subspace of K m generated by the
columns of A.

Example 2.3.14. If A is the matrix of example 2.3.12, the √ space of rows


of A is the subspace of R3 generated by (1, 2, 3) and (4, 2, −1). These
are linearly independent vectors, so such subspace has dimension 2.
The space √ of columns of A is the subspace of R2 generated
√ by the vec-
tors (1, 4), (2, 2) and (3, −1). Notice that (1, 4) and (2, 2) are linearly
independent, while
√ √
3+ 2 13 2 √
(3, −1) = √ (1, 4) + √ (2, 2),
1−4 2 8 2−2

so that (1, 4), (2, 2), (3, −1) are linearly dependent. Hence the space of
columns has dimension 2, too.

Theorem 2.3.15 (Kronecker theorem). Let A ∈ Mm×n (K). Then

dim R(A) = dim C(A) = rk(A).

Proof. Since R(A) = C(t A) and rk(A) = rk(t A), to prove the theorem it
is enough to prove that dim C(A) = rk(A).
Consider the vector space V = K m . The columns C1 , . . . , Cn of A
can be seen as vectors of K m , as in Example 2.3.12. Now consider the
canonical basis (e1 , . . . , em ) of K m . The component matrix of C1 , . . . , Cm
with respect to the canonical basis is exactly the matrix A. Let s =
dim C(A); the matrix A must then have s columns Ci1 , . . . , Cis that are
linearly independent. Let B be the submatrix of A formed by these s
columns. Now we can apply Theorem 2.3.8 to the vector space V and the
matrix B, that is the m × s component matrix of Ci1 , . . . , Cis with respect
to the canonical basis of K m : since these vectors are linearly independent,
the rank of B must be s. Since B is a submatrix of A, we have rk(A) ≥ s.
Now assume by contradiction that r = rk(A) > s. By definition of
rank, this means that there is a square submatrix of A of size r that has
non-zero determinant. Such submatrix is formed by taking r columns

69
Andrea Ferraguti Chapter 2: Determinant and rank

Ci1 , . . . , Cir of A and r rows of A. Now consider the submatrix B of A


formed by the r columns Ci1 , . . . , Cir . This is an m × r matrix that has
a square submatrix of size r with non-zero determinant; hence rk(B) = r.
By Theorem 2.3.8, this means that the vectors Ci1 , . . . , Cir are linearly
independent in K m , but this contradicts the fact that dim C(A) = s <
r.

Corollary 2.3.16.

1. Let A ∈ Mm×n (K). The rank of A is the maximum number of


linearly independent rows or linearly independent columns of A.

2. Let A ∈ Mn (K). Then the following are equivalent:

(a) det A 6= 0;
(b) A is invertible;
(c) rk(A) = n;
(d) the rows (columns) of A are linearly independent, when seen
as vectors in K n ;
(e) the rows (columns) of A are a basis of K n .

Proof. 1. It follows by Theorem 2.3.15 and Corollary 1.4.15.


2. The equivalence (a) ⇐⇒ (b) follows from Theorem 2.1.12.
The equivalence (a) ⇐⇒ (c) follows from the definition of rank.
The equivalence (c) ⇐⇒ (d) follows from Theorem 2.3.15 theorem,
together with Corollary 1.4.15.
The equivalence (d) ⇐⇒ (e) follows from Corollary 1.4.15.
Kronecker’s theorem yields an easier way to compute the rank of a matrix.
By Corollary 2.3.5, if A ∈ Mm×n (K) is a matrix, in order to prove that A
has rank r we need to find a size r square submatrix of A with non-zero
determinant and then show that every square submatrix of A of size r + 1 has
determinant 0. However, this can result in a large number of operations. The
next theorem yields a simpler criterion.

Theorem 2.3.17. Let A ∈ Mm×n (K) be a matrix. Suppose that A has


a square submatrix B of size r with nonzero determinant and that every
square submatrix of A of size r + 1 containing B has determinant 0. Then
rk(A) = r.

70
Andrea Ferraguti Chapter 2: Determinant and rank

Proof. The submatrix B is formed by taking r columns Ci1 , . . . , Cir and


r rows from A, where i1 < i2 < . . . < ir . Now consider the matrix
Be = (Ci1 | . . . |Cir ) ∈ Mm×r (K); this has B as a size r square submatrix
with non-zero determinant, and therefore it has rank r. By Theorem 2.3.8,
the columns Ci1 , . . . , Cir are therefore linearly independent when seen as
vectors of K m .
We now claim that if C is a column of A different from Ci1 , . . . , Cir ,
then Ci1 , . . . , Cir , C are linearly dependent vectors of K m . Notice that
this implies that the columns Ci1 , . . . , Cir span the space of columns C(A),
and since they are linearly independent then by Theorem 2.3.15 we have
rk(A) = r.
Let C be a column of A different from Ci1 , . . . , Cir , and consider the
m × (r + 1) submatrix B e 0 of A formed by taking the columns Ci1 , . . . , Cir
and C. This has B as a submatrix, formed by taking the r columns
Ci1 , . . . , Cir and r rows Rj1 , . . . , Rjr . Now consider the matrix B e 00 formed
only by the r rows Rj1 , . . . , Rjr of B e 0 . This is an r × (r + 1) matrix that
has B as a size r square submatrix with non-zero determinant; hence it
has rank r. By Theorem 2.3.15, this implies that Rj1 , . . . , Rjr are linearly
independent when seen as vectors of K r+1 . Now let R0 be a row of B e0
different from Rj1 , . . . , Rjr . The matrix formed by R0 , Rj1 , . . . , Rjr is an
(r + 1) × (r + 1) submatrix of A that contains B, and hence it has determi-
nant 0 by hypothesis. Therefore, R0 , Rj1 , . . . , Rjr are linearly dependent
vectors of K r+1 . Since this holds for every row of B e 0 , it follows that
Rj1 , . . . , Rjr span R(B e 0 ), and hence the matrix B e 0 has rank r by Theorem
2.3.15. But then, again by Theorem 2.3.15, the r + 1 columns of B e 0 must
be linearly dependent, as desired.

 
1 2 3 4
 
Example 2.3.18. Let A = 5 −1 6 7 ∈ M3×4 (R). The submatrix
 
 
7 −4 3 2
 
1 2
B =   has determinant −11 6= 0, so rk(A) ≥ 2. The 3 × 3
5 −1

71
Andrea Ferraguti Chapter 2: Determinant and rank

submatrices of A that contain B are:


   
1 2 3 1 2 4
   
5 −1 6 and 5 −1 7 ,
   
   
7 −4 3 7 −4 2

and they both have determinant 0. Hence rk(A) = 2 by Theorem 2.3.17.


Notice that A has four different square submatrices of size 3, but thanks
to Theorem 2.3.17 we only need to look at two of them.
Alternatively, one can notice that the first two rows R1 , R2 of A are
linearly independent, while R3 = −3R1 + 2R2 , so that rk(A) = 2 by
Kronecker theorem.

72
Andrea Ferraguti Chapter 3: Linear systems

Chapter 3: Linear systems


3.1 Compatibility of linear systems

Definition 3.1.1. Let K be a field, let m, n ≥ 1 be integers. A linear


system of m equations in n variables with coefficients in K is a system of
equations of the following form:


 a11 x1 + a12 x2 + . . . + a1n xn = b1

a x + a x + . . . + a x = b
21 1 22 2 2n n 2
, (12)


 . . .

am1 x1 + am2 + . . . + amn xn = bm

where the elements aij , bk are in K and x1 , . . . , xn are variables.


The matrix associated to the linear system is the matrix
 
a a12 . . . a1n
 11 
A =  . . . . . . . . . . . .  ∈ Mm×n (K).
 
 
am1 am2 . . . amn

If we let    
x b
 1  1
   
 x2   b2 
X=
 
 and  ,
B= 
. . . . . . 
   
xn bm
we can write linear system (12) in its matrix form

AX = B, (13)

where AX is the usual matrix multiplication.

73
Andrea Ferraguti Chapter 3: Linear systems

 
x
 1
 
 x2 
  ∈ Mn×1 (K) is a solution
Definition 3.1.2. A column vector X =  
. . . 
 
xn
to system (12) if AX = B.
A linear system is compatible if it admits at least one solution.

Note:-
Notice that there is a big difference between the matrix form of a linear
system (13) and the identity AX = B for some X ∈ Mn×1 (K). In fact,
the former is a formal way of expressing (12), that is a system of equations,
while the latter is an identity between matrices with coefficients in K.
When the number of variables is low, typically less than 5, instead of
labeling them as x1 , . . . , xn we will label them as x, y, z, t.

Example 3.1.3.

• The system (
2x + 3z = 1
y − z = −1
is a linear system of 2 equations and 3 variables with coefficients in
R. The matrix associated to the system is
 
2 0 3
A= ,
0 1 −1
 
x  
  1
and if we let X = y  and B =   then the matrix form of
 
  −1
z
the system is AX = B.

74
Andrea Ferraguti Chapter 3: Linear systems



−4
 
A solution is, for example, the vector  2 , and therefore the
 
 
3
 
−7
 
system is compatible. Another one is  4 . We will see later on
 
 
5
that this system has infinitely many solutions.

• The system of 3 equations in 2 variables



x + y = 1

x−y =3 ,

2x − 2y = 4

when considered as a linear system over R, has no solutions, since


clearly the second and the third equations cannot hold true at the
same time for any pair of real numbers (x, y). This system is there-
fore not compatible.

Now let A ∈ Mm×n (K), X ∈ Mn×1 (K) and B ∈ Mm×1 (K). The key obser-
vation that allows to relate linear systems to vector spaces is the following:
 if
x
 1
 
 x2 
C1 , . . . , Cn are the columns of A, so that A = (C1 |C2 | . . . |Cn ), and X = 
 ,

. . . 
 
xn
then
AX = x1 C1 + x2 C2 + . . . + xn Cn .
Namely, multiplying A by X means taking the linear combination of the
columns of A with coefficients x1 , . . . , xn .
Note:-
If AX = B is a linear system, we will denote by (A|B) the matrixobtained

2 2
by adjoining the column B to the matrix A. For example, if A =  
1 −3

75
Andrea Ferraguti Chapter 3: Linear systems

   
1 2 2 1
and B =  , the matrix (A|B) is  .
0 1 −3 0

Theorem
  3.1.4 (Rouché-Capelli). Let K be a field, A ∈ Mm×n (K), X =
x
 1
 
 x2 
  and B ∈ Mm×1 (K). The system AX = B is compatible if and
 
. . . 
 
xn
only if rk(A) = rk(A|B).

Proof. The system is compatible if and only if there exists X ∈ Mn×1 (K)
such that AX = B. By what we observed above, this is equivalent to the
existence of coefficients x1 , . . . , xn ∈ K such that

x1 C1 + x2 C2 + . . . + xn Cn = B,

where C1 , . . . , Cn are the columns of A. This is equivalent to saying that


B ∈ hC1 , . . . , Cn i, which in turn is equivalent to saying that

dimhC1 , . . . , Cn i = dimhC1 , . . . , Cn , Bi.

By Kronecker theorem, this is equivalent to asking that rk(A) = rk(A|B).

A particular case of Rouché-Capelli theorem is that where m = n, i.e. where


the number of equations equals the number of variables.

Theorem 3.1.5 (Cramer). Let AX = B be a linear system over a field K,


where A ∈ Mn (K). If det A 6= 0, then the system is compatible and has a

76
Andrea Ferraguti Chapter 3: Linear systems

unique solution, given by


 
det(B1 )
 det(A) 
 det(B2 ) 
 det(A) 
X= 
.

 ... 
 
det(Bn )
det(A)

Here Bi is the matrix obtained from A by replacing the i-th column with
B.

Proof. First, since det A 6= 0 then by Corollary 2.3.16 we have rk(A) = n.


Now (A|B) is an n × (n + 1) matrix, so that its rank is at most n. Since
A is a submatrix of size n of (A|B) with non-zero determinant, it follows
that rk(A|B) = n and hence by Theorem 3.1.4 the system is compatible.
Now suppose that X1 , X2 ∈ Mn×1 (K) are two solutions of the system,
so that AX1 = AX2 . Since det A 6= 0 then A is invertible, so we can
multiply both sides of the expression by A−1 , obtaining

A−1 (AX1 ) = A−1 (AX2 ).

Since matrix multiplication is associative, we get (AA−1 )X1 = (AA−1 )X2 ,


and since AA−1 = In it follows that X1 = X2 . Hence the solution of the
system is unique.
Notice that the vector X = A−1 B is a solution, because A(A−1 B) =
(AA−1 )B = B, and therefore it is the unique solution. To end the proof,
we just need to show that this vector has the form claimed in the state-

77
Andrea Ferraguti Chapter 3: Linear systems

 
b
 1
 
 b2 
ment. Let B = 
 . Recall that

. . . 
 
bn
 
n+1
A11 −A21 . . . (−1) An1
 
n+2
−A12
 
1  A22 . . . (−1) An2 
A−1 =  ,
det(A) 
 ... ... ... ...


 
(−1)1+n A1n (−1)2+n A2n ... Ann

where Aij is the determinant of the (n − 1) × (n − 1) submatrix of A


obtained by erasing the i-th row and the j-th column. Therefore we have:
P 
n i+1
b (−1) A
 i=1 i i1

 Pn i+2 
−1 1  i=1 bi (−1) Ai2 
A B=  .
det(A) 
 ...


P 
n i+n
i=1 bi (−1) Ain

A moment
Pn of reflection, using Laplace theorem on the j-th column, shows
i+j
that b
i=1 i (−1) A ij is exactly the determinant of the matrix Bj =
(C1 | . . . |Cj−1 |B|Cj+1 | . . . |Cn ), where C1 , . . . , Cn are the columns of A.

     
1 1 −1 x 2
     
Example 3.1.6. Let A = 1 −1 1 , X = y  and B = 0,
     
     
0 0 3 z 1
where all coefficients are real. Using Laplace theorem on the last row of
A we see that det A = −6 6= 0, so by Theorem 3.1.5 the system has a

78
Andrea Ferraguti Chapter 3: Linear systems

unique solution. In the notation of the theorem we have:


     
2 1 −1 1 2 −1 1 1 2
     
B1 = 0 −1 1  , B2 = 1 0 1  , B3 = 1 −1 0 ,
     
     
1 0 3 0 1 3 0 0 1

so that det B1 = −6, det B2 = −8, det B3 = −2. This yields


 
1
 
X = 4/3
 
 
1/3

as the unique solution to the system.

3.2 The rank-nullity theorem


The next step in the theory of linear system is to describe, in a precise sense,
how many solutions a compatible linear system has.

Example 3.2.1.
   
1 2 1
• Let A =   ∈ M2 (R). The system AX =   has a unique
3 4 1
solution, by Theorem 3.1.5.
 
• Let A = 1 2 ∈ M1×2 (R). The system AX = 0 has infinitely
 
−2x
many solutions, since all vectors of the form  , with x ∈ R,
x
are solutions.
   
1 1
• Let A =   ∈ M2×1 (R). The system AX =   has no solu-
2 1
tions, since if x ∈ R = M1 (R) is a solution, then the two equations
x = 1 and 2x = 1 should hold true at the same time.

79
Andrea Ferraguti Chapter 3: Linear systems

Definition 3.2.2. A linear system AX = B is homogeneous if B is the


zero vector.

Remark 3.2.3. A homogeneous linear system is always compatible, as


it admits
  the zero vector as a solution.
 Namely,
 if A ∈ Mm×n (K) and
0 0
   
X = . . . ∈ Mn×1 (K) then AX = . . . ∈ Mm×1 (K). We will denote
   
   
0 0
by 0 the m × 1 or n × 1 zero matrix, so that a homogeneous linear system
will be written as AX = 0.
If A is a square matrix with non-zero determinant, the system AX = 0
only has X = 0 as a solution, by Theorem 3.1.5.

Definition 3.2.4. Let K be a field and A ∈ Mm×n (K). The kernel of A


is the set of solutions of the homogeneous linear system AX = 0. The
kernel of A will be denoted by ker A.
A solution of the system AX = 0 is called nontrivial if it is different
from 0.

Remark 3.2.5. Let A ∈ Mm×n (K). Then ker A is a vector subspace of


Mn×1 (K). In fact, if X, Y ∈ ker A and α, β ∈ K then

A(αX + βY ) = A(αX) + A(βY ) = α(AX) + β(AY ) = 0 + 0 = 0,

so that αX + βY ∈ ker A.

Lemma 3.2.6. Let K be a field and A ∈ Mm×n (K). Let p ∈ {1, . . . , n}


be a natural number. Let ei1 , . . . , eip be vectors of the canonical basis
of Mn×1 (K), with 1 ≤ i1 < i2 < . . . < ip ≤ n. Let C1 , . . . , Cn be the
columns of A. Then Ci1 , . . . , Cip are linearly independent if and only if
ker A ∩ hei1 , . . . , eip i = {0}.

Proof. Start by noticing that hei1 , . . . , eip i coincides with the set of n × 1
matrices (ai1 )i=1...,n ∈ Mn×1 (K) with the property that aj1 = 0 for every

80
Andrea Ferraguti Chapter 3: Linear systems

j∈/ {i1 , . . . , ip }. It follows that the set {AX : X ∈ hei1 , . . . , eip i} coincides
with the set of all linearPcombinations of Ci1 , . . . , CipP : if α1 , . . . , αp ∈ K,
the linear combination j=1 αj Cij corresponds to A( pj=1 αj eij ).
p

The condition ker A ∩ hei1 , . . . , eip i = {0} means that no non-zero vec-
tors in hei1 , . . . , eip i are solutions to the system AX = 0. By what we
said above, this means precisely that no linear combination of Ci1 , . . . , Cip
with not all coefficients being zero gives the zero vector; in other words,
Ci1 , . . . , Cip are linearly independent.

Theorem 3.2.7 (Rank-nullity theorem). Let K be a field and A ∈ Mm×n (K).


Then
dim(ker A) = n − rk(A).

Proof. Let r = rk(A) and p = dim(ker A). We need to prove that


p = n − r. Since rk(A) = r, by Theorem 2.3.15 there are r linearly
independent columns Ci1 , . . . , Cir in A. By Lemma 3.2.6, it follows that
ker A ∩ hei1 , . . . , eir i = {0}. Therefore, by Grassmann formula we have:

dim(hei1 , . . . , eir i + ker A) = r + p ≤ n,

so that p ≤ n − r. Now we have to prove the converse inequality.


Let B be a basis of ker A in Mn×1 (K). By Theorem 1.4.16, the basis B
can be completed to a basis B 0 of Mn×1 (K) using vectors of the canonical
basis. Thus there are ei1 , . . . , ein−p in the canonical basis such that B ∪
(ei1 , . . . , ein−p ) is a basis of Mn×1 (K). Since the latter set is a basis, we
must have, by Grassmann formula, that

ker A ∩ hei1 , . . . , ein−p i = {0}.

By Lemma 3.2.6, it follows that the columns Ci1 , . . . , Cin−p are linearly
independent. By Theorem 2.3.15, this implies that r ≥ n − p, i.e. p ≥
n − r.

Corollary 3.2.8. Let K be a field and A ∈ Mm×n (K). The homogeneous


system AX = 0 has a nontrivial solution if and only if rk(A) < n. In
particular:

1. if m < n, the homogeneous system AX = 0 always has a nontrivial


solution.

81
Andrea Ferraguti Chapter 3: Linear systems

2. if m = n, the homogeneous system AX = 0 has a nontrivial solution


if and only if det A = 0.

Proof. The system AX = 0 has a nontrivial solution if and only if ker A 6=


{0}, that is, if and only if dim ker A > 0. By Theorem 3.2.7, this is
equivalent to asking that n > rk(A).
If m < n then rk(A) ≤ m < n, since rk A ≤ min{m, n}, and hence the
system has a nontrivial solution.
If m = n then the condition rk(A) < n is equivalent to det A = 0 by
Corollary 2.3.16.

Proposition 3.2.9. Let K be a field, A ∈ Mm×n (K) and B ∈ Mm×1 (K).


Suppose that the system AX = B is compatible and let X ∈ Mn×1 (K) be
a solution. Then the set S of all solutions of the system AX = B is given
by:
{X + Z : Z ∈ ker A}.

Proof. Let S = {X +Z : Z ∈ ker A}. We need to prove that S = S. First,


let Y ∈ S, so that AY = B. Since AX = B as well, subtracting these two
equations term by term we get that

A(Y − X) = B − B = 0.

This means exactly that Y − X ∈ ker A, or, in other words, that there
exists Z ∈ ker A such that Y − X = Z. Hence Y = X + Z, that is, Y ∈ S.
Conversely, let Y ∈ S, so that there exists Z ∈ ker A such that Y =
X + Z. Then

AY = A(X + Z) = AX + AZ = B + 0 = B,

so that Y ∈ S.
Proposition 3.2.9 completely describes all solutions of a compatible linear
system AX = B: once we find one solution X of the system, all solutions are
of the form X + Z, where Z is any element of the kernel of A.

Remark 3.2.10. When K = R, if a system of m equations in n variables is


compatible, we will say that it has ∞n−rk(A) solutions. When n = rk(A),
the system has a unique solution.

82
Andrea Ferraguti Chapter 3: Linear systems

3.3 How do I solve a linear system?


To conclude this chapter, we have to explain how to solve a linear system.
Thanks to Theorem 3.2.7 and Proposition 3.2.9, we know the structure of the
set of solutions. Now all remains to do is to actually find the solutions.
We begin by proving a lemma that allows to reduce a linear system to a
smaller one with the same solutions.
 
b
 1
Lemma 3.3.1. Let K be a field, A ∈ Mm×n (K) and B = . . . ∈
 
 
bm
Mm×1 (K). Let the linear system AX = B be compatible and let r =
rk(A) = rk(A|B). Let R1 , . . . , Rm be the rows of A and R
e1 , . . . , Rem be
the rows of (A|B), so that deleting the last entry of 
R
ei one obtains Ri .

R
 i1 
 Ri2 
 
Let 1 ≤ i1 < i2 < . . . < ir ≤ m be such that rk  
 = r, let

 ... 
 
Rir
   
R b
 i1   i1 
R b
   
 i 2
  i 2

A0 = 


 and B 0 = 
. Then the following hold:

 ...   ... 
   
Rir bi r

1. rk(A0 |B 0 ) = r.

2. An element X ∈ Mn×1 (K) is a solution of the system AX = B if


and only if it is a solution of A0 X = B 0 .

Proof. 1. The matrix (A0 |B 0 ) is obtained from A0 by adding a final column,


whose entries are the i1 , i2 , . . . , ir -th entries of B. Hence (A0 |B 0 ) is a
submatrix of (A|B), and since A0 is a submatrix of (A0 |B 0 ) of rank r and
(A|B) has rank r, too, we must have that rk(A0 |B 0 ) = r by Remark 2.3.2.

83
Andrea Ferraguti Chapter 3: Linear systems

2. An element X ∈ Mn×1 (K) is a solution of AX = B if and only if:

Ri X = bi for every i = 1, . . . , m, (14)

where Ri X is the usual matrix multiplication between the i-th row of A


and X. On the other hand, X is a solution to A0 X = B 0 if and only
if Rij X = bij for every j = 1, . . . , r. Hence it is obvious that if X is a
solution of AX = B then it is also a solution of A0 X = B 0 .
Conversely, suppose that X is a solution of A0 X = B 0 . This means
that:
Rij X = bij for every j = 1, . . . , r. (15)
Since rk(A0 |B 0 ) = rk(A0 ) = rk(A) = rk(A|B) = r, we have by Kronecker
theorem that
hR eir i = hR
e i1 , . . . , R em i,
e1 , . . . , R

or, in other words, that Re i1 , . . . , R


eir generate the space of rows of (A|B).
Hence for every k ∈ {1, . . . , m} there exist λ1 , . . . , λr ∈ K such that
Pr
j=1 λj Rij = Rk . This means, in particular, that:
e e

r
X r
X
λj Rij = Rk and λ j bi j = bk . (16)
j=1 j=1

From (15) and (16) we get that


k
! k k
X X X
Rk X = λj Rij X = (λj Rij X) = (λj bij ) = bk .
j=1 j=1 j=1

Since this holds true for every k ∈ {1, . . . , m}, we have by (14) that X is
a solution to AX = B.
What Lemma 3.3.1 is saying is that once we have a compatible linear
system, in order to solve it we can erase equations that are linearly dependent
from the others.

84
Andrea Ferraguti Chapter 3: Linear systems

 
1 1 0 1
 
1 −2 2 −1
 
Example 3.3.2. Let K = R. Let A =   and B =
2 −1 2
 
0
 
1 4 −2 3
 
1
 
 
0
 . Let R1 , . . . , R4 be the rows of A and R e1 , . . . , R
e4 be the rows of
 
1
 
2
(A|B). Since R1 , R3 are linearly independent, while R2 = R3 − R1 and
R4 = 3R1 − R3 , we have rk(A) = 2. Moreover, R e2 = R e3 − R
e1 and
R
e4 = 3Re1 − Re3 , so that rk(A|B) = 2 and the system is compatible. By
Lemma 3.3.1, in order to solve the system we can disregard the second
and the fourth equation, i.e. it is enough to solve the system:
 
x
    
 
1 1 0 1 y  1
   =  .
2 −1 2 0 z 
 
1
 
t

In consequence of Lemma 3.3.1, we can just look at systems of the form


AX = B with A ∈ Mm×n (K) and rk(A) = rk(A|B) = m. In fact, if the
system is compatible then rk(A) = rk(A|B) ≤ m, and we can erase equations
until we get a basis of the space of rows of (A|B).
Hence suppose that A ∈ Mm×n (K) is such that rk(A) = m and the system
AX = B is compatible. Clearly if n = m then Theorem 3.1.5 already tells
us how to solve it. In principle, this theorem can be always used, even when
n > m, in view of the following observation. Since rk(A) = m, the matrix
A has m linearly independent columns, say Ci1 , . . . , Cim . Now consider the
variables xk , with k ∈
/ {i1 , . . . , im }. However we fix values xk ∈ K for such

85
Andrea Ferraguti Chapter 3: Linear systems

variables, the system


 P
a1i1 xi1 + . . . + a1im xim = b1 − k∈{i

 / 1 ,...,im } a1k xk

a x + . . . + a x = b − P
2i1 i1 2im im 2 / 1 ,...,im } a2k xk
k∈{i
. . .


a x + . . . + a x = b − P

mi1 i1 mim im m / 1 ,...,im } amk xk
k∈{i

is compatible, because the matrix that represents it is a square matrix of


full rank, by construction. Therefore, we can just apply Theorem 3.1.5 to the
system
 P


 a 1i 1 x i 1 + . . . + a 1i m x i m = b 1 − / 1 ,...,im } a1k xk
k∈{i

a x + . . . + a x = b − P
2i1 i1 2im im 2 / 1 ,...,im } a2k xk
k∈{i
, (17)


 . . .
a x + . . . + a x = b − P

mi1 i1 mim im m / 1 ,...,im } amk xk
k∈{i

where the variables xk with k ∈ / {i1 , . . . , im } will be treated as parameters.


Applying Theorem 3.1.5 we will obtain an expression of the form:
 P


 x i 1 = d1 + / 1 ,...,im } c1k xk
k∈{i

x = d + P
i2 2 / 1 ,...,im } c2k xk
k∈{i
,


 . . .
x = d + P

im m / 1 ,...,im } cmk xk
k∈{i

where the coefficients cij and d` are elements of k. This means that to ob-
tain all solutions to our linear system, we have to let all variables xk with
k ∈/ {i1 , . . . , im } vary over K, and the values of the remaining variables are
determined by the values of the former ones.
Note:-
Formally, solutions of a linear system AX = B are vectors of Mn×1 (K),
so they should be written as column vectors. However, this is annoying in
practice, so we will often write solutions as vectors in K n , namely as row
vectors.

Example 3.3.3. Let us go back to the system of Example 3.3.2. We have

86
Andrea Ferraguti Chapter 3: Linear systems

seen, thanks to Lemma 3.3.1, that it is enough to solve


 
x
    
 
1 1 0 1 y  1
   =  . (18)
2 −1 2 0 z 
 
1
 
t
 
1 1 0 1
Now, the matrix A =   has rank 2. Let us choose two
2 −1 2 0
linearly independent columns: the first and the second one. These cor-
respond to the variables x and y; treating the other two variables as
parameters we can rewrite the system as:
(
x+y =1−t
.
2x − y = 1 − 2z

Now this can be seen as a system of 2 equations in2 variables,


 namely
1 1
x and y, that is represented by the matrix A0 =  , that has
2 −1
determinant −3. Applying Theorem 3.1.5, we get the solutions:
 
1 1 − t 1
x = − det   = − 1 (2z + t − 2)
3 1 − 2z −1 3

 
1 1 1−t
y = − det   = − 1 (2t − 2z − 1).
3 2 1 − 2z 3

This means that the set S of solutions of (18) is:


  
1 1
S= − (2z + t − 2), − (2t − 2z − 1), z, t : z, t ∈ R .
3 3

Notice how this has exactly the shape predicted by Proposition 3.2.9: a
specific solution of the system is (2/3, 1/3, 0, 0), and the kernel of the

87
Andrea Ferraguti Chapter 3: Linear systems

matrix A is given by:


  
1 1
− (2z + t), − (2t − 2z), z, t : z, t ∈ R ,
3 3

that is a 2-dimensional subspace of R4 .

Note:-
One is not obliged to use Theorem 3.1.5 to solve the system
(
x+y =1−t
.
2x − y = 1 − 2z

For example, here one can also notice that adding up the two equation the
relation
3x = 2 − t − 2z
must hold, so that x = 31 (2 − t − 2z), and substituting this for x in any of
the two equations gives the value of y.
Moreover, one can choose any two linearly independent columns from the
matrix A. For example, a smarter choice here would be to choose the third
and the fourth column, so that the system to be solved becomes:
(
t=1−x−y
,
2z = 1 − 2x + y

whose solution in t and z is obvious, and yields the following equivalent


form of the set of solutions S of the system (18):
  
1
S= x, y, (1 − 2x + y), 1 − x − y : x, y ∈ R .
2

This is exactly the same set we found in Example 3.3.3, but written in a
different way.
We end the section by illustrating another way of solving a system in the
form (17). This is called Gauss elimination method, and it is based on the
following observations.

1. Let A ∈ Mm (K), X ∈ Mn×1 (K), B ∈ Mm×1 (K) and U ∈ GLm (K) be an


invertible matrix. Then AX = B if and only if (U A)X = U B. In fact,
if AX = B then multiplying both sides by U one gets (U A)X = U B,

88
Andrea Ferraguti Chapter 3: Linear systems

and conversely if (U A)X = U B then multiplying both sides by U −1 one


gets AX = B.

2. Let AX = B be a linear system, with A ∈ Mm×n (K) and B ∈ Mm×1 (K).


Let i, j ∈ {1, . . . , m} be two distinct indices. If A0 , B 0 are the matrices
obtained from A, B, respectively, by swapping row i and row j, then X
is a solution of AX = B if and only if it is a solution of A0 X = B 0 .

3. Let A ∈ Mm×n (K) and let R1 , . . . , Rm be the rows of A. Let i, j ∈


{1, . . . , m} with j < i and let c ∈ K. Finally, let
 
1 0 0 0 ... 0
 
 0 1 0 0 ... 0 
 
 
 
. . . . . . . . . . . . . . . . . . 
U =  ∈ Mm (K),
 
. . . c . . . 1 . . . 0 
 
 
. . . . . . . . . . . . . . . . . . 
 
0 0 ... ... 0 1

namely the m × m matrix that differs from the identity matrix just by
the (i, j)-th entry, that is equal to c. Then U is invertible, since it is
lower triangular and the entries on the diagonal are all 1, and U A is an
m × n matrix that coincides with A, except for the fact that the i-throw
is replaced by cRj + Ri .

Given a square matrix A of size m, by applying the three operations de-


scribed above, multiple times if necessary, we can find a matrix U ∈ GLm (K)
such that U A is upper triangular. Notice that we do not really need to compute
U , we can just perform the following two types of operations on A: swapping
two rows, or replacing a row Ri with Ri + cRj , for some other row Rj and
some c ∈ K.
Now let us go back to the linear system (17). In order to ease the notation,
we assume that {i1 , . . . , im } = {1, . . . , m}, but the argument works in any
case. Let A0 = (aij )i,j=1,...,m ∈ Mm (K) be the matrix of the coefficients of the
system and write the system as A0 X = B 0 . By our preliminary reductions,
det A0 6= 0. Now let U ∈ GLm (K) be such that U A0 is upper triangular. The
system U A0 = U B 0 , that has the same solutions as the original system by

89
Andrea Ferraguti Chapter 3: Linear systems

point 1., has the form:


 Pn
0 0 0 0 0
a11 x1 + a12 x2 + . . . + a1n xn = d1 + Pk=m+1 c1k xk


0 0 0 0 n 0
a22 x2 + a23 x3 + . . . + a2n xn = d2 + Pk=m+1 c2k xk



a033 x3 + a034 x4 + . . . + a03n xn = d03 + nk=m+1 c03k xk .

...




a0 x = d 0 + P n
 0
mm m m k=m+1 cmk xk

That is, the last equation directly gives us the value of xm . Substituting this
into the m − 1-th equation we immediately get the value of xm−1 . Substituting
these values into the (m − 2)-th equation we get the value of xm−2 , and so on.
Notice that the fact that det A 6= 0 is crucial, since it ensures that a0ii 6=
0 for every i = 1, . . . , m. Let us show how this algorithm works with two
examples.

Example 3.3.4. Consider the linear system with real coefficients:



3x + 2y + z = 1

x−y+z =0 ,

x−z =2

whose associated matrices are:


   
3 2 1 1
   
A = 1 −1 1  , B = 0 .
   
   
1 0 −1 2

The matrix A is square and has nonzero determinant, so we are already


in the form of (17). Moreover, by Theorem 3.1.5 we know that the system
has a unique solution. Let us bring A into upper triangular form. Let
R1 , R2 , R3 be the rows of A. As a first step, we replace R2 with R2 −1/3R1
and R3 with R3 − 1/3R1 , and we do the same on B. We end up with the

90
Andrea Ferraguti Chapter 3: Linear systems

matrices:
   
3 2 1 1
   
A1 = 0 −5/3 2/3  , B1 = −1/3 .
   
   
0 −2/3 −4/3 5/3

Next, if R10 , R20 , R30 are the rows of A1 , we replace R30 with R30 − 2/5R20 ,
and we do the same on B1 . We end up with:
   
3 2 1 1
   
A2 = 0 −5/3 2/3  , B1 =  −1/3  .
   
   
0 0 −24/15 27/15

The system A2 X = B2 is:



3x + 2y + z = 1

−5y + 2z = −1 ,

−8z = 9

that easily gives x = 7/8, y = −1/4 and z = −9/8.

Example 3.3.5. Consider the system AX = B over R, where:


     
1 1 2 0 0 x 1
   1  
1 0 1 −1 −1 x2  −1
     
     
A = 0 1 1 −1 −1 , X = , B =  1 .
     
x3 
     
     
1 0 1 1 1 x4  0
     
0 1 1 1 1 x5 2

One checks that R2 = R1 − R5 and R3 = R1 − R4 , where R1 , . . . , R5


are the rows of (A|B) while first, fourth and fifth rows of A are linearly
independent. Hence rk(A) = rk(A|B) = 3, and by Theorem 3.2.7 the
system is compatible and has ∞2 solutions.
By Lemma 3.3.1, it is enough to consider the system given by the first,

91
Andrea Ferraguti Chapter 3: Linear systems

fourth and fifth equations, namely the system:



x1 + x2 + 2x3 = 1

x1 + x3 + x4 + x5 = 0 (19)

x2 + x3 + x4 + x5 = 2

whose associated matrices are:


   
1 1 2 0 0 1
   
A0 = 1 and B 0 = 0 .
   
0 1 1 1
   
0 1 1 1 1 2

The matrix A0 has rank 3, by construction. The next step is to select


three linearly independent columns of A0 . For example, we can select the
first, the second and the fourth. Now in system (19) we move the third
and the fifth variables, namely the ones not corresponding to the selected
columns, to the right side of the equalities, getting the system:

x1 + x2 = 1 − 2x3

x1 + x4 = −x3 − x5 . (20)

x2 + x4 = 2 − x3 − x5

Now all we need to do is to solve for x1 , x2 , x4 . This is easy and can


be done by substitution, or via Cramer’s theorem 3.1.5. However, let us
show how to use Gauss’ elimination. The matrices associated to system
(20) are:
   
1 1 0 1 − 2x3
   
A00 = 1 0 1 and B 00 =  −x3 − x5  .
   
   
0 1 1 2 − x3 − x5

In order to bring A00 to upper triangular form, the first step is to replace
its second row R2 with R2 − R1 , and to perform the same operation on

92
Andrea Ferraguti Chapter 3: Linear systems

B 00 . This yields:
   
1 1 0 1 − 2x3
   
A001 = 0 −1 1 and B100 = −1 + x3 − x5  .
   
   
0 1 1 2 − x3 − x5

Now the second and last step is to replace the third row R3 of A001 with
R3 + R2 , and do the same on B100 . We get:
   
1 1 0 1 − 2x3
   
00 00
A1 = 0 −1 1 and B1 = −1 + x3 − x5  .
   
   
0 0 2 1 − 2x5

This corresponds to the system:



x1 + x2 = 1 − 2x3

−x2 + x4 = −1 + x3 − x5 . (21)

2x4 = 1 − 2x5

System (21) is easy to solve: the third equation tells us that x4 = 1/2 −
x5 ; substituting into the second one we get that x2 = 3/2 − x3 , and
substituting the latter into the first tells us that x1 = −1/2 − x3 . This
means that the set of solutions to (21), and thus to (19), is

S = {(−1/2 − x3 , 3/2 − x3 , x3 , 1/2 − x5 , x5 ) : x3 , x5 ∈ R}.

Once again, we can see how the set S matches the prediction of Propo-
sition 3.2.9. A specific solution to the system is (−1/2, 3/2, 0, 1/2, 0),
while
ker A = h(−1, −1, 1, 0, 0), (0, 0, 0, −1, 1)i,
that is a two-dimensional subspace of R5 , so that the set S can be written
as
A = {(−1/2, 3/2, 0, 1/2, 0) + Z : Z ∈ ker A}.

93
Andrea Ferraguti Chapter 4: Scalar products and orthogonality

Chapter 4: Scalar products and orthogonality


4.1 Bilinear forms and scalar products

Definition 4.1.1. Let K be a field and V be a K-vector space. A bilinear


form on V is a function
∗: V × V → K
that satisfies the following properties:

1. for every u, v, w ∈ V ,

(u + v) ∗ w = u ∗ w + v ∗ w;

2. for every u, v, w ∈ V ,

u ∗ (v + w) = u ∗ v + u ∗ w;

3. for every v, w ∈ V and every λ ∈ K,

λ(v ∗ w) = (λv) ∗ w = v ∗ (λw).

If, in addition, for every v, w ∈ V we have that

v ∗ w = w ∗ v,

then ∗ is called a symmetric bilinear form or, alternatively, a scalar prod-


uct.

Example 4.1.2.

• Let V = C3 . The function

∗: V × V → C

((x1 , x2 , x3 ), (y1 , y2 , y3 )) 7→ x1 y1 + x1 y2 + x2 y2 + x3 y3
is a bilinear form. You can check, as an exercise, that properties
1., 2., 3. are satisfied. However, this is not a scalar product, since
for example (1, 1, 0) ∗ (1, 0, 0) = 1 while (1, 0, 0) ∗ (1, 1, 0) = 2.

94
Andrea Ferraguti Chapter 4: Scalar products and orthogonality

• Let V = R2 . The function

∗: V × V → R

((x1 , x2 ), (y1 , y2 )) 7→ x1 y1 + x1 y2 + x2 y1 + x2 y2
is a scalar product.

Definition 4.1.3. Let V be a K-vector space with a scalar product ∗.


Two vectors v, w ∈ V are orthogonal if v ∗ w = 0.

Remark 4.1.4.

• Let V be a K-vector space and ∗ be a scalar product on V . Then


the zero vector is orthogonal to every vector v. In fact, (0 + 0) ∗ v =
2(0 ∗ v) by property 1. of bilinear forms, but on the other hand
(0 + 0) ∗ v = 0 ∗ v, so that 0 ∗ v = 2(0 ∗ v). This implies that 0 ∗ v = 0,
and since a scalar product is a symmetric bilinear form, it follows
that v ∗ 0 = 0.

• It might happen that v ∗ v = 0 even if v is not the zero vector. For


example, in V = R2 consider the scalar product given by:

(x1 , x2 ) ∗ (y1 , y2 ) = x1 y1 − x2 y1 − x1 y2 + x2 y2 .

Then (1, 1) ∗ (1, 1) = 0.

• If v ∗ w = 0 then for every α, β ∈ K we have:

(αv) ∗ (βw) = (αβ)(v ∗ w) = 0.

Definition 4.1.5. Let V be a K-vector space with a scalar product ∗. Let


A ⊆ V be a non-empty subset. The orthogonal complement of A in V is
the set:
A⊥ = {v ∈ V : w ∗ v = 0, ∀w ∈ A}.

Proposition 4.1.6. Let V be a K-vector space, let ∗ be a scalar product


on V and let A ⊆ V be a non-empty subset.

95
Andrea Ferraguti Chapter 4: Scalar products and orthogonality

1. A⊥ is a vector subspace of V .

2. If A ⊆ B ⊆ V , then B ⊥ ⊆ A⊥ .

3. A⊥ = hAi⊥ .

4. hAi ⊆ (A⊥ ) .

5. Let U ⊆ V be a vector subspace and let B be a basis for U . Then


B⊥ = U ⊥ .

Proof. 1. Let v 1 , v 2 ∈ A⊥ , let α1 , α2 ∈ K and let w ∈ A. Then

(α1 v 1 + α2 v 2 ) ∗ w = α1 (v 1 ∗ w) + α2 (v 2 ∗ w) = 0 + 0 = 0,

so that α1 v 1 + α2 v 2 ∈ A⊥ .
2. If v ∈ B ⊥ , then v ∗ w = 0 for every w ∈ B. Since A ⊆ B, it follows
in particular that v ∗ w = 0 for every w ∈ A, and therefore v ∈ A⊥ .
3. Since A ⊆ hAi, by 2. it follows that hAi⊥ ⊆ A⊥ . Conversely,
suppose that v ∈ A⊥ . Let w ∈ hAi. Then, by definition Pn of span, there
exist α1 , . . . , αn ∈ K and v 1 , . . . , v n ∈ A such that w = i=1 αi v i . Then:
n
! n
X X
v∗w =v∗ αi v i = αi (v ∗ v i ) = 0
i=1 i=1

since v ∗ v i = 0 for every i, because v i ∈ A and v ∈ A⊥ .



4. We have that (A⊥ ) = {v ∈ V : v ∗ w = 0 ∀w ∈ A⊥ }, by definition.
However, by point 3. we have that A⊥ = hAi⊥ , so that

(A⊥ ) = {v ∈ V : v ∗ w = 0 ∀w ∈ hAi⊥ }.

Now if v ∈ hAi then by definition v ∗ w = 0 for every w ∈ hAi⊥ , and



therefore v ∈ (A⊥ ) .
5. Follows immediately from 3. by setting A = B.

4.2 Positive definite scalar products


Until the end of this chapter, we will only consider vector spaces over the field
R of real numbers.

96
Andrea Ferraguti Chapter 4: Scalar products and orthogonality

Definition 4.2.1. Let V be an R-vector space and let ∗ be a scalar product


on V . The product ∗ is positive definite if:

1. for every v ∈ V we have v ∗ v ≥ 0;

2. v ∗ v = 0 if and only if v = 0.

Positive definite scalar products will be denoted using a dot: ·.

Example 4.2.2. The most important example of positive definite scalar


product is the usual scalar product on V = Rn . That is, the function

Rn × Rn → R
n
X
((x1 , . . . , xn ), (y1 , . . . , yn )) 7→ xi yi .
i=1

This will be called the standard or euclidean scalar product on Rn .


However, there are many other examples. For instance, the function

R2 × R2 7→ R

((x1 , x2 ), (y1 , y2 )) 7→ 3x1 x2 − x2 y1 − x1 y2 + 3y1 y2


is a positive definite scalar product.

Definition 4.2.3. Let V be an R-vector space and · a positive definite


scalar product on V . The norm of a vector v ∈ V is defined as:

kvk = v · v.

A vector v ∈ V is called versor if kvk = 1. If v ∈ V is a non-zero vector,


1
the versor associated to v is the vector kvk v.

Proposition 4.2.4. Let V be an R-vector space with a positive definite


scalar product ·. For every α ∈ R and every v ∈ V we have:

kαvk = |α|kvk.

97
Andrea Ferraguti Chapter 4: Scalar products and orthogonality

Consequently, if v ∈ V then the versor associated to v has norm 1.

Proof. p p
kαvk = (αv) · (αv) = α2 (v · v) = |α|kvk.

Proposition 4.2.5 (Cauchy-Schwarz inequality). Let V be an R-vector space


with a positive definite scalar product ·. Let u, v ∈ V . Then we have:

|u · v| ≤ kukkvk,

and equality holds if and only if u, v are linearly dependent.

Proof. If u, v are linearly dependent, there exists α ∈ R such that αu = v,


so that

|u · v| = |(αu) · u| = |α|kuk2 = |α|kukkuk = kukkvk,

using Proposition 4.2.4, as desired. Hence we can assume that u, v are


linearly independent. This is equivalent to say that for every α ∈ R we
have αu + v 6= 0. Therefore, for every α ∈ R we have:

0 < (αu + v) · (αu + v) = α2 kuk2 + 2α(u · v) + kvk2 .

This means that the degree 2 polynomial with real coefficients:

x2 kuk2 + 2x(u · v) + kvk2

only assumes positive values. Hence, its discriminant must be negative,


i.e.
(u · v)2 − kuk2 kvk2 < 0,
which is precisely the desired inequality.

Definition 4.2.6. Let V be an R-vector space with a positive definite


scalar product ·. A subset A = {v 1 , . . . , v n } ⊆ V is called an orthogonal
system if v i · v j = 0 for every i 6= j. If in addition every v i is a versor, the
set is called an orthonormal system.
If, in addition, A is a basis then it will be referred to as an orthogonal
basis or an orthonormal basis, respectively.

98
Andrea Ferraguti Chapter 4: Scalar products and orthogonality

Theorem 4.2.7. Let {v 1 , . . . , v n } ⊆ V be an orthogonal system not con-


taining 0. Then v 1 , . . . , v n are linearly independent.

Proof. Suppose that α1 v 1 + . . . + αn v n = 0 for some α1 , . . . , αn ∈ R. For


every i ∈ {1, . . . , n}, taking the scalar product with v i on both sides of
such equality we get:
n
X
0 = 0 · v i = (α1 v 1 + . . . + αn v n ) · v i = αj (v j · v i ) = αi kv i k2 ,
j=1

because v i · v j = 0 whenever i 6= j. Now since v i 6= 0 and · is positive


definite, we have that kv i k2 6= 0, and therefore it must be αi = 0. Since
this is true for every i, the vectors v 1 , . . . , v n are linearly independent.
The next fundamental theorem to prove is that every f.g. R-vector space en-
dowed with a positive definite scalar product has an orthonormal basis. The
proof is based on the following lemma.

Lemma 4.2.8. Let V be an R-vector space endowed with a positive def-


inite scalar product ·. Suppose that {v 1 , . . . , v n } is an orthogonal system
not containing 0 and let w ∈ V be such that w ∈ / hv 1 , . . . , v n i. Let
n
X w·v i
v n+1 = w − v.
i=1
kv i k i
2

Then {v 1 , . . . , v n+1 } is an orthogonal system not containing 0.

Proof. To prove that {v 1 , . . . , v n+1 } is an orthogonal system we only need


to prove that v j · v n+1 = 0 for every j ∈ {1, . . . , n}, since {v 1 , . . . , v n } is
already orthogonal by hypothesis. This holds true because
n
X w·v i w · vj
v n+1 ·v j = w·v j − (v ·v ) = w·v j − (v ·v ) = w·v j −w·v j = 0,
i=1
kv i k2 i j kv j k2 j j

using the fact that {v 1 , . . . , v n } is orthogonal.


Now we need to prove that {v 1 , . . . , v n+1 } does not contain 0. Suppose
by contradiction it does; then necessarily v n+1 = 0, since {v 1 , . . . , v n } does

99
Andrea Ferraguti Chapter 4: Scalar products and orthogonality

not contain 0 by hypothesis. However, if v n+1 = 0 then:


n
X w·v i
w= v,
i=1
kv i k i
2

which is impossible since by hypothesis w ∈


/ hv 1 , . . . , v n i.

Remark 4.2.9. In the hypotheses of Lemma 4.2.8, if w is orthogonal to


every v i , then
n
X w · vi
v n+1 = w − v = w.
i=1
kv i k2 i
In other words, the vector w stays the same.

Theorem 4.2.10 (Gram-Schmidt orthonormalization algorithm). Let V be


a f.g. R-vector space endowed with a positive definite scalar product ·.
Then there exists an orthonormal basis for V .

Proof. Let B = (v 1 , . . . , v n ) be any basis for V . Define v 01 = v 1 and


recursively, for every j = 2, . . . , n, let:
j−1
X v j · v 0i 0
v 0j = vj − 0 2 i
v.
i=1
kv i k

We claim that {v 01 , . . . , v 0n } is an orthogonal basis for V . To see this,


first notice that {v 01 } is an orthogonal system not containing 0. Since
{v 1 , . . . , v n } is a basis, it follows, in particular, that v 2 ∈ / hv 1 i, and there-
fore by Lemma 4.2.8 we have that {v 01 , v 02 } is an orthogonal system not
containing 0. Now again v 3 ∈ / hv 1 , v 2 i, and since v 01 and v 02 are both
linear combinations of v 1 and v 2 , they both belong to hv 1 , v 2 i, so that
v3 ∈/ hv 01 , v 02 i. Hence {v 01 , v 02 , v 03 } is an orthogonal system not containing 0.
Iterating this process, we find that {v 01 , . . . , v 0n } is an orthogonal system
not containing 0. Now by Theorem 4.2.7 such set is linearly independent,
and since dim V = n, it follows that (v 01 , . . . , v 0n ) is a basis.
To conclude, just replace each v 0i with its associated versor; this way
we obtain, by Remark 4.1.4, a sequence of n versors that are still pairwise
orthogonal or, in other words, an orthonormal basis.

100
Andrea Ferraguti Chapter 4: Scalar products and orthogonality

Example 4.2.11.

• Let V = R3 , endowed with the standard scalar product. The se-


quence B = ((1, 1, 1), (1, 0, 1), (0, 0, 2)) is a basis of V . Let us see
how to obtain from B an orthonormal basis using Theorem 4.2.10.
We start by setting v 1 = (1, 1, 1), v 2 = (1, 0, 1), v 3 = (0, 0, 2). The
first step is simply to let

v 01 = v 1 = (1, 1, 1).

Next, we let

v 2 · v 01 0 2
v 02 = v 2 − 0 2 1
v = (1, 0, 1) − (1, 1, 1) = (1/3, −2/3, 1/3).
kv 1 k 3

Finally, we let

v 3 · v 02 0 v 3 · v 01 0
v 03 = v 3 − v − v =
kv 02 k2 2 kv 01 k2 1

2/3 2
= (0, 0, 2) − (1/3, −2/3, 1/3) − (1, 1, 1) = (−1, 0, 1).
2/3 3
The sequence ((1, 1, 1), (1/3, −2/3, 1/3), (−1, 0, 1)) is then an or-
thogonal basis of V . In order to obtain an orthonormal basis, we
just need to divide every vector by its norm. This yields:
  √ !  !
1 1 1 1 2 1 1 1
B0 = √ , √ , √ , √ , −√ , √ , − √ , 0, √ .
3 3 3 6 3 6 2 2

• Let again V = R3 , but this time let it be endowed with a different


positive definite scalar product, namely the function

R3 × R3 → R

(x1 , x2 , x3 ) · (y1 , y2 , y3 )) =
= 2x1 y1 + x1 y2 + x2 y1 + 4x2 y2 + x2 y3 + x3 y2 + 2x3 y3 .
The reader can verify as an exercise that this is indeed a posi-
tive definite scalar product. Now consider the canonical basis B =

101
Andrea Ferraguti Chapter 4: Scalar products and orthogonality

((1, 0, 0), (0, 1, 0), (0, 0, 1)). This is of course an orthonormal basis
with respect to the standard scalar product on V , but it is no longer
orthonormal (and neither orthogonal) with respect to the above
scalar product. Hence, we can apply Gram-Schmidt algorithm and
obtain an orthonormal one. First, we set

v 1 = (1, 0, 0), v 2 = (0, 1, 0), v 3 = (0, 0, 1).

Next, we let
v 01 = v 1 = (1, 0, 0).
Notice that kv 01 k2 = 2. Next, we let:

v 2 · v 01 0 1
v 02 = v 2 − 0 2 1
v = (0, 1, 0) − (1, 0, 0) = (−1/2, 1, 0).
kv 1 k 2

Notice that kv 02 k2 = 7/2 Finally,

v 3 · v 02 0 v 3 · v 01 0
v 03 = v 3 − v − v =
kv 02 k2 2 kv 01 k2 1
1 0
(0, 0, 1) − (−1/2, 1, 0) − (1, 0, 0) = (1/7, −2/7, 1),
7/2 2
so that kv 03 k2 = 12/7. Hence an orthonormal basis for this scalar
product is:
     
1 1 2 1 2 7
√ , 0, 0 , − √ , √ , 0 , √ , − √ , √ .
2 14 14 84 84 84

Theorem 4.2.12. Let V be an R-vector space of dimension n endowed


with a positive definite scalar product. Let {v 1 , . . . , v k } ⊆ V be a subset
of orthogonal vectors not containing the zero vector. Then there exist
v k+1 , . . . , v n such that (v 1 , . . . , v n ) is an orthogonal basis for V .

Proof. By Theorem 4.2.7, the vectors v 1 , . . . , v k are linearly independent.


Therefore we can apply Theorem 1.4.16, and find vectors wk+1 , . . . , wn
such that B = (v 1 , . . . , v k , wk+1 , . . . , wn ) is a basis of V . Now we can
apply Gram-Schmidt algorithm to the basis B. In order to do that, we

102
Andrea Ferraguti Chapter 4: Scalar products and orthogonality

first have to let, for every i ∈ {1, . . . , k}:


i−1
X v i · v 0j
v 0i = vi − v 0j .
j=1
kv 0j k2

However, as noticed in Remark 4.2.9, when we apply the above operation


to a vector that is already orthogonal to the previous ones, we are not
actually doing anything. That is, when performing the Gram-Schmidt
algorithm to B we obtain a basis of the form (v 1 , . . . , v k , w0k+1 , . . . , w0n ), as
required.

Theorem 4.2.13. Let V be an R-vector space of dimension n endowed


with a positive definite scalar product. Let A ⊆ V be a non-empty subset.
Then:
V = hAi ⊕ A⊥ .

Proof. First, we need to show that hAi ∩ A⊥ = {0}. This is easy: first
recall that, by Proposition 4.1.6, A⊥ = hAi⊥ . Hence if v ∈ hAi ∩ A⊥ =
hAi ∩ hAi⊥ then we must have v · v = 0. But v · v = kvk2 , and hence
kvk2 = 0. Since the scalar product is positive definite, this implies that
v = 0.
Next, let m = dim A. To conclude the proof, we need to sho that
dim A⊥ = n − m. By Grassmann formula, we have:

dim(hAi ⊕ A⊥ ) = m + dim A⊥ ,

and since hAi ⊕ A⊥ ⊆ V , we must have

dim A⊥ ≤ n − m.

Now let B = (v 1 , . . . , v m ) be an orthogonal basis of hAi. By Theorem


4.2.12, there exist v m+1 , . . . , v n ∈ V such that (v 1 , . . . , v n ) is an orthogonal
basis of V . Since v m+1 , . . . , v n are orthogonal to v 1 , . . . , v m , they must
belong to A⊥ , since A⊥ = B ⊥ . It follows that hv m+1 , . . . , v n i ⊆ A⊥ . Since
v m+1 , . . . , v n are part of a basis then they are linearly independent, and
hence their span has dimension n − m. It follows that

dim A⊥ ≥ n − m,

103
Andrea Ferraguti Chapter 4: Scalar products and orthogonality

concluding the proof.

Corollary 4.2.14. Let V be an R-vector space of dimension n endowed


with a positive definite scalar product. Let U ⊆ V be a subspace. Then:

(U ⊥ ) = U

Proof. Let m = dim U . By Theorem 4.2.13, dim U ⊥ = n−m. By the same


theorem, dim(U ⊥ )⊥ = n − (n − m) = m. On the other hand, U ⊆ (U ⊥ )⊥
by Proposition 4.1.6, and since both spaces have the same dimension m,
equality must hold.

104
Andrea Ferraguti Chapter 5: Eigenspaces and diagonalization

Chapter 5: Eigenspaces and diagonalization


5.1 Eigenvalues,eigenvectors and eigenspaces

Definition 5.1.1. Let K be a field and A ∈ Mn (K). The characteristic


polynomial of the matrix A is the expression:

pA (x) = det(A − xIn ) ∈ K[x].

The eigenvalues of the matrix A are the roots of the characteristic poly-
nomial.

Remark 5.1.2. By developing the expression det(A − xIn ) with Laplace


theorem, one sees easily that deg pA (x) = n.

Example 5.1.3.
 
1 2
• Let A =   ∈ M2 (R). Then
1 0
 
1−x 2
pA (x) = det   = x2 − x − 2.
1 −x

 
1 0 −1
 
• Let A =  0 1 −1 ∈ M3 (R). Then
 
 
−1 −1 0
 
1−x 0 −1
 
3 2
pA (x) = det  0 1 − x −1  = x − 3x − x + 2
 
 
−1 −1 −x

Notice that if λ ∈ K is an eigenvalue of A, then det(A − λIn ) = 0. This means


that dim ker(A − λIn ) ≥ 1, because of Theorem 3.2.7. Hence, there exists a

105
Andrea Ferraguti Chapter 5: Eigenspaces and diagonalization

non-zero X ∈ Mn×1 (K) such that (A − λIn )X = 0, i.e.

AX = λX.

Definition 5.1.4. Let λ ∈ K be an eigenvalue of A. The eigenspace


relative to λ is the space

Vλ = ker(A − λIn ).

The eigenvectors relative to λ are the non-zero elements of Vλ .

Note:-
Technically, an eigenspace is a subspace of Mn×1 (K), so eigenvectors will be
column vectors. However, since practically it is easier to write rows instead
of columns, we will think of eigenspace as subspaces of K n , i.e. we will write
eigenvectors n-tuples of elements of K.

Definition 5.1.5. Two matrices A, B ∈ Mn (K) are similar if there exists


an invertible matrix P ∈ GLn (K) such that P −1 AP = B.
A matrix A ∈ Mn (K) is diagonalizable if it is similar to a diagonal
matrix. If this is the case, and P −1 AP is diagonal for some P ∈ GLn (K),
the matrix P is called a diagonalizing matrix for A.

 
λ1 0 ... 0
 
 
 0 λ2 . . . 0 
Remark 5.1.6. If D =   ∈ Mn (K) is a diagonal matrix

. . . . . . . . . . . . 
 
0 . . . 0 λn
then
 
λ −x 0 ... 0
 1 
n
λ2 − x . . .
 
 0 0  Y
pD (x) = det(D−xIn ) = det 
=
 (λi −x).
 ... ... ... . . .  i=1
 
0 ... 0 λn − x

106
Andrea Ferraguti Chapter 5: Eigenspaces and diagonalization

Therefore, the eigenvalues of D are precisely the elements on the diagonal.

Lemma 5.1.7. if A, B ∈ Mn (K) are similar, then pA (x) = pB (x).

Proof. Let P ∈ GLn (K) be such that P −1 AP = B. Then:

pB (x) = det(B − xIn ) = det(P −1 AP − xIn ) = det(P −1 AP − xP −1 P ) =

= det(P −1 (A − xIn )P ) = det P −1 · pA (x) · det P =


= (det P )−1 · det P · pA (x) = pA (x).

Theorem 5.1.8. A matrix A ∈ Mn (K) is diagonalizable if and only if the


vector space K n has a basis entirely consisting of eigenvectors of A.

Proof. First, suppose that A is diagonalizable. Let P ∈ GLn (K) be such


that P −1 AP = D with D ∈ Mn (K) a diagonal matrix. Multiplying on
both sides by D, we get that

AP = P D.

Now write P = (P1 |P2 | . . . |Pn ) with P1 , . . . , Pn the columns of P . Then:

AP = (AP1 |AP2 | . . . |APn ),

that is, the columns


 of the matrix  AP are AP1 , . . . , APn . On the other
λ 0 ... 0
 1 
 
 0 λ2 ... 0 
hand, if D = 
 then

. . . . . . . . . . . .
 
0 ... 0 λn

P D = (λ1 P1 |λ2 P2 | . . . |λn Pn ),

that is, the columns of P D are λ1 P1 , . . . , λn Pn . Since the two matrices


are equal, it follows that APi = λi Pi for every i = 1, . . . , n. This means
precisely that every Pi is an eigenvector of A (notice that no Pi can be

107
Andrea Ferraguti Chapter 5: Eigenspaces and diagonalization

the zero vector since P is invertible). Moreover, since det P 6= 0 then the
columns of P are a basis of K n .
Conversely, let P1 , . . . , Pn be a basis of K n consisting of eigenvectors
of A. Let P = (P1 | . . . |Pn ). Since AP = (AP1 | . . . |APn ) and the Pi ’s are
eigenvectors, there exist λ1 , . . . , λn ∈ K such that APi = λi Pi for every
i = 1, . . . , n. Therefore,

AP = (λ1 P1 | . . . |λn Pn ) = DP,


 
λ 0 ... 0
 1 
 
 0 λ2 . . . 0 
where D =  . Since (P1 , . . . , Pn ) is a basis of K n , the

. . . . . . . . . . . . 
 
0 . . . 0 λn
matrix P is invertible and therefore

P −1 AP = D,

so that A is diagonalizable.

Remark 5.1.9. The proof of Theorem 5.1.8 shows how to find a diagonal-
izing matrix for a diagonalizable matrix A ∈ Mn (K): it sufficies to find
a basis (P1 , . . . , Pn ) of K n consisting of eigenvectors of A and then set
P = (P1 |P2 | . . . |Pn ). The matrix P is a diagonalizing matrix for A.

Example 5.1.10.
 
1 2
• Let A =   ∈ M2 (R). The characteristic polynomial of A is:
2 1
 
1−x 2
PA (x) = det   = (1 − x)2 − 4,
2 1−x

so that the eigenvalues of A are λ1 = −1, λ2 = 3. Let us compute

108
Andrea Ferraguti Chapter 5: Eigenspaces and diagonalization

the eigenspaces. We have:


 
2 2
V−1 = ker(A + I2 ) = ker  ,
2 2

and it is immediate to see that this is the subspace

{(α, −α) : α ∈ R} ⊆ R2 .

Next,
 
−2 2
V3 = ker(A − 3I2 ) = ker   = {(α, α) : α ∈ R}.
2 −2

Thus both eigenspaces are 1-dimensional. A basis of V−1 is ((1, −1)),


while a basis of V3 is ((1, 1)). Since (1, −1) and (1, 1) are linearly
independent, they together constitute a basis of R2 . This means
that ((1, −1), (1, 1)) is a basis of R2 consisting of eigenvectors of A.
Hence the matrix  
1 1
P = 
−1 1
is a diagonalizing matrix for A. In fact one can verify that:
 
−1 0
P −1 AP =  .
0 3

 
1 1
• Let A =   ∈ M2 (R). We have pA (x) = (1 − x)2 , so that
0 1
the unique eigenvalue
 ofA is λ = 1. The relative eigenspace is
0 1
ker(A − I2 ) = ker  , that is a 1-dimensional subspace of R2 .
0 0
2
Therefore, R cannot have a basis consisting of eigenvectors of A; it
follows that A is not diagonalizable.

109
Andrea Ferraguti Chapter 5: Eigenspaces and diagonalization

• Let A be the second matrix in Example 5.1.3, so that

pA (x) = x3 − 3x2 − x + 2 = (x + 1)(x − 1)(x − 2).

The eigenvalues are therefore: λ1 = −1, λ2 = 1 and λ3 = 2. Hence


 
2 0 −1
 
V−1 = ker(A + I3 ) = ker  0 2 −1 .
 
 
−1 −1 1

To find such kernel, we need to solve the homogeneous linear system



2x − z = 0

2y − z = 0 .

−x − y + z = 0

The associated matrix is of course A + I2 , that has rank 2. The first


two rows are linearly independent, so we can disregard the third
equation and solve the system
(
2x − z = 0
,
2y − z = 0

whose set of solutions is

V−1 = {(α, α, 2α) : α ∈ R} = h(1, 1, 2)i.

Next,  
0 −1
0
 
V1 = ker(A − I2 ) = ker  0 0 −1 ,
 
 
−1 −1 −1
so that
V1 = {(α, −α, 0) : α ∈ R} = h(1, −1, 0)i.

110
Andrea Ferraguti Chapter 5: Eigenspaces and diagonalization

Finally,  
−1 0 −1
 
V2 = ker(A − 2I2 ) = ker  0 −1 −1
 
 
−1 −1 −2
so that
V2 = {(α, α, −α) : α ∈ R} = h(1, 1, −1)i.
3
The sequence ((1, 1, 2), (1, −1, 0), (1, 1, −1)) is a basis
 of R , and

1 1 1
 
therefore the matrix A is diagonalizable. If P = 1 −1 1 
 
 
2 0 −1
then  
−1 0 0
 
P −1 AP =  0 1 0 .
 
 
0 0 2

Definition 5.1.11. Let A ∈ Mn (K) and let λ ∈ K be an eigenvalue of A.


The algebraic multiplicity aλ of λ is the multiplicity of λ as a root of the
characteristic polynomial of A.
The geometric multiplicity gλ of λ is the dimension of the eigenspace
relative to λ.

Remark 5.1.12. We have that aλ , gλ ≥ 1. In fact, this is obvious by


definition for the algebraic multiplicity, and since gλ = dim(ker(A − λIn ))
and det(A − λIn ) = 0 we have that gλ ≥ 1 by Theorem 3.2.7.

Theorem 5.1.13. Let A ∈ Mn (K) and let λ ∈ K be an eigenvalue of A.


Then:
g λ ≤ aλ .

Proof. Let Vλ be the eigenspace relative to λ. Let P1 , . . . , Pm be a basis


of Vλ , so that m = gλ . Let us fix a basis B = (P1 , . . . , Pm , Pm+1 , . . . , Pn )
of K n , thanks to Theorem 1.4.16, and let P = (P1 |P2 | . . . |Pm ), where we

111
Andrea Ferraguti Chapter 5: Eigenspaces and diagonalization

think of the Pi ’s as column vectors. Then:

P −1 AP = (P −1 AP1 |P −1 AP2 | . . . |P −1 APn ),

and since P1 , . . . , Pm ∈ Vλ we have APi = λPi for every i ∈ {1, . . . , m}.


Hence:

P −1 AP = (P −1 λP1 |P −1 AP2 | . . . |P −1 λPm |P −1 APm+1 | . . . |P −1 APn ) =

(λP −1 P1 |λP −1 P2 | . . . |λP −1 Pm |P −1 APm+1 | . . . |P −1 APn ).


Now notice that P −1 Pi is simply the i-th vector of the canonical basis,
since
In = P −1 P = (P −1 P1 |P −1 P2 | . . . |P −1 Pn ).
Therefore, the first m columns of P −1 AP are λe1 , λe2 , . . . , λem . It follows
that when we compute the characteristic polynomial of P −1 AP we will
obtain (λ − x)m q(x), for some polynomial q(x) ∈ K[x]. In other words,
(λ − x)m is a factor of the characteristic polynomial of P −1 AP . But this
equals pA (x), thanks to Lemma 5.1.7. Hence the algebraic multiplicity of
λ is at least m, as desired.

Definition 5.1.14. An eigenvalue λ of a matrix A ∈ Mn (K) is called


regular if aλ = gλ .

Remark 5.1.15. If aλ = 1 then λ is regular, by Remark 5.1.12 and Theo-


rem 5.1.13.

Proposition 5.1.16. Let A ∈ Mn (K) and let λ1 , . . . , λm ∈ K be distinct


eigenvalues. Then the sum Vλ1 + Vλ2 + . . . + Vλm is direct.

Proof. By induction on m. For m = 1 there is nothing to prove. Now


assume that the claim is true for m − 1. Let X ∈ Vλ1 + Vλ2 + . . . + Vλm .
We need to show that this can be written in a unique way as a sum of
eigenvectors belonging to different eigenspaces. Suppose then that:

X = X1 + X2 + . . . + Xm = Y1 + Y2 + . . . + Ym

where Xi , Yi ∈ Vλi for every i = 1, . . . , m. Subtracting the two expressions

112
Andrea Ferraguti Chapter 5: Eigenspaces and diagonalization

we get:
(X1 − Y1 ) + (X2 − Y2 ) + . . . + (Xm − Ym ) = 0. (22)
Multiplying by A on the left both sides of (22) we get:

λ1 (X1 − Y1 ) + λ2 (X2 − Y2 ) + . . . + λm (Xm − Ym ) = 0, (23)

since Xi − Yi is an eigenvector relative to λi , for every i. On the other


hand, we can multiply both sides of (22) by λm and get:

λm (X1 − Y1 ) + λm (X2 − Y2 ) + . . . + λm (Xm − Ym ) = 0. (24)

Subtracting (24) from (23), we get:

(λ1 − λm )(X1 − Y1 ) + . . . + (λm−1 − λm )(Xm−1 − Ym−1 ) = 0

or, in other words:

(λ1 −λm )X1 +. . .+(λm−1 −λm )Xm−1 = (λ1 −λm )Y1 +. . .+(λm−1 −λm )Ym−1 .

We have therefore written a vector of Vλ1 + . . . + Vλm−1 in two ways; by


the inductive hypothesis these two ways must coincide. That is, for every
i ∈ {1, . . . , m − 1} we must have:

(λi − λm )Xi = (λi − λm )Yi .

Since the λi ’s are all distinct, the coefficient λi − λm is non-zero, and


therefore we get:

Xi = Yi for every i ∈ {1, . . . , m − 1}.

Substituting in (22), we get that Xm = Ym , too, and the proof is complete.

Theorem 5.1.17. Let A ∈ Mn (K). Then A is diagonalizable if and only


if all eigenvalues belong to K and they are all regular.

Proof. First, suppose that A is diagonalizable. By Theorem 5.1.8, the


space K n has a basis B consisting of eigenvectors of A. Since such eigen-
vectors must belong to certain eigenspaces, there exist pairwise distinct
eigenvalues λ1 , . . . , λm ∈ K of A such that B ⊆ Vλ1 + Vλ2 + . . . + Vλm . By

113
Andrea Ferraguti Chapter 5: Eigenspaces and diagonalization

Proposition 5.1.16, the latter sum is direct and hence

B ⊆ Vλ1 ⊕ Vλ2 ⊕ . . . ⊕ Vλm .

Since hBi = V , we must have Vλ1 ⊕ Vλ2 ⊕ . . . ⊕ Vλm = K n . Therefore,


by Grassmann formula,

n = gλ1 + . . . + gλm .

On the other hand, gλi ≤ aλi for every i by Theorem 5.1.13, and of course
aλ1 +. . .+aλm ≤ n by the fundamental theorem of algebra 0.5.3. It follows
that:
n = gλ1 + . . . + gλm ≤ aλ1 + . . . + aλm ≤ n,
and hence equality must hold everywhere. In particular, aλ1 + . . . + aλm =
n, so that λ1 , . . . , λm are all the eigenvalues of A, and so they all belong
to K, and gλi = aλi for every i, so that all eigenvalues are regular.
Conversely, suppose that all eigenvalues of A belong to K and they
are all regular. Let such eigenvalues be λ1 , . . . , λm . Since aλi = gλi for
every i, we have that gλ1 + . . . + gλm = n because, as we explained above,
aλ1 + . . . + aλm = n by the fundamental theorem of algebra. On the other
hand,
dim(Vλ1 + . . . + Vλm ) = dim(Vλ1 ⊕ . . . ⊕ Vλm )
by Proposition 5.1.16. Altogether, these observations imply that

V λ1 ⊕ . . . ⊕ V λm = K n .

Hence if Bi is a basis of Vλi for every i, we have, by Proposition 1.5.4, that


B1 ∪ B2 ∪ . . . ∪ Bm is a basis of K n consisting of eigenvectors of A, and it
follows that A is diagonalizable by Theorem 5.1.8.

Example 5.1.18.

• We have seen in Example 5.1.10 that the matrix


 
1 1
  ∈ M2 (R)
0 1

is not diagonalizable. In fact, such matrix does not satisfy the hy-

114
Andrea Ferraguti Chapter 5: Eigenspaces and diagonalization

potheses of Theorem 5.1.17, since 1 is a non-regular eigenvalue: it


has algebraic multiplicity 2 but geometric multiplicity 1.
 
0 −1
• The matrix A =   ∈ M2 (R) is not diagonalizable for a
1 0
different reason: its characteristic polynomial is x2 + 1, that has
no roots in R. So this matrix has no real eigenvalues, and again
it does not satisfy the hypotheses of Theorem 5.1.17. However,
if we think of A as an element of M2 (C) rather than an element
of M2 (R), then the eigenvalues belong to the base field C, since
they are ±i, and they are both regular, since they have algebraic
multiplicity
 1.  Hence the matrix A is diagonalizable over C. In fact
1 1
if P =   then one can verify that:
−i i
 
i 0
P −1 AP =  .
0 −i

5.2 Real symmetric matrices

Definition 5.2.1. We recall that a square matrix A ∈ Mn (K) is called


symmetric if A = t A.

If X ∈ Cn , we denote by X the vector whose entries are the complex


conjugates of the entries of X. Namely, if X = (x1 , . . . , xn ) then:

X = (x1 , . . . , xn ).

Notice that if X = (x1 , . . . , xn ) ∈ Cn is non-zero then


n
X
t
XX= |xi |2 > 0. (25)
i=1

Theorem 5.2.2. Let A ∈ Mn (R) be a symmetric matrix. Then its eigen-


values are all real.

115
Andrea Ferraguti Chapter 5: Eigenspaces and diagonalization

Proof. Let pA (x) ∈ R[x] be the characteristic polynomial of A. If λ ∈ C


is an eigenvalue of A, namely a root of pA (x), then also λ is an eigenvalue
of A (cf. Lemma 0.5.4). Moreover, if X is an eigenvector of A relative to
λ then
AX = λX,
and taking conjugates we get:

AX = AX = λX = λ · X, (26)

using the fact that A = A since A has real entries. This means that
X is an eigenvector of A relative to the eigenvalue λ. Using (26) together
with the fact that A = t A we get:

λ(t X · X) = t (λX) · X = t (AX) · X = t X t AX =

= t X(AX) = t XλX = λ(t XX).


Equating the first and the last terms of the above chain of equalities we
see that λ(t X · X) = λ(t X · X), namely

(λ − λ)(t X · X) = 0.

Since X is an eigenvector, it is not the zero vector, and therefore by (25)


we have t X · X 6= 0. It follows that λ = λ, or, in other words, that
λ ∈ R.
The standard scalar product · : Rn × Rn → R can be extended to a scalar
product
• : Cn × Cn → C
n
X
((x1 , . . . , xn ), (y1 , . . . , yn )) 7→ xi yi .
i=1

Note:-
The standard scalar product Cn × Cn → C defined above is not positive
definite; in fact it makes no sense to talk about positive complex numbers.
Moreover, it is not even true that (x1 , . . . , xn ) • (x1 , . . . , xn ) = 0 if and only
if (x1 , . . . , xn ) = 0. For example, (1, i) • (1, i) = 1 − 1 = 0.

Proposition 5.2.3. Let A ∈ Mn (R) be a symmetric matrix. Let λ, µ ∈ R

116
Andrea Ferraguti Chapter 5: Eigenspaces and diagonalization

be distinct eigenvalues of A. Let X be an eigenvector relative to λ and Y


an eigenvector relative to µ. Then X • Y = 0.

Proof. By hypothesis, AX = λX and AY = µY . Hence

λ(X • Y ) = λ(t X · Y ) = t (λX) · Y =

= t (AX) · Y = (t X t A)Y = t XAY =


= t XµY = µ(X • Y ),
so that
(λ − µ)(X • Y ) = 0.
Since by hypothesis λ 6= µ, we get that X • Y = 0.

Definition 5.2.4. A matrix A ∈ Mn (R) is orthogonal if its rows form an


orthonormal basis of Rn and its columns form an orthonormal basis of
Rn .

 √ √ 
1/ 2 1/ 2
Example 5.2.5. The matrix A =  √ √  ∈ M2 (R) is orthogo-
1/ 2 −1/ 2
√ √ √ √
nal, since ((1/ 2, 1/ 2), (1/ 2, −1/ 2)) is an orthonormal basis of R2 .

Remark 5.2.6. An orthogonal matrix is necessarily invertible, since its


rows/columns are linearly independent.

Lemma 5.2.7. A matrix A ∈ Mn (R) is orthogonal if and only if t A = A−1 .

Proof. Let R1 , . . . , Rn be the rows of A and C1 , . . . , Cn be the columns of


A. If A is orthogonal, then by definition the following conditions hold:

1. Ri • Rj = 0 for every i 6= j;

2. Ri • Ri = 1 for every i;

3. Ci • Cj = 0 for every i 6= j;

117
Andrea Ferraguti Chapter 5: Eigenspaces and diagonalization

4. Ci • Ci = 1 for every i.

On the other hand, t A = A−1 if and only if the following conditions hold:

(a) A · t A = In ;

(b) t A · A = In .

To conclude the proof, it is enough to look at how the products A · t A


and t A · A are computed. The matrix At A is given by (aij )i,j∈{1,...,n} where
aij = Ri • Rj for every i, j. Hence such matrix equals In precisely if
conditions 1. and 2. hold true. Similarly, we have
t
A · A = (Ci • Cj )i,j∈{1,...,n} ,

so that t A · A = In if and only if conditions 3. and 4. hold true.

Remark 5.2.8. If A, B ∈ Mn (R) are orthogonal, then AB is orthogonal,


since
(AB)−1 = B −1 A−1 = t B t A = (AB)t .

Definition 5.2.9. A matrix A ∈ Mn (R) is orthogonally diagonalizable


if there exists an orthogonal matrix O ∈ GLn (R) such that O−1 AO is
diagonal.

Theorem 5.2.10 (Spectral theorem). A matrix A ∈ Mn (R) is orthogonally


diagonalizable if and only if it is symmetric.

Proof. First, suppose that A is orthogonally diagonalizable, and let O ∈


GLn (R) be such that O−1 AO = D, with D ∈ Mn (R) diagonal. By Lemma
5.2.7, we have that O−1 = t O, and hence t OAO = D. On the other hand,
a diagonal matrix is symmetric, so that D = t D. Hence
t
OAO = D = t D = t (t OAO) = t Ot AO,

so that t OAO = t Ot AO. Multiplying this equality on the left by O and


on the right by t O yields A = t A, i.e. that A is symmetric.
Conversely, suppose that A is symmetric. We will prove the claim by
induction on n. For n = 1, there is nothing to do because A is itself

118
Andrea Ferraguti Chapter 5: Eigenspaces and diagonalization

diagonal and equals 1 × A × 1, with 1 being an orthogonal (1 × 1)-matrix.


Now suppose that the claim is true for all square matrices of size n − 1
and consider our matrix A ∈ Mn (R). Let λ ∈ R be an eigenvalue (that
exists thanks to Theorem 5.2.2), and let X be an eigenvector relative to
λ, normalized so that X • X = 1. Now choose an orthonormal basis
B = (X, X2 , . . . , Xn ) of Rn and let P = (X|X2 | . . . |Xn ) be the matrix
whose columns are the vectors in B. Then
t
P AP = t P (AX|AX2 | . . . |AXn ) = t P (λX|AX2 | . . . |AXn ).

Now notice that since B is an orthonormal basis, when we compute the


product t P (λX) we obtain the columns vector t (λ, 0, 0, . . . , 0), and there-
fore  
t
λ v
P AP =  , (27)
t
0 C
where v ∈ Rn−1 , t 0 is the column vector of size n − 1 with only 0 entries
and C ∈ Mn−1 (R). Since the matrix A is symmetric, then so is t P AP ,
and hence expression (27) implies that v is the zero vector in Rn−1 and
C = t C. Now we can use the inductive hypothesis: C ∈ Mn−1 (R) is
symmetric, and hence orthogonally diagonalizable. Let Q ∈ GLn (R) be
an orthogonal matrix such that t QCQ = D, with D diagonal. Now let
 
1 0
P0 =  ;
t
0 Q

this is orthogonal because Q is orthogonal and O = P P 0 is orthogonal


since both P and P 0 are (see Remark 5.2.8). All in all, we have that:
t
OAO = t (P P 0 )A(P P 0 ) = t P 0 (t P AP )P 0 =
   
λ 0 λ 0
= tP 0  P0 =  ,
t t
0 C 0 D
that is diagonal, ending the proof.

119
Andrea Ferraguti Chapter 6: Affine geometry

Chapter 6: Affine geometry


6.1 Affine spaces

Definition 6.1.1. An affine space of dimension n over a field K is a triple


(A, V, f ) where:

• A is a non-empty set, whose elements are called points of the affine


space;

• V is a K-vector space of dimension n;

• f : A × A → V is a function with the following properties:

(i) For every P ∈ A and every v ∈ V , there exists a unique Q ∈ A


such that f (P, Q) = v.
(ii) If P, Q, R ∈ V are such that f (P, Q) = v and f (Q, R) = w,
then f (P, R) = v + w.

One shall think of A as the set of points in the affine space, while elements of V
are vectors that represent the ”difference” between two points. The following
examples are the keys to understanding the concept of affine space.

Example 6.1.2. Let K = R, let A = R2 , seen as the well-known set of


points in the cartesian plane, and let V be the vector space R2 . Given
two points P = (xP , yP ) and Q = (xQ , yQ ) in A, we define

f (P, Q) := (xQ − xP , yQ − yP ) ∈ V.

Let us check that the triple (A, V, f ) is an affine space of R of dimension


2. First, given a point (xP , yP ) ∈ A and a vector v ∈ V , we can write
v = (v1 , v2 ) for some v1 , v2 ∈ R. Hence if we let Q := (v1 + xP , v2 + yP )
then f (P, Q) = (v1 + xP − xP , v2 + yP − yP ) = v, and it is easy to check
that Q is the unique point of A with this property. Therefore property
(i) holds true. Moreover, if P, Q, R ∈ A are such that f (P, Q) = v and
f (Q, R) = w, then writing P = (xP , yP ), Q = (xQ , yQ ) and R = (xR , yR )
we get that v = (xQ − xP , yQ − yP ) and w = (xR − xQ , yR − yQ ), so that

120
Andrea Ferraguti Chapter 6: Affine geometry

v + w = (xR − xP , yR − xR ). On the other hand, by definition

f (P, R) = (xR − xP , yR − yP ),

so that f (P, R) = f (P, Q) + f (Q, R) and (ii) holds true as well.

Example 6.1.2 is nothing else than the well-known cartesian plane. The
points of the space are given by pairs of real numbers, and for every pair of
points P, Q in the plane there is a vector connecting them. This vector can be
visualized by drawing a straight arrow from P to Q pointing at Q. However,
one needs to be careful using this graphic representation because vectors with
the same length, direction and verse are the same element of the underlying
vector space V . For example, the vector connecting the points (0, 0) and (1, 0)
is the same vector that connects (0, 1) and (1, 1), although graphically we
represent the two vectors as two separate objects.
The function f of example 6.1.2 is nothing else that the function that
associates to a pair of points P, Q the vector of V = R2 that connects them.
Property (i) then just says that given a point P and a vector v, one can
”translate” the point P by v in a unique way, obtaining a new point Q. For
example, if one is given the point P = (1, 1) and the vector v = (−2, 0), the
unique translate of P by v is the point Q = (−1, 1). Finally, property (ii)
simply says that if Q is the translate of P by a vector v and R is the translate
of Q by a vector w then R is the translate of P by the vector v + w.
Example 6.1.2 can of course be generalized to any dimension and any field.
We will see later on that this is, in a way, the only relevant example.

Example 6.1.3. Let n ≥ 1 be an integer, K a field, A = K n , seen as a


set of points, and V = K n seen as an n-dimensional K-vector space. Let
f : A × A → V be the function defined by

((x1 , . . . , xn ), (y1 , . . . , yn )) 7→ (y1 − x1 , . . . , yn − xn ).

Then the triple (A, f, V ) is an affine space of dimension n over the field K.
Of course when n = 2 and K = R we recover Example 6.1.2. When n = 3
and K = R this is also a well-known object: it is nothing else that the
3-dimensional space, where every point is identified uniquely by a triple
of real numbers, usually referred to as ”coordinates”. Once again, given
two points in the space it is possible to connect them by a unique vector,
represented as a straight arrow starting at the first point and pointing at

121
Andrea Ferraguti Chapter 6: Affine geometry

the second one.


The case n > 3 and K = R cannot be drawn on paper of course, since
our world is only 3-dimensional, but it works exactly in the same way: a
point is identified with an n-tuple of real numbers, and there is a unique
vector connecting two distinct points.
When n = 1 and K = R the affine space above is simply the real line:
its points are real numbers, and the vector connecting two real numbers
is just their difference.

Definition 6.1.4. The affine space defined in Example 6.1.3 will be de-
noted by An (K).

From now on, when (A, V, f ) is an affine space and P, Q ∈ A, we will denote
−→
by P Q the vector f (P, Q).

Proposition 6.1.5. Let (A, V, f ) be an affine space over a field K, and let
P, Q, R, S ∈ A. The following hold true.
−→ −→
1. P Q = P R if and only if Q = R;
−→
2. P Q = 0 if and only if P = Q;
−→ −→
3. if v = P Q, then −v = QP ;
−→ −→ −→ −→
4. P Q = RS if and only if P R = QS.
−→ −→ −→
Proof. 1. Of course if Q = R then P Q = P R. Conversely, let v := P Q.
By property (i) of the function f , there exists a unique point X ∈ A such
−−→ −→
that P X = v. Therefore that point must be Q, and hence if P R = v then
necessarily R = Q.
−→ −→ −→
2. By property (ii) of the function f , we have P P = P P + P P , and
−→ −→ −→ −→
therefore P P = 0. Conversely, if P Q = 0, then P Q = P P , and therefore
Q = P by point 1.
−→ −→ −→
3. By property (ii) of the function f , we have P P = P Q + QP . By
−→ −→
point 2., this implies that P Q = −QP .
−→ −→ −→ −→ −→
4. By property (ii) of f , we have P Q + QS = P S = P R + RS. The
claim follows immediately.

122
Andrea Ferraguti Chapter 6: Affine geometry

Definition 6.1.6. Let (A, V, f ) be an affine space of dimension n over a


field K. Let P ∈ A and v ∈ V . The translate of P by v, denoted by
−→
tv (P ), is the unique point Q ∈ A such that P Q = v.
Given a vector v ∈ V , the translation map associated to v is the map

tv : A → A

P 7→ tv (P )

Note:-
The existence of the translate of P by v is granted by property 1. of the
function f associated to an affine space.

Corollary 6.1.7. The translation map tv : A → A is a bijection.

Proof. Suppose tv (P ) = tv (Q) = R. This is the same as saying that


−→ −→ −→ −→
P R = v = QR. By Proposition 6.1.5, it follows that RP = −v = RQ and
in turn that P = Q. Hence tv is injective.
If Q ∈ A, by the definition of affine space there exists a unique P ∈ A
−→ −→
such that QP = −v. By Proposition 6.1.5 it follows that P Q = v, namely
we have tv (P ) = Q. This shows that tv is surjective.

6.2 Linear subspaces

Definition 6.2.1. Let (A, V, f ) be an affine space of dimension n over a


field K. Let O ∈ A and W ⊆ V be a vector subspace of dimension m.
The linear subspace associated to O and W is the set

[O, W ] := {tv (O) : v ∈ W }.

The vector subspace W is called translation space of [O, W ]. The point


O is called origin of [O, W ].

In other words, a linear subspace of an affine space is the set of all translates
of a point via vectors that lie in a vector subspace of V .

123
Andrea Ferraguti Chapter 6: Affine geometry

Example 6.2.2. Let A2 (R) be the affine space of Example 6.1.2. Let
W ⊆ V be the subspace generated by the vector (1, 1) and let O =
(0, 0) ∈ A2 (R). The linear subspace [O, W ] is the set of all points in the
plane that are translates of (0, 0) via a multiple of (1, 1). Depicting A2 (R)
as the usual cartesian plane, [O, W ] is nothing else than the bisector of
the first quadrant.
Similarly, let A3 (R) be the affine space of Example 6.1.3 with n = 3
and K = R. Let O = (1, 1, 1) ∈ A3 (R) and let W = h(1, 0, 0), (0, 1, 0)i ⊆
V . Depicting A3 (R) in the usual way, the linear subspace [O, W ] is the
plane passing by (1, 1, 1) and parallel to the xy-plane.
In general, we will see that linear subspaces in An (K) are nothing else
than sets of solutions of linear systems!

Proposition 6.2.3. Let (A, V, f ) be an affine space of dimension n over a


field K, and let [O, W ] be a linear subspace. Then the triple

([O, W ], W, f |[O,W ]×[O,W ] )

is an affine space of dimension m.

Proof. In order to ease the notation, we will write f |[O,W ] in place of


f |[O,W ]×[O,W ] .
First, we need to show that f |[O,W ] takes values in W . In other words,
we need to show that if P, Q ∈ [O, W ] then f (P, Q) ∈ W . Since P, Q ∈
[O, W ] then by definition there exist v, w ∈ W such that f (O, P ) = v and
f (O, Q) = w. Then by Proposition 6.1.5 we have that f (P, O) = −v and
since (A, V, f ) is a vector space we have

−v + w = f (P, O) + f (O, Q) = f (P, Q).

It follows that f (P, Q) = w − v, and since W is a vector subspace the


latter difference belongs to W . Hence f |[O,W ] takes values in W .
Next, we need to show that f |[O,W ] satisfies properties 1. and 2. of
Definition 6.1.1. Let then P ∈ [O, W ] and v ∈ W . Since (A, V, f ) is an
affine space, there exists a unique Q ∈ A such that f (P, Q) = v. Showing
that Q ∈ [O, W ] is equivalent then to proving that property 1. holds for
f |[O,W ] . Since P ∈ [O, W ], by definition of linear subspace there is a

124
Andrea Ferraguti Chapter 6: Affine geometry

w ∈ W such that f (O, P ) = w. Then

f (O, Q) = f (O, P ) + f (P, Q) = w + v,

so that f (O, Q) ∈ W . But this means precisely that Q is a translate of O


via a vector of W , namely that Q ∈ [O, W ].
Finally, let P, Q, R ∈ [O, W ]. Then f (P, Q) + f (Q, R) = f (P, R)
because (A, V, f ) is an affine space. On the other hand of course f (P, Q)
equals f |[O,W ] (P, Q), and the same holds true for f (Q, R) and f (P, R).
Property 2. then holds true for f |[O,W ] .

Remark 6.2.4. One can prove that if (A, V, f ) is an affine space, B ⊆


A is a non-empty subset and W ⊆ V is a vector subspace such that
(B, W, f |B×B ) is an affine space, then for every O ∈ B the set {tw (O) : w ∈
W } is a linear subspace of (A, V, f ).

Proposition 6.2.5. Let (A, V, f ) be an affine space of dimension n over a


field K, and let [O, W ] be a linear subspace. Then we have:

[O, W ] = [O0 , W ] for every O0 ∈ [O, W ].

In other words, every point of a linear subspace can be taken as origin of


the subspace.

Proof. Let O0 ∈ [O, W ]. First, we prove that [O, W ] ⊆ [O0 , W ]. Let


−→
Q ∈ [O, W ]. Then there exists w ∈ W such that w = OQ. On the other
hand O0 ∈ [O, W ] as well, and therefore there exists w0 ∈ [O, W ] such that
−−→
w0 = OO0 . It follows that
−−0→ −−0→ −→
O Q = O O + OQ = w − w0 ∈ W,

and therefore Q ∈ [O0 , W ], by definition.


To conclude the proof we need to show that [O0 , W ] ⊆ [O, W ]. So
−−→
let Q ∈ [O0 , W ]. Then there is some w ∈ W such that O0 Q = w. Since
−−→
O0 ∈ [O, W ], there is also some v ∈ W such that OO0 = v. It follows that
−→ −−→0 −−0→
OQ = OO + O Q = v + w ∈ W,

125
Andrea Ferraguti Chapter 6: Affine geometry

so that Q ∈ [O, W ].

Proposition 6.2.6. Let (A, V, f ) be an affine space over a field K and let
[O, W ] and [O0 , W 0 ] be two linear subspaces.

1. If P ∈ [O, W ] ∩ [O0 , W 0 ], then

[O, W ] ∩ [O0 , W 0 ] = [P, W ∩ W 0 ].

2. [O, W ] ⊆ [O0 , W 0 ] if and only if [O, W ] ∩ [O0 , W 0 ] 6= ∅ and W ⊆ W 0

Proof. 1. Since P ∈ [O, W ] ∩ [O0 , W 0 ], by Proposition 6.2.5 we can write


[O, W ] = [P, W ] and [O0 , W 0 ] = [P, W 0 ]. Therefore it is enough to show
that [P, W ] ∩ [P, W 0 ] = [P, W ∩ W 0 ].
First, we show that [P, W ∩ W 0 ] ⊆ [P, W ] ∩ [P, W 0 ]. Let then Q ∈
−→
[P, W ∩ W 0 ], and write w = P Q with w ∈ W ∩ W 0 . Since w ∈ W , it
follows that Q ∈ [P, W ], and since w ∈ W 0 , it follows that Q ∈ [P, W 0 ].
Hence Q ∈ [P, W ] ∩ [P, W 0 ].
Conversely, we need to show that [P, W ] ∩ [P, W 0 ] ⊆ [P, W ∩ W 0 ]. Let
Q ∈ [P, W ] ∩ [P, W 0 ]. Then there are vectors w ∈ W and w0 ∈ W 0 such
−→ −→
that P Q = w and P Q = w0 . But then w = w0 , hence w ∈ W ∩ W 0 and it
follows that Q ∈ [P, W ∩ W 0 ].
2. First assume that [O, W ] ⊆ [O0 , W 0 ]. It is then obvious that [O, W ]∩
[O0 , W 0 ] 6= ∅. Moreover, since O ∈ [O0 , W 0 ] then by Proposition 6.2.5 we
have [O0 , W 0 ] = [O, W 0 ]. Now we have to show that W ⊆ W 0 . If w ∈ W ,
then Q := tw (O) ∈ [O, W 0 ], but by definition of [O, W 0 ] we must also have
−→
Q = tw0 (O) for some w0 ∈ W 0 . It follows that w = OQ = w0 , and so
w ∈ W 0 . This shows that W ⊆ W 0 .
Conversely, assume that [O, W ] ∩ [O0 , W 0 ] 6= ∅ and W ⊆ W 0 . By point
1. we have that [O, W ] ∩ [O0 , W 0 ] = [P, W ∩ W 0 ] for some P ∈ [O, W ] ∩
[O0 , W 0 ]. Since W ⊆ W 0 , it follows that [O, W ] ∩ [O0 , W 0 ] = [P, W ]. Since
−→ −→
P ∈ [O, W ], we have that OP ∈ W , and hence P O ∈ W by Proposition
6.1.5. Hence O ∈ [P, W ] and by Proposition 6.2.5 it follows that

[O, W ] ∩ [O0 , W 0 ] = [O, W ].

This implies obviously that [O, W ] ⊆ [O0 , W 0 ].

126
Andrea Ferraguti Chapter 6: Affine geometry

Definition 6.2.7. Two linear subspaces [O, W ] and [O0 , W 0 ] of an affine


space (A, V, f ) are called parallel if W ⊆ W 0 or W 0 ⊆ W .
To denote parallel linear subspaces we write [O, W ] k [O0 , W 0 ].

Proposition 6.2.8. If [O, W ] and [O0 , W 0 ] are two parallel subspaces of an


affine space (A, V, f ), then one of the following hold true:

1. [O, W ] ⊆ [O0 , W 0 ];

2. [O0 , W 0 ] ⊆ [O, W ];

3. [O, W ] ∩ [O0 , W 0 ] = ∅.

In particular, if dim W = dim W 0 then either [O, W ] = [O0 , W 0 ] or [O, W ]∩


[O0 , W 0 ] = ∅.

Proof. Follows immediately form Proposition 6.2.6.

6.3 Relative position of linear subspaces

Definition 6.3.1. Let (A, V, f ) be an affine space of dimension n, and let


[O, W ] be a linear subspace.

• If W = {0}, then [O, W ] is called point.

• If dim W = 1, then [O, W ] is called line.

• If dim W = 2, then [O, W ] is called plane.

• If dim W = n − 1, then [O, W ] is called hyperplane.

Remark 6.3.2. Linear subspaces of dimension 0, namely points, are noth-


ing else than elements of A. In fact by definition a point [O, {0}] is the
set {t0 (O)} = {O}.

Definition 6.3.3. Let [O, W ] and [O0 , W 0 ] be two linear subspaces of an


affine space. We say that [O, W ] lies on [O0 , W 0 ] if [O, W ] ⊆ [O0 , W 0 ].

127
Andrea Ferraguti Chapter 6: Affine geometry

Proposition 6.3.4. Let (A, V, f ) be an affine space of dimension 2, and


let r, s be two lines. If r ∩ s = ∅, then r k s.

Proof. Let r = [P, hvi] and s = [Q, hwi] for some non-zero vectors v, w ∈
V . By contradiction, suppose that r and s are not parallel. Then w ∈ /
hvi, as otherwise we would have hwi ⊆ hvi. Hence v and w are linearly
independent, and therefore dimhv, wi = 2. Since dim V = 2, it follows
−→
that V = hv, wi. Hence there exist α, β ∈ K such that P Q = αv + βw.
−−→
Let Q0 ∈ s be such that QQ0 = −βw. Then:
−−→0 −→ −−→0
P Q = P Q + QQ = αv + βw − βw = αv.

It follows that Q0 ∈ r, but of course Q0 ∈ s by construction. Hence


r ∩ s 6= ∅, contradicting the hypothesis.

Definition 6.3.5. Let (A, V, f ) be an affine space of dimension n ≥ 3.


Two lines that are not parallel and have empty intersection are called
skew. Two lines that lie on the same plane are called coplanar.

Proposition 6.3.6. Let (A, V, f ) be an affine space of dimension n ≥ 3.

1. Two lines r, s in (A, V, f ) are skew if and only if they are not coplanar

2. There exist two skew lines in (A, V, f ).

3. Two skew lines lie on parallel planes.

Proof. 1. First, assume that r, s are skew. Let r = [P, hvi] and s =
[Q, hwi]. Suppose by contradiction that they lie on a plane π = [O, W ],
where W ⊆ V is a subspace of dimension 2. By Proposition 6.2.3, π is an
affine space of dimension 2. Since r ∩ s = ∅, by Proposition 6.3.4 they are
parallel, but this contradicts the hypothesis.
Conversely, assume that there is no plane containing r and s. Assume
by contradiction that r ∩ s 6= ∅. Let P ∈ r ∩ s. Write r = [P, hvi] and
s = [P, hwi] for some non-zero v, w ∈ V , thanks to Proposition 6.2.5. The
space hv, wi is at most 2-dimensional, hence there exists a subspace W ⊆
V such that dim W = 2 and hv, wi ⊆ W . Then the linear subspace [P, W ]
is a plane that clearly contains both lines, contradicting the hypothesis.

128
Andrea Ferraguti Chapter 6: Affine geometry

Hence it must be r ∩ s = ∅. Now suppose by contradiction that r k s.


Then there exists a non-zero vector v ∈ V such that r = [P, hvi] and
s = [Q, hvi]. Since r ∩ s = ∅, it must be r 6= s and hence we can assume
−→
that Q ∈ / r. Let π = [P, hP Q, vi]. We claim that this is a plane containing
−→
both r and s. First, P Q cannot be proportional to v for as it was, then
Q would be a translate of P via a multiple of v, and hence it would lie on
r, which is impossible. The r lies on π by construction. Let now R ∈ s.
−→
Then QR = αv for some α ∈ K. On the other hand Q lies on π by
−→
construction, since it is the translate of P via P Q. Hence R lies on π as
well, since it is the translate of Q via a multiple of v. This shows that s
lies on π, so that r, s are coplanar, contradicting the hypothesis. Hence
r, s must be skew.
2. Since dim V ≥ 3, there exist three linearly independent vectors in
V . Let them be u, v, w. Let P ∈ A and let r be the line [P, hui]. Next,
let Q = tv (P ) and let s be the line [Q, hwi]. Let us prove that r and s are
skew lines. By point 1. we just have to prove that they there is no plane
that contains both of them. Suppose by contradiction that there is one,
−→
call it π = [P, W ] with dim W = 2. Since Q ∈ π, then v = P Q ∈ W . On
the other hand, since π contains both r and s we must have u, w ∈ W .
But then hu, v, wi ⊆ W , that is impossible since dim W = 2.
3. Let r = [P, hvi] and s = [Q, hwi] be skew lines. Then dimhv, wi = 2,
as otherwise r, s would be parallel. Then the planes π = [P, hv, wi] and
π 0 = [Q, hv, wi] are clearly parallel and contain r and s, respectively.

Proposition 6.3.7. Let (A, V, f ) be an affine space of dimension 3.

1. A line r and a plane π with empty intersection are parallel.

2. Two planes π, σ with empty intersection are parallel.

3. If two distinct planes π, σ intersect in a point P , then their intersec-


tion is a line through P .

4. Every line r is contained in at least two distinct planes.

Proof. 1. Let P be a point of r, so that r = [P, hui] for some non-zero


u ∈ V . Let π = [Q, hv, wi] for some linearly independent v, w ∈ V .
Suppose by contradiction that r and π are not parallel. Then u ∈/ hv, wi
and hence we have dimhu, v, wi = 3. Since dim V = 3 as well, it follows

129
Andrea Ferraguti Chapter 6: Affine geometry

−→
that V = hu, v, wi. Therefore P Q = αu + βv + γw for some α, β, γ ∈ K.
−−→
Let Q0 ∈ π be the point such that QQ0 = −βv − γw. Then
−−→0 −→ −−→0
P Q = P Q + QQ = αu.

Then Q0 ∈ r, but on the other hand by construction Q0 ∈ π. It follows


that Q0 ∈ r ∩ π, contradicting the hypothesis.
2. Let π = [P, W ] and σ = [Q, W 0 ] with dim W = dim W 0 = 2. Suppose
by contradiction that π, σ are not parallel. Then W 6= W 0 , and so there
exists a non-zero vector v ∈ W \ W 0 . Then [P, hvi] is a line contained in
π that is not parallel to σ. By 1. it follows that r and σ have non-empty
intersection, but then also π and σ do, contradicting the hypothesis.
3. Let π = [Q, W ] and σ = [Q0 , W 0 ] with dim W = dim W 0 = 2. Let
P ∈ π ∩ σ. Since π 6= σ it must be that W 6= W 0 , as otherwise we
could write π = [P, W ] = σ. Hence by Grassmann formula it follows that
dim(W ∩ W 0 ) = 1, and by Proposition 6.2.6 we have

π ∩ σ = [P, W ∩ W 0 ],

that is a line through P .


4. Let r = [P, hui] for some non-zero u ∈ V . Let us complete (u) to
a basis of V , via Theorem 1.4.16: there exist v, w such that (u, v, w) is a
basis of V . Then [P, hu, vi] and [P, hu, wi] are two planes both containing
r. Moreover, they are distinct because if they were equal then we would
have tw (P ) ∈ [P, hu, vi] and consequently w ∈ hu, vi, that is impossible
since u, v, w are linearly independent.

Definition 6.3.8. Let (A, V, f ) be an affine space. Two lines that intersect
in a single point, a line and a plane that intersect in a single point or two
planes that intersect in a single line are called incident.

Thanks to the propositions we have proved, we can give a complete descrip-


tion of the relative position of linear subspaces of affine spaces in dimension 2
and 3.

Theorem 6.3.9. Let (A, V, f ) be an affine space of dimension ≥ 2, and let

130
Andrea Ferraguti Chapter 6: Affine geometry

r, s be two lines. Then:


 (
 skew
∅



coplanar, parallel and distinct
r∩s= .


 a point incident

a line r=s

Theorem 6.3.10. Let (A, V, f ) be an affine space of dimension 3 and let


π, σ be two planes. Then:

∅
 parallel and distinct
π ∩ σ = a line incident .

a plane π=σ

Theorem 6.3.11. Let (A, V, f ) be an affine space of dimension 3 and let


r be a line and π be a plane. Then:

∅
 parallel and r does not lie on π
r ∩ π = a point incident .

a line r⊆π

6.4 Coordinate systems and equations of subspaces

Definition 6.4.1. Let (A, V, f ) be an affine space. A system of coordinates


on (A, V, f ) is a pair (O, B) where O ∈ A is a point called origin and B
is a basis of V .

Theorem 6.4.2. Let (A, V, f ) be an affine space of dimension n, let O ∈ A


and B = (v 1 , . . . , v n ) be a basis of V .

1. The map
Λ[O,B] : A → K n
P 7→ (x1 , . . . , xn ),

131
Andrea Ferraguti Chapter 6: Affine geometry

−→ P
where (x1 , . . . , xn ) ∈ K n is the only n-tuple such that OP = ni=1 xi v i ,
is a bijection.

2. Let ΦB : V → K n be the bijection defined by


n
!
X
ΦB αi v i = (α1 , . . . , αn )
i=1

(see Corollary 1.4.13). If P, Q ∈ A, are such that Λ[O,B] (P ) =


(x1 , . . . , xn ) and Λ[O,B] (Q) = (y1 , . . . , yn ), where Λ[O,B] is the bijec-
tion given by point 1., then
−→
ΦB (P Q) = (y1 − x1 , . . . , yn − xn ).

Proof. 1. First, suppose that P, Q ∈ A are such that Λ[O,B] (P ) = Λ[O,B] (Q) =
−→ P −→
(x1 , . . . , xn ). This means that OP = ni=1 xi v i = OQ. However, P by defini-
tion of affine space, given the point O ∈ A and the vector ni=1 xi v i ∈ V ,
−−→ P
there exists a unique point X ∈ A such that OX = ni=1 xi v i . This means
that P = X = Q. Therefore Λ[O,B] is injective. Pn
Next, if (x1 , . . . , xn ) ∈ K n , let v = i=1 xi v i ∈ V . Once again by
−→
definition of affine space, there exists a unique P ∈ A such that OP = v.
But then Λ[O,B] (P ) = (x1 , . . . , xn ). This proves that Λ[O,B] is surjective as
well.
2. Since Λ[O,B] (P ) = (x1 , . . . , xn ) and Λ[O,B] (Q) = (y1 , . . . , yn ), the
−→ P −→ P
vector OP is ni=1 xi v i while OQ = ni=1 yi v i . It follows that
n n n
−→ −→ −→ X X X
P Q = P O + OQ = − xi v i + yi v i = (yi − xi )v i ,
i=1 i=1 i=1

and therefore
−→
ΦB (P Q) = (y1 − x1 , . . . , yn − xn ).

Theorem 6.4.2 is of the utmost importance. It shows that every affine space
of dimension n over a field K, once a coordinate system is chosen, ”becomes”
the affine space An (K) described in Example 6.1.3. The bijection Λ[O,B] , that
depends on the choice of a coordinate system, yields a dictionary that allows
to translate all properties of an affine space of dimension n over a field K into
properties of the very concrete object An (K).

132
Andrea Ferraguti Chapter 6: Affine geometry

Note:-
−→
Notice that since OO = 0, we have that Λ[O,B] (O) = (0, . . . , 0). Namely, the
map Λ[O,B] transforms the origin O of a coordinate system into the point
(0, . . . , 0) ∈ K n .
Moreover, ΦB (0) = (0, . . . , 0).

From now on, with a slight abuse of notation, we will write An (K) to
denote the set of points of the latter affine space, although technically An (K)
denotes a triple of objects, according to the definition of affine space.
The next theorem describes how linear subspaces transform under Λ[O,B] .
First, we prove two preliminary lemmas.

Lemma 6.4.3. Let V be a K-vector space of dimension n and let B =


(v 1 , . . . , v n ) be a basis of V .

1. For every w1 , w2 ∈ V and every α, β ∈ K we have

ΦB (αw1 + βw2 ) = αΦB (w1 ) + βΦB (w2 ).

2. Let W ⊆ V be a subspace of dimension m. Then ΦB (W ) is a


subspace of K n of dimension m.
Pn Pn
Proof. 1. Let w1 = i=1 ai v i and w2 = i=1 bi v i . Then
n
X
αw1 + βw2 = (αai + βbi )v i ,
i=1

so that

ΦB (αw1 + βw2 ) = (αa1 + βb1 , αa2 + βb2 , . . . , αan + βbn ).

On the other hand, αΦB (w1 ) = α(a1 , . . . , an ) and βΦB (w2 ) = β(b1 , . . . , bn ),
and the claim follows easily.
2. First, we show that ΦB (W ) is a subspace of K n . Let w1 , w2 ∈
ΦB (W ), so that there exist u1 , u2 ∈ W such that ΦB (u1 ) = w1 and
ΦB (u2 ) = w2 . Let α, β ∈ K. Then

ΦB (αu1 + βu2 ) = αΦB (u1 ) + βΦB (u2 ) = αw1 + βw2

133
Andrea Ferraguti Chapter 6: Affine geometry

by point 1. This proves that αw1 + βw2 ∈ ΦB (W ), and therefore ΦB (W )


is a vector subspace of V .
To prove that it has dimension m, let (u1 , . . . , um ) be a basis P of W . If
w ∈ ΦB (W ), then w = ΦB (u) for some u ∈ W . But u = m i=1 αi ui for
some α1 , . . . , αm ∈ K, since (u1 , . . . , um ) is a basis of W . Hence
m
! m
X X
w = ΦB (u) = ΦB αi ui = αi ΦB (ui )
i=1 i=1

by point 1. This proves that ΦB (u1 ), . . . , ΦB (um ) generate ΦB (W ). To


prove that these vectors are linearly independent, suppose first that
m
X
αi ΦB (ui ) = 0
i=1

for some α1 , . . . , αm ∈ K. Then once again by point 1. we get


m
!
X
0 = ΦB αi ui .
i=1

However ΦB is a bijection, and since ΦB (0) = 0, it follows that m


P
i=1 αi ui =
0. Since (u1 , . . . , um ) is a basis of W , we get that α1 = . . . = αm = 0.

Lemma 6.4.4. Let • be the standard scalar product on K n , i.e. the map

Kn × Kn → K
n
X
((x1 , . . . , xn ), (y1 , . . . , yn )) 7→ xi y i .
i=1
n
Let W be a subspace of K .

1. dim W ⊥ = n − dim W .

2. (W ⊥ )⊥ = W .

3. Let dim W = m, and let (u1 , . . . , un−m ) be a basis of W ⊥ , where

ui = (ci1 , ci2 , . . . , cin ) ∈ K n for every i.

134
Andrea Ferraguti Chapter 6: Affine geometry

Then the matrix


 
c c12 ... c1n
 11 
 
 c21 c22 ... c2n 
C=  ∈ M(n−m)×n (K)
 
 ... ... ... ... 
 
c(n−m)1 c(n−m)2 . . . c(n−m)n

has rank n − m and has the property that ker C = W .

Proof. Let (w1 , . . . , wm ) be a basis of W , so that dim W = m. Write wi =


(ai1 , . . . , ain ) ∈ K n . Let
 A be the matrix
 of size m × n with coefficients in
K whose i-th row is ai1 . . . ain , for every i. This is nothing else than
the transpose of the matrix AB described in Theorem 2.3.8, with B being
the canonical basis of K n .
Now let • be the standard scalar product on K n . We claim that

W ⊥ = ker A. (28)

In fact, by Proposition 4.1.6 we have that W ⊥ = {w1 , . . . , wm }⊥ .


Hence v ∈ W ⊥ if and only if wi • v = 0 for every i = 1, . . . , m, but
on the other hand a moment of reflection shows that

(w1 • v, . . . , wm • v) = Av,

that is, v ∈ W ⊥ if and only if v ∈ ker A.


1. Since the rows of A are linearly independent, we have rk(A) = m by
Theorem 2.3.8. By Theorem 3.2.7, we have dim ker A = n − m, and (28)
implies then that
dim W ⊥ = n − dim W.
2. By point 1. applied to W ⊥ and then to W we have that

dim(W ⊥ )⊥ = n − dim W ⊥ = n − (n − dim W ) = dim W,

and since W ⊆ (W ⊥ )⊥ by Proposition 4.1.6, it must be (W ⊥ )⊥ = W .


3. Redoing the construction described at the beginning of the proof to

W we get by (28) that
(W ⊥ )⊥ = ker C,
and so the claim by 2.

135
Andrea Ferraguti Chapter 6: Affine geometry

Theorem 6.4.5. Let (A, V, f ) be an affine space of dimension n over a


field K. Let B = (v 1 , . . . , v n ) be a basis of V , let O ∈ A and let [P, W ] be
a linear subspace of dimension m.

1. The subset Λ[O,B] ([P, W ]) ⊆ An (K) is the linear subspace

[Λ[O,B] (P ), ΦB (W )].

2. A subset S ⊆ An (K) is a linear subspace if and only if there


exists a matrix A ∈ M(n−m)×n (K) of rank n − m and a matrix
B ∈ M(n−m)×1 (K) such that S is the set of solutions of the linear
system AX = B.

Proof. 1. Notice that thanks to Lemma 6.4.3, clearly [Λ[O,B] (P ), ΦB (W )]


is a linear subspace. In order to ease the notation, we will write Λ for
Λ[O,B] and Φ for ΦB . Moreover, we let Λ(P ) = (x1 , . . . , xn ).
First, let (y1 , . . . , yn ) ∈ Λ([P, W ]). This means that there is some
−→
Q ∈ [P, W ] such that Λ(Q) = (y1 , . . . , yn ). Hence OQ = ni=1 yi v i . On
P
−→ P
the other hand, OP = ni=1 xi v i , and therefore
n
−→ −→ −→ X
P Q = OQ − OP = (yi − xi )v i ∈ W.
i=1

It follows that
n
!
X
Φ (yi − xi )v i = (y1 − x1 , . . . , yn − xn ) ∈ Φ(W ),
i=1

and since Λ(Q) = Λ(P )+(y1 −x1 , . . . , yn −xn ), we get precisely that Λ(Q)
is a translate of Λ(P ) via a vector of Φ(W ), i.e. that Λ(Q) ∈ [Λ(P ), Φ(W )].
We have thus proved that Λ([P, W ]) ⊆ [Λ(P ), Φ(W )].
Conversely, let (y1 , . . . , yn ) ∈ [Λ(P ), Φ(W )]. Then

(y1 , . . . , yn ) = (x1 , . . . , xn ) + (a1 , . . . , an ),

where (a1 , . . . , an ) ∈ Φ(W ). Since the map Λ is a bijection by Theorem


6.4.2, there exists a unique Q ∈ A such that Λ(Q) = (y1 , . . . , yn ), or, in
−→ −→
other words, such that OQ = ni=1 yi v i . Since OP = ni=1 xi v i , we get
P P

136
Andrea Ferraguti Chapter 6: Affine geometry

that n
−→ −→ −→ X
P Q = OQ − OP = ai v i .
i=1
Pn
Now Φ ( i=1 ai v i ) = (a1 , . . . , an ), and since the map Φ isP a bijection and
(a1 , . . . , an ) ∈ Φ(W ) by hypothesis, we must have that ni=1 ai v i ∈ W .
−→
Hence P Q ∈ W , that is, Q is a translate of P via a vector of W , i.e.
Q ∈ [P, W ]. Therefore Λ(Q) = (y1 , . . . , yn ) ∈ Λ([P, W ]). It follows that
[Λ(P ), Φ(W )] ⊆ Λ([P, W ]).
2. First, suppose that S is a linear subspace. Then S = [P, W ], for
some P ∈ K n and W vector subspace of K n . By Lemma 6.4.4, there
exists an (n − m) × n matrix A of rank n − m such that W = ker A. Now
if P = (x1 , . . . , xn ) then S, as a subset of An (K) = K n , is nothing else
that
{(x1 , . . . , xn ) + (z1 , . . . , zn ) : (z1 , . . . , zn ) ∈ ker A}.
 
x
 1
Setting B := A . . . and appealing to Proposition 3.2.9, it follows that
 
 
xn
S is the set of solutions of the linear system AX = B.
Conversely, if AX = B is a linear system of n − m equations in n
variables and rk A = n − m, then by Theorem 3.2.7 and Proposition 3.2.9,
if (x1 , . . . , xn ) ∈ K n is a solution of the system then the set S of all
solutions is

S = {(x1 , . . . , xn ) + (z1 , . . . , zn ) : (z1 , . . . , zn ) ∈ ker A}.

In other words, S is the linear subspace [(x1 , . . . , xn ), ker A].

Theorem 6.4.5 essentially tells us that we can reduce the study of linear sub-
spaces of affine spaces to the study of solutions of linear systems.

Remark 6.4.6. Theorem 6.4.5 shows that given a linear subspace S of


An (K), defined by a linear system of the form AX = B with A an (n −
m) × n matrix of rank n − m, the translation space of S is the kernel of
A.

137
Andrea Ferraguti Chapter 6: Affine geometry

6.5 Equations for lines, planes and hyperplanes


Recall that a line in An (K) is a linear subspace of dimension 1. Given Theorem
6.4.5, a line corresponds to the set of solutions of a linear system AX = B, with
A ∈ M(n−1)×n (K) a matrix of rank n − 1. This means that a line ` ⊆ An (K)
can be described by a system of equations of the form:


 a11 x1 + a12 x2 + . . . + a1n xn = b1

a x + a x + . . . + a x = b
21 1 22 2 2n n 2
`: , (29)


 ...

a(n−1)1 x1 + . . . + a(n−1)n xn = bn−1

where the matrix associated to the system has rank n − 1.

Definition 6.5.1. A system of the form (29) that describes a line in An (K)
is called a system of cartesian equations for a line.

Example 6.5.2. In A2 (K), a system as (29) becomes

a11 x1 + a12 x2 = b1 ,

where the rank condition simply means that (a11 , a12 ) 6= (0, 0).
In A3 (K), a system of cartesian equations for a line has the form:
(
a11 x1 + a12 x2 + a13 x3 = b1
,
a21 x1 + a22 x2 + a23 x3 = b2

with the condition that


 
a11 a12 a13
rk  =2
a21 a22 a23

or, in other words, that the two vectors (a11 , a12 , a13 ) and (a21 , a22 , a23 )
are linearly independent.

The description of a line ` via a system of the form (29) is implicit, namely,
we don’t have a parametrization for the points of the line. To do this, it is
enough to solve the system. Notice that since rk(A) = n − 1, Theorem 3.2.7
implies that ker A is a 1-dimensional vector subspace of K n , that is therefore

138
Andrea Ferraguti Chapter 6: Affine geometry

generated by a non-zero vector (a1 , . . . , an ) ∈ K n . Proposition 3.2.9 implies


that the solutions of system (29) all have the form

(p1 , . . . , pn ) + t(a1 , . . . , an )

with t ∈ K and (p1 , . . . , pn ) ∈ K n . In other words, the same line defined by


(29) can be described by the following set of equations:


 x1 = p1 + ta1

x = p + ta
2 2 2
`: . (30)


 . . .

xn = pn + tan

We stress the fact that (p1 , . . . , pn ) and (a1 , . . . , an ) are two elements of K n ,
the second one being non-zero, while t is a parameter. The point (p1 , . . . , pn )
is a point lying on the line, corresponding to t = 0. All other points can be
found by letting t vary in K.

Definition 6.5.3. A system in the form of (30) is called parametric equa-


tion for a line.
In order to pass from a cartesian equation to a parametric one, it is enough to
solve the linear system. However it is also possible to pass from a parametric
equation to a cartesian one. In fact, equations (30) can be read in the following
way: for all points (x1 , . . . , xn ) ∈ `, the vectors (x1 − p1 , . . . , xn − pn ) and
(a1 , . . . , an ) are linearly dependent, as they are proportional. This means that
the matrix  
x − p 1 x2 − p 2 . . . xn − p n
 1 
a1 a2 ... an
has rank 1. In other words, points of ` satisfy the system of equations deter-
mined by imposing
 
x1 − p1 x2 − p2 . . . xn − pn
rk   = 1.
a1 a2 ... an

Since the above matrix has 2 lines and n columns, and the bottom row is non-
zero, asking that it has rank 1 is equivalent to asking that every 2×2 submatrix
has determinant 0. Again, since the bottom row is non-zero there exists some
i ∈ {1, . . . , n} such that ai 6= 0. Then by Theorem 2.3.17 the condition on

139
Andrea Ferraguti Chapter 6: Affine geometry

the rank is satisfied if and only if every 2 × 2 submatrix containing the i-th
column has determinant 0. Assuming that a1 6= 0, we get:
 
x1 − p 1 xi − p i
det   = 0 for every i = 2, . . . , n,
a1 ai

that amounts to the system of cartesian equations for `:




 a2 x1 − a1 x2 = a2 p1 − a1 p2

a x − a x = a p − a p
3 1 1 3 3 1 1 3
`: .


 . . .
an x1 − a1 xn = an p1 − a1 pn

Notice that the matrix corresponding to the above system is:


 
a −a1 0 0 ... 0
 2 
0 −a1 0 . . . 0 
 
 a3
 ,
 
. . . . . . . . . . . . . . . . . . 
 
an 0 0 . . . 0 −a1

that clearly has rank n − 1 because if we erase the first column the resulting
square matrix has determinant (−a1 )n−1 6= 0, since we assumed that a1 6=
0.

Example 6.5.4. Consider the line ` ⊆ A3 (R) given by the system of equa-
tions: (
2x + 3y + z = 1
`: .
x+z =3
In order to find a system of parametric equations for `, all we have to do
is to solve the system. Letting
 
2 3 1
A= 
1 0 1

be the matrix associated with the system, we see that it has rank 2, and

140
Andrea Ferraguti Chapter 6: Affine geometry

hence the system has ∞1 solutions. Rewriting it as:


(
2x + 3y = 1 − z
x=3−z

and setting z = t, we see immediately that the set of solutions is:

S = {(3 − t, −5/3 + 1/3t, t) : t ∈ R}

and therefore a system of parametric equations for ` is:



x = 3 − t

y = −5/3 + 1/3t .

z=t

Notice that the line ` passes through the point (3, −5/3, 0).

Example 6.5.5. Consider the line ` ⊆ A3 (R) given by the system of para-
metric equations: 
x = 1 + t

y = −t .

z = 1 + 2t

In order to find a system of cartesian equations, all we have to do is to


consider the matrix  
x−1 y z−1
 
1 −1 2
and to impose that the 2 × 2 submatrices all have determinant 0. As
explained, it is enough to fix a column whose bottom entry is non-zero
and to look at the two 2 × 2 submatrices containing it. In this case, we
can fix the first column and get:
   
x−1 y x−1 z−1
det   = det   = 0,
1 −1 1 2

141
Andrea Ferraguti Chapter 6: Affine geometry

getting the system (


x+y =1
`: .
2x − z = 1

By Theorem 6.4.5, in general a linear subspace of An (K) of dimension m


is simply the set of solutions of a system of the form:


 a11 x1 + a12 x2 + . . . + a1n xn = b1

a x + a x + . . . + a x = b
21 1 22 2 2n n 2
,
. . .



a(n−m)1 x1 + . . . + a(n−m)n xn = bn−m

where the matrix associated to the system has rank n − m. Therefore we


immediately get the following proposition.

Proposition 6.5.6. A linear subspace π ⊆ An (K) is a hyperplane if and


only if it is described by an equation of the form:

π : a1 x1 + a2 x2 + . . . + an xn = b,

where (a1 , . . . , an ) 6= (0, . . . , 0).


Notice that a hyperplane in A2 (K) is simply a line, and a hyperplane in A3 (K)
is simply a plane.
In general a plane in An (K) is a linear subspace of dimension 2. Thanks
to Proposition 6.5.6, we see that in A3 (K) a plane is described by an equation
of the form:
a1 x1 + a2 x2 + a3 x3 = b,
with (a1 , a2 , a3 ) 6= (0, 0, 0). It becomes then immediately clear that a sys-
tem of cartesian equations for a line in A3 (K) represents, geometrically, the
intersection of two incident planes.

6.6 Relative position of subspaces via equations


Let S, S 0 ⊆ An (K) be two linear subspaces of dimension m and m0 , respec-
tively. These are defined by two linear systems AX = B and A0 X = B 0 ,
respectively, where A ∈ M(n−m)×n (K), A0 ∈ M(n−m0 )×n (K) are matrices of
rank n − m and n − m0 , respectively. As noticed in Remark 6.4.6, the transla-
tion spaces of S and S 0 are nothing else than ker A and ker A0 . Hence we have

142
Andrea Ferraguti Chapter 6: Affine geometry

that
S is parallel to S 0 ⇐⇒ ker A ⊆ ker A0 or ker A0 ⊆ ker A. (31)

Lemma 6.6.1. Let m, p, n be positive integers with m ≤ p ≤ n. Let


A ∈ Mm×n (K) and B ∈ Mp×n (K) and assume that rk(A) = m and
rk(B) = p. Then we have that ker(B) ⊆ ker(A) if and only if
 
A
rk   = rk(B)
B
 
A
where   is the (m + p) × n matrix obtained by adjoining the rows of
B
B below those of A.

Proof. We denote by • the standard scalar product on K n . With respect


to this scalar product, the rows of A are orthogonal to vectors in ker A,
and those of B are orthogonal to vectors of ker B. This means that

R(A) ⊆ ker(A)⊥ and R(B) ⊆ ker(B)⊥ , (32)

where R denotes the space generated by the rows.


Now by Lemma 6.4.4 we have that

dim(ker(A)⊥ ) = n − dim(ker(A)) and dim(ker(B)⊥ ) = n − dim(ker(B)),

and since rk(A) = m and rk(B) = p, by Theorem 3.2.7 we have that

dim ker(A)⊥ = m and dim ker(B)⊥ = p. (33)

On the other hand, by Theorem 2.3.15 we have that dim R(A) = m and
dim R(B) = p, so by (32) and (33) we get that

R(A) = ker(A)⊥ and R(B) = ker(B)⊥ . (34)

It follows that the rows of A are a basis of (ker A)⊥ and those of B are
a basis of (ker B)⊥ .

143
Andrea Ferraguti Chapter 6: Affine geometry

If ker B ⊆ ker A, then (ker A)⊥ ⊆ (ker ⊥


 B) , so R(A) ⊆ R(B). But
A
then it follows immediately that rk   = rk(B).
B
 
A
Conversely, if rk   = rk(B) then it means that each row of A is
B
linearly dependent from the rows of B and hence if R is a row of A then
R ∈ R(B). Hence R(A) ⊆ R(B), and therefore by (34) we have that
(ker A)⊥ ⊆ (ker B)⊥ . It follows that

((ker B)⊥ )⊥ ⊆ ((ker A)⊥ )⊥ ,

and by Lemma 6.4.4 we have that ker B ⊆ ker A.

We can now proceed to describe relative positions of lines and planes in A2 (K)
and A3 (K) (as we did in Theorems 6.3.9,6.3.10 and 6.3.11) but using their
equations.

Theorem 6.6.2. Let ` : ax + by = c and `0 : a0 x + b0 y = c0 be two lines in


A2 (K).

1. ` k `0 if and only if  
a b
rk   = 1.
0 0
a b
If ` k `0 , then ` = `0 if and only if
 
a b c
rk   = 1.
0 0 0
a b c

2. ` and `0 are incident if and only if


 
a b
rk   = 2.
0 0
a b

Proof. By Remark 6.4.6, lines ` and `0 are associated with the linear

144
Andrea Ferraguti Chapter 6: Affine geometry

systems AX = B and A0 X = B 0 , respectively, where


       
A= a b , B= c A 0 = a0 b 0 , B = c 0

and their translation spaces are ker A and ker A0 , respectively. Since
rk(A) = rk(A0 ) = 1, by (31) and Lemma 6.6.1, we have that ` k `0 if
and only if  
a b
rk   = 1,
0 0
a b
and consequently ` and `0 are incident if and only if
 
a b
rk   = 2.
a0 b 0

If ` k `0 , by Proposition 6.2.8 we have ` = `0 if and only if ` ∩ `0 6= ∅, that


is, if and only if the system
(
ax + by = c
a0 x + b 0 y = c 0
 
a b
is compatible. Since rk   = 1, by Theorem 3.1.4 this happens if
a0 b 0
 
a b c
and only if rk   = 1.
a0 b 0 c 0

Theorem 6.6.3. Let π : ax + by + cz = d and π 0 : a0 x + b0 y + c0 z = d0 two


planes in A3 (K).

1. π k π 0 if and only if
 
a b c
rk   = 1.
0 0 0
a b c

145
Andrea Ferraguti Chapter 6: Affine geometry

If π k π 0 then π = π 0 if and only if


 
a b c d
rk   = 1.
0 0 0 0
a b c d

2. π and π 0 are incident if and only if


 
a b c
rk   = 2.
0 0 0
a b c

Proof. The translation spaces of the planes π, π 0 are ker A, ker A0 , respec-
tively, where    
A = a b c and A0 = a0 b0 c0 .
0 0
Since rk(A) = rk(A ) = 1, by  (31) and Lemma 6.6.1, it follows that π k π
a b c
if and only if rk   = 1.
a0 b 0 c 0
In this case, by Proposition 6.2.8 we have π = π 0 if and only if π ∩ π 0 6=
∅, that is, if and only if the system
(
ax + by + cz = d
a0 x + b0 y + c0 z = d0
 
a b c
is compatible, and since rk   = 1, this happens precisely when
0 0 0
a b c
 
a b c d
rk   = 1.
a0 b 0 c 0 d 0

Theorem 6.6.4. Let


(
a1 x + b 1 y + c 1 z = d 1
`: and π : a3 x + b3 y + c3 z = d3
a2 x + b 2 y + c 2 z = d 2

146
Andrea Ferraguti Chapter 6: Affine geometry

be a line and a plane in A3 (R), respectively.

1. ` k π if and only if  
a b c
 1 1 1
rk a2 b2 c2  = 2.
 
 
a3 b 3 c 3
If ` and π are parallel, then ` ⊆ π if and only if
 
a b1 c 1 d 1
 1 
rk a2 b2 c2 d2  = 2.
 
 
a3 b3 c 3 d 3

2. ` and π are incident if and only if


 
a1 b 1 c 1
 
rk a2 b2 c2  = 3.
 
 
a3 b 3 c 3

Proof. The translation spaces of `, π are ker A and ker A0 , respectively,


where  
a1 b 1 c 1  
A=  and A0 = a3 b3 c3 .
a2 b 2 c 2
Of course since dim ker A = 1 and dim ker A0 = 2 we have that ` k π if
and only if ker A ⊆ ker A0 , and since rk(A)
 = 2 and 0
 rk(A ) = 1, by Lemma
a b c
 1 1 1
6.6.1 this happens precisely when rk a2 b2 c2  = 2.
 
 
a3 b 3 c 3
By Proposition 6.2.8, we have that ` ⊆ π if and only if ` ∩ π 6= ∅,
namely if and only if the system
(
a1 x + b 1 y + c 1 z = d 1
a2 x + b 2 y + c 2 z = d 2

147
Andrea Ferraguti Chapter 6: Affine geometry

 
a b c
 1 1 1
is compatible, and since rk a2 b2 c2  = 2 this happens precisely when
 
 
a3 b 3 c 3
 
a b c d
 1 1 1 1
rk a2 b2 c2 d2  = 2.
 
 
a3 b 3 c 3 d 3

Theorem 6.6.5. Let


( (
a1 x + b1 y + c1 z = d1 a3 x + b 3 y + c 3 z = d 3
`: and `0 :
a2 x + b 2 y + c 2 z = d 2 a4 x + b 4 y + c 4 z = d 4

be two lines in A3 (K).

1. We have that ` k `0 if and only if


 
a b1 c 1
 1 
 
a2 b2 c2 
rk 

 = 2.

a3 b3 c3 
 
a4 b4 c4

If this is the case, then ` = `0 if and only if


 
a b c d1
 1 1 1 
 
a2 b2 c2 d2 
rk 

 = 2.

a3 b3 c3 d3 
 
a4 b4 c4 d4

148
Andrea Ferraguti Chapter 6: Affine geometry

2. ` and `0 are skew if and only if


   
a b c1 a b1 c 1 d 1
 1 1   1 
   
 a2 b 2 c2  a b2 c 2 d2 
rk   = 3 and rk  2  = 4.
   
 a3 b 3 c3   a3 b3 c 3 d3 
   
a4 b 4 c4 a4 b4 c 4 d4

3. ` and `0 are incident if and only if


   
a b c1 a b1 c 1 d 1
 1 1   1 
   
a2 b2 c2  a b2 c 2 d2 
rk   = rk  2  = 3.
   
a3 b3 c3  a3 b3 c 3 d3 
   
a4 b 4 c4 a4 b4 c 4 d4

Proof. The translation spaces of `, `0 are ker A and ker A0 , respectively,


where    
a1 b 1 c 1 a b c
A=  and A0 =  3 3 3  .
a2 b 2 c 2 a4 b 4 c 4
Since rk(A) = rk(A0 ) = 2, we have ` k `0 if andonly if ker A 0
 = ker A . By
a b c
 1 1 1
 
0
a2 b2 c2 
Lemma 6.6.1, ker A = ker A if and only if rk    = 2. If this

a3 b3 c3 
 
a4 b 4 c 4
is the case, then by Proposition 6.2.8 we have that ` = `0 if and only if
` ∩ `0 6= ∅, that is, if and only if the system


 a1 x + b 1 y + c 1 z = d 1

a x + b y + c z = d
2 2 2 2
(35)


 a 3 x + b3 y + c 3 z = d 3

a4 x + b 4 y + c 4 z = d 4

149
Andrea Ferraguti Chapter 6: Affine geometry

 
a b c
 1 1 1
 
 a2 b 2 c 2 
is compatible, and since rk 

 = 2 this happens precisely when

 a3 b 3 c 3 
 
a4 b 4 c 4
 
a b c d
 1 1 1 1
 
 a2 b 2 c 2 d 2 
rk   = 2. If ` and `0 are not parallel, then either they
 
 a3 b 3 c 3 d 3 
 
a4 b 4 c 4 d 4
are skew or they are incident. This depends on ` ∩ `0 , that is empty
in the first case and has one point in the second. This is governed by
system (35): the system has no solution when the lines are skew and has
precisely one solution when they are incident. Theorem 3.1.4 allows then
to conclude.

6.7 Pencils and bundles of lines and planes

Definition 6.7.1. A pencil of lines in A2 (K) is the set of all lines that
pass through a given point.
An improper pencil of lines in A2 (K) is the set of all lines that are
parallel to a given one.

The next proposition explains how to write down equations for pencils of
lines in A2 (K).

Proposition 6.7.2.

1. Let P = (xP , yP ) ∈ A2 (K). Let r : ax + by + c = 0 and s : a0 x + b0 y +


c0 = 0 be any two distinct lines of A2 (K) passing through P . Let
` : a00 x + b00 y + c00 = 0 be a line. Then ` belongs to the pencil of lines
through P if and only if there exist λ, µ ∈ K with (λ, µ) 6= (0, 0)
such that:

a00 x + b00 y + c00 = λ(ax + by + c) + µ(a0 x + b0 y + c0 ).

2. Let r : ax + by + c = 0 be a line. A line ` ⊆ A2 (K) belongs to the

150
Andrea Ferraguti Chapter 6: Affine geometry

improper pencil of lines parallel to r if and only if there exists k ∈ K


such that an equation for ` is:

ax + by + k = 0.

Proof. 1. First, assume that the line ` passes through P . Then a00 xP +
b00 yP + c00 = 0. This means that the vectors (a00 , b00 , c00 ) and (xP , yP , 1) are
orthogonal with respect to the standard scalar product on K 3 . Hence,
(a00 , b00 , c00 ) ∈ h(xP , yP , 1)i⊥ . Now since r and s both pass through P , the
same reasoning holds true, so that

(a, b, c), (a0 , b0 , c0 ) ∈ h(xP , yP , 1)i⊥ .

On the one hand the space h(xP , yP , 1)i⊥ has dimension 2 by Lemma 6.4.4.
Since by hypothesis the lines r and s are distinct, the vectors (a, b, c) and
(a0 , b0 , c0 ) are linearly independent in K 3 . It follows that these two vector
form a basis of h(xP , yP , 1)i⊥ , and therefore there exist λ, µ ∈ K such that

(a00 , b00 , c00 ) = λ(a, b, c) + µ(a0 , b0 , c0 ),

as required.
Conversely, if the equation of ` is λ(ax + by + c) + µ(a0 x + b0 y + c0 ) = 0
then ` passes through P since axP + byP + c = a0 xP + b0 yP + c = 0 by
hypothesis, and therefore

λ(axP + byP + c) + µ(a0 xP + b0 yP + c) = 0.

2. This follows immediately from Theorem 6.6.2.

Example 6.7.3. Let (1, 1) ∈ A2 (R). In order to find the pencil of lines
through (1, 1) we first need to find two distinct lines through such point.
For example, we can pick x − 1 = 0 and y − 1 = 0. Next, the equation of
the pencil is simply:

λ(x − 1) + µ(y − 1) = 0.

Remark 6.7.4. One might think that the pencil of lines through a point
(xP , yP ) is y − yP = m(x − xP ), where m ∈ K. However, this equation
misses one of the lines of the pencil, namely the line x = xP . In fact

151
Andrea Ferraguti Chapter 6: Affine geometry

this equation comes from the following simplification: given the correct
equation of the pencil of lines through P , namely λr + µs = 0 for some
lines r, s through P , we can divide everything by µ, since proportional
equations give rise to the same line. This way we obtain an equation that
depends only on one parameter, namely λ/µ, but we miss the equation of
the line of the pencil that corresponds to µ = 0.

Definition 6.7.5. A pencil of planes in A3 (K) is the set of all planes that
contain a given line.
An improper pencil of planes in A3 (K) is the set of all planes that are
parallel to a given one.

Proposition 6.7.6.

1. Let ` ⊆ A3 (K) be a line. Let π : ax + by + cz + d = 0 and σ : a0 x +


b0 y + c0 z + d0 = 0 be two distinct planes that contain `. Then a plane
ϑ : a00 x + b00 y + c00 z + d00 = 0 belongs to the pencil of planes through
` if and only if there exist λ, µ ∈ K with (λ, µ) 6= (0, 0) such that:

a00 x + b00 y + c00 z + d00 = λ(ax + by + cz + d) + µ(a0 x + b0 y + c0 z + d).

2. Let π : ax + by + cz + d = 0 be a plane in A3 (K). Then a plane


σ ⊆ A3 (K) belongs to the improper pencil of planes parallel to π if
and only if there exists k ∈ K such that an equation for σ is:

ax + by + cz + k = 0.

x = x0 + v1 t

Proof. 1. Let ` : y = y0 + v2 t be a parametric equation for `, with

z = z0 + v3 t

(v1 , v2 , v3 ) 6= (0, 0, 0).
First, assume that ϑ contains `. Then we must have:

a00 (x0 + v1 t) + b00 (y0 + v2 t) + c00 (z0 + v3 t) + d00 = 0,

152
Andrea Ferraguti Chapter 6: Affine geometry

and this is equivalent to the pair of conditions:


(
a00 x0 + b00 y0 + c00 z0 + d00 = 0
.
a00 v1 + b00 v2 + c00 v3 = 0

In turn, these two conditions are equivalent to asking that the vector
(a00 , b00 , c00 , d00 ) is orthogonal, via the standard scalar product on K 4 , to
both (x0 , y0 , z0 , 1) and (v1 , v2 , v3 , 0). In other words, if we let W :=
h(x0 , y0 , z0 , 1), (v1 , v2 , v3 , 0)i, we have:

(a00 , b00 , c00 , d00 ) ∈ W ⊥ .

Now, since (v1 , v2 , v3 , 0) 6= 0, the vectors (x0 , y0 , z0 , 1) and (v1 , v2 , v3 , 0) are


linearly independent, since the last coordinate of the former is 1 and that
of the latter is 0. It follows that dim W = 2 and hence dim W ⊥ = 2 by
Lemma 6.4.4. Now since both π and σ contain `, the same argument we
used above proves that (a, b, c, d) and (a0 , b0 , c0 , d0 ) both belong to W ⊥ . On
the other hand these vectors must be linearly independent, since π 6= σ.
Hence they form a basis of W ⊥ , and it follows that there exist λ, µ ∈ K
such that:
(a00 , b00 , c00 , d00 ) = λ(a, b, c, d) + µ(a0 , b0 , c0 , d0 ).
Conversely, if the equation of ϑ is of the form λπ + µσ, then necessarily
` ⊆ ϑ, since it is contained in both π and σ.
2. This follows immediately from Theorem 6.6.3.

Definition 6.7.7. A bundle of lines in A3 (K) is the set of all lines that
pass through a given point.
An improper bundle of lines in A3 (K) is the set of all lines that are
parallel to a given one.
A bundle of planes in A3 (K) is the set of all planes that pass through
a given point.
An improper bundle of planes in A3 (K) is the set of all planes that
are parallel to a given line.

Given a point P = (xP , yP , zP ) ∈ A3 (K), the parametric equation for the

153
Andrea Ferraguti Chapter 6: Affine geometry

bundle of lines through P is:



x = xP + λt

y = yP + µt ,

z = zP + νt

where (λ, µ, ν) ∈ K 3 \ {(0, 0, 0)}. Notice that two non-zero triples (λ0 , µ0 , ν0 )
and (λ00 , µ00 , ν00 ) in K 3 determine the same line in the bundle if and only if they
are proportional. Equivalently, one can write down a cartesian equation for
the bundle of lines through P , that is the following:
(
µ(x − xP ) − λ(y − yP ) = 0
.
ν(x − xP ) − λ(z − zP ) = 0
Given a line ` ⊆ A3 (K) with translation space generated by a non-zero
vector (v1 , v2 , v3 ) ∈ K 3 , the parametric equation for the bundle of lines parallel
to ` is: 
x = λ + v1 t

y = µ + v2 t ,

z = ν + v3 t

where (λ, µ, ν) ∈ K 3 . Two triples (λ0 , µ0 , ν0 ) and (λ00 , µ00 , ν00 ) in K 3 determine
the same line in the bundle if and only if the line through the points (λ0 , µ0 , ν0 )
and (λ00 , µ00 , ν00 ) of A3 (K) has direction (v1 , v2 , v3 ). This happens precisely when
 
λ0 − λ00 µ0 − µ00 ν0 − ν00
rk   = 1.
v1 v2 v3
The proof of the following proposition is completely analogous to that of
Proposition 6.7.2, so we omit it. We invite the interested readers to try to
write it down themselves.

Proposition 6.7.8.

1. Let P ∈ A3 (K) and let π1 , π2 , π3 ⊆ A3 (K) be three planes through


P such that π1 ∩ π2 ∩ π3 = {P }. Then a plane σ belongs to the
bundle of planes through P if and only if there exist λ, µ, ν ∈ K not
all zero such that
σ = λπ1 + µπ2 + νπ3 .

2. Let ` ⊆ A3 (K) be a line. Let π1 , π2 , π3 ⊆ A3 (K) be three planes

154
Andrea Ferraguti Chapter 6: Affine geometry

parallel to ` such that π1 ∩ π2 ∩ π3 = ∅ and π1 , π2 , π3 are not all


parallel to each other. Then a plane σ belongs to the bundle of
planes parallel to ` if and only if there exist λ, µ, ν ∈ K not all zero
such that
σ = λπ1 + µπ2 + νπ3 .
Note:-
A pencil is, roughly speaking, an family that is determined by a pair of pa-
rameters, while a bundle is determined by a triple of parameters. However,
since proportional parameters define the same object in the pencil/bundle,
in a pencil there is just 1 “degree of freedom”, while in a bundle there are 2.
When K = R, we sometimes say that a pencil contains ∞1 objects, while a
bundle contains ∞2 objects.
We will see later on that projective geometry yields a framework where
there is no real distinction between proper and improper pencils and bundles.

155
Andrea Ferraguti Chapter 7: Euclidean geometry

Chapter 7: Euclidean geometry


7.1 Euclidean spaces

Definition 7.1.1. A euclidean space of dimension n is a 4-tuple (E, V, f, •),


where V is an R-vector space of dimension n, the triple (E, V, f ) is an
affine space of dimension n over the field R and • is a positive definite
scalar product on V .

Example 7.1.2. The affine space An (R) can be seen as an euclidean space
of dimension n when the underlying vector space Rn is endowed with the
standard scalar product. This euclidean space is denoted by En (R).

Definition 7.1.3. Let (E, V, f, •) be a euclidean space of dimension n.


Two linear subspaces [O, W ] and [O0 , W 0 ] are orthogonal if W ⊥ ⊆ W 0 or
W ⊆ W 0⊥.
We write [O, W ] ⊥ [O0 , W 0 ] to denote orthogonality.

Note:-
In this chapter we will appeal several times, without citing it every time,
to the following facts. Let V be an R-vector space of dimension n with a
positive definite scalar product • and W ⊆ V be a vector subspace. Then:

• W ⊕ W⊥ = V ;

• dim W ⊥ = n − dim W ;

• (W ⊥ )⊥ = W .

These facts are the content of Theorem 4.2.13 and Corollary 4.2.14.

Remark 7.1.4. Let (E, V, f, •) be a euclidean space of dimension n. Two


linear subspaces [O, W ] and [O0 , W 0 ] such that 1 ≤ dim W, dim W 0 ≤ n − 1
cannot be parallel and orthogonal at the same time. In fact, if W k W 0
then suppose without loss of generality that W ⊆ W 0 . If it was W ⊥ ⊆ W 0
then it would also be W + W ⊥ ⊆ W 0 . But W + W ⊥ = V , and therefore
W 0 = V , contradicting the fact that dim W 0 < n. If on the other hand
it was W ⊆ W 0 ⊥ , then it would be W 0 ⊆ W ⊥ by taking orthogonal

156
Andrea Ferraguti Chapter 7: Euclidean geometry

complements, and hence W ⊆ W ⊥ , that is impossible because dim W > 0.

Proposition 7.1.5. Let (E, V, f, •) be a euclidean space of dimension 2.


Let ` ⊆ E be a line and P ∈ E be a point. Then there exists a unique
line `0 such that P ∈ `0 and ` ⊥ `0 .

Proof. Let W be the translation space of `. Since dim V = 2, we have


that dim W ⊥ = 1, and therefore the only possible translation space for a
line orthogonal to ` is W ⊥ . Hence [P, W ⊥ ] is the unique line through P
that is orthogonal to `.

Proposition 7.1.6. Let (E, V, f, •) be a euclidean space of dimension n.


Let ` be a line, H be a hyperplane and P be a point.

1. There exists a unique hyperplane through P that is orthogonal to `.

2. There exists a unique line through P that is orthogonal to H.

Proof. 1. Let W be the translation space of `. Then dim W ⊥ = n − 1, so


if W 0 is the translation space of a hyperplane that is orthogonal to `, we
must necessarily have W 0 = W ⊥ . Hence [P, W ⊥ ] is the unique hyperplane
orthogonal to ` passing through P .
2. Let U be the translation space of H. This has dimension n − 1,
and so dim U ⊥ = 1. Hence if U 0 is the translation space of a line that is
orthogonal to H, we must necessarily have U 0 = U ⊥ . Hence [P, U ⊥ ] is the
unique line through P that is orthogonal to H.

Proposition 7.1.7. Let (E, V, f, •) be a euclidean space of dimension 3.


Let `, π ⊆ E be a line and a plane, respectively.

1. If ` ⊥ π, then ` is orthogonal to every line `0 ⊆ π.

2. If ` is not orthogonal to π, then there exists a unique plane π 0 that


is orthogonal to π and such that ` ⊆ π 0 .

Proof. 1. Let W be the translation space of ` and let U be the translation


space of π. Since dim W ⊥ = 2 and ` ⊥ π, it must be that W ⊥ = U . If
`0 ⊆ π, then the translation space U 0 of `0 is contained in U , and therefore
U 0 ⊆ W ⊥ . But then ` ⊥ `0 , by definition.

157
Andrea Ferraguti Chapter 7: Euclidean geometry

2. Let W be the translation space of π. If σ is another plane with


translation space U , then in order for π and σ to be orthogonal it must
be that W ⊥ ⊆ U , since by dimension counting it cannot happen that
W ⊆ U ⊥.
Now let W 0 be the translation space of `. The space W ⊥ is a 1-
dimensional subspace of V , and since ` is not orthogonal to π, we cannot
have that W ⊥ = W 0 . Since W 0 , W ⊥ are both 1-dimensional, this means
that dim(W 0 + W ⊥ ) = 2. On the other hand, any plane containing `
must have a translation space that contains W 0 . Hence the plane π 0 =
[P, W 0 + W ⊥ ], where P is any point of `, is the unique plane containing `
that is orthogonal to π.

Proposition 7.1.8. Let `, `0 ⊆ E3 (R) be two skew lines. Then there exists
a unique line that is orthogonal and incident to both ` and `0 .

Proof. Let W, W 0 be the translation spaces of `, `0 , respectively. Since the


two lines are skew, we have W 6= W 0 . Therefore it cannot be W ⊥ = W 0 ⊥ ,
as otherwise taking orthogonal complements we would get that W = W 0 .
Hence W ⊥ and W 0 ⊥ are two distinct 2-dimensional subspaces of V , and
necessarily we must have W ⊥ + W 0 ⊥ = V . By the Grassmann formula it
follows that

dim(W ⊥ ∩ W 0 ) = 1,
namely, there exists a unique 1-dimensional subspace of V that is or-
thogonal to W and W 0 at the same time. Let U := W ⊥ ∩ W 0 ⊥ be such
subspace. Now let P ∈ ` and consider the plane π = [P, U + W ] (notice
that dim(U + W ) = 2 since U ⊆ W ⊥ ). Clearly, any line that is orthogonal
and incident to both ` and `0 must be contained in π. Now we claim that
π is not parallel to `0 . In fact, suppose by contradiction that it is. Then
we necessarily have W 0 ⊆ U + W . Let w0 ∈ W 0 be a non-zero vector.
Then there exists w ∈ W such that w0 − w ∈ U . But U = W ⊥ ∩ W 0 ⊥ , and
hence w0 − w is orthogonal to both w and w0 , i.e.

w • (w0 − w) = w0 • (w0 − w) = 0,

and subtracting the two equations term by term and using the properties
of the scalar product we would get

kw0 − wk2 = 0,

158
Andrea Ferraguti Chapter 7: Euclidean geometry

that implies w0 = w since • is positive definite. Hence w ∈ W ∩ W 0 , but


since W 6= W 0 we have W ∩ W 0 = {0}, so that w = 0, which contradicts
the hypothesis.
Hence π and `0 are not parallel, and the only other possibility is that
they are incident. Let Q = π ∩ `0 be the incidence point. The line [Q, U ]
is the only line that can be orthogonal and incident to both ` and `0 . We
already know that it is orthogonal to both ` and `0 , since its direction
is U , and it is incident to `0 , by construction. On the other hand it is
contained in π, that is a 2-dimensional affine space, and not parallel to `,
and therefore it must be incident to ` as well.

Definition 7.1.9. Let (E, V, f, •) be a euclidean space of dimension n and


let P, Q ∈ E be two points.
−→
1. The distance between P and Q is defined as d(P, Q) = kP Qk.

2. The segment with endpoints P and Q is defined as

P Q = {tλ−
−→ (P ) : λ ∈ [0, 1]},
PQ

namely it is the set of all translates of P by a vector of the form


−→
λP Q, where λ is a real number between 0 and 1.

3. The midpoint of the segment of endpoints P and Q is t 1 −


−→ (P ).
PQ
2

4. The axis of the segment P Q is the unique hyperplane through the


midpoint of P Q that is orthogonal to the line through P and Q
(this exists unique thanks to Proposition 7.1.6).

5. Let H be a hyperplane. The orthogonal projection of P onto H is


the unique intersection between H and the line through P that is
orthogonal to H.

7.2 Coordinate systems, orthogonality and distance

Definition 7.2.1. Let (E, V, f, •) be a euclidean space of dimension n. A


coordinate system is a pair (O, B) where O ∈ E is a point called origin
and B is an orthonormal basis of V .
Of course since a euclidean space is, in particular, an affine space, once a

159
Andrea Ferraguti Chapter 7: Euclidean geometry

coordinate system is chosen then Theorems 6.4.2 and 6.4.5 can be applied,
and E can be ”transformed” into the well-known affine space An (R). However,
this time we have an additional piece of structure, that is a positive definite
scalar product. Hence it is natural to ask what happens to the scalar product
on V , once we apply the map Λ[O,B] .

Theorem 7.2.2. Let (E, V, f, •) be euclidean space of dimension n and let


B = {v 1 , . . . , v n } be an orthonormal basis of V . Let

ΦB : V → Rn

v 7→ (α1 , . . . , αn )
Pn
where v = i=1 αi v i be the bijection of Theorem 6.4.2. Then for every
v, w ∈ V we have:
v • w = ΦB (v) · ΦB (w),
where the scalar product on the right hand side is the standard scalar
product on Rn .

Proof. Let v = ni=1 αi v i and w = ni=1 βi v i , where α1 , . . . , αn , β1 , . . . , βn ∈


P P
R. Then ΦB (v) = (α1 , . . . , αn ) and ΦB (w) = (β1 , . . . , βn ), so that
n
X
ΦB (v) · ΦB (w) = α i βi .
i=1

On the other hand,


n
! n
! n X
n n
X X X X
v•w = αi v i • βi wi = (αi βj )v i • wj = αi βi ,
i=1 i=1 i=1 j=1 i=1

where in the second equality we just used the properties of scalar products
and in the third equality we used the fact that B is an orthonormal basis.

Theorem 7.2.2 essentially says that once we choose a coordinate system, a


euclidean space of dimension n ”becomes” the euclidean space En (R). There-
fore we can now focus on the latter, and see how orthogonality relations are
detectable from equations of linear subspaces.

160
Andrea Ferraguti Chapter 7: Euclidean geometry

Proposition 7.2.3. Let ` : ax + by = c and `0 : a0 x + b0 y = c0 be two lines


in E2 (R). Then ` ⊥ `0 if and only if aa0 + bb0 = 0.
0 0
Proof. Thetranslation
 spaces
 of `,` are ker A and ker A , respectively,
where A = a b and A0 = a0 b0 . It is immediate to see that ker A =
h(−b, a)i and ker A0 = h(−b0 , a0 )i. Since dim(ker A)⊥ = 1 = dim ker A0 ,
we have that ` ⊥ `0 if and only if (ker A)⊥ = ker A0 . Clearly (ker A)⊥ =
h(a, b)i, and this coincides with ker A0 if and only if the vectors
 (a,
b) and
a b
(−b0 , a0 ) are linearly dependent, namely if and only if det   = 0,
0 0
−b a
0 0
that is precisely the condition aa + bb = 0.

Proposition 7.2.4.

1. Let π : ax + by + cz + d = 0 be a plane in E3 (R) and let W ⊆ R3 be


its translation space. Then W ⊥ = h(a, b, c)i.

2. Let π : ax + by + cz + d = 0 and π 0 : a0 x + b0 y + c0 z + d0 = 0 be two


planes in E3 (R). Then π ⊥ π 0 if and only if aa0 + bb0 + cc0 = 0.

3. Let π : ax + by + cz + d = 0 be a plane in E3 (R) and let ` ⊆ E3 (R) be


a line with translation spacegeneratedby a vector (a0 , b0 , c0 ) ∈ R3 .
a b c
Then π ⊥ ` if and only if rk   = 1.
0 0 0
a b c
 
Proof. 1. The translation space of π is ker A, where A = a b c . Since
for every v ∈ ker A we have that Av = 0, and clearly Av is the scalar
product of the vectors (a, b, c) and v, it follows that (a, b, c) ∈ (ker A)⊥ .
But dim ker A = 2, so dim(ker A)⊥ = 1 and therefore (ker A)⊥ is generated
by (a, b, c).
0 0
2. The translation
 spaces of π and
 π are, respectively, ker A and ker A
where A = a b c and A0 = a0 b0 c0 . Since they both have di-
mension 2, their orthogonal complements both have dimension 1. Hence
π ⊥ π 0 if and only if (ker A)⊥ ⊆ ker A0 . By point 1., (ker A)⊥ = h(a, b, c)i.
Hence (ker A)⊥ is contained in ker A0 if and only if A0 · (a, b, c) = 0, namely

161
Andrea Ferraguti Chapter 7: Euclidean geometry

if and only if aa0 + bb0 + cc0 = 0.


3. By 1., the orthogonal complement to the translation space of π is
generated by (a, b, c). Since the translation space of ` is also 1-dimensional,
we have that ` ⊥ π if and only if (a, b, c) and (a0 , b0 , c0 ) generate the same
space, i.e. if and only if they are linearly dependent.

Remark 7.2.5. In general, if H : a1 x1 + a2 x2 + . . . + an xn + a0 = 0 is a


hyperplane in En (R), the orthogonal complement of its translation space
is generated by the vector (a1 , . . . , an ).

Definition 7.2.6. Given a point P ∈ En (R) and a hyperplane H ⊆ En (R),


the distance between P and H is the distance of P from the orthogonal
projection of P onto H.

Theorem 7.2.7. Let H : a1 x1 + . . . + an xn + a0 = 0 be a hyperplane in


En (R) and let P = (xP1 , xP2 , . . . , xPn ) ∈ En (R). Then the distance between
P and H is given by the formula:

|a1 xP1 + a2 xP2 + . . . + an xPn + a0 |


pPn
2
.
i=1 ai

Proof. We have to compute the orthogonal projection of P onto H. The


direction orthogonal to H is, thanks to Remark 7.2.5, (a1 , . . . , an ). Hence
the unique line through P that is orthogonal to H is


 x1 = xP1 + a1 t
x = xP + a t

2 2 2
`: .


 . . .
xn = xPn + an t

To find its intersection with H, we susbstitute the parametric equation


into the equation of H, obtaining

a1 (xP1 + a1 t) + . . . + an (xPn + an t) + a0 = 0,

namely t = − PH(P
n
)
a2
, where we set H(P ) = a1 xP1 + a2 xP2 + . . . + an xPn + a0 .
i=1 i

162
Andrea Ferraguti Chapter 7: Euclidean geometry

This means that the orthogonal projection Q of P onto H has coordinates


 
P H(P ) p H(P ) P H(P )
x1 − a1 Pn 2 , x2 − a2 Pn 2 , . . . , xn − an Pn 2 .
i=1 ai i=1 ai i=1 ai

−→
Hence the vector P Q is just
 
−→ H(P ) H(P ) H(P )
P Q = − a1 Pn 2 , a2 Pn 2 , . . . , an Pn 2
i=1 ai i=1 ai i=1 ai

and its norm, which is the distance between P and H, is


v
u n
uX a2i H(P )2 |H(P )|
t Pn 2 2 = Pn 2 .
i=1 ( i=1 ai ) i=1 ai

Definition 7.2.8.

1. Let π, π 0 ⊆ E3 (R) be two parallel planes. Their distance is defined


as the distance of any point of π from π 0 .

2. Let `, π ⊆ E3 (R) be a line and a plane, respectively, that are parallel.


Their distance is defined as the distance of any point of ` from π.

Remark 7.2.9. The distance between two parallel planes is well-defined,


that is, it does not depend on the choice of a point on π. In fact, suppose
π and π 0 are parallel planes. Then by Theorem 6.6.3, they have cartesian
equations of the form π : ax + by + cz + d = 0 and π 0 : ax + by + cz + d0 = 0.
Let P = (xP , yP , zP ) ∈ π. By Theorem 7.2.7 we have that

|axP + byP + czP + d0 | |d0 − d|


d(P, π 0 ) = √ =√ ,
a2 + b 2 + c 2 a2 + b2 + c2
that is a formula that does not depend on the coordinates of P . Moreover,
it is symmetric in π and π 0 , that is, the distance between π and π 0 can
also be computed as the distance from π of any point of π 0 .
If ` and π are a line and a plane that are parallel, then there is a unique
plane π 0 containing ` that is parallel to π, so the distance between ` and

163
Andrea Ferraguti Chapter 7: Euclidean geometry

π is the same as the distance between π and π 0 .

Definition 7.2.10. Let `, `0 ⊆ E3 (R) be two skew lines. Let r be the


unique line that is orthogonal and incident to both ` and `0 , whose exis-
tence is granted by Proposition 7.1.8. Let P, Q be the incidence points.
Then the distance of ` and `0 is defined as the distance of P and Q.

Proposition 7.2.11. Let `, `0 ⊆ E3 (R) be two skew lines. Let π, π 0 ⊆


E3 (R) be two parallel planes such that ` ⊆ π and `0 ⊆ π 0 . Then

d(`, `0 ) = d(π, π 0 ).

Proof. Since π k π 0 , the two planes have the same translation space W .
Since ` ⊆ π and `0 ⊆ π 0 , if U is the translation space of ` and U 0 is the
translation space of `0 then necessarily U + U 0 ⊆ W . On the other hand
since ` and `0 are not parallel then U 6= U 0 and hence U ⊕ U 0 = W . In
particular, if (u) is a basis of U and (u0 ) is a basis of U 0 , then (u, u0 ) is a
basis of W .
Let r be the unique line that is incident and orthogonal to both ` and
`0 , and let Ur = hur i be its translation space. Since r ⊥ ` and r ⊥ `0 , we
have that ur • u = 0 and ur • u0 = 0; it follows that ur ∈ W ⊥ . Namely,
r ⊥ π. Let P = ` ∩ r and Q = `0 ∩ r. Then Q is the orthogonal projection
of P on π 0 , because it is the intersection of the unique line orthogonal to
π 0 and passing through P , that is r. Hence d(P, Q) is both the distance
from ` to `0 and that of π from π 0 .

Example 7.2.12. Let us compute the distance between the two lines
( (
x+y =0 0 x−y =0
`: and ` : .
x−z =1 2x + z = 0

By solving the two systems one sees that the translation space W of ` is
generated by (1, −1, 1) while the translation space W 0 of `0 is generated
by (1, 1, −2). Hence

W ⊥ = h(1, 1, 0), (1, 0, −1)i and W 0 = h(1, −1, 0), (2, 0, 1)i.

164
Andrea Ferraguti Chapter 7: Euclidean geometry

To find W ⊥ ∩ W 0 ⊥ , we need to solve the linear system

a(1, 1, 0) + b(1, 0, −1) = c(1, −1, 0) + d(2, 0, 1),

and we see easily that



W ⊥ ∩ W 0 = h(1, 3, 2)i.

Any line that is orthogonal and incident to ` must be contained in the


plane through ` that has translation space h(1, −1, 1), (1, 3, 2)i. This has
equation:
π : 5x + y − 4z − 4 = 0,
and its intersection with `0 is the point Q = (2/7, 2/7, −4/7). The line r
with direction (1, 3, 2) passing through Q has equation

x = 2/7 + t

r : y = 2/7 + 3t ,

z = −4/7 + 2t

and its intersection with ` is the point P = (1/7, −1/7, −6/7). Hence we
have
q
d(`, `0 ) = d(P, Q) = (2/7 − 1/7)2 + (2/7 + 1/7)2 + (−4/7 + 6/7)2 =
√ √
14 2
= =√ .
7 7
Now let us try to compute d(`, `0 ) using Proposition 7.2.11. First, we
need to find two parallel planes π, π 0 such that ` ⊆ π and `0 ⊆ π 0 . To do
this, notice that every plane of the form a(x + y) + b(x − z − 1) = 0, for
a, b ∈ R, contains `, and every plane of the form c(x − y) + d(2x + z) = 0,
for c, d ∈ R, contains `0 . Rewrite these equations as:

(a + b)x + ay − bz − b = 0 and (c + 2d)x − cy + dz = 0.

165
Andrea Ferraguti Chapter 7: Euclidean geometry

Now impose parallelism, namely:



a + b = c + 2d

a = −c .

−b = d

This system yields: 


b = −2/3a

c = −a

d = 2/3a

so that the parameters a = 3, b = −2, c = −3, d = 2 yield the two parallel


planes
π : x + 3y + 2z + 2 = 0 and π 0 : x + 3y + 2z = 0.
Now pick a point on π 0 , such as (0, 0, 0), and use Theorem 7.2.7. We get:

0 2 2 2
d(π, π ) = √ =√ =√ .
1+9+4 14 7

166
Andrea Ferraguti Chapter 8: Projective geometry

Chapter 8: Projective geometry


8.1 Equivalence relations

Definition 8.1.1. Let S be a set. A relation on S is a subset R of S × S.


A relation R ⊆ S × S is said to be:

1. reflexive if for every s ∈ R we have that (s, s) ∈ R;

2. symmetric if (s, t) ∈ R if and only if (t, s) ∈ R;

3. transitive if (s, t) ∈ R and (t, u) ∈ R implies that (s, u) ∈ R.

A relation that is reflexive, symmetric and transitive is called an equiva-


lence relation.

Example 8.1.2.

• Let S be a set. The relation R = S × S is an equivalence relation.

• Let S be a set. The relation R = {(s, s) : s ∈ S} is an equivalence


relation.

• Let S = N. The relation R = {(s, t) ∈ N×N : s−t ≥ 0} is reflexive,


since s − s = 0 for every s ∈ N, it is transitive because if s − t ≥ 0
and t − u ≥ 0 then adding up the two inequalities it follows that
s − u ≥ 0, so that (s, u) ∈ R, but it is not symmetric, since for
example (2, 1) ∈ R but (1, 2) ∈
/ R.

• Let S = Z. The relation R = {(s, t) ∈ Z × Z : s · t ≤ 0} is clearly


symmetric, since s · t = t · s but it is not reflexive, since for example
(2, 2) ∈
/ R, and it is not transitive since for example (1, −2) ∈ R,
(−2, 3) ∈ R but (1, 3) ∈/ R.

• Let S = Z. The relation R = {(s, t) ∈ Z × Z : s + t is even} is an


equivalence relation: (s, s) ∈ R for every s ∈ Z since 2s is always
even, if s + t then so is s + t and if s + t is even and t + u is even,
then (s + t) + (t + u) is also even. Since the latter sum is s + u + 2t,
then s + u needs to be even as well.

167
Andrea Ferraguti Chapter 8: Projective geometry

Definition 8.1.3. Let S be a set and R be an equivalence relation. If


(s, t) ∈ R we write s ∼R t, or just s ∼ t when there is no risk of ambiguity.
We say that s is in relation with t.
Given s ∈ S, the set

[s] := {t ∈ S : s ∼ t}

is called the equivalence class of s.


If [s] is an equivalence class, an element of [s] is called a representative
of [s].

Remark 8.1.4. Equivalence classes are never empty because equivalence


relations are reflexive, and hence s ∈ [s] for every s ∈ S.

Proposition 8.1.5. Let S be a set and R be an equivalence relation on S.


Let s, t ∈ S. Then either [s] = [t] or [s] ∩ [t] = ∅.

Proof. Suppose that [s] ∩ [t] 6= ∅, so that there exists u ∈ [s] ∩ [t]. Now
let v ∈ [s]. Then by definition v ∼ s. On the other hand u ∈ [s] as well,
so that u ∼ s. Since the relation R is symmetric and transitive, it follows
that v ∼ u. On the other hand u ∈ [t], and hence u ∼ t. Since the relation
R is transitive, it follows that v ∼ t, and hence v ∈ [t]. This shows that
[s] ⊆ [t]. A completely symmetric argument shows that [t] ⊆ [s], and
hence [s] = [t].
Proposition 8.1.5 shows that an equivalence relation on S defines a partition
on S. That is, we can ”slice” S into equivalence classes that have empty
intersection, and are such that every element of S belongs to exactly one
equivalence class.

Definition 8.1.6. Let S be a set and R an equivalence relation on S. The


set S/ ∼ of equivalence classes with respect to R is called quotient set.

Example 8.1.7.

• Let S be a set and R = S ×S. Then every element of S is in relation


with every other element. Hence there is a single equivalence class,

168
Andrea Ferraguti Chapter 8: Projective geometry

i.e. S/ ∼ is a set with just one element.

• Let S be a set and R = {(s, s) : s ∈ S}. Then the equivalence class


of an element s contains only s. Hence the set S/ ∼ is in bijection
with S, since equivalence classes are in bijection with elements.

• Let S = Z and R = {(s, t) ∈ Z × Z : s + t is even}. Then if s ∈ Z is


even, it is in relation with every other even number, and it is not in
relation with any odd number. On the other hand if s is odd then it
is in relation with every other odd number but it is not in relation
with any even number. Hence there are just two equivalence classes:
that of even numbers and that of odd numbers. That is S/ ∼ is a
set with two elements (that we conventionally identify with 0 and
1). This quotient set is conventionally denoted by F2 (cf. Example
1.1.14).

8.2 Projective spaces


Let K be a field and let n ≥ 0 be a natural number. Consider the following
equivalence relation on the set S = K n+1 \ {0}. We let
R = {(v, w) ∈ S × S : ∃ λ ∈ K \ {0} s.t. λv = w}.
In other words, we consider two non-zero vectors of K n+1 to be equivalent if
they are proportional. Let us check that this is an equivalence relation.
• If (x1 , . . . , xn+1 ) ∈ S then
(x1 , . . . , xn+1 ) = 1 · (x1 , . . . , xn+1 ),
so the relation is reflexive.
• If λ(x1 , . . . , xn+1 ) = (y1 , . . . , yn+1 ) then since both vectors are non-zero
it must be λ 6= 0, and therefore
λ−1 (y1 , . . . , yn+1 ) = (x1 , . . . , xn+1 ),
so the relation is symmetric.
• If
λ(x1 , . . . , xn+1 ) = (y1 , . . . , yn+1 ) and µ(y1 , . . . , yn+1 ) = (z1 , . . . , zn+1 )
then
λµ(x1 , . . . , xn+1 ) = (z1 , . . . , zn+1 ),
and so the relation is transitive.

169
Andrea Ferraguti Chapter 8: Projective geometry

Definition 8.2.1. The quotient space K n+1 \ {0}/ ∼ is called projective


space of dimension n, and it is denoted by Pn (K).

Let us try to understand in detail elements of the projective space. The key
observation is the following: suppose that (x1 , . . . , xn+1 ) ∈ K n+1 \ {0} is such
that xn+1 6= 0. Then

(x1 , . . . , xn+1 ) ∼ x−1 −1 −1


n+1 (x1 , . . . , xn+1 ) = (xn+1 x1 , . . . , xn+1 xn , 1).

That is, whenever the last entry of (x1 , . . . , xn+1 ) is non-zero, the equivalence
class of (x1 , . . . , xn+1 ) contains an element whose last coordinate is 1. On the
other hand, suppose that (x1 , . . . , xn , 1), (y1 , . . . , yn , 1) ∈ K n+1 \ {0}. Then
these two elements are either equal or they are not in relation with each other.
In fact, if they were then there would be some non-zero λ ∈ K such that

λ(x1 , . . . , xn , 1) = (y1 , . . . , yn , 1),

but λ(x1 , . . . , xn , 1) = (λx1 , . . . , λxn , λ), and if this equals (y1 , . . . , yn , 1) then
necessarily λ = 1. But then (x1 , . . . , xn , 1) = (y1 , . . . , yn , 1).
Hence we can easily prove the following proposition.

Proposition 8.2.2. There exists a bijection between K n and the set of


equivalence classes of elements (x1 , . . . , xn+1 ) ∈ K n+1 \ {0} with xn+1 6= 0.

Proof. Consider the function

φ : K n → Pn (K)

(x1 , . . . , xn ) 7→ [(x1 , . . . , xn , 1)].


As we have seen above, this map is injective. On the other hand, if
[X] ∈ Pn (K) is the equivalence class of an element (x1 , . . . , xn+1 ) with
xn+1 6= 0 then

φ((x−1 −1 −1 −1
n+1 x1 , . . . , xn+1 xn )) = [(xn+1 x1 , . . . , xn+1 xn , 1)] = [X],

so every such equivalence class [X] is in the image of φ.

Definition 8.2.3. Equivalence classes [(x1 , . . . , xn+1 )] ∈ Pn (K) with xn+1 6=


0 are called proper points of Pn (K).

170
Andrea Ferraguti Chapter 8: Projective geometry

Now let us focus our attention on equivalence classes of elements of K n+1 \


{0} of the form (x1 , . . . , xn , 0). Notice that the subset of all such elements in
bijection with K n \ {0}, because since the last coordinate is 0 then it must be
(x1 , . . . , xn ) 6= (0, . . . , 0). Now if (x1 , . . . , xn , 0), (y1 , . . . , yn , 0) ∈ K n+1 \ {0},
these are in relation with each other exactly when (x1 , . . . , xn ) and (y1 , . . . , yn )
are in relation with each other as elements of K n \ {0}, because

λ(x1 , . . . , xn , 0) = (y1 , . . . , yn , 0) ⇐⇒ λ(x1 , . . . , xn ) = (y1 , . . . , yn ).

But equivalence classes of K n \ {0} are, by definition, elements of Pn−1 (K).


We have therefore proved the following proposition.

Proposition 8.2.4. There exists a bijection between Pn−1 (K) and the set
of equivalence classes of elements (x1 , . . . , xn+1 ) ∈ K n+1 \ {0} with xn+1 =
0.

Definition 8.2.5. Equivalence classes [(x1 , . . . , xn+1 )] ∈ Pn (K) with xn+1 =


0 are called improper points of Pn (K).

From now on, we will denote equivalence classes of the form [(x1 , . . . , xn+1 )]
by (x1 : x2 : . . . : xn+1 ).
We have seen that the space Pn (K) decomposes into two parts: the proper
points and the improper points. Let us now understand them more in detail
in the cases n = 1, 2, 3.
For n = 1, the proper points are in bijection with K, and they are in
bijection with points of the form (x : 1), where x is an element of K. What
are improper points? They are points of the form (x : 0), where x is a non-
zero element of K. But of course any two elements (x, 0), (y, 0) ∈ K 2 \ {0}
are in relation with each other, since x−1 y(x, 0) = (y, 0). Therefore there is
a unique improper point, that is (1 : 0). Hence P1 (K) is the union of K and
an improper point. This should be thought as a ”point at infinity”, and it is
sometimes denoted by ∞. Hence we have, as sets, P1 (K) = K ∪ {∞}.
On the other hand, K can be thought as the set of points of the affine
space A1 (K), and hence we can write

P1 (K) = A1 (K) ∪ {∞}.

In other words, the projective space of dimension 1, that is called projective


line, is the union of the affine line and an extra point ”at infinity”.

171
Andrea Ferraguti Chapter 8: Projective geometry

Remark 8.2.6. Another way we can think about points in P1 (K), without
distinguishing between proper and improper points, is as 1-dimensional
vector subspaces of K 2 . In fact, every non-zero (x, y) ∈ K 2 generates
a 1-dimensional subspace, that coincides with the subspace generated by
λ(x, y), for every λ ∈ K \ 0. Conversely, every 1-dimensional subspace is
generated by a non-zero vector of K 2 . Therefore there is a bijection

P1 (K) → {1-dimensional vector subspaces of K 2 }.

The projective space P2 (K) is called projective plane. Its proper points are
in bijection with K 2 , and in turn they are in bijection with points of the form
(x : y : 1), where (x, y) ∈ K 2 . Since K 2 is the set of points of A2 (K), we can
say that proper points are in bijection with A2 (K). Improper points are points
of the form (x : y : 0), and they are in bijection with P1 (K). Thanks to Remark
8.2.6, we can think of these as 1-dimensional subspaces of K 2 . Now every line
in A2 (K) has a translation space, that is simply a 1-dimensional subspace of
K 2 . Hence improper points of P2 (K) can be thought as translation spaces of
the lines in A2 (K). In other words there is a bijection

P2 (K) → A2 (K) ∪ {translation spaces of lines in A2 (K)}.

Hence the projective plane should be thought as the affine plane A2 (K) to-
gether with some ”extra points” that represent directions of the lines in A2 (K).
This view extends to higher dimensions as well, since K n \ {0}/ ∼ can
always be thought as the set of 1-dimensional subspaces of K n . Hence for
example there is a bijection

P3 (K) → A3 (K) ∪ {translation spaces of lines in A3 (K)}.

8.3 Linear subspaces

Definition 8.3.1. A linear subspace of dimension m in Pn (K) is a subset


of the form

S = {(x1 : . . . : xn+1 ) ∈ Pn (K) : (x1 , . . . , xn+1 ) ∈ ker A},

where A ∈ M(n+1−m)×(n+1) (K) is a matrix of rank n + 1 − m.

In other words, a linear subspace of dimension m is the set of equivalence


classes of vectors in the kernel of an (n + 1 − m) × (n + 1) matrix of rank

172
Andrea Ferraguti Chapter 8: Projective geometry

n + 1 − m, with 0 removed. Similarly to what we do for affine spaces, we call


a line a linear subspace of dimension 1, plane a linear subspace of dimension
2 and hyperplane a linear subspace of dimension n − 1.

Example 8.3.2.

• A line in P2 (K) is a subset of the form

` = {(x : y : z) ∈ P2 (K) : ax + by + cz = 0},

where a, b, c ∈ K are not all 0.

• A plane in P3 (K) is a subset of the form

π = {(x : y : z : t) ∈ P3 (K) : ax + by + cz + dt = 0},

where a, b, c, d ∈ K are not all 0.

• A line in P3 (K) is a subset of the form

` = {(x : y : z : t) ∈ P3 (K) : ax+by+cz+dt = a0 x+b0 y+c0 z+d0 t = 0},


 
a b c d
where a, b, c, d, a0 , b0 , c0 , d0 ∈ K and rk   = 2.
0 0 0 0
a b c d

Geometrically, one can define a surjective map


ψ : K n+1 \ {0} → Pn (K)
(x1 , . . . , xn+1 ) 7→ (x1 : . . . : xn+1 ).
Namely, each non-zero vector in K n+1 is mapped to its equivalence class in
the quotient set.
Now if we think of K n+1 as of the set of points of the affine space An+1 (K),
then a linear subspace of Pn (K) of dimension m is simply the image, via ψ, of
a linear subspace of An+1 (K) of dimension m + 1 passing through (0, 0, . . . , 0),
with (0, 0, . . . , 0) removed.
Let now ` : ax + by + cz = 0 be a line in P2 (K). We have seen that
P2 (K) = A2 (K) ∪ {directions of lines in A2 (K)}.
Hence we can ask what are the sets
` ∩ A2 (K) and ` ∩ {directions of lines in A2 (K)}

173
Andrea Ferraguti Chapter 8: Projective geometry

or, in other words, what are the proper points of ` and what are the improper
points. Let us start by assuming that (a, b) 6= (0, 0). To find proper points, we
just have to find which points (x0 : y0 : 1) of P2 (K) satisfy ax0 + bx0 + c = 0.
These are clearly in bijection with points on the affine line ax + by + c = 0.
On the other hand, improper points of ` are points of the form (x0 : y0 : 0)
that satisfy
ax0 + by0 = 0
It is immediate to see that there is just one such point, and it is (−b : a : 0).
Moreover, the space h(−b, a)i is precisely the translation space of the line
ax + by + c = 0. All in all, we have proven that the line ax + by + cz has
two types of points: its proper points are essentially points of the affine line
ax + by + c = 0, its unique improper point is (−b : a : 0), and it represents the
direction of the line ax + by + c = 0 in A2 (K).
When (a, b) = (0, 0) the line ` becomes z = 0.

Definition 8.3.3. The line z = 0 in P2 (K) is called improper line.

The improper line contains no proper points but it contains all improper
points of P2 (K).
An analogous reasoning can be applied to planes in P3 (K). If π : ax + by +
cz + dt = 0 is a plane with (a, b, c) 6= (0, 0, 0), its proper points are essentially
the points on the affine plane ax + by + cz + d = 0, while its improper points
are the points of the form (x0 : y0 : z0 : 0) that satisfy ax0 + bx0 + cz0 = 0.
Of course vectors (x ∈ K 3 that satisfy such relation constitute the
0 , y0 , z0 ) 

kernel of the matrix a b c , and this is the translation space of the plane
ax + by + cz = 0 in A3 (K). Therefore improper points of π correspond to
directions of the lines that are contained in the plane ax + by + cz = 0.
When (a, b, c) = (0, 0, 0), the plane π becomes t = 0.

Definition 8.3.4. The plane t = 0 in P3 (K) is called improper plane.

The improper plane contains no proper points but it contains all improper
points of P3 (K).
The above discussion can also be reversed. Namely, given a linear subspace
S of An (K), we can find a linear subspace S 0 of Pn (K) such that S 0 ∩ An (K) =
S. This is very easy to do: all we need to do is to homogeneize the equation of
the subspace. A subspace of An (K) is given by a system of linear equations,
in which all variables x1 , . . . , xn appear to the first degree. If we think of these

174
Andrea Ferraguti Chapter 8: Projective geometry

as of the proper points of a linear subspace of Pn (K), this means that each
variable xi has to be thought as xi /xn+1 . For instance, if ax + by + c = 0 is
the equation of a line in A2 (K), then all we need to do is to replace x with
x/z and y with y/z. The equation then becomes ax/z + by/z + c = 0, and
multiplying by z we get ax + by + cz = 0, that is the equation of a line in
P2 (K).

8.4 Equations of linear subspaces


As it happens for affine spaces, projective lines and planes have parametric
equations, too. Given two distinct points P = (xP : yP : zP ), Q = (xQ : yQ :
zQ ) ∈ P2 (K), a parametric equation of the line through P and Q is:

x = λxP + µxQ

` : y = λyP + µyQ ;

z = λzP + µzQ

here one should think of λ and µ as varying parameters that cannot be both
zero. In order to obtain a cartesian equation for ` one just has to impose that
 
x y z
 
det  zP yP zP  = 0.
 
 
xQ yQ zQ

Conversely, given a line ax+by +cz = 0, in order to find a parametric equation


we just solve the linear system. Assuming without loss of generality that a 6= 0
we get that x = −(b/a)y − (c/a)z, so that the resulting parametric equation
is: 
x = λ · (−b/a) + µ · (−c/a)

y=λ .

z=µ

Given two distinct points P = (xP : yP : zP : tP ), Q = (xQ : yQ : zQ : tQ ) ∈


3
P (K), the parametric equation of the line through them is:


 x = λxP + µxQ

y = λy + µy
P Q
;


 z = λzP + µzQ

t = λtP + µtQ

175
Andrea Ferraguti Chapter 8: Projective geometry

to recover a cartesian equation we have to impose that


 
x y z t
 
rk  zP yP zP tP  = 2.
 
 
xQ yQ zQ tQ

Conversely, given a cartesian equation of the form ax + by + cz + dt = 0 =


a0 x + b0 y + c0 z + d0 t for a line in P3 (K), to find a parametric equation we just
need to solve the system.

Example 8.4.1.

• Let x+y −2z = 0 be a line in P2 (R). To find a parametric equation,


simply notice that x = −y + 2z, so we get

x = −λ + 2µ

y=λ .

z=µ

• Let 
x = λ − µ

y = 2λ

z = −λ + 2µ

be a parametric equation for a line in P2 (R). To find a cartesian


equation, we let
 
x y z
 
det  1 2 −1 = 4x − y + 2z = 0.
 
 
−1 0 2

• Let 

 x = 2λ + µ

y = 2λ − µ


 z=µ

t=λ

176
Andrea Ferraguti Chapter 8: Projective geometry

be a parametric equation for a line in P3 (R). To find a cartesian


equation, we have to impose
 
x y z t
 
rk  2 2 0 1 = 2.
 
 
1 −1 1 0

To do this, we can for example select the 2 × 2 submatrix given by


the third and fourth row and column and use Theorem 2.3.17. This
way we get:
   
x z t y z t
   
det  2 0 1 = det  2 0 1 = 0,
   
   
1 1 0 −1 1 0

which in turn yields the cartesian equation:


(
x − z − 2t = 0
.
y + z − 2t = 0
Analogously to the case of lines, given three distinct points P = (xP : yP :
zP : tP ), Q = (xQ : yQ : zQ : tQ ) and R = (xR : yR : zR : tR ) in P3 (K) that
do not lie on the same line, a parametric equation of the plane that contains
them is: 
x = λxP + µxQ + νxR

y = λyP + µyQ + νyR .

z = λzP + µzQ + νzR

To obtain a cartesian equation, just impose


 
x y z t
 
 
xP yP zP tP 
det 

 = 0.

xQ yQ zQ tQ 
 
xR yR zR tR

Conversely, given a cartesian equation such as ax+by +cz +dt = 0, to obtain a


parametric one we just have to solve the linear system. If a 6= 0, for example,

177
Andrea Ferraguti Chapter 8: Projective geometry

we get x = −(b/a)y − (c/a)z − (d/a)t, so that




 x = λ · (−b/a) + µ · (−c/a) + ν · (−d/a)

y = λ
.


 z=µ

t=ν

8.5 Relative position of linear subspaces

Lemma 8.5.1. Two lines in P2 (K) always have non-empty intersection.


The same holds true for a plane and a line in P3 (K) and for two planes in
P3 (K).

Proof. Let ` : ax + by + cz = 0 and `0 : a0 x + b0 y + c0 z = 0 be two lines in


P2 (K). Their intersection is given by the system
(
ax + by + cz = 0
.
a0 x + b 0 y + c 0 z = 0

This is a homogeneous system of 2 equations in 3 variables, and therefore


the set of its solutions is a vector subspace of dimension at least 1. It
follows that it always contains least one non-zero solution (x0 , y0 , z0 ). This
gives rise to a point (x0 : y0 : z0 ) ∈ P2 (K) that lies on the intersection of
the two lines.
If π : ax + by + cz + dt = 0 and π 0 : a0 x + b0 y + c0 z + d0 t = 0 are two
planes in P3 (K), their intersection is given by the system
(
ax + by + cz + dt = 0
.
a0 x + b0 y + c0 z + d0 t = 0

This is a homogeneous system of 2 equations in 4 variables, and hence the


set of its solutions is a vector subspace of dimension at least 2. Therefore
it always contains ( a non-zero solution.
ax + by + cz + dt = 0
Similarly, if ` : is a line and π : a00 x+b00 y +
a0 x + b0 y + c0 z + d0 t = 0
c00 z +d00 t = 0 is a plane, the system that determines `∩π is a homogeneous
system of 3 equation in 4 variables, and therefore its set of solutions is a
vector space of dimension at least 1.

178
Andrea Ferraguti Chapter 8: Projective geometry

Example 8.5.2.

• Let ` : x + 2y + z = 0 and `0 : x − y − z = 0 be two lines in P2 (R),


and let us determine their intersection. To do this, we need to solve
the linear system (
x + 2y + z = 0
.
x−y−z =0
The set of solutions of such system is:

{(s, −2s, 3s) : s ∈ R}

or, in other words, it is the vector subspace of R3 generated by


(1, −2, 3). Notice that since this is 1-dimensional, all non-zero vec-
tors in this subspace belong to the same equivalence class, that is
that of (1, −2, 3). This simply means that ` ∩ `0 = {(1 : −2 : 3)}:
the two lines intersect in a proper point.

• Let π : x + y + t = 0 and π 0 : x − y + t = 0 be two planes in P3 (R).


To find their intersection, we need to solve the linear system
(
x+y+t=0
.
x−y+t=0
 
1 1 0 1
The matrix representing this linear system, namely  
1 −1 0 1
has rank 2. Therefore, this system of two equations represents a line
in P3 (R), that is exactly the intersection of π and π 0 .
Of course one can also find the solutions to the system. The set of
solutions is a 2-dimensional subspace of R4 and a basis is, for exam-
ple, ((1, 0, 0, −1), (0, 0, 1, 0)). It follows that a parametric equation
for π ∩ π 0 is: 

 x=λ

y = 0
.


 z = µ
t = −λ

179
Andrea Ferraguti Chapter 8: Projective geometry

• Let ` : x + y + t = x − y + t = 0 be the line of the previous point


and let π : x + 3y + t = 0 be a plane. In order to find ` ∩ π, we need
to solve the system:

x + y + t = 0

x−y+t=0 ,

x + 3y + t = 0

whose set of solutions is a 1-dimensional vector subspace of R4


generated by the vector (0, 0, 1, 0). This means that ` ∩ π is the
point (0 : 0 : 1 : 0). Notice that since its last coordinate is
0, this is an improper point. In fact, the affine part of ` is the
line `0 : x + y + 1 = x − y + 1 = 0, while the affine part of π is
π 0 : x + 3y + 1 = 0. In A3 (R), `0 and π 0 are parallel and `0 does
not lie on π 0 , and hence they have no intersection. In the projective
space instead, these two subspaces have an improper intersection
point, that represents the direction of the line `0 .

If we start with two parallel linear subspaces in A2 (K) or A3 (K) that


have empty intersection and we extend them to the projective space P2 (K) or
P3 (K), Lemma 8.5.1 tells us that these have non-empty intersection. Hence
their intersection points must belong to the improper line of P2 (K) or the
improper plane in P3 (K). In fact, if for example ax + by + c = 0 and ax +
by + c0 = 0 are two parallel lines with empty intersection in A2 (K), once we
homogeneize their equations we obtain the projective lines ax + by + cz = 0
and ax + by + c0 z = 0. Their intersection is the improper point (−b : a : 0).
That is, two parallel line in A2 (K) meet in P2 (K) in the improper point that
represents their direction.
Similarly, two distinct parallel planes in A3 (K) given by π : ax + by + cz +
0 0
dt
( = 0 and π : ax + by + cz + d t = 0 meet in the line given by the equation
ax + by + cz = 0
. This is a line contained in the improper plane whose
t=0
points represent the directions of the lines contained in π (or π 0 , that is the
same since they are parallel).
Finally, a line and a plane in A3 (K) that are parallel and distinct meet in
P3 (K) in the improper point that represents the direction of the line.

180
Andrea Ferraguti Chapter 8: Projective geometry

8.6 Projective pencils and bundles


The concepts of pencils and bundles of lines and planes that we encountered
in Chapter 6 can be revisited in the projective setting.

Definition 8.6.1.

1. A pencil of lines in P2 (K) is the set of all lines through a given point
of P2 (K).

2. A pencil of planes in P3 (K) is the set of all planes that contain a


given line.

3. A bundle of lines in P3 (K) is the set of all lines through a given


point of P3 (K).

4. A bundle of planes in P3 (K) is the set of all planes through a given


point of P3 (K).

Proposition 8.6.2.

1. Let P ∈ P2 (K) and let r : ax + by + cz = 0 and s : a0 x + b0 y + c0 z = 0


be two distinct lines through P . Then the pencil of lines through
P ∈ P2 (K) is given by the equation

λ(ax + by + cz) + µ(a0 x + b0 y + c0 z) = 0.

2. Let ` ⊆ P3 (K) be a line and let r : ax + by + cz + dt = 0 and


s : a0 x + b0 y + c0 z + d0 t = 0 be two distinct planes containing `. Then
the pencil of planes in P3 (K) containing ` is given by the equation

λ(ax + by + cz + dt) + µ(a0 x + b0 y + c0 z + d0 t) = 0.


In the projective space there is no distinction between proper and improper
pencil, since the concept of parallelism does not make sense anymore. On the
other hand once we have a pencil of lines/planes in the affine space, we can
homogeneize the equation of the pencil, obtaining a set of lines in P2 (K) or
P3 (K). If we start with a proper pencil, we end up with a pencil in the
projective space.

181
Andrea Ferraguti Chapter 8: Projective geometry

Example 8.6.3.

• Consider the pencil of lines through the point (1, 1) ∈ A2 (K). This
is given by the equation:

λ(x − 1) + µ(y − 1) = 0,

and homogeneizing this we get:

λ(x − z) + µ(y − z) = 0. (36)

Since x − z = 0 and y − z = 0 are two lines in P2 (K) through


(1 : 1 : 1), by Proposition 8.6.2, equation (36) is that of the pencil
of lines through (1 : 1 : 1).

• Consider the pencil of planes through the line x − y − 1 = z − 2 =


in A3 (K). This is given by the equation:

λ(x − y − 1) + µ(z − 2) = 0.

Homogeneizing, we get

λ(x − y − t) + µ(z − 2t) = 0,

that is the pencil of planes through the line x − y − t = z − 2t = 0


in P3 (K).

What happens if, on the other hand, we start with an improper pencil of
lines/planes? Suppose first we have an improper pencil of lines in A2 (K), that
is given by the equation:
ax + by + µ = 0,
where (a, b) ∈ K 2 \ {(0, 0)} are given coefficients and µ ∈ K is a varying
parameter. Homogeneizing this, we get:
ax + by + µz = 0. (37)
Now this closely recalls the equation of a pencil of lines in P2 (K), except for
the fact that we have only one parameter µ. However, consider the equation
λ(ax + by) + µz = 0, (38)
where λ, µ ∈ K are not both zero. This is precisely the equation of a pencil
of lines in P2 (K). The lines z = 0 and ax + by = 0 intersect in the point

182
Andrea Ferraguti Chapter 8: Projective geometry

(−b : a : 0), so (38) is an equation for the pencil of lines through (−b : a : 0).
Notice that this is the improper point corresponding to the direction of the
affine line ax + by + µ = 0!
How different is this from (37)? Not much: if λ 6= 0, we can divide (38)
by λ, obtaining a line that belongs to the family (37). If λ = 0, on the other
hand, we obtain the line z = 0, that does not appear in (37) for any value of
µ. Hence (38) only differs from (37) by the improper line. We have therefore
understood what improper pencils represent in the projective space: they are
simply pencils of lines through an improper point, and thus the name.
Similarly, consider an improper pencil of planes in A3 (K) given by

ax + by + cz + µ = 0,

where (a, b, c) ∈ K 3 is not the zero vector and µ is a varying parameter.


Mimicking the above argument, we can consider the pencil of projective planes:

λ(ax + by + cz) + µt = 0.

This is the pencil defined by the planes ax+by+cz = 0 and t = 0, that intersect
(as already explained in this chapter) in the line ax + by + cz = t = 0, that
lies on the improper plane and contains all points corresponding to directions
of lines lying on ax + by + cz = 0. This explains what an improper pencil of
planes becomes in the projective space: it simply becomes the pencil of planes
that contain the line ax + by + cz = t = 0, plus the improper plane itself.
This is why in the projective space there is no distinction between proper
and improper pencils: they are defined exactly in the same way, but the latter’s
support is contained in the improper line/plane.

Proposition 8.6.4. Let P = (xP : yP : zP : tP ) ∈ P3 (K).

1. The bundle of lines through P has equation:




 x = λxP + µv1

y = λy + µv
P 2
,


 z = λzP + µv3

t = λtP + µv4

where (v1 , v2 , v3 , v4 ) 6= (0, 0, 0, 0).

2. Let π1 , π2 , π3 be pairwise distinct planes passing through P . Then

183
Andrea Ferraguti Chapter 8: Projective geometry

the bundle of planes through P is given by

λπ1 + µπ2 + νπ3 = 0,

where (λ, µ, ν) 6= (0, 0, 0).

Of course, the same argument we used for pencils applies to bundles as well:
in the projective space there is no distinction between proper and improper
bundles because an improper bundle of lines/planes is simply a bundle of
lines/planes passing through an improper point.

8.7 Real and imaginary points


In this section, we focus on the case K = C.

Definition 8.7.1. A point P = (x1 : . . . : xn+1 ) ∈ Pn+1 (C) is called real if


there exists λ ∈ C such that λxi ∈ R for every i = 1, . . . , n + 1. If P is
not real, then it is imaginary.
The conjugate of P is the point P = (x1 : . . . : xn+1 ).

Example 8.7.2. The point (i : i : i) ∈ P2 (C) is real, since multiplying its


entries by −i we obtain (1 : 1 : 1) ∈ P2 (R). The point (i : 1) ∈ P1 (C) is
imaginary, since if there was λ ∈ C such that λi, λ ∈ R, then in particular
λ ∈ R and therefore λi ∈ / R.
Note:-
The notion of conjugate is well-defined, that is, it does not depend on the
representative of the equivalence class of P . In fact suppose P = (x1 : . . . :
xn+1 ) ∈ Pn+1 (C) and let λ ∈ C \ {0}, so that P = (λx1 : . . . : λxn+1 ). Then

λP = (λx1 : . . . : λxn+1 ) = λ(x1 : . . . : xn+1 ) = P .

Lemma 8.7.3. A point P = (x1 : . . . : xn+1 ) ∈ Pn+1 (C) is real if and only
if P = P .

Proof. If P is real, then by definition we can write P = (y1 : . . . : yn+1 )

184
Andrea Ferraguti Chapter 8: Projective geometry

with yi ∈ R for every i, and therefore

P = (y 1 : . . . : y n+1 ) = (y1 : . . . : yn+1 ) = P.

Conversely, suppose that P = P . Let i ∈ {1, . . . , n + 1} be such that


xi 6= 0. Then

P = (x−1 −1 −1 −1 −1
i x1 : xi x2 : . . . : xi xi−1 : 1 : xi xi+1 : . . . : xi xn+1 )

and

P = (x−1 −1 −1 −1 −1
i x1 : xi x2 : . . . : xi xi−1 : 1 : xi xi+1 : . . . : xi xn+1 ).

Since these points are equal, by definition there exists λ ∈ C such that

λ(x−1 −1 −1 −1 −1
i x1 , xi x2 , . . . , xi xi−1 , 1, xi xi+1 , . . . , xi xn+1 ) = P =

= (x−1 −1 −1 −1 −1
i x1 , xi x2 , . . . , xi xi−1 , 1, xi xi+1 , . . . , xi xn+1 ).

The i-th entry of the left hand side is λ, while the i-th entry of the right
hand side is 1. Therefore λ = 1, and x−1 −1
i xj = xi xj for every j. This
−1
means that xi xj ∈ R for every j ∈ {1, . . . , n}, and hence P is real.

Definition 8.7.4. Let L ⊆ Pn (C) be a linear subspace of dimension m,


defined by the kernel of an (n+1−m)×(n+1) matrix A of rank n+1−m.
We say that L is real if
ker A = ker A,
where A is the matrix whose entries are the conjugates of the entries of
A. If L is not real, then we say that it is imaginary.
The conjugate of L is the linear subspace defined by ker A.

Equivalently, real subspaces can be characterized by the property explained


by the following lemma.

Lemma 8.7.5. Let L ⊆ Pn (K) be defined by a matrix A. Then L is real


if and only if for every P ∈ Pn (K) we have

P ∈ L ⇐⇒ P ∈ L.

Proof. First suppose that L is real and let P = (x1 : . . . : xn+1 ) ∈

185
Andrea Ferraguti Chapter 8: Projective geometry

 
x
 1 
Pn (K). Then P ∈ L if and only if A  . . .  = 0, i.e. if and only if
 
 
xn+1
(x1 , . . . , xn+1 ) ∈ ker A. Since L is real then ker A = ker A, so this last
condition is equivalent to (x1 , . . . , xn+1 ) ∈ ker A. Taking conjugates, this
becomes equivalent to P ∈ L.
Conversely, suppose that P ∈ L if and only if P ∈ L. Suppose that
(x1 , . . . , xn+1 ) ∈ ker A. Then (x1 : . . . : xn+1 ) ∈ L, and hence (x1 :
. . . : xn+1 ) ∈ L. This implies that (x1 , . . . , xn+1 ) ∈ ker A, and taking
conjugates (x1 : . . . : xn+1 ) ∈ ker A. Therefore ker A ⊆ ker A. The
symmetric argument shows that ker A ⊆ ker A.

Proposition 8.7.6. In P2 (C), the following hold true.

1. A line ` : ax + by + cz = 0 is real if and only if (a : b : c) is a real


point of P2 (C).

2. A line through two imaginary conjugate points is real.

3. If P is imaginary, there exists a unique real line ` with P ∈ `.

4. If ` is an imaginary line, then ` ∩ ` is a single real point.

5. If ` is an imaginary line, then ` contains precisely one real point.


 
Proof. 1. ` is defined by ker A, with A = a b c .
If (a : b : c) is a real point of P2 (C), then λa, λb, λc ∈ R for some

non-zero λ ∈ C. Clearly ker A = ker(λA), where λA = λa λb λc .
Taking conjugates, ker A = ker(λA), but λA is a real matrix, and hence
the latter is ker λA. It follows that ker A = ker A, so that ` is real.
Conversely, suppose that ker A = ker A. Without loss of generality,
suppose that a 6= 0, so that another equation for ` is x + b0 y + c0 z = 0.
Assume by contradiction that (a : b : c) is an imaginary point: then at
least one between b0 and c0 is not real. Suppose it is b0 . Then (−b0 , 1, 0) ∈
ker A, and since ker A = ker A we must have (−b0 , 1, 0) ∈ ker A. But

186
Andrea Ferraguti Chapter 8: Projective geometry

 
A= 1 b0 c0 and hence
 
−b0
 
A  1  = −b0 + b0 ,
 
 
0

which is not 0 since b0 is not real. This contradicts the fact that ker A =
ker A, and hence (a : b : c) must be real.
2. Let P = (x1 : x2 : x3 ) ∈ P2 (C) be an imaginary point, and let P be
its conjugate. Let ` : ax + by + cz = 0 be a line through P and P . Then
(
ax1 + bx2 + cx3 = 0
.
ax1 + bx2 + cx3 = 0

Taking conjugates we get that


(
ax1 + bx2 + cx3 = 0
ax1 + bx2 + cx3 = 0

and therefore ` also passes through P and P . However since P is imaginary


P and P are two distinct points, and there exists a unique line that passes
through them. Hence ` = `.
3. If ` : ax + by + cz = 0 is a real line (with a, b, c ∈ R) containing the
imaginary point P = (x1 : x2 : x3 ), then ax1 + bx2 + cx3 = 0 and taking
conjugates we get ax1 + bx2 + cx3 = 0. This means that ` passes through
P as well, and since P 6= P it is the unique one.
4. Let ` : ax + by + cz be an imaginary line, so that ` 6= `. If P = (x1 :
x2 : x3 ) ∈ P2 (C) belongs to ` ∩ ` then
(
ax1 + bx2 + cx3 = 0
ax1 + bx2 + cx3 = 0

and taking conjugates we get


(
ax1 + bx2 + cx3 = 0
,
ax1 + bx2 + cx3 = 0

187
Andrea Ferraguti Chapter 8: Projective geometry

showing that P ∈ ` ∩ `. But since ` 6= ` the intersection point is unique,


and so P = P .
5. If P ∈ ` is real, then P ∈ `, but ` ∩ ` contains a unique real point
by 3.

The proof of the next proposition is very similar to that of Proposition 8.7.6.
We omit it, but encourage the interested reader in writing it down them-
selves.

Proposition 8.7.7. In P3 (C), the following hold true.

1. A plane ax + by + cz + dt = 0 is real if and only if (a : b : c : d) is a


real point of P3 (C).

2. If P ∈ P3 (C) is imaginary, the line through P and P is real.

3. If P ∈ P3 (C) is imaginary, there exists a unique real line through P .

4. If a real line or a real plane contain an imaginary point, then they


contain the conjugate as well.

5. Two imaginary conjugate planes intersect in a real line.

6. An imaginary plane contains a unique real line.

7. An imaginary line contains at most one real point and there exists
at most one real plane containing it.

188
Andrea Ferraguti Chapter 9: Conics

Chapter 9: Conics
9.1 Algebraic curves, intersection multiplicities and tan-
gents
We denote by C[x, y, z] the set of polynomials in 3 variables over the complex
field. That is, elements of C[x, y, z] are expressions of the form
n
X
aijk xi y j z k .
i,j,k=1

The degree of a term aijk xi y j z k is i + j + k.

Definition 9.1.1. A polynomial f ∈ C[x, y, z] is called homogeneous of


degree d if there exists a non-negative integer d such that every term of f
has degree d. That is, f is homogeneous of degree d if it has the form
X
aijk xi y j z k .
i+j+k=d

Example 9.1.2.

• Homogeneous polynomials of degree 0 are constant polynomials.

• The polynomial x2 + 3xz − iz 2 is homogeneous of degree 2.

• The polynomial x4 + zy 3 + z 2 + y 4 is not homogeneous.

• The polynomial 2x5 + x3 yz − 4y 4 z + 6z 4 x + iz 3 y 2 is homogeneous


of degree 5.

Definition 9.1.3. An algebraic plane curve of degree d is a set of the form

{(x : y : z) ∈ P2 (C) : f (x, y, z) = 0},

where f ∈ C[x, y, z] is a non-zero homogeneous polynomial of degree d.

Since all of our curves will be contained in P2 (C), we will drop the adjective
”plane”.

189
Andrea Ferraguti Chapter 9: Conics

Note:-
The definition of algebraic curve makes sense only because f is homoge-
neous. In fact, if (x1 : x2P: x3 ) is such that f (x1 , x2 , x3 ) = 0, with f
i j k
homogeneous of the form i+j+k=d aijk x y z , then for every λ ∈ C we
have X
f (λx1 , λx2 , λx3 ) = aijk (λx1 )i (λx2 )j (λx3 )k =
i+j+k=d
X X
= λi+j+k aijk xi1 xj2 xk3 = λd aijk xi1 xj2 xk3 = λd f (x1 , x2 , x3 ) = 0.
i+j+k=d i+j+k=d

Example 9.1.4. A line in P2 (C) is defined by an equation of the form


ax + by + cz = 0. Therefore, a line is an algebraic curve of degree 1.

Definition 9.1.5. An algebraic curve C : f (x, y, z) = 0 of degree d is re-


ducible if there exist homogeneous polynomials g, h ∈ C[x, y, z], each of
degree > 0, such that f = gh. If C is not reducible, then it is called
irreducible.

Example 9.1.6.

• If f, g are homogeneous polynomials, of degree d, e, respectively,


then gh is a homogeneous polynomial of degree d + e. Hence since
lines have degree 1, they are also irreducible.

• The curve x2 + xy + xz + yz = 0 is reducible, because

x2 + xy + xz + yz = (x + y)(x + z).

• The curve x2 + y 2 = 0 is reducible, since x2 + y 2 = (x + iy)(x − iy).

• The curve x2 + y 2 + z 2 = 0 is irreducible.

P Let C : f (x,i y,j z)k


= 0 be an algebraic curve of degree d, and write f =
2
i+j+k=d aijk x y z . Let P = (x0 : y0 : z0 ) ∈ P (C) be a point of C, so that
f (x0 , y0 , z0 ) = 0 and let ` : ax + by + cz = 0 be a line through P . Since
(a, b, c) 6= (0, 0, 0) we can assume without loss of generality that c 6= 0, and
rewrite ` as z = αx+βy, where α = −a/c and β = −b/c, so that z0 = αx0 +βy0 .

190
Andrea Ferraguti Chapter 9: Conics

The equation
f (x, y, αx + βy) = 0
certainly has a solution, because f (x0 , y0 , αx0 + βy0 ) = f (x0 , y0 , z0 ) = 0. On
the other hand, we have that
X
f (x, y, αx + βy) = aijk xi y j (αx + βy)k .
i+j+k=0

Notice that (αx + βy)k = kh=0 hk αh β k−h xh y k−h is a homogeneous polyno-


P 

mial of degree k. Hence every term aijk xi y j (αx + βy)k of f (x, y, αx + βy)
is homogeneous of degree d, and in turn the polynomial f (x, y, αx + βy) is
homogeneous of degree d as well. Hence we can write
d
X
f (x, y, αx + βy) = ah xh y d−h (39)
h=0

for some coefficients a0 , . . . , ad ∈ C.


Now we need to make an important assumption: the fact that f (x, y, αx +
βy) is not the zero polynomial.

Remark 9.1.7. The fact that f (x, y, αx + βy) = 0 is equivalent to saying


that every point of ` solves the equation f (x, y, z) = 0, that is in turn
equivalent to saying that ` ⊆ C. It is not difficult to show that this happens
exactly when there exists a homogeneous polynomial h ∈ C[x, y, z] such
that f (x, y, z) = h(x, y, z)(ax + by + cz).
Next, notice that we cannot have x0 = y0 = 0, as otherwise z0 = 0. Suppose
without loss of generality that y0 6= 0. Since (x0 , y0 ) solves the equation
f (x, y, αx + βy) = 0, also (y0−1 x0 , 1) does. Hence we can set y = 1 in (39)
(that is equivalent to dividing everything by y and then replacing x/y by x),
and look at the equation
Xd
ah xh = 0, (40)
h=0

of which y0−1 x0 is a solution.

Definition 9.1.8. The multiplicity of y0−1 x0 as a root of (40) is called


intersection multiplicity of ` and C in P and it is denoted by mP (C, `).

191
Andrea Ferraguti Chapter 9: Conics

Note:-
Since Equation (40) has degree d, the intersection multiplicity is at most d.
Moreover, it is also at least 1.
The definitoin of intersection multiplicity looks intricate at first, but in fact
computing it is rather easy. Let us see some examples.

Example 9.1.9.

• Let f = x2 + y 2 + z 2 and let C : f = 0 be the corresponding algebraic


curve. Let P = (1 : 0 : i) ∈ C and let ` : ix + y − z = 0 be a line
through P . First, we rewrite the equation of ` as z = ix + y. Next,
we substitute this into f and we equate to 0, getting

f (x, y, ix + y) = x2 + y 2 + (ix + y)2 = 2ixy + 2y 2 = 0. (41)

Now we look at which coordinate of P , between the x-coordinate


and the y-coordinate, is 0. It is the first one, and therefore we can
set x = 1 in (41), getting

iy + y 2 = 0.

Now this equation has, as a solution, x−10 y0 , where x0 = 1 is the x-


coordinate of P and y0 is the y-coordinate. Notice that x−1 0 y0 = 0,
so we have to look at 0 as a root of the equation. This clearly has
multiplicity 1, and hence mP (C, `) = 1.

• Let f = x2 + y 2 + z 2 and let C : f = 0 be the corresponding algebraic


curve. Let P = (1 : 0 : i) ∈ C and let ` : ix − z = 0 be a line through
P . Let us compute the intersection multiplicity of ` and C in P .
First, we rewrite the equation of ` as z = ix. Next, we substitute
in the equation of C, getting

x2 + y 2 − x2 = y 2 = 0. (42)

Now we look at which coordinate of P is non-zero. It is clearly the


x-coordinate, so we need to look at the root y = x−1
0 y0 = 0 of (42).
This has multiplicity 2, and hence mP (C, `) = 2.

• Let f = x3 − x2 z − y 2 z and C : f = 0 be the corresponding curve.


Let P = (1 : 0 : 1) ∈ C and let x − z = 0 be a line through P . We

192
Andrea Ferraguti Chapter 9: Conics

write ` as z = x and substitute, getting:

y 2 x = 0.

The non-zero coordinate of P , between x and y, is the x coordinate,


so we set x = 1 and look at 0 as a root of y 2 = 0. This has
multiplicity 2, so mP (C, `) = 2.

Definition 9.1.10. Let C be an algebraic curve, let P ∈ C be a point and


` be a line that passes through P . We say that ` is a tangent in P if
mP (C, `) ≥ 2.

Referring to 9.1.9, in the first example the line is not tangent, while in the
other two it is.

Definition 9.1.11. Let C be an algebraic curve. A point P ∈ C is called


smooth or non-singular if there exists a unique tangent in P . If P is not
smooth, then it is singular.

Example 9.1.12.

• Consider the curve x3 − x2 z − y 2 z = 0 and the point P = (0 : 0 : 1).


Let m ∈ C and consider the line y = mx through P . Substituting
in the equation of the curve, we get

x3 − x2 z − m2 x2 z = 0.

The non-zero coordinate of P is the third one, so to understand


mP (C, `) we need to look at x = 0 as a solution of

x3 − x2 − m2 x2 = x3 − (1 + m2 )x2 = 0.

This has multiplicity at least 2 for every m, so every line through


P of the form y = mx is a tangent. Hence P is singular.

• Consider the curve x3 − xz 2 − y 2 z = 0 and the point P = (0 : 0 : 1).


Let m ∈ C and consider a line y = mx through P . We get

x3 − xz 2 − m2 x2 z = 0,

193
Andrea Ferraguti Chapter 9: Conics

and since the z-coordinate of P is non-zero, we need to look at 0 as


a root of
x3 − m2 x2 − x = 0.
This has multiplicity 1 for every m, so no line of the form y = mx
is a tangent in P .
The only other line through P is x = 0. Substituting, we get

y 2 z = 0,

and we get to look at the multiplicity of y = 0 as a root of y 2 = 0.


This has multiplicity 2, and hence x = 0 is a tangent in P .

Theorem 9.1.13. Let C : f (x, y, z) = 0 be an algebraic curve and let P =


(x0 : y0 : z0 ) ∈ C. Then P is singular if and only if the following relations
hold true: 


 f (x0 , y0 , z0 ) = 0
 ∂f (x , y , z ) = 0

∂x 0 0 0
∂f .

 ∂y
(x 0 , y0 , z0 ) = 0

 ∂f (x , y , z ) = 0

∂z 0 0 0

Theorem 9.1.14. Let C : f (x, y, z) = 0 be an algebraic curve and let P =


(x0 : y0 : z0 ) ∈ C. If P is smooth, then the tangent line in P is given by
the equation
∂f ∂f ∂f
(x0 , y0 , z0 )x + (x0 , y0 , z0 )y + (x0 , y0 , z0 )z = 0.
∂x ∂y ∂z
We close the section with a theorem that explains, in a precise sense, ”how
many” intersections does an algebraic curve of degree d have with a line.

Theorem 9.1.15. Let C : f (x, y, z) = 0 be an algebraic curve of degree d


and ` : ax + by + cz = 0 be a line. Assume that ` 6⊆ C. Then:
X
mP (C, `) = d.
P ∈C∩`

194
Andrea Ferraguti Chapter 9: Conics

Proof. Write f (x, y, z) = i+j+k=d aijk xi y j z k . Intersecting C and ` means


P
studying the system
(P
i j k
i+j+k=d aijk x y z = 0
. (43)
ax + by + cz = 0

Since (a, b, c) 6= (0, 0, 0) we can assume without loss of generality that


c 6= 0, and rewrite ` as z = αx + βy, where α = −a/c and β = −b/c.
Substituting in (43), we get

f (x, y, αx + βy) = 0.

The polynomial f (x, y, αx + βy) cannot be the 0 polynomial, as otherwise


we would have ` ⊆ C, violating the hypotheses. Then, as we have already
noted in (39), the polynomial f (x, y, αx + βy) is homogeneous of degree
d, and hence we can write
d
X
f (x, y, αx + βy) = ah xh y d−h
h=0

for some coefficients a0 , . . . , ad ∈ C.


We know that in order to find the multiplicity of intersection at a point
(x0 : y0 : z0 ) ∈ C ∩ `, we need to compute the multiplicity of (x0 , y0 ) as a
root of
d
X
ah xh y d−h = 0. (44)
h=0

Therefore the theorem is proved if we can show that equation (44) has
exactly d solutions, when counted with multiplicity.
Suppose that (x0 , y0 ) is a solution of (44). We can assume that (x0 , y0 ) 6=
(0, 0) because if x0 = 0 = y0 then z = αx0 + βy0 = 0, which does not
define a point of P2 (C). Now suppose that y0 6= 0. Then (x0 y0−1 , 1) is also
a solution of (44). On the other hand, if (x0 , 0) is a solution than also
(1, 0) is. Since proportional pairs (x0 , y0 ) and (λx0 , λy0 ) give rise to the
same point P = (x0 : y0 : αx0 + βy0 ),P we can just count solution of (44)
of the form (x0 , 1) or (1, 0). Now, if dh=0 ah xh y d−h = ad y d then we are
done, since (1, 0) is the unique root and it has multiplicity d. Otherwise,

195
Andrea Ferraguti Chapter 9: Conics

there is a unique e ≥ 0 such that


d d
!
X X
ah xh y d−h = y e ah xh y d−h−e
h=0 h=0

with ad−e 6= 0. Then (1, 0) is a root with multiplicity e, and the equation
d
X
ah xh y d−h−e = 0
h=0

has precisely d−e roots of the form (x0 , 1), when counted with multiplicity.
All in all, we have e + d − e = d roots.

9.2 Conics
Definition 9.2.1. A conic is a plane algebraic curve of degree 2.

Equivalently, a conic is a plane algebraic curve given by an equation of the


form
C : a11 x2 + 2a12 xy + 2a13 xz + a22 y 2 + 2a23 yz + a33 z 2 = 0,
where aij ∈ C are not all zero.
There is a convenient way to rewrite the above equation. Namely, a conic
is the set of points (x0 : y0 : z0 ) ∈ P2 (C) that satisfy
  
a a a x
   11 12 13   0 
x0 y0 z0 a12 a22 a23   y0  = 0.
  
  
a13 a23 a33 z0
 
a11 a12 a13
 
The matrix A = a12 a22 a23  is the matrix associated to the conic C.
 
 
a13 a23 a33
 
x
 
Setting X = y  we can write the equation of C as:
 
 
z
t
XAX = 0.

196
Andrea Ferraguti Chapter 9: Conics

Definition 9.2.2. Let C be a conic. We say that C is:

1. generic if it has no singular points;

2. simply degenerate if it has exactly one singular point;

3. doubly degenerate if all of its points are singular;

Theorem 9.2.3. Let C : t AX = 0 be a conic. Then C is:

1. generic if det A 6= 0 or, equivalently, if rk(A) = 3;

2. simply degenerate if rk(A) = 2;

3. doubly degenerate if rk(A) = 1;

Proof. Let

f = a11 x2 + 2a12 xy + 2a13 xz + a22 y 2 + 2a23 yz + a33 z 2

be the equation of C. By Theorem 9.1.13, a point (x0 : y0 : z0 ) ∈ C is


singular if and only if:
 ∂f
 ∂x (x0 , y0 , z0 ) = 0

∂f
∂y
(x0 , y0 , z0 ) = 0 .
 ∂f

∂z
(x0 , y0 , z0 ) = 0

Now notice that:


 ∂f
 ∂x = 2a11 x + 2a12 y + 2a13 z

∂f
∂y
= 2a12 x + 2a22 y + 2a23 z
 ∂f

∂z
= 2a13 x + 2a23 y + 2a33 z

So that (x0 : y0 : z0 ) ∈ C is singular if and only if


    
x a a a x
 0   11 12 13   0 
A  y0  = a12 a22 a23   y0  = 0.
    
    
z0 a13 a23 a33 z0

197
Andrea Ferraguti Chapter 9: Conics

Namely, in order to find singular points we need to solve the homogenous


linear system
AX = 0. (45)
Notice that any non-zero solution of (45) automatically yields a point of
the conic, since if AX = 0 then clearly t XAX = 0.
If det A 6= 0, then by Theorem 3.1.5, system (45) only has the trivial
solution X = 0. But this does not define a point of P2 (C), and therefore
C has no singular points.
If rk(A) = 2, the solutions of system (45) form a 1-dimensional C-
vector space, generated by a non-zero vector (x0 , y0 , z0 ) ∈ C3 . This means
that (x0 : y0 : z0 ) is the unique singular point of the conic, and hence C is
simply degenerate.
If rk(A) = 1, the solutions of system (45) form a 2-dimensional C-
vector space. This gives rise, in P2 (C), to a line ` made entirely of singular
points. The line ` is entirely contained in C. We claim that the converse
is also true, namely, every point of C belongs to `. In fact, if this was not
true then let P ∈ C \ ` and let Q ∈ `. Now consider the line r through P
and Q: its intersection multiplicity with C at Q is ≥ 2, since Q is singular.
Its intersection multiplicity with C at P is at least 1. But then
X
mR (C, r) ≥ 2 + 1 = 3,
R∈C∩`

contradicting Theorem 9.1.15. Since every point of C is a point of `, and


points of ` are all singular, C is doubly degenerate.
Next, we see what it means geometrically to be generic, simply degenerate
and doubly degenerate.

Lemma 9.2.4. Let C : f (x, y, z) = 0 be a conic.

1. C is generic if and only if it is irreducible.

2. C is simply degenerate if and only if there exist two distinct lines


` : ax + by + cz = 0, `0 : a0 x + b0 y + c0 z = 0 such that

f (x, y, z) = (ax + by + cz)(a0 x + b0 y + c0 z).

3. C is doubly degenerate if and only if there exists a line ` : ax + by +


cz = 0 such that f (x, y, z) = (ax + by + cz)2 .

198
Andrea Ferraguti Chapter 9: Conics

Proof. 1. First, let C be irreducible. By contradiction, assume that it is


not generic. Then it has (at least) a singular point P ∈ C. Now let Q be
another point of C. Let ` : ax + by + cz = 0 be the line through P and Q.
If ` 6⊆ C, then by Theorem 9.1.15 we have
X
mR (C, `) = 2, (46)
R∈`∩C

but mP (C, `) ≥ 2 since P is singular and mQ (C, `) ≥ 1, and this contradicts


(46). Therefore we must have ` ⊆ C, and by Remark 9.1.7 this implies that
f (x, y, z) = (ax + by + cz)h(x, y, z), so that C is reducible, contradicting
our assumption. Hence C is generic.
Conversely, let C be generic. By contradiction, assume that it is re-
ducible. Then there are lines ` : ax + by + cz = 0 and `0 : a0 x + b0 y + c0 z = 0
such that
f (x, y, z) = (ax + by + cz)(a0 x + b0 y + c0 z). (47)
Now let P = (x0 : y0 : z0 ) ∈ ` ∩ `0 . Assume without loss of generality that
x0 6= 0 and consider the line ` : z0 x − x0 z = 0 through P . We claim that
mP (C, `) ≥ 2. To compute the multiplicity, write ` as z = xz00 x, substitute
in (47) and equate to 0, getting:

((ax0 + cz0 )x + bx0 y)((a0 x0 + c0 z0 )x + b0 x0 y) = 0.

Since x0 6= 0, in order to find the intersection multiplicity we have to solve

((ax0 + cz0 ) + bx0 y)((a0 x0 + c0 z0 ) + b0 x0 y) = 0.

But both factors have x−1 0 y0 as a root, and hence mP (C, `) ≥ 2, contra-
dicting the fact that C is generic.
2. We have proved in 1. that if C decomposes as the product of two
lines, the intersection point is singular. Hence if f (x, y, z) = (ax + by +
cz)(a0 x + b0 y + c0 z) with ax + by + cz = 0 and a0 x + b0 y + c0 z = 0 distinct
lines, their unique intersection point P is singular. On the other hand, no
other point Q can be singular, as otherwise if ` is the line through P, Q
then mp (C, `) + mQ (C, `) ≥ 3, contradicting Theorem 9.1.15. Conversely,
suppose that C is simply degenerate. Then by 1. it must be reducible,
and since deg f = 2, we can only have f (x, y, z) = g(x, y, z)h(x, y, z) with
g, h homogeneous polynomials of degree 1. If g = 0 and h = 0 define the

199
Andrea Ferraguti Chapter 9: Conics

same line, then any point on such line is singular, but then every point of
C would be singular. Hence g = 0 and h = 0 must define distinct lines.
3. If f = (ax + by + cz)2 , then every point of C is singular, and so
C is doubly degenerate. Conversely, if C is doubly degenerate then it is
reducible, and we showed in 2. that if it decomposes as the product of
two distinct lines then there is a unique singular point. Then it must
decompose as the square of a line.
The next lemma shows how to compute tangent lines in smooth points of
conics.
 
a a a13
 11 12 
Lemma 9.2.5. Let C : t XAX = 0 be a conic, where A = a12 a22 a23  ∈
 
 
a13 a23 a33
M3 (C). Let P = (xP : yP : zP ) ∈ C be a smooth point. Then the unique
tangent in P is the line
 
x
   
xP yP zP A y  = 0.
 
 
z

Proof. By Theorem 9.1.14, the tangent line in P is given by the equation


∂f ∂f ∂f
(P )x + (P )y + (P )z = 0,
∂x ∂y ∂z

where f (x, y, z) = a11 x2 + 2a12 xy + 2a13 xz + a22 y 2 + 2a23 yz + a33 z 2 . Hence


we just need to verify that this expression coincides with that of the claim.
We compute:
∂f
(P ) = 2a11 xP + 2a12 yP + 2a13 zP
∂x
∂f
(P ) = 2a22 yP + 2a12 xP + 2a23 zP
∂y
∂f
(P ) = 2a33 zP + 2a13 xP + 2a23 yP
∂z

200
Andrea Ferraguti Chapter 9: Conics

so that the equation of the tangent line in P is given by:

(a11 xP +a12 yP +a13 zP )x+(a22 yP +a12 xP +a23 zP )y+(a33 zP +a13 xP +a23 yP )z = 0.

On the other hand,


 
a11 a12 a13
  
zP a12 a22 a23  =
 
xP y P
 
a13 a23 a33
 
= a11 xP + a12 yP + a13 a22 yP + a12 xP + a23 zP a33 zP + a13 xP + a23 yP
 
x
 
so that when we multiply the above matrix by y  and we equate to 0
 
 
z
we obtain precisely the equation of the tangent line.

Definition 9.2.6. Let C : f (x, y, z) = 0 be a conic. The improper points


of C are the intersections of C with the line z = 0.
If C is general, an asymptote of C is a proper tangent in an improper
point of C.

Remark 9.2.7. To find the improper points of C one simply needs to


solve the equation f (x, y, 0) = 0. By Theorem 9.1.15, a conic either
has two distinct improper points or it has a unique improper point with
multiplicity 2; in the latter case, the line z = 0 is tangent in such point.
If C is general and it has 2 improper points, then these are smooth
and z = 0 is not tangent in either of them, since otherwise we would
contradict Theorem 9.1.15. Hence the tangents in the improper points
must be proper, and the conic has two asymptotes.
If C is general and it has only one improper point, then this is smooth
and z = 0 is tangent in such point; it follows that the conic has no
asymptotes.

201
Andrea Ferraguti Chapter 9: Conics

Example 9.2.8.

• Let C : x2 + 2y 2 + z 2 = 0. To find the improper points, we need


to solve√x2 + 2y 2 = 0. It follows that
√ the two improper points are:
P = (i 2 : 1 : 0) and P = (−i 2 : 1 : 0). Since C is a general
conic, there is a unique tangent line in both P and P . To compute
these tangents, we use Lemma 9.2.5. The tangent in P is given by:
  
1 0 0 x
√   
i 2 1 0 0 2 0 y  = 0,
  
  
0 0 1 z

that is, √
i 2x + 2y = 0.
Similarly, the line √
−i 2x + 2y = 0
is the tangent in P . Since these lines are proper, they are the
asymptotes of C.

• Let C : x2 +4xy +4y 2 −2xz +z 2 = 0. To find the improper points, we


need to solve x2 + 4xy + 4y 2 = 0. This is equivalent to (x + 2y)2 = 0,
so this conic has a unique improper point that is P = (2 : −1 : 0).
Its tangent line has equation:
  
1 2 −1 x
   
2 −1 0  2 4 0  y  = 0,
  
  
−1 0 1 z

that is just z = 0. This means that the tangent line in P is improper,


and hence it is not an asymptote.

9.3 Real conics


Definition 9.3.1. A real conic is a conic C which has a defining equation
f (x, y, z) = 0 with f (x, y, z) ∈ R[x, y, z].

202
Andrea Ferraguti Chapter 9: Conics

The conic C is called:

1. An ellipse if it has two imaginary conjugate improper points;

2. A hyperbola if it has two real distinc improper points;

3. A parabola if it has one real improper point with multiplicity 2.

By Remark 9.2.7, ellipses and hyperbolas have two asymptotes, while parabo-
las have no asymptotes.

Theorem 9.3.2. Let C : t XAX = 0 be a real general conic, where


 
a a12 a13
 11 
A = a12 a22 a23  ∈ M3 (R).
 
 
a13 a23 a33

Let  
a a
e =  11 12  .
A
a12 a22
Then C is:

1. an ellipse if det A
e > 0;

2. a hyperbola if det A
e < 0;

3. a parabola if det A
e = 0.

Proof. In order to find improper points of C we need to solve


  
a11 a12 a13 x
   
x y 0 a12 a22 a23  y  = 0,
  
  
a13 a23 a33 0

that is,
a11 x2 + 2a12 xy + a22 y 2 = 0. (48)
Now we need to consider three cases. First, if a11 = a22 = 0 then it

203
Andrea Ferraguti Chapter 9: Conics

must be a12 6= 0, because if it was also a12 = 0 then the first two rows
of A would be linearly dependent, so that rk(A) < 3 and A would not
be general. Then the equation becomes xy = 0, so the conic has two
improper points (1 : 0 : 0) and (0 : 1 : 0), hence it is a hyperbola and
e = −a2 < 0, as required.
det A 12
Next, if a11 6= 0 then y = 0 does not yield a solution of (48). Hence
we can assume that y = 1 and look at the equation

a11 x2 + 2a12 x + a22 = 0,

that has two real solutions precisely when ∆ = a212 − a11 a22 > 0, it has one
solution with multiplicity two when a212 −a11 a22 = 0 and has two imaginary
conjugate solutions when a212 − a11 a22 < 0. Since a212 − a11 a22 = − det A,
e
we are done.
Finally, if a22 6= 0 then x = 0 does not yield any solution of (48), so
we can just set x = 1 and reason like in the previous case.

Definition 9.3.3. Let C : t XAX = 0 be a real general conic, where


 
a a a13
 11 12 
A = a12 a22 a23  ∈ M3 (R).
 
 
a13 a23 a33

Let P = (xP : yP : zP ) ∈ P2 (C) be any point. A point Q = (xQ : yQ :


zQ ) ∈ P2 (C) is said to be conjugate to P with respect to C if
  
a11 a12 a13 x
    Q
xP yP zP a12 a22 a23   yQ  = 0.
  
  
a13 a23 a33 zQ

We write the above equation as t P AQ = 0, with a slight abuse of notation.

Remark 9.3.4. Notice that P is conjugate to Q with respect to C if and

204
Andrea Ferraguti Chapter 9: Conics

only if Q is conjugate to P with respect to P . In fact,


t
P AQ = 0 ⇐⇒ t (t P AQ) = 0 ⇐⇒ t Qt AP = 0 ⇐⇒ t QAP = 0,

using the fact that t A = A.

Definition 9.3.5. Let C be a real general conic and let P ∈ P2 (C). The
set of points of P2 (C) that are conjugate to P with respect to C is called
the polar of P with respect to C.

Proposition 9.3.6. Let C : t XAX = 0 be a real general conic.

1. Let P ∈ P2 (C). Then the polar of P with respect to C is a line.

2. Let P, Q ∈ P2 (C) be such that P 6= Q, and let `P , `Q be the polar of


P, Q with respect to C, respectively. Then `P 6= `Q .

Proof. 1. The polar is the set of points that solve the equation
  
a11 a12 a13 x
   
xP yP zP a12 a22 a23  y  = 0,
  
  
a13 a23 a33 z

where P = (xP : yP : zP ). This clearly defines a line unless


 
a a12 a13
   11 
a22 a23  = 0,
 
xP yP zP a12
 
a13 a23 a33

namely unless t P A = 0. Transposing and using the fact that A is sym-


metric, the latter condition is equivalent to AP = 0. Since the vector of
the coordinates of P cannot be the zero vector because (0 : 0 : 0) is not
a point of the projective plane, in order for this to happen we need ker A
to be non-zero, but this happens precisely when det A = 0, namely when
A is not general.

205
Andrea Ferraguti Chapter 9: Conics

2. Let P = (xP : yP : zP ) and Q = (xQ : yQ : zQ ). Since P 6= Q as


points of P2 (C), the vectors (xP , yP , zP ) and (xQ , yQ , zQ ) are not linearly
dependent in C3 . Suppose by contradiction that `P = `Q . Then AP and
AQ are proportional vectors, i.e. there exists λ ∈ C such that AP = λAQ
and hence AP = A(λQ), that implies

A(P − λQ) = 0.

In other words, P − λQ ∈ ker A. But ker A = {0} since C is general, and


hence P = λQ, contradicting the hypothesis.

Definition 9.3.7. Let C be a real general conic and let ` ⊆ P2 (C) be a


line. We say that a point P ∈ P2 (C) is a pole of ` if the polar of P with
respect to C is `.

Proposition 9.3.8. Let C be a real general conic and let ` ⊆ P2 (C) be a


line. Then ` has a unique pole.

Proof. Let ` : ax + by + cz = 0. Then a point P = (xP : yP : zP ) ∈ P2 (C)


is a pole of ` if and only if
 
a a a13
   11 12 
 
xP yP zP a12 a22 a23 
 
a13 a23 a33
 
xP
   
is proportional to a b c or, equivalently (transposing), if A  yP  is
 
 
zP
 
a
 
proportional to B =  b . Now the linear system AX = B has exactly
 
 
c
one solution by Theorem 3.1.5, because being C general the matrix A has
non-zero determinant. Let then X be the unique solution to AX = B. If

206
Andrea Ferraguti Chapter 9: Conics

λ ∈ C, we have that A(λX) = λB, and therefore again by Theorem 3.1.5


the vector λX is the unique solution to AX = λB. Therefore we have
proved that AP is proportional to B if and only if P is proportional to
X. But proportional vectors define the same point in P2 (C), and hence `
has a unique pole.
Therefore we have proven that given a general real conic C, the polar of
every point P ∈ P2 (C) with respect to C is a line and every line has a unique
pole in P2 (C).

Theorem 9.3.9 (Reciprocity principle). Let C be a general real conic, let


P ∈ P2 (C) and let p be the polar of P with respect to C.

1. If Q ∈ p, the polar of Q with respect to C passes through P .

2. If ` is a line containing P , its pole belongs to p.

Proof. Let C have equation t XAX = 0.


1. Since p is the set of points that are conjugate to P with respect
to C, if Q ∈ p then we have t P AQ = 0. Transposing and using the fact
that A is symmetric, we get t QAP = 0, so that P is conjugateto Q with
respect to C. But this means precisely that P belongs to the polar of Q
with respect to C.
2. Let Q be the pole of `. This means that ` is the set of points of
2
P (C) that are conjugate to Q with respect to C. Since P ∈ `, then P
is conjugate to Q with respect to C, i.e. t QAP = 0. Transposing, we get
that t P AQ = 0, namely, Q is conjugate to P with respect to C. But then
by definition Q belongs to the polar of P , that is p.

Proposition 9.3.10. Let C be a general real conic, let P ∈ P2 (C) and let
p be the polar of P with respect to C.

1. P ∈ p if and only if P ∈ C, and in such case the polar of P is the


tangent line in P .

2. If P ∈
/ C, there exist exactly two lines `1 and `2 through P that are
tangent to C, and p is the line through the two points `1 ∩ C and
`2 ∩ C.

Proof. 1. P ∈ p if and only if P is conjugate to itself with respect to C,

207
Andrea Ferraguti Chapter 9: Conics

namely if and only if t P AP = 0. But this is equivalent to saying that


P ∈ C. Lemma 9.2.5 shows that in this case the equation of the polar is
precisely the equation of the tangent line.
2. Consider p∩C. By theorem 9.1.15, this consists either of two distinct
points or of a single point with multiplicity 2. Suppose that the latter
holds, and let Q = p ∩ C. Since the intersection multiplicity of p and C in
Q is 2, p is tangent to C in Q, and hence by 1. the line p is the polar of
Q. But a line has a unique pole by Proposition 9.3.8, and hence P = Q.
But then P ∈ C, contradicting the hypothesis. Hence p ∩ C consists of two
distinct points P1 , P2 . Now let `i be the tangent in Pi , for i = 1, 2. Since
Pi ∈ C, by Theorem 9.3.9 we have that P ∈ `i , for i = 1, 2.
So we proved that there exist two lines `1 , `2 through P that are tangent
to C, and p is the line through `1 ∩ C and `2 ∩ C. It remains to show that
there are no other lines through P that are tangent to C. Let `3 be another
line with such property and let P3 = `3 ∩ C. Then `3 is the polar of P3 ,
and since P ∈ `3 , by Theorem 9.3.9 we have that P3 ∈ p. But then
{P1 , P2 , P3 } ⊆ p ∩ C, contradicting Theorem 9.1.15 since the three points
are all distinct.

Definition 9.3.11. Let C be a general real conic. The center of C is the


pole of the improper line z = 0. The diameters of C are the polars of the
improper points of P2 (C).

Remark 9.3.12. Let C be a general real conic. Then every diameter of C


passes through the center of C. In fact, if P is the center of C then by
definition the polar of P is z = 0. Hence by Theorem 9.3.9, the polars of
the points lying on z = 0, that are the diameters, pass through P .

Proposition
 9.3.13.Let C be a general real conic with defining matrix
a a a
 11 12 13 
A = a12 a22 a23 .
 
 
a13 a23 a33

208
Andrea Ferraguti Chapter 9: Conics

1. The center of C is the unique point (xP : yP : zP ) ∈ P2 (C) such that:


(
a11 xP + a12 yP + a13 zP = 0
.
a12 xP + a22 yP + a23 zP = 0

2. The center of C is improper if and only if C is a parabola, and in


this case its center is the unique improper point of C.

3. Suppose C is not a parabola. Then the asymptotes of C are the lines


through the center and the improper points of C.

Proof. 1. By Remark 9.3.12, in order to compute the center we can com-


pute the polars of the points (1 : 0 : 0) and (0 : 1 : 0) and intersect them.
These polars are, respectively,

a11 x + a12 y + a13 z = 0 and a12 x + a22 y + a23 z = 0.

Hence the center is given by the unique solution of the system


(
a11 x + a12 y + a13 z = 0
(49)
a12 x + a22 y + a23 z = 0

(notice that the solution is unique because since C is general, the rows of
A are linearly independent, hence all solutions of (49) are proportional to
each other).
2. System (49) has a solution of the form (xP : yP : 0) if and only if
(xP , yP ) solves the system
(
a11 x + a12 y = 0
. (50)
a12 x + a22 y = 0

But this is a homogeneous system of two equations in two indetermi-


nates, and 
therefore 
by Theorem 3.1.5 it has a non-zero solution if and
a11 a12
only if det   = 0, namely if and only if C is a parabola by
a12 a22
Theorem 9.3.2.
Suppose then that C is a parabola. Its center then is (xP : yP : 0)
where (xP , yP ) solves (50). Since C is a parabola, a11 a22 − a212 = 0, so

209
Andrea Ferraguti Chapter 9: Conics

the two equations are linearly dependent. This means that the solution
is (−a12 : a11 : 0) (notice that it cannot be a11 = a12 = 0 as otherwise we
would also have a22 = 0 and C would be degenerate).
On the other hand, to find the improper points of C we need to solve

a11 x2 + 2a12 xy + a22 y 2 = 0,

and plugging in x = −a12 and y = a11 in the above expression we get


−a11 a212 + a22 a211 = a11 (−a212 + a11 a22 ), that is 0 since C is a parabola. This
means that the center coincides with the unique improper point.
3. Since C is not a parabola, it has two distinct improper points P1 and
P2 . The polar pi of Pi is the tangent line in Pi for i = 1, 2 by Proposition
9.3.10, and hence p1 and p2 are the asymptotes of C. By Remark 9.3.12,
they pass through the center, and hence they must be the lines through
the center and the improper points.

Remark 9.3.14. The center of a real general conic C is a point of C if and


only if C is a parabola. In fact, if C is a parabola then by Proposition
9.3.13 its center lies on it. If C is not a parabola, by Proposition 9.3.13
the center P is proper. Hence if it was a point of C, by Proposition 9.3.10
its polar would be the tangent in P , that is a proper line. But this is
impossible since the polar of the center is the improper line by definition.

Example 9.3.15. Let C : x2 − 2xy + 2y 2 + 4xz − 2z 2 = 0. By Proposition


9.3.13, the center is found by solving the system
(
x − y + 2z = 0
,
−x + 2y = 0

that yields the point (−4 : −2 : 1).


Of course one can also compute the center by using its very definition,
namely that of being the pole of the improper line. So P = (xP : yP : zP )

210
Andrea Ferraguti Chapter 9: Conics

is the center of C if and only if its polar is z = 0, namely if and only if


  
1 −1 2 x
   
xP yP zP −1 2 0  y  = 0
  
  
2 0 −2 z

is the improper line. The above line has equation

(xP − yP + 2zP )x + (−xP + 2yP )y + (2xP − 2zP )z = 0,

which is the equation of the improper line if and only if


(
xP − yP + 2zP = 0
,
−xP + 2yP = 0

that is the very same system we already solved.

Proposition 9.3.16. Let C be a general real conic. The diameters of C


constitute the pencil of lines through the center of C.

Proof. Every diameter passes through the center, by Remark 9.3.12. Con-
versely, let ` be a line through the center P of C. By Theorem 9.3.9, the
pole of ` must lie on the polar of P . But this is the improper line, by
definition of center. Hence ` is the polar of an improper point.

Remark 9.3.17. When looking at the affine part of a general real conic C, if
C is not a parabola then its center is proper, and therefore the diameters
will be the pencil of affine lines through such proper point. If C is a
parabola, since the center is improper the affine part of the diameters will
constitute a pencil of parallel lines, all with direction (−a12 : a11 : 0) by
Proposition 9.3.13.

9.4 Conics in E2 (R)

Definition 9.4.1. A conic in E2 (R) is the set of solutions of an equation

211
Andrea Ferraguti Chapter 9: Conics

of the form:

C : a11 x2 + 2a12 xy + 2a13 x + a22 y 2 + 2a23 y + a33 = 0,

where aij ∈ R for every i, j and a11 , a12 , a22 are not all 0.
Of course by homogeneizing the equation of a conic in E2 (R) one obtains
the equation of a real conic Ce in P2 (C). However, we maintain the concepts
distinct, in a formal way, in order to be able to talk about orthogonality, that
is a notion that makes no sense in P2 .
Every notion we have seen in the previous sections for conics in P2 (C)
carries over to conics in E2 (R); to make sense of such notions we’ll think of
them as associated to the conic C. e For example, if x2 + y 2 + 1 is a conic in
E2 (R), we can talk of its improper points: these will be the improper points
in P2 (C) of the conic x2 + y 2 + z 2 = 0.
In the Euclidean setting we distinguish (for reasons that will not be treated
in these notes) circles from ellipses, although the former are a special case of
the latter.

Definition 9.4.2. A circle in E2 (R) is a conic with equation

C : a11 x2 + a11 y 2 + 2a13 x + 2a23 y + a33 = 0,

with a11 6= 0.

Therefore, in the Euclidean setting we will use the word ”ellipse” to denote
an ellipse that is not a circle.

Definition 9.4.3. The cyclic points in P2 (C) are (1 : i : 0) and (1 : −i : 0).

Lemma 9.4.4. A general conic C ⊆ E2 (R) is a circle if and only Ce passes


through the cyclic points.

Proof. Simply impose to the general equation of a conic the passage


through the cyclic points. This yields:

a11 ± 2ia12 − a22 = 0,

so that we must have a11 = a22 and a12 = 0.

212
Andrea Ferraguti Chapter 9: Conics

Definition 9.4.5. Let C ⊆ E2 (R) be a hyperbola. We say that C is equi-


lateral if its asymptotes are orthogonal.

Proposition 9.4.6. Let

C : a11 x2 + 2a12 xy + 2a13 x + a22 y 2 + 2a23 y + a33 = 0

be a hyperbola in E2 (R). Then C is equilateral if and only if a11 + a22 = 0.

Proof. To find the improper points of C we need to solve the equation

a11 x2 + 2a12 xy + a22 y 2 = 0. (51)

Since C is a hyperbola, we know that this equation will yield two real
distinct improper points: (x0 : y0 : 0) and (x00 : y00 : 0). Let P be the
center of C, that is proper by Proposition 9.3.13. Since the asymptotes are
the lines through P and the improper points, again by Proposition 9.3.13,
their directions are (x0 , y0 ) and (x00 , y00 ). Hence they are orthogonal if and
only if
x0 x00 + y0 y00 = 0. (52)
6 0, then x0 =
If, without loss of generality, we assume that a22 = 6 0 as
otherwise to solve (51) we would also need y0 = 0. For the same reason,
x00 6= 0. Hence (52) can be rewritten as:

y0 y00
· = −1.
x0 x00
y0 y00
Now since x0
and x00
are the roots of the equation

a22 t2 + 2a12 t + a11 = 0,


a11
their product is −1 if and only if a22
= −1, that is if and only if a11 +a22 =
0.

Definition 9.4.7. Let C ⊆ E2 (R) be a general conic. An axis of C is a


proper diameter whose direction is orthogonal to that of its own pole. If
` is an axis of C and P ∈ ` ∩ C is proper, then it is called a vertex of C.

213
Andrea Ferraguti Chapter 9: Conics

Proposition 9.4.8. Let C ⊆ E2 (R) be a general conic.

1. If C is a circle, all diameters are axes and all proper points of C are
vertices.

2. If C is a hyperbola or an ellipse, then its has 2 axes and 4 vertices.

3. If C is a parabola, there is a unique axis and a unique vertex and


the tangent in the vertex is orthogonal to the axis.
 
a a a
 11 12 13 
Proof. Let A = a12 a22 a23  be the matrix of the associated projec-
 
 
a13 a23 a33
tive conic C.
e Let (xP : yP : 0) be an improper point. Its polar is given by
the equation:

(xP a11 + yP a12 )x + (xP a12 + yP a22 )y + (xP a13 + yP a23 )z = 0.

The direction of such polar is (xP a12 + yP a22 , −xP a11 − yP a12 ), while that
of the pole is (xP , yP ). Hence in order for them to be orthogonal we need
(xP a12 + yP a22 )xP − (xP a11 + yP a12 )yP = 0 or, in other words,

a12 x2P + (a22 − a11 )xP yP − a12 yP2 = 0. (53)


1. If C is a circle, then a11 = a22 and a12 = 0, so that (53) is satisfied
by every pair (xP , yP ). This is equivalent to saying that every diameter is
an axis.
If Q is the center of C and R is a point of C, by Theorem 9.3.9 the pole
of the line ` through Q and R is improper. Therefore ` is a diameter, and
it is an axis by what we said above. Hence R is a vertex.
2. If a12 = 0, then the two solutions of (53) yield the points P1 = (1 :
0 : 0) and P2 = (0 : 1 : 0). If a12 6= 0, then the solutions to (53) yield two
points P1 = (xP : yP : 0) and P2 = (−yP : xP : 0). These points coincide
if and only if yP = ixP , and if this happens then it must be xP 6= 0. then
(53) becomes:
x2P (2a12 + (a22 − a11 )i) = 0,
and since the aij ’s are all real and xP 6= 0 it must be a11 = a22 , con-
tradicting the hypothesis that C is a hyperbola or an ellipse. Therefore

214
Andrea Ferraguti Chapter 9: Conics

P1 6= P2 .
Let `i be the polar of Pi , for i = 1, 2. If `i was the improper line, then
Pi ∈ `i , which implies that Pi ∈ C by Proposition 9.3.10. But the polar of
a point of C is its tangent, by the same proposition, so that the improper
line would be tangent in an improper point of C. This would imply that
C is a parabola by Remark 9.2.7, contradicting the hypothesis. Therefore,
`1 and `2 are axes of C, and they are distinct since they are the polar of
two distinct points, by Proposition 9.3.6.
Now we need to prove that `i ∩ C consists of two distinct points. Let
i ∈ {1, 2}. If `i was tangent to C in a point Q, then `i would be the
polar of Q by Proposition 9.3.10. But since the pole of `i is improper,
then Q would be improper and its direction would be the same as that
of `i , which is impossible. Hence `i ∩ C consists of two distinct points for
i = 1, 2. If `1 ∩ C contains an improper point P 0 , then by Theorem 9.3.9
the polar of P 0 would pass by P2 . But the polar of P 0 is the tangent line in
P 0 , and since P 0 and P2 are both improper, this means that the improper
line z = 0 is tangent to C. Hence C should a parabola, contradicting the
hypothesis. This shows that `1 ∩ C consists of two distinct proper points,
and with a simmetric argument also `2 ∩ C does.
Let then `1 ∩ C = {Q1 , Q2 } and `2 ∩ C = {R1 , R2 } with Q1 , Q2 , R1 , R2
proper points with Q1 6= Q2 and R1 6= R2 . If it was Q1 = R1 , then
Q1 would be the center of the conic, since all diameters pass through
the center. But Q1 ∈ C, and this contradicts Remark 9.3.14. Hence
{Q1 , Q2 } ∩ {R1 , R2 } = ∅, proving that C has 4 vertices.
3. By Theorem 9.3.13, diameters of a parabola are exactly the lines that
pass through its unique improper point. If a11 6= 0 or a12 6= 0, the improper
point is (a12 : −a11 : 0). Otherwise, we must have a22 6= 0 as otherwise Ce
would be degenerate, and the improper point is (a12 : −a22 : 0). Without
loss of generality, let us assume that we are in the first case. Then all
diameters have equation

a11 x + a12 y + kz = 0,

for some k ∈ C. Therefore the unique improper point whose direction is


orthogonal to that of its own diameter is (a11 : a12 : 0). Since one between
a11 and a12 is non-zero, the corresponding diameter is proper, and it is
therefore the unique axis. One intersection with Ce is (a12 : −a11 : 0),
and hence the second intersection must be proper, as otherwise the whole
axis would be improper. Hence there is a unique vertex. By Theorem

215
Andrea Ferraguti Chapter 9: Conics

9.3.9, the polar of the vertex, that is tangent to C therein, passes through
(a12 : −a11 : 0), and hence its direction is orthogonal to that of the
axis.

Example 9.4.9.

• Let C : x2 + 2y 2 − 2x − 2y + 3 = 0. This is an ellipse, so it has two


axes and four vertices. Let us compute them. To find axes, we first
need to solve equation (53), that in this case is:

xy = 0.

Hence the points P1 = (1 : 0 : 0) and P2 = (0 : 1 : 0) are


such
 that their polars
 are the axes of C. Since the matrix of Ce
1 0 −1
 
is  0 2 −1, the axes have equation
 
 
−1 −1 3

x − z = 0 and 2y − z = 0

or, in affine coordinates,

x − 1 = 0 and 2y − 1 = 0.

In order to find the vertices, we simply need to intersect the axes


with C. Hence we have to solve
( (
1 + 2y 2 − 2 − 2y + 3 = 0 x2 + 1/2 − 2x − 1 + 3 = 0
and ,
x=1 y = 1/2
√ p
which yield the points (1 : (1±i 3)/2 : 1) and (1±i 3/2 : 1/2 : 1).
These are the four vertices of C. Notice that they are all imaginary
points! One can prove easily that Ce has no real point.

216
Andrea Ferraguti Chapter 9: Conics

• Let C : x2 − y 2 − 2xy + 3 = 0. This is a hyperbola,


 so it has two

1 −1 0
 
axes and four vertices. The matrix of Ce reads as: −1 −1 0,
 
 
0 0 3
and equation (53) becomes:

−x2 − 2xy + y 2 = 0,

which yields the two points (1 : 1 ± 2 : 0). Now the axes of C are
the polars of these points, namely the lines:
√ √ √ √
2x + (2 + 2)y = 0 and 2x − (2 − 2)y = 0.

To find the vertices, we need to solve:


( √ √
(( 2 + 1)y)2 − y 2 + 2y( 2 + 1)y + 3 = 0

x = −( 2 + 1)y

and ( √ √
(( 2 − 1)y)2 − y 2 + 2y( 2 − 1)y + 3 = 0
√ ,
x = ( 2 − 1)y
finding the four points
√ √ √
 q q 
( 2 + 1) 3 − 3 2 : − 3 − 3 2 : 2 ,

√ √ √
 q q 
−( 2 + 1) 3 − 3 2 : 3 − 3 2 : 2 ,

√ √ √
 q q 
( 2 − 1) 3 + 3 2 : 3 + 3 2 : 2 ,

√ √ √
 q q 
( 2 − 1) 3 + 3 2 : 3 + 3 2 : −2 .

√ vertices of C; notice that the first two are imagi-


These are the four
nary, since 3 − 3 2 < 0 while the other two are real.

217
Andrea Ferraguti Chapter 9: Conics

• Let C : x2 + 2xy + y 2 + 2y + 3 = 0. This is a parabola,


 and therefore

1 1 0
 
it has one axis and one vertex. The matrix of Ce is 1 1 1 and
 
 
0 1 3
the axes equation (53) is:

x2 − y 2 = 0,

yielding the points (1 : 1 : 0) and (1 : −1 : 0). Their polars are:

x + y = 0 and z = 0,

so the only axis is x + y = 0. Intersecting it with C we find


(
−2x + 3 = 0
,
y = −x

and hence the unique vertex is (3 : −3 : 2).

218
Andrea Ferraguti Chapter 9: Conics

Notes

219

You might also like