Lectures On Group Theory
Lectures On Group Theory
Richard Earl
Michaelmas 2019
SYNOPSIS
• The natural numbers and their ordering. Induction as a method of proof, including a proof of
the binomial theorem with non-negative integral coefficients. [1.5]
• Functions: composition, restriction; injective (one-to-one), surjective (onto) and invertible func-
tions; images and preimages. [2]
• Proofs and refutations: standard techniques for constructing proofs; counter-examples. Exam-
ple of proof by contradiction and more on proof by induction.
RECOMMENDED READING
1
SET THEORETIC AND LOGICAL NOTATION
IMPORTANT SETS
N — the set of natural numbers {0, 1, 2, . . .}. (Another common convention begins with 1.)
Z — the set of integers {0, ±1, ±2, . . .}.
Q — the set of rational numbers.
R — the set of real numbers.
C — the set of complex numbers.
Zn — the integers, modulo n 2.
Rn — n-dimensional real space — the set of all real n-tuples (x1 , x2, . . . , xn ) .
R [x] — the set of polynomials in x with real coefficients.
∅ — the empty set
For a, b ∈ R with a < b we define
(a, b) = {x ∈ R : a < x < b} .
(a, b] = {x ∈ R : a < x b} .
[a, b) = {x ∈ R : a x < b} .
[a, b] = {x ∈ R : a x b} .
LOGICAL NOTATION
∀ for all ⇒ implies, is sufficient for, only if
∃ there exists ⇐ is implied by, is necessary for, if
∃! there exists unique ⇔ if and only if, is logically equivalent to
¬ negation, not : or | or s.t. such that
∨ or or QED found at the end of a proof
∧ and
The Greek Alphabet
A, α alpha H, η eta N, ν nu T, τ tau
B, β beta Θ, θ theta Ξ, ξ xi Y, υ upsilon
Γ, γ gamma I, ι iota O, o omicron Φ, φ, ϕ phi
∆, δ delta K, κ kappa Π, π pi X, χ chi
E, ǫ epsilon Λ, λ lambda P, ρ, ̺ rho Ψ, ψ psi
Z, ζ zeta M, µ mu Σ, σ, ς sigma Ω, ω omega
2
1. THE NATURAL NUMBERS AND INDUCTION
We already know intuitively what the natural numbers are. Here is a definition — this first description
does not define them in terms of anything more basic but just says they are what you think they are.
Definition 1 A natural number is a non-negative whole number. That is it is a member of the
sequence 0, 1, 2, 3, . . . obtained by starting from 0 and adding 1 successively. We write N for the set
{0, 1, 2, 3, . . .} of all natural numbers.
When discussing foundational material it is convenient to include 0 as a natural number. In the
rest of mathematics though, and life more generally, one starts counting at 1, so you will also see N
defined as the set {1, 2, 3, . . .}. Observe here we are using curly brackets or braces to gather together
objects into a set. We will discuss sets in more detail in the next chapter.
Beyond the set N of natural numbers, there are operations and relations associated with N. Given
natural numbers x, y then we can associate their sum x + y in N and their product x × y in N. This
is to say that + and × are binary operations on N. The natural numbers 0 and 1 have special roles
in that
x + 0 = x for all x in N; x × 1 = x for all x in N.
Definition 2 A binary operation ∗ on a set S associates an element x ∗ y in S with each ordered
pair (x, y) where x, y are in S. (Binary operations will be studied in further detail in the Groups and
Group Actions course next term.)
Further the set N has a natural ordering . (As a mathematical object, is a binary relation
on N. Binary relations will be discussed in detail in the next chapter.)
Definition 3 Let x, y be natural numbers. We write x y if there exists a natural number z such
that x + z = y.
Proposition 4 Let x, y, z be natural numbers. Then
(a) x x.
(b) if x y and y x then x = y.
(c) if x y and y z then x z.
(d) either x y or y x holds true.
Proof (a) This follows as 0 is a natural number and x + 0 = x.
(b) As x y then x + a = y for some natural number a and similarly y x implies y + b = x for
some b. Then
x + (a + b) = (x + a) + b = y + b = x
and hence a + b = 0. This is only possible for natural numbers if a = b = 0 and so x = y.
(c) As x y and y z then x + a = y and y + b = z for some a, b. Then
z = y + b = (x + a) + b = x + (a + b)
x = y + (x − y)
showing that y x.
Remark 5 It’s worth taking a moment to note what algebraic laws have been necessary in the proof
of the above. We have certainly used the associativity of +, that is the rule that x+(y+z) = (x+y+z).
Is this something we need to assume as an axiom, or something we should be able to prove?
Below is a somewhat more rigorous definition of the natural numbers than Definition 1 and which
leads more naturally to proofs. It is possible to examine in finer detail models for the natural numbers
and those interested should consider taking Part B Set Theory in the third year.
For now we shall work with:
Definition 6 N is the smallest set such that
(i) 0 ∈ N
(ii) if n ∈ N then n + 1 ∈ N.
This definition of the natural numbers ties in very readily with proofs by induction.
1.2 Induction
Induction (or more precisely mathematical induction ) is a particularly useful method of proof for
dealing with families of statements, such as the last three statements above, which are indexed by the
natural numbers, the integers or some subset of them. We shall prove statement B using induction
(Example 7). Statements B and C can be approached with induction because, in each case, knowing
that the nth statement is true helps enormously in showing that the (n + 1)th statement is true —
INDUCTION 4
this is the crucial idea behind induction. Statement D, on the other hand, is a famous open problem
known as Goldbach’s Conjecture. If we let D(n) be the statement that 2n can be written as the sum
of two primes, then it is currently known that D(n) is true for n 4 × 1018 . What makes statement
D different from B and C, and more intractable to induction, is that in trying to verify D(n + 1)
we can’t generally make much use of knowledge of D(n) and so we can’t build towards a proof. For
example, we can verify D(19) and D(20) by noting that
38 = 7 + 31 = 19 + 19 and 40 = 3 + 37 = 11 + 29 = 17 + 23.
Here, knowing that 38 can be written as a sum of two primes is no great help in verifying that 40
can be, as none of the primes we might use for the latter was previously used in splitting 38.
By way of an example, we shall prove statement B by induction before giving a formal definition
of just what induction is.
Example 7 Show that n lines in the plane, no two of which are parallel and no three meeting in a
point, divide the plane into n(n + 1)/2 + 1 regions.
Solution
Proving B(0): When we have no lines in the plane then clearly
R
we have just one region, as expected from putting n = 0 into the L2
formula n(n + 1)/2 + 1.
L4
Assuming B(n) and proving B(n+ 1): Suppose now that we have n
such lines dividing the plane into n(n +1)/2 + 1 regions and we add L1 Q
an (n + 1)th line. This extra line meets each of the previous n lines
because, by assumption, it is parallel with none of them. Also, it
P
meets each of these n lines in a distinct point, as we have assumed L3
that no three lines are concurrent. These n points of intersection
divide the new line into n + 1 segments.
Figure 1
For each of these n + 1 segments there are now two regions, one on either side of the segment, where
previously there had been only one region. So by adding this (n + 1)th line we have created n + 1
new regions. For example, in the diagram on the right, the four segments, ‘below P’, PQ, QR and
‘above R’ on the fourth line L4, divide what were four regions previously into eight new ones. In
total, then, the number of regions we now have is
(n(n + 1)/2 + 1) + (n + 1) = (n + 1)(n + 2)/2 + 1.
So the given formula is correct for the number of regions with n + 1 lines; the result follows by
induction.
Definition 8 The first statement to be proved (B(0) in the above example) is known as the initial
case or initial step. Showing the (n + 1)th statement follows from the nth statement is called
the inductive step, and the assumption that the nth statement is true is called the inductive
hypothesis.
Be sure that you understand Example 7: it contains the important steps common to any proof
by induction.
• In particular, note in the final step that we have retrieved our original formula of n(n+1)/2+1,
but with n + 1 now replacing n everywhere; this was the expression that we always had to be
working towards!
INDUCTION 5
By induction we now know that B is true, i.e. that B(n) is true for any n 1. How does this
work? Well, suppose we wanted to be sure B(3) is correct — above we have just verified (amongst
other things) the following three statements:
and so putting the four statements together, we see that B(3) is true. The first statement tells us
that B(0) is true and the next three are stepping stones, first to the truth about B(1), on to B(2)
and then on to B(3). A similar chain of logic can be made to show that any B(n) is true.
Formally, then, the principle of induction is as follows:
Theorem 9 (The Principle of Induction) Let P (n) be a family of statements indexed by the
natural numbers n = 0, 1, 2, . . .. Suppose that
It is not hard to see how we might amend the hypotheses of the theorem above to show:
Corollary 10 Let N be an integer and let P (n) be a family of statements for n N . Suppose that
• (Inductive Step) for any n 0, if P (0), P (1), . . . , P (n) are all true then so is P (n + 1).
INDUCTION 6
Proof For n 0, let Q(n) be the statement ‘P (k) is true for 0 k n’. The assumptions of the
above theorem are equivalent to: (i) Q(0) (which is the same as P (0)) is true and (ii) if Q(n) is true
then Q(n + 1) is true. By induction (as stated in Theorem 9) we know that Q(n) is true for all n.
As P (n) is a consequence of Q(n), then P (n) is true for all n.
As a consequence of induction we can now show:
Proposition 12 Every non-empty subset of the natural numbers has a minimal element.
Proof Suppose, for a contradiction, that S is a subset of N with no minimal element and define
1.3 Examples
LHS = x + (y + 0) = x + y = (x + y) + 0 = RHS,
from the previous definition. So if we assume the proposition is true for z = n and all x, y ∈ N then
EXAMPLES 7
Proof This is left as Sheet 1, Exercise 2.
To reinforce the need for proof, and to show how patterns can at first glance deceive us, consider
the following example. Take two points on the circumference of a circle and take a line joining them;
this line then divides the circle’s interior into two regions. If we take three points on the perimeter
then the lines joining them will divide the disc into four regions. Four points can result in a maximum
of eight regions. Surely, then, we can confidently predict that n points will maximally result in 2n−1
regions.
Remark 19 This problem differs from our earlier examples in that our family of statements now
involves two variables n and k, rather than just the one variable. If we write P (n, k) for the statement
in equation (1.1) then we can use induction to prove all of the statements P (n, k) in various ways:
• we could prove P (1, 1) and show how P (n + 1, k) and P (n, k + 1) both follow from P (n, k) for
n, k 1;
• we could prove P (1, k) for all k 1 and show knowing P (n, k) for all k leads to the truth of
P (n + 1, k) for all k — this reduces the problem to one application of induction in n, but to a
family of statements at a time;
EXAMPLES 8
• we could prove P (n, 1) for all n 1 and show how knowing P (n, k) for all n leads to proving
P (n, k + 1) for all n — in a similar fashion to the previous method, now inducting through k
and treating n as arbitrary.
What these different approaches rely on is that all the possible pairs (n, k) are somehow linked to
our initial pair (or pairs). Let
S = {(n, k) : n, k 1}
be the set of all possible pairs (n, k). The first method of proof uses the fact that the only subset T
of S satisfying the properties
is S itself. The second and third bullet points above rely on the fact that the whole of S is the only
subset having similar properties.
Solution (of Example 18) In this case the second method of proof seems easiest, that is we will prove
that P (1, k) holds for all k 1 and show that assuming the statements P (N, k), for a particular N
and all k, is sufficient to prove the statements P (N + 1, k) for all k. Firstly we note
1 × 2 × · · · × (k + 1)
LHS of P (1, k) = 1 × 2 × · · · × k and RHS of P (1, k) = = 1×2×· · ·×k
k+1
are equal, proving P (1, k) for all k 1. Now if P (N, k) holds true, for particular N and all k 1,
we have
N+1
LHS of P (N + 1, k) = r(r + 1)(r + 2) · · · (r + k − 1)
r=1
N(N + 1) . . . (N + k)
= + (N + 1)(N + 2) · · · (N + k) [by hypothesis]
k+1
N
= (N + 1)(N + 2) · · · (N + k) +1
k+1
(N + 1)(N + 2) · · · (N + k)(N + k + 1)
=
k+1
= RHS of P (N + 1, k),
proving P (N + 1, k) simultaneously for each k. This verifies all that is required for the second
method.
(x + y)2 = x2 + 2xy + y2
It may even have been pointed out to you that these coefficients 1, 2, 1 and 1, 3, 3, 1 are simply
the numbers that appear in Pascal’s triangle and that more generally the nth row (counting from
n = 0) contains the coefficients in the expansion of (x+y)n . Pascal’s triangle is the infinite triangle of
numbers that has 1s down both edges and a number internal to some row of the triangle is calculated
by adding the two numbers above it in the previous row. So the triangle grows as follows:
n=0 1
n=1 1 1
n=2 1 2 1
n=3 1 3 3 1
n=4 1 4 6 4 1
n=5 1 5 10 10 5 1
n=6 1 6 15 20 15 6 1
Of course we haven’t proved this identity yet! These identities, for general n, are the subject of
the binomial theorem. We introduce now the binomial coefficients; their connection with Pascal’s
triangle will become clear soon.
Definition 20 The (n, k)th binomial coefficient is the number
n n!
= where 0 k n.
k k!(n − k)!
n n
k
is read as ‘n choose k’ and in some books is denoted as n Ck . By convention k
= 0 if k > n 0
or n 0 > k.
The following lemma demonstrates that the binomial coefficients are precisely the numbers from
Pascal’s triangle.
Lemma 21 Let 1 k n. Then
n n n+1
+ = .
k−1 k k
Corollary 22 The kth number in the nth row of Pascal’s triangle is nk (remembering to count from
n = 0 and k = 0). In particular, the binomial coefficients are whole numbers.
Proof We shall prove this by induction. Note that 00 = 1 gives the 1 at the apex of Pascal’s
triangle, proving the initial step. Suppose now that the numbers nk , where 0 k n, are the
numbers that appear in the nth row of Pascal’s triangle. The first and last entries of the (n + 1)th
row (associated with k = 0 and k = n + 1) are
n+1 n+1
1= and 1 =
0 n+1
as required. For 1 k n, then the kth entry on the (n +1)th row is formed by adding the (k − 1)th
and kth entries from the nth row. By the inductive hypothesis and Lemma 21 their sum is
n n n+1
+ = ,
k−1 k k
verifying that the (n + 1)th row also consists of the correct binomial coefficients. The result follows
by induction.
Finally, we come to the binomial theorem.
Theorem 23 (Binomial Theorem) Let n be a natural number and x, y be real numbers. Then
n
n n k n−k
(x + y) = x y .
k=0
k
Proof Let’s check the binomial theorem first for n = 0. We can verify this by noting
0 0 0
LHS = (x + y)0 = 1; RHS = x y = 1.
0
For induction, we aim to show the theorem holds in the (n + 1)th case assuming the nth case to be
true. We have
n
n+1 n n k n−k
LHS = (x + y) = (x + y)(x + y) = (x + y) x y
k=0
k
writing in our assumed expression for (x + y)n . Expanding the brackets gives
n n n−1 n
n k+1 n−k n k n+1−k n k+1 n−k n k n+1−k
x y + x y = xn+1 + x y + x y + yn+1 ,
k=0
k k=0
k k=0
k k=1
k
√ π
2+ 2+ 2 + ···+ 2 = 2 cos .
2n+1
n root signs
Sets are amongst the most primitive objects in mathematics, so primitive in fact that it is somewhat
difficult to give a precise definition of what one means by a set — i.e a definition which uses words
with entirely unambiguous meanings. For example, here is a description due to Cantor:
By an "aggregate" [set] we are to understand any collection into a whole M of definite and separate
objects m of our intuition or our thought. These objects we call the "elements" of M.
One might now ask exactly what one means by a "collection" or by "objects", but the point is that
we all know intuitively what Cantor is talking about. Cantor’s "aggregate" is what we call a set.
Notation 25 (a) Let S be a set. We then write x ∈ S to denote that x is an element of S. That
is one of the "objects" in S. And we write x ∈/ S to denote that x is not an element of S.
(b) Let S and T be sets. We write T ⊆ S to denote that whenever x ∈ T then x ∈ S. That is,
every element of T is an element of S. In this case T is said to be a subset of S.
(c) The symbol ⊆ is also read as "is contained in". Note that the symbol ⊂ typically means the
same and not "is contained in but not equal to", as you might suspect.
At the same time, too liberal an understanding of what a "collection" means can lead to famous
paradoxes.
Remark 26 (Russell’s Paradox) Let
H = {sets S : S ∈
/ S} .
That is, H is the collection of sets S which are not elements of themselves. This, at first glance, is
an odd choice of set to consider but also currently seems a perfectly valid set for our consideration.
Most sets that we can think of seem to be in H. For example, N is in H, as the elements of N
are single natural numbers, and no element is the set N itself. The problem arises when we ask the
question: is H ∈ H?
There are two possibilities: either H ∈ H or H ∈ / H. On the one hand, if H ∈ / H then H meets
the precise criterion for being in H and so H ∈ H. On the other hand, if H ∈ H then H ∈ / H is
false, and so H does not meet the criterion for being in H and hence H ∈ / H.
So we have a contradiction either way. A modern take on Russell’s Paradox is that the set H
is inherently self-contradictory. It would be akin to starting a proof with "let x be the smallest
positive real number" or "let n be that largest natural number". There are no such numbers, so it is
not surprising that contradictory or nonsensical proofs might result from such a beginning. A more
modern take on sets are the ZF axioms, so-named after the mathematicians Zermelo and Fraenkel,
who gave a set of axioms for how a set might be constructed — for example by taking unions or
P(A) = {∅, {1} , {2} , {3} , {1, 2} , {1, 3} , {2, 3} , {1, 2, 3}}
Z = {. . . , −2, −1, 0, 1, 2, . . .} .
(c) The set of rational numbers (or just simply rationals) is denoted Q. This is the set comprising
all fractions where the numerator and denominator are both integers. So
m
Q= : m, n ∈ Z, n > 0 .
n
(d) The set of real numbers R will be formally introduced in Analysis I. For now we simply state that
the real numbers are those numbers with a √ decimal expansion. This includes the rational numbers
but also includes irrational numbers such as 2 and π.
C = {a + bi : a, b ∈ R}
√
where i = −1.
Example 31 Note
N ⊆ Z ⊆ Q ⊆ R ⊆ C.
Definition 32 Given subsets A and B of a set S then:
(a) the union A ∪ B of A and B is the subset consisting of those elements that are in A or B (or
both), that is:
A ∪ B = {x ∈ S : x ∈ A or x ∈ B}.
(b) the intersection A ∩ B of A and B is the subset consisting of those elements that are in both A
and B, that is:
A ∩ B = {x ∈ S : x ∈ A and x ∈ B}.
(c) the complement of B, written B c or B ′ , is the subset consisting of those elements that are not
in A, that is:
Ac = {x ∈ S : x ∈ / A}.
(d) the complement of B in A, written A\B, or sometimes A − B, is the subset consisting of those
elements that are in A and not in B, that is:
/ B} = A ∩ B c .
A\B = {x ∈ A : x ∈
A ∪ B = Z; A ∩ B = ∅; Ac = B = B\A.
Definition 34 Let S and T be sets. Their Cartesian product S × T is the set of all ordered pairs
(s, t) where s ∈ S and t ∈ T. Note that — as the name suggests — order matters in an ordered pair.
So (1, 2) = (2, 1) whilst {1, 2} = {2, 1} as sets.
Definition 35 If n 1 then we write S n for all ordered n-tuples, that is vectors of the form
(s1 , s2, . . . , sn ) where si ∈ S for all i.
(A × C) ∩ (B × D) = (A ∩ B) × (C ∩ D).
To appreciate this, (s, t) is in the LHS and RHS if the four conditions s ∈ A, t ∈ C, s ∈ B, t ∈ D
all need to apply.
Example 37 Note that (A × B)c = Ac × B c in general. For example if S = {0, 1} = T and
A = {0} = B then
We can use Cartesian products to define the notion of disjoint unions. If we take the union A ∪ B
of two sets A and B, then any element that is in both A and B appears just once in the union. We
might wish to keep some sense of an element being in both sets and retain a sense in the union of
the x that came from A that is distinct from the notion of the x that came from B.
Definition 38 (Disjoint Union) Let A and B be sets. Their disjoint union A ⊔ B is defined to be
A × {0} ∪ B × {1}.
The set A is now identified with A × {0} and B with B × {1} and any element x that is in both A
and B appears twice in the disjoint union as (x, 0) and (x, 1).
We now need to introduce some logical notation and language to help with our proofs of set identities.
Notation 39 Let P and Q denote logical statements — such as ‘x y’ or ‘for all a ∈ R, a2 0’.
(a) P ⇒ Q
This reads "P implies Q". This means that whenever the statement P is true then the statement
Q is true. This implication may be strict, meaning that it may be possible for Q to be true and P
false. For example x 4 ⇒ x 2 but 4 > 3 2 shows that the implication is strict.
We say that P is sufficient for Q and the Q is necessary for P. The statement P ⇒ Q may
also be read as "if P then Q", "Q if P " and, occasionally, "P only if Q".
P ⇒ Q can also be written as Q ⇐ P.
(b) P ⇔ Q
This reads as "P if and only if Q", which is sometimes contracted to "P iff Q". This means that
whenever the statement P is true then the statement Q and vice versa. So P is true precisely when
Q is true, or equally P is necessary and sufficient for Q.
Note that the context of the statement is an important part of its truth or falsity. So in R the
statement x > 2 ⇔ x2 > 4 is false — with x = −3 being a counter-example — but the statement is
true in N.
(c) We write P ∧ Q for the statement "P and Q" which holds when both P and Q are true.
(d) We write P ∨ Q for the statement "P or Q" which holds when either P or Q (or both) is
true. So note this is not an "exclusive or".
x∈A ⇒ x ∈ B,
A=B ⇔ ∀s ∈ S (s ∈ A ⇔ s ∈ B)
⇔ ∀s ∈ S (s ∈ A ⇒ s ∈ B ∧ s ∈ B ⇒ s ∈ A)
⇔ (∀s ∈ S s ∈ A ⇒ s ∈ B) ∧ (∀s ∈ S s ∈ B ⇒ s ∈ A)
⇔ A ⊆ B ∧ B ⊆ A.
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
and
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
Proof Suppose x ∈ A ∪ (B ∩ C). This means that x ∈ A or x ∈ B ∩ C. From the definition of union
this leads to a case by case analysis. If x is in A then we have x ∈ A ∪ B and x ∈ A ∪ C. So x is in
the RHS.
Conversely suppose that x ∈ (A ∪ B) ∩ (A ∪ C) . So x ∈ A ∪ B and x ∈ (A ∪ C) . Now we have
four cases:
(i) x ∈ A and x ∈ A which imply x ∈ A ⊆ A ∪ (B ∩ C);
(ii) x ∈ A and x ∈ C which imply x ∈ A ⊆ A ∪ (B ∩ C);
(iii) x ∈ B and x ∈ A which imply x ∈ A ⊆ A ∪ (B ∩ C);
(iv) x ∈ B and x ∈ C which imply x ∈ B ∩ C ⊆ A ∪ (B ∩ C).
Thus we’ve shown LHS ⊆ RHS and RHS ⊆ LHS both and the sets are equal.
The second equation is left as Sheet 1, Exercise 4(i).
Remark 42 (Logical equivalents of set-theoretic identities) In a very natural way ‘union’,
‘intersection’ and ‘complement’ for sets correspond to ‘or’, ‘and’ and ‘not’ for logical statements. To
For 1 m, n 2 let
{1, 2} (m, n) = (1, 1)
{3, 4} (m, n) = (1, 2)
Am,n =
{2, 4}
(m, n) = (2, 1)
{1, 3} (m, n) = (2, 2)
Then
2 2 2
Am,n = (Am,1 ∩ Am,2 ) = ∅ ∪ ∅ = ∅;
m=1 n=1 m=1
2 2 2
Am,n = (A1,n ∪ A2,n ) = {1, 2, 4} ∩ {1, 3, 4} = {1, 3}
n=1 m=1 n=1
∃x ∈ S ∀y ∈ T P (x, y)
∀y ∈ T ∃x ∈ S P (x, y)
the first statement is much stronger than the second. Take care! The following example is chosen to
show how different such superficially similarly statements actually are.
Example 47 Let S be the set of capital cities and T be the set of countries. Let P (x, y) be the
statement "x is the capital of y".
The statement
∀y ∈ T ∃x ∈ S P (x, y)
is then true — it says every country has a capital city (and it doesn’t really matter that some countries
have arguably more than one capital). Importantly here the x is permitted to depend on the y as the
quantifier comes second. So for y = Denmark there exists x = Copenhagen and for y = Botswana
there exists x = Gaborone.
However the statement
∃x ∈ S ∀y ∈ T P (x, y)
is far from true. This time the existential quantifier comes first and this single capital city x is
required to be the capital of all countries — there is clearly no such city.
An alternative approach to proving set-theoretic and logical identities is via truth tables. These
provide a systematic means of treating all the different cases that arise. There may be different
numbers of cases to consider depending on the number of sets involved in an identity. So the truth
table for the intersection of two sets involves four cases as an element may independently be in each
of the two sets or not.
Below are listed the truth tables for A∩B, A∪B, Ac , A\B and A∆B, the last denoting symmetric
difference — to be in A∆B an element needs to be in precisely one of A or B.
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
can then be made via these truth tables using a case by case analysis. For example, if an element
x is in A and C, but not in B, then we have enough information to determine whether x is in
the LHS/RHS. In all there would be eight cases to check (three independent choices of whether an
element is in each of A, B, C) and these are listed below. The case just described is the sixth case
below. As the LHS column and RHS column read identically then we do indeed have proved the
identity true in all cases.
There are similarly truth tables for logical statements. If P and Q are now logical statements
then below are the truth tables for P ∧ Q, P ∨ Q, ¬P and P ⇒ Q.
P Q P ∧Q P Q P ∨Q P Q P ⇒Q
F F F F F F P ¬P F F T
F T F F T T F T F T T
T F F T F T T F T F F
T T T T T T T T T
TRUTH TABLES 21
The last truth table for P ⇒ Q may take a little getting used to. It’s also not hard to see that this
is the same truth table as (¬A) ∨ B as it is false only when ¬P and Q are both false.
To help us understand why this is the case, consider how we prove P ⇒ Q using proof by
contradiction. In this case we assume that Q is false, whilst still assuming the hypotheses P to be
true, and arrive at a contradiction. That is we show P ∧ (¬Q) to be false — or more succinctly we
prove
¬(P ∧ (¬Q)).
For P ∧ (¬Q) to be true then both P and ¬Q must be true. So for P ∧ (¬Q) to be false, only one
of P and ¬Q need be false. That is.
(We have just demonstrated one of De Morgan’s rules. More on this in the next section.) We can
also see that the following truth tables are the same: that of P ⇒ Q and that of ¬Q ⇒ ¬P .
P Q P ⇒Q P Q ¬Q ⇒ ¬P
T T T T T T
T F F T F F
T T T T T T
F F T F F T
The statement ¬Q ⇒ ¬P is known as the contrapositive of P ⇒ Q and the two statements are
equivalent.
Augustus De Morgan was the first president of the London Mathematical Society. Beyond this, he
is mainly remembered for the following two set-theoretic laws (and their logical equivalents).
Theorem 48 (De Morgan’s Laws — finite version) Let A1, . . . , An be a family of subsets of a
set S. Then c
n n
Ak = Ack
k=1 k=1
and c
n n
Ak = Ack .
k=1 k=1
Proof We will prove the first identity with n = 2 using a truth table.
DE MORGAN’S LAWS 22
We can then prove the general result by induction. Suppose that the first De Morgan law holds true
for n subsets. Then
n+1 c n c n c n+1
Ak = Ak ∩ An+1 = Ak ∪ Acn+1 = Ack .
k=1 k=1 k=1 k=1
These may be more intuitive equivalents of De Morgan’s law. The first says that ‘P and Q’ will be
false if either P or Q is false or, put another way, it is sufficient to prove that P and Q aren’t both
true by showing that either one is false. In the second P ∨ Q is true is either of P, Q holds, so for
this to fail then both P and Q must be false.
Example 49 Let A, B ⊆ S and C, D ⊆ T . Then
Solution
(s, t) ∈ (A × B)c ⇔ ¬((s, t) ∈ A × B)
⇔ ¬(s ∈ A ∧ t ∈ B)
⇔ ¬(s ∈ A) ∨ ¬(t ∈ B)
⇔ s ∈ Ac ∨ t ∈ B c
⇔ (s, t) ∈ Ac × B c .
More generally for arbitrary unions and intersections, the quantifiers ∀ and ∃ make explicit,
logically, what it means for an element to be in such intersections and unions. Above we consider
finite unions and intersections. It’s not hard to imagine that we might have infinitely many sets
A0 , A1 , A2 ,. . . and wish to consider the intersection and union
∞ ∞
An , An .
n=0 n=0
To be in the intersection requires being in every An and to be in the union means being in some An .
In these examples the sets An and indexed using n ∈ N and N is called the indexing set. We might
generalize this to the case where we have sets {Ai : i ∈ I}, with I now being the indexing set, where
I could be yet "larger" sets than N such as R.
DE MORGAN’S LAWS 23
Hence we define the arbitrary intersection and arbitrary union by
x ∈ Ai ⇔ ∀i ∈ I x ∈ Ai ;
i∈I
∞
x ∈ An ⇔ ∃i ∈ I x ∈ Ai .
n=0
So if we are to have versions of De Morgan’s laws that apply to arbitrary intersections and unions
we need to be able to negative statements involving these quantifiers. Note we have
Proposition 50 (De Morgan’s Laws — logical version) Let P (x) be a family of statements,
indexed by the elements x of some set S. Then
Proof For ∀x ∈ S P (x) to be true it is the case that P (x) is universally true on the set S. For this
not to be the case means that only one counter-example x ∈ S needs to be exist where ¬P (x) holds.
The second law follows from the first. If we replace P (x) with ¬P (x) we have
Remark 51 You should make sure you are comfortable with these logical versions of De Morgan’s
laws. One says that for P (x) not to be universally true means there is some counter-example. The
second says that for P (x) to be nowhere true means that it is everywhere false.
So far that might seem reasonably clear and intuitive. But, as we will see, logical statements in
mathematics can become quite complicated, with many important statements including four quan-
tifiers. The careful negation of such statements — for example if you wish to prove a result by
contradiction — needs due attention.
Remark 52 (Vacuously true statements) Whatever the statement P (x), the statement
∃x ∈ ∅ P (x)
is always true, no matter what P (or its negation) is. We say a statement of the form
∀x ∈ ∅ P (x)
is vacuously true. Essentially as ∅ there is no x for which the statement needs verifying, and so
the statement is true.
DE MORGAN’S LAWS 24
Corollary 53 (De Morgan’s Laws — arbitrary set-theoretic version) Let S be a set and for
each i ∈ I let Ai ⊆ S. Then
c
Ai = Aci .
i∈I i∈I
c
Ai = Aci .
i∈I i∈I
Proof We have c
x∈ Ai ⇔ ¬ x∈ Ai
i∈I i∈I
⇔ ¬ (∀i ∈ I x ∈ Ai )
⇔ ∃i ∈ I ¬(x ∈ Ai )
⇔ ∃i ∈ I x ∈ Aci
⇔ x∈ Aci .
i∈I
We can prove the second law in a similar way or alternatively set Bi = Aci so that
c
x∈ Bi ⇔ ¬ x∈ Aci ⇔ x∈ Ai ⇔ x∈ Bic .
i∈I i∈I i∈I i∈I
Binary relations, (which we’ll usually just refer to them simply as relations), are common in math-
ematics and everyday life, and include the following examples.
• The relation , meaning "less than or equal to", which compares pairs of real numbers.
• The relation |, meaning "divides" or "is a factor of", which compares pairs of positive integers.
• The relation ⊆, meaning "is a subset of", which compares subsets of a set.
• The relation , meaning "is less than or equal to at all points", which compares pairs of
functions from R to R.
• The relation "is at the same height above sea level", which compares points on the earth’s
surface.
(c) Let S = P({1, 2}), the set of subsets of {1, 2} and ⊂ denote the relation "is strictly contained
in". Then
⊂ equals {{∅, {1}}, {∅, {2}}, {∅, {1, 2}}, {{1}, {1, 2}}, {{2}, {1, 2}}}.
Definition 56 Let S be a set, R a relation on S and s, t, u ∈ S. We say that
(a) R is reflexive if sRs for all s in S.
(b) R is symmetric if whenever sRt then tRs.
(c) R is anti-symmetric if whenever sRt and tRs then s = t.
(d) R is transitive if whenever sRt and tRu then sRu.
Example 57 Define R on N by aRb if b = ak for some k 1. Then R is reflexive, anti-symmetric,
transitive but not symmetric.
Solution Reflexivity: R is reflexive as a = a1 for all a ∈ N and so aRa.
Anti-symmetry: Say now that aRb and bRa. Then b = ak and a = bl for k, l 1. If a = 0 then
a = b = 0 follows. If a = 0 then a = (ak )l = akl and this implies akl−1 = 1. Then kl = 1, giving
k = l = 1 and implying a = b.
Transitivity: We see that R is transitive also — if aRb and bRc then b = ak and c = bl for some
k, l 1. Then c = (ak )l = akl and so we see that aRc as kl 1.
Symmetry: Finally R is not symmetric as 2R4 is true but 4R2 is false.
Example 58 We define R on S, the set of n × n real matrices with GL(n, R), with ARB if there
exists an invertible matrix P such that A = P −1 BP. Then R is reflexive, symmetric and transitive
but not anti-symmetric. The relation R is called similarity.
Solution Reflexivity: we see that ARA for all A as A = I −1 AI.
Symmetry: if ARB then A = P −1 BP for some P and so B = P AP −1 = (P −1)−1AP −1 showing
BRA.
as is
z1 z2 ⇔ z1 = z2 or |z1 | < |z2| .
(c) (Lexicographic Order) With S = C then following is a total order
(e) "Divides" on N is a partial order which is not a total order. e.g. 2 ∤ 3 and 3 ∤ 2.
a − c = (a − b) + (b − c) = (k + l)n
and hence a ∼ c.
Definition 67 Given an equivalence relation ∼ on a set S with a ∈ S, then the equivalence class
of a, written ā or [a] , is the subset
ā = {x ∈ S : x ∼ a} .
Example 68 For Example 64(a), the equivalence classes are the circles centred at the origin with
radius r > 0, and the origin by itself.
For Example 64(c), the equivalence class of 0 is the set of constant functions.
Theorem 69 Let ∼ be an equivalence relation on the set S. The equivalence classes of ∼ form a
partition of S.
Definition 70 Let S be a set and Λ be an indexing set. We say that a collection of subsets Aλ of S
(where λ ∈ Λ) is a partition of S if
(i) Aλ = ∅ for each λ ∈ Λ;
(ii) Aλ = S;
λ∈Λ
(iii) if λ = µ then Aλ ∩ Aµ = ∅, or equivalently: if Aλ ∩ Aµ = ∅ then λ = µ.
Proof (Not on IUM syllabus — this result is proved in the HT Groups and Group Actions course —
but it is included here for anyone interested) Recall that the equivalence class of a ∈ S is
ā = {s ∈ S : s ∼ a} .
In order to show that these partition S we need to show that equivalence classes are non-empty and
that every element of S lies in one, and only one, equivalence class of S. Given a ∈ S then a ∼ a by
c ∼ a and c ∼ b.
Theorem 72 Let S be a set. Given a partition P of S and a ∈ S, we will write Pa for the unique
set in P such that a ∈ Pa .
(a) Given an equivalence relation ∼ on S then the equivalence classes of ∼ form a partition P (∼)
of S (where P (∼)a = ā for each a ∈ S).
(b) Given a partition P of S then the relation ∼P on S defined by
a ∼P b if and only if b ∈ Pa
is an equivalence relation on S.
(c) As given above, (a) and (b) are inverses of one another; that is
In particular, there are as many equivalence relations on a set S as there are partitions of the set S.
Proof This result is not on the IUM syllabus. A proof of this will appear in the HT Groups and
Group Actions course.
Exercise 9 Let A, B ⊆ S. Show that (Ac )c = A. Show more generally that A\(A\B)) = A ∩ B.
Exercise 10 Let A, B, C ⊆ S. The symmetric difference A∆B is defined to be the subset con-
sisting of those elements of S which are in A or are in B, but not both.
(i) Show that
A∆B = (A\B) ∪ (B\A).
(ii) Show further that
Consequently P(S) is an Abelian group under ∆, with identity ∅ and every element its own inverse.
(iii) Show that
A ∩ (B∆C) = (A ∩ B)∆(A ∩ C).
(A ∩ B) ∩ C = A ∩ (B ∩ C), A ∩ B = B ∩ A, A ∩ S = A,
this means that P(S) is a commutative ring under ∆ and ∩, with additive identity ∅ and multi-
plicative identity S.
Exercise 11 Give a sequence of non-empty sets A1 ⊇ A2 ⊇ A3 ⊇ · · · such that their intersection
∞
Ak
k=1
is a total order.
Exercise 14 A partially ordered set (S, ) is said to be a lattice if for each x, y ∈ S there is a least
upper bound x ∨ y and a greatest lower bound x ∧ y. Show that the following partially ordered sets
are lattices, describing ∨ and ∧ in each case. m(We say that z is an upper bound for x and y is
x z and y z and z is further a least upper bound if whenever x w and y w then z w.)
(i) (P(X), ⊆) where X is a given set.
(ii) (N, |) .
(iii) (R, ) .
(iv) The space of bounded functions f : R → R with f g if f (x) g(x) for all real x.
Functions now play a key role in mathematics, but first, fledgling concepts of functions first date
only back to the 17th century, mainly down to the introduction of Cartesian co-ordinates and the
development of calculus. Even by Euler’s time, a century later, the working definition of a function
was at best somewhat limited:
and Euler also permitted functions to take multiple values. Fourier series in the nineteenth century
led to the study of arbitrary real-valued functions of a real variable. The more general (naïve)
definition of a function below could not have come about without Cantor’s work on set theory in
the late 19th century and a final rigorous definition of a function was not really in place until the
twentieth century with the work of Zermelo, Fraenkel, Skolem, Von Neumann, Weyl.
Definition 73 Let X and Y be sets. A function f : X → Y is an assignment of a value f (x) ∈ Y
for each x ∈ X. The set X is called the domain of f and the set Y is called the codomain of f.
Functions are also referred to as maps and mappings.
Remark 74 It is an important, if subtle, aspect to appreciate that a function is the "whole package"
of assignment, domain and codomain. For example the following four functions are all different
functions, despite the assignment looking the same, and are different in some crucial ways as we will
later see.
f1 : R → R given by f1(x) = x2 .
f2 : R → [0, ∞) given by f2(x) = x2 .
f3 : [0, ∞) → R given by f3(x) = x2 .
f4 : [0, ∞) → [0, ∞) given by f4(x) = x2 .
f(X) = {f(x) : x ∈ X} ⊆ Y.
FUNCTIONS 31
(c) Let T = P(N)\{∅} denote the set of non-empty subsets of N. The function
is not a well-defined function as some non-empty subsets of N, such as N itself, do not have a largest
element.
(d) The function f : Q → Z given by f(m/n) = n is not a well-defined function as we would have
yet 2/3 = 4/6. A function cannot map the same point to more than one image point. We could
amend our definition to give a well-defined function by insisting that m/n is the rational in simplest
form with n > 0.
(e) The function f : N → Z defined by f(n) = n has image N. The map f is called inclusion. Given
a set B and a subset A ⊆ B then the inclusion map ι : A → B is defined by ι(a) = a for all a ∈ A.
(f) Given a set X, the identity map id : X → X is defined by id(x) = x for all x ∈ X.
(g) Given a function f : X → Y, and a subset A ⊆ X, the restriction of f to A is the map
f|A : A → Y defined by
f|A (x) = f(x).
So f|A is the same assignment as f but restricted only to the domain of A.
Remark 77 (Well-definedness) The phrase well-defined is common in mathematics — and it is
commonly confusing to undergraduates as checking something is well-defined means checking different
things in different scenario. "Well-defined" means just that — that a mathematical object has been
properly defined; the definition means that there is no ambiguity nor omission in the definition of
the mathematical object. But given the varying nature of such objects, what needs checking varies
from situation to situation.
That said, it is common to need to check that a function is well-defined, and there are two main
ways in which we might need to check a function is well-defined.
(a) The assignment needs to be defined for all elements of the domain with the output in the
codomain. The functions below each fail to be well-defined for different reasons.
√
f1 : R → R x !→ x;
√
f2 : C → C x !→ x;
f3 : P(N) → N A !→ |A| ;
f4 : R → R x !→ min y : y 3 + y > x ;
f4 : R → R x !→ min y : y 2 + y = x .
• f1 is not defined on the whole domain — it’s not defined on the negative numbers. One way to
resolve this would be to restrict the domain to [0, ∞).
• f2 is ambiguous. Every complex number has at least one square root, but the definition does
not make clear how f2 (−2i) is defined — would the answer be 1 − i or −1 + i. This problem
could be resolved by finding a way to specify which square root is intended.
• f3 is not defined on the whole domain — some subsets of N are infinite. This could be resolved
by restricting the domain to finite sets or by extending the codomain to N ∪ {∞} .
(b) Functions need to be carefully defined on sets of equivalence classes. Many functions are de-
fined in terms of representatives of equivalence classes, or in terms of language relating to equivalence
classes. We already do this without thinking, for example you will have known for a long time that
This formula does not relate to specific triangles with precisely known vertices — such a formula exists
but is messy — but rather is defined on set of congruence classes of triangles.
In other examples a function is not well-defined when it does not "descend" to the set of equiva-
lence classes. Here are two functions that are well-defined on T, the set of triangles in R2.
For t ∈ T , f (t) = the length of the longest side of t.
For t ∈ T , g(t) = the length of the largest angle of t.
Both these are well-defined notions for a particular triangle. Two different equivalence relations on
T are ∼1 which denotes congruence and ∼2 which denotes similarity. Note that the function g still
makes sense on both sets of equivalence classes T / ∼1and T / ∼2 . However the function f only makes
sense on T / ∼1 . We cannot refer to the longest side of a collection of triangles that are similar to
one another, but it still makes sense to refer to their common largest angle.
Definition 78 Let f : X → Y be a function.
(a) Given A ⊆ X, then the image of A, denoted f(A), is the subset
f (A) = {f(x) : x ∈ A} ⊆ Y.
f −1 (C) = {x : f(x) ∈ C} ⊆ X.
Example 79 (a) Let f : R → R be the function f (x) = x2, and A = [0, ∞) and B = (−∞, 0]. Then
h(A) = h(B) = [−1, 1], h−1 (A) = [−π/2 + 2kπ, π/2 + 2kπ] , h−1 (B) = [π/2 + 2kπ, 3π/2 +
k∈Z k∈Z
Figure 1: y = f1 (x)
Solution (a) f1 is neither 1—1 nor onto. It is not 1—1 as f1 (1) = 1 = f1 (−1) and is not onto as −1
is not in the image of f1 . the dashed lines in the graph of y = f1(x) highlight these facts.
(b) f2 is 1—1. If 2x1 = 2x2 then 2x1 −x2 = 1 and so x1 − x2 = 0. However f2 is not onto as −1 is
not in the image of the function.
(c) f3 is not 1—1 as f3 (1) = 0 = f3 (−1). It is onto, and this is best appreciated by a sketch of the
graph y = x3 − x (Figure 1) or by knowing that every cubic has a real root. So the cubic equation
x3 − x − y = 0 has a real solution x for all values of y. Hence y = f3 (x) is in the image of f3 for all
y or equivalently f3 is onto.
(d) That f4′ (x) = 3x2 + 1 > 0 means that f4 is strictly increasing. So if x1 < x2 then f4 (x1) <
f4 (x2) and we see that f4 never takes the same value at distinct inputs — that is f4 is 1—1. Again
sketching the graph y = x3 + x or knowing x3 + x − y = 0 has a real solution x for all values of y,
we see that f4 is onto.
Example 87 If we return to the examples f1 , f2 , f3, f4 from Remark 74 we note the following
This reinforces the fact that a function is the whole package of assignment, domain and codomain.
Note that f1 and f3 are not onto as −1 is not in the image, and f1 and f2 are not 1—1 as (−1)2 = 12 .
thus showing that g ◦ f is onto. Again there is a partial converse, which is left to Sheet 2, Exercise
4(iii). But if g ◦ f is onto then f need not be onto as we can see by setting
f (r1 ) = f (r2) .
(g ◦ f ) (r1) = (g ◦ f ) (r2)
and as g ◦ f is 1—1 then r1 = r2 . However, g need not be 1—1 as shown using the same f, g, R, S, T as
above. Here (g ◦ f) (x) = x and so g ◦ f is 1—1, but g (1) = g (−1) = 1 and so g is not 1—1.
Notation 92 Let f : X → Y with A ⊆ Y. The earlier use of the notation f −1 (A) for pre-images
does not in any way imply that f is invertible. WHEN AND IF f is invertible, then the notation
f −1 (A) means the same both as the pre-image of A and also as the image of A under the function
f −1 .
Example 93 The function x !→ sin x from R to R is not invertible. It is not for example 1—1 as
sin 0 = sin π. So when we consider the function sin−1 we are considering the inverse of the function
x !→ sin x from [−π/2, π/2] to [−1, 1] .
Theorem 94 A function f : S → T is bijective if and only if it is invertible.
Proof Firstly we’ll assume that f has an inverse g : T → S such that
s1 = g (f (s1)) = g (f (s2 )) = s2
f (g (t)) = t
g (t) = s.
A potential problem with this is that there may be many such s and a well-defined g can only assign
one of these to t. But as f is 1—1 then we can show that there is in fact only one such s. If
f (s1 ) = t = f (s2 )
g (f (s)) = s
f (g (t)) = t
In fact, considered another way, injectivity and surjectivity can be rephrased as a function having
a left-inverse or a right-inverse. We state, but do not prove, the following result.
Proposition 95 Let f : R → S be a map between non-empty sets R and S.
(a) f is 1—1 if and only if there is a map g : S → R such that g ◦ f = idR .
(b) f is onto if and only if there is a map g : S → R such that f ◦ g = idS .
3.3 Cardinality
Cardinality is a fancy word for size — given a set X we wish to rigorously define |X|, the cardinality
of X, to be the number of distinct elements in the set X. For finite sets this will not throw up any
surprises — more surprising results will emerge when infinite sets are encountered in the Analysis I
course.
Definition 96 Let n 1 be a natural number and X be a set. We define the cardinality |X| of
X to be n if there is a bijection from X to the set {1, 2, . . . , n} . The cardinality of the empty set is
defined to be 0.
Definition 97 A set X is said to be finite if its cardinality is some natural number.
Proposition 98 Let m, n ∈ N with m < n. There is no bijection between {1, 2, . . . , m} and {1, 2, . . . , n} .
Consequently the cardinality of a finite set is well-defined.
Proof Suppose, for a contradiction, there is some is a bijection between some {1, 2, . . . , m} and
{1, 2, . . . , n} where m < n. And further we may assume m to be the smallest such integer for which
there is a bijection
f : {1, 2, . . . , m} → {1, 2, . . . , n} .
Then 1 f(m) n and we can restrict f to produce a bijection f˜ from {1, . . . , m − 1} and
{1, . . . , n} \{f(m)} and then a bijection g from {1, . . . , n} \{f(m)} to {1, . . . , n − 1} by
k 1 k < f (m),
g(k) =
k − 1 f (m) < k n.
CARDINALITY 38
Proof The definition given in Sheet 1, Exercise 5 is recursive, so the equivalence of these definitions
can be verified using induction. The proof is left as an optional exercise.
Proposition 100 Let n be a positive integer and ∅ = X ⊆ {1, 2, . . . , n} . There is an ordering
The map h : {1, 2, . . . , m} → {1, 2, . . . , n} given by k !→ k is 1—1 and hence we have an injection
g −1 ◦ h ◦ f from S to T.
Conversely if there is a 1—1 map f from S to T, then this map is a bijection from S to its image
f (S) ⊆ T. By Proposition 100 we then have |S| = |f(S)| |T | as required.
(b) Say that |S| = m and |T | = n with m n. Then we have bijections
k 1 k n
h (k) =
n n<k m
CARDINALITY 39
Proof We could prove this by induction, but the proof is probably most transparent by considering
how to construct such a bijection f. There are n choices for the value that f (1) can take. However
once f (1) has been decided upon there are then n − 1 choices for f(2) as we must have f (1) = f(2)
for the map to be 1—1. Similarly, having decided on f(2) there are then n − 2 choices for f (3) etc..
In all there are
n × (n − 1) × (n − 2) × · · · × 2 × 1 = n!
ways to construct a bijection.
Example 104 Let A = {1, 2, 3} and B = {1, 2}.How many maps are there from A to B? How many
of these are 1—1, how many onto, how many bijective? Repeat this question for maps (b) B → A, (c)
A → A, (d) B → B.
Solution (a) There are 23 = 8 maps from A to B; to see this we note that each of 1, 2, 3 can
independently map to one of two values in B. None of these maps are 1—1 as we cannot find three
distinct image points in B and so none of these maps are bijective. For a map A → B to be onto at
least one element must map to each of 1 and 2. Say one element maps to 1 and two to 2; there are
three such maps as the choice of what element maps to 1 entirely determines the map. There are
similarly three maps where one element maps to 2. In all there are 6 onto maps from A to B.
(b) There are 32 = 9 maps from B to A, none are onto, none are bijective and 6 are injective.
(c) There are 33 = 27 maps from A to A. There are six maps that are 1—1, the same six being
the onto maps and the bijections.
(d) There are 22 = 4 maps from B to B. Two of these maps are 1—1 and as in (c) (and for the
same reasons) these two maps are also the onto maps and the bijections.
Example 105 List the subsets of {1, 2, 3}.
Solution There are eight subsets of {1, 2, 3} , each listed below.
Most of them will be natural enough though it may seem surprising that the empty set, ∅, the
subset with no elements is a permitted subset.
Note that there are 23 subsets of {1, 2, 3}. This isn’t that surprising if we note that each subset
of {1, 2, 3} is entirely determined by how it would ‘respond’ to the following ‘questionnaire’:
Each element can be independently in a subset or not, irrespective of what else the subset may
contain. Three independent binary (i.e. yes or no) decisions lead to 23 possible sets of responses.
Respectively, the eight subsets above correspond to the series of questionnaire responses
If we interpret the subsets of {1, 2, . . . , n} as the possible responses to n yes-or-no questions then
it is not surprising that there are 2n subsets. Nonetheless we give a careful proof of this fact using
induction.
Proposition 106 (Subsets of a finite set) Show that there are 2n subsets of the set {1, 2, ..., n}.
CARDINALITY 40
Proof The subsets of {1} are ∅ and {1}, thus verifying the proposition for n = 1. Suppose now that
the proposition holds for a particular n and consider the subsets of {1, 2, . . . , n, n + 1}. Such subsets
come in two, mutually exclusive, varieties: they either contain the new element n + 1 or they don’t.
The subsets of {1, 2, . . . , n, n + 1} which don’t include n+1 are precisely the subsets of {1, 2, ..., n}
and by hypothesis there are 2n of these. The subsets of {1, 2, . . . , n, n + 1} which do include n + 1
are the previous 2n subsets of {1, 2, ..., n} together with the new element n + 1 included in each of
them; including n + 1 in these 2n distinct subsets still leads to 2n distinct subsets. So in all we have
2n + 2n = 2n+1 subsets of {1, 2, . . . , n, n + 1}, completing the inductive step.
Example 107 Let X be a set with |X| = n.
(a) How many binary relations are there on X?
(b) How many of these binary relations are reflexive?
(c) How many are symmetric?
(d) How many are reflexive and symmetric?
Solution (a) A binary relation on X is a subset of X 2. As |X 2 | = |X|2 = n2 then there are 2n
2
relations on X.
(b) The decision to include (x1, x2 ) in a relation is one of n2 independent binary choices. However
for a reflexive relation, n of these decisions are already made as each (x, x) is in the relation. So
2
n2 − n independent binary decisions remain and there are 2n −n reflexive, binary relations on X.
(c) For symmetry if (x1, x2 ) is in the relation then so is (x2 , x1). If we identify X with {1, 2, . . . , n}
this means we only have to know whether (i, j) is in the relation where i j. This leaves 21 n(n + 1)
binary, independent decisions and so there are 2n(n+1)/2 symmetric, binary relations.
(d) For reflexivity and symmetry we still need to know whether (i, j) is in the relation where i < j.
This leaves 12 n(n − 1) binary, independent decisions and so there are 2n(n−1)/2 reflexive, symmetric,
binary relations.
Example 108 Let A is a subset of {1, 2, 3, . . . , 106} of size 10.
(a) As |A| = 10 then there are 210 = 1024 subsets of A.
(b) For each subset B of A, let sB be the sum of the elements of B (with the convention that the
empty set sums to zero). So the maximum possible value of sB is achieved when B contains the ten
largest elements of A in which case
10
sB = 97 + 98 + · · · + 106 = (97 + 106) = 5 × 203 = 1015.
2
(c) So we can consider s as a map from P(A) to {0, 1, . . . , 1015}. As the first set has 1024 elements
and the second set has 1016 then, by the pigeon-hole principle there exist (at least) two distinct subsets
B, C of A such that sB = sC .
(d) The subsets found in part B and C are distinct, but need not be disjoint. For example, it
might be the case that {1, 2, 3, 6, 9} and {2, 5, 6, 8} were the two distinct subsets of A; these are not
disjoint as 2, 6 are common elements. In this case we could produce disjoint sets, still with equal
sums, by simply removing the common elements.
More generally, if B and C are distinct subsets such that sB = sC . Show that there are disjoint
subsets B, C of A such that sB = sC then B\C and C\B are disjoint subsets of A such that
sB\C = sC\B .
CARDINALITY 41
Recall, from Definition 20, that nk = n!/ {k!(n − k)!} is read as ‘n choose k’; as we shall see,
the reason for this is that there are nk ways of choosing k elements without repetition from the set
{1, 2, . . . , n} , no interest being shown in the order that the k elements are chosen. By way of an
example:
Example 109 Determine 53 and list the subsets corresponding to this number.
Solution We have 53 = 3!2!
5! 120
= 6×2 = 10 and the corresponding subsets are
{1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5},
{1, 4, 5}, {2, 3, 4}, {2, 3, 5}, {2, 4, 5}, {3, 4, 5}.
Note that there are 6 = 3! orders of choosing the elements that lead to each subset. Any of the
ordered choices
(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1),
Cantor extended the ideas that we have seen introduced for the cardinality of finite sets to that
of infinite cardinals. The smallest infinite cardinal is ℵ0, pronounced aleph-null, the cardinality of
the set of natural numbers N. It’s not hard to see that Z also has cardinality ℵ0 but perhaps more
surprising that Q also has cardinality ℵ0 . What may be even more surprising is that R has a larger
cardinality — this is because Cantor showed there is no bijection from R to N.
In our earlier results we saw that if A and B are two finite sets then:
INFINITE SETS 42
• the disjoint union A ⊔ B has cardinality |A| + |B| .
Cantor extended these ideas to infinite sets, defining a way to add, multiply and take powers of
infinite cardinals. We can also extend the idea of order to define the following:
A consequence of this is that we then need a theorem to prove that if |A| |B| and |A| |B|
then |A| = |B|, the Cantor-Bernstein-Schröder theorem. Another significant theorem of Cantor’s
showed that any set A has more subsets than it does elements. That is 2|A| > |A| and shows that
there infinitely many different infinities! It turns out that
but whether there is an infinity strictly between |N| and |R| turns out to be a very subtle question
— the continuum hypothesis — which can not be decided using just the axioms of ZF set theory.
Some of these results are covered further in the Analysis I course this term.
Exercise 20 Let f : N → N be the map f (n) = 2n. Show that f is 1—1. Define a map g : N → N
such that (g ◦ f)(n) = n for all n. Is there a map h: N → N such that f(h(n)) = n for all n?
INFINITE SETS 43
Exercise 21 How many differentiable functions f(x) satisfy f(x)2 = x2 for all x? Show that there
are infinitely many functions that satisfy f(x)2 = x2 for all x.
Exercise 22 Give an example of a bijection between the set (0, 1) and each of the sets (i) (0, ∞),
(ii) (−1, 1), (iii) R, (iv) [0, 1).
Exercise 23 Give an example of an injection from finite subsets of N and N.
Exercise 24 Let S1 and S2 be sets with 1 and 2 be partial orders on S1 and S2 . An order
isomorphism between S1 and S2 is a bijection such that for all x, y ∈ S
φ(x) 2 φ(y) ⇔ x 1 y.
(ii) Show that, using the standard orders, that {0} ⊔ N is order isomorphic to N but not order
isomorphic to N ⊔ {0} .
INFINITE SETS 44
4. WRITING MATHEMATICS, CONSTRUCTING
PROOFS
∀t ∈ T ∃r ∈ R (g ◦ f)(r) = t.
• Ask yourself, really ask yourself if you’re comfortable with this line being correct. Is this what
you would have written yourself? g ◦ f has T as its codomain and R as its domain, and being
onto means for every element of the codomain there exists an element of the domain that maps
to it. So yes this line is correct. If any of that was in doubt, then revisit your notes to make
sure you have the definition right. Don’t try to start a proof with a fuzzy sense of what needs
to be proved.
When first meeting such dense language the new symbols may seem daunting, but the quantifiers
∀ and ∃ are in fact "road-signs" for the task at hand and, properly understood, will help construct
a proof and indeed automatically fill in some of the lines of that proof.
One hypothesis takes us forward (or at least gives us more information) when we are presented
with an s ∈ S and the other is helpful when we are presented with a t ∈ T. Returning to the task
at hand, we see that the first line of our draft proof is "Let t ∈ T ". Only one of our two hypotheses
can "connect" to that information, the hypothesis that tells us g is onto. Thus we might expect the
second line of our proof should read
For a general map g : S → T it would not generally be the case that there exists such an s ∈ S, so it’s
important to include the hypothesis here to make clear the "why" of this intermediate conclusion.
So our final line is only a notational redraft of where we have so far argued.
• appreciating where the help is coming from, and where the problems lie;
∀t ∈ T ∃r ∈ R (g ◦ f)(r) = t.
Here both quantifiers are problematic in the sense that the universal quantifier sets out the general
scope of the task, and the existential quantifier assigns a task for each case within the given scope.
However in the first hypothesis
∀t ∈ T ∃s ∈ S g(s) = t,
the universal quantifier is helpful as we know a certain result holds within a particular scope.
So you need to be able to separate task from help in a useful way. This might seem obvious in
the sense that to prove "if P then Q" it is clearly going to be "P " that is helpful, but once all the
dense logic starts appearing on the page it can become difficult to separate the help from the task.
Finally mathematical thinking is a lot more than laying logical cable as this may seem. It
is important to build intuition of what definitions mean, and this will only become increasingly
important as you meet more advanced ideas. You might, for example, begin to appreciate that a
function f : X → Y being 1—1 is akin to there being no loss of information in the process of applying
f — all of X is in some sense still present and retrievable in f(X). So it might seem natural that the
composition of two injective functions is injective as no information is lost overall. In itself there is
no harm having/developing that intuition, but at the same time it is important to recognize how far
from a satisfactory proof "no information is lost overall" is.
Remark 119 Analyzing proofs Let’s return to the incorrect proof of the false statement:
Statement:Let f : R → S and g : S → T be maps, such that g ◦ f is surjective. Then f is
surjective.
Proof: Let t ∈ T and write t = g(s) where s ∈ S. As g ◦ f is onto then there exists r such that
g(f (r)) = t = g(s). Hence f(r) = s and so f is onto.
There is little chance of redeeming this proof, as the proposition itself is wrong. But it’s worth
flagging what errors are there and what irrelevances.
• One first tip-off is that the proof begins wrong. We are trying to prove that f is onto and so,
necessarily, the proof has to begin by introducing s ∈ S, as S is the codomain of f. The false
proof doesn’t begin that way.
• We then write t = g(s) for some s and there is no hypothesis that allows us to do this. To be
able to guarantee the existence of such an s we would need to know that g is onto as a given
hypothesis.
• A third error is to conclude from g(f (r)) = g(s) that f(r) = s. This is something that we
would know if g is 1—1, but again we haven’t been told this.
Highlighted here these errors may now seem obvious, but passively read they would have seemed
innocuous, and might easily have snuck by unnoticed.
The necessary guidance for how to construct and analyze proofs is largely given in the previous
section. Here we apply that guidance to some further examples. The following definitions are not on
the IUM syllabus, but you will meet them soon in the Analysis I course. They are introduced here
to highlight how helpful careful definitions can be in generating a proof, even without any intuition
having developed yet.
Definition 120 (a) A (real) sequence (xn ) is a function x : N → R and we write xn to denote x(n).
(b) A sequence (xn ) converges if
∃N ∈ N ∀n N |xn − L| < ε.
∀L ∈ R ∃ε > 0 ∀N ∈ N ∃n N ¬ |xn − L| ε,
we begin with a universal quantifier ∀L ∈ R. The remainder of the task is to show that this particular,
but arbitrary L is not a limit. Let’s try to understand just what is involved:
∃ε > 0 ∀N ∈ N ∃n N |xn − L| ε.
Example 124 Let α be the unique real root that solves α3 + α = 1. Show that α is irrational.
Solution Suppose for a contradiction that α = m/n where m and n are coprime, positive integers.
We then have that
m3 m
+ =1 ⇒ m3 + mn2 = n3.
n3 n
Let p be a prime factor of m. We then see that p divides the LHS and so also divides n3. As p is
prime then p necessarily divides n. This is the required contradiction as we assumed m and n to be
coprime.
Review: To what extent did the proof have to look like this? One approach would be to solve
the cubic, find an explicit expression for α and then show that number to be irrational. But if we
don’t know how to solve the equation then contradiction seems the only alternative.
Once we have taken that approach, then we necessarily get to the equation m3 + mn2 = n3 and
our problem is how to get a contradiction from this — knowing the numbers involved are integers we
might look to show that the two sides have different factors. We might note that m divides the LHS
and so divides the RHS. However we can’t in general quote that m then has to divide n just because
it divides n3 . This is though true if m is prime, so now we can revisit the earlier step and instead
focus on prime factors of m. It is not at all unusual, when constructing a proof, to have to go back
and tighten an earlier assumption so that the direction of the proof can continue several steps later.
Don’t expect a first attempt at a proof to be as slick as the one above or those that appear in books.
Note that the set {k ∈ N : f(k) = n} = f −1({n}) is non-empty as f is onto, and non-empty subsets
of N have minimal elements meaning g is well-defined. Finally as
g(n) ∈ {k ∈ N : f(k) = n}
0 = 02 + 02 1 = 12 + 02 2 = 12 + 12 3? 4 = 02 + 22 5 = 12 + 22
6? 7? 8 = 22 + 22 9 = 02 + 32 10 = 12 + 32 11?
12? 13 = 22 + 32 14? 15? 16 = 02 + 42 17 = 12 + 42
The list of numbers that cannot be expressed as a sum of two squares begins
There doesn’t seem to be than apparent a pattern here — and in any case by the time we have reached
100000003 we have a lot more square numbers to choose from.
What about the list of square numbers themselves
We can see that the even squares are divisible by 4 and this is not surprising as (2k)2 = 4k 2. What
about the odd squares? Well in this case
then a square number is 0 or 1 more than a multiple of 4. So a sum of two squares can only be 0, 1
or 2 more than a multiple of 4. It is therefore impossible that 100000003 can be written as a sum of
two squares.
Remark 131 Consider the following optional reading. Modular arithmetic is not expressly on the
IUM syllabus, but some understanding of modular arithmetic helps during the first year, and high-
lights some of the topics that have arisen in the course, including equivalence relations and well-
definedness.
We now note, for certain values of n, that modular arithmetic can have some unfortunate algebraic
aspects such as
3 × 5 = 0 mod 15, 4 × 3 = 0 mod 6.
It follows that one cannot divide by 3 or 5 in Z15 nor divide by 3 or 4 in Z6 . More generally we note:
Proposition 138 Let x̄ ∈ Zn with x = 0.
x̄ has a multiplicative inverse if and only if hcf(x, n) = 1. Hence if n is prime, then Zn is in fact
a field.
Proof The proofs rely on theory from the Groups and Group Actions course and are omitteed here.