0% found this document useful (0 votes)
42 views

LectureNotes DHolt

Uploaded by

Greg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

LectureNotes DHolt

Uploaded by

Greg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

MA241 Combinatorics

Derek Holt
revisions by David Mond and Daan Krammer
September 1, 2006

Contents
1 Introduction: 3 Sample Problems 2

2 Sums 6
2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Sums and Recurrences . . . . . . . . . . . . . . . . . . . . . . 6
2.3 The Perturbation Method . . . . . . . . . . . . . . . . . . . . 9
2.4 Multiple Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Finite and Infinite Calculus . . . . . . . . . . . . . . . . . . . 11
2.6 Negative Exponents in Rising and Falling powers . . . . . . . 14

3 Integer Functions 16
3.1 Floors and Ceilings . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Floor and Ceiling Problems . . . . . . . . . . . . . . . . . . . 17
3.3 Spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Floor and Ceiling Sums . . . . . . . . . . . . . . . . . . . . . . 21

4 Binomial Coefficients 22
4.1 Homogeneous trees . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Binomial Coefficients . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Easy Identities . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Some more complicated identities . . . . . . . . . . . . . . . . 28
4.5 Derangements . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.6 Multinomial coefficients . . . . . . . . . . . . . . . . . . . . . 33

5 Special Numbers 35
5.1 Stirling Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Harmonic Numbers . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3 Fibonacci Numbers, Fn . . . . . . . . . . . . . . . . . . . . . . 44

6 Generating Functions 47
6.1 Basic Manipulation . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2 Representations of sequences . . . . . . . . . . . . . . . . . . . 48
6.3 Partial Fractions . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.4 Solving Recurrences . . . . . . . . . . . . . . . . . . . . . . . . 52
6.5 The simplest sort of differential equations . . . . . . . . . . . . 61
6.6 Exponential Generating Functions . . . . . . . . . . . . . . . . 62
6.7 Generating functions in more variables . . . . . . . . . . . . . 68

7 Discrete Probability 69
7.1 Sample Spaces and Random Variables . . . . . . . . . . . . . . 69
7.2 Probability Generating Functions (PGF’s) . . . . . . . . . . . 73
7.3 Tossing Coins . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

1
1 Introduction: 3 Sample Problems

Example 1: Tower of Hanoi.

n discs

A B C

Object: Move pile A to B by moving one disc at a time. A disc may never
rest on a smaller one. What is the minimal number of moves, Tn ?

Strategy: for n > 1

1. Move top (n − 1) discs A → C (Tn−1 moves)

2. Move bottom disk A → B (1 move)

3. Move top (n − 1) C → B (Tn−1 moves)

This works and is the quickest solution, as we shall not prove.

So we have recurrence equations:

T1 = 1 , Tn = 2Tn−1 + 1 (1)

Technique: Look at some values. Guess the answer. Prove by induction.


T1 = 1 , T2 = 3 , T3 = 7 , T4 = 15. Guess Tn = 2n − 1. We can prove this is
correct (exercise) using (1).

Example 2: Lines in plane.

What is the maximum number of regions that n infinite lines can divide a
plane into? Call this Ln . Get maximum if no lines are parallel and no three
lines meet at a point.

L0 = 1 (no line)
L1 =2
L2 =4
L3 =7

2
What happens when we introduce line n which intersects other (n − 1) lines?
It gets divided into n sections. Each section divides an old region into two.
So
Ln = Ln−1 + n . (2)
Pn
Hence Ln = L0 + k=1 k, for example:

L5 = L4 + 5
= L3 + 4 + 5
= L2 + 3 + 4 + 5
= L1 + 2 + 3 + 4 + 5
= L0 + 1 + 2 + 3 + 4 + 5 .

Solution:
n(n + 1)
Ln = 1 + ,
2
which can be proved formally by induction.

In general, if
Ln = Ln−1 + f (n) ,
then n
X
Ln = L 0 + f (k) .
k=1

Example 3: Josephus Problem.

1
n 2

n people sit in a circle. Going round clockwise, alternate people are removed.
Which person is left last? Call the number of this person Jn . For example,

3
1
8 2

7 3

6 4

People who go are: 2, 4, 6, 8, 3, 7, 5. So J8 = 1


n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Jn 1 1 3 1 3 5 7 1 3 5 7 9 11 13 15
Guess solution: we can write n = 2m + L for some 0 6 L < 2m . Then
Jn = 2L + 1.

Case 1: n even, say n = 2k.

In the first round all even people are removed. Then k people remain num-
bered 2i − 1 (1 6 i 6 k). Hence the person left at the end will have number
2Jk − 1. So J2k = 2Jk − 1.

Case 2: n odd, say n = 2k + 1 people.

In the first round even people go. Then 1 goes. Then k people are left with
numbers 2i + 1 (1 6 i 6 k). J2k+1 = 2Jk + 1.

The recurrence system to solve is



J1 = 1 
J2k = 2Jk − 1

J2k+1 = 2Jk + 1
We can now prove by induction that Jn = 2L + 1.

Case 1: n = 2k.
n = 2m + L ⇐ k = 2m−1 + L/2
By inductive hypothesis:
Jk = 2(L/2) + 1 = L + 1 .
So
Jn = J2k = 2Jk − 1 = 2(L + 1) − 1 = 2L + 1
as required.

Case 2: n = 2k + 1.
n = 2m + L ⇐ k = 2m−1 + (L − 1)/2

4
By induction hypothesis:
µ ¶
L−1
Jk = 2 + 1 = L.
2

So

Jn = J2k+1 = 2Jk + 1 = 2L + 1 ,

as required.

5
2 Sums

2.1 Notation

n
X X
ak = a 1 + a 2 + . . . + a n and ak
k=1 16k6n

mean the same. We can also write things like:


X 1 1 1 1 1 1
= + + + + .
16k612
k 2 3 5 7 11
k prime
P
ak means sum over all integers. Usually, only finitely many ak are non-
k
zero. Otherwise only defined if series converges.

“[Statement]” is defined to be 1 if statement is true and 0 if false. For


example:

[5 is prime] = 1
[2 + 3 = 5] = 1
[2 + 3 = 6] = 0
X [k is prime] 1 1 1 1 1
= + + + +
16k612
k 2 3 5 7 11
½
f (x) = 0 , x rational
f : R → R, where
f (x) = 1 , x irrational
is the same as
f (x) = [x ∈
/ Q].
The empty sum is zero. That is, if
n
X
Sn = k
k=1

then

S0 = empty sum = 0

2.2 Sums and Recurrences

Discrete Continuous
Sum Integral
Recurrence relation Differential equation

6
P
Sums can be converted to recurrences e.g. Sn = nk=1 2−k is equivalent to
S0 = 0 , Sn = Sn−1 + 2−n . It is useful to go the other way sometimes. This
can be done be multiplying by the “summation factor” (which is like the
integrating factor in differential equations).

Example 4: (Hanoi)

T0 = 0 , Tn = 2Tn−1 + 1

Multiply by factor 1/2n to get

Tn Tn−1 1
n
= n−1 + n .
2 2 2
Put Sn = Tn /2n , so

Sn = Sn−1 + 2−n ,

with S0 = 0. So
n µ ¶n
X
−k 1
Sn = 2 =1− (sum of a G.P.)
k=1
2

So Tn = 2n − 1.

(We used:
n
X a(rn+1 − r)
k
ar =
k=1
r−1
the sum of a G.P. (geometric progression).)

More generally, for the recurrence

an Tn = bn Tn−1 + cn (bn 6= 0) ,

we multiply by summation factor


an−1 an−2 . . . a1
Ωn = ,
bn bn−1 . . . b1
or any constant multiple cΩn .Put

S n = Ω n an T n

to get
Sn = Sn−1 + Ωn cn .
So n
X
Sn = Ω k ck + S 0 .
k=1

7
Example 5: (from “analysis Quicksort”)
c0 = 0
n−1
2X
cn = n+1+ ck (3)
n k=0
n−1
X
2
ncn = n + n + 2 ck (4)
k=0

Replacing n by n − 1 this gives


n−2
X
2
(n − 1)cn−1 = (n − 1) + (n − 1) + 2 ck (5)
k=0

Subtract (5) from (4) to get


ncn = (n + 1)cn−1 + 2n . (6)
This has linear recurrence with an = n , bn = (n + 1) , cn = 2n. The
summation factor is Ωn = an−1 an−2 ...a1
bn bn−1 ...b1
1
= 2n(n+1) . So
1
Ωn = .
n(n + 1)
Multiply (6) by Ωn to find
cn cn−1 2
= + .
n+1 n n+1
Put
cn
Sn =
n+1

so
2
Sn = Sn−1 +
n+1
where S0 = 0. So
n
X 2
Sn = .
k=1
k+1
Notation: Since there is no formula for this sum, we introduce a new notation.
Define the harmonic numbers Hn by
n
X 1
Hn = , H0 = 0
k=1
k
Note that Hn → ∞ as n → ∞.

Solution to recurrence is
Sn = 2(Hn+1 − 1)
cn = 2(n + 1)(Hn+1 − 1) .
In fact
Hn ∼ loge n + γ
where
γ = Euler’s constant ≈ 0.5772156

8
2.3 The Perturbation Method
Pn
Let Sn = k=0 ak be a sum. Then
n
X
Sn+1 = Sn + an+1 = a0 + ak+1
k=0

The idea is to try and get a formula for the above in terms of Sn . Then solve
for Sn .
Pn
Example 6: Geometric Series. We define Sn = k=0 xk . Then

Sn+1 = Sn + xn+1 .

Also
n
X
Sn+1 = 1 + xk+1 = 1 + xSn
k=0

so
xn+1 − 1
Sn = if x 6= 1
x−1

Pn
Example 7: Tn = k=0 kxk . We have

Tn+1 = Tn + (n + 1)xn+1
n n
X
k+1
X
k+1 xn+2 − x
and Tn+1 = (k + 1)x = xTn + x = xTn +
k=0 k=0
x−1

−xn+2 + x (n + 1)xn+1
so Tn = + .
(x − 1)2 (x − 1)

Another method would be to use the fact that


d(Sn )
Tn = x .
dx

2.4 Multiple Sums

Summing over more than one index. For example:


n X
X n
aj b k
j=1 k=1

factorises to n
µX n
¶µ X ¶
aj bk .
j=1 k=1

9
We aim to reduce these to single sums. And more generally, the indices are
not independent, for example:
X n
X n
X n X
X k−1
(· · · ) = (· · · ) = (· · · ).
16j<k6n j=1 k=j+1 k=1 j=1

Just like there’s Fubini’s theorem in calculus (interchanging the order of


integration) there is also interchanging the order of summation, which is one
of the most important tricks in combinatorics. An easy example of this is:
b X
X d d X
X b
f (i, j) = f (i, j).
i=a j=c i=c j=a

Here’s a harder example:


n
n X j
n X
X X
f (i, j) = f (i, j). (7)
i=0 j=i j=0 i=0

If you find it hard to find an equation like (7), it can help to draw a picture
consisting of boxes, one box at position (i, j) for each value of (i, j) involved
in the sum. In the case of (7) with n = 3 that would be:

(0,0) (0,1) (0,2) (0,3) j

(1,1) (1,2) (1,3)


=
(2,2) (2,3) i
(3,3)

Example 8: X
aj ak
16j<k6n

 
a21 a1 a2 . . . a 1 an

 a2 a1 a22 . . . a 2 an 

 .. .. .. 
 . . . 
an a1 an a2 . . . a2n
We are summing terms above the main diagonal of the matrix. Call this sum
Sq . Note that the matrix is symmetric, giving Sq = Sx . So
µX n ¶2 µ X n ¶µ X n ¶ X n X n
ak = aj ak = aj ak
k=1 j=1 k=1 j=1 k=1
n
X X n
= Sx + Sq + a2k = 2 Sq + a2k .
k=1 k=1

So n
·µ X ¶2 n ¸
1 X
Sq = ak − a2k .
2 k=1 k=1

10
Example 9:
X 1
If Sn = then
16j<k6n
k−j

S1 = 0 (empty sum)
1 1 1 5
S3 = + + =
2−1 3−1 3−2 2

k
1/1 1/2 1/3 1/4

S5 = sum of 1/1 1/2 1/3 .


j
1/1 1/2
1/1

• Sum by rows:
n n
X X 1
Sn = .
j=1 k=j+1
k − j
Change of variable: put k − j = l.
n−j
n X n n−1
X 1 X X
Sn = = Hn−j = Hj .
j=1 l=1
l j=1 j=0

• Of course, summing by columns gives the same result. However:


• Summing by diagonals:
n X
n−l n n µ ¶
X 1 X (n − j) X n
Sn = = = −1 = nHn − n .
l=1 j=1
l j=1
j j=1
j

So we have shown that


n−1
X
Hj = nHn − n .
j=0

Compare this with Z


ln xdx = x ln x − x + c .

2.5 Finite and Infinite Calculus

n
X n(n + 1) n2
S1 (n) = k= = + O(n)
k=0
2 2
n
X n3
S2 (n) = k2 = + O(n2 )
k=0
3

11
In general,
ni+1
Si (n) = + O(ni )
i+1
corresponds roughly to Z
i xi+1
x dx = ,
i+1
but the correspondence is not exact.

Rising and falling powers are easier to sum:

Definition 1: The following notation is not very common, but we will find
it quite useful.

• The falling power, xm = x(x − 1) . . . (x − m + 1) for x ∈ R.


| {z }
m factors

• The rising power, xm = x(x + 1) . . . (x + m − 1).

Some properties:

• x0 = x0 = 1 (empty products are 1).


µ ¶
m n! n n
• for n ∈ N, n = (n−m)! = Pm = m!
m
• nn = n!
µ ¶
m n+m−1
• n = m!
m

Then
³ ´
(x + 1)m − xm = x(x − 1) . . . (x − m + 2) (x + 1) − (x − m + 1)
= mxm−1 (8)

Similarly
xm − (x − 1)m = mxm−1 . (9)
These are similar to differentiation formulae.

Consider n
X
km
k=0

with
(k + 1)m+1 − k m+1
km =
m+1
(by (8) using (m + 1) for m).

12
On summing, most terms cancel. We get
n
X
m (n + 1)m+1 − 0m+1 (n + 1)m+1
k = = , for m > 0 , (10)
k=0
m+1 m+1

which is an exact analogue of integration.

Similarly, summing (9) gives


n
X nm+1
km = , for m > 0 . (11)
k=1
m+1

For example, when m = 2 and n = 4,


6×5×4
1.2 + 2.3 + 3.4 + 4.5 =
3
= 40

We have:
n
X n(n + 1)
k = ,
k=1
2
n
X n(n + 1)(n + 2)
k(k + 1) = , etc . . .
k=1
3

These can be used to get expressions for Sm (n). For example,


n
X
S3 (n) = k3 .
k=1

We must express k 3 in terms of rising (or falling) powers, that is:

k 3 = k(k + 1)(k + 2) − 3k(k + 1) + k .

So
n(n + 1)(n + 2)(n + 3) 3n(n + 1)(n + 2) n(n + 1)
S3 (n) = − +
4 3 2
³ ´ 2 2
n(n + 1) 2 n (n + 1)
= n + 5n + 6 − 4n − 8 + 2 = = (S1 (n))2 .
4 4

Note: Since µ ¶
m k
k = m! ,
m
equation (10) gives
n µ ¶ µ ¶
X k n+1
= , n > 0.
m m+1
k=0

13
2.6 Negative Exponents in Rising and Falling powers

Motivation:
x3 = x(x − 1)(x − 2) divide by (x − 2)
x2 = x(x − 1) divide by (x − 1)
x1 = x divide by x
x0 = 1 divide by x+1
1
x−1 = x+1
...
1
x−2 = (x+1)(x+2)
...

Definition 2:
1 1
x−m = =
(x + 1) . . . (x + m) (x + 1)m
1 1
x−m = =
(x − 1) . . . (x − m) (x − 1)m

Note: x−m := x−m for typographical reasons.

It can be checked that equations (8) and (9) are still true for m 6 0. So they
are true for all m ∈ Z.

But (10) and (11) become


n
X (n + 1)m+1 − 0m+1
(10) km =
k=0
m+1
n
X nm+1 − 0m+1
(11) km =
k=1
m+1

Therefore true for m ∈ Z , m 6= −1.

Note: Note that


1
0−m =
(−1)(−2) . . . (−m)
1
0−m =
1.2. . . . m
are non-zero.

Example 10:
n n ³ 1 ´ ∞
X X 1 X
k −2 = =− −1 so k −2 = 1
k=0 k=0
(k + 1)(k + 2) n+2 k=0

n ∞
X
−3 1³ 1 1´ X 1
k =− − so k −3 =
k=0
2 (n + 2)(n + 3) 2 k=0
4

14
What happens for m = −1?
n n
X
−1
X 1
k = = Hn+1 ∼ loge (n + 1)
k=0 k=0
k+1
Z
1
Compare this with dx = loge x.
x

15
3 Integer Functions

3.1 Floors and Ceilings

Definition 3: For x ∈ R:

Floor(x) := bxc = greatest integer m 6 x,


Ceiling(x) := dxe = least integer m > x.

Example 11:

b3c = bπc = 3 ,
d3e = 3,
dπe = 4,
b−2c = b−1.4c = −2 ,
d−1.4e = −1 ,
d−2e = −2 .

Some easy properties:

bxc = x ⇐⇒ x ∈ Z ⇐⇒ dxe = x
x − 1 < bxc 6 dxe < x + 1
b−xc = −dxe , d−xe = −bxc
bxc = n ⇐⇒ n 6 x < n + 1 ⇐⇒ x − 1 < n 6 x
If n ∈ Z then bx + nc = bxc + n.

Note that bx + yc and bxc + byc are not always equal, for example b3/4 +
3/4c = 1 6= 0 = b3/4c + b3/4c.

For m, n ∈ Z, n mod m is defined as the remainder when n is divided by m.

In general, for x, y ∈ R, y 6= 0, we define

x mod y = x − ybx/yc

Example 12:

−5 mod 3 = 1 (not -2)


5 mod − 3 = −1 (x mod y has same sign as y)
−5 mod − 3 = −2

x mod 1 = x − bxc = fractional part of x. Can define x mod 0 = x

16
Exercise. Let a < b be real numbers. We have the four types of intervals

[a, b] = {x ∈ R | a 6 x 6 b},
[a, b) = {x ∈ R | a 6 x < b},
(a, b] = {x ∈ R | a < x 6 b},
(a, b) = {x ∈ R | a < x < b}.

Complete the following table, which expresses the number of integers in such
intervals in terms of the floor and ceiling:

|[a, b] ∩ Z| = |[a, b) ∩ Z| =

|(a, b] ∩ Z| = bbc − bac, |(a, b) ∩ Z| =

3.2 Floor and Ceiling Problems


p √
Example 13: Is b bxcc = b xc for x > 0? Yes.

Proof: p p
m=b bxcc ⇐⇒ m 6 bxc < m + 1
⇐⇒ m2 6 bxc < (m + 1)2
⇐⇒ m2 6√x < (m + 1)2
⇐⇒ m 6 √x < (m + 1)
⇐⇒ m = b xc

p √
Example 14: Is d bxce = d xe? Not always.
p
m + 1 = d bxce
p ⇐⇒ m2 < bxc 6 (m + 1)2 . If m2 6 bxc < m2 + 1 then
bxc = m2 and bxc = m.

So this equation fails for x satisfying m2 < x < m2 + 1 for some m, for
example x = 4.3 or x = 9.5.

Example 15: How many integers in the range 1 6 n 6 1000 satisfy



b 3 nc | n? (∗)

a | b means a divides b exactly where a, b ∈ Z. Notation: [5 . . . 9] means


{5, 6, 7, 8, 9}.

For n ∈ [1 . . . 7] , b 3 nc = 1, so (∗) holds for all of them.

For n ∈ [8 . . . 26] , b 3 nc = 2, so (∗) holds for even n in this range.

For n ∈ [27 . . . 63] , b 3 nc = 3, so (∗) holds for multiples of 3 in this range.
..
.

17

For n ∈ [729 . . . 999] , b 3 nc = 9 so (∗) holds for multiples of 9 in this range.

For n = 1000 , b 3 nc = 10.

Answer very roughly is


(33 − 23 ) (43 − 33 ) (103 − 93 )
7+ + + ... + + 1.
2 3 9
(In fact we want ceilings of terms.)

We can write the problem as that of determining


1000
X £ √ ¤
A= b 3 nc | n
n=1

Put k = b 3 nc
9 X
X
A = 1+ [k | n][k 3 6 n < (k + 1)3 ]
k=1 n
X9 X
= 1+ [n = mk][k 3 6 n < (k + 1)3 ]
k=1 n,m
9 X
X
= 1+ [n = mk][k 3 6 mk < (k + 1)3 ]
k=1 n,m
9 X
X
= 1+ [k 2 6 m < k 2 + 3k + 3 + 1/k]
k=1 m
X9
= 1+ (3k + 4)
k=1
= 1 + 3 × 45 + 36
= 172

(Here we have used that 3k + 4 is the number of integers satisfying k 2 6 m <


k 2 + 3k + 3 + 1/k.)

3.3 Spectra

For x ∈ R we define the spectrum, spec(x), of x to be the set of integers


n o
bxc , b2xc , b3xc , b4xc , . . .

Lemma 1 If 1 6 x < y then spec(x) 6= spec(y).

Proof
Since 1 6 x we have bxc < b2xc < b3xc < · · · and similarly for y. So in

18
order to prove spec(x) 6= spec(y) it is enough to show that there exists n > 0
with bnxc 6= bnyc.

Since x < y, there exists n ∈ N with 1/n < y − x. It follows that 1 < ny − nx
and therefore bnyc > bnxc as required. ¤


Let φ = ( 5 + 1)/2, the golden ratio, about 1.618. Then

spec(φ) = {1, 3, 4, 6, 8, 11, 12, 14, . . .}


spec(φ2 ) = {2, 5, 7, 10, 13, . . .}

We notice that

spec(φ) ∪ spec(φ2 ) = N
spec(φ) ∩ spec(φ2 ) = ∅

φ + 1 = φ2 . Hence 1/φ + 1/φ2 = 1.

Proposition 2 Let α, β ∈ R \ Q such that


1 1
+ =1 α, β > 0
α β

Then spec(α) ∪ spec(β) = N and spec(α) ∩ spec(β) = ∅

√ √
For example, take α = 2 and β = 2 + 2.

spec( 2) = {1, 2, 4, 5, 7, 8, 9, 11, . . .}

spec(2 + 2) = {3, 6, 10, 13, 17, . . .}

Proof
For α ∈ R , α > 0 , n ∈ N put

N (α, n) = |{k ∈ N | bkαc 6 n}| .

We claim that if
1 1
+ = 1,
α β
where α and β are irrational, then

N (α, n) + N (β, n) = n , ∀n ∈ N.

This will prove the proposition as changing n to (n+1) must increase exactly
one of the terms N (α, n) , N (β, n) by 1, that is,

either N (α, n + 1) = N (α, n) + 1


or N (β, n + 1) = N (β, n) + 1

but not both. So n + 1 lies in spec(α) or spec(β) but not both.

19
Proof of claim:
X X
N (α, n) = [bkαc 6 n] = [bkαc < n + 1]
k>0 k>0
X X
= [kα < n + 1] = [0 < k < (n + 1)/α]
k>0 k>0
= d(n + 1)/αe − 1.
(In general the number of integers in (0, x) is dxe − 1.)

Since α ∈
/ Q, we have n + 1/α ∈
/ Z, so
d(n + 1)/αe = (n + 1)/α + η
for some η with 0 < η < 1. Similarly,
N (β, n) = d(n + 1)/βe − 1 = (n + 1)/β + θ − 1
for some 0 < θ < 1.
n+1 n+1
N (α, n) + N (β, n) = +η−1+ +θ−1
α β
= n+1+η+θ−1−1
= n+η+θ−1
Since this is an integer and η , θ ∈ (0, 1), this can only be equal to n. So this
completes the proof. ¤

3.4 Division

Given n cakes to share between k people, how many should each get?

Write n = kq + r, where 0 6 r < k and q = bn/kc.

Then r people get q + 1 cakes and k − r people get q cakes. Note that
dn/ke = d(n − 1)/ke = . . . = . . . d(n − r + 1)/ke = q + 1,
d(n − r)/ke = d(n − r − 1)/ke = . . . = d(n − k + 1)/ke = q.
So
dn/ke + d(n − 1)/ke + . . . + d(n − k + 1)/ke = r(q + 1) + (k − r)q
= r + kq = n.
True for all n ∈ Z , k ∈ N.

Similarly,
bn/kc + b(n + 1)/kc + . . . + b(n + k − 1)/kc = n
This generalises to x ∈ R. Replace n by kx to get
bkxc = bxc + bx + 1/kc + bx + 2/kc + . . . + bx + (k − 1)/kc
True for all x ∈ R , k ∈ N.

20
3.5 Floor and Ceiling Sums

Example 16:
X √
b kc = 0 + 1| +{z
1 + 1} + 2| + 2 +{z
2 + 2 + 2} +3 + 3 + . . .
06k6n 22 −12 terms 32 −22 terms
X
= j((j + 1)2 − j 2 ) + some left over

More precisely, put a = b nc. Then
X √ X √ X √
b kc = b kc + b kc
06k6n 06k<a2 a2 6k6n
| {z }
left over terms
X
= j((j + 1)2 − j 2 ) + (n − a2 + 1)a
06j<a
X
= j(2j + 1) + (n − a2 + 1)a
06j<a
X
= (2j(j − 1) + 3j) + a(n − a2 + 1)
06j<a
2a(a − 1)(a − 2) 3a(a − 1)
= + + a(n − a2 + 1)
3 2
a3 a2 5a
= na − − +
3 2 6

21
4 Binomial Coefficients

4.1 Homogeneous trees

Let X be a set. A permutation of X is a bijective map f : X → X. The set


of permutations of X will be written Bij(X). We will ignore that this is in
fact a group (the symmetric group ).

The factorial numbers n! for n > 0 are defined by

n! = 1 · 2 · 3 · · · n.

We have the sets Xn = {1, 2, . . . , n} of n elements.

Proposition 3 Counting Permutations

| Bij(Xn )| = n!

Proof. Imagine we want to choose a permutation f ∈ Bij(Xn ), and that we


do this in n steps. First we choose the value of f (1), then the value of f (2),
and so on.
f (1) = 5

f (2) = 3 f (2) = 8

f (3) = 6 f (3) = 8 f (3) = 3 f (3) = 4 (12)

When choosing f (1), all values of Xn are available, which gives n possibilities.

As a permutation f should satisfy f (1) 6= f (2), the value of f (2) can be


chosen from the n − 1 possibilities

Xn \{f (1)}.

In general, at the i-th step there are n + 1 − i possible choices

Xn \{f (1), f (2), . . . , f (i − 1)}.

The answer is obtained by multiplying 1 these numbers, which gives

n · (n − 1) · (n − 2) · · · 1 = n!. 2

The remarkable thing in this proof is that the number of possible values of
f (2) is n − 1, regardless the value of f (1) that’s just been chosen.
1
Compare the number of elements of a Cartesian product: |A × B| = |A| × |B|

22
An event of choosing an object is a path from top to bottom in a diagram
called a homogeneous (rooted) tree .




   



     

 

This rooted tree is not homogeneous


root


x = vertex,
a0 = 2 y = a child of x,
x
z = a leaf,
 

a1 = 3 y number of leaves:
     !

a2 = 2 a0 a1 a2 = 2 · 3 · 2 = 12.
          ! 

z
This rooted tree is homogeneous

A rooted tree is homogeneous if any two vertices on the same level have the
same number of children. If a homogeneous tree has n levels, and a vertex
on level i has ai children, then the number of leaves (= vertices with no
children) is
a0 a1 · · · an−2 .
For nonhomogeneous trees there is no such formula.

So now we understand that (12) is a small part of the homogeneous tree


involved in the proof of Proposition 3!

We’ll see another homogeneous tree in the following section, and many more
after that.

4.2 Binomial Coefficients


¡¢
Let r ∈ R, k ∈ Z. We define the binomial coefficients kr (”r choose k”) by

k
µ ¶   r(r − 1) . . . (r − k + 1) = r if k > 0
r
= k! k!
k 
 0 if k < 0

Proposition 4 Counting interpretation of binomial coefficients

Let n, k ∈ Z, 0 6 k 6 n. The number of k-element ¡n¢ subsets of a fixed n-


element set is precisely the binomial coefficient k .

Proof. For the duration of this proof, we write g(n, k) for the number ¡n¢ of
k-element subsets of Xn = {1, . . . , n}. So we want to prove g(n, k) = k .

23
In Proposition 3 (counting permutations) we saw that | Bij(Xn )| = n!. Let
us compute | Bij(Xn )| another way as follows.

Step 1: Choose {f (1), f (2), . . . , f (k)} (only as a set, not the individual f (i)!).
By definition, there are g(n, k) ways to do this.

Step 2: Choose the individual values f (1), f (2), . . . , f (k). This boils down
to ordering k objects. By Proposition 3 (counting permutations) there are
k! ways of doing this. Note that this number does not depend on what
happened during step 1: the tree is homogeneous so far!

Step 3: Choose the remaining values f (k + 1), . . . , f (n) individually. Again


there are (n − k)! ways of doing this.

We have a homogeneous tree, so

n! = | Bij(Xn )| = g(n, k) k! (n − k)!.

and µ ¶
n! n
g(n, k) = = .
k! (n − k)! k
This finishes the proof. 2

µ ¶ µ ¶ µ ¶
r n n
Note that for all r ∈ R, = 1. Also, = if 0 6 k 6 n are
0 k n−k
integers.

4.3 Easy Identities

Absorption: for k 6= 0,
µ ¶ µ ¶
r r r−1
=
k k k−1

which is clear from the definition.

Proposition 5 Pascal’s Triangle Identity:


µ ¶ µ ¶ µ ¶
r r−1 r−1
= + (13)
k k k−1

Proof
First assume n > k > 0, n ∈ Z. Then
µ ¶
n
= number of ways of choosing k things from n things
k

Suppose our set of size n is one red and (n − 1) green balls.


µ ¶
n−1
= subsets of size k which include the red ball
k−1

24
µ ¶
n−1
= subsets of size k which do not include the red ball
k
So µ ¶ µ ¶ µ ¶
n n−1 n−1
= + .
k k−1 k
Write (13) as µ ¶ µ ¶ µ ¶
x x−1 x−1
= + .
k k k−1
This is a polynomial equation in x of degree k. It is true for all x ∈ Z with
x > 0, so it has more than k roots. Hence (13) must be an identity. That is,
true for all x ∈ R. ¤

Using Pascal’s triangle identity repeatedly, one easily produces the first few
rows of Pascal’s triangle:

0 1 2 3 4 5 6 7 k
0 1
1 1 1
2 1 2 1
3 1 3 3 1 Pascal’s triangle.
4 1 4 6 4 1
5 1 5 10 10 5 1
6 1 6 15 20 15 6 1
7 1 7 21 35 35 21 7 1 ¡ ¢
n
n k

It is not a bad idea to know the first five rows by heart.

Unfold (13), for example:


µ ¶ µ ¶ µ ¶
5 4 4
= +
3 3 2
µ ¶ µ ¶ µ ¶
4 3 3
= + +
3 2 1
µ ¶ µ ¶ µ ¶ µ ¶
4 3 2 1
= + + + .
3 2 1 0
In general (parallel summation),
Xµ r + k ¶ µ r + n + 1 ¶
= for n ∈ Z ,
k n
k6n

from summing down diagonal in Pascal’s Triangle. Alternatively:


µ ¶ µ ¶ µ ¶
5 4 4
= +
3 3 2
µ ¶ µ ¶ µ ¶
3 3 4
= + +
3 2 2
µ ¶ µ ¶ µ ¶
2 3 4
= + + .
2 2 2

25
In general (upper summation),
X µ k ¶ µ n+1 ¶
= for m, n ∈ Z and m, n > 0 ,
m m+1
06k6n

from summation down column in Pascal’s Triangle.

Combinatorial explanation of upper summation: µ choosing


¶ (m + 1) from
k
(n + 1), numbered as 0, 1, 2 . . . n. There are ways of making the
m
choice of (m + 1) such that the largest number in the chosen set is k. The
result follows.

Negation:

rk = r(r − 1) . . . (r − k + 1)
= (−1)k (−r)(1 − r) . . . (k − 1 − r)
= (−1)k (k − 1 − r)k

Dividing by k! gives (upper negation)


µ ¶ µ ¶
r k k−1−r
= (−1) . (14)
k k

In particular
µ ¶
−1
= (−1)k for k > 0,
k
µ ¶ µ ¶
−2 k k+1
= (−1) = (−1)k (k + 1).
k k

From this we obtain a formula for the alternating sum of the first m elements
in the r’th row of Pascal’s triangle:
µ ¶ Xµ k−1−r ¶
X
k r
(−1) = (upper negation)
k k
k6m k6m
µ ¶
−r + m
= (parallel summation)
m
µ ¶
m r−1
= (−1) (upper negation)
m

But there is no known simple expression for the same sum without alternating
signs,
Xµ r ¶
,
k
k6m

although of course the special case m = r is obvious :


r µ ¶
X r
= 2r , the total number of subsets.
k
k=0

26
Vandermonde Convolution:
Xµ r ¶µ s ¶ µ r + s ¶
= , for n ∈ Z r, s ∈ R.
k n−k n
k∈Z

Combinatorial explanation: suppose we have set of r men, s women. We


want
µ to¶choose
µ n people.
¶ Number of subsets with k men and n − k women
r s
is . The result (for the case r, s ∈ Z, r, s > 0) follows.
k n−k
Again it generalises to all r, s ∈ R by the roots of polynomial argument.

Theorem 6 Binomial Theorem.


Xµ r ¶ X µ r ¶ µ x ¶k
r k r−k r
(x + y) = x y =y
k k y
k∈Z k∈Z

This is valid if either:

1. r ∈ N, series is finite

2. r ∈ R and |x/y| < 1, series is infinite and converges.

Another proof of Vandermonde Convolution: Look at coefficient of xn y r+s−n


in

(x + y)r+s = (x + y)r × (x + y)s


µ ¶ Xµ r ¶µ s ¶
r+s
=
n k n−k
k

This covers all top 10 identities, except “trinomial revision”. Let m, k ∈ Z,


r ∈ R, m > k.
µ ¶µ ¶
r m r(r − 1) . . . (r − m + 1) m!
= ×
m k m! k!(m − k)!
r(r − 1) . . . (r − k + 1) (r − k) . . . (r − m + 1)
= ×
k! (m − k)!
µ ¶µ ¶
r r−k
=
k m−k

(if m < k then both sides are 0).

27
The top ten binomial coefficient identities.
µ ¶
n n!
Factorial expansion = , n > k > 0 integers
k k! (n − k)!
µ ¶ µ ¶
n n
Symmetry = , n > k > 0 integers
k n−k
µ ¶ µ ¶
r r r−1
Absorption = , k > 0 integer
k k k−1
µ ¶ µ ¶ µ ¶
Pascal’s triangle r r−1 r−1
= + , k integer
identity k k k−1
µ ¶ µ ¶
r k k−r−1
Upper negation = (−1) , k integer
k k
µ ¶µ ¶ µ ¶µ ¶
r m r r−k
Trinomial revision = , m, k integers
m k k m−k
X µr ¶ r > 0 integer
Binomial theorem (x + y)r = xk y r−k ,
k or |x/y| < 1
X µr + k ¶ µk ¶
r+n+1
Parallel summation = , n integer
k n
k6n
X µk¶ µ
n+1

Upper summation = , m, n > 0 integers
m m+1
06k6n
µ ¶ X µ r ¶µ s ¶
Vandermonde r+s
= , n integer.
convolution n k n−k
k

4.4 Some more complicated identities

Example 17: Let


m µ ¶Áµ ¶
X m n
S= , for n > m > 0
k k
k=0

By trinomial revision
µ ¶µ ¶ µ ¶µ ¶
n m n n−k
= ,
m k k m−k

So
µ ¶Áµ ¶ µ ¶Áµ ¶
m n n−k n
=
k k m−k m

Hence µ ¶ m µ ¶
n X n−k
×S = .
m m−k
k=0

28
This looks like parallel summation. We need to change variable in summa-
tion. Put l = m − k, k = m − l.
µ ¶ m µ ¶
n X n−m+l
×S =
m l
l=0
µ ¶
n−m+m+1
= (parallel summation)
m
µ ¶
n+1
=
m
µ ¶Áµ ¶
n+1 n
⇒S =
m m
n+1
=
n+1−m

Example 18:
m µ ¶Áµ ¶
X n−k−1 n
S= k for n > m > 0 .
n−m−1 m
k=0

If the first k in each summand were only (n − k) we could use absorption.


So we can write k = n − (n − k) and then
µ ¶
n
S = S1 − S2
m

where m µ ¶
X n−k−1
S1 = n
n−m−1
k=0

and m µ ¶
X n−k−1
S2 = (n − k) .
n−m−1
k=0

By absorption,
m µ ¶
X n−k
S2 = (n − m)
n−m
k=0

Now we make a change of variable:

l = n−k, k = n−l.
n µ ¶
X l
S2 = (n − m)
n−m
l=n−m
µ ¶
l
But if l < n − m, then = 0. So
n−m
n µ ¶ µ ¶
X l n+1
S2 = (n − m) = (n − m)
n−m n−m+1
l=0

29
by upper summation.
m µ ¶
X n−k−1
S1 = n
n−m−1
k=0

which is similar to S2 with (n − 1) in place of n. Put

n−k −1 = l, k =n−l−1

n−1 µ ¶ µ ¶
X l n
S1 = n =n by upper summation
n−m−1 n−m
l=0

So
µ ¶
n
S = S1 − S2
m
µ ¶ µ ¶
n n+1
= n − (n − m)
n−m n−m+1
µ ¶ µ ¶
n (n + 1)(n − m) n
= n −
n−m (n − m + 1) n−m
µ ¶
m n
=
n−m+1 n−m
µ ¶
m n
= .
n−m+1 m

So
m
S= .
n−m+1

4.5 Derangements

Theorem 7 Inversion Formula.

Let f, g : N0 → R be functions (here N0 = N ∪ {0}). Then


µ ¶
X
k n
g(n) = (−1) f (k) for all n > 0 ⇐⇒
k
k∈Z
µ ¶
X
k n
f (n) = (−1) g(k) for all n > 0.
k
k∈Z

Pn ¡n¢
Note: The sums are the same as k=0 because k
= 0 otherwise. So they
are finite sums.

30
Proof
By symmetry, enough to prove LHS ⇒ RHS, so assume LHS. Then
Xµ n ¶ Xµ n ¶ Xµ k ¶
k k
(−1) g(k) = (−1) (−1)j f (j)
k k j
k k j
µ ¶µ ¶
X X
j+k n k
= f (j) (−1)
k j
j k

where we have replaced n by k, and k by j in LHS. Now


µ ¶µ ¶ µ ¶µ ¶
n k n n−j
= ,
k j j k−j
so by trinomial revision the sum is
µ ¶X µ ¶
X n j+k n−j
f (j) (−1)
j k−j
j k
µ ¶ µ ¶
X n X l+2j n−j
= f (j) (−1)
j l
j l+j

where we have replaced k by l + j and k − j by l.


P P P
Since j is constant in l+j term, l+j is the same as l (remember the
range of summation is all of Z). So sum is
µ ¶X µ ¶
X n l n−j
= f (j) (−1)
j l
j l
µ n−j
¶X µ ¶
X n l n−j
= f (j) (−1)
j l
j l=0
µ ¶
n−j
(since = 0 for other terms).
l
But we have shown that
m µ ¶ µ ¶
X r k m r−1
(−1) = (−1)
k m
k=0

So our sum is
µ ¶µ ¶
X n n−j−1
f (j) . (∗)
j n−j
j

Note that for n ∈ Z,


µ ¶ ½
n−1 0 for n 6= 0
=
n 1 for n = 0
That is,
µ ¶
n−1
= [n = 0]
n

31
— only non-zero term in (∗) is when j = n. So
µ ¶
n
(∗) = f (n) = f (n) , for all n > 0 .
n

So LHS ⇒ RHS as required. ¤

There are n! permutations of a set of size n. A permutation is called a


derangement if it does not fix any point in the set.

Let D(n) be the number of derangements. Can we get a formula for D(n)?

More generally, let h(n, r) be the number of permutations fixing exactly r


points, 0 6 r 6 n.

So D(n) = h(n, 0) (note that h(n, n − 1) = 0).

n h(n, 0) h(n, 1) h(n, 2) h(n, 3) h(n, 4)


1 0 1
2 1 0 1
3 2 3 0 1
4 9 8 6 0 1

Example 19: For n = 4 the permutations (1, 2, 3, 4), (1, 2, 4, 3), (1, 3, 4, 2),
(1, 3, 2, 4), (1, 4, 2, 3), (1, 4, 3, 2), (1, 2)(3, 4), (1, 4)(3, 2), (1, 3)(2, 4) fix 0 points.
So there are 9 derangements.

(1, 2, 3), (1, 3, 2), (1, 2, 4), (1, 4, 2), (1, 3, 4), (2, 3, 4), (2, 4, 3), (1, 4, 3) fix 1
point.

(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4) fix 2 points

Identity fixes 4 points


µ ¶
n
Note that h(n, r) = D(n − r) because we choose r fixed points in
µ ¶ r
n
ways then derange the other n − r points in D(n − r) ways. Clearly
r
the total number of permutations:
n n µ ¶
X X n
n! = h(n, k) = D(n − k)
k
k=0 k=0
Xµ n ¶ Xµ n ¶
= D(n − k) = D(n − k)
k n−k
k k
Xµ n ¶
= D(k) replacing k with n − k .
k
k

32
We can now use Theorem (7) with g(n) = n! and f (n) = (−1)n D(n) to
conclude that,
Xµ n ¶
n
f (n) = (−1) D(n) = (−1)k k!
k
k
n n
X n! X n!
so D(n) = (−1)n+k = (−1)n−k
k=0
(n − k)! k=0
(n − k)!
n
X n!
= (−1)k , replacing n − k by k .
k=0
k!

Example 20:

D(1) = 0
³ 1 1´
D(2) = 2! 1 − + =1
1! 2!
³ 1 1 1´
D(3) = 3! 1 − + − =2
1! 2! 3!
³ 1 1 1 1´
D(4) = 4! 1 − + − + =9
1! 2! 3! 4!

Note that D(n)/n! is the proportion of derangements as a subset of all per-


mutations. It tends to 1/e rapidly as n → ∞. In fact
¹ º
n! 1
D(n) = + .
e 2

4.6 Multinomial coefficients

These are generalisations of binomial coefficients. We define


µ ¶
n1 + · · · + n k (n1 + · · · + nk )!
:= .
n1 , . . . , n k n1 ! · · · n k !

Note that µ ¶ µ ¶
k+` k+`
= ,
k, ` k
the binomial coefficient.
¡ ¢
We’ve seen before that nk is the number of k-element subsets of an n-element
set. The following generalises this to multinomial coefficients, necessarily
with a somewhat different language.

Let X(m) denote a set of m elements, and let A(n1 , . . . , nk ) denote the set
of k-tuples (Y1 , . . . , Yk ) of disjoint sets whose union is X(n1 + · · · + nk ), and
such that |Yi | = ni for all i.

33
Proposition 8 Combinatorial description of multinomial coefficients.
µ ¶
n1 + · · · + n k
= |A(n1 , . . . , nk )|.
n1 , . . . , n k

Exercise. Prove the above proposition by a homogeneous tree.

The binomial theorem also has its generalisation, which looks as follows.

Theorem 9 Multinomial theorem.

Let k > 1 and m > 0 be integers. Then


Xµ m

m
(x1 + · · · + xk ) = xn1 1 · · · xnk k
n1 , . . . , n k
where the sum is over all k-tuples (n1 , . . . , nk ) of nonnegative integers whose
sum is m. 2

Exercises about multinomial coefficients.

1. Prove µ ¶ µ ¶µ ¶
k+`+m k+`+m k+`
=
k, `, m k + `, m k, `
and deduce trinomial revision.
2. Prove the multinomial theorem, using the binomial theorem and induc-
tion on k.
3. Prove the multinomial theorem, using differentiation and induction on
m.
4. Prove:
µ ¶ µ ¶ µ ¶
n1 + · · · + n k n1 + · · · + n k − 1 n1 + · · · + n k − 1
= +
n1 , . . . , n k n1 − 1, n2 , . . . , nk n1 , n2 − 1, n3 , . . . , nk
µ ¶
n1 + · · · + n k − 1
+··· +
n1 , . . . , nk−2 , nk−1 − 1, nk
µ ¶
n1 + · · · + n k − 1
+ (15)
n1 , . . . , nk−1 , nk − 1

5. Prove the multinomial theorem, using induction on m and (15).


6. Fix n1 , . . . , nk−1 ∈ Z>0 . Prove that the map
µ ¶
n1 + · · · + n k
nk 7→
n1 , . . . , n k
is a polynomial. (This tells us how a more general multinomial coeffi-
cient should be defined where one variable is real and all other variables
are nonnegative integers.)

34
5 Special Numbers

5.1 Stirling Numbers

Stirling numbers of the first and second kinds are written as


· ¸ ½ ¾
n n
first kind , second kind
k k
They are defined only for n ∈ N0 , k ∈ Z. They are 0 for k < 0 and k > n.
Let Xn = {1, 2, 3, . . . n} be a set of size n.

Stirling numbers of the second kind

Definition 4: A partition of a set X is an© equivalence


ª relation on that
n
set. The equivalence classes are called parts. k is the number of ways of
partitioning Xn into k non-empty subsets.

Example 21: n = 4, k = 2. We have the partitions {1, 2, 3} ∪ {4},


{1, 2, 4} ∪ {3}, {1, 3, 4} ∪ {2}, {2, 3, 4} ∪ {1}, {1, 2} ∪ {3, 4}, {1, 3} ∪ {2, 4},
{1, 4} ∪ {2, 3}.
½ ¾
4
So = 7.
2

Properties:
½ ¾
0
= 1
0
½ ¾
n
= 0 , for n > 0
0
½ ¾
n
= 1 , for n > 0
1
½ ¾
n
= 2n−1 − 1 , for n > 2
2
2n − 2
=
2
(note that 2n − 2 is the number of non empty proper subsets of Xn .)

Proposition 10 Basic Recurrence Relation:


½ ¾ ½ ¾ ½ ¾
n n−1 n−1
=k + (16)
k k k−1

Proof
Consider a partition of Xn into k non-empty subsets (n, k > 0).

Either

35
© n−1 ª
1. {n} is one of the subsets. Then there are k−1
ways of decomposing
Xn \ {n} into k − 1 non empty subsets.
2. n is in a larger
© n−1subset.
ª Then we decompose Xn \ {n} into k non-empty
subsets in© k ª ways. We can place n in any of these k subsets. So
there are n−1 k
ways of doing this.

Result follows. ¤

©nª ©nª ©nª ©nª ©nª ©nª


n 0 1 2 3 4 5

0 1
1 0 1
2 0 1 1
3 0 1 3 1
4 0 1 7 6 1
5 0 1 15 25 10 1

Example 22: Unfolding is the analogue of upper and parallel summation.


Unfolding (16) ` times (` 6 k) yields
½ ¾ ½ ¾ ½ ¾
n n−1 n−1
=k +
k k k−1
½ ¾ ½ ¾ ½ ¾
n−1 n−2 n−2
=k + (k − 1) + =
k k−1 k−2
³X 1 ½ ¾ ½ ¾
n−m−1 ´ n−2
= (k − m) + = ···
m=0
k−m k−2
`−1
³X ½ ¾ ½ ¾
n−m−1 ´ n−`
= (k − m) + .
m=0
k − m k − `
A more formal induction to prove this is also possible.

Exercise. Unfold (16) the other way.


P © ª
Note: b(n) = k nk = total number of ways of partitioning Xn , the total
number of equivalence relations on Xn .

b(0) = b(1) = 1 , b(2) = 2 , b(3) = 5 , b(4) = 15 , etc. There is no known


simple formula for b(n).

Stirling numbers of the first kind

Let g be a permutation of a set X. An orbit of g is a set of the form


{g k x | k ∈ Z} where x ∈ X. A cycle of g is a permutation of the form g|Y
(restriction) where Y is an orbit of g.

36
For example, the orbits of (123)(67) ∈ S7 are {1, 2, 3}, {4}, {5}, {6, 7}.
Its cycles are (123), (4), (5), (67). Cycles are always non-empty. Note:
(1234) = (2341) etc (cyclic permutation).
· ¸
n
Definition 5: is the number of permutations in Sn with k cycles.
k

Example 23: n = 4, k = 2. (1, 2, 3)(4), (1, 3, 2)(4), (1, 2, 4)(3), (1, 4, 2)(3),
(1, 3,
· 4)(2),
¸ (1, 4, 3)(2), (2, 3, 4)(1), (2, 4, 3)(1), (1, 2)(3, 4), (1, 3)(2, 4), (1, 4)(2, 3).
4
So = 11
2

· ¸
0
= 1,
0
· ¸
n
= 0 , for n > 0
0
· ¸
n
= (n − 1)!
1

the last equality because we can write the same n-cycle in precisely n different
ways: given an n-cycle, choose as first digit any one of the n digits in the
cycle, and then the cyclic order determines the order of the remaining digits.

Note that
X· n ¸
= n!
k
k

the total number of permutations of Xn .

Proposition 11 Basic Recurrence Relation:


· ¸ · ¸ · ¸
n n−1 n−1
= (n − 1) + (17)
k k k−1

Proof
Either (n) is a cycle by itself,£ leaving
¤ Xn \ {n} = Xn−1 to be decomposed
n−1
into k − 1 disjoint cycles (in k−1 ways), or we decompose Xn \ {n} into k
£ ¤
non-empty cycles in n−1 k
ways and then insert the n into one of the cycles.
For any given decomposition of Xn−1 into disjoint cycles, there are (n − 1)
ways of inserting n into one of them: put n immediately before any of the
other (n − 1) numbers. ¤

37
£n¤ £n¤ £n¤ £n¤ £n¤ £n¤
n 0 1 2 3 4 5

0 1
1 0 1
2 0 1 1
3 0 2 3 1
4 0 6 11 6 1
5 0 24 50 35 10 1

Example 24: How many ways are there of inserting 6 into (1, 2, 3)(4, 5)?

(6, 1, 2, 3)(4, 5)
(1, 6, 2, 3)(4, 5)
(1, 2, 6, 3)(4, 5)
(1, 2, 3)(6, 4, 5)
(1, 2, 3)(4, 6, 5)

(Note that (1, 2, 3, 6)(4, 5) is the same as (6, 1, 2, 3)(4, 5), and (1, 2, 3)(4, 5, 6)
is the same as (1, 2, 3)(6, 4, 5), as decompositions into disjoint cycles — and,
thus, of course, as permutations). Gives 6 − 1 = 5 possible ways.
· ¸
X
k n
Exercise. (−1) = 0 if n > 2.
k
k

Recall the rising and falling powers from Assignment 2. We were able to
express ordinary powers in terms both of rising powers, and of falling powers.
For example

x4 = x4 + 6x3 + 7x2 + x1
x4 = x4 − 6x3 + 7x2 − x1

Properties and examples about Stirling numbers


½ ¾
n
P n
Theorem 12 1. x = k xk
k
½ ¾
n
P n−k n
2. x = k (−1) xk
k
· ¸
n
P n−k n
3. x = k (−1) xk
k
· ¸
n
P n
4. x = k xk
k

Proof
We do (1) and (3). (2) and (4) are similar.

38
1. Induction on n. n = 0 gives 1 = 1, true. So assume true for n − 1.
Note that xk+1 = xk (x − k). So x.xk = xk+1 + kxk . We have

xn = x.x
Ã
n−1
!
X½ n − 1 ¾
= x xk (by induction)
k
k
X½ n − 1 ¾
= (xk+1 + kxk )
k
k
X½ n − 1 ¾ X ½ n−1 ¾
k+1
= x + k xk
k k
k k
X½ n − 1 ¾ X ½ n−1 ¾
= xk + k xk
k−1 k
k k
X½ n ¾
= xk ( by Proposition 23)
k
k

3. Induction on n. n = 0 gives 1 = 1, true. Assume true for n − 1.

xn = xn−1 (x − n + 1)
X· n − 1 ¸
= (x − n + 1) (−1)n−1−k xk (by induction)
k
k
X· n − 1 ¸ X· n − 1 ¸
n−1−k k+1
= (−1) x + (−1)n−k (n − 1)xk
k k
k k
X· n − 1 ¸ X· n − 1 ¸
= (−1)n−k xk + (−1)n−k (n − 1)xk
k−1 k
k k
X· n ¸
= (−1)n−k xk (by Proposition 24)
k
k

Example 25: Using parts (3) and (1) of theorem 12 yields


· ¸ · ¸³ X ½ ¾ ´
n−k n n−k n k `
X X
n (3) k (1)
x = (−1) x = (−1) x .
k
k k
k `
`

But the polynomials {x` : ` ∈ Z>0 } (falling powers) are linear independent.
Therefore, we can compare coeffients of x` which yields
· ¸½ ¾
n−k n k
X
[n = `] = (−1) .
k
k `

Example 26: Here is an example of an identity which can be obtained


bijectively. Let F (k) denote the set of permutations g ∈ Sn+1 of ` + 1 cycles

39
such that n + 1 is in a (k + 1)-cycle, and f (k) the number of elements of
F (k). So · ¸ X
n+1
= f (k). (18)
`+1 k

We choose an element of F (k) as follows.

Step 1. Choose
¡n¢ the orbit {g m (n + 1) | m ∈ Z} of n + 1. The number of
choices is k .
£ ¤
Step 2. Choose the cycle of n + 1. The number of them is k+1
1
= k!.
£ ¤
Step 3. Finish g. There are n−k`
choices.

We are dealing with a homogeneous tree so


µ ¶ · ¸
n n−k
f (k) = k!
k `

so by (18) we find
· ¸ Xµ ¶ · ¸
n+1 n n−k
= k! .
`+1 k
k `

5.2 Harmonic Numbers

We define
n
X 1
Hn = , H0 = 0.
k=1
k

Recall that Z x
1
log x = dx
1 x
(where log x means loge x).

Consider the following figure, showing the graph of y = 1/x and of a step
function f whose integral between 1 and n + 1 is Hn .

40
2

1 2 3

It is clear that Hn > log(n + 1) > log n, and that log n > Hn − 1 (throw away
the first (square) box and shift all the others one unit to the left). Hence
Hn − log(n + 1) is bounded above by 1. It is an increasing sequence, and so
tends to a limit, known as γ. Since log(n + 1) − log n = log(1 + 1/n) → 0 as
n → ∞,

lim (Hn − log n) = γ, which is approximately = 0.577215 . . .


n→∞

Some problems in which Hn occur.

Example 27: Suppose we have a stack of n cards each of length 2 units


and we try to balance them on edge of table to get maximal overhang. What
is the largest possible overhang?
x=0

card 1
d1 card 2
d2 card 3

card n
dn

table

Let dn be a possible overhang (possible in the sense that the cards don’t fall).
Number the cards 1–n from top to bottom. Since the first card does not fall,
we have
d1 6 1.
The reason for this is that the center of the first card must be above the
second card.

41
More generally, the center of gravity of the first k cards must be above the
(k + 1)-st card, where the table plays the role of the (n + 1)-st card. In order
to compute this center of gravity, put x = 0 at the end of the first card. Then
the center of gravity of the first k cards is

(1 + d0 ) + (1 + d1 ) + · · · + (1 + dk−1 )
k
and therefore,

(1 + d0 ) + (1 + d1 ) + · · · + (1 + dk−1 )
dk 6 . (19)
k
Here d0 = 0.

Note that, the greater d0 , . . . , dk are, the greater the upper bound for dk
given by (19) is. It follows that dn is greatest precisely when equality holds
in (19), for all k:

(1 + d0 ) + (1 + d1 ) + · · · + (1 + dk−1 )
dk = for all k,
k
or
kdk = k + (d0 + · · · + dk−1 ). (20)
Replacing k by k − 1 gives

(k − 1)dk−1 = (k − 1) + (d0 + · · · + dk−2 ). (21)

Substracting (20) and (21) gives

k dk − (k − 1)dk−1 = 1 + dk−1 ,

that is,
1
dk =+ dk−1 .
k
Since d0 = 0 we find that dk is just the harmonic number dk = Hk .

Since Hn → ∞ as n → ∞, we can get arbitrarily large overhang. H4 > 2,


so with 4 cards we can get the top card to be clear of the table. But note
H1000000 = 14.39 so to get an overhang of 7 card lengths we need nearly
1000000 cards.

Example 28: You are collecting football stickers. The complete set has
size n. You buy them one at a time, selected randomly from a complete set.
How many do you expect to have to buy to get a full set?

Suppose you already have k < n distinct stickers. How many do you expect
to have to buy until you get a new sticker number k + 1? At this point, the
probability that a random sticker
P is new is (n − k)/n = p. The expected
time for new sticker k + 1 is r>1 r P (r), where P (r) is the probability that
you get the new sticker on your r’th purchase. This occurs if you get r − 1

42
old stickers in a row, and then get a new one. So P (r) = (1 − p)r−1 p, and
expected time is
X X p p 1 n
r (1 − p)r−1 p = r q r−1 p = = = = .
r>1 r>1
(1 − q)2 p2 p n−k

So total expected waiting time for full set is


n−1
X n
= nHn ∼ n ln n .
k=0
n−k

Example 29: Consider a worm on an elastic band. The worm starts at


one end and crawls towards the other at 1cm per second. The elastic band
initially has length 100cm, but after each second, it is stretched by 100cm.

worm
1 cm/sec
A
elastic
100cm
fixed

Does worm ever reach end of elastic? After 1 second 1/100’th journey is over.
Then the elastic is stretched to 200cm long. But worm remains 1/100’th way
along since the stretching is uniform. In 2nd second, worm completes further
1/200’th of journey. So in total 1/100 + 1/200 has been done. This remains
true after stretching to 300cm. In 3rd second 1/300’th of journey done. After
n seconds, the fraction of journey completed is 1/100+1/200+. . .+1/100n =
Hn /100. Since Hn → ∞ the answer to the question is yes.

It takes approximately n seconds where n is the first integer with Hn > 100.
Hn ∼ log n + γ, so,
n ∼ e100−γ = e99.423
= 4.79 × 1035 years
At this stage, length of elastic ∼ 1025 light years!

We can also do a continuous version: same but elastic is stretched continu-


ously with far end moving at 100cm/s. Length of elastic is l = 100t + 100
where t is time. So we get

dx x
= + 1.
dt t+1
Solution x = log(t + 1)(t + 1).
x log(t + 1)
=
l 100
100
x = l when ln(t + 1) = 100 ⇒ t = e − 1.

43
5.3 Fibonacci Numbers, Fn

Definition 6: They are defined by F1 = F2 = 1, Fn = Fn−1 + Fn−2 . This


defines Fn for all n ∈ Z.

n −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7
Fn 5 −3 2 −1 1 0 1 1 2 3 5 8 13

In general F−n = (−1)n−1 Fn (proof by (strong) induction on n).

Let √
5+1
φ= ∼ 1.618
2
be a root of x2 − x − 1 = 0. The other root is
φ̂ = 1 − φ = −φ−1 ∼ −0.618
Then
¹ n º
φn − φ̂n φ 1
Fn = √ = √ +
5 5 2
which will be proved in chapter 6.

There are lots of identities involving Fn .

Cassini’s Identity:
Fn+1 Fn−1 − Fn2 = (−1)n
For example,
n = 8, 13 × 34 − 212 = 1
n = 6, 13 × 5 − 64 = 1
Proof by induction.

Another identity:
Fn+k = Fk Fn+1 + Fk−1 Fn , for all n, k ∈ Z
by easy induction on k. For example,
k = n , F2n = Fn (Fn+1 + Fn−1 ) , so Fn | F2n
k = 2n , F3n = F2n Fn+1 + F2n−1 Fn , so Fn | F3n
By easy induction Fn | Fkn for all k > 1. In fact hcf(Fm , Fn ) = Fhcf(m,n)
(harder).

Proposition 13 Let n ∈ N. Then n has a unique Fibonary representation


n = F k1 + F k2 + . . . + F kr ,
with ki > ki+1 + 1 and kr > 1.

44
For example, 100 = 89 + 8 + 4 = F11 + F6 + F4

Lemma 14 If r > 1, then

Fr + Fr−2 + Fr−4 + . . . + F(3 or 2) < Fr+1

Proof
Induction on r

r = 2 : F 2 < F3
r = 3 : F 3 < F4

For r > 3, by induction, Fr−2 + . . . + F(3 or 2) < Fr−1 . So Fr + (Fr−2 + . . . +


F(3 or 2) ) < Fr + Fr−1 = Fr+1 . This proves the lemma. ¤

Proof of proposition
Existence: induction on n. For n = 1, 1 = F2 , so OK.

For n > 1, choose k1 , with Fk1 6 n < Fk1 +1 . Apply induction to n − Fk1 .
Since n − Fk1 < Fk1 +1 − Fk1 = Fk1 −1 . So we get k2 < k1 − 1 when we write
n − Fk1 = Fk2 + Fk3 + . . . + Fkn . Since F2 = F1 = 1, we never need to use F1 ,
so get kr > 1.

Uniqueness: to prove uniqueness let n = Fk1 +. . . +Fkr , ki > ki+1 +1, kr > 1.
k1 must be the largest possible such that

Fk1 6 n < Fk1 +1 .

Otherwise, by lemma, sum on right is less than n. Then apply induction to


n − Fk1 . The k2 , . . . , kr are uniquely determined. ¤

Example 30: What is the Fibonary expansion of 3Fn (n > 0)? We begin
by computing a few small cases.

Fibonary ex-
n Fn 3Fn pansion of 3Fn
0 0 0 0
1 1 3 F4
2 1 3 F4
3 2 6 F5 + F2
4 3 9 F6 + F2
5 5 15 F7 + F3

From the last two lines we guess

3Fn = Fn+2 − Fn−2 . (22)

This is true for all n ∈ Z and can easily be proved by induction. Now (22) is
the Fibonary expansion for 3Fn provided n > 4. For 0 6 n < 4 refer to the
table.

45
Example 31: Game: two players A and B. A chooses n ∈ N, n > 1. Then
players take it in turn to subtract an integer m (which they choose each time)
from n. The player who first gets it to 0 wins. Rules:

1. B must not subtract n on first go.

2. You may never subtract more than 2m, where m was previous number
subtracted.

For example, when n = 12,

n B A B A B A B
−2 −3 −1 −1 −1 −1 −2
12 10 7 5 4 3 2 0

Here B wins.

Winning strategy: A should choose a Fibonacci Number. Otherwise B should


write
n = F k1 + . . . + F kr
as in the proposition and subtract Fkr .

For example, n = 12 , 12 = 8 + 3 + 1. B subtracts 1 to get 11. In the game


above A could have won. 10 = 8 + 2. A subtracts 2 to get Fibonacci number
8.

46
6 Generating Functions

6.1 Basic Manipulation

Let G = {g0 , g1 , g2 , . . .} be an infinite sequence, gi ∈ C. Let gn = 0 for n < 0.


The generating function of the sequence (gi ) is defined to be
X
G(z) = gn z n = g 0 + g 1 z + g 2 z 2 + g 3 z 3 . . . .
n∈Z

We will nearly always treat this as a formal power series, and not concern
ourselves with convergence.

Basic Operations:

1. Linear sums: Let F = (fn ) and G = (gn ) be sequences; α, β ∈ R. Then


X
(αF + βG)(z) = (αfn + βgn )z n .
n

2. Shifting: (m ∈ N)
X X
z m G(z) = gn z n+m = gn−m z n .
n n

3. Scalar Multiplication:
X
G(cz) = gn c n z n (6= cG(z))
n

4. Differentiation:
X
G0 (z) = (n + 1)gn+1 z n
n
X
0
zG (z) = ngn z n .
n

5. Multiplication or convolution:
XµX ¶
F (z)G(z) = fk gn−k z n
n k
= f0 g0 + (f0 g1 + f1 g0 )z + (f0 g2 + f1 g1 + f2 g0 )z 2 + . . .

6. Division: Since

(1 − z)(1 + z + z 2 + z 3 + . . .) = 1 ,

we can write
1 X
= zn .
1−z n>0

47
In general
µX ¶ XµX ¶
G(z) n 2
= gn z (1 + z + z + . . .) = gk z n
1−z n n k6n

Dividing by (1 − z) replaces a sequence by the sequence of its partial


sums. For example, G = (1, 1, 1, 1, . . .),
1
G(z) =
1−z
1 X
= (n + 1)z n
(1 − z)2 n

is the generating function of (1, 2, 3, 4, . . .) (also get by differentiating


1/(1 − z))

7. The generating function of (1, 1!1 , 2!1 , 3!1 , . . .) is


X zn
ez = .
n>0
n!

The generating function of (0, 11 , 21 , 13 , . . .) is


µ ¶ X n
1 z
ln = .
1−z n>1
n

8. Strictly speaking, the above should be taken as definitions. They are


different from the things (definitions or theorems) you’ve seen in analy-
sis, however alike they seem. All results you know in other settings are
true and should, strictly speaking be proved, but we shan’t. To give
an idea what sort of properties we mean — there are many — here are
a few:

z m (z n G(z)) = z m+n G(z)


(F G)H = F (GH) for all GF’s F, G, H,
(F G)0 = F 0 G + F G0 for all GF’s F, G, (23)
d z
e = ez .
dz

Exercise. Prove (23).

6.2 Representations of sequences

There are many ways to define a sequence a0 , a1 , . . . of complex numbers


(usually natural numbers for us). Here are the ones we’re interested in,
illustrated by the example where an = 2n .

(1) Cardinality. Give a set Bn such that an = |Bn |. In our example,


Bn = {1, 2}n .

48
(2) Recursion. A formula which expresses each an in the previous terms.
For example an = 2an−1 or

an = an−1 + an−2 + · · · + a0 + 1.

(3) Equation
P for the GF. An equation involving the generating function
A(x) = n>0 an xn . We distinguish between the following.

(3a) Algebraic equation. For example

(1 − 2x)2 A(x)2 + (1 − 2x)A(x) = 2.

(3b) Differential equation. This is an equation involving at least one of


A0 (x), A00 (x), . . . (derivatives) and possibly A(x). For example
2 A(x)
A0 (x) = .
1 − 2x

(3c) Functional equation. Different arguments of the GF are combined


in one equation. For example

2 A(2x2 ) = A(x) + A(−x).

(4) Formula for GF. For example


1
A(x) = .
1 − 2x

(5) Partial fraction expansion for GF. For example (again)


1
A(x) = .
1 − 2x

(6) Formula for an . In our example, an = 2n .

Many problems are of the following sort: a sequence an is given by one of


the above methods. You’re then asked to translate it into a specified other
method. Our favourite line is

(1) → (2) → (3) → (4) → (5) → (6)

but anything else is also possible.

6.3 Partial Fractions

You’re supposed to know about partial fractions, but for all clarity we give
the main result and a few examples.

In this section we are interested in the steps

(4) Formula for GF → (5) Partial fraction → (6) Formula for an

49
By (4) we are given a direct formula for a GF. We suppose it’s a rational
function (a quotient of two polynomials) because only then (5) makes sense.

By C[z] we denote the set of polynomials a0 + a1 z + · · · + an z n in one variable


z. The degree of this polynomial is n, if an 6= 0. The degree of the zero
polynomial is −1. In C[z], we can add, substract and multiply; but the
quotient of two polynomials is not always again a polynomial.

By C(z) we shall denote the set of rational functions, that is, quotients of
polynomials.

In C(z) one can add, substract, multiply and divide by nonzero elements.
In particular, it is a vector space over C. The following proposition gives a
basis.

Proposition 15 Partial Fractions:

(a) C(z) has the basis


n ¯ o n ¯ o
z k ¯ 0 6 k ∪ (z − α)−k ¯ α ∈ C, 0 < k .
¯ ¯

(b) Let q ∈ C[z] be a nonzero polynomial, and let d > 0. Let V = V (q, d) ⊂
C(z) be the set of rational functions whose denominator is q (or a
divisor of it), and such that the degree of the numerator is smaller than
d plus the degree of the denominator. Here is a basis for the complex
vector space V (q, d), which is indeed a subset of the basis of (a):
n ¯ o n ¯ o
z k ¯ 0 6 k < d ∪ (z − α)−k ¯ 0 < k, (z − α)−k q ∈ C[z] .
¯ ¯

Expressing a given rational function in its partial fraction simply means writ-
ing it as a linear combination of the particular basis given in part (a) of the
above proposition. In practice, one calculates a partial fractions expansion
by choosing q and d such that the involved rational function is in V (q, d) –
usually one immediately knows such q and d. Then one needs only consider
the finite basis of V (q, d) given in part (b).

Example 32: Find the partial fraction expansion of


−25x
G(x) = .
(1 − 2x)2 (1 + 3x)

Solution. By Proposition 15 the answer must be of the form


−25x a b c
2
= + 2
+
(1 − 2x) (1 + 3x) 1 − 2x (1 − 2x) 1 + 3x
where a, b, c are constants yet to be found. On multiplying both sides with
the denominator (1 − 2x)2 (1 + 3x) we find
−25x = a(1 − 2x)(1 + 3x) + b(1 + 3x) + c(1 − 2x)2
= a(1 + x − 6x2 ) + b(1 + 3x) + c(1 − 4x + 4x2 )
= (a + b + c) + (a + 3b − 4c)x + (−6a + 4c)x2

50
so 

 a+b+c=0
a + 3b − 4c = −25


−6a + 4c = 0.
You know how to solve this and you find (a, b, c) = (2, −5, 3), so
2 −5 3
G(x) = + 2
+ . (24)
1 − 2x (1 − 2x) 1 + 3x

The second and last step is to turn partial fraction expansions into a direct
formula for the coefficients. This is aided by the following formula.

Proposition 16 Let n > 0. Then

1 X µn + k − 1 ¶
= xk .
(1 − x)n k>0
k

Proof. We have
X µ−n¶ X µn + k − 1 ¶
k
(1 − x) −n
= (−x) = xk .
k>0
k k>0
k

The first identity is the Binomial Theorem 6, the second identity is upper
negation (14). 2

Example 33: Compute the coefficients in the GF (24).

Solution. By Proposition 16 we have


2 −5 3
G(x) = + 2
+
1 − 2x (1 − 2x) 1 + 3x
X³ ´
= 2(2x)k − 5(k + 1)(2x)k + 3(−3x)k
k>0

so the coefficient of xk is

2k+1 − 5(k + 1)2k − 3k+1 .

Remark. This remark does not belong to the syllabus, but you may find it
helpful because it explains what power series of rational functions look like,
and in practice many generating functions are rational functions. Let G(z)
be the generating function of (g0 , g1 , . . .). Then the following can be shown
to be equivalent:

1. G(z) is a rational function.

51
2. There is a recurrence a0 gn + a1 gn−1 + . . . + ak gn−k = 0 (ai ∈ C, a0 6= 0),
which is true for all n big enough.

3. There are nonzero ci ∈ C (i in some finite set I) and polynomials


pi (x) ∈ C[x] such that for n big enough, gn is given by
X
gn = pi (n) cni .
i∈I

Exercise: Prove this, using Proposition 15 on partial fractions.


n
X (−1)k k
Exercise. Use partial fractions to evaluate the sum .
k=1
4k 2 − 1

6.4 Solving Recurrences

The examples in this subsection have in common that the recursion formula
for the sequence (gn )n is linear inhomogeneous, that is, of the form

gn = linear combination of the previous ones + cn

where cn is known (and simple).

The procedure for finding a formula for gn from the recursion formula is as
follows.

Step 1. Write down a single equation for a recurrence, true for all n.

Step 2. Multiply by z n and sum over n. This gives an equation in G(z).

Step 3. Solve the equation to get an expression for G(z).

Step 4. Solution gn is coefficient of z n in G(z).

Example 34: Fibonacci Numbers:

g0 = 0 , g1 = 1 , gn = gn−1 + gn−2

But now we must let gn = 0 for all n < 0.

Step 1:
gn = gn−1 + gn−2 , for n > 2 (25)
What happens for n < 2?

n = 1, 1 = g1 = g0 + g−1 + 1
n = 0, 0 = g0 = g−1 + g−2

In fact, (25) is true for all n 6 0. Wrong only for n = 1. Single equation is

gn = gn−1 + gn−2 + [n = 1] , for all n ∈ Z .

52
Step 2:
gn z n = gn−1 z n + gn−2 z n + [n = 1]z n
Sum over n ∈ Z to get

G(z) = zG(z) + z 2 G(z) + z .

Step 3:
z
G(z) =
1 − z − z2
Now factorise denominator and use partial fractions. Note: if x2 + ax + b = 0
has roots α, β then 1 + az + bz 2 = (1 − αz)(1 − βz).
√ In this case, the roots
of x2 − x − 1 = 0 are φ and φ̂ where φ = (1 + 5)/2. So 1 − z − z 2 =
(1 − φz)(1 − φ̂z). By partial fractions,
µ ¶
z 1 1 1
=√ −
(1 − φz)(1 − φ̂z) 5 1 − φz 1 − φ̂z

since 5 = φ − φ̂.

Step 4: √ Coefficient of z n in 1/(1 − φz) is φn for all n > 0. So gn =


n n
(φ − φ̂ )/ 5

Example 35: g0 = g1 = 1

gn = gn−1 + 2gn−2 + (−1)n (n > 2)

n 0 1 2 3 4 5 6 7
gn 1 1 4 5 14 23 52 97

Step 1:

n = 1, 1 = g1 = g0 + 2g−1 + (−1)n + 1
n = 0, 1 = g0 = g−1 + 2g−2 + (−1)n
n < 0, gn = gn−1 + 2gn−2

So single equation:

gn = gn−1 + 2gn−2 + (−1)n [n > 0] + [n = 1] for all n ∈ Z

Step 2: X
G(z) = zG(z) + 2z 2 G(z) + (−1)n z n + z
n>0

53
Step 3: Note that
X 1
(−1)n z n = .
n>0
1+z

1 + z(1 + z)
G(z) =
(1 + z)(1 − z − 2z 2 )
1 + z + z2
=
(1 − 2z)(1 + z)2
A B C
= + +
1 − 2z (1 + z) (1 + z)2

Solve for A, B, C by comparing coefficients to get A = 7/9 , B = −1/9 , C =


1/3.
7 1 1
G(z) = − + .
9(1 − 2z) 9(1 + z) 3(1 + z)2

Step 4: Expressions for each of the three terms can easily be obtained from
the Binomial Theorem. Adding them, we find: coefficient of z n is
µ ¶
7 n 1 n n+1 n 7 · 2n n 2
2 − (−1) + (−1) = + + (−1)n
9 9 3 9 3 9

For example n = 4, 7 × 16/9 + 14/9 = 14

Example 36:

Part 1. How many ways can you tile a 2 × n rectangle with n dominoes
(1 × 2 rectangles). Call this tn .

n=0 n=1 n=2


t0 = 1 t1 = 1 t2 = 2

n=3
t3 = 3

For recurrence pattern at left is either

54
tn−1 ways of filling the remainder

tn−2 ways of filling the remainder

tn = tn−1 + tn−2 ⇒ tn = Fn+1

Part 2. Let un be the number of ways of tiling a 3 × n rectangle with 1 × 2


dominoes. u0 = 1, u1 = 0, in fact un = 0 for n odd since 3n is odd.

u2 = 3

Pattern on left starts in one of the following 3 ways:

Let vn−1 be the number of ways of completing a tiling of the 3 × n rectangle


after starting on the left in the first (or, symmetrically, the third) of these
ways. That is, vn−1 is the number of ways of tiling the shape

1 3
2

n−1

Then un = un−2 + 2vn−1 (with u0 = 1 , u1 = 1).

Moreover, as the following figure shows, vn−1 = un−2 + vn−3 ,

55
u n−2 ways v n−3 ways

n−1 n−1

i.e. vn = un−1 + vn−2 , for n > 2.

Evidently v0 = 0, v1 = 1.

Let u, v be generating functions for (un ) , (vn ).

Step 1: Equations valid for all n:


un = un−2 + 2vn−1 + [n = 0]
vn = un−1 + vn−2

Step 2:
U (z) = z 2 U (z) + 2zV (z) + 1 (26)
V (z) = zU (z) + z 2 V (z) (27)

Step 3: (27) gives


zU (z)
V (z) = .
1 − z2
Substitute into (26):
2z 2 U (z)
U (z) = 2
+ z 2 U (z) + 1
1−z
1 − z2 1−w
U (z) = = , where w = z 2
1 − 4z 2 + z 4 1 − 4w + w2

Roots of x2 − 4x + 1 are 2 ± 3. So
1−w
U (z) = √ √
(1 − (2 + 3)w)(1 − (2 − 3)w)
√ √
3+ 3 1 3− 3 1
= · √ + · √
6 1 − (2 + 3)w 6 1 − (2 − 3)w
(by partial fractions). Note that

3+ 3 1
= √ .
6 3− 3

Step 4: Coefficient of z n
√ √
(2 + 3)n (2 − 3)n
un = √ + √ (so u2n+1 = 0)
3− 3 3+ 3

56
n 0 1 2 3 4 5 6 7 8
un 1 0 3 0 11 0 41 0 153
vn 0 1 0 4 0 15 0 56 0

Example 37: Let pn denote the number of ways to tile a 2 × n rectangle


by dominoes and hooks. Find a recurrence formula for pn .

domino hook

Solution. We can write n


z }| {
pn =

if we agree that a picture stands for a number, namely the number of ways
to tile the picture by dominoes and hooks. Or the number of ways to finish
the tiling if a tiling has already begun (this applies in the pictures starting
from (28)).

We also define n
z }| {
qn = .

By considering all possibilities to cover the top left box, we find


n
z }| {
pn = = + + +

= + + +
= pn−1 + pn−2 + 2qn−1 . (28)

Also
qn = = + = qn−1 + pn−2 . (29)

Equation (28) helps us express qn in terms of the pi :

2qn−1 = pn − pn−1 − pn−2 .

Inserting this into (29) we find

0 = 2qn − 2qn−1 − 2pn−2


= (pn+1 − pn − pn−1 ) − (pn − pn−1 − pn−2 ) − 2pn−2
= pn+1 − 2pn − pn−2 .

57
Example 38: Given a product x0 x1 x2 . . . xn of n + 1 terms. How many
different ways can we bracket this? i.e. how many different ways can we
calculate it using n multiplications? Call it cn .

c0 = c 1 = 1

n=2 (x0 x1 )x2 or x0 (x1 x2 )


⇒ c2 = 2
n=3 x0 (x1 (x2 x3 )) , x0 ((x1 x2 )x3 ) , (x0 x1 )(x2 x3 )
((x0 x1 )x2 ))x3 , x0 (x1 x2 )x3
⇒ c3 = 5

To get a recurrence, suppose the final multiplication occurs between xk and


xk+1 ³ ´ ³ ´
(x0 . . . )(. . . xk ) × (xk+1 . . .)(. . . xn )
There are ck ways of bracketing the first term and cn−k−1 ways of bracketing
the second term. So
n−1
X
cn = ck cn−k−1 = c0 cn−1 + c1 cn−2 + . . . + cn−1 c0 .
k=0

Step 1: For n = 0, c0 = 1, RHS = 0, so


X
cn = ck cn−1−k + [n = 0] , for all n ∈ Z .
k

Step 2: Let C(z) be the generating function of (cn ). Then


XµX ¶ XµX ¶
n
C(z) = ck cn−1−k z + 1 = z ck cn−1−k z n−1 + 1
n k n k
XµX ¶
=z ck cn−k z n + 1 = zC(z)2 + 1
n k

(we have used convolution of sequences, (5) in the Basic Operations, to get
the last line here).

Hence
zC(z)2 − C(z) + 1 = 0,
and

Step 3: √
1± 1 − 4z
C(z) =
2z

58
With positive sign get term 1/z in expansion, so that cannot be correct. So
negative sign must be correct.

1 − 4z = (1 − 4z)1/2
X µ 1/2 ¶
= (−4z)k , by binomial theorem
k
k>0
X 1 µ −1/2 ¶
= 1+ (−4z)k , by absorption.
2k k − 1
k>1

So

1− 1 − 4z
C(z) =
2z
X 1 µ −1/2 ¶ X µ −1/2 ¶ (−4z)k
k−1
= (−4z) = .
k>1
k k−1 k>0
k k+1

Note that
µ ¶
−1/2 −1/2 × −3/2 × −5/2 × . . . × (−2k − 1)/2
=
k k!
k
(−1)
= k (1 × 3 × 5 × . . . × (2k − 1))
2 k!
(−1)k (2k)!
=
2k k! 2 × 4 × 6 . . . × 2k
µ ¶ µ ¶
(−1)k (2k)! (−1)k 2k 1 2k
= k k = = .
2 k!2 k! 4k k (−4)k k

So
X µ 2k ¶ z k
C(z) = .
k k + 1
k>0

Step 4: µ ¶
2n 1
cn =
n n+1
These are called the Catalan Numbers.

n 0 1 2 3 4 5 6 7 8 9
cn 1 1 2 5 14 42 132 429 1430 4862

cn+1
lim =4
n→∞ cn

Another example of where these numbers arise: How many solutions are
there of:

a1 + a2 + . . . + a2n = 0, ai = ±1
a1 + a2 + . . . + aj > 0, 0 6 j 6 2n

59
Call this number dn . Let S2k = a1 + a2 + . . . + a2k . Let k be maximal with
k < n and S2k = 0 (possibly k = 0). There are dk possibilities for a1 , . . . , a2k ,
we must have a2k+1 = 1 and a2n = −1.
P
dn−k−1 possibilities for a2k+2 , . . . , a2n−1 . So we get recurrence dn = k dk dn−k−1
(n > 1). Since this is the same recurrence as for (cn ), and the initial values
c1 = d1 , c2 = d2 are the same, cn = dn for all n.

Example 39: Spanning trees.

To avoid all confusion, let us recall a few definitions about graphs. A graph
is a pair (V, E) where V is a set and E a subset of

V (2) = {{x, y} | x, y ∈ V, x 6= y}.

The elements of V are the vertices, the elements of E the edges. In a picture,
the elements of V are depicted by points and an edge {x, y} by a line connect-
ing the vertices x and y. Thus, the graph ({1, 2, 3, 4}, {{1, 2}, {1, 3}, {1, 4}})
can be depicted as follows.
4  
3

 

1 2
A graph (E, V ) is connected if any two of its vertices x, y can be connected
by a path
x = z0 , z1 , . . . , z k = y
which means that {zi , zi+1 } is an edge for all i. This path is called a cycle if
z0 , . . . , zk−1 are all different but zk = z0 , and k > 2.

A tree is a connected graph without cycles. A spanning tree of a graph (V, E)


is a tree (V, T ) with T ⊂ E.


 

 89
,- 67,-

*+
v4

 :; 45*+
v3
   ./ *+

v2
 ()&'$%"#!
   <=./ DEBC@A>?<=:;896745 *+

v1
 () >? DE23

 &' 01 23

 $% 01 @A

! "# BC01 01

The almost-wheel G18 One of its spanning trees


For integers n > 0, let Gn denote the almost wheel shown in the picture. It
has n + 1 vertices, one of which is depicted in the middle. Note that one of
the ‘edges’ on the boundary is missing. On the right hand side, we see one
of its spanning trees.

60
Let v1 , . . ., vn denote the vertices of Gn different from the origin, in this order
starting and ending at the missing edge.

Let tn denote the number of spanning trees of Gn . In order to find a formula


for tn , we begin with a recurrence formula.

The spanning tree in the picture is said to be of type 4; more generally, we


call a spanning tree of Gn of type k if {vi , vi+1 } is an edge for 1 6 i < k but
not for i = k. The result is that v1 , . . . , vk are connected along the boundary
with each other but not with vk+1 .

In order to count the number of type k spanning trees, note that they depend
on two things. Firstly, one among v1 , . . . , vk needs to be connected to the
origin: there are k choices for this. Secondly, all vertices except v1 , . . ., vk
form a smaller almost-wheel Gn−k . A spanning tree for the Gn−k should be
chosen for which there are tn−k choices.

So there are k tn−k spanning trees of Gn of type k, and


n
X n
X
tn = k tn−k = k tn−k if n > 1.
k=1 k=0

Let us write
X
T (z) = tn z n = generating function of tn .
n>0

Then
X X n
X
n n
T (z) = 1 + tn z = 1 + z k tn−k
n>1 n>0 k=0
µX ¶µ X ¶
z
=1+ k zk t` z ` =1+ T (z).
k>0 `>0
(1 − z)2

It is easy to solve this for T (z):


1 (1 − z)2
T (z) = z = .
1− (1−z)2
1 − 3z + z 2

Unfortunately it is beyond our scope to look at the so-called matrix-tree


theorem which gives the number of spanning trees of any graph as the de-
terminant of a matrix.

6.5 The simplest sort of differential equations

Let f be a function of one variable, which may be a function of a real number,


a function of a complex number, or a generating function. An ordinary
differential equation is an equation involving one of f 0 , f 00 , . . . (derivatives)
and possibly f . Usually the function f in a differential equation is unknown,
the aim being to say something about the functions solving it.

61
We will be interested in differential equations of the form

f 0 (x) = a(x) f (x) (30)

where a(x) is an (explicitly) given function, and one looks for solutions f .
Such equations are called linear homogeneous ordinary differential equations
of order 1.

Proposition 17 Let A(x) denote a primitive of a(x) (that is, a(x) = A0 (x)).
Then all solutions of (30) are given by

f (x) = c eA(x) (31)

for a constant c.

Warning. This proposition needs to be taken with a grain of salt. For


example, some generating functions are not the exponential of any generating
function.

Proof. It is straightforward to show that the functions (31) are solutions


to (30). We will now show that the converse also holds. It is also a way of
reproducing the proposition, if you forget.

Suppose f (x) satisfies (30). First we divide both sides by f (x):

f 0 (x)
= a(x).
f (x)
Then we note that a primitive of the left hand side is known:
¡ ¢0
log f (x) = a(x).

Now we use that A(x) is a primitive of a(x). Primitives are unique up to


additive constants so we may write

log f (x) = A(x) + b.

Taking the exponential of both sides gives (31) with c = eb . 2

A typical example of how to apply the proposition to generating functions is


example 42 on Bell numbers.

6.6 Exponential Generating Functions

The EGF (Exponential Generating Function) for the sequence (g0 , g1 , g2 , . . .)


is X gn z n
Ĝ(z) =
n>0
n!

For example (1, 1, 1, . . .) has EGF ez , whereas it has GF (1 − z)−1 .

62
This is only a minor variation on the definition of ordinary Generating Func-
tion, but in some situations it is a lot more convenient. Some recurrences are
much easier to solve using EGF’s rather than GF’s.

Example 40:
g0 = 0 , 3gn = ngn−1 + 2n! (n > 1)

Step 1: n = 0, LHS = 0, RHS = 2. So

3gn = ngn−1 + 2n! − 2[n = 0]

Step 2: Multiply by z n /n! and sum


3gn z n gn−1 z n n zn
= + 2z − 2[n = 0]
n! (n − 1)! n!

Sum over n for n > 0.


2
3Ĝ(z) = z Ĝ(z) + −2
1−z
2 2
⇒ Ĝ(z) = −
(1 − z)(3 − z) 3 − z
1 1 2
= − −
1−z 3−z 3−z
1 1 X X zn
= − = zn −
1 − z 1 − z/3 3n

So gn = n! − n!/3n .

Binomial Convolution
Let F̂ (z) and Ĝ(z) be EGF’s of (f0 , f1 , . . .) and (g0 , g1 , . . .).

Let
µX ¶µ X ¶
fi z i gj z j
Ĥ(z) = F̂ (z)Ĝ(z) =
i>0
i! j>0
j!
µ
X X fk gn−k ¶ X hn
= zn = zn
n>0 k
k!(n − k)! n>0
n!

where
µ ¶
X n! X n
hn = fk gn−k = fk gn−k .
k!(n − k)! k
k k

We call the sequence (hn ) defined by this formula the binomial convolution
of the sequences (fn ) and (gn ). Thus, binomial convolution of sequences
corresponds to multiplication of their EGF’s.

63
Example 41: Bernoulli numbers.

Definition. The Bernoulli numbers Bn (n > 0) are defined by


z X zn
= Bn . (32)
ez − 1 n>0 n!

The denominator ez − 1 starts as z + 12 z 2 + · · · , which is why one puts a


z in the numerator (otherwise the sum on the right would have to start at
n = −1).
m µ ¶
X m+1
Exercise. Use binomial convolution to show Bj = [m = 0].
j=0
j

Exercise. Show that Bn = 0 if n > 2 is odd. Hint: let f (z) denote the
function defined in (32) and consider f (z) − f (−z).

Here are some values for the Bernoulli numbers.

n 0 1 2 3 4 5 6 7 8 9 10
Bn 1 − 12 1
6
0 1
− 30 0 1
42
0 1
− 30 0 5
66

Definition. The Bernoulli polynomials Bn (x) (n > 0) are defined by


z exz X zn
= B n (x) . (33)
ez − 1 n>0 n!

It is clear that
Bn (0) = Bn .
We will now show that Bn (x) is indeed a polynomial, and we will express
these polynomials in terms of the Bernoulli numbers. We have
µX ¶µ X ¶
X z n (33) z xz zk kz
k
Bn (x) = z e = Bk x
n>0
n! e − 1 k>0
k! k>0
k!

where the last equality is by (32) and the exponential series. By binomial
convolution we find n µ ¶
X n
Bn (x) = Bk xn−k .
k=0
k

Proposition 18
n
X Bm (n + 1) − Bm
k m−1 = .
k=0
m

64
Proof. We have
X³ ´ z m (33) z e(x+1)z z exz
Bm (x + 1) − Bm (x) = −
m>0
m! ez − 1 ez − 1
z exz (ez − 1) xz
X
m z
m+1 X
m−1 zm
= = z e = x = x .
ez − 1 m>0
m! m>1
(m − 1)!

Taking coefficients of z m on both sides gives

Bm (x + 1) − Bm (x) = m xm−1 .

Write k = x and sum over k ∈ {0, 1, . . . , n}:


n
X n ³
X ´
m−1
m k = Bm (k + 1) − Bm (k)
k=0 k=0
= Bm (n + 1) − Bm (0) = Bm (n + 1) − Bm . 2

Example 42: Bell numbers.

The Bell number b(n) is the number of partitions of Xn . So we have


n ½ ¾
X n
b(n) = .
k=0
k

We have the following recursion:


X µnumber of partitions of Xn+1 such¶
b(n + 1) =
that n + 1 is in a part of size k + 1
k>0
n µ ¶
X n
= b(n − k), (n > 0).
k=0
k

So if we put
X zn
B(z) = b(n) = exponential GF of the Bell numbers
n>0
n!

then we find, on differentiating


X z n−1 X zn
B 0 (z) = b(n) = b(n + 1)
n>1
(n − 1)! n>0 n!
n µ ¶ n
XX n z n X X b(k) 1
= b(k) = zn
n>0 k=0
k n! n>0 k=0
k! (n − k)!
µX ¶µ X ` ¶
b(k) k z
= z
k>0
k! `>0
`!
= B(z)ez .

65
Therefore,
d B 0 (z)
log B(z) = = ez , log B(z) = ez + c,
dz B(z)
and z
B(z) = ee + c
for some constant c. Since e1+c = B(0) = b(0) = 1 we must have c = −1 so
z
B(z) = ee − 1 .

We thus find a closed formula for the exponential generating function of the
Bell numbers, but a closed formula for the Bell numbers is not possible.

X kn
Exercise. Prove b(n) = e−1 . (Ignore convergence questions.)
k>0
k!

Example 43: Higher Derangements.

In section 4.5 we considered derangements: a derangement of Xn = {1, . . . , n}


is a permutation of Xn without fixed points. We found the following formula
for the number D(n) of derangements of Xn :
n
X (−1)`
D(n) = n! .
`=0
`!

A k-derangement of Xn is a permutation g of Xn all of whose orbits have more


than k elements (recall that a g-orbit is a set of the form {g m (x) | m ∈ Z}
where x ∈ Xn ). Let D(n, k) denote the number of k-derangements of Xn .

So a 1-derangement is just a derangement, and D(n, 1) = D(n).

We will find a formula for the exponential generating function


X D(n, k)
f (x) := xn .
n>0
n!

Let E(n + 1, k, p) denote the set of k-derangements in Sn+1 such that n + 1


is in a (p + 1)-cycle. So
X
D(n + 1, k) = #E(n + 1, k, p). (34)
p>k

We compute #E(n + 1, k, p). One chooses an element g of E(n + 1, k, p) in


three steps.
¡ ¢
Step 1. Choose the orbit of n + 1, that is, {g ` (n + 1) : ` ∈ Z}. There are np
choices because the orbit has p + 1 elements.

Step 2. Choose the cycle of n + 1. There are p! choices.

66
Step 3. Finish g. There are (n + 1) − (p + 1) = n − p elements left to permute
so the number of choices is D(n − p, k).

We are dealing with a homogeneous tree so


µ ¶
n n! D(n − p, k)
#E(n + 1, k, p) = p! D(n − p, k) =
p (n − p)!

and by (34)
n
X n! D(n − p, k)
D(n + 1, k) = . (35)
p=k
(n − p)!

We find
X D(n, k) xn−1 X D(n + 1, k) xn
f 0 (x) = =
n>1
(n − 1)! n>0
n!
X D(n + 1, k) xn
= because the other terms are zero
n>k
n!
n
X xn µ X ¶
n! D(n − p, k)
= by (35)
n>k
n! p=k
(n − p)!
X X xm+p D(m, k)
=
p>k m>0
m!
µ X ¶µ X ¶
p D(m, k) xm xk
= x = f (x).
p>k m>0
m! 1 − x

In section 6.5 we have learned how to solve a differential equation like the
one here, f 0 (x) = xk (1 − x)−1 f (x). We find

f0 xk
= ,
f 1−x
1 ¡ ¢
(log f )0 = xk + xk+1 + · · · = − 1 + x + x2 + · · · + xk−1 ,
1−x
¡ x x3 xk ¢
log f = − log(1 − x) − x + + + ··· + +c (and c = 0),
2 3 k
2 3 k
−(x + x2 + x3 + · · · + xk )
e
f (x) = .
1−x
A closed formula for D(n, k) is not possible.

Exercise. Show that


D(n, k)
lim
n!
n→∞

exists and compute it. (In this exercise, you may use, without proving it,
that there are a0 , a1 , . . . ∈ C such that
x2 x3 xk
−(x + 2
+ 3
+ ··· + k
) X
e = an x n
n>0

67
for any complex number x, in the sense that the right hand side converges to
the left hand side. By the way, if you know a little complex function theory
then you can prove this by observing that the left hand side is a holomorphic
function in x on the complex plane.)

6.7 Generating functions in more variables

Generating functions in more variables exist too.

Example 44: Define ex to be the formal power series


X xn
ex = .
n>0
n!

Our aim is to prove


ex ey = ex+y .
We don’t allow ourselves to use any similar looking result from analysis.
Since the assertion is about formal power series, we don’t need to consider
convergence questions.

The proof uses the binomial theorem 6 and goes as follows:


X (x + y)n n µ ¶ k n−k
x+y
XX n x y
e = =
n>0
n! n>0 k=0
k n!
n n
XX n! xk y n−k X X xk y n−k
= = =
n>0 k=0
k! (n − k)! n! n>0 k=0
k! (n − k)!
µ X k ¶µ X ` ¶
x y
= = e x ey .
k>0
k! `>0
`!

X µk + ` ¶ 1
Exercise. (a) Prove: xk y ` = .
k,`>0
k 1−x−y

n µ ¶
X n−m P
(b) Put an := . Use (a) to compute A(t) = n>0 an tn .
m=0
m

Exercise. Prove
X x1 · · · x k
min(n1 , . . . , nk ) xn1 1 · · · xnk k = .
n1 ,...,nk >0
(1 − x1 ) · · · (1 − xk )(1 − x1 · · · xk )

68
7 Discrete Probability

7.1 Sample Spaces and Random Variables

A discrete sample space is a set Ω together


P with a function P : Ω → [0, 1]
(the probability function) such that ω∈Ω P (ω) = 1. In Pmost examples such
a set is finite or countable; indeed, from the fact that ω∈Ω P (ω) converges
to a finite sum, it follows that P (ω) = 0 for all but countably many ω
(Exercise: prove this. It’s not hard — for each n, for how many ω can we
have P (ω) > 1/n?)

The space Ω is uniform if P is constant.

A subset A of Ω is called an event. We define


X
P (A) = P (ω).
ω∈A

Example 45: Ω = {1, 2, 3, 4, 5, 6}, P (ω) = 1/6 for all ω ∈ Ω (for example
throwing a dice)
1 1 1
A = {5, 6}, P (A) = + =
6 6 3

If A, B ⊆ Ω and A ∩ B = ∅ then P (A ∪ B) = P (A) + P (B). More generally,

P (A ∪ B) = P (A) + P (B) − P (A ∩ B)

(Exercise).

A random variable (RV), X, is a function defined on a sample space Ω.


Usually, X : Ω → R. For x ∈ Range(X) we have
X
P (X = x) = P (ω)
ω∈Ω
X(ω)=x

1
Example 46: Ω = {(i, j) | i, j ∈ [1 . . . 6]}, P (ω) = 36
for all ω ∈ Ω (throw
dice twice). Then

S1 : (i, j) → i
S2 : (i, j) → j
S : (i, j) → i + j

are all random variables defined on Ω and S = S1 + S2 .

x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P (S = x) 36 36 36 36 36 36 36 36 36 36 36

Range(S) = [2 . . . 12]

69
Two random variables X : Ω → T1 and Y : Ω → T2 are called independent if

P (X = x and Y = y) = P (X = x)P (Y = y)

for all x ∈ T1 and y ∈ T2 . So in the above example S1 and S2 are independent


but S1 and S1 + S2 are not independent.

We define the mean or expected value E(X) of the RV X to be


X X
xP (X = x) = X(ω)P (ω)
x∈X(Ω) ω∈Ω

For example, for S : (i, j) → i + j as above


1 2 1
E(S) = 2 × +3× + . . . + 12 × =7
36 36 36
In general,

E(X + Y ) = E(X) + E(Y )


1+2+3+4+5+6 7
E(S1 ) = = .
6 2
So
7 7
E(S) = E(S1 ) + E(S2 ) = + = 7.
2 2
What about E(XY )? If X and Y are independent, then we claim that
E(XY ) = E(X)E(Y ). In the following proof, we use the fact that Ω is
partitioned into disjoint sets {ω ∈ Ω : X(ω) = x, Y (ω) = y} as x and y vary
in X(Ω) and Y (Ω).
X
E(XY ) = X(ω)Y (ω)P (ω)
ω∈Ω
X µ X ¶
= xyP (ω)
x∈X(Ω), y∈Y (Ω) X(ω)=x, Y (ω)=y
X
= xyP (X = x, Y = y)
x∈X(Ω), y∈Y (Ω)
X
= xyP (X = x)P (Y = y)
x∈X(Ω), y∈Y (Ω)
X X
= xP (X = x) yP (Y = y)
x∈X(Ω) y∈Y (Ω)

= E(X)E(Y ).

For example E(S1 S2 ) = 49/4. We often use µ, or µ(X), to denote the


expected value E(X).

The variance V (X) of X is defined to be E((X − µ)2 ); it is a measure of the


deviation of X from the mean (of the spread of distribution). This has nicer
mathematical properties than other possibilities such as E(|X − µ|)

70
p
The standard deviation σ(X) is V (X)

V (X) = E((X − µ)2 )


= E(X 2 − 2Xµ + µ2 )
= E(X 2 ) − 2µE(X) + µ2
= E(X 2 ) − µ2
= E(X 2 ) − E(X)2

Let X and Y be two independent RVs on Ω.

V (X + Y ) = E((X + Y )2 ) − E(X + Y )2
= E(X 2 ) + E(Y 2 ) + 2E(XY ) − E(X)2 − E(Y )2 − 2E(X)E(Y )

But
E(XY ) = E(X)E(Y )
by independence. So

V (X + Y ) = V (X) + V (Y ) .

Note that
V (X + X) = V (2X) = 4V (X)
so we need independence. In the dice example

12 + 22 + . . . + 62 49 35
V (S1 ) = − = ,
6 4 12
35
V (S) = V (S1 + S2 ) = .
6
Choose any α > 0,
X
V (X) = (X(ω) − µ)2 P (ω)
ω∈Ω
X
> (X(ω) − µ)2 P (ω)
ω∈Ω
(X(ω)−µ)2 >α
X
> αP (ω)
ω∈Ω
(X(ω)−µ)2 >α

= αP ((X − µ)2 > α)

Define c by

α = c2 σ 2 (where σ = V is the standard deviation of X)

Then provided α > 0,

V (X)
P ((X − µ)2 > α) 6
α
1
⇒ P (|X − µ| > cσ) 6 2 (36)
c

71
So the probability that the difference from the mean is greater than or equal
to c times the standard deviation, is less than or equal to 1/c2 . This is true
for any RV on any sample space. In specific cases, we get a stronger result.
For example, in the normal distribution
P (|X − µ| > 2σ) ∼ 0.05
We next discuss the notion of multiple independent instances of a single
random variable X. This models the idea of repeated independent trials, e.g.
n independent dice throws. Suppose Ω1 is a sample space, with probability
measure P1 : Ω1 → [0, 1]. Define Ωn = Ω1 × · · · × Ω1 (n times). We define a
probability measure Pn : Ωn → [0, 1] by

Pn (ω1 , . . . , ωn ) = P1 (ω1 ) × · · · × P1 (ωn ).

Exercise: Show that this is a probability measure, i.e. that


X
Pn (ω1 . . . , ωn ) = 1.
(ω1 ,...,ωn )∈Ωn

More generally, show that if Ai ⊂ Ω1 for i = 1, . . . , n then


Pn (A1 × · · · × An ) = P1 (A1 ) × · · · × P1 (An ).

We get our “n independent instances of X”, which we now call X1 , . . . , Xn ,


by defining
Xi (ω1 , . . . , ωn ) = X(ωi ).
Exercise: Show that the random variables Xi and Xj are independent for
i 6= j.

It is more or less obvious that


Pn (Xi = x) = P1 (X = x);
for the set
{(ω1 , . . . , ωn ) : Xi (ω1 , . . . , ωn ) = x}
is equal to
Ω1 × · · · × Ω1 × {ωi : X(ωi ) = x} × Ω1 × · · · × Ω1 .

Given n independent instances of X, say X1 , X2 , . . . Xn , let S = X1 +· · ·+Xn .


Then
E(S) = nE(X)
=nµ
V (S) =nV (X)

σ(S) = nσ
E(S/n) = µ
σ
σ(S/n) = √
n

72
The spread of averages of n instances is smaller than the spread of single
instances.
µ¯ ¯ ¶
¯S ¯ cσ 1
P ¯¯ − µ¯¯ > √ 6 2.
n n c

Or putting d = c/ n,
µ¯ ¯ ¶
¯S ¯ 1
P ¯¯ − µ¯¯ > dσ 6 2 .
n nd

7.2 Probability Generating Functions (PGF’s)

Let Ω be a sample space and X a RV taking values in N0 = N ∪ {0}. Let


gk = P (X = k) , for k > 0
Then the PGF of the RV X is just the generating function of the sequence
(gk ).
X
GX (z) = gk z k
k>0

Since X
gk = 1
k>0

(for the subsets {X = k} partition the sample space Ω), we must have
GX (1) = 1. And so GX (1) must be a convergent series. Moreover,
X
E(X) = P (X = k)k = G0X (1) , where G0X is the derivative of GX
k

This series is not always convergent, that is, E(X) can be infinite:

Example 47: we toss a coin until we get tails, and if we get k heads before
the tail, then you pay me 2k pennies. The sample space Ω is
Ω = {t, ht, hht, hhht, . . . , }.
Denote h · · · ht (k heads followed by a tail) by hk t. Provided the coin is
fair, heads and tails each have probability 1/2; we assume the outcomes of
successive tosses are independent, so the probability measure P : Ω → [0, 1]
is given by P (hk t) = (1/2)k+1 . The random variable X is X(hk t) = 2k , so
X 2k X1
E(X) = k+1
= =∞
k>0
2 k>0
2

For any random variable X,


X
E(X 2 ) = k 2 P (X = k)
k>0
X
= (k(k − 1) + k)P (X = k)
k>0

= G00X (1) + G0X (1)

V (X) = E(X 2 ) − E(X)2 = G00X (1) + G0X (1) − G0X (1)2 .

73
Example 48: A random variable Un : Ω → [0, · · · , n−1] ⊂ N0 with uniform
probability distribution: Un (ω) = 0, 1, . . . , n − 1 each with probability 1/n.
The PGF is
1 + z + z 2 + . . . + z n−1 (1 − z n )
GUn (z) = = , if z 6= 1
n n(1 − z)

which is unfortunate, since to find E(Un ) and V (Un ) we want to be able to


evaluate G0Un and G00Un when z = 1! To get round this, use Taylor’s Theorem
around z = 1:

GUn (1 + z) = GUn (1) + G0Un (1)z + G00Un (1)z 2 /2 + . . .


Get
1 − (1 + z)n (1 + z)n − 1
GUn (1 + z) = =
µ−nz µ ¶ nzµ ¶ ¶
1 n 2 n 3
= nz + z + z + ...
nz 2 3
(n − 1) (n − 1)(n − 2) 2
=1+ z+ z + ...
2 6

So

GUn (1) = 1
n−1
G0Un (1) =
2
(n − 1)(n − 2)
G00Un (1) =
3

n−1
E(Un ) = G0Un (1) =
2
n2 − 1
V (Un ) = G00Un (1) + G0Un (1) − G0Un (1)2 =
12
For example, if n = 6,
35
V (U6 ) =
12

Let X, Y be independent RVs on Ω, X, Y : Ω → N0 . Let gk = P (X = k),


hk = P (Y = k). Consider X + Y :
X
P (X + Y = n) = P (X = k and Y = n − k)
k
X
= P (X = k)P (Y = n − k) , by independence
k
X
= gk hn−k
k

74
So PGF of X + Y is
X³X ´
GX+Y (z) = gk hn−k z n = GX (z)GY (z).
n>0 k

In particular, if S is the sum of n independent instances of X, then

GS (z) = (GX (z))n

7.3 Tossing Coins

From now we write X(z) instead of GX (z).

A single coin toss has two outcomes h, t with probabilities p, q with p+q = 1.
A fair coin has p = q = 1/2.

1. Basic example. Ω = {h, t} P (h) = p and P (t) = q

Define an RV H on Ω by H(h) = 1 , H(t) = 0. So PGF H(z) = q + pz.


P (H = 0) = q, P (H = 1) = p. When we toss the coin n times, Hn , the sum
of n instances of H, is the total number of h. The PGF of Hn is
Xµ n ¶
n n
Hn (z) = H(z) = (q + pz) = pk q n−k z k .
k
k>0

We can mine this for a great deal of information:


µ ¶
n
P (Hn = k) = pk q n−k ,
k
E(H) = H 0 (1),
= p,
E(Hn ) = nE(H) = np,
V (H) = H 00 (1) + H 0 (1) − H 0 (1)2 = 0 + p − p2 = p(1 − p) = pq,
V (Hn ) = nV (H) = npq,

σ(Hn ) = npq.

2. Suppose we toss the coin until we get h then stop. Ω = {h, th, tth, ttth, . . .} =
{tk h | k ∈ N0 } where tk h means t k times, then h.

P (tk h) = q k p
X p p
qk p = = =1
k>0
1−q p
P
So we do have a sample-space, that is, ω∈Ω P (ω) = 1.

Definition 7: Define a RV F on Ω by F (tk h) = k + 1 = total number of


tosses to get h.

75
Then
X pz
F (z) = q k−1 pz k = .
k>1
1 − qz

Again, Fn = sum of n independent instances of F . This represents the


number of tosses needed to get h n times.

The PGF of Fn is µ ¶n
pz
.
1 − qz
So
(1 − qz)p + pzq p
F 0 (z) = 2
=
(1 − qz) (1 − qz)2
p 1
F 0 (1) = = ,
(1 − q)2 p
1
E(F ) = ,
p
n
E(Fn ) = ,
p
2pq
F 00 (z) =
(1 − qz)3
2q
F 00 (1) = ,
p2
2q 1 1 q
V (F ) = 2
+ − 2 = 2,
p p p p
nq
V (Fn ) =
p2
3. Now we keep tossing the coin until we get h twice in a row, hh. The sample
space is {hh, thh, hthh, tthh, . . .} with probabilities {p 2 , qp2 , qp3 ,
q 2 p2 , . . .}. Let G on Ω be an RV equal to total number of tosses as in (2).

G(z) = p2 z 2 + qp2 z 3 + (qp3 + q 2 p2 )z 4 + . . . We get G(z) by substituting qz for


t and pz for h in each element of Ω and summing. An element of Ω consists
of an arbitrary non-empty string in t and ht followed by hh.

Strings of length 1 in t and ht are t, ht. Length 2: tt, tht, htt, htht. Adding
these up, this could be written as (t + ht)2 . Adding up strings of length 3
in t and ht gives (t + ht)3 . Adding up strings of length k in t and ht gives
(t + ht)k . So sum of elements of Ω can be written as
X
(t + ht)k hh
k>0

Now substitute qz for t and pz for h.

76
So PGF
X p2 z 2
G(z) = (qz + pqz 2 )k p2 z 2 = G(1)
k>0
1 − qz − pqz 2
p2 p2 p2
= = = = 1.
1 − q − pq p − pq p(1 − q)
So X
P (ω) = 1
ω∈Ω

and Ω really is a sample space. That is, sequence hh occurs with probability 1
if you keep tossing the coin.
2(1 − qz − pqz 2 )p2 z + p2 z 2 (q + 2pqz)
G0 (z) =
(1 − qz − pqz 2 )2
expected num- 0 2p4 + p2 (q + 2pq)
ber of tosses
= G (1) =
p4
2p4 + p2 (1 − 2p2 + p) 1 1
= = + .
p4 p p2

For example, if p = 21 , G0 (1) = 6.

4. Toss a coin until we get hht.

A sequence ending with first occurence of hht is an arbitrary string of k(> 0)


t or ht followed by string of l(> 0) h, followed by hht.

Using the same idea as in the last example,


X XX ³X ´³ X ´
ω = (t + ht)k hl hht = (t + ht)k hl hht
Ω k>0 l>0 k>0 l>0

For generating function t → qz, h → pz.


X ³X ´ p2 qz 3
(qz + pqz 2 )k (pz)` p2 qz 3 =
k>0 `>0
(1 − qz − pqz 2 )(1 − pz)

In case p = 1/2
z3
G(z) =
z 3 − 8z + 8
(z 3 − 8z + 8)3z 2 − z 3 (3z 2 − 8)
G0 (z) =
(z 3 − 8z + 8)2
G0 (1) = 8

the expected number of tosses.

If we did thh instead, element of Ω is l > 0 h0 s followed by k > 0 t or ht


followed by thh. So the generating function is the same and so is the expected
number.

77
5. Consider the following game between A and B. We toss a fair coin (p =
1/2) until either hht occurs or thh. A wins if hht comes first, B if thh first.

A wins with pattern hk t (k > 2) only (that is, A wins only if first two
tosses are h, with probability 1/4). B wins with (t + ht)k hh (k > 1). The
generating function for the game is the GF of hk t plus the GF of (t + ht)k hh.

z (z/2 + z 2 /4) z 2 /4
G(z) = + .
4(2 − z) 1 − z/2 − z 2 /4

Putting z = 1 gives
1 3
4
+ 4
(A wins) (B wins)
So thh is “better combination than hht”.

6. A new game: now suppose that A wins with hht and B wins with htt.

Winning pattern for A is tk (ht)l hm hht, k, l, m > 0. B wins with tk (ht)l htt,
k, l > 0

The GF is
z3 z3
+
8(1 − z/2)(1 − z 2 /4)(1 − z/2) 8(1 − z/2)(1 − z 2 /4)
(A wins) (B wins)

Putting z = 1 gives
2 1
+ .
3 3
A wins with probability 2/3.

So hht is better than htt. We have thh > hht > htt.

But thh and htt are equally likely to come first (by symmetry). So the
relationship ‘better than’ is not transitive.

78

You might also like