An intro to math methods in Comb-R.Sprugnoli
An intro to math methods in Comb-R.Sprugnoli
Renzo Sprugnoli
Dipartimento di Sistemi e Informatica
Viale Morgagni, 65 - Firenze (Italy)
1 Introduction 5
1.1 What is the Analysis of an Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 The Analysis of Sequential Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Binary Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Closed Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 The Landau notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Special numbers 11
2.1 Mappings and powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 The group structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Counting permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Dispositions and Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 The Pascal triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.7 Harmonic numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.8 Fibonacci numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.9 Walks, trees and Catalan numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.10 Stirling numbers of the first kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.11 Stirling numbers of the second kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.12 Bell and Bernoulli numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 Generating Functions 39
4.1 General Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Some Theorems on Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 More advanced results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Common Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5 The Method of Shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.6 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.7 Some special generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3
4 CONTENTS
5 Riordan Arrays 55
5.1 Definitions and basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 The algebraic structure of Riordan arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3 The A-sequence for proper Riordan arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.4 Simple binomial coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.5 Other Riordan arrays from binomial coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.6 Binomial coefficients and the LIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.7 Coloured walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.8 Stirling numbers and Riordan arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.9 Identities involving the Stirling numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6 Formal methods 67
6.1 Formal languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2 Context-free languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3 Formal languages and programming languages . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.4 The symbolic method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.5 The bivariate case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.6 The Shift Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.7 The Difference Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.8 Shift and Difference Operators - Example I . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.9 Shift and Difference Operators - Example II . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.10 The Addition Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.11 Definite and Indefinite summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.12 Definite Summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.13 The Euler-McLaurin Summation Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.14 Applications of the Euler-McLaurin Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7 Asymptotics 83
7.1 The convergence of power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2 The method of Darboux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.3 Singularities: poles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.4 Poles and asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.5 Algebraic and logarithmic singularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.6 Subtracted singularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.7 The asymptotic behavior of a trinomial square root . . . . . . . . . . . . . . . . . . . . . . . . 89
7.8 Hayman’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.9 Examples of Hayman’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8 Bibliography 93
Chapter 1
Introduction
1.1 What is the Analysis of an ever, as a matter of fact, the nature of S is not essen-
tial, because we always deal with a suitable binary
Algorithm representation of the elements in S on a computer,
and have therefore to be considered as “words” in
An algorithm is a finite sequence of unambiguous the computer memory. The “problem of searching”
rules for solving a problem. Once the algorithm is is as follows: we are given a finite ordered subset
started with a particular input, it will always end, T = (a1 , a2 , . . . , an ) of S (usually called a table, its
obtaining the correct answer or output, which is the elements referred to as keys), an element s ∈ S and
solution to the given problem. An algorithm is real- we wish to know whether s ∈ T or s 6∈ T , and in the
ized on a computer by means of a program, i.e., a set former case which element ak in T it is.
of instructions which cause the computer to perform Although the mathematical problem “s ∈ T or
the elaborations intended by the algorithm. not” has almost no relevance, the searching prob-
So, an algorithm is independent of any computer, lem is basic in computer science and many algorithms
and, in fact, the word was used for a long time before have been devised to make the process of searching
computers were invented. Leonardo Fibonacci called as fast as possible. Surely, the most straight-forward
“algorisms” the rules for performing the four basic algorithm is sequential searching: we begin by com-
operations (addition, subtraction, multiplication and paring s and a1 and we are finished if they are equal.
division) with the newly introduced arabic digits and Otherwise, we compare s and a2 , and so on, until we
the positional notation for numbers. “Euclid’s algo- find an element ak = s or reach the end of T . In
rithm” for evaluating the greatest common divisor of the former case the search is successful and we have
two integer numbers was also well known before the determined the element in T equal to s. In the latter
appearance of computers. case we are convinced that s 6∈ T and the search is
Many algorithms can exist which solve the same unsuccessful.
problem. Some can be very skillful, others can be The analysis of this (simple) algorithm consists in
very simple and straight-forward. A natural prob- finding one or more mathematical expressions de-
lem, in these cases, is to choose, if any, the best al- scribing in some way the number of operations per-
gorithm, in order to realize it as a program on a par- formed by the algorithm as a function of the number
ticular computer. Simple algorithms are more easily n of the elements in T . This definition is intentionally
programmed, but they can be very slow or require vague and the following points should be noted:
large amounts of computer memory. The problem
• an algorithm can present several aspects and
is therefore to have some means to judge the speed
therefore may require several mathematical ex-
and the quantity of computer resources a given algo-
pressions to be fully understood. For example,
rithm uses. The aim of the “Analysis of Algorithms”
for what concerns the sequential search algo-
is just to give the mathematical bases for describing
rithm, we are interested in what happens during
the behavior of algorithms, thus obtaining exact cri-
a successful or unsuccessful search. Besides, for
teria to compare different algorithms performing the
a successful search, we wish to know what the
same task, or to see if an algorithm is sufficiently good
worst case is (Worst Case Analysis) and what
to be efficiently implemented on a computer.
the average case is (Average Case Analysis) with
Let us consider, as a simple example, the problem
respect to all the tables containing n elements or,
of searching. Let S be a given set. In practical cases
for a fixed table, with respect to the n elements
S is the set N of natural numbers, or the set Z of
it contains;
integer numbers, or the set R of real numbers or also
the set A∗ of the words on some alphabet A. How- • the operations performed by an algorithm can be
5
6 CHAPTER 1. INTRODUCTION
of many different kinds. In the example above technical point for the analyst. A large part of our
the only operations involved in the algorithm are efforts will be dedicated to this topic.
comparisons, so no doubt is possible. In other Let us now consider an unsuccessful search, for
algorithms we can have arithmetic or logical op- which we only have an Average Case analysis. If Uk
erations. Sometimes we can also consider more denotes the number of comparisons necessary for a
complex operations as square roots, list concate- table with k elements, we can determine Un in the
nation or reversal of words. Operations depend following way. We compare s with the first element
on the nature of the algorithm, and we can de- a1 in T and obviously we find s 6= a1 , so we should
cide to consider as an “operation” also very com- go on with the table T ′ = T \ {a1 }, which contains
plicated manipulations (e.g., extracting a ran- n − 1 elements. Consequently, we have:
dom number or performing the differentiation of
a polynomial of a given degree). The important Un = 1 + Un−1
point is that every instance of the “operation”
takes on about the same time or has the same This is a recurrence relation, that is an expression
complexity. If this is not the case, we can give relating the value of Un with other values of Uk having
k < n. It is clear that if some value, e.g., U0 or U1
different weights to different operations or to dif-
ferent instances of the same operation. is known, then it is possible to find the value of Un ,
for every n ∈ N. In our case, U0 is the number of
We observe explicitly that we never consider exe- comparisons necessary to find out that an element
cution time as a possible parameter for the behavior s does not belong to a table containing no element.
of an algorithm. As stated before, algorithms are Hence we have the initial condition U0 = 0 and we
independent of any particular computer and should can unfold the preceding recurrence relation:
never be confused with the programs realizing them.
Algorithms are only related to the “logic” used to Un = 1 + Un−1 = 1 + 1 + Un−2 = · · · =
solve the problem; programs can depend on the abil- = 1 + 1 + · · · + 1 +U0 = n
| {z }
ity of the programmer or on the characteristics of the n times
computer.
Recurrence relations are the other mathematical
device arising in algorithm analysis. In our example
1.2 The Analysis of Sequential the recurrence is easily transformed into a sum, but
as we shall see this is not always the case. In general
Searching we have the problem of solving a recurrence, i.e., to
find an explicit expression for Un , starting with the
The analysis of the sequential searching algorithm is
recurrence relation and the initial conditions. So, an-
very simple. For a successful search, the worst case
other large part of our efforts will be dedicated to the
analysis is immediate, since we have to perform n
solution or recurrences.
comparisons if s = an is the last element in T . More
interesting is the average case analysis, which intro-
duces the first mathematical device of algorithm anal- 1.3 Binary Searching
ysis. To find the average number of comparisons in a
successful search we should sum the number of com- Another simple example of analysis can be performed
parisons necessary to find any element in T , and then with the binary search algorithm. Let S be a given
divide by n. It is clear that if s = a1 we only need ordered set. The ordering must be total, as the nu-
a single comparison; if s = a2 we need two compar- merical order in N, Z or R or the lexicographical
isons, and so on. Consequently, we have: order in A∗ . If T = (a1 , a2 , . . . , an ) is a finite or-
dered subset of S, i.e., a table, we can always imag-
1 1 n(n + 1) n+1
Cn = (1 + 2 + · · · + n) = = ine that a1 < a2 < · · · < an and consider the fol-
n n 2 2
lowing algorithm, called binary searching, to search
This result is intuitively clear but important, since for an element s ∈ S in T . Let ai the median el-
it shows in mathematical terms that the number of ement in T , i.e., i = ⌊(n + 1)/2⌋, and compare it
comparisons performed by the algorithm (and hence with s. If s = ai then the search is successful;
the time taken on a computer) grows linearly with otherwise, if s < ai , perform the same algorithm
the dimension of T . on the subtable T ′ = (a1 , a2 , . . . , ai−1 ); if instead
The concluding step of our analysis was the execu- s > ai perform the same algorithm on the subtable
tion of a sum—a well-known sum in the present case. T ′′ = (ai+1 , ai+2 , . . . , an ). If at any moment the ta-
This is typical of many algorithms and, as a matter of ble on which we perform the search is reduced to the
fact, the ability in performing sums is an important empty set ∅, then the search is unsuccessful.
1.4. CLOSED FORMS 7
Let us consider first the Worst Case analysis of This sum also is not immediate, but the reader can
this algorithm. For a successful search, the element check it by using mathematical induction. If we now
s is only found at the last step of the algorithm, i.e., write k2k − 2k + 1 = k(2k − 1) + k − (2k − 1), we find:
when the subtable on which we search is reduced to
a single element. If Bn is the number of comparisons k log2 (n + 1)
An = k + − 1 = log2 (n + 1) − 1 +
necessary to find s in a table T with n elements, we n n
have the recurrence: which is only a little better than the worst case.
Bn = 1 + B⌊n/2⌋ For unsuccessful searches, the analysis is now very
simple, since we have to proceed as in the Worst Case
In fact, we observe that every step reduces the table analysis and at the last comparison we have a failure
to ⌊n/2⌋ or to ⌊(n−1)/2⌋ elements. Since we are per- instead of a success. Consequently, Un = Bn .
forming a Worst Case analysis, we consider the worse
situation. The initial condition is B1 = 1, relative to
the table (s), to which we should always reduce. The 1.4 Closed Forms
recurrence is not so simple as in the case of sequential
searching, but we can simplify everything considering The sign “=” between two numerical expressions de-
a value of n of the form 2k −1. In fact, in such a case, notes their numerical equivalence as for example:
we have ⌊n/2⌋ = ⌊(n − 1)/2⌋ = 2k−1 − 1, and the re- n
X n(n + 1)
currence takes on the form: k=
2
B2k −1 = 1 + B2k−1 −1 or βk = 1 + βk−1 k=0
if we write βk for B2k −1 . As before, unfolding yields Although algebraically or numerically equivalent, two
βk = k, and returning to the B’s we find: expressions can be computationally quite different.
In the example, the left-hand expression requires n
Bn = log2 (n + 1) sums to be evaluated, whereas the right-hand ex-
by our definition n = 2 − 1. This is valid for every n pression only requires a sum, a multiplication and
k
of the form 2k − 1 and for the other values this is an a halving. For n also moderately large (say n ≥ 5)
approximation, a rather good approximation, indeed, nobody would prefer computing the left-hand expres-
because of the very slow growth of logarithms. sion rather than the right-hand one. A computer
We observe explicitly that for n = 1, 000, 000, a evaluates this latter expression in a few nanoseconds,
sequential search requires about 500,000 comparisons but can require some milliseconds to compute the for-
on the average for a successful search, whereas binary mer, if only n is greater than 10,000. The important
searching only requires log2 (1, 000, 000) ≈ 20 com- point is that the evaluation of the right-hand expres-
parisons. This accounts for the dramatic improve- sion is independent of n, whilst the left-hand expres-
ment that binary searching operates on sequential sion requires a number of operations growing linearly
searching, and the analysis of algorithms provides a with n.
mathematical proof of such a fact. A closed form expression is an expression, depend-
The Average Case analysis for successful searches ing on some parameter n, the evaluation of which
can be accomplished in the following way. There is does not require a number of operations depending
only one element that can be found with a single com- on n. Another example we have already found is:
parison: the median element in T . There are two n
X
elements that can be found with two comparisons: k2k−1 = n2n − 2n + 1 = (n − 1)2n + 1
the median elements in T ′ and in T ′′ . Continuing k=0
in the same way we find the average number An of
comparisons as: Again, the left-hand expression is not in closed form,
whereas the right-hand one is. We observe that
1
An = (1 + 2 + 2 + 3 + 3 + 3 + 3 + 4 + · · · 2n = 2×2×· · ·×2 (n times) seems to require n−1 mul-
n
tiplications. In fact, however, 2n is a simple shift in
· · · + (1 + ⌊log2 (n)⌋))
a binary computer and, more in general, every power
The value of this sum can be found explicitly, but the αn = exp(n ln α) can be always computed with the
method is rather difficult and we delay it until later maximal accuracy allowed by a computer in constant
(see Section 4.7). When n = 2k − 1 the expression time, i.e., in a time independent of α and n. This
simplifies: is because the two elementary functions exp(x) and
k ln(x) have the nice property that their evaluation is
1 X j−1 k2k − 2k + 1 independent of their argument. The same property
A2k −1 = k j2 =
2 − 1 j=1 2k − 1 holds true for the most common numerical functions,
8 CHAPTER 1. INTRODUCTION
as the trigonometric and hyperbolic functions, the Γ the last integral being the famous Gauss’ integral.
and ψ functions (see below), and so on. Finally, from the recurrence relation we obtain:
As we shall see, in algorithm analysis there appear
1
many kinds of “special” numbers. Most of them can Γ(1/2) = Γ(1 − 1/2) = − Γ(−1/2)
be reduced to the computation of some basic quan- 2
tities, which are considered to be in closed form, al- and therefore:
though apparently they depend on some parameter √
n. The three main quantities of this kind are the fac- Γ(−1/2) = −2Γ(1/2) = −2 π.
torial, the harmonic numbers and the binomial co-
efficients. In order to justify the previous sentence, The Γ function is defined for every x ∈ C, except
let us anticipate some definitions, which will be dis- when x is a negative integer, where the function goes
cussed in the next chapter, and give a more precise to infinity; the following approximation can be im-
presentation of the Γ and ψ functions. portant:
(−1)n 1
The Γ-function is defined by a definite integral: Γ(−n + ǫ) ≈ .
Z ∞ n! ǫ
Γ(x) = tx−1 e−t dt. When we unfold the basic recurrence of the Γ-
0 function for x = n an integer, we find Γ(n + 1) =
By integrating by parts, we obtain: n × (n − 1) × · · · × 2 × 1. The factorial Γ(n + 1) =
Z ∞ n! = 1 × 2 × 3 × · · · × n seems to require n − 2 multi-
Γ(x + 1) = tx e−t dt = plications. However, for n large it can be computed
0
Z ∞ by means of the Stirling’s formula, which is obtained
£ x −t ¤∞ from the same formula for the Γ-function:
= −t e 0 + xtx−1 e−t dt
0
n! = Γ(n + 1) = nΓ(n) =
= xΓ(x) ³ n ´n µ ¶
√ 1 1
which is a basic, recurrence property of the Γ- = 2πn 1+ + + ··· .
e 12n 288n2
function. It allows us to reduce the computation of
Γ(x) to the case 1 ≤ x ≤ 2. In this interval we can This requires only a fixed amount of operations to
use a polynomial approximation: reach the desired accuracy.
The function ψ(x), called ψ-function or digamma
Γ(x + 1) = 1 + b1 x + b2 x2 + · · · + b8 x8 + ǫ(x) function, is defined as the logarithmic derivative of
where: the Γ-function:
obtain an approximate formula for the Harmonic is in the same order as g(n), a constant K 6= 0 exists
numbers: such that:
1 1 f (n) f (n)
Hn = ln n + γ + − + ··· lim =1 or lim = K;
2n 12n2 n→∞ Kg(n) n→∞ g(n)
which shows that the computation of Hn does not
the constant K is very important and will often be
require n − 1 sums and n − 1 inversions as it can
used.
appear from its definition.
Before making some important comments on Lan-
Finally, the binomial coefficient:
µ ¶ dau notation, we wish to introduce a last definition:
n n(n − 1) · · · (n − k + 1) we say that f (n) is of smaller order than g(n) and
= =
k k! write f (n) = o(g(n)), iff:
n!
= = f (n)
k!(n − k)! lim = 0.
n→∞ g(n)
Γ(n + 1)
= Obviously, this is in accordance with the previous
Γ(k + 1)Γ(n − k + 1)
definitions, but the notation introduced (the small-
can be reduced to the computation of the Γ function oh notation) is used rather frequently and should be
or can be approximated by using the Stirling’s for- known.
mula for factorials. The two methods are indeed the If f (n) and g(n) describe the behavior of two al-
same. We observe explicitly that the last expression gorithms A and B solving the same problem, we
shows that binomial coefficients can be defined for will say that A is asymptotically better than B iff
every n, k ∈ C, except that k cannot be a negative f (n) = o(g(n)); instead, the two algorithms are
integer number. asymptotically equivalent iff f (n) = Θ(g(n)). This is
The reader can, as a very useful exercise, write rather clear, because when f (n) = o(g(n)) the num-
computer programs to realize the various functions ber of operations performed by A is substantially less
mentioned in the present section. than the number of operations performed by B. How-
ever, when f (n) = Θ(g(n)), the number of operations
1.5 The Landau notation is the same, except for a constant quantity K, which
remains the same as n → ∞. The constant K can
To the mathematician Edmund Landau is ascribed simply depend on the particular realization of the al-
a special notation to describe the general behavior gorithms A and B, and with two different implemen-
of a function f (x) when x approaches some definite tations we may have K < 1 or K > 1. Therefore, in
value. We are mainly interested to the case x → general, when f (n) = Θ(g(n)) we cannot say which
∞, but this should not be considered a restriction. algorithm is better, this depending on the particular
Landau notation is also known as O-notation (or big- realization or on the particular computer on which
oh notation), because of the use of the letter O to the algorithms are run. Obviously, if A and B are
denote the desired behavior. both realized on the same computer and in the best
Let us consider functions f : N → R (i.e., sequences possible way, a value K < 1 tells us that algorithm
of real numbers); given two functions f (n) and g(n), A is relatively better than B, and vice versa when
we say that f (n) is O(g(n)), or that f (n) is in the K > 1.
order of g(n), if and only if: It is also possible to give an absolute evaluation
for the performance of a given algorithm A, whose
f (n) behavior is described by a sequence of values f (n).
lim <∞
n→∞ g(n) This is done by comparing f (n) against an absolute
scale of values. The scale most frequently used con-
In formulas we write f (n) = O(g(n)) or also f (n) ∼ tains powers of n, logarithms and exponentials:
g(n). Besides, if we have at the same time: √
O(1) < O(ln n) < O( n) < O(n) < · · ·
g(n) √
lim <∞ < O(n ln n) < O(n n) < O(n2 ) < · · ·
n→∞ f (n)
< O(n5 ) < · · · < O(en ) < · · ·
we say that f (n) is in the same order as g(n) and n
< O(ee ) < · · · .
write f (n) = Θ(g(n)) or f (n) ≈ g(n).
It is easy to see that “∼” is an order relation be- This scale reflects well-known properties: the loga-
tween functions f : N → R, and that “≈” is an equiv- rithm grows more slowly than any power nǫ , how-
alence relation. We observe explicitly that when f (n) ever small ǫ, while en grows faster than any power
10 CHAPTER 1. INTRODUCTION
Special numbers
A sequence is a mapping from the set N of nat- we write B = {x | P (x) is true}, thus giving an inten-
ural numbers into some other set of numbers. If sional definition of the particular set B.
f : N → R, the sequence is called a sequence of real
If S is a finite set, then |S| denotes its cardinality
numbers; if f : N → Q, the sequence is called a se-
or the number of its elements: The order in which we
quence of rational numbers; and so on. Usually, the
write or consider the elements of S is irrelevant. If we
image of a k ∈ N is denoted by fk instead of the tradi-
wish to emphasize a particular arrangement or order-
tional f (k), and the whole sequence is abbreviated as
ing of the elements in S, we write (a1 , a2 , . . . , an ), if
(f0 , f1 , f2 , . . .) = (fk )k∈N . Because of this notation,
|S| = n and S = {a1 , a2 , . . . , an } in any order. This is
an element k ∈ N is called an index.
the vector notation and is used to represent arrange-
Often, we also study double sequences, i.e., map-
ments of S. Two arrangements of S are different if
pings f : N × N → R or some other nu-
and only if an index k exists, for which the elements
meric set. In this case also, instead of writing
corresponding to that index in the two arrangements
f (n, k) we will usually write fn,k and the whole
are different; obviously, as sets, the two arrangements
sequence will be denoted by {fn,k | n, k ∈ N} or
continue to be the same.
(fn,k )n,k∈N . A double sequence can be displayed
as an infinite array of numbers, whose first row is If A, B are two sets, a mapping or function from A
the sequence (f0,0 , f0,1 , f0,2 , . . .), the second row is into B, noted as f : A → B, is a subset of the Carte-
(f1,0 , f1,1 , f1,2 , . . .), and so on. The array can also sian product of A by B, f ⊆ A × B, such that every
be read by columns, and then (f0,0 , f1,0 , f2,0 , . . .) is element a ∈ A is the first element of one and only
the first column, (f0,1 , f1,1 , f2,1 , . . .) is the second col- one pair (a, b) ∈ f . The usual notation for (a, b) ∈ f
umn, and so on. Therefore, the index n is the row is f (a) = b, and b is called the image of a under the
index and the index k is the column index. mapping f . The set A is the domain of the map-
We wish to describe here some sequences and dou- ping, while B is the range or codomain. A function
ble sequences of numbers, frequently occurring in the for which every a1 6= a2 ∈ A corresponds to pairs
analysis of algorithms. In fact, they arise in the study (a1 , b1 ) and (a2 , b2 ) with b1 6= b2 is called injective. A
of very simple and basic combinatorial problems, and function in which every b ∈ B belongs at least to one
therefore appear in more complex situations, so to be pair (a, b) is called surjective. A bijection or 1-1 cor-
considered as the fundamental bricks in a solid wall. respondence or 1-1 mapping is any injective function
which is also surjective.
If |A| = n and |B| = m, the cartesian product
2.1 Mappings and powers A × B, i.e., the set of all the couples (a, b) with a ∈ A
and b ∈ B, contains exactly nm elements or couples.
In all branches of Mathematics two concepts appear, A more difficult problem is to find out how many
which have to be taken as basic: the concept of a mappings from A to B exist. We can observe that
set and the concept of a mapping. A set is a col- every element a ∈ A must have its image in B; this
lection of objects and it must be considered a primi- means that we can associate to a any one of the m
tive concept, i.e., a concept which cannot be defined elements in B. Therefore, we have m · m · . . . · m
in terms of other and more elementary concepts. If different possibilities, when the product is extended
a, b, c, . . . are the objects (or elements) in a set de- to all the n elements in A. Since all the mappings can
noted by A, we write A = {a, b, c, . . .}, thus giving be built in this way, we have a total of mn different
an extensional definition of this particular set. If a mappings from A into B. This also explains why the
set B is defined through a property P of its elements, set of mappings from A into B is often denoted by
11
12 CHAPTER 2. SPECIAL NUMBERS
is the sum of the degrees of its cycles. The six per- In fact, in cycle notation, we have:
mutations have degree 2, 3, 3, 4, 4, 3, respectively. A
permutation is even or odd according to the fact that (2 5 4 7 3 6)(2 6 3 7 4 5) =
its degree is even or odd. = (2 6 3 7 4 5)(2 5 4 7 3 6) = (1).
The permutation (8, 9, 4, 3, 6, 1, 7, 2, 10, 5),
in vector notation, has a cycle representation A simple observation is that the inverse of a cycle
(1 8 2 9 10 5 6)(3 4), the number 7 being a fixed is obtained by writing its first element followed by
point. The long cycle (1 8 2 9 10 5 6) has degree 8; all the other elements in reverse order. Hence, the
therefore the permutation degree is 8 + 3 + 2 = 13 inverse of a transposition is the same transposition.
and the permutation is odd. Since composition is associative, we have proved
that (Pn , ◦) is a group. The group is not commuta-
tive, because, for example:
2.3 The group structure ρ◦π = (1 4)(2 5 7 3) ◦ (2 5 4 7 3 6)
Let n ∈ N; Pn denotes the set of all the permutations = (1 7 6 2 4)(3 5) 6= π ◦ ρ.
of n elements, i.e., according to the previous sections,
the set of 1-1 mappings π : Nn → Nn . If π, ρ ∈ Pn , An involution is a permutation π such that π 2 = π ◦
we can perform their composition, i.e., a new permu- π = (1). An involution can only be composed by fixed
tation σ defined as σ(k) = π(ρ(k)) = (π ◦ ρ)(k). An points and transpositions, because by the definition
example in P7 is: we have π −1 = π and the above observation on the
π◦ρ= inversion of cycles shows that a cycle with more than
µ ¶µ ¶2 elements has an inverse which cannot coincide with
1 2 3 4 5 6 7 1 2 3 4 5 6 7 the cycle itself.
1 5 6 7 4 2 3 4 5 2 1 7 6 3 Till now, we have supposed that in the cycle repre-
µ ¶
1 2 3 4 5 6 7 sentation every number is only considered once. How-
= .
4 7 6 3 1 5 2 ever, if we think of a permutation as the product of
In fact, by instance, π(2) = 5 and ρ(5) = 7; therefore cycles, we can imagine that its representation is not
σ(2) = ρ(π(2)) = ρ(5) = 7, and so on. The vec- unique and that an element k ∈ Nn can appear in
tor representation of permutations is not particularly more than one cycle. The representation of σ or π ◦ ρ
suited for hand evaluation of composition, although it are examples of this statement. In particular, we can
is very convenient for computer implementation. The obtain the transposition representation of a permuta-
opposite situation occurs for cycle representation: tion; we observe that we have:
Therefore, we conclude that an even [odd] permuta- When n ≥ 2, we can add to every permutation in
tion can be expressed as the composition of an even Pn one transposition, say (1 2). This transforms ev-
[odd] number of transpositions. Since the composi- ery even permutation into an odd permutation, and
tion of two even permutations is still an even permu- vice versa. On the other hand, since (1 2)−1 = (1 2),
tation, the set An of even permutations is a subgroup the transformation is its own inverse, and therefore
of Pn and is called the alternating subgroup, while the defines a 1-1 mapping between even and odd permu-
whole group Pn is referred to as the symmetric group. tations. This proves that the number of even (odd)
permutations is n!/2.
Another simple problem is how to determine the
2.4 Counting permutations number of involutions on n elements. As we have al-
ready seen, an involution is only composed by fixed
How many permutations are there in Pn ? If n = 1,
points and transpositions (without repetitions of the
we only have a single permutation (1), and if n = 2
elements!). If we denote by In the set of involutions
we have two permutations, exactly (1, 2) and (2, 1).
of n elements, we can divide In into two subsets: In′
We have already seen that |P3 | = 6 and if n = 0
is the set of involutions in which n is a fixed point,
we consider the empty vector () as the only possible
and In′′ is the set of involutions in which n belongs
permutation, that is |P0 | = 1. In this way we obtain
to a transposition, say (k n). If we eliminate n from
a sequence {1, 1, 2, 6, . . .} and we wish to obtain a
the involutions in In′ , we obtain an involution of n − 1
formula giving us |Pn |, for every n ∈ N.
elements, and vice versa every involution in In′ can be
Let π ∈ Pn be a permutation and (a1 , a2 , ..., an ) be
obtained by adding the fixed point n to an involution
its vector representation. We can obtain a permuta-
in In−1 . If we eliminate the transposition (k n) from
tion in Pn+1 by simply adding the new element n + 1
an involution in In′′ , we obtain an involution in In−2 ,
in any position of the representation of π:
which contains the element n−1, but does not contain
(n + 1, a1 , a2 , . . . , an ) (a1 , n + 1, a2 , . . . , an ) ··· the element k. In all cases, however, by eliminating
(k n) from all involutions containing it, we obtain a
··· (a1 , a2 , . . . , an , n + 1) set of involutions in a 1-1 correspondence with In−2 .
The element k can assume any value 1, 2, . . . , n − 1,
Therefore, from any permutation in Pn we obtain
and therefore we obtain (n − 1) times |In−2 | involu-
n+1 permutations in Pn+1 , and they are all different.
tions.
Vice versa, if we start with a permutation in Pn+1 ,
We now observe that all the involutions in In are
and eliminate the element n + 1, we obtain one and
obtained in this way from involutions in In−1 and
only one permutation in Pn . Therefore, all permuta-
In−2 , and therefore we have:
tions in Pn+1 are obtained in the way just described
and are obtained only once. So we find: |In | = |In−1 | + (n − 1)|In−2 |
|Pn+1 | = (n + 1) |Pn | Since |I0 | = 1, |I1 | = 1 and |I2 | = 2, from this recur-
which is a simple recurrence relation. By unfolding rence relation we can successively find all the values
this recurrence, i.e., by substituting to |Pn | the same of |In |. This sequence (see Section 4.9) is therefore:
expression in |Pn+1 |, and so on, we obtain: n 0 1 2 3 4 5 6 7 8
|Pn+1 | = (n + 1)|Pn | = (n + 1)n|Pn−1 | = In 1 1 2 4 10 26 76 232 764
= ··· = We conclude this section by giving the classical
= (n + 1)n(n − 1) · · · 1 × |P0 | computer program for generating a random permu-
tation of the numbers {1, 2, . . . , n}. The procedure
Since, as we have seen, |P0 |=1, we have proved that
shuffle receives the address of the vector, and the
the number of permutations in Pn is given by the
number of its elements; fills it with the numbers from
product n · (n − 1)... · 2 · 1. Therefore, our sequence
1 to n then uses the standard procedure random to
is:
produce a random permutation, which is returned in
n 0 1 2 3 4 5 6 7 8 the input vector:
|Pn | 1 1 2 6 24 120 720 5040 40320
procedure shuffle( var v : vector; n : integer ) ;
As we mentioned in the Introduction, the number var : · · ·;
n·(n−1)...·2·1 is called n factorial and is denoted by begin
n!. For example we have 10! = 10·9·8·7·6·5·4·3·2·1 = for i := 1 to n do v[ i ] := i ;
3, 680, 800. Factorials grow very fast, but they are one for i := n downto 2 do begin
of the most important quantities in Mathematics. j := random( i ) + 1 ;
2.5. DISPOSITIONS AND COMBINATIONS 15
by means of the above formulas. There exists, how- 2.6 The Pascal triangle
ever, a symmetry formula which is very useful. Let
us begin by observing that: Binomial coefficients satisfy a very important recur-
µ ¶ rence relation, ¡ ¢ which we are now going to prove. As
n n(n − 1) . . . (n − k + 1) we know, nk is the number of the subsets with k
= =
k k! elements of a set with n elements. We can count
n . . . (n − k + 1) (n − k) . . . 1 the number of such subsets in the following way. Let
= =
k! (n − k) . . . 1 {a1 , a2 , . . . , an } be the elements of the base set, and
n! let us fix one of these elements, e.g., an . We can dis-
= tinguish the subsets with k elements into two classes:
k!(n − k)!
the subsets containing an and the subsets that do
This is a very important formula by its own, and it not contain an . Let S + and S − be these two classes.
shows that: We now point out that S + can be seen (by elimi-
µ ¶ µ ¶
n n! n nating an ) as the subsets with k − 1 elements of a
= = set with n − 1 elements;
n−k (n − k)!(n − (n − k))! k ¡ ¢ therefore, the −
number of the
¡ ¢ elements in S + is n−1 k−1 . The class S can be seen
From this formula we immediately obtain 100 97 = as composed by the subsets with k elements of a set
¡100¢
3 . The most difficult computing problem is thewith n − 1 elements, i.e., ¡ the ¢ base set minus an : their
¡ ¢
evaluation of the central binomial coefficient 2k number is therefore n−1 k . By summing these two
k , for
which symmetry gives no help. contributions, we obtain the recurrence relation:
The reader is invited to produce a computer pro- µ ¶ µ ¶ µ ¶
n n−1 n−1
gram to evaluate binomial coefficients. He (or she) is = +
warned not to use the formula n!/k!(n−k)!, which can k k k−1
produce very large numbers, exceeding the capacity which can be used with the initial conditions:
of the computer when n, k are not small. µ ¶ µ ¶
The definition of a binomial coefficient can be eas- n n
= =1 ∀n ∈ N.
ily expanded to any real numerator: 0 n
µ ¶
r rk r(r − 1) . . . (r − k + 1) For example, we have:
= = . µ ¶ µ ¶ µ ¶
k k! k!
4 3 3
= + =
For example we have: 2 2 1
µ ¶ µ ¶ µ ¶ µ ¶ µ ¶
1/2 1/2(−1/2)(−3/2) 1 2 2 2 2
= = = + + + =
3 3! 16 2 1 1 0
µ ¶ µµ ¶ µ ¶¶
2 1 1
but in this case the symmetry rule does not make = 2+2 =2+2 + =
1 1 0
sense. We point out that:
µ ¶ = 2 + 2 × 2 = 6.
−n −n(−n − 1) . . . (−n − k + 1)
= = This recurrence is not particularly suitable for nu-
k k!
merical computation. However, it gives a simple rule
(−1)k (n + k − 1)k to compute successively all the binomial coefficients.
=
µ k! ¶ Let us dispose them in an infinite array, whose rows
n+k−1 k represent the number n and whose columns represent
= (−1)
k the number k. The recurrence tells us that the ele-
which allows us to express a binomial coefficient with ment in position (n, k) is obtained by summing two
negative, integer numerator as a binomial coefficient elements in the previous row: the element just above
with positive numerator. This is known as negation the position (n, k), i.e., in position (n − 1, k), and the
rule and will be used very often. element on the left, i.e., in position (n−1, k −1). The
If in a combination we are allowed to have several array is initially filled by 1’s
¡n¢in the first column (cor-
copies of the same element, we obtain a combination responding to the various 0 ) and
¡n¢the main diagonal
with repetitions. A useful exercise is to prove that the (corresponding to the numbers n ). See Table 2.1.
number of the k by k combinations with repetitions We actually obtain an infinite, lower triangular ar-
of n elements is: ray known as the Pascal triangle (or Tartaglia-Pascal
µ ¶ triangle). The symmetry rule is quite evident and a
n+k−1 simple observation is that the sum of the elements in
Rn,k = .
k row n is 2n . The proof of this fact is immediate, since
2.7. HARMONIC NUMBERS 17
and therefore the sum cannot be limited. On the Since H20 = H1 = 1, we have the limitations:
other hand we can define:
ln 2k < H2k < ln 2k + 1.
1 1 1
Hn = 1 + + + · · · +
2 3 n These limitations can be extended to every n, and
since the values of the Hn ’s are increasing, this im-
a finite, partial sum of the harmonic series. This plies that a constant γ should exist (0 < γ < 1) such
number has a well-defined value and is called a har- that:
monic number. Conventionally, we set H0 = 0 and Hn → ln n + γ as n→∞
the sequence of harmonic numbers begins:
This constant is called the Euler-Mascheroni con-
n 0 1 2 3 4 5 6 7 8 stant and, as we have already mentioned, its value
3 11 25 137 49 363 761 is: γ ≈ 0.5772156649 . . .. Later we will prove the
Hn 0 1
2 6 12 60 20 140 280 more accurate approximation of the Hn ’s we quoted
in the Introduction.
Harmonic numbers arise in the analysis of many
The generalized harmonic numbers are defined as:
algorithms and it is very useful to know an approxi-
mate value for them. Let us consider the series: 1 1 1 1
Hn(s) = s + s + s + · · · + s
1 1 1 1 1 2 3 n
1 − + − + − · · · = ln 2
2 3 4 5 (1)
and Hn = Hn . They are the partial sums of the
and let us define: series defining the Riemann ζ function:
1 1 1 (−1)n−1 1 1 1
Ln = 1 − + − + ··· + . ζ(s) = s
+ s + s + ···
2 3 4 n 1 2 3
Obviously we have: which can be defined in such a way that the sum
actually converges except for s = 1 (the harmonic
µ ¶
1 1 1 series). In particular we have:
H2n − L2n = 2 + + ··· + = Hn
2 4 2n
1 1 1 1 π2
ζ(2) = 1 + + + + + ··· =
or H2n = L2n + Hn , and since the series for ln 2 is al- 4 9 16 25 6
ternating in sign, the error committed by truncating 1 1 1 1
it at any place is less than the first discarded element. ζ(3) = 1 + + + + + ···
8 27 64 125
Therefore:
1 1 1 1 1 π4
ln 2 − < L2n < ln 2 ζ(4) = 1 + + + + + ··· =
2n 16 81 256 625 90
and by summing Hn to all members: and in general:
1 (2π)2n
Hn + ln 2 − < H2n < Hn + ln 2 ζ(2n) = |B2n |
2n 2(2n)!
Let us now consider the two cases n = 2k−1 and n = where Bn are the Bernoulli numbers (see below). No
2k−2 : explicit formula is known for ζ(2n + 1), but numeri-
cally we have:
1
H2k−1 + ln 2 − k < H2k < H2k−1 + ln 2
2 ζ(3) ≈ 1.202056903 . . . .
1 Because of the limited value of ζ(s), we can set, for
H2k−2 + ln 2 − < H2k−1 < H2k−2 + ln 2.
2k−1 large values of n:
By summing and simplifying these two expressions,
we obtain: Hn(s) ≈ ζ(s)
1 1
H2k−2 + 2 ln 2 − − k−1 < H2k < H2k−2 + 2 ln 2.
2k 2 2.8 Fibonacci numbers
We can now iterate this procedure and eventually At the beginning of 1200, Leonardo Fibonacci intro-
find: duced in Europe the positional notation for numbers,
1 1 1 together with the computing algorithms for perform-
H20 +k ln 2− k
− k−1 −· · ·− 1 < H2k < H20 +k ln 2. ing the four basic operations. In fact, Fibonacci was
2 2 2
2.9. WALKS, TREES AND CATALAN NUMBERS 19
1 1 2 3 3
@
@ @
@ ¡
¡ @
@ ¡
¡ ¡
¡
2 3 1 3 1 2
@
@ ¡
¡ @
@ ¡
¡
3 2 2 1
n\k 0 1 2 3 4 5 6
q q q q 0 1
q q ¡@
q¡ @q 1 0 1
q 2 0 1 1
3 0 2 3 1
4 0 6 11 6 1
q q q q q 5 0 24 50 35 10 1
q q q¡¡@ @q q ¡@
¡ @q q ¡q@
¡ @q 6 0 120 274 225 85 15 1
q q
¡ ¡@ @q q q
q
Table 2.2: Stirling numbers of the first kind
instances:
Figure 2.4: Rooted Plane Trees x1 = x
x2 = x(x − 1) = x2 − x
x3 = x(x − 1)(x − 2) = x3 − 3x2 + 2x
from the origin and never going below the x-axis; they x4 = x(x − 1)(x − 2)(x − 3) =
are called Dyck walks. An obvious 1-1 correspondence = x4 − 6x3 + 11x2 − 6x
exists between Dyck walks and the walks considered
above, and again we obtain the sequence of Catalan and picking the coefficients in their proper order
numbers: (from the smallest power to the largest) he obtained a
table of integer numbers. We are mostly interested in
the absolute values of these numbers, as are shown in
n 0 1 2 3 4 5 6 7 8 Table 2.2. After him, these numbers are called Stir-
bn 1 1 2 5 14 42 132 429 1430
£ling
n
¤ numbers of the first kind and are now denoted by
k , sometimes read “n cycle k”, for the reason we
Finally, the concept of a rooted planar tree is as are now going to explain.
follows: let us consider a node, which is the root of First note that the above identities can be written:
the tree; if we recursively add branches to the root or Xn h i
n
to the nodes generated by previous insertions, what xn = (−1)n−k xk .
k
we obtain is a “rooted plane tree”. If n denotes the k=0
number of branches in a rooted plane tree, in Fig. 2.4 n n−1
we represent all the trees up to n = 3. Again, rooted Let us now observe that x = x (x − n + 1) and
plane trees are counted by Catalan numbers. therefore we have:
X ·n − 1¸
n−1
n
x = (x − n + 1) (−1)n−k−1 xk =
k
k=0
n−1
X · ¸
2.10 Stirling numbers of the =
n−1
(−1)n−k−1 xk+1 −
k
first kind k=0
n−1
X · ¸
n−1
− (n − 1) (−1)n−k−1 xk =
About 1730, the English mathematician James Stir- k
k=0
ling was looking for a connection between powers n · ¸
X n−1
of a number x, say xn , and the falling factorials = (−1)n−k xk +
xk = x(x − 1) · · · (x − k + 1). He developed the first k=0
k−1
22 CHAPTER 2. SPECIAL NUMBERS
n
X · ¸
n−1 As an immediate consequence of this reasoning, we
+ (n − 1) (−1)n−k xk .
k find:
k=0 Xn h i
n
= n!,
We performed the change of variable k → k − 1 in k
k=0
the first sum and then extended both sums from 0
i.e., the row sums of the Stirling triangle of the first
to n. This identity is valid for every value of x, and
kind equal n!, because they correspond to the total
therefore we can equate its coefficients to those of the
number of permutations of n objects. We also observe
previous, general Stirling identity, thus obtaining the
that:
recurrence relation: £ ¤
hni · ¸ · ¸ • n1 = (n − 1)!; in fact, Sn,1 is composed by
n−1 n−1 all the permutations having a single cycle; this
= (n − 1) + .
k k k−1 begins by 1 and is followed by any permutations
of the n − 1 remaining numbers;
This recurrence, together with the initial conditions: h i
n
¡ ¢
hni hni • n−1 = n2 ; in fact, Sn,n−1 contains permu-
= 1, ∀n ∈ N and = 0, ∀n > 0, tations having all fixed points except a single
n 0
transposition; but this transposition can only be
completely defines the Stirling number of the first formed by taking ¡two
¢ elements among 1, 2, . . . , n,
kind. which is done in n2 different ways;
What is a possible combinatorial interpretation of £ ¤
these numbers? Let us consider the permutations of • n2 = (n − 1)!Hn−1 ; returning to the numeri-
n elements and count the permutations having ex- cal definition, the coefficient of x2 is a sum of
actly k cycles, whose set will be denoted by Sn,k . If products, in each of which a positive integer is
we fix any element, say the last element n, we ob- missing.
serve that the permutations in Sn,k can have n as a
fixed point, or not. When n is a fixed point and we
eliminate it, we obtain a permutation with n − 1 el-
2.11 Stirling numbers of the
ements having exactly k − 1 cycles; vice versa, any second kind
such permutation gives a permutation in Sn,k with n
as a fixed point if we add (n) to it. Therefore, there James Stirling also tried to invert the process de-
are |Sn−1,k−1 | such permutations in Sn,k . When n is scribed in the previous section, that is he was also
not a fixed point and we eliminate it from the permu- interested in expressing ordinary powers in terms of
tation, we obtain a permutation with n − 1 elements falling factorials. The first instances are:
and k cycles. However, the same permutation is ob-
x1 = x1
tained several times, exactly n − 1 times, since n can
x2 = x1 + x2 = x + x(x − 1)
occur after any other element in the standard cycle
x3 = x1 + 3x2 + x3 = x + 3x(x − 1)+
representation (it can never occur as the first element
+ x(x − 1)(x − 2)
in a cycle, by our conventions). For example, all the
x4 = x1 + 6x2 + 7x3 + x4
following permutations in S5,2 produce the same per-
mutation in S4,2 : The coefficients can be arranged into a triangular ar-
ray, as shown in Table 2.3, and are called Stirling
(1 2 3)(4 5) (1 2 3 5)(4) (1 2 5 3)(4) (1 5 2 3)(4). numbers©ofªthe second kind. The usual notation for
them is nk , often read “n subset k”, for the reason
The process can be inverted and therefore we have: we are going to explain.
Stirling’s identities can be globally written:
|Sn,k | = (n − 1)|Sn−1,k | + |Sn−1,k−1 |
n n o
X n
which is just the recurrence relation for the Stirling xk .
xn =
k
k=0
numbers of the first kind. If we now prove that also
the initial
£ conditions
¤ are the same, we conclude that We obtain a recurrence relation in the following way:
|Sn,k | = nk . First we observe that Sn,n is only com- n−1 ½ ¾
posed by the identity, the only permutation having n xn = xxn−1 = x X n − 1 xk =
cycles, i.e., n fixed points; so |Sn,n | = 1, ∀n ∈ N. k=0
k
Moreover, for n ≥ 1, Sn,0 is empty, because ev- n−1 ½ ¾
X n−1
ery permutation contains at least one cycle, and so = (x + k − k)xk =
|Sn,0 | = 0. This concludes the proof. k
k=0
2.12. BELL AND BERNOULLI NUMBERS 23
{1} ∪ {2, 3, 4} {1, 2} ∪ {3, 4} {1, 3} ∪ {2, 4} 2.12 Bell and Bernoulli num-
{1, 4} ∪ {2, 3} {1, 2, 3} ∪ {4} bers
{1, 2, 4} ∪ {3} {1, 3, 4} ∪ {2} . If we sum the rows of the Stirling triangle of the sec-
If Pn,k is the corresponding set, we now count |Pn,k | ond kind, we find a sequence:
by fixing an element in Nn , say the last element n.
n 0 1 2 3 4 5 6 7 8
The partitions in Pn,k can contain n as a singleton
Bn 1 1 2 5 15 52 203 877 4140
(i.e., as a subset with n as its only element) or can
contain n as an element in a larger subset. In the which represents the total number of partitions rela-
former case, by eliminating {n} we obtain a partition tive to the set Nn . For example, the five partitions of
in Pn−1,k−1 and, obviously, all partitions in Pn−1,k−1 a set with three elements are:
can be obtained in such a way. When n belongs to a
larger set, we can eliminate it obtaining a partition {1, 2, 3} {1} ∪ {2, 3} {1, 2} ∪ {3}
24 CHAPTER 2. SPECIAL NUMBERS
{1, 3} ∪ {2} {1} ∪ {2} ∪ {3} . This construction can be easily inverted and since it is
injective, we have proved that it is actually a 1-1 cor-
The numbers in this sequence are called Bell numbers
respondence. Because of that, ordered Bell numbers
and are denoted by Bn ; by definition we have:
are also called preferential arrangement numbers.
n n o
X We conclude this section by introducing another
n
Bn = . important sequence of numbers. These are (positive
k
k=0 or negative) rational numbers and therefore they can-
©nªnot correspond to any counting problem, i.e., their
¤ numbers grow very fast; however, since k ≤
£ nBell combinatorial interpretation cannot be direct. How-
k , for every value of n and k (a subset in Pn,k ever, they arise in many combinatorial problems and
corresponds to one or more cycles in Sn,k ), we always therefore they should be examined here, for the mo-
have Bn ≤ n!, and in fact Bn < n! for every n > 1. ment only introducing their definition. The Bernoulli
Another frequently occurring sequence is obtained numbers are implicitly defined by the recurrence re-
by ordering the subsets appearing in the partitions of lation:
Nn . For example, the partition {1} ∪ {2} ∪ {3} can be Xn µ ¶
n+1
ordered in 3! = 6 different ways, and {1, 2} ∪ {3} can Bk = δn,0 .
k
be ordered in 2! = 2 ways, i.e., {1, 2} ∪ {3} and {3} ∪ k=0
{1, 2}. These are called ordered partitions, and their No initial condition is necessary, because for n = 0
¡¢
number is denoted by On . By the previous example, we have 10 B0 = 1, i.e., B0 = 1. This is the starting
we easily see that O3 = 13, and the sequence begins: value, and B1 is obtained by setting n = 1 in the
recurrence relation:
n 0 1 2 3 4 5 6 7 µ ¶ µ ¶
On 1 1 3 13 75 541 4683 47293 2 2
B0 + B1 = 0.
0 1
Because of this definition, the numbers On are called
ordered Bell numbers and we have: We obtain B1 = −1/2, and we now have a formula
n
X n n o for B2 :
On = k!; µ ¶ µ ¶ µ ¶
k 3 3 3
k=0 B0 + B1 + B2 = 0.
0 1 2
this shows that On ≥ n!, and, in fact, On > n!, ∀n >
1. By performing the necessary computations, we find
Another combinatorial interpretation of the or- B2 = 1/6, and we can go on successively obtaining
dered Bell numbers is as follows. Let us fix an in- all the possible values for the Bn ’s. The first twelve
teger n ∈ N and for every k ≤ n let Ak be any mul- values are as follows:
tiset with n elements containing at least once all the
n 0 1 2 3 4 5 6
numbers 1, 2, . . . , k. The number of all the possible
Bn 1 −1/2 1/6 0 −1/30 0 1/42
orderings of the Ak ’s is just the nth ordered Bell num-
ber. For example, when n = 3, the possible multisets n 7 8 9 10 11 12
are: {1, 1, 1} , {1, 1, 2} , {1, 2, 2} , {1, 2, 3}. Their pos-
Bn 0 −1/30 0 5/66 0 −691/2730
sible orderings are given by the following 7 vectors:
Except for B1 , all the other values of Bn for odd
(1, 1, 1) (1, 1, 2) (1, 2, 1) (2, 1, 1) n are zero. Initially, Bernoulli numbers seem to be
small, but as n grows, they become extremely large in
(1, 2, 2) (2, 1, 2) (2, 2, 1) modulus, but, apart from the zero values, they are al-
plus the six permutations of the set {1, 2, 3}. These ternatively one positive and one negative. These and
orderings are called preferential arrangements. other properties of the Bernoulli numbers are not eas-
We can find a 1-1 correspondence between the ily proven in a direct way, i.e., from their definition.
orderings of set partitions and preferential arrange- However, we’ll see later how we can arrange things
ments. If (a1 , a2 , . . . , an ) is a preferential arrange- in such a way that everything becomes accessible to
ment, we build the corresponding ordered partition us.
by setting the element 1 in the a1 th subset, 2 in the
a2 th subset, and so on. If k is the largest number
in the arrangement, we exactly build k subsets. For
example, the partition corresponding to (1, 2, 2, 1) is
{1, 4} ∪ {2, 3}, while the partition corresponding to
(2, 1, 1, 2) is {2, 3}∪{1, 4}, whose ordering is different.
Chapter 3
3.1 Definitions for formal be called the (ordinary) generating function of the
sequence. The term ordinary is used to distinguish
power series these functions from exponential generating func-
Let R be the field of real numbers and let t be any tions, which will be introduced in the next chapter.
indeterminate over R, i.e., a symbol different from The indeterminate t is used as a “place-marker”, i.e.,
any element in R. A formal power series (f.p.s.) over a symbol to denote the place of the element in the se-
R in the indeterminate t is an expression: quence. For example, in the f.p.s. 1 + t + t2 + t3 + · · ·,
corresponding to the sequence (1, 1, 1, . . .), the term
∞
X t5 = 1 · t5 simply denotes that the element in position
f (t) = f0 +f1 t+f2 t2 +f3 t3 +· · ·+fn tn +· · · = fk tk 5 (starting from 0) in the sequence is the number 1.
k=0 Although our study of f.p.s. is mainly justified by
the development of a generating function theory, we
where f0 , f1 , f2 , . . . are all real numbers. The same
dedicate the present chapter to the general theory of
definition applies to every set of numbers, in particu-
f.p.s., and postpone the study of generating functions
lar to the field of rational numbers Q and to the field
to the next chapter.
of complex numbers C. The developments we are now
There are two main reasons why f.p.s. are more
going to see, and depending on the field structure of
easily studied than sequences:
the numeric set, can be easily extended to every field
F of 0 characteristic. The set of formal power series 1. the algebraic structure of f.p.s. is very well un-
over F in the indeterminate t is denoted by F[[t]]. The derstood and can be developed in a standard
use of a particular indeterminate t is irrelevant, and way;
there exists an obvious 1-1 correspondence between,
say, F[[t]] and F[[y]]; it is simple to prove that this 2. many f.p.s. can be “abbreviated” by expressions
correspondence is indeed an isomorphism. In order easily manipulated by elementary algebra.
to stress that our results are substantially indepen- The present chapter is devoted to these algebraic as-
dent of the particular field F and of the particular pects of f.p.s.. For example, we will prove that the
indeterminate t, we denote F[[t]] by F, but the reader series 1 + t + t2 + t3 + · · · can be conveniently abbre-
can think of F as of R[[t]]. In fact, in combinatorial viated as 1/(1 − t), and from this fact we will be able
analysis and in the analysis of algorithms the coeffi- to infer that the series has a f.p.s. inverse, which is
cients f0 , f1 , f2 , . . . of a formal power series are mostly 1 − t + 0t2 + 0t3 + · · ·.
used to count objects, and therefore they are positive We conclude this section by defining the concept
integer numbers, or, in some cases, positive rational of a formal Laurent (power) series (f.L.s.), as an ex-
numbers (e.g., when they are the coefficients of an ex- pression:
ponential generating function. See below and Section
4.1). g(t) = g−m t−m + g−m+1 t−m+1 + · · · + g−1 t−1 +
∞
X
If f (t) ∈ F, the order of f (t), denoted by ord(f (t)), 2
is the smallest index r for which fr 6= 0. The set of all + g0 + g1 t + g 2 t + · · · = gk t k .
k=−m
f.p.s. of order exactly r is denoted by F r or by Fr [[t]].
The formal power series 0 = 0 + 0t + 0t2 + 0t3 + · · · The set of f.L.s. strictly contains the set of f.p.s..
has infinite order. For a f.L.s. g(t) the order can be negative; when the
If (f0 , f1 , f2 , . . .) = (fk )k∈N is a sequence of (real) order of g(t) is non-negative, then g(t) is actually a
numbers, there is no substantial P∞difference between f.p.s..
P∞ We kobserve explicitly that an expression as
k
the sequence and the f.p.s. k=0 fk t , which will k=−∞ fk t does not represent a f.L.s..
25
26 CHAPTER 3. FORMAL POWER SERIES
3.2 The basic algebraic struc- from zero, then we can suppose that ord(f (t)) = k1
and ord(g(t)) = k2 , with 0 ≤ k1 , k2 < ∞. This
ture means fk1 6= 0 and gk2 6= 0; therefore, the product
The set F of f.p.s. can be embedded into several al- f (t)g(t) has the term of degree k1 +k2 with coefficient
gebraic structures. We are now going to define the fk1 gk2 6= 0, and so it cannot be zero. We conclude
most common one, which is related to the usual con- that (F, +, ·) is an integrity domain.
cept of sum and (Cauchy) product of series. Given The previous reasoning also shows that, in general,
P∞ P ∞ we have:
two f.p.s. f (t) = k=0 fk tk and g(t) = k=0 gk tk ,
the sum of f (t) and g(t) is defined as: ord(f (t)g(t)) = ord(f (t)) + ord(g(t))
∞
X ∞
X ∞
X The order of the identity 1 is obviously 0; if f (t) is an
f (t) + g(t) = fk tk + gk t k = (fk + gk )tk .
invertible element in F, we should have f (t)f (t)−1 =
k=0 k=0 k=0
1 and therefore ord(f (t)) = 0. On the other hand, if
From this definition, it immediately follows that F f (t) ∈ F 0 , i.e., f (t) = f0 + f1 t + f2 t2 + f3 t3 + · · · with
is a commutative group with respect to the sum. f0 6= 0, we can easily prove that f (t) is invertible. In
The associative and commutative laws directly fol- fact, let g(t) = f (t)−1 so that f (t)g(t) = 1. From
low from the analogous properties in the field F; the the explicit expression for the Cauchy product, we
identity is the f.p.s. 0 = 0 + 0tP+ 0t2 + 0t3 + · · ·, and can determine the coefficients of g(t) by solving the
∞ k
P∞ series ofkf (t) = k=0 fk t is the series infinite system of linear equations:
the opposite
−f (t) = k=0 (−fk )t .
f0 g0 = 1
Let us now define the Cauchy product of f (t) by
f0 g1 + f1 g0 = 0
g(t):
f0 g2 + f1 g1 + f2 g0 = 0
Ã∞ !à ∞ !
X X ···
f (t)g(t) = fk tk gk t k =
k=0 k=0
The system can be solved in a simple way, starting
with the first equation and going on one equation
X∞ Xk after the other. Explicitly, we obtain:
= fj gk−j tk
f1 f2 f2
k=0 j=0 g0 = f0−1 g1 = − 2 g2 = − 13 − 2 · · ·
f0 f0 f0
Because of the form of the tk coefficient, this is also
−1
called the convolution of f (t) and g(t). It is a good and therefore g(t) = f (t) is well defined. We con-
idea to write down explicitly the first terms of the clude stating the result just obtained: a f.p.s. is in-
Cauchy product: vertible if and only if its order is 0. Because of that,
F 0 is also called the set of invertible f.p.s.. Accord-
f (t)g(t) = f0 g0 + (f0 g1 + f1 g0 )t + ing to standard terminology, the elements of F 0 are
+ (f0 g2 + f1 g1 + f2 g0 )t2 + called the units of the integrity domain.
+ (f0 g3 + f1 g2 + f2 g1 + f3 g0 )t3 + · · · As a simple example, let us compute the inverse of
the f.p.s. 1 − t = 1 − t + 0t2 + 0t3 + · · ·. Here we have
This clearly shows that the product is commutative f0 = 1, f1 = −1 and fk = 0, ∀k > 1. The system
and it is a simple matter to prove that the identity is becomes:
the f.p.s. 1 = 1 + 0t + 0t2 + 0t3 + · · ·. The distributive
g0 = 1
law is a consequence of the distributive law valid in g1 − g 0 = 0
F. In fact, we have:
g2 − g 1 = 0
···
(f (t) + g(t))h(t) =
and we easily obtain that all the gj ’s (j = 0, 1, 2, . . .)
X∞ Xk
are 1. Therefore the inverse f.p.s. we are looking for
= (fj + gj )hk−j tk =
is 1 + t + t2 + t3 + · · ·. The usual notation for this
k=0 j=0
fact is:
1
∞
X X k ∞
X X k = 1 + t + t 2 + t3 + · · · .
= k
fj hk−j t + k
gj hk−j t = 1 − t
k=0 j=0 k=0 j=0 It is well-known that this identity is only valid for
= f (t)h(t) + g(t)h(t) −1 < t < 1, when t is a variable and f.p.s. are inter-
preted as functions. In our formal approach, however,
Finally, we can prove that F does not contain any these considerations are irrelevant and the identity is
zero divisor. If f (t) and g(t) are two f.p.s. different valid from a purely formal point of view.
3.4. OPERATIONS ON FORMAL POWER SERIES 27
3.3 Formal Laurent Series Our aim is to show that the field (L, +, ·) of f.L.s. is
isomorphic with the field constructed in the described
In the first section of this Chapter, we introduced the way starting with the integrity domain of f.p.s.. Let
concept of a formal Laurent series, P as an extension Lb = F × F be the set of pairs of f.p.s.; we begin
∞ k
of the concept
P∞ of a f.p.s.; if a(t) = k=m ak t and by showing that for every (f (t), g(t)) ∈ L b there ex-
k
b(t) = k=n bk t (m, n ∈ Z), are two f.L.s., we can ists a pair (a(t), b(t)) ∈ Lb such that (f (t), g(t)) ∼
define the sum and the Cauchy product: (a(t), b(t)) (i.e., f (t)b(t) = g(t)a(t)) and at least
∞
X X∞ one between a(t) and b(t) belongs to F 0 . In fact,
a(t) + b(t) = ak tk + bk t k = let p = min(ord(f (t)), ord(g(t))) and let us define
k=m k=n a(t), b(t) as f (t) = tp a(t) and g(t) = tp b(t); obviously,
X∞ either a(t) ∈ F 0 or b(t) ∈ F 0 or both are invertible
= (ak + bk )tk f.p.s.. We now have:
k=p
à !à !
∞
X ∞
X b(t)f (t) = b(t)tp a(t) = tp b(t)a(t) = g(t)a(t)
k k
a(t)b(t) = ak t bk t =
k=m k=n and this shows that (a(t), b(t)) ∼ (f (t), g(t)).
∞
X X If b(t) ∈ F 0 , then a(t)/b(t) ∈ F and is uniquely de-
= ai bj tk termined by a(t), b(t); in this case, therefore, our as-
k=q i+j=k sert is proved. So, let us now suppose that b(t) 6∈ F 0 ;
then we can write P b(t) = tm v(t), where v(t) ∈ F 0 . We
where p = min(m, n) and q = m + n. As we did for ∞
have a(t)/v(t) = k=0 dk tk ∈ F 0 , P and consequently
f.p.s., it is not difficult to find out that these opera- ∞ k−m
let us consider the f.L.s. l(t) = k=0 dk t ; by
tions enjoy the usual properties of sum and product, construction, it is uniquely determined by a(t), b(t)
and if we denote by L the set of f.L.s., we have that or also by f (t), g(t). It is now easy to see that l(t)
(L, +, ·) is a field. The only pointP we should formally is the inverse of the f.p.s. b(t)/a(t) in the sense of
∞
prove is that every f.L.s.Pa(t) = k=m ak tk 6= 0 has f.L.s. as considered above, and our proof is complete.
∞
an inverse f.L.s. b(t) = k=−m bk tk . However, this This shows that the correspondence is a 1-1 corre-
is proved in the same way we proved that every f.p.s. spondence between L and Lb preserving the inverse,
in F 0 has an inverse. In fact we should have: so it is now obvious that the correspondence is also
an isomorphism between (L, +, ·) and (L, b +, ·).
am b−m = 1
Because of this result, we can identify Lb and L
am b−m+1 + am+1 b−m = 0
and assert that (L, +, ·) is indeed the smallest field
am b−m+2 + am+1 b−m+1 + am+2 b−m = 0
··· containing (F, +, ·). From now on, the set Lb will be
ignored and we will always refer to L as the field of
By solving the first equation, we find b−m = a−1 f.L.s..
m ;
then the system can be solved one equation after the
other, by substituting the values obtained up to the
moment. Since am b−m is the coefficient of t0 , we have 3.4 Operations on formal
a(t)b(t) = 1 and the proof is complete. power series
We can now show that (L, +, ·) is the smallest
field containing the integrity domain (F, +, ·), thus Besides the four basic operations: addition, subtrac-
characterizing the set of f.L.s. in an algebraic way. tion, multiplication and division, it is possible to con-
From Algebra we know that given an integrity do- sider other operations on F, only a few of which can
main (K, +, ·) the smallest field (F, +, ·) containing be extended to L.
(K, +, ·) can be built in the following way: let us de- The most important operation is surely taking a
fine an equivalence relation ∼ on the set K × K: power of a f.p.s.; if p ∈ N we can recursively define:
½
(a, b) ∼ (c, d) ⇐⇒ ad = bc; f (t)0 = 1 if p = 0
f (t)p = f (t)f (t)p−1 if p > 0
if we now set F = K × K/ ∼, the set F with the
operations + and · defined as the extension of + and and observe that ord(f (t)p ) = p ord(f (t)). There-
· in K is the field we are searching for. This is just fore, f (t)p ∈ F 0 if and only if f (t) ∈ F 0 ; on the
the way in which the field Q of rational numbers is other hand, if f (t) 6∈ F 0 , then the order of f (t)p be-
constructed from the integrity domain Z of integers comes larger and larger and goes to ∞ when p → ∞.
numbers, and the field of rational functions is built This property will be important in our future devel-
from the integrity domain of the polynomials. opments, when we will reduce many operations to
28 CHAPTER 3. FORMAL POWER SERIES
infinite sums involving the powers f (t)p with p ∈ N. By applying well-known rules of the exponential
If f (t) 6∈ F 0 , i.e., ord(f (t)) > 0, these sums involve and logarithmic functions, we can easily define the
elements of larger and larger order, and therefore for corresponding operations for f.p.s., which however,
every index k we can determine the coefficient of tk as will be apparent, cannot be extended to f.L.s.. For
by only a finite number of terms. This assures that the exponentiation we have for f (t) ∈ F 0 , f (t) = f0 +
our definitions will be good definitions. v(t):
We wish also to observe that taking a positive in-
∞
X
teger power can be easily extended to L; in this case, v(t)k
ef (t) = exp(f0 + v(t)) = ef0 .
when ord(f (t)) < 0, ord(f (t)p ) decreases, but re- k!
k=0
mains always finite. In particular, for g(t) = f (t)−1 ,
g(t)p = f (t)−p , and powers can be extended to all Again, since v(t) 6∈ F 0 , the order of v(t)k increases
integers p ∈ Z. with k and the sums necessary to compute the co-
When the exponent p is a real or complex num- efficient of tk are always finite. The formula makes
ber whatsoever, we should restrict f (t)p to the case clear that exponentiation can be performed on every
f (t) ∈ F 0 ; in fact, if f (t) = tm g(t), we would have: f (t) ∈ F, and when f (t) 6∈ F 0 the factor ef0 is not
f (t)p = (tm g(t))p = tmp g(t)p ; however, tmp is an ex- present.
pression without any mathematical sense. Instead, For the logarithm, let us suppose f (t) ∈ F 0 , f (t) =
if f (t) ∈ F 0 , let us write f (t) = f0 + vb(t), with f0 + vb(t), v(t) = vb(t)/f0 ; then we have:
ord(bv (t)) > 0. For v(t) = vb(t)/f0 , we have by New-
ton’s rule: ln(f0 + vb(t)) = ln f0 + ln(1 + v(t)) =
X∞
X∞µ ¶ v(t)k
p p p p p p = ln f0 + (−1)k+1 .
f (t) = (f0 +b v (t)) = f0 (1+v(t)) = f0 v(t)k , k
k k=1
k=0
In this case, for f (t) 6∈ F 0 , we cannot define the log-
which can be assumed as a definition. In the¡ last ¢ arithm, and this shows an asymmetry between expo-
expression, we can observe that: i) f0p ∈ C; ii) kp is
nential and logarithm.
defined for every value of p, k being a non-negative
Another important operation is differentiation:
integer; iii) v(t)k is well-defined by the considerations
above and ord(v(t)k ) grows indefinitely, so that for d
∞
X
every k the coefficient of tk is obtained by a finite Df (t) = f (t) = kfk tk−1 = f ′ (t).
dt
sum. We can conclude that f (t)p is well-defined. k=1
Particular cases are p = −1 and p = 1/2. In the
This operation can be performed on every f (t) ∈ L,
former case, f (t)−1 is the inverse of the f.p.s. f (t).
and a very important observation is the following:
We have already seen a method for computing f (t)−1 ,
but now we obtain the following formula: Theorem 3.4.1 For every f (t) ∈ L, its derivative
∞ µ ¶ ∞ f ′ (t) does not contain any term in t−1 .
1 X −1 1 X
f (t)−1 = v(t)k = (−1)k v(t)k .
Proof: In fact, by the general rule, the term in
f0 k f0
k=0 k=0
t−1 should have been originated by the constant term
For p = 1/2, we obtain a formula for the square root (i.e., the term in t0 ) in f (t), but the product by k
of a f.p.s.: reduces it to 0.
p p X ∞ µ ¶ This fact will be the basis for very important results
1/2
f (t)1/2 = f (t) = f0 v(t)k = on the theory of f.p.s. and f.L.s. (see Section 3.8).
k
k=0 Another operation is integration; because indefinite
∞ µ ¶
p X (−1)k−1 2k integration leaves a constant term undefined, we pre-
= f0 v(t)k . fer to introduce and use only definite integration; for
4k (2k − 1) k
k=0
f (t) ∈ F this is defined as:
In Section 3.12, we will see how f (t)p can be ob- Z t ∞ Z t ∞
tained computationally without actually performing X X fk k+1
k
k f (τ )dτ = fk τ dτ = t .
the powers v(t) . We conclude by observing that this 0 0 k+1
k=0 k=0
more general operation of taking the power p ∈ R
cannot be extended to f.L.s.: in fact, we would have Our purely formal approach allows us to exchange
smaller and smaller terms tk (k → −∞) and there- the integration and summation signs; in general, as
fore the resulting expression cannot be considered an we know, this is only possibleRwhen the convergence is
t
actual f.L.s., which requires a term with smallest de- uniform. By this definition, 0 f (τ )dτ never belongs
gree. to F 0 . Integration can be extended to f.L.s. with an
3.6. COEFFICIENT EXTRACTION 29
obvious exception: because integration is the inverse position ◦; composition is always associative and
operation of differentiation, we cannot apply integra- therefore (F 1 , ◦) is a group if we prove that every
tion to a f.L.s. containing a term in t−1 . Formally, f (t) ∈ F 1 has a left (or right) inverse, because the
from the definition above, such a term would imply a theory assures that the other inverse exists and coin-
division by 0, and this is not allowed. In all the other cides with the previously found inverse. Let f (t) =
cases, integration does not create any problem. f1 t+f2 t2 +f3 t3 +· · · and g(t) = g1 t+g2 t2 +g3 t3 +· · ·;
we have:
(linearity) [tn ](αf (t) + βg(t)) = α[tn ]f (t) + β[tn ]g(t) (K1)
is called an operator and exactly the “coefficient of ” conclude with the so-called Newton’s rule:
operator or, more simply, the coefficient operator. µ ¶
r n
In Table 3.1 we state formally the main properties [tn ](1 + αt)r = α
n
of this operator, by collecting what we said in the pre-
vious sections. We observe that α, β ∈ R or α, β ∈ C which is one of the most frequently used results in
are any constants; the use of the indeterminate y is coefficient extraction. Let us remark explicitly that
only necessary not to confuse the action on different when r = −1 (the geometric series) we have:
f.p.s.; because g(0) = 0 in composition, the last sum µ ¶
is actually finite. Some points require more lengthy 1 −1 n
[tn ] = α =
comments. The property of shifting can be easily gen- 1 + αt n
µ ¶
eralized to [tn ]tk f (t) = [tn−k ]f (t) and also to nega- 1+n−1
= (−1)n αn = (−α)n .
tive powers: [tn ]f (t)/tk = [tn+k ]f (t). These rules are n
very important and are often applied in the theory of
A simple, but important use of Newton’s rule con-
f.p.s. and f.L.s.. In the former case, some care should
cerns the extraction of the coefficient of tn from the
be exercised to see whether the properties remain in
inverse of a trinomial at2 + bt + c, in the case it is
the realm of F or go beyond it, invading the domain
reducible, i.e., it can be written (1 + αt)(1 + βt); ob-
of L, which can be not always correct. The property
viously, we can always reduce the constant c to 1; by
of differentiation for n = 1 gives [t−1 ]f ′ (t) = 0, a
the linearity rule, it can be taken outside the “coeffi-
situation we already noticed. The operator [t−1 ] is
cient of” operator. Therefore, our aim is to compute:
also called the residue and is noted as “res”; so, for
example, people write resf ′ (t) = 0 and some authors 1
use the notation res tn+1 f (t) for [tn ]f (t). [tn ]
(1 + αt)(1 + βt)
We will have many occasions to apply rules (K1) ÷
(K5) of coefficient extraction. However, just to give with α 6= β, otherwise Newton’s rule would be im-
a meaningful example, let us find the coefficient of mediately applicable. The problem can be solved by
tn in the series expansion of (1 + αt)r , when α and using the technique of partial fraction expansion. We
r are two real numbers whatsoever. Rule (K3) can look for two constants A and B such that:
be written in the form [tn ]f (t) = n1 [tn−1 ]f ′ (t) and we 1 A B
= + =
can successively apply this form to our case: (1 + αt)(1 + βt) 1 + αt 1 + βt
A + Aβt + B + Bαt
rα n−1 = ;
[tn ](1 + αt)r = [t ](1 + αt)r−1 = (1 + αt)(1 + βt)
n
rα (r − 1)α n−2 if two such constants exist, the numerator in the first
= [t ](1 + αt)r−2 = · · · = expression should equal the numerator in the last one,
n n−1
rα (r − 1)α (r − n + 1)α 0 independently of t, or, if one so prefers, for every
= ··· [t ](1 + αt)r−n =
value of t. Therefore, the term A + B should be equal
n n−1 1
µ ¶ to 1, while the term (Aβ + Bα)t should always be 0.
n r
= α [t0 ](1 + αt)r−n . The values for A and B are therefore the solution of
n
the linear system:
½
We now observe that [t0 ](1 + αt)r−n = 1 because of A+B =1
our observations on f.p.s. operations. Therefore, we Aβ + Bα = 0
3.7. MATRIX REPRESENTATION 31
The discriminant of this system is α − β, which is At this point we only have to find the value of θ.
always different from 0, because of our hypothesis Obviously:
α 6= β. The system has therefore only one solution, Ãp ! √
which is A = α/(α−β) and B = −β/(α−β). We can |∆| . −b 4c − b2
now substitute these values in the expression above: θ = arctan + kπ = arctan + kπ
2 2 −b
1
[tn ] = ¡ √ ¢
(1 + αt)(1 + βt) When b < 0, we have 0 < arctan − 4c − b2 /2 <
µ ¶
1 α β π/2, and this is the correct value for θ. However,
= [tn ] − = when b > 0, the principal branch of
α − β 1 + αt 1 + βt ¡ arctan
√ is negative,
¢
µ ¶ and we should set θ = π + arctan − 4c − b2 /2. As
1 α β
= [tn ] − [tn ] = a consequence, we have:
α−β 1 + αt 1 + βt
√
αn+1 − β n+1 4c − b2
= (−1)n θ = arctan +C
α−β −b
Let us now consider a trinomial 1 + bt + ct2 for where C = π if b > 0 and C = 0 if b < 0.
which ∆ = b2 − 4c < 0 and b 6= 0. The trinomial is An interesting and non-trivial example is given by:
irreducible, but we can write:
1
1 σn = [tn ] =
[tn ] = 1 − 3t + 3t2
1 + bt + ct2 √ n+1 √
2( 3) sin((n + 1) arctan( 3/3))
1 = √ =
= [tn ] µ √ ¶µ √ ¶ 3
−b+i |∆| −b−i |∆| √ (n + 1)π
1− 2 t 1− 2 t = 2( 3)n sin
6
This time, a partial fraction expansion does not give
These coefficients have the following values:
a simple closed form for the coefficients, however, we
can apply the formula above in the form: √ 12k
n = 12k σn = 3 = 729k
√ 12k+2
1 αn+1 − β n+1 n = 12k + 1 σn = 3 = 3 × 729k
[tn ] = . √ 12k+2
(1 − αt)(1 − βt) α−β n = 12k + 2 σn = 2 3 = 6 × 729k
√ 12k+4
Since α and β are complex numbers, the resulting n = 12k + 3 σn = 3 = 9 × 729k
√ 12k+4
expression is not very appealing. We ³ can try to give n = 12k + 4 σn = 3 = 9 × 729k
p ´
it a better form. Let us set α = −b + i |∆| /2, n = 12k + 5 σn = 0
√ 12k+6
so α is always contained in the positive imaginary n = 12k + 6 σn = − 3 = −27 × 729k
√ 12k+8
halfplane. This implies 0 < arg(α) < π and we have: n = 12k + 7 σn = − 3 = −81 × 729k
p p √ 12k+8
b |∆| b |∆| n = 12k + 8 σn = −2 3 = −162 × 729k
α−β = − +i + +i = √ 12k+10
2 2p 2 2 n = 12k + 9 σn = − 3 = −243 × 729k
p √ 12k+10
= i |∆| = i 4c − b2 n = 12k + 10 σn = − 3 = −243 × 729k
n = 12k + 11 σn = 0
If θ = arg(α) and:
r
ρ = |α| =
b2
−
4c − b2
= c
√ 3.7 Matrix representation
4 4
Let f (t) ∈ F 0 ; with the coefficients of f (t) we form
we can set α = ρeiθ and β = ρe−iθ . Consequently: the following infinite lower triangular matrix (or ar-
³ ´ ray) D = (dn,k )n,k∈N : column 0 contains the coeffi-
αn+1 − β n+1 = ρn+1 ei(n+1)θ − e−i(n+1)θ = cients f0 , f1 , f2 , . . . in this order; column 1 contains
the same coefficients shifted down by one position
= 2iρn+1 sin(n + 1)θ and d0,1 = 0; in general, column k contains the coef-
ficients of f (t) shifted down k positions, so that the
and therefore:
first k positions are 0. This definition can be sum-
√
n 1 2( c)n+1 sin(n + 1)θ marized in the formula dn,k = fn−k , ∀n, k ∈ N. For
[t ] = √ a reason which will be apparent only later, the array
1 + bt + ct2 4c − b 2
32 CHAPTER 3. FORMAL POWER SERIES
³ ´
D will be denoted by (f (t), 1): If fbn,k = (1, f (t)/t), by definition we have:
n,k∈N
f0 0 0 0 0 ···
f1 f0 0 0 0 ··· fbn,k = [tn ]f (t)k
f2 f1 f0 0 0 ···
and therefore the generic element dn,k of the product
D = (f (t), 1) = f3 f2 f1 f0 0 ··· .
is:
f4 f3 f2 f1 f0 ···
∞ ∞
.. .. .. .. .. .. X X
. . . . . . dn,k = gbn,j fbj,k = [tn ]g(t)j [y j ]f (y)k =
j=0 j=0
If (f (t), 1) and (g(t), 1) are the matrices corre- ∞
X ¡ ¢
sponding to the two f.p.s. f (t) and g(t), we are in- = [tn ] [y j ]f (t)k g(t)j = [tn ]f (g(t))k .
terested in finding out what is the matrix obtained j=0
by multiplying the two matrices with the usual row-
by-column product. This product will be denoted by In other words, column k in (1, g(t)/t) · (1, f (t)/t) is
(f (t), 1) · (g(t), 1), and it is immediate to see what the kth power of the composition f (g(t)), and we can
its generic element dn,k is. The row n in (f (t), 1) is, conclude:
by definition, {fn , fn−1 , fn−2 , . . .}, and column k in
(1, g(t)/t) · (1, f (t)/t) = (1, f (g(t))/t).
(g(t), 1) is {0, 0, . . . , 0, g1 , g2 , . . .} where the number
of leading 0’s is just k. Therefore we have: Clearly, the identity t ∈ F 1 corresponds to the ma-
∞
X trix (1, t/t) = (1, 1), the identity matrix, and this is
dn,k = fn−j gj−k sufficient to prove that the correspondence f (t) ↔
j=0 (1, f (t)/t) is a group isomorphism.
Row-by-column product is surely the basic op-
if we conventionally set P∞gr = 0, ∀r P < 0, When
n
eration on matrices and its extension to infinite,
k = 0, we have dn,0 = j=0 fn−j gj = j=0 fn−j gj , lower triangular arrays is straight-forward, because
and therefore column 0 contains the coefficients of the sums involved in the product are actually finite.
the convolution f (t)g(t). When k = 1 we have We have shown that we can associate every f.p.s.
P∞ Pn−1
dn,1 = j=0 fn−j gj−1 = j=0 fn−1−j gj , and this f (t) ∈ F 0 to a particular matrix (f (t), 1) (let us
n−1 denote by A the set of such arrays) in such a way
is the coefficient of t in the convolution f (t)g(t).
Proceeding in the same way, we see that column k that (F 0 , ·) is isomorphic to (A, ·), and the Cauchy
contains the coefficients of the convolution f (t)g(t) product becomes the row-by-column product. Be-
shifted down k positions. Therefore we conclude: sides, we can associate every f.p.s. g(t) ∈ F 1 to a
matrix (1, g(t)/t) (let us call B the set of such matri-
(f (t), 1) · (g(t), 1) = (f (t)g(t), 1) ces) in such a way that (F 1 , ◦) is isomorphic to (B, ·),
and the composition of f.p.s. becomes again the row-
and this shows that there exists a group isomorphism by-column product. This reveals a connection be-
between (F 0 , ·) and the set of matrices (f (t), 1) with tween the Cauchy product and the composition: in
the row-by-column product. In particular, (1, 1) is the Chapter on Riordan Arrays we will explore more
the identity (in fact, it corresponds to the identity deeply this connection; for the moment, we wish to
matrix) and (f (t)−1 , 1) is the inverse of (f (t), 1). see how this observation yields to a computational
Let us now consider a f.p.s. f (t) ∈ F 1 and let method for evaluating the compositional inverse of a
us build an infinite lower triangular matrix in the f.p.s. in F 1 .
following way: column k contains the coefficients of
f (t)k in their proper order:
3.8 Lagrange inversion theo-
1 0 0 0 0 ···
0 f1 0 0 0 ··· rem
0 f2 f 2
0 0 ···
1 Given an infinite, lower triangular array of the form
0 f3 2f1 f2 f13 0 ··· .
(1, f (t)/t), with f (t) ∈ F 1 , the inverse matrix
0 f4 2f1 f3 + f22 3f12 f2 f14 · · ·
(1, g(t)/t) is such that (1, g(t)/t) · (1, f (t)/t) = (1, 1),
.. .. .. .. .. . .
. . . . . . and since the product results in (1, f (g(t))/t) we have
f (g(t)) = t. In other words, because of the isomor-
The matrix will be denoted by (1, f (t)/t) and we are phism we have seen, the inverse matrix for (1, f (t)/t)
interested to see how the matrix (1, g(t)/t)·(1, f (t)/t) is just the matrix corresponding to the compositional
is composed when f (t), g(t) ∈ F 1 . inverse of f (t). As we have already said, Lagrange
3.9. SOME EXAMPLES OF THE LIF 33
found a noteworthy formula for the coefficients of in fact, f (t)k−n is a f.L.s. and, as we observed, the
this compositional inverse. We follow the more recent residue of its derivative should be zero. This proves
proof of Stanley, which points out the purely formal that D · (1, f (t)/t) = (1, 1) and therefore D is the
aspects of Lagrange’s formula. Indeed, we will prove inverse of (1, f (t)/t).
something more, by finding the exact form of the ma- If f (t) is the compositional inverse of f (t), the col-
trix (1, g(t)/t), inverse of (1, f (t)/t). As a matter of umn 1 gives us the value of its coefficients; by the
fact, we state what the form of (1, g(t)/t) should be formula for dn,k we have:
and then verify that it is actually so. µ ¶n
Let D = (dn,k )n,k∈N be defined as: n 1 n−1 t
f n = [t ]f (t) = dn,1 = [t ]
µ ¶n n f (t)
k t
dn,k = [tn−k ] . and this is the celebrated Lagrange Inversion For-
n f (t) mula (LIF). The other columns give us the coefficients
k
Because f (t)/t ∈ F 0 , the power (t/f (t)) = of the powers f (t) , for which we have:
k
vn,k =
k n n
[t ]t tf (t)k−n−1 f ′ (t) =
3.9 Some examples of the LIF
n
k −1 1 d ¡ ¢ We found that the number bn of binary trees
= [t ] f (t)k−n = 0; with n nodes (and of other combinatorial objects
n k − n dt
34 CHAPTER 3. FORMAL POWER SERIES
P
Pnwell) satisfies the recurrence relation: bn+1 = which proves that Tn+1 =
as Ti1 Ti2 · · · Tip , where the
P∞ k=0 b k b n−k . Let us consider the f.p.s. b(t) = sum is extended to all the p-uples (i1 , i2 , . . . , ip ) such
k
k=0 bk t ; if we multiply the recurrence relation by that i1 + i2 + · · · + ip = n. As before, we can multiply
tn+1 and sum for n from 0 to infinity, we find: by tn+1 the two members of the recurrence relation
à n ! and sum for n from 0 to infinity. We find:
X ∞ X ∞ X
bn+1 tn+1 = tn+1 bk bn−k . T (t) − 1 = tT (t)p .
n=0 n=0 k=0
This time we have a p degree equation, which cannot
Since b0 = 1, we can add and subtract 1 = b0 t0 from be directly solved. However, if we set w(t) = T (t)−1,
the left hand member and can take t outside the sum- so that w(t) ∈ F 1 , we have:
mation sign in the right hand member:
à n ! w = t(1 + w)p
X∞ ∞
X X
n
bn t − 1 = t bk bn−k tn . and the LIF gives:
n=0 n=0 k=0
1
Tn = [tn ]w(t) = [tn−1 ](1 + t)pn =
In the r.h.s. we recognize a convolution and substi- µ ¶n
tuting b(t) for the corresponding f.p.s., we obtain: 1 pn 1 (pn)!
= = =
n n−1 n (n − 1)!((p − 1)n + 1)!
b(t) − 1 = tb(t)2 . µ ¶
1 pn
=
n (p − 1)n + 1 n
We are interested in evaluating bn = [t ]b(t); let us
therefore set w = w(t) = b(t) − 1, so that w(t) ∈ F 1 which generalizes the formula for the Catalan num-
and wn = bn , ∀n > 0. The previous relation becomes bers.
w = t(1 + w)2 and we see that the LIF can be applied Finally, let us find the solution of the functional
(in the form relative to the functional equation) with equation w = tew . The LIF gives:
φ(t) = (1 + t)2 . Therefore we have:
1 n−1 nt 1 nn−1 nn−1
1 wn = [t ]e = = .
bn = [tn ]w(t) = [tn−1 ](1 + t)2n = n n (n − 1)! n!
µ ¶n Therefore, the solution we are looking for is the f.p.s.:
1 2n 1 (2n)!
= = =
n n−1 n (n − 1)!(n + 1)! X∞
µ ¶ nn−1 n 3 8 125 5
1 (2n)! 1 2n w(t) = t = t + t 2 + t3 + t4 + t +· · · .
= = . n=1
n! 2 3 24
n + 1 n!n! n+1 n
As we said in the previous chapter, bn is called the As noticed, w(t) is the compositional inverse of
nth Catalan number and, under this name, it is often t3 t4 t5
denoted by Cn . Now we have its form: f (t) = t/φ(t) = te−t = t − t2 + − + − · · · .
2! 3! 4!
µ ¶
1 2n It is a useful exercise to perform the necessary com-
Cn =
n+1 n putations to show that f (w(t)) = t, for example up to
the term of degree 5 or 6, and verify that w(f (t)) = t
also valid in the case n = 0 when C0 = 1. as well.
In the same way we can compute the number of
p-ary trees with n nodes. A p-ary tree is a tree in
which all the nodes have arity p, except for leaves, 3.10 Formal power series and
which have arity 0. A non-empty p-ary tree can be the computer
decomposed in the following way:
When we are dealing with generating functions, or
r
©©HH more in general with formal power series of any kind,
© © ¶ HH
© ¶ H we often have to perform numerical computations in
©© ¶ HH order to verify some theoretical result or to experi-
r ©
© r¶ HHr ment with actual cases. In these and other circum-
J J J stances the computer can help very much with its
T1J T2J . . . TpJ speed and precision. Nowadays, several Computer
J J J
Algebra Systems exist, which offer the possibility of
3.11. THE INTERNAL REPRESENTATION OF EXPRESSIONS 35
actually working with formal power series, contain- provided that (m, n) is a reduced rational number.
ing formal parameters as well. The use of these tools The dimension of m and n is limited by the internal
is recommended because they can solve a doubt in representation of integer numbers.
a few seconds, can clarify difficult theoretical points In order to avoid this last problem, Computer Al-
and can give useful hints whenever we are faced with gebra Systems usually realize an indefinite precision
particular problems. integer arithmetic. An integer number has a vari-
However, a Computer Algebra System is not al- able length internal representation and special rou-
ways accessible or, in certain circumstances, one may tines are used to perform the basic operations. These
desire to use less sophisticated tools. For example, routines can also be realized in a high level program-
programmable pocket computers are now available, ming language (such as C or JAVA), but they can
which can perform quite easily the basic operations slow down too much execution time if realized on a
on formal power series. The aim of the present and programmable pocket computer.
of the following sections is to describe the main al-
gorithms for dealing with formal power series. They
can be used to program a computer or to simply un- 3.11 The internal representa-
derstand how an existing system actually works.
The simplest way to represent a formal power se- tion of expressions
ries is surely by means of a vector, in which the kth
component (starting from 0) is the coefficient of tk in The simple representation of a formal power series by
the power series. Obviously, the computer memory a vector of real, or rational, components will be used
only can store a finite number of components, so an in the next sections to explain the main algorithms
upper bound n0 is usually given to the length of vec- for formal power series operations. However, it is
tors and to represent power series. In other words we surely not the best way to represent power series and
have: becomes completely useless when, for example, the
̰ ! coefficients depend on some formal parameter. In
X other words, our representation only can deal with
k
reprn0 ak t = (a0 , a1 , . . . , an ) (n ≤ n0 )
purely numerical formal power series.
k=0
Because of that, Computer Algebra Systems use a
Fortunately, most operations on formal power se- more sophisticated internal representation. In fact,
ries preserve the number of significant components, power series are simply a particular case of a general
so that there is little danger that a number of succes- mathematical expression. The aim of the present sec-
sive operations could reduce a finite representation to tion is to give a rough idea of how an expression can
a meaningless sequence of numbers. Differentiation be represented in the computer memory.
decreases by one the number of useful components; In general, an expression consists in operators and
√
on the contrary, integration and multiplication by tr , operands. For example, in a + 3, the operators are
say, increase the number of significant elements, at √
+ and ·, and the operands are a and 3. Every
the cost of introducing some 0 components. operator has its own arity or adicity, i.e., the number
The components a0 , a1 , . . . , an are usually real of operands on which it acts. The adicity of the sum
numbers, represented with the precision allowed by + is two, because it √ acts on its two terms; the adicity
the particular computer. In most combinatorial ap- of the square root · is one, because it acts on a
plications, however, a0 , a1 , . . . , an are rational num- single term. Operands can be numbers (and it is
bers and, with some extra effort, it is not difficult important that the nature of the number be specified,
to realize rational arithmetic on a computer. It is i.e., if it is a natural, an integer, a rational, a real or a
sufficient to represent a rational number as a couple complex number) or can be a formal parameter, as a.
(m, n), whose intended meaning is just m/n. So we Obviously, if an operator acts on numerical operands,
must have m ∈ Z, n ∈ N and it is a good idea to it can be executed giving a numerical result. But if
keep m and n coprime. This can be performed by any of its operands is a formal parameter, the result is
a routine reduce computing p = gcd(m, n) using Eu- a formal expression, which may perhaps be simplified
clid’s algorithm and then dividing both m and n by but cannot be evaluated to a numerical result.
p. The operations on rational numbers are defined in An expression can always be transposed into a
the following way: “tree”, the internal nodes of which correspond to
(m, n) + (m′ , n′ ) = reduce(mn′ + m′ n, nn′ ) operators and whose leaves correspond to operands.
The simple tree for the previous expression is given
(m, n) × (m′ , n′ ) = reduce(mm′ , nn′ )
in Figure 3.1. Each operator has as many branches as
(m, n)−1 = (n, m) its adicity is and a simple visit to the tree can perform
(m, n)p = (mp , np ) (p ∈ N) its evaluation, that is it can execute all the operators
36 CHAPTER 3. FORMAL POWER SERIES
+
¡ @
¡
¡ @
@
∗ +
¡ ¡ @
¡
º· ¡ º· ¡¡ @@
a0 t 0 ∗ +
¹¸¹¸ .
¡ ¡ .
¡
º· ¡ º·
¡ .
¡ .
.
a1 t 1 ∗
¹¸¹¸ +
¡
¡
º· ¡ º· ¡ @
a2 t2 ¡¡ @¾»
@
¹¸¹¸ ∗ O(tn )
½¼
¡
¡
º· ¡ º·
an tn
¹¸¹¸
J. C. P. Miller has devised an algorithm allowing to The computation is now straight-forward. We be-
perform (1 + g(t))α in time O(r2 ). In fact, let us gin by setting h0 = 1, and then we successively com-
write h(t) = a(t)α , where a(t) is any formal power pute h1 , h2 , . . . , hr (r = n, if n is the number of terms
series with a0 = 1. By differentiating, we obtain in (a1 , a2 , . . . , an )). The evaluation of hk requires a
h′ (t) = αa(t)α−1 a′ (t), or, by multiplying everything number of operations in the order O(k), and therefore
by a(t), a(t)h′ (t) = αh(t)a′ (t). Therefore, by extract- the whole procedure works in time O(r2 ), as desired.
ing the coefficient of tn−1 we find: The inverse of a series, i.e., (1+g(t))−1 , is obtained
by setting α = −1. It is worth noting that the previ-
n−1
X n−1
X ous formula becomes:
ak (n − k)hn−k = α (k + 1)ak+1 hn−k−1
k−1
X k
X
k=0 k=0
hk = −ak − aj hk−j = − aj hk−j (h0 = 1)
We now isolate the term with k = 0 in the left hand j=1 j=1
member and the term having k = n − 1 in the right and can be used to prove properties of the inverse of
hand member (a0 = 1 by hypothesis): a power series. As a simple example, the reader can
show that the coefficients in (1 − t)−1 are all 1.
n−1
X n−1
X
nhn + ak (n − k)hn−k = αnan + αkak hn−k
k=1 k=1 3.13 Logarithm and exponen-
(in the last sum we performed the change of vari- tial
able k → k − 1, in order to have the same indices as
in the left hand member). We now have an expres- The idea of Miller can be applied to other operations
sion for hn only depending on (a1 , a2 , . . . , an ) and on formal power series. In the present section we
(h1 , h2 , . . . , hn−1 ): wish to use it to perform the (natural) logarithm and
the exponentiation of a series. Let us begin with the
n−1
1X logarithm and try to compute ln(1 + g(t)). As we
hn = αan + ((α + 1)k − n) ak hn−k =
n know, there is a direct way to perform this operation,
k=1
i.e.:
X µ (α + 1)k
n−1 ¶ ∞
X 1
= αan + − 1 ak hn−k ln(1 + g(t)) = g(t)k
n k
k=1 k=1
38 CHAPTER 3. FORMAL POWER SERIES
and this formula only requires a series of successive number of significant terms in f (t) and g(t), respec-
products. As for the operation of rising to a power, tively.
the procedure needs a time in the order of O(r3 ), We conclude this section by sketching the obvious
and it is worth considering an alternative approach. algorithms to compute differentiation and integration
In fact, if we set h(t) = ln(1 + g(t)), by differentiating of a formal power series f (t) = f0 +f1 t+f2 t2 +f3 t3 +
we obtain h′ (t) = g ′ (t)/(1 + g(t)), or h′ (t) = g ′ (t) − · · ·. If h(t) = f ′ (t), we have:
h′ (t)g(t). We can now extract the coefficient of tk−1
and obtain: hk = (k + 1)fk+1
k−1
X and therefore the number of significant
Rt terms is re-
khk = kgk − (k − j)hk−j gj duced by 1. Conversely, if h(t) = 0 f (τ )dτ , we have:
j=0
Xµ
k−1
j
¶
= gk − 1− hk−j gj
j=1
k
Generating Functions
4.1 General Rules in the last passage we applied backwards the formula:
1 n−1 ′
Let us consider a sequence of numbers F = [tn ]F (w(t)) = [t ]F (t)φ(t)n (w = tφ(w))
n
(f0 , f1 , f2 , . . .) = (fk )k∈N ; the (ordinary) generat-
ing function for the sequence F is defined as f (t) = and therefore w = w(t) ∈ F 1 is the unique solution
f0 + f1 t + f2 t2 + · · ·, where the indeterminate t is of the functional equation w = tφ(w). By now apply-
arbitrary. Given the sequence (fk )k∈N , we intro- ing the rule of differentiation for the “coefficient of”
duce the generating function operator G, which ap- operator, we can go on:
plied to (fk )k∈N produces the ordinary generating
function for the sequence, i.e., G(fk )k∈N = f (t). In [tn ]F (t)φ(t)n
·Z ¸
this expression t is a bound variable, and a more ac- d F (y) ¯¯
= [tn−1 ] dy ¯ y = w(t) =
curate notation would be Gt (fk )k∈N = f (t). This dt y
notation is essential when (fk )k∈N depends on some · ¯ ¸
F (w) ¯ dw
parameter or when we consider multivariate gener- = [tn−1 ] ¯ w = tφ(w) .
w dt
ating functions. In this latter case, for example, we
should write Gt,w (fn,k )n,k∈N = f (t, w) to indicate the We have applied the chain rule for differentiation, and
fact that fn,k in the double sequence becomes the co- from w = tφ(w) we have:
efficient of tn wk in the function f (t, w). However, · ¸
whenever no ambiguity can arise, we will use the no- dw dφ ¯¯ dw
= φ(w) + t ¯ w = tφ(w) .
tation G(fk ) = f (t), understanding also the binding dt dw dt
for the variable k. For the sake of completeness, we
We can therefore compute the derivative of w(t):
also define the exponential generating function of the
sequence (f0 , f1 , f2 , . . .) as: · ¯ ¸
dw φ(w) ¯
= ¯ w = tφ(w)
µ ¶ dt 1 − tφ′ (w)
∞
X
fk tk
E(fk ) = G = fk . where φ′ (w) denotes the derivative of φ(w) with re-
k! k!
k=0 spect to w. We can substitute this expression in the
formula above and observe that w/φ(w) = t can be
The operator G is clearly linear. The function f (t) taken outside of the substitution symbol:
can be shifted or differentiated. Two functions f (t)
and g(t) can be multiplied and composed. This leads [tn ]F (t)φ(t)n =
· ¯ ¸
to the properties for the operator G listed in Table 4.1. F (w) φ(w) ¯
= [tn−1 ] ¯ w = tφ(w) =
Note that formula (G5) requires g0 = 0. The first w 1 − tφ′ (w)
· ¸
five formulas are easily verified by using the intended 1 F (w) ¯¯
interpretation of the operator G; the last formula can = [tn−1 ] ¯ w = tφ(w) =
t 1 − tφ′ (w)
be proved by means of the LIF, in the form relative · ¸
F (w) ¯¯
to the composition F (w(t)). In fact we have: = [tn ] ¯ w = tφ(w)
1 − tφ′ (w)
F (t) which is our diagonalization rule (G6).
[tn ]F (t)φ(t)n = [tn−1 ] φ(t)n =
t The name “diagonalization” is due to the fact
·Z ¸
F (y) ¯¯ that if we imagine the coefficients of F (t)φ(t)n as
= n[tn ] dy ¯ y = w(t) ; constituting the row n in an infinite matrix, then
y
39
40 CHAPTER 4. GENERATING FUNCTIONS
The principle is rather obvious from the very defini- G(fk−j ) = tj G(fk ) (4.2.3)
tion of the concept of generating functions; however, ¡ ¢
Proof: We have G(fk ) = G f(k−1)+1 =
it is important, because it states the condition under t−1 (G(fn−1 ) − f−1 ) where f−1 is the coefficient of t−1
which we can pass from an identity about elements in f (t). If f (t) ∈ F, f−1 = 0 and G(fk−1 ) = tG(fk ).
to the corresponding identity about generating func- The theorem then follows by mathematical induction.
tions. It is sufficient that the two sequences do not
agree by a single element (e.g., the first one) and we
cannot infer the equality of generating functions. Property (G3) can be generalized in several ways:
G(fk ) − f0 − f1 t
G(fk+2 ) = (4.2.1)
t2
Proof: Let gk = fk+1 ; by (G2), G(gk ) = Theorem 4.2.5 Let f (t) = G(fk ) be as above; then:
(G(fk ) − f0 ) /t. Since g0 = f1 , we have: ¡ ¢
G k 2 fk = tDG(fk ) + t2 D2 G(fk ) (4.2.5)
G(gk ) − g0 G(fk ) − f0 − f1 t
G(fk+2 ) = G(gk+1 ) = = This can be further generalized:
t t2
4.3. MORE ADVANCED RESULTS 41
Theorem 4.2.6 Let f (t) = G(fk ) be as above; then: Proof: Let us consider the sequence (gk )k∈N , where
¡ ¢ gk+1 = fk and g0 = 0. So we have: G(gk+1 ) =
G k j fk = Sj (tD)G(fk ) (4.2.6)
G(fk ) = t−1 (G(gk ) − g0 ). Finally:
Pj © j ª r µ ¶ µ ¶ µ ¶
where Sj (w) = r=1 r w is the jth Stirling poly- 1 1 1 1
nomial of the second kind (see Section 2.11). G f k = G g k+1 = G g k =
k+1 k+1 t k
Z Z
Proof: Formula (4.2.6) is to be understood in the 1 t dz 1 t
= (G(gk ) − g0 ) = G(fk ) dz
operator sense; so, for example, being S3 (w) = w + t 0 z t 0
3w2 + w3 , we have:
¡ ¢
G k 3 fk = tDG(fk ) + 3t2 D2 G(fk ) + t3 D3 G(fk ) . In the following theorems, f (t) will always denote
the generating function G(fk ).
The proof proceeds by induction, as (G3) and (4.2.5)
are the first two instances. Now: Theorem 4.2.10 Let f (t) = G(fk ) denote the gen-
¡ j+1 ¢ ¡ j ¢ ¡ j ¢ erating function of the sequence (fk )k∈N ; then:
G k fk = G k(k )fk = tDG k fk ¡ ¢
G pk fk = f (pt) (4.2.10)
that is: Proof: P By setting g(t) P = pt in (G5) we¡ have:¢
∞ n ∞ n n
Xj f (pt) = f
n=0 n (pt) = n=0 p fn t = G pk fk
r r
Sj+1 (tD) = tDSj (tD) = tD S(j, r)t D =
r=1 ¡ ¢
j j In particular, for p = −1 we have: G (−1)k fk =
X X
= S(j, r)rtr Dr + S(j, r)tr+1 Dr+1 . f (−t).
r=1 r=1
∞ µ ¶k
The proof of the second formula is analogous. 1 X k t
= [tn ] [y ]f (y) =
The following proof is typical and introduces the 1−t 1−t
k=0
µ ¶
use of ordinary differential equations in the calculus 1 t
of generating functions: = [tn ] f .
1−t 1−t
Theorem 4.3.2 Let f (t) = G(fk ) denote the gener- Since the last expression does not depend on n, it
ating function of the sequence (fk )k∈N ; then: represents the generating function of the sum.
µ ¶ Z t
fk 1 G(fk ) PWe ¡observe
¢ explicitly that by (K4) we have:
G = √ √ dz (4.3.3) n n n n
2k + 1 2 t 0 z k=0 k fk = [t ](1 + t) f (t), but this expression
does not represent a generating function because it
Proof: Let us set gk = fk /(2k+1), or 2kgk +gk = fk . depends on n. The Euler transformation can be gen-
If g(t) = G(gk ), by applying (G3) we have the differ- eralized in several ways, as we shall see when dealing
ential equation 2tg ′ (t) + g(t) = f (t), whose solution with Riordan arrays.
having g(0) = f0 is just formula (4.3.3).
We conclude with two general theorems on sums:
4.4 Common Generating Func-
Theorem 4.3.3 (Partial Sum Theorem) Let
f (t) = G(fk ) denote the generating function of the tions
sequence (fk )k∈N ; then:
The aim of the present section is to derive the most
à n !
X 1 common generating functions by using the apparatus
G fk = G(fk ) (4.3.4) of the previous sections. As a first example, let us
1−t
k=0 consider the constant sequence F = (1, 1, 1, . . .), for
Pn which we have fk+1 = fk for every k ∈ N. By ap-
Proof: If we set sn = k=0 fk , then we have sn+1 =
sn + fn+1 for every n ∈ N and we can apply the oper- plying the principle of identity, we find: G(fk+1 ) =
ator G to both members: G(sn+1 ) = G(sn ) + G(fn+1 ), G(fk ), that is by (G2): G(fk ) − f0 = tG(fk ). Since
i.e.: f0 = 1, we have immediately:
G(sn ) − s0 G(fn ) − f0
= G(sn ) + 1
t t G(1) =
1−t
Since s0 = f0 , we find G(sn ) = tG(sn ) + G(fn ) and
from this (4.3.4) follows directly. For any constant sequence F = (c, c, c, . . .), by (G1)
we find that G(c) = c(1−t)−1 . Similarly, by using the
The following result is known as Euler transforma- basic rules and the theorems of the previous sections
tion: we have:
Theorem 4.3.4 Let f (t) = G(fk ) denote the gener- 1 t
G(n) = G(n · 1) = tD =
ating function of the sequence (fk )k∈N ; then: 1−t (1 − t)2
à n µ ¶ ! µ ¶ ¡ ¢ 1 t + t2
X n 1 t G n2 = tDG(n) = tD =
G fk = f (4.3.5) (1 − t) 2 (1 − t)3
k 1−t 1−t
k=0 1
G((−1)n ) = G(1) ◦ (−t) =
Proof: By well-known properties of binomial coeffi- 1+t
µ ¶ µ ¶ Z tµ ¶
cients we have: 1 1 1 dz
µ ¶ µ ¶ µ ¶ G = G ·1 = −1 =
n n −n + n − k − 1 n n 0 1 − z z
= = (−1)n−k = Z t
k n−k n−k dz 1
µ ¶ = = ln
−k − 1 0 1−z 1−t
= (−1)n−k à n ! µ ¶
n−k X1 1 1
G(H n ) = G = G =
and this is the coefficient of t n−k
in (1 − t)−k−1
. We k 1 − t n
k=0
now observe that the sum in (4.3.5) can be extended 1 1
to infinity, and by (G5) we have: = ln
1−t 1−t
Xn µ ¶ Xn µ ¶
n −k − 1 where Hn is the nth harmonic number. Other gen-
fk = (−1)n−k fk =
k n−k erating functions can be obtained from the previous
k=0 k=0
∞ formulas:
X
n−k −k−1 k 1 1
= [t ](1 − t) [y ]f (y) =
G(nHn ) = tD ln =
k=0 1−t 1−t
4.4. COMMON GENERATING FUNCTIONS 43
µ ¶
t 1 By (K4), we have the well-known Vandermonde
= ln +1
(1 − t)2 1−t convolution:
µ ¶ Z t µ ¶
1 1 1 1 m+p
G Hn = ln dz = = [tn ](1 + t)m+p =
n+1 t 0 1−z 1−z n
µ ¶2
1 1 = [tn ](1 + t)m (1 + t)p =
= ln X n µ ¶µ ¶
2t 1−t m p
=
1−t k n−k
G(δ0,n ) = G(1) − tG(1) = =1 k=0
1−t Pn ¡n¢2 ¡2n¢
where δn,m is the Kronecker’s delta. This last re- which, for m = p = n becomes
³¡ ¢ ´ k=0 k = n .
k
lation can be readily generalized to³G(δn,m´ ) = tm . We can also find G p , where p is a parameter.
An interesting example is given by G n(n+1) . Since The derivation is purely algebraic and makes use of
1
n−k 2
k=0
4.7 Some special generating We can think of the last sum as a “semi-closed”
functions form, because the√number of terms is dramatically
reduced from n to n, although it remains depending
We wish to determine the generating function of on n. In the same way we find:
the sequence {0, 1, 1, 1, 2, 2, 2, 2, 3, . . .},
√ that is the
sequence whose generic element is ⌊ k⌋. We can X µn¶ ⌊log2 n⌋ µ
X n−1
¶
k
think that it is formed up by summing an infinite ⌊log2 k⌋(−1) = .
k n − 2k
number of simpler sequences {0, 1, 1, 1, 1, 1, 1, . . .}, k k=1
∞
X 2 2 2
2 1 2k 2k (1 − t)k − 1
= [tn−k ] = = − =
(1 − t)2 1 − 2t (1 − t)k2 (1 − 2t)
k=1
√
⌊ n⌋ µ ¶ 1
X −2 = .
=
2
(−1)n−k = (1 − t) 2 (1 − 2t)
k
n − k2
k=1 Therefore, we conclude:
√
⌊ n⌋ µ ¶
X n−k +1 2
√ √
= = n µ ¶ √
X ⌊ n⌋
X 2 X n−1¶
⌊ n⌋ µ
n − k2 n 2
k=1 ⌊ k⌋ = 2k 2n−k − −
√ k n − k2
⌊ n⌋ k=0 k=1 k=1
X
= (n − k 2 + 1) = √ √
k=1 X µn−2¶
⌊ n⌋
X 2 µn − k 2 ¶
⌊ n⌋
√ − 2 − ··· − 2k −1 =
√
⌊ n⌋
X n − k2 n − k2
k=1 k=1
= (n + 1)⌊ n⌋ − k2 . √
⌊ n⌋ µµ µ ¶ ¶
k=1 √ X n−1 n−2
n
= ⌊ n⌋2 − +2 + ···+
The final value of the sum is therefore: n − k2 n − k2
k=1
√ µ ¶ µ ¶¶
√ ⌊ n⌋ ¡ √ ¢ √ 1 2
(n + 1)⌊ n⌋ − ⌊ n⌋ + 1 ⌊ n⌋ + k2 −1 n − k
3 2 +··· + 2 .
n − k2
√
whose asymptotic value is 23 n n. Again, for the anal- We observe that for very large values of n the first
ogous sum with ⌊log2 n⌋, we obtain (see Chapter 1): term dominates all the others,√and therefore the
n
asymptotic value of the sum is ⌊ n⌋2n .
X
⌊log2 k⌋ = (n + 1)⌊log2 n⌋ − 2⌊log2 n⌋+1 + 1.
k=1
4.8 Linear recurrences with
A somewhat more difficult sum is the following one: constant coefficients
µ ¶ "∞ #
n
X n √ 1 X yk ¯ 2 ¯ t
⌊ k⌋ = [tn ] ¯y= If (fk )k∈N is a sequence, it can be defined by means
k 1−t 1−y 1−t of a recurrence relation, i.e., a relation relating the
k=0 k=0
X ∞
t k2 generic element fn to other elements fk having k < n.
= [tn ] 2 = Usually, the first elements of the sequence must be
(1 − t)k (1 − 2t)
k=0 given explicitly, in order to allow the computation
X∞
2 1 of the successive values; they constitute the initial
= [tn−k ] k 2 . conditions and the sequence is well-defined if and only
(1 − t) (1 − 2t)
k=0 if every element can be computed by starting with
We can now obtain a semi-closed form for this sum by the initial conditions and going on with the other
expanding the generating function into partial frac- elements by means of the recurrence relation. For
tions: example, the constant sequence (1, 1, 1, . . .) can be
defined by the recurrence relation xn = xn−1 and
1 A B
= + + the initial condition x0 = 1. By changing the initial
(1 − t)k2 (1 − 2t) 1 − 2t (1 − t)k2 conditions, the sequence can radically change; if we
C D X consider the same relation xn = xn−1 together with
+ k 2 −1 + k 2 −2 + · · · + . the initial condition x0 = 2, we obtain the constant
(1 − t) (1 − t) 1−t
sequence {2, 2, 2, . . .}.
2
We can show that A = 2k , B = −1, C = −2, D = In general, a recurrence relation can be written
2
−4, . . . , X = −2k −1 ; in fact, by substituting these fn = F (fn−1 , fn−2 , . . .); when F depends on all
values in the previous expression we get: the values fn−1 , fn−2 , . . . , f1 , f0 , then the relation is
2 called a full history recurrence. If F only depends on
2k 1 2(1 − t) a fixed number of elements fn−1 , fn−2 , . . . , fn−p , then
− − −
1 − 2t (1 − t)k2 (1 − t)k2 the relation is called a partial history recurrence and p
4(1 − t)2
2
2k −1 (1 − t)k −1
2
is called the order of the relation. Besides, if F is lin-
− k 2 − ··· − k 2 = ear, we have a linear recurrence. Linear recurrences
(1 − t) (1 − t)
2
à 2 2
! are surely the most common and important type of
2k 1 2k (1 − t)k − 1 recurrence relations; if all the coefficients appearing
= − 2 =
1 − 2t (1 − t)k 2(1 − t) − 1 in F are constant, we have a linear recurrence with
48 CHAPTER 4. GENERATING FUNCTIONS
constant coefficients, and if the coefficients are poly- and by solving in F (t) we have the explicit expression:
nomials in n, we have a linear recurrence with poly-
nomial coefficients. As we are now going to see, the t
F (t) = .
method of generating functions allows us to find the 1 − t − t2
solution of any linear recurrence with constant coeffi-
cients, in the sense that we find a function f (t) such This is the generating function for the Fibonacci
that [tn ]f (t) = fn , ∀n ∈ N. For linear recurrences numbers. We can now find an explicit expression for
with polynomial coefficients, the same method allows Fn in the following way. The denominator of F (t)
us to find a solution in many occasions, but the suc- b where:
can be written 1 − t − t2 = (1 − φt)(1 − φt)
cess is not assured. On the other hand, no method
√
is known that solves all the recurrences of this kind, 1+ 5
and surely generating functions are the method giv- φ= ≈ 1.618033989
2
ing the highest number of positive results. We will
discuss this case in the next section. √
b 1− 5
The Fibonacci recurrence Fn = Fn−1 + Fn−2 is an φ= ≈ −0.618033989.
example of a recurrence relation with constant co- 2
efficients. When we have a recurrence of this kind, The constant 1/φ ≈ 0.618033989 is known as the
we begin by expressing it in such a way that the re- golden ratio. By applying the method of partial frac-
lation is valid for every n ∈ N. In the example of tion expansion we find:
Fibonacci numbers, this is not the case, because for
n = 0 we have F0 = F−1 + F−2 , and we do not know t A B
the values for the two elements in the r.h.s., which F (t) = = + =
b
(1 − φt)(1 − φt) 1 − φt 1 − φt
b
have no combinatorial meaning. However, if we write
b + B − Bφt
A − Aφt
the recurrence as Fn+2 = Fn+1 + Fn we have fulfilled = .
the requirement. This first step has a great impor- 1 − t − t2
tance, because it allows us to apply the operator G
to both members of the relation; this was not pos- We determine the two constants A and B by equat-
sible beforehand because of the principle of identity ing the coefficients in the first and last expression for
for generating functions. F (t):
The recurrence being linear with constant coeffi- ½ ½ √
cients, we can apply the axiom of linearity to the A+B =0 b = 1/ 5
A = 1/(φ − φ) √
recurrence: −Aφb − Bφ = 1 B = −A = −1/ 5
recurrence down to 1, and accordingly change the last We can somewhat simplify this expression by observ-
index 0. ing that:
In order to show a non-trivial example, let us
X n
discuss the problem of determining the coefficient 1 1 1
= + + ··· + 1 − 1 =
of tn in√ the f.p.s. corresponding to the function 2k − 1 2n − 1 2n − 3
k=0
f (t) = 1 − t ln(1/(1 − t)). If we expand the func-
tion, we find: 1 1 1 1 1 1
√ 1 = + + +· · ·+ +1− −· · ·− −1 =
f (t) = 1 − t ln = 2n 2n − 1 2n − 2 2 2n 2
1−t
1 1 71 5 31 6 1 1 1 1
= t − t3 − t4 − t − t + · · · . = H2n+2 − 2n + 1 − 2n + 2 − 2 Hn+1 + 2n + 2 − 1 =
24 24 1920 960
A method for finding a recurrence relation for the 1 2(n + 1)
= H2n+2 − Hn+1 − .
coefficients fn of this f.p.s. is to derive a differential 2 2n + 1
equation for f (t). By differentiating:
Furthermore, we have:
′ 1 1 1 µ ¶ µ ¶
f (t) = − √ ln +√ 1 2n 4 2n + 2 n+1
2 1−t 1−t 1−t =
(n + 1)4n n (n + 1)4n+1 n + 1 2(2n + 1)
and therefore we have the differential equation: µ ¶
2 1 2n + 2
= .
1 √ 2n + 1 4n+1 n + 1
(1 − t)f ′ (t) = − f (t) + 1 − t.
2
Therefore:
By extracting the coefficient of tn , we have the rela- µ ¶
tion: 1 2(n + 1)
fn+1 = Hn+1 − H2n+2 + ×
µ ¶ 2 2n + 1
1 1/2
(n + 1)fn+1 − nfn = − fn + (−1)n µ ¶
2 n 2 1 2n + 2
× .
which can be written as: 2n + 1 4n+1 n + 1
µ ¶
2n − 1 1 2n This expression allows us to obtain a formula for fn :
(n + 1)fn+1 = fn − n .
2 4 (2n − 1) n µ ¶ µ ¶
1 2n 2 1 2n
fn = Hn − H2n + =
This is a recurrence relation of the first order with the 2 2n − 1 2n − 1 4n n
initial condition f0 = 0. Let us apply the summing
µ ¶µ ¶
factor method, for which we have an = n, bn = (2n − 4n 1/2
1)/2. Since a0 = 0, we have: = Hn − 2H2n + .
2n − 1 n
an an−1 . . . a1 n(n − 1) · · · 1 · 2n The reader can numerically check this expression
= =
bn bn−1 . . . b1 (2n − 1)(2n − 3) · · · 1 against the actual values of fn given above. By using
n!2n 2n(2n − 2) · · · 2 the asymptotic approximation Hn ∼ ln n + γ given in
= =
2n(2n − 1)(2n − 2) · · · 1 the Introduction, we find:
n 2
4 n! 4n
= . Hn − 2H2n + ∼
(2n)! 2n − 1
By multiplying the recurrence relation by this sum- ∼ ln n + γ − 2(ln 2 + ln n + γ) + 2 =
ming factor, we find: = − ln n − γ − ln 4 + 2.
4n n!2 2n − 1 4n n!2 1 Besides:
(n + 1) fn+1 = fn − . µ ¶
(2n)! 2 (2n)! 2n − 1 1 1 2n 1
∼ √
We are fortunate and cn simplifies dramatically; 2n − 1 4n n 2n πn
besides, we know that the two coefficients of fn+1 and we conclude:
and fn are equal, notwithstanding their appearance.
Therefore we have: 1
fn ∼ −(ln n + γ + ln 4 − 2) √
µ ¶X n 2n πn
1 2n −1
fn+1 = (a0 f0 = 0).
(n + 1)4n n 2k − 1 which shows that |fn | grows as ln n/n3/2 .
k=0
4.11. THE INTERNAL PATH LENGTH OF BINARY TREES 51
¡n−1¢
4.11 The internal path length k , and therefore the total contribution to Pn of
the left subtrees is:
of binary trees µ ¶ µ ¶
n−1 Pk
(n − 1 − k)!(Pk + k!k) = (n − 1)! +k .
Binary trees are often used as a data structure to re- k k!
trieve information. A set D of keys is given, taken
In a similar way we find the total contribution of the
from an ordered universe U . Therefore D is a permu-
right subtrees:
tation of the ordered sequence d1 < d2 < · · · < dn ,
µ ¶
and as the various elements arrive, they are inserted n−1
in a binary tree. As we know, there are n! pos- k!(Pn−1−k + (n − 1 − k)!(n − 1 − k)) =
k
sible ¡permutations
¢ of the keys in D, but there are µ ¶
only 2n n /(n + 1) different binary trees with n nodes. Pn−1−k
= (n − 1)! + (n − 1 − k) .
When we are looking for some key d to find whether (n − 1 − k)!
d ∈ D or not, we perform a search in the binary tree, It only remains to count the contribution of the roots,
comparing d against the root and other keys in the which obviously amounts to n!, a single comparisons
tree, until we find d or we arrive to some leaf and we for every tree. We therefore have the following recur-
cannot go on with our search. In the former case our rence relation, in which the contributions of the left
search is successful, while in the latter case it is un- and right subtrees turn out to be the same:
successful. The problem is: how many comparisons
should we perform, on the average, to find out that d Pn = n! + (n − 1)!×
is present in the tree (successful search)? The answer
to this question is very important, because it tells us Xµ
n−1
Pk Pn−1−k
¶
how good binary trees are for searching information. × +k+ + (n − 1 − k) =
k! (n − 1 − k)!
k=0
The number of nodes along a path in the tree, start-
ing at the root and arriving to a given node K is Xµ
n−1 ¶
called the internal path length for K. It is just the Pk
= n! + 2(n − 1)! +k =
number of comparisons necessary to find the key in k!
k=0
K. Therefore, our previous problem can be stated in Ãn−1 !
X Pk n(n − 1)
the following way: what is the average internal path = n! + 2(n − 1)! + .
length for binary trees with n nodes? Knuth has k! 2
k=0
found a rather simple way for answering this ques- We used the formula for the sum of the first n − 1
tion; however, we wish to show how the method of integers, and now, by dividing by n!, we have:
generating functions can be used to find the average
internal path length in a standard way. The reason- n−1 n−1
Pn 2 X Pk 2 X Pk
ing is as follows: we evaluate the total internal path =1+ +n−1=n+ .
n! n k! n k!
length for all the trees generated by the n! possible k=0 k=0
permutations of our key set D, and then divide this Let us now set Qn = Pn /n!, so that Qn is the average
number by n!n, the total number of nodes in all the total i.p.l. relative to a single tree. If we’ll succeed
trees. in finding Qn , the average i.p.l. we are looking for
A non-empty binary tree can be seen as two sub- will simply be Qn /n. We can also reformulate the
trees connected to the root (see Section 2.9); the left recurrence for n + 1, in order to be able to apply the
subtree contains k nodes (k = 0, 1, . . . , n − 1) and the generating function operator:
right subtree contains the remaining n − 1 − k nodes. n
Let Pn the total internal path length (i.p.l.) of all the X
(n + 1)Qn+1 = (n + 1)2 + 2 Qk
n! possible trees generated by permutations. The left k=0
subtrees have therefore a total i.p.l. equal to Pk , but
1+t Q(t)
every search in these subtrees has to pass through Q′ (t) = +2
(1 − t)3 1−t
the root. This increases the total i.p.l. by the total
number of nodes, i.e., it actually is Pk + k!k. We 2 1+t
Q′ (t) − Q(t) = .
now observe that every left subtree is associated to 1−t (1 − t)3
each possible right subtree and therefore it should be This differential equation can be easily solved:
counted (n − 1 − k)! times. Besides, every permuta- µZ ¶
tion generating the left and right subtrees is not to 1 (1 − t)2 (1 + t)
Q(t) = dt + C =
be counted only once: the keys can be arranged in (1 − t)2 (1 − t)3
µ ¶
all possible ways in the overall permutation, retain- 1 1
ing their relative ordering. These possible ways are = 2 ln − t + C .
(1 − t)2 1−t
52 CHAPTER 4. GENERATING FUNCTIONS
Since the i.p.l. of the empty tree is 0, we should have Formally, a height balanced binary tree is a tree such
Q0 = Q(0) = 0 and therefore, by setting t = 0, we that for every node K in it, if h′K and h′′k are the
find C = 0. The final result is: heights of the two subtrees originating from K, then
|h′K − h′′K | ≤ 1.
2 1 t
Q(t) = ln − . The algorithm of Adelson-Velski and Landis is very
(1 − t)2 1 − t (1 − t)2
important because, as we are now going to show,
We can now use the formula for G(nHn ) (see the Sec- height balanced binary trees assure that the retriev-
tion 4.4 on Common Generating Functions) to ex- ing time for every key in the tree is O(ln n). Because
tract the coefficient of tn : of that, height balanced binary trees are also known
µ µ ¶ ¶ as AVL trees, and the algorithm for building AVL-
n 1 1 trees from a set of n keys can be found in any book
Qn = [t ] 2 ln + 1 − (2 + t) =
(1 − t)2 1−t on algorithms and data structures. Here we only wish
µ ¶ to perform a worst case analysis to prove that the re-
t 1 2+t trieval time in any AVL tree cannot be larger than
= 2[tn+1 ] ln + 1 − [t n
] =
(1 − t)2 1−t (1 − t)2 O(ln n).
µ ¶ µ ¶
−2 −2 In order to perform our analysis, let us consider
= 2(n+1)Hn+1 −2 (−1)n − (−1)n−1 = to worst possible AVL tree. Since, by definition, the
n n−1
height of the left subtree of any node cannot exceed
= 2(n + 1)Hn + 2 − 2(n + 1) − n = 2(n + 1)Hn − 3n. the height of the corresponding right subtree plus 1,
Thus we conclude with the average i.p.l.: let us consider trees in which the height of the left
µ ¶ subtree of every node exceeds exactly by 1 the height
Pn Qn 1 of the right subtree of the same node. In Figure 4.1
= =2 1+ Hn − 3.
n!n n n we have drawn the first cases. These trees are built in
a very simple way: every tree Tn , of height n, is built
This formula is asymptotic to 2 ln n+γ −3, and shows by using the preceding tree Tn−1 as the left subtree
that the average number of comparisons necessary to and the tree Tn−2 as the right subtree of the root.
retrieve any key in a binary tree is in the order of Therefore, the number of nodes in Tn is just the sum
O(ln n). of the nodes in Tn−1 and in Tn−2 , plus 1 (the root),
and the condition on the heights of the subtrees of
4.12 Height balanced binary every node is satisfied. Because of this construction,
Tn can be considered as the “worst” tree of height n,
trees in the sense that every other AVL-tree of height n will
have at least as many nodes as Tn . Since the height
We have been able to show that binary trees are a n is an upper bound for the number of comparisons
“good” retrieving structure, in the sense that if the necessary to retrieve any key in the tree, the average
elements, or keys, of a set {a1 , a2 , . . . , an } are stored retrieving time for every such tree will be ≤ n.
in random order in a binary (search) tree, then the If we denote by |Tn | the number of nodes in the
expected average time for retrieving any key in the tree Tn , we have the simple recurrence relation:
tree is in the order of ln n. However, this behavior of
binary trees is not always assured; for example, if the |Tn | = |Tn−1 | + |Tn−2 | + 1.
keys are stored in the tree in their proper order, the
resulting structure degenerates into a linear list and This resembles the Fibonacci recurrence relation,
the average retrieving time becomes O(n). and, in fact, we can easily show that |Tn | = Fn+1 − 1,
To avoid this drawback, at the beginning of the as is intuitively apparent from the beginning of the
1960’s, two Russian researchers, Adelson-Velski and sequence {0, 1, 2, 4, 7, 12, . . .}. The proof is done by
Landis, found an algorithm to store keys in a “height mathematical induction. For n = 0 we have |T0 | =
balanced” binary tree, a tree for which the height of F1 − 1 = 1 − 1 = 0, and this is true; similarly we
the left subtree of every node K differs by at most proceed for n + 1. Therefore, let us suppose that for
1 from the height of the right subtree of the same every k < n we have |Tk+1 | = Fk − 1; this holds for
node K. To understand this concept, let us define the k = n−1 and k = n−2, and because of the recurrence
height of a tree as the highest level at which a node relation for |Tn | we have:
in the tree is placed. The height is also the maximal
number of comparisons necessary to find any key in |Tn | = |Tn−1 | + |Tn−2 | + 1 =
the tree. Therefore, if we find a limitation for the = Fn − 1 + Fn−1 − 1 + 1 = Fn+1 − 1
height of a class of trees, this is also a limitation for
the internal path length of the trees in the same class. since Fn + Fn−1 = Fn+1 by the Fibonacci recurrence.
4.13. SOME SPECIAL RECURRENCES 53
T0 Tr 1 Tr 2 Tr 3 Tr 4 Tr 5
¡ ¡@ ¡¡@ ¡@
r
¡ r¡ @r r
¡ @r r
¡ @r
¡ A ¢ ¡@ ¢
r
¡ r¡ Ar ¢r r
¡ @r r¢ @ @r
¡ A ¢ ¢
r
¡ r¡ Ar r¢ r¢
¡
r¡
We have√ shown that for large values of n we √ have We are now in a position to find out their expo-
Fn ≈ φn / √5; therefore we have |Tn | ≈ φn+1 / 5 − 1 nential generating function, i.e., the function B(t) =
or φn+1 ≈ 5(|Tn |+1).√ By passing to the logarithms, G(Bn /n!), and prove some of their properties. The
we have: n ≈ logφ ( 5(|Tn | + 1)) − 1, and since all defining relation can be written as:
logarithms are proportional, n = O(ln |Tn |). As we n µ ¶
X n+1
observed, every AVL-tree of height n has a number Bn−k =
of nodes not less than |Tn |, and this assures that the n−k
k=0
retrieving time for every AVL-tree with√ at most |Tn | n
X Bn−k
nodes is bounded from above by logφ ( 5(|Tn |+1))−1 = (n + 1)n · · · (k + 2) =
(n − k)!
k=0
Xn
4.13 Some special recurrences (n + 1)! Bn−k
= = δn,0 .
(k + 1)! (n − k)!
k=0
Not all recurrence relations are linear and we had
occasions to deal with a different sort of relation when If we divide everything by (n + 1)!, we obtain:
we studied the Catalan
Pn−1 numbers. They satisfy the n
X 1 Bn−k
recurrence Cn = k=0 Ck Cn−k−1 , which however, = δn,0
in this particular form, is only valid for n > 0. In (k + 1)! (n − k)!
k=0
order to apply the method of generating functions,
we write it for n + 1: and since this relation holds for every n ∈ N, we
n
can pass to the generating functions. The left hand
X
Cn+1 = Ck Cn−k . member is a convolution, whose first factor is the shift
k=0
of the exponential function and therefore we obtain:
Riordan Arrays
5.1 Definitions and basic con- Theorem 5.1.1 Let D = (d(t), h(t)) be a Riordan
array and let f (t) be the generating function of the
cepts sequence (fk )k∈N ; then we have:
A Riordan array is a couple of formal power series X n
D = (d(t), h(t)); if both d(t), h(t) ∈ F 0 , then the dn,k fk = [tn ]d(t)f (th(t)) (5.1.3)
Riordan array is called proper. The Riordan array k=0
can be identified with the infinite, lower triangular
Proof: The proof consists in a straight-forward com-
array (or triangle) (dn,k )n,k∈N defined by:
putation:
dn,k = [tn ]d(t)(th(t))k (5.1.1) Xn X∞
55
56 CHAPTER 5. RIORDAN ARRAYS
out of every three powers of 2, starting with 2n and By definition, the last expression denotes the generic
going down to the lowest integer exponent ≥ 0; we element of the Riordan array (f (t), g(t)) where f (t) =
have: d(t)a(th(t)) and g(t) = h(t)b(th(t)). Therefore we
⌊n/3⌋ have:
X 1 1
Sn = 2n−3k = [tn ] . (d(t), h(t)) · (a(t), b(t)) = (d(t)a(th(t)), h(t)b(th(t))).
1 − 2t 1 − t3
k=0
(5.2.1)
As we will learn studying asymptotics, an approxi-
mate value for this sum can be obtained by extracting This expression is particularly important and is the
the coefficient of the first factor and then by multiply- basis for many developments of the Riordan array
ing it by the second factor, in which t is substituted theory.
by 1/2. This gives Sn ≈ 2n+3 /7, and in fact we have The product is obviously associative, and we ob-
the exact value Sn = ⌊2n+3 /7⌋. serve that the Riordan array (1, 1) acts as the neutral
In a sense, the theorem on the sums involving the element or identity. In fact, the array (1, 1) is every-
Riordan arrays is a characterization for them; in fact, where 0 except for the elements on the main diagonal,
we can prove a sort of inverse property: which are 1. Observe that this array is proper.
Let us now suppose that (d(t), h(t)) is a proper
Theorem 5.1.2 Let (dn,k )n,k∈N be an infinite tri-
Riordan array. By formula (5.2.1), we immediately
angle
P such that for every sequence (fk )k∈N we have see that the product of two proper Riordan arrays is
n
k dn,k fk = [t ]d(t)f (th(t)), where f (t) is the gen- proper; therefore, we can look for a proper Riordan
erating function of the sequence and d(t), h(t) are two array (a(t), b(t)) such that (d(t), h(t)) · (a(t), b(t)) =
f.p.s. not depending on f (t). Then the triangle de- (1, 1). If this is the case, we should have:
fined by the Riordan array (d(t), h(t)) coincides with
(dn,k )n,k∈N . d(t)a(th(t)) = 1 and h(t)b(th(t)) = 1.
5.3. THE A-SEQUENCE FOR PROPER RIORDAN ARRAYS 57
By setting y = th(t) we have: done by using the LIF. As we observed in the first sec-
h ¯ i tion, the bivariate generating function for (d(t), h(t))
¯
a(y) = d(t)−1 ¯ t = yh(t)−1 is d(t)/(1 − tzh(t)) and therefore we have:
h ¯ i dh (t)
¯
b(y) = h(t)−1 ¯ t = yh(t)−1 . dn,k = [tn z k ] =
1 − tzhh (t)
· ¸
Here we are in the hypotheses of the Lagrange Inver- k n dh (t) ¯¯
= [z ][t ] ¯ y = thh (t) .
sion Formula, and therefore there is a unique function 1 − zy
−1
t = t(y) such that t(0) = 0 and t = yh(t) . Besides,
By the formulas above, we have:
being d(t), h(t) ∈ F 0 , the two f.p.s. a(y) and b(y) are
uniquely defined. We have therefore proved: y = thh (t) = th(thh (t))−1 = th(y)−1
Theorem 5.2.1 The set A of proper Riordan arrays which is the same as t = yh(y). Therefore we find:
is a group with the operation of row-by-column prod- d (t) = d (yh(y)) = d(t)−1 , and consequently:
h h
uct defined functionally by relation (5.2.1). · ¸
k n d(y)−1 ¯¯ −1
It is a simple matter to show that some important d n,k = [z ][t ] ¯ y = th(y) =
1 − zy
classes of Riordan arrays are subgroups of A: µ ¶
1 d d(y)−1 1
= [z k ] [y n−1 ] =
• the set of the Riordan arrays (f (t), 1) is an in- n dy 1 − zy h(y)n
variant subgroup of A; it is called the Appell µ
1 z
subgroup; = [z k ] [y n−1 ] −
n d(y)(1 − zy)2
¶
• the set of the Riordan arrays (1, g(t)) is a sub- d′ (y) 1
− =
group of A and is called the subgroup of associ- d(y) (1 − zy) h(y)n
2
ated operators or the Lagrange subgroup; ̰
X
k 1 n−1
= [z ] [y ] z r+1 y r (r + 1)−
• the set of Riordan arrays (f (t), f (t)) is a sub- n r=0
group of A and is called the Bell subgroup. Its ∞
!
′
d (y) X 1
elements are also known as renewal arrays. − r r
z y =
d(y) r=0 d(y)h(y)n
The first two subgroups have already been consid- µ ¶
ered in the Chapter on “Formal Power Series” and 1 n−1 d′ (y) 1
= [y ] ky k−1 − y k =
show the connection between f.p.s. and Riordan ar- n d(y) d(y)h(y)n
µ ¶
rays. The notations used in that Chapter are thus 1 n−k yd′ (y) 1
explained as particular cases of the most general case = [y ] k − .
n d(y) d(y)h(y)n
of (proper) Riordan arrays.
Let us now return to the formulas for a Riordan This is the formula we were looking for.
array inverse. If h(t) is any fixed invertible f.p.s., let
us define: 5.3 The A-sequence for proper
h ¯ i
¯
dh (t) = d(y)−1 ¯ y = th(y)−1 Riordan arrays
so that we can write (d(t), h(t))−1 = (dh (t), hh (t)). Proper Riordan arrays play a very important role
By the product formula (5.2.1) we immediately find in our approach. Let us consider a Riordan array
the identities: D = (d(t), h(t)), which is not proper, but d(t) ∈
F 0 . Since h(0) = 0, an s > 0 exists such that
d(thh (t)) = dh (t)−1 dh (th(t)) = d(t)−1 h(t) = hs ts + hs+1 ts+1 + · · · and hs 6= 0. If we define
b
h(t) = hs +hs+1 t+· · ·, then bh(t) ∈ F 0 . Consequently,
h(thh (t)) = hh (t)−1 hh (th(t)) = h(t)−1 the Riordan array D b = (d(t), b h(t)) is proper and the
which can be reduced to the single and basic rule: rows of D can be seen as the s-diagonals (dbn−sk )k∈N
b Fortunately, for proper Riordan arrays, Rogers
of D.
f (thh (t)) = f h (t)−1 ∀f (t) ∈ F 0 . has found an important characterization: every ele-
ment dn+1,k+1 , n, k ∈ N, can be expressed as a linear
Observe that obviously f h (t) = f (t). combination of the elements in the preceding row,
We wish now to find an explicit expression for the i.e.:
generic element dn,k in the inverse Riordan array
(d(t), h(t))−1 in terms of d(t) and h(t). This will be dn+1,k+1 = a0 dn,k + a1 dn,k+1 + a2 dn,k+2 + · · · =
58 CHAPTER 5. RIORDAN ARRAYS
∞
X and this uniquely determines A when h(t) is given
= aj dn,k+j . (5.3.1)
and, vice versa, h(t) is uniquely determined when A
j=0
is given.
The sum is actually finite and the sequence A = The A-sequence for the Pascal triangle is the so-
(ak )k∈N is fixed. More precisely, we can prove the lution A(y) of the functional equation 1/(1 − t) =
following theorem: A(t/(1 − t)). The simple substitution y = t/(1 − t)
gives A(y) = 1 + y, corresponding to the well-known
¡ ¢
Theorem 5.3.1 An infinite lower triangular array basic recurrence of the Pascal triangle: n+1 =
¡n¢ ¡ n ¢ k+1
D = (dn,k )n,k∈N is a Riordan array if and only if a
k + k+1 . At this point, we realize that we could
sequence A = {a0 6= 0, a1 , a2 , . . .} exists such that for have started with this recurrence relation and directly
every n, k ∈ N relation (5.3.1) holds found A(y) = 1+y. Now, h(t) is defined by (5.3.2) as
Proof: Let us suppose that D is the Riordan the solution of h(t) = 1 + th(t), and this immediately
array (d(t), h(t)) and let us consider the Riordan gives h(t) = 1/(1−t). Furthermore, since column 0 is
array (d(t)h(t), h(t)); we define the Riordan array {1, 1, 1, . . .}, we have proved that the Pascal triangle
(A(t), B(t)) by the relation: corresponds to the Riordan array (1/(1−t), 1/(1−t))
as initially stated.
(A(t), B(t)) = (d(t), h(t))−1 · (d(t)h(t), h(t)) The pair of functions d(t) and A(t) completely
characterize a proper Riordan array. Another type
or: of characterization is obtained through the following
observation:
(d(t), h(t)) · (A(t), B(t)) = (d(t)h(t), h(t)).
Theorem 5.3.2 Let (dn,k )n,k∈N be any infinite,
By performing the product we find: lower triangular array with dn,n 6= 0, ∀n ∈ N (in
particular, let it be a proper Riordan array); then a
d(t)A(th(t)) = d(t)h(t) and h(t)B(th(t)) = h(t). unique sequence Z = (zk )k∈N exists such that every
element in column 0 can be expressed as a linear com-
The latter identity gives B(th(t)) = 1 and this implies bination of all the elements in the preceding row, i.e.:
B(t) = 1. Therefore we have (d(t), h(t)) · (A(t), 1) = ∞
X
(d(t)h(t), h(t)).
P∞ The element fP n,k of the left hand dn+1,0 = z0 dn,0 + z1 dn,1 + z2 dn,2 + · · · = zj dn,j .
∞
member is j=0 dn,j ak−j = j=0 dn,k+j aj , if as j=0
usual we interpret ak−j as 0 when k < j. The same
element in the right hand member is: (5.3.3)
Proof: Let z0 = d1,0 /d0,0 . Now we can uniquely
[tn ]d(t)h(t)(th(t))k = determine the value of z1 by expressing d2,0 in terms
of the elements in row 1, i.e.:
= [tn+1 ]d(t)(th(t))k+1 = dn+1,k+1 .
d0,0 d2,0 − d21,0
By equating these two quantities, we have the iden- d2,0 = z0 d1,0 + z1 d1,1 or z1 = .
d0,0 d1,1
tity (5.3.1). For the converse, let us observe that
(5.3.1) uniquely defines the array D when the ele- In the same way, we determine z2 by expressing d3,0
ments {d0,0 , d1,0 , d2,0 , . . .} of column 0 are given. Let in terms of the elements in row 2, and by substituting
d(t) be the generating function of this column, A(t) the values just obtained for z0 and z1 . By proceeding
the generating function of the sequence A and de- in the same way, we determine the sequence Z in a
fine h(t) as the solution of the functional equation unique way.
h(t) = A(th(t)), which is uniquely determined be- The sequence Z is called the Z-sequence for the
cause of our hypothesis a0 6= 0. We can therefore (Riordan) array; it characterizes column 0, except
consider the proper Riordan array D b = (d(t), h(t));
for the element d0,0 . Therefore, we can say that
by the first part of the theorem, D b satisfies relation the triple (d0,0 , A(t), Z(t)) completely characterizes
(5.3.1), for every n, k ∈ N and therefore, by our previ- a proper Riordan array. To see how the Z-sequence
ous observation, it must coincide with D. This com- is obtained by starting with the usual definition of a
pletes the proof. Riordan array, let us prove the following:
The sequence A = (ak )k∈N is called the A-sequence Theorem 5.3.3 Let (d(t), h(t)) be a proper Riordan
of the Riordan array D = (d(t), h(t)) and it only array and let Z(t) be the generating function of the
depends on h(t). In fact, as we have shown during array’s Z-sequence. We have:
the proof of the theorem, we have: d0,0
d(t) =
h(t) = A(th(t)) (5.3.2) 1 − tZ(th(t))
5.4. SIMPLE BINOMIAL COEFFICIENTS 59
Proof: By the preceding theorem, the Z-sequence two parameters and k is a non-negative integer vari-
exists and is unique. Therefore, equation (5.3.3) is able. Depending if we consider n a variable and m a
valid for every n ∈ N, and we can go on to the gener- parameter, or vice versa, we have two different infi-
ating functions. Since d(t)(th(t))k is the generating nite arrays (dn,k ) or (dbm,k ), whose elements depend
function for column k, we have: on the parameters a, b, m or a, b, n, respectively. In
either case, if some conditions on a, b hold, we have
d(t) − d0,0
= Riordan arrays and therefore we can apply formula
t (5.1.3) to find the value of many sums.
= z0 d(t) + z1 d(t)th(t) + z2 d(t)(th(t))2 + · · · =
= d(t)(z0 + z1 th(t) + z2 (th(t))2 + · · ·) = Theorem 5.4.1 Let dn,k and dbm,k be as above. If
= d(t)Z(th(t)). b > a and b − a is an integer, then D = (dn,k ) is
b =
a Riordan array. If b < 0 is an integer, then D
By solving this equation in d(t), we immediately find (db ) is a Riordan array. We have:
m,k
the relation desired.
µ ¶
The relation can be inverted and this gives us the tm tb−a−1
D= ,
formula for the Z-sequence: (1 − t)m+1 (1 − t)b
· ¸ µ ¶
d(t) − d0,0 ¯¯ −1 t−b−1
Z(y) = ¯ t = yh(t) . b
D = (1 + t) ,n
.
td(t) (1 + t)−a
We conclude this section by giving a theorem, Proof: By using well-known properties of binomial
which characterizes renewal arrays by means of the coefficients, we find:
A- and Z-sequences: µ ¶ µ ¶
n + ak n + ak
Theorem 5.3.4 Let d(0) = h(0) 6= 0. Then d(t) = dn,k = = =
m + bk n − m + ak − bk
h(t) if and only if: A(y) = d(0) + yZ(y). µ ¶
−n − ak + n − m + ak − bk − 1
= ×
Proof: Let us assume that A(y) = d(0) + yZ(y) or n − m + ak − bk
Z(y) = (A(y) − d(0))/y. By the previous theorem,
× (−1)n−m+ak−bk =
we have: µ ¶
−m − bk − 1
d(0) = (−1)n−m+ak−bk =
d(t) = = (n − m) + (a − b)k
1 − tZ(th(t))
1
d(0) = [tn−m+ak−bk ] =
= = (1 − t)m+1+bk
1 − (tA(th(t)) − d(0)t)/th(t) µ b−a ¶k
tm t
d(0)th(t) = [tn ] ;
= = h(t), (1 − t)m+1 (1 − t)b
d(0)t
because A(th(t)) = h(t) by formula (5.3.2). Vice and:
versa, by the formula for Z(y), we obtain from the dbm,k = [tm+bk ](1 + t)n+ak =
hypothesis d(t) = h(t):
= [tm ](1 + t)n (t−b (1 + t)a )k .
d(0) + yZ(y) =
· µ ¶ ¸ The theorem now directly follows from (5.1.1)
1 d(0) ¯¯ −1
= d(0) + y − ¯ t = yh(t) = For m = a = 0 and b = 1 we again find the Riordan
t th(t)
· ¸ array of the Pascal triangle. The sum (5.1.3) takes
th(t) d(0)th(t) ¯¯ −1
= d(0) + − ¯ t = yh(t) = on two specific forms which are worth being stated
t th(t)
£ ¯ ¤ explicitly:
= h(t) ¯ t = yh(t)−1 = A(y). X µ n + ak ¶
fk =
m + bk
k
µ ¶
n tm tb−a
= [t ] f b>a (5.4.1)
(1 − t)m+1 (1 − t)b
5.4 Simple binomial coeffi-
X µ n + ak ¶
cients fk =
m + bk
k
Let us consider simple binomial coefficients,
¡ n+ak ¢ i.e., bi-
nomial coefficients of the form m+bk , where a, b are = [tm ](1 + t)n f (t−b (1 + t)a ) b < 0. (5.4.2)
60 CHAPTER 5. RIORDAN ARRAYS
√ ¸
If m and n are independent of each other, these (1 − 2 y)z+1 ¯¯ t
− √ ¯y= =
relations can also be stated as generating¡ n+ak ¢ function 2 y (1 + t)2
identities. The binomial coefficient m+bk is so gen- µ √
n z+1 (1 + t + 2 t)z+1
eral that a large number of combinatorial sums can = [t ](1 + t) √ −
be solved by means of the two formulas (5.4.1) and 2 t(1 + t)z+1
√ ¶
(5.4.2). (1 + t − 2 t)z+1
− √ =
Let us begin our set of examples with a simple¡sum;¢ 2 t(1 + t)z+1
by the theorem above, the binomial coefficients m n−k µ ¶
2n+1 2z+2 2z + 2
corresponds to the Riordan array (tm /(1 + t)m+1 , 1); = [t ](1 + t) = ;
2n + 1
therefore, by the formula concerning the row sums,
we have: in the last but one passage, we √ used backwards
√ 2z+2the
µ ¶ z+1
X n−k m bisection rule, since (1 + t ± 2 t) = (1 ± t) .
t 1
= [tn ] = We solve the following sum by using (5.4.2):
m (1 − t)m+1 1 − t
k
µ ¶ X µ2n − 2k ¶µn¶
1 n+1 (−2)k =
= [tn−m ] = . k
m−k k
(1 − t)m+2 m+1 · ¸
¯ t
m 2n n ¯
Another simple example is the sum: = [t ](1 + t) (1 − 2y) ¯ y = =
(1 − t)2
Xµ n ¶ µ ¶
5k = n
2k + 1 = [tm ](1 + t2 )n =
k m/2
· ¯ ¸
n t 1 ¯ t2 where the binomial coefficient is to be taken as zero
= [t ] ¯y= =
(1 − t)2 1 − 5y (1 − t)2 when m is odd.
1 n 2t
= [t ] = 2n−1 Fn .
2 1 − 2t − 4t2
5.5 Other Riordan arrays from
The following sum is a more interesting case. From
the generating function of the Catalan numbers we binomial coefficients
immediately find:
Other Riordan arrays can be found by using the the-
X µ n + k ¶µ2k ¶ (−1)k orem in the previous section and the rule (α 6= 0, if
=
m + 2k k k+1 − is considered):
k
·√ ¸ µ ¶ µ ¶ µ ¶
tm 1 + 4y − 1 ¯¯ t α±β α α α−1
= [tn ] ¯ y = = = ± .
(1 − t)m+1 2y (1 − t)2 α β β β−1
Ãs !
n−m 1 4t For example we find:
= [t ] 1+ −1 × µ ¶ µ ¶
(1 − t)m+1 (1 − t)2 2n n + k 2n n + k
= =
(1 − t)2 n+k 2k n+k n−k
× = µ ¶ µ ¶
2t n+k n+k−1
µ ¶ = + =
1 n−1 n−k n−k−1
= [tn−m ] = . µ ¶ µ ¶
(1 − t)m m−1 n+k n−1+k
= + .
In the following sum we use the bisection 2k 2k
¡ ¢ formulas.
Because the generating function for z+1 k
k 2 is (1 + Hence, by formula (5.4.1), we have:
z+1
2t) , we have:
X 2n µn + k ¶
µµ ¶ ¶ fk =
z+1 n+k n−k
G 22k+1 = k
2k + 1
X µn + k ¶ X µn − 1 + k ¶
1 ³ √ √ ´
= fk + fk =
= √ (1 + 2 t)z+1 − (1 − 2 t)z+1 . 2k 2k
2 t k
µ
k
¶
By applying formula (5.4.2): n 1 t
= [t ] f +
1−t (1 − t)2
X µ z + 1 ¶µz − 2k ¶ µ ¶
22k+1 = 1 t
2k + 1 n−k + [tn−1 ] f =
k 1−t (1 − t)2
· √ µ ¶
(1 + 2 y)z+1 1+t t
= [tn ](1 + t)z √ − = [tn ] f .
2 y 1−t (1 − t)2
5.6. BINOMIAL COEFFICIENTS AND THE LIF 61
This¡ proves
¢ that the infinite triangle of the elements where φ is the golden ratio and φb = −φ−1 . The
2n n+k
n+k 2k is a proper Riordan array and many identi- reader can generalize formula (5.5.1) by using the
ties can be proved by means of the previous formula. change of variable t → pt and prove other formulas.
For example: The following one is known as Riordan’s old identity:
X 2n µn + k ¶µ2k ¶ X n µn − k ¶
(−1)k = (a + b)n−2k (−ab)k = an + bn
n+k n−k k n−k k
k k
· ¯ ¸
n 1+t 1 ¯ t
= [t ] √ ¯y= = while this is a generalization of Hardy’s identity:
1−t 1 + 4y (1 − t)2
= [tn ]1 = δn,0 , X n µn − k ¶
xn−2k (−1)k =
n−k k
X 2n µn + k ¶µ2k ¶ (−1)k
k
= √ √
n+k n−k k k+1 (x + x2 − 4)n + (x − x2 − 4)n
k = .
·√ ¸ 2n
n 1+t 1 + 4y − 1 ¯¯ t
= [t ] ¯y= =
1−t 2y (1 − t)2
5.6 Binomial coefficients and
= [tn ](1 + t) = δn,0 + δn,1 .
the LIF
The following is a quite different case. Let f (t) =
G(fk ) and: In a few cases only, the formulas of the previous sec-
µ ¶ Z t tions give the desired result when the m and n in
fk f (τ ) − f0 the numerator and denominator of a binomial coeffi-
G(t) = G = dτ.
k 0 τ cient are related between them. In fact, in that case,
we have to extract the coefficient of tn from a func-
Obviously we have: tion depending on the same variable n (or m). This
µ ¶ µ ¶ requires to apply the Lagrange Inversion Formula, ac-
n−k n−k n−k−1
= cording to the diagonalization rule.
¡ Let
¢ us suppose
k k k−1
we have the binomial coefficient 2n−k
n−k and we wish
except for k = 0, when the left-hand side is 1 and the to know whether it corresponds to a Riordan array
right-hand side is not defined. By formula (5.4.1): or not. We have:
µ ¶ µ ¶
X n n−k 2n − k
fk = = [tn−k ](1 + t)2n−k =
n−k k n − k
k µ ¶k
∞ µ ¶ n 2n t
X n − k − 1 fk = [t ](1 + t) .
= f0 + n = 1+t
k−1 k
k=1
µ 2 ¶ The function (1 + t)2n cannot be assumed as the
n t
= f0 + n[t ]G . (5.5.1) d(t) function of a Riordan array because it varies
1−t as n varies. Therefore, let us suppose that k is
This gives an immediate proof of the following for- fixed; we can apply the diagonalization rule with
mula known as Hardy’s identity: F (t) = (t/(1 + t))k and φ(t) = (1 + t)2 , and try to
find a true generating function. We have to solve the
X n µn − k ¶
(−1) =k equation:
n−k k
k
w = tφ(w) or w = t(1 + w)2 .
n 1 − t2
= [t ] ln =
µ
1 + t3
¶ This equation is tw2 − (1 − 2t)w + t = 0 and we are
1 1 looking for the unique solution w = w(t) such that
= [tn ] ln − ln =
1+t 3 1−t 2 w(0) = 0. This is:
½
(−1)n 2/n if 3 divides n √
= 1 − 2t − 1 − 4t
(−1)n−1 /n otherwise. w(t) = .
2t
We also immediately obtain:
We now perform the necessary computations:
X 1 µn − k ¶ φn + φbn µ ¶k
= w
n−k k n F (w) = =
k 1+w
62 CHAPTER 5. RIORDAN ARRAYS
µ √ ¶k
1 − 2t − 1 − 4t 5.7 Coloured walks
= √ =
1 − 1 − 4t
µ √ ¶k In the section “Walks, trees and Catalan numbers”
1 − 1 − 4t we introduced the concept of a walk or path on the
= ;
2 integral lattice Z2 . The concept can be generalized by
furthermore: defining a walk as a sequence of steps starting from
1 1 1 the origin and composed by three kinds of steps:
= =√ .
1 − tφ′ (w) 1 − 2t(1 + w) 1 − 4t 1. east steps, which go from the point (x, y) to (x +
Therefore, the diagonalization gives: 1, y);
µ ¶ µ √ ¶k
2n − k 1 1 − 1 − 4t 2. diagonal steps, which go from the point (x, y) to
= [tn ] √ .
n−k 1 − 4t 2 (x + 1, y + 1);
This shows that the binomial coefficient is the generic
3. north steps, which go from the point (x, y) to
element of the Riordan array:
µ √ ¶ (x, y + 1).
1 1 − 1 − 4t
D= √ , . A colored walk is a walk in which every kind of step
1 − 4t 2t
can assume different colors; we denote by a, b, c (a >
As a check, we observe that¡ column ¢ 0 contains all the
0, b, c ≥ 0) the number of colors the east, diagonal
elements with k = 0, i.e., 2n n , and this is in accor-
√ and north steps can be. We discuss complete colored
dance with the generating function d(t) = 1/ 1 − 4t. walks, i.e., walks without any restriction, and under-
A simple example is: diagonal walks, i.e., walks that never go above the
Xn µ ¶
2n − k k main diagonal x − y = 0. The length of a walk is the
2 = number of its steps, and we denote by dn,k the num-
n−k
k=0
· ¯ √ ¸ ber of colored walks which have length n and reach a
n 1 1 ¯ 1 − 1 − 4t distance k from the main diagonal, i.e., the last step
= [t ] √ ¯y= =
1 − 4t 1 − 2y 2 ends on the diagonal x − y = k ≥ 0. A colored walk
1 1 1 problem is any (counting) problem corresponding to
= [tn ] √ √ = [tn ] = 4n .
1 − 4t 1 − 4t 1 − 4t colored walks; a problem is called symmetric if and
By using the diagonalization rule as above, we can only if a = c.
show that: We wish to point out that our considerations are
µµ ¶¶ by no means limited to the walks on the integral lat-
2n + ak
= tice. Many combinatorial problems can be proved
n − ck k∈N
à µ √ ¶a+2c ! to be equivalent to some walk problems; bracketing
1 1 − 1 − 4t problems are a typical example and, in fact, a vast
= √ , tc−1 . literature exists on walk problems.
1 − 4t 2t
Let us consider dn+1,k+1 , i.e., the number of col-
An interesting example is given by the following al- ored walks of length n + 1 reaching the distance k + 1
ternating sum: from the main diagonal. We observe that each walk
X µ 2n ¶ is obtained in a unique way as:
k
(−1) =
n − 3k
k
" µ √ ¶6 # 1. a walk of length n reaching the distance k from
1 1 ¯ 1 − 1 − 4t
¯ the main diagonal, followed by any of the a east
= [tn ] √ ¯ y = t3
1 − 4t 1 + y 2t steps;
µ ¶
1 1−t 2. a walk of length n reaching the distance k + 1
= [tn ] √ + =
2 1 − 4t 2(1 − 3t) from the main diagonal, followed by any of the b
µ ¶
1 2n δn,0 diagonal steps;
= + 3n−1 + .
2 n 6
3. a walk of length n reaching the distance k + 2
The reader is invited to solve, in a similar way, the from the main diagonal, followed by any of the
corresponding non-alternating sum. c north steps.
In the same way¡we can ¢ deal with binomial coeffi-
cients of the form pn+ak n−ck , but in this case, in order Hence we have: dn+1,k+1 = adn,k + bdn,k+1 + cdn,k+2 .
to apply the LIF, we have to solve an equation of de- This proves that A = {a, b, c} is the A-sequence of
gree p > 2. This creates many difficulties, and we do (dn,k )n,k∈N , which therefore is a proper Riordan ar-
not insist on it any longer. ray. This significant fact can be stated as:
5.7. COLOURED WALKS 63
Theorem 5.7.1 Let dn,k be the number of colored weighted row sums given in the first section allow us
walks of length n reaching a distance k from the main to find the generating functions α(t) of the total num-
diagonal, then the infinite triangle (dn,k )n,k∈N is a ber αn of underdiagonal walks of length n, and δ(t)
proper Riordan array. of the total distance δn of these walks from the main
diagonal:
The Pascal, Catalan and Motzkin triangles define
√
walking problems that have different values¡ of ¢ a, b, c. 1 1 − (b + 2a)t − ∆
When c = 0, it is easily proved that dn,k = k a b n k n−k α(t) =
2at (a + b + c)t − 1
and so we end up with the Pascal triangle. Con-
sequently, we assume c 6= 0. For any given triple à √ !2
1 1 − (b + 2a)t − ∆
(a, b, c) we obtain one type of array from complete δ(t) = .
walks and another from underdiagonal walks. How- 4at (a + b + c)t − 1
ever, the function h(t), that only depends on the A- In the symmetric case these formulas simplify as fol-
sequence, is the same in both cases, and we can find it lows:
by means of formula (5.3.2). In fact, A(t) = a+bt+ct2
Ãs !
and h(t) is the solution of the functional equation 1 1 − (b − 2a)t
h(t) = a + bth(t) + ct2 h(t)2 having h(0) 6= 0: α(t) = −1
2at 1 − (b + 2a)t
√
1 − bt − 1 − 2bt + b2 t2 − 4act2 Ã s !
h(t) = (5.7.1)
2ct2 1 1 − bt 1 − (b − 2a)t
δ(t) = − .
The radicand 1 − 2bt + (b 2
− 4ac)t 2
= (1 − (b + 2at 1 − (b + 2a)t 1 − (b + 2a)t
√ √
2 ac)t)(1 − (b − 2 ac)t) will be simply denoted by The alternating row sums and the diagonal sums
∆. sometimes have some combinatorial significance as
Let us now focus our attention on underdiagonal well, and so they can be treated in the same way.
walks. If we consider dn+1,0 , we observe that every The study of complete walks follows the same lines
walk returning to the main diagonal can only be ob- and we only have to derive the form of the corre-
tained from another walk returning to the main diag- sponding Riordan array, which is:
onal followed by any diagonal step, or a walk ending
à √ !
at distance 1 from the main diagonal followed by any 1 1 − bt − ∆
north step. Hence, we have dn+1,0 = bdn,0 + cdn,1 (dn,k )n,k∈N = √ , .
∆ 2ct2
and in the column generating functions this corre-
sponds to d(t) − 1 = btd(t) + ctd(t)th(t). From this
The proof is as follows. Since a complete walk can
relation we easily find d(t) = (1/a)h(t), and therefore
go above the main diagonal, the array (dn,k )n,k∈N is
by (5.7.1) the Riordan array of underdiagonal colored
only the right part of an infinite triangle, in which
walk is:
k can also assume the negative values. By following
à √ √ !
1 − bt − ∆ 1 − bt − ∆ the logic of the theorem above, we see that the gen-
(dn,k )n,k∈N = , . erating function of the nth row is ((c/w) + b + aw)n ,
2act2 2ct2
and therefore the bivariate generating function of the
extended triangle is:
In current literature, major importance is usually
given to the following three quantities: X³ c ´n
d(t, w) = + b + aw tn =
n
w
1. the number of walks returning to the main diag-
onal; this is dn = [tn ]d(t), for every n, 1
= .
1 − (aw + b + c/w)t
2. the total
Pn number of walks of length n; this is
αn = k=0 dn,k , i.e., the value of the row sums If we expand this expression by partial fractions, we
of the Riordan array; get:
à !
3. the average distance from the mainP diagonal of d(t, w) = √ 1 1 1
n
√ − √
all the walks of length n; this is δn = k=0 kdn,k , ∆ 1 − 1−bt− 2ct
∆
w 1 − 1−bt+ ∆
2ct w
which is the weighted row sum of the Riordan Ã
array, divided by αn . 1 1
= √ √ +
∆ 1− 1−bt− ∆
2ct w
In Chapter 7 we will learn how to find an asymp- √ !
totic approximation for dn . With regard to the last 1 − bt − ∆ 1 1
+ √ .
two points, the formulas for the row sums and the 2at w 1 − 1−bt− ∆ 1
2ct w
64 CHAPTER 5. RIORDAN ARRAYS
The first term represents the right part of the ex- recurrence relation by (k + 1)!/(n + 1)! we obtain the
tended triangle and this corresponds to k ≥ 0, new relation:
whereas the second term corresponds to the left part ½ ¾
(k < 0). We are interested in the right part, and the (k + 1)! n + 1
=
expression can be written as: (n + 1)! k + 1
à √ !k ½ ¾
(k + 1)! n k+1 k! n n o k + 1
1 1 1 X 1 − bt − ∆ k = + .
√ √ =√ w n! k + 1 n + 1 n! k n + 1
∆1− 1−bt− ∆
w ∆ k 2ct
2ct © ª
If we denote by dn,k the quantity k! nk /n!, this is a
which immediately gives the form of the Riordan ar- recurrence relation for dn,k , which can be written as:
ray.
(n + 1)dn+1,k+1 = (k + 1)dn,k+1 + (k + 1)dn,k .
5.8 Stirling numbers and Rior- Let us now proceed as above and find the column
dan arrays generating functions for the new array (dn,k )n,k∈N .
Obviously, d0 (t) = 1; by setting k = 0 in the new
The connection between Riordan arrays and Stirling recurrence:
numbers is not immediate. If we examine the two in-
finite triangles of the Stirling numbers of both kinds, (n + 1)dn+1,1 = dn,1 + dn,0
we immediately realize that they are not Riordan ar-
′
rays. It is not difficult to obtain the column generat- and passing to generating functions: d1 (t) = d1 (t) +
ing functions for the Stirling numbers of the second 1. The solution of this simple differential equation
t
kind; by starting with the recurrence relation: is d 1 (t) = e − 1 (the reader can simply check this
½ ¾ ½ ¾ n o solution, if he or she prefers). We can now go on
n+1 n n by setting k = 1 in the recurrence; we obtain: (n +
= (k + 1) +
k+1 k+1 k 1)dn+1,2 = 2dn,2 + 2dn,1 , or d′2 (t) = 2d2 (t) + 2(et −
1). Again, this differential equation has the solution
and the obvious generating function S0 (t) = 1, we d2 (t) = (et − 1)2 , and this suggests that, in general,
can specialize the recurrence (valid for every n ∈ N) we have: dk (t) = (et − 1)k . A rigorous proof of this
to the case k = 0. This gives the relation between fact can be obtained by mathematical induction; the
generating functions: recurrence relation gives: d′k+1 (t) = (k + 1)dk+1 (t) +
S1 (t) − S1 (0) (k + 1)dk (t). By the induction hypothesis, we can
= S1 (t) + S0 (t); substitute dk (t) = (et − 1)k and solve the differential
t
equation thus obtained. In practice, we can simply
because S1 (0) = 0, we immediately obtain S1 (t) = verify that dk+1 (t) = (et − 1)k+1 ; by substituting, we
t/(1−t), which is easily checked by looking at column have:
1 in the array. In a similar way, by specializing the (k + 1)et (et − 1)k =
recurrence relation to k = 1, we find S2 (t) = 2tS2 (t)+
tS1 (t), whose solution is: (k + 1)(et − 1)k+1 + (k + 1)(et − 1)k
and this equality is obviously true.
t2
S2 (t) = . The form of this generating function:
(1 − t)(1 − 2t)
µ n o¶
© ª k! n
This proves, in an algebraic way, that n2 = 2n−1 −1, dk (t) = G = (et − 1)k
n! k
and also indicates the form of the generating function n∈N
for column m:
proves that (dn,k )n,k∈N is a Riordan array having
³n n o´ tm
Sm (t) = G = d(t) = 1 and th(t) = (et − 1). This fact allows us
m n∈N (1 − t)(1 − 2t) · · · (1 − mt) to prove algebraically a lot of identities concerning
the Stirling numbers of the second kind, as we shall
which is now proved by induction when we specialize
see in the next section.
the recurrence relation above to k = m. This is left
For the Stirling numbers of the first kind we pro-
to the reader as a simple exercise.
ceed in an analogous way. We multiply the basic
The generating functions for the Stirling numbers
recurrence:
of the first kind are not so simple. However, let us
· ¸ · ¸ h i
go on with the Stirling numbers of the second kind n+1 n n
proceeding in the following way; if we multiply the = n +
k+1 k+1 k
5.9. IDENTITIES INVOLVING THE STIRLING NUMBERS 65
by£ (k¤ + 1)!/(n + 1)! and study the quantity fn,k = A first result we obtain by using the correspon-
k! nk /n!: · ¸ dence between Stirling numbers and Riordan arrays
(k + 1)! n + 1 concerns the row sums of the two triangles. For the
= Stirling numbers of the first kind we have:
(n + 1)! k + 1
· ¸ h i n h i
k! h n i 1
(k + 1)! n n k! n k + 1 X Xn
= + , n
n! k + 1 n + 1 n! k n + 1 = n! =
k n! k k!
k=0 k=0
that is: · ¯ ¸
¯ 1
= n![tn ] ey ¯ y = ln =
(n + 1)fn+1,k+1 = nfn,k+1 + (k + 1)fn,k . 1−t
1
In this case also we have f0 (t) = 1 and by special- = n![tn ] = n!
1−t
izing the last relation to the case k = 0, we obtain:
as we observed and proved in a combinatorial way.
f1′ (t) = tf1′ (t) + f0 (t). The row sums of the Stirling numbers of the second
kind give, as we know, the Bell numbers; thus we
This is equivalent to f1′ (t) = 1/(1 − t) and because
can obtain the (exponential) generating function for
f1 (0) = 0 we have:
these numbers:
1 n n o
k! n n o 1
X Xn
f1 (t) = ln . n
1−t = n! =
k n! k k!
k=0 k=0
By setting k = 1, we find the simple differential equa- £ ¯ ¤
tion f2′ (t) = tf2′ (t) + 2f1 (t), whose solution is: = n![tn ] ey ¯ y = et − 1 =
µ ¶2 = n![tn ] exp(et − 1);
1
f2 (t) = ln . therefore we have:
1−t
µ ¶
This suggests the general formula: Bn
G = exp(et − 1).
µ h i¶ µ ¶k n!
k! n 1
fk (t) = G = ln
n! k n∈N 1−t PWen © n ª defined the ordered Bell numbers as On =
also
k=0 k k!; therefore we have:
and again this can be proved by induction. In this
k! n n o
X n
case, (fn,k )n,k∈N is the Riordan array having d(t) = 1 On
= =
and th(t) = ln(1/(1 − t)). n! n! k
k=0
· ¸
n 1 ¯¯ 1
= [t ] ¯ y = e − 1 = [tn ]
t
.
5.9 Identities involving the 1−y 2 − et
Stirling numbers We have thus obtained the exponential generating
function: µ ¶
The two recurrence relations for dn,k and fn,k do not On 1
give an immediate evidence that the two triangles G = .
n! 2 − et
are indeed Riordan arrays., because they do not cor-
respond to A-sequences. However, the A-sequences Stirling numbers of the two kinds are related be-
for the two arrays can be easily found, once we know tween them in various ways. For example, we have:
their h(t) function. For the Stirling numbers of the X h n i ½ k ¾ n! X k! h n i m! k
½ ¾
first kind we have to solve the functional equation: = =
k m m! n! k k! m
µ ¶ k
·
k
¸
1 1 n! n ¯ 1
ln = tA ln . y m ¯
1−t 1−t = [t ] (e − 1) ¯ y = ln =
m! 1−t
µ ¶
By setting y = ln(1/(1 − t)) or t = (ey − 1)/y, we n! n tm n! n − 1
y y = [t ] = .
have A(y) = ye /(e − 1) and this is the generating m! (1 − t)m m! m − 1
function for the A-sequence we were looking for. In
a similar way, we find that the A-sequence for the Besides, two orthogonality relations exist between
triangle related to the Stirling numbers of the second Stirling numbers. The first one is proved in this way:
kind is: X hni ½ k ¾
t (−1)n−k =
A(t) = . k m
ln(1 + t) k
66 CHAPTER 5. RIORDAN ARRAYS
½ ¾
n! X k! h n i m! k 1−t 1
= (−1) n
(−1)k = = n![tn ] ln =
m! n! k k! m t 1−t
k
· ¯ ¸ 1 (n − 1)!
n n! n −y m ¯ 1 = n ln = .
= (−1) [t ] (e − 1) ¯ y = ln = 1−t n+1
m! 1−t
n! Clearly, this holds for n > 0. For n = 0 we have:
= (−1)n [tn ](−t)m = δn,m .
m! n h i
X n
The second orthogonality relation is proved in a sim- Bk = B0 = 1.
k
k=0
ilar way and reads:
X nno · k ¸
(−1)n−k = δn,m .
k m
k
k! h n i xk
Xn
= (−1)n n! (−1)k =
n! k k!
k=0
· ¯ ¸
n n −xy ¯ 1
= (−1) n![t ] e ¯ y = ln =
1−t
µ ¶
x
= (−1)n nx = n! = xn
n
and:
n n o µ ¶
k! n n o x
X Xn
n k
x = n! =
k n! k k
k=0 k=0
£ ¯ ¤
= n![tn ] (1 + y)x ¯ y = et − 1 =
= n![tn ]etx = xn .
Formal methods
6.1 Formal languages say that z occurs in w, and the particular instance
of z in w is called an occurrence of z in w. Observe
During the 1950’s, the linguist Noam Chomski in- that if z is a subword of w, it can have more than one
troduced the concept of a formal language. Several occurrence in w. If w = zw2 , we say that z is a head
definitions have to be provided before a precise state- or prefix of w, and if w = w1 z, we say that z is a tail
ment of the concept can be given. Therefore, let us or suffix of w. Finally, a language on A is any subset
proceed in the following way. L ⊆ A∗ .
First, we recall definitions given in Section 2.1. An The basic definition concerning formal languages
alphabet is a finite set A = {a1 , a2 , . . . , an }, whose is the following: a grammar is a 4-tuple G =
elements are called symbols or letters. A word on A (T, N, σ, P), where:
is a finite sequence of symbols in A; the sequence is
written by juxtaposing the symbols, and therefore a • T = {a1 , a2 , . . . , an } is the alphabet of terminal
word w is denoted by w = ai ai . . . ai , and r = |w| is symbols;
1 2 r
the length of the word. The empty sequence is called • N = {φ1 , φ2 , . . . , φm } is the alphabet of non-
the empty word and is conventionally denoted by ǫ; terminal symbols;
its length is obviously 0, and is the only word of 0
length. • σ ∈ N is the initial symbol;
The set of all the words on A, the empty word in-
• P is a finite set of productions.
cluded, is indicated by A∗ , and by A+ if the empty
word is excluded. Algebraically, A∗ is the free monoid Usually, the symbols in T are denoted by lower
on A, that is the monoid freely generated by the sym- case Latin letters; the symbols in N by Greek let-
bols in A. To understand this point, let us consider ters or by upper case Latin letters. A production
the operation of juxtaposition and recursively apply is a pair (z1 , z2 ) of words in T ∪ N , such that z1
it starting with the symbols in A. What we get are contains at least a symbol in N ; the production is
the words on A, and the juxtaposition can be seen as often indicated by z1 → z2 . If w ∈ (T ∪ N )∗ , we
an operation between them. The algebraic structure can apply a production z1 → z2 ∈ P to w whenever
thus obtained has the following properties: w can be decomposed w = w1 z1 w2 , and the result
is the new word w1 z2 w2 ∈ (T ∪ N )∗ ; we will write
1. associativity: w1 (w2 w3 ) = (w1 w2 )w3 =
w = w1 z1 w2 ⊢ w1 z2 w2 when w1 z1 w2 is the decompo-
w1 w2 w3 ;
sition of w in which z1 is the leftmost occurrence of
2. ǫ is the identity or neutral element: ǫw = wǫ = z1 in w; in other words, if we also have w = w b1 z1 w
b2 ,
w. then |w1 | < |w
b1 |.
Given a grammar G = (T, N, σ, P), we define the
It is called a monoid, which, by construction, has relation w ⊢ w b between words w, w b ∈ (T ∪ N )∗ : the
been generated by combining the symbols in A in all relation holds if and only if a production z1 → z2 ∈ P
the possible ways. Because of that, (A∗ , ·), if · de- exists such that z1 occurs in w, w = w1 z1 w2 is the
notes the juxtaposition, is called the “free monoid” leftmost occurrence of z1 in w and w b = w1 z2 w2 . We
generated by A. Observe that a monoid is an alge- also denote by ⊢∗ the transitive closure of ⊢ and call
braic structure more general than a group, in which it generation or derivation; this means that w ⊢∗ w b
all the elements have an inverse as well. if and only if a sequence (w = w1 , w2 , . . . , ws = w) b
If w ∈ A∗ and z is a word such that w can be exists such that w1 ⊢ w2 , w2 ⊢ w3 , . . . , ws−1 ⊢ ws .
decomposed w = w1 zw2 (w1 and/or w2 possibly We observe explicitly that by our condition that in
empty), we say that z is a subword of w; we also every production z1 → z2 the word z1 should contain
67
68 CHAPTER 6. FORMAL METHODS
∗
at least a symbol in N , if a word wi ∈ T ∗ is produced Theorem 6.2.1 A word w ∈ {a, b} belongs to the
during a generation, it is terminal, i.e., the generation Dyck language D if and only if:
should stop. By collecting all these definitions, we
i) the number of a’s in w equals the number of b’s;
finally define the language generated by the grammar
G as the set: ii) in every prefix z of w the number of a’s is not
n ¯ o
∗ ¯ ∗ less than the number of b’s.
L(G) = w ∈ T ¯ σ ⊢ w
Proof: Let w ∈ D; if w = ǫ nothing has to be
i.e., a word w ∈ T ∗ is in L(G) if and only if we can
proved. Otherwise, w is generated by the second pro-
generate it by starting with the initial symbol σ and
duction and w = aw1 bw2 with w1 , w2 ∈ D; therefore,
go on by applying the productions in P until w is
if we suppose that i) holds for w1 and w2 , it also
generated. At that moment, the generation stops.
holds for w. For ii), any prefix z of w must have
Note that, sometimes, the generation can go on for-
one of the forms: a, az1 where z1 is a prefix of w1 ,
ever, never generating a word on T ; however, this is
aw1 b or aw1 bz2 where z2 is a prefix of w2 . By the
not a problem: it only means that such generations
induction hypothesis, ii) should hold for z1 and z2 ,
should be ignored.
and therefore it is easily proved for w. Vice versa, let
us suppose that i) and ii) hold for w ∈ T ∗ . If w 6= ǫ,
6.2 Context-free languages then by ii) w should begin by a. Let us scan w until
we find the first occurrence of the symbol b such that
The definition of a formal language is quite general w = aw1 bw2 and in w1 the number of b’s equals the
and it is possible to show that formal languages co- number of a’s. By i) such occurrence of b must exist,
incide with the class of “partially recursive sets”, the and consequently w1 and w2 must satisfy condition
largest class of sets which can be constructed recur- i). Besides, if w1 and w2 are not empty, then they
sively, i.e., in finite terms. This means that we can should satisfy condition ii), by the very construction
give rules to build such sets (e.g., we can give a gram- of w1 and the fact that w satisfies condition ii) by
mar for them), but their construction can go on for- hypothesis. We have thus obtained a decomposition
ever, so that, looking at them from another point of of w showing that the second production has been
view, if we wish to know whether a word w belongs used. This completes the proof.
to such a set S, we can be unlucky and an infinite
If we substitute the letter a with the symbol ‘(′ and
process can be necessary to find out that w 6∈ S.
the letter b with the symbol ‘)’, the theorem shows
Because of that, people have studied more re-
that the words in the Dyck language are the possi-
stricted classes of languages, for which a finite process
ble parenthetizations of an expression. Therefore, the
is always possible for finding out whether w belongs
number of Dyck words ¡ ¢with n pairs of parentheses is
to the language or not. Surely, the most important
the Catalan number 2n n /(n+1). We will see how this
class of this kind is the class of “context-free lan-
result can also be obtained by starting with the def-
guages”. They are defined in the following way. A
inition of the Dyck language and applying a suitable
context-free grammar is a grammar G = (T, N, σ, P)
and mechanical method, known as Schützenberger
in which all the productions z1 → z2 in P are such
methodology or symbolic method. The method can
that z1 ∈ N . The naming “context-free” derives from
be applied to every set of objects, which are defined
this definition, because a production z1 → z2 is ap-
through a non-ambiguous context-free grammar.
plied whenever the non-terminal symbol z1 is the left-
A context-free grammar G is ambiguous iff there
most non-terminal symbol in a word, irrespective of
exists a word w ∈ L(G) which can be generated by
the context in which it appears.
two different leftmost derivations. In other words, a
As a very simple example, let us consider the fol-
context-free grammar H is non-ambiguous iff every
lowing grammar. Let T = {a, b}, N = {σ}; σ is
word w ∈ L(H) can be generated in one and only
the initial symbol of the grammar, being the only
one way. An example of an ambiguous grammar is
non-terminal. The set P is composed of the two pro-
G = (T, N, σ, P) where T = {1}, N = {σ} and P
ductions:
contains the two productions:
σ→ǫ σ → aσbσ.
σ→1 σ → σσ.
This grammar is called the Dyck grammar and the
language generated by it the Dyck language. In Fig- For example, the word 111 is generated by the two
ure 6.1 we draw the generation of some words in the following leftmost derivations:
Dyck language. The recursive nature of the produc- σ ⊢ σσ ⊢ 1σ ⊢ 1σσ ⊢ 11σ ⊢ 111
tions allows us to prove properties of the Dyck lan-
guage by means of mathematical induction: σ ⊢ σσ ⊢ σσσ ⊢ 1σσ ⊢ 11σ ⊢ 111.
6.3. FORMAL LANGUAGES AND PROGRAMMING LANGUAGES 69
σ
»» X XX
» »» XXX
»» X
ǫ aσbσ
(
(( h hhh
(( (( hh hh
(((( hhh
abσ aaσbσbσ
¡ @ ¡ @
¡ @ ¡ @
ab abaσbσ aabσbσ aaaσbσbσbσ
¡ @ ¡ @ ¡@
¡ @ ¡ @ ¡ @
ababσ abaaσbσbσ aabbσ aabaσbσbσ
¡ @ ¡@ ¡ @ ¡@
¡ @ ¡ @ ¡ @ ¡ @
abab ababaσbσ aabb aabbaσbσ
Instead, the Dyck grammar is non-ambiguous; in 6.3 Formal languages and pro-
fact, as we have shown in the proof of the pre-
vious theorem, given any word w ∈ D, w 6= ǫ,
gramming languages
there is only one decomposition w = aw1 bw2 , hav-
ing w1 , w2 ∈ D; therefore, w can only be generated
In 1960, the formal definition of the programming
in a single way. In general, if we show that any word
language ALGOL’60 was published. ALGOL’60 has
in a context-free language L(G), generated by some
surely been the most influential programming lan-
grammar G, has a unique decomposition according
guage ever created, although it was actually used only
to the productions in G, then the grammar cannot
by a very limited number of programmers. Most of
be ambiguous. Because of the connection between
the concepts we now find in programming languages
the Schützenberger methodology and non-ambiguous
were introduced by ALGOL’60, of which, for exam-
context-free grammars, we are mainly interested in
ple, PASCAL and C are direct derivations. Here,
this kind of grammars. For the sake of completeness,
we are not interested in these aspects of ALGOL’60,
a context-free language is called intrinsically ambigu-
but we wish to spend some words on how ALGOL’60
ous iff every context-free grammar generating it is
used context-free grammars to define its syntax in a
ambiguous. This definition stresses the fact that, if a
formal and precise way. In practice, a program in
language is generated by an ambiguous grammar, it
ALGOL’60 is a word generated by a (rather com-
can also be generated by some non-ambiguous gram-
plex) context-free grammar, whose initial symbol is
mar, unless it is intrinsically ambiguous. It is possible
hprogrami.
to show that intrinsically ambiguous languages actu-
ally exist; fortunately, they are not very frequent. For The ALGOL’60 grammar used, as terminal sym-
example, the language generated by the previous am- bol alphabet, the characters available on the stan-
biguous grammar is {1}+ , i.e., the set of all the words dard keyboard of a computer; actually, they were the
composed by any sequence of 1’s, except the empty characters punchable on a card, the input mean used
word. actually, it is not an ambiguous language and at that time to introduce a program into the com-
a non-ambiguous grammar generating it is given by puter. The non-terminal symbol notation was one
the same T, N, σ and the two productions: of the most appealing inventions of ALGOL’60: the
symbols were composed by entire English sentences
enclosed by the two special parentheses h and i. This
σ→1 σ → 1σ. allowed to clearly express the intended meaning of the
non-terminal symbols. The previous example con-
cerning hprogrami makes surely sense. Another tech-
It is a simple matter to show that every word 11 . . . 1 nical device used by ALGOL’60 was the compaction
can be uniquely decomposed according to these pro- of productions; if we had several production with the
ductions. same left hand symbol β → w1 , β → w2 , . . . , β → wk ,
70 CHAPTER 6. FORMAL METHODS
they were written as a single rule: beginning and ending by the symbol 1 and never con-
taining two consecutive 0’s. For small values of n,
β ::= w1 | w2 | · · · | wk Fibonacci words of length n are easily displayed:
where ::= was a metasymbol denoting definition and
| was read “or” to denote alternatives. This notation
is usually called Backus Normal Form (BNF). n=1 1
Just to do a very simple example, in Figure 6.1 n=2 11
(lines 1 through 6) we show how integer numbers n=3 111, 101
were defined. This definition avoids leading 0’s in n=4 1111, 1011, 1101
numbers, but allows both +0 and −0. Productions n=5 11111, 10111, 11011, 11101, 10101
can be easily changed to avoid +0 or −0 or both.
If we count them by their length, we obtain the
In the same figure, line 7 shows the definition of the
sequence {0, 1, 1, 2, 3, 5, 8, . . .}, which is easily recog-
conditional statements.
nized as the Fibonacci sequence. In fact, a word of
This kind of definition gives a precise formula-
length n is obtained by adding a trailing 1 to a word of
tion of all the clauses in the programming language.
length n−1, or adding a trailing 01 to a word of length
Besides, since the program has a single generation
n − 2. This immediately shows, in a combinatorial
according to the grammar, it is possible to find
way, that Fibonacci words are counted by Fibonacci
this derivation starting from the actual program and
numbers. Besides, we get the productions of a non-
therefore give its exact structure. This allows to give
ambiguous context-free grammar G = (T, N, σ, P),
precise information to the compiler, which, in a sense,
where T = {0, 1}, N = {φ}, σ = φ and P contains:
is directed from the formal syntax of the language
(syntax directed compilation). φ→1 φ → φ1 φ → φ01
A very interesting aspect is how this context-free
grammar definition can avoid ambiguities in the in- (these productions could have been written φ ::=
terpretation of a program. Let us consider an expres- 1 | φ1 | φ01 by using the ALGOL’60 notations).
sion like a + b ∗ c; according to the rules of Algebra, We are now going to obtain the counting gener-
the multiplication should be executed before the ad- ating function for Fibonacci words by applying the
dition, and the computer must follow this convention Schützenberger’s method. This consists in the fol-
in order to create no confusion. This is done by the lowing steps:
simplified productions given by lines 8 through 11 in
Figure 6.1. The derivation of the simple expression 1. every non-terminal symbol σ ∈ N is transformed
a + b ∗ c, or of a more complicated expression, reveals into the name of its counting generating function
that it is decomposed into the sum of a and b ∗ c; σ(t);
this information is passed to the compiler and the 2. every terminal symbol is transformed into t;
multiplication is actually performed before addition.
If powers are also present, they are executed before 3. the empty word is transformed into 1;
products.
4. every | sign is transformed into a + sign, and ::=
This ability of context-free grammars in design-
is transformed into an equal sign.
ing the syntax of programming languages is very im-
portant, and after ALGOL’60 the syntax of every After having performed these transformations, we ob-
programming language has always been defined by tain a system of equations, which can be solved in the
context-free grammars. We conclude by remembering unknown generating functions introduced in the first
that a more sophisticated approach to the definition step. They are the counting generating functions for
of programming languages was tried with ALGOL’68 the languages generated by the corresponding non-
by means of van Wijngaarden’s grammars, but the terminal symbols, when we consider them as the ini-
method revealed too complex and was abandoned. tial symbols.
The definition of the Fibonacci words produces:
6.4 The symbolic method φ(t) = t + tφ(t) + t2 φ(t)
1 hdigiti ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
2 hnon − zero digiti ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
3 hsequence of digitsi ::= hdigiti | hdigiti hsequence of digitsi
4 hunsigned numberi ::= hdigiti | hnon − zero digiti hsequence of digitsi
5 hsigned numberi ::= +hunsigned numberi | − hunsigned numberi
6 hinteger numberi ::= hunsigned numberi | hsigned numberi
number of Fibonacci words of length n is Fn , as we They can be defined in a pictorial way by means of
have already proved by combinatorial arguments. an object grammar:
In the case of the Dyck language, the definition An object grammar defines combinatorial objects
yields: instead of simple letters or words; however, most
σ(t) = 1 + t2 σ(t)2 times, it is rather easy to pass from an object gram-
mar to an equivalent context-free grammar, and
and therefore: therefore obtain counting generating functions by
√ means of Schützenberger’s method. For example, the
1− 1 − 4t2
σ(t) = . object grammar in Figure 6.2 is obviously equivalent
2t2
to the context-free grammar for Motzkin words.
Since every word in the Dyck language has an even
length, the number of Dyck words with 2n symbols is
just the nth Catalan number, and this also we knew 6.5 The bivariate case
by combinatorial means.
Another example is given by the Motzkin words; In Schützenberger’s method, the rôle of the indeter-
these are words on the alphabet {a, b, c} in which minate t is to count the number of letters or symbols
a, b act as parentheses in the Dyck language, while occurring in the generated words; because of that, ev-
c is free and can appear everywhere. Therefore, the ery symbol appearing in a production is transformed
definition of the language is: into a t. However, we can wish to count other param-
eters instead of or in conjunction with the number of
µ ::= ǫ | cµ | aµbµ symbols. This is accomplished by modifying the in-
tended meaning of the indeterminate t and/or intro-
if µ is the only non-terminal symbol. The ducing some other indeterminate to take into account
Schützenberger’s method gives the equation: the other parameters.
For example, in the Dyck language, we can wish to
µ(t) = 1 + tµ(t) + t2 µ(t)2 count the number of pairs a, b occurring in the words;
this means that t no longer counts the single letters,
whose solution is easily found: but counts the pairs. Therefore, the Schützenberger’s
√ method gives the equation σ(t) = 1 + tσ(t)2 , whose
1 − t − 1 − 2t − 3t2 solution is just the generating function of the Catalan
µ(t) = .
2t2 numbers.
By expanding this function we find the sequence of An interesting application is as follows. Let us sup-
Motzkin numbers, beginning: pose we wish to know how many Fibonacci words of
length n contain k zeroes. Besides the indeterminate
n 0 1 2 3 4 5 6 7 8 9 t counting the total number of symbols, we introduce
Mn 1 1 2 4 9 21 51 127 323 835 a new indeterminate z counting the number of zeroes.
From the productions of the Fibonacci grammar, we
These numbers count the so-called unary-binary derive an equation for the bivariate generating func-
trees, i.e., trees the nodes of which have ariety 1 or 2. tion φ(t, z), in which the coefficient of tn z k is just the
72 CHAPTER 6. FORMAL METHODS
t t
¡@
¡ @
º· ¡ @
JJ ::= ǫ ¡ @
µ J ¹¸ JJ JJ JJ
J µ J µ J µ J
J J J
number we are looking for. The equation is: total number of zeroes in all the words of length n:
X X µn − k − 1 ¶
φ(t, z) = t + tφ(t, z) + t2 zφ(t, z). kφn,k = k=
k
k k
In fact, the production φ → φ1 increases by 1 the · ¯ ¸
n t y ¯ t2
length of the words, but does not introduce any 0; = [t ] ¯y= =
the production φ → φ01, increases by 2 the length 1 − t (1 − y)2 1−t
and introduces a zero. By solving this equation, we t3
= [tn ] .
find: (1 − t − t2 )2
t
φ(t, z) = . We extract the coefficient:
1 − t − zt2
n t3
We can now extract the coefficient φn,k of tn z k in the [t ] =
(1 − t − t2 )2
following way: µ µ ¶¶2
n−1 1 1 1
t = [t ] √ − =
φn,k = [tn ][z k ] = 5 1 − φt 1 − φt b
1 − t − zt2
t 1 1 n−1 1 2 t
= [tn ][z k ] = = [t ] 2
− [tn ] +
t
1 − t 1 − z 1−t2 5 (1 − φt) 5 (1 − φt)(1 − φt) b
µ 2 ¶k 1 1
t X k
∞
t + [tn−1 ] =
n
= [t ][z ] k
z = 5 (1 − φt)b 2
1−t 1−t
k=0 1 n−1 1 2 1
∞
X t2k+1 z k 2k+1 = [t ] 2
− √ [tn ] +
t 5 (1 − φt) 5 5 1 − φt
= [tn ][z k ] = [t n
] =
(1 − t)k+1 (1 − t)k+1 2 1 1 1
k=0
µ ¶ + √ [tn ] + [tn−1 ] .
5 5 1 − φtb 5 b 2
(1 − φt)
1 n − k − 1
= [tn−2k−1 ] = . The last two terms are negligible because they rapidly
(1 − t)k+1 k
tend to 0; therefore we have:
Therefore, the number of Fibonacci words of length X
n containing k zeroes is counted by a binomial coeffi- n 2
kφn,k ≈ φn−1 − √ φn .
cient. The second expression in the derivation shows 5 5 5
k
that the array (φn,k )n,k∈N is indeed a Riordan ar- To obtain the average number Z of zeroes, we need
ray (t/(1 − t), t/(1 − t)), which is the Pascal triangle to divide this quantity by Fn ∼ φn /√5, the total
n
stretched vertically, i.e., column k is shifted down by number of Fibonacci words of length n:
k positions (k + 1, in reality). The general formula P √
we know for the row sums of a Riordan array gives: kφn,k n 2 5− 5 2
Zn = k ∼ √ − = n− .
X µ
X n−k−1 ¶ F n φ 5 5 10 5
φn,k = = This shows that the average number of zeroes grows
k
k k linearly with the length of the words and tends to
· ¸
n t 1 ¯¯ t2 become the 27.64% of this length, because (5 −
= [t ] ¯y= = √
1−t 1−y 1−t 5)/10 ≈ 0.2763932022 . . ..
t
= [tn ] = Fn
1 − t − t2
6.6 The Shift Operator
as we were expecting. A more interesting problem
is to find the average number of zeroes in all the Fi- In the usual mathematical terminology, an operator
bonacci words with n letters. First, we count the is a mapping from some set F1 of functions into some
6.7. THE DIFFERENCE OPERATOR 73
concerns the behavior of the difference operator with This is a very important formula, and it is the first
respect to the falling factorial: example for the interest of combinatorics and gener-
ating functions in the theory of finite operators. In
∆xm = (x + 1)m − xm = (x + 1)x(x − 1) · · ·
fact, let us iterate ∆ on f (x) = 1/x:
· · · (x − m + 2) − x(x − 1) · · · (x − m + 1) =
1 −1 1
= x(x − 1) · · · (x − m + 2)(x + 1 − x + m − 1) = ∆2 = + =
x (x + 1)(x + 2) x(x + 1)
= mxm−1 −x + x + 2 2
= =
This is analogous to the usual rule for the differenti- x(x + 1)(x + 2) x(x + 1)(x + 2)
ation operator applied to xm : 1 (−1)n n!
∆n =
x x(x + 1) · · · (x + n)
Dxm = mxm−1
as we can easily show by mathematical induction. In
As we shall see, many formal properties of the dif- fact:
ference operator are similar to the properties of the
differentiation operator. The rôle of the powers xm is ∆n+1 1 = (−1)n n!
−
however taken by the falling factorials, which there- x (x + 1) · · · (x + n + 1)
fore assume a central position in the theory of finite (−1)n n!
− =
operators. x(x + 1) · · · (x + n)
The following general rules are rather obvious: (−1)n+1 (n + 1)!
=
∆(αf (x) + βg(x)) = α∆f (x) + β∆g(x) x(x + 1) · · · (x + n + 1)
The formula for ∆n now gives the following identity:
∆(f (x)g(x)) = n µ ¶
1 X n 1
= E(f (x)g(x)) − f (x)g(x) = ∆n = (−1)n−k E k
x k x
= f (x + 1)g(x + 1) − f (x)g(x) = k=0
From a formal point of view, we have: This rule can be iterated, giving the summation for-
mula:
∆2 = (E − 1)2 = E 2 − 2E + 1 Xn µ ¶
n
E n = (∆ + 1)n = ∆k
and in general: k
k=0
n µ ¶
X n which can be seen as the “dual” formula of the one
∆n = (E − 1)n = (−1)n−k E k =
k already considered:
k=0
n µ ¶
X Xn µ ¶
n n
= (−1)n (−E)k n
∆ = (−1)n−k E k
k k
k=0 k=0
6.8. SHIFT AND DIFFERENCE OPERATORS - EXAMPLE I 75
The evaluation of the successive differences of any 4∗ ) A case involving the harmonic numbers:
function f (x) allows us to state and prove two identi-
1
ties, which may have combinatorial significance. Here ∆Hx =
x+1
we record some typical examples; we mark with an µ ¶−1
asterisk the cases when ∆0 f (x) 6= If (x). n (−1)n−1 x + n
∆ Hx =
1) The function f (x) = 1/x has already been de- n n
veloped, at least partially: X µn¶ µ ¶−1
1 x+n
1 −1 (−1)k Hx+k = − (n > 0)
∆ = k n n
k
x x(x + 1)
µ ¶−1 n µ ¶ µ ¶−1
n1 (−1)n n! (−1)n x + n X n (−1)k−1 x + k
∆ = = Hx+n − Hx
x x(x + 1) · · · (x + n) x n k k k
k=1
X µn¶ (−1)k n!
where is to be noted the case x = 0.
= = 5∗ ) A more complicated case with the harmonic
k x+k x(x + 1) · · · (x + n) numbers:
k
µ ¶−1
1 x+n ∆xHx = Hx + 1
= µ ¶−1
x n (−1)n x + n − 1
∆n xHx =
X n¶
µ µ ¶−1 n−1 n−1
k x+k x
(−1) = .
k k x+n
k X µn¶
2∗ ) A somewhat similar situation, but a bit more (−1)k (x + k)Hx+k =
k
k
complex: µ ¶−1
1 x+n−1
p+x m−p =
∆ = n−1 n−1
m+x (m + x)(m + x + 1)
µ ¶−1
∆n
p+x
=
m−p
(−1)n−1
m+x+n X µn¶ (−1)k µx + k − 1¶−1
m+x m+x n =
k k−1 k−1
µ ¶ µ ¶−1 k
X n m−p m+n
k p+k = (x + n) (Hx+n − Hx ) − n
(−1) = (n > 0)
k m+k m n
k
6) Harmonic numbers and binomial coefficients:
X µn¶µm + k ¶−1 m µ ¶ µ ¶µ ¶
(−1)k = (see above). x x 1
k k m+n ∆ Hx = Hx +
k m m−1 m
µ ¶ µ ¶
3) Another version of the first example: x x
∆n Hx = (Hx + Hm − Hm−n )
1 −p m m−n
∆ =
px + m (px + m)(px + p + m)
X µn¶ µ
x+k
¶
1 (−1) k
Hx+k =
∆n = k m
px + m k
µ ¶
(−1)n n!pn n x
= = (−1) (Hx + Hm − Hm−n )
(px + m)(px + p + m) · · · (px + np + m) m−n
X nµ ¶µ ¶
According to this rule, we should have: ∆0 (p + x
(Hx + Hm − Hm−k ) =
x)/(m + x) = (p − m)(m + x); in the second next k m−k
k
sum, however, we have to set ∆0 = I, and therefore µ ¶
x+n
we also have to subtract 1 from both members in or- = Hx+n
m
der to obtain a true identity; a similar situation arises
whenever we have ∆0 6= I. and by performing the sums on the left containing
µ ¶
X n (−1)k Hx and Hm :
n!pn
= X µn¶µ x ¶
k pk + m m(m + p) · · · (m + np) Hm−k =
k
k m−k
µ ¶ k
X n (−1)k k!pk 1 µ ¶
= x+n
k m(m + p) · · · (m + pk) pn + m = (Hx + Hm − Hx+n )
k m
76 CHAPTER 6. FORMAL METHODS
µ ¶µ ¶ X µn¶
n X
X k
n k mk (x + k)m−k = (x + n)m
(−1)k+j ln(x + j) = ln(x + n) k
k
k j
k=0 j=0
5) Two sums involving the binomial coefficients:
Note that the last but one relation is an identity. µ ¶ µ ¶
x x
∆ =
m m−1
µ ¶ µ ¶
6.9 Shift and Difference Oper- n x x
∆ =
ators - Example II m m−n
X µn ¶ µ
x+k
¶−1
m
µ
x+n
¶−1
3) Falling factorials are an introduction to binomial (−1)k =
coefficients: k m m+n m+n
k
8) Two sums with the central binomial coefficients: Because of this connection, the addition operator has
µ ¶ µ ¶ not widely considered in the literature, and the sym-
1 2x 1 1 2x bol S only is used here for convenience. Likewise the
∆ x =
4 x 2(x + 1) 4x x difference operator, the addition operator can be it-
µ ¶ µ ¶
1 2x (−1)n (2n)! 1 2n erated and often produces interesting combinatorial
∆n x = sums according to the rules:
4 x n!(x + 1) · · · (x + n) 4n n
X µn¶
S n = (E + 1)n = Ek
X µn ¶ µ
2x + 2k 1
¶
k
(−1)k = k
k x + k 4k X µn¶
k
µ ¶ E n = (S − 1)n = (−1)n−k S k
(2n)! 1 2x k
k
= =
n!(x + 1) · · · (x + n) 4n x Some examples are in order here:
µ ¶µ ¶−1 µ ¶
1 2n x + n 2x 1) Fibonacci numbers are typical:
=
4n n n x SFm = Fm+1 + Fm = Fm+2
X µn ¶ k
(−1) (2k)!
µ ¶
1 2x
= S n Fm = Fm+2n
k k!(x + 1) · · · (x + k) 4k x
k
µ ¶ X µn ¶
1 2x + 2n Fm+k = Fm+2n
= k
4n x + n k
X µn¶
9) Two sums with the inverse of the central bino- (−1)k Fm+2k = (−1)n Fm+n
mial coefficients: k
k
µ ¶−1 µ ¶−1 2) Here are the binomial coefficients:
2x 4x 2x
∆4x = µ ¶ µ ¶
x 2x + 1 x m m+1
µ ¶−1 S =
x x+1
n x 2x µ ¶ µ ¶
∆ 4 =
x n m m+n
µ ¶−1 S =
x x+n
1 (−1)n−1 (2n)!4x 2x
=
2n − 1 2n n!(2x + 1) · · · (2x + 2n − 1) x X µn¶µ m ¶ µ
m+n
¶
=
k x+k x+n
X µn ¶ µ ¶−1 k
k k 2x + 2k X µn¶ µ ¶ µ ¶
(−1) 4 = k m+k n m
k x+k (−1) = (−1)
k k x+k x+n
µ ¶−1 k
1 (2n)! 2x
= 3) And finally the inverse of binomial coefficients:
n
2n − 1 2 n!(2x + 1) · · · (2x + 2n − 1) x µ ¶−1 µ ¶−1
X µn ¶ 1 (2k)!(−1)k−1 m m+1 m−1
= S =
k
k 2k − 1 2 k!(2x + 1) · · · (2x + 2k − 1) x m x
k µ ¶ µ ¶−1
µ ¶−1 µ ¶ n m m+1 m−n
2x + 2n 2x S =
= 4n −1 x m−n+1 x
x+n x
X µn¶µ m ¶−1 m+1
µ
m+n
¶−1
6.10 The Addition Operator k x+k
=
m−n+1 x
k
The addition operator S is analogous to the difference X µn¶ (−1)k µm + k ¶−1 1
µ
m
¶−1
operator: = .
k m−k+1 x m+1 x+n
S =E+1 k
We can obviously invent as many expressions as we
and in fact a simple connection exists between the
desire and, correspondingly, may obtain some sum-
two operators:
mation formulas of combinatorial interest. For ex-
S(−1)x f (x) = (−1)x+1 f (x + 1) + (−1)x f (x) = ample:
= (−1)x+1 (f (x + 1) − f (x)) = S∆ = (E + 1)(E − 1) = E 2 − 1 =
= (−1)x−1 ∆f (x) = (E − 1)(E + 1) = ∆S
78 CHAPTER 6. FORMAL METHODS
This derivation shows that the two operators S and have ∆−1 f (x) = g(x) and the rule of definite sum-
∆ commute. We can directly verify this property: mation immediately gives:
n
X
S∆f (x) = S(f (x + 1) − f (x)) =
f (x + k) = g(x + n + 1) − g(x)
= f (x + 2) − f (x) = (E 2 − 1)f (x) k=0
∆Sf (x) = ∆(f (x + 1) + f (x)) =
This is analogous to the rule of definite integration.
R
= f (x + 2) − f (x) = (E 2 − 1)f (x) In fact, the operator of indefinite integration dx
Consequently, we have the two summation formulas: is inverse of the differentiation operator D, and if
f (x) is any function, a primitive function for f (x)
X µn¶ is any function ĝ(x) such that Dĝ(x) = f (x) or
∆n S n = (E 2 − 1)n = (−1)n−k E 2k R
k D−1 f (x) = f (x)dx = ĝ(x). The fundamental the-
k
X µn¶ orem of the integral calculus relates definite and in-
E 2n = (∆S + 1)n = ∆k S k definite integration:
k
k Z b
A simple example is offered by the Fibonacci num- f (x)dx = ĝ(b) − ĝ(a)
a
bers:
The formula for definite summation can be written
∆SFm = Fm+1 in a similar way, if we consider the integer variable k
(∆S)n Fm = ∆n S n Fm = Fm+n and set a = x and b = x + n + 1:
b−1
X
X µn¶ f (k) = g(b) − g(a)
(−1)k Fm+2k = (−1)n Fm+n
k k=a
k
X µn¶ These facts create an analogy between ∆−1 and
Fm+k = Fm+2n −1
R
k D , or Σ and dx, which can be stressed by con-
k
sidering the formal properties of Σ. First of all, we
but these identities have already been proved using observe that g(x) = Σf (x) is not uniquely deter-
the addition operator S. mined. If C(x) is any function periodic of period
1, i.e., C(x + k) = C(x), ∀k ∈ Z, we have:
This formula allows us to change a sum of products, have a number of sums. The negative point is that,
of which we know that the second factor is a differ- sometimes, we do not have a simple function and,
ence, into a sum involving the difference of the first therefore, the sum may not have any combinatorial
factor. The transformation can be convenient every interest.
time when the difference of the first factor is simpler Here is a number of identities obtained by our pre-
than the difference of the second factor. For example, vious computations.
let us perform the following indefinite summation: 1) We have again the partial sums of the geometric
µ ¶ µ ¶ series:
x x
Σx = Σx∆ = px
m m+1 ∆−1 px = = Σpx
µ ¶ µ ¶ p−1
x x+1 n
= x − Σ(∆x) = X px+n+1 − px
m+1 m+1 px+k =
µ ¶ µ ¶ p−1
x x+1 k=0
= x −Σ = n
m+1 m+1 X pn+1 − 1
µ ¶ µ ¶ pk = (x = 0)
x x+1 p−1
= x − k=0
m+1 m+2
2) The sum of consecutive Fibonacci numbers:
Obviously, this indefinite sum can be transformed
into a definite sum by using the first result in this ∆−1 Fx = Fx+1 = ΣFx
section: n
X
µ ¶ µ ¶ Fx+k = Fx+n+2 − Fx+1
Xb
k b+1 k=0
k = (b + 1) − n
m m+1 X
k=a Fk = Fn+2 − 1 (x = 0)
µ ¶ µ ¶ µ ¶
a b+2 a+1 k=0
− a − +
m+1 m+2 m+2
3) The sum of consecutive binomial coefficients with
and for a = 0 and b = n: constant denominator:
µ ¶ µ ¶ µ ¶
Xn µ ¶ µ ¶ µ ¶ −1 x x x
k n+1 n+2 ∆ = =Σ
k = (n + 1) − m m+1 m
m m+1 m+2
k=0 Xn µ ¶ µ ¶ µ ¶
x+k x+n+1 x
= −
m m+1 m+1
6.12 Definite Summation k=0
n µ ¶ µ ¶
X k n+1
In a sense, the rule: = (x = 0)
m m+1
k=0
n
X
E k = (E n+1 − 1)∆−1 = (E n+1 − 1)Σ 4) The sum of consecutive binomial coefficients:
k=0
µ ¶ µ ¶ µ ¶
p+x p+x p+x
∆−1 = =Σ
is the most important result of the operator method. m+x m+x−1 m+x
Xn µ ¶ µ ¶ µ ¶
In fact, it reduces the sum of the successive elements p+k p+n+1 p
in a sequence to the computation of the indefinite = −
m+k m+n m−1
sum, and this is just the operator inverse of the k=0
difference. Unfortunately, ∆−1 is not easy to com- 5) The sum of falling factorials:
pute and, apart from a restricted number of cases,
there is no general rule allowing us to guess what 1
∆−1 xm = xm+1 = Σxm
−1
∆ f (x) = Σf (x) might be. In this rather pes- m + 1
n
simistic sense, the rule is very fine, very general and X (x + n + 1)m+1 − xm+1
(x + k)m =
completely useless. m+1
k=0
However, from a more positive point of view, we n
X 1
can say that whenever we know, in some way or an- km = (n + 1)m+1 (x = 0).
−1
other, an expression for ∆ f (x) = Σf (x), we have m +1
Pn k=0
solved the problem of finding k=0 f (x + k). For
example, we can look at the differences computed in 6) The sum of raising factorials:
the previous sections and, for each of them, obtain 1
the Σ of some function; in this way we immediately ∆−1 xm = (x − 1)m+1 = Σxm
m+1
80 CHAPTER 6. FORMAL METHODS
n
X (x + n)m+1 − (x − 1)m+1 By inverting, we have a formula for the Σ operator:
(x + k)m =
m+1 µ ¶
k=0 1 1 D
n
X 1 Σ= D =
km = nm+1 (x = 0). e −1 D eD − 1
m+1
k=0
Now, we recognize the generating function of the
7) The sum of inverse binomial coefficients: Bernoulli numbers, and therefore we have a devel-
µ ¶−1 µ ¶−1 µ ¶ opment for Σ:
−1 x m x−1 x
∆ = =Σ µ ¶
m m−1 m−1 m 1 B1 B2 2
Σ = B0 + D+ D + ··· =
n µ ¶−1 D 1! 2!
X x+k
= 1 1 1 3
m = D−1 − I + D − D +
k=0 2 12 720
õ ¶−1 µ ¶−1 ! 1 1
m x−1 x+n + D5 − D7 + · · · .
− . 30240 1209600
m−1 m−1 m−1
This is not a series development since, as we know,
8) The sum of harmonic numbers. Since 1 = ∆x, we the Bernoulli numbers diverge to infinity. We have a
have: case of asymptotic development, which only is defined
∆−1 Hx = xHx − x = ΣHx when we consider a limited number of terms, but in
n
X general diverges if we let the number of terms go to
Hx+k = (x + n + 1)Hx+n+1 − xHx − (n + 1) infinity. The number of terms for which the sum ap-
k=0 proaches its true value depends on the function f (x)
n
X and on the argument x.
Hk = (n + 1)Hn+1 − (n + 1) = From the indefinite we can pass to the definite sum
k=0 by applying the general rule of Section 6.12. Since
R
= (n + 1)Hn − n (x = 0). D−1 = dx, we immediately have:
n−1
X Z n
6.13 The Euler-McLaurin Sum- 1 n
f (k) = f (x) dx − [f (x)]0 +
0 2
mation Formula k=0
1 ′ n 1 n
+ [f (x)]0 − [f ′′′ (x)]0 + · · ·
One of the most striking applications of the finite 12 720
operator method is the formal proof of the Euler- and this is the celebrated Euler-McLaurin summation
McLaurin summation formula. The starting point is formula. It expresses a sum as a function of the in-
the Taylor theorem for the series expansion of a func- tegral and the successive derivatives of the function
tion f (x) ∈ C ∞ , i.e., a function having derivatives of f (x). In this sense, the formula can be seen as a
any order. The usual form of the theorem: method for approximating a sum by means of an in-
h ′ h2 tegral or, vice versa, for approximating an integral by
f (x + h) = f (x) + f (x) + f ′′ (x) +
1! 2! means of a sum, and this was just the point of view
hn (n) of the mathematicians who first developed it.
+ ··· + f (x) + · · ·
n! As a simple but very important example, let us
can be interpreted in the sense of operators as a result find an asymptotic development for the harmonic
connecting the shift and the differentiation operators. numbers Hn . Since Hn = Hn−1 + 1/n, the Euler-
In fact, for h = 1, it can be written as: McLaurin formula applies to Hn−1 and to the func-
tion f (x) = 1/x, giving:
Df (x) D2 f (x)
Ef (x) = If (x) + + + ··· Z n · ¸n · ¸n
1! 2! dx 1 1 1 1
Hn−1 = − + − 2 −
and therefore as a relation between operators: 1 x 2 x 1 12 x 1
· ¸n · ¸n
D D2 Dn 1 6 1 120
E =1+ + + ··· + + · · · = eD − − 4 + − 6 + ···
1! 2! n! 720 x 1 30240 x 1
This formal identity relates the finite operator E and 1 1 1 1 1
= ln n − + − 2
+ + −
the infinitesimal operator D, and subtracting 1 from 2n 2 12n 12 120n4
both sides it can be formulated as: 1 1 1
− − 6
+ + ···
120 256n 252
∆ = eD − 1
6.14. APPLICATIONS OF THE EULER-MCLAURIN FORMULA 81
µ ¶
In this expression a number of constants appears, and √ nn 1 1
= 2πn n 1 + + + ··· ×
they can be summed together to form a constant γ, e 12n 288n2
µ ¶
provided that the sum actually converges. However, 1
we observe that as n → ∞ this constant γ is the × 1− + ··· ··· =
360n3
Euler-Mascheroni constant: µ ¶
√ nn 1 1
= 2πn n 1 + + − ··· .
lim (Hn−1 − ln n) = γ = 0.577215664902 . . . e 12n 288n2
n→∞
This is the well-known Stirling’s approximation for
By adding 1/n to both sides of the previous relation,
n!. By means of this approximation, we can also find
we eventually find:
the approximation for another important quantity:
1 1 1 1 µ ¶ √ ¡ ¢2n
Hn = ln n + γ + − + − + ··· 2n (2n)! 4πn 2n
2n 12n2 120n4 252n6 = = e
¡ ¢2n ×
n n!2 2πn n
and this is the asymptotic expansion we were looking e
1 1
for. 1+ 24n + 1142n2 − · · ·
ס = ¢2
1 1
1+ +
12n 288n2 − · · ·
µ ¶
6.14 Applications of the Euler- = √
4n
1−
1
+
1
+ ··· .
πn 8n 128n2
McLaurin Formula
Another application of the Euler-McLaurin sum-
As another application of the Euler-McLaurin sum- mation formula is given by the sum Pn k p , when
k=1
mation formula, we now show the derivation of the p is any integer constant different from −1, which is
Stirling’s approximation for n!. The first step consists the case of the harmonic numbers:
in taking the logarithm of that quantity: n−1 Z n
X 1 n 1 £ p−1 ¤n
p
ln n! = ln 1 + ln 2 + ln 3 + · · · + ln n k = xp dx − [xp ]0 + px 0
−
0 2 12
k=0
so that we are reduced to compute a sum and hence 1 £ ¤n
to apply the Euler-McLaurin formula: − p(p − 1)(p − 2)xp−3 0 + · · · =
720
n−1
X Z n np+1 np pnp−1
1 n = − + −
ln(n − 1)! = ln k = ln x dx − [ln x]1 + p+1 2 12
1 2
k=1
· ¸n · ¸n p(p − 1)(p − 2)np−3
1 1 1 2 − + ···.
+ − + ··· = 720
12 x 1 720 x3 1 In this case the evaluation at 0 does not introduce
1 1 any constant. By adding np to both sides, we have
= n ln n − n + 1 − ln n + −
2 12n the following formula, which only contains a finite
1 1 1 number of terms:
− − + + ···.
12 360n2 360 n
X
R np+1 np pnp−1
Here we have used the fact that ln x dx = x ln x − kp = + + −
p+1 2 12
x. At this point we can add ln n to both sides and k=0
introduce a constant σ = 1 − 1/12 + 1/360 − · · · It is p(p − 1)(p − 2)np−3
− + ···
not by all means easy to determine directly the value 720
of σ, but by other approaches
√ to the same problem If p is not an integer, after ⌈p⌉ differentiations, we
it is known that σ = ln 2π. Numerically, we can obtain xq , where q < 0, and therefore we cannot
observe that: consider the limit 0. We proceed with the Euler-
1 1 √ McLaurin formula in the following way:
1− + = 0.919(4) and ln 2π ≈ 0.9189388.
12 360 n−1 Z n
X 1 n 1 £ p−1 ¤n
We can now go on with our sum: kp = xp dx − [xp ]1 + px 1
−
1 2 12
k=1
1 √ 1 1
ln n! = n ln n−n+ ln n+ln 2π + − +· · · 1 £ ¤n
2 12n 360n3 − p(p − 1)(p − 2)xp−3 1 + · · · =
720
To obtain the value of n! we only have to take expo- np+1 1 1 1 pnp−1
nentials: = − − np + + −
µ ¶ µ ¶ p+1 p+1 2 2 12
nn √ √ 1 1 p p(p − 1)(p − 2)np−3
n! = n 2π exp exp − ··· − − +
en 12n 360n3 12 720
82 CHAPTER 6. FORMAL METHODS
p(p − 1)(p − 2)
+ ···
720
n
X np+1 np pnp−1
kp = + + −
p+1 2 12
k=1
p(p − 1)(p − 2)np−3
− + · · · + Kp .
720
The constant:
1 1 p p(p − 1)(p − 2)
Kp = − + − + + ···
p + 1 2 12 720
has a fundamental rôle when the leading term
np+1 /(p + 1) does not increase with n, i.e., when
p < −1. In that case, in fact, the sum converges
to Kp . When p > −1 the constant is less important.
For example, we have:
n √
X √
2 √ n 1
k= n n+ + K1/2 + √ + · · ·
3 2 24 n
k=1
K1/2 ≈ −0.2078862 . . .
Xn
1 √ 1 1
√ = 2 n + K−1/2 + √ − √ + ···
k 2 n 24n n
k=1
K−1/2 ≈ −1.4603545 . . .
For p = −2 we find:
n
X 1 1 1 1 1
= K−2 − + 2 − 3 + − ···
k2 n 2n 6n 30n5
k=1
Asymptotics
Proof: If f (t0 ) < ∞ then an index N ∈ N ex- lowing formula for the radius of convergence:
ists such that for every n > N we have |fn tn0 | ≤ 1 pn
n
|fn ||t0 | < M , for some finite M ∈ R. This means = lim sup |fn |.
R n→∞
83
84 CHAPTER 7. ASYMPTOTICS
This result is the basis for our considerations on the This is a particular case of a more general re-
asymptotics of a power series coefficients. In fact, it sult due to Darboux and known as Darboux’ method.
implies that, as a first approximation, |fn | grows as First of all, let us show how it is possible to ¡obtain
¢ an
1/Rn . However, this is a rough estimate, because it approximation for the binomial coefficient nγ , when
can also grow as n/Rn or 1/(nRn ), and many possi- γ ∈ C is a fixed number and n is large. We begin
bilities arise, which can make more precise the basic by proving the following formula for the ratio of two
approximation; the next sections will be dedicated to large values of the Γ function (a, b are two small pa-
this problem. We conclude by noticing that if: rameters with respect to n):
¯ ¯
¯ fn+1 ¯ Γ(n + a)
lim ¯ ¯=S =
n→∞ ¯ fn ¯ Γ(n + b)
µ µ ¶¶
then R = 1/S is the radius of convergence of the (a − b)(a + b − 1) 1
= na−b 1 + +O .
series. 2n n2
Let us apply the Stirling formula for the Γ function:
7.2 The method of Darboux Γ(n + a)
≈
Γ(n + b)
Newton’s rule is the basis for many considerations on r µ ¶n+a µ ¶
asymptotics. 2π n+a 1
√ In practice, we used it to prove that ≈ 1+ ×
Fn ∼ φn / 5, and many other proofs can be per- n+a e 12(n + a)
r µ ¶n+b µ ¶
formed by using Newton’s rule together with the fol- n+b e 1
lowing theorem, whose relevance was noted by Ben- × 1− .
2π n+b 12(n + b)
der and, therefore, will be called Bender’s theorem:
If we limit ourselves to the term in 1/n, the two cor-
Theorem 7.2.1 Let f (t) = g(t)h(t) where f (t), rections cancellate each other and therefore we find:
g(t), h(t) are power series and h(t) has a ra- r
dius of convergence larger than f (t)’s (which there- Γ(n + a) ≈ n + b b−a (n + a)n+a
e =
fore equals the radius of convergence of g(t)); if Γ(n + b) n+a (n + b)n+b
r
limn→∞ gn /gn+1 = b and h(b) 6= 0, then: n + b b−a a−b (1 + a/n)n+a
= e n .
fn ∼ h(b)gn n+a (1 + b/n)n+b
Let us remember that if g(t) has positive real coef- We now obtain asymptotic approximations in the fol-
ficients, then gn /gn+1 tends to the radius of conver- lowing way:
r p
gence of g(t). The proof of this theorem is omitted n+b 1 + b/n
here; instead, we give a simple example. Let us sup- = p ≈
n+a 1 + a/n
pose we wish to find the asymptotic value for the µ ¶
Motzkin numbers, whose generating function is: b ³ a ´ b−a
≈ 1+ 1− ≈1+ .
√ 2n 2n 2n
1 − t − 1 − 2t − 3t 2
µ(t) = . ³
2t2 x ´n+x ³ ³ x ´´
1+ = exp (n + x) ln 1 + =
For n ≥ 2 we obviously have: n n
√ µ µ ¶¶
1 − 2t − 3t2 x x2
n 1−t− = exp (n + x) − + · · · =
µn = [t ] = n 2n2
2t2 µ ¶
1 n+2 √ x2 x2
= − [t 1/2
] 1 + t (1 − 3t) . = exp x + − + · · · =
2 n 2n
µ ¶
We now observe that the radius of convergence of x x2
µ(t) is R = 1/3, which is the same as the radius of = e 1 + + · · · .
1/2
√ 2n
g(t) = (1−3t) , while h(t) = 1 + t has 1 as radius
of convergence; therefore we have µn /µn+1 → 1/3 as Therefore, for our expression we have:
n → ∞. By Bender’s theorem we find: Γ(n + a)
r ≈
1 4 n+2 Γ(n + b)
µn ∼ − [t ](1 − 3t)1/2
= µ ¶µ ¶µ ¶
2 3 na−b ea a2 b2 b−a
√ µ ¶ ≈ 1+ 1− 1+ =
3 1/2 ea−b eb 2n 2n 2n
n+2 µ µ ¶¶
= − (−3) =
3 n+2 a−b a2 − b2 − a + b 1
√ µ ¶ µ ¶n+2 = n 1 + +O .
3 2n + 4 3 2n n2
= . We are now in a position to prove the following:
3(2n + 3) n + 2 4
7.3. SINGULARITIES: POLES 85
Theorem 7.2.2 Let f (t) = h(t)(1 − αt)γ , for some |t| > 1, f (t) assumes a well-defined value while fb(t)
γ which is not a positive integer, and h(t) having a diverges.
radius of convergence larger than 1/α. Then we have: We will call a singularity for f (t) every point t0 ∈ C
µ ¶µ ¶ such that in every neighborhood of t0 there is a t for
n 1 γ αn h(1/α) which f (t) and fb(t) behaves differently and a t′ for
fn = [t ]f (t) ∼ h (−α)n = .
α n Γ(−γ)n1+γ which f (t′ ) = fb(t′ ). Therefore, t0 = 1 is a singularity
for f (t) = 1/(1 − t). Because our previous considera-
Proof: We simply apply Bender’s theorem and the
tions, the singularities of f (t) determine its radius of
formula for approximating the binomial coefficient:
convergence; on the other hand, no singularity can be
µ ¶ contained in the circle of convergence, and therefore
γ γ(γ − 1) · · · (γ − n + 1)
= = the radius of convergence is determined by the singu-
n n!
larity or singularities of smallest modulus. These will
(−1)n (n − γ − 1)(n − γ − 2) · · · (1 − γ)(−γ) be called dominating singularities and we observe ex-
= . plicitly that a function can have more than one dom-
Γ(n + 1)
inating singularity. For example, f (t) = 1/(1 − t2 )
By repeated applications of the recurrence formula has t = 1 and t = −1 as dominating singularities, be-
for the Γ function Γ(x + 1) = xΓ(x), we find: cause |1| = |−1|. The radius of convergence is always
a non-negative real number and we have R = |t0 |, if
Γ(n − γ) =
t0 is any one of the dominating singularities for f (t).
= (n − γ − 1)(n − γ − 2) · · · (1 − γ)(−γ)Γ(−γ) An isolated point t0 for which f (t0 ) = ∞ is there-
fore a singularity for f (t); as we shall see, not every
and therefore: singularity of f (t) is such that f (t) = ∞, but, for
µ ¶ the moment, let us limit ourselves to this case. The
γ (−1)n Γ(n − γ)
= = following situation is very important: if f (t0 ) = ∞
n Γ(n + 1)Γ(−γ)
n
µ ¶ and we set α = 1/t0 , we will say that t0 is a pole for
(−1) −1−γ γ(γ + 1)
= n 1+ f (t) iff there exists a positive integer m such that:
Γ(−γ) 2n
lim (1 − αt)m f (t) = K < ∞ and K 6= 0.
from which the desired formula follows. t→t0
This value is indeed a very good approximation for 7.5 Algebraic and logarithmic
Dn , which can actually be computed as the integer
nearest to n!/e.
singularities
Let us now see how Bender’s theorem is applied
Let us consider the generating√ function for the Cata-
to the exponential generating function of the ordered
lan numbers f (t) = (1 − 1 − 4t)/(2t) and the cor-
Bell numbers. We have shown that the dominating
responding power series fb(t) = 1 + t + 2t2 + 5t3 +
singularity is a pole at t = ln 2 which has order 1:
14t4 + · · ·. Our choice of the − sign was motivated
1 − t/ ln 2 −1/ ln 2 1 by the initial condition of the recurrence Cn+1 =
lim = lim = . P n
t→ln 2 2 − et t→ln 2 −et 2 ln 2 k=0 Ck Cn−k defining the Catalan numbers. This
At this point we have: is due to the fact that, when the argument is a pos-
1 1 1 − t/ ln 2 itive real number, we can choose the positive value
[tn ] = [t n
] ∼ as the result of a square root. In other words, we
2 − et 1 − t/ ln 2 2 − et
· ¸ consider the arithmetic square root instead of the al-
1 − t/ ln 2 ¯¯ n 1 gebraic square root. This allows us to identify the
∼ ¯ t = ln 2 [t ] =
2 − et 1 − t/ ln 2 power series fb(t) with the function f (t), but when we
1 1 pass to complex numbers this is no longer possible.
=
2 (ln 2)n+1 Actually, in the complex field, a function containing
and we conclude with the very good approximation a square root is a two-valued function, and there are
On ∼ n!/(2(ln 2)n+1 ). two branches defined by the same expression. Only
Finally, we find the asymptotic approximation for one of these two branches coincides with the func-
the Bernoulli numbers. The following statement is tion defined by the power series, which is obviously a
very important when we have functions with several one-valued function.
dominating singularity: The points at which a square root becomes 0 are
special points; in them the function is one-valued, but
Principle: If t1 , t2 , . . . , tk are all the dominating in every neighborhood the function is two-valued. For
singularities of a function f (t), then [tn ]f (t) can be the smallest in modulus among these points, say t0 ,
found by summing all the contributions obtained by we must have the following situation: for t such that
independently considering the k singularities. |t| < |t0 |, fb(t) should coincide with a branch of f (t),
We already observed that ±2πi are the two domi- while for t such that |t| > |t0 |, fb(t) cannot converge.
nating singularities for the generating function of the In fact, consider a t ∈ R, t > |t0 |; the expression un-
Bernoulli numbers; they are both poles of order 1: der the square root should be a negative real number
and therefore f (t) ∈ C\R; but fb(t) can only be a real
t(1 − t/2πi) 1 − t/πi
lim = lim = −1. number or f (t) does not converge. Because we know
t→2πi et − 1 t→2πi et
that when fb(t) converges we must have fb(t) = f (t),
t(1 + t/2πi) 1 + t/πi we conclude that fb(t) cannot converge. This shows
lim = lim = −1.
t→−2πi et − 1 t→−2πi et that t0 is a singularity for f (t).
Therefore we have: Every kth root originates the same problem and
n t n 1 t(1 − t/2πi) 1 the function is actually a k-valued function; all the
[t ] t = [t ] ∼− .
e −1 1 − t/2πi et − 1 (2πi)n values for which the argument of the root is 0 is a
A similar result is obtained for the other pole; thus singularity, called an algebraic singularity. They can
we have: be treated by Darboux’ method or, directly, by means
of Bender’s theorem, which relies on Newton’s rule.
Bn 1 1
∼− − . Actually, we already used this method to find the
n! (2πi)n (−2πi)n asymptotic evaluation for the Motzkin numbers.
When n is odd, these two values are opposite in sign The same considerations hold when a function con-
and the result is 0; this confirms that the Bernoulli tains a logarithm. In fact, a logarithm is an infinite-
numbers of odd index are 0, except for n = 1. When valued function, because it is the inverse of the expo-
n is even, say n = 2k, we have (2πi)2k = (−2πi)2k = nential, which, in the complex field C, is a periodic
(−1)k (2π)2k ; therefore: function:
2(−1)k (2k)!
B2k ∼ − . et+2kπi = et e2kπi = et (cos 2kπ + i sin 2kπ) = et .
(2π)2k
This formula is a good approximation, also for small The period of et is therefore 2πi and ln t is actually
values of n, and shows that Bernoulli numbers be- ln t + 2kπi, for k ∈ Z. A point t0 for which the ar-
come, in modulus, larger and larger as n increases. gument of a logarithm is 0 is a singularity for the
88 CHAPTER 7. ASYMPTOTICS
A further correction can now be obtained by con- where m is a small integer. In the second case, gn is
sidering: the sum of various terms, as many as there are terms
1 1 1 in the polynomial q(t), each one of the form:
k(t) = ln −
2 1 − t 1 − 2t 1
1 1 qk [tn−k ] p .
− ln + (1 − 2t) ln = (1 − αt)(1 − βt)
1 − 2t 1 − 2t
(1 − 2t)2 1 It is therefore interesting to compute, once and for
= ln all, the asymptotic value [tn ]((1−αt)(1−βt))s , where
2(1 − t) 1 − 2t
s = 1/2 or s = −1/2.
which gives: Let us suppose that |α| > |β|, since the case α = β
1 1 has no interest and the case α = −β should be ap-
kn = [tn ](1 − 2t)2 ln = proached in another way. This hypothesis means that
2(1 − 1/2) 1 − 2t
t = 1/α is the radius of convergence of the function
2n 2n−1 2n−2 2n+1
= −4 +4 = . and we can develop everything around this singular-
n n−1 n−2 n(n − 1)(n − 2) ity. In most combinatorial problems we have α > 0,
This correction is still smaller, and we can write: because the coefficients of f (t) are positive numbers,
µ ¶ but this is not a limiting factor.
2n 2 Let us consider s = 1/2; in this case, a minus sign
Sn ∼ 1+ .
n−1 n(n − 2) should precede the square root. The evaluation is
shown in Table 7.1. The formula so obtained can be
In general, we can obtain the same results if we ex-
considered sufficient for obtaining both the asymp-
pand the function h(t) in f (t) = g(t)h(t), h(t) with a
totic evaluation of fn and a suitable numerical ap-
radius of convergence larger than that of f (t), around
proximation. However, we can use the following de-
the dominating singularity. This is done in the fol-
velopments:
lowing way: µ ¶ µ µ ¶¶
2n 4n 1 1 1
1 1 = √ 1− + +O
= = n πn 8n 128n2 n3
2(1 − t) 1 + (1 − 2t) µ µ ¶¶
2 3 1 1 1 1 1
= 1 − (1 − 2t) + (1 − 2t) − (1 − 2t) + · · · . = 1+ + +O
2n − 1 2n 2n 4n2 n3
This implies: µ µ ¶¶
1 1 3 1
= 1+ +O
1 1 1 2n − 3 2n 2n n2
ln = ln −
2(1 − t) 1 − 2t 1 − 2t and get:
1 2 1 r
− (1 − 2t) ln + (1 − 2t) ln − ··· µ
1 − 2t 1 − 2t α − β αn 6 + 3(α − β) 25
fn = √ 1− + +
and the result is the same as the one previously ob- α 2n πn 8(α − β)n 128n2
tained by the method of subtracted singularities. µ ¶¶
9 9αβ + 51β 2 1
+ − + O .
8(α − β)n2 32(α − β)2 n2 n3
7.7 The asymptotic behavior of The reader is invited to find a similar formula for
a trinomial square root the case s = 1/2.
³ ´1/2
[tn ] − (1 − αt)1/2 (1 − βt)1/2 = [tn ] − (1 − αt)1/2 α−β α + β
α (1 − αt) =
q ³ ´1/2
= − α−β n
α [t ](1 − αt)
1/2 β
1 + α−β (1 − αt) =
q ³ 2
´
= − α−β [t n
](1 − αt) 1/2
1 + β
(1 − αt) − β
2 (1 − αt)2
+ · · · =
q α ³ 2(α−β) 8(α−β)
2
´
α−β n β β
=− [t ] (1 − αt)1/2 + 2(α−β) (1 − αt)3/2 − 8(α−β)2 (1 − αt)5/2 + · · · =
q α ³¡ ¢ ¡3/2¢ ¡5/2¢ ´
β2
= − α−β α
1/2 n β n
n (−α) + 2(α−β) n (−α) − 8(α−β)2 n (−α) + · · · =
n
q ³ ´
(−1)n−1 ¡2n¢ β2
= − α−β n (−α) n
1 − β 3
− 2
15
+ · · · =
q α 4 (2n−1) ¡n ¢ ³ 2(α−β) 2n−3 8(α−β) (2n−3)(2n−5)
2 ¡ ¢ ´
n
= α−β α
α 4n (2n−1) n
2n 3β
1 − 2(α−β)(2n−3) 15β
− 8(α−β)2 (2n−3)(2n−5) + O n13 .
In these cases, the only method seems to be the 3. if α, β are positive real numbers and γ, δ are real
Cauchy theorem, which allows us to evaluate [tn ]f (t) numbers, then the function:
by means of an integral: µ µ ¶γ
β 1 1
Z f (t) = exp ln ×
1 f (t) (1 − t)α t (1 − t)
fn = dt
2πi γ tn+1 µ µ ¶¶δ !
2 1 1
× ln ln
where γ is a suitable path enclosing the origin. We do t t (1 − t)
not intend to develop this method here, but we’ll limit is H-admissible.
ourselves to sketch a method, derived from Cauchy
theorem, which allows us to find an asymptotic evalu- For example, the following functions are all H-
ation for fn in many practical situations. The method admissible:
µ ¶ µ ¶
can be implemented on a computer in the following t t2 t
sense: given a function f (t), in an algorithmic way we e exp t + exp
2 1−t
can check whether f (t) belongs to the class of func- µ ¶
tions for which the method is applicable (the class 1 1
exp ln .
of “H-admissible” functions) and, if that is the case, t(1 − t)2 1 − t
we can evaluate the principal value of the asymp- In particular, for the third function we have:
totic estimate for fn . The system ΛΥΩ, by Flajolet, µ ¶ µ ¶ µ ¶
Salvy and Zimmermann, realizes this method. The t 1 1 1
exp = exp − 1 = exp
development of the method was mainly performed 1−t 1−t e 1−t
by Hayman and therefore it is known as Hayman’s and naturally a constant does not influence the H-
method; this also justifies the use of the letter H in admissibility of a function. In this example we have
the definition of H-admissibility. α = β = 1 and γ = δ = 0.
A function is called H-admissible if and only if it For H-admissible functions, the following result
belongs to one of the following classes or can be ob- holds:
tained, in a finite number of steps according to the
following rules, from other H-admissible functions: Theorem 7.8.1 Let f (t) be an H-admissible func-
tion; then:
1. if f (t) and g(t) are H-admissible functions and f (r)
p(t) is a polynomial with real coefficients and fn = [tn ]f (t) ∼ p as n→∞
positive leading term, then: rn 2πb(r)
where r = r(n) is the least positive solution of the
exp(f (t)) f (t) + g(t) f (t) + p(t) equation tf ′ (t)/f (t) = n and b(t) is the function:
µ ′ ¶
p(f (t)) p(t)f (t) d f (t)
b(t) = t t .
dt f (t)
are all H-admissible functions;
As we said before, the proof of this theorem is
2. if p(t) is a polynomial with positive coefficients based on Cauchy’s theorem and is beyond the scope
and not of the form p(tk ) for k > 1, then the of these notes. Instead, let us show some examples
function exp(p(t)) is H-admissible; to clarify the application of Hayman’s method.
7.9. EXAMPLES OF HAYMAN’S THEOREM 91
Bibliography
The birth of Computer Science and the need of ana- It deals, in an elementary but rigorous way, with the
lyzing the behavior of algorithms and data structures main topics of Combinatorial Analysis, with an eye
have given a strong twirl to Combinatorial Analysis, to Computer Science. Many exercise (with solutions)
and to the mathematical methods used to study com- are proposed, and the reader is never left alone in
binatorial objects. So, near to the traditional litera- front of the many-faced problems he or she is encour-
ture on Combinatorics, a number of books and papers aged to tackle.
have been produced, relating Computer Science and
the methods of Combinatorial Analysis. The first au- Ronald L. Graham, Donald E. Knuth, Oren
thor who systematically worked in this direction was Patashnik: Concrete Mathematics, Addison-
surely Donald Knuth, who published in 1968 the first Wesley (1989).
volume of his monumental “The Art of Computer
Programming”, the first part of which is dedicated Many texts on Combinatorial Analysis are worth of
to the mathematical methods used in Combinatorial being considered, because they contain information
Analysis. Without this basic knowledge, there is little on general concepts, both from a combinatorial and
hope to understand the developments of the analysis a mathematical point of view:
of algorithms and data structures:
William Feller: An Introduction to Probabil-
Donald E. Knuth: The Art of Computer Pro- ity Theory and Its Applications, Wiley (1950) -
gramming: Fundamental Algorithms, Vol. I, (1957) - (1968).
Addison-Wesley (1968).
Many additional concepts and techniques are also John Riordan: An Introduction to Combinatorial
contained in the third volume: Analysis, Wiley (1953) (1958).
Numerical and probabilistic developments are to Ian P. Goulden, David M. Jackson: Combinato-
be found in the central volume: rial Enumeration, Dover Publ. (2004).
Donald E. Knuth: The Art of Computer Pro-
gramming: Numerical Algorithms, Vol. II, Richard P. Stanley: Enumerative Combinatorics,
Addison-Wesley (1973) Vol. I, Cambridge Univ. Press (1986) (2000).
93
94 CHAPTER 8. BIBLIOGRAPHY
Riordan arrays are part of the general method of Doron Zeilberger: A holonomic systems approach
coefficients and are particularly important in prov- to special functions identities, Journal of Compu-
ing combinatorial identities and generating function tational and Applied Mathematics 32 (1990), 321
transformations. They were introduced by: – 368.
Louis W. Shapiro, Seyoum Getu, Wen-Jin Woan, Doron Zeilberger: A fast algorithm for proving
Leon C. Woodson: The Riordan group, Discrete terminating hypergeometric identities, Discrete
Applied Mathematics, 34 (1991) 229 – 239. Mathematics 80 (1990), 207 – 211.
functions), the method is not very general, but covered by the quoted text, especially Knuth, Wilf
is very effective whenever it can be applied. The and Henrici. The method of Heyman is based on the
method was extended by Flajolet to some classes of paper:
exponential generating functions and implemented
Micha Hofri: Probabilistic Analysis of Algo-
in Maple.
rithms, Springer (1987).
Marco Schützenberger: Context-free languages
and pushdown automata, Information and Con-
trol 6 (1963) 246 – 264
***
Chapter 7: Asymptotics
97
98 INDEX
table, 5
tail, 67
Tartaglia triangle, 16
Taylor theorem, 80
terminal word, 68
tractable algorithm, 10
transposition, 12
transposition representation, 13
tree representation, 36
triangle, 55
trinomial coefficient, 46
unary-binary tree, 71
underdiagonal colored walk, 62
underdiagonal walk, 20
unfold a recurrence, 6, 49
uniform convergence, 83
unit, 26
unsuccessful search, 5
walk, 20
word, 12, 67
worst AVL tree, 52
worst case analysis, 5
Z-sequence, 58
zeta function, 18
Zimmermann, Paul, 90