Pete L. Clark
Pete L. Clark
DISCRETE CALCULUS
PETE L. CLARK
I have a quite different answer to that question. Here goes: the task in finding
a closed form expression for a sum is to eliminate the “dot dot dot”. This is exactly
what induction does for us. In general, suppose f and g are functions from the
positive integers to the real numbers, and our task is to prove that
f (1) + . . . + f (n) = g(n).
Now let us contemplate proving this by induction.
1
2 PETE L. CLARK
Now assume the identity for a fixed positive integer n, and add f (n) to both sides:
IH
f (1) + . . . + f (n) + f (n + 1) = (1 + . . . + f (n)) + f (n + 1) = g(n) + f (n + 1).
Since we want to get g(n + 1) in the end, what remains to be shown is precisely
that g(n + 1) = g(n) + f (n + 1), or equivalently
(4) g(n + 1) − g(n) = f (n + 1).
If f and g are both simple algebraic functions, then (assuming the result is actually
true!) verifying (4) is a matter of high school algebra. For example, to prove (2),
then – after checking that 12 = 1(1+1)(2·1+1)
6 – the identity we need to verify is
(n + 1)(n + 2)(2(n + 1) + 1) n(n + 1)(2n + 1)
− = (n + 1)2 ,
6 6
and we need only expand out both sides and see that we get n2 + 2n + 1 either way.
I wish to suggest that this procedure is analogous to whatR happens in calculus when
x
we have two functions f and g and wish to verify that 1 f (t) = g(x): it suffices,
dg
by the Fundamental Theorem of Calculus, to show that g(1) = 0 and dx = f . This
analogy may well seem farfetched at the moment, so let’s leave it aside and press on.
But not just yet. First let us consider a slightly different framework: we have
two functions f, g : Z+ → R, but instead of trying to show f (1) + . . . + f (n) = g(n),
we are trying to show that
f (1) + . . . + f (n) = g(1) + . . . + g(n).
Let us write F (n) = f (1) + . . . + f (n) and G(n) = g(1) + . . . + g(n), so we want
to show that F (n) = G(n) for all n. Suppose we try to prove this by induction.
We must show that F (1) = G(1), and then we get to assume that for a given n,
F (n) = G(n), and we need to show F (n + 1) = G(n + 1). Here’s the point: given
(5) F (n) = G(n),
the desired conclusion G(n + 1) = F (n + 1) is equivalent to
(6) F (n + 1) − F (n) = G(n + 1) − G(n).
Indeed, if (5) and (6) both hold, then adding them together, we get
F (n + 1) = F (n + 1) − F (n) + F (n) = G(n + 1) − G(n) + G(n) = G(n + 1),
and similarly, if we know F (n + 1) = G(n + 1), then subtracting (5), we get (6). So
our application of induction gives that it is necessary and sufficient to show that
F (n + 1) − F (n) = G(n + 1) − G(n). Note however that
F (n + 1) − F (n) = (f (1) + . . . + f (n) + f (n + 1)) − (f (1) + . . . + f (n)) = f (n + 1),
and similarly G(n + 1) − G(n) = g(n + 1). We need to show this for all n, i.e., we
need to know that f (n) = g(n) for all n ≥ 2. Since we also needed this for n = 1,
we see – perhaps somewhat sheepishly – that what we have shown is the following.
Proposition 1. Let f, g : Z+ → R. The following are equivalent:
(i) For all n ∈ Z+ , f (n) = g(n).
(ii) For all n ∈ Z+ , f (1) + . . . + f (n) = g(1) + . . . + g(n).
MATHEMATICAL INDUCTION, POWER SUMS, AND DISCRETE CALCULUS 3
This is not earth-shattering, but the following minor variation is somewhat inter-
esting. Namely, for any function f , define a new function ∆f , by
(∆f )(n) = f (n + 1) − f (n).
The point here is that (∆f )(1) + . . . + (∆f )(n) =
(f (2)−f (1))+(f (3)−f (2))+. . .+(f (n)−f (n−1))+(f (n+1)−f (n)) = f (n+1)−f (1).
Since Proposition 1 holds for all functions f and g, in particular it holds for ∆f
and ∆g, and we get:
Proof. We define the new function h(x) = f (x) − g(x). Our hypotheses are that
h0 (x) = (f (x) − g(x))0 = f 0 (x) − g 0 (x) = 0 for all x and that h(1) = f (1) − g(1) = 0,
and we want to show that h(x) = 0 for all x. So suppose not, i.e., there exists some
x0 with h(x0 ) 6= 0. Certainly x0 6= 1, and we may assume without loss of generality
that x0 > 1. Now we apply the Mean Value Theorem to h(x) on the interval [1, x0 ]:
there exists a real number c, 1 < c < x0 , such that
h(x0 ) − h(1)
h0 (c) = .
x0 − 1
Thus h0 (c) 6= 0, contradicting the hypothesis that h0 (x) for all x ∈ R.
If we think instead of two functions x(t) and y(t) giving the position of a moving
particle at time t, the Theorem states the following physically plausible result: two
moving bodies with identical instanteous velocity functions and which have the
same position at time t = 1 will have the same position for all times t. Theorem 3
applies to x(t) and y(t) as follows: we look only at positive integer values of x and
y, and we can interpret (∆x)(n) as the average velocity of x(t) between times n
and n + 1. The result then says that if x(t) and y(t) start out at the same position
at time t = 1 and their average velocities on each interval [n, n + 1] agree, then x
and y have the same positions at all positive integer values.
Note that we needed a deep theorem from calculus to show Theorem 4, but for
the analogous Theorem 3 we only needed mathematical induction.
4 PETE L. CLARK
Again, there is a simple answer which is perfectly good and probably indeed should
be the first answer given. Namely, we should clarify our task: we were not claiming
to be able to find – and still less, asking the student to find! – the right hand side
of these identities. It is important to understand that induction is never used to
discover a result; it is only used to prove a result that one already either suspects
to be true or has been asked to show. In other words, the simple answer to the
question “How do we figure out what goes on the right hand side?” is: we don’t.
It is given to us as part of the problem.
But this is a very disappointing answer. I feel that the disappointment this answer
engenders in students is of pedagogical significance, so forgive me while I digress on
this point (or skip ahead, of course). In other words, it often happens in university
level math classes that we present certain techniques and advertise them as giving
solutions to certain problems, but we often do not discuss the limitations of these
techniques, or more positively, try to identify the range of problems to which the
techniques can be successfully applied. For instance, after learning several inte-
gration techniques, many calculus students become anxious when they realize that
they may not know which technique or combination of techniques to apply to a
given proble. They often ask for simple rules like, ”Can you tell us when to use
integration by parts?” I at least have found it tempting as an instructor to brush
off such questions, or answer them by saying that much of the point is for them to
gain enough experience with the various techniques so as to be able to figure out
(or guess) which techniques will work on a given problem. But calculus instructors
2
know something that the students don’t: many functions, like ex , simply do not
have elementary antiderivatives. It would be a terrible disservice not to point this
out to the students, as well as not to clue them into the truth: we carefully select
the integration problems we give the students so that (i) elementary antiderivatives
exist and (ii) they can indeed be found using the set of tools we have taught them.
There are real dangers that such practices will dampen or kill off students’ math-
ematical curiosity. Most students initially think they are being asked to solve a
robust class of problems – and thus, they think that they should know how to
solve these problems, and are disturbed that they don’t – but eventually they
learn that less knowledge than they thought is actually needed to solve the ex-
ercise. This is intellectually deadening. What use it it to know how to prove
12 + . . . + n2 = n(n+1)(2n+1)
6 without knowing how to figure out what should go
on the right hand side? The answer is that there is no inherent use that I can
Rb
see (other than being able to compute a x2 dx using Riemann sums); it is just an
opportunity to demonstrate a mastery of a very narrow skill, which we identified
in the last section as being able to verify that g(n + 1) − g(n) = f (n).
Of course, sometimes there are necessary reasons for not answering the natural
MATHEMATICAL INDUCTION, POWER SUMS, AND DISCRETE CALCULUS 5
questions: the answer may be very complicated! For instance, although I tell my
2
calculus students that the reason they cannot integrate ex is that it is provably
impossible, I do not give any indication why this is true: such arguments are well
beyond the scope of the course.
But such is not the case for 12 + . . . + n2 , and we now present a simple method to
derive formulas for the power sums
Sd (n) = 1d + . . . + nd .
We begin with the sum
n
X
(i + 1)d+1 − id+1 ,
S=
i=1
which we evaluate in two different ways. First, writing out the terms gives
S = 2d+1 −1d+1 +3d+1 −2d+1 +. . .+nd+1 −(n−1)d+1 +(n+1)d+1 −nd+1 = (n+1)d+1 −1.
Second, by first expanding out the binomial (i + 1)d+1 we get
n n
X X d+1 d d+1
S= (i + 1)d+1 − id+1 = id+1 + i + ... + i + 1 − id−1 =
i=1 i=1
1 d
n n n n
X d+1 d d+1 d+1 X d d+1 X X
( i + ... + i) = i + ... + i+ 1=
i=1
1 d 1 i=1
d i=1 i=1
d d
X d+1 X d+1
Sj (n) = Sj (n).
j=0
d+1−j j=0
j
Equating our two expressions for S, we get
d
X d+1
(n + 1)d+1 − 1 = Sj (n).
j=0
j
We leave it as an exercise for the reader to derive (3) using this method.
Again though, let us not neglect the natural question: the method presented gives
a recursive method for evaluating the power sums Sd (n). Can we find a closed
form expression for Sd (n) in general? The answer is yes, and we will derive it as
an application of the discrete calculus, a topic to which we turn to next.
3. Discrete calculus
Let f : Z+ → R be any function. We define the new function ∆f as follows:
(∆f )(n) = f (n + 1) − f (n).
We view ∆ as being an operator on the set V = {f : Z+ → R} of all functions from
Z+ to R. Specifically, it is called the forward difference operator.
d f g df −f dg
(D5) dx g = dx g2 dx .
dxn
(D6) dx = nxn−1 .
Similarly, ∇ satisfies (D2) and (D3), but not quite (D1): if f (n) = C for all
n, then for all n ≥ 2 we have (∇f )(n) = f (n) − f (n − 1) = C − C = 0, but
(∇f )(1) = f (1) − f (0) = C − 0 = C.
On the other hand, if you try to verify the direct analogue of (D4) – namely
∆(f g) = ∆(f )g +f ∆(g) – you soon see that it does not work out: the left hand side
has two terms, the right hand side has four terms, and no cancellation is possible.
Another way to see that this formula cannot be correct is as follows: if x denotes
the function n 7→ n, then as with the usual derivative we have
∆(x)(n) = n + 1 − n = 1;
i.e., the discrete derivative of the identity function x is the constant function 1. But
from this and the product rule (D4), the power rule (D6) follows by mathematical
induction. However, let us calculate ∆(x2 ):
∆(x2 )(n) = (n + 1)2 − n2 = n2 + 2n + 1 − n2 = 2n + 1,
whereas
((∆x)x + x∆x)(n) = n + n = 2n.
Something slightly different does hold:
(∆f g)(n) = (∆f )(n)g(n) + f (n + 1)(∆g)(n).
This formula looks a bit strange: the left hand side is symmetric in f and g, whereas
the right hand side is not. Thus there is another form of the product rule:
(∆f g)(n) = ∆(gf )(n) = f (n)(∆g)(n) + (∆f )(n)g(n + 1).
A more pleasant looking form of the product rule is
(8) (∆f g) = f ∆g + (∆f )g + ∆f ∆g.
d
This formulation makes clear the relationship with the usual product rule for dx :
if f : R → R is a differentiable function, then defining ∆f to be f (x + h) − f (x),
one checks that (8) remains valid. Now dividing both sides by ∆x = h, we get
(∆f g) ∆g ∆f ∆f
=f + g+ ∆g.
h ∆x ∆x ∆x
As h → 0, ∆f df ∆g dg df
∆x → dx , ∆x → dx and the last term approaches dx · 0 = 0. So the
d
product rule for dx is a simplification of the discrete product rule, an approximation
which becomes valid in the limit as h → 0. Thinking back to calculus, it becomes
clear that many identities in calculus are simplfications of corresponding discrete
8 PETE L. CLARK
Moral: the conventional calculus is more analytically complex than discrete calcu-
lus: in the former, one must deal rigorously and correctly with limiting processes,
whereas in the latter no such processes exist. Conversely, discrete calculus can be
more algebraically complex than conventional calculus. More on this later.
In the usual calculus, one studies the inverse process to differentiation, namely
antidifferentiation. The plausible candidate for the discrete antiderivative is just
the summation operator Σ : f → Σf defined as
(Σf )(n) = f (1) + . . . + f (n).
Let us now calculate the composite operators ∆◦Σ and Σ◦∆ applied to an arbitary
discrete function f :
(∆◦Σ)(f )(n) = ∆(n 7→ f (1)+. . .+f (n)) = f (1)+. . .+f (n+1)−(f (1)+. . .+f (n)) = f (n+1).
Similarly,
(Σ◦∆)(f )(n) = Σ(n 7→ f (n+1)−f (n)) = f (2)−f (1)+. . .+f (n+1)−f (n) = f (n+1)−f (1).
MATHEMATICAL INDUCTION, POWER SUMS, AND DISCRETE CALCULUS 9
So Σ and ∆ are very close to being inverse operators, but there is something slightly
off with the indexing. Now the ∇ operator proves its worth: we have
(Σ◦∇)(f )(n) = Σ(n 7→ f (n)−f (n−1)) = f (1)−f (0)+. . .+f (n)−f (n−1) = f (n)−f (0) = f (n)
and
(∇◦Σ)(f )(n) = ∇(n 7→ f (1)+. . .+f (n)) = (f (1)+. . .+f (n))−(f (1)+. . .+f (n−1)) = f (n).
So indeed ∇ and Σ are inverse operators. This is even better than in the usual
calculus, where the antiderivative is only well-determined up to the addition of a
constant. In other words:
Theorem 7. (Fundamental Theorem of Discrete Calculus, v. 1) For functions
f, g : Z+ , R, the following are equivalent:
(i) For all n ∈ Z+ , f (n) = g(n) − g(n − 1) (f = ∇g).
(ii) For all n ∈ Z+ , g(n) = f (1) + . . . + f (n) (Σf = g).
Pn
In other words, if we want to find a closed form expression for i=1 f (i), it suffices
to find a function g such that ∇g = f .
1Stopping the sum at f (b − 1) rather than f (b) is the correct normalization for ∆, as we will
shortly see.
10 PETE L. CLARK
so that F is a function whose discrete derivative – this time ∆ and not ∇ – is equal
to f . Second, for 1 ≤ a ≤ b, we have
F (b) − F (a) =
Remark: Of course in the traditional calculus the definite integral Σba f (x)dx has an
area interpretation. This can be given to the discrete definite integral Σba f as well.
Namely, for a discrete function f : Z+ → R, extend it to a function f : [1, ∞) → R
by f (x) = f (bxc). Thus f is the unique step function whose restriction to Z+ is f
and is left-continuous at each integer value. Then one has
Z b
f dx = Sab f,
a
i.e., the area under the step function f : [1, n] is precisely f (1) + . . . + f (n − 1).
Remark: We hope the reader has by now appreciated that the distinction be-
tween a discrete calculus based on ∆ versus one based on ∇ is very minor: in the
former case, we define the antiderivative to be f 7→ (n 7→ f (1) + . . . + f (n − 1))
(and similarly for the definite integral) Σba f and the in the latter case we define the
antiderivative to be f 7→ (n 7→ f (1) + . . . + f (n)). So we could make do by choosing
either one once and for all as the discrete derivative; on the other hand, there is no
compelling need to do so.
from k = 1 to n yields
n
X n
X
f (n + 1)g(n + 1) − f (1)g(1) = (∆f )(k)g(k) + f (k + 1)(∆g)(k),
k=1 k=1
or
n
X n
X
(∆f )(k)g(k) = f (n + 1)g(n + 1) − f (1)g(1) − f (k + 1)(∆g)(k).
k=1 k=1
Solution: If instead we are asked to find the antiderivative of xex , we would use
integration by parts, writing h(x) = g(x)df (x), with g(x) = x, df (x) = ex dx. Then
Z Z
xe dx = xe − ex = xex − ex + C = (x − 1)ex + C.
x x
Let’s try the discrete analogue: put (∆f )(n) = 2n , g(n) = n, so that h(n) =
(∆f )(n)g(n). Then f (n) = 2n , ∆g = 1, so
n
X n
X
H(n) = (∆f )(k)g(k) = (n + 1)2n+1 − 2 − 2k+1 = (n + 1)2n+1 − 2 − (2n+2 − 4)
k=1 k=1
n+1
=2 (n + 1 − 2) = (n − 1)2n+1 + 2.
Thus the result is closely analogous but not precisely what one might guess: note
the 2n+1 in place of ex .
3.5. Some discrete differential equations.
Example: For α ∈ R, let us find all discrete functions f with (∆f ) = αf . For
all n ≥ 1, we have f (n + 1) − f (n) = αf (n) or f (n + 1) = (α + 1)f (n). The general
solution is then f (1) = C (arbitrary) and f (n) = (α + 1)n−1 C. Here we are using
the convention that 00 = 1, so that the general solution to ∆f = −f is given by
Cδ1 , where δ1 (1) = 1, δ1 (n) = 0 for all n > 1.
Moreover, the operators ∆, ∇ and Σ are all linear operators on the vector space V ,
and ∇ and Σ are mutually inverse.
Again, that’s nice, but to compute things we would rather have a finite dimen-
sional vector space. In fact we can define an infinite family of finite dimensional
subspaces Pd ⊂ V , as follows.
2Moreover, this set is not a basis: its span is the set of all functions which are zero for all
sufficiently large n. But this is not a key point for us.
MATHEMATICAL INDUCTION, POWER SUMS, AND DISCRETE CALCULUS 13
nonzero. Direct calculation shows that this coefficient is (d+1)ad −ad = dad , which
is nonzero since we assumed that d > 0 and ad 6= 0.
An immediate consequence of Proposition 10 is
∆(Pd ) ⊆ Pd−1 ⊆ Pd .
In particular, ∆ is a linear operator on the finite-dimensional vector space Pd .
Moreover its kernel is the set of constant functions. Indeed, if for any discrete
function f we have f (n + 1) − f (n) = 0 for all n, then by induction we have
f (n) = f (1) for all n. (This is the discrete analogue of the fact that a function with
identically zero derivative must be constant.) Therefore the kernel of ∆ on Pd is
one-dimensional. Recall the following fundamental fact of linear algebra: for any
linear operator L on a finite-dimensional vector space W , we have
dim(ker(L)) + dim(L(W )) = dim L.
Therefore we find that dim(∆(Pd )) = d. On the other hand, we know from the
proposition that the image ∆(Pd ) is contained in the d-dimensional subspace Pd−1 .
Therefore we must have equality:
Theorem 11. We have ∆(Pd ) = Pd−1 .
What is the significance of this? Applied to the function xd−1 ∈ Pd , we get that
there exists a degree d polynomial Pd (x) such that for all n ∈ Z+ ,
∆(Pd )(n) = Pd (n + 1) − Pd (n) = nd .
Since ∆(Pd )(n) = ∇(Pd )(n + 1), we have also
∇(Pd )(n + 1) = nd ,
and applying the summation operator Σ to both sides we get that for all n ∈ Z+ ,
Pd (n + 1) = 1d + . . . + nd .
Thus we’ve shown that there must be a nice closed form expression for the sum of
the first n dth powers: indeed, it must be a polynomial of degree d + 1.
But first we should address the following concern: given that it is ∇ and not
∆ which is the inverse to Σ on V , why did we work first with ∆ and only at the
end “shift variables” to get back to ∇?
The answer is that the linear map ∇ does not map the space of polynomial func-
tions to itself. It’s close, but remember that ∇ of the nonzero constant function C
is not zero: rather it is the function which is C at n = 1 and 0 for all larger values
of n, but this is not a polynomial function of any degree. (Recall that a degree d
polynomial function can be zero for at most d distinct real numbers.) So ∆ it is.
d
Note that the usual derivative operator dx also carries Pd linearly onto Pd−1 so
gives a linear operator on Pd . In this classical case the corresponding matrix is
14 PETE L. CLARK
The linear algebra of ∆ is more complicated (and more interesting!). E.g., since
∆(1) = 0,
∆(x) = 1,
∆(x ) : n 7→ (n + 1)2 − n2 = 2n + 1,
2
To find the discrete antiderivative of the function xd−1 then, it suffices to solve the
matrix equation M [ad , ad−1 , . . . , a0 ]t = [0, . . . , 0, 1, 0]t for a0 , . . . , ad .
Moreover, because the matrix is in upper triangular form, we can easily solve the
linear system by back substitution. For example, when d = 4 we get:
a1 + a2 + a3 + a4 = 0,
+2a2 + 3a3 + 4a4 = 0,
3a3 + 6a4 = 0,
4a4 = 1.
(The last equation reads 0 = 0.) So we have a4 = 41 , and then
1 −1
a3 = (−6a4 ) = ,
3 2
1 1
a2 = (−3a3 − 4a4 ) = ,
2 4
a1 = −a2 − a3 − a4 = 0.
Note that the constant term a0 is undetermined, as it should be. It follows from
the above analysis that it doesn’t matter what constant term we take, so we may
as well take a0 = 0. Thus
1 1 1
P4 (x) = x4 − x3 + x2 ,
4 2 4
and we easily calculate
2
n4 n3 n2
n(n + 1)
P4 (n + 1) = + + = .
4 2 4 2
MATHEMATICAL INDUCTION, POWER SUMS, AND DISCRETE CALCULUS 15
As we have seen, the basis 1, x, . . . , xd is a very nice one for the linear operator
d
dx on the space Pd : the matrix representation is . . ..
2 d
d
There is in fact an even better basis for dx on Pd : namely 1, x, x2 , . . . , xd , be-
cause with respect to this basis the matrix is . . ., a shift operator.
An important lesson in linear algebra is to find the best basis for the problem
at hand. More specifically, given a linear operator T on a finite-dimensional vector
space V over a field k, then under the assumption that all of the eigenvalues of
V are elements of k, there exists a Jordan basis for V with respect to T , i.e., a
basis in which V decomposes as a direct sum of T -stable subspaces Wi such that
Ti restricted to each Wi is the sum of a shift operator and a scalar.
Pn−1
Corollary 13. a) 1 p0 (x) = p1 (n) − p1 (1) = p1 (x) − 1.
Pn−1
b) For all k ≥ 1, 1 pk (x) = pk+1 (n) − pk+1 (1) = pk+1 (n).
d
To stress the analogy between ∆ and dx , it is common to define the falling powers
xk = (k!)pk (x) = x(x − 1) · · · (x − k + 1).
So we have found what linear algebra tells us is the optimal basis for Pd (and, in
fact, for the infinite dimensional vector space P). What benefits do we reap?
5.2. The discrete Taylor series of a polynomial function.
Another aspect of the philosophy of the optimal basis {b1 , . . . , bn } for a vector space
V is that, upon expressing v as a linear combination of the basis vectors:
v = a1 b1 + . . . + an bn , ai ∈ R,
we expect that the coefficients ai will be natural functions of v.
xi
Example: Suppose that V = Pd and we choose the basis b0 = 1, bi = i! . We
know there are unique real numbers a0 , . . . , ad such that
x2 x3 xd
P (x) = a0 + a1 x + a2 + a3 + . . . + ad .
2 3! d!
But this is just the Taylor series for P (x). Explicitly, repeated evaluation at 0 and
k
differentiation gives ak = ddxP (0) for all 0 ≤ k ≤ d.
Now we keep V = Pd but consider instead the natural basis p0 (x), . . . , pd (x) for the
discrete derivative ∆. We get that for any P ∈ Pd unique real numbers a0 , . . . , ad
such that
x(x − 1) x(x − 1) · · · (x − d + 1)
P (x) = a0 + a1 x + a2 + . . . + an .
2 d!
Again, evaluating at x = 0 we find
a0 = P (0).
Taking ∆ of both sides gives
(∆P )(x) = a1 + a2 p1 (x) + . . . + ad pd−1 x,
and evaluating at 0 gives a1 = (∆P )(x). Continuing on in this way, we find that
for all 0 ≤ k ≤ d, ak = (∆k P )(0). So we have shown
Theorem 14. For any P (x) ∈ Pd , we have P (x) =
d d k
X x(x − 1) · · · (x − k + 1) X X k
(∆k P )(0) = (−1)j P (k − j)pk (x).
k! j=0
j
k=0 k=0
Pn−1
Taking 1 of both sides and applying Corollary 13, we get at last a closed form
expression for an arbitrary power sum:
d k
X X k n(n − 1) · · · (n − k)
(11) 1d + . . . + (n − 1)d = (−1)j (k − j)d .
j=0
j (k + 1)!
k=1