0% found this document useful (0 votes)
4 views26 pages

Spectral Derivatives

The document discusses the mathematical foundations of the spectral-derivatives Python package, emphasizing the transformation of functions and derivatives in the Fourier domain. It covers essential concepts in signal processing and calculus, including the Fourier basis, transforms, and the implications of using Chebyshev derivatives. The text aims to provide a comprehensive understanding of these mathematical principles for practical application in computing.

Uploaded by

Giovani Zbl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views26 pages

Spectral Derivatives

The document discusses the mathematical foundations of the spectral-derivatives Python package, emphasizing the transformation of functions and derivatives in the Fourier domain. It covers essential concepts in signal processing and calculus, including the Fourier basis, transforms, and the implications of using Chebyshev derivatives. The text aims to provide a comprehensive understanding of these mathematical principles for practical application in computing.

Uploaded by

Giovani Zbl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Spectral Derivatives

Pavel Komarov
January 8, 2025

One of the happiest accidents in all math is the ease of transforming a function to and taking derivatives in the
Fourier (i.e. the frequency) domain. But in order to exploit this extraordinary fact without serious artefacting, and in
order to be able to use a computer, we need quite a bit of extra knowledge and care.
This document sets out the math behind the spectral-derivatives Python package, all the way down to the
arXiv:2506.06210v1 [eess.SP] 6 Jun 2025

bones, as much as I can manage. I try to get in to the real whys behind what we’re doing here, touching on fundamental
signal processing and calculus concepts as necessary, and building upwards to more general cases.

Contents
1 Bases 2
1.1 The Fourier Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Transforms 3
2.1 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 A Whole Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Taking Derivatives in the Fourier Domain 5


3.1 Taking Derivatives in the Discrete Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.1 The DFT Pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.2 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.3 Taking Derivatives of the Interpolant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 The Chebyshev Basis 9


4.1 The Advantage of Chebyshev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5 Chebyshev Derivatives by Reduction to the Fourier Case 12


5.1 The Discrete Cosine Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2 Even and Odd Derivatives and the Discrete Sine Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.3 Transforming Back to the Chebyshev Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3.1 Higher Derivatives and Splintering Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.4 Handling Domain Endpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.4.1 Endpoints for Higher Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6 Chebyshev Derivatives via Series Recurrence 17


6.1 Chebyshev Series Derivative Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

7 Multidimensionality 19
7.1 Dimensions Together versus In Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

8 Arbitrary Domains 20
8.1 Fourier on [a, b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
8.2 Chebyshev on [a, b] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
8.3 Accounting for Smoosh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

9 Differentiating in the Presence of Noise 22


9.1 White Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
9.2 Filtering with the Fourier Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
9.3 The Advantage of Spectral Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
9.4 Measurement versus Process Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
9.5 Filtering with Polynomial Bases (is Terrible) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1
1 Bases
A basis is a set of functions, call them {ξk }, that can be linearly combined to produce other functions. Often these
are chosen to be orthogonal, meaning that if we take the “inner product” of one funtion from the set with itself, we
get back a constant (often normalized to be 1), and if we take the inner product of one of these functions with a
different member of the set, we get back 0. In this sense the members of an orthogonal basis set are like perpendicular
directions on a graph.
The inner product between two functions f and g is a generalization of the inner product between vectors, where
instead of summing over a finite number of discrete entries, we integrate over infinitely many infinitesimally-separated
points in the domain. We define it as:

Zb
⟨f, g⟩ = f (x)g(x)dx
a

where the overbar ◦ denotes a complex conjugate.


The inner product is symmetrical, so

Zb
⟨f, g⟩ = ⟨g, f ⟩ = f (x)g(x)dx
a

Note that if we set a and b at ±∞, this integral could diverge. If it doesn’t diverge with infinite bounds, we say the
argument is “Lebesgue integrable”[1]. Some of what we’ll do only makes sense for this class of functions, so be aware.

1.1 The Fourier Basis


The most famous basis is the Fourier basisa , which is the set of complex exponentials:
ejω = cos(ω) + j sin(ω) (1)

where I use j to represent the imaginary unit ( −1), because I’m from Electrical Engineering, and because Python
uses j.
Why this identity is true isn’t obvious at first but can be seen by Taylor Expanding[3] the exponential function
and trigonometric functions:

x2 x3 X xn
ex = 1 + x + + + ... =
2! 3! n=0
n!
So
(jω)2 (jω)3 ω2 ω3 ω4
ejω = 1 + jω + + + ... = 1 + jω − −j + − ...
2! 3! 2! 3! 4!

ω3 ω5 ω7
sin(ω) = ω − + − + ...
3! 5! 7!

ω2 ω4 ω6
cos(ω) = 1 − + − + ...
2! 4! 6!

Notice all of the even-power terms appear with alternating sign as in the cosine expansion, and the odd-power
terms appear with alternating sign as in the sine expansion, but with an extra j multiplied in.
The presence of complex numbers to make this work can be confusing at first, but don’t be scared! All we’re really
doing is using a compressed representation of a sine plus a cosine, where the real and imaginary parts (orthogonal in
the complex plane, and therefore independent and non-interfering) allow us to describe the contributions of sine and
cosine simultaneously. In fact, Joseph Fourier originally used only real trigonometric functions[4], and it wasn’t until
later someone decided it would be easier to work with complex exponentials. Later (subsection 2.1) we’ll see that for
real signals all the complex numbers cancel, leaving only a real sine and real cosine, which when added together make
a single, phase-shifted sinusoid! So think of ejω as oscillations at a particular frequency, ω.
If we inner product mismatched wiggles, they misalign and integrate to 0, but if we inner product matched wiggles,
they align, multiply to 1 because of the complex conjugate, and integrate to 2π over a period.

2
2 Transforms
We can reconstruct a function from a linear combination of basis functions:
M
X −1
f (x) = ck ξk (x), x ∈ [a, b]
k=0

where M is the number of basis functions we’re using in our reconstruction and k iterates through them. This is
essentially a recipe, which tells us how much of each basis function ξk is present in the signal f on a domain between
a and b.
We can find the quantities {ck } by taking the inner product of the above with each of the basis functions to produce
a system of M equations, then solving. In the special case where {ξk } are orthogonal, the system’s equations untangle,
and we get the simple relationship:

Rb
f (x)ξk (x)dx
⟨ξk , f ⟩ a
ck = =
⟨ξk , ξk ⟩ ||ξk ||22
The set of numbers {ck } is now an alternative representation of the original function. In some sense it’s equally
descriptive, so long as we know which basis we’re using to reconstruct. The function has been transformed, completely
analogous to changing coordinate systems in linear algebra, where we can express vector f⃗ in terms of a new orthogonal
⃗ ⃗ ⃗ ⃗
basis (ξ⃗0 , ξ⃗1 ) instead of axis-aligned unit vectors (⃗e0 , ⃗e1 ) via f⃗ = ⟨||ξξ⃗0 ,||f2⟩ ξ⃗0 + ⟨||ξξ⃗1 ,||f2⟩ ξ⃗1 :
0 2 1 2

f⃗
⃗e1
projξ⃗1 f⃗

ξ⃗1 ξ⃗0

projξ⃗0 f⃗

⃗e0

The {ck } are often said to live in another “domain”, although we have to be careful with this terminology, because
it technically refers to a “connected” set, not just a collection of M things. To be precise, some authors use “series”
to describe {ck } instead. However, it is possible for members of the basis set to be related through a continuous
parameter which in some sense makes the set dense, even in cases where we only take discrete members of this more
general set to be our basis set for a particular scenario. This is the case for the Fourier basis, where we choose ω ∈ R,
and hence ω really can become a new domain.

2.1 The Fourier Transform


Using Fourier’s original real-sinusoid-based formulation, we can write the reconstruction expression as∗ :

X
f (x) = a0 + (ak cos(kω0 x) + bk sin(kω0 x))
k=1

where
• f is periodic with fundamental frequency ω0 , so the k th frequency becomes k · ω0 .
• ak and bk are coefficients describing how much cosine and sine to add in, respectively.
• k goes up to ∞ because in general we need an infinite number of ever-higher-frequency sinusoids to reconstruct
the function with perfect fidelity.

∗ It’s worth considering how weird it is this works to express arbitrary functions, even non-smooth ones (so long as they meet the Dirichlet

conditions[11], i.e. aren’t pathological cases), a fact so counter-intuitive that Joseph Lagrange publicly declared Fourier was wrong at a
meeting of the Paris Academy in 1807[5] and rejected Fourier’s paper, which then went unpublished until after Lagrange died![11] It’s
valuable to ask why this works[6] and sift through some analysis.[7]

3
ejx +e−jx ejx −e−jx
Let’s now use cos(x) = 2 and sin(x) = 2j , which can be verified by manipulating Euler’s formula,
Equation 1.

X ejkω0 x + e−jkω0 x ejkω0 x − e−jkω0 x
f (x) = a0 + (ak + bk )
2 2j
k=1
−1 ∞ ∞
X a−k b−k jkω0 x X ak bk X
= a0 + ( − )e + ( + )ejkω0 x = ck ejkω0 x
2 2j 2 2j
k=−∞ k=1 k=−∞

So if we choose c0 = a0 and ck = c−k = a2k + b2jk , then the complex exponential formulation is exactly equivalent to
the trigonometric formulation[8]. That is, we can choose complex ck such that when multiplied by complex exponentials,
we get back only real signal! Essentially, the relative balance of real and complex in ck affects how much cosine and
sine are present at the k th frequency, thereby accomplishing a phase shift[9]. Without accounting for phase shifts, we
would only be able to model symmetric signals!
If instead of a fundamental frequency ω0 = 2π T , where T is a period of repetition, the signal contains dense
frequencies (because it has no repetition, T → ∞, ω0 → 0), and if we care about a domain of the entire set of R, then
it makes more sense to express the transformed coefficients as a function in ω and to make both our inner product and
reconstruction expression integrals from −∞ to +∞:
Z∞
fˆ(ω) = f (x)e−jωx dx = F{f (x)}
−∞
(2)
Z∞
1
f (x) = fˆ(ω)ejωx dω = F −1 {fˆ(ω)}

−∞
1
where the hat ◦ˆ represents a function in the Fourier domain, and the 2π is a scaling factor that corrects for the fact
the inner product of a Fourier basis function with itself integrates to 2π over a period instead of to 1 as we need for
orthonormality.
Just like the {ck }, fˆ(ω) can be complex, but if the original f (x) is real, then fˆ’s complexity will perfectly interact
with the complex exponentials to produce only a real function in the reconstruction.

2.2 A Whole Family


Part of what makes Fourier transforms confusing is the proliferation of different variants for different situations, so it’s
worth categorizing them.[10]. First off, are we dealing with a periodic signal (which has an ω0 ) or an aperiodic signal
(which doesn’t)? And second, are we dealing with a continuous function or discrete samples?

Periodic Aperiodic

x(t)
Continuous

x(t)
FT−1 FT

X(ejω ) FS X(jω)

DTFT

DTFT−1

x[n] FS−1 x[n]


Discrete

DFT−1 DFT
ck
X[k]

4
Note that, following a more signal-processing-ish convention[11], the function we’re transforming is now called x,
and the independent variable, since it can no longer be x, is named t. For discrete signals, we use independent variable
n in square brackets.
Here FS stands for “Fourier Series”, which is the first situation covered above. FT stands for “Fourier Transform”,
which is given by the integral pair, Equation 2. But these are not the only possibilities! DTFT stands for “Discrete
Time Fourier Transform”, where the signal we want to analyze is discrete but the transform is continuous. And finally
DFT stands for “Discrete Fourier Transform”, not to be confused with the DTFT, which we use when both the original
and transformed signals are sampled.
All of these can be considered Fourier transforms, but often when people talk about the canonical “Fourier Trans-
form”, they are referring to the continuous, aperiodic case in the upper righthand cell.
The notation of all these different functions and transforms is easy to mix up and made all the more confusing by
the reuse of symbols. But it’s important to keep straight which situation we’re in. I can only apologize. For more on
all these, see [11].

3 Taking Derivatives in the Fourier Domain


Let’s take a Fourier transform of the derivative of a function[12]:
Z∞ ∞
Z∞
d df −jωx
F{ f (x)} = e| {z } dx = f (x)e−jωx − f (x)(−jω)e−jωx dx = jω · fˆ(ω)
dx dx
|{z} u −∞
−∞ | {z } −∞
dv 0 for Lebesgue-
integrable
functions
We can use the inverse transform equation to see the same thing:
Z∞ Z∞
d d 1 1 d
f (x) = fˆ(ω)ejωx dω = fˆ(ω) ejωx dω = F −1 {jω · fˆ(ω)}
dx dx 2π 2π dx
−∞ −∞

So a derivative in the x domain can be accomplished by a multiplication in the frequency domain. We can raise to
higher derivatives simply by multiplying by jω more times.
This is great because taking derivatives in the spatial domain is actually pretty hard, especially if we’re working with
discrete samples of a signal, whereas taking the derivative this way in the frequency domain, the spectral derivative,
gives us much better fidelity.[13, 14] The cost is that we have to do a Fourier transform and inverse Fourier transform
to sandwich the actual differentiation, but there is an O(N log N ) algorithm to accomplish the DFT (subsection 2.2
and Equation 3) for discrete signals called the Cooley-Tukey algorithm, also known as the Fast Fourier Transform
(FFT)[14].

3.1 Taking Derivatives in the Discrete Case


Because we’re going to want to use a computer, and a computer can only operate on discrete representations, we really
need to talk about the DFT and what it means to take a derivative in this discrete paradigm. It has a connection to
the above continuous case but is far more subtle, worth going in to at some length.

3.1.1 The DFT Pair


M −1

X
DFT: Yk = yn e−j M nk
n=0
(3)
M −1
−1 1 X 2π
DFT : yn = Yk ej M nk
M
k=0

where
• n iterates samples in the original domain (often spatial)
• k iterates samples in the frequency domain (wavenumbers)
• M is the number of samples in the signal, often given as N by other sources[15], but I’ll use N for something
else later and want to be consistent
• y denotes the signal in its original domain

5
• Y denotes the signal in the frequency domain

We can express this as the linear inverse problem:


2π 2π 2π
ej M 00 ej M 01 ej M 0(M −1)

···
   
Y0 y0
2π 2π 2π
1  ej M 10 ej M 11 ··· ej M 1(M −1)  Y1
   y1
=
    
.. .. .. .. .. ..
M
  
 . . . .   .  .
2π 2π
j 2π
ej M (M −1)0 ej M (M −1)1 ··· e M (M −1)(M −1) YM −1 yM −1
which, thanks the special structure of its matrix, can be solved by a divide-and-conquer strategy that recursively builds
the solution to a larger problem out of smaller ones (FFT)[14].
For simplicity, we can collect 2π 2π
M n as a single term, θn ∈ [0, 2π), or M k as a single term, ωk . We then get yn = y(θn )
and Yk = Y (ωk ). This may help highlight the fact the original signal and transformed signal live on a domain which
maps to the unit circle[16] (hence periodicity and aliasing) and are being sampled at equally-spaced angles/angular
velocities.

3.1.2 Interpolation
I now quote Steven Johnson[17], with some of my own symbols and notation sprinkled in:

“In order to compute derivatives like y ′ (θ), we need to do more than express yn . We need to use the
DFT−1 expression to define a continuous interpolation between the samples yn —this is called trigono-
metric interpolation—and then differentiate this interpolation. At first glance, interpolating seems very
straightforward: one simply evaluates the DFT−1 expression at non-integer n ∈ R. This indeed defines an
interpolation, but it is not the only interpolation, nor is it the best interpolation for this purpose. The rea-
son there is more than one interpolation is due to aliasing: any term e+jθn k Yk in the DFT−1 can be replaced

by e+jθn (k+mM ) Yk for any integer m and still give the same samples yn , since ej M nmM = ej2πnm = 1 for
any integers m and n. Essentially, adding the mM term to k means that the interpolated function y(θ)
just oscillates m extra times between the sample points, which has no effect on yn but has a huge effect
on derivatives. To resolve this ambiguity, one imposes additional criteria—e.g. a bandlimited spectrum
and/or minimizing some derivative of the interpolated y(θ)”

We can now posit a slightly more general formula for the underlying continuous, periodic (over interval length M)
signal:
M −1
1 X
y(θ) = Yk ejθ(k+mk M ) , mk ∈ Z
M
k=0

“In order to uniquely determine the mk , a useful criterion is that we wish to oscillate as little as
possible between the sample points yn . One way to express this idea is to assume that y(θ) is bandlimited
to frequences |k + mk M | ≤ M2 . Another approach, that gives the same result ... is to minimize the
mean-square slopeӠ

Z2π Z2π M −1 2
1 ′ 2 1 1 X
|y (θ)| dθ = j(k + mk M )Yk ejθ(k+mk M ) dθ
2π 2π M
0 0 k=0

Z2π M −1  M −1
1 X X 
= j(k + mk M )Yk ejθ(k+mk M ) j(k + mk M )Yk ejθ(k+mk M ) dθ
2πM 2
0 k=0 k=0

Z2πM −1 M −1 
1 X X  
= j(k + mk M )Yk ejθ(k+mk M ) j(k ′ + mk′ M )Yk′ ejθ(k′ +mk′ M ) dθ
2πM 2
0 k=0 k′ =0

† It’s due to this ambiguity and constraint that spectral methods are only suitable for smooth functions!

6
M −1 M −1 Z2π
1 X X 1 ′
= 2 (k + mk M )(k ′ + mk′ M )Yk Yk′ ejθ(k+mk M ) e−jθ(k +mk′ M ) dθ
M ′

k=0 k =0 0
| {z }


0 if k + mk M ̸= k + mk M





=
 ⇐⇒ k ̸= k for 0 ≤ k, k ′ < M


1 if k = k ′

M −1
1 X
= |Yk |2 (k + mk M )2
M2
k=0

We now seek to minimize this by choosing mk for each k. Only the last term depends on mk , so it’s sufficient to
minimize only this:
minimize (k + mk M )2
mk

s.t. 0≤k<M
mk ∈ Z

This problem actually admits of good ol’ calculus plus some common sense:
d −k
(k + mk M )2 = 2(k + mk M )M = 0 −→ m∗k = ∈ (−1, 0]
dmk M

where denotes optimality. But we additionally need to choose mk ∈ Z. Let’s plot it to see what’s going on.

cost
feasible costs

−1

−0.5 mk mk

As we change the values of M and k, the parabola shifts around, getting taller for larger M and shifting leftward
as k → M .
We can see that for k ∈ [0, M M
2 ), the mk = 0 solution is lower down the cost curve, and for k ∈ ( 2 , M ), the mk = −1
solution is more optimal. “If k = M 2 (for even M ), however, there is an ambiguity: either mk = 0 or −1 gives the
2 M 2
same value (k + mk M ) = ( 2 ) . For this YM/2 term (the “Nyquist” term), we can arbitrarily split up the YM/2 term
between m = 0 [j M M M
2 θ, positive frequency] and m = −1 [j( 2 − M )θ = −j 2 θ, negative frequency]:”
M M
YM/2 (uej 2 θ
+ (1 − u)e−j 2 θ
)
(−1)n (−1)n
z}|{ z }| {
M 2π M 2π
where u ∈ C s.t. at sample points θn we get YM/2 (uej 2 M n + (1 − u)e−j 2 M n ) = YM/2 (u ejπn +(1 − u) e−jπn ) =
YM/2 (−1)n “and so recover the DFT−1 .”

If we use the above in the mean-squared slope derivation instead of Yk ejθ(k+mk M ) and Yk′ ejθ(k +mk′ M ) , then the
integral portion becomes:
Z2π
1 M M M M
YM/2 YM/2 (uej 2 θ + (1 − u)e−j 2 θ )(uej 2 θ + (1 − u)e−j 2 θ )dθ

0

 Z2π M Z2π Z2π


2 1 j 2 θ −j M M M M M
= |YM/2 | uu |e e 2 } dθ + u(1 − u) e| 2 {ze 2 } dθ + (1 − u)u e|−j 2 θ{ze−j 2 θ} dθ
θ j θ j θ
2π {z
0 =1 0 periodic! 0 periodic!

7
Z2π 
M M
+ (1 − u)(1 − u) e|−j 2{z
θ j 2 θ
e } dθ
0 =1
1
= |YM/2 |2 (|u|2 2π + |1 − u|2 2π) = |YM/2 |2 (|u|2 + |1 − u|2 )

because integrating something periodic over a multiple of its period yields its mean, which is 0 in this case.
th
We now know that the contribution to the mean-squared slope from the M 2 term ∝ |u|2 + |1 − u|2 . What’s the
optimal u?
d 1
|u|2 + |1 − u|2 = 2u − 2(1 − u) = 0 −→ u =
du 2
So “the YM/2 term should be equally split between the frequencies ± M 2 θ, giving a cos( M
2 θ) term.” Note that if M
M
is odd, there is no troublesome 2 term like this, but later we’ll use the Discrete Cosine Transform[20] type I (DCT-I),
which is equivalent to the FFT with even M and Yk = YM −k , so we do have to worry about the Nyquist term.
Now if we put it all together we get “the unique “minimal-oscillation” trigonometric interpolation of order
M ”:
1  X M 
Yk ejkθ + YM −k e−jkθ + YM/2 cos( θ)

y(θ) = Y0 + (4)
M 2
0<k< M
2

“As a useful side effect, this choice of trigonometric interpolation has the property that real-valued samples yn (for
which Y0 is real and YM −k = Yk ) will result in a purely real-valued interpolation y(θ) for all θ.”

3.1.3 Taking Derivatives of the Interpolant


Now at last, with this interpolation between integer n in hand, we can take a derivative w.r.t. the spatial variable:
d 1  X M M 
y(θ) = jk(Yk ejkθ − YM −k e−jkθ ) − YM/2 sin( θ)
dθ M M
2 2
0<k< 2


Evaluating at θn = M n, n ∈ Z, we get:
0 M −1
1  X jk 2π −jk 2π M 
 1 X ′ j 2π kn
yn′ = jk(Yk e M n
− YM −k e M n
)− Y sin(πn) =
  Yk e M
M 2 M/2 M
M k=0
0<k< 2

M
jk · Yk
 k< 2

where Yk = 0 k=M 2
j(k − M ) · Yk k > M M

2 ← comes from: knew = M − kold , 0 < kold < 2

M
→ < knew < M ; −jkold · YM −kold → −j(M − knew ) · Yknew
2
Easy! Now let’s do the second derivative:

d2 1  X 2π 2π
 M 2 M 
2
y(θ) = (jk)2 (Yk ejk M n + YM −k e−jk M n ) − YM/2 cos( θ)
dθ M M
2 2
0<k< 2


And again evaluating at θn = M n, n∈ Z:
 M 2 M −1
1  X 2π 2π
 1 X ′′ j 2π kn
yn′′ = (jk)2 (Yk ejk M n + YM −k e−jk M n ) − YM/2 (−1)n = Yk e M
M M
2 M
0<k< 2 k=0

(jk)
 2
· Yk k<M 2
(
M
′′
  2
′′ (jk)2 · Yk k≤ 2
where Yk = jM
2 · Yk k=M 2
or equivalently Yk = 2 M
 (j(k − M )) · Yk k> 2
(j(k − M ))2 · Yk k > M


2

It’s important to realize “this [second derivative] procedure is not equivalent to performing the spectral first-
derivative procedure twice (unless M is odd so that there is no YM/2 term) because the first derivative operation omits
the YM/2 term entirely.”[17]

8
We can repeat for higher derivatives, but the punchline is that for odd derivatives the M
2 term goes away‡ , and for
even derivatives it comes back. In general:

k<M


 (jk)ν · Yk 2
(j M )ν · Y

k = M
(ν)
Yk = 2 k 2 and ν even (5)
M


 0 k = 2 and ν odd
(j(k − M ))ν · Yk k > M

2

This has definite echoes of the standardly-given, continuous-time case covered in section 3, but it’s emphatically
not as simple as just multiplying by jω or even by jk. However, the final answer is thankfully super compact to
represent in math and in code.

3.2 Limitations
So far it has all been good news, but there is a serious caveat to using the Fourier basis, especially for derivatives.
Although a Fourier transform tends to have more “mass” at lower frequencies and fall off as we go to higher ones
(otherwise the reconstruction integral would diverge), and therefore we can get really great reconstructions by leaving
off higher modes[14] (or equivalently only sampling and transforming M components), we in fact need all the infinite
modes to reconstruct an arbitrary signal[11]. Even then, the Fourier basis can not represent true discontinuities nor
non-smooth corners, instead converging “almost everywhere”, which is math speak for the “measure” or volume of the
set where it doesn’t work being 0, meaning it only doesn’t work at the discontinuities or corners themselves.[11]
If there are discontinuities or corners, we get what’s called the Gibbs Phenomenon[11], essentially overshoot as the
set of basis functions tries to fit a sudden change. These extra wiggles are bad news for function approximation but
even worse news for taking derivatives: If we end up on one of those oscillations, the slope might wildly disagree with
that of the true function!

An example of the Gibbs phenomenon, from [14]

This is a bigger problem than it may first appear, because when we do this on a computer, we’re using the DFT,
which implicitly periodically extends the function (subsection 2.2, Equation 3). So we not only need the function to
have no jumps or corners internal to its domain; we need it to match up smoothly at the edges of its domain too!
This rules out the above spectral method for all but “periodic boundary conditions”[14]. But if the story ended
right there, I wouldn’t have thought it worth building this package.

4 The Chebyshev Basis


There is another basis which we can use to represent arbitrary functions, called the Chebyshev polynomials[18], which
has a really neat relationship to the Fourier basis.
‡ For
real signals it is common (e.g. in the kind of code ChatGPT might generate to do this) not to worry about zeroing out the
PM −1 −j 2π nM
Nyquist term and instead throw away the imaginary part of the inverse transform. This works because YM/2 = n=0 yn e
M 2 =
PM −1 n , which will be purely real for real y , and when we mulitply by (jk)ν for odd ν, then Y (ν)
n=0 y n (−1) n M/2
gets a constituent odd
(ν)
power of j, which makes it purely imaginary. Then its contribution to the inverse transform (i.e. to each sample of the derivative yn )
(ν) 2π M (ν)
is +YM/2 ej M n 2 =+YM/2 (−1)n , which will also be purely imaginary. Whereas other imaginary components in the transform of a real
signal have negated twins at negative frequency to pair with and become real sines in the inverse transform, the Nyquist term has no twin,
so it’s the only imaginary thing left over. Thus for real signals keeping only the ifft().real is equivalent to zeroing out the Nyquist term.

9
Im{z}

z = ejθ

Let x ∈ [−1, 1] Chebyshev


θ x 1 = cos(θ), θ ∈ [0, π] Fourier (6)
Re{z}
1
= (z + z −1 ), |z| = 1 Laurent
2

z = z −1
= e−jθ

The k th Chebyshev polynomial is defined as Tk (x) = Re{z k } = 21 (z k + z −k ) = cos(kθ) by Euler formula:

T0 (x) = Re{z 0 } = 1
1
T1 (x) = Re{z 1 } = (ejθ + e−jθ ) = cos(θ) = x
2
1 j2θ −j2θ
T2 (x) = (e + e ) = cos(2θ)
2
1 2 r 1 2  2 2 z + z −1
−2
but also = (z + 2 + z ) −1 = (z + z −1 ) − 1 = √ −1 = 2x2 − 1
2| {z } 2 2 | {z 2 }
perfect square | {z }
2 cos(θ)
1
T3 (x) = (ej3θ + e−j3θ ) = cos(3θ)
2
1 3
but also = (z + z −1 )3 − (z + z −1 ) = 4x3 − 3x
2 2
...

It turns out there is a recurrent pattern:


1 k+1 1 1
Tk+1 = (z + z −(k+1) ) = (z k + z −k )(z + z −1 ) − (z k−1 + z −(k−1) ) = 2xTk (x) − Tk−1 (x)
2 2 2
Due to the relationship between θ and x on their respective domains, you can think of these polynomials as cosine
waves “wrapped around a cylinder and viewed from the side.”[18]
Essentially, on the domain [−1, 1] each of these polynomials has ever more wiggles in the range [−1, 1], and they
perfectly coincide with the shadows of horizontally-reversed 2π-periodic cosines in the domain [0, π]. If we trace a
function’s value over x = cos(θ) for linearly-increasing θ ∈ [0, π] instead of tracing it for linearly-decreasing x ∈ [−1, 1],
it’s as if we’re walking along the arc of the cylinder instead of along the shadow. We’re effectively moving, horizontally
flipping, and warping the function (by expanding near the edges and compressing in the middle) to a new θ domain.
We can reconstruct a function using the different variables/basis formulations, and as long as our variables are
related as in Equation 6, these reconstructions are equivalent:
N N N
X X 1 X
y(x) = ak Tk (x) ; y(z) = ak (z k + z −k ) ; y(θ) = ak cos(kθ) (7)
2
k=0 k=0 k=0

Note the set of {ak } is for k ∈ {0, ...N } and therefore has cardinality N + 1.

4.1 The Advantage of Chebyshev


Why might we prefer this basis to the Fourier basis? Well, the advantage of a polynomial basis is we can avert the
need for periodicity at the boundaries. Polynomial fits don’t suffer the Gibbs Phenomenon, however they do suffer
from the also-bad Runge Phenomenon[14]:

10
Relationship of Chebyshev domain and Fourier Domain, from [19]. Notice
the cosines are horizontally flipped. The authors use n instead of k, which is
common for Chebyshev polynomials (e.g. [18]), but I prefer k to enumerate
basis modes, for consistency.

The Runge phenomenon, demonstrated in (a) and (b), mitigated in (c)


and (d), from [14]

However, there is something we can do about the Runge phenomenon: By clustering fit points at the edges of the
domain, the wild wobbles go away.
If we take xn = cos(θn ) with θn equispaced, n ∈ {0, ...N }, then we get a very natural clustering at the edges of
[−1, 1]. What’s more, if we have equispaced θn and a reconstruction expression built up out of sinusoids, we’re back
in a Fourier paradigm (at least in variable θ) and can exploit the efficiency of the FFT, or, better, the discrete cosine
and sine transforms![20, 21]
Notice too that the polynomials/projected cosines are asymmetrical, so we can natually use this basis to model

11
arbitrary, lopsided functions without having to worry about phase shifts like we did for a Fourier basis of discrete
harmonics.

5 Chebyshev Derivatives by Reduction to the Fourier Case


This all suggests a solution procedure:

Chebyshev Derivative via Fourier


1: Sample y at {xn = cos(θn )} rather than at equally spaced {xn }, thereby warping the function over the arc
of a cylinder.
2: Use the DCT to transform to frequency domain.
3: Multiply by appropriate (jk)ν to accomplish differentiation in the θ domain.
4: Inverse transform using the DST if odd function, DCT if even function.
5: Change variables back from θ to x, the Chebyshev variable, taking care that this entails an extra chain rule.

ex sin(7x), xn = cos(θn ) ecos(θ) sin(7 cos(θ)), θn = 2π


10 n
2 2
flip and
warp
−1 x 1 π
θ π
2

−2 −2

2 periodically extend

−π − π2 π π θ 3π 2π
2 2

−2
ecos(θ) sin(7 cos(θ)), θ ∈ [−π, 2π]

Illustration of implicit function manipulations in the first two steps of the algorithm. The edges of
the aperiodic function can be made to match by “periodic extension”, but this operation alone only
fixes discontinuity; corners are created at 0 and π, resulting in Gibbs phenomenon when we frequency
transform. Warping stretches corners into smooth transitions.

There are still a lot of details left to be worked out here, which we’ll tackle in sequence.

5.1 The Discrete Cosine Transform


Because the reconstruction of y(θ) (Equation 7) only contains cosines, doing a full FFT/DFT, which tries to fit sines
as well as cosines, would be doing extra work. Instead we can use the DCT.
There are actually several possible definitions of the DCT[20], based on different discrete periodic extensions. The
DCT-II is the default in scipy[20], because it most closely resembles the Karhunen–Loève Transform[22], which is
provably optimal to represent signals generated by a stationary (statistical properties not varying over time) Markov
process (where values are related only to immediate previous values, through an autocorrelation coefficient). This class
of signals is not exactly the same as the class of smooth functions, but it’s similar in spirit. Thus we can often get
better “energy compaction” with this basis and represent signals in fewer coefficients[23], especially when they have
steep slopes at their boundaries. Indeed, common compression standards like JPEG choose to use the DCT-II.
However, we are dealing with warped functions with flattened edges, which are best made periodic by direct
stacking-and-mirroring around the ends of the domain; they do not need any “wiggle room” between repeats of the
endpoints. To achieve this with the DCT-II, we have to sample at “half-index” points, θn,II = N1+1 (n + 21 ), which do
not go all the way to the the domain boundaries. But for compatibility with boundary value problems, we really want
a sampling that includes the boundaries, like θn = πn N . The DCT-I implies the periodic extension we want with the

12
sampling we want, so it is actually our best choice here, and it mercifully comes with the added benefit of being the
least confusing variant to derive from the DFT:
Say we periodically extend y, i.e. stack y(θ), θ ∈ [0, π] next to a horizontal flip of itself on (π, 2π), and then sample
at the canonical DFT points, θn = 2π M n, n ∈ {0, ...M − 1} (Equation 3). We get:

⃗yext = [y0 , y1 , ...yN −1 , yN , yN −1 , ...y1 ], that is: yn = yM −n , 0 ≤ n ≤ N


| {z } | {z }
original vector, redundant
length N+1 information
| {z }
length M = 2N, necessarily even!

Then using M − k for k in the DFT equation, we get:


let nnew = M − nold
M −1 M −1 1
2π 2π 2π
X X X
YM −k = yn e−j M n(M −k) = yn e|−j2πn
{z } e
j M nk
= yM −n ej M (M −n)k
n=0 n=0 1 n=M
M M M −1
2π 2π 2π
X X X
= yM −n e|j2πk e−j M nk = yn e−j M nk = yn e−j M nk = Yk □
| {z } {z }
n=1 1 n=1 n=0
=yn

because e−j M M k =
e0 = 1 and yM = y0
So when yn are redundant this way, the Yk are too, in a very mirror way. We can now use the facts Yk = YM −k
and N = M2 in the FFT interpolation (Equation 4):

1  X M 
Yk ejkθ + YM −k e−jkθ + YM/2 cos( θ)

y(θ) = Y0 +
M 2
0<k< M
2

N −1 (8)
1  X ejkθ + e−jkθ  
= Y0 + 2 Yk + YN cos(N θ)
M 2
k=1 | {z }
cos(kθ)

2π π
At samples θn = Mn = N n, this becomes:
N −1
1  X πnk 
y(θn ) = Y0 + YN cos(πn) +2 Yk cos( ) (DCT-I−1 )
M | {z } N
k=1
(−1)n

This is exactly the DCT-I−1 , which, except for the M


1
term and a flip of Y and y, is the same as the forward
−1
DCT-I! But the DCT and DCT operate on the shorter set of Y ⃗ = [Y0 , ...YN ], without redundant information. Thus

FFT−1 ([Y0 , ...YN , YN −1 , ...Y1 ])[: N + 1] = DCT-I−1 ([Y0 , ...YN ]) (9)


| {z }
⃗ext
Y

Where [: N + 1] truncates to only the first N + 1 elements ({0, ...N }). Given the equality above, we can line up
everything we now know in a diagram:

truncate
⃗yext ⃗y
truncate−1 where truncate−1 does periodic
−1
FFT FFT DCT-I DCT-I−1 extension, stacking back in
truncate redundant information
⃗ext
Y ⃗
Y
truncate−1

We can now easily see that in addition to the inverse relationship (Equation 9), we also have the forward relationship:

13
FFT([y0 , ...yN , yN −1 , ...y1 ])[: N + 1] = DCT-I([y0 , ...yN ])
th th
Notice that the 0 and N terms appear outside the sum, and that the sum is multiplied by 2. In our original
conception of the cosine series for y(θ) (Equation 7), all the cosines appear equally within the sum, so our Yk are
subtly different from the ak in that formulation (some scaled by a factor of 12 and all scaled by M ). Both are valid,
but it’s more computationally convenient to use the DCT-based formulation.

5.2 Even and Odd Derivatives and the Discrete Sine Transform
The DCT can get us in to the frequency domain, but we’ll need the help of another transform to get back out. We
again start with the DCT-I formulation for simplicity.
⃗ext (Equation 9), we have a palindromic structure around N , but also around 0, because
If we look at the full Y
of the repetitions[16], which ensure we can read the values of Yk at negative k by wrapping around to the end of the
vector. This is describing an even function, f (−x) = f (x), which makes sense, because y(θ) is entirely composed of
cosines, which are even functions, and because the forward transform is symmetrical with the inverse transform, the
interpolation Y (ω) between Yk is also ultimately a bunch of cosines.
The derivative of an even function is an odd function, f (−x) = −f (x), which in principle should be constructable
from purely sines, which are odd. And the derivative of an odd function is an even function again.
(ν)
To see this more granularly, let’s look in more detail at the multiplication by (jk)ν that produces all the Yk
(Equation 5), for k ∈ {0, ...M − 1}:

(ν)
⃗ext ⃗ext
Y = [0, j ν , ...(j(N − 1))ν , (0 or (jN )ν ) , (−j(N − 1))ν , ...(−j)ν ] ⊙ Y
| {z }
depending on
ν odd or even
ν ⃗ext
= j ν · [0, 1, ...1, (0 or 1), −1, ..., −1] ⊙ [0, 1, ...N − 1, N, N − 1, ...1]ν ⊙Y
| {z } | {z }

constant palindromic
where ⊙ is a Hadamard, or element-wise, product, and raising a vector to a power is also element-wise. We can see
(
[0, 1, ...1, 0, −1, ..., −1] if ν is odd
1̃ =
ν
[0, 1, ...1, 1, 1, ...1] if ν is even
[0, 1, ...1, 0, −1, ..., −1] is odd around entries 0 and N , and [0, 1, ...1, 1, 1, ...1] is even around entry 0.
(ν)
Let’s now use this to reconstruct samples in the θ domain, yn , for odd and even derivatives:
N −1
1 X 1 X
yn(odd ν) = (jk)ν (Yk ejkθn − YM −k e−jkθn ) = (jk)ν Yk (ejkθn − e−jkθn )
M M
| {z } M | {z }
0<k< from odd- k=1
2 =Yk 2j sin(kθn )
ness of 1̃ν
N −1
1 X πnk
= 2 (jk)ν Yk j sin( ) (10)
M N
k=1
| {z }
= a DST-I of Y ⃗ (ν) · j!
1  X M M 
yn(even ν) = (jk)ν (Yk ejkθn + YM −k e−jkθn ) + (j )ν YM/2 cos( θn )
M M
| {z } 2 2
0<k< 2 from even- =Yk
ness of 1̃ν
N −1
1  X 
= (jN )ν YN cos(πn) + (jk)ν Yk (ejkθn + e−jkθn )
M | {z }
k=1
2 cos(kθn )
N −1
1  X πnk 
= (j0)ν Y0 + (jN )ν YN (−1)n + 2 (jk)ν Yk cos( ) (11)
M N
k=1
| {z }

= a DCT-I of Y !(ν)

14
(ν)
Brilliant! So we can use only the non-redundant Yk with a DST-I or DCT-I to convert odd and even functions,
respectively, back to the θ domain!
Note that the DCT-I and DST-I definitions given in scipy[20, 21] use slightly different indexing than in my
definitions here, which can be a point of confusion. I consistently take N to be the index of the last element of the
non-redundant yn , not its length, following [18]. Note too that I consistently use n to index samples and k to index
basis domain, whereas scipy uses n for the domain being transformed from and k for the domain being transformed
to, which means these symbols are consistent with mine for forward transforms but flipped for inverse transforms.
Even more confusing, the DST-I only takes the k ∈ {1, ...N − 1} elements, since sines will result in zero crossings
at k = 0 and N (no informational content), whereas the DCT-I takes all k ∈ {0, ...N } elements!

5.3 Transforming Back to the Chebyshev Domain


At this point we’ve accomplished all but the last step of the algorithm, but we’ve been operating with yn = y(θn ) and
(ν)
yn = y (ν) (θn ), which are really samples from the θ domain, when what we really need to do is take derivatives in the
x = cos(θ) domain.
When we do this, we have to employ a chain rule, which introduces a new factor: the derivative of one of our
variables w.r.t. the other. For the 1st derivative it looks like:
d d dθ d −1
y(θ) = y(θ) · = y ′ (θ) · cos−1 (x) = y ′ (θ) · √
dx dθ dx dx 1 − x2
The y ′ (θ) term is actually pretty easy to handle, because we know its value (and for higher orders too) at discretized
θn from earlier! (Equations 10 & 11) If we use the sampled xn = cos(θn ) from step 1 of the algorithm, then our {xn }
and {θn } align, and we can find samples of the derivative w.r.t. x by plugging {xn } in to the new factor(s) and
multiplying appropriately (pointwise):
d −1
[ y(θ)]n = p ⊙ yn′
dx 1 − x2n

5.3.1 Higher Derivatives and Splintering Terms


Let’s see it for the second derivative:

√ d ′

d2 d y ′ (θ) − 1 − x2 dx y (θ) − y ′ (θ) dx
d
(− 1 − x2 )
y(θ) = √ =
dx2 dx − 1 − x2 1 − x2
′ x
d ′
− y (θ) · dx dθ y (θ) 1−x2
√ ′′
y (θ) xy ′ (θ)
= dθ √ − = −
1 − x2 1 − x2 1 − x2 (1 − x2 )3/2
d2 1 xn
−→ [ 2 y(θ)]n = 2
⊙ yn′′ − ⊙ yn′
dx 1 − xn (1 − x2n )3/2

Notice that the 2nd derivative in x requires both the 1st and 2nd derivatives in θ! This splintering phenomenon will
be a considerable source of pain as we take higher derivatives: For the ν th derivative in x we require all derivatives up
to order ν in θ.
Let’s see a few more:
d3 −1 3x −2x2 − 1 ′
y(θ) = y ′′′ (θ) + y ′′ (θ) + y (θ)
dx3 (1 − x2 )3/2 (1 − x2 )2 (1 − x2 )5/2
d4 1 −6x 11x2 + 4 ′′ −6x3 − 9x ′
y(θ) = y IV (θ) + y ′′′ (θ) + y (θ) + y (θ) (12)
dx4 (1 − x2 )2 (1 − x2 )5/2 (1 − x2 )3 (1 − x2 )7/2
d5 −1 10x −35x2 − 10 ′′′ 50x2 + 55x ′′ −24x4 − 72x2 − 9 ′
y(θ) = y V (θ) + y IV (θ) + y (θ) + y (θ) + y (θ)
dx5 (1 − x2 )5/2 (1 − x2 )3 (1 − x2 )7/2 (1 − x2 )4 (1 − x2 )9/2
We can see a bit of a pattern here, though. In particular, the function of x multiplying each derivative of y in θ
comes from at most two terms in the preceding derivative, which have a predictable form:

d  p(x) (µ) q(x) (µ−1)



y (θ) + 1 y (θ) =
dx (1 − x2 )c−1 (1 − x2 )c− 2

15
d
p(x) (µ+1) dθ (1 − x2 )c−1 dx p(x) − p(x)(c − 1)(1 − x2 )c−2 (−2x) (µ)
y (θ) · + y (θ) +
(1 − x2 )c−1 dx (1 − x2 )2c−2
d 1 3
q(x) (µ) dθ (1 − x2 )c− 2 dx q(x) − q(x)(c − 21 )(1 − x2 )c− 2 (−2x) (µ−1)
1 y · + y (θ)
(1 − x2 )c− 2 dx (1 − x2 )2c−1

If we now gather the y (µ) (θ) terms and use the fact dθ
= √ −1 , we can find its new multiplying factor is equal to:
dx 1−x2

(1 − x2 )p′ (x) + 2(c − 1)xp(x) − q(x)


(1 − x2 )c

This relationship holds no matter which, µ, c, p, q we’re addressing, which allows us to build up a kind of pyramid
of terms:

numerator c at each 1
−1
p and q: location: 2

−x 3
1
2 1

increasing increasing 5 3
d
of y −2x2 − 1 3x −1
2 2 2
dx ν
q p
7 5
−6x3 − 9x 11x2 + 4 −6x 1
2 3 2 2

9 7 5
−24x4 − 72x2 − 9 50x3 + 55x −35x2 − 10 10x −1
2 4 2 3 2

lower d

of y ... ... higher d

of y ... increasing µ ...

q always refers to the element up and to the left, and p always refers to the element above. If the arrows roll out of
the pyramid, the corresponding p or q is 0. I’ve done the above programmatically in code, such that we can find and
apply the factors—and thereby accomplish the variable transformation back to the Chebyshev domain—for arbitrarily
high derivatives.

5.4 Handling Domain Endpoints


There’s a problem
√ in the above at the edges of the domain: If x = ±1, the denominators of all the factors, which are
powers of 1 − x2 = 0!

However, this doesn’t mean dx ν y can’t have a valid limit value at those points. First remember our reconstruction

y(θ) is composed of cosines, so notice if we take odd derivatives in θ, we get sines, and at the edges of the domain
where x → ±1, θ → 0, π, sine will be 0! However, if we take even derivatives, then cos(0, π) → 1, −1. Then, if we look
closely
√ at the derivatives in x (Equation 12), we can see that even derivatives
√ in θ of y are divided by even powers of
1 − x2 , and the highest power in a denominator is an odd power of 1 − x2 . If we multiply through so everything is
over the highest-power denominator and then combine the expression into a single fraction, we get a situation where
the odd-derivative
√ terms are 0 because sines, and the even-derivative terms are 0 because they’re multiplied by at least
one 1 − x2 .
This means the numerator as well as the denominator is 0 at the domain endpoints. 00 is an indeterminate form,
so we can use L’Hôpital’s rule!
Let’s see it in fine detail for the 1st derivative. The below uses the DCT-I reconstruction (Equation 8) for y(θ).

 PN −1 
1
M − N YN sin(N θ) − 2
N −1
d 1  k=1 kYk sin(kθ)
X  dθ
lim Y0 + YN cos(N θ) + 2 Yk cos(kθ) · = lim √
x→±1 dθ M dx x→±1 − 1 − x2
θ→0,π k=1 θ→0,π
 PN −1 2   PN −1 2 
d
1
− N 2
YN cos(N θ) − 2 k Y k cos(kθ) · √ −1 1
N 2
YN cos(N θ) + 2 k Yk cos(kθ)
dx M k=1 1−x
 2 M k=1
−→ = lim = lim
d x→±1 √ x x→±1 x
dx
θ→0,π
2
1−x θ→0,π

16
  PN −1 
1
 N 2 YN + 2 k=1 k 2 Yk
M at x = 1, θ = 0
=   (1st endpoints)
− 1 N 2 (−1)N YN + 2 PN −1 k 2 (−1)k Yk at x = −1, θ = π
M k=1

And now let’s do it for the 2nd derivative, with some slightly more compact notation, where we can be agnostic
about y(θ)’s exact structure until the end:

√ √ 
√ −1 + √ −xy
1− x2 y ′′′ (θ) ′′  ′′ √ −1
 + y ′ (θ))
 2 (θ) − (

1 − x2 y ′′ (θ) − xy ′ (θ) 0 dx
d
 2 1−x
1−x
xy (θ)
 1−x2
lim → −→ √
x→±1 (1 − x2 )3/2 0 dx
d
−3x 1 − x2
θ→0,π
−1
−y ′′ (θ) √−1
0 dxd −y IV (θ) √
2
1−x

2
1−x 1
→−→ 6x 2 −3 = 2 (y IV (θ) + y ′′ (θ))
0 dxd √ 
2 6x − 3
1−x
 PN −1 
We already know y ′′ (θ) = M1
− N 2 YN cos(N θ) − 2 k=1 k 2 Yk cos(kθ) , assuming type I reconstruction. We can
easily find
N −1
1  4 X 
y IV (θ) = N YN cos(N θ) + 2 k 4 Yk cos(kθ)
M
k=1

Now we can evaluate these and the factor 6x21−3 at the limit values and put it all together to find:
  
 1 (N 4 − N 2 )YN + 2 PN −1 (k 4 − k 2 )Yk at x = 1, θ = 0
3M  k=1
N −1
 (2nd endpoints)
1 4 2 N 4 2 k
P

3M (N − N )(−1) YN + 2 k=1 (k − k )(−1) Yk at x = −1, θ = π

5.4.1 Endpoints for Higher Derivatives


We can do the above for higher derivatives too. However, in general finding √ the endpoints for the ν th derivative
involves ν applications of L’Hôpital’s rule, slowly cancelling one power of 1 − x2 at a time after each. The algebra
gets to be pretty gnarly.
But there is some hope: We can see a pattern like the pyramid scheme from earlier, because the functions multiplying
each y (µ) (θ) in the numerator of the limit argument depend only on one or two terms
√ from before the latest L’Hôpital.
We can additionally use the relationship between variables x = cos(θ) to recognize 1 − x2 = sin(θ) and substitute to
put everything in terms of a single variable, and then
√ just as well perform L’Hôpital’s derivatives more simply w.r.t.
θ rather than x and cancel a sin(θ) rather than a 1 − x2 .
When we proceed, the denominator eventually acquires a single standalone term of the form D cosν (θ), which at
the domain endpoints 0 and π will be something nonzero, D0 = ±Dπ , thereby ending our journey. At the same
iteration, the numerator reduces to a set of constants, C, multiplying even-order θ-derivatives of y(θ) up to the 2ν th .
Putting it all together, the endpoint formulas can be found as:

  PN −1 
1
 (... − C3 N 6 + C2 N 4 − C1 N 2 )YN + 2 k=1 (... − C3 k 6 + C2 k 4 − C1 k 2 )Yk
D0 M at x = 1, θ = 0
 PN −1 
 1 (... − C 3 N 6
+ C 2 N 4
− C1 N 2
)(−1) N
YN + 2 (... − C 3 k 6
+ C 2 k 4
− C 1 k 2
)(−1) k
Yk at x = −1, θ = π
Dπ M k=1

where the alternating plus and minus in the k and N terms comes from the fact the 2nd derivative contains −cosines,
the 4th +cosines, the 6th −cosines again, and so on. √
Because the act of cancellation and the functions containing powers of 1 − x2 = sin(θ) can’t be easily represented
in numpy, computing C and D requires a symbolic solver like sympy. I’ve devised an implementation to construct
expressions for the endpoints, up to arbitrary order.

6 Chebyshev Derivatives via Series Recurrence


The above algorithm is a generalization of the method suggested by Trefethen[18], but it actually overcomplicates
matters considerably. It is possible to sidestep all the ballooningly-difficult variable mappings and higher derivative
limit-evaluations by exploiting a direct relationship[24, 25, 26] between a function’s Chebyshev series and its derivative’s

17
Chebyshev series. This is analogous to how we can represent a function as a power series (a sum of integer powers of
the independent variable) and find the coefficients of its derivative with Power Rule[27].
If we sandwich this key rule with the realization that the DCT is not only getting Fourier series coefficients of
the warped function, but also (scaled) Chebyshev series coefficients of the original function (because these are exactly
the same modulo constant factors!), and thus the DCT−1 can go from the Chebyshev series representation to a
cosine-spaced sampling of a function, then we have the ingredients to craft an algorithm that stays entirely in the x
domain:

Chebyshev Derivative via Series Rule


1: Sample y at {xn = cos(θn )} for equispaced {θn }.
2: Use the DCT to get Chebyshev basis coefficients.
3: Use the Chebyshev series derivative rule to calculate the derivative’s coefficients in O(N ).
4: Inverse transform with the DCT to resample the derivative function at {xn }.

This algorithm turns out to be essentially numerically identical to the first, and due to its relative simplicity (no
splinters!) and ease of extension to support non-cosine-spaced samples (albeit with considerably greater computational
cost, O(N 3 ) rather than O(N log N )), it is the method of choice implemented in the main library code.

6.1 Chebyshev Series Derivative Rule


To most easily explain the rule, we’ll need a little extra machinery: There are actually two different kinds of Chebyshev
polynomials, and so far we’ve only been working with the first kind, denoted Tk (x) = cos(kθ). The second kind is
given as:

sin((k + 1)θ)
Uk (x) =
sin(θ)
Notice that:

d d dθ −1
 k sin(kθ)
= k(− sin(kθ)) · √

Tk (x) = cos(kθ) · = = k · Uk−1 (x)
dx dθ dx 1 − x2 sin(θ)
But we really want the derivative in terms of T , not U . Lucky for us, there is a relationship between the two, based
on trigonometric identities, making particular use of cos(α) sin(β) = 21 (sin(α + β) − sin(α − β)):

odd k−1 odd k−1


X X 1
2 cos(κθ) sin(θ) = 
2 (sin((κ + 1)θ) − sin((κ − 1)θ)
odd κ>0 odd κ>0
2
((
(((−(2)θ) − ...− sin(2θ) (( ( 0
= sin(kθ)− sin((k
( −
( 2)θ)
( (+ (sin((k
( ( ((( +(sin(2θ) − 
sin(0θ)
 = sin(kθ)
(
(( k−1
sin(kθ) X
−→ =2 cos(κθ) for even k
sin(θ)
odd κ>0

Similarly:

even k−1 even k−1


X X 1
2 cos(κθ) sin(θ) = 
2 (sin((κ + 1)θ) − sin((κ − 1)θ)
2

even κ≥0 even κ≥0
((
= sin(kθ)− sin((k − (((−(2)θ) − ...− sin(θ) ((((
2)θ) + sin((k ((((+ sin(θ) − sin(−θ) = sin(kθ) + sin(θ)
( (
( ( ((( | {z }
(
=+ sin(θ)
k−1
sin(kθ) X
−→ = −1 + 2 cos(κθ) for odd k
sin(θ)
odd κ>0

Thus:

18
( P
k−1
d 2 odd κ>0 Tκ (x) for even k
k · Uk−1 (x) = Tk (x) = k · Pk−1
dx −1 + 2 even κ≥0 Tκ (x) for odd k
Let’s see this on a couple examples to get a better intuition: In practice we represent a function with N Chebyshev
series coefficients, stored low to high. If N = 5, then T3 (x) would be [0, 0, 0, 1, 0], and T4 (x) would be [0, 0, 0, 0, 1]. If we
d d
differentiate these two, we should get dx T3 (x) = 3·(−1+2·[1, 0, 1, 0, 0]) = [3, 0, 6, 0, 0], and dx T4 (x) = 4·2·[0, 1, 0, 1, 0] =
[0, 8, 0, 8, 0]. Notice the extra constant from the −1 in the first example is factored in to the coefficient of T0 (x) = 1.
d
Because differentiation is linear, we can scale and stack these particular results, e.g. dx (2T3 (x) + T4 (x)) = 6T0 (x) +
8T1 (x) + 12T2 (x) + 8T3 (x). And because each term’s derivative only affects every-other term of lower order, and these
effects are cumulative, it’s possible to calculate the new sequence by starting at the higher-order end and working
downwards, modifying only two numbers at each step[26]. An implementation of this procedure called chebder[24]
lives in numpy.

7 Multidimensionality
We are now fully equipped to find derivatives for 1-dimensional data. This is technically all we need, because, due
to linearity of the derivative operator, we can find the derivative along a particular dimension of a multidimensional
space by using our 1D solution along each constituent vector running in that direction, and we can find derivatives
along multiple dimensions by applying the above in series along each dimension:

∂2
y(x1 , x2 ) = Algo(Algo(yi , 1st , x1 )j , 1st , x2 ) ∀ i, j
∂x1 ∂x2

∂2 ∂2
∇2 y = ( + )y = Algo(yi , 2nd , x1 ) + Algo(yj , 2nd , x2 ) ∀ i, j
∂x21 ∂x22
where i, j are indexers as in the computing sense and have nothing to do with the imaginary unit, Algo applies the
algorithm to each vector along the dimension given by the third argument, and the 1st and 2nd in the second argument
refer to the derivative order.
Each application to a vector incurs O(N log N ) cost, and fundamentally applying the method to higher-dimensional
data must involve a loop, so the full cost of applying along any given direction is (assuming length N in all dimensions)
O(N D log N ), where D is the dimension of the data. Aside from pushing this loop lower down into numpy to take
advantage of vectorized compute, there can be no cost savings for a derivative in a particular dimension.

7.1 Dimensions Together versus In Series


Can we simplify the situation at all?
Due to the linearity of the Fourier transform (and DCT), transforming along all dimensions, multiplying by ap-
propriate (jk)ν (or calling chebder) along corresponding dimensions of the transformed data, and then inverse trans-
forming along all dimensions is equivalent to transforming, differentiating, and inverse transforming each dimension in
series[17]:

FFT ⊙(jk)νd FFT−1


y Y (jk)ν̃ ⊙ Y dy

FFT ⊙(jk)νd′ FFT−1

O(D · N D log N ) O(D · N D ) O(D · N D log N )

That’s neat, but does it save us anything, really? Let’s see it in series:

19
FFT
y Yd (jk)νd ⊙ Yd ∂d y

FFT ⊙(jk)νd FFT−1

O(N D log N ) O(N D ) O(N D log N )

FFT−1 ⊙(jk)νd′
... ∂d′ ∂d y (jk)νd′ ⊙ Yd′ Yd′
repeat for D
dimensions

If we add up the costs, we can see that it’s actually no more or less efficient to differentiate along all dimensions at
once versus in series.
From a user-friendliness perspective, I judge it to be somewhat more confusing to specify multiple derivative
dimensions at once (although generalizing the order and axis parameters to vectors is possible), so I have chosen to
limit the package to differentation along a single dimension at a time, which also agrees with the interface of chebder.
Multidimensional data can still be handled, however, via clever indexing and use of fft, dct, and chebder’s axis
parameter.

8 Arbitrary Domains
So far we’ve only used the domain [0, 2π) in the Fourier case, because this is the domain assumed by the DFT, and the
domain [−1, 1] in the Chebyshev case, because this is the where a cosine wrapped around a cylinder casts a shadow.
As you may have guessed, this hasn’t curtailed the generality of the methods at all, because we can map any domain
from a to b onto a canonical domain.

8.1 Fourier on [a, b)


Say we have t ∈ [a, b) that we need to map to θ ∈ [0, 2π). We can accomplish this with:
b−a
θ ∈ [0, 2π) ↔ t ∈ [a, b) = [0, 2π) · +a
| {z } 2π
θ
To get a sense this is true, let’s see an example: phase is same
modulo 2π
t = θ 8−4
2π + 4
cos( π2 t + π6 ) + 2 sin( 3π π  + π6 ) + 2 sin(3θ + 
cos(θ + 
2π  + π4 ), θ ∈ [0, 2π)

2 t + 4 ), t ∈ [4, 8)
4 4

2 2

t θ
2 4 6 8 2 4 6 8
−2 −2

−4 −4

In the discrete case, where we have M samples on [a, b), then we can map tn with:

{0, ...M − 1} · 2π {0, ...M − 1} · 


 b−a

θn ∈ ↔ tn ∈ · +a
M M 2π

In simple code terms, if we want to take a spectral derivative of a function that’s periodic on [a, b), then we need
to sample it at t_n = np.arange(M)/M * (b - a) + a = np.linspace(a, b, M, endpoint=False).

20
8.2 Chebyshev on [a, b]
Here both ends are inclusive, so we have t ∈ [a, b] that we need to map to x ∈ [−1, 1]. We can accomplish this with:
b−a b+a
x ∈ [−1, 1] ↔ t ∈ [a, b] = [−1, 1] · +
| {z } 2 2
x

To get a sense this is true, let’s see an example:


t = x 4−1
2 +
4+1
2 3 5
et sin(5t), t ∈ [1, 4) e( 2 x+ 2 ) sin(5( 32 x + 52 )), x ∈ [−1, 1]

40 40

20 20

t x
−1 1 2 3 4 −1 1 2 3 4
−20 −20

In the discrete case, where we have N + 1 samples on [a, b], then we can map tn with:
 π{0, ...N } 
 π{0, ...N }  b − a b + a
xn ∈ cos ↔ tn ∈ cos · +
N N 2 2
In code this is t_n = np.cos(np.arange(N+1)*np.pi/N) * (b - a)/2 + (b + a)/2.

Notice the order has flipped here, that counting up in n means we traverse x from +1 → −1. This is actually what
we want; it corresponds to the horizontal flip necessary to make cosine shadows equate with Chebyshev polynomails.

8.3 Accounting for Smoosh


When a function is sampled at one of the tn above, then it is as if the function lives on the canonical domain. The
actual mapping is purely notional, and the spectral differentation procedure proceeds completely agnostic to where
the data really came from.
This means the result will actually be the derivative of the smooshed or stretched version of the function on the
canonical domain. As the examples hopefully clarified, the height of this smooshed function is exactly as it was before,
but the width is compressed or expanded by a factor of:
(

length of new interval for Fourier
smoosh = = b−a 2
length of old interval b−a for Chebyshev
Because a derivative is calculating slope, and slope is rise over run, the answer is effectively now
dy dy b−a
2 or 2π = ·
dx · b−a
dx |2 or
{z2π}
scale

In other words, the overall derivative is scaled by the inverse of the width-smoosh. So to recover the true derivative
dy
we want, dx , we have to divide by this scale, which is a familiar term from our variable transformations t ↔ θ or x.
For higher derivatives:
dν y dν y
= · scaleν
dx · smoosh)ν dxν
So we can always correct the derivative by dividing by scaleν .
To enable calculation of the scale, and to double check the user sampled their function at a correct t_n (especially
in the Chebyshev case, since cosine-spacing is easy to flub and especially confusing with the DCT-II), the functions
take the sample locations as a parameter and raise error messages with correct examples if the sampling is invalid.

21
9 Differentiating in the Presence of Noise
Finding the true derivative of a noisy signal is ill-posed, because random variations cause unknown (and often dra-
mamtic) local deviations of function slope. This problem only gets worse for calculations of curvature and higher order
derivatives, so the only solution is to try to remove the noise before differentating.

9.1 White Noise


Absent further information about a noise-generating process, we can’t assume anything about the noise’s structure,
which makes it extra challenging to remove. We instead can only rely on the aid of a central observation from the field
of signal processing:

“Every spectrum of real noise falls off reasonably rapidly as you go to infinite frequencies, or else it
would have infinite energy. But the sampling process aliases higher frequencies in lower ones, and the
folding ... tends to produce a flat spectrum. ... white noise. The signal, usually, is mainly in the lower
frequencies.” –Richard Hamming, The Art of Doing Science and Engineering[2], Digital Filters III

By use of the term “frequencies”, Hamming is implying use of the Fourier basis. He’s saying there is band-separation
of signal and noise in the frequency domain, yet another reason for its popularity. This general principle extends to
other noise reduction techniques in spirit: At bottom, all accomplish some kind of smoothing, be it moving average,
FIR filtering, Savitzky-Golay, Kalman filtering, total variation penalty, etc.

9.2 Filtering with the Fourier Basis


It’s helpful to see a picture:

Cutoff Frequency
Energy

Aliasing

0 1 fs 3 0 1 1
2 fs 2 fs 4 fs 2 fs
Frequency f (Hz) Frequency f (Hz)

Typical energy spectrum of a noisy signal, before and after sampling. fs is a sampling rate of our choosing.

In practice we expect to sample a signal frequently enough to capture fluctuations of interest in the data, so signal
energy should be concentrated in frequencies below some cutoff. Noise energy, decaying up to higher frequencies, is
added in linearly.
To reach the frequency domain, we use the FFT, which requires equispaced samples and a periodic signal, because
a discontinuity or corner causes artefacts at higher frequency, which become impossible to model in finitely many
coefficients and impossible to distinguish from noise. Nyquist theorem[11] tells us that we need > 2 equispaced
samples per cycle to unambiguously reconstruct a frequency, so this process creates a natural cutoff at the Nyquist
rate, fs /2. Equispaced samples taken from frequencies slightly over this limit are best matched by frequencies slightly
under the limit, just as unitary complex numbers alias: ej(π+ϵ) = e−j(π−ϵ) . This creates a folding pattern in the
bandlimited FFT spectrum, which distributes decaying noise energy somewhat evenly.
Notice the more we sample, the more we can concentrate the legitimate signal’s energy in low frequencies, and
the more we can distribute noise energy across higher frequencies. We can then zero out or otherwise dampen the
upper part of the spectrum to chop down the noise that hasn’t aliased on to the periodic signal’s band! The filter
parameter is designed exactly for this. If we then inverse transform, we get a smoother signal, or we can multiply by
(jk)ν and then inverse transform to sample the derivative of that smoother signal.

22
9.3 The Advantage of Spectral Smoothing
Most alternative noise-quelling techniques can only take advantage of local information, but a spectral representation
builds a function out of basis functions that span the entire domain, so every point’s value takes holistic fit under
consideration. This makes the reconstruction much more robust to perturbations than one that uses only a few
neighboring points. I.e. it’s much harder to corrupt the signal so thoroughly that it can’t be successfully recovered.
In the Fourier case, this has an analogy with error-correcting codes[28, 2]; notice the locations corresponding to
Hamming code parity checks constitute selections of different frequency:
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8

9 10 11 12 9 10 11 12 9 10 11 12 9 10 11 12

13 14 15 16 13 14 15 16 13 14 15 16 13 14 15 16

check #1 check #2 check #3 check #4

Of course there are distinctions: In the context of continuous signals, “corruption” means (discrete representations
of) continous numbers have slightly inaccurate values, whereas in error correcting codes each datum is a single discrete
bit which can simply be flipped. Likewise, a Fourier coefficient indicates “We need this much of the k th basis function”,
whereas the analogous parity check indicates “There is/isn’t a parity error among my bits.”
But in both cases a spot-corruption will stand out, because it appears in a particular combination of parity checks
or introduces a value that can’t as easily be represented with a finite combination of smooth basis functions. In this
sense, the Fourier basis is the ideal basis for filtering high frequency noise, so long as the signal of interest is periodic,
and other methods are merely trying to approximate a lowpass filter.

9.4 Measurement versus Process Uncertainty


Noise can have a couple different sources. If noise is due to imperfect sensing, then each sample is drawn from a random
variable. By contrast, if there is uncertainty in the underlying dynamics, then we might say there is noise in the process
itself. If we sample our signal uniformly, as we do when using the Fourier basis and most other filtering methods, since
they tend to be more computationally expensive to generalize to non-uniform samplings[29], then we don’t really need
to make a distinction between these noise sources, because they look the same. But if we cosine-sample our data for
expediency when using the Chebyshev basis, there will be a difference, because samples taken closer together will have
greater correlation.

cosine-spaced noise equispaced noise cosine-sampling of equispaced noise


measurement noise in x domain measurement noise in θ domain or process noise in θ domain
process noise in x domain

When we sample at cosine-space points, ∆x decreases near the edges. In the case of measurement noise, effects
∆y
on ∆y are undiminished in these regions, so the error in ∆x increases dramatically. In the case of process noise the
situation is a little more hopeful, because its effects on ∆y naturally get smaller as ∆x shrinks.

9.5 Filtering with Polynomial Bases (is Terrible)


To avert Gibbs phenomenon in the case of aperiodic data or to achieve better compression (more energy represented
in fewer modes) in case we know something about how our data should look, we may prefer to use another basis, like
Chebyshev polynomials or PCA modes for representation. All bases have “lower frequency” and “higher frequency”
elements, meaning some modes have fewer transitions between low and high values, and some have more. Similar to
Fourier basis representations, signal energy empirically tends to cluster in lower modes, and noise tends to be scattered
across modes, similar to the typical spectrum above.
However, smoothing aperiodic data is fundamentally more difficult than the periodic case at the edges of the domain,
because in the periodic case we can smooth across the domain boundary, whereas with an aperiodic function we only

23
get one-sided information about those regions. And unlike Fourier basis functions, which have uniform frequency
throughout their domain, other bases are typically nonuniform. This may be desirable if the data or noise is known to
fit a particular pattern, but in the naive case, where we don’t know the noise shape or expect white noise, nonuniform
basis functions’ variable expressive power across the domain can mismatch the situation. Chebyshev polynomials
exhibit this characteristic, getting steeper and therefore higher frequency near the boundaries, which makes them
worse at filtering and better at fitting high frequency noise in these regions, i.e. more sensitive to disruptions there.

An example of systematic edge blowup for Chebyshev derivatives in the presence of noise.

This problem is perhaps best shown mathematically in the Chebyshev-via-Fourier method, where we not only
implicitly warp the function to be y(cos(θ)) by taking a DCT (thereby treating measurement noise as if it’s evenly
distributed√in the θ domain), but we later also explicitly unwarp to get back to the x domain, involving division by
powers of 1 − x2 [25, 18], which → 0 as x → ±1. Numerically equivalent, in the Chebyshev series-based method,
“Tight coupling between coefficients enables propagation of errors from high frequency to low frequency modes.”[25] We
see that higher-order coefficients have an impact on ever-other lower-order coefficient[26], and this effect is cumulative,
so a slight error in coefficient values, especially higher-order ones, compounds in a very nasty way when we differentiate.
Thus “while the representation of a function by a Chebyshev series may be most accurate near x = ±1, these results
indicate that the derivatives computed from a Chebyshev series are least accurate at the edges of the domain.”[25]
In fact, the Chebyshev basis is not the only one suffer this weakness, because other polynomial bases, e.g. the
Legendre polynomials (orthogonal without a weighting function) and Bernstein polynomials (linearly independent
but not orthogonal), experimented with in the filtering noise notebook, are also more sensitive at the domain edges,
even when sampled at cosine-spaced points to help dampen Runge phenomenon. This is a fundamental limitation of
polynomial-based methods in the presence of noise. Occasionally differentiation can be made to work okay by filtering
higher modes, but there is always systematic blowup in higher-order derivatives.

Notes
a. There’s a great passage in Richard Hamming’s book The Art of Doing Science and Engineering[2] where he wonders why we use the
Fourier basis so much:
“It soon became clear to me digital filter theory was dominated by Fourier series, about which theoretically I had learned
in college, and actually I had had a lot of further education during the signal processing I had done for John Tukey, who was
a professor from Princeton, a genius, and a one or two day a week employee of Bell Telephone Laboratories. For about ten
years I was his computing arm much of the time.
Being a mathematician I knew, as all of you do, that any complete set of functions will do about as good as any other
set at representing arbitrary functions. Why, then, the exclusive use of the Fourier series? I asked various electrical engineers
and got no satisfactory answers. One engineer said alternating currents were sinusoidal, hence we used sinusoids, to which I
replied it made no sense to me. So much for the usual residual education of the typical electrical engineer after they have left
school!

24
So I had to think of basics, just as I told you I had done when using an error-detecting computer. What is really going
on? I suppose many of you know what we want is a time-invariant representation of signals, since there is usually no natural
origin of time. Hence we are led to the trigonometric functions (the eigenfunctions of translation), in the form of both Fourier
series and Fourier integrals, as the tool for representing things.
Second, linear systems, which is what we want at this stage, also have the same eigenfunctions—the complex exponentials
which are equivalent to the real trigonometric functions. Hence a simple rule: if you have either a time-invariant system or a
linear system, then you should use the complex exponentials.
On further digging in to the matter I found yet a third reason for using them in the field of digital filters. There is a
theorem, often called Nyquist’s sampling theorem (though it was known long before and even published by Whittaker, in
a form you can hardly realize what it is saying, even when you know Nyquist’s theorem), which says that if you have a
band-limited signal and sample at equal spaces at a rate of at least two in the highest frequency, then the original signal can
be reconstructed from the samples. Hence the sampling process loses no information when we replace the continuous signal
with the equally spaced samples, provided the samples cover the whole real line. The sampling rate is often known as the
Nyquist rate after Harry Nyquist, also of servo stability fame, as well as other things [also reputed to have been just a really
great guy who often had productive lunches with his colleagues, giving them feedback and asking questions that brought out
the best in them]. If you sample a non-band-limited function, then the higher frequencies are “aliased” into lower ones, a word
devised by Tukey to describe the fact that a single high frequency will appear later as a single low frequency in the Nyquist
band. The same is not true for any other set of functions, say powers of t. Under equally spaced sampling and reconstruction
a single high power of t will go into a polynomial (many terms) of lower powers of t.
Thus there are three good reasons for the Fourier functions: (1) time invariance, (2) linearity, and (3) the reconstruction
of the original function from the equally spaced samples is simple and easy to understand.
Therefore we are going to analyze the signals in terms of the Fourier functions, and I need not discuss with electrical
engineers why we usually use the complex exponents as the frequencies instead of the real trigonometric functions. [It’s down
to convenience, really.] We have a linear operation, and when we put a signal (a stream of numbers) into the filter, then out
comes another stream of numbers. It is natural, if not from your linear algebra course then from other things such as a course
in differential equations, to ask what functions go in and come out exactly the same except for scale. Well, as noted above,
they are the complex exponentials; they are the eigenfunctions of linear, time-invariant, equally spaced sampled systems.
Lo and behold, the famous transfer function [contains] exactly the eigenvalues of the corresponding eigenfunctions! Upon
asking various electrical engineers what the transfer function was, no one has ever told me that! Yes, when pointed out to
them that it is the same idea they have to agree, but the fact it is the same idea never seemed to have crossed their minds!
The same, simple idea, in two or more different disguises in their minds, and they knew of no connection between them! Get
down to the basics every time!”
In that spirit, with Patron Saint Hamming watching over us, let’s continue: subsection 1.1

References
[1] Lebesgue Integrable, https://mathworld.wolfram.com/LebesgueIntegrable.html
[2] Hamming, R., 1996, The Art of Doing Science and Engineering
[3] Pego, B., Simplest proof of Taylor’s theorem, https://math.stackexchange.com/a/492165/278341

[4] Disintegration By Parts, Why do Fourier transforms use complex numbers?,


https://math.stackexchange.com/a/1293127/278341
[5] Yagle, A., 2005, https://web.eecs.umich.edu/∼aey/eecs206/lectures/fourier2.pdf
[6] https://math.stackexchange.com/questions/1105265/why-do-fourier-series-work

[7] Xue, S., 2017, Convergence of Fourier Series, https://math.uchicago.edu/∼may/REU2017/REUPapers/Xue.pdf


[8] Derivation of Fourier Series, http://lpsa.swarthmore.edu/Fourier/Series/DerFS.html
[9] Sego, D., Demystifying Fourier analysis, https://dsego.github.io/demystifying-fourier/
[10] Nakagome, S. Fourier Transform 101 — Part 4: Discrete Fourier Transform,
https://medium.com/sho-jp/fourier-transform-101-part-4-discrete-fourier-transform-8fc3fbb763f3
[11] Oppenheim, A. & Willsky, A., 1996, Signals and Systems, 2nd Ed.
[12] Brunton, S., The Fourier Transform and Derivatives, https://www.youtube.com/watch?v=d5d0ORQHNYs

[13] Trefethen, N., 2000, Spectral Methods in Matlab, Chapter 4,


https://epubs.siam.org/doi/epdf/10.1137/1.9780898719598.ch4
[14] Kutz, J.N., 2023, Data-Driven Modeling & Scientific Computation, Ch. 11,
https://faculty.washington.edu/kutz/kutz book v2.pdf

25
[15] Discrete Fourier Transform, https://numpy.org/doc/2.1/reference/routines.fft.html

[16] Bristow-Johnson, R., 2014, About Discrete Fourier Transform vs. Discrete Fourier Series,
https://dsp.stackexchange.com/a/18931/40873
[17] Johnson, S., 2011, Notes on FFT-based differentiation, https://math.mit.edu/∼stevenj/fft-deriv.pdf
[18] Trefethen, N., 2000, Spectral Methods in Matlab, Chapter 8,
https://epubs.siam.org/doi/epdf/10.1137/1.9780898719598.ch8
[19] Burns, K., et al., 2020, Dedalus: A flexible framework for numerical simulations with spectral methods,
https://www.researchgate.net/publication/
340905766 Dedalus A flexible framework for numerical simulations with spectral methods
[20] https://docs.scipy.org/doc/scipy/reference/generated/scipy.fft.dct.html

[21] https://docs.scipy.org/doc/scipy/reference/generated/scipy.fft.dst.html
[22] Giesen, F., DCT-II vs. KLT/PCA, https://www.farbrausch.de/%7Efg/articles/dct klt.pdf
[23] Royi, 2025, Why does the DCT-II have better energy compaction than DCT-I?,
https://dsp.stackexchange.com/a/96197/40873
[24] Harris, C., 2009, chebder,
https://github.com/numpy/numpy/blob/v2.2.0/numpy/polynomial/chebyshev.py#L874-L961
[25] Breuer, K. & Everson, R., 1990, On the errors incurred calculating derivatives using Chebyshev polynomials,
https://doi.org/10.1016/0021-9991(92)90274-3

[26] Komarov, P., 2025, Chebyshev Series Derivative in terms of Coefficients,


https://scicomp.stackexchange.com/q/44939/48402
[27] Power Rule, https://mathworld.wolfram.com/PowerRule.html
[28] Sanderson, G., 2020, But what are Hamming codes? The origin of error correction,
https://www.youtube.com/watch?v=X8jsijhllIA
[29] 2012, Savitzky-Golay smoothing filter for not equally spaced data, https://dsp.stackexchange.com/a/9494/40873

26

You might also like