Spectral Derivatives
Spectral Derivatives
Pavel Komarov
January 8, 2025
One of the happiest accidents in all math is the ease of transforming a function to and taking derivatives in the
Fourier (i.e. the frequency) domain. But in order to exploit this extraordinary fact without serious artefacting, and in
order to be able to use a computer, we need quite a bit of extra knowledge and care.
This document sets out the math behind the spectral-derivatives Python package, all the way down to the
arXiv:2506.06210v1 [eess.SP] 6 Jun 2025
bones, as much as I can manage. I try to get in to the real whys behind what we’re doing here, touching on fundamental
signal processing and calculus concepts as necessary, and building upwards to more general cases.
Contents
1 Bases 2
1.1 The Fourier Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Transforms 3
2.1 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 A Whole Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
7 Multidimensionality 19
7.1 Dimensions Together versus In Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
8 Arbitrary Domains 20
8.1 Fourier on [a, b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
8.2 Chebyshev on [a, b] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
8.3 Accounting for Smoosh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1
1 Bases
A basis is a set of functions, call them {ξk }, that can be linearly combined to produce other functions. Often these
are chosen to be orthogonal, meaning that if we take the “inner product” of one funtion from the set with itself, we
get back a constant (often normalized to be 1), and if we take the inner product of one of these functions with a
different member of the set, we get back 0. In this sense the members of an orthogonal basis set are like perpendicular
directions on a graph.
The inner product between two functions f and g is a generalization of the inner product between vectors, where
instead of summing over a finite number of discrete entries, we integrate over infinitely many infinitesimally-separated
points in the domain. We define it as:
Zb
⟨f, g⟩ = f (x)g(x)dx
a
Zb
⟨f, g⟩ = ⟨g, f ⟩ = f (x)g(x)dx
a
Note that if we set a and b at ±∞, this integral could diverge. If it doesn’t diverge with infinite bounds, we say the
argument is “Lebesgue integrable”[1]. Some of what we’ll do only makes sense for this class of functions, so be aware.
ω3 ω5 ω7
sin(ω) = ω − + − + ...
3! 5! 7!
ω2 ω4 ω6
cos(ω) = 1 − + − + ...
2! 4! 6!
Notice all of the even-power terms appear with alternating sign as in the cosine expansion, and the odd-power
terms appear with alternating sign as in the sine expansion, but with an extra j multiplied in.
The presence of complex numbers to make this work can be confusing at first, but don’t be scared! All we’re really
doing is using a compressed representation of a sine plus a cosine, where the real and imaginary parts (orthogonal in
the complex plane, and therefore independent and non-interfering) allow us to describe the contributions of sine and
cosine simultaneously. In fact, Joseph Fourier originally used only real trigonometric functions[4], and it wasn’t until
later someone decided it would be easier to work with complex exponentials. Later (subsection 2.1) we’ll see that for
real signals all the complex numbers cancel, leaving only a real sine and real cosine, which when added together make
a single, phase-shifted sinusoid! So think of ejω as oscillations at a particular frequency, ω.
If we inner product mismatched wiggles, they misalign and integrate to 0, but if we inner product matched wiggles,
they align, multiply to 1 because of the complex conjugate, and integrate to 2π over a period.
2
2 Transforms
We can reconstruct a function from a linear combination of basis functions:
M
X −1
f (x) = ck ξk (x), x ∈ [a, b]
k=0
where M is the number of basis functions we’re using in our reconstruction and k iterates through them. This is
essentially a recipe, which tells us how much of each basis function ξk is present in the signal f on a domain between
a and b.
We can find the quantities {ck } by taking the inner product of the above with each of the basis functions to produce
a system of M equations, then solving. In the special case where {ξk } are orthogonal, the system’s equations untangle,
and we get the simple relationship:
Rb
f (x)ξk (x)dx
⟨ξk , f ⟩ a
ck = =
⟨ξk , ξk ⟩ ||ξk ||22
The set of numbers {ck } is now an alternative representation of the original function. In some sense it’s equally
descriptive, so long as we know which basis we’re using to reconstruct. The function has been transformed, completely
analogous to changing coordinate systems in linear algebra, where we can express vector f⃗ in terms of a new orthogonal
⃗ ⃗ ⃗ ⃗
basis (ξ⃗0 , ξ⃗1 ) instead of axis-aligned unit vectors (⃗e0 , ⃗e1 ) via f⃗ = ⟨||ξξ⃗0 ,||f2⟩ ξ⃗0 + ⟨||ξξ⃗1 ,||f2⟩ ξ⃗1 :
0 2 1 2
f⃗
⃗e1
projξ⃗1 f⃗
ξ⃗1 ξ⃗0
projξ⃗0 f⃗
⃗e0
The {ck } are often said to live in another “domain”, although we have to be careful with this terminology, because
it technically refers to a “connected” set, not just a collection of M things. To be precise, some authors use “series”
to describe {ck } instead. However, it is possible for members of the basis set to be related through a continuous
parameter which in some sense makes the set dense, even in cases where we only take discrete members of this more
general set to be our basis set for a particular scenario. This is the case for the Fourier basis, where we choose ω ∈ R,
and hence ω really can become a new domain.
where
• f is periodic with fundamental frequency ω0 , so the k th frequency becomes k · ω0 .
• ak and bk are coefficients describing how much cosine and sine to add in, respectively.
• k goes up to ∞ because in general we need an infinite number of ever-higher-frequency sinusoids to reconstruct
the function with perfect fidelity.
∗ It’s worth considering how weird it is this works to express arbitrary functions, even non-smooth ones (so long as they meet the Dirichlet
conditions[11], i.e. aren’t pathological cases), a fact so counter-intuitive that Joseph Lagrange publicly declared Fourier was wrong at a
meeting of the Paris Academy in 1807[5] and rejected Fourier’s paper, which then went unpublished until after Lagrange died![11] It’s
valuable to ask why this works[6] and sift through some analysis.[7]
3
ejx +e−jx ejx −e−jx
Let’s now use cos(x) = 2 and sin(x) = 2j , which can be verified by manipulating Euler’s formula,
Equation 1.
∞
X ejkω0 x + e−jkω0 x ejkω0 x − e−jkω0 x
f (x) = a0 + (ak + bk )
2 2j
k=1
−1 ∞ ∞
X a−k b−k jkω0 x X ak bk X
= a0 + ( − )e + ( + )ejkω0 x = ck ejkω0 x
2 2j 2 2j
k=−∞ k=1 k=−∞
So if we choose c0 = a0 and ck = c−k = a2k + b2jk , then the complex exponential formulation is exactly equivalent to
the trigonometric formulation[8]. That is, we can choose complex ck such that when multiplied by complex exponentials,
we get back only real signal! Essentially, the relative balance of real and complex in ck affects how much cosine and
sine are present at the k th frequency, thereby accomplishing a phase shift[9]. Without accounting for phase shifts, we
would only be able to model symmetric signals!
If instead of a fundamental frequency ω0 = 2π T , where T is a period of repetition, the signal contains dense
frequencies (because it has no repetition, T → ∞, ω0 → 0), and if we care about a domain of the entire set of R, then
it makes more sense to express the transformed coefficients as a function in ω and to make both our inner product and
reconstruction expression integrals from −∞ to +∞:
Z∞
fˆ(ω) = f (x)e−jωx dx = F{f (x)}
−∞
(2)
Z∞
1
f (x) = fˆ(ω)ejωx dω = F −1 {fˆ(ω)}
2π
−∞
1
where the hat ◦ˆ represents a function in the Fourier domain, and the 2π is a scaling factor that corrects for the fact
the inner product of a Fourier basis function with itself integrates to 2π over a period instead of to 1 as we need for
orthonormality.
Just like the {ck }, fˆ(ω) can be complex, but if the original f (x) is real, then fˆ’s complexity will perfectly interact
with the complex exponentials to produce only a real function in the reconstruction.
Periodic Aperiodic
x(t)
Continuous
x(t)
FT−1 FT
X(ejω ) FS X(jω)
DTFT
DTFT−1
DFT−1 DFT
ck
X[k]
4
Note that, following a more signal-processing-ish convention[11], the function we’re transforming is now called x,
and the independent variable, since it can no longer be x, is named t. For discrete signals, we use independent variable
n in square brackets.
Here FS stands for “Fourier Series”, which is the first situation covered above. FT stands for “Fourier Transform”,
which is given by the integral pair, Equation 2. But these are not the only possibilities! DTFT stands for “Discrete
Time Fourier Transform”, where the signal we want to analyze is discrete but the transform is continuous. And finally
DFT stands for “Discrete Fourier Transform”, not to be confused with the DTFT, which we use when both the original
and transformed signals are sampled.
All of these can be considered Fourier transforms, but often when people talk about the canonical “Fourier Trans-
form”, they are referring to the continuous, aperiodic case in the upper righthand cell.
The notation of all these different functions and transforms is easy to mix up and made all the more confusing by
the reuse of symbols. But it’s important to keep straight which situation we’re in. I can only apologize. For more on
all these, see [11].
So a derivative in the x domain can be accomplished by a multiplication in the frequency domain. We can raise to
higher derivatives simply by multiplying by jω more times.
This is great because taking derivatives in the spatial domain is actually pretty hard, especially if we’re working with
discrete samples of a signal, whereas taking the derivative this way in the frequency domain, the spectral derivative,
gives us much better fidelity.[13, 14] The cost is that we have to do a Fourier transform and inverse Fourier transform
to sandwich the actual differentiation, but there is an O(N log N ) algorithm to accomplish the DFT (subsection 2.2
and Equation 3) for discrete signals called the Cooley-Tukey algorithm, also known as the Fast Fourier Transform
(FFT)[14].
where
• n iterates samples in the original domain (often spatial)
• k iterates samples in the frequency domain (wavenumbers)
• M is the number of samples in the signal, often given as N by other sources[15], but I’ll use N for something
else later and want to be consistent
• y denotes the signal in its original domain
5
• Y denotes the signal in the frequency domain
3.1.2 Interpolation
I now quote Steven Johnson[17], with some of my own symbols and notation sprinkled in:
“In order to compute derivatives like y ′ (θ), we need to do more than express yn . We need to use the
DFT−1 expression to define a continuous interpolation between the samples yn —this is called trigono-
metric interpolation—and then differentiate this interpolation. At first glance, interpolating seems very
straightforward: one simply evaluates the DFT−1 expression at non-integer n ∈ R. This indeed defines an
interpolation, but it is not the only interpolation, nor is it the best interpolation for this purpose. The rea-
son there is more than one interpolation is due to aliasing: any term e+jθn k Yk in the DFT−1 can be replaced
2π
by e+jθn (k+mM ) Yk for any integer m and still give the same samples yn , since ej M nmM = ej2πnm = 1 for
any integers m and n. Essentially, adding the mM term to k means that the interpolated function y(θ)
just oscillates m extra times between the sample points, which has no effect on yn but has a huge effect
on derivatives. To resolve this ambiguity, one imposes additional criteria—e.g. a bandlimited spectrum
and/or minimizing some derivative of the interpolated y(θ)”
We can now posit a slightly more general formula for the underlying continuous, periodic (over interval length M)
signal:
M −1
1 X
y(θ) = Yk ejθ(k+mk M ) , mk ∈ Z
M
k=0
“In order to uniquely determine the mk , a useful criterion is that we wish to oscillate as little as
possible between the sample points yn . One way to express this idea is to assume that y(θ) is bandlimited
to frequences |k + mk M | ≤ M2 . Another approach, that gives the same result ... is to minimize the
mean-square slopeӠ
Z2π Z2π M −1 2
1 ′ 2 1 1 X
|y (θ)| dθ = j(k + mk M )Yk ejθ(k+mk M ) dθ
2π 2π M
0 0 k=0
Z2π M −1 M −1
1 X X
= j(k + mk M )Yk ejθ(k+mk M ) j(k + mk M )Yk ejθ(k+mk M ) dθ
2πM 2
0 k=0 k=0
Z2πM −1 M −1
1 X X
= j(k + mk M )Yk ejθ(k+mk M ) j(k ′ + mk′ M )Yk′ ejθ(k′ +mk′ M ) dθ
2πM 2
0 k=0 k′ =0
† It’s due to this ambiguity and constraint that spectral methods are only suitable for smooth functions!
6
M −1 M −1 Z2π
1 X X 1 ′
= 2 (k + mk M )(k ′ + mk′ M )Yk Yk′ ejθ(k+mk M ) e−jθ(k +mk′ M ) dθ
M ′
2π
k=0 k =0 0
| {z }
′
0 if k + mk M ̸= k + mk M
′
=
⇐⇒ k ̸= k for 0 ≤ k, k ′ < M
′
1 if k = k ′
M −1
1 X
= |Yk |2 (k + mk M )2
M2
k=0
We now seek to minimize this by choosing mk for each k. Only the last term depends on mk , so it’s sufficient to
minimize only this:
minimize (k + mk M )2
mk
s.t. 0≤k<M
mk ∈ Z
This problem actually admits of good ol’ calculus plus some common sense:
d −k
(k + mk M )2 = 2(k + mk M )M = 0 −→ m∗k = ∈ (−1, 0]
dmk M
∗
where denotes optimality. But we additionally need to choose mk ∈ Z. Let’s plot it to see what’s going on.
cost
feasible costs
−1
∗
−0.5 mk mk
As we change the values of M and k, the parabola shifts around, getting taller for larger M and shifting leftward
as k → M .
We can see that for k ∈ [0, M M
2 ), the mk = 0 solution is lower down the cost curve, and for k ∈ ( 2 , M ), the mk = −1
solution is more optimal. “If k = M 2 (for even M ), however, there is an ambiguity: either mk = 0 or −1 gives the
2 M 2
same value (k + mk M ) = ( 2 ) . For this YM/2 term (the “Nyquist” term), we can arbitrarily split up the YM/2 term
between m = 0 [j M M M
2 θ, positive frequency] and m = −1 [j( 2 − M )θ = −j 2 θ, negative frequency]:”
M M
YM/2 (uej 2 θ
+ (1 − u)e−j 2 θ
)
(−1)n (−1)n
z}|{ z }| {
M 2π M 2π
where u ∈ C s.t. at sample points θn we get YM/2 (uej 2 M n + (1 − u)e−j 2 M n ) = YM/2 (u ejπn +(1 − u) e−jπn ) =
YM/2 (−1)n “and so recover the DFT−1 .”
′
If we use the above in the mean-squared slope derivation instead of Yk ejθ(k+mk M ) and Yk′ ejθ(k +mk′ M ) , then the
integral portion becomes:
Z2π
1 M M M M
YM/2 YM/2 (uej 2 θ + (1 − u)e−j 2 θ )(uej 2 θ + (1 − u)e−j 2 θ )dθ
2π
0
7
Z2π
M M
+ (1 − u)(1 − u) e|−j 2{z
θ j 2 θ
e } dθ
0 =1
1
= |YM/2 |2 (|u|2 2π + |1 − u|2 2π) = |YM/2 |2 (|u|2 + |1 − u|2 )
2π
because integrating something periodic over a multiple of its period yields its mean, which is 0 in this case.
th
We now know that the contribution to the mean-squared slope from the M 2 term ∝ |u|2 + |1 − u|2 . What’s the
optimal u?
d 1
|u|2 + |1 − u|2 = 2u − 2(1 − u) = 0 −→ u =
du 2
So “the YM/2 term should be equally split between the frequencies ± M 2 θ, giving a cos( M
2 θ) term.” Note that if M
M
is odd, there is no troublesome 2 term like this, but later we’ll use the Discrete Cosine Transform[20] type I (DCT-I),
which is equivalent to the FFT with even M and Yk = YM −k , so we do have to worry about the Nyquist term.
Now if we put it all together we get “the unique “minimal-oscillation” trigonometric interpolation of order
M ”:
1 X M
Yk ejkθ + YM −k e−jkθ + YM/2 cos( θ)
y(θ) = Y0 + (4)
M 2
0<k< M
2
“As a useful side effect, this choice of trigonometric interpolation has the property that real-valued samples yn (for
which Y0 is real and YM −k = Yk ) will result in a purely real-valued interpolation y(θ) for all θ.”
2π
Evaluating at θn = M n, n ∈ Z, we get:
0 M −1
1 X jk 2π −jk 2π M
1 X ′ j 2π kn
yn′ = jk(Yk e M n
− YM −k e M n
)− Y sin(πn) =
Yk e M
M 2 M/2 M
M k=0
0<k< 2
M
jk · Yk
k< 2
′
where Yk = 0 k=M 2
j(k − M ) · Yk k > M M
2 ← comes from: knew = M − kold , 0 < kold < 2
M
→ < knew < M ; −jkold · YM −kold → −j(M − knew ) · Yknew
2
Easy! Now let’s do the second derivative:
d2 1 X 2π 2π
M 2 M
2
y(θ) = (jk)2 (Yk ejk M n + YM −k e−jk M n ) − YM/2 cos( θ)
dθ M M
2 2
0<k< 2
2π
And again evaluating at θn = M n, n∈ Z:
M 2 M −1
1 X 2π 2π
1 X ′′ j 2π kn
yn′′ = (jk)2 (Yk ejk M n + YM −k e−jk M n ) − YM/2 (−1)n = Yk e M
M M
2 M
0<k< 2 k=0
(jk)
2
· Yk k<M 2
(
M
′′
2
′′ (jk)2 · Yk k≤ 2
where Yk = jM
2 · Yk k=M 2
or equivalently Yk = 2 M
(j(k − M )) · Yk k> 2
(j(k − M ))2 · Yk k > M
2
It’s important to realize “this [second derivative] procedure is not equivalent to performing the spectral first-
derivative procedure twice (unless M is odd so that there is no YM/2 term) because the first derivative operation omits
the YM/2 term entirely.”[17]
8
We can repeat for higher derivatives, but the punchline is that for odd derivatives the M
2 term goes away‡ , and for
even derivatives it comes back. In general:
k<M
(jk)ν · Yk 2
(j M )ν · Y
k = M
(ν)
Yk = 2 k 2 and ν even (5)
M
0 k = 2 and ν odd
(j(k − M ))ν · Yk k > M
2
This has definite echoes of the standardly-given, continuous-time case covered in section 3, but it’s emphatically
not as simple as just multiplying by jω or even by jk. However, the final answer is thankfully super compact to
represent in math and in code.
3.2 Limitations
So far it has all been good news, but there is a serious caveat to using the Fourier basis, especially for derivatives.
Although a Fourier transform tends to have more “mass” at lower frequencies and fall off as we go to higher ones
(otherwise the reconstruction integral would diverge), and therefore we can get really great reconstructions by leaving
off higher modes[14] (or equivalently only sampling and transforming M components), we in fact need all the infinite
modes to reconstruct an arbitrary signal[11]. Even then, the Fourier basis can not represent true discontinuities nor
non-smooth corners, instead converging “almost everywhere”, which is math speak for the “measure” or volume of the
set where it doesn’t work being 0, meaning it only doesn’t work at the discontinuities or corners themselves.[11]
If there are discontinuities or corners, we get what’s called the Gibbs Phenomenon[11], essentially overshoot as the
set of basis functions tries to fit a sudden change. These extra wiggles are bad news for function approximation but
even worse news for taking derivatives: If we end up on one of those oscillations, the slope might wildly disagree with
that of the true function!
This is a bigger problem than it may first appear, because when we do this on a computer, we’re using the DFT,
which implicitly periodically extends the function (subsection 2.2, Equation 3). So we not only need the function to
have no jumps or corners internal to its domain; we need it to match up smoothly at the edges of its domain too!
This rules out the above spectral method for all but “periodic boundary conditions”[14]. But if the story ended
right there, I wouldn’t have thought it worth building this package.
9
Im{z}
z = ejθ
z = z −1
= e−jθ
T0 (x) = Re{z 0 } = 1
1
T1 (x) = Re{z 1 } = (ejθ + e−jθ ) = cos(θ) = x
2
1 j2θ −j2θ
T2 (x) = (e + e ) = cos(2θ)
2
1 2 r 1 2 2 2 z + z −1
−2
but also = (z + 2 + z ) −1 = (z + z −1 ) − 1 = √ −1 = 2x2 − 1
2| {z } 2 2 | {z 2 }
perfect square | {z }
2 cos(θ)
1
T3 (x) = (ej3θ + e−j3θ ) = cos(3θ)
2
1 3
but also = (z + z −1 )3 − (z + z −1 ) = 4x3 − 3x
2 2
...
Note the set of {ak } is for k ∈ {0, ...N } and therefore has cardinality N + 1.
10
Relationship of Chebyshev domain and Fourier Domain, from [19]. Notice
the cosines are horizontally flipped. The authors use n instead of k, which is
common for Chebyshev polynomials (e.g. [18]), but I prefer k to enumerate
basis modes, for consistency.
However, there is something we can do about the Runge phenomenon: By clustering fit points at the edges of the
domain, the wild wobbles go away.
If we take xn = cos(θn ) with θn equispaced, n ∈ {0, ...N }, then we get a very natural clustering at the edges of
[−1, 1]. What’s more, if we have equispaced θn and a reconstruction expression built up out of sinusoids, we’re back
in a Fourier paradigm (at least in variable θ) and can exploit the efficiency of the FFT, or, better, the discrete cosine
and sine transforms![20, 21]
Notice too that the polynomials/projected cosines are asymmetrical, so we can natually use this basis to model
11
arbitrary, lopsided functions without having to worry about phase shifts like we did for a Fourier basis of discrete
harmonics.
−2 −2
2 periodically extend
−π − π2 π π θ 3π 2π
2 2
−2
ecos(θ) sin(7 cos(θ)), θ ∈ [−π, 2π]
Illustration of implicit function manipulations in the first two steps of the algorithm. The edges of
the aperiodic function can be made to match by “periodic extension”, but this operation alone only
fixes discontinuity; corners are created at 0 and π, resulting in Gibbs phenomenon when we frequency
transform. Warping stretches corners into smooth transitions.
There are still a lot of details left to be worked out here, which we’ll tackle in sequence.
12
sampling we want, so it is actually our best choice here, and it mercifully comes with the added benefit of being the
least confusing variant to derive from the DFT:
Say we periodically extend y, i.e. stack y(θ), θ ∈ [0, π] next to a horizontal flip of itself on (π, 2π), and then sample
at the canonical DFT points, θn = 2π M n, n ∈ {0, ...M − 1} (Equation 3). We get:
1 X M
Yk ejkθ + YM −k e−jkθ + YM/2 cos( θ)
y(θ) = Y0 +
M 2
0<k< M
2
N −1 (8)
1 X ejkθ + e−jkθ
= Y0 + 2 Yk + YN cos(N θ)
M 2
k=1 | {z }
cos(kθ)
2π π
At samples θn = Mn = N n, this becomes:
N −1
1 X πnk
y(θn ) = Y0 + YN cos(πn) +2 Yk cos( ) (DCT-I−1 )
M | {z } N
k=1
(−1)n
Where [: N + 1] truncates to only the first N + 1 elements ({0, ...N }). Given the equality above, we can line up
everything we now know in a diagram:
truncate
⃗yext ⃗y
truncate−1 where truncate−1 does periodic
−1
FFT FFT DCT-I DCT-I−1 extension, stacking back in
truncate redundant information
⃗ext
Y ⃗
Y
truncate−1
We can now easily see that in addition to the inverse relationship (Equation 9), we also have the forward relationship:
13
FFT([y0 , ...yN , yN −1 , ...y1 ])[: N + 1] = DCT-I([y0 , ...yN ])
th th
Notice that the 0 and N terms appear outside the sum, and that the sum is multiplied by 2. In our original
conception of the cosine series for y(θ) (Equation 7), all the cosines appear equally within the sum, so our Yk are
subtly different from the ak in that formulation (some scaled by a factor of 12 and all scaled by M ). Both are valid,
but it’s more computationally convenient to use the DCT-based formulation.
5.2 Even and Odd Derivatives and the Discrete Sine Transform
The DCT can get us in to the frequency domain, but we’ll need the help of another transform to get back out. We
again start with the DCT-I formulation for simplicity.
⃗ext (Equation 9), we have a palindromic structure around N , but also around 0, because
If we look at the full Y
of the repetitions[16], which ensure we can read the values of Yk at negative k by wrapping around to the end of the
vector. This is describing an even function, f (−x) = f (x), which makes sense, because y(θ) is entirely composed of
cosines, which are even functions, and because the forward transform is symmetrical with the inverse transform, the
interpolation Y (ω) between Yk is also ultimately a bunch of cosines.
The derivative of an even function is an odd function, f (−x) = −f (x), which in principle should be constructable
from purely sines, which are odd. And the derivative of an odd function is an even function again.
(ν)
To see this more granularly, let’s look in more detail at the multiplication by (jk)ν that produces all the Yk
(Equation 5), for k ∈ {0, ...M − 1}:
(ν)
⃗ext ⃗ext
Y = [0, j ν , ...(j(N − 1))ν , (0 or (jN )ν ) , (−j(N − 1))ν , ...(−j)ν ] ⊙ Y
| {z }
depending on
ν odd or even
ν ⃗ext
= j ν · [0, 1, ...1, (0 or 1), −1, ..., −1] ⊙ [0, 1, ...N − 1, N, N − 1, ...1]ν ⊙Y
| {z } | {z }
1̃
constant palindromic
where ⊙ is a Hadamard, or element-wise, product, and raising a vector to a power is also element-wise. We can see
(
[0, 1, ...1, 0, −1, ..., −1] if ν is odd
1̃ =
ν
[0, 1, ...1, 1, 1, ...1] if ν is even
[0, 1, ...1, 0, −1, ..., −1] is odd around entries 0 and N , and [0, 1, ...1, 1, 1, ...1] is even around entry 0.
(ν)
Let’s now use this to reconstruct samples in the θ domain, yn , for odd and even derivatives:
N −1
1 X 1 X
yn(odd ν) = (jk)ν (Yk ejkθn − YM −k e−jkθn ) = (jk)ν Yk (ejkθn − e−jkθn )
M M
| {z } M | {z }
0<k< from odd- k=1
2 =Yk 2j sin(kθn )
ness of 1̃ν
N −1
1 X πnk
= 2 (jk)ν Yk j sin( ) (10)
M N
k=1
| {z }
= a DST-I of Y ⃗ (ν) · j!
1 X M M
yn(even ν) = (jk)ν (Yk ejkθn + YM −k e−jkθn ) + (j )ν YM/2 cos( θn )
M M
| {z } 2 2
0<k< 2 from even- =Yk
ness of 1̃ν
N −1
1 X
= (jN )ν YN cos(πn) + (jk)ν Yk (ejkθn + e−jkθn )
M | {z }
k=1
2 cos(kθn )
N −1
1 X πnk
= (j0)ν Y0 + (jN )ν YN (−1)n + 2 (jk)ν Yk cos( ) (11)
M N
k=1
| {z }
⃗
= a DCT-I of Y !(ν)
14
(ν)
Brilliant! So we can use only the non-redundant Yk with a DST-I or DCT-I to convert odd and even functions,
respectively, back to the θ domain!
Note that the DCT-I and DST-I definitions given in scipy[20, 21] use slightly different indexing than in my
definitions here, which can be a point of confusion. I consistently take N to be the index of the last element of the
non-redundant yn , not its length, following [18]. Note too that I consistently use n to index samples and k to index
basis domain, whereas scipy uses n for the domain being transformed from and k for the domain being transformed
to, which means these symbols are consistent with mine for forward transforms but flipped for inverse transforms.
Even more confusing, the DST-I only takes the k ∈ {1, ...N − 1} elements, since sines will result in zero crossings
at k = 0 and N (no informational content), whereas the DCT-I takes all k ∈ {0, ...N } elements!
√ d ′
√
d2 d y ′ (θ) − 1 − x2 dx y (θ) − y ′ (θ) dx
d
(− 1 − x2 )
y(θ) = √ =
dx2 dx − 1 − x2 1 − x2
′ x
d ′
− y (θ) · dx dθ y (θ) 1−x2
√ ′′
y (θ) xy ′ (θ)
= dθ √ − = −
1 − x2 1 − x2 1 − x2 (1 − x2 )3/2
d2 1 xn
−→ [ 2 y(θ)]n = 2
⊙ yn′′ − ⊙ yn′
dx 1 − xn (1 − x2n )3/2
Notice that the 2nd derivative in x requires both the 1st and 2nd derivatives in θ! This splintering phenomenon will
be a considerable source of pain as we take higher derivatives: For the ν th derivative in x we require all derivatives up
to order ν in θ.
Let’s see a few more:
d3 −1 3x −2x2 − 1 ′
y(θ) = y ′′′ (θ) + y ′′ (θ) + y (θ)
dx3 (1 − x2 )3/2 (1 − x2 )2 (1 − x2 )5/2
d4 1 −6x 11x2 + 4 ′′ −6x3 − 9x ′
y(θ) = y IV (θ) + y ′′′ (θ) + y (θ) + y (θ) (12)
dx4 (1 − x2 )2 (1 − x2 )5/2 (1 − x2 )3 (1 − x2 )7/2
d5 −1 10x −35x2 − 10 ′′′ 50x2 + 55x ′′ −24x4 − 72x2 − 9 ′
y(θ) = y V (θ) + y IV (θ) + y (θ) + y (θ) + y (θ)
dx5 (1 − x2 )5/2 (1 − x2 )3 (1 − x2 )7/2 (1 − x2 )4 (1 − x2 )9/2
We can see a bit of a pattern here, though. In particular, the function of x multiplying each derivative of y in θ
comes from at most two terms in the preceding derivative, which have a predictable form:
15
d
p(x) (µ+1) dθ (1 − x2 )c−1 dx p(x) − p(x)(c − 1)(1 − x2 )c−2 (−2x) (µ)
y (θ) · + y (θ) +
(1 − x2 )c−1 dx (1 − x2 )2c−2
d 1 3
q(x) (µ) dθ (1 − x2 )c− 2 dx q(x) − q(x)(c − 21 )(1 − x2 )c− 2 (−2x) (µ−1)
1 y · + y (θ)
(1 − x2 )c− 2 dx (1 − x2 )2c−1
If we now gather the y (µ) (θ) terms and use the fact dθ
= √ −1 , we can find its new multiplying factor is equal to:
dx 1−x2
This relationship holds no matter which, µ, c, p, q we’re addressing, which allows us to build up a kind of pyramid
of terms:
numerator c at each 1
−1
p and q: location: 2
−x 3
1
2 1
increasing increasing 5 3
d
of y −2x2 − 1 3x −1
2 2 2
dx ν
q p
7 5
−6x3 − 9x 11x2 + 4 −6x 1
2 3 2 2
9 7 5
−24x4 − 72x2 − 9 50x3 + 55x −35x2 − 10 10x −1
2 4 2 3 2
lower d
dθ
of y ... ... higher d
dθ
of y ... increasing µ ...
q always refers to the element up and to the left, and p always refers to the element above. If the arrows roll out of
the pyramid, the corresponding p or q is 0. I’ve done the above programmatically in code, such that we can find and
apply the factors—and thereby accomplish the variable transformation back to the Chebyshev domain—for arbitrarily
high derivatives.
y(θ) is composed of cosines, so notice if we take odd derivatives in θ, we get sines, and at the edges of the domain
where x → ±1, θ → 0, π, sine will be 0! However, if we take even derivatives, then cos(0, π) → 1, −1. Then, if we look
closely
√ at the derivatives in x (Equation 12), we can see that even derivatives
√ in θ of y are divided by even powers of
1 − x2 , and the highest power in a denominator is an odd power of 1 − x2 . If we multiply through so everything is
over the highest-power denominator and then combine the expression into a single fraction, we get a situation where
the odd-derivative
√ terms are 0 because sines, and the even-derivative terms are 0 because they’re multiplied by at least
one 1 − x2 .
This means the numerator as well as the denominator is 0 at the domain endpoints. 00 is an indeterminate form,
so we can use L’Hôpital’s rule!
Let’s see it in fine detail for the 1st derivative. The below uses the DCT-I reconstruction (Equation 8) for y(θ).
PN −1
1
M − N YN sin(N θ) − 2
N −1
d 1 k=1 kYk sin(kθ)
X dθ
lim Y0 + YN cos(N θ) + 2 Yk cos(kθ) · = lim √
x→±1 dθ M dx x→±1 − 1 − x2
θ→0,π k=1 θ→0,π
PN −1 2 PN −1 2
d
1
− N 2
YN cos(N θ) − 2 k Y k cos(kθ) · √ −1 1
N 2
YN cos(N θ) + 2 k Yk cos(kθ)
dx M k=1 1−x
2 M k=1
−→ = lim = lim
d x→±1 √ x x→±1 x
dx
θ→0,π
2
1−x θ→0,π
16
PN −1
1
N 2 YN + 2 k=1 k 2 Yk
M at x = 1, θ = 0
= (1st endpoints)
− 1 N 2 (−1)N YN + 2 PN −1 k 2 (−1)k Yk at x = −1, θ = π
M k=1
And now let’s do it for the 2nd derivative, with some slightly more compact notation, where we can be agnostic
about y(θ)’s exact structure until the end:
√ √
√ −1 + √ −xy
1− x2 y ′′′ (θ) ′′ ′′ √ −1
+ y ′ (θ))
2 (θ) − (
1 − x2 y ′′ (θ) − xy ′ (θ) 0 dx
d
2 1−x
1−x
xy (θ)
1−x2
lim → −→ √
x→±1 (1 − x2 )3/2 0 dx
d
−3x 1 − x2
θ→0,π
−1
−y ′′ (θ) √−1
0 dxd −y IV (θ) √
2
1−x
2
1−x 1
→−→ 6x 2 −3 = 2 (y IV (θ) + y ′′ (θ))
0 dxd √
2 6x − 3
1−x
PN −1
We already know y ′′ (θ) = M1
− N 2 YN cos(N θ) − 2 k=1 k 2 Yk cos(kθ) , assuming type I reconstruction. We can
easily find
N −1
1 4 X
y IV (θ) = N YN cos(N θ) + 2 k 4 Yk cos(kθ)
M
k=1
Now we can evaluate these and the factor 6x21−3 at the limit values and put it all together to find:
1 (N 4 − N 2 )YN + 2 PN −1 (k 4 − k 2 )Yk at x = 1, θ = 0
3M k=1
N −1
(2nd endpoints)
1 4 2 N 4 2 k
P
3M (N − N )(−1) YN + 2 k=1 (k − k )(−1) Yk at x = −1, θ = π
PN −1
1
(... − C3 N 6 + C2 N 4 − C1 N 2 )YN + 2 k=1 (... − C3 k 6 + C2 k 4 − C1 k 2 )Yk
D0 M at x = 1, θ = 0
PN −1
1 (... − C 3 N 6
+ C 2 N 4
− C1 N 2
)(−1) N
YN + 2 (... − C 3 k 6
+ C 2 k 4
− C 1 k 2
)(−1) k
Yk at x = −1, θ = π
Dπ M k=1
where the alternating plus and minus in the k and N terms comes from the fact the 2nd derivative contains −cosines,
the 4th +cosines, the 6th −cosines again, and so on. √
Because the act of cancellation and the functions containing powers of 1 − x2 = sin(θ) can’t be easily represented
in numpy, computing C and D requires a symbolic solver like sympy. I’ve devised an implementation to construct
expressions for the endpoints, up to arbitrary order.
17
Chebyshev series. This is analogous to how we can represent a function as a power series (a sum of integer powers of
the independent variable) and find the coefficients of its derivative with Power Rule[27].
If we sandwich this key rule with the realization that the DCT is not only getting Fourier series coefficients of
the warped function, but also (scaled) Chebyshev series coefficients of the original function (because these are exactly
the same modulo constant factors!), and thus the DCT−1 can go from the Chebyshev series representation to a
cosine-spaced sampling of a function, then we have the ingredients to craft an algorithm that stays entirely in the x
domain:
This algorithm turns out to be essentially numerically identical to the first, and due to its relative simplicity (no
splinters!) and ease of extension to support non-cosine-spaced samples (albeit with considerably greater computational
cost, O(N 3 ) rather than O(N log N )), it is the method of choice implemented in the main library code.
sin((k + 1)θ)
Uk (x) =
sin(θ)
Notice that:
d d dθ −1
k sin(kθ)
= k(− sin(kθ)) · √
Tk (x) = cos(kθ) · = = k · Uk−1 (x)
dx dθ dx 1 − x2 sin(θ)
But we really want the derivative in terms of T , not U . Lucky for us, there is a relationship between the two, based
on trigonometric identities, making particular use of cos(α) sin(β) = 21 (sin(α + β) − sin(α − β)):
Similarly:
Thus:
18
( P
k−1
d 2 odd κ>0 Tκ (x) for even k
k · Uk−1 (x) = Tk (x) = k · Pk−1
dx −1 + 2 even κ≥0 Tκ (x) for odd k
Let’s see this on a couple examples to get a better intuition: In practice we represent a function with N Chebyshev
series coefficients, stored low to high. If N = 5, then T3 (x) would be [0, 0, 0, 1, 0], and T4 (x) would be [0, 0, 0, 0, 1]. If we
d d
differentiate these two, we should get dx T3 (x) = 3·(−1+2·[1, 0, 1, 0, 0]) = [3, 0, 6, 0, 0], and dx T4 (x) = 4·2·[0, 1, 0, 1, 0] =
[0, 8, 0, 8, 0]. Notice the extra constant from the −1 in the first example is factored in to the coefficient of T0 (x) = 1.
d
Because differentiation is linear, we can scale and stack these particular results, e.g. dx (2T3 (x) + T4 (x)) = 6T0 (x) +
8T1 (x) + 12T2 (x) + 8T3 (x). And because each term’s derivative only affects every-other term of lower order, and these
effects are cumulative, it’s possible to calculate the new sequence by starting at the higher-order end and working
downwards, modifying only two numbers at each step[26]. An implementation of this procedure called chebder[24]
lives in numpy.
7 Multidimensionality
We are now fully equipped to find derivatives for 1-dimensional data. This is technically all we need, because, due
to linearity of the derivative operator, we can find the derivative along a particular dimension of a multidimensional
space by using our 1D solution along each constituent vector running in that direction, and we can find derivatives
along multiple dimensions by applying the above in series along each dimension:
∂2
y(x1 , x2 ) = Algo(Algo(yi , 1st , x1 )j , 1st , x2 ) ∀ i, j
∂x1 ∂x2
∂2 ∂2
∇2 y = ( + )y = Algo(yi , 2nd , x1 ) + Algo(yj , 2nd , x2 ) ∀ i, j
∂x21 ∂x22
where i, j are indexers as in the computing sense and have nothing to do with the imaginary unit, Algo applies the
algorithm to each vector along the dimension given by the third argument, and the 1st and 2nd in the second argument
refer to the derivative order.
Each application to a vector incurs O(N log N ) cost, and fundamentally applying the method to higher-dimensional
data must involve a loop, so the full cost of applying along any given direction is (assuming length N in all dimensions)
O(N D log N ), where D is the dimension of the data. Aside from pushing this loop lower down into numpy to take
advantage of vectorized compute, there can be no cost savings for a derivative in a particular dimension.
That’s neat, but does it save us anything, really? Let’s see it in series:
19
FFT
y Yd (jk)νd ⊙ Yd ∂d y
FFT−1 ⊙(jk)νd′
... ∂d′ ∂d y (jk)νd′ ⊙ Yd′ Yd′
repeat for D
dimensions
If we add up the costs, we can see that it’s actually no more or less efficient to differentiate along all dimensions at
once versus in series.
From a user-friendliness perspective, I judge it to be somewhat more confusing to specify multiple derivative
dimensions at once (although generalizing the order and axis parameters to vectors is possible), so I have chosen to
limit the package to differentation along a single dimension at a time, which also agrees with the interface of chebder.
Multidimensional data can still be handled, however, via clever indexing and use of fft, dct, and chebder’s axis
parameter.
8 Arbitrary Domains
So far we’ve only used the domain [0, 2π) in the Fourier case, because this is the domain assumed by the DFT, and the
domain [−1, 1] in the Chebyshev case, because this is the where a cosine wrapped around a cylinder casts a shadow.
As you may have guessed, this hasn’t curtailed the generality of the methods at all, because we can map any domain
from a to b onto a canonical domain.
2 2
t θ
2 4 6 8 2 4 6 8
−2 −2
−4 −4
In the discrete case, where we have M samples on [a, b), then we can map tn with:
20
8.2 Chebyshev on [a, b]
Here both ends are inclusive, so we have t ∈ [a, b] that we need to map to x ∈ [−1, 1]. We can accomplish this with:
b−a b+a
x ∈ [−1, 1] ↔ t ∈ [a, b] = [−1, 1] · +
| {z } 2 2
x
40 40
20 20
t x
−1 1 2 3 4 −1 1 2 3 4
−20 −20
In the discrete case, where we have N + 1 samples on [a, b], then we can map tn with:
π{0, ...N }
π{0, ...N } b − a b + a
xn ∈ cos ↔ tn ∈ cos · +
N N 2 2
In code this is t_n = np.cos(np.arange(N+1)*np.pi/N) * (b - a)/2 + (b + a)/2.
Notice the order has flipped here, that counting up in n means we traverse x from +1 → −1. This is actually what
we want; it corresponds to the horizontal flip necessary to make cosine shadows equate with Chebyshev polynomails.
In other words, the overall derivative is scaled by the inverse of the width-smoosh. So to recover the true derivative
dy
we want, dx , we have to divide by this scale, which is a familiar term from our variable transformations t ↔ θ or x.
For higher derivatives:
dν y dν y
= · scaleν
dx · smoosh)ν dxν
So we can always correct the derivative by dividing by scaleν .
To enable calculation of the scale, and to double check the user sampled their function at a correct t_n (especially
in the Chebyshev case, since cosine-spacing is easy to flub and especially confusing with the DCT-II), the functions
take the sample locations as a parameter and raise error messages with correct examples if the sampling is invalid.
21
9 Differentiating in the Presence of Noise
Finding the true derivative of a noisy signal is ill-posed, because random variations cause unknown (and often dra-
mamtic) local deviations of function slope. This problem only gets worse for calculations of curvature and higher order
derivatives, so the only solution is to try to remove the noise before differentating.
“Every spectrum of real noise falls off reasonably rapidly as you go to infinite frequencies, or else it
would have infinite energy. But the sampling process aliases higher frequencies in lower ones, and the
folding ... tends to produce a flat spectrum. ... white noise. The signal, usually, is mainly in the lower
frequencies.” –Richard Hamming, The Art of Doing Science and Engineering[2], Digital Filters III
By use of the term “frequencies”, Hamming is implying use of the Fourier basis. He’s saying there is band-separation
of signal and noise in the frequency domain, yet another reason for its popularity. This general principle extends to
other noise reduction techniques in spirit: At bottom, all accomplish some kind of smoothing, be it moving average,
FIR filtering, Savitzky-Golay, Kalman filtering, total variation penalty, etc.
Cutoff Frequency
Energy
Aliasing
0 1 fs 3 0 1 1
2 fs 2 fs 4 fs 2 fs
Frequency f (Hz) Frequency f (Hz)
Typical energy spectrum of a noisy signal, before and after sampling. fs is a sampling rate of our choosing.
In practice we expect to sample a signal frequently enough to capture fluctuations of interest in the data, so signal
energy should be concentrated in frequencies below some cutoff. Noise energy, decaying up to higher frequencies, is
added in linearly.
To reach the frequency domain, we use the FFT, which requires equispaced samples and a periodic signal, because
a discontinuity or corner causes artefacts at higher frequency, which become impossible to model in finitely many
coefficients and impossible to distinguish from noise. Nyquist theorem[11] tells us that we need > 2 equispaced
samples per cycle to unambiguously reconstruct a frequency, so this process creates a natural cutoff at the Nyquist
rate, fs /2. Equispaced samples taken from frequencies slightly over this limit are best matched by frequencies slightly
under the limit, just as unitary complex numbers alias: ej(π+ϵ) = e−j(π−ϵ) . This creates a folding pattern in the
bandlimited FFT spectrum, which distributes decaying noise energy somewhat evenly.
Notice the more we sample, the more we can concentrate the legitimate signal’s energy in low frequencies, and
the more we can distribute noise energy across higher frequencies. We can then zero out or otherwise dampen the
upper part of the spectrum to chop down the noise that hasn’t aliased on to the periodic signal’s band! The filter
parameter is designed exactly for this. If we then inverse transform, we get a smoother signal, or we can multiply by
(jk)ν and then inverse transform to sample the derivative of that smoother signal.
22
9.3 The Advantage of Spectral Smoothing
Most alternative noise-quelling techniques can only take advantage of local information, but a spectral representation
builds a function out of basis functions that span the entire domain, so every point’s value takes holistic fit under
consideration. This makes the reconstruction much more robust to perturbations than one that uses only a few
neighboring points. I.e. it’s much harder to corrupt the signal so thoroughly that it can’t be successfully recovered.
In the Fourier case, this has an analogy with error-correcting codes[28, 2]; notice the locations corresponding to
Hamming code parity checks constitute selections of different frequency:
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8
9 10 11 12 9 10 11 12 9 10 11 12 9 10 11 12
13 14 15 16 13 14 15 16 13 14 15 16 13 14 15 16
Of course there are distinctions: In the context of continuous signals, “corruption” means (discrete representations
of) continous numbers have slightly inaccurate values, whereas in error correcting codes each datum is a single discrete
bit which can simply be flipped. Likewise, a Fourier coefficient indicates “We need this much of the k th basis function”,
whereas the analogous parity check indicates “There is/isn’t a parity error among my bits.”
But in both cases a spot-corruption will stand out, because it appears in a particular combination of parity checks
or introduces a value that can’t as easily be represented with a finite combination of smooth basis functions. In this
sense, the Fourier basis is the ideal basis for filtering high frequency noise, so long as the signal of interest is periodic,
and other methods are merely trying to approximate a lowpass filter.
When we sample at cosine-space points, ∆x decreases near the edges. In the case of measurement noise, effects
∆y
on ∆y are undiminished in these regions, so the error in ∆x increases dramatically. In the case of process noise the
situation is a little more hopeful, because its effects on ∆y naturally get smaller as ∆x shrinks.
23
get one-sided information about those regions. And unlike Fourier basis functions, which have uniform frequency
throughout their domain, other bases are typically nonuniform. This may be desirable if the data or noise is known to
fit a particular pattern, but in the naive case, where we don’t know the noise shape or expect white noise, nonuniform
basis functions’ variable expressive power across the domain can mismatch the situation. Chebyshev polynomials
exhibit this characteristic, getting steeper and therefore higher frequency near the boundaries, which makes them
worse at filtering and better at fitting high frequency noise in these regions, i.e. more sensitive to disruptions there.
An example of systematic edge blowup for Chebyshev derivatives in the presence of noise.
This problem is perhaps best shown mathematically in the Chebyshev-via-Fourier method, where we not only
implicitly warp the function to be y(cos(θ)) by taking a DCT (thereby treating measurement noise as if it’s evenly
distributed√in the θ domain), but we later also explicitly unwarp to get back to the x domain, involving division by
powers of 1 − x2 [25, 18], which → 0 as x → ±1. Numerically equivalent, in the Chebyshev series-based method,
“Tight coupling between coefficients enables propagation of errors from high frequency to low frequency modes.”[25] We
see that higher-order coefficients have an impact on ever-other lower-order coefficient[26], and this effect is cumulative,
so a slight error in coefficient values, especially higher-order ones, compounds in a very nasty way when we differentiate.
Thus “while the representation of a function by a Chebyshev series may be most accurate near x = ±1, these results
indicate that the derivatives computed from a Chebyshev series are least accurate at the edges of the domain.”[25]
In fact, the Chebyshev basis is not the only one suffer this weakness, because other polynomial bases, e.g. the
Legendre polynomials (orthogonal without a weighting function) and Bernstein polynomials (linearly independent
but not orthogonal), experimented with in the filtering noise notebook, are also more sensitive at the domain edges,
even when sampled at cosine-spaced points to help dampen Runge phenomenon. This is a fundamental limitation of
polynomial-based methods in the presence of noise. Occasionally differentiation can be made to work okay by filtering
higher modes, but there is always systematic blowup in higher-order derivatives.
Notes
a. There’s a great passage in Richard Hamming’s book The Art of Doing Science and Engineering[2] where he wonders why we use the
Fourier basis so much:
“It soon became clear to me digital filter theory was dominated by Fourier series, about which theoretically I had learned
in college, and actually I had had a lot of further education during the signal processing I had done for John Tukey, who was
a professor from Princeton, a genius, and a one or two day a week employee of Bell Telephone Laboratories. For about ten
years I was his computing arm much of the time.
Being a mathematician I knew, as all of you do, that any complete set of functions will do about as good as any other
set at representing arbitrary functions. Why, then, the exclusive use of the Fourier series? I asked various electrical engineers
and got no satisfactory answers. One engineer said alternating currents were sinusoidal, hence we used sinusoids, to which I
replied it made no sense to me. So much for the usual residual education of the typical electrical engineer after they have left
school!
24
So I had to think of basics, just as I told you I had done when using an error-detecting computer. What is really going
on? I suppose many of you know what we want is a time-invariant representation of signals, since there is usually no natural
origin of time. Hence we are led to the trigonometric functions (the eigenfunctions of translation), in the form of both Fourier
series and Fourier integrals, as the tool for representing things.
Second, linear systems, which is what we want at this stage, also have the same eigenfunctions—the complex exponentials
which are equivalent to the real trigonometric functions. Hence a simple rule: if you have either a time-invariant system or a
linear system, then you should use the complex exponentials.
On further digging in to the matter I found yet a third reason for using them in the field of digital filters. There is a
theorem, often called Nyquist’s sampling theorem (though it was known long before and even published by Whittaker, in
a form you can hardly realize what it is saying, even when you know Nyquist’s theorem), which says that if you have a
band-limited signal and sample at equal spaces at a rate of at least two in the highest frequency, then the original signal can
be reconstructed from the samples. Hence the sampling process loses no information when we replace the continuous signal
with the equally spaced samples, provided the samples cover the whole real line. The sampling rate is often known as the
Nyquist rate after Harry Nyquist, also of servo stability fame, as well as other things [also reputed to have been just a really
great guy who often had productive lunches with his colleagues, giving them feedback and asking questions that brought out
the best in them]. If you sample a non-band-limited function, then the higher frequencies are “aliased” into lower ones, a word
devised by Tukey to describe the fact that a single high frequency will appear later as a single low frequency in the Nyquist
band. The same is not true for any other set of functions, say powers of t. Under equally spaced sampling and reconstruction
a single high power of t will go into a polynomial (many terms) of lower powers of t.
Thus there are three good reasons for the Fourier functions: (1) time invariance, (2) linearity, and (3) the reconstruction
of the original function from the equally spaced samples is simple and easy to understand.
Therefore we are going to analyze the signals in terms of the Fourier functions, and I need not discuss with electrical
engineers why we usually use the complex exponents as the frequencies instead of the real trigonometric functions. [It’s down
to convenience, really.] We have a linear operation, and when we put a signal (a stream of numbers) into the filter, then out
comes another stream of numbers. It is natural, if not from your linear algebra course then from other things such as a course
in differential equations, to ask what functions go in and come out exactly the same except for scale. Well, as noted above,
they are the complex exponentials; they are the eigenfunctions of linear, time-invariant, equally spaced sampled systems.
Lo and behold, the famous transfer function [contains] exactly the eigenvalues of the corresponding eigenfunctions! Upon
asking various electrical engineers what the transfer function was, no one has ever told me that! Yes, when pointed out to
them that it is the same idea they have to agree, but the fact it is the same idea never seemed to have crossed their minds!
The same, simple idea, in two or more different disguises in their minds, and they knew of no connection between them! Get
down to the basics every time!”
In that spirit, with Patron Saint Hamming watching over us, let’s continue: subsection 1.1
References
[1] Lebesgue Integrable, https://mathworld.wolfram.com/LebesgueIntegrable.html
[2] Hamming, R., 1996, The Art of Doing Science and Engineering
[3] Pego, B., Simplest proof of Taylor’s theorem, https://math.stackexchange.com/a/492165/278341
25
[15] Discrete Fourier Transform, https://numpy.org/doc/2.1/reference/routines.fft.html
[16] Bristow-Johnson, R., 2014, About Discrete Fourier Transform vs. Discrete Fourier Series,
https://dsp.stackexchange.com/a/18931/40873
[17] Johnson, S., 2011, Notes on FFT-based differentiation, https://math.mit.edu/∼stevenj/fft-deriv.pdf
[18] Trefethen, N., 2000, Spectral Methods in Matlab, Chapter 8,
https://epubs.siam.org/doi/epdf/10.1137/1.9780898719598.ch8
[19] Burns, K., et al., 2020, Dedalus: A flexible framework for numerical simulations with spectral methods,
https://www.researchgate.net/publication/
340905766 Dedalus A flexible framework for numerical simulations with spectral methods
[20] https://docs.scipy.org/doc/scipy/reference/generated/scipy.fft.dct.html
[21] https://docs.scipy.org/doc/scipy/reference/generated/scipy.fft.dst.html
[22] Giesen, F., DCT-II vs. KLT/PCA, https://www.farbrausch.de/%7Efg/articles/dct klt.pdf
[23] Royi, 2025, Why does the DCT-II have better energy compaction than DCT-I?,
https://dsp.stackexchange.com/a/96197/40873
[24] Harris, C., 2009, chebder,
https://github.com/numpy/numpy/blob/v2.2.0/numpy/polynomial/chebyshev.py#L874-L961
[25] Breuer, K. & Everson, R., 1990, On the errors incurred calculating derivatives using Chebyshev polynomials,
https://doi.org/10.1016/0021-9991(92)90274-3
26