DSP Mathematics
DSP Mathematics
M. Hoffmann
DESY, Hamburg, Germany
Abstract
1 Introduction
Digital signal processing requires the study of signals in a digital representation and the methods to in-
terpret and utilize these signals. Together with analog signal processing, it composes the more general
modern methodology of signal processing. Although the mathematics that are needed to understand most
of the digital signal processing concepts were developed a long time ago, digital signal processing is still
a relatively new methodology. Many digital signal processing concepts were derived from the analog
signal processing field, so you will find a lot of similarities between the digital and analog signal pro-
cessing. Nevertheless, some new techniques have been necessitated by digital signal processing, hence,
the mathematical concepts treated here have been developed in that direction. The strength of digital
signal processing currently lies in the frequency regimes of audio signal processing, control engineering,
digital image processing, and speech processing. Radar signal processing and communications signal
processing are two other subfields. Last but not least, the digital world has entered the field of accel-
erator technology. Because of its flexibilty, digital signal processing and control is superior to analog
processing or control in many growing areas.
Around 1990, diagnostic devices in accelerators began to utilize digital signal processing, e.g.,
for spectral analysis. Since then, the processing speed of the hardware [mostly standard computers
and digital signal processors (DSPs)] has increased very quickly, such that now fast RF control is now
possible. In the future, direct sampling and processing of all RF signals (up to a few GHz) will be
possible, and many analog control circuits will be replaced by digital ones.
The design of digital signal processing systems without a basic mathematical understanding of the
signals and its properties is hardly possible. Mathematics and physics of the underlying processes need
to be understood, modelled, and finally controlled. To be able to perform these tasks, some knowledge
of trigonometric functions, complex numbers, complex analysis, linear algebra, and statistical methods
is required. The reader may look them up in his undergraduate textbooks if necessary.
The first session covers the following topics: the dynamics of the harmonic oscillator and signal
theory. Here we try to describe what a signal is, how a digital signal is obtained, and what its quality
parameters, accuracy, noise, and precision are. We introduce causal time invariant linear systems and
discuss certain fundamental special functions or signals.
In the second session we are going to go into more detail and introduce the very fundamental
concept of convolution, which is the basis of all digital filter implementations. We are going to treat the
Fourier transformation and finally the Laplace transformation, which are also useful for treating analog
signals.
11
M. H OFFMANN
R
x
C L I~
m
k k
I
m
The third session will make use of the concepts developed for analog signals as they are ap-
plied to digital signals. It will cover digital filters and the very fundamental concept and tool of the
z-transformation, which is the basis of filter design.
The fourth and last session will cover more specialized techniques, like the Kalman filter and the
concept of wavelets. Since each of these topics opens its own field of mathematics, we can just peek at
the surface to get an idea of its power and what it is about.
2 Oscillators
One very fundamental system (out of not so many others) in physics and engineering is the harmonic
oscillator. It is still simple and linear and shows various behaviours like damped oscillations, reso-
nance, bandpass or band-reject characteristics. The harmonic oscillator is, therefore, discussed in many
examples, and also in this lecture, the harmonic oscillator is used as a work system for the afternoon
lab-course.
Q
RI + LI˙ + = mI∼
C
R 1
⇔ I¨ + I˙ + I = KI∼ . (1)
L LC
12
D IGITAL SIGNAL PROCESSING MATHEMATICS
k κ 1
⇔ ẍ + ẋ + x = F(t) . (2)
m m m
This is also a second-order linear differential equation.
where T is the excitation amplitude, ω ∼ the frequency of the excitation, ξ the relative phase of the
excitation compared to the phase of the oscillation of the system (whose absolute phase is set to zero),
R k
β= or
2L 2m
is the term which describes the dissipation which will lead to a damping of the oscillator and
r
1 κ
ω0 = √ or
LC m
gives you the eigenfrequency of the resonance of the system.
Also one very often uses the so-called Q-value
ω0
Q= (4)
2β
which is a measure for the energy dissipation. The higher the Q-value, the less the dissipation, the
narrower the resonance, and the higher the amplitude in the case of resonance.
x(t) = Aei(ωt+φ)
ẋ(t) = iωAei(ωt+φ)
ẍ(t) = −ω2 Aei(ωt+φ) .
13
M. H OFFMANN
T
A
2ωβ
φ π
5 Amplitude 0.01 Phase 0.1
4 0.1
0.2
3
0.5
π
Q 2
0.3
2 0.3
0.01
1
0.5 0.2
0 0
0 500 1000 1500 2000 0 500 1000 1500 2000
ω [Hz] ω [Hz]
Fig. 3: Amplitude and phase of the excited harmonic oscillator in steady state
excitation frequency. Since we are only interested in the phase difference of the oscillator with respect
to the excitation force, we can set ξ = 0.
In this (steady) state, we can look up the solution from a graphic (see Fig. 2). We get one equation
for the amplitude
2
T
= (ω20 − ω2 )2 + (2ωβ)2
A
1
⇔ A=Tq
(ω20 − ω2 ) + 4ω2 β2
14
D IGITAL SIGNAL PROCESSING MATHEMATICS
i7
6 complex vectors
ω=ω 0 0.1
5
4
0.5
3
0.3 0.2
2
1 0.01
r
0
−3 −2 −1 0 1 2 3
ω= ω=0
8
Fig. 4: Complex vector of the harmonic oscillator moving with frequency for different Q values
l φ
m
Fig. 5: The gravity
pendulum. A mass m
oscillates in the grav-
g
ity field.
15
M. H OFFMANN
3.5
T=0.1, ampl 1.4 orig 3
T=0.1, phase x0=0, T=1, ampl
3 T=0.2, ampl x0=0, T=1, phase
T=0.2, phase 1.2 x0=3, T=1, ampl
T=0.4, ampl x0=3, T=1, phase
2.5 T=0.4, phase 1
T=1.0, ampl 2
2 T=1.0, phase
A/T, phi
0.8
A/T
phi
1.5
0.6
1
1 0.4
0.5 0.2
0 0 0
0.6 0.8 1 1.2 1.4 1.6 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5
exciting frequency [Hz] exciting frequency [Hz]
previous section. But what if we have large amplitudes like or even a rotation of the pendulum
like ?
Well, this system is unbounded (rotation can occur instead of oscillation) and so the behaviour is
obviously amplitude dependent. We especially expect the resonance frequency to be a function of the
oscillation amplitude, ω = F(A). At least, we can still assume ω = ω ∼ for the steady state solution; this
means that the system will follow the excitation frequency after some time.
Figure 6 shows the simulated behaviour of the mathematical pendulum in the steady state. You
can see the single resonance peak, which for small amplitudes looks very similar to the one seen in
Fig. 3. For larger amplitudes, however, this peak is more and more bent to the left. When the peak hangs
over1 , a jump occurs at an amplitude-dependent excitation frequency, where the system can oscillate
with a small amplitude and then suddenly with a large amplitude. To make things even worse, the
decision about which amplitude is taken by the system depends on the amplitude the system already has.
Figure 6 (right) shows that the jump occurs at different frequencies, dependent on the amplitude x 0 at the
beginning of the simulation.
Last but not least, coupled systems of that type may have a very complicated dynamic behaviour
and may easily become chaotic.
16
D IGITAL SIGNAL PROCESSING MATHEMATICS
m
α
Fig. 7: The yo-yo. A mass m on the inclined plane. For simplicity, the rotation of the ball is not considered here.
3.5 2
amp
1.8 freq
3
1.6
T=0.1, ampl
T=0.1, phase 1
1.5 T=0.2, ampl
T=0.2, phase 0.8
T=0.4, ampl
1 T=0.4, phase 0.6
T=1.0, ampl 0.4
T=1.0, phase
0.5
0.2
0 0
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 -5 0 5 10 15 20
exciting frequency [Hz] Excitation amplitude T
Fig. 8: Simulated frequency response of the yo-yo for different excitation frequencies and amplitudes (left). On
the right you can see different oscillation modes of this system depending on the excitation amplitude for different
excitation frequencies. The system responds with different oscillation frequencies in an unpredictible manner.
where (
x
x 6= 0
sgn(x) := |x|
.
0 x=0
Now let us answer the questions: Is there a resonance? And if so, what is the resonance frequency?
!
Obviously, the resonance frequency here would also be highly amplitude-dependent (ω 0 = f (A))
because it takes longer for the ball to roll down the inclined plane if it starts with a bigger amplitude. But
if we look at the simulated frequency response with different excitation amplitudes (see Fig. 8) it looks
like there is a resonance at 0 Hz!?
Looking closer at the situation one finds that the oscillation frequency can differ from the ex-
citation frequency: ω 6= ω∼ . Figure 8 (right) shows all possible oscillation frequencies (in relation to
the excitation frequency) with different starting amplitudes x 0 (colours) under excitation with different
amplitudes. The system responds with oscillations in an unpredictible manner.
Now you know why linear systems are so nice and relatively easy to deal with.
3 Signal theory
The fundamental concepts we want to deal with for digital signal processing are signals and systems.
In this section we want to develop the mathematical understanding of a signal in general, and more
specifically look at the digital signals.
17
M. H OFFMANN
3.1 Signals
The signal s(t) which is produced by a measurement device can be seen as a real, time-varying property
(a function of time). The property represents physical observables like voltage, current, temperature, etc.
Its instant power is defined as s2 (t) (all proportional constants are set to one 2 ).
The signal under investigation should be an energy signal, which is
Z∞
s2 (t)dt < ∞ . (5)
−∞
This requires that the total energy content of that signal be finite. Most of the elementary functions (e.g.,
sin(), cos(), rect(), . . . ) are not energy signals, because they ideally are infinitely long, and the integral
(5) does not converge. In this case one can treat them as power signals, which requires
T /2
Z
lim s2 (t)dt < ∞ . (6)
T →∞
−T /2
(The energy of the signal is finite for any given time interval.) Obviously sin() and cos() are signals
which fullfil the relation (6).
Now, what is a physical signal that we are likely to see? Well, wherever the signal comes from,
whatever sensor is used to measure whatever quantity, in the end — if it is measured electrically —
we usually get a voltage as a function of time U(t) as (input) signal. This signal can be discrete or
continuous, analog or digital, causal or non-causal. We shall discuss these terms later.
From the mathematical point of view we have the following understanding/definitions:
– Time: t ∈ R (sometimes ∈ R+ 0)
– Amplitude: s(t) ∈ R (usually a voltage U(t))
– Power: s2 (t) ∈ R+
0 (constants are renormed to 1)
Since the goal of digital signal processing is usually to measure or filter continuous, real-world
analog signals, the first step is usually to convert the signal from an analog to a digital form by using
an analog-to-digital converter. Often the required output is another analog signal, so a digital-to-analog
converter is also required.
The algorithms for signal processing are usually performed using specialized electronics, which
either make use of specialized microprocessors called digital signal processors (DSPs) or they process
signals in real time with purpose-designed application-specific integrated circuits (ASICs). When flexi-
bility and rapid development are more important than unit costs at high volume, digital signal processing
algorithms may also be implemented using field-programmable gate arrays (FPGAs).
Signal domains
Signals are usually studied in one of the following domains:
18
D IGITAL SIGNAL PROCESSING MATHEMATICS
5. wavelet domains.
We choose the domain in which to process a signal by making an informed guess (or by trying
different possibilities) as to which domain best represents the essential characteristics of the signal. A
sequence of samples from a measuring device produces a time or spatial domain representation, whereas
a discrete Fourier transform produces the frequency domain information, the frequency spectrum. Au-
tocorrelation is defined as the cross-correlation of the signal with itself over varying intervals of time or
space. Wavelets open various possibilities to create localized bases for decompositions of the signal. All
these topics will be covered in the following sections. First we are going to look at how one can obtain a
(digital) signal and what quantities define its quality. Then we are going to look at special fundamental
signals and linear systems which transform these signals.
Discrete-time signals
Discrete-time signals may be inherently discrete-time (e.g., turn-by-turn beam position at one monitor)
or may have originated from the sampling of a continuous-time signal (digitization). Sampled-data
signals are assumed to have been sampled at periodic intervals T . The sampling rate must be sufficiently
high to extract all the information in the continuous-time signal, otherwise aliasing occurs. We shall
discuss issues relating to amplitude quantization, but, in general, we assume that discrete-time signals
are continuously valued.
3.2 Digitization
The digitization process makes out of an analog signal s(t) a series of samples
The second effect must not be neglected, although in some cases there is no special problem with this if
you can use a high enough number of bits for the digitization. Modern fast ADCs have 8, 14 or 16 bits
resolution. High-precision ADCs exist with 20 or even more effective bits, but they are usually much
slower. Figure 9 illustrates the digitization process.
Dithering
Because the number of bits of ADCs is a cost issue, there is a technique called dithering which is
frequently used to improve the (amplitude) resolution of the digitization process. Suprisingly, it makes
use of noise which is added to the (analog) input signal. The trick is that you can substract the noise later
from the digital values, assuming you know the exact characteristics of the noise, or even better, you
produce it digitally using a DAC, and therefore know the value of each noise sample. This technique is
illustrated in Fig. 10.
19
M. H OFFMANN
INPUT digital
ADC
A B C
A: s(t) B: st C: x[t]
4.1 4.1 70
4.05 4.05 69
4 4 68
3.95 3.95 67
3.9 3.9 66
Signal [mV]
Signal [mV]
Digits
3.85 3.85 65
3.8 3.8 64
3.75 3.75 63
3.7 3.7 62
3.65 3.65 61
3.6 3.6 60
2.4 2.6 2.8 3 3.2 3.4 0 10 20 30 40 50 0 10 20 30 40 50
time [ms] sample # sample #
Fig. 9: The digitization process is done in two steps: First, samples are taken from the analog input signal (A). The
time discretization is done with the sampling frequency f s . The voltage is stored in a sample-and-hold device (B)
(a simple capacitor can do). Finally the voltage across the capacitor is converted into a digital number (C), usually
represented by n bits of digital logic signals. The digital representation of the input signal is not perfect (as can be
seen on the bottom plots) as it has a limited resolution in both time and amplitude.
The only situation where you may encounter non-causal signals or non-causal algorithms is under
the following circumstances: Say, a whole chunk of data has been recorded (this can be the whole pulse
train in a repetitive process or the trace of a pulse of an RF system). Now you want to calculate a
prediction for the next measurement period from the last period’s data. From some viewpoint, this data
is seen as a non-causal signal: If you process the data sample by sample, you always have access to the
whole dataset, which means you can also calculate with samples before the sample actually processes.
You can thereby make use of non-causal algorithms, because from this algorithm’s perspective your data
also contains the future. But from the outside view, it is clear that it does not really contain the future,
because the whole chunk of data has been taken in the past and is now processed (with a big delay). A
measurement can not take information from the future! Classically, nature or physical reality has been
considered to be a causal system.
=⇒ x[n] = A · cos(ωnT )
ω 2πω
= A · cos(n ) = A · cos(n )
fs ωs
=: A · cos(ωd n) ,
20
D IGITAL SIGNAL PROCESSING MATHEMATICS
2010 2010
analog orginal
digital added noise
millivolts (or digital number)
2009 2009
millivolts
2008 2008
2007 2007
2006 2006
0 10 20 30 40 50 0 10 20 30 40 50
time (or sample #) time
2010
orginal
digital
millivolts (or digital number)
2009
2008
2007
2006
0 10 20 30 40 50
time (or sample #)
Fig. 10: The dithering technique makes use of (random) noise which is added to the analog signal. If this noise
is later removed from the digital signal (e.g. using a digital low pass filter or statistics) the accuracy of the digital
values can be improved. The best method would be the subtractive dither: produce the ‘random’ noise by a DAC
and substract the known numbers later.
where
2πω
ωd = = ωT (7)
ωs
is the discrete time frequency. The units of the discrete-time frequency ω d are radians per sample with
a range of
−π < ωd ≤ π or 0 ≤ ωd < 2π .
“A continuous signal can be properly sampled if it does not contain frequency components above
fs
fcrit = , the so-called Nyquist frequency” .
2
21
M. H OFFMANN
Proper:
4 4
DC 0.09 of sampling rate
3 3
2 2
1 1
Not proper:
Amplitude
Amplitude
0 0
-1 -1
4
0.95 of sampling rate
-2 -2
3
-3 -3
2
-4 -4
0 5 10 15 20 25 30 0 5 10 15 20 25 30 1
Amplitude
time (or sample #) time (or sample #)
0
Still proper: -1
4
0.31 of sampling rate -2
3
-3
2
-4
1 0 5 10 15 20 25 30
Amplitude
-1 "aliasing"
-2
-3
-4
0 5 10 15 20 25 30
time (or sample #)
Fig. 11: Different examples of proper and not proper sampling. If the sampling frequency is too low compared
with the frequency of the signal, a signal reconstruction is not possible anymore.
Frequency components which are larger than this critical frequency ( f > f crit ) are aliased to a mirror
frequency f ∗ = fcrit − f .
The sampling theorem has consequences on the choice of the sampling frequency you should use
to sample your signal of interest. The digital signal cannot contain frequencies f > f crit . Frequencies
greater than f crit will add up to the signal components which are still properly sampled. This results
in information loss at the lower frequency components because their signal amplitudes and phases are
affected. So except for special cases (see undersampling and down-conversion) you need
x[n] = sin(2πT ( f ± k f s )n + φ) .
This means: when sampling at f s , we cannot distinguish between f and f ± k f s by the sampled
data, where k is an integer.
22
D IGITAL SIGNAL PROCESSING MATHEMATICS
Nyquist−
DC Frequency
GOOD
ALIASED
0.5
digital frequency
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2 2.5
270
digital phase (deg)
180
90
0
−90
0 0.5 1 1.5 2 2.5
Continuous frequency (as a function of the sampling rate)
Fig. 12: Mapping of the analog frequency components of a continous signal to the digital frequencies. There is
a good area where the frequencies can be properly reconstructed and several so-called Nyquist bands where the
digital frequency is different. Also the phase jumps from one Nyquist band to the other.
4.1
4.05
4
4
3.95
Signal [mV]
3.9
3.85 2
3.8
3.75
3.7
3.65
3.6 0 fs 2fs 3fs
2.4 2.6 2.8 3 3.2 3.4
time [ms] Frequency
lower upper
6 sideband
Amplitude
4.1
4.05
4
4
3.95
Amplitude
3.9
3.85 2
3.8
3.75
3.7
3.65
3.6 0 fs 2fs 3fs
2.4 2.6 2.8 3 3.2 3.4
time Frequency
Fig. 13: Aliasing example. In frequency domain the continuous signal has a limited spectrum. The sampled signal
can be seen as a pulse train of sharp (δ-)pulses which are modulated with the input signal. So the resulting spectrum
gets side-bands which correspond to the Nyquist bands seen from inside the digital system. By the way: the same
applies if you want to convert a digital signal back to analog.
23
M. H OFFMANN
2. 3. 4. 5.
BASEBAND NYQUIST ZONE NYQUIST ZONE NYQUIST ZONE NYQUIST ZONE
0 0.5fs fs 1.5fs 2fs 2.5fs
0 0.5fs fs 1.5fs 2fs 2.5fs
0 0.5fs fs 1.5fs 2fs 2.5fs
The aliasing can also be seen the other way round: Given a continuous signal with a limited spec-
trum (see Fig. 13). After sampling we cannot distinguish if we originally had a continuous and smooth
signal or a signal consisting of a pulse train of sharp (δ-)pulses which are modulated corresponding to
the input signal. Such a signal has side-bands which correspond to the Nyquist bands seen from inside
the digital system. The same principle applies if you want to convert a digital signal back to analog.
This concept can be further generalized: Consider the sampling process as a time-domain multi-
plication of the continuous-time signal x c (t) with a sampling function p(t), which is a periodic impulse
function (Dirac comb). The frequency-domain representation of the sampled data signal is the convolu-
tion of the frequency domain representation of the two signals, resulting in the situation seen in Fig. 13.
If you do not understand this by now, never mind. We shall discuss the concept of convolution in more
detail later.
3.4.2 Undersampling
Last but not least, I want to mention a technique called undersampling, harmonic sampling or sometimes
also called digital demodulation or downconversion. If your signal is modulated onto a carrier frequency
and the spectral band of the signal is limited around this carrier, then you may take advantage of the
‘aliasing’. By choosing a sampling frequency which is lower than the carrier but synchronized with it
(this means it is exactly a fraction of the carrier), you are able to demodulate the signal. This can be
done with the spectrum of the signal lying in any Nyquist zone given by the sampling frequency (see
Fig. 14). Just keep in mind that the spectral components may be reversed and also the phase of the signal
can be shifted by 180◦ depending on the choice of the zone. And also — of course — any other spectral
components which leak into the neighboring zones need to be filtered out.
24
D IGITAL SIGNAL PROCESSING MATHEMATICS
2
4.1
correct spectrum spectrum of
4.05 impulse train
4
3.95
Amplitude
3.9
Amplitude
1
3.85
3.8
3.65
0
3.6
2.4 2.6 2.8 3 3.2 3.4 0 1f s 2fs 3fs
time Frequency
Fig. 15: Frequency response of the zero-order hold (right) which is applied at the DAC and generates the step
function (left)
2
1.8
1.6
1.4
1.2
Amplitude
1
0.8
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
Frequency
Fig. 16: Transfer function of the (ideal) reconstruction filter for a DAC with zero-order hold
As you can imagine, this behaviour appears to be unpleasant because now, not only components of
the higher order sidebands of the impulse train spectrum are produced on the output (though attenuated
by H( f )), but also the original spectrum (the baseband) is shaped by it. To overcome this ‘feature’, a
reconstruction filter is used. The reconstruction filter should remove all frequencies above one half of f s
(an analog filter will be necessary, which is sometimes already built into commercial DSPs), and boost
1
the frequencies by the reciprocal of the zero-order-hold’s effect ( sinc() ). This can be done within the
digital process itself! The transfer function of the (ideal) reconstruction filter is shown in Fig. 16.
To design your digital signal processing system, you need to know about (analog) filter design, the
25
M. H OFFMANN
characteristics of anti-aliasing and reconstruction filters, and about limitations of signal processing like
bandwidth and noise of the analog parts and, for the digital parts, sampling frequency and quantization.
4 Noise
The terms error and noise are closely related. Noise is some fluctuation on the input signal which can
come from different sources, can have different spectral components and in many cases (except for the
dithering methods) is unwanted. It can cover the information you want to extract from the signal and
needs to be suppressed with more or less advanced techniques. Usually, some of the noise components
can hardly be avoided and, therefore, we shall have to deal with it. Noise on the signal can cause an error.
But there are also errors which do not come from noise. We therefore distinguish between systematic
(deterministic) errors on the one hand and unsystematic (statistical) errors (or noise) on the other hand.
We are going to take a closer look at this distinction.
Systematic error ←→ accuracy comes from characteristics of the measurement device (ADC/DAC:
offset, gain, linearity-errors). It can be improved by improvements of the apparatus, like calibra-
tion. The only limits here come from the practical usefulness and from quantum mechanics, which
keeps you from measuring certain quantities with absolute accuracy.
Statistical error comes from unforseen random fluctuations, stochastics, and noise. It is impossible to
avoid them completely, but it is possible to estimate the extent and it can be reduced through sta-
tistical methods (averaging), multiple repetitive measurements etc. This determines the precision
of the measurement.
Note that the definition is context dependent: The accuracy of 100 devices can be a matter of precision!
Imagine that you measure the same property with 100 different devices where each device has a slightly
different systematic (calibration) error. The results can now be distributed in much the same way as they
are with a statistical measurement error — and so they can be treated as statistical errors, in this case,
and you might want to use the statistical methods described in the following section.
The distinction above leads to the terms accuracy and precision, which we shall define in the
following sections. Besides this, we want to deal with the basic concepts of statistics which include
– random variables and noise (e.g., white noise, which has an equal distribution, Gaussian noise,
which has a Gaussian distribution, and 1/ f or pink noise, which is 1/ f distributed),
– the mean and the standard deviation, variance, and
– the normal or Gaussian distribution.
1 N−1
x̂ := ∑ xi .
N i=0
The variance σ2 (σ itself is called standard deviation) is a measure of the ‘power of fluctuations’ of the
set of N samples. It is a direct measure of the precision of the signal.
1 N−1
σ2 := ∑ (xi − x̂)2 .
N − 1 i=0
(8)
26
D IGITAL SIGNAL PROCESSING MATHEMATICS
8 10000
200
Number of occurences
Number of occurences
8000
6
150
−→
Value
6000
100 4
4000
50 2
2000
0 0 0
0 20 40 60 80 100 120 90 100 110 120 130 140 150 160 170 90 100 110 120 130 140 150 160 170
Sample number Value Value
M−1
N= ∑ Hi ,
i=0
1 M−1
x̂ := ∑ i · Hi
N i=0
,
1 M−1
σ2 := ∑ (i − x̂)2 Hi
N − 1 i=0
.
As you already saw in Fig. 17, with a large number of samples the histogram becomes smooth and
it will converge in the limit N → ∞ to a distribution which is called the probability mass function. This
is an approximation of the (continuous) probability density distribution. This is illustrated in Fig. 18.
In this case, the fluctuations of the samples have a Gaussian distribution. Examples of probability mass
functions and probability density distributions of common waveforms are shown in Fig. 19.
27
M. H OFFMANN
10000
0.04 0.04
Probability of occurence
Number of occurences
Probability density
8000
N→∞ 0.03 0.03
6000 −→
0.02 0.02
4000
0.01 0.01
2000
0 0 0
90 100 110 120 130 140 150 160 170 90 100 110 120 130 140 150 160 170 90 100 110 120 130 140 150 160 170
Value Value Signal level
Fig. 18: Histogram, probability mass function, and probability density distribution
σ σ
V pp V pp
2
σ σ
V pp V pp 0
−2
−4
−6
Fig. 19: Probability mass functions and probability density distributions of common waveforms
Z+∞
P(x)dx = 1 .
−∞
Now what is this good for? Imagine that we have N samples of a measured quantity. Then we can
define the
σN
typical error: ∆A = √ .
N
Here σN is an estimate of the standard deviation of the underlying process over N samples (e.g., extracted
from the histogram). This is the best information about the underlying process you can extract out of the
28
D IGITAL SIGNAL PROCESSING MATHEMATICS
y y
normalized x̂ = 20
raw shape f (x) = e −x2
σ=3
0.75 −2σ x̂
0.2 −3σ −1σ 1σ 2σ 3σ
0.50
0.1
0.25
0
x x
−4 −3 −2 −1 0 1 2 3 0 5 10 15 20 25 30 35
Fig. 20: The raw shape and the normalized shape of the Gauss function. The area of one standard deviation ±σ
integrates to 68.3%, the area of ±2σ to 95.4%.
8 8
changing mean changing mean and standard deviation
6 6
4 4
Amplitude
Amplitude
2 2
0 0
-2 -2
-4 -4
0 100 200 300 400 500 0 100 200 300 400 500
Sample number Sample number
sampled signal. In practice, that means that the more samples you take, the smaller the typical error ∆A
is. But this can only be done if the underlying quantity does not change during the time the samples were
taken. In reality, the quantity and also its fluctuations may change, as in Fig. 21, and it is a real issue to
select the proper and useful number of samples to calculate the mean and standard deviation σ N to get a
good approximation of what the real process may look like. There is no such thing as an instant error;
the probability density function cannot be measured, it can only be approximated by collecting a large
number of samples.
The sum of independent random numbers (of any distribution) becomes Gaussian dis-
tributed.
The practical importance of the central limit theorem is that the normal distribution can be used as
an approximation of some other distributions. Whether these approximations are sufficiently accurate de-
pends on the application for which they are needed and the rate of convergence to the normal distribution.
It is typically the case that such approximations are less accurate in the tails of the distribution.
29
M. H OFFMANN
x = RND
p(x) x̂ = 0.5
1 σ = √112 ≈ 0.29
0
x
0 1 2
x = RND + RND
p(x) x̂ = 1
σ = √16 ≈ 0.4
0
0 1 2 x
x = RND + · · · + RND (12 ×) Fig. 22: Consequence of the central limit
p(x) x̂ = 6
theorem: Summing up more and more
σ=1 equally distributed random numbers will re-
sult to good approximation in a Gaussian
0
0 6 12 x distributed random variable
It should now be clear why most of your measurements may be Gaussian distributed. This is
simply because the measurement process is a very complicated one with many different and independent
error sources which all together contribute to the final measurement value. They do so without caring
about the details of their mechanisms — as long as there are enough contributors, the result will be
approximately Gaussian.
There is also a practical application of the theorem in computing. Suppose you need to generate
numbers which have a Gaussian distribution. The task is quite easy; you just have to have a function
which generates any kind of (pseudo-) random numbers and then sum up enough of them.
Here is an example: first generate white noise using a function which produces equally distributed
random numbers between zero and one RND := [0; 1[. This is often implemented in the form of a pseudo
random generator which calculates
RND = (a s + b) mod c ,
where s is the seed and a, b and c are appropriately chosen constants. The new random number is used
as a seed for the next calculation and so on.
The distribution of this function is shown in Fig 22, top. If you now add two such random numbers,
the result will have a distribution as shown in the figure in the centre. After adding 12 random numbers
you already get a very good approximation of a Gaussian distribution with a standard deviation of σ = 1
and a mean value of x̂ = 6. If you subtract 6 from this sum, you are done. But do not really implement it
like this, because there is a simpler formula which only uses 2 random variables and will also do a good
job (x̂ = 0, σ = 1): q
x= −2 log10 (RND1 ) · cos(2π RND2 ) .
30
D IGITAL SIGNAL PROCESSING MATHEMATICS
accuracy
1500
number of occurences
Quantities which come from ratios are very often — for practical reasons (you avoid multiplication
and division) — expressed in decibels, a logarithmic pseudo-unit:
!2
P̄signal Âsignal,rms
SNR(dB) := 10 log10 = 20 log10
P̄noise Ânoise,rms
= Psignal [dBm] − Pnoise [dBm] .
A similar ‘unit’ is used if you talk about the carrier as reference: [SNR(dB)]=dBc (=‘dB below carrier’),
and so you can also define a CNR = carrier-to-noise ratio.
1. Systematic errors: Most importantly, ADC and DAC distortions: e.g. offset, gain and linearity
errors. These types of errors can be corrected for through calibration.
2. Stochastic errors: quantization noise, quantization distortions, as well as aperture and sampling
errors (clock jitter effects).
3. Intrinsic errors: DAC-transition errors and glitches. They are random, unpredictable, and some-
times systematic, but it is hard to correct the source of these errors, and so they need to be filtered.
31
M. H OFFMANN
output
100
The systematic errors can in principle be corrected for through calibration, and this is also the
recommended way to treat them wherever possible. The intrinsic errors are hard to detect, may cause
spurious effects and therefore make life really bad. If they bother you, a complete system analysis and
probably a rework of some components may be required to cure them. There is (nearly) no way to
overcome them with some sort of data processing. Therefore we focus here on the stochastic errors,
because the way we treat them with data processing determines the quality of the results. At least, we
can improve the situation by use of sophisticated algorithms which, in fact, can be implemented in the
digital processing system more easily than in an analog system.
Although this error is not really independent of the input value, from the digital side it actually is,
because there is no control when the least significant bit flips. It is, therefore, best to treat this error as a
(quantization) noise source.
For a full-scale sin() signal, the signal-to-noise ratio coming from the quantization noise is
fs
SNR = 6.02 n + 1.76dB + 10 log . (12)
2 BW
As you can see, it increases with lower BW. This means that doubling the sampling frequency increases
the SNR by 3dB (at the same signal bandwidth). This is effectively used with so-called ‘oversampling’
schemes. Oversampling is just a term describing the fact that with a sampling frequency that is much
higher than would be required by the Nyquist criterium, you can compensate for the quantization noise
caused by a low ADC bit resolution. Especially for 1-bit ADCs, this is a major issue.
In Eq. (12), it is assumed that the noise is equally distributed over the full bandwidth. This is often
not the case! Instead, the noise is often correlated with the input signal! The lower the signal, the more
correlation. In the case of strong correlation, the noise is concentrated at the various harmonics of the
input signal; this is exactly where you do not want them. Dithering and a broad input signal spectrum
randomizes the quantization noise.
32
D IGITAL SIGNAL PROCESSING MATHEMATICS
Nevertheless, this simple quantization noise is not the only cause of errors in the analog-to-digital
conversion process. There are two common, related effects: missing codes and code transition noise.
These effects are intrinsic to the particular ADC chip in use. Some binary codes will simply not be pro-
duced because of ADC malfunction as a consequence of the hardware architecture and internal algorithm
responsible for the conversion process. Especially for ADCs with many bits, this is an issue. Last but not
least, the ADC may show code transition noise; this means that the output oscillates between two steps
if the input voltage is within a critical range even if the input voltage is constant.
5 Linear systems
You now know some of the main consequences, advantages, and limitations of using digitized signals.
You know how to deal with aliasing, downsampling, and analog signal reconstruction. You know the
concepts of noise and the basic mathematical tools to deal with it.
Next, we are going to look more closely at the systems which transform the (digital) signals.
Of course, there are analog systems as well as digital ones. But, since there are not many conceptual
differences, we can focus mainly on the digital ones. The analogy to analog system concepts will be
drawn from whenever useful.
We are also going to use different notations in parallel: besides the mathematical notation, we show
the rather symbolic expressions commonly used in engineering fields. In contrast to the mathematical
notation, which is slightly different for analog systems (e.g. y(t) = 2x(t)) and digital systems (e.g.
y[n] = 2x[n]), the latter does not make a formal difference here. Both concepts and notations are in use in
different books on the field. They are, however, easy to understand, so you will quickly become familar
with both notations.
3. and MISO (Multiple-Input-Single-Output) systems; here the adder is the most popular double-
input-single-output system:
F
x1 , x2 −→ y
(x1 [n], x2 [n]) 7−→ y[n] Examples:
+ x
x2
Besides this, there is also a way to split signals. This produces a generic Single-Input-Double-Output
system.
33
M. H OFFMANN
Starting from elementary systems, the concept of superposition allows us to combine systems to
create more complex systems of nearly any kind.
5.2 Superposition
Systems may be of any complexity. It is, therefore, convenient to look at them as a composition of simpler
components. If we restrict ourselves to the class of linear systems, it is possible to first decompose the
input signals and then process them with simple systems. In the end, the result will be synthezised
by superposition for prediction of the output. In this way, we can split up the problems into many
pieces of simpler complexity, and even use only a few fundamental systems. Without the concept of
decomposition and linear systems, we would be forced to examine the individual characteristics of many
unrelated systems, but with this approach, we can focus on the traits of the linear system category as a
whole.
Although most real systems found in nature are not linear, most of them can be well approximated
with a linear system, at least for some limited range of ‘small’ input signal amplitudes.
5.3.1 Linearity
Given system F with F(x1 [n]) = y1 [n] and F(x2 [n]) = y2 [n], then F is said to be linear if
F(x1 [n] + x2 [n]) = F(x1 [n]) + F(x2 [n]) ,
(it follows that F(x[n] + x[n]) = F(2 x[n]) = 2 F(x[n])), and for two linear systems F1 and F2
F1 (F2 (x[n])) = F2 (F1 (x[n])) .
5.3.2 Time-invariance
(also ‘shift-invariance’) Given F with F(x[n]) =: y[n] is considered time-invariant if
F(x[n − k]) = y[n − k] ∀k ∈ N .
5.3.3 Causality
The system is causal if the output(s) (and internal states) depend only on the present and past input and
output values.
Causal: y[n] = x[n] + 3x[n − 1] − 2x[n − 2]
Non-causal: y[n] = x[n + 1] + 3x[n] + 2x[n − 1] .
In the latter case the system Y produces its output y by using an input value of the input signal x which is
ahead of time (or the currently processed time step n).
5.3.4 Examples
Which of the following systems are linear and/or time-invariant and/or causal?
34
D IGITAL SIGNAL PROCESSING MATHEMATICS
Fig. 25: A linear MIMO system composed of linear SISO systems and adders
x[n] x1 [n]
× y[n] × y[n]
constant
x2 [n]
linear nonlinear
5.5 Decompositions
An important consequence of the linearity of systems is that there exist algorithms for different ways
of decomposing the input signal. The spectral analysis is based on this, so one can say the concept of
decomposition is really fundamental. The simplest decompositions are
– Pulse decomposition
x[n]
= + x [n] + x [n]
x [n] + x [n] + ...
+. . . + x [n]
0 1 2 3 11
– Step decompositions
x[n]
= + x [n] + x [n]
x [n] +. . . + x [n] + . . .
+ x [n]
0 1 2 3 11
– Fourier decomposition
x[n]
= x [n] + x [n] + x [n] + x [n] +. . . + x [n] + xc7 [n]
c0 c1 c2 c3 c6
N = 16 + ! x! ! ! ![n]! ! ! + " x" " " " [n]" " " + # x# # # # [n]# # # + $ x$ $ $ [n]
$ $ $ $ +. . . + % x% % % % [n]% % % + & x& & & & [n]& & &
s0 s1 s2 s3 s6 s7
– and many others.
Later, we shall make extensive use of special decompositions and also convolutions (which is the
opposite process). Their applications are in the Fourier transformation, the Laplace and z-transformation,
wavelets and filters.
35
M. H OFFMANN
DC 1
(1)
δ function δ(t)
(1)
... ...
δ comb ϖ(t) 0 1 2
1
2
Gauss impulse e−πt
2
Switched 1
4 step(t) cos(2πFt)
F
cos function 1
T
1
Exponential impulse step(t) e−t/T
T 1
2T
Double 1 −|t|/T
e ; T >0
exponential impulse 2T 1
2T
1
sgn(t) e−|t|/T
2T
1
6 Special functions
In this very short section I wish to introduce you to some very common functions and signal forms shown
in Fig. 26. Special focus will be put on the δ-function (or better: the δ-distribution; but in practice the
difference does not play a big role). Other common waveforms are shown in the figure.
36
D IGITAL SIGNAL PROCESSING MATHEMATICS
continuous: discrete:
∞
' 1
'('(' '('('
0 0 1 2 3
( (
0 x 6= 0 0 k 6= 0
δ(x) := δ[k] :=
∞ x=0 1 k=0
Z ∞ ∞
−∞
δ(x) dx = 1 ∑ δ[i] = 1
−∞
The definition above can be improved if you look at the δ-function as the limit of a series of func-
tions. Some popular definitions include
sinc functions:
sin(κx)
δ(x) = lim
κ→∞ πx
Gauss functions:
1 x2
δ(x) = lim √ e− ε
ε→0 πε
Lorentz functions:
1 ε
δ(x) = lim
π ε→0 x2 + ε2
rectangles: (
1 0 |x| ≥ ε
δ(x) = lim rε (x) ; rε (x) :=
ε→0 2ε 1 |x| < ε
Also a complex (Fresnel) definition is possible:
r
α iαz2
δ(z) = lim e .
α→∞ iπ
More important than the correct definition are the calculation rules of the δ-function, which can
be applied independently of its definition, whether you use ∞ or the limits of series. The most important
ones are given here:
37
M. H OFFMANN
Fourier transform: Z ∞
1 1
√ δ(t)e−iωt dt = √
2π −∞ 2π
Laplace transform: Z ∞
δ(t − a)e−st dt = e−as
0
Scaling rule:
δ(x)
δ(αx) =
|α|
Another popular pseudo function is the so-called Dirac comb, which is a combination of an infinite
number of equally shifted δ-functions:
... ...
C(x) = ∑ δ(x − k) .
k∈Z
7 Convolution
As already mentioned before, decomposition and convolution are the fundamental operations of linear
systems. Here we are going to look more closely at the concept of convolution because the technique is
the basis of all digital filters. The specialized digital signal processors always have special and ready-
made instructions built in to support this operation.
38
D IGITAL SIGNAL PROCESSING MATHEMATICS
we can now immediately write down the output of the system if we know its impulse response:
N−1
y[n] = ∑ xn h[n − i] .
i=0
This arises because the system is linear and so the sum stays a sum and the product with a scalar (x n )
transforms to a product with a scalar. Only the response to the δ-function needs to be known, but this is
just the impulse response! Try to really understand this fundamental fact, recapitulate the linearity criteria
if necesary and make it clear to yourself what x n δ[n − i] means. The features you should remember are
– h[n] has all information to process the output of the system for any input signal !
– h[n] is called filter kernel of the system (and can be measured by impulse response).
– The system is ‘causal’l if h[i] = 0 ∀i < 0.
– The output for any input signal x[n] is
y[n] = x[n] ∗ h[n] ,
where ∗ is the convolution operator. The mathematical definition follows.
7.2 Convolution
Given two functions f , g : D → C, where D ⊆ R, the convolution of f with g, written f ∗ g and defines
as the integral of the product of f with a mirrored and shifted version of g:
Z
( f ∗ g)(t) := f (τ)g(t − τ) dτ .
D
The domain D can be extended either by periodic assumption or by zero, so that g(t − τ) is always
defined.
Given f , g : D → C, where D ⊆ Z, the discrete convolution can be defined in a similar way by the
sum:
( f ∗ g)[n] := ∑ f [k]g[n − k]
k∈D
Two examples of discrete convolutions are shown in Fig. 28 and Fig. 29. As you can see, it is very
simple to realize digital filters with this technique by choosing the appropiate filter kernels. You may ask
where the filter kernels come from. Well, this is the topic of filter design where a practical formalism can
be used which we briefly discuss in the section about the z-transform.
This feature allows you to rearrange systems which are in series in different and arbitrary orders. It does
not matter if you first pass a differentiator and then a low-pass or vice versa. The result will be the same.
39
M. H OFFMANN
. ...... ///////////
- - - - .. // /
3 0.06 3
..
-- - - - - - - - - - - - - - - - ∗ = / / / / / / ///
2 0.04 2
Amplitude
Amplitude
1
- - - - - -- -- - - ----- 0.02 1
//////////
0
-1
-- 0.00
-0.02
0
-1
-2 -0.04 -2
0 10 20 30 40 50 60 70 80 0 10 20 30 0 10 20 30 40 50 60 70 80 90 100 110
Sample number Sample number Sample number
0 0 0 0
3 0.75 3
00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ∗ = 2 2 22 22 2 2 2 2 2 2 2
2 0.50 2
2
Amplitude
Amplitude
Amplitude
0 0 0 0 0 00 00 0 0 00000 111111111111111 2 2 2 2 2 2 2 2 2 2 2 2 2 22 22 2 2 2 2 2 2 2222222
2 2 22 2 222 222 2 2 22 2222 2 2 2 2
1 0.25 1
-1
0
00 0.00
-0.25
0
-1
-2 -0.50 -2
0 10 20 30 40 50 60 70 80 0 10 20 30 0 10 20 30 40 50 60 70 80 90 100 110
Sample number Sample number Sample number
| {z }| {z }| {z }
Input Signal impulse response Output Signal
filter kernel
Fig. 28: Realization of a low-pass and a high-pass filter with convolution. The input signal is convoluted with an
appropiate filter kernel and the result is the output signal.
33 3 3 3
3 1.0 3
33 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ∗ 444444444444444 =
2 0.5 2
5555555555555 55555
Amplitude
Amplitude
Amplitude
3 3 3 3 3 33 33 3 3 33333 4 5 5555555555555
55 55 555555555555
1 0.0 1
0
33 -0.5 0
5
-1
-2
-1.0
-1.5
-1
-2
55 5 5
0 10 20 30 40 50 60 70 80 0 10 20 30 0 10 20 30 40 50 60 70 80 90 100 110
Sample number Sample number Sample number
d.)1.5Discrete Derivative
66
66 6 6 6 6 6 6 6 6 7
4 4
6 6 6 6
3 1.0 3
66 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 ∗ 77777777 7777777 =
2 0.5 2
Amplitude
Amplitude
Amplitude
1
6 6 6 6 6 66 66 6 6 66666 0.0 1
8888888888888 8 8888888 8 8888888 8 8888888888888888888888
-1
0
66 -0.5
-1.0 7 0
-1
-2 -1.5 -2
0 10 20 30 40 50 60 70 80 0 10 20 30 0 10 20 30 40 50 60 70 80 90 100 110
Sample number Sample number Sample number
| {z }| {z }| {z }
Input Signal impulse response Output Signal
filter kernel
Fig. 29: Realization of a digital attenuator and calculating the derivative of an input signal
40
D IGITAL SIGNAL PROCESSING MATHEMATICS
h1 [n]
x[n] + −→ y[n]
h2 [n]
7.3.5 Exercise
Given x[n] a ‘pulse-like’ signal (x[n] = 0 for small and large n), what is the result of
7.4.1 Cross-correlation
Given two functions f , g : D → C, where D ⊆ R, the cross correlation of f with g:
Z
!
( f ◦ g)(t) := K f (τ)g(t + τ)dτ .
D
The cross-correlation is similar in nature to the convolution of two functions. Whereas convolution
involves reversing a signal, then shifting it and multiplying it by another signal, correlation only involves
shifting it and multiplying (no reversing).
41
M. H OFFMANN
7.4.2 Auto-correlation Z
Ag (t) := g ◦ g = K g(τ)g(t + τ)dτ
D
. The auto-correlation can be used to detect a known waveform in a noisy background, e.g., echoes
of a signal. This can also be used to detect periodicities in a very noisy signal. The auto-correlation
function of a periodic signal is also a periodic signal with the same period (but the phase information is
lost). Because white noise at one time is completely independent of white noise at a different time, the
auto-correlation function of white noise is a δ pulse at zero. So, for the analysis of periodicities, you just
look at the auto-correlation function for bigger time lags and ignore the values around zero, because this
area contains only the information about the strength of the noise contribution.
( f ◦ g)[n] := α ∑ f [k]g[n+k] ,
k∈D
which is identical to
!
f [n] ◦ g[n] = f [n] ∗ g[− n] .
8 Fourier transform
The Fourier transform is a linear operator that maps complex functions to other complex functions. It de-
composes a function into a continuous spectrum of its frequency components, and the inverse transform
synthesizes a function from its spectrum of frequency components. The Fourier transform of a signal
x(t) can be thought of as that signal in the frequency domain X(ω).
Information is often hidden in the spectrum of a signal. Figure 30 shows common waveforms and
its Fourier transforms. Also looking at the transfer function of a system shows its frequency response.
The Fourier transform is, therefore, a commonly used tool. As you will see later, a discretized version of
the Fourier transform exists which is the Discrete Fourier Transform.
Given f : D → C, where D ⊆ R, the Fourier transformation of f is:
Z
F(ω) := f (t) e−iωt dt
D
42
D IGITAL SIGNAL PROCESSING MATHEMATICS
1 δ( f )
(1) 1
δ(t) 1
(1) (1)
... ... ... ...
ϖ(t) 0 1 2 ϖ( f ) 0 1 2
1 1
2 2
e−πt e−π f
2
(1)
1
δ( f + F)+
2 cos(2πFt)
F
δ( f − F) −F F
1 1
rect(t) 1
2
sinc(π f ) 1
1 1
sinc(πt) 1 rect( f ) 1
2
1
1 i
step(t) δ( f ) −
4
2 2π f
1 δ( f + F) + δ( f − F )
F
4 step(t) cos(2πFt) −
i 2f
−F F
π f 2 − F2
1
T 1
1 1
step(t) e−t/T
T 1 + i2π T f
X(ω) = X ∗ (−ω)
complex conjugate
The Fourier transform of a cos-like signal will be purely real, and the Fourier transform of a sin-
like signal will be purely imaginary. If you apply the Fourier transform twice, you get the time-reversed
FT FT
input signal x(t) −→ X(ω) −→ x(−t). In the following, most important calculation rules are summarized:
43
M. H OFFMANN
Scaling: 1 ω
FT{x(λt)} = X( )
|λ| λ
Convolution:
FT{x1 (t) ∗ x2 (t)} = X1 (ω) · X2 (ω) ; FT{x1 (t) · x2 (t)} = X1 (ω) ∗ X2 (ω) (13)
Integration:
DC offset
Z t
z Z }|
{
1 1 ∞
FT{ h(τ) dτ} = X(ω) + h(τ) dτ δ(ω) (14)
−∞ iω 4π −∞
Time-shift:
FT{x(t + t0 )} = eiωt0 X(ω)
The output signal will be the convolution of the input signal with each of the impulse response
vectors
y(t) = x(t) ∗ h1 ∗ h2 . (15)
If we now look at the spectrum of the output signal Y (ω) by Fourier transforming Eq. (15), we get
Transfer Functions
Here we made use of the calculation rule (13), that the Fourier transform of a convolution of two
signals is the product of the Fourier tansforms of each signal. In this way, we are going to call the Fourier
transforms of the impulse responses transfer functions. The transfer function also completely describes
a (linear) system; it contains as much information as the impulse response (or kernel) of the system. It
is a very handy concept because it describes how the spectrum of a signal is modified by a system. The
transfer function is a complex function, so it not only gives the amplitude relation |H(ω)| of a system’s
output relative to its input, but also the phase relations. The absolute value of the transfer function can
tell you immediately what kind of filter characteristic the system has. For example, a function like |H| =
|H|
1 low pass
x1 × y
44
D IGITAL SIGNAL PROCESSING MATHEMATICS
?
1
linear system
h(t)
step(t) y(t)
It is the convolution of the two input spectra. In the special case, where one input signal consists only of a
single frequency peak, the spectrum of the second input will be moved to this frequency. So a multiplier
(sometimes also called a mixer) can be used to shift sprectra. Exercise: what does the resulting spectrum
look like if you have a single frequency on each of the two inputs? Which frequency components will be
present? Do not forget the negative frequencies!
FT−1
Rt
y(t) = −∞ h(τ) dτ
The step response is the integral over time of the impulse response.
45
M. H OFFMANN
cross-correlation:
FT
s(t) ∗ g(−t) ←→ S(ω) · G∗ (ω)
Here, the real part of the spectrum of the cross-correlation of two signals tells us about parts which are
similar, and the imaginary part of the spectrum tells us about parts which are not correlated.
9 Laplace transform
You have seen how handy the Fourier transformation can be in describing (linear) systems with h(t). But
a Fourier transform is not always defined:
– for example, x(t) = e−t has an infinite frequency spectrum X(ω) > 0 everywhere;
– for example, x(t) = et is unbounded and can not even be represented;
– for example, step(t) −→ infinite frequency spectrum;
– a lot of δ-functions appear, etc.
To handle this, we decompose these functions, not only into a set of ordinary cosine and sine functions,
but we also use exponential functions and exponentially damped or growing sine and cosine functions.
It is not so complicated to do. We just substitute the frequency term iω by a general complex number p.
You can look at this as introducing a complex frequency
p = σ + iω ,
where ω is the known real frequency and σ is a (also real) damping term. The functions to deal with now
become
f (t) = e−pt = e−σt · e−iωt .
Instead of the Fourier transform we now introduce a more general transform, called the Laplace trans-
form: Given s : R+ → R, the Laplace transformation of s is:
Z∞
L
s(t) 7−→ S(p) := s(t) e−pt dt
0
Remember:
We shall come back to the inverse Laplace transform later in Section 9.5.
46
D IGITAL SIGNAL PROCESSING MATHEMATICS
∞
Im(p)=ω
sin-cos
× ×
× ×
0 Re(p)=σ
σ0
× ×
convergence Frequency-axis p = iω
To see what that means in practice, it is useful to visualize the functions and numbers we deal
with in a diagram. This is possible if we look at the complex plane shown in Fig. 32, called the p plane.
Different points on the plane correspond to different types of base functions as shown. The ordinary sine
and cosine functions are also present and live on the imaginary axis. If the imaginary axis lies inside
of the convergence area, then also the Fourier transform exists, and you will get it if you go along the
imaginary axis. In addition, you also get spectral values for other regions of the plane.
What is this good for? Well, it is good for solving differential equations, especially those for
analog filters. Let us see how this works. Formally, we do exactly the same as we did with the Fourier
transform:
x −→ System −→ y
L
L
L
L−1
X · H = Y
We will have the concept of transfer functions slightly extended onto the whole p plane, but the
concept stays the same. So we may get answers to questions like: What filter do I need for getting
a specific output y(t)? Or we can compose the system out of subsystems by multipliing the transfer
functions of each of them, etc. This implies that we will have nearly the same or similar calculation rules
as for the Fourier transformation, and indeed that is exactly true.
47
M. H OFFMANN
L L−1
x(t) −→ X(p) −→ x(t)
Linearity:
L{c1 x1 (t) + c2 x2 (t)} = c1 X1 (p) + c2 X2 (p)
Scaling: 1 p
L{x(λt)} = X( ); λ>0
|λ| λ
Time-shift:
L{x(t − t0 )} = e−pt0 S(p) ; L−1 {X(p + p0 )} = e−p0t s(t); t0 > 0
Convolution:
L{x1 (t) ∗ x2 (t)} = X1 (p) · X2 (p) ; L−1 {X1 (p) ∗ X2 (p)} = x1 (t) · x2 (t)
Integration: Zt
S(p)
L{ s(τ)dτ} =
p
0Z∞
s(t)
L−1 { S(p0 )d p0 } =
t
p
Differentiation:
dn dks
L{ s(t)} = pn S(p) if |t=0 = 0 ∀k < n
dt n dt k
48
D IGITAL SIGNAL PROCESSING MATHEMATICS
Im(p)
× +ω0
p0
Poles ◦
σ0 Zero Re(p)
× −ω0
convergence
area
Fig. 33: Two poles and one zero in the p plane for the complex spectrum of a damped oscillation
−ω0 ω0
and you can see the resonance. If the poles were on the i-axis, a δ function would be necessary for
expressing the spectrum:
−ω0 ω0
49
M. H OFFMANN
M N
Y (p) = ∑ ak pk · X(p) + ∑ bk pk ·Y (p)
k=0 k=1
M
∑ ak pk
k=0
= N
· X(p) =: H(p) · X(p) .
1 − ∑ bk pk
k=1
Here, the transfer function H(p) is defined for the whole complex plane using the coefficients from the
differential equation. Its general form is
M M
∑ ak pk aM ∏ (p − p0k )
k=0 k=1
H(p) = N
= M
.
1 − ∑ bk pk −bN ∏ (p − p pk )
k=1 k=1
Factorizing is always possible. p0k are the zeros and p pk the poles of the transfer function. The transfer
function is fully determined by its poles and zeros (except for a complex factor abMN )!
aM ∏M |p − p0k |
|H(p)| = · k=1
N
.
bN ∏i=1 |p − p pi |
Figure 34 illustrates how you can read the frequency response from a small diagram. You scan
from zero along the imaginary axis (which gives you the real frequency ω) and from each pont z you
measure the distances between z and zeros and the distances between z and poles, multiply and divide
them together and plot the result in the diagram in dependency of ω as shown in Fig. 34. This is the way
your filter design tools do it (no magic).
50
D IGITAL SIGNAL PROCESSING MATHEMATICS
i
Zeros
◦
×
|H|
× p
Poles 1.5
scan ν 1.0
×
r 0.5
0
× νg ν
×
◦
−→
Fig. 34: Calculating the frequency response of a system from the poles and zeros of its transfer function
no poles here
σ
path
× counterclockwise
× encircles all poles
exactly once
−∞
Fig. 35: Different integration paths around poles for the inverse Laplace transform
Now the question might be, why does the integration go from minus infinity to infinity exactly on the
boundary of the convergence area? Indeed it is not necessary to do it this way. But, the integration
path needs to encircle all poles (exactly once) anticlockwise. From residual theory we know that the
contribution of a holomorph function on an area where there are no poles is zero for any closed integration
loop. So we define I
1
s(t) = S(p) e pt d p
2πi
path
s(t) = ∑ Res p pk S(p) e pt .
p pk
51
M. H OFFMANN
1
tells us that the behaviour of S(p) for large |p| should be at least a decay of the order of |S(p)| < so
|p|
that for lim the contribution to this path of integration is zero.
p→∞
Examples
Finally, let us do two examples of inverse Laplace transforms to see how it works out:
1. p0 single pole of 1
S(p) := , k = 1, p0 = −a .
p+a
s(t) = Res−a S(p) e pt
1 d0 1 pt
= · 0 e (p + a)1 = e−at .
(1 − 1)! d p p+a p=−a
52
D IGITAL SIGNAL PROCESSING MATHEMATICS
1 N−1 nk
s[n] = ∑ S[k] e2πi N .
N k=0
Calculation rules for the DFT are exactly the same as for the continuous Fourier transforms (lin-
earity, symmetry, etc.), just replace ω with the discrete frequency
N−1
k
ωd : 2π
N
⇒ S[ωd ] = ∑ s[n] e−iω n
d
n=0
ωd
and then substitute k = 2π ·N , k ∈ N0 .
But there are also two important differences, one is the
Scaling: (λ ∈ Z) 1 ωd
DFT{x[λn]} = X( ) = X[???]
|λ| λ
which will not work, because the length of the period itself is modified. A little modification needs to be
applied to the
Time-shift:
DFT{x[n + n0 ]} = eiωd n0 X[k] .
And finally, with the convolution, one needs to pay attention, because if the result has more samples
than the period, it needs to be folded back into the period.
1. zero-padding
2. windowing, also for estimation of FT of an aperiodic signal and time-frequency analysis.
53
M. H OFFMANN
x[n] + + y[n]
a0
UD UD
+ +
a1 b1
UD UD
+ +
a2 b2
UD UD
+ +
aM−1 bN−1
In analogy to the differential equation for analog systems (see Eq. (17))
N M
dk dk
y(t) = ∑ αk dt k x(t) + ∑ βk dt k y(t) ,
k=0 k=1
we can define a similar equation of differences for the digital systems which only consists of the above
mentioned three operations (compare with the equivalent notation shown in Fig. 36):
N−1 M−1
⇒ y[n] = ∑ ak x[n − k] + ∑ bk y[n − k] . (20)
k=0 k=1
| {z } | {z }
direct recursive
where we have two filter kernels, one direct kernel a[M] and one recursive kernel b[N].
54
D IGITAL SIGNAL PROCESSING MATHEMATICS
It is clear why it is named in this way: for the FIR filter the impulse response has only a finite number
of non-zero values, which means that there is a n f where h[i] = 0 ∀i > n f . In contrast to this, the
impulse response will (in general) be of infinite length, although only a finite set of the coefficients
(ak , bk ) generate it.
Order := max(N, M) .
So the order is the minimum number of coefficients needed to implement it. The order is also a
measure for the maximum latency (or delay) of a filter, because it counts the maximum number of unit
delays needed to complete the output (refer to Fig. 36).
For an FIR filter, the order of the filter is equal to the length of the impulse response. For an IIR
filter this is not the case.
N M
y[n] = ∑ ak x[n − k] + ∑ bk y[n − k]
k=0 k=1
DFT time shift time shift rule
DFT
N z }| { M z }| {
Y (ωd ) = ∑ k d
a X(ω ) e −iωd k
+ ∑ k d
b Y (ω ) e −iωd k
k=0 k=1
N
∑ ak (e−iωd )k
X(ωd )6=0 ∀ωd Y (ωd ) k=0
=⇒ H(ωd ) := = . (21)
X(ωd ) M
1 − ∑ bk (e−iωd )k
k=1
Remember that the digital frequancy ω d is periodic with ωs (−π < ωd < π).
Further, remember that we developed a similar formula in Section 9.3. In that case, we used the
Laplace transform (as a more general expression which was extended to complex frequencies) instead of
the Fourier transform. It is also possible to do (more or less) the same thing here for the digital systems.
We can substitute z := eiωd and extend it to the complex by including a damping term σ
z := eiωd −σ .
55
M. H OFFMANN
11 The z transform
Introducing the z-transform, we develop a tool which is as powerful as the Laplace transform mentioned
in Section 9, but also applicable for digital systems and digital signals. The concept is based on the peri-
odicity of the spectra of digital signals. With a suitable transformation, all tools and methods developed
for analog systems and analog signals using the Laplace transform can be adapted for use with digital
ones.
k
Starting with the discrete transfer function, we simply do the substitution z := e iωd (= e2πi N ) in
Eq. (21):
N
∑ ak z−k
substitution k=0
H(ωd ) −→ H(z) = M
.
1 − ∑ bk z−k
k=1
This substitution maps the frequency axis to the unit circle in the complex z-plane:
i
ωd = ±π 1
z
ωd 0, fs , 2 fs , . . .
−→ ×
−2 fs − fs 0 fs 2 fs r
−1 1
ωd = −π ωd = π
−1
This concept is useful because it automatically accounts for the periodicity of ω d . The z-plane (or
the unit circle) is a representation of one period of the digital frequency. Frequencies above the Nyquist
frequency are automatically mapped to the place where its aliasing frequency would be. So there will be
no aliasing from now on.
Now, we can extend this concept to the whole complex plane z ∈ C. We therefore add a damping
term to the digital frequency ωd :
ωd ∈ R[−π,π] −→ ω dc ∈ C
ωdc = ωd + iσ ,
⇒ z = ei ωdc = eiωd −σ .
As shown in Fig. 37, different points and regions for z correspond to different classes of (sampled)
functions. As with the p-plane for the Laplace transform, besides discrete sine and cosine functions
there are also discrete exponential functions, as well as exponentially damped and growing functions.
Together, this set of functions forms a basis for the decomposition of any discrete signal. In particular,
for the expression of the transfer functions of discrete systems, we can find a very handy way; it is similar
to what we did with the transfer functions of analog systems, factorized in poles and zeros in the p-plane.
56
D IGITAL SIGNAL PROCESSING MATHEMATICS
D D D
ωd = 2π/3
σ=0 × × ;;;;;;;;
D>D DBD D == ωd = π/4
×
× EE
= =>= = = =
σ = 0.347 × EE E
ωd = π/5
σ=0
σ>0
EEE
× ×CC × A × ×
CC A>A>ABA>ABA>A r
CCCC @@@@@@@@
ωd = π ωd = 0 ωd = 0
FFFF
ωd = π
σ=0
σ = 0.357 σ=∞ σ = −0.0953
FFFF ????
????
ωd = 0
σ = 0.357
×
sin/cos growing
i
1
|H|
z
× 1.5
◦ scan ν 1.0
◦ r 0.5
−1 1
× 0 ωd
0 0.5 1.0 2π
−1
−→
Fig. 38: Calculation of the frequency response of a digital system from its poles and zeros of its transfer function
in the z-plane
All this can be done by a very easy calculation, or even graphically, if you like.
Examples
1. 2nd order non-recursive filter (FIR)
1 1
a0 = ; a1 = 1 ; a2 = ; b1 = b2 = 0
2 2
57
M. H OFFMANN
i
zero 1 double pole
1 1 z
h[n] = { , 1, }
2 2
1 1 ◦ ×
−→ y[n] = x[n] + x[n − 1] + x[n − 2] r
−1 1
2 2
1
+ e−iωd + 21 e−2iωd −1
−→ H(eiωd ) = 2
1
|H|
1 1 z−2 1.5
−→ H(z) = + z−1 + z−2 = (z + 1) 1.0
2 2 2
0.5
Poles: z p1 = z p2 = 0, Zeros: z01 = −1 0 ωd
0 0.5 1.0 2π
a0 = 1 ; a 1 = a2 = · · · = a n = 0 ; b1 = 0.9 ; b2 = · · · = bm = 0
i
1
1
G h[n]
GGGGGGGG z
h[n] = (0.9)n ; n≥0
0 1 2 3 4 5 6 7 n ◦ ×
r
−1 1
1 z
−→ H(z) = =
1 − 0.9z −1 z − 0.9 −1
|H|
∞
Z
h[n] 7−→ H(z) := ∑ h[n] z−n
n = −∞
Region of convergence
The Region of convergence (Roc) can be defined as follows:
( ) z-plane i
∞
Roc := z: ∑ h[n] z−n < ∞ r
n=−∞
(all the poles of h(z) lie inside a circle of |z| < r.) Roc.
58
D IGITAL SIGNAL PROCESSING MATHEMATICS
i i
p-plane z-plane
B
A C r A r
B
C
⇐ Bilinear Transformation ⇒
analog system ⇐ ⇒ digital system
z sin(ω)
h[n] = sin(ωn) ? 0 |z| > 1−?
z − 2z cos(ω) + 1
2
If the impulse response of the system is decaying faster than or approximately exponentially, then
all poles lie inside a circle of finite size, and the z-transform exists outside of that circle.
59
M. H OFFMANN
where C is an anticlockwise, closed path encircling the origin and entirely in the region of convergence.
C must encircle all of the poles of X(z). In this case Eq. (22) can be expressed using the calculus of
residuals
x[n] = ∑ Resz pk X(z) zn−1 .
z pk
Example
z0 single pole of 1
X(z) := , k=1, z0 = 0 .
z
x[n] = Res0 X(z) zn−1
(
1 d 0 1 n−1 1 1 n=1
= · z · (z − 0) = = δ[n − 1] .
(1 − 1)! dz0 z z=0 0 n 6= 1
60
D IGITAL SIGNAL PROCESSING MATHEMATICS
edge frequency and what is the phase response of the system to be created? This is especially necessary
if you design feedback loops, and stability is your concern.
Well, this is how you could start:
– Do not specify ai , bi but zeros z0k and poles z pi by the transfer function H(z), H(ωd ), the impulse
response h[n], or the step-response s[n]. Usually, you do this by trial and error: you place some
poles and zeros on the z-plane and calculate the frequency response (if you are interested in the
frequency domain), or the step response (if you are interested in time domain) or both. Then you
can move these poles around to see how that changes the responses. You could add more zeros or
poles and try to cancel out resonances if they bother you, etc.
– Then calculate ai and bi or h[n] for implementation. The implementation is straightforward and
not very difficult; if you keep the order of your filter small, there will not be so many surprises
later.
To make this trial-and-error job a little more sophisticated, you should know that
1. Because ai , bi usually should be real (for implementation), −→ z 0k and z pi need to be real or they
appear in complex conjugate pairs.
2. The filter kernel should be finite or at least
!
lim h[n] = 0
n→∞
otherwise the filter might be unstable. A consequence of this boundary is that |z pk | < 1, which
means the poles need to be located inside the unit circle.
61
M. H OFFMANN
noise
+ LP
Fig. 42: Filtering a noisy signal with a low-pass filter. The result is time-shifted and the high-frequency compo-
nents (the sharp edges) of the signal are not well reconstructed.
62
D IGITAL SIGNAL PROCESSING MATHEMATICS
(Detector-)
noise x̂ is best/optimal
ζ[n] x̂[n] estimate of the
x[n] predictive signal
real +
very noisy
system signal
y[n] filter
σ[n]
σ is estimation
of the certainty
adaptive model of the of the prediction
real system
with model parameters system
αi identification
Fig. 43: Principle of an adaptive and predictive filter (like the Kalman filter). The filter consists of a model of
the underlying process which can calculate predicted values from the model parameters (and the latest measured
values). The model parameters are adapted from a system identification block. The algorithm is essential for the
optimality of the Kalman filter; it always follows the variance of the measured data and the predicted data, based
on the rule of ‘propagation of error’. In this way, it is guaranteed that the output variance is always smaller than
(or equal to) the variance of the input signal.
can improve the knowlege of the position. Assume the uncertainty σ 2 of this second measurement is
smaller than the first one (because now the captain himself did the measurement). You could throw away
the first measurement and only use the second one. But this would be not the best solution, because the
first measurement also contains information we could benefit from. So the clever way is to combine both
measurements
σ22 σ21
⇒ best estimate: x̂ = · x 1 + · x2
σ21 + σ22 σ21 + σ22
1
uncertainty: σ̂ = q ≤ min(σ1 , σ2 )
1 1
σ 2 + σ 2
1 2
4 For the moment time does not play a role because we assumed the boat to be at rest.
63
M. H OFFMANN
so that the variance σ̂ of the resulting position measurement x̂ is even better than the best of each single
measurement.
But what if some noticeable/relevant time has passed between the measurements?
To be more general we can say:
σ22 σ21
x̂(t2 ) = · x(t 1 ) + · x(t2 )
σ21 + σ22 σ21 + σ22
σ21
= x(t1 ) + · (x(t2 ) − x(t1 )) .
σ21 + σ22
Now we consider a stream of (new) input data x n+1 := x(tn+1 ) which should be combined with
the latest best value x̂n to produce a new best value x̂n+1 . (Remember: the variances σn+1 of each
measurement are assumed to be known.) This is trivial if the measurement device is the same for all
measurements, since σn+1 can be assumed to be constant. But even if σ n+1 is not known in advance, one
can estimate it by calculating the variance (e.g., with the method of running statistics) of the input signal
stream.
where v(t) ± ∆v(t) is assumed to be known by a different measurement (called system identification).
Besides this, we also assume that v is constant or changing only adiabatically (slowly compared to the
sampling rate of the position measurements). The model also tells us the expected uncertainty of the
calculated position value: q
σ(t) = (∆v · (t − t0 ))2 + (σ(t0 ))2 .
If, at this moment, you do not understand this formula, read the paragraph about ‘propagation of error’.
Figure 44 shows you what this means: Because the velocity also has a non-zero variance, the variances
of the position derived from it become larger with time (σ is growing!), so the uncertainty is increasing!
Now let us see what this means for our example. Since the boat is moving, we cannot simply
combine the latest best value with the new measurement, because some time has passed since the last
64
D IGITAL SIGNAL PROCESSING MATHEMATICS
measurement and we know (by our model) that the position must have changed in the meantime and
we cannot simply combine it (to produce an average). Instead, we have to consider this position change
since the last measurement. This can be done by a prediction of the actual position by our model:
x(tn+1 ) =: x̄n+1 based on the model parameter v, the last ‘known’ position x̂(t n ):
Propagation of error
Consider a function
f = f (α1 , α2 , . . . , αn )
which is a function of one or more (model) parameters α i , each with corresponding errors ∆α i . Now you
want to know the consequence of these errors on the overall error or uncertainty of f .
s 2
∂f
⇒ ∆f = ∑ ∂αi
∆αi .
i
Maybe you have seen this before, because this a very common formula in physics and applies everywhere
where measurements are done. In our example this means:
x(t) = v · t + x0
s 2 2 2
∂x ∂x ∂x
⇒ ∆x = ∆v + ∆x0 + ∆t
∂v ∂x0 ∂t
q
= (∆v · t)2 + (∆x0 )2 ,
(assuming ∆t = 0).
This assumes that the individual errors are not correlated and are Gaussian distributed. This is
likely because of the central limit theorem, but not guaranteed!
65
M. H OFFMANN
1
x̂n = x̄n + K̄ n (xn − x̄n ) ; σ̂n = p
1/σ̄n + 1/σ2n
2
where σ̄2n
K̄n :=
σ̄2n + σ2n
is the redefined Kalman gain.
With some additional substitutions, T := t n+1 − tn ,
one can see the general structure (difference equation) of the digital filter:
And this is also the way the Kalman filter could be implemented. Notice that the second term is
a recursive part of the filter. The Kalman gain is the weight which decides how much model and how
much input data goes to the output. If the prediction from the model is bad (the corresponding estimated
variance σ̄ is large), the Kalman gain tends to K̄ = 1 and so the input will be directly passed to the output
without using the model at all, but also without making the output data more noisy than the input. On
the contrary, if the input data occasionally has a lot of noise and the model and its model parameters are
still fine, K̄ will be closer to zero and the output data of the Kalman filter will be dominated by the model
predictions and its statistics.
– The Kalman filter makes use of an internal model and model parameters.
– The ‘internal’ system/model parameters (σ, v, ∆v) are calculated from the input data itself.
– Also the variances of the input data stream and the variances of the derived predicted values belong
to the internal parameters.
– The Kalman filter makes use of the ‘propagation of error’ principle.
– The Kalman filter has three fundamental functional blocks:
1. The combination of model predicted with input data stream.
2. The prediction block for the next model value.
3. The system identification block for update of the model parameters.
66
D IGITAL SIGNAL PROCESSING MATHEMATICS
y[n]
x̄n−1 x̄n
K̄ model
Fig. 45: Internal structure of the Kalman filter. In one
correct estimate block, the model prediction x̄n−1 from the last step is
combined with the actual input value x[n] and passed
σ v, ∆v
to the output y[n]. The second block calculates the pre-
diction for the next time-step based on the model pa-
rameters and their variances. In parallel, the model pa-
x[n] update model x[n + 1] rameters need to be updated by a system identification
parameters
t algorithm.
tn
We discussed a fairly simple example. In more realistic applications, the model can be very
complex. But with more and more model parameters, more error contributions are added; this means
that the optimal model complexity needs to be evaluated. The model should be as complex as necessary
to reduce the noise, but also as simple as possible.
The trick is that the σ of the output will always be smaller than (or in worst case equal to) the σ of
the input5 . So the output will be best noise-filtered (depending on the model). A bad model generates a
K̄ near to one, so the input is not corrected much (no effective filtering).
14 Wavelets
As mentioned in the previous section, wavelets can be helpful (among other things) at removing noise
with a special spectral characteristics from a (non-causal) signal.
A similar method can also be used to select only especially desired spectral components with
special filters; this is also often used for (lossy) data compression as for audio signals or images. Last
but not least, the wavelet transformation also has applications in solving special classes of differential
equations. We shall not go into these very popular fields, but instead restrict ourselves to the question of
how we can make use of the wavelets for noise removal.
67
M. H OFFMANN
noise
|f|
+ DFT
|f|
DFT−1
threshold low
threshold high
Fig. 46: Example for using the digital Fourier transformation for noise filtering. The noisy signal is transformed to
frequency domain. Then all spectral componets which are below a certain threshold (for amplitude) are removed
(which means they are set to zero) and finally the data is transformed back into time domain. Depending on the
threshold used, the result is fairly good, or still too noisy if the threshold was too low, or the reconstruction of the
signal is bad, if the threshold was set too high.
way, it is possible to remove high-frequency noise where the original signal is smooth and still conserve
the sharp edges of the waveform.
And finally, more technical requirements are needed, in particular, applications to make the calculation
easy.
– smooth wavelets,
– compactly supported wavelets (Daubechies, 1988),
– wavelets with simple mathematical expressions (Haar, 1900, Meyer, Morlet),
6 On the contrary, any function consisting of a δ-function will probally be localized in time but definitly not in frequency.
7 The Gauss function itself would also be localized in frequency, but it does not fulfil the more restrict requirement that the
spectrum go to zero for ω → 0.
68
D IGITAL SIGNAL PROCESSING MATHEMATICS
localized
Each wavelet family is generated from a ‘mother wavelet’ Ψ1,0 (which fulfils the requirements
mentioned above) by a transformation which is a combination of translation and dilatation
1 x−b
family: Ψ1,0 (x) 7−→ Ψa,b := √ Ψ1,0 ; a ∈ R+ , b ∈ R .
a a
If you do a proper selection of a’s and b’s, you can get a wavelet family which forms a basis (like
{sin(nωt), cos(nωt)} do).
With the following set of parameters a and b:
a := 2− j ; b := k · 2− j ; j, k ∈ Z ,
f= ∑ c j,k · ψ j,k .
j,k∈Z
The difference, compared with the ordinary Fourier transformation, is that the coefficients form a two-
dimensional array; and you may ask what the benefit of having even more coefficients than with a Fourier
transformation will be. The answer is that the wavelet decomposition can be done in a tricky way: that
only (the first) very few coefficients hold most of the information of your function f and almost all other
components can be neglected. Of course, this depends on the class of functions and on the selected
(mother) wavelets.
The second big task—after having selected the wavelet family you want to use—is how to get the
coefficients, or, more generally, how to perform the Discrete Wavelet Transformation
DWT
f (x) 7−→ {c j,k } ; c j,k ∈ R; j, k ∈ Z .
69
M. H OFFMANN
The algorithm for the DWT is a bit tricky, but this problem has been solved and a very efficient
algorithm exists. Unfortunately it is out of the scope of this lecture to explain how it works. So please
consult the literature. But still one word: it makes use of iterative (digital) filter banks and the problem
can best be understood in frequency domain. Also, the concept of the scaling function plays an impotant
role here, limiting the number of coefficients to a finite number.
Ψ00
1
0 ≤ x < 0.5
Ψ00 (x) := −1 0.5 ≤ x < 1 x.
0 else
70
D IGITAL SIGNAL PROCESSING MATHEMATICS
infinity (especially all the coefficients with j < 0) or—and this is, of course, recommended—you add
(at least) one additional function to the set of wavelets which replaces the infinite number of smaller
and smaller scale wavelets; namely, Φ j0 ,k ; k ∈ Z ( j0 is fixed here, so these functions form
R
only a one
dimensional array), the scaling function. The scaling function is not a wavelet, since Φ(x) dx = 1 is
required, but you can prove that the set
Φ j0 ,k , Ψ j,k ; j ≥ j0 , k ∈ Z
spans the same space as the full basis Ψ j,k ; j, k ∈ Z .
Now you might still be worried about the k within the definition, but consider our example: let us
choose j0 = 0. The restriction of the domain to [0; 1[ means that we need only consider wavelets with
0 ≤ k < j and there is a maximal j because of our sampling resolution ( j < m). All in all, the number
of non-zero wavelet components is limited to a finite number. Finally, the missing scaling function is
simply (
1 for 0 ≤ x < 1 ,
Φ0,k :=
0 else
independent of k, so we need only one. Now all functions (with a limited number of samples) can be
transformed to a finite set of wavelet coefficients. If the number of non-zero wavelet coefficients is
smaller than the number of samples, you might be happy.
Unfortunately, the application of the wavelets is limited: Although the discrete wavelet transfor-
mation is well defined, and efficient algorithms have been worked out, the success of using the wavelets
depends on the choice of the wavelet family. If you cannot find a clever wavelet family which fits well
with your particular problem, you will be lost, and there is no generic way to help you out there.
Acknowledgements
I would like to thank Kirsten Hacker for proofreading the manuscript. Thanks to all those who sent
me their comments and also pointed out some bugs and confusions after the presentation at the CERN
school. If this started fruitful discussions, I am happy.
Bibliography
Many ideas for instructive pictures are taken from Smith’s book, which is pretty much a beginner’s
guide to digital signal processing. Figures 12, 14, 23, 25, 28, and 29 have their origin there. There are
many other books on digital signal processing, wavelets, and the Kalman filter. Here, I just list a short
collection of textbooks and similar papers which inspired and taught me the latter. You have to find out
by yourself if they will also be useful to you.
– S.W. Smith, The Scientist and Engineer’s Guide to Digital Signal Processing (California Technical
Pub., San Diego, CA, 1997).
– W. Kester, Mixed-Signal and DSP Design Techniques (Newnes, Amsterdam, 2003).
– W.H. Press, B.P. Flannery, S.A. Teukolsky and W.T. Vetterling, Numerical Recipes in C: The Art
of Scientific Computing, 2nd ed. (Cambridge University Press, 1992).
– B.D.O. Anderson and J.B. Moore, Optimal Filtering (Prentice-Hall, Englewood Cliffs, NJ, 1979).
– G.F. Franklin, J.D. Powell and M.L. Workman, Digital Control of Dynamic Systems, 3rd ed.
(Addison-Wesley, Menlo Park, CA, 1998).
– E. Kreyszig, Advanced Engineering Mathematics, 8th ed. (Wiley, New York, 1999).
– D. Lancaster, Don Lancaster’s Active Filter Cookbook, 2nd ed. (Newnes, Oxford, 1996).
– P.M. Clarkson, Optimal and Adaptive Signal Processing (CRC Press, Boca Raton, 1993).
71
M. H OFFMANN
– G. Strang and T. Nguyen, Wavelets and Filter Banks (Cambridge Univ. Press, Wellesley, MA,
1997).
– B. Widrow and S.D. Stearns, Adaptive Signal Processing (Prentice-Hall, Englewood Cliffs, NJ,
1985).
– G. Welch and G. Bishop An Introduction to the Kalman Filter, University of North Carolina at
Chapel Hill, Department of Computer Science, http://www.cs.unc.edu/~{welch,gb}.
– Wikipedia, The Free Encyclopedia. May 1, 2007, http://en.wikipedia.org/.
72