2022-Maurice Bellanger - Digital Signal Processing_ Theory and Practice, 10th Edition-WILEY (2024) (1)
2022-Maurice Bellanger - Digital Signal Processing_ Theory and Practice, 10th Edition-WILEY (2024) (1)
所有
Digital Signal Processing
Digital Signal Processing
Tenth Edition
Maurice Bellanger
CNAM, Paris
France
Translated by
Benjamin A. Engel
This edition first published 2024
Copyright © 2024 by John Wiley & Sons Ltd. All rights reserved.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any
form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law.
Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/
permissions.
The right of Maurice Bellanger to be identified as the author of this work has been asserted in accordance with law.
Originally published in France as: Traitement numérique du signal 10th edition By Maurice BELLANGER
© Dunod 2022, Malakoff
Registered Offices
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products visit us at
www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears
in standard print versions of this book may not be available in other formats.
Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or
its affiliates in the United States and other countries and may not be used without written permission. All other
trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product
or vendor mentioned in this book.
Contents
17 Applications 335
17.1 Frequency Detection 335
17.2 Phase-locked Loop 337
17.3 Differential Coding of Speech 338
17.4 Coding of Sound 339
17.5 Echo Cancelation 340
17.5.1 Data Echo Canceller 340
17.5.1.1 Two-wire Line 340
17.5.2 Acoustic Echo Canceler 342
17.6 Television Image Processing 342
17.7 Multicarrier Transmission – OFDM 344
17.8 Mobile Radiocommunications 347
References 349
Index 363
xi
The most important and most impactful technical revolutions are not always those that are most
evident to a product’s end user. Modern digital signalprocessing methods fall into the category of
impactful technical revolutions whose consequences are not immediately perceptible, and which
do not make the front page.
It is interesting to reflect, for a moment, on the way in which such techniques emerge. Digital
computation, applied to a signal in the broadest sense, is certainly not a new idea in itself. When
Kepler derived the laws of motion of the planets from the series of observations made by his
father-in-law Tycho Brahe, his was a truly numerical computation of the signal – in this case, the
signal being Brahe’s observations of the planets’ positions over time. In recent decades, though,
digital signal processing has become a discipline in its own right. What has changed is the way it
can now process electrical signals in real time, using digital technologies.
This leap forward is the cumulative result of technical progress in numerous fields – starting,
of course, with the capability of recording the data we wish to process in the form of an electrical
signal. This has been contingent on the gradual development of what are known as information
sensors, which can range in complexity from a simple stress gage (which, in itself, took a great deal
of research in solid mechanics to make possible) to a radar system.
In addition, with the marvelous progress in micro-electronics came the necessary technological
tools, capable, at the extremely fast rates required for real-time processing, of performing the arith-
metical operations that the earliest computers (the ENIAC was built in 1945, not long ago in the
grand scheme of things) took hours to do, often being interrupted by repeated breakdowns. Today,
these operations can be carried out by microprocessors weighing only a few grams and consuming
only a few milliwatts of power, capable of functioning for over a decade without breakdown.
Finally, we have had to wait for progress in programming techniques – i.e. the optimal use of these
technologies – because though the computational capacities of modern microprocessors are vast, it
is unwise to waste those capacities on performing unnecessary operations. The invention of the fast
Fourier transform algorithms is one of the most striking examples of the importance of program-
ming methods. This convergence of technical progress in fields ranging from physics to electronics
to mathematics has not been unintentional. To a certain extent, every step forward has created a
new problem, which was then solved by new progress in a different field. It would undoubtedly be
helpful, from the standpoint of the history and epistemology of science and technology, to have an
in-depth study of this lengthy and complicated process.
Indeed, the consequences are already considerable. Indisputably, analog processing of electrical
signals came before digital processing, and analog processing will surely continue to have an impor-
tant role to play in certain applications, but the benefits of digital processing can be expressed in two
words: accuracy and reliability. Certain applications have only been made possible by the accuracy
xii Foreword (Historical Perspective)
and reliability offered by digital technologies, which go far beyond the sectors of electronics and
telecommunications in which these techniques first emerged. As one example among many, in
X-ray tomodensitography, scanners are based on the application of a theorem developed by Johann
Radon in 1917. Only the developments mentioned above have enabled the practical implemen-
tation of this new medical diagnostic tool. It is a safe bet that, in tomorrow’s world, digital signal
processing techniques will be used in increasingly varied products, including consumer electron-
ics. However, it is an equally safe bet that the general public, while benefitting from the lower
prices and higher performance and reliability offered by these techniques, will remain blissfully
unaware of the phenomenal and complex combination of research, technology, and invention
represented by this progress. This shift has already begun in the case of television receivers.
However, when these technical revolutions take place, another problem almost inevitably arises.
We need to train users to get to grips not just with a new tool, but often, an entirely new way of
thinking. If we are not careful, such training can easily become a bottleneck, delaying the intro-
duction of new techniques. Therefore, this book is a particularly important addition to the field.
Its author, Maurice Bellanger, has been teaching for many years at the École Nationale Supérieure
des Télécommunications and the Institut Supérieur d’Électronique de Paris. It is a highly didactic
book, containing relevant exercises as well as in-depth explanations and multiple programs, which
certain people will often be able to make use of exactly as they are. Without a doubt, it will help
open the door to desirable and necessary evolution.
P. Aigrain, 1981
xiii
Preface
In signal processing, digital techniques offer a fantastic range of possibilities: rigorous system
design, flexibility, reproducibility of equipment, stability of operating features, and ease of super-
vision and monitoring. However, there is a certain amount of abstractness in these techniques,
and, in order to apply them to real-world cases, we need a set of theoretical knowledge, which
may represent an obstacle to their use. This book aims to break down the barriers and make
digital techniques accessible to readers by drawing the connection between theory and practice
and providing users with the most widely used results in the domain, at their fingertips.
The foundation upon which this book is built is the author’s teaching at engineering
schools – first the École nationale supérieure des télécommunications and the Institut supérieur
d’électronique de Paris, and later, Supélec and CNAM. The book offers a clear and concise
presentation of the main techniques used in digital processing, comparing them on their merits
and giving the most useful results in a form that is directly usable, both for the design and for
the concrete implementation of systems. Theoretical explanations have been condensed to what
is absolutely necessary for a thorough understanding and a correct application of the results.
Bibliographic references are provided, where interested readers will find further information
about the topics discussed herein. At the end of each chapter are a few exercises, often drawn from
real-world examples, to allow readers to test their absorption of the material in the chapter and
familiarize themselves with its application. Answers to these exercises and guidelines are given at
the end of the book.
With respect to previous editions, this new edition offers additional information, simplifi-
cations, and also a new chapter about one of the most important tools in the field of artificial
intelligence – neural networks, as they relate to adaptive systems.
As with the previous editions, this one owes a great deal to the author’s students and colleagues.
Thanks to them all for their contributions and assistance.
xv
Introduction
us to transition from the discrete-time domain to the discretefrequency domain. It lends itself very
well to spectral analysis, with a frequency step dividing the sampling frequency of the signals being
analyzed.
Fast computation algorithms offer gains, as they enable operations to be performed in real time,
provided certain elementary conditions are met. Thus, the discrete Fourier transform is not only a
fundamental tool in determining the processing characteristics and in the study of the impacts of
those characteristics on the signal, but it is also used in the production of popular devices, such as
mobile radio and digital television. Chapters 2 and 3 are dedicated to these algorithms. To begin
with, they present the elementary properties and the mechanism of fast computation algorithms
and their applications before moving on to a set of variants associated with practical situations.
A significant portion of this book is devoted to the study of one-dimensional invariant linear
time-discrete systems, which are easily accessible and highly useful. Multi-dimensional systems,
and, in particular, two- and three-dimensional systems, are experiencing significant development.
For example, they are applied to images. However, their properties are generally deduced from
those of one-dimensional systems, of which they are often merely simplified extensions. Nonlinear
or time-variable systems either contain a significant subset, retaining the properties of linearity and
time-invariance, or can be analyzed with the same techniques as systems that have those properties.
Linearity and time-invariance lead to the existence of a convolution relation, which governs the
operation of the system or filter having those properties. This convolution relation is defined on
the basis of the system’s response to the elementary signal which represents a pulse – the impulse
response – by an integral in the case of analog signals. Thus, if x(t) denotes the signal to be filtered,
and h(t) is the filter impulse response, the filtered signal y(t) is given by the equation:
∞
y (t) = h (𝝉) x (t − 𝝉) d𝝉
∫−∞
In these conditions, such a relation, which directly expresses the filter’s real operation, offers
limited practical interest. To begin with, it is not particularly easy to determine the impulse
response on the basis of criteria that define the filter’s intended operation. In addition, an equation
that contains an integral cannot easily be used to recognize and check the filter’s behavior. Design
is much easier to address in the frequency domain because the Laplace transform or Fourier
transform can be used to move to a transformed plane where the convolution relations from the
amplitude–time plane become simple products of functions. The Fourier transform matches the
system’s frequency response to the impulse response, and the filtration is then the product of that
frequency response by the Fourier transform, or spectrum, of the signal to be filtered.
In discrete digital systems, the convolution is expressed by a sum. The filter is defined by a series
of numbers, representing its impulse response. Thus, if the series to be filtered is written as x(n),
the filtered series y(n) is expressed by the following sum, where n and m are integers:
∑
y (n) = h (m) x (n − m)
m
Two scenarios then arise. Firstly, the sum may pertain to a finite number of terms – i.e. the h(m)
values are zero, except for a finite number of values of the integer variable m. The filter is known as
a finite impulse-response filter. In reference to its realization, it is also referred to as non-recursive,
because it does not require a feedback loop from output to input in its implementation. It occupies
finite memory space because it only retains the memory of an elementary signal – an impulse, for
example – for a limited time. The numbers h(m) are called the coefficients of the filter, which they
define completely. They can be calculated directly, in a very simple way – for instance, by means of a
Fourier series development of the frequency response. This type of filter exhibits highly interesting
Introduction xvii
original features (for example, the possibility of a rigorously linear phase response – i.e., a constant
group delay); the signals whose components are within the filter’s passband are not deformed as
they pass through the filter. This possibility is exploited in data transmission systems, or spectral
analysis, for example.
Alternatively, the sum may pertain to an infinite number of terms, and the h(m) may have
an infinite number of nonzero values; the filter is called an infinite impulse-response filter, or
recursive, because its memory must be set up as a feedback loop from output to input. Its operation
is governed by an equation whereby an element in the output series y(n) is calculated by the
weighted sum of a number of elements of the input series x(n), and a certain number of elements
of the previous output series. For example, if L and K are integers, the filter’s operation may be
defined by the following equation:
∑
L
∑
K
y (n) = al x (n − l) − bk y (n − k)
l=0 k=1
The al(l = 0, 1, …, L) and bk(k = 1, 2, …, K) are the coefficients. As is the case with analog
filters, this type of filter generally cannot easily be studied directly; it is necessary to go through
a transformed plane. The Laplace or Fourier transforms could be used for this purpose. However,
there is a transform that is much more suitable – the Z transform, which is the equivalent for
discrete systems. A filter is characterized by its Z-transfer function, generally written as H(Z),
which involves the coefficients in the following equation:
∑
L
al Z−l
l=0
H (Z) =
∑
K
1+ bk Z−k
k=1
To obtain the filter’s frequency response, in H(Z), we simply need to replace the variable Z with
the following expression, where f denotes the frequency variable, and T the time step between the
signal samples:
Z = e j2𝜋f T
In this operation, the imaginary axis in the Laplacian plane corresponds to the circle with unit
radius, centered at the origin in the plane of the variable Z. It is plain that the frequency response
of the filter defined by H(Z) is a periodic function whose period is the sampling frequency.
Another representation of the function H(Z), which is useful in the design of filters and the
study of a number of properties, explicitly includes the roots of the numerator, also known as the
zeroes of the filter, Zl(l = 1, 2, …, L), and the roots of the denominator, also known as the poles,
Pk(k = 1, 2, …, K):
∏
L
(1 − Zl Z −1 )
l=1
H (Z) = a0
∏
K
(1 − Pk Z −1 )
k=1
The term a0 is a scaling factor which defines the gain of the filter. The filter stability condition
is expressed very simply by the following constraint: all the poles must be within the unit circle.
The position of the poles and zeroes with respect to the unit circle offers a very simple way of
determining the characteristics of the filter; this technique is very widely used in practice.
xviii Introduction
Four chapters are devoted to the study of the characteristics of these digital filters. Chapter 4
presents the properties of time-invariant discrete linear systems, recaps the main properties of
the Z-transform, and lays down the fundamental groundwork necessary for the study of filters.
Chapter 5 discusses finite impulse-response filters – their properties are studied, the techniques
for calculating the coefficients are described, and the structures of real-world filters are examined.
Infinite impulse-response filters are generally produced by cascading first- and second-order ele-
mentary cells, or sections, so Chapter 6 describes these sections and their properties. To begin with,
this makes the study of this type of system considerably easier; in addition, the chapter provides a set
of results that are highly useful in practice. Chapter 7 outlines the methods for calculating the coef-
ficients for infinite impulse-response filters and discusses the problems posed by their real-world
implementation, with the limitations that are encountered and the consequences of those limita-
tions – in particular, computational noise.
As the properties of infinite impulse-response filters are comparable to those of continuous ana-
log filters, it is natural to envisage similar structures for these filters to those generally employed
in analog filtering. This is the subject of Chapter 8, which presents ladder structures. We then take
a diversion to look at switched-capacitor filters, which are not digital in the strictest sense of the
word, but which are sampled, and are highly useful additions to digital filters. To guide users, a
summary of the respective merits of the structures described is given at the end of the chapter.
Certain devices – for example, in instrumentation or telecommunications – work on signals
represented by a series of complex numbers. Out of all signals of this type, one category is of
particular practical interest: analytic signals. Their properties are studied in Chapter 9, as is the
design of devices apt for the generation or processing of such signals. Additional concepts relating
to filtering are also explained in this chapter, which, in a unified manner, presents the main
interpolation techniques. Signal restoration is also discussed.
Digital processing machines, when operating in real time, operate at a rate that is closely linked
to the signal sampling frequency. Their complexity depends on the volume of operations being
carried out, and the length of time available in which to perform this processing. The signal
sampling frequency is generally imposed either at system input or at output, but within the system
itself, it is possible to vary this rate in order to adapt it to the characteristics of the signal and the
processing, and thereby reduce the volume of operations and the computation rate. The machines
may be simplified – potentially very significantly – if, over the course of the processing, the
sampling frequency is adapted to suit the usable bandwidth of the signal; this is multirate filtering,
which is presented in Chapter 10. The impacts on the processing characteristics are described,
along with realization methods. Rules are provided on usage and assessment. This technique
produces particularly interesting results for narrow passband filters or the implementation of sets
known as filter banks. In this case, the system associates, with a set of phase-shifting circuits, a
discrete Fourier transform calculator.
Filter banks for the breakdown and reconstruction of signals have become a fundamental tool for
compression. The way in which they work is described in Chapters 11 and 12 with design methods
and realization structures.
The filters can be determined on the basis of time-domain specifications; such is the case, for
example, with the modeling of a system, as described in Chapter 13. If the characteristics vary, it
may be interesting to adapt the coefficients as a function of changes occurring in the system. This
adaptation may depend on an approximation criterion and take place at a rate that may come to
equal the system’s sampling rate; then, the filter is said to be adaptive. Chapter 14 is devoted to
adaptive filtering, in the simplest of cases, but also the most common and the most useful – where
the approximation criterion chosen is the minimization of the mean squared error, and where the
Introduction xix
coefficients vary depending on the gradient algorithm. After recapping details of random signals
and their properties in Chapter 13 – in particular, the autocorrelation function and matrix, whose
eigenvalues play an important role – the gradient algorithm is presented in Chapter 14, and its
convergence conditions are studied. Then, the two main adaptation parameters, the time constant
and the residual error, are analyzed along with the arithmetic complexity. Different structures are
proposed for concrete implementation.
Chapter 15 can be viewed as an extension of Chapters 13 and 14 to the domain of neural networks
in artificial intelligence. These devices are characterized by the systematic use of nonlinear circuits
for the functions of modeling, classification, or shape recognition. Adaptive techniques are used
during the learning phases.
Chapter 16 discusses a very specific application: error-correction coding. Indeed, information
processing and transmission systems include error-correction coding techniques, which are gener-
ally introduced by a mathematical approach, though some of the most widely used types of coding
are actually direct applications of the fundamental signal processing techniques. Thus, the chapter
puts forward a signalprocessing vision of certain types of coding, to facilitate readers’ access to and
use of these techniques.
Finally, Chapter 17 briefly describes some applications, showing how the fundamental methods
and techniques are put to use.
1
The conversion of an analog signal to digital form involves a twofold approximation. Firstly, in the
time domain, the signal function s(t) is replaced by its values at integral time increments T and is
thus converted to s(nT). This process is called sampling. Secondly, in the amplitude domain, each
value of s(nT) is approximated by a whole multiple of an elementary quantity. This process is called
quantization. The approximate value thus obtained is then associated with a number. This process
is called coding – a term often used to describe the whole process by which the value of s(nT) is
transformed into the number representing it.
The effect of these two approximations on the signal will be analyzed in this chapter. To achieve
this, two basic tools will be used: Fourier analysis and distribution theory.
Fourier analysis is a method of decomposing a signal into a sum of individual components which
can easily be produced and observed. The importance of this decomposition is that a system’s
response to the signal can be deduced from these individual components using the superposition
principle. These elementary component signals are periodic and complex, so both the amplitude
and phase of the systems can be studied. They are represented by a function se (t) such that:
where f is the inverse of the period – that is, the frequency of the elementary signal.
Since the elementary signals are periodic, clearly, the analysis is simplified when the signal itself
is periodic. This case will be examined first, although it is not the most interesting, since a periodic
signal is completely determinate and carries practically no information.
Under certain conditions, this function can be expanded in a Fourier series as:
∑
∞
s(t) = Cn ej2𝜋nt∕T (1.3)
n=−∞
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
2 1 Signal Digitizing – Sampling and Coding
ip(t)
T
a
τ t
The index n is an integer, and the Cn , called the Fourier coefficients, are defined by:
T
1
Cn = s(t)e−j2𝜋nt∕T dt (1.4)
T ∫0
In fact, the Fourier coefficients minimize the square of the difference between the function s(t)
and the series (1.3). Expression (1.4) is obtained by taking the derivative with respect to the index
n coefficient of the quantity:
( )2
T ∑
∞
j2𝜋mt∕T
s(t) − Cm e dt
∫0 m=−∞
∑
∞
1
T
|Cn |2 = |s(t)|2 dt (1.7)
n=−∞
T ∫0
The constituent elements resulting from the expansion of a periodic signal have frequencies
which are integer multiples of 1/T (the inverse of the period). They form a discrete set in the space of
all frequencies. In contrast, if the signal is not periodic, the Fourier components form a continuous
domain in the frequency space.
where
∞
S(f ) = S(f )ej2𝜋ft dt (1.9)
∫−∞
The function S(f ) is the Fourier transform of s(t). More commonly, S(f ) is called the spectrum of
signal s(t).
Example: To calculate the Fourier transform I(f ) of an isolated pulse i(t) of width 𝜏 and amplitude
a, centered on the time origin (Figure 1.2):
∞ 𝜏∕2
I(f ) = i(t)e−j2𝜋ft dt = a e−j2𝜋ft dt
∫−∞ ∫−𝜏∕2
sin(𝜋f𝜏)
I(f ) = a𝜏 (1.10)
𝜋f𝜏
Figure 1.3 represents the function I(f ). This will be used frequently i(t)
in this book. It is important to note that it will be zero for nonzero fre- a
quencies which are whole multiples of the inverse of the impulse width.
A table of this function is given in Appendix 1.
This example clearly shows the correspondence between the Fourier
coefficients and the spectrum. In effect, by comparing equations (1.6) t
τ
and (1.10), it can be verified that, apart from the factor 1/T, the coef-
ficients of the Fourier series expansion of an impulse train correspond Figure 1.2 Isolated
impulse.
to the values of the spectrum of the isolated impulse at frequen-
cies which are whole multiples of the inverse of the period of the
impulses.
In the case of a nonperiodic function, there is an expression similar to the Bessel–Parseval
relation, but this time the energy in the signal is conserved, instead of the power:
∞ ∞
|S(f )|2 df = |s(t)|2 dt (1.11)
∫−∞ ∫−∞
Let s′ (t) be the derivative of the function s(t); its Fourier transform Sd (f ) is given by:
∞
Sd (f ) = e−j2𝜋ft s′ (t)dt = j2𝜋fS(f ) (1.12)
∫−∞
Thus, taking the derivative of a signal leads to multiplying its spectrum by j2𝜋f .
One essential property of the Fourier transform (in fact, the main reason for its use) is that it
transforms a convolution into a simple product. Consider two time functions, x(t) and h(t), with
Fourier transforms X(f ) and H(f ), respectively. The convolution y(t) is defined by:
∞
y(t) = x(t) ∗ h(t) = x(t − 𝜏)h(𝜏)d𝜏 (1.13)
∫−∞
2 1 0
– – 1 2 f
τ τ τ τ
4 1 Signal Digitizing – Sampling and Coding
and therefore,
∞
sin(𝜋𝜙𝜏) sin(𝜋(f − 𝜙)𝜏) 1 sin(𝜋f𝜏)
d𝜙 =
∫−∞ 𝜋𝜙𝜏 𝜋(f − 𝜙)𝜏 𝜏 𝜋f𝜏
Taking f = n/𝜏, for any integer n,
∞
sin(𝜋𝜙𝜏) sin(𝜋(𝜙𝜏 − n))
• d𝜙 = 0 (1.15)
∫−∞ 𝜋𝜙𝜏 𝜋(𝜙𝜏 − n)
Thus, the functions sin 𝜋(x – n)/[𝜋(x – n)], with n being an integer, forms a set of orthogonal
functions.
The definition and properties of the Fourier transform can be extended to multivariate functions.
Let s(x1 , x2 , …, xn ) be a function of n real variables. Its Fourier transform is a function S(𝜆1 , 𝜆2 , …,
𝜆n ) defined by:
S(𝜆1 , 𝜆2 , … , 𝜆n ) = ··· s(x1 , x2 , … , xn ) × e−j𝟐𝜋(𝜆1 x1 +𝜆2 x2 +···+𝜆n xn ) dx1 dx2 · · · dxn (1.16)
∫ ∫ℝn ∫
If the function s(x1 , x2 , …, xn ) is separable – that is, if:
then:
The variables xi (1 ⩽ i ⩽ n) often represent distances (for example, for two dimensions), and in
that case, 𝜆i are called spatial frequencies.
1.2 Distributions
1.2.1 Definition
A distribution D is defined as a continuous linear function in the vector space 𝒟 of functions
defined in ℝn , indefinitely differentiated, and having a bounded support.
1.2 Distributions 5
With each function 𝜑 belonging to 𝒟 , the distribution D associates a complex number D(𝜑)
which will also be denoted by ⟨D, φ⟩, with the properties:
(1) D(𝜑1 + 𝜑2 ) = D(𝜑1 ) + D(𝜑2 ).
(2) D(𝜆𝜑) = 𝜆D(𝜑) where 𝜆 is a scalar.
(3) If 𝜑j converges to 𝜑 when j tends toward infinity, the sequence D(𝜑j ) converges to D(𝜑).
Examples:
(i) If f (t) is a function which is summable over any bounded ensemble, it defines a distribution
Df by:
∞
⟨Df , 𝜙⟩ = f (t)𝜙(t)dt (1.17)
∫−∞
(ii) If 𝜑′ denotes the derivative of 𝜑, the function:
∞
⟨D, 𝜙⟩ = f (t)𝜙′ (t)dt = ⟨f , 𝜙⟩ (1.18)
∫−∞
is also a distribution.
(iii) The Dirac distribution 𝛿 is defined by:
⟨𝛿, 𝜙⟩ = 𝜙(0) (1.19)
The Dirac distribution 𝛿 at a real point x is defined by:
⟨𝛿(t − x), 𝜙⟩ = 𝜙(x) (1.20)
This distribution is said to represent a mass of +1 at the point x.
(iv) Consider a pulse i(t) of duration 𝜏, with amplitude a = l/𝜏, centered on the origin. It defines a
distribution Di :
𝜏∕2
1
⟨Di , 𝜙⟩ = 𝜙(t)dt
𝜏 ∫−𝜏∕2
For very small values of 𝜏, this becomes:
⟨Di , 𝜙⟩ ≃ 𝜙(0)
that is, the Dirac distribution can be regarded as the limit of the distribution Di when 𝜏 tends toward
0.
This set is a distribution of unit mass points separated on the abscissa by whole multiples of T.
Its Fourier transform is:
∑
∞
Fu = e−j2𝜋fnT = U(f ) (1.26)
n=−∞
1 ∑ j2𝜋nt∕T
∞
lim ip (t) = e
𝜏→0 T n=−∞
The following fundamental property is demonstrated in Ref. [2].
The Fourier transform of the time distribution, represented by unit mass points separated by whole
multiples of T is a frequency distribution of points of mass 1/T separated by whole multiples of 1/T.
That is:
∑ ( )
1 ∑
∞ ∞
n
U(f ) = ej𝜋fnT = 𝛿 f− (1.27)
n=−∞
T n=−∞ T
This result will be used when studying signal sampling. The property of the Fourier transform
whereby it exchanges convolution and multiplication applies equally to distributions.
Before considering the influence of the sampling and quantizing operations on the signal, it is
useful to discuss the characteristics of those signals which are most often studied.
A signal is defined as a function of time s(t). This function can be either an analytic expression or
the solution of a differential equation, in which case the signal is said to be deterministic.
1.3 Some Commonly Studied Signals 7
s(t) = A cos(𝜔t + 𝛼)
where A is the amplitude, 𝜔 = 2𝜋f is the angular frequency, and 𝛼 is the phase of the signal.
Signals of this type are easy to reproduce and recognize at different points of a system. They allow
the various characteristics to be visualized in a simple way. Moreover, as mentioned above, they
serve as the basis for the decomposition of any deterministic signal through the Fourier transform.
If the system is linear and invariant in time, it can be characterized by its frequency response
H(𝜔). For each value of the frequency, H(𝜔) is a complex number whose modulus is the amplitude
of the response. By convention, the function 𝜑(𝜔) such that:
is defined as the phase. This convention allows the group delay 𝜏(𝜔), a positive function for real
systems, to be expressed as:
d𝜙
𝜏(𝜔) = (1.29)
d𝜔
The group delay refers to transmission lines on which different frequencies of the signal
propagate at different speeds, which leads to energy dispersion in time. As an illustration of the
concept, let us consider two close frequencies 𝜔 ± Δ𝜔 and the corresponding phases per unit
length 𝜑 ± Δ𝜑. The sum signal is expressed by:
or
This is a modulated signal, and there is no dispersion if the two factors in the above expres-
sion undergo the same delay per unit length – that is, Δ𝜑/Δ𝜔 is constant. Thus, the group delay
characterizes the dispersion imposed on the signal energy by a transmission line or any equivalent
system.
If the sinusoidal signal s(t) is applied to the system, then an output signal sr (t) is obtained such
that:
Once again, this is a sinusoidal signal, and comparison with the applied signal reveals the
response of the system. The importance of this procedure (for example, for test operations) can
readily be appreciated.
Deterministic signals, meanwhile, do not give a good representation of real signals because
they do not carry any information except by the fact of their presence. Real signals are generally
represented by a random function s(t). For testing and analyzing systems, random signals are also
used, but they must have particular characteristics which do not present undue complications for
their generation and use. A study of such signals is given in Vol. 2 of Ref. [2].
8 1 Signal Digitizing – Sampling and Coding
p(x, t) = p(x)
It is of second-order if it possesses a first-order moment called the mean value, which is the
mathematical expectation of s(t), denoted by E[s(t)] and defined by:
∞
m1 (t) = E[s(t)] = xp(x, t)dx (1.32)
∫−∞
and a second-order moment called the covariance:
∞ ∞
E[s(t1 )s(t2 )] = m2 (t1 , t2 ) = x1 x2 p(x1 , x2 ; t1 , t2 )dx1 dx2
∫−∞ ∫−∞
where p(x1 ,x2 ;t1 ,t2 ) is the probability density of a pair of random variables [s(t1 ),s(t2 )].
The stationarity can be limited to the moments of first and second order; then, the signal is said
to be stationary of order 2 or stationary in the wider sense. For such a signal:
∞
E[s(t)] = xp(x)dx = m1
∫−∞
The independence of time is translated for the probability density p(x1 , x2; t1 ,t2 ) as follows:
where,
𝜏 = t2 − t1
The ergodicity of this average illustrates that it takes a particular value k with probability 1. For a
stationary signal, the ergodicity of the time average implies equality with the average of the ampli-
tudes at a given instant. In effect, we use the expectation of the variable mT :
[ T∕2 ] T∕2
1 1
E[mT ] = k = E lim s(t)dt = lim E[s(t)]dt = m1
T→∞ T ∫−T∕2 T→∞ T ∫−T∕2
1.3 Some Commonly Studied Signals 9
This result has important consequences in practice, as it provides a means of finding the statistical
properties of the signal at a given instant from observation over the time period. The ergodicity of
the covariance in the stationary case is also very interesting because it leads to the relation:
T∕2
1
rxx (𝜏) = lim s(t)s(t − 𝜏)dt (1.36)
T→∞ T ∫−T∕2
The autocorrelation function r xx (𝜏) of the signal s(t) is fundamental for the study of ergodic
second-order stationary signals. Its principal properties are:
(2) Its maximum is at the origin and corresponds to the power P of the signal:
(3) The power spectral density is the Fourier transform of the autocorrelation function:
∞ ∞
Φxx (f ) = rxx (𝜏)ej2𝜋f𝜏 d𝜏 = 2 rxx (𝜏) cos(2𝜋f𝜏)d𝜏
∫−∞ ∫0
In effect, r xx (𝜏) = s(𝜏)*s(−𝜏), and if S(f ) denotes the Fourier transform of s(t), we obtain:
This last property is physically translated by the fact that the more rapidly the signal varies (that
is, the more its spectrum tends toward high frequencies), the narrower its autocorrelation function
becomes. In the limit case, the signal is purely random, and the function becomes zero for 𝜏 ≠ 0.
This signal is called white noise and is described by:
rxx (𝜏) = P𝛿
Φxx (f ) = P
In fact, such a signal has no physical reality since its power is infinite, but it does offer a use-
ful mathematical model for signals with a spectral density that is virtually constant over a wide
frequency band.
The parameter m is the mean of the variable x; the variance 𝜎 2 is the second-order moment of
the centered random variable (x − m); and 𝜎 is the standard deviation. The variable (x − m)/𝜎 has
a zero average and a unit standard deviation. A tabulation is given in Appendix 2.
A random variable is characterized by its amplitude probability law and also by its moments mn ,
such that:
∞
mn = xn p(x)dx (1.39)
∫−∞
These moments are the coefficients of the series expansion of the function F(u), called the
characteristic function of the random variable x and defined by:
∞
F(u) = ejux p(x)dx (1.40)
∫−∞
It is the inverse Fourier transform of the probability density p(x) which is expressed by:
∞
1
p(x) = e−jux F(u)du (1.41)
2𝜋 ∫−∞
On the basis of equation (1.40), the following series expansion is obtained:
∑
∞
(ju)n
F(u) = mn (1.42)
n=0
n!
and for a centered Gaussian variable:
2 u2 ∕2
F(u) = e−𝜎 (1.43)
The normal law can be generalized to multidimensional random variables [3]. The characteristic
function of a k-dimensional Gaussian variable x(x1 ,x2 , … ,xk ) is expressed by:
( )
1 ∑∑
k k
F(u1 , u2 , … , uk ) = exp − r uu (1.44)
2 i=1 j=1 ij i j
where
rij = E(xi xj )
The probability density is obtained through Fourier transformation. For two dimensions, we get:
{ [ ]}
1 1 x12 2rx1 x2 x22
p(x1 , x2 ) = √ exp − − + 2 (1.45)
2𝜋𝜎 𝜎 (1 − r 2 ) 2(1 − r 2 ) 𝜎12 𝜎1 𝜎2 𝜎2
1 2
1 2 2
p(x) = √ e−x ∕(2𝜎 ) (1.47)
𝜎 (2π)
1.3 Some Commonly Studied Signals 11
is an approximation to white Gaussian noise which is used in the analysis of systems or for modeling
signals. It is a stationary signal with a zero average and a spectral density which is not strictly
constant, but which corresponds to a uniform distribution filtered by an RC-type low-pass filter. It
is obtained by amplifying the thermal noise generated at the terminals of a resistor.
The Gaussian distribution can be obtained from a uniform distribution. Let p(y) be the Rayleigh
distribution:
y 2 2
p(y) = 2 e−(y ∕2𝜎 ) ; y ⩾ 0 (1.48)
𝜎
and p(x) the uniform distribution over the interval [0,1]. Changing variables so that:
p(x)dx = p(y)dy
one gets:
dx dx y 2 2
p(y) = p(x) = = 2 e−(y ∕2𝜎 )
dy dy 𝜎
and therefore,
√[ ( )]
−(y2 ∕2𝜎 2 ) 1
x=e ; y=𝜎 21n (1.49)
x
The Gaussian distribution is obtained by considering two independent variables, x and y, and the
variable z, given by:
z′ = y sin 2𝜋x
Hence:
1 1 −(y2 ∕2𝜎 2 ) 1 −(z2 +z′ 2 ∕2𝜎 2 )
p(z)p(z′ ) = e = e
2𝜋 𝜎 2 2𝜋𝜎 2
and finally,
1 2 2
p(z) = √ e−(z ∕2𝜎 )
𝜎 2𝜋
The method is often used to generate digital Gaussian signals.
Following that convention, the peak factor for a Gaussian signal is 12.9 dB, and when this
definition is applied to a sinusoidal signal, a peak factor of 3 dB results.
A stationary model used to represent telephone signals is formed by the random signal whose
amplitude probability density obeys the following exponential, or Laplace, distribution:
√
1
p(x) √ e− 2|x|∕𝜎 (1.52)
𝜎 2
The peak factor rises, in this case, to 17.8 dB.
In conclusion, ergodic second-order stationary functions characterized by an amplitude proba-
bility distribution and an autocorrelation function can be used to model the majority of signals of
interest. They are widely used in system analysis.
In addition to the other possibilities for representing signals, it is important to have some
sort of global measure, so as to be able, for example, to follow a signal through the processing
system. Such a measure can be obtained by defining norms on the function which represents
the signal.
(1) p = 1:
1
||s||1 = |s(t)|dt (1.53a)
∫0
(2) p = 2:
1
||s||22 = |s(t)|2 dt (1.53b)
∫0
which is the expression for the energy of the signal s(t).
(3) p = ∞:
The Lp norms can be generalized by introducing a real, positive weighting function of p(x). The
weighted Lp norm of the difference function f (x) − F(x) is then written:
[ 1 ]1∕p
||f (x) − F(x)||p = |f (x) − F(x)| p(x)dx
p
(1.53d)
∫0
These expressions are applied when calculating filter coefficients and in realization problems – in
particular for the scaling of internal data in memories and for noise estimation.
1.5 Sampling
Sampling consists of representing a signal function s(t) by its value s(nT) taken at whole multiples
of the time interval T, called the sampling period. Such an operation can be analyzed in a simple
and concise way by using distribution theory. In fact, by definition, the distribution of unit masses at
whole multiple points on the axis, with period T, associates with the function s(t) the ensemble of its
values s(nT), where n is a whole number. Conforming to the notation given earlier, this distribution
is denoted by u(t) and is written:
∑
∞
u(t) = 𝛿(t − nT)
n=−∞
The sampling operation affects the spectrum S(f ) of the signal. Considering the fundamental
relation (1.27), it appears that the spectrum U(f ) of the distribution u(t) is formed of lines of ampli-
tude 1/T at frequencies which are whole multiples of the sampling frequency f s = 1/T. Thus, u(t)
is expressed as a sum of elementary sinusoidal signals, collectively called carriers:
1 ∑ j2𝜋nt∕T
∞
u(t) = e (1.54)
T n=−∞
Hence, the set of values for the signal s(nT) corresponds to the product with the signal s(t) of
the ensemble of component signals which make up u(t). That is, the operation of sampling is an
amplitude modulation of the signal by an infinite number of carriers with frequencies which are
whole multiples of the sampling frequency f s = 1/T. Consequently, the sampled signal spectrum
includes the function S(f ), called the baseband, and other images or sidebands, which correspond
to the translation of the baseband by whole multiples of the sampling frequency. The operation of
sampling and its influence on the signal spectrum are represented in Figure 1.4.
s s
t 0 f
s s
T t 0 1 f
= fe
T
The sampled signal spectrum Ss (f ) is expressed as the convolution of S(f ) with U(f ), so that:
∑
∞
Ss (f ) = 1∕T S(f − n∕T) (1.55)
n=−∞
It is important to note that the function Ss (f ) is periodic – that is, the sampling has introduced a
periodicity into the frequency space, which constitutes a fundamental characteristic of the sampled
signals.
The sampling operation, as described above, which is called ideal sampling, may seem rather
unrealistic in that it would be difficult in practice to manipulate or reconstitute an instantaneous
signal value. Real samplers, or circuits which reconstitute samples, all possess a certain aperture
time. In fact, it can be shown that sampling, or the reconstitution of samples by pulses having a
given width, simply introduces a modification of the signal spectrum.
In effect, when sampling a signal x(t) by a set of pulses separated by period T, with width 𝜏 and
amplitude a, it is quite possible for a quantity 𝜎 n to be collected in the nth period, and this is written:
nT+𝜏∕2
𝜎n = a s(t)dt
∫nT−𝜏∕2
This quantity expresses the result of the convolution of the signal s(t) by the elementary pulse
i(t). The function which results in this case at the sampling time nT is the function s*i. That is, the
sampled signal does not have S(f ) for its spectrum, but the product:
( ) ( )
1 ∑
∞
n n
Sp (f ) = U(f )S(f ) = S 𝛿 f−
T n=−∞ T T
Similar reasoning applies in the case of reconstitution of samples with a duration 𝜏. In fact, it is
the convolution product of the samples s(nT) with the elementary pulse i(t) which is reconstituted.
Thus, we have the proposition:
Sampling or the reconstitution of samples by pulses with width 𝜏 can be treated as ideal sampling or
ideal reconstitution, provided that the signal spectrum is multiplied by the spectrum of the elementary
pulse train.
In practice, however, whenever τ is small in comparison with the period T, the correction is
negligible.
i(t)
p T
a
τ t
I(f) aτ
T
1 f
1 τ
T
This theorem establishes the conditions under which the set of samples of a signal correctly repre-
sents that signal. A signal is said to be correctly represented by the set of its samples taken with the
periodicity T, if it is possible, from this set of values, to completely reconstitute the original signal.
The sampling has introduced a periodicity of the spectrum in frequency space. To restore the
original signal, this periodicity must be suppressed – that is, the image bands must be eliminated.
This can be achieved using a low-pass filter with a transfer function H(f ) which is equal to 1/f s up
to the frequency f s /2 and is zero at higher frequencies. At the output of such a filter, a continuous
signal appears, which can be expressed as a function of the values s(nT). The square wave impulse
response h(t) of the filter is written, using equation (1.10), as:
sin(𝜋t∕T)
h(t) =
𝜋t∕T
The signal at the output of the filter, s(f ), corresponds to the convolution product of the set s(nT)
with the function h(t). It is written as:
[ ∞ ]
∞ ∑ sin 𝜋(t − 𝜃)∕T
s(t) = s(𝜃)𝛿(𝜃 − nT) d𝜃
∫−∞ n=−∞ 𝜋(t − 𝜃)∕T
and hence,
∑
∞
sin 𝜋(t∕T − n)
s(t) = s(nT) (1.57)
n=−∞
𝜋(t∕T − n)
This is the formula for interpolating signal values at points sited between the samples. It can
be verified that it reproduces s(nT) for multiples of the period T, and the reconstitution process is
represented in Figure 1.6.
In order for the calculated signal s(t) to be identical to the original signal, the spectrum S(f ) has to
be identical to the spectrum of the original signal. As shown in Figure 1.6, this condition is satisfied
if and only if the original spectrum does not contain any components with frequencies greater than
or equal to f s /2.
16 1 Signal Digitizing – Sampling and Coding
s(t)
Se(f)
0 fe = 1 f T t
1 T h(t)
H(f) 1
fe
fe 0 fe f 0 1 =T t
–
2 2 fe
s(t)
S(f)
0 f 0 t
T
Se(f)
0 fe 1 f
fe =
2 T
If this is not the case, and the image bands overlap the baseband, as shown in Figure 1.7, a
foldover distortion or aliasing is introduced and the restoring filter produces a signal that is dif-
ferent from the original one. From this, the sampling theorem or Shannon’s theorem is derived:
A signal which does not contain any component with a frequency equal to or greater than a value
f m is completely determined by the set of its values at regularly spaced intervals of period T = 1/(2f m ).
Thus, the sampling frequency of a signal is determined by the upper limit of its frequency band.
In practice, the signal band is generally limited by filtering to a value below f s /2 before sampling
at frequency f s , in order that the restoring filter can be practically realizable.
It is interesting to note that the sampling frequency is determined by the bandwidth of the signal.
The reconstitution illustrated in Figure 1.6 was for a low-frequency signal with which a low-pass
filter was associated. It is apparent that the same reasoning can also be applied to a signal occupying
a restricted region in frequency space by using an associated band-pass filter. In particular, this
property is applicable to modulated signals and is used in certain types of digital filters.
The result given at the end of Section 1.1.2 enables sampling to be presented from another view-
point. Equation (1.57) shows that sampling corresponds to decomposition of the signal s(t) in accor-
sin 𝜋(t∕T−n)
dance with the set of orthogonal functions 𝜋(t∕T−n) and Shannon’s theorem simply expresses the
condition for this set to form a signal decomposition basis (several decomposition bases may exist).
The properties given above can be clearly illustrated by sampling sinusoidal signals, whose features
are of use in numerous applications.
1.8 Sampling of Sinusoidal and Random Signals 17
s(n) = cos(2πfn + 𝜙)
Then,
The set exhibits the periodicity N 2 and comprises at most N 2 different numbers. On the other
hand, since the sampling frequency is more than twice the signal frequency, N1 ∕N2 < 12 . The
ensemble of N 2 different samples permits the representation of a number of sinusoidal signals
equal to the largest whole number less than N 2 /2. For example, if N 2 = 8, with the ensemble of
numbers – 2 cos (2πn/8 + 𝜑), (n = 0,1, …,7), it is possible to represent samples of three sinusoidal
signals:
( )
N1
2 cos 2𝜋 t + 𝜙 with N1 = 1,2, 3
8
Figure 1.8 represents this sampling for 𝜑 = 0, and in this particular case, four numbers are suffi-
√
cient: ±2 and ± 2.
If we then add to the three sinusoidal signals in Figure 1.8 the continuous signal with the value 1,
and the oscillating signal cos (𝜋t) with frequency 12 , and amplitude 1, the result is that when the
composite signal is sampled, zero values appear. This is true except for points which are multiples
of 8, where the value 8 is obtained as shown in Figure 1.9(a). The spectrum of this sum is obtained
directly by applying the relation:
1 jx
cos x = (e + e−jx )
2
It is formed of lines with an amplitude of 1 at frequencies which are multiples of 18 (Figure 1.9(b)).
This spectrum has already been studied in Section 1.2, and equation (1.27) again applies.
Spectrum analyzers and digital frequency synthesizers use the fact that it is possible to produce
a range of sinusoidal signals from a limited set of numbers which are stored, for example, in a
computer memory.
A
2
2
0 1 2 3 4 5 6 7 8 t
– 2
–2
A
8
0 1 2 3 4 5 6 7 8 t
(a)
A
0 1 1 1 f
8 2
(b)
∑
3
Figure 1.9 (a) Sampling the signal: s(t) = 1 + 2 cos(2𝜋n∕8t) + cos(𝜋t). (b) corresponding spectrum.
n=1
This is the sampling of the autocorrelation function r xx (𝜏) of the continuous random signal
defined by expression (1.34). Its Fourier transform gives the power spectrum density Φd (f ) of the
discrete signal, which is related to the spectral density Φxx (f ) of the continuous signal by a relation
similar to equation (1.55); that is,
( )
1 ∑
∞
n
Φd (f ) = Φxx f − (1.59)
T n=−∞ T
If the sampling frequency is not high enough, or if the spectrum Φxx (f ) spans an infinite domain,
aliasing takes place.
The hypothesis of ergodicity for the discrete signal s(n) leads to the relation:
1 ∑N
r(n) = lim s(i)s(i − n) (1.60)
N→∞ 2N + 1
i=−N
This relation gives the opportunity to extend the concept of autocorrelation function to determin-
istic signals. Then, for a periodic signal with period N 0 , the autocorrelation function is the sequence
r(n) given by:
1 ∑
N0 −1
r(n) = s(i)s(i − n) (1.61)
N0 i=0
Example:
( )
n
s(n) = A sin 2𝜋
N0
( ) ( )
1 ∑
N0
i i−n
r(n) = A2 sin 2𝜋 sin 2𝜋
N0 i=0 N0 N0
2
( )
A n
r(n) = cos 2𝜋
2 N0
The period of r(n) is N 0 and r(0) is the signal power.
A discrete random signal can also be defined directly. For example, if r(n) = 0 for n ≠ 0, the
signal s(n)
[ is a] discrete white noise, and the spectral density is constant over the frequency
interval − 12 , 12 . This signal represents a physical reality – it is a sequence of noncorrelated
random variables, and it can be obtained through an algorithm which produces statistically
independent numbers.
p(x)
0.5
H 0.4
1 2 3 4 5 17 0.3
0.2
0.1
–1 1 0 1 1 x
–
2 2
“exclusive OR”
S
Figure 1.10 Pseudorandom generator and the probability distribution after filtering.
20 1 Signal Digitizing – Sampling and Coding
By applying to this set a narrowband filter which passes only the band 450–550 Hz, an approx-
imately Gaussian signal is obtained. The signal has a peak factor of 10.5 dB and is an excellent
test signal for digital transmission equipment. If the filtering is performed numerically, the set of
numbers obtained can be used to test digital processing equipment.
1.9 Quantization
Quantization is the approximation of each signal value s(t) by a whole multiple of an elementary
quantity q which is called the quantizing step. If q is constant for all signal amplitudes, the quan-
tization is said to be uniform. This operation is carried out by passing the signal through a device
which has a staircase characteristic and produces the signal sq (t), as shown in Figure 1.11 for q = 1.
The way in which the approximation is made defines the centering of this characteristic. For
example,
( ) the diagram
( represents
) the case (called rounding), where each value of the signal between
1 1
n − 2 q and n + 2 q is rounded to nq. This approximation minimizes the power of the error
signal. It is also possible to have approximation by default, which is obtained by truncation and
consists of approximating every value between nq and (n + 1)q by nq; the characteristic is therefore
displaced by q/2 toward the right on the abscissa.
The effect of this approximation is to superimpose on the original signal an error signal e(t) called
the quantizing distortion or, more commonly, the quantizing noise. Thus:
The case of rounding is illustrated in Figure 1.12. The amplitudes at odd multiples of q/2 are
called the decision amplitudes. The amplitude of the error signal lies between −q/2 and q/2 and
power measures the degradation undergone by the signal.
When the variations in the signal are large relative to the quantizing step – that is, when quanti-
zation has been carried out sufficiently finely – the error signal is equivalent to a set of elementary
signals which are each formed from a straight-line segment (Figure 1.13). The power of such an
Q
s(t) sq(t)
sq(t)
5
4
3
2
1
–5 –4 –3 –2 –1 0 1 2 3 4 5 s(t)
–2
–3
–4
–5
1.9 Quantization 21
s(t)
sq(t)
q
e(t)
e(t)
q
2
τ 0 τ t
–
2 2
q
2
[ ]
1 sin(𝜋f𝜏)
E𝜏 (f ) = q − cos(𝜋f𝜏) (1.65)
j2𝜋f 𝜋f𝜏
It appears that the majority of the energy is found around the frequency 1/𝜏. Under these
conditions, the spectral distribution of the error signal depends both on the slope of the elemen-
tary signal – that is, on the statistical distribution of the derivative s′ (t) of the signal – and on
the size of the step q relative to the signal. Reference [7] gives the calculation of this spectrum
for a noise signal and shows the spread as a function of frequency, when the quantizing step is
sufficiently small, to be on a range several hundred times the width of the signal band. If the signal
to be quantized is not random, the spectrum of the error signal will be concentrated on certain
frequencies – for example, the harmonics of a sinusoidal signal.
22 1 Signal Digitizing – Sampling and Coding
When converting an analog signal into digital form, quantization occurs, along with the sam-
pling, as the two operations are carried out in succession. While sampling is normally carried out
first, it is equally valid to carry out the quantization first and the sampling second, at a frequency
f s which is usually a little over twice the bandwidth of the signal. Under these conditions, the error
signal often has a spectrum which extends beyond the sampling frequency, and since it is actually
the sum of the signal and the error signal which is sampled, aliasing occurs, and the whole energy
of the error signal is recovered in the frequency band ( −f s /2, f s /2). In the majority of cases, the
spectral energy density of the quantizing noise is constant, and the following statement can be
made:
The noise produced during uniform quantization with a step q has a power which is gener-
ally expressed by B = q2 /12 and shows a constant spectral distribution in the frequency band
(−f s /2, f s /2).
It should be noted that the quantization of small signals (those with amplitude of the same order
of magnitude as step q) depends critically on the centering of the characteristic. For example, with
the centering in Figure 1.11, a sinusoidal signal with an amplitude of less than q/2 is totally sup-
pressed. It is possible, nevertheless, to suitably code these small signals by superimposing on them a
large-amplitude auxiliary signal which is removed later in the process. Thus, the coding introduces
limits on the small amplitudes of the signal. Equally, however, other limits appear in the case of
large amplitudes, as will be seen below.
[ N ]2
1 2 q
Pc = = 22N−3 q2
2 2
Figure 1.14 illustrates this signal together with the quantizing step and the decision amplitudes.
The coding dynamic range is defined as the ratio between this peak power and the power of
the quantizing noise; this is, in fact, the maximum value of the signal-to-noise ratio (SNR) for a
sinusoidal signal with uniform coding. The following formula expresses this dynamic range:
3
Pc ∕B = (S∕B)max = 22N−3 •12 = ⋅ 22N
2
1.10 The Coding Dynamic Range 23
+Am
2N·q q O
–Am
x
4
3 0.004
0.018
2 0.054
0.13
1 0.24
0.352
q 0
0.1 0.2 0.3 0.4 0.5 p(x)
–1
–2
–3
–4
y
1
111
110
101
100
1/2
011
010
1/4
001
000
0
1 1 1 1 1 1 1 x
64 3216 8 4 2
This characteristic causes seven straight-line segments to appear in both the positive and negative
quadrants. As the two segments which encompass the origin are colinear, the characteristic has a
total of 13 segments.
Since the quantization of the y amplitudes is carried out with the quantum q, quantization of
the x amplitudes near the origin is based on a quantum q/16 – that is, the dynamic of the coder
is increased by 24 dB. Amplitudes close to unity are less well quantized as the step is multiplied
by 4. The power of the quantization noise is a function of the signal amplitude – it is necessary to
calculate an average for each value, and this interferes with the statistics of the signal.
Figure 1.17 gives the SNR for a Gaussian signal as a function of the signal level after coding into
8 bits for linear and nonlinear coding. The reference level for the signal (0 dB) is the peak power
of the coder, and it is clearly apparent that the dynamic is extended by nonlinear coding. For low
amplitudes, quantization in fact corresponds to 12 bits. In practice, the signal can be coded by linear
quantization into 12 bits, followed by a process which is very close to the conversion of an integer
into a floating-point number:
S
(dB)
B
40
30
20
10
24 dB
+0 0 0 1 0 1 1 0 1 1 0
+1 0 0 0 1 1 0
The coding can be improved when the probability distribution p(x) of the signal amplitude is
known. For a given number of bits N, an optimal quantizing characteristic can be found, which
minimizes the total quantizing distortion.
The signal amplitude range is divided into M = 2N subsets (xi−1 ,xi ) with − (M/2) + 1 ⩽ i ⩽ M/2
and every subset is represented by a value yi, as shown in Figure 1.18. The optimization consists of
determining the set of values xi and yi which minimize the error signal power E2 expressed by:
∑
M∕2 xi
E2 = (x − yi )2 p(x)dx
i=−(M∕2)+1
∫xi −1
yM/2
yM/2–1
y2
y1
Taking the derivative with respect to xi and yi , it can be shown that the following relations must
hold:
1 M M
xi = (yi + yi+1 ) for − +1⩽i⩽ −1
2 2 2
xi
M M
(x − yi )p(x)dx = 0 for ∈ − + 1 ⩽ i ⩽
∫xi −0 2 2
p(xm∕2 ) = p(x−M∕2 ) = 0 (1.70)
These relations lead to the determination of the quantizing characteristic. If p(x) is an even
function, x0 = 0, and an iterative procedure is used, starting with an a priori choice for y1 . If
relation (1.70) is not satisfied for M/2, another initial choice is made for y1 and so on [9].
Table 1.1 gives the error signal power obtained with a Gaussian signal of unit power, for
several numbers of bits N, for the optimal coding and for uniform coding with the best
scaling of the quantizing characteristic [9]. Table 1.2, taken from Reference [10], gives, for
the same conditions, the values which correspond to a signal probability density following
expression (1.48).
The coding optimization can also be carried out with respect to the information content, by intro-
ducing the concept of entropy H defined as follows [2, Vol. 3]:
∑
H = − pi log2 (pi ) (1.71)
i
with
M M
− +1⩽i⩽ ,
2 2
and where pi designates the probability that the signal is in the amplitude subrange represented
by yi .
Considering that:
∑
pi = 1
i
N 1 2 3 4 5
E
N 1 2 3 4 5
E2
the entropy is zero when the amplitude is concentrated in a single subrange. It is maximal when
the signal amplitude is uniformly distributed when it takes the value H max equal to the number of
bits N of the coder:
Hmax = log2 M = N (1.72)
In fact, the entropy measures the difference between a given distribution and the uniform distri-
bution. The quantizing characteristic which maximizes the entropy is that which leads to amplitude
subranges corresponding to a uniform probability distribution.
The last row in Table 1.1 shows that for a Gaussian signal, the quantizing law which minimizes
the error signal power leads to entropy values close to the maximum N.
The results obtained for sampling and quantization can be used, inversely, to evaluate the quantity
of information carried by a signal or to determine the capacity of a transmission channel.
A real channel of bandwidth f m can carry 2f m independent samples per second, as shown in
Figure 1.4, by replacing 𝜏 with 2f m . The quantity of information per sample depends on the relative
powers of the useful signal, the noise, and their amplitude distributions.
An important particular case is the Gaussian channel [11].
Let us assume a set of M symbols of N bits each is to be transmitted by a channel in the presence
of white Gaussian noise of power B = 𝜎b2 .
In a M-dimension hyperspace, the M symbols occupy the volume of a hypersphere V M
defined by:
R
RM
VM = r M−1 dr ··· f (𝜃i ) d𝜃i = F (1.73)
∫o ∫i=1,..,M−1 ∫ M 𝜃
If a uniform distribution of the symbols in the hypersphere with radius R is assumed, the energy
of the corresponding signal is:
R
1 M
ES = r 2 r M−1 dr ··· f (𝜃 )d𝜃 = R2 (1.74)
VM ∫0 ∫i=1 ∫M−1 i i M+2
The quantity of transmitted information is M⋅N bits. In the hypersphere, it is possible to associate
with each set of bits a volume V s expressed by:
( )M
VM 1 R
Vs = MN = F(𝜃) (1.75)
2 M 2N
Now, a M-component noise with energy Eb = M𝜎b2 is assigned to each set of bits. When M
tends toward infinity,
√ the point representing the noise in the hypersphere comes close to a sphere
with radius 𝜎b2 M and centered on the point representing the set of bits. In fact, for M Gaussian
√ √
∑M 2 M2
random variables b(n), the variable r = n=1 b (n) has the first-order moment m1 = 𝜎b M+1
and its variance m2 − m21 tends toward zero when M tends toward infinity. The volume of the
sphere is:
√
( M 𝜎b )M
Vb = F(𝜃) (1.76)
M
1.14 Binary Representations 29
The condition for the absence of transmission errors is that the volume of the sphere be included
in the volume assigned to each set of bits, which leads to:
√ R
M 𝜎b < N (1.77)
2
However, when M tends toward infinity, according to (1.74), R2 represents the energy of the sum
of the signal with power S and the noise:
( )
R2 = M S + 𝜎b2 (1.78)
– distortion-free channel
– white Gaussian noise
– infinite transmission delay
In practice, channel equalization and error-correcting codes make it achievable to approach that
limit with finite transmission delay.
There are several ways of establishing the correspondence between the set of quantized amplitudes
and the set of binary numbers which must represent them. As the signals to be coded generally have
both positive and negative amplitudes, the preferred representations are those which preserve the
sign information. The following are the most usual representations:
Digital processing machines, and particularly general-purpose ones, often use floating-point
representations in which each number has three parts: the sign bit, the mantissa, and the expo-
nent. The mantissa represents the fractional part, and the exponent is a power of the base number,
for example, with base 10: +0.719 × 105 .
The dynamic range extension comes from the multiplicative effect introduced by the exponent.
6
For example, in base 2 for a 6-bit exponent and 16-bit mantissa, the dynamic range is: 22 × 216 =
280 ≃ 1024 – that is, 24 decimal numbers. Additional gain is achieved by choosing a base which is,
itself, a power of two, such as 8 or 16, leading to octal or hexadecimal operations (Figure 1.19).
10–1
10–2
10–3
10–4
10–5
1.B Appendix 2: The Reduced Normal Distribution 31
0 1 0 0 0 0 0 0 0
1 0.99589 −0.04742 0.02429 −0.01633 0.01229 −0.00986 0.00823 −0.00706
2 0.98363 −0.08942 0.04684 −0.03173 0.02399 −0.01929 0.01613 −0.01385
3 0.96340 −0.12566 0.06721 −0.04588 0.03482 −0.02806 0.02350 −0.02021
4 0.93549 −0.15591 0.08504 −0.05847 0.04455 −0.03598 0.03018 −0.02599
5 0.90032 −0.18006 0.10004 −0.06926 0.05296 −0.04287 0.03601 −0.03105
6 0.85839 −0.19809 0.11196 −0.07804 0.05989 −0.04850 0.04088 −0.03528
7 0.81033 −0.21009 0.12069 −0.08466 0.06520 −0.05301 0.04466 −0.03859
8 0.75683 −0.21624 0.12614 −0.08904 0.06880 −0.05606 0.04730 −0.04091
9 0.69865 −0.21682 0.12832 −0.09113 0.07065 −0.05769 0.04874 −0.04220
10 0.63662 −0.21221 0.12732 −0.09095 0.07074 −0.05787 0.04897 −0.04244
11 0.57162 −0.20283 0.12329 −0.08856 0.06910 −0.05665 0.04800 −0.04164
12 0.50455 −0.18921 0.11643 −0.08409 0.06581 −0.05406 0.04587 −0.03983
13 0.43633 −0.17189 0.10702 −0.07770 0.06099 −0.05020 0.04265 −0.03707
14 0.36788 −0.15148 0.09538 −0.06960 0.05479 −0.04518 0.03844 −0.03344
15 0.30011 −0.12862 0.08185 −0.06002 0.04739 −0.03914 0.03335 −0.02904
16 0.23387 −0.10394 0.06682 −0.04924 0.03989 −0.03298 0.02751 −0.02399
17 0.17001 −0.07811 0.05071 −0.03753 0.02980 −0.02470 0.02110 −0.01841
18 0.10929 −0.05177 0.03392 −0.02522 0.02007 −0.01667 0.01426 −0.01245
19 0.05242 −0.02544 0.01688 −0.01261 0.01006 −0.00837 0.00716 −0.00626
∞ ∞
√ e−x ∕2 ∫𝝀 e−x
2 ∕2
∫𝝀 e−x
1 2 2 2 ∕2
f (x) = P= P=
2
2𝝅
√
2𝝅
dx √
2𝝅
dx
∞ ∞
√ e−x ∕2 ∫𝝀 e−x
2 ∕2
∫𝝀 e−x
1 2 2 2 ∕2
f (x) = P= P=
2
2𝝅
√
2𝝅
dx √
2𝝅
dx
3 1 −𝜆∕2
P≈ e
4 𝜆
Exercises
1.1 Consider the Fourier series expansion of the periodic function i(t) of period T, which is zero
throughout the period except for the range − 𝜏/2 ⩽ t ⩽ 𝜏/2, where it has a value of 1.
Give the value of the coefficients for 𝜏 = T/2 and 𝜏 = T/3.
Verify that the expansion leads to i(0) = 1 and draw the function when the expansion is
limited to 5 terms.
1.2 Analyze the sampling at the frequency f s of the signal s(t) = sin (𝜋f s t + 𝜑) when 𝜑 varies
from 0 to 𝜋/2.
Examine the reconstitution of this signal from the samples.
Exercises 33
1.3 Calculate the amplitude distortion introduced into a signal reconstituted by pulses with a
width of half the sampling period.
1.4 A signal occupies the frequency band [f 1 , f 2 ]. What conditions should be imposed on the
frequency f 1 so that this signal can be sampled directly at a frequency between f 2 and 2f 2 ?
1.7 A digital frequency synthesizer is constructed from read-only memory of 16 kbits with an
access time of 500 ns. Knowing that the numbers which represent the samplings of sinu-
soidal signals total 8 bits, what are the characteristics of the synthesizer, and the frequency
range and increment step that can be obtained?
1.8 What is the probability distribution of the amplitudes of the sinusoidal signal:
( )
t
s(t) = A cos 2𝜋
T
Give its autocorrelation function. Give the autocorrelation function of a stationary random
Gaussian function whose spectrum has a uniform distribution in the frequency band
(f 1 , f 2 ).
1.9 Calculate the spectrum of a set of impulses with width T/2, separated by T, the occurrence
of each pulse having probability p. In particular, examine the case where p = 1/2.
What happens to the spectrum if these pulses form a pseudorandom sequence with a length
24 −1 = 15 produced by a 4-bit shift register, following the polynomial g(x) = x4 + x + 1?
1.10 A sinusoidal signal with frequency 1050 Hz is sampled at 8 kHz and coded into 10 bits. What
is the maximum value of the SNR? What is the value of the signal-to-quantization-noise ratio
measured in the frequency band 300–500 Hz? What are the values if the sampling frequency
is increased to 16 kHz?
34 1 Signal Digitizing – Sampling and Coding
1.11 The sinusoidal signal sin (2𝜋t/8 + 𝜑) with 0 ⩽ 𝜑 ⩽ 𝜋/2 is sampled with period T = 1 and
coded into 5 bits.
In the case where 𝜑 = 0, calculate the power and the spectrum of the quantization noise.
How does this spectrum appear as a function of the phase 𝜑?
1.12 Consider a coding scale in which the quantizing step has a value q. What is the quantization
of the signal s1 (t) = 𝛼q sin (𝜔1 t) for −1 ⩽ 𝛼 ⩽ 1, as a function of the centering of the quan-
tization characteristic? Show the envelope of the restored signal after decoding and narrow
filtering around the frequency 𝜔1 .
The signal s2 (t) = 10q sin 𝜔2 t is superimposed on s1 (t). Show the envelope of the restored
signal under these conditions.
1.13 Assume a Gaussian signal is to be coded. How many bits would be required to have the
signal-to-quantization-noise ratio greater than 50 dB? Can this number be reduced if signal
clipping is allowed for 1% of this time?
1.14 The signal s(t) = A sin (2𝜋⋅810t) is coded into 8 bits. If the quantization step is q, trace the
curve which shows the signal-to-quantization-noise ratio as a function of the amplitude A
when this amplitude varies from q to 27 q. Sketch the corresponding curve for nonlinear
coding following the 13-segment A-law.
1.15 Calculate the limits of the amplitude subranges for the optimal 2-bit coding of a unit
Gaussian signal.
References
1 A. Papoulis, The Fourier Integral and its Applications, McGraw-Hill, New York, 1962.
2 E. Roubine, Introduction à la théorie de la communication, 3 vols, Masson, Paris, 1970.
3 W. B. Davenport, Probability and Random Processes, McGraw-Hill, New York, 1970.
4 J. R. Rice, The Approximation of Functions, vol. 1, Addison-Wesley, Reading, Mass, 1964.
5 B. Picinbono, Principles of Signals and Systems, Artech House Inc., London, 1988.
6 W. Peterson, Error Correcting Codes, MIT Press, 1972.
7 W. B. Bennet, Spectra of quantized signals. The Bell System Technical Journal, 1948.
8 CCITT, Digital Networks — transmission systems and multiplexing equipment. Yellow Book,
Vol. III, 3, Geneva, Switzerland, 1981.
9 J. Max, Quantizing for minimum distortion. IRE Transactions on Information Theory 6, 7–12,
1960.
10 M. D. Paez and T. H. Glisson, Minimum mean-squared error quantization in speech PCM and
DPCM systems. IEEE Transactions on Communications 20, 225–30, 1972.
11 C. Shannon, Communication in the presence of noise, Proceedings of I.R.E., Vol. 37, pp. 10–21,
Jan. 1949 (reprinted in: Proceedings of the IEEE, Sept. 1984 and Feb. 1998).
35
The discrete Fourier transform (DFT) is introduced when the Fourier transform of a function is to
be calculated using a digital computer. This type of processor can handle only numbers and, in a
quantity limited by the size of its memory. It follows that the Fourier transform:
∞
S(f ) = s(t)e−j2𝜋ft dt
∫−∞
must be adapted, by replacing the signal s(t) with the numbers s(nT) which represent a sample
of the signal, and by limiting to a finite value N the set of numbers on which the calculations are
carried out. The calculation then provides numbers S*(f ) defined by
∑
N−1
S ∗ (f ) = s(nT)e−j2𝜋fnT
n=0
As the computer has limited processing power, it can only provide results for a limited number
of values of the frequency f , and it is natural to choose multiples of a certain frequency step Δf .
Thus,
∑
N−1
S ∗ (kΔf ) = s(nT)e−j2𝜋nkΔfT
n=0
The conditions under which the calculated values form a good approximation to the required
values are examined below. An interesting simplifying choice is to take Δf = 1/NT. Then there are
only N different values of S*(k/NT), which is a periodic set of period N since:
S ∗ [(k + N)∕NT] = S ∗ (k∕NT)
On the other hand, the transform thus calculated appears as discrete values and, as shown in
Section 1.6, this property is characteristic of the spectrum of periodic functions. Thus, the set
S*(k/NT) is obtained by the Fourier transform of the set s(nT), which is periodic, with period NT.
The DFT and the inverse transform establish the relations between these two periodic sets.
The definition, properties, methods of calculation, and applications of the DFT have been
discussed in numerous publications. Overviews are given in References [1–4].
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
36 2 The Discrete Fourier Transform
If two sets of complex numbers, x(n) and X(k), which are periodic with period N, are chosen, then
the DFT and the inverse transform establish the following relationships between them:
1∑
N−1
X(k) = x(n)e−j2𝜋nk∕N (2.1)
N n=0
∑
N−1
x(n) = X(k)ej2𝜋kn∕N (2.2)
k=0
The position of the scale factor 1/N is chosen so that the X(k) are the coefficients of the Fourier
series expansion of the set x(n). This transformation has the following properties.
Linearity: If x(n) and y(n) are two sets with the same period and with transforms X(k) and Y (k),
respectively, the set v(n) = x(n) + 𝜆y(n), where 𝜆 is a scalar, has the transform
A translation of the x(n) implies a rotation of the phase of the X(k). If the transform Xn0 (k) of the
set x(n − n0 ) is calculated, then,
∑
N−1
Xn0 (k) = x(n − n0 )e−j2𝜋nk∕n = X(k)e−j2𝜋n0 k∕N
n=0
A translation of the x(n) from n0 induces on X(k) a rotation of the phase through an angle equal
to 2𝜋n0 k/N.
Symmetry: If the set x(n) is real, the numbers x(k) and X(N − k) are complex conjugates:
∑
N
X(N − k) = x(n)ej2𝜋n(N−k)∕N = X(k)
n=0
If the set x(n) is real and even, then so is the set X(k). Indeed, if x(N − n) = x(n) then, for example,
for N = 2P + 1:
( )
∑p
nk
X(N − k) = x(0) + 2 x(n) cos 2𝜋 = X(k)
n−1
N
If the set x(n) is real and odd, the set X(k) is purely imaginary. In this case, x(N − n) = − x(n) and
x(0) = x(N) = 0. For example, for N = 2P + 1, this becomes:
( )
∑p
nk
X(k) = −2j x(n) sin 2𝜋 = −X(N − k)
n=1
N
It should be noted that X(0) = X(N) = 0.
As any real signal can always be decomposed into odd and even parts, these last two symmetry
properties are important.
Circular convolution: The transform of a convolution product is equal to the product of the
transforms.
If x(n) and h(n) are two sets with period N, the circular convolution y(n) can be defined by the
equation:
∑
N−1
y(n) = x(l)h(n − l) (2.3)
𝜄=0
2.1 Definition and Properties of the Discrete Fourier Transform 37
This is a set which has the same period N. Its transform is written as:
[ ]
∑ ∑
N−1 N−1
Y (k) = x(l)h(n − l) e−j2𝜋nk∕N
n=0 𝜄=0
[N−1 ]
∑
N−1
∑
−j2𝜋(n−l)k∕N
= x(l) h(n − l)e e−j2𝜋lk∕N
l=0 n=0
(N−1 ) (N−1 )
∑ ∑
−j2𝜋[(n−l)k∕N] −j2𝜋lk∕N
Y (k) = h(n − l)e x(l)e = H(k)X(k) (2.4)
n=0 𝜄=0
This is an important property of the DFT. A direct application will be given later.
Parseval’s relation: This relation states that the power of a signal is equal to the sum of the powers
of its harmonics. Thus,
1∑ 1∑ ∑
N−1 N−1 N−1
x(n)x(n) = x(n) X(k)e−j2𝜋kn∕N
N n=0 N k=0 k=0
1∑ 1∑ 1∑
N−1 N−1 N−1
|x(n)|2 = X(k) x(n)e−j2𝜋kn∕N
N n=0 N k=0 N n=0
1∑ ∑
N−1 N−1
|x(n)|2 = |X(k)|2 (2.5)
N n=0 k=0
Relations with Fourier series: Due to the presence of the scale factor 1/N in the definition (2.1)
of the DFT, the output terms X(k) represent, except for spectrum aliasing effects, the coeffi-
cients of the Fourier series development of the periodic signal, when this signal exhibits no
discontinuity. If it is not the case, noticeable discrepancies emerge. In fact, it can be shown
that, if there is a discontinuity in function x(t) at time t0 , the Fourier series development of
x(t) at time t0 equals the average of the left and right limits of x(t) when t tends toward t0 . By
contrast, the inverse DFT restores the exact original signal samples and, therefore, the terms
X(k) include the DFT of the distribution of the discontinuities with the inverse sign and scale
factor 0.5.
Illustration: x(t) is the following triangular signal:
x(t) = t; 0 ≤ t < 1
x(t + 1) = x(t)
Coefficients of the Fourier series:
1 j
C0 = ; Cn = ; n ∶ integer; n ≠ 0
2 2𝜋n
The magnitude of the discontinuity at the origin is 1 and the DFT of order N gives the following
values:
1
X(k) = − + X ′ (k); X ′ (k) ≈ Ck
N
This specificity of the DFT is an undesirable effect when a development with the smallest number
of non-negligible coefficients is sought, as in signal compression, for example. If the signal is real,
a symmetric signal is appended, which cancels the discontinuity and leads to the discrete cosine
transform described in Section 3.3.4.
38 2 The Discrete Fourier Transform
However, the most important property of the DFT probably lies in the fact that it lends itself
to efficient calculation techniques. This property has won it a prominent position in digital signal
processing.
The equations defining the DFT provide a relationship between two sets of N complex numbers.
This is conveniently written in matrix form by setting
W = e−j2𝜋∕N . (2.6)
The coordinates of the numbers W n , and the coefficients of the DFT, appear on the unit circle in
the complex plane as shown in Figure 2.1, and are the roots of the equation Z N − 1 = 0.
The matrix equation for the direct transform is as follows:
⎡ X0 ⎤ ⎡1 1 1 1 ··· 1 ⎤ ⎡ x0 ⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ X1 ⎥ ⎢1 W W2 W3 · · · W N−1 ⎥ ⎢ x1 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ X ⎥ = 1 ⎢1 W 2 W 4 W 6 · · · W 2(N−1) ⎥ ⎢ x2 ⎥
⎢ 2
⎥ N⎢ ⎥⎢ ⎥
⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ··· ⋮ ⎥⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ (N−1) 2(N−1) (N−1)(N−1) ⎥⎢ ⎥
⎣XN−1 ⎦ ⎣1 W W ··· W ⎦ ⎣xN−1 ⎦
For the inverse transform, it is sufficient to remove the factor 1/N and change W n to W −n .
The square matrix T N of order N exhibits obvious features – rows and columns with the same
index have the same elements and these elements are powers of a basic number such that W N = 1.
Significant simplifications can be envisaged under these conditions, leading to algorithms for
fast calculation. A FFT is said to have been carried out when the DFT is calculated using such
algorithms.
An important case occurs when N is a power of 2 because it leads to algorithms which are simple
and particularly effective. These algorithms are based on a decomposition of the set to be trans-
formed into a number of interleaved subsets. The case of interleaving in the time domain will be
considered first, which leads to the so-called decimation-in-time algorithms.
⎡ 1 1 ··· 1 ⎤ ⎡ x1 ⎤
⎢ ⎥⎢ ⎥
⎢ W W3 ··· W N−1 ⎥ ⎢ x3 ⎥
⎢ ⎥⎢ ⎥
+ ⎢ W2 W6 ··· W 2(N−1) ⎥ ⎢ x5 ⎥
⎢ ⎥⎢ ⎥
⎢ ⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥
⎢ ⎥⎢ ⎥
⎢ N∕2−1 3(N∕2−1) ⎥⎢ ⎥
⎣W W · · · W (N∕2−1)(N−1) ⎦ ⎣xN−1 ⎦
If the matrix which multiplies the column vector of elements with even indices is denoted by
T N/2 , then the matrix multiplying the vector of elements with odd indices can be factorized into
the product T N/2 and a diagonal matrix so that,
⎡ X0 ⎤ ⎡ x0 ⎤ ⎡1 0 0 ··· 0⎤ ⎡ x1 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ X1 ⎥ ⎢ x2 ⎥ ⎢0 W 0 ··· 0⎥ ⎢ x3 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ X ⎥=T ⎢ x ⎥ + ⎢0 0 W2 ··· 0⎥ TN∕2 ⎢ x5 ⎥
⎢ 2
⎥ N∕2
⎢ 4
⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮⎥ ⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ N∕2−1 ⎦
X ⎣ 2(N∕2−1) ⎦ ⎣0
x 0 … W N∕2−1 ⎦ ⎣ N∕2−1 ⎦
x
Similarly, for the last N/2 elements of the set X(k), remembering that W N = 1, one can write:
⎡ XN∕2 ⎤ ⎡ x0 ⎤ ⎡1 0 0 ··· ⎤ 0 ⎡ x1 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢XN∕2+1 ⎥ ⎢ x2 ⎥ ⎢1 W 0 ··· 0 ⎥ ⎢ x3 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢X ⎥ ⎢ x ⎥ − ⎢0 W2 · · · 0 ⎥ TN∕2 ⎢ x5 ⎥
N∕2+2 = TN∕2 0
⎢ ⎥ ⎢ 4
⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ⋮ ⎥ ⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ XN−1 ⎦ ⎣x2(N∕2−1) ⎦ ⎣0 0 0 … W N∕2−1 ⎦ ⎣xN−1 ⎦
It is apparent that the calculation of X(k) and X(k + N/2) for 0 < k(N/2) − 1 uses the same
calculations with only a change in sign in the final sum. Hence the following diagram:
This shows that the calculation of a Fourier transform of order N reduces to the calculation of
two transforms of order N/2, to which M/2 complex multiplications are added. By iteration through
40 2 The Discrete Fourier Transform
a number of steps given by log2 N − 1 = log2 (N/2), transforms of order 2 are arrived at. These have
the matrix:
[ ]
1 1
T2 =
1 −1
and no multiplications are required.
As each stage involves N/2 complex multiplications, the complete transform requires M c complex
multiplications, where:
Mc = (N∕2)log2 (N∕2) (2.7)
and Ac complex additions, where:
Ac = Nlog2 N (2.8)
In practice, the number of complex multiplications can again be reduced because some powers
of W have certain properties. For example, W 0 = 1 and W N/4 = −j do not require complex multipli-
cations, and
√
N∕8 2
W = (1 − j)
2
only requires one complex half-multiplication each. Thus, three multiplications can be saved in the
first stage, 3N/8 can be eliminated in the penultimate one, and 2N/4 in the last. The gain over all
the stages is 5N/4 − 3 and the minimum number of complex multiplications is given by:
[ ]
5
mc = N∕2 log2 (N∕2) − +3 (2.9)
2
It should be noted that all of these calculation reductions cannot always be easily implemented
either in software or in hardware.
The matrix for the fourth-order transform is:
⎡1 1 1⎤
1
⎢ ⎥
⎢1 −j −1 +j ⎥
T4 = ⎢ ⎥ (2.10)
⎢1 −1 1 −1⎥
⎢ ⎥
⎢1 +j −1 −j ⎥⎦
⎣
The diagram for its reduction is shown in Figure 2.2. By convention, the arrows represent multi-
plications and the solid circles to the left of the elementary flow graphs, often called “butterflies,”
represent an addition (upper one) and a subtraction (lower one). The eighth-order transform is
represented in Figure 2.3.
It is seen that in this treatment, the indices of the X(k) appear in natural order, while those of the
x(n) are in a permuted one. This permutation is caused by the successive interleaving and results
X0 = x0 + x2 + x1 + x3 (x0 + x2) x0
X2 = x0 + x2 – x1 – x3 (x1 + x3) x1
w0
X3 = x0 – x2 + j(x1 – x3) (x1 – x3) x3
w1
X0 x0
X1 x4
X2 x2
w0
X3 x6
w2
X4 x1
w0
X5 x5
w1
X6 x3
w2 w0
X7 x7
w3 w2
in a reversal or inversion of the binary representation of the indices, which is often called “bit
reversal.” For example, for N = 8,
x0 (000) corresponds to x0 (000)
x4 (100) corresponds to x1 (001)
x2 (010) corresponds to x2 (010)
x6 (110) corresponds to x3 (011)
x1 (001) corresponds to x4 (100)
x5 (101) corresponds to x5 (101)
x3 (011) corresponds to x6 (110)
x7 (111) corresponds to x7 (111)
The amount of data memory required to calculate a transform of order N is that needed to hold
N complex positions. Indeed, the calculations are performed on pairs of variables which undergo
the operation represented by a butterfly and preserve their position in the set of variables at the
end of the operation, as is clearly shown in the diagrams. This is called “in-place computation.”
The inverse transform is simply obtained by changing the sign of the exponent of W. The factor
1/N can be introduced, for example, by halving the results of the additions and subtractions made
in the butterflies. This allows scaling of the numbers in the memories.
This type of interleaving can also be applied to X(k) when a similar algorithm is obtained. It leads
to the so-called decimation-in-frequency algorithms.
For the elements with odd indices after a similar process, the corresponding equation is:
In this case, the square matrix obtained is equal to the product of the matrix T N/2 obtained for
the elements with even indices, and the diagonal matrix whose elements are the powers W k with
0 ⩽ k ⩽ N/2 − 1. Thus,
⎡ X1 ⎤ ⎡ 1 0 0 ··· 0⎤ ⎡ x0 − xN∕2 ⎤
⎢ X ⎥ ⎢0 ⎢ ⎥
⎢ 3 ⎥ ⎢ W 0 ··· 0 ⎥⎥ ⎢ x1 − xN∕2+1 ⎥
⎢ X ⎥ = ⎢0 ⎢ ⎥
0 W2 · · · 0 ⎥ ⎢ x2 − xN∕2+2 ⎥
⎢ 5 ⎥ ⎢ ⎥⎢ ⎥
⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥
⎣XN−1 ⎦ ⎣0 0 0 … W N∕2−1 ⎦ ⎢⎣xN∕2−1 − xN−1 ⎥⎦
The elements x(k) with even and odd indices are calculated using the square matrix T N/2 for the
transform of order N/2, and the following diagram is obtained:
By adopting the same notation for the butterflies as was used in the preceding section, similar
diagrams are obtained. Figure 2.4 shows the diagram for N = 8.
In decimation-in-frequency algorithms, the number of calculations is the same as with time inter-
leaving. The numbers x(n) to be transformed appear in their natural order, while the transformed
numbers X(k) are permuted.
The algorithms which have been obtained so far are based on a decomposition of the transform
of order N into elementary second-order transforms which do not require multiplications. These
algorithms are said to be radix-2 transforms. Other elementary transforms can also be used, the
most important being the radix-4 one, which uses the elementary matrix T 4 .
X0 x0
X4 x1
X2 x2
w0
X6 x3
w2
X1 x4
w0
X5 x5
w1
X3 x6
w0 w2
X7 x7
w2 w3
⎡ X0 ⎤ ⎡ x0 ⎤ ⎡ x1 ⎤ ⎡ X2 ⎤ ⎡ X3 ⎤
⎢ X ⎥ ⎢ x ⎥ ⎢ x ⎥ ⎢ X ⎥ ⎢ X ⎥
⎢ 1 ⎥ ⎢ 4 ⎥ ⎢ 5 ⎥ ⎢ 6 ⎥ ⎢ 7 ⎥
⎢ X ⎥=T ⎢ x ⎥+D T ⎢ x ⎥+D T ⎢ X ⎥+D T ⎢ X ⎥
⎢ 2
⎥ N∕4
⎢ 8
⎥ 1 N∕4
⎢ 9 ⎥ 2 N∕4
⎢ 10 ⎥ 3 N∕4
⎢ 11 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ N−2 ⎥ ⎢ ⎥
⎣XN∕4−1 ⎦ ⎣x4(N∕4−1) ⎦ ⎣xN−3 ⎦ ⎣X ⎦ ⎣XN−1 ⎦
⎡ XN∕4 ⎤ ⎡ x0 ⎤ ⎡ x1 ⎤ ⎡ x2 ⎤ ⎡ x3 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢XN∕4+1 ⎥ ⎢ x4 ⎥ ⎢ x5 ⎥ ⎢ x6 ⎥ ⎢ x7 ⎥
⎢ ⎥ = TN∕4 ⎢ ⎥ − jD1 TN∕4 ⎢ ⎥ − D2 TN∕4 ⎢ ⎥ + jD3 TN∕4 ⎢ ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎢X ⎥ ⎢x ⎥ ⎢x ⎥ ⎢x ⎥ ⎢x ⎥
⎣ N∕2−1 ⎦ ⎣ 4(N∕4−1) ⎦ ⎣ N−3 ⎦ ⎣ N−2 ⎦ ⎣ N−1 ⎦
This equation involves the same matrix calculations as the previous one, with the addition of the
multiplications by the elements of the second row of the matrix T 4 . It can hence be shown that the
calculation of the transform results in the diagram in Figure 2.5.
0 k N/4 – 1
X0 x0
X1 row k x4
=
TN/4
w0
XN/4–1 x4 (N/4–1)
XN/4 x1
XN/4+1 row x5
=
TN/4
k+N/4 1 1 1 1 wk
XN/2–1 1 –j –1 j xN–3
XN/2 1 –1 1 –1 x2
XN/2+1 1 j –1 –j x6
row
= TN/4
k+N/2 w2k
X3 N/4–1 xN–2
X3 N/4 x3
X3 N/4+1 row x7
=
k+3 N/4 TN/4
w3k
XN–1 xN–1
This type of transform is carried out in log4 N − 1 = log4 (N/4) stages. Each stage requires 3(N/4)
complex multiplications, which results in a total of M c4 multiplications, where:
( )
3 N
Mc4 = Nlog4 (2.11)
4 4
The number of complex additions Ac4 is:
Ac4 = 2Nlog4 N (2.12)
It is apparent that the number of additions is the same in radix-2 and in radix-4 algorithms,
in contrast to the complex multiplications, where calculation in radix-4 algorithms results in a
saving of over 25%. Other radices can also be envisaged – for example, radix 8. In this case, there
are multiplications in the elementary matrix, and the savings over radix 4 are negligible. Different
radices can also be combined [5].
For a transform of order N, the number of complex multiplications M c2/4 (N) is given by the recur-
rence derived from the above equations:
( ) ( )
N N N
Mc2∕4 (N) = Mc2∕4 + 2Mc2∕4 + (2.17)
2 4 2
2.3 Degradation Arising from Wordlength Limitation Effects 45
with the initial values M(2) = M(4) = 0. The value obtained in this way is slightly smaller than for
a radix-4 algorithm.
In practice, taking trivial operations into account and implementing complex multiplications
with 3 real multiplications and 5 real additions, as explained later on, it can be shown that
the N-order transform needs Nlog2 (N) − 3N + 4 real multiplications and 3Nlog2 (N) − 3N + 4
additions [6].
The algorithms which have been presented for decimation-in-time and -frequency, and for
radix-2 and 4, are elements of a large set of algorithms. A unified presentation of FFT algorithms
is given in the next chapter, so that the most appropriate can be selected for each application. In
actual calculations, however, operations are carried out with limited precision. This results in
some degradation of the signal.
The equipment used introduces limitations caused by the finite precision of arithmetic units and
the limited memory capacity. Firstly, the coefficients are held in a memory, with a limited number
of bits. Thus, the memory contents represent approximations of the actual coefficients, which are
generally obtained by rounding. Secondly, as the calculation proceeds, rounding is performed so
as to keep the data wordlength within the capacities of the memory locations, or of the arithmetic
units. It is important to analyze the degradations introduced by these two types of wordlength
limitation in order to be able to precisely determine the hardware needed to produce a transform
with specific performance.
To begin with, the rounding of coefficients is considered.
The coefficients actually used by the machine represent an approximation of the theoretical coef-
ficients, which have values of the real and imaginary parts within the range [−1, +1].
For the coefficient e−j2𝜋n/N , digitization into bc bits involves a quantization error
𝛿(n) = 𝛿 R (n) + j𝛿 1 (n) such that, if rounding is employed,
The calculation of each transformed number X(k) from the data x(n) is performed with an error
Δ(k) such that:
1∑
N−1
X(k) + Δ(k) = x(n)[e−j2𝜋nk∕N + 𝛿(nk)]
N n=0
or
1∑
N−1
Δ(k) = x(n)𝛿(nk)
N n=0
As the x(n) and X(k) are related by equation (2.2):
∑
N−1
x(n) = X(k)ej2𝜋nk∕N
k=0
this becomes,
∑
N−1
Δ(k) = X(i)𝜀(i, k) (2.18)
i=0
46 2 The Discrete Fourier Transform
with,
1∑
N−1
𝜀(i, k) = 𝛿(nk)ej2𝜋ni∕N
N n=0
Consequently, for the transformed number x(k), rounding the coefficients of the transform
introduces a perturbation Δ(k) obtained by summing the elementary perturbations, each of which
is equal to the product of a transformed number by a factor representing its contribution. The
transformed numbers interact with each other and are no longer strictly independent.
It is possible to calculate the 𝜀(i,k) for each transformation. In general, it is important to know
the maximum value 𝜀m that the |𝜀(i,k)| can have for a given order of transformation and for a given
number of bits bc . √
The inequality |𝛿(n)| < 2−bc 2 provides a maximum for 𝜀m as
√
𝜀m ⩽ 2bc 2
In practice, the values found for 𝜀m are much lower than this maximum. For example, for N = 64,
it is found that εm ≃ 0.6 × 2 − bc , and this value is also found for higher values of N [7].
Another performance degradation arises from computation noise. At DFT computer input, data
have a limited number of bits, and this number grows with every multiplication and addition. In
general, the number of bits assigned to internal data in the computer is fixed and it is necessary
to limit word lengths. Most of the time, limitations are obtained just by rounding, because, since
overflow is generally not acceptable, the scaling is selected at the beginning of the transform so
that the whole of the calculation can be performed without any risk of overflow. Two cases can be
distinguished for the radix-2 transform:
– Direct transform: Due to the global scale factor 1/N, it is sufficient to halve the results of addi-
tions and subtractions in each butterfly to keep the correct scaling.
– Inverse transform: The scaling at the beginning of the transform is selected so that no overflow
can occur in the calculations.
Regarding round-off noise power, detailed estimation requires identification of the sources, with
corresponding quantization steps, and summations [8].
The calculation of a spectrum by the DFT requires that certain approximations and a suitable choice
of parameters be made to attain the desired performance. Before considering any application, it is
therefore useful to look carefully at the function fulfilled by the DFT.
1∑
N−1
X(0) = x(n)
N n=0
2.4 Calculation of a Spectrum Using the DFT 47
The signal X(0) thus defined results from the convolution of the signal x(t) with the distribution
𝜑0 (t) such that:
1∑
N−1
𝜙0 (t) = 𝛿(t − nT)
N n=0
The Fourier transform of this distribution is given by:
1 ∑ −j2𝜋nf∕T
N−1
1 1 − e−j2𝜋f∕NT
Φ0 (f ) = e =
N n=0 N 1 − ej2𝜋fT
or
Φ0 (f ) = e−j𝜋f(N−1)T 𝜙(f )
with,
1 sin(𝜋fNT)
Φ(f ) = (2.19)
N sin(𝜋fT)
Now, a convolution operation in the time domain corresponds to a product in the frequency
domain – that is, X(0) is a signal obtained by filtering the input signal by the function Φ0 (f ).
Figure 2.6 shows the function Φ(f ) and the function 𝜙(t) of which it is the Fourier transform. The
function Φ(f ) is zero at points on the frequency axis which are whole multiples of 1/NT, except for
multiples of 1/T. It is periodic and has a period 1/T, conforming to the laws of sampling. It is simply
the spectrum of a sampled impulse of width NT. Similarly, the output X(k) has the corresponding
function 𝜙k (t) such that:
1 ∑ j2𝜋nk∕N
N−1
𝜙k (t) = e 𝛿(t − nT)
N n=0
1 ∑ −j2𝜋nk∕N −j2𝜋fnT
N−1
Φk (f ) = e e
N n=0
In compact form, after simplification, this becomes:
( )
k
Φk (f ) = (−1)k e−j𝜋f(N−1)T ej𝜋k∕N 𝜙 f (2.19b)
NT
φ(t)
1/N
0 T 2T t
NT
∅(f)
1
f
0 1 2
NT NT
x2
x1
x0 2𝜋 x N –1
x1 x0 N x0
xN–1 xN–
1 x1
X0 X1 XN–1
The output X(k) provides the signal filtered according to the function Φ0 (f ), but translated by
k/NT along the frequency axis.
Thus, the DFT forms a set of N identical filters, or a bank of filters, distributed uniformly over the
frequency domain at intervals of 1/NT.
If the input signal is periodic, then, from the definition of the DFT, this bank of filters is frequency
sampled at intervals of 1/NT, and it should be noted that there is no interference between the
outputs X(k). Strictly speaking, this property is lost if the coefficients are rounded, as has been
shown earlier.
The DFT function described above also illustrates the problem of scaling numbers in the
memory of an FFT calculator. Let us suppose that the numbers of transform x(n) result from the
sampling of a random signal for which the amplitude probability distribution has the variance 𝜎 2 .
If the signal has a uniform distribution of its energy spectrum, its power is uniformly distributed
between the X(k), and each has a variance equal to 𝜎 2 /N. By contrast, if the signal has a spectral
distribution which can be concentrated on one X(k), this X(k) has the same probability distribution
as the x(n) – in particular, the variance 𝜎 2 . Scaling the numbers by dividing by 2 at each stage of
an FFT calculation is suitable for the handling of such signals.
A different view of the filtering process is provided by observing that the outputs X(k) of the DFT
are the sums of the inputs x(n) after phase shifting. In effect, the output X(0) is the sum of the x(n)
with zero phase shift, and the output X(k) is the sum of x(n) with phase shifts which are multiples
of 2𝜋(k/N), as shown in Figure 2.7. At each output, in-phase components of the resulting signals
add while the others cancel. For example, if the x(n) are complex numbers with the same phase and
modulus, all the X(k) become zero except for X(0). The input signal is thus found to be decomposed
according to the base represented by the N vectors e−j2𝜋(k/N) with 0 ⩽ k ⩽ N − 1.
This result is useful in studying banks of filters which include a DFT processor.
Periodicity in time is introduced artificially by assuming that the signal is repeated outside the
time interval 𝜃 = NT which actually corresponds to the data being processed. Under these condi-
tions, the DFT supplies a sample of the spectrum with a frequency period Δf , equal to the inverse
of the duration of the data, which constitutes the frequency resolution of the analysis. The relation
Δf = NT expresses the Heisenberg uncertainty principle for spectrum analysis. A more accurate
analysis can be obtained by increasing the duration of the data collection – for example, by making
it N′ T (with N′ > N) with zero additional samples. The additional frequency samples are obtained
simply from an interpolation of the others. This procedure is commonly used to provide a number
N’ of data points which is a power of 2, thus allowing fast algorithms to be used. On the other hand,
the fact that the signal is not formed solely of lines at frequencies which are multiples of 1/NT intro-
duces interference between the spectral components obtained. Indeed, the filter function Φ(f ) of
the DFT, which is given in Section 2.4.1, introduces ripples throughout the frequency band and, if
the signal has a spectral component S(f 0 ) at frequency f 0 such that k/NT < f 0 < (k + 1)/NT, then:
X(k) = S(f0 )Φ(k∕NT − f0 ), 0⩽k ⩽N −1 (2.20)
As a result, all the transform outputs can assume nonzero values as shown in Figure 2.8. Thus,
limitations appear for the resolution of the analyzer. This effect can be reduced by modifying the
filter function of the DFT by weighting the signal samples before transforming them.
This operation amounts to replacing the rectangular time window 𝜙(t) with a function whose
Fourier transform results in smaller ripples. Numerous functions are used, of which the simplest
is the raised cosine window:
( )
1 t
𝜙(t) = 1 + cos 2𝜋 (2.21)
2 NT
and the Hamming window:
( )
t
𝜙(t) = 0.54 + 0.46 cos 2 𝜋 (2.22)
NT
The latter function has 99.96% of its energy in the main lobe. The peak side lobe ripple is about
40 dB below the main lobe peak. Other time windows can be used, and several efficient functions
are introduced in Reference. [9].
Let Φ(f ) be the spectrum of the time window 𝜑(t) after sampling; the expression (2.20) can
be extended to any signal with spectrum S(f ), using the definition of the convolution and taking
account of the periodicity of Φ(f ):
[ ] ( )
( )
1 ∑
1∕T ∞
n k
X(k) = S u− Φ − u du
∫0 T n=−∞ T NT
The signal spectrum aliasing due to sampling with period T is apparent.
To cope better with interference between the calculated spectral components, it is necessary to
employ a bank of more selective filters, like that presented in Chapter 10.
The DFT can also be used indirectly in the calculation of convolutions.
X(k)
1
k
O 1 2 3 4 5
Figure 2.8 Analysis of a signal with a frequency which is not a multiple of 1/NT.
50 2 The Discrete Fourier Transform
The efficiency of FFT algorithms leads to the use of the DFT in cases other than spectrum analysis
and, in particular, in convolutions.
Although, in general, this is not the most efficient of approaches, it can be useful in applications
where an FFT processor is available.
One of the properties of the DFT is that the transform of a convolution product is equal to the
product of its transforms. Given two sets x(n) and h(n) of period N, with transforms X(k) and H(k),
the circular convolution
∑
N−1
y(n) = h(m)x(n − m)
m=0
is a set of finite length, having N 1 + N 2 − 1 terms. The fast convolution is applied by considering
that the three sets y(n), x(n), and h(n) have the period N such that N ⩾ N 1 + N 2 − 1. It is then
sufficient to complete each set with a suitable number of zero terms. It is of particular interest if a
power of 2 is chosen for N.
Nevertheless, in practice, convolution is a filtering operation, where the x(n) represents the signal
and the h(n) the coefficients. The set of the x(n) is much longer than that of the h(n), and it is
necessary to subdivide the calculation. To do this, the set of the x(n) is regarded as a superposition
of elementary sets xk (n), each of N 3 terms. That is,
∑
x(n) = xk (n)
k
Each set yk (n) contains N 3 + N 2 − 1 nonzero terms. Thus, the convolutions involve N 3 + N 2 − 1
terms. Figure 2.9 shows the sequence of operations. The sets yk (n) and yk+1 (n) have N 2 − 1 terms
which are superposed. The same operations can be performed by decomposing the set x(n) into
sets xk (n) such that N 2 − 1 terms are superposed.
In this process, the number of calculations to be performed on each element of the set y(n)
increases as log2 (N 2 + N 3 − 1) and N 3 must not be chosen as too large. Also, if N 3 < N 2 , no terms
in the set y(n) can be obtained directly. Consequently, there is an optimal value for N 3 . The number
2.6 Calculations of a DFT Using Convolution 51
N2
h(n)
N3
xk(n)
N2 + N3 – 1
yk(n)
N3
xk+1(n)
N2 + N3 – 1
yk+1(n)
N2 – 1 N2 – 1 N2 – 1
y(n)
In certain applications, only those operators which can form convolutions are available to calcu-
late a DFT. Such is the case for circuits using charge transfer devices which allow calculations to
be performed on the sampled signals in analog form at speeds compatible with the frequencies
encountered, for example, in radar applications.
The definition of a DFT can be written as:
1∑
N−1
X(k) = x(n)e−j2𝜋(nk∕N)
N n=0
By writing
1 2
nk = [n + k2 − (n − k)2 ] and W = e−j(2𝜋∕N)
2
this becomes
2 ∕2
∑
N−1
2 ∕2 2 ∕2
X(k) = W k x(n)W n W −(n−k) (2.24)
n=0
2 2
This equation expresses the circular convolution product of the sets x(n)W n ∕2 and W −n ∕2 . It
follows that the calculation of X(k) can be performed in three stages comprising the following
operations:
2
(1) Multiply the data x(n) by the coefficients W n ∕2
2
(2) Form the convolution product with the set of coefficients W n ∕2
2
(3) Multiply the results by the coefficients W k ∕2
This process is represented in Figure 2.10, and the method can be extended to the case where W
is a complex number with a non-unit modulus [10].
52 2 The Discrete Fourier Transform
2.7 Implementation
Exercises
2.1 Calculate the DFT of the set comprising N = 16 terms such that:
x(0) = x(1) = x(2) = x(14) = x(15) = 1
x(n) = 0 for 3 ⩽ n ⩽ 13
and of the set
x(0) = x(1) = x(2) = x(3) = x(4) = 1
Exercises 53
x(n) = 0 for 5 ⩽ n ⩽ 15
Compare the results obtained. Carry out the inverse transform of these results.
2.2 Establish the diagram for the FFT algorithm of order 16 with time and frequency
interleaving. What is the minimum number of multiplications and additions that are
required?
2.3 Calculate the DFT of the set comprising N = 128 terms such that:
x(0) = x(1) = x(2) = x(126) = x(127) = 1
x(n) = 0 for 3 ⩽ n ⩽ 125
Compare the results with those in Exercise 2.1.
The set X(k) obtained forms an approximation to the Fourier series expansion of a set
of impulses. Compare the results obtained with the figures in the table in Appendix 1,
Chapter 1. Account for the differences.
2.4 We wish to develop a DFT of order 64 with a minimum of arithmetic operations. Determine
the number of multiplications and additions required with algorithms with radix 2, 4, and 8.
2.5 Analyze the power of the rounding noise produced in a transform of order 32. Using the
results in Section 2.3, show how the results vary at the different outputs. Calculate the dis-
tortion introduced by limiting the coefficients to 8 bits.
2.6 Show that each output of a DFT, X(k), can be obtained from the inputs x(n) by a recurrence
relation. Calculate the number of multiplications that would be required.
2.7 Carry out a DFT of order 64 on data which are 16-bit numbers. Calculate the degradation
of signal-to-noise ratio when a cascade of direct and inverse transforms is used on a 16-bit
machine.
2.8 Assume that the bandwidth occupied by a signal for analysis is from 0 to 10 kHz. The spectral
resolution required is 1 Hz. What length of recording is required in order to carry out such an
analysis? What memory capacity is required to store the data, assuming they are coded into
8 bits? Determine the characteristics of a computer capable of performing such a spectral
analysis – memory capacity, memory cycle, addition, and multiplication times.
2.9 Calculate the DFT of the set x(n) which is defined by:
x(n) = sin(2𝜋n∕3.5) + 0.2 sin(2𝜋n∕6.5) with 0 ⩽ n ⩽ 15
The following windows are used to improve the analysis:
1
g(n) = [1 − cos(2𝜋n∕16)]
2
2𝜋n
g(n) = 0.54 − 0.46 cos (Hamming)
16
2𝜋n 4𝜋n
g(n) = 0.42 − 0.5 cos + 0.08 cos (Blackman)
16 16
Compare the results.
54 2 The Discrete Fourier Transform
References
1 Special issue on FFT and applications. IEEE Transactions, 15(2), 1967, https://ieeexplore.ieee
.org/xpl/tocresult.jsp?isnumber=26059&punumber=8337, 1–113.
2 A. Oppenheim and R. Schafer, Digital Signal Processing, Prentice Hall, Englewood Cliffs NJ,
1974, Chapters 3 and 6.
3 L. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Prentice Hall,
Englewood Cliffs NJ, 1975, Chapters 6 and 10.
4 C. S. Burrus and T. W. Parks, DFT/FFT and Convolution Algorithms, Wiley, New York, 1985.
5 P. Duhamel and H. Hollmann, Split radix FFT algorithm. Electronics Letters, 1984, 20 (1), 14–16.
6 H. Sorensen, M. Heideman and S. Burrus. On computing the split-radix FFT, IEEE Transac-
tions, 34(1), 1986, 152–156.
7 D. W. Tufts, H. S. Hersey and W. E. Mosier, Effects of FFT Coefficient quantization on bin
frequency response. Proceedings of IEEE, 1972, 60, 1, pp. 146–147, 10.1109/PROC.1972.8582.
8 T. Thong and B. Liu, Fixed point FFT error analysis. IEEE Transactions, 24(6), 563–573, 1976.
9 A. Eberhard, An optimal discrete window for the calculation of power spectra. IEEE Transac-
tions, 21(1), 37–43, 1973.
10 L. Rabiner, R. Schafer and C. Rader, The chirp z-transform algorithm. IEEE Transactions on
Audio and Electroacoustics, 17, 1969. 17, 2, pp. 86–92, 10.1109/TAU.1969.1162034
55
Algorithms for the fast calculation of a discrete Fourier transform (DFT) are based on factorization
of the matrix of the transform. We have already seen such factorization in the sections on
decimation-in-time and decimation-in-frequency algorithms, in the preceding chapter, which are
particular examples of a large group of algorithms.
In order to use these fast algorithms and thus to exploit to the full both the characteristics of the
signals to be processed and the various technological possibilities, one must use a suitable mathe-
matical tool – the Kronecker product of matrices. By combining this product with the conventional
product, it is possible to factorize the matrix of the DFT in a simple way.
Similarly, the Kronecker product of a diagonal matrix by another diagonal matrix is once again
a diagonal matrix.
The Kronecker product can be combined with conventional matrix products, and thus we have
the following properties which will be used in the coming sections, provided that the dimensions
are compatible:
(1) The Kronecker product of a product of matrices with the unit matrix is equal to the product of
the Kronecker products of each matrix with the unit matrix:
(2) The product of Kronecker products is equal to the Kronecker product of the products:
(3) The inverse of a Kronecker product is equal to the Kronecker product of the inverses:
(A × B × C)t = At × Bt × Ct (3.6)
The transpose of the matrix of a Kronecker product is the Kronecker product of the transposes
of the matrices.
These properties can be easily demonstrated using some simple examples. They are used
to factorize matrices with redundant elements and, in particular, for DFT matrices [2].
Decimation-in-frequency will be considered first.
It should be noted that the scale factor 1/N is ignored throughout the rest of this chapter.
In the algorithms examined in the previous chapter, one of the sets – either the input or the
output – was permuted. The matrix which represents this algorithm is derived from the matrix T N
by permutation of the rows or the columns depending upon whether the decimation in frequency
or decimation in time is considered [3].
Let TN′ denote the matrix corresponding to decimation in frequency. This is obtained by
permutation of the rows of T N as follows. The rows are numbered, and each number is expressed
in binary notation; then the binary numbers are reversed, and the resulting number denotes the
position of that row in the new matrix. For example, for N = 8 we obtain:
⎡1 1 1 1 1 1 1 1 ⎤ 0 0 0 = 0
⎢1 W W 2 W 3 −1− W −W 2 −W 3 ⎥ 0 0 1 = 1
⎢ 2 2 2 2⎥
⎢1 W − 1 −W 1 W − 1 −W ⎥ 0 1 0 = 2
⎢1 W 3 −W 2 W −1−W 3 W 2 W ⎥ 0 1 1 = 3
T8 = ⎢
⎢1− 1 1 − 1 1− 1 1 − 1 ⎥⎥ 1 0 0 = 4
⎢1− W W −W −1 W −W 2 W 3 ⎥
2 3 1 0 1 = 5
⎢1−W 2 − 1 W 2 1−W 2 − 1 W 2 ⎥ 1 1 0 = 6
⎢ ⎥
⎣1−W 3 −W 2 − W −1 W 3 W 2 W ⎦ 1 1 1 = 7
3.2 Factorizing the Matrix of a Decimation-in-Frequency Algorithm 57
⎡1 1 1 1 1 1 1 1 ⎤ 0 0 0 = 0
⎢1− 1 1 − 1 1− 1 1 − 1 ⎥ 1 0 0 = 4
⎢ 2 2 2 2⎥
⎢1 W − 1 −W 1 W − 1 −W ⎥ 0 1 0 = 2
⎢1−W 2 − 1 W 2 1−W 2 − 1 W 2 ⎥ 1 1 0 = 6
T8′ = ⎢ 2 3 2 3⎥
⎢1 W W W −1− W −W −W ⎥ 0 0 1 = 1
⎢1− W W 2 −W 3 −1 W −W 2 W 3 ⎥ 1 0 1 = 5
⎢1 W 3 −W 2 W −1−W 3 W 2 − W ⎥ 0 1 1 = 3
⎢ ⎥
⎣1−W 3 −W 2 − W −1 W 3 W 2 W ⎦ 1 1 1 = 7
Note that for N = 2, the matrix T2′ is equal to T 2 .
The matrix T2′ is factorized by finding the matrix TN∕2
′
and the diagonal matrix DN/2 whose ele-
ments are the numbers W with 0 ⩽ k ⩽ N/2 − 1. Thus,
k
⎡TN∕2
′ ′
TN∕2 ⎤
TN′ = ⎢ ⎥
⎢T ′ D −T ′ D ⎥
⎣ N∕2 N∕2 N∕2 N∕2 ⎦
This decomposition appears clearly for T8′ . If I N/2 denotes the unit matrix of order N/2, we can
write:
[ ]
⎡TN∕2
′
0 ⎤ IN∕2 IN∕2
TN = ⎢
′ ⎥
⎢0 ′ ⎥
TN∕2
⎣ ⎦ DN∕2 −DN∕2
or
[ ][ ]
⎡TN∕2
′
0 ⎤ IN∕2 0 IN∕2 IN∕2
TN′ = ⎢ ⎥
⎢ 0 T′ ⎥ 0 D IN∕2 −IN∕2
⎣ N∕2 ⎦ N∕2
where ΔN is a diagonal square matrix of order N, in which the first N/2 elements have the value 1
and the subsequent elements are powers of W, W k with 0⩽k⩽N/2–1.
The complete factorization is obtained by iteration:
( ) ( )
TN′ = T2′ × IN∕2 (Δ4 × IN∕4 ) I2 × T2′ × IN∕4
…………………………………………
( )
(ΔN∕2 × I2 ) IN∕4 × T2′ × I2
( )
ΔN IN∕2 × T2′
or
∏(
log2 N
)( )
TN′ = Δ2 i × IN∕2i I2i−1 × T2′ × IN∕2i (3.8)
i=1
This expression shows that the transform is calculated in log2 (N) stages, each containing:
( )
(1) One part involving the ordering of the data corresponding to the factor I2i−1 × IN∕2i , which
contains only additions and subtractions.
(2) One part which involves the multiplications by the coefficients represented in the matrix
( )
Δ2i × T2′ × IN∕2i . The stage corresponding to i = 1 does not involve any multiplications. It can be
verified that all the matrices indeed have the dimension N.
58 3 Other Fast Algorithms for the FFT
In order to see how factorization is generalized to radix 4, it is interesting to examine the matrix
′
T16 , which is obtained from T 16 by the following permutation of the rows. The rows are numbered
to base 4 and the order of the digits in the row numbers are reversed. The value obtained shows the
number of the row in the new matrix. Following this permutation, we obtain T4 = T4′ .
If D4 denotes the diagonal matrix
⎡1 0 0 0 ⎤
⎢ ⎥
0 W 0 0 ⎥
D4 = ⎢
⎢0 0 W2 0 ⎥
⎢0 0 0 W 3 ⎥⎦
⎣
then the matrix of the transform of order 16 thus obtained is:
⎡ T4 T4 T4 T4 ⎤
⎢ ⎥
⎢T4 D4 T4 (−j)D4 T4 (−1)D4 T4 (+j)D4 ⎥
′
T16 =⎢ ⎥
⎢T D2 T (−1)D2 T D2 T (−1)D2 ⎥
⎢ 4 4 4 4 4 4 4 4⎥
⎢T D3 T (+j)D3 T (−1)D3 T (−j)D3 ⎥
⎣ 4 4 4 4 4 4 4 4⎦
⎡T4 0 0 0 I4 0 0 0 ⎤ ⎡I4 I4 I4 I4 ⎤
⎢ ⎥ ⎢ ⎥
⎢ 0 T4 0 0 0 D4 0 0 ⎥ ⎢I4 −jI4 −I4 +jI4 ⎥
′
T16 =⎢ ⎥×⎢ ⎥
⎢ 0 0 T 0 0 0 D2 0 ⎥ ⎢I −I I −I ⎥
⎢ 4 4 ⎥ ⎢ 4 4 4 4 ⎥
⎢ 0 0 0 T 0 0 0 D3 ⎥ ⎢I +jI −I −jI ⎥
⎣ 4 4⎦ ⎣ 4 4 4 4⎦
This expression is written in Kronecker product form as:
′
T16 = (T4 × I4 )Δ16 (I4 × T4 ) (3.9)
where Δ16 is a diagonal matrix in which the first four terms have the value 1, the next four terms
W k with 0 ⩽ k ⩽ 3, and the subsequent terms (W 2 )k and (W 3 )k with 0 ⩽ k ⩽ 3.
Factorization as Kronecker products forms the basis of algorithms which have various
properties – notably the order of presentation/extraction of data, and the linking of operations. It
also applies to partial transforms, which are of great practical importance.
The transforms which have been studied in the above sections relate to sets of N numbers which
may be complex. In a fine spectrum analysis, it can happen that the order of the transform N
becomes very large, though we are interested in knowing only a reduced number of points in the
spectrum. The limitation of the calculation to useful single points can then allow for significant
savings.
Let us calculate the partial transform defined by the following equation, where r is a factor of N:
⎡ Xp ⎤ ⎡ 1 W P W 2P … W (N−1)P ⎤ ⎡ x0 ⎤
⎢X ⎥ ⎢1 W P+1 W 2(P+1) … W (N−1)(P+1) ⎥ ⎢ x ⎥
⎢ p+1 ⎥ ⎢ P+2 W 2(P+2) … W (N−1)(P+2)
⎥⎢ 1 ⎥
⎢ Xp+2 ⎥ = ⎢1 W ⎥ ⎢ x2 ⎥ (3.10)
⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎣Xp+r−1 ⎦ ⎣1W (P+r−1) · · · …W (N−1)(P+r−1) ⎦ ⎣xN−1 ⎦
3.3 Partial Transforms 59
From the whole set of data, one can form N/r subsets, each containing r terms:
(x0 , xN∕r , x2N∕r , , x(r−1)N∕r )
(x1 , x(N∕r)+1 , , x(r−1)N∕r+1 )
……………………………
(x(N∕r)−1 , x(2N∕r)−1 , , xN−1 )
Assume Dr is the diagonal matrix of dimension r, whose elements are the powers of W, W k with
0 ⩽ k ⩽ r − 1.
The matrix of the partial transform can be separated into N/r submatrices which can each be
applied to one of the sets which were defined earlier, and the matrix equation of the transform is
written as:
∑
(N∕r)−1
(N∕r)
[X]p,r = Dir Tr (W p )i Dr [x]i,r (3.11)
i=0
where [X]p,r denotes the set of r numbers X k with p ⩽ k ⩽ p + r − 1 and [x]i,r the set of data xk with
k = nN/r + i and n = 0, 1,…, r − 1. The transform T r is that of order r.
Consequently, if r is a factor of N, a partial transform relating to r points is calculated using N/r
transforms of order r with which the appropriate diagonal matrices are associated.
If N and r are powers of 2, the number M P of complex multiplications to be made is given by:
( ) [ ( ) ]
N r r 1 r
MP = log2 + 2r = N log2 +2 (3.12)
r 2 2 2 2
This result is equally valid when it is the number of points to be transformed which is limited, as
is often the case in spectrum analysis. A common example of a partial transform is that applied to
real data.
[ ( )] [ ( ) ]
1 N 1 N
X(k) = Y (k) + Y − k + je−j2𝜋(k∕N) Y − k − Y (k) (3.14)
2 2 2 2
( )
N
with 0 ⩽ k ⩽ N/2 and Y 2
= Y (0).
60 3 Other Fast Algorithms for the FFT
1∑
N−1
X(k) = x(n)e−j2𝜋(2k+1)n∕(2N) (3.18)
N n=0
∑
N−1
x(n) = X(k)ej2𝜋(2k+1)n∕(2N)
k=0
The coefficients of this transform have as their coordinates the points M of a unit circle such that
−−→
the vector OM forms an angle with the abscissa which is an odd multiple of 2𝜋/2N, as shown in
Figure 3.1.
By setting W = e−j(𝜋/N) , the matrix of this transform is written:
⎡1 W W 2 … W N−1 ⎤
⎢1 W 3 W 6 … W 3(N−1) ⎥
I ⎢ ⎥
TN = ⎢1 W 5 W 10 … W 5(N−1) ⎥
⎢⋮ ⋮ ⋮ ⎥
⎢ (2N−1) … …W (2N−1)(N−1) ⎥
⎣ 1 W ⎦
M
2𝜋
N
1
0 Re
3.3 Partial Transforms 61
1∑
N−1
X(N − 1 − k) = x(n)e−j2𝜋[2(N−1−k)+1]∕(2N)
N n=0
1∑
N−1
= x(n)ej2𝜋(2k+1)n∕(2N) (3.19)
N n=0
Thus,
X(N − 1 − k) = X(k) (3.20)
Consequently, since the X(k) with even and odd indices are complex conjugates, it is sufficient
to calculate the X(k) with an even index in order to perform a transform on real numbers. Such a
transform is the matrix T R given by:
⎡1 W W 2 … W N∕2 … W N−1 ⎤
⎢ ⎥
1 W 5 W 10 … W 5N∕2 … W 5(N−1) ⎥
TR = ⎢
⎢⋮ ⋮ ⋮ ⋮ ⎥
⎢1W 2N−3 … …W (2N−3)N∕2 …W (2N−3)(N−1) ⎥
⎣ ⎦
Let DN/2 be the diagonal matrix whose elements are W k with 0 ⩽ k ⩽ N/2 − 1 and let T N/2 be
the matrix of the transform of order N/2. Allowing for the fact that W 2N = 1 and W N/2 = −j, this
becomes:
TR = [TN∕2 D, −jTN∕2 D] = (TN∕2 D) × [1, −j] (3.21)
The odd transform of the real data is then calculated by carrying out a transform of order N/2 on
the set of complex numbers:
[ ( )]
N N
y(n) = x(n) − jx + n W n with 0 ⩽ n ⩽ −1 (3.22)
2 2
The number of calculations is the same as in the method illustrated at the beginning of this
section, but the structure is simpler. It should be noted that the transformed numbers give a fre-
quency sample of the signal spectrum represented by the x(n), displaced by a half step on the
frequency axis.
An important case where significant simplifications are introduced is that of real symmetrical
sets. Reductions in the calculations are illustrated by using the doubly odd transform [6].
1∑
N−1
X(k) = x(n)e−j2𝜋(2k+1)(2n+1)∕(4N) (3.23)
N n=0
∑
N−1
x(n) = X(k)e j2𝜋(2k+1)(2n+1)∕(4N) (3.24)
k=0
The coefficients of this transform are based on the points M of a unit circle such that the vector
−−→
OM forms an angle with the abscissa which is an odd multiple of 2𝜋/4N as shown in Figure 3.2.
If the x(n) are real numbers, the following holds:
X(N − 1 − k) = −X(k)
62 3 Other Fast Algorithms for the FFT
M
2𝜋
N
1
0 Re
x(N − 1 − n) = −x(n)
By assuming, as before, that W = e−j(𝜋/N) , the matrix of the transform is written as:
Let us consider the case where the set of data x(n) is real and antisymmetric – that is,
x(n) = −x(N − 1 − n). Then the same applies to the set X(k). The set of the x(n) for even n is equal
to the set of the x(n) for odd n, except for the sign. The situation is the same for the set of X(k).
In order to calculate the transform, it is sufficient in this case to carry out the calculations for the
x(2n) with 0 ⩽ n ⩽ N/2 − 1, since the X(k) are real numbers. Alternatively, it is sufficient to perform
the calculations on the X(2k) with 0 ⩽ k ⩽ N/2 − l.
The corresponding matrix T RR is written as:
⎤⎡
1 1 … 1 1⎤ ⎡ 0 … 0 ⎤
⎡1 0 0 … 0
⎢ ⎥⎢⎢ 1 W 2 … W 8[(N∕2)−1] 0⎥ ⎢W 2 … 0 ⎥
0 W2 0 … 0 ⎥⎢ ⎥
TRR = W 1∕2 ⎢ ⎥ ⎢1 W 16 … ⋮ ⋮⎥ ⎢ ⋮ ⋮ ⎥
⎢⋮ ⋮ ⋮ ⎥⎢
⎢0 ⋮ ⋮ ⋮ ⋮⎥ ⎢ ⋮ ⋮ ⎥
⎣ 0 … … W 2(N∕2−1) ⎥⎦ ⎢ ⎥⎢ ⎥
⎣1 W 8[(N∕2)−1] W 8[(N∕2)−1][(N∕2)−1] 0⎦ ⎣ 0 W 2[(N∕2)−1]
⎦
Allowing for W 2N = 1, this becomes:
[ ]
1∕2 TN∕4 TN∕4
TRR = W DN∕2 D (3.27)
TN∕4 TN∕4 N∕2
3.3 Partial Transforms 63
and, as W 2N/4 = −j, this calculation can be carried out with one performance of the operations
represented by the matrix T N/4 , on the set of numbers x(2n) − jx(2n + N/2) with 0 ⩽ n ⩽ N/4–1.
The N/4 numbers obtained are complex ones whose real parts form the set of the desired X(2 k)
with 0 ⩽ k ⩽ N/4 − 1. By carrying out the operation defined by T RR for the transformed numbers of
rank 2 k + N/2 with 0 ⩽ k ⩽ N/4 − 1, it can be verified that we have obtained the earlier numbers
multiplied by −j. That is, the imaginary part of the numbers obtained previously furnishes the set of
the X(2 k + N/2). It follows that, if the doubly odd transform is applied to a real and antisymmetric
set of N terms, or to a symmetric set which becomes antisymmetric through a suitable change of
sign, it is reduced to the equation:
[ ( )] [ ( )]
N N
X(2k) + jX 2k + = W 1∕2 DN∕4 TN∕4 DN∕4 x(2n) − jx 2n + (3.28)
2 2
with 0 ⩽ k ⩽ N/4 − 1, 0 ⩽ n ⩽ N/4 − 1, and where DN/4 is a diagonal matrix, whose elements are
W 2i with 0 ⩽ i ⩽ N/4 − 1.
The number of complex multiplications Mc which are necessary is:
( )
N N N N
Mc = log2 + 2 = log2 (2n) (3.29)
8 8 4 8
Comparisons using the different transforms are given in Table 3.1 to illustrate the amount of
calculations for each type.
The importance of the odd transforms can be easily seen. It should, however, be noted that other
algorithms allow greater reductions to be made for real data and for symmetric real data [7], but
these are not as simple to use, especially for practical implementations.
One feature of the doubly odd transform, when applied to a real antisymmetric set, is that it is
identical to the inverse transform. Apart from the scale factor 1/N, there is no distinction, in this
case, between the direct and inverse transforms.
The Fourier transform of a real symmetric set is introduced, for example, when deriving the
power spectrum density of a signal from its autocorrelation function.
1∑
N−1
2𝜋nk
XCF (k) = x(n) cos (3.30)
N n=0 N
64 3 Other Fast Algorithms for the FFT
1∑
N−1
2𝜋nk
XSF (k) = x(n) sin (3.31)
N n=0 N
(3) The discrete cosine transform (DCT):
√
∑
2 N−1
XDC (0) = x(n)
N n=0
( )
2∑
N−1
2𝜋(2n + 1)k
XDC (k) = x(n) cos (3.32)
N n=0 4N
The inverse transform is given by:
1 ∑
N−1
2𝜋(2n + 1)k
x(n) = √ XDC (0) + XDC (k) cos
2 n=1
4N
(4) The discrete sine transform (DST):
√( )N−1 [ ]
2 ∑ 2𝜋(n + 1)(k + 1)
XDS (k) = x(n) sin (3.33)
N + 1 n=0 2N + 2
DCT (x1) x1
Inverse
S s
DFT
DCT (x2) x2
Figure 3.3 Computing 2 inverse DCTs with the help of a single same-size inverse DFT.
1
c(0) = √ ; c(k) = 1; 1 ≤ k ≤ N − 1
2
Finally, the DCT of order N can be completed with the help of a DFT of the same order.
Taking into account the fact that only the real part is exploited in the above equation, it is even
possible to compute two DCTs, using the imaginary part. Similarly, it is possible to compute two
inverse DCTs with a single inverse DFT, as illustrated in Figure 3.3.
At the input of the inverse DFT, the relations between variables are the following [8]:
C0 (x1 ) + jC0 (x2 ) CN∕2 (x1 ) + jCN∕2 (x2 )
S0 = √ ; SN∕2 = √
2 2
{ ( ) ( )}
𝜋k 𝜋k
2Sk = [Ck (x1 ) + CN−k (x2 )] cos + [CN−k (x1 ) − CN−k (x2 )] sin ]
2N 2N
{ ( ) ( )}
𝜋k 𝜋k
+ j [Ck (x1 ) + CN−k (x2 )] sin + [Ck (x2 ) − CN−k (x1 )] cos ]
2N 2N
1 ≤ k ≤ N − 1; k ≠ N∕2
Similarly, at the output of the inverse DFT:
x1 (2p) = Re{s(p)}; x1 (2p + 1) = Re{s(N − p − 1)}
and:
∑ ∑
N−1 N−1
2𝜋(2n1 + 1)k1 2𝜋(2n2 + 1)k2
x(n1 , n2 ) = e(k1 )e(k2 )X(k1 , k2 ) cos cos
k1 =0 k2 =0
4N 4N
with:
1
e(k) = √ ; k=0
2
e(k) = 1; k≠0
That transform is separable, and it can be computed as follows:
[ ]
2 ∑
N−1
2𝜋(2n2 + 1)k2 2 ∑ (
N−1
) 2𝜋(2n1 + 1)k1
X(k1 , k2 ) = e(k2 ) cos × e(k ) x n1 , n2 cos
N n =0
4N N 1 n =0 4N
2 1
Thus, the 2D-transform can be computed using 2N times the 1D-DCT, and the number of real
multiplications is of the order of N 2 log2 (N). In fact, that amount can be reached with the help of
an algorithm based on the decomposition of a DCT of order N into two DCTs of order N/2 [10]. It is
even possible to reach the value 3/4 N 2 log2 N, through extension of the decimation technique and
splitting the set of (N × N) data into subsets of (N/2 × N/2) data [11].
The filtering function of the DFT is improved when the transform is performed on overlapping
blocks of data [12].
3.5 Other Fast Algorithms 67
Time n + 1
X1(n + 1) X2(n + 1)
X1(n) X2(n)
Time n
Let us consider a data sequence of length 2M and a transform whose matrix is M × 2M. At time n,
the block of processed data can be represented by two M-element vectors denoted X 1 (n) and X 2 (n),
as shown in Figure 3.4.
At time n + 1, a block of data includes half the data of the previous block – that is,
X 1 (n + 1) = X 2 (n). The lapped transform allows the two vectors to be recovered. The block
of data is multiplied by the matrix [A B], A and B being M × M matrices:
[ ] [ ]
X1 (n) X1 (n + 1)
U1 = [A, B] ; U2 = [A, B] (3.38)
X2 (n) X2 (n + 1)
Then, the results, U 1 at time n and U 2 at time n + 1, are multiplied by the transpose matrix.
[ ] [ t] [ ] [ t]
Y𝟏 (n) A Y𝟏 (n + 𝟏) A
= t U𝟏 ; = t U𝟐 (3.39)
Y𝟐 (n) B Y𝟐 (n + 𝟏) B
which yields:
[Y𝟐 (n) + Y𝟏 (n + 𝟏)] = [Bt AX𝟏 (n) + Bt BX𝟐 (n) + At AX𝟏 (n + 𝟏) + At BX𝟐 (n + 𝟏)] (3.40)
𝟏
At time n + 1, the input data are restored by addition: X𝟐 (n) = 𝟐
[Y𝟐 (n) + Y𝟏 (n + 𝟏)], provided the
following conditions are satisfied:
𝟏 t
Bt A = At B = 𝟎; [B B + At A] = IM (3.41)
𝟐
For example, these conditions are met if the elements of the matrix [A B] are:
√ [( )( ) ]
𝟐 M+𝟏 𝟏 𝝅
ank = h(n) cos n + k+
M 𝟐 𝟐 M
( )
𝟏 𝝅
𝟎 ≤ n ≤ 𝟐M − 𝟏; 𝟎 ≤ k ≤ M − 𝟏; h(n) = −sin n +
𝟐 𝟐M
In fact, a bank of M orthogonal filters has been obtained and the terms h(n) are the coefficients of
the prototype filter, which, in the present case, is a low-pass filter whose frequency response, given
in Section 5.8, is more selective than that of the DFT filter.
In image compression, lapped transforms introduce smoothing and attenuate the so-called block-
ing effects.
the operating constraints and to technological characteristics to be achieved. Hence, they are of
great interest for practical applications.
Nevertheless, they are not the only method of fast calculation of a DFT, and algorithms can be
elaborated which involve, at least in certain cases, a shorter calculation time or a lower number of
multiplications, or which are applicable for values of the order N which are not necessarily powers
of two.
A first approach consists of replacing the complex multiplications, which are costly in terms of
circuitry or time, with a set of operations which are simpler to put into operation. Reference [13]
describes a technique which uses one property of the DFT mentioned in Section 2.4.1 – namely the
fact that multiplications by the coefficients W k correspond to phase rotations.
The technique known as CORDIC (coordinate rotation digital computer) enables these rotations
to be realized by linking simple operations – to rotate a vector (x,y) through an angle 𝜃 with an accu-
racy of 0/2n , a sequence of n elementary rotations of angle d𝜃 i is performed such that tan d𝜃 i = 2−i
with 0 ≤ i ≤ n − 1 and − 𝜋/2 ≤ 0 ≤ 𝜋/2. The coordinates xi and yi of the vector at iteration i yield the
coordinates at iteration i + 1 by using the expressions:
xi+1 = xi + sign[𝜃i ] ⋅ yi 2−i
yi + 1 = yi − sign[𝜃i ] ⋅ xi 2−i
𝜃i + 1 = 𝜃i − sign[𝜃i ] ⋅ d𝜃i (3.42)
The function sign [𝜃 i ] is the sign of 𝜃, and 𝜃 0 = −𝜃. These operations consist only of additions and
shifts and can therefore be more advantageous than complex multiplication of the same precision.
One method which is particularly interesting is used to obtain a multiplication volume of order N,
instead of N log2 N, for a DFT of order N. This method depends on factorizing the matrix T N in a
particular way. It is decomposed into a product of three factors:
TN = BN CN AN
where AN is a J × N matrix, J is a whole number, CN is a diagonal matrix of dimension J, and BN
is an N × J matrix. The special feature of this factorization is that the elements of the matrices AN
and BN are 0, 1, or − 1. Under these conditions, the calculation requires only J multiplications.
For example, the complex multiplication takes the form:
[ ] [ ] ⎡a 0 0 ⎤ ⎡1 1⎤
a −b 1 −1 0 ⎢
= 0 a+b 0 ⎥ ⎢0 1⎥
b a 1 0 1 ⎢ ⎥ ⎢ ⎥
⎣ 0 0 a − b⎦ ⎣1 0⎦
which shows that 3 real multiplications are sufficient, as shown in Section 2.7.
This decomposition is obvious for J = N 2 ; for example for N = 3, we obtain:
⎡1 ⎤ ⎡1 0 0⎤
⎢ 1 O ⎥ ⎢0 1 0⎥
⎢ ⎥⎢ ⎥
⎢ 1 ⎥ ⎢0 0 1⎥
⎡1 1 1 0 0 0 0 0 0 ⎤ ⎢ 1 ⎥ ⎢1 0 0⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
T3 = 0 0 0 1 1 1 0 0 0 × ⎢ W ⎥ ⎢0 1 0⎥
⎢ ⎥ ⎢ ⎥ ⎢0
⎣0 0 0 0 0 0 1 1 1 ⎦ W2 0 1⎥
⎢ ⎥⎢ ⎥
⎢ 1 ⎥ ⎢1 0 0⎥
⎢ O W 2 ⎥ ⎢0 1 0⎥
⎢ ⎥⎢ ⎥
⎣ W ⎦ ⎣0 0 1⎦
With some low values of N, certain factorizations are available in which J is of the order of N,
in which case there are the same number of multiplications. In order to generalize this property
3.5 Other Fast Algorithms 69
and by using the algebraic properties of the set of exponents of W which are defined modulo N.
For example, for N = 4, the sequence of operations is as follows:
t1 = x0 + x2 , t2 = x1 + x3
m0 = 1(t1 + t2 ), m1 = 1(t1 − t2 )
m2 = 1(x0 − x2 ), m3 = j(x1 − x3 )
70 3 Other Fast Algorithms for the FFT
2 2 (0) 2
3 3 (2) 6
4 4 (0) 8
5 6 (5) 17
7 9 (8) 36
8 8 (2) 26
9 11 (10) 44
16 18 (10) 74
X0 = m 0
X1 = m 2 + m 3
X2 = m 1
X3 = m 2 − m 3
For N = 8:
t1 = x0 + x4 , t2 = x2 + x6 , t3 = x1 + x5 ,
t4 = x1 − x5 , t5 = x3 + x7 , t6 = x3 − x7 ,
t7 = t1 + t2 , t8 = t3 + t5 ,
m0 = 1(t7 + t8 ), m1 = 1(t7 − t8 ),
m2 = 1(t1 − t2 ), m3 = 1(x0 − x4 ),
( )
𝜋
m4 = cos (t4 − t6 ), m5 = j(t3 − t5 ),
4 ( )
𝜋
m6 = j(x2 − x6 ), m7 = j sin (t4 + t6 ),
4
s1 = m 3 + m 4 , s2 = m 3 + m 4 , s3 = m 6 + m 7 , s4 = m 6 + m 7
X0 = m 0 , X1 = s1 + s3 , X2 = m 2 + m 5 , X 3 = s2 − s4 ,
X4 = m 1 , X5 = s5 + s4 , X6 = m 2 − m 5 , X 7 = s1 − s3 .
Amplitude
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 10 20 30 40 50 60
Index (0 – 63)
In practice, a machine with B bits carries out its operations on the set with 2B integers: 0, 1, …,
2B − 1. In this set, the usual laws of addition and multiplication cannot be applied, as shifting and
truncation must be introduced, which lead to approximations in the calculations, as was shown in
Section 2.3.
The first condition to be fulfilled to ensure exact calculations in a set E is that the product or the
sum of two elements of the set E must also belong to this set. This condition is satisfied in the set
of integers 0, 1, …, M − 1 if the calculations are made modulo M. By appropriate selection of the
modulus M, it is possible to define transformations with properties which are comparable to those
of the DFT and which allow error-free calculation of convolutions with fast calculation algorithms.
The definition of such transformation rests on the algebraic properties of integers modulo M, for
certain values of M. They are called number-theoretic transforms.
The choice of the modulus M is governed by the following considerations:
(1) Simplicity of the calculations in the modular arithmetic. In principle, modular arithmetic
implies a division by the modulus M. This division is trivial for M = 2m . It is very simple for
M = 2m ± 1, because the result is obtained by adding a carry-bit (1’s complement arithmetic)
or subtracting it.
(2) The modulus must be sufficiently large. The result of the convolution must be capable of repre-
sentation without ambiguity in modulo M arithmetic. For example, a convolution with 32 terms
with 12-bit data and 8-bit coefficients requires M > 225 .
(3) Suitable algebraic properties. The set of modulo M integers should have algebraic properties
allowing the definition of transformations comparable to the DFT.
First, there should be periodic elements in order that the fast algorithms can be elaborated; the
set must have an element 𝛼 such that:
𝛼N = 1
∑
N−1
X(k) = x(n)𝛼 nk (3.45)
n=0
For the existence of the inverse transformation which is defined by the expression:
∑
N−1
x(n) = N −1 X(k)𝛼 −nk (3.46)
k=0
∑
N−1
𝛿(i) = 1 if i = 0 modulo N
𝛼 ik = N𝛿(i) with
k=0
𝛿(i) = 0 if i ≠ 0 modulo N
This condition reflects the fact that each element (1 − 𝛼 i ) must have an inverse. It can be shown
that all of the conditions for the existence of a transformation and its inverse reduce to the following
one. For each prime factor P of M, N must be a factor of P − 1. Thus, if M is prime, N must divide
into M − 1.
Exercises 73
Exercises
3.1 Find the Kronecker product A × I 3 for the matrix A such that
[ ]
a a
A = 11 12
a21 a22
and the unit matrix I 3 of dimension 3.
Find the product I 3 × A and compare it with the above.
74 3 Other Fast Algorithms for the FFT
3.2 By taking square matrices of dimension 2, verify the properties of the Kronecker products
given in Section 3.1.
3.3 Give the factorizations of the matrix of a DFT of order 64 on base 2, base 4, and base 8, fol-
lowing the procedure given in Section 3.2.
Calculate the required number of multiplications for each of these three cases and compare
the results.
3.4 Using the decimation-in-time approach, factorize the matrix of the DFT of order 12. What is
the minimum number of multiplications and additions required? Write a computer program
for the calculation.
3.5 Factorize the matrix of a DFT of order 16 on base 2 for the following two cases:
(a) When the data appear at both input and output in natural order.
(b) When the stages in the calculation are identical.
For the latter case, devise an implementation scheme involving the use of shift registers as
memories.
3.7 Calculate a DFT of order 12 by using a factorization of the type given in Section 3.4 and with
the given permutations for the data
Evaluate the number of operations and compare it with the values found in Exercise 3.4.
3.8 To perform a circular convolution of the two sets x = (2, −2, 1, 0) and h = (1, 2, 0, 0) a
number-theoretic transform of modulus M = 17 and coefficient 𝛼 = 4 is used
As N = 4, verify that 𝛼 N = 1. Give the matrices of the transformation and the inverse
transformation. Prove that the desired result is the set y = (2, 2, −3, 2).
References
7 H. Ziegler, A fast transform algorithm for symmetric real valued series. IEEE Transactions,
20(5), 1972.
8 C. Diab, M. Oueidat and R. Prost, A new IDCT-DFT relationship reducing the IDCT computa-
tional cost. IEEE Transactions on Signal Processing, 50(7), 1681–84, 2002.
9 R. Bracewell, The fast Hartley transform. Proceedings of IEEE 72(8), 1010–1018, 1984.
10 M. Vetterli, H. J. Nussbaumer, Simple FFT and DCT algorithms with reduced number of opera-
tions. Signal Processing, 6(4), 267–278, 1984.
11 M. A. Haque, A two-dimensional fast cosine transform. IEEE Transactions, 33(6), 1532–1539,
1985.
12 H. S. Malvar, Signal Processing with Lapped Transforms, Artech House, Norwood MA, 1992.
13 A. Despain, Very fast Fourier transform algorithms hardware for implementation. IEEE Trans-
actions on Computers, C-28(5), 333–341, 1979.
14 H. Silverman, An introduction to programming the Winograd Fourier transform algorithm.
IEEE Transactions, 25(2), 152–165, 1977.
15 H. Nussbaumer, Fast Fourier Transform and Convolution Algorithms, Springer, Berlin, 1981.
16 B. Fino and R. Algazi, Unified matrix treatment of fast Walsh-Hadamard transform. IEEE
Transactions on Computers, C-25 (11), 1142–1146, 1976.
17 R. C. Agrawal and C. S. Burrus, Number theoretic transforms to implement fast digital convolu-
tion. Proceeding of IEEE, 63(4), 550–560, 1975.
18 J. H. McClellan, Hardware realization of a Fermat number transform. IEEE Transactions, 24(3),
216–225, 1976.
77
Time-invariant discrete linear systems represent a very important area for digital signal
processing – digital filters with fixed characteristics. These systems are characterized by the
fact that their behavior is governed by a convolution equation. Their properties are analyzed
using the Z-transform, which plays the same role in discrete systems as the Laplace or Fourier
transforms do in continuous systems. In this chapter, the elements which are most useful for
studying such systems will be briefly introduced. To supplement this discussion, reference should
be made to References [1–5].
Further, if h(n) is the set forming the system’s response to the unit set u0 (n), then h(n − m) corre-
sponds to u0 (n − m) because of the time-invariance. Linearity then implies the following relation:
∑ ∑
y(n) = x(m)h(n − m) = h(m)x(n − m) = h(n) ∗ x(n) (4.3)
m m
This is the convolution equation which represents the linear time-invariant (LTI) system. Such
a system is completely defined by the values of the set h(n), which is called the system’s impulse
response.
This system has the property of causality if the output with index n = n0 depends only on inputs
with indices n < n0 . This property implies that h(n) = 0 for n < 0, and the output is given by:
∑
∞
y(n) = h(m)x(n − m) (4.4)
m=0
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
78 4 Time-Invariant Discrete Linear Systems
u0(n)
1
0 n
u0(n–n0)
1
0 n
An LTI system is stable if each input with a bounded amplitude has a corresponding bounded
output. A necessary and sufficient condition for stability is given by the inequality:
∑
|h(n)| < ∞ (4.5)
n
To show that the condition is necessary, it is sufficient to apply to the system the input set x(n)
such that:
x(n) = +1 if h(n) ⩾ 0
−1 if h(n) < 0
If the inequality (4.5) is not satisfied, then y(0) is not bounded, and the system is not stable. If the
input set is bounded – that is,
then we have
| ∑ ∑
|
|y(n) ⩽ |h(m)||x(n − m)| ⩽ M |h(m)|
|
| m m
If the inequality (4.5) is satisfied, then y(n) is bounded, and the condition is sufficient.
In particular, the LTI system defined by the response:
h(m) = am with m ⩾ 0
is stable for |a| < 1. The properties of LTI systems will be studied using the Z-transform.
The Z-transform, X(Z), of the set x(n) is defined by the following relation:
∑
∞
X(Z) = x(n)Z −n (4.6)
n=−∞
4.2 The Z-Transform 79
Z is a complex variable and the function X(Z) has a convergence region which, in general, is an
annular ring centered on the origin, with radii R1 and R2 . That is, X(Z) is defined for R1 < |Z| < R2 .
The values R1 and R2 depend on the set x(n). If the set x(n) represents the samples of a signal taken
with period T, its Fourier transform is written as:
∑
∞
S(f ) = x(n)e−j2𝜋fnT
n=−∞
Consequently, for Z = ej2𝜋fT , the Z-transform of the set x(n) coincides with its Fourier transform.
That is, the analysis of a discrete system can be performed with the Z-transform, and, in order to
find a frequency response, it is sufficient to replace Z with ej2𝜋fT .
This transform has an inverse. Assuming Γ is a closed contour containing the origin and all sin-
gular points, or poles, of X(Z), we can write:
∑
∞
∑
Z m−1 X(Z) = x(n)Z m−1−n = x(m)Z −1 + Z m−1−n x(n)
n=−∞ n≠m
A stability condition appears quite simply from observing that the set x(n) is bounded if and only
if |pi | < 1 for 1 ⩽ i ⩽ N – that is, the poles of X(Z) are inside the unit circle.
In these examples, the terms of the set x(n) can be obtained directly by series expansion. When
X(Z) is a rational fraction, a very simple method of obtaining the first values of the set x(n) is direct
division. For example, if
1 + 2Z −1 + Z −2 + Z −3
X(Z) =
1 − Z −1 − 8Z −2 + 12Z −3
direct division gives:
and thus:
The Z-transform has the property of linearity. Also, the Z-transform of the delayed set x(n − n0 )
is written:
Xn0 (Z) = Z −n0 X(Z) (4.8)
These two properties are used to calculate the Z-transform, Y (Z), of the set y(n) obtained at the
output of a discrete linear system by convolution of the sets x(n) and h(n) which have transforms
X(Z) and H(Z), respectively.
By calculating the Z-transform for the two components of the convolution equation (4.3):
∑
y(n) = h(m)x(n − m)
m
we find,
∑
Y (Z) = h(m)Z −m X(Z) = H(Z)X(Z) (4.9)
m
Consequently, the Z-transform of a convolution product is the product of the transforms. The
function H(Z) is called the Z-transfer function of the LTI system being considered.
The Z-transform of the product of two sets x3 (n) = x1 (n)x2 (n) is the function X 3 (Z) defined by:
( )
1 Z −1
X3 (Z) = X1 (𝜐)X2 𝜐 d𝜐 (4.10)
2𝜋j ∫Γ 𝜐
The integration contour is inside the region of convergence of the functions X 1 (v) and X 2 (Z/v).
When applied to causal sets, the one-sided Z-transform is introduced. The one-sided Z-transform
of the set x(n) is written:
∑
∞
X(Z) = x(n)Z −n (4.11)
n=0
The properties are the same as for the transform defined by equation (4.6), except for delayed sets
where the transform of the set x(n − n0 ) is written:
∑
∞
∑
n0
Xn0 (z) = x(n − n0 )z−n = z−n0 X(z) + x(−n)z−(n0 −n) (4.12)
n=0 n=1
The value of this transform in the study of system response is that account can be taken of the
initial conditions and that the transient response can be brought to light. It also allows for deter-
mination of the extreme values of the set x(n) from the x(z). The initial value x(0) is written as:
x(0) = lim X(Z) (4.13)
z→∞
More on the Z-transform and its applications can be found in Reference [6]. The above results
can be applied to the calculation of the power of discrete signals.
This is the Bessel–Parseval relation given in Section 1.1.1, which expresses the conservation of
energy for discrete signals. The energy of the signal is equal to the energy of its spectrum.
The calculations above provide an expression which is useful for the norm ||X||2 of the function
X(f ). Indeed, by definition,
1∕2
||X||22 = |X(f )|2 df
∫−1∕2
This becomes:
1 dZ
||X||22 = X(Z)X(Z −1 ) (4.16)
2𝜋j ∫|Z|=1 Z
If X(Z) is a holomorphic function of the complex variable in a domain which contains the unit
circle, the integral is calculated by the method of residues and directly yields the value of ||X||2 ,
which is also the L2 norm of the discrete signal x(n).
Let us now assume we want to calculate the energy Ey of the signal y(n) at the output of the LTI
system with impulse response h(n), to which the signal x(n) is applied.
The signal x(n) is assumed to be deterministic. Using equation (4.15), by setting 𝜔 = 2𝜋f , we can
write:
𝜋
1
Ey = |Y (ej𝜔 )|2 d𝜔
2𝜋 ∫−𝜋
Equation (4.9) directly provides the following result:
𝜋
1
Ey = |H(ej𝜔 )|2 |X(ej𝜔 )|2 d𝜔 (4.17)
2𝜋 ∫−𝜋
These results also apply to random signals.
82 4 Time-Invariant Discrete Linear Systems
These results are of great practical importance and will be frequently used in later sections (for
example, when evaluating the powers of the round-off noise in the filters).
4.5 Systems Defined by Difference Equations 83
The most interesting LTI systems are those where the input and output sets are related by a linear
difference equation with constant coefficients. On the one hand, they represent simple examples,
and on the other, they offer an excellent representation of many natural systems.
A system of this type of order N is defined by the equation:
∑
N
∑
N
y(n) = ai x(n − i) − bi y(n − i) (4.27)
i=0 i=1
By applying the Z-transform to both sides of this equation, and by denoting the transforms of the
sets y(n) and x(n) by Y (Z) and X(Z), we obtain:
∑
N
∑
N
Y (Z) = ai Z −i X(Z) − bi Z −1 Y (Z) (4.28)
i=0 i=1
Hence:
Y (Z) = H(Z)X(Z)
with:
a0 + a1 Z −1 + · · · + aN Z −N
H(Z) = (4.29)
1 + b1 Z −1 + · · · bN Z −N
The transfer function H(Z) is a rational fraction. The ai and bi are the coefficients of the system.
Some coefficients can be zero, as is the case, for example, when the two summations of expression
(4.27) have different numbers of terms. To find the frequency response, it is sufficient to replace
the variable Z by ej2𝜋f in H(Z).
The function H(Z) is written in the form of a quotient of two polynomials N(Z) and D(Z) of degree
N which have N roots Z i and Pi , respectively, with 1 ⩽ i ⩽ N.
By using these roots, another expression for H(Z) appears:
∏N
N(Z) (1 − Zi Z −1 )
H(Z) = = a0 ∏i=1
N
(4.30)
D(Z) −1
i=1 (1 − Pi Z )
In the complex plane, Z is the affix of a moving point M, Pi and Z i (1 ⩽ i ⩽ N) are affixes of the
poles, and zeros of the function H(Z). Then:
From this, a graphic interpretation in the complex plane can be developed. The frequency
response of the system is obtained when the moving point M lies on the unit circle. Figure 4.2
shows the example of a system of order N = 2.
84 4 Time-Invariant Discrete Linear Systems
M
Z1
φ1
P1
φ2 R
P2
θ2
Z2
The modulus of the transfer function is thus equal to the ratio of the product of the distances
between the moving point M and the roots Z i to the product of the distances between M and the
−−→
poles Pi . The phase is equal to the difference between the sum of the angles between the vectors Pi M
−−−→
and the real axis, and the sum of the angles between the vectors Zi M and the real axis, following
the convention introduced in Chapter 1. This graphic interpretation is frequently used in practice
because it offers a very simple visualization of a system’s frequency response.
Analysis of a system using its frequency response corresponds to steady-state operation. It is ade-
quate only insofar as transient phenomena can be neglected. If this is not the case, initial conditions
have to be introduced, to represent, for example, the status of the equipment and the contents of
its memory at switch on.
Consider the behavior for values with index n ⩾ 0 of the system defined by equation (4.27), to
which the set x(n) (x(n) = 0 for n < 0) is applied. The y(n) are completely determined if the values
y(−i) with 1 ⩽ i ⩽ N are known. These values correspond to the initial conditions, and, in order to
introduce them, the one-sided Z-transform has to be used.
The one-sided Z-transform is applied to both sides of equation (4.27), assuming that the input
x(n) is causal – that is, that x(n) = 0 for n < 0. Allowing for equation (4.12), which gives the transform
Y i (Z) of the delayed set y(n − i):
∑
i
Yi (Z) = Z −i Y (Z) + y(−n)Z −(i−n)
n=1
we obtain:
∑
N
∑
N
∑
N
∑
i
Y (Z) = ai Z −i X(Z) − bi Z −i Y (Z) − bi y(−n)Z −(i−n)
i=0 i=1 i=1 n=1
or
∑
N
∑
i
bi y(−n)Z −(i−n)
i=1 n=1
Y (Z) = H(Z)X(Z) − (4.33)
∑
N
1+ bi Z −i
i=1
The system response with index n, y(n), is obtained by series expansion or by inverse
transformation.
It should be noted that the values y(−i) represent the state of the system at switch on, provided
that only the set of output numbers is contained in the system memory. However, the memory often
contains other internal variables which can be introduced into the analysis for generalization and
to provide other features relating, in particular, to the implementation.
4.6 State Variable Analysis 85
The state of a system of order N at time n is defined by a set of at least N internal variables repre-
sented by a vector U(n) called the state vector. Its operation is governed by the relations between
this state vector and the input and output signals. The behavior of a linear system to which the
input set x(n) is applied, and which provides the output set y(n), is described by the following pair
of equations, called state equations [7]:
U(n + 1) = AU(n) + Bx(n)
y(n) = Ct U(n) + dx(n) (4.34)
A is called the matrix of the system, B is the control vector, C the observation vector, and d is the
transition coefficient. The set x(n) is the innovation and y(n) is the observation. The reasons why
they are so called will be outlined later, particularly in Chapters 6 and 15. The matrix A is a square
matrix of order N. The vectors B and C are N-dimensional.
The state of the system at time n is obtained from the initial state at time zero by the equation:
∑
n
U(n) = An U(0) + An−i Bx(i − 1) (4.35)
i=1
Consequently, the behavior of such a system depends on successive powers of the matrix A.
The Z-transfer function of the system is obtained by taking the Z-transform of the state equations
(4.34). Thus:
(ZI − A)U(Z) = BX(Z)
Y (Z) = Ct U(Z) + dX(Z)
and consequently:
The poles of the transfer function thus obtained are the values of Z for which the determinant of
the matrix (ZI − A) is zero. That is, they are the roots of the characteristic polynomial of A. Con-
sequently, the poles of the transfer function are the eigenvalues of the matrix A and have absolute
values less than unity to ensure stability. This result agrees with the equation for the operation
of the system (4.35). Indeed, by diagonalizing the matrix A, it can be seen that it is the condition
whereby the vector U(n) = An U(0) tends toward zero as n tends toward infinity – a situation which
corresponds to the free evolution of the system from the initial state U(0).
Examination of the transfer function of the system (4.36) shows by another route that when a
system is specified by the input–output equation, there is some latitude in the choice of the state
parameters. Indeed, only the eigenvalues of the matrix A are imposed, and the matrix of the system
can be replaced by another matrix A′ = M −1 AM, which has the same eigenvalues. Then, in order
to preserve the same output set, using equation (4.35), the following criteria are necessary:
A′ = M −1 AM; C′t M; B′ = M −1 B
The matrix A can also be replaced by its transpose A′ . The system is then described by a system
of equations, parallel to that of equation (4.34), corresponding to the state vector V(n), such that:
V(n + 1) = At V(n) + Cx(n)
y(n) = Bt V(n) + dx(n) (4.37)
86 4 Time-Invariant Discrete Linear Systems
Exercises
4.1 Consider an LTI system with an impulse response h(n) such that:
h(n) = 1 0 ⩽ n ⩽ 3
h(n) = 0 elsewhere
4.2 Show that the Z-transform of the causal sequence x(n) defined by:
is
Te−aT Z −1
X(Z) =
(1 − eaT Z −1 )2
Calculate the inverse transform of ln(Z − a), Z/{(Z − a)(Z − b)} and establish the conditions
on a and b such that the resulting sequence converges.
What is the domain of convergence of the function obtained? Plot its poles and zeros in the
Z-plane.
4.5 Use the one-sided Z-transform to calculate the response of the system defined by the
difference equation:
y(n) = x(n) + y(n − 1) − 0.8y(n − 2)
with the initial conditions y(−1) = a and y(−2) = b, to the set x(n) defined by:
x(n) = ejn𝜔 for n ⩾ 0
x(n) = 0 for n < 0
Show the response due to the initial conditions and the steady state response.
References
1 L. R. Rabiner and B. Gold, Theory and Application of Digital Signal Processing. Chapter II,
Prentice Hall, 1975.
2 A. V. Oppenheim and R. W. Schafer, Digital Signal Processing. Chapter II, Prentice Hall, 1974.
3 M. Kunt, Digital Signal Processing, Artech House, Norwood, MA, 1986.
4 F.J. Taylor, Digital Filter Design Handbook, Dekker, New York, 1983.
5 A. Peled and B. Liu, Digital Signal Processing: Theory, Design and Implementation, Wiley,
New York, 1976.
6 E. I. Jury, Theory and Application of the Z-Transform Method, Wiley, 1964.
7 J. A. Cadzow, Discrete Time Systems, Prentice-Hall, 1973.
89
Digital FIR filters are discrete linear time-invariant systems in which an output number, rep-
resenting a sample of the filtered signal, is obtained by weighted summation of a finite set of
input numbers, representing samples of the signal to be filtered. The coefficients of the weighted
summation constitute the filter’s impulse response and only a finite number of them take nonzero
values. This is a “finite memory” filter – that is, it determines its output as a function of input data
of limited age. It is frequently called a non-recursive filter because, unlike the infinite impulse
response filter, it does not require a feedback loop in its implementation.
The properties of FIR filters will be illustrated by two simple examples.
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
90 5 Finite Impulse Response (FIR) Filters
–fe/2 0 fe/2 fe f
h(t)
0.5
T 0 T t
–
2 2
The impulse response h(t) which corresponds to the filter transfer function |H(f )| is written as:
[ ( ) ( )]
1 T T
h(t) = 𝛿 t+ +𝛿 t−
2 2 2
Figure 5.1 shows the characteristics of the filter.
Another simple operation associates the set x(nT) with the set y(nT) by:
1
y(nT) = [x(nT) + 2x[(n − 1)T] + [(n − 2)T]] (5.3)
4
As in the previous case, this equation conserves the component with zero frequency and
eliminates the one with frequency f s /2. This corresponds to the transfer function:
1 1
H(f ) = (1 + e−j2𝜋f2T + e−j2𝜋f2T ) = e−j2𝜋f2T (1 + cos 2𝜋fT) (5.4)
4 2
This is a raised-cosine filter. Its propagation delay is 𝜏 = T, and |H(f )| corresponds to the impulse
response h(t) such that:
1 1 1
h(t) = 𝛿(t + T) + 𝛿(t) + 𝛿(t − T)
4 2 4
This is a more selective low-pass filter than the preceding one, and it is evident that an even
more selective filtering function can be obtained merely by increasing the number of terms in the
set x(nT) over which the weighted summation is carried out (Figure 5.2).
–fe/2 0 fe/2 fe f
h(t)
0.5
–T 0 T t
5.2 Practical Transfer Functions and Linear Phase Filters 91
These two examples have served to illustrate the following properties of FIR filters:
(1) The input set x(n) and the output set y(n) are related by an equation of the following type (the
defining relation):
∑
N−1
y(n) = ai x(n − i) (5.5)
i=0
The filter defined in this way comprises a finite number N of coefficients ai . If it is regarded as
a discrete system, its response h(i) to the unit set is:
h(i) = ai if 0 ⩽ i ⩽ N − 1
0 elsewhere
That is, the impulse response is simply the set of the coefficients.
(2) The transfer function of the filter is:
∑
N−1
H(f ) = ai e−j2𝜋fiT (5.6)
i=0
(3) The function H(f ), the filter’s frequency response, is periodic with period f s = 1/T. The coeffi-
cients ai (0 ⩽ i ⩽ N − 1) form the Fourier series expansion of this function.
In view of the Bessel–Parseval relation given in Section 1.1.1, the following can be written:
∑
N−1
1
f
s
|ai |2 = |H(f)|2 df (5.8)
i=0
fs ∫0
(4) If the coefficients are symmetrical, the transfer function can be written as the product of two
terms, of which one is a real function and the other a complex number with modulus 1, repre-
senting a constant propagation delay 𝜏 which is a whole multiple of half the sampling period.
Such a filter is said to have a linear phase.
A digital filter which processes numbers representing samples of a signal taken with period T
has a periodic frequency response of period f s = 1/T. As a consequence, this function H(f ) can
be expanded into a Fourier series:
∑
∞
H(f ) = 𝛼n e+j2𝜋fnT (5.9)
n=−∞
with
f
s
1
𝛼n = H(f )e−j2𝜋fnT df (5.10)
fs ∫0
The coefficients 𝛼 n of the expansion are, except for a constant factor, the samples taken with
period T of the Fourier transform of the function H(f ) over a frequency interval of width f s . As they
92 5 Finite Impulse Response (FIR) Filters
form the impulse response, the condition (4.5) for the stability of the filter implies that the 𝛼 n tends
toward zero when n tends toward infinity. Consequently, the function H(f ) can be approximated
by an expansion reduced to a limited number of terms:
∑
Q
H(f ) ≃ 𝛼n ej2𝜋fnT = HL (f )
n=−p
where P and Q are finite integers. The approximation is improved as the numbers increase.
The property of causality, which expresses the fact that in a real filter, the output cannot precede
the input in time, implies that the impulse response h(n) is zero for n < 0. Using relations (5.5) and
(5.6), if the filter is causal, then Q = 0 and we obtain:
∑
P
HL (f ) = an e−j2𝜋fnT
n=0
As a result, any causal digital filtering function can be approximated by the transfer function of
an FIR filter.
The fact that transfer functions with linear phase can be realized is an important property which
is exploited in spectral analysis or in data-transmission applications. For FIR filters, this capability
leads to a reduction in complexity, because it implies symmetry in the coefficients which is used to
halve the number of multiplications needed to produce each output number.
By definition, a linearphase filter has the following frequency response:
where R(f ) is a real function and the phase 𝜙(f ) is a linear one: 𝜙(f ) = 𝜙0 + 2𝜋f𝜏, when 𝜏 is a
constant giving the propagation delay through the filter.
In fact, the phase is not rigorously linear, due to the sign changes of R(f ) which introduce dis-
continuities of 𝜋 in the phase. However, the filter is said to be linear phase.
The impulse response of this filter is written as:
∞
h(t) = e−j𝜙0 R(f )ej2𝜋f(t−𝜏) df (5.12)
∫−∞
By assuming 𝜙0 to be zero and by decomposing the real function R(f ) into an even part P(f ) and
an odd part I(f ), this becomes:
∞ ∞
h(t + 𝜏) = 2 P(f ) cos(2𝜋ft)df + 2j I(f ) sin(2𝜋ft)df
∫0 ∫0
If the condition is imposed that the function h(t) must be real, this becomes:
∞
h(t + 𝜏) = 2 P(f ) cos(2𝜋ft)df
∫0
This relation shows that the impulse is symmetric about the point t = 𝜏 on the time axis. Such a
condition is satisfied in a filter with real symmetric coefficients. Two configurations are available,
depending upon whether the number of coefficients N is even or odd:
(1) N = 2P + 1: the filter has a propagation time 𝜏 = PT. The transfer function is:
[ ]
∑P
−j2𝜋fPT
H(f ) = e h0 + 2 hi cos(2𝜋fiT) (5.13)
i=1
5.2 Practical Transfer Functions and Linear Phase Filters 93
( )
(2) N = 2P: the filter has a propagation delay 𝜏 = P − 12 T. The transfer function is:
( )
∑P [ ( ) ]
−j2𝜋f(P−1∕2)T 1
H(f ) = e 2 hi cos 2𝜋f i − T (5.14)
i=1
2
The filter coefficients hi form the digital filter’s response to the unit set. They can also be regarded
as samples, taken with period T, of the continuous impulse response h(t) of a filter which has the
same frequency response as the digital filter in the range (−T/2, T/2) but has no periodicity on the
frequency axis. Figures 5.3 and 5.4 illustrate this for odd and even N.
These filters have the even function P(f ) in their frequency response. With real coefficients, a
frequency response can also be obtained which corresponds to I(f ), the odd part of R(f ).
As the function h(t) must be real, this category of filter has the transfer function:
H(f ) = −je−j2𝜋f𝜏 I(f ) = e−j(𝜋∕2) e−j2𝜋f𝜏 I(f )
A fixed phase shift 𝜙0 = 𝜋/2, which corresponds to a quadrature, is added to the frequency-
proportional phase shift. This possibility is important in certain types of modulation and will be
examined later. The impulse response is zero at the point t = 𝜏 on the time axis and is antisymmetric
about it. Figures 5.5 and 5.6 show the configurations when the number of coefficients N is even
or odd.
0 t
(2P+1)T
0 t
2PT
0 t
(2P+1)T
0 t
2PT
94 5 Finite Impulse Response (FIR) Filters
∑
P [ ( ) ]
1
H(f ) = −jej2𝜋f𝜏 2 hi sin 2𝜋f i − T (5.16)
i=1
2
As h0 = 0, the transfer function has the same form in both cases. It is not difficult to envisage
that fixed phase shifts other than 𝜙0 = 0 and 𝜙0 = 𝜋/2 can be obtained for filters with complex
coefficients.
We now turn our attention to the calculation of the coefficients of FIR filters, assuming the phase
to be linear, which applies to a majority of applications, and is the case when specifications are given
for the frequency response.
δ2
0 f1 f2 fe/2 f
5.3 Calculation of Coefficients by Fourier Series Expansion for Frequency Specifications 95
The table given in Appendix 1 of Chapter 1 can then be used to provide an initial estimation of
the coefficients of an FIR filter. The optimized values are, in fact, not very different.
For the filter to be realizable, it is necessary to limit the number of coefficients to N. This operation
boils down to multiplying the impulse response h(t) by a time window g(t) so that:
NT NT
g(t) = 1 for − ⩽t⩽
2 2
0 elsewhere
Using equation (1.10), the Fourier transform of this function is written as:
sin(𝜋fNT)
G(f ) = NT (5.19)
𝜋fNT
Figure 5.8 shows these functions.
The real filter, with a limited number of coefficients N, has the following convolution product
H R (f ) as its transfer function:
∞
HR (f ) = H(f ′ )G(f − f ′ ) df ′
∫−∞
By limiting the number of coefficients, ripples are introduced and the steepness of the cutoff of
the filter is limited, as shown in Figure 5.9, which corresponds to an ideal low-pass filter with a
cutoff frequency f c .
The ripples depend on those of the function G(f ). In order to reduce their amplitude, it is suffi-
cient to choose as the time window functions whose spectra introduce smaller ripples than that of
NT T NT t
–
2 2
G(f)
NT
0 1 f
NT
–fc 0 fc f
HR(f)
1
–fc 0 fc f
96 5 Finite Impulse Response (FIR) Filters
the rectangular window given above. This situation has been encountered in Section 2.4.2 for spec-
trum analysis and the same functions can be employed. One example is the Hamming window,
which is defined as:
g(t) = 0.54 + 0.46 cos(2𝜋i∕NT) for |t| ⩽ NT∕2
0 for |t| > NT∕2
The consequence of reducing the ripples in the pass and stop bands is an increase in the width
of the transition band.
The function which presents the smallest ripples for a given width of the principal lobe is the
Dolph–Chebyshev function:
cos[Kcos−1 (Z0 cos 𝜋x)]
G(x) = for x0 ⩽ x ⩽ 1 − x0
ch[K ch−1 (Z0 )]
ch[K ch−1 (Z0 cos 𝜋x)]
for 0 ⩽ x ⩽ x0 and 1 − x0 ⩽ x ⩽ 1 (5.20)
ch[K ch−1 (Z0 )]
with x0 = (1/𝜋) cos−1 (1/Z 0 ), K an integer, and Z 0 a parameter. This function, as shown in
Figure 5.10, presents a main lobe of width B, given by:
( )
2 −1 1
B = 2x0 = cos
𝜋 Z0
and secondary lobes of constant amplitude given by:
1
A=
ch[K ch−1 (Z0 )]
It is periodic and its inverse Fourier transform is formed of a set of K + 1 discrete nonzero values,
used to weight the coefficients of the Fourier series expansion of the filter function to be approxi-
mated.
It is worth noting that if the ripples in the function G(x) are of constant amplitude, those in the
resulting filter decrease in amplitude, with distance from the pass band and stop band edges. The
ripples in the pass and stop bands, however, are the same.
The technique of Fourier series expansion of the function to be approximated leads to a simple
method of determining the coefficients of the filter, but it involves two important restrictions:
(1) The ripples of the filter in the pass and stop bands are equal.
(2) The amplitude of the ripple is not constant.
The first limitation can be eliminated by using a method which preserves the simplicity of direct
calculation: the least-squares method. Further, this corresponds exactly to the objectives to be
achieved in a number of applications.
0 x0 1 x
5.4 Calculation of Coefficients by the Least-Squares Method 97
Let us calculate the N coefficients hi of an FIR filter according to the leastsquares criterion so that
the transfer function approximates a given function.
The calculation can be carried out directly from the relationship between coefficients and fre-
quency response, as shown in Section 5.16 for 2-dimensional filters. However, it is interesting to
provide a more general iterative approach, operating in the frequency domain from an approxi-
mate solution, which can cope with non-quadratic cost functions and can be used for other types
of filters – in particular, IIR filters.
The discrete Fourier transform applied to the set hi , with (0 ⩽ i ⩽ N − 1), produces a set H k such
that:
1 ∑ −j2𝜋(ik∕N)
N−1
Hk = he (5.21)
N i=0 i
The set of H k , 0 ⩽ k ⩽ N − 1 forms a sample of the filter’s frequency response with period f s /N.
Conversely, the coefficients hi are related to the set H k by the equation:
∑
N−1
Hk = hk e−j2𝜋(ik∕N) (5.22)
k=0
It should be noted that this expression simply forms a different type of series expansion of the
function H(f ) with a limited number of terms.
Given the function to be approximated, D(f ), a first possibility is to choose the H k such that:
( )
k
Hk = D f for 0 ⩽ k ⩽ N − 1
N s
The transfer function of the filter H(f ), obtained by interpolation, shows ripples in the pass and
stop bands, as shown in Figure 5.11.
H(f)
0 f1 f2 fe/2 f
The divergence between this function and that given represents an error e(f ) = H(f ) − D(f ) which
can be minimized by the least-squares criterion. The procedure begins by evaluating the squared
error E which is the L2 norm of the divergence function. To do this, the response H(f ) is sampled
with a frequency step Δ less than f s /N, in a way that produces interpolated values, for example:
fs
Δ= with L as an integer greater than 1
NL
The function e(f ) is calculated at frequencies which are multiples of Δ. Generally, when evalu-
ating the squared error E, only one part of the band (0, f s /2) is taken into account. For a low-pass
filter this can be the pass band, the stop band or the set of both bands. In order to demonstrate the
principle of the calculation, it is assumed that the minimization relates to the pass band (0, f 1 ) of
a low-pass filter, whence:
N0 −1 ( )
∑ f f f
E= e2 n with 1 NL < N0 ⩽ 1 NL + 1
n=0
NL f s fs
Further, it is often useful to find a weighting factor P0 (n) for the error element of index n, so that
the frequency response can be modeled. Thus:
( ) N0 −1
∑
N0 −1
f ∑
E= P02 (n)e2 n s = P02 (n)e2 (n) (5.24)
n=0
NL n=0
With the error function obtained from the interpolation formula (5.23), the squared error E is
a function of the set of H k with 0 ⩽ k ⩽ N − 1 and is expressed by E(H). If these samples of the
frequency response are given increases of ΔH k , a new squared error value is obtained:
∑ 𝜕E
N−1
1 ∑ ∑ 𝜕2 E
N−1N−1
E(H + ΔH) = E(H) + ΔHk + ΔHk ΔH1 (5.25)
K=0
𝜕H 2 k=0 l=0 𝜕HK 𝜕H1
From the defining equation for E and the interpolation equation (5.23), this becomes:
𝜕E ∑
N0 −1
𝜕e(n)
=2 P02 (n)e(n)
𝜕Hk n=0
𝜕Hk
𝜕2 E ∑
N0 −1
𝜕e(n) 𝜕e(n)
=2 P02 (n)
𝜕Hk 𝜕H1 n=0
𝜕H1 𝜕Hk
These equations can be written in matrix form where A is the matrix with N rows and N 0 columns
such that:
⎡ a00 a01 · · · a0(N0 −1) ⎤
⎢ a a11 · · · a1(N0 −1) ⎥⎥ 𝜕e(j)
A = ⎢ 10 with aij =
⎢ ⋮ ⋮ ⋮ ⎥ 𝜕Hi
⎢a a a ⎥
⎣ 0(N−1) 1(N−1) · · · (N−1)(N0 −1) ⎦
Let P0 be the diagonal matrix of order N 0 whose elements are the weighting factor P0 (n). Then:
[ ]
𝜕E
= 2AP20 [e(n)] (5.26)
𝜕Hk
The set of terms 𝜕 2 E/𝜕H k 𝜕H e form a square matrix of order N such that:
𝜕2 E
= 2AP20 At (5.27)
𝜕Hk 𝜕H1
5.5 Calculation of Coefficient by Discrete Fourier Transform 99
The condition for E(H + ΔH) to be a minimum is that all its derivatives with respect to
H k (0 ⩽ k ⩽ N − 1) be zero. Now,
𝜕 𝜕E ∑ 𝜕E 𝜕E
N−1
E(H + ΔH) = + ΔHl
𝜕Hk 𝜕Hk l=0 𝜕Hk 𝜕Hl
Under these conditions, the increments ΔH k (0 ⩽ k ⩽ N − 1) which transfer the initial values
of the samples to the optimal values of the frequency response form a column vector which is
written as:
[ ]−1
[ΔH] = − AP20 At AP20 [e(n)] (5.29)
To summarize, the following operations are required for calculating the coefficients of the filter
by the least-squares method:
The weighting coefficients P0 (n) allow certain constraints to be introduced – for example, to
obtain ripples in the pass and stop bands which are in a given ratio, or to force the frequency
response to a particular value. This latter condition can also be taken into account by reducing
the number of degrees of freedom by 1; this is more elegant but more complicated to program.
Implementation of this calculation does not present any particular difficulty and permits the
calculation of a filter in a direct way. However, the filter obtained does not have constant amplitude.
This is a commonly required feature, and to achieve it, an iterative technique is employed.
If the matrix inversion in equation (5.29) is difficult or impossible, the optimum can still be
reached by replacing the matrix with a small constant and iterating the process. This is the gradient
algorithm, which is dealt with in Chapter 14.
A first iterative approach consists of using the discrete Fourier transform, which is calculated effi-
ciently by a fast algorithm.
Consider calculation of a linear-phase filter with N coefficients meeting the specification of
Figure 5.7. A discrete Fourier transform of order N 0 with N 0 ≃ 10 N will be used.
100 5 Finite Impulse Response (FIR) Filters
The procedure consists of taking initial values for the coefficients – for example, the terms h given
by (5.18) for −P ≤ i ≤ P, if N = 2P + 1. This set of N values is completed symmetrically with zeros
to obtain a set of N 0 real values, symmetrical with respect to the origin.
A DFT calculation then gives the response H(f ) at N 0 points on the frequency axis:
H(f ) = Hid (f ) + E(f )
where H id (f ) is the ideal response and E(f ) is the deviation from this response.
A reduction of the deviation E(f ) is then performed, by replacing H(f ) with the function G(f )
such that:
G(f ) = Hid (f ) + EL (f ) if H(f ) > Hid (f ) + EL (f )
G(f ) = Hid (f ) − EL (f ) if H(f ) < Hid (f ) − EL (f )
where EL (f ) represents the limit of the deviation given by the characteristic – for example, 𝛿 1 and
𝛿 2 for the low-pass filter in Figure 5.7.
Calculation of the inverse DFT gives N 0 terms; the N values which encircle the origin are retained
and the others are discarded. The procedure is repeated by taking the DFT of the N 0 values obtained
in this way.
Denoting the sum of the squares of the N 0 − N terms discarded in the time domain at iteration
k by J(k), a decreasing function is obtained if the specification of the filter is compatible with the
number of coefficients N. The procedure is terminated when J(k) falls below a fixed threshold.
By applying the method for different numbers of coefficients N, an optimum solution can be
approached and even achieved in special cases. All types of linear phase filters can be designed in
this way.
To obtain the optimum filter, a method based on the Chebyshev approximation is used.
when the number of coefficients is N = 2r − 1. The technique which will be presented is valid in all
cases, whether N is even or odd, or whether the coefficients are symmetric or antisymmetric. It is
based on the following theorem [1].
A necessary and sufficient condition for H R (f ) to be the unique and, in the Chebyshev [ ] sense, the
1
best approximation to a given function D(f ) over a compact subset A for the range 0, 2 is that the
error function e(f ) = H R (f ) − D(f ) present at least (r + 1) extremal frequencies on A. That is, there
exist (r + 1) extremal frequencies (f 0 , f 1 ,…, f r ) such that e(f i ) = −e(f i − 1 ), with 1 ⩽ i ⩽ r and
|e(fi )| = max |e(f )|
f ∈A
5.6 Calculation of Coefficients by Chebyshev Approximation 101
This result is still valid if a weighting function P0 (f ) for the error is introduced.
Thus, the problem is equivalent to the solution of the system of (r + 1) equations:
The unknowns are the coefficients hi (0 ⩽ i ⩽ r − 1) and the maximum of the error function: 𝛿.
In matrix form, by writing the unknowns as a column vector and by normalizing the frequencies
so that f s = 1/T, this becomes:
⎡1 cos(2𝜋f ) … cos[2𝜋f (r − 1)] 1 ⎤ ⎡
h ⎤
⎡D(f0 )⎤ ⎢ P0 (f0 ) ⎥ ⎢ 0 ⎥
0 0
⎢ ⎥ ⎢ ⎥
1 ⎥ ⎢ h1 ⎥
⎢D(f1 )⎥ ⎢⎢1 cos(2𝜋f1 ) … cos[2𝜋f1 (r − 1)] ⎢ ⎥
⎢ ⎥=⎢ P0 (f1 ) ⎥ ⎢ ⋮ ⎥
⎢ ⋮ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢⋮ ⋮ ⋮ ⋮ ⎥⎢
⎥ hr−1 ⎥
⎢D(f )⎥ ⎢ ⎢ ⎥
⎣ r ⎦ 1 ⎥⎢
⎢1 cos(2𝜋fr ) … cos[2𝜋fr (r − 1)] ⎥ ⎣ 𝛿 ⎥⎦
⎣ P0 (fr ) ⎦
This matrix equation results in the determination of the coefficients of the filter, under the con-
dition that the (r + 1) extremal frequencies f i are known.
An iterative procedure based on an algorithm called the Remez algorithm is used to find the
extremal frequencies. In this algorithm, each stage has the following phases:
(1) Initial values are assigned or are available for the parameters f i (0 ⩽ i ⩽ r).
(2) The corresponding value 𝛿 is calculated by solving the system of equations, which leads to the
following formula:
a0 D(f0 ) + a1 D(f1 ) + · · · + ar D(fr )
𝛿=
a0 ∕P0 (f0 ) − a1 ∕P0 (f1 ) + · · · + (−1)r ar ∕P0 (fr )
with:
∏
r
1
ak =
cos(2𝜋fk ) − cos(2𝜋fi )
i=0
i≠k
(1) The values of the function H R (f ) are interpolated between the f i (0 ⩽ i ⩽ r) in order to
calculate e(f ).
(2) The extremal frequencies obtained in the previous phase are taken as the initial values for the
next stage.
Figure 5.12 shows the evolution of the error function through one stage of the calculation. The
procedure is halted when the difference between the value 𝛿 is calculated on the basis of the new
extremal frequencies, iteratively. The new 𝛿 value is compared with the preceding value. When the
difference between two consecutive values of 𝛿 falls below a certain threshold, the procedure is
halted. Usually, this occurs after a few iterations.
The convergence of this procedure is related to the choice of initial values for the frequencies f i .
For the first iteration, the extremal frequencies obtained using a different method of calculation
can be used as the filter coefficients. Even more simply, a uniform distribution of the extremal
frequencies over the frequency range can be assumed.
As in the least squares method presented in the previous section, the values H R (f ) have to be inter-
polated. Because of the non-uniform distribution of the extremal frequencies, it is more convenient
102 5 Finite Impulse Response (FIR) Filters
e(f)
step i + 1
δ step i
Figure 5.12 Evolution of the error function in a stage of the Remez algorithm.
This simple estimate is sufficient for most practical cases. It clearly indicates the relative impor-
tance of the various parameters. The transition band Δf is the most sensitive, with the pass-band
and stop-band ripples having less significant impact. For example, if 𝛿 1 = 𝛿 2 = 0.01, dividing one of
these figures by 2 leads to an increase of only 10% in the filter order. Moreover, it is worth emphasiz-
ing that, according to the estimation, the filter complexity is independent of the pass-band width.
Examples:
(1) Calculation for a filter with 39 coefficients (N = 39) leads to the following values:
With these values for the parameters, the estimate gives N e = 40.
(2) A filter with 160 coefficients (N = 160) has the following parameters:
It should be noted that non-negligible differences can occur between the value N which is
necessary in practice and the estimated value N e , when the transition-band limits approach 0
and 0.5, or when N is a few units. A set of more complex formulae is given in Reference [3]. As
indicated in Chapter 7, a high-pass filter can be derived from a low-pass one by inverting the sign
of every second coefficient. Consequently, estimate (5.32) can also apply to high-pass filters. When
the mask specifies different ripples for the pass and stop bands, an overestimation of the number
of coefficients can be obtained by assuming the most rigid constraints on 𝛿 1 and 𝛿 2 in the pass and
stop bands, respectively.
For band-pass filters, it is necessary to introduce several transition bands.
Figure 5.13 gives the characteristic of such a filter, which has two transition bands Δf 1 and
Δf 2 . Experience shows that the number of coefficients N depends essentially on the smallest
bandwidth, Δf m = min(Δf 1 , Δf 2 ). Then estimate (5.32) can be applied, with Δf = Δf m . An upper
Ampl.
1 + δ1
1
1 – δ1
δ2
0 f1 f2 f3 f4 0.5 f
–δ2
Δf1 Δf2
bound is obtained by considering the band-pass filter as a cascade of low-pass and high-pass filters
and summing the estimates.
Example: A band-pass filter with 32 coefficients (N = 32) has the following characteristics:
𝛿1 = 0.015; 𝛿2 = 0.0015; f1 = 0.1; f1 = 0.2;
f3 = 0.35; f4 = 0.425; Δfm = 0.075
The estimate using equation (5.32) with Δf = Δf m gives N e = 32. A set of elaborate formulae to
estimate band pass filter orders is given in Reference [3].
The estimation formulae can be used to complete a computer program for the filter coefficient
calculations by determining the number N at the beginning of the program. Expression (5.32) is
very useful in practice for complexity assessment.
When the frequency responses of filters are examined in the transition band, it is evident that
they approximate to raised-cosine form; the approximation becomes closer as the pass-band and
attenuation-band ripples become closer. In fact, this type of response corresponds to the specifi-
cations imposed on data transmission using Nyquist filters, and it represents another approach to
linear-phase FIR filters.
H(f)
Δf
1
0.5
0 fc f
I1(f)
0 fc f
I2(f)
1
Δf 0 Δf f
–
2 2
Figure 5.14 (a) Response with raised-cosine transition; (b) frequency impulse of width 2fc ; (c) frequency
impulse for the transition.
5.8 Raised-Cosine Transition Filter 105
Note that this expression can be applied to any filter and provides a more accurate estimation of
the coefficients than (5.18).
The above results can be generalized to any symmetrical transition band:
Δf
H(fc + f ) = 1 − H(fc − f ); |f | ≤
2
These filters, called Nyquist filters, play a crucial role in digital transmission. The impulse
response is zero at multiples of 1/2f c , which determines the interference free transmission rate.
An important particular case is the half-band filter such that f c = f s /4 . Then, even coefficients
are null and, for N = 4 M + 1 coefficients, the input–output relationship is:
[ ]
1 ∑ M
y(n) = x(n − 2M) + h2i−1 [x(n − 2M + 2i − 1) + x(n − 2M − 2i + 1)]
2 i=1
With its reduced number of computations, this filter is a basic building block in multirate
filtering.
106 5 Finite Impulse Response (FIR) Filters
In communications, the filtering function is generally shared between transmitter and receiver,
which leads to the half-Nyquist filter – for example, with cosine transition band:
1 Δf
H 2 (f ) = 1; |f | ≤ fc −
2
( [ ( )])
Δf
1
cos 𝜋 f − fc − 2 Δf Δf
H 2 (f ) = ; fc − ≤ f ≤ fc +
2Δf 2 2
1 Δf
H 2 (f ) = 0; |f | ≤ fc +
2
The impulse response is:
( ) ( )
Δf Δf 1 Δf
1
4 𝜋
cos 2𝜋t fc + 2
+ 𝜋t
sin 2𝜋t fc − 2
h 2 (t) = (5.37)
1 − (4Δft)2
As mentioned above, digital filter coefficients are obtained by sampling – that is, by substituting
i/f s for t in (5.37).
FIR filters are composed of circuits which carry out the three fundamental operations of storage,
multiplication, and addition. They are arranged to produce from the set of data x(n) an output set
y(n) according to the equation of definition of the filter. As no real operation can be instantaneous,
the equation which is implemented in place of equation (5.5) in Section 5.1 is:
∑
N−1
y(n) = ai x(n − i − 1) (5.38)
i=0
N data memories are required, and for each output number, N multiplications and N − 1
additions have to be performed. Various arrangements of the circuits can be devised to put these
operations into effect [4–6].
Figure 5.15(a) outlines the filter in what is called the “direct” structure. The transposition of the
diagram of this scheme produces the “transposed” structure. This is represented in Figure 5.15(b),
where the same operators are arranged differently. This structure allows multiplication of each
piece of data x(n) by all of the coefficients in succession. The memories store partial sums,
and, in effect, at time n, the first memory stores the number aN−1 x(n), the next memory stores
aN−1 x(n − 1) + aN−2 x(n), and the last memory stores the sum y(n). The difference between the two
structures lies in the position of the memories. An intermediate structure having two memories
per coefficient can also be envisaged, the internal data being stored for the duration T/2 in each.
The cascade structure so obtained has local connections only.
In linear-phase filters, the symmetry of the coefficients can be exploited to halve the number of
multiplications to be performed for each output number. This is very important for the complexity
of the filter and justifies the almost universal use of linear-phase filters.
The complexity of the circuits depends on the number of operations to be performed and on
their precision. Thus, the multiplication terms should have the minimum possible number of bits,
which will reduce the amount of memory necessary both for the coefficients and for the data. These
limitations modify the characteristics of the filter.
5.10 Limitation of the Number of Bits for Coefficients 107
T T T T
x(n)
+ + + +
y(n)
(a)
x(n)
T + T + + T + T
y(n)
(b)
Figure 5.15 Realization of FIR filters, (a) Direct structure; (b) transposed structure.
Limitation of the number of bits in the coefficients of a filter produces a change in the frequency
response, which appears as the superposition of an error function. The consequences will be ana-
lyzed for linear-phase filters, and it is a straightforward matter to extend the results to any FIR
filter.
Consider a linear-phase filter with N = 2P + 1 coefficients, for which the transfer function is
written according to Section 5.2, equation (5.13), as:
[ ]
∑
P
−j2𝜋fPT
H(f ) = e h0 + 2 hi cos(2𝜋fiT)
i=1
The limitation of the number of bits in the coefficients introduces an error 𝛿hi (0 ⩽ i ⩽ P) in the
coefficient hi which, under the assumption of rounding with a quantization step q, is such that:
|𝛿hi | ≤ q∕2
This results in the superposition on the function H(f ) of an error function e(f ) such that:
[ ]
∑
P
e(f ) = e−j2𝜋fPT
𝛿h0 + 2 𝛿hi cos(2𝜋fiT) (5.39)
i=1
The amplitude of this function must be limited so that the response of the real filter is within the
required specification. An upper limit is obtained from:
∑
P
|e(f )| ⩽ |𝛿h0 | + 2 |𝛿hi || cos(2𝜋fiT)|
i=1
q
|e(f )| ⩽ N (5.40)
2
In general, this limit is much too large, and a more realistic estimate is obtained by statistical
methods [7].
When a large number of filters with a wide range of specifications are considered and when
general results are required, the variables 𝛿hi (0 ⩽ i ⩽ P) can be viewed as random independent
108 5 Finite Impulse Response (FIR) Filters
variables, uniformly distributed over the range [−q/2, q/2]. In these conditions, they have variance
q2 /12. The function e(f ) can equally be regarded as a random variable. Assume that e0 is the effective
value of the function e(f ) over the frequency range [0, f s ] – that is,
sf
1
e20 = |e(f )|2 df (5.41)
∫
2 0
In fact, the function e(f ) is periodic and defined by its Fourier series expansion. The
Bessel–Parseval equation allows us to write, according to equation (5.8):
1
fs ∑
N−1
|e(f )|2 df = (𝛿hi )2
∫
fs 0 i=0
If the filter is a low-pass one with a frequency response approaching unity in the pass band and
corresponding to the mask in Figure 5.7, initial approximations of the values of the coefficients can
be calculated by equation (5.18). Under these conditions, the maximum for h0 is obtained with:
h0 = (f1 + f2 )fs (5.45)
5.11 Z–Transfer Function of an FIR Filter 109
The Z-transfer function for an FIR filter with N coefficients is a polynomial of degree N − 1, which
is written (equation (5.7)) as:
∑
N−1
H(Z) = ai Z −i
i=0
110 5 Finite Impulse Response (FIR) Filters
This polynomial has N − 1 roots Z i ⋅(1 ⩽ i ⩽ N − 1) in the complex plane and can be written as
the product:
∏
N−1
H(Z) = a0 (1 − Zi Z −1 ) (5.49)
i=1
These roots have certain characteristics because of the properties of FIR filters.
Firstly, if the coefficients are real, each complex root Z i has a corresponding complex conju-
gate root Z i . Hence, H(Z) can be written as a product of first- and second-degree terms with real
coefficients. Each second-degree term is thus written as:
Secondly, the symmetry of the coefficients of a linear-phase filter must appear in the decompo-
sition into products of factors. For a second-degree term with real coefficients, it is necessary that
|Z i | = 1 if the roots are complex – that is, the root must lie on the unit circle. For a fourth-degree
term with real coefficients, the four complex roots have to be Z i , Zi , Z i , 1∕Zi , 1∕Z i . That is,
( ) [ ] ( )]
1 1 1
H4 (Z) = 1 − 2Re Zi + Z −1 + |Zi |2 + 2 + 4Re Z −2
Zi Zi Zi
( )
1
− 2Re Zi + Z −3 + Z −4 (5.51)
Zi
Under these conditions, an FIR linear-phase filter can be decomposed into a set of elementary
filters of second or fourth degree, having the symmetry properties of the coefficients.
The roots have been calculated for a low-pass filter with 15 coefficients. The coordinates of the
14 roots are:
Their positions in the complex plane are shown in Figure 5.16. This illustrates the characteristics
of the frequency response of the filter. The pairs of roots characteristic of the linear phase can be
clearly seen. The configuration of the roots is modified if this constraint is no longer imposed.
6.052
1 R
5.12 Minimum-Phase Filters 111
It is interesting to observe that the frequency response of a linear phase filter of even length is
zero at half sampling frequency, as shown by expression (5.14). Thus, such a filter has a zero at
f s /2. Similarly, if an odd-length filter has a zero at f s /2, this is a double zero, which ensures the
symmetry of the frequency response about f s /2.
The propagation delay through a linear-phase filter can be excessive for some applications. Also, it
is not always possible or useful to use symmetry of the coefficients of a linear-phase filter to simplify
the calculations [9]. Therefore, if phase linearity is not a strictly imposed constraint, the complexity
of the filter may be reduced by abandoning it. A linear-phase transfer function can be considered
as the product of a minimum-phase function and a pure phase shift. The condition for a Z-transfer
function to be minimum phase is that its roots must be within or on the unit circle. This point is
developed in Chapter 9.
The calculation methods developed for linear-phase filters can be adapted. However, the coef-
ficients for a minimum-phase filter can be obtained in a simple way using the coefficients of an
optimal linear-phase filter.
Let the frequency response of a linear-phase filter with N = 2P + 1 coefficients be written as:
[ ]
∑P
−j2𝜋fPT
H(f ) = e h0 + 2 hi cos(2𝜋fiT)
i=1
The ripples in the pass and stop bands are 𝛿 1 and 𝛿 2 , respectively. Let us consider the filter
obtained by adding 𝛿 2 to the above response and by rescaling it to approach unity in the pass band.
Its response H 2 (f ) will be:
[ ]
1 ∑P
−j2𝜋fPT
H2 (f ) = e h0 + 2 hi cos(2𝜋fiT) (5.52)
1 + 𝛿2 i=1
In the pass band, there are ripples with amplitude 𝛿1′ such that:
𝛿1
𝛿1′ =
(1 + 𝛿2 )
Its response in the stop band is represented in Figure 5.17; the ripples are limited to
𝛿2′ = 2𝛿2 ∕(1 + 𝛿2 ).
This is a linear-phase filter because the symmetry of the coefficients is conserved. In contrast,
it can be noticed that the roots of the Z-transfer function which are on the unit circle are double
because H 2 (f ) never becomes negative. Under these conditions, the configuration of the roots is as
shown in Figure 5.18.
H2(f)
δ'2
0 f2 0.5 f
j j
1 1
The roots which are not on the unit circle are not double. However, the absolute value of the
function H 2 (f ) is not modified, except for a constant factor, if the roots Z i outside the unit circle
are replaced by the roots 1/Z i , which are inside the unit circle and thus also become double. This
operation amounts simply to multiplication by G(Z) such that:
[1 − (Z −I ∕Zi )][1 − (Z −1 ∕Zi )]
G(Z) =
(1 − Zi Z −1 )(1 − Zi Z −1 )
Since Z −1 = Z on the unit circle, the symmetry with respect to the real axis yields:
| (Z −1 − Z )(Z −1 − Z ) |
| i i |
| | =1 (5.53)
| (Z − Z )(Z − Z ) | j∞
| i i |Z=e
and thus:
1
|G(ej2𝜋f )| =
|Zi |2
Under these conditions, we can write:
2
H2 (f ) = Hm (f )K (K is a constant)
where H m (f ) is the response of a filter which has a Z-transfer function with P roots which are
single and inside or on the unit circle. This filter satisfies the minimum-phase condition, it has
P + 1 coefficients, and the amplitudes of the ripples in the pass and stop band are 𝛿m1 and 𝛿m2 such
that:
√
𝛿1 1 𝛿1 𝛿
𝛿m1 = 1 + −1≈ ≈ 1 (5.54)
1 + 𝛿2 2 1 + 𝛿2 2
√
2𝛿2 √
𝛿m2 = ≈ 2𝛿2 (5.55)
1 + 𝛿2
To design this filter, it is sufficient to start from the linear-phase filter whose parameters 𝛿 1 and
𝛿 2 are determined from 𝛿m1 and 𝛿m2 and to follow the procedure described above. One drawback is
that the extraction of the roots of a polynomial of degree N − 1 is required, which limits the values
that can be envisaged for N. Other procedures can also be used [10, 11].
An estimate Ne′ of the order of the FIR filter with minimum phase shift can be deduced from
equation (5.32). According to the procedure described earlier for the specifications of 𝛿 1 and 𝛿 2 ,
5.13 Design of Filters with a Large Number of Coefficients 113
this becomes:
( )
1 2 1 f3
Ne′ ≈ ⋅ log
2 3 10𝛿1 𝛿22 Δf
or
( )
1 1 fs
Ne′ ≈ Ne − log (5.56)
3 10𝛿1 Δf
The validity of this formula is naturally limited to the case where 𝛿 1 ≪ 0.1. The improvement in
the order of the filter with minimum phase is a function of the ripple in the pass band. It generally
remains relatively low.
Example: Assume the following specifications for a low-pass filter:
whence
𝛿1 = 0.0822; 𝛿2 = 0.0000938
The number of coefficients needed for the corresponding linear-phase filter is estimated at
N e = 24, which leads to Ne′ = 12. Actually, it can be shown that the minimum-phase filter satisfying
the characteristic requires 11 coefficients instead of the 15 for the linear-phase filter.
In conclusion, when the symmetries provided by the linear phase cannot be used, it may be
advantageous to resort to minimum-phase filters.
Optimization techniques become difficult to use, or they no longer converge, when the number of
filter coefficients is very large – perhaps one thousand or more – corresponding to extremely nar-
row transition bands of the order of thousandths. One can then use suboptimal techniques which
require only the calculation of filters with a reduced number of coefficients. This is the case in the
method described as frequency masking [12].
Consider a filter H(Z) whose transition band Δf is centered on the cutoff frequency f c . One
starts by designing a low-pass filter H 0 (Z M ) with a reduced sampling frequency of f s /M, where
M < f s /4Δf , such that the transition band of one of the alias frequencies of this filter coincides with
the transition band of the required filter, as shown in Figure 5.19(b).
Two complementary filters are then constructed from H 0 (Z M ) as shown in Figure 5.19(c); this
requires an odd number of coefficients, 2P + 1, for H 0 (Z M ).
A diagram with two branches is obtained, to which are applied the filters G1 (Z) and G2 (Z); G1
and G2 are described as interpolators, and they have the responses given in Figure 5.19(c). It is then
sufficient to sum the outputs to obtain the desired filter of Figure 5.19(a). The overall arrangement
is shown in Figure 5.20.
The procedure thus requires three filters having transition bands of M Δf , f c − kf s /M and
(k + 1)f s /M − f c , where k is the integer which permits the cutoff frequency f c to be included.
The transfer function H(Z) of the required filter takes the form:
H(f) Δf
0 fc f (a)
H0(f)
0 f (b)
G1(f)
G2(f)
0 fc f (c)
Figure 5.19 The principle of frequency masking: (a) desired filter, (b) downsampled filter, (c) interpolating
filters.
1
––Z– PM + G 1 (Z)
2
x (n ) y (n)
+
1 –
H 0 (Z M ) – –– Z PM + G 2 (Z)
2
1
––Z– PM F 1 (Z )
2
x (n ) y (n )
+
1 –
H 0 (Z M ) – ––Z PM F 2 (Z )
2
Note that the arrangement shown in Figure 5.20 provides an efficient realization of the over-
all filter since the filter H 0 (Z M ) has M − 1 zero coefficients among two nonzero coefficients. This
arrangement can be simplified as shown in Figure 5.21. The interpolating filters can be taken as
F 1 (Z) = G1 (Z) + G2 (Z) and F 2 (Z) = G1 (Z) − G2 (Z) but they can also be derived directly from their
specifications, as deduced from Figure 5.19.
The set of coefficients aij takes the form of a N 1 × N 2 matrix denoted AN1 N2 . The corresponding
two-variable transfer function:
∑ ∑
N1 −1 N2 −1
−j
H(Z1 , Z2 ) = aij Z1−i Z2 (5.59)
i=0 j=0
where:
N1 = 2KP + 1; N2 = 2LP + 1
The function t(k, l) can be chosen so as to map the points in the frequency response of the
one-dimensional filter into contours in the (𝜔1 , 𝜔2 ) plane. For example, the circular symmetry
is approximately achieved by:
1
cos 𝜔 = [cos 𝜔1 + cos 𝜔2 + cos 𝜔1 cos 𝜔2 − 1] (5.66)
2
as can be seen from a series expansion of cos 𝜔1 , with 𝜔1 being small. The frequency response of a
filter designed that way is shown in Figure 5.22.
The implementation of a two-dimensional filter can be obtained through straight application of
equation (5.58). For filters derived from a one-dimensional function, expression (5.64) suggests an
important simplification: in the one-dimensional FIR filter with P + 1 coefficients gi , the delays are
replaced by two-dimensional sections corresponding to the function H 1 (𝜔1 , 𝜔2 ) [13].
Separable filters are particularly simple to produce; in this case, the coefficient matrix is
dyadic – that is,
AN1 N2 = V1 V2t
where V 1 and V 2 are the vectors. Then, in accordance with equation (5.60), the transfer function
factorizes:
H(z1 , z2 ) = H1 (z1 )H2 (z2 ) (5.67)
The specifications of such filters are subject to limitations. First, they must have quadrantal
symmetry along the coordinate axes. As shown in Figure 5.23, the useful frequency domain is
divided into four parts: low pass/low pass (LL), low pass/high pass (LH), high pass/low pass (HL),
and high pass/high pass (HH).
Consequently, the ripple specifications must be defined. For a two-dimensional low-pass filter,
the HH domain is subjected to the attenuation of two filters, horizontal and vertical. An illustration
is given in Figure 5.24 which shows the frequency response of a two-dimensional separable filter
based on the half-band filter with coefficients: a = [0.5 0.314 0 −0.094 0 0.045 0 −0.022].
Such filters can be realized by following the definition exactly – that is, a data table representing
an image can be processed row by row with the horizontal filter and column by column with the
vertical filter.
When the image is subjected to a horizontal scan as in television, the signal appears as
one-dimensional and can be processed as such. If each row contains N points, the transfer
H(ω1, ω2)
H(ω)
ω2
0 ω ω1
Figure 5.22 Two-dimensional FIR filter designed from a one-dimensional linear-phase filter.
5.14 Two-Dimensional FIR Filters 117
ω2
BH HH
ω2c
BB HB
0 ω1c π ω1
f2
f1
(a) (b)
Figure 5.24 Two-dimensional half-band separable filter: (a) low pass/low pass, (b) high pass/high pass.
⎡1⎤ [ ]
A′ = ⎢2⎥ −1 0 1
⎢ ⎥
⎣3⎦
and the corresponding circuit is given in Figure 5.25. Realization is particularly simple as the cir-
cuits do not contain multipliers.
–
x(n) + + + y(n)
The method will be developed for an important special case – filters with quadrantal symmetry.
Two types of filters correspond to this category: rectangular and lozenge filters with the frequency
domains shown in Figure 5.26.
The frequency response of a zero-phase filter having (2 M + 1) × (2 N + 1) coefficients with quad-
rantal symmetry is expressed by:
∑
M
∑
N
∑ ∑
M N
H(𝜔1 , 𝜔2 ) = h00 + 2 hi0 cos i𝜔1 + 2 h0j cos j𝜔2 + 4 hij cos i𝜔1 cos j𝜔2 (5.69)
i=1 j=1 i=1 j=1
In total, the filter has (1 + M + N + MN) coefficients hij with different values.
The least-squares method with weighting will be applied directly, to approach the desired
response, D(𝜔1 ,𝜔2 ). With an oversampling factor of k, the quadratic deviation function, or cost
function, to be minimized is:
KM KN | ( ) ( )2 ( )
∑ ∑| m𝜋 n𝜋 m𝜋 n𝜋 || m𝜋 n𝜋
J= |H , −D , |W , (5.70)
| KM KN KM KN || KM KN
m=0 n=0 |
with K M = k(M + 0.5) and K N = k(N + 0.5) in order to cover all the useful frequency domains.
The weighting function W(𝜔1 , 𝜔2 ) enables the approximation to be adjusted in accordance with
the ripple specifications, for example.
With simplified notation, this gives:
∑ ∑
KM KN
J= e2 (m, n)W(m, n) (5.71)
m=0 n=0
ω2 ω2
π π
π ω1 π ω1
(a) (b)
Designating the coefficient vector by [hij ] and the frequency vector by V(m, n):
[ ( ) ( ) ( ) ( ) ]
m𝜋 n𝜋 n𝜋 n𝜋
V t (m, n) = 1, … , 2 cos i , … , 2 cos j , … , 4 cos i cos j ,…
KM KN KM KN
the solution can be written as:
[K K ]−1 [ K K ]
∑M ∑N ∑ M ∑N
t
[hij ] = W(m, n)V (m, n) V (m, n) W(m, n)V (m, n) D (m, n) (5.73)
m=0 n=0 m=0 n=0
If the number of coefficients is even, it is necessary to modify the parameters. For a filter with
(2 M) × (2 N + 1) coefficients, it is necessary to take:
[ ( ) ( ) ( )]
m𝜋 n𝜋 n𝜋
V (m, n) = … , 2 cos (i − 0.5)
t
, … , 4 cos (i − 0.5) cos j (5.74)
KM KM KN
with K M = kM and K N = k(N + 0.5). The coefficient vector obtained in this case has (M + MN)
elements.
An important characteristic of filters used in image processing is the response to a unit step.
Ringing at the transition can produce repetitions of contours and thus degrade the image. It is
possible to reduce ringing by modifying the desired response D(𝜔1 , 𝜔2 ) using a slope at the end of
the pass band and the start of the attenuation band.
The method is illustrated by the design of a rectangular filter with (2 M + 1) × (2 N + 1) = 9 × 9
coefficients, with 0.125 and 0.25 as the end of the pass band and the start of the attenuation band on
Figure 5.27 Rectangular filter with 9 × 9 coefficients: (a) impulse response, (b) frequency response,
(c) horizontal section of the frequency response, (d) response to a unit step.
120 5 Finite Impulse Response (FIR) Filters
the horizontal frequency axis and 0.0625 and 0.125 on the vertical axis. The 25 different coefficients
obtained are:
⎡ 0.052427 0.0419028 0.0184534 −0.0002861 −0.006258 ⎤
⎢ ⎥
⎢0.0491981 0.0393451 0.0173566 −0.0002629 −0.0059292⎥
⎢ ⎥
hij = ⎢ 0.041534 0.0332908 0.0147612 −0.000261 −0.005282 ⎥
⎢ ⎥
⎢0.0299102 0.0240605 0.107414 −0.0002704 −0.0041828⎥
⎢ ⎥
⎢ ⎥
⎣0.0180912 0.0146366 0.0065523 −0.0003836 −0.0031209⎦
and the corresponding frequency response is given in Figure 5.27. Evidently, this response is very
close to that of a separable filter. Considering now a lozenge filter with (2 M + 1) × (2 N) = 9 × 8
coefficients, a pass band ending at 0.125, and an attenuation band starting at 0.25 on the horizontal
and vertical axes, the coefficients for one quadrant are:
Figure 5.28 Lozenge filter with 9 × 8 coefficients: (a) impulse response, (b) frequency response,
(c) horizontal section of the frequency response, (d) response to a unit step.
Exercises 121
The frequency response is given in Figure 5.28. Calculation has been guided by attempting to
reduce the unit step response g(i, j) defined by:
∑
i
∑
j
g(i, j) = h(i1 , j1 ) (5.75)
i1 =−M j1 =−N
This response is also shown in the figure, where it has been repeated in the four quadrants to
provide a complete picture.
See References [14, 15] for complementary developments in the design techniques for
two-dimensional FIR filters, including those with limited precision coefficients and constraints on
the unit step response.
Exercises
5.1 Consider the 17 coefficients of a low-pass filter with cutoff frequency f c = 0.25f s given in
Figure 5.12. How many of these coefficients assume different values? Give the expression
for the frequency response H(f ). Determine the frequencies for which it is zero and give the
maximum ripple. Determine the zeros of the filter Z-transfer function.
5.2 Consider a filter for which the sampling frequency is taken as the reference (f s = 1) and whose
frequency response H(f ) is such that:
Using the discrete Fourier transform, calculate the 17 coefficients of this filter. Draw the fre-
quency response and determine the zeros of the Z-transfer function.
5.3 Using the equations given in Section 5.7, determine the ripple of a low-pass filter with 17 coef-
ficients for which the upper frequency of the pass band is f 1 = 0.2 and the lower frequency of
the stop band is f 2 = 0.3. Compare the results obtained with those in the preceding exercises.
5.4 Consider a filter with a transfer function H(f ) which, except for the phase shift, is given by
the equation:
∑
4
H(f ) = h0 + 2 h2i−1 cos[2𝜋f(2i − 1)T]
i=0
Give the direct and transposed structures which allow this filter to be achieved with a min-
imum number of elements. What simplifications are involved if the sampling frequency of
the output is divided by two?
How is the frequency response modified if the coefficients ai are replaced by ai (−1)i and by
ai cos (i𝜋/2)? How are the Z-transfer function zeros affected?
5.6 Consider a low-pass filter which satisfies Figure 5.7 with the following values for the
parameters
f1 = 0.05; f2 = 0.15; 𝛿1 = 0.01; and 𝛿2 = 0.001
How many coefficients are needed and how many bits are required to represent them? If the
input data have 12 bits and if the signal-to-noise ratio degradation is limited to ΔSN = 0.1 dB,
what is the internal data wordlength?
References
1 T. W. Parks and J. H. McClellan, Chebyshev approximation for non recursive digital filters with
linear phase. IEEE Transactions on Circuits and Systems, 19(2), 189–194, 1972.
2 J. McClellan, T. Parks and L. Rabiner, “A computer program for designing optimum FIR linear
phase digital filters,” IEEE Transactions on Audio and Electroacoustics, 21, 6, pp. 506–526, 1973,
10.1109/TAU.1973.1162525.
3 J. Shen and G. Strang, The asymptotics of optimal (equiripple) filters. IEEE Transactions, 47(4),
1087–1098, 1999.
4 R. E. Crochiere and A. V. Oppenheim, “Analysis of linear digital networks,” Proceedings of the
IEEE, vol. 63, no. 4, pp. 581–595, 1975, 10.1109/PROC.1975.9793.
5 W. Schussler, On Structures for Non Recursive Digital Filters, Archiv der elektrischen Übertra-
gung, 1972.
6 M. Bellanger and G. Bonnerot, Premultiplication scheme for digital FIR filters. IEEE Transac-
tions, 26(1), 50–55, 1978.
7 D. Chan and L. Rabiner, “Analysis of quantization errors in the direct form for finite impulse
response digital filters,” IEEE Transactions on Audio and Electroacoustics, vol. 21, no. 4,
pp. 354–366, 1973, 10.1109/TAU.1973.1162497.
8 F. Grenez, Synthèse des filtres numériques non récursifs à coefficients quantifiés. Annales des
Télécommunications, 34, 1–2, 1979.
9 M. Feldmann, J. Henaff, B. Lacroix and J. C. Rebourg, Design of minimum phase
charge-transfer transversal filters, Electronic Letters, 15(8), 1979.
10 R. Boite and H. Leich, A new procedure for the design of high order minimum phase FIR
filters. Signal Processing, 3(2), 101–108, 1981.
11 Y. Kamp and C. J. Wellekens, Optimal design of minimum-phase FIR filters. IEEE Transactions,
31(4), 922–926, 1983.
12 Y. C. Lim and Y. Lian, The optimum design of one and two-dimensional FIR filters using the
frequency response masking technique. IEEE Transactions on Circuits and Systems II: Express
Briefs, 40(2), 88–95, 1993.
13 D. Dudgeon and R. Mersereau, Multidimensional Digital Signal Processing, Prentice-Hall,
Englewood Cliffs, NJ, 1984.
14 D. E. Pearson, Image Processing, McGraw-Hill, UK, 1991.
15 V. Ouvrard and P. Siohan, Design of 2D video filters with spatial constraints, Proceedings of
EUSIPCO-92, North Holland, Brussels, August 1992, 1001–1004.
123
Digital filters with an infinite impulse response (IIR) are discrete linear systems which are governed
by a convolution equation based on an infinite number of terms. In principle, they have infinite
memory. This memory is achieved by feeding the output back to the input, so they are known as
recursive filters. Each element of the set of output numbers is calculated by weighted summation
of a certain number of elements of the input set and of the previous output set.
In general, this IIR allows for much more selective filtering functions to be obtained than with
finite impulse response (FIR) filters of similar complexity. However, the feedback loop complicates
the study of the properties and design of these filters and leads to parasitic phenomena.
When examining IIR filters, it is simpler initially to consider them in terms of first- and
second-order sections. Not only are these simple structures useful in introducing the properties of
IIR filters. They also represent the most frequently used type of implementation. Indeed, even the
most complex IIR filters appearing in practice are generally formed from a set of such sections.
Consider a system which, for the set of data x(n), produces the set y(n) such that:
This set constitutes the impulse response of the filter. It is infinite and the stability condition is
written as:
∑
∞
|b|n < ∞
n=0
y(n)
1
1–b
1–e–1
1–b
–1 0 1 2 3 4 5 6 7 8 9 10 11 n
τ
The response of the system for the set x(n) such that:
x(n) = 0 for n < 0
1 for n ≥ 0
which tends toward 1/(1 − b) as n tends toward infinity if the system is stable. This response is
shown in Figure 6.1.
By analogy with a continuous system with time constant 𝜏, sampled with period T, and whose
response yc (n) is written as:
yc (n) = [1 − e−(T∕𝜏)(n+1) ]
e−T∕𝜏 = b
b = 1 − 𝛿 with 0 < 𝛿 ≪ 1
we obtain:
T
𝜏≈ (6.5)
𝛿
This situation is encountered in adaptive systems, which are presented in Chapter 14.
6.1 First-Order Section 125
If the input set x(n) results for n ≥ 0 from the sampling of the signal x(t) = ej2𝜋ft or x(t) = ej𝜔t , with
period T = 1, then:
ejn𝜔 bn+1 e−j𝜔
y(n) = − (6.6)
1 − be−j𝜔 1 − be−j𝜔
This expression exhibits a transient and a steady-state term which corresponds to the frequency
response H(𝜔) of the filter:
1
H(𝜔) = (6.7)
(1 − be−j𝜔 )
The modulus and the phase of this function are:
1 b sin 𝜔
|H(𝜔)|2 = ; 𝜙(𝜔) = tan−1 (6.8)
(1 − 2b cos 𝜔 + b2 ) 1 − b cos 𝜔
The phase can also be written as:
( )
sin 𝜔
𝜙(𝜔) = tan−1 −𝜔
(cos 𝜔 − b)
and the group delay:
d𝜙 b cos 𝜔 − b2
𝜏g (𝜔) = = (6.9)
d𝜔 1 − 2b cos 𝜔 + b2
It should be noted that for very small values of 𝜔:
|
| H(𝜔)|2 ≈ 1
|
{ [ ]} (6.10a)
b
(1 − b2 ) 1 + (1−b)2
𝜔2
This expression approximates the response H RC (𝜔) of an RC circuit, which is written as:
1
|HRc (𝜔)|2 = (6.10b)
1 + R2 C2 𝜔2
It appears that for frequencies which are very small in comparison to the sampling frequency,
the digital circuit has a frequency response similar to that of an RC network. Figure 6.2(a) shows
the form of the frequency response for a digital first-order circuit. Figure 6.2(b) gives the phase
response, and Figure 6.2(c) the group delay.
The phase can be written as:
( )
sin 𝜔
𝜑(𝜔) = tan−1 −𝜔 cos 𝜔 > b
cos 𝜔 − b
𝜋
𝜑max cos−1 b cos 𝜔 = b
2
( )
sin 𝜔
𝜑(𝜔) = 𝜋 + tan−1 − 𝜔 cos 𝜔 > b (6.11)
cos 𝜔 − b
It passes through a maximum for 𝜔 such that cos 𝜔 = b, which corresponds to cancellation of the
group delay. Thus, the coefficient b directly controls the maximum phase of the section.
The transfer function of the first-order section can also be obtained using the Z-transform.
Assume Y (Z) and X(Z) are the transforms of the output and input sets, respectively. Then:
Y (Z) = X(Z) + bZ −1 Y (Z)
and thus the Z-transfer function, H(Z), is:
1
H(Z) = Z
−1
(1 − bZ ) = (Z−b)
126 6 Infinite Impulse Response (IIR) Filter Sections
|H(f)|
1
1–b φ(ω)
φmax
–π 0 Arc cos b π ω
1
1+b
0 0.25 0.5 f
(a) (b)
τg(ω) b
j
1–b
M
Arc cos b ω α
0 π ω 0 b P 1
–b
1+b
(c) (d)
Figure 6.2 First-order section: (a) frequency response, (b) phase response, (c) group delay, and (d) pole
The frequency response is obtained simply by replacing Z with ej𝜔 , in the expression for H(Z),
with 𝜔 = 2𝜋f .
A graphical interpretation is given in Figure 6.2(d), which represents the pole P of this function
in the complex plane. This is a point on the real axis with coordinate b.
Following this figure:
1
|H| = and 𝜙 = 𝛼 − 𝜔
MP
The stability condition implies that the pole P is inside the unit circle.
An interesting special case is the narrowband integrator, defined by the following transfer func-
tion:
𝜀
H(𝜏) (6.12)
1 − (1 − 𝜀)Z −1 int
with 𝜀 being small such that 0 < 𝜀 ≪ 1.
It can also be shown that the 3 dB bandwidth is approximately equal to 𝜀 and the time constant
equal to 1/𝜀. The norm of the frequency response is: ||H||22 ≈ 2𝜀 .
The one-sided Z-transform generates the transient responses and introduces the initial condi-
tions. Indeed:
∑
∞
∑
∞
∑
∞
y(n)Z −n = x(n)Z −n + b y(n − 1)Z −n
n=0 n−0 n=0
Y (Z) = X(Z) + by(−1) + bZ −1 Y (Z)
6.2 Purely Recursive Second-Order Section 127
Hence:
X(Z) by(−1)
Y (Z) = +
1 − bZ −1 1 − bZ −1
If x(n) = ejn𝜔 , X(Z) is written:
∑
∞
1
Y (Z) = ejn𝜔 Z −n = (6.13)
n=0 1 − ej𝜔 Z −1
The value y(n) is obtained from the equation for the inverse Z-transform:
[ ]
1 n−1 1 1 by(−1)
y(n) = ΓZ • + dZ
j2𝜋 ∫ 1 − ej𝜔 Z −1 1 − bZ −1 1 − bZ −1
By taking a circle with radius greater than unity as the contour of integration, Γ, the theory of
residues gives:
ejn𝜔 bn+1 e−j𝜔
y(n) = − + y(−1)bn+1 (6.14)
1 − be−j𝜔 1 − be−j𝜔
which can also be obtained directly from a series expansion of Y (Z). This expression shows the
steady-state and transient responses, and also the response due to the initial conditions. The last
two items disappear when n increases if |b| < 1 – that is, if the system is stable.
This analysis shows that the first-order filter offers restricted possibilities because it has only
one pole, which must be real if the filter has real coefficients. Further, its frequency response is a
monotonic function. The second-order filter has a wider variety of possibilities. It is the structure
most commonly used in digital filtering because of the modularity it allows, even with the most
complex filters, and because of its properties relating to limitations in the coefficient wordlengths
and the round-off noise. We will first examine the purely recursive filter section.
where b1 and b2 are the coefficients. The corresponding time constant 𝜏 12 is given by:
√ √
𝜏12 ≈ 2 𝜏1 𝜏2 (6.18)
for coefficients sufficiently close to unity. For N identical sections, the time constant 𝜏 N can be
approximated by:
√
𝜏N ≈ (N)𝜏1 (6.19)
(2) b𝟐𝟏 < 𝟒b𝟐 : the two poles are complex conjugates written as P and P, with
b1 1√ ( )
P=− +j 4b2 − b21 (6.20)
2 2
Figure 6.3 illustrates this, the most interesting case, and the remainder of this section will con-
centrate on this.
The relation between the position of the poles and the filter coefficients is
very simple:
b1 = −2Re(P) (6.21)
That is, the coefficient of the Z −1 term in the expression for H(Z) is equal in modulus to twice
the real part of the pole and has the opposite sign. Then:
b2 = |OP|2 (6.22)
The coefficient of the Z −2 term is equal to the square of the modulus of the pole or to the square
of the distance from the pole to the origin. As will be seen later, both relations are very useful in
determining filter coefficients.
If M denotes the coordinate ej𝜔 in the complex plane, the modulus of the transfer function is:
1
|H(𝜔)| =
MP•M P
and the phase is:
𝜙(𝜔) = 𝛼1 + 𝛼2 − 2𝜔
−−→ −−→
where 𝛼 1 and 𝛼 2 denote the angles between the vectors PM and PM and the real axis.
The analytical expressions are deduced from H(Z) by letting Z = ej𝜔 . By using
1
H(Z) =
1 + b1 Z −1 + b2 Z −2
α1 P
M
ω
0
1
− α2
P
6.2 Purely Recursive Second-Order Section 129
we have:
1
|H(𝜔)|2 = (6.23a)
1 + b21 + b22 + 2b1 (1 + b2 ) cos 𝜔 + 2b2 cos 2𝜔
[ ]
b1 sin 𝜔 + b2 sin 2𝜔
𝜙(𝜔) = − arctan (6.24a)
1 + b1 cos 𝜔 + b2 cos 2𝜔
A very elegant form for the frequency response and the phase is obtained by representing the
poles in polar coordinates, P = rej𝜃 , and expressing H(Z) as a factor product:
1
H(Z) =
(1 − PZ −1 )(1 − PZ −1 )
The coefficients b1 and b2 then become:
b1 = −2r cos 𝜃; b2 = r 2
For H(𝜔), we obtain:
1
H(𝜔) = (6.25)
[1 − rej(𝜃−𝜔) ][1 − re−j(𝜃+𝜔) ]
Hence:
1
|H(𝜔)|2 = (6.23b)
[1 + r 2 − 2r cos(𝜃 − 𝜔)][1 + r 2 − 2r cos(𝜃 + 𝜔)]
[ ] [ ]
r sin(𝜃 + 𝜔) r sin(𝜃 − 𝜔)
𝜙(𝜔) = arctan − arctan (6.24b)
1 − r cos(𝜃 + 𝜔) 1 − r cos(𝜃 − 𝜔)
These expressions permit the curves for |H(𝜔)| and 𝜙(𝜔) to be plotted as a function of the
frequency 𝜔 = 2𝜋f . It can be shown that |H(𝜔)| is an even function and that 𝜙(𝜔) is an odd
function of 𝜔.
The values corresponding to the extrema of |H(𝜔)| are the roots of the following equation, which
is obtained by taking the derivative of equation (6.23) with respect to 𝜔:
sin 𝜔[b1 (1 + b2 ) + 4b2 cos 𝜔] = 0
The extremum frequencies are 0 and 0.5 and another extremal frequency f 0 exists if:
| b1 (1 + b2 ) |
| |<1 (6.26a)
| 4b |
| 2 |
or, in polar coordinates:
2r
cos𝜃 < (6.26b)
1 + r2
In this case,
b1 (1 + b2 )
cos(2𝜋fo ) = cos 𝜔o = − (6.27)
4b2
The frequency f 0 is the resonance frequency of the filter section. The amplitude at the resonance
is written as:
( )
1 √ 4b2
Hm = (6.28)
1 − b2 4b2 − b21
or, in polar coordinates:
1 1
Hm = (6.29)
1 − r (1 + r) sin 𝜃
130 6 Infinite Impulse Response (IIR) Filter Sections
Thus, it appears that the frequency response at resonance is inversely proportional to the distance
from the pole to the unit circle. This is a fundamental expression which will be used frequently in
the following chapters.
It is also important for the second-order section to determine the 3-dB bandwidth, B3 , such that:
B3 = f2 − f1 = (𝜔2 − 𝜔1 )∕2𝜋
with:
|H(𝜔1 )|2 = |H(𝜔2 )|2 = Hm
2
∕2
For a strongly resonant filter section (r ≈ 1), using equations (6.22) and (6.23), the following
approximation holds in the vicinity of the resonance frequency:
| | 1 1
|H(𝜔1 )2 | ≈
| | 4 sin2 𝜃 1 + r 2 − 2r cos(𝜃 − 𝜔1 )
[ ]
1 1
=
2 (1 − r 2 )2 sin2 𝜃
whence:
cos(𝜃 − 𝜔1 ) = ((1 + r 2 )∕2r) − (1 − r 2 )2 ∕4r
By expansion and limiting the number of terms, we derive:
|𝜃 − 𝜔1 | ≈ 1 − r
Hence, the approximation for a strongly resonant filter section is:
B3 = (1 − r)∕𝜋 (6.30a)
This result is used below for calculating the arithmetic complexity of filters.
Another characteristic is sometimes used for a purely recursive second-order section: the equiva-
lent noise bandwidth B2 . This is the bandwidth of a noise source whose spectral density is assumed
to be constant within this band and equal to Hm 2 and whose total noise power is equal to the power
obtained at the output of the section when white noise of unit power is applied. By definition:
Bb .Hm
2
= ||H||22
Taking account of the expression for Bb .Hm
2 = ||H||2 given below (6.36), and expression (6.29)
2
above, yields:
(1 − r 2 )sin2 𝜃
Bb = (6.30b)
1 + r 4 − 2r 2 cos 2𝜃
This expression is useful in spectral analysis, for example.
The main characteristics of a purely recursive second-order filter section can be illustrated by an
example.
Example: Consider a second-order filter section having poles with coordinates:
P = 0.6073 + j0.5355
P = 0.6073 − j0.5355
The various parameters are:
b1 = −2 Re(P) = −1.2146
b2 = |OP|2 = 0.6556
6.2 Purely Recursive Second-Order Section 131
|H(f)|
5
Hm
4
1
H(Z) =
1 − 1.2146Z −1 + 0.6556Z −2
1
|H(𝜔)|2 =
2.905 − 4.02 cos 𝜔 + 1.31 cos(2𝜔)
𝜃 = 2𝜋 × 0.1156; r = 0.81; f0 = 0.111; Hm = 4.39; B3 = 0.06
The modulus of the response is shown in Figure 6.4 as a function of the frequency.
The phase response of the second-order section can be considered using equation (6.24) for the
function 𝜙(𝜔). In order to describe the variations in this function, it is useful first to calculate its
derivative using equation (6.24b). Thence,
d𝜙 r cos(𝜃 + 𝜔) − r 2 r cos(𝜃 − 𝜔) − r 2
= + (6.31)
d𝜔 1 − 2r cos(𝜃 + 𝜔) + r 2 1 − 2r cos(𝜃 − 𝜔) + r 2
This derivative is interesting as it is the group delay of the filter. By definition (1.29):
d𝜙
𝜏(𝜔) =
d𝜔
Thus, the group delay can be written as:
r[cos(𝜃 + 𝜔) − r] r[cos(𝜃 − 𝜔) − r]
𝜏(𝜔) = + (6.32)
1 − 2r cos(𝜃 + 𝜔) + r 2 1 − 2r cos(𝜃 − 𝜔) + r 2
The function 𝜏(𝜔) has a maximum in the vicinity of the resonance frequency. At the frequency
of f = 𝜃/2𝜋, this becomes:
[ ]
r (1 − r)[cos 2𝜃 − r] r
𝜏(𝜃) = 1+ ≈ (6.33)
1−r 1 − 2r cos 2𝜃 + r 2 1−r
In physical systems, this function is positive and 𝜙(𝜔) is an increasing function which has a value
of 0 at the origin and a multiple of 𝜋 at frequency 0.5.
Example: r = 0.81; 𝜃 = 2𝜋 × 0.1156
Figure 6.5 shows the curve 𝜏(f ) as a function of frequency. This curve has a maximum of 3.8 in
the vicinity of the resonance. The unit of time is the sampling period T. The values obtained should
be multiplied by T if this period is different from unity.
The function 𝜏(f ) can be seen to have negative values. In fact, it is the theoretical group delay of
the filter. The system, as it has been presented, however, cannot actually be realized. Each output
element y(n) is calculated by an addition which involves an input number x(n). This operation
cannot be instantaneous. To enable the system to be realized, y(n) has to be delayed, for example,
132 6 Infinite Impulse Response (IIR) Filter Sections
τ(f)
5
4 τm = 3.8
0
0.1 f0 0.2 0.3 0.4 0.5 f
–1
φ(f)
π
3
by one period. The group delay will then be increased correspondingly, and it is necessary to add
the value 𝜔 to the phase 𝜙(𝜔). The function 𝜙(𝜔) obtained under these conditions is represented
in Figure 6.6, and the curve has maximum slope in the vicinity of the resonance.
The equations which were given for the functions |H(𝜔)|, 𝜙(𝜔), and 𝜏(𝜔) are important
because they allow the corresponding functions to be determined for filters realized by cascading
second-order sections, either by multiplication for the modulus of the frequency response, or by
addition for the phase and the group delay.
To introduce the initial conditions and find the transient responses, the one-sided Z-transform
is used. From the equation of definition of the filter section, the following relation can be found
between the one-sided transforms Y (Z) and X(Z):
h(n)
0.5
–2 –1 0 1 2 3 4 5 6 7 8 9 10 11 12 n
Finally,
1 + r2 1
||H||22 = (6.36)
1 − r 2 1 + r 4 − 2r 2 cos 2𝜃
The value ||H||1 is also used:
∑
∞
||H||1 = |h(n)|
n=0
1 ∑ n
∞
1
||H||1 = r | sin[(n + 1)𝜃]| ≤ (0 < 𝜃 < 𝜋) (6.37)
sin 𝜃 n=0 (1 − r) sin 𝜃
Example: When the poles are located on the imaginary axis in the Z-plane, 𝜃 = 𝜋/2, and the
impulse response is given by:
h(2p) = r 2p (−1)p
then:
∑
∞
1
||H||22 = |h(2p)|2 =
p=0
1 − r4
∑
∞
1
||H||1 = h(2p) =
p=0
1 − r2
The results obtained for the purely recursive second-order section can be extended to the general
second-order section.
The most general second-order filter introduces the input data x(n − 1) and x(n − 2) to the
calculation of an element y(n) of the output set at time n. Its equation of definition is written as:
or:
1 − 2 Re(Z0 )Z −1 + Z −2
HT (Z) = a0
1 − 2 Re(P)Z −1 + |P|2 Z −2
The modulus of the frequency response of the general second-order section, when the zeros are
placed on the unit circle, is expressed by:
(a1 + 2a0 cos 𝜔)2
|HT (𝜔)|2 = (6.40)
1 + b21 + b22 + 2b1 (1 + b2 ) cos 𝜔 + 2b2 cos 2𝜔
Such a filter can be regarded as the cascade of a purely recursive IIR filter section and a
linear-phase FIR filter section. Consequently, the phase characteristics and the group delay of the
complete filter section are sums of the characteristics of the elementary components. That is,
r cos(𝜃 + 𝜔) − r 2 r cos(𝜃 − 𝜔) − r 2
𝜏T (𝜔) = 1 + + (6.41)
1 − 2r cos(𝜃 + 𝜔) + r 2 1 − 2r cos(𝜃 − 𝜔) + r 2
[ ] [ ]
r sin(𝜃 + 𝜔) r sin(𝜃 − 𝜔)
𝜙T (𝜔) = 𝜔 + arctan − arctan (6.42)
1 − r cos(𝜃 + 𝜔) 1 − r sin(𝜃 − 𝜔)
These two expressions give the phase and the group delay of a second-order section when both
zeros are on the unit circle. This filter section is usually called the second-order elliptic section,
from the technique used for calculating the coefficients.
Example: To illustrate the properties of a general second-order filter section, let us use the
example given earlier by completing the filter with two zeros:
Z0 = 0.3325 + j0.943 and Z 0 = 0.3325 − j0.943
The positions of the singularities in the complex plane are given by Figure 6.8(a). The transfer
function H T (Z) is the quotient of the two second-order polynomials N(Z) and D(Z):
N(D)
HT (Z) = a0
D(Z)
with:
N(Z) = (1 − 0.665Z −1 + Z −2 )
D(Z) = 1 − 1.2146Z −1 + 0.6556Z −2
Figure 6.8(b) shows the frequency response for the filter. The contributions of the numerator
and the denominator of the transfer are also indicated. The factor a0 corresponds to a scaling factor
which is calculated so that the response of the filter has a specified value at a given frequency. For
example:
HT (0) = 1 results in a0 = 0.33
The group delay and the phase are given by Figures 6.5 and 6.6, respectively. The norm ||H T ||2 of
the function H T (𝜔) is calculated as given earlier, and is:
2 + a21 + a21 b2 − 4a1 b1 + 2b21 − 2b22
||HT ||22 = a20 [ ] (6.43)
(1 − b2 ) (1 + b2 )2 − b21
An important particular case is the notch filter, which is used to remove a line in a spectrum
without disturbing the other components. The transfer function is:
1 + a1 Z −1 + Z −2
HN (Z) = (6.44)
1 + a1 (1 − 𝜀)Z −1 + (1 − 𝜀)2 Z −2
136 6 Infinite Impulse Response (IIR) Filter Sections
j Z0 = 0.3325 + j 0.943
P = 0.6073 + j 0.5355
x
0 1
x
(a)
HT(f)
5
3
N(f)
2
1
1/D(f)
0 0.1 0.2 0.3 0.4 0.5 f
(b)
Figure 6.8 (a) Poles and zeros of a general second-order section. (b) Frequency response of a general
second-order section
where 𝜀 is a small positive real value. As shown in Figure 6.9(a), 𝜀 is the distance of the poles to the
unit circle. For very small value of 𝜀, the 3-dB attenuation bandwidth can be approximated by:
𝜀
B3N ≈
𝜋
Outside the notch, the poles compensate for the zeros and the frequency response is almost flat
and close to unity. Moreover, such a filter achieves a very small amplification of the input white
noise, since equation (6.43) yields:
2 − 3𝜀
||HN ||22 ≈ ≈1+𝜀
2 − 5𝜀
If the frequency of the signal to be removed is not precisely known, then the zeros have to be
moved inside the unit circle in order to increase the notch width.
Another class of general second-order sections is that of phase-shifter circuits. The phase-shifter
circuit is characterized by the fact that the numerator and the denominator of the transfer function
have the same coefficients but in the reverse order:
b2 + b1 Z −1 + Z −2 N(Z)
HD (Z) = = (6.45)
1 + b1 Z −1 + b2 Z −2 D(Z)
The polynomials N(Z) and D(Z) are image polynomials. As a result, |H D (ej𝜔 )| = 1 – that is, the
circuit is a pure phase shifter.
The transfer function H D (Z) is written as a function of the poles and zeros as:
(P − Z −1 )(P − Z −1 )
HD (Z) =
(1 − PZ −1 )(1 − PZ −1 )
6.3 General Second-Order Section 137
|HE(f)|
1 j
ej2πf0
1–
0 f0 0.5f 1
(a)
+ yE(n)
x(n) Phase
shifter
– yL(n)
(b)
Figure 6.9 (a) Second-order notch filter. (b) Implementation of the notch filter and its complement
j
Z0
P
1
−
P
−
Z0
It is clear that the poles and zeros are harmonic conjugates and Figure 6.10 shows their position
in the Z-plane.
The calculation of the phase and group delay for this circuit can be very simply deduced from
equations (6.24) and (6.32) for the purely recursive element as:
N(Z) Z −2 (Z −1 )
HD (Z) = =
D(Z) D(Z)
However,
Hence,
𝜙D (𝜔) = 2𝜙(𝜔) − 2𝜔
After a few arithmetic manipulations, the group delay 𝜏 g (𝜔) of the phase shifter becomes:
1 − r2 1 − r2
𝜏g (𝜔) = + (6.46)
1 + 2r cos(𝜃 − 𝜔) + r 2 1 − 2r cos(𝜃 + 𝜔) + r 2
138 6 Infinite Impulse Response (IIR) Filter Sections
It is not difficult to show that, as 𝜔 varies from 0 to 𝜋, the phase 𝜙D (𝜔) varies by 2𝜋:
𝜋 𝜋
1 − r2
𝜙D (𝜋) = 𝜏(𝜔)d𝜔 = 2 d𝛼 = 2𝜋
∫0 ∫0 1 + r 2 − 2r cos α
An interesting application of this result is that the abovementioned notch filter can be imple-
mented with the help of a phase shifter. In fact, two complementary filters can be obtained with a
single all-pass section, as shown in Figure 6.9(b). The filter zero that sits on the unit circle corre-
sponds to the phase shift 𝜋. In fact, it is even a set of two complementary filters that are obtained
with a single second-order phase shifter [1, 2].
The elements can be implemented by circuits which directly produce the operations represented
in the expression for the transfer functions. The term Z −1 corresponds to a delay of one elementary
period and is achieved by one memory element. The coefficients used in the circuits are those
of the transfer function, with the same sign for the numerator and the opposite sign for the
denominator.
The circuit which corresponds directly to the equation for the definition of the purely recursive
second-order section is given in Figure 6.11. The output numbers y(n) are delayed twice, and mul-
tiplied by the coefficients −b1 and −b2 before being added to the input numbers x(n). The circuit
includes two memory locations for data and two for the coefficients. For each output number, two
multiplications and two additions are required.
The general second-order filter section can be realized to conform with the equation of defini-
tion. However, two data memory locations are required for the input numbers and two for the
output numbers. The structure obtained is not canonical, as it contains more than the minimum
number of components. Indeed, only two data memories are necessary if the transfer function is
factorized as:
N(Z) 1
HT (Z) = = N(Z)
N(Z) D(Z)
That is, the calculations involved in the denominator are performed first, followed by those for the
numerator. This structure, called D–N, is shown in Figure 6.12. It corresponds to the introduction
x(n) + y(n)
Figure 6.11 Circuit of a purely recursive section
Z –1
–b1
+ y(n–1)
Z –1
–b2
y(n–2)
6.4 Structures for Implementation 139
a0
x(n) + + y(n)
Z–1
–b1 a1
u1(n)
+ +
Z –1
–b2 a2
u2(n)
of two internal variables u1 (n) and u2 (n) forming a state vector U(n) with N = 2 dimensions. The
system is described by the following equations:
u1 (n + 1) = x(n) − b1 u1 (n) − b2 u2 (n)
u2 (n + 1) = u1 (n)
y(n) = a0 x(n) − a0 b1 u1 (n) − a0 b2 u2 (n) + a1 u1 (n) + a2 u2 (n)
a0
x(n) y(n)
v1(n)
Z–1
a1 –b1
+
v2(n)
Z–1
a2 –b2
+
This representation thus results in a canonical realization, which has the minimum number of
internal variables and consequently the minimum number of memory spaces.
Using the results given in Section 4.6, we see that there is a dual structure which corresponds to
the internal variables 𝜐1 (n)and 𝜐2 (n) such that:
[ ] [ ] [ ] [ ]
𝜐1 (n + 1) −b1 1 𝜐1 (n) −a0 b1 + a1
= x(n)
𝜐2 (n + 1) −b2 0 𝜐2 (n) −a0 b2 + a2
y(n) = 𝜐1 (n) + a0 x(n) (6.48)
The limitation of the coefficient wordlength means that the coefficients can have only a limited
number of values. It follows, therefore, that the poles have a limited number of possible positions
inside the unit circle. The same effect occurs for zeros on the unit circle when the filter is elliptic.
Thus, quantization of the absolute value of the coefficients to b bits limits the number of positions
that the poles can take in a quadrant of the unit circle to 22b , and the number of frequencies of
infinite attenuation to 2b . If the transfer function is calculated first, and the number of bits in the
coefficients is then limited, for example, by rounding, the transfer function is modified by perturba-
tions eN (Z) and eD (Z) in the numerator and the denominator [3]. The function H R (Z) is obtained:
N(Z) + eN (Z)
HR (Z) = (6.49)
D(Z) + eD (Z)
If the round-off errors in the coefficients are denoted by 𝛿ai and 𝛿bi (0 ≤ i ≤ 2), the perturbation
transfer functions are written as:
∑
2
∑
2
eN (f ) = 𝛿ai e−j2𝜋fi ; eD (f ) = 𝛿bi e−j2𝜋fi
i=0 i=1
Let us consider the case of the elliptic filter element whose coefficients are quantized by rounding
to bc bits, including the sign. Allowing for the inequalities:
q = 2 × 21−bc = 22−bc
Thus,
The modifications of the transfer function caused by quantization of the coefficients of the
denominator are a maximum for frequencies near the poles because the function D(f ) is then a
minimum.
6.6 Internal Data Wordlength Limitation 141
a0
x(n) + + y(n)
Q
Z–1
e(n)
–b1 a1
+ +
Z–1
–b2
filter. If the quantization step has the value q, this error signal can be regarded as having a spectrum
with uniform distribution and a power q2 /12. Under these conditions, the round off noise N c at the
output can be determined if f s = 1 by using equation (4.25) in Section 4.4, as:
1| |2
q2 | N(f ) | df
Nc =
12 ∫0 | D(f ) ||
|
or, as a function of the set h(n), the impulse response of the filter:
q2 ∑
∞
Nc = |h(n)|2
12 n=0
By using the results in the earlier sections, for a purely recursive element with complex poles and
polar coordinates (r, 𝜃), this becomes:
q2 1 + r 2 1
Nc = (6.53)
12 1 − r 2 1 + r 4 − 2r 2 cos 2𝜃
and for the elliptic section:
2
( 2 2 2 2
)
q2 a0 2 + a1 + a1 b2 − 4a1 b1 + 2b1 − 2b2
Nc = [ ] (6.54)
12 (1 − b2 ) (1 + b2 )2 − b21
The quantization step q is related to the number of bits of the internal data memories. This
relation involves the amplitude of the frequency response of the purely recursive part. It is studied
in detail in the following chapter, for a cascade of second-order sections.
In this section, only the D–N structure has been considered. The calculations can be readily
adapted to the N–D structure [4]. The introduction of the quantization device also has consequences
in the absence of a signal.
b2 b12
b2
4
1
Complex poles
–1 1 2
0 b1
Real poles
–1
1
–
b2
1
b
=
=
–b
2
b
1
–
1
Nevertheless, even if the stability condition is fulfilled, there may still be a signal at the output
in the absence of an input. This is usually a constant or periodic signal which corresponds to an
auto-oscillation of the filter, and which is often called a limit cycle. Such auto-oscillations can be
produced with large amplitudes if overflow occurs when the capacity of the memories is exceeded
in the absence of a logic saturation device. The equation for the system when there is no input
signal is:
The natural condition for the absence of oscillations is given by the inequality:
and thus the condition which is necessary and sufficient for the absence of large-amplitude
auto-oscillations is:
This inequality determines a square in the plane (b1 , b2 ), within the triangle of stability of the
filter element.
To eliminate all possibility of large-amplitude oscillations caused by overflow of the memory, one
can show that it is sufficient to employ a logic saturation device as shown in Section 6.6 [5].
Limit cycles are also produced through quantization before storage in the memory. However,
these have small amplitudes in well-designed systems. They arise through the fact that, in practice,
the input signal is never zero, because, even in the absence of data x(n), the error signal e(n) caused
by quantization of the internal data before storage in the memory is still applied to the filter. An
estimation of the amplitude Aa of the limit cycle is given by the expression:
q
Aa = max |H(ω)|
2
where H(𝜔) is the transfer function for the filter section.
144 6 Infinite Impulse Response (IIR) Filter Sections
Application to a purely recursive second-order section with complex poles produces, according
to equations (6.37) and (6.29):
q 1
|y(n)| ≤ (6.57)
2 (1 − r) sin 𝜃
q 1
Aa = (6.58)
2 (1 − r 2 ) sin 𝜃
These signals often have a spectrum formed of lines with frequencies close to those at which
H(𝜔) is maximum, which are either factors of, or in simple ratios to, the sampling frequency.
When designing filters, the number of bits in the internal data memories must be chosen to be
sufficiently large, and the quantization step q has to be chosen to be sufficiently small to prevent
the limit cycles from being troublesome. It should be noted also that they can be eliminated by
using a type of quantization other than rounding (for example, truncation of the absolute value)
[2]. However, this is only achieved with an increase in the power of the round-off noise in the
presence of a signal [6].
The results obtained in this chapter will be used in the next chapter, where second-order sections
in cascade will be discussed.
Exercises
6.2 Calculate the response of the system which is defined by the following equation:
y(n) = x(n) + x(n − 1) − 0.8y(n − 1)
to the unit set u0 (n) and the set x(n) such that:
x(n) = 0 for n < 0
1 for n ≥ 0
Give the steady-state frequency response and the transient response.
6.3 Assume a purely recursive second-order section which has the following coefficients:
b1 = −1.56; b2 = −0.8
References 145
State the position of the poles. Calculate the frequency response, the phase response, and
the group delay. How are the functions modified if two zeros are added at j and −j? For this
case, show the circuit diagram using the D–N form and count the number of multiplications
required for each output number.
6.4 Give the expression for the impulse response of a purely recursive second-order filter section
which has the following coefficients:
b1 = −1.60; b2 = −0.98
Calculate the resonance frequency and amplitude of the response at resonance. Give the
response H(𝜔) and calculate the norm ||H||2 .
The zeros are added on the unit circle to produce an infinite attenuation at frequency 3f s /8.
What are the coefficients of the filter? Calculate the new expression for H(𝜔) and the new
value of ||H||2 .
For this filter, find the amplitude of the limit cycles, using the D–N form and then the N–D
form. Give an example of a limit cycle.
Does this filter produce large-amplitude oscillations when there is no logic saturation device?
Give an example.
6.5 How many bits are needed to represent the coefficients of the filter with the following
Z-transfer function:
1 − 0.952Z −1 + Z −2
H(Z) =
1 − 1.406Z −1 + 0.9172−2
in order that the frequency response is not modified by more than 1% in the vicinity of the
poles? Calculate the displacement of the point of infinite attenuation.
6.6 Assume the realization of a second-order phase shifter having poles P1,2 such that:
Calculate the coefficients and give the expression for the function 𝜏 g (𝜔). Show that an imple-
mentation scheme exists which produces a reduced number of multiplications. When there
is no logic saturation device, can this element exhibit large-amplitude oscillations? Can it
produce low-amplitude limit cycles?
References
1 S. K. Mitra and J. F. Kaiser, Handbook for Digital Signal Processing, John Wiley, New York, 1993.
2 P. A. Regalia, S. K. Mitra and P. P. Vaidyanathan, All-pass filter: a versatile signal processing
building block. Proceedings of the IEEE, 76(1), 19–37, 1988.
3 J. B. Knowles and E. M. Olcayto, Coefficient accuracy and digital filter response. IEEE Transac-
tions on Circuit Theory, 1968.
4 L. B. Jackson, On the interaction of round-off noise and dynamic range in digital filters. Bell Sys-
tems Technical Journal, 1970.
146 6 Infinite Impulse Response (IIR) Filter Sections
5 P. Ebert, J. Mazo and M. Taylor, Overflow oscillations in digital filters. Bell Systems Technical
Journal, 1969.
6 T. Claasen, W. Mecklenbrauker and J. Peek, Effects of quantizations and overflow in recursive
digital filters. IEEE Transactions, ASSP24(6), 1976.
147
Digital filters with an infinite impulse response (IIR), or recursive filters, have properties simi-
lar to those of analog filters, and consequently, their coefficients can be determined by similar
techniques [1–3].
Before discussing the method for calculating the coefficients, it is useful to give some general
expressions for the properties of these filters.
H(Z) = H(Z)
and the frequency response of the filter can be written with the same conventions as in the earlier
chapters:
H(𝜔) = |H(𝜔) |e −j𝜙(𝜔)
The modulus and the phase are expressed in terms of H(Z) by the following equations:
|H(𝜔)| 2 = [H(Z)H(Z −1 )]Z =ejω (7.3)
By squaring H(𝜔) and using equation (7.3),
[ ]
1 H(Z)
𝜙(𝜔) = − j log (7.4)
2 H(Z −1 ) Z=ej𝜔
and, by taking the derivative of 𝜙(Z) with respect to the complex variable Z, we obtain:
[ ]
d𝜙 1 H ′ (Z) 1 H ′ (Z −1 )
=− j + 2
dZ 2 H(Z) Z H(Z −1 )
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
148 7 Infinite Impulse Response Filters
∑
N∕2
𝜏(𝜔) = 𝜏i (𝜔)
i=1
The general equations for IIR filters given above are used to calculate the coefficients.
A direct method for calculating the coefficients of an IIR filter is to use a model function, which
is a real function defined on the frequency axis. The model functions which will be considered are
those of Butterworth, Bessel, Chebyshev, and elliptic functions, all of which have known selectivity
properties. They are also used to calculate analog filters and form a model for the square of the
transfer function to be derived. However, one drawback to their use for calculating digital filters
is that they are not periodic when the desired function has period f s . It is therefore necessary to
7.2 Direct Calculations of the Coefficients Using Model Functions 149
establish a map linking the real axis and the range [0, f s ]. Such mapping is supplied by a conformal
transformation in the complex plane with the following properties:
(1) It transforms the imaginary axis onto the unit circle.
(2) It transforms a rational fraction of the complex variable s into a rational fraction of the complex
variable Z.
(3) It conserves stability.
Figure 7.1 illustrates this equation. Note that the warping is negligible for very low frequencies,
which justifies the choice of the scale factor 2/T.
Although other transforms can also be applied to the calculation of digital filter coefficients,
the bilinear transform is the most commonly used. It allows for the calculation of digital filters
1
2T
1 1
–
2 T
0 1 1 fN
2 T
1
–
2T
7.2 Direct Calculations of the Coefficients Using Model Functions 151
from the transfer functions for analog filters, or by using computer programs developed for analog
filters.
It should be remembered, however, that the frequency response is warped, as shown above, since
the analog and digital frequencies 𝜔A and 𝜔D are related by the equation:
[ ]
2 𝜔 T
𝜔A = tan N (7.19)
T 2
The group delay is also modified:
[ ( )]
𝜔A T 2
𝜏N = 𝜏A 1 + (7.20)
2
That is, an analog filter with a constant group delay is transformed into a digital filter which does
not have this property.
In order to design a digital filter from a mask using this method, the mask must first be modified
to take account of the subsequent frequency warping. The analog filter satisfying the new mask can
then be calculated, and finally, the bilinear transform can be applied.
The parameter 𝜔c gives the value of the variable for which the function has the value 12 . Figure 7.2
represents this function for various values of n.
By analytic extension, taking 𝜔c = 1, this can be written as:
| 1
|F(𝜔) |2 = |H(j𝜔)|2 = |H(s)H(−s)|
| s=j𝜔 =
| 1 + 𝜔2n
1 1
H(s)H(−s) = ( )2n = 1 + (−s2 )n
s
1+ j
The poles of this function lie on the unit circle. For example, when n is odd, one can write:
1
H(s)H(−s) = ∏2n
j𝜋(k∕n) )
k=1 (s − e
0.5
n=1
4 2
0 ωc 2 ωc ω
152 7 Infinite Impulse Response Filters
By setting to H(s) the poles which are to the left of the imaginary axis, to obtain a stable filter,
and after proper factorization to obtain first- and second-order sections with real coefficients, one
has:
1 ∏
(n−1)∕2
1
H(s) = ( ( ))
1 + s k=1 s2 + 2 cos 𝜋 k s+1
n
𝜔c = 2𝜋fc
By setting u = 1/tan(𝜋f 0 T) and 𝛼 k = 2cos(𝜋k/n), the Z-transfer function for the digital filter of
odd-order n is obtained:
(n−1)
1 + Z −1 ∏2
(1 + Z −1 )2
k
H(Z) = a 0
(1 + u) + (1 − u)Z −1 k=1 1 + bk1 Z −1 + bk Z −2
2
with:
1
ak0 = ; bk1 = 2ak0 (1 − u2 ); bk2 = ak0 (1 − u𝛼k + u2 )
1 + u𝛼k + u2
For even values of n, with 𝛼 k = 2 cos(𝜋(2k − 1)/2n), this becomes:
∏
n∕2
(1 + Z −1 )2
H(Z) = ak0 (7.22)
k=1 1 + bk1 Z −1 + bk2 Z −2
It would thus appear that the zeros of the Z-transfer function are all found at the point Z = −1,
which can simplify the realization of the filter. On the other hand, the function is completely deter-
mined by the data for the parameters n and u.
The order n is calculated from the specification of the filter. Assume that a filter is to be produced
with a frequency response greater than or equal to 1 − 𝛿 1 in the band [0, f 1 ] and less than or equal to
𝛿 2 in the band [f 2 , f s /2]. In terms of the model function F(𝜔), these constraints imply the following
inequalities:
1 1
( )2n ≥ (1 − 𝛿1 ) and ( )2n ≤ 𝛿2
2 2
𝜔1 𝜔2
1+ 𝜔c
1+ 𝜔c
H(f)
0.5
f2
0 0.1 f1 0.2 0.3 0.4 0.5 f
Thus, the filter order is proportional to the inverse of the transition band, as is the case for FIR
filters. It follows that the selectivity of this kind of filter is rather limited in practice.
In conclusion, Butterworth filters are straightforward to calculate. Significant simplifications can
be made in their realization because of the arrangement of the roots and, in certain cases, of the
poles, but they are much less selective than elliptic filters.
|T(jω)|2
1
1
1 + ε2
1
A2
0 ω1 ω2 ω
Figure 7.4 represents the function T 2 (u) for u = j𝜔, and shows the parameters corresponding to
k1 such that:
𝜀
k1 = √
(A2 − 1)
√
The function sn2 (𝜔, k) oscillates between 0 and 1 for 𝜔 < 𝜔1 and between (A2 − 1)/𝜀 and infinity
for 𝜔 ≥ 𝜔2 .
One can show that the order n of the filter is determined from the parameters k1 and k, the
selectivity factor:
𝜔
k= 1
𝜔2
by the expression:
(√ ( ))
K(k)K 1 − k12
n= √ (7.27a)
K(k1 )K( (1 − k2 ))
where K(k) is the complete elliptic integral of the first type:
𝜋
2 d𝜃
K(k) = (7.27b)
∫0 (1 − k2 sin2 𝜃) 2
1
This integral is calculated by the Chebyshev polynomial approximation method which results in
an error of the order of 10−8 with a polynomial of degree 4. The inverse function to the incomplete
integral of the first type is calculated as the quotient of two rapidly converging series.
A simplified equation for the order n of the filter can be obtained from the general specification
given in Figure 5.7. With the assumption of a ripple in the pass band of between 1 and 1 − 2𝛿 1 , and
with the following parameters (Figure 7.5):
1 1
𝛿2 = ; 2𝛿1 = 1 − √ ;f =1
A (1+𝜀2 ) s
7.2 Direct Calculations of the Coefficients Using Model Functions 155
Z1 P2
P1
0 1
one has:
( ) [ ]
() 8𝜔1
2 2
n≈ In √ In (7.28)
𝜋 2
𝛿2 𝛿1 (𝜔2 − 𝜔1 )
The order N of the digital filter satisfying the mask in Figure 5.7 is then given by:
( )
( ) ⎡ 𝜋f ⎤
( ) ⎢ 8 tan f 1 ⎥
2 2
√ × In ⎢ ( ( ) ( )) ⎥
s
N≈ In
𝜋 2
𝛿2 𝛿1 ⎢ tan 𝜋f2 − tan 𝜋f1 ⎥
⎣ fs fs ⎦
The transition band Δf = f 2 − f 1 is generally narrow, and thus (using logarithms to base 10),
( ) [( ) ( ) ( )]
2 fs 4 2𝜋f1
N ≈ 1.076 log √ log sin (7.29)
𝛿2 𝛿1 Δf 𝜋 fs
This relation should be compared with equation (5.32) for finite impulse response filters. It shows
that, for elliptic IIR filters, the order is proportional to the logarithm of the inverse of the normal-
ized transition band. This leads to much lower values than for FIR filters. Further, equation (7.29)
shows that the width of the band is also involved. The maximum value of N is found for f 1 close to
f s /4 – that is, for a pass band approximating half of the useful band. Further simplification can be
obtained for filters with a narrow pass band. In this case, the order N ′ of the filter is given by:
[ ] ( )
2 8f1
′
N ≈ 1.076 log √ log (7.30)
(𝛿2 𝛿1 ) Δf
and, as in analog filters, it is the steepness of the cutoff which is important here.
Once the filter order has been determined, the calculation procedure involves determining the
poles and zeros of T 2 (u), which show double periodicity in the complex plane. By changing the
variable and then applying the bilinear transform, the configuration of the poles and zeros of
the digital filter in the Z-plane is obtained [4].
With this technique, the filter is specified by:
(1) The peak-to-peak amplitude of the pass band ripples, expressed in dB:
BP = −20 log(1 − 2𝛿1 )
(2) The amplitude of the ripples in the attenuation band, expressed in dB:
( )
1
AT = 20 log
𝛿2
(3) The frequency of the end of the pass band, FB.
156 7 Infinite Impulse Response Filters
1
|H(f)|
dB
50
40
30
20
0.17
FA
0 0.1 FB 0.2 0.3 0.4 0.5 f
Example: Consider the specification given in the previous section: BP = 0.4, AT = 36.5, FS = 1,
FB = 0.1725, FA = 0.2875.
It is found that N = 3.37, and the adopted values in N = 4. The zeros and poles have co-ordinates
(Figure 7.5):
To demonstrate the point of infinite attenuation, the curve 1/|H(f )| giving the attenuation of the
filter as a function of frequency is shown in Figure 7.6.
Figure 7.7 shows the group delay of the filter obtained. The curves can be compared with the
results obtained for the same characteristic using the Butterworth filter. They demonstrate the
advantage of the elliptic filter, which requires an order lower by a factor of 2 and produces a corre-
sponding reduction in the complexity of the circuits.
The methods described allow for calculation of low-pass filters, from which suitable frequency
transformations make it possible to obtain high-pass and band-pass filters.
τ(f)
10
These transforms conserve the ripple in the response of the filter but result in frequency warping.
A more direct method consists of using transforms other than the bilinear transform to reach the
function H(Z). For example, the transform:
−1 −2
1 1 − 2 cos(𝜔0 T)Z + Z
s= −2
(7.31)
T 1−Z
allows a bandpass digital filter to be obtained from a low-pass filter function.
For Z = ej𝜔T this becomes:
1 cos(𝜔0 T) − cos(𝜔T)
s=j (7.32)
T sin(𝜔T)
If the pass band of the digital filter extends from 𝜔B to 𝜔H, 𝜔0 must be chosen so that the abscissae
of the transformed points are equal in absolute value but are of opposite sign:
cos(𝜔o T) − cos(𝜔B T) cos(𝜔o T) − cos(𝜔H T)
=−
sin(𝜔B T) sin(𝜔H T)
and thus:
[ ]
cos (𝜔B + 𝜔H ) T2
cos(𝜔o T) = [ ]
cos (𝜔B − 𝜔H ) T2
This approach avoids adding a stage to the calculation procedure for a band-pass filter.
It is also possible to use transforms in the Z-plane which conserve the unit circle. The simplest
is to transform from Z to − Z, which changes a low-pass filter into a high-pass one.
The transform:
(Z −1 − 𝛼)
Z −1 → (7.33)
(1 − 𝛼Z −1 )
where 𝛼 is a real number, changes a low-pass filter into another low-pass one. In fact, it can be
shown that the most general transform is expressed by [5]:
∏K
Z −1 − 𝛼k
Z −1 → ± (7.34)
k=1
1 − 𝛼k Z −1
with |𝛼k| < 1 to ensure stability.
158 7 Infinite Impulse Response Filters
The value E is a function of the set of 2N + 1 parameters, which are the coefficients of the filter:
( ) N
E = E a0 , ai1 , ai2 , bi1 , bi2 with 1 ⩽ i ⩽
2
The minimum corresponds to the set of 2N + 1 parameters xk such that:
𝜕E
= 0; 1 ⩽ k ⩽ 2N + 1
𝜕xk
For the parameter a0 , one can set H(Z) = a0 H 1 (Z), whence:
𝜕E ∑
NO −1
=0=2 (a0 |H1 (fn )| − |D(fn )|)|H1 (fn )|
𝜕a0 n=0
The procedure consists of taking an initial function H10 (Z), which is found, for example, by the
direct calculation method given in the preceding section for elliptic filters, and then assuming that it
is sufficiently close to the optimum for the function E to be represented by a quadratic function with
2N parameters xk . The desired optimum is then obtained through an increment in the parameters
represented by the vector ΔX with 2N element such that:
∑
2N
𝜕E 1 ∑ ∑ 𝜕2 E
2N 2N
E(X + ΔX) ≈ E(X) + Δxk + Δx Δx
k=1
𝜕xk 2 k=1 l=1 𝜕xk 𝜕xl k l
By using A to denote the matrix with 2N rows and N 0 columns which has elements:
𝜕
aij = 2 [a |H (f )|]
𝜕xi 0 1 j
and by using Δ to denote the column vector en with N 0 terms such that:
the condition of least squares is obtained by requiring E(X + ΔX) to be an extremum. As in Section
5.4, when calculating the coefficients of FIR filters, we have:
ΔX = −[AAt ]−1 AΔ
The calculation is then repeated with the new values for the parameters, which should ultimately
lead to the required optimum. The chances of achieving this and the rate of convergence depend
on the increments given to the parameters, and one of the best strategies is offered by the Fletcher
and Powell algorithm [7].
To ensure stability in the resulting system, either the stability can be controlled at each stage, or
the final system can be modified by replacing the poles Pi outside the unit circle by 1/Pi , which
does not modify the modulus of the frequency response except for a constant factor. In the latter
case, it is generally necessary to return to the optimization procedure to achieve the optimum.
Mean square error minimization can be applied to other functions as well as the frequency
response – for example, the group delay [8].
|N(f)|
|G(f)|
0 f1 f2 0.5 f
Figure 7.8 The N(f) and G(f) functions for a low-pass filter.
more satisfactory, however, to use an adaptation of the calculation techniques employed for analog
filters.
By assuming that the required function H(f ), written as:
N(f )
H(f ) =
D(f )
is such that:
|H(f )| ≤ 1
and hence:
1
|H(f )|2 =
| G(f ) |2
1 + | N(f ) |
| |
Figure 7.8 shows the functions |G(f )| and |N(f )| for a low-pass filter. The zeros of the function
G(Z) lie on the unit circle, and they can be calculated using the algorithm for linear-phase FIR
filters. The weighting function is determined from 1/|N(f )|.
By optimizing the stop and pass bands alternately, the required filter function is obtained after
several iterations and the filter coefficients are then obtained. The stability of the filter requires that
only those poles of H(z) which are inside the unit circle be conserved.
∑
P
H(f ) = an e−j2𝜋fn
n=−P
At RA = 𝜆At A
which is an eigenvalue equation. The filter coefficients are the elements of the eigenvector corre-
sponding to the largest eigenvalue of the matrix R, whose elements are the terms:
sin(n − m)2𝜋fc
(n − m)𝜋
The elements of the eigenvectors of the matrix R are called the discrete prolate spheroidal
sequences [9].
An FIR filter has been obtained; it is also possible to derive an IIR filter. To that end, consider the
following purely recursive function:
1
|H(f )|2 =
|∑N |2
1 + | n=1 bn ej2𝜋fn |
| |
The coefficients can be calculated to minimize the energy of the denominator in the band [−f c ,
f c ], under the condition |H(f c )|2 = 0.5. Then, the same method as above can be applied and the coef-
ficients bn (1 ≤ n ≤ N) can be taken as the elements of the eigenvector associated with the smallest
eigenvalue of the spheroidal matrix. First, the scaling factor of the eigenvector is chosen such that
|H(fc )|2 = 21• . Then, the poles of the analytic expansion of |H(f )|2 are calculated, and the desired
filter transfer function H(Z) is obtained by keeping only those few poles which are inside the unit
circle to ensure stability.
The procedure can be made reasonably simple by using iterative techniques and exploiting the
structural properties of the spheroidal matrix [9].
Example: Assume: N = 4; f s = 1; f c = 0.1. The minimal eigenvector V min is:
t[1.0,−2.773,−2.773,1.0]
Vmin
If T designates the matrix whose elements are the terms ej2𝜋fc (n−m) with 1 ≤ n, m ≤ N, then the
tmin
scaling factor leading to the equality Vmin is 10.46.
After factorization of the analytical expansion H(Z)H(Z −1 ), the transfer function finally
obtained is:
0.0704
H(Z) = (7.40)
(Z − 0.73 + j0.446)(Z − 0.73 − j0.446)(Z − 0.741)
The technique presented above for low-pass filtering can be extended to high-pass filtering.
162 7 Infinite Impulse Response Filters
This structure is the one most frequently used because, in addition to its modularity, it presents
the useful properties of low sensitivity to coefficient wordlength limitation and to round-off noise.
The function H(Z) can also be decomposed into rational fractions:
𝛼i 𝛼j + 𝛽j Z −1
H(Z) = a0 + · · · + + · · · + +··· (7.42)
1 − Pi Z −1 1 − 2Re(Pj )Z −1 + |Pj |2 Z −2
The approach corresponds to connecting the M basic elements in parallel as shown in Figure 7.9.
The numbers y(n) are obtained by summing the outputs from the different elements, to which the
input numbers x(n) are applied.
IIR filters can also be implemented with the help of phase shifters.
Butterworth, Chebyshev, and elliptic transfer functions can be decomposed into a sum of two
phase shifters [10]. For such a function, this gives:
N(Z) 1
H(Z) = = [A1 (Z) + A2 (Z)] (7.43)
D(Z) 2
where A1 (Z) and A2 (Z) are the transfer functions of the phase shifters.
H1
Σ y(n)
HN–1
7.2 Direct Calculations of the Coefficients Using Model Functions 163
Calculation of A1 (Z) and A2 (Z) from H(Z) involves the complementary function G(Z) =
M(Z)/D(Z) such that:
It is assumed that the initial function H(Z) is such that N(Z) is a symmetric polynomial and M(Z)
is asymmetric – that is:
and
Note that the zeros of N(Z) + M(Z) and N(Z) − M(Z) are harmonic conjugates and that the zeros
of D(Z) are their inverses. Designating the poles of the filters by Pi (i = 1, …, N), and hence the zeros
of D(Z), one can write, to within a constant:
∏
r
∏
N
N(Z) + M(Z) = (1 − Z −1 Pi ) (Z −1 − Pi ) (7.48)
i=1 i=r+1
and
∏
r
∏
N
N(Z) − M(Z) = (Z −1 − Pi ) (1 − Z −1 Pi )
i=1 i=r+1
where r is the number of zeros of the polynomial N(Z) + M(Z) within the unit circle. Dividing by
D(Z), one obtains:
∏N −1 − P )
i=r+1 (Z i
H(Z) + G(Z) = ∏N (7.49)
−1
i=r+1 (1 − Z Pi )
and similarly:
∏r
(Z −1 − Pi )
H(Z) − G(Z) = ∏ri=1 (7.50)
−1
i=1 (1 − Z Pi )
The phase shifters A1 (Z) and A2 (Z) have the following expressions:
∏N
Z −1 − Pi ∏r
Z −1 − Pi
A1 (Z) = −1
A2 (Z) = (7.51)
i=r+1
1 − Z Pi i=1
1 − Z −1 Pi
Finally, the filter H(Z) and its complement G(Z) are obtained by the arrangement shown in
Figure 7.10.
A2 (Z) + y 2 (n )
–
164 7 Infinite Impulse Response Filters
The general procedure for designing the phase shifters from an elliptical filter is as follows:
(1) Calculate the transfer function H(Z) = N(Z)/D(Z) of an elliptic filter of odd order N.
(2) Calculate the coefficients of the antisymmetric polynomial M(Z) from N(Z) and D(Z) by using
equation (7.46).
(3) Determine the inverses of the poles of H(Z) which are the roots of the polynomial N(Z) + M(Z).
(4) Calculate A1 (Z) and A2 (Z) using expression (7.51).
A simplified approach, when the order N is not very high, consists of finding A1 (Z) and A2 (Z)
directly by combining poles. Hence, for:
[ ]
1 + 1.8601Z −1 + 2.9148Z −2 + 2.9148Z −3 + 1.8601Z −4 + Z −5
H(Z) = 0.0546
(1 − 0.4099Z −1 )(1 − 0.06611Z −1 + 0.4555Z −2 )(1 − 0.4993Z −1 + 0.8448Z −2 )
this gives:
0.4555 − 0.6611Z −1 + Z −2
A1 (Z) =
1 − 0.6611Z −1 + 0.4555Z −2
(−0.4099 + Z −1 )(0.8448 − 0.4993Z −1 + Z −2 )
A2 (Z) =
(1 − 0.4099Z −1 )(1 − 0.4993Z −1 + 0.8448Z −2 )
The basic structure of phase shifters is useful since it provides two complementary filters with the
same calculations, which is useful in filter banks, as shown in Chapters 11 and 12. Furthermore, it
is less sensitive than other structures to rounding of the coefficients.
Note that filters which can be decomposed as the sum of phase shifters are entirely defined by
their poles.
A sum of phase shifters as in Figure 7.10 is the most efficient realization of an elliptic filter since
it requires a number of multiplications equal to the order of the filter.
In fact, these expressions form the Fourier series expansion of functions which are periodic in
frequency. The Bessel–Parseval equation (1.7) relating the power of a signal to the power of its
components allows the following equation to be written:
1 || ∑N
|e (f )2 df = |𝛿ai |2
∫0 || N
| i=1
ωc
0 1
This condition is generally much more restrictive than the earlier one because the function |D(𝜔)|
has very low values in the pass band and is even more restrictive with a more selective filter. Further,
when the pass band is narrow, the coefficients can have large values. For a low-pass filter such as
that in Figure 7.11, the following equation can be written:
D(Z) ≈ [1 − Z −1 ]N
and thus:
N!
bi ≈ (7.58)
i!(N − i)!
Under these conditions, a very large number of bits is required if both large values of the coeffi-
cients are to be represented and a very low quantization error is required. For this reason, decom-
posed structures are used almost exclusively with first- and second-order sections.
Let us first consider the cascade structure which corresponds to the decomposition (7.41) of the
transfer function. If the order N of the filter is even, this is written as:
∏
N∕2
H(𝜔) = N(𝜔)∕D(𝜔) = Ni (𝜔)∕Di (𝜔)
i=1
The polynomials N i (𝜔) and Di (𝜔) are of the second degree.
In the pass band, if rounding of the coefficients of the polynomials N i (𝜔) is neglected, we obtain:
∏
N∕2
Ni (𝜔)
HR (𝜔) ≈
i=1
Di (𝜔) + ei (𝜔)
or
[ ]
N(𝜔) ∑
N∕2
ei (𝜔)
HR (𝜔) ≈ 1− (7.59)
D(𝜔) i=1
Di (𝜔)
Then, using equation (7.53), we obtain:
| e (𝜔)| ≤ q
| i |
and the overall relative error e(𝜔) in the frequency response is found to be bounded by:
| ∑
N∕2
| 1
| e(𝜔)| ≤ q (7.60)
| |D
| i (𝜔)|
| i=1
This expression demonstrates the benefit of the decomposed structure as the bound of the error
is proportional to:
∑
N∕2
1
i=1
|Di (𝜔)|
7.2 Direct Calculations of the Coefficients Using Model Functions 167
in the stop band can have values much greater than unity, for example, in the vicinity of the zeros
of the filter. The result is that in the stop band, the parallel structure is more sensitive to rounding
errors than the cascade structure.
Finally, the cascade structure allows the coefficients of the IIR filters to be represented using
fewer bits. Thus, this structure is the one most frequently used.
In the presence of a signal – that is, for nonzero values of x(n) – rounding before storage in the
memory with quantization step q is equivalent to superposition
on the input signal of an error signal e(n) such that |e(n)| < q/2, assumed to have uniform spec-
trum and power so that 𝜎 2 = q2 /12.
If other roundings are involved (for example, in the multiplications), it is apparent that the error
signals produced are added either to the input or to the output signal depending upon whether they
correspond to the coefficients of the recursive or non-recursive part. Consequently, to simplify the
analysis, only the case of single quantization is considered. (By modifying the power of the injected
noise, it is always possible to reach a scenario where only single quantization is required).
The error signal applied to the input of the filter undergoes the filtering function and, by applying
equation (4.25), the power of the round-off noise at the output is:
1| |2
q2 | N(f ) | df
Nc = (7.63)
12 ∫0 | D(f ) ||
|
or, as a function of the set h(k), the impulse response of the filter,
q2 ∑
∞
Nc = |h(k)|2 (7.64)
12 k=0
The implementation of the cascade structure presents some possibilities for reducing this noise
power [11].
When the filter is realized as a cascade of N/2 second-order sections, the round-off noise pro-
duced in each section undergoes the filtering function of that section and of the following ones. In
this case, it should be noted that the amplitude, or level, at the input of each section, varies with
the rank of the section and the frequency of the signal being considered.
If the rounding procedure is the same for all sections, the noise produced is the same and the con-
tributions are added to each other. The total noise at the output of the filter under these conditions
has a power of:
( )
q2 ∑ 1∏
N∕2 N∕2
| Ni (f ) |2
Nc = | | df (7.65)
12 j=1 ∫0 l=j || Di (f ) ||
It is important to arrange the cascade of sections in such a way that the total round-off noise is
minimized, and the parameters are available:
(1) The pairing of the poles and zeros to form a section.
(2) The order in which the sections are arranged.
(3) The scaling factor applied to each section.
These three parameters will be examined in turn.
(1) The pairing of the poles and zeros. The products
∏
N∕2
| Ni (f ) |2
Pj (f ) = | |
| Di(f ) |
l=j | |
have to be minimized, which means that each of the factors is minimized and, in particular,
the lowest maximal value for each factor must be obtained. This condition is approximately
fulfilled by the very simple procedure of associating the pole nearest to the unit circle with its
closest zero, then the next pole with the zero which is then closest, and so on.
(2) Determining the order of the sections. The factor making the largest contribution to the
total noise is often the one which has the highest maximal value. It can be worthwhile
7.2 Direct Calculations of the Coefficients Using Model Functions 169
to place it at the beginning of the chain so that its contribution appears only once in the
total sum following equation (7.65) and to connect the sections in decreasing order of their
maxima.
(3) Calculating the scale factors. These are the parameters which control the scaling of the num-
bers in the internal data memories. They are calculated for each section so as to maximize the
amplitude of the signal while avoiding clipping.
Once the scale factors are known, all the elements involved in the implementation are available
and the power of the rounding noise at the output of the filter can be determined for each value of
the number of bits in the internal data memories [12–14].
sections whose properties are given in Section 6.3. Experience has shown that the FIR filter, which
exhibits perfectly linear phase, always requires fewer calculations [15]. It is also easy to implement.
Finally, it is recommended that FIR filters be used when linear phase is required and that IIR
filters be used in other cases.
Nevertheless, the comparison above has been made with the implicit hypothesis that the sam-
pling rate is the same at the input and output of the filter. The bases for the comparison are notice-
ably modified if this constraint disappears, as will be shown in a later chapter.
Exercises
7.1 Using the formulae given in Section 7.1, calculate the frequency and phase response and the
group delay of the filter section defined by the relation:
Using the same formulae, calculate the frequency and phase response and the group delay
of the second-order section with the Z-transfer function:
b2 + b1 Z −1 + Z −2
H(Z) =
1 + b1 Z −1 + b2 Z −2
7.2 It is proposed to use charts of analog filters in order to calculate a digital band-pass filter.
What specifications should be used in order for the digital filter to reject the signal compo-
nents in the bands (0−0.15) and (0.37−0.5) and show no attenuation in the band (0.2−0.33),
assuming fs = 1? Study direct calculation using a low-pass transformation and 7.31.
7.3 Calculate the coefficients of a Butterworth filter of order 4 whose amplitude has the value
2−1/2 at the frequency fc = 0.25. Give the decomposition into second-order sections.
7.4 Use a frequency transformation to transform the band-pass filter in Section 7.2.4 (Figure 7.6)
into a high-pass filter with a pass-band limit of fH = 0.4. How are the poles and zeros changed
in this operation?
7.6 The specification of the above filter is increased by 0.1 dB in order to permit rounding of the
filter coefficients. How many bits are required to represent the coefficients in the cascade
structure? Estimate how many bits are required to represent the coefficients for the parallel
structure. Find an optimum for rounding the coefficients. Can the number of bits found
earlier be reduced?
References 171
7.7 Does the filter given in Section 7.2.3 exhibit auto-oscillations? What are the frequencies and
the amplitudes? Answer the same question for the filter described in Exercise 5.
7.8 How many multiplications are required by the filter in Figure 7.5? How many memory
locations are necessary? How many coefficients are required by an FIR filter for the same
specification? Compare the number of multiplications and the memory capacities.
7.9 It is desired to achieve the channel filtering function in a PCM transmission terminal by
digital methods. The telephone signal is sampled at 32 kHz and coded into 12 bits, and the
filtering is carried out by a low-pass IIR filter. The pass band is 3300 Hz, and the stop band
begins at 4600 Hz.
The ripples in the pass and stop bands have value of:
𝛿1 ≤ 0.015; 𝛿2 ≤ 0.04
A computer program for elliptic filters produces the following results: order of filter: N = 4
zeros: z1 = 0.09896 ± j0.995
z2 = 0.5827 ± j0.8127
poles: P1 = 0.6192 ± j0.2672
P2 = 0.702 ± j0.589
Calculate the transfer function of the filter decomposed into second-order sections.
What is the value of the overall scaling factor, knowing that the amplitude at frequency
0 is 0.99?
The coefficients are quantized into 10 bits. Determine the displacement of the infinite atten-
uation frequencies and evaluate the additional pass-band ripple.
Calculate the scaling factor to be assigned to each section and estimate the round-off noise
produced if the data memories have 16 bits.
Give the complete diagram for the filter.
Evaluate the complexity in terms of:
(1) The number of multiplications and additions per second.
(2) The number of memory bits.
Calculate the filter order. By taking the order N = 6, a margin on the in-band ripple is avail-
able. Determine the coefficient wordlength.
If the signal-to-noise ratio degradation is limited to 0.1 dB, give the increase of the internal
data wordlength with respect to the input data wordlength.
References
1 A. Oppenheim and R. Schafer, Digital Signal Processing, Prentice Hall, Englewood Cliffs, NJ,
1974, Chapters 5 and 9.
2 L. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Prentice Hall,
Englewood Cliffs, NJ, 1975, Chapters 4 and 5.
172 7 Infinite Impulse Response Filters
3 R. Boite and H. Leich, Les filtres numériques: analyse et synthèse des filtres unidimensionnels,
Masson, Paris, 1980.
4 B. Gold and C. Rader, Digital processing of signals, McGraw-Hill, New York, 1969.
5 A. G. Constantinides, Spectral transformation for digital filters. Proceedings of the IEEE, 8, 1970.
6 A. Deczky, Synthesis of recursive digital filters using the minimum p-error criterion. IEEE
Transactions on Audio and Electroacoustics, 20, 1972.
7 R. Fletcher and M. J. D. Powell, A rapidly convergent descent method for minimization.
Computer Journal, 6(2), 1963.
8 J. P. Thiran, Equal ripple delay recursive filters. IEEE Transactions on Circuit Theory, 1971.
9 T. Durrani and R. Chapman, Optimal all-pole filter design based on discrete prolate spheroidal
sequences. IEEE Transactions, ASSP32(4), 716–21, 1984.
10 P. P. Vaidyanathan, S. K. Mitra and Y. Neuvo, A new approach to the realisation of low sensitiv-
ity IIR filters. IEEE Transactions, ASSP34(2), 350–61, 1986.
11 L. B. Jackson, Round-off noise analysis for fixed point digital filters in cascade or parallel form.
IEEE Transactions on Audio and Electroacoustics, 1970.
12 A. Peled and B. Liu, Digital Signal Processing: Theory, Design and Implementation, John Wiley,
New York, 1976.
13 Von E. Lueder, H. Hug and W. Wolf, Minimizing the round-off noise in digital filters by
dynamic programming. Frequenz, 29(7), 211–14, 1975.
14 D. Mitra, Large-amplitude, self-sustained oscillations in difference equations, which describe
digital filter sections using saturation arithmetic. IEEE Transactions, ASSP25(2), 1977.
15 L. Rabiner, J. F. Kaiser, O. Herrmann and M. Dolan, Some comparison between FIR and IIR
digital filters. Bells Systems Technical Journal, 53, 1974.
173
The filter structures presented in the previous chapters are deduced directly from their Z-transfer
functions, with the coefficients applied to the multiplying circuits being those of the powers of Z −1 .
More elaborate structures can be developed.
In analog filtering, structures exist which allow filters with very low ripple and excellent
selectivity to be constructed using passive components of limited precision. In digital filtering,
these properties can be translated into a reduction in the round-off noise and in the number of
bits representing the coefficients.
Analog filter networks are based on cascading two port circuits whose properties will be
considered first [1].
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
174 8 Digital Ladder Filters
E
V1 V2 R2
R1
The transmission and reflection coefficients of the circuit can be demonstrated if another
matrix – the distribution matrix – is introduced. If a reference case is defined with unit terminating
resistors, the incident and reflected waves a and b can be written as:
1
a = (𝜐 + i) (8.2)
2
1
b = (𝜐 − i) (8.3)
2
The relations between variables a and b can be obtained using equation (8.1):
1
a = (z + I2 )i
2
[ ]
1 0
İ 2 =
0 1
1
b = (z − I2 )i
2
Thus,
b = Sa (8.4)
where:
[ ]
S11 S12
S=
S21 S22
and:
S = (z − İ 2 )(z + İ 2 )−1 (8.5)
If the circuit is reciprocal, then:
S12 = S21 = 𝜏 (8.6)
where 𝜏 is the transmission coefficient:
2V2
𝜏= (8.7)
E
If the input and output impedances of the circuit are z1 and z2 , respectively, one can write:
z1 − 1 z2 − 1
S11 = 𝜌1 = ; S22 = 𝜌2 = (8.8)
z1 + 1 z2 + 1
The values 𝜌1 and 𝜌2 are the reflection coefficients at the input and output of the two-port circuit.
If the circuit is not dissipative, the power that it absorbs is zero, and it can be shown that the
distribution matrix of such a reciprocal two-port network has the form [2]:
[ ]
1 h f
S= (8.9)
g f ±h∗
where f, g, and h are real polynomials with the following properties:
8.1 Properties of Two-Port Circuits 175
(1) They are linked by a relation which, on the imaginary axis, corresponds to:
|g|2 = |h|2 + |f |2
The notation h* (p) indicates h(− p).
(2) Depending upon whether f is of even or odd degree, the lower or upper sign is taken in
equation (8.9).
(3) Each root of g in the complex plane lies in the left-hand half plane.
The polynomials f , g, and h are the characteristic polynomials of the circuit. The roots of f (p) are
generally on the imaginary axis in the stop band and are transmission zeros. The roots of h(p) are
attenuation zeros, and for a non-dissipative network, they are generally on the imaginary axis in
the pass band.
For the circuit in Figure 8.1, the transmission coefficient is:
√
2V2 R2
S12 = (8.10)
E R1
The attenuation in decibels is denoted by the function Af (𝜔), where:
( )
| g(𝜔) |2
| = 10 log 1 + |h|
2
Af (𝜔) = −10 log |S12 (𝜔)|2 = 10 log || |
| f (𝜔) | |f |2
The relation
| f (𝜔) |2 | h(𝜔) |2
| | | |
| g(𝜔) | + | g(𝜔) | = 1 (8.11)
| | | |
expresses the fact that the non-transmitted power is reflected.
For a cascade arrangement, it is important to pay equal attention to the transfer matrix t
defined by:
[ ] [ ]
b1 a
=t 2 (8.12)
a1 b2
The cascade arrangement is represented by the product of the transfer matrices.
The transfer matrix of non-dissipative two-port circuits takes the form:
[ ]
1 ±g∗ h
t= (8.13)
f ±h∗ g
By way of example, Figure 8.2 gives the transfer matrices of several elementary circuits.
The fact that the circuit element is non-dissipative has important consequences for the atten-
uation Af (𝜔). In the pass band, Af (𝜔) cannot take negative values. Consequently, at frequencies
where h(𝜔) is zero, its derivative with respect to any of the parameters must also be zero. In a filter
with inductances and capacitances, terminated by resistors, variation of the values of the L and S
elements does not affect the attenuation to the first order at frequencies where it is zero.
If the ripple is small, it can be assumed that this property applies over the whole pass band. In
practice, it can be taken that, in a ladder filter, for example, the interactions between the different
branches are such that a perturbation in one element has repercussions for all the other factors of
the attenuation function, having an overall compensating effect which minimizes the incidence of
the perturbation.
Given this behavior, it is of interest to find digital filter structures which have similar properties.
In effect, in a digital filter where the amplitudes of the ripples in the pass and stop bands are similar,
176 8 Digital Ladder Filters
1–LP/2 1 1
LP/2 1 p– 2C 2C
p 1 1
–LP/2 1+LP/2 – p+ 2C
2C
C L
1 1
1–CP/2 –CP/2 p– –
1 2L 2L
p 1 1
CP/2 1+CP/2 2L
p+ 2L
the denominator of the transfer function determines the number of bits required to represent the
coefficients. Structures derived, for example, from ladder analog filters can therefore be expected
to lead to significant gains in the coefficient wordlengths, in the complexity of the multipliers, and
also in the power of the round-off noise.
Ladder structures are the most commonly used type in passive analog filtering. The procedure
for obtaining the elements of such a structure, using a transfer function, is described in detail in
Ref. [2]. It consists of factorizing the overall transfer matrix, defined using the calculated transfer
function H(𝜔) into partial matrices corresponding to the series and parallel arms of the ladder
structure.
The most direct approach for obtaining a digital filter structure from an analog ladder filter is to
simulate its voltage–current flow chart.
V2 V4 VN–2 VN
R1 Z3 ZN–3 ZN–1 R2
I1 I3 IN–1 IN+1
E Y2 Y4 YN–2 YN
–1
–R1–1 –R2–1
E
R1 I1 V2 I3 V4 IN–1
VN
1 Y2–1 1 Z3–1 1 Y4–1 –1
ZN–1 1 YN–1
–1 –1
A particularly simple case occurs when the series impedors Z K−1 are inductors and the paral-
lel branches y𝜅 are capacitors (K = 4, 6, …, N). Such filters are purely recursive and are without
frequencies of infinite attenuation.
The transfer functions to account for them take the form:
−1 R 1
ZK−1 = ; YK−1 = (8.14)
sLK−1 sCK R
where s is the Laplace variable and R is a normalization constant. Both cases involve the transfer
functions of integrators which are easily produced using operational amplifiers and R−C networks.
The diagram in Figure 8.5 then appears, being deduced from Figure 8.4. It represents the functions
to be implemented and shows the circuit diagram using integrators.
R R
R1 R2
ER
R1 – 1 – – 1 VN
Σ Σ Σ R Σ R Σ
– sC2R sL3 – sLN–1 sCNR
– – VN
Σ Σ
R R
R1 R2
1 R R 1
sC2R sL3 sLN–1 sCNR
ER –
Σ Σ Σ
R1 – –
The digital realization consists of replacing each integrator with an equivalent function. In
Ref. [3], it is shown that the only digital integrator circuit, which is simple to realize, and which
is equivalent to an analog integrator, is the one represented by Figure 8.6 and whose Z-transfer
function, i(z), is written as:
aZ−1∕2
I(Z) = (8.15)
1 − Z −1
The equivalence between the analog and digital integrators is obtained in the same way as for
any transfer function, by replacing Z with ej𝜔T . Thus,
ae−j𝜔(T∕2) a 1
I(𝜔) = = (8.16)
1 − e−j𝜔T 2j sin(𝜔(T∕2))
where T is the sampling period of the digital circuit. This function is equivalent to the analog
function a/j𝜔T with frequency warping. If f A denotes the analog frequency and f N is the digital
frequency, we can show:
𝜋 fA T = sin(𝜋 fN T) (8.17)
The frequency warping thus introduced is different from that obtained with the bilinear trans-
form introduced in the previous chapter, as shown in Figure 8.7, and it must be taken into account
when calculating a filter from a specification.
The circuit in Figure 8.6 shows the disadvantage of introducing the function Z −1/2 , which cor-
responds to an additional memory circuit, but the transfer function of a ladder filter is not altered
when the impedances of all branches are multiplied by the same function [3]. This property has
already been used to introduce the normalization constant R. If the impedances are multiplied by
Z −1/2 , this term is eliminated from all of the Z-transfer functions of the integrators in the circuit,
which then become:
TR 1
Ii (Z) = for odd i
Li 1 − Z −1
Z–1
1
2T
1
πT
Sine
transform
0 1 fN
2T
8.2 Simulated Ladder Filters 179
and:
T Z −1
Ii (Z) = for even i
Ci R 1 − Z −1
In contrast, the termination resistances are transformed to R1 Z −1/2 and R2 Z −1/2 ; the terminations
are no longer purely resistive, as they have the transfer functions:
R1 e−j𝜋fT ; Ṙ 2 e−j𝜋fT
This effect can be neglected when the sampling frequency is large compared to the pass band,
as there is an insignificant change in the transfer function of the filter. Further, the resistances R1
and R2 can be chosen as unity, as can the normalization constant R. The circuit of the digital filter
obtained under these conditions is given in Figure 8.8.
The coefficients have the following values, for odd-order N:
T T T N −1
aN = ; a2i−1 = ; a2i = ; i = 1, 2, … (8.18)
CN+1 C2i L2i+1 2
The filter thus realized involves N multiplications and N memories for a transfer function of
order N. For these parameters, the structure is canonic. The number of additions is 2N + 1.
To summarize, the calculation of a simulated ladder digital filter from an imposed specification
involves the following stages:
(1) Transposing the specification by modifying the frequency axis using equation (8.17) above.
(2) Calculating the elements of a filter with passive ladder elements LC, satisfying the transposed
specification.
(3) Using the values of the elements so obtained to calculate the coefficients ai (i = 1, 2, …, N) of
the digital filter using equations (8.18).
The chief attraction of the structure obtained in this way is that the coefficients can be represented
by a very small number of bits. Also, some multiplications can be replaced by simple additions,
and, in certain cases, all the multiplications of the filter can be eliminated, resulting in signifi-
cant savings in the circuitry. To illustrate this property, let us consider a low-pass filter of order
N = 7 [3], with elements having the following values (Figure 8.5):
R = R1 = R9 = 1
C2 = 1.2597 = C8
L3 = 1.5195 = L7
C4 = 2.2382 = C6
L5 = 1.6796
x(n) – – – y(n)
+ + +
a1 Z–1 + Z–1 + aN
+ +
– –
Af(f)
(dB)
0.5
3 bits
0.4
0.3
5 bits
0.2
10 bits
0.1
0 0.5 1 f
Figure 8.9 Ripples in the pass band for various coefficient wordlengths.
The coefficients ai (i = 1, 2, …, N) of the corresponding simulated ladder digital filter are calcu-
lated with a sampling period T = 1/fs = 0.01, from equation (8.18) above. The ripples of the filter in
the pass band are shown in Figure 8.9, where the coefficients are represented by 10, 5, and 3 bits. It
is notable that, when represented with 5 bits, the attenuation zeros are conserved. With 3 bits, they
are also conserved except for the one closest to the transition band. Thus, the insensitivity to the
first order of the attenuation zeros, which was shown in Section 8.1, is demonstrated. In compari-
son with the cascade structure in the previous chapter, this example shows an estimated gain of 4
or 5 bits for representing the coefficients.
The technique described in this section can be extended to filters other than the purely recursive
low-pass filter, but the designs will be more complicated. Also, the need to have a sampling fre-
quency which is large in comparison with the pass band is not conducive to efficient processing.
In practice, the simulated ladder structure is primarily used with a different method of performing
the calculations – that used in switched-capacitor devices.
Strictly speaking, because they do not use arithmetic operations, filters using switched-capacitor
devices are not digital filters. Nevertheless, they do use the same design methods and are comple-
mentary to digital filters. They are frequently used in analog–digital conversion circuits.
The basic principle, which is presented in detail in Reference [4], is as follows. Switching a capac-
itor S between two voltages V 1 and V 2 at frequency f s is equivalent to introducing a resistance R
given by:
1
R= f
C s
between the two potentials. In effect, as is shown in Figure 8.10, the capacitor is charged to voltages
V 1 and V 2 alternately and a charge transfer C(V 1 − V 2 ) results. If the operations are performed at
frequency f s , a current i:
i = C(V1 − V2 )fs
V1 V2 V1 V2
C (Q = CV1) C (Q = CV2)
V0 C2 V0 C2
R
Σ – –
– V2 C1 V2
+ +
V1 V1
dV2 1 ΔV2 C1
=– (V0 –V1) = –fe (V0 –V1)
dt RC2 Δt C2
This equivalent resistor is inserted in an integrator circuit as shown in Figure 8.11. The integrator
being considered has an adder at its input, like those in Figure 8.5. The equation describing the
operation of the analog integrator is also shown in Figure 8.11. In the switched capacitor version,
the capacitor C1 is alternately connected at frequency f s across the input of the operational
amplifier, and between voltages V 0 and V 1 . The equation for the variation ΔV 2 in the output
voltage during the interval Δt, which is assumed to be large in comparison with the period l/f s , is
shown in the diagram.
The condition that the two types of integrators are equivalent is:
1
C1 = (8.19)
fs R
However, in order to completely analyze the switched-capacitor integrator, it is necessary to take
account of the sampling [5] and to calculate its Z-transfer function. Assume that 𝜐e (t) is the input
signal and that 𝜐2 (t) is the output one. The sampling period T is assumed to be divided into two
equal parts. The capacitor C1 is connected to the input, to the integrator for time T/2,
( and)to the
voltage 𝜐e (t) itself for time T/2. Let us assume that this is between times nT and n + 12 T . The
charge transmitted to the integrator is Q(nT) such that:
[( ) ]
1
Q(nT) = C1 𝜐e n + T
2
Under these conditions, at time (n + 1)T, the output voltage is:
C [( ) ]
1
𝜐2 [(n + 1)T] = V2 (nT) − 1 𝜐e n + T
C2 2
By taking the Z-transform of the two components, one has:
V2 (Z) C Z −1∕2
= H(Z) = − 1 (8.20)
Ve (Z) C2 1 − Z −1
One finds the same type of transfer function as was given by equation (8.15) for digital circuits.
The switched-capacitor integrator performs in exactly the same way as the digital circuits described
182 8 Digital Ladder Filters
in the previous section, and the same warping of the frequency axis is involved. It should be noted
that, to ensure that no further delay is introduced and that this function is conserved when cascad-
ing two integrators, the capacitors of the two integrators must be switched in antiphase.
A design involving a switched-capacitor device for a simulated ladder filter, like that in Figure 8.5,
is obtained by substituting integrator circuits and calculating the value to be given to the switched
capacitors in each case.
Example: Assume an implementation using switched capacitors for a Butterworth filter of
order 4, with an analog circuit as shown in Figure 8.12(a).
The procedure described in the previous section results in the design (Figure 8.12(b)) for produc-
ing a filter from integrators, assuming unit terminal resistances. The switched-capacitor design is
shown in Figure 8.12(c). The coefficients ai (i = 1, 2, 3, 4) which define the ratios of the capacitances
are given by equation (8.18).
If the filter has a 3 dB attenuation frequency f c equal to 1 kHz, the analog parameters are as
follows:
R1 = R2 = 1
C2 = 121.8 × 10−6 ; C4 = 294.1 × 10−6
L3 = 294.1 × 10−6 ; L5 = 121.8 × 10−6
R1 L3 L5
E Vs
C2 C4 R2
(a)
– –
Σ Σ Vs
1 R 1 R
jωRC2 jωL3 jωRC4 jωL5
–
E Σ Σ Σ
– –
a2 c2 (b) a4 c4 Vs
C1 C2 C3 C4
E
a1 c1 a3 c3
a1 c1
(c)
The lattice structure occurs in the analysis and synthesis of speech for simulating the vocal tract,
and also more generally in systems for linear prediction. It allows the realization of finite impulse
response (FIR) and infinite impulse response (IIR) filters [6].
Consider the structure with M sections as represented in Figure 8.13. The outputs y1 (n) and u1 (n)
of the first section are related to the input set x(n) by:
Similarly, y2 (n) and u2 (n), the outputs of the second element, are related to the inputs by:
By iteration, yM (n) and uM (n), the outputs produced by the filter are related to the set x(n) by the
following equations, which correspond to FIR filtering:
∑
M
yM (n) = ai x(n − i) (8.23)
i=0
∑
M
uM (n) = aM−i x(n − i) (8.24)
i=0
The two FIR filters thus obtained have the same coefficients but in reverse order. Their Z-transfer
functions, H M (Z) and U M (Z), are image polynomials. Thus,
∑
M
HM (Z) = ai Z −i
i=0
∑ M
UM (Z) = aM−i Z −i = Z −M HM (Z −1 ) (8.25)
i=0
An iterative process is used to determine the coefficients ki of the lattice filter from the coefficients
ai (1 ⩽ i ⩽ M). Firstly, the coefficient a0 is assumed to be equal to unity. Then it is straightforward,
using the equations given earlier (and also directly from Figure 8.13), to prove that:
kM = a M
184 8 Digital Ladder Filters
x(n)
k1 k2 kM
This point is the basis for the calculation. By using H m (Z) and U m (Z) (1 ⩽ m ⩽ M) to denote the
corresponding transfer functions at the outputs of the mth section, the following matrix equation
can be written:
[ ] [ 1 k Z −1 ] [ ]
Hm (Z) m Hm−1 (Z)
=
Um (Z) km Z −1 Um−1 (Z)
kM k2 k1
Similarly, the sets x2 (n), x1 (n), u1 (n), and u2 (n) are related by:
x1 (n) = x2 (n) − k2 u1 (n − 1)
u2 (n) = k2 x1 (n) + u1 (n − 1)
This results in the transfer function H 2 (Z) between the input x2 (n), and the output y(n) given by:
1
H2 (Z) =
1 + k1 (1 + k2 )Z −1 + k2 Z −2
Similarly, the transfer function U 2 (Z) relates u2 (n), and y(n) where:
U2 (Z) = k2 + k1 (1 + k2 )Z −1 + Z −2
By iteration, it can be seen that the sets xM (n) and y(n), on the one hand, and uM (n), and y(n) on
the other, are related by the equations:
∑
M
y(n) = xM (n) − bi y(n − 1) (8.28)
i=1
∑
M−1
uM (n) = bM−i y(n − i) + y(n − M) (8.29)
i=0
The coefficients ki (1 ⩽ i ⩽ M) of the lattice filter are calculated by iteration from the coefficients
bi of the IIR filter, after noting that:
kM = bM
By using H m (Z) and U m (Z) to denote the relating functions for the set of m sections (1 ⩽ m ⩽ M),
it is possible, using the equations of definition:
xm−1 (n) = xm (n) − km um−1 (n − 1)
um (n) = km xm−1 (n) + um−1 (n − 1)
to produce the following matrix equation:
[ ] [ 1 k Z −1 ] [ ]
Dm (Z) m Dm−1 (Z)
Um (Z)
= −1 Um−1 (Z)
km Z
186 8 Digital Ladder Filters
+ Z–1
u'n u'n–1
As in the case of FIR filters, this matrix equation is also written, for km ≠ 1:
[ ] [ ]
Dm−1 (Z) 1 1 −km [Dm (Z) ]
Um−1 (Z)
= 2 Um (Z)
(8.31)
1 − km −km Z Z
As with FIR filters, this expression allows for the calculation of the coefficients ki (1 ⩽ i ⩽ M) of
the IIR lattice filter in M iterations, using the polynomial DM (Z), where:
∑
M
DM (Z) = 1 + bi Z −i
i=1
The lattice structures given in Figures 8.13 and 8.14 are canonical for the data memories but not
for the multiplications. They can be made canonical, for an IIR type filter, for example, by using
the single-multiplication section represented in Figure 8.15.
However, a further addition is then necessary. The equations of this first-order section are as
follows:
(1 + k)x1 (n) = y(n) + ky(n − 1)
(1 + k)u1 (n) = ky(n) + y(n − 1)
To within a factor of (1 + k), they are equivalent to the equations for a two-multiplier lattice.
In contrast with the structures described in the previous sections, lattice filters do not have any
particular advantage for the number of bits needed to represent the coefficients. Nevertheless, in
practice, they have one interesting property in that a necessary and sufficient condition for an IIR
filter to be stable and have its poles inside the unit circle is that the coefficients have a modulus of
less than unity:
1≤i≤M
This property is obvious for k1 in Figure 8.14 if the appropriate section is isolated. It can be
extended to the other coefficients by considering the subcircuits and using recurrence.
This results in a control of the stability which can be realized quite simply and is of particular use
in systems such as adaptive filters, where the values of the coefficients are constantly changing.
The lattice structures considered above are either non-recursive or purely recursive. Note that
the purely recursive structure can be supplemented to make a general filter; it is sufficient to form
a weighted sum of the variables um (n). That is, the expression:
∑
M
vM = 𝛾0 y(n) + 𝛾m um (n)
m=1
defines FIR-type filtering of the signal y(n) by virtue of equation (8.29). As the coefficients
bi (1 ≤ i ≤ M) are fixed, the coefficients 𝛾 i can be determined to obtain any numerator for the
general filter.
8.5 Comparison Elements 187
It is also useful to observe that the purely recursive structure consists of the pure all-pass function.
In fact, equations (8.29) and (8.30) yield:
UM (Z) bM + bM−1 Z −1 + … + Z −M
HD (Z) = =
X(Z) 1 + b1 Z −1 + … + bM Z −M
This expression shows that, as indicated in Section 6.3, the signal uM (n) is the output of an all-pass
network with x(n) as the input. The transfer function H D (Z) can be expressed directly as a function
of the lattice coefficients by a continued fraction:
( 2
) −1
1 − kM Z
HD (Z) = kM + 1
−1
kM Z +
k Z −1
(1−kM−1
2
)Z−1
m−1 kM−1 Z −1 +
⋱ ⋱
1
+
(1−k12 )Z−1
k1 + k1 Z −1 +1
This observation can be used to calculate the poles of the lattice filter directly [7].
An interesting application of the above results is the implementation of the notch filter intro-
duced in Section 6.3. The notch filter output yN (n) is obtained simply by incorporating one more
adder into Figure 8.14 to carry out the following operation:
yN (n) = xM (n) + uM (n) (8.32)
For order 2, the transfer function of the all-pass circuit is:
k2 + k1 (1 + k2 )z−1 + z−2
HD (Z) =
1 + k1 (1 + k2 )z−1 + k2 z−2
A very peculiar property of the approach is that the frequency 𝜔0 and the 3 dB attenuation band
width B3N can be tuned independently [8]. This decoupling effect is due to the relationships:
k1 ≃ − cos 𝜔0
1 − tan 𝜋B3N
k2 ≃ = (1 − 𝜀)2 (8.33)
1 + tan 𝜋B3N
If a subtraction is performed instead of the addition in equation (8.32), the complementary filter
is obtained.
of additions is increased, as is the complexity of the sequence of operations. Extra memories are
required for storing the intermediate results. Also, multiplexing of the operations between several
filters – an important advantage of digital filters – becomes more difficult. In view of these factors,
a detailed evaluation is necessary before this type of structure can be used.
Exercises
8.1 Give the impedance and distribution matrices for the elementary two port networks of
Figure 8.2. Give the impedance, distribution, and transfer matrices when the elements are
resonant LC circuits.
8.2 Consider the Butterworth filter of order 4 given in Figure 8.12(a). Draw the corresponding
flow chart. Show the circuit of the digital simulated ladder filter and calculate the coefficients
from the given values for the analog elements for a sampling frequency of 40 kHz. Study
the modification of the transfer function in the pass band introduced by a reduction in the
sampling frequency from 40 kHz to 10 kHz.
8.3 Let us consider the low-pass Chebyshev filter of order 7, whose analog elements are given in
Section 8.2. The sampling frequency is taken as 10 kHz. Show the design for the correspond-
ing simulated ladder filter.
What is the number of operations to be carried out in each realization?
Give the frequency response when the coefficients are represented by 5 bits.
8.4 Calculate the frequency response of the lattice filter given as the example in Section 8.4. How
does this response evolve when the parameters are represented by 5 bits? Give the circuit of
the filter with single multiplication elements. How should the circuit be modified to produce
the filter using an inverse Z-transfer function?
References
Complex signals in the form of sets of complex numbers are currently used in digital signal anal-
ysis. Some examples of these sets are presented in the chapters on discrete Fourier transforms. In
this chapter, analytic signals – a particular category of complex signal – will be studied. Such sig-
nals exhibit some interesting properties and occur primarily in modulation and multiplexing. The
properties of the Fourier transforms of real causal sets will be examined first [1–3].
The Fourier transform of this set is obtained by replacing Z with ej2𝜋f in X(Z):
∑
∞
𝜋nf
X(f ) = x(n)e−j2
n=−∞
The values of X(f ) at negative frequencies are complex conjugates of the values at positive
frequencies. The supplementary condition of causality can be imposed on the set x(n) and the
consequences for X(f ) will now be examined.
The function X(f ) can be separated into real and imaginary parts:
If the set x(n) is real, then using equation (9.1), the function X R (f ) is even. Thus, it is the Fourier
transform of an even set xp (n). The function X 1 (f ) is the Fourier transform of an odd set xi (n) such
that:
xp (n) = xp (−n)
xi (n) = −xi (−n)
x(n) = xp (n) + xi (n) (9.3)
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
190 9 Complex Signals – Quadrature Filters – Interpolators
x(n)
xp(n)
0 n
xi(n)
Figure 9.1 Decomposition of a causal set into even and odd parts.
and hence,
∑
∞
XR (f ) − x(0) = x(n) cos(2𝜋nf) (9.6)
n=1
∑
∞
XI (f ) = − x(n) sin(2𝜋nf) (9.7)
n=1
It can be seen that these two functions are related. To go from one to the other, it is sufficient to
change cos(2𝜋nf ) to − sin(2𝜋nf ) or vice versa. Such an operation is called quadrature, and it will
now be expressed analytically.
By definition, a causal set is one which satisfies the equality:
This set is a sample of the unit step function Y (t) which, in terms of distributions, has a Fourier
transform FY given in Reference [2]:
( )
1 1 1
FY = 𝜐p + 𝛿(f ) (9.8)
j2π f 2
9.1 The Fourier Transform of a Real and Causal Set 191
The equations which relate the real and imaginary parts of X(f ) are:
1
2
Uk (n) = 0 for n ≠ k
Uk (k) = 1
XR (f ) = cos(2𝜋kf); XI (f ) = − sin(2𝜋kf)
192 9 Complex Signals – Quadrature Filters – Interpolators
Analytic signals correspond to causal signals in which time and frequency are exchanged. Their
spectrum has no negative frequency component, and their name derives from the fact that they
represent the restriction to the real axis of an analytic function of a complex variable – that is, they
can be expanded in series in a region which contains that axis. The properties of analytic signals
are deduced from the properties of causal signals by exchanging time and frequency.
Consider the signal x(t) = xR (t) + jxI (t) such that:
The functions xR (t) and xI (t) are Hilbert transforms of each other:
∞
1 xI (t′ ) ′
xR (t) = dt (9.16)
𝜋 ∫−∞ t − t′
∞
1 xR (t′ ) ′
xI (t) = − dt (9.17)
𝜋 ∫−∞ t − t′
The Fourier transform of the real function:
1
xR (t) = [x(t) + x̄ (t)] (9.18)
2
is the function X R (f ) such that:
1 ̄
XR (f ) = [X(f ) + X(−f )] (9.19)
2
̄
X(−f )
that is, X R (f ) = X(f )/2 for positive frequencies and XR (f ) = 2
for negative ones.
Similarly:
1 ̄
XI (f ) = −j [X(f ) − X(−f )] (9.20)
2
Figure 9.2 shows the decomposition of the spectrum of a signal into real and imaginary parts.
9.2 Analytic Signals 193
0 f
XR(f)
0 f
jXI(f)
0 f
Example:
1 j𝜔t
x(t) = ej𝜔t ; xR (t) = cos 𝜔t = [e + e−j𝜔t ]
2
1
xI (t) = sin 𝜔t = −j [ej𝜔t − e−j𝜔t ]
2
Finally, it can be seen that the following equations hold between X R (f ) and:
XI (f ) = −jXR (f ) for f > 0
jXR (f ) for f < 0
That is, X I (f ) is obtained from X R (f ) by a rotation of the components through 𝜋/2. The Hilbert
transform consists of orthogonalizing the signal components. This is a filtering operation, and the
frequency response Q(f ) is represented in Figure 9.3.
Example:
∞
xR (t) = [A(f ) cos(2𝜋ft) − B(f ) sin(2𝜋ft)]df (9.21)
∫0
∞
xI (t) [A(f ) sin(2𝜋ft) + B(f ) cos(2𝜋ft)]df (9.22)
∫0
The properties of continuous analytic signals can be transferred to discrete signals after certain
modifications.
A discrete signal has a periodic Fourier transform. A discrete analytic signal x(n) deduced from
a real signal is a discrete signal whose Fourier transform X n (f ), which has the period f s = 1, is zero
for − 12 < f < 0 (Figure 9.4).
0 f
–j
194 9 Complex Signals – Quadrature Filters – Interpolators
Xn(f)
–1 0 1 f
H(f)
1
0 1 f
If a discrete signal x(n) is obtained by sampling a continuous analytic signal x(t) at a frequency
f s = 1, it is worth noting that the continuous signal can be reconstructed from the discrete values
by using a reconstruction filter which preserves only the signal components contained in the band
(0, f s ), as is shown in Figure 9.4. The reconstruction formula is:
∑
∞
sin[𝜋(t − n)] j𝜋(t−n)
x(t) = x(n) e (9.23)
n=−8
𝜋(t − n)
Sampling does not introduce any degradation to an analytic signal x(t) if its spectrum does not
contain any components with frequencies greater than or equal to f s . Thus, the sampling theorem
for an analytic signal is:
An analytic signal which does not contain any components with frequencies greater than or equal
to f m is wholly determined by the set of its values sampled at time intervals of T = /f m .
The set x(n) is decomposed into a real set xR (n) and an imaginary set xI (n), such that:
The corresponding Fourier transforms X nR (f ) and X nI (f ) are obtained from the Fourier transform
X n (f ) by equations (9.19) and (9.20) given above.
1
XnI (f ) = −jXnR (f ) for 0 < f <
2
1
jXnR (f ) for − <f <0
2
The relations between the sets X R (n) and X I (n) are obtained by considering the quadrature filter
whose frequency response is given in Figure 9.5. The impulse response of this filter is the set h(n),
such that:
0 1∕2
h(n) = jej2𝜋nf df +
(−j)ej2𝜋nf df
∫−1∕2 ∫0
( )
2 n𝜋
h(n) = sin2 for n ≠ 0
𝜋n 2
h(0) = 0 (9.24)
By applying the set xR (n) to this filter, we obtain the set xI (n), thus:
2 ∑
∞
sin2 [𝜋(m∕2)]
xI (n) = xR (n − m) (9.25)
𝜋 m=−∞ m
m≠n
9.3 Calculating the Coefficients of an FIR Quadrature Filter 195
h(n)
1
1 1
3 1
5 7
–4 –3 –2 –1
0 1 2 3 4 5 6 7 8 n
and similarly:
2 ∑
∞
sin2 [𝜋(m∕2)]
xR (n) = − xI (n − m) (9.26)
𝜋 m=−∞ m
m≠n
The sets xR (n) and xI (n) are related by the discrete Hilbert transform [4].
Examination of the elements of the set h(n) leads to several observations. Firstly, the fact that
every other element is zero implies that, if the set xR (n) also has every other element zero, then so
must the set xI (n), and further, the sets xR (n) and xI (n) must be interleaved. An example will be
given later.
The impulse response of the quadrature filter corresponds to a linearphase FIR filter as described
in Section 5.2. Its frequency response is:
∑
∞
Q(f ) = −j2 h(n) sin(2𝜋nf) (9.27)
n=1
A realizable quadrature filter is easily obtained by limiting the number of terms over which
summation (9.27) is performed. The frequency response then deviates from the ideal response.
In practice, the filter is specified by setting a limit 𝛿 for the ripple in a frequency band (f 1 , f 2 ),
as shown in Figure 9.6. A satisfactory FIR filter can be obtained using a low-pass filter, and the
results are presented in Chapter 5.
One interesting example is that of the filter whose response is given in Figure 9.7. This filter is
called a half-band filter because the pass band represents half of the useful band.
Further, H (0.25) = 0.5, and the response is assumed to be asymmetric about the point
f c = 0.25 – that is, H(0.25 + f ) = 1 − H(0.25 − f ). This filter is specified by the transition band
Δf and the ripple in the pass and stop bands, which are equal to 𝛿 0 . Its coefficients can be
196 9 Complex Signals – Quadrature Filters – Interpolators
1+δ
1–δ
f1 f2
–1 0 1 f
2 2
–1 + δ
–1 – δ
0.5
1 0 1 1 f
–
2 4 2
calculated using conventional FIR filter design programs. This type of filter is of interest because
the symmetry of its frequency response implies that the coefficients hn are zero for even values of
n except for h0 . Thus, if n = 2p, decomposition of H(f ) into the sum of two functions leads to:
[ Δf
0.25 2
h2p = cos(2𝜋2pf)df + (−1)p H(0.25 + f ) cos 4𝜋pfdf
∫0 ∫0
]
Δf ∕2
+ [H(0.25 − f ) − 1] cos 4𝜋pfdf
∫0
and hence,
h2p = 0
Translation of this response through 0.25 on the frequency axis leads to the function H ′ (f ) such
that:
[ ]
∑M
′ −j2𝜋2Mf 1 i
H (f ) = H(f − 0.25) = e 1−2 (−1) h2i−1 sin[2𝜋(2i − 1)f ]
2 i=1
They have imaginary values. By comparing the expression for H ′ (f ) with equation (9.27) for Q(f ),
it becomes clear that the set of coefficients an such that:
Real Analytic
signal signal
Quadrature
filter
represents the coefficients of a quadrature filter whose ripple equals 2𝛿 0 in the band:
[ ]
1
Δf ∕2, − Δf ∕2 .
2
Example. For the specification 𝛿 0 = 0.01 and Δf = 0.111, it is found that M = 5.
a1 = 0.6283 a3 = 0.1880
a5 = 0.0904 a7 = 0.0443
a9 = 0.0231
The expression H ′ (f ) corresponds to a complex filter which contains two parts: a circuit
producing a delay of 2M elementary periods, and a quadrature filter, as shown in Figure 9.8. The
outputs of these two circuits form the real and imaginary parts of the complex signal. The system
can be said to comprise two branches – one real and the other imaginary. It allows a real signal to
be converted to an analytic signal and is thus an analytic FIR filter.
It should be noted that this property holds true even if the fundamental low-pass filter is not
of the half-band type. In this case, the coefficients with even index do not cancel out. In fact,
the translation through a frequency of 0.25 corresponds to multiplication of the coefficients by a
complex factor, so that the h′n takes the values:
Under these conditions, the real branch of the system is not a simple delay. A filtering function
is achieved at the same time as the analytic signal is generated.
Thus, circuits with finite impulse response allow an ideal quadrature filter to be approximated
without error in the phase shift but with approximation of the amplitude in the pass band. Circuits
with infinite impulse response, or recursive circuits, provide an alternative approach. By using
pure phase shifters, they allow for the approximation of the quadrature filter with no error in the
amplitude, but with an approximation in the phase.
A recursive phase shifter is characterized by the fact that the numerator and denominator of its
Z-transfer function are image polynomials. That is, they have the same coefficients but in reverse
order. The properties of phase shifters were introduced in Section 6.3.
It is possible to design a pair of phase shifters such that the output signals have a phase difference
which approximates 90∘ with an error of less than , in a given frequency band (f 1 , f 2 ). The cal-
culation techniques are the same as for IIR filters. The procedure for producing a phase difference
with elliptic behavior is as follows [5]:
198 9 Complex Signals – Quadrature Filters – Interpolators
By taking N = 5, we obtain:
p0 = −0.0395z0 = 0.9240
p1 = −0.3893z1 = 0.4396
p2 = −3.8360z2 = −0.5864
p3 = −1.0039z3 = −0.00197
p4 = −0.1509z4 = 0.7377
In forming the circuit, the first three zeros z1 are attributed to one branch and the last two to the
other. The variation in the phase difference with frequency is given by the function 𝜙(f ) represented
in Figure 9.9.
π +ε
2
π –ε
2
0 f1 f2 0.5 f
9.5 Single Side-Band Modulation 199
Recursive phase shifters allow two orthogonal signals to be obtained. It should be noted that in
performing this operation, they also introduce phase distortion which is the same for both signals.
These circuits can be used in modulation and multiplexing equipment.
Modulation of a signal results in a displacement of the spectrum on the frequency axis. It is sin-
gle side-band (SSB) modulation if, for a real signal, the part of the spectrum which corresponds
to positive frequencies is displaced toward positive frequencies and the part which corresponds
to negative ones is displaced toward negative frequencies. Thus, the signal s(t) = cos 𝜔t has the
modulated signal:
sm (t) = cos(𝜔 + 𝜔0 )t
(1) Form the analytic signal sa (n) = sR (n) + jsI (n) which corresponds to the real signal represented
by the set s(n).
(2) Multiply the set sa (n) by the set of complex numbers:
cos(2𝜋nf0 ) − j sin(2𝜋nf0 )
and retain the real part sm(n) of the set thus obtained. Then,
The signal spectrum evolves as shown in Figure 9.10 and the corresponding circuits are shown
in Figure 9.11.
If the analytic filter is of the FIR type, the set sR (n) is simply the delayed s(n), which is different
from the case of 90∘ recursive phase shifters. The set which corresponds to the modulated signal
sm (n) can be added to other modulated sets to provide, for example, a frequency-multiplexed signal
in telephony.
S(f)
0 f
Sa(f)
0 f0 f
Sm(f)
–f0 0 f0 f
H2
si(n)
This is the transfer function of a phase shifter of the second order, which was introduced in
Section 6.3 and whose group delay is given by equation (6.45). One can write:
H(Z) = H1 (Z)H2 (Z)
where H 2 (Z) is a function which is zero at Z0−1 and introduces a smaller phase shift than H(Z).
By iteration, it results that minimum phase shift is achieved with the function H m (Z), which is
obtained by replacing all the zeros in H(Z) outside the unit circle with their inverse.
The formulation of the minimum phase condition is that the function:
log[H(Z)] = log[A(Z)] − j𝜙(Z)
should not have any poles outside the unit circle.
Under these conditions, the functions log[A(f )] and 𝜙(f ) are related by equations (9.11) and
(9.12), which correspond to the Hilbert transform. They are the Bayard–Bode equations for discrete
systems:
1∕2
log[A(f )] = K − 𝜙(f ′ ) cot𝜋(f − f ′ )df ′ (9.31)
∫−1∕2
1∕2
𝜙(f ) = log[A(f ′ )] cot𝜋(f − f ′ )df ′ (9.32)
∫−1∕2
The constant K is a scaling factor for the amplitude.
Similarly, a function H M (Z) whose zeros are outside the unit circle is said to be maximum phase.
9.7 Differentiator
D(𝜔) = 𝜔 for 0 ≤ 𝜔 ≤ 𝜔1 ≤ 𝜋
If the upper band edge 𝝎1 equals 𝝅, the filter is said to be full-band.
The frequency response of the corresponding N coefficient digital filter is expressed by:
( )
𝜋
−j + 𝜔(N−1)
H(𝜔) = R(𝜔)e 2 2 (9.34)
R(𝝎) is the real function:
∑
P
R(𝜔) = hi sin(i𝜔); N = 2P + 1 (9.35)
i=1
∑P (( ) )
1
R(𝜔) = hi sin i− 𝜔 ; N = 2P
i=1
2
The corresponding impulse responses have been given in Section 5.2.
Conventional design techniques apply to these devices – amongst others, the least-squares
method. The full-band case is particularly simple, because N is even, as shown by equation (9.35)
above, and the least-squares method leads to the following set of coefficients [6]:
8 (−1)i+1
hi = (9.36)
𝜋 (2i − 1)2
202 9 Complex Signals – Quadrature Filters – Interpolators
If the desired function D(𝝎) is proportional to a power of the frequency, the differentiator is said
to be of higher order.
The conversion of a real signal into a complex signal often involves an interpolation operation,
particularly in analog-to-digital converters in communication transceivers.
Alternatively,
∑
N−1
e−j𝜏𝜔 e−j(i−K) 𝜔 ≈ 1 (9.40)
i=0
and finally:
∑
P
1
G(𝜔) = hi e−j( i+𝜏) 𝜔 ≈ 1; |𝜏| ≤ (9.42)
i=−P
2
x(t)
0 T 2T 3T t
The interpolator coefficients hi are derived from this expression and conventional approximation
techniques such as the least-squares method can be used.
For systems in which the delay may vary in time, such as synchronization loops, it is helpful to
be able to relate the coefficient values to the delay values, in order to make the interpolator be able
to track the delay evolution. Lagrange interpolation is a suitable approach in this case.
(𝜏 − 1) h−1 + 𝜏 h0 + (𝜏 + 1) h1 = 0
2 2 2
x(n)
+ + + y(n)
τ τ
with:
∑
P
C0 = 1; Cj (Z) = bij Z −i
i=−P
The corresponding implementation diagram is provided in Figure 9.13. It can easily be adapted
to the evolutions of the delay.
The connection with the interpolation formula mentioned in Section 5.6 can be established from
the definition of the filter coefficients.
The general expression of coefficients hi , a solution of (9.45), is obtained by noticing that the
square matrix is of Vandermonde type. Determinants and sub-determinants are zero whenever 2
rows or 2 columns are identical. It follows that they can be expressed as products.
Generalizing to arbitrary periods of time Δ, the interpolation filter coefficients are written as:
∏ Δ−j
N−1
ai = ;0 ≤ i ≤ N − 1 (9.49)
j=0;j≠i
i−j
and interpolated values:
∑
N−1
x(nT − Δ) ≈ y(n) = ai x[(n − i)T] (9.50)
i=0
The quality of the interpolation depends on the number of coefficients and, when this number
tends toward infinity, the sampling formula (1.57) can be obtained using the identity:
∏( t
)
sin 𝜋t = 𝜋t 1− (9.51)
n≠0
n
which, letting 𝚫 = t/T, leads to:
∏ Δ − j ∏ k − (Δ − i) sin 𝜋(Δ − i)
ai = = = (9.52)
j≠i
i−j j≠0
k 𝜋(Δ − i)
In certain applications, such as curve fitting or image processing, data are available in sets or
blocks, and therefore, the values have to be interpolated in the blocks.
⎪ 3
B3 (t) = ⎨ 2−|t| ; 1 ≤ t < 2 (9.54)
6
⎪
⎪0; 2 ≤ |t|
⎩
Given a set of N sample values s(nT),0 ≤ n ≤ N − 1, inverse filtering is carried out first, so that a
new set x(n) is obtained, to which the interpolation filter is applied.
With the cubic spline, assuming T = 1, the Z-transfer function of the samples is:
Z + 4 + Z −1
B3 (Z) = (9.55)
6
The inverse is, in product form:
6 1 1
B−1
3 (Z) = √ √ √ (9.56)
2+ 3 1 + (2 − 3)Z −1 1 + (2 − 3 )Z
or else:
√ [ ]
6−3 3 1 1
B−1
3 (Z) = √ √ + √ −1 (9.57)
2 3−3 1 + (2 − 3)Z −1 1 + (2 − 3 )Z
Of the 2 factors in (9.56), one is causal, and one is anti-causal. The sequence s(n) can be processed
by the first factor and the output u(n) is:
√
u(n) = s(n) − (2 − 3 ) u(n − 1) (9.58)
The second factor is applied to u(n) and the output is the desired sequence x(n):
6 √
x(n) = √ u(n) − (2 − 3 ) x(n + 1) (9.59)
2+ 3
In that case, the calculations must be carried out in reverse order.
The two above recurrences require initial values – namely u(0) and x(N−1) – whose determina-
tion depends on limit conditions of the data block. In general, a symmetrical extension outside the
data block is retained – that is, s(−n) = s(n) and s(N−1 + k) = s(N−1−k). The periodicity is 2N − 2
and the series development of the first factor in (9.56) leads to the following initial value:
∞ √
∑
u(0) = ( 3 − 2)n s(n)
n=0
206 9 Complex Signals – Quadrature Filters – Interpolators
which is:
1 ∑ √
2N−3
u0 = √ ( 3 − 2)n s(n) (9.60)
1 − ( 3 − 2)2N−2 n=0
For large values of N, depending on the level of accuracy required, the summation may be limited
to the first terms.
In the other direction, x(N−1) can be calculated directly with the following scheme – start from
(9.57), perform series developments of the two terms, use the corresponding expression of u(n)
as a function of s(n) for n = N−1; then, due to the symmetry of s(n) about N−1, the following
initialization is obtained:
√
6−3 3
x(N − 1) = √ [2 u(N − 1) − s(N − 1)] (9.61)
2 3−3
It is readily verified that the constant signal s(n) = 1 leads to:
1
u(0) = √ = u(n); x(N − 1) = 1 = x(n)
3− 3
Once the transformed block has been made available, interpolation is carried out:
∑
s(t) = x(n) Bm (t − nT) (9.62)
n
The summation is limited to the terms for which the spline function is not null. As in the previous
section, when the degree n grows, this interpolator tends toward the ideal interpolator.
An important special case is the restoration of a block of pixels in an image from the bordering
pixels.
The compression of fixed images or video streams is often based on decomposition into
blocks – for example, blocks of 8 × 8 or 16 × 16 pixels, in combination with the discrete cosine
transform (DCT-2D). During transmission or manipulation, some blocks may be lost or damaged
and some restoration action is needed to minimize the visual impact. Then, the above technique
can apply.
According to the definition given in Section 3.3.4, without the scale factors, considering a M × N
matrix of real data x(i,j), the DCT delivers a matrix with the same dimensions:
( ) ( )
∑∑
M+1N+1
𝜋(2i + 1)k 𝜋(2j + 1)l
y(k, l) = cos cos x(i, j) (9.66)
i=0 j=0
2M 2N
The impact of the discontinuities at the beginning and end of the sequence, mentioned in
Section 2.1, is avoided by this transform. A property of most of these images is that they have few
high-frequency components. This property can be exploited in the restoration of unknown blocks
from their surrounding pixels.
The unknown block is an M × N rectangle to be determined from the 2(M + N + 2) surrounding
pixels, as illustrated in Figure 9.14a. In the transformed matrix, the M × N high-frequency elements
are set to zero, as indicated in Figure 9.14b.
Once the matrices have been scanned and data are available in vectors, expressions (9.64) and
(9.65) apply. Elements of the matrices T ij (1 ≤ i, j ≤ 2) are the elements of the DCT. The term
G = −T−𝟏 𝟐𝟐 T𝟐𝟏 is called the interpolation mask.
The approach yields particularly simple results because it turns out that every unknown pixel X
is obtained from eight neighboring pixels, as indicated in Figure 9.15, by
The parameters a and b depend on the block dimensions M and N. Table 9.1 lists a set of
numerical values.
For the details of the calculations leading to expression (9.67), and for examples with test images,
see reference [10].
(a) (b)
Figure 9.14 (a) Unknown and known pixels; (b) DCT coefficients.
208 9 Complex Signals – Quadrature Filters – Interpolators
1–a
X8 X7
X3
M a, b
2 0.7071
3 0.8090 0.5
4 0.8660 0.6340
5 0.9010 0.7225 0.5
6 0.9239 0.7832 0.5995
7 0.9397 0.8264 0.6736 0.5
8 0.9511 0.8580 0.7298 0.5792
9 0.9595 0.8818 0.7731 0.6423 0.5
10 0.9659 0.9001 0.8070 0.6930 0.5658
11 0.9709 0.9145 0.8340 0.7341 0.6205
9.12 Conclusion
The transformation of real signals into complex signals is a filtering operation, and the quadrature
filter involved can have any frequency response. In particular, when its response is proportional to
frequency, it is called a differentiator.
In practice, real–complex conversions are efficiently implemented through interpolation, with
a half-band filter. The mask of the filter is defined from the performance objectives in terms of
frequency distortion and image–band residuals.
Interpolation is a fundamental operation linked to sampling. In theory, it is defined by the sam-
pling formula which corresponds to a linear-phase infinite impulse response filter. In practice, a
linear-phase FIR filter is used, in order to keep known samples untouched. A particularly important
case is Lagrange interpolation, which corresponds to a max-flat filter, whose frequency response
derivatives vanish at the origin.
Block interpolation, as in image processing, can use filter responses that do not preserve known
samples, such as spline functions, which constitute another approximation of the ideal interpolator.
Lost samples or lost blocks of samples in 1D and 2D signal sequences can be efficiently restored
by interpolation techniques.
Exercises 209
Exercises
9.1 Calculate the Fourier transform X(f ) of the real causal set x(n) such that:
x(n) = 0 for n < 0
an for n ≥ 0
n≥0
with:
|a| < 1
Decompose X(f ) into real and imaginary parts.
9.3 Starting from a real signal represented by the set x(n), a complex signal is formed whose real
and imaginary parts are given by:
xR (n) = x(n) cos[2𝜋(n∕4)]
xI (n) = x(n) sin[2𝜋(n∕4)]
What comments can be made about the sets xR (n) and xI (n)?
Is the signal obtained an analytic signal?
A half-band filter is applied to each of the sets xR (n) and xI (n) and the complex signal filtered
in this way is multiplied by the complex set e−j2𝜋n/4 . What operation has been carried out on
the real signal x(n)? Perform this set of operations on the signal x(n) = cos (𝜋n/5).
9.4 Study the effect of coefficient wordlength limitation on the quadrature FIR filter. Following
the procedure described in Sections 5.7 and 5.10 for linear phase filters, find an expression
for the estimate of the coefficient wordlength as a function of the quadrature filter parame-
ters – that is, ripple and transition band.
9.5 Give a simplified expression for the order of the 90∘ IIR phase shifter. Study the effect
of the coefficient wordlength limitation and find a formula for estimating the coefficient
wordlength as a function of the parameters. Check the results in the example given in
Section 9.4.
This is a minimum-phase function. Give the expression for the linear-phase function and
the maximum-phase function which have the same amplitude characteristic. Compare the
impulse responses.
9.7 A real signal with sampling frequency 2f s is converted into a complex signal with sampling
frequency f s with the following half-band filter:
H(Z) = −0.0506 + 0.2954 Z −2 + 0.5 Z −3 + 0.2951 Z −4 − 0.0506 Z −6
Calculate the filter response at frequencies 0 and f s /8. Derive the ripple and transition
bandwidth. ( )
The filter is included in an IQ modulator fed with the sequence x(n) = sin n𝜋 4
. Give
the expression of the complex output y(n) and show that it has frequency components at
frequencies f s /4 and 3f s /4. Give the amplitudes of these components.
n 5 6 7 8 9 10 11
Using the first values of h(n), justify the values x c (n) in the vicinity of n = 8.
Comment on the accuracy of the approach for quadrature filter design.
References
1 E. A. Guillemin, Theory of Linear Physical Systems. John Wiley, New York, 1963.
2 B. Picinbono, Principles of Signals and Systems. Artech House Inc., London, 1988.
3 A. Oppenheim and R. Schafer, Digital Signal Processing, Chapter 7. Prentice-Hall, Englewood
Cliffs NJ, 1974.
4 B. Gold, A. Oppenheim and C. Rader, Theory and implementation of the discrete Hilbert trans-
form. Proc. of Symp. Computer Processing in Communications, Vol. 19. Polytechnic Press, New
York, 1970.
5 B. Gold and C. Rader, Digital Processing of Signals, Chapter 3. McGraw-Hill, New York, 1969.
6 G. Mollova, Compact formulas for least squares design of digital differentiators. Electronics
Letters, 35(20), 1695–97, 1999.
7 T. I. Laakso, V. Valimaki, M. Karjalainen and U. K. Laine, Splitting the unit delay—tools for
fractional delay filter design. IEEE Signal Processing Magazine, 13(1), 30–60, 1996.
8 J. J. Fuchs, B. Delyon, Minimum L1-norm reconstruction function for 0versampled signals:
application to time delay estimation. IEEE Transactions, 46, 1666–73, 2000.
9 M. Unser, Splines: a perfect fit for signal and image processing. IEEE Signal Processing
Magazine, 16(6), 22–38, 1999.
10 Z. Al Kachouh, M. Bellanger, Efficient restoration technique for missing blocks in images. IEEE
Transactions on Circuits and Systems for Video Technology, 13(12), 1182–86, 2003.
213
10
Multirate Filtering
Multirate filtering is a technique for reducing the calculation rate needed in digital filters and, in
particular, the number of multiplications to be performed per second. As will be shown later, this
parameter is generally regarded as a reflection of the complexity of the system.
In a filter, the number of multiplications M R to be performed per second is given by:
MR = Kfs
where f s is the frequency at which the calculations are made. The parameter f s generally corre-
sponds to the sampling frequency of the signal represented by the numbers to be processed. The
factor K depends on the type of filter and on its performance.
In reducing the value of M R , the factor K can be influenced by choosing the most appropriate
type and structure of a filter and by optimizing the order of that filter to suit the constraints and
required characteristics. Also, f s can be influenced by changes in the sampling frequency during
processing. In many practical cases, the advantages thus obtained are considerable.
The sampling frequency for a real signal must be more than twice its bandwidth, which can vary
during processing. For example, a filtering operation eliminates undesirable components, so the
useful bandwidth is reduced. Once the useful bandwidth has been decreased, the sampling fre-
quency of the signal can itself be reduced. As a result, the sampling frequency can be adapted to
the bandwidth of the signal at each stage of processing, so as to maximize the filter’s computa-
tion speed. Before studying the development and implementation of this fundamental principle,
it is appropriate first to analyze the effect of a change in the sampling frequency on the signal and
its spectrum.
(a)
S(f)
S(Z)
S(f – 1/2)
S(– Z)
S0(f)
S(Z) + S(– Z)
S1(f)
S(Z) – S(– Z)
f
(c)
Figure 10.1 Downsampling by 2. (a) Decimation operations; (b) decimation symbol; (c) spectra and
Z-transforms.
The term Z 2 characterizes the doubling of the sampling period, while the term Z −1 reflects the
interleaving operation. The Z-transform of s(n) is recovered as:
S(Z) = S0 (Z 2 ) + Z −1 S1 (Z 2 )
The decomposition and reconstruction formulas, expressed above for factor two, can be general-
ized to any whole number M, and this generalization is presented below with the help of Fourier
transforms.
Given the signal s(t) whose spectrum S(f ) has no component with frequency higher than f m , and
assuming that the signal is sampled with period T such that:
where M is a whole number, let us examine the relationship between the Fourier transforms Si (f )
of the sets:
[( ) ]
i
s n+ MT ; i = 0,1, 2, … , M − 1
M
From the results in Section 1.2, the Fourier transform of the distribution u0 (t),
∑
∞
u0 (t) = 𝛿(t − nMT)
n=−∞
10.1 Decimation and Z-Transform 215
S0(f)
1
0 1 2 3 1 f
4T 4T 4T T
S1(f)
1
0 1 f
T
S2(f)
1
0 1 f
T
S3(f)
1
0 1 f
S4(f) T
0 1 f
T
It is interesting to note, in Figure 10.2, that retarding the set of sampling pulses causes phase
rotations through multiples of 2𝜋/M for image bands around multiples of the sampling frequency
1/MT.
Addition of all the sets of retarded pulses causes a cancellation of the image bands except around
frequencies which are multiples of 1/T, which becomes the new sampling frequency. This is an
application of the linearity properties of the Fourier transform.
It is also useful to establish relations between the Z-transfer functions of the sequences involved.
Let S(Z) be the Z-transform of the set s(nT). By definition:
∑
∞
S(Z) = s(nT)Z −n (10.6)
n=−∞
The spectrum SM (f ) of the signal sampled with period T is obtained by replacing Z with (ej2𝜋fT ).
Then:
SM (f ) = S(ej2𝜋fT)
Defining Si (Z M ) by:
∑
∞
M
Si (Z ) = S(nMT + iT)Z −nM
n=−∞
we obtain:
∑
M−1
S(Z) = Si (Z M )Z −i (10.7)
i=0
The terms si (Z M ) are Z-transforms of the sets s[(n + i/M)MT] for i = 0, 1, …, M − 1. The factor
Z −i reflects the interleaving of these sets.
Now, it is necessary to express Si (Z M ) as a function of S(Z). Substituting Ze−j2𝜋m/M for Z in
equation (10.7), we find:
( j2𝜋m
) M−1
∑
S Ze− M = ej2𝜋mi∕M Z −i Si (Z M )
i=0
⎡ (S(Z)j2𝜋 ) ⎤ ⎡ S0 (Z M ) ⎤
⎢ − ⎥ ⎢ −1 M ⎥
⎢ S Ze M ⎥ ⎢ Z S1 (Z ) ⎥
⎢ . ⎥ −1 ⎢ . ⎥
⎢ ⎥ = TN ⎢ . ⎥
⎢ . ⎥ ⎢ ⎥
⎢ . ⎥ ⎢ . ⎥
⎢ −j2𝜋(M−1)M )⎥ ⎢Z −(M−1) S (Z M )⎥
⎣S(Ze ⎦ ⎣ M−1 ⎦
where TN−1 is the matrix of the inverse DFT. Multiplying the two sides of this equation by T N ,
we get:
M−1e−j2𝜋(im∕M)
∑
Z −i Si (Z M ) = (1∕M) S(Ze−j2𝜋(m∕M) ); 0 ≤ i ≤ M − 1 (10.8a)
m=0
1 ∑ im i
M−1
Si (Z M ) = W Z (S(ZWm )) (10.8b)
M m=0
Multirate filtering will be introduced first for FIR filters, where it appears naturally. Consider a
low-pass FIR filter which eliminates components with a frequency greater than or equal to the fre-
quency f c in a signal sampled at frequency f s . The filtered signal only requires a sampling frequency
equal to 2f c , and in fact, it is sufficient to provide the output numbers at this frequency.
218 10 Multirate Filtering
Samp. fr. f0 f2
In an FIR filter of order N, the relation which determines the number of the output set y(n) from
the set of input numbers x(n) is written as:
∑
N−1
y(n) = ai x(n − i) (10.9)
i=0
Each output number y(n) is calculated from a set of N input numbers by weighted summation
with the coefficients ai (i = 0, 1, …, N − 1). Under these conditions, the input and output rates are
independent and a decrease in the output sampling rate by a factor k = f s /2f c , assumed to be an
integer, results in a reduction in the computation rate by the same factor.
The same reasoning applies to raising the sampling rate, or interpolation. In this case, the output
rate is greater than the input rate. To show the savings in computation, it is sufficient simply to
regard the rates as being equal by incorporating a suitable amount of null data into the input set.
The independence of the output from the input in FIR filters can be exploited in narrow
band-pass filters, even if the input and output rates are identical, by dividing the filtering operation
into two phases [1]:
(1) Reducing the sampling frequency from the value f s to an intermediate value f 0 such that:
f0 ≥ 2fc
Figure 10.3 illustrates this decomposition. If the two operations are carried out with similar filters
of order N, the number of multiplications M D to be performed per second is given by:
MD = Nf0 × 2 (10.10)
This value is to be compared with direct realization by a single filter, which results in the value
M R such that:
MR = Nfs (10.11)
(1) Aliasing of residual components with frequencies greater than f 0 /2 into the band below f 0 /2.
Harmonic distortion results. Its power BR depends on the attenuation of the sampling rate
reduction filter and is calculated from its transfer function H(f ) by using the results given in
the previous chapters.
10.2 Decomposition of a Low-Pass FIR Filter 219
H(f)
1+δ
1
1–δ
0 f1 fc f
(a)
H
1+δ
1
1–δ
0 fc 2fc f
(b)
H
1 + 2δ
1
1 – 2δ
0 fc 2fc f
(c)
Figure 10.4 Frequency response of multirate filter, (a) direct filter response; (b) sample rate reduction filter
response; (s) multirate filter response.
For example, if the input signal has a uniform spectral distribution and unit power, the total
power BT of the aliased signal is:
s f −f0 ∕2
1
BT = |H(f )|2 df (10.12)
fs ∫f0 ∕2
The distortion can be assumed to have a uniform spectral distribution and only the power in
the pass band is considered. In this case, we obtain:
[N−1 ]
2f1 ∑ 2 2f1
BR < |a | − (10.13)
f0 i=0 i f2
Allowance must be made for this degradation when calculating the sampling rate reduction
filter [2].
(2) The periodicity in frequency of the response of the sampling rate reduction filter, with period
f 0 , introduces a distortion whose power Bi is a function of the attenuation of the interpolation
filter.
220 10 Multirate Filtering
If this filter is the same as the sampling rate reduction filter, with the same assumptions, we
obtain:
fs −f0 ∕2
Bi = (1∕fs ) |H(f )|2 df
∫f0 ∕2
This distortion outside the pass band can be troublesome when other signals need to be added
to the filtered signal.
(3) Cascading two filters increases the ripple in the pass band. For example, the ripple is doubled
if identical filters are used in both operations.
Finally, the subunits of the multirate filter should be designed so that the complete system satis-
fies the overall specifications imposed on the filter [3].
The circuit in Figure 10.3 is simplified if the sampling frequencies of the signal before and after
filtering can be different. The principle can also be applied to high-pass and band-pass filters, pro-
vided, for example, modulation and demodulation stages are introduced.
The principle of decomposition can be extended to the sampling rate reduction subunit and to
the interpolation subunit, which introduces a further advantage. The FIR half-band filter is a par-
ticularly efficient element for implementing these subunits.
The coefficients hn are zero for even value of n except for h0 . Figure 10.5 illustrates the properties
of this filter. The specification is defined by the ripple (𝛿) in the pass and stop bands, and by the
width Δf of the transition band. Given these parameters, the formulae in Section 5.7 can be used
to estimate the filter parameters. The filter order is:
[ ]
2
N ≃ log(1∕10𝛿 2 )fs ∕Δf
3
By taking the attenuation Af as:
Af = 10 log(1∕𝛿 2 )
and bearing in mind the particular significance of the frequency f s /4 in this type of filter, one can
write:
[M ≃ (2∕3)(Af ∕10 − 1)fs ∕4Δf ]
Hence, the relationship between the attenuation (in decibels) and the transition band, for a given
number of coefficients, can be expressed simply as:
Af = 10 + 15M(4Δf ∕fs ) (10.16)
10.3 Half-Band FIR Filters 221
hi
0.5
h1 h1
h5
h5 h3
h3
i
H(f) Δf
0.5
0 fe fe fe f
4 2
This approximation is valid when M exceeds a few units. The coefficients are calculated using
the general program for FIR filters, with appropriate data. The relationship between the filter input
and output sets is:
[ ]
1 ∑M
y(n) = x(n − 2M) + h2i−1 [x(n − 2M + 2i − 1) + x(n − 2M − 2i + 1)] (10.17)
2 i=1
and M multiplications have to be performed for each element of the output set y(n). It should be
noted that these operations are performed only on elements of the input set with an odd index. It
follows that if such a filter is used to reduce the sampling frequency from f s to f 0 = f s /2, the number
of multiplications to be performed per second is Mf 0 . The same is true if the sampling frequency f s
is increased from f 0 to 2f 0 , which is achieved simply by calculating a sample between two samples
of the input set.
Filter h0 h1 h3 h5 h7 h9
F1 1 1
F2 2 1
F3 16 9 1
F4 32 19 −3
F5 256 180 −25 3
F6 346 208 −44 9
F7 512 302 −53 7
F8 802 490 −116 33 −6
F9 8192 5042 −1277 429 −116 18
222 10 Multirate Filtering
Finally, the number of multiplications to be performed per second in a half-band filter with a
change of sampling frequency is:
[ ( ) f ]
2 1 s 1
MR = log f ∕2 (10.18)
2 10𝛿 2 Δf 4 s
Example: A group of half-band filters with useful application characteristics [3] is shown in
Table 10.1, which gives the quantized coefficients, the quantization step being taken as unity [4].
The frequency response can be calculated simply from equation (10.15). Filters F4, F6, F8, and F9
correspond to a unit value of the parameter 4Δf /f S , with ripples of 37, 50, 67, and 79 dB, respectively.
Filters F2, F3, and F5 have monotonic responses.
The advantages of the particular filter structure described in this section can be applied to general
multirate filtering.
The properties of the elementary half-band filter can be used to produce the multirate filter design
shown in Figure 10.6. The intermediate frequency f 0 is:
related to the sampling frequency f s by a power of two:
fs = 2P f0 (10.19)
The sampling frequency is reduced or increased by a cascade of P half-band filters. The complete
unit comprises a basic filter which operates at frequency f 0 and has a cascade of half-band filters
on either side.
The overall low-pass filter is specified by the following parameters:
(1) Ripple in the pass band: 𝛿 1
(2) Ripple in the stop band: 𝛿 2
(3) Width of the transition band: Δf
(4) Edge of the pass band: f 1
(5) Edge of the stop band: f 2
To calculate the half-band filters, their specifications have to be defined. The ripple in the pass
band is assumed to be divided between the half-band filters and the basic filter. Also, each filter has
to have a ripple in the stop band smaller than 𝛿 2 . As a result, for each half-band filter, the ripple 𝛿 0
is given by:
{ }
𝛿1
𝛿0 = min , 𝛿2 (10.20)
4P
e e i i
F1 FP Basic filter FP F1
x(n) y(n)
fe fe f0 fe
2P
H(f) Δf
Δf1
0 f2 fe fe fe f
f2
2P 2 2
The first half-band filter F1e of the cascade can be determined if its transition band Δf 1 is fixed
(Figure 10.7). To fix Δf 1 , it must be taken that the role of the first filter is to eliminate those com-
ponents of the signal which can be folded in the useful band after having the sampling frequency.
Thus:
f
Δf1 = s − 2f2 (10.21)
2
From equation (10.18), the number of multiplications M C to be performed in the first filter is:
( )( )
2 1 fs 1
MC1 = log 2
(10.22)
3 10𝛿0 Δf 1 4
The same approach can be taken for the other filters in the cascade, and it can be shown that the
total number of multiplications is estimated as:
( )
1 1
Mc ≈ log fs (10.23)
3 10𝛿02
To estimate the volume of calculations to be performed per second in the complete filter, the
order N of the basic filter must be determined:
[ ( ) ]
𝛿1 1 fs
N≈D ,𝛿 (10.24)
2 2 Δf 2P
with:
( ) ( )
𝛿1 2 1
D ,𝛿 = log
2 2 3 5𝛿1 𝛿2
The values of the parameters showing the complexity of the complete filter are, finally:
[ ( )]
1 f 𝛿1
MR = fs D(𝛿0 ) + 2P+1 s D , 𝛿2 (10.25)
2 Δf 2
Example: Consider a narrow low-pass filter defined by the following:
Thus: M R = 6.2
A direct realization results in a filter of order N = 110, which corresponds to the values: M R = 55
224 10 Multirate Filtering
From the point of view of generalizing to apply more broadly, it is important, in order to optimize
multirate processing in more general cases than that of a single filter, to emphasize the phase-shift
function.
The phase relations between different samples of the same signal were examined in detail in
Section 10.1. The results will be used to analyze multirate filtering from this point of view [5].
Assume that the sampling frequency f s is reduced by a factor N. Let X(Z) be the Z-transform of
( )
the input set x(n); the Fourier transform is obtained by replacing Z with ej2𝜋f∕fs , forming ej2𝜋f∕fs .
The output set y(Nn) sampled at frequency f s /N has Y (Z N ), a function of Z N , as its Z-transform.
Consequently, if phase-shifting circuits are involved in this operation, their transfer function is also
a function of the variable Z N and can be calculated from the overall filter function.
The structure of a phase shifter will be discussed initially for the single element making up the
half-band FIR filter, which is defined by equation (10.17). This relation can be rewritten as:
[ ]
1 ∑
2M
y(n) = x(n − 2M) + ai x(n − 2i + 1) (10.26)
2 i=1
in which:
ai = h(2M−2i+1) = a2M−i+1 for 1 ≤ i ≤ M
The corresponding Z-transfer function is:
[ ]
1 ∑
2M−1
−2M −1 −2i
H(Z) = Z +Z ai+1 Z (10.27)
2 i=0
or alternatively:
1
H(Z) = [H (Z 2 ) + Z −1 H1 (Z 2 )] (10.28)
2 0
The corresponding frequency response is:
1 [ −j2𝜋(f ∕fs )2M ]
H(f ) = e + e−j2𝜋(f ∕fs ) H1 (f ) (10.29)
2
The function H 0 (f ) corresponds to a delay, and so is the characteristic of a purely linear phase
shifter.
Because the coefficients are symmetrical, H 1 (f ) is also a linear phase function. Since this part of
the filter is operating at the frequency f s /2, H 1 (f ) displays the periodicity f s /2. As the number of
the coefficients is even, we can use the results from Section 5.2, and write:
( ( ) )
1 2
H1 (f ) = exp −j2𝜋f M − |H1 (f )| (10.30)
2 fs
or:
H1 (f ) = e−j2𝜋(f ∕fs )2M e−j𝜙(f ) |H1 (f )| (10.31)
The function 𝜙(f ) is linear and has the periodicity f s /2. Consequently, it is expressed by:
([ ] )
2f 1 2f
𝜙(f ) = 𝜋 + − (10.32)
fs 2 fs
10.5 Digital Filtering by Polyphase Network 225
|H1 (f )| = 2|H(f )| − 1; 0 ≤ f ≤ fs ∕4
The ripple is double with respect to H(f ) and the function is null at f = f s /4 – the frequency which
corresponds to a change in phase of 𝜋.
Figure 10.8 shows the functions |H 1 (f )| and 𝜙(f ) which characterize the filter.
The phase Φ(f ) such that:
1 δ
0 fe fe f
4 2
π
f f
φ(f) + 2π 2π
fe fe
π
2
0 fe f
π φ(f) 2
2
+
x(n) y(n)
Z –1 H1(Z 2)
226 10 Multirate Filtering
Z –2 M + y(n)
x(n)
Z –1 H1(Z 2) – yʹ(n)
H
2
H0
1
0
f
fe fe
4 2
H1
–1
with:
∑
K−1
Hn (Z N ) = akN+n (Z −N )k
k=0
This filter can be implemented by a network with N paths which is called a polyphase network
because each path has a frequency response which approximates that of a pure phase shifter. The
phase shifts are constant in frequency and are whole multiples of 2𝜋/N. When there is a change
in sampling frequency by a factor of N, the circuits in the different paths of the network operate at
the frequency f s /N.
10.7 Filter Banks Using Polyphase Networks and DFT 227
As infinite impulse response filters have greater selectivity than finite impulse response filters, it
is important to study multirate filters with recursive elements.
The basic technique for calculating a multirate filter with IIR elements is to perform the same type
of decomposition on the transfer function H(Z) of the overall IIR filter as was used to produce
equation (10.36). The function H(Z) is assumed to be a rational fraction in which the denominator
and the numerator have the same degree.
Such a decomposition is obtained by finding the poles of H(Z):
ΠKk=1 (Z − Zk )
H(Z) = a0 (10.37)
ΠKk=1 (Z − Pk )
From the identity:
( )
Z N − PkN = (Z − Pk ) Z N−1 + Z N−2 Pk + · · · + PkN−1 (10.38)
one can write:
( )
ΠKk=1 (Z − Zk ) Z N−1 + Pk Z N−2 + · · · + PkN−1
H(Z) = a0 ( )
ΠKk=1 Z N − PkN
which, in another form, is:
∑KN −i
i=0 ai Z
H(Z) = ∑K
1 + k=1 bk Z −Nk
Thus:
∑K
akN+n Z −Nk
k=0
Hn (Z N ) = ∑K (10.39)
1 + k=1 bk Z −Nk
Nn (Z N )
Hn (Z N ) = (10.40)
D(Z N )
Each path of the polyphase network is thus determined. They all have the same recursive part
and are distinguished by the non-recursive one as shown in equation (10.40). In principle, when
compared with the previous section, the difference is that the individual IIR phase shifters obtained
do not have linear phase.
For realization purposes, it is worth pointing out that the poles are raised to power N, which is
highly advantageous because they are spread inside the unit circle.
A discrete Fourier transform computer is a bank of filters (Section 2.4) which are suitable for
multirate filtering. However, it should be noted that filtering functions achieved in this way have
significant overlap. To improve the discrimination between the components of the signal, the
numbers are weighted before applying the discrete Fourier transform. The weighting coefficients
are samples of functions called spectral analysis windows. The production of filter banks by
228 10 Multirate Filtering
means of a polyphase network and a DFT represents the generalization of such spectral analysis
windows [6, 7].
Assume that a bank is to be created with N filters which cover the band [0, f s ] and are obtained
by shifting a basic filter function along the frequency axis by mf s /N, with 1 < m < N − 1.
If H(Z) is the basic filter Z-transfer function, a change in frequency by mf s /N appears as a change
in the variable from Z to Zej2𝜋m/N . That is, the filter with index m has a transfer function bm (z)
given by:
Bm (Z) = H(Zej2𝜋m∕N )
By applying the decomposition of H(Z) introduced in the earlier sections, this becomes:
∑
N−1
Bm (Z) = Z −n e−j2𝜋(mn∕N) Hn (Z N )
n=0
By allowing for the fact that the functions H n (Z N ) are the same for all the filters, a factorization
can be introduced which results in the following matrix equation:
⎡ B0 ⎤ ⎡ 1 1 1 ··· 1 ⎤⎡ H0 (Z N ) ⎤
⎢ ⎥ ⎢ 2 N−1 ⎥ ⎢ −1 N) ⎥
B
⎢ 1 ⎥=⎢ 1 W W · · · W ⎥⎢ Z H 1 (Z ⎥ (10.41)
⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥
⎢B ⎥ ⎢1 W N−1 W 2(N−1) · · · W (N−1)2 ⎥ ⎢Z −(N−1) H (Z N )⎥
⎣ N−1 ⎦ ⎣ ⎦⎣ N ⎦
where W = e−j2𝜋m/N .
The square matrix is the matrix of the discrete Fourier transform. The bank of filters is realized
by forming a cascade of the polyphase network described in Figure 10.13 and a discrete Fourier
transform.
The operation of this device is illustrated in Figure 10.12 which shows, for the case where N = 4,
the phase shifts introduced at the various points in the system so as to preserve only the signal in
the band [1/2NT, 3/2NT].
The polyphase network has the effect of correcting, in the useful part of the elementary band
[1/2NT, 3/2NT], for the interleaving of the numbers at the output of the discrete Fourier transform
computer. This prevents overlap between the filters and leads to the filter function in Figure 10.13.
This function depends only on the basic filter H(Z), which can be an FIR or IIR filter, and which can
be specified so that the filters in the bank have no overlap or have a crossover point, for example,
at 3 dB or 6 dB.
C0 C0
C1 C1
C2 Δφ = 0 C2
C0 C3 C3
C3 C0
π
C0 Δφ = — C1
C1 C2 2 C3
C1 C2 C1
TFD A
C1 C0 C0 C1
Δφ = π
C2 C3 C2 C2 C3
C1 C2
C0 3π f
Δφ = — C1 0 1 3 1
C3 C2 2 C3
C3 C0 8T 8T T
H(f)
0 1 1 2 f
2NT NT NT
The impulse response of the basic filter is the spectral analysis window of the system. If the filters
are specified to have no frequency overlap, the sampling frequency at the output of the filters, or at
the input according to the method of use, can have the value 1/NT and the overall calculation can
be performed at this frequency.
If N is assumed to be a power of 2, the fast Fourier transform algorithm can be used to calculate
the discrete Fourier transform, and the number of real multiplications M R to be performed during
a sampling period in the complete system is:
[ ]
N N
MR = N × 2K + 2N log2 = 2N K + log2 (10.42)
2 2
This value can be compared with the value 2KN 2 required by N IIR filters of the same order
operating at frequency 1/T. An application in telecommunications will be discussed in a later
chapter.
10.8 Conclusion
FIR filters have the property that the sampling frequencies at the input and output are indepen-
dent. This can be exploited to adapt the computation rate to the bandwidth of the signal during
processing. This property can be extended to recursive structures through the use of a suitable
transformation. Phase shifting is the basic function involved.
Multirate filtering can be applied to narrow-band filters. It can provide considerable savings in
computation when there is a factor of more than an order of magnitude between the sampling
frequency and the pass band of the filter, as often occurs in practice.
The use of these techniques requires more detailed analysis of the processing. The limitations
on their use result primarily from complications in the computation sequencing produced by the
cascading of stages operating at different frequencies. This point should be examined carefully for
each potential application of multirate filtering so that an excessive increase in the control unit or
in the instruction program does not outweigh the computational gains.
Exercises
10.1 Give the number of bits assigned to the coefficients of the half-band filters in Table 10.1.
Use the results in Section 5.8 to evaluate the extra ripple introduced in the half-band filters
by limiting the number of bits representing the coefficients. Test this evaluation on the filters
F 6 , F 8 , and F 9 .
230 10 Multirate Filtering
10.2 Estimate the number of coefficients of the three filters in the cascade in the example in
Section 10.4.
As each multiplication result is rounded, analyze the round-off noise produced by the reduc-
tion in sampling rate and give an expression for its power. Perform the same analysis for an
increase in sampling rate and the basic filter. Compare the results with the direct realization.
Give an estimate of the aliasing distortion.
10.3 A filter for a telephone channel has a pass band which extends from 0 to 3400 Hz with a rip-
ple of less than 0.25 dB. Above 4000 Hz, the attenuation is greater than 35 dB. For a sampling
frequency f s = 32 kHz, design a multirate filter as a cascade of half-band filters. Compare
the number of multiplications and additions to be performed per second and the size of the
memory with the values obtained for direct FIR filtering.
The signal applied to the filter is composed of numbers with 13 bits, and the computations
are performed using 16-bit registers. Evaluate the power of the round-off noise with a reduc-
tion and an increase in the sampling rate.
10.4 A discrete Fourier transform computer is a bank of uniform filters with characteristics as
shown in Section 2.4. Study the phase shifts introduced in the odd Fourier transforms and in
the doubly odd Fourier transforms in Section 3.3. What are the characteristics of the banks
of filters so obtained?
10.5 Consider a bank of two filters to be produced using the IIR filter of order 4 given as an
example in Section 7.2. The zeros and the poles of the upper half-plane have coordinates
(Figure 7.5):
Z1 = −0.816 + j0.578; Z2 = −0.2987 + j0.954
P1 = 0.407 + j0.313; P2 = 0.335 + j0.776
Using the equations in Section 10.6, calculate the Z-transfer functions of the polyphase net-
work paths. Give the coordinates of the poles and the zeros in the complex plane.
Use the results of Chapter 7 to determine the effect on the frequency response of limiting
the number of bits in the coefficients of the denominator of the transfer function. Compare
with the direct realization.
Draw the circuit diagram of this bank of two filters and determine the number of
multiplications required, assuming that the sampling frequency at the output of each filter
is half the value at the input.
References
11
The compression of certain signals, such as speech, sound, or images, involves sub-band decom-
position combined with sampling rate reduction and reconstruction from sub-bands after storage
or transmission. The simplest approach to these operations is to use banks of two filters [1].
H0(Z ) 2 2 G0(Z )
x(n) x† (n–K)
X(n)
H1(Z ) 2 2 G1(Z ) ˜ )
X(Z
H0(ZN) H0–1(ZN)
N H0(ZN) G0(ZN) N
N H1(ZN) G1(ZN) N
x(n) x^(n)
TFD TFD–1
^
X(z) ANALYSIS SYNTHESIS X(z)
N HN–1(ZN) GN–1(ZN) N
H(Z) = H0 (Z 2 ) + Z −1 H1 (Z 2 ) (11.4)
11.2 QMF Filters 235
Next, it is necessary to determine the conditions to be met by the prototype filter so that the basic
relations (11.1) and (11.2) hold. The system transfer function is expressed by:
T(Z) = Z −1 H1 (Z 2 ) H0 (Z 2 ) (11.5)
The two polyphase components are linked to the prototype filter by (see Section 10.1):
H0 (Z 2 ) = 12 [H (Z) + H(−Z)]; Z −1 H1 (Z 2 ) = 12 [H(Z) − H(−Z)] and the transfer function is:
1
T(Z) = [H 2 (Z) − H 2 (−Z)] (11.6)
4
It can easily be shown that in order to obtain the sign inversion needed for reconstruction in
(11.2), it is necessary to choose a prototype filter with an even number of coefficients: N = 2P. The
frequency response of such a filter is written as (see Section 5.2):
√
stop-band edge f 2 , the ripples 𝛿 1 and 𝛿 2 , and imposing the amplitude 2∕2 at frequency 1/4. In
order for H 2 (f ) to meet condition (11.10) with ripple 𝛿, it is sufficient to take:
1 𝛿 √
f2 = − f1 ; H(0) = 1; 𝛿1 = ; 𝛿2 = 𝛿
2 2
Example:
A low-pass filter has the following parameters:
Δf = 0.24; f1 = 0.13; f2 = 0.37; 𝛿 = 0.01
Calculated coefficients:
h1 = h8 = 0.015235; h2 = h7 = −0.085187
h3 = h6 = 0.081638; h4 = h5 = 0.486502
The two branches of the polyphase network obtained H 0 (Z 2 ) and H 1 (Z 2 ) have the same coeffi-
cients, but in reverse order.
∑15
The even coefficients of the filter i=1 h′i Z −i should be null, except for h′8 .
7 ( )2
∑
In fact, we obtain: h′2i = 1.7 × 10−5
i=1; i≠4
Iterative methods can be employed, if necessary, to complete the calculation and achieve a better
approximation of the symmetry.
The quality of the reconstruction depends on the ripple 𝛿 and, therefore, on the number of coef-
ficients. If perfect reconstruction is sought, the option of identical filters must be dropped, and
undersampling is necessary at analysis input [2].
Amplitude
1.4
1.2
H0(Z) H1(– Z)
1
0.8
0.6
0.4
0.2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Frequency
H f
1
0 1 1 3 f
2N N 2N
Figure 11.3 Frequency responses of the filters in the standard JPEG 2000 lossless.
The two filters obtained are at the basis of the reversible transform used in the still image com-
pression standard JPEG 2000 – option lossless. The frequency responses are given in Figure 11.3.
It is worth pointing out that the two sub-bands are unbalanced, so the filters are not of the
half-Nyquist type.
Decomposition into two equal sub-bands can be achieved provided linear phase constraint is
dropped and the following factorization of the half-band filter is performed:
The filters H 0 (Z) and H 1 (−Z) have the same coefficients but in reverse order and the polynomial
P(Z) has degree 2K and 2K + 1 coefficients.
As a function of M, the number of different coefficients, the equality 2K + 1 = 4M holds, which
implies K = 2M−1. The fact that the whole number K is odd allows relation (11.2) to be satisfied,
and with:
it can readily be verified that conditions (11.1) and (11.3) are met.
238 11 QMF Filters and Wavelets
Amplitude
1.4
1.2
0.8
0.6
0.4
0.2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Frequency
The coefficient calculation procedure is the minimum phase technique for FIR filters described
in Section 5.13. The ripple is added to the center coefficient of the half-band filter, which makes the
zeros on the unit circle double, and the minimum and maximum phase factors are extracted [3].
As an illustration, let us consider a filter P(Z) with 2K + 1 = 15 coefficients, computed with the
specifications f1 = 12 − f2 = 0.2. The M = 4 different coefficients are: h1 = 0.62785; h3 = − 0.18681;
h5 = 0.08822; h7 = − 0.05297.
The ripple value is 𝛿 = 0.047 and the central coefficient becomes h0 = 1.047. Taking one of the
zeros that are on the unit circle and those that are inside, the first filter is obtained:
H0 (Z) = 0.3704 + 0.5111 Z −1 + 0.2715 Z −2 − 0.0885 Z −3 − 0.1346 Z −4 + 0.0338 Z −5
+ 0.0973 Z −6 − 0.0703 Z −7
The corresponding frequency response
√ is shown in Figure 11.4. With respect to the initial filter,
the ripple in the attenuated band is 𝛿.
Among the characteristics of the factors of P(Z), the regularity of the frequency response is worth
emphasizing, because it is important for the compression of signals and, particularly, of images.
In filtering, this property corresponds to the presence of multiple zeros at point Z = −1 in the
Z-transfer function. On the theoretical side, the approach is justified by the wavelet theory.
11.4 Wavelets
The objective of the wavelet theory is the representation of signals in the time-frequency domain.
It is a representation which is not possible with the Fourier analysis because it assumes periodic
or finite duration signals. Thus, in order to localize a signal in both time and frequency with the
Fourier transform, a sliding window has to be introduced.
11.4 Wavelets 239
H1 2 y2(n)
X(n)
H1 2 y3(n)
As the basis for decomposition, the wavelet transform uses a set of functions called wavelets,
deduced from a generating function through translation and dilation. It allows for analysis of sig-
nals with arbitrary duration [4, 5].
In practice, the discrete wavelet transform is a nonuniform filter bank and an efficient approach
for implementation consists of cascading banks of two filters, such as those described above, with
decimation by factor two at each stage and the same coefficients for all the filters. The operations
of translation and dilation are performed automatically by the sampling rate changes. The tree
structure for analysis is shown in Figure 11.5. By completing the lower part in the figure, a uniform
filter bank can be obtained.
The coefficients are calculated with the objective of maximum regularity – that is, function P(Z)
featuring the maximum number of zeros at point Z = −1.
A half-band FIR filter with N different coefficients has 4N−1 coefficients and degree 4N−2.
With its 2(N−1) null coefficients, the function P(Z) includes a factor of degree 2(N−1). Thus, the
degree of the remaining factor is 2N. Then, the analysis and synthesis filters are derived from the
factorization of P(Z).
In a minimum phase solution, the filters have 2N coefficients and the Z-transfer function of the
low-pass filter has N zeros at Z = −1 in the complex plane. As in the previous sections, numerical
values can be determined directly by combining this constraint with the conditions of canceling
the coefficients of the odd terms in P(Z). Alternatively, P(Z) can be obtained and factorized.
Table 11.1 provides the coefficients of filters H 0 (Z) for the first values of N. The coefficients
of the other filters involved in analysis and synthesis, H 1 (Z), G0 (Z), G1 (Z) as in Figure 11.1, are
given by:
∑
2N
H0 (Z) = h0,i Z −i ; h1,i = (−1)i h0.2N+1−i ; g0,i = h0.2N+1−i ; g1,i = h1.2N+1−i (11.19)
i=1
Frequency responses are provided in Figure 11.6. It is worth mentioning the similarity with
Butterworth filter responses (see Section 7.2.3), which have the same zeros at Z = −1, also have
the property of perfect reconstruction when the cutoff frequency is set in the middle of the useful
band, and have a monotonic frequency response.
It is also possible to obtain linear phase filters, by giving different odd numbers of coefficients to
the low-pass and high-pass filters. For example, Table 11.2 gives the coefficients of the (9, 7) filters
used in the JPEG 2000 standard for highrate lossy compression [6].
The accuracy of the reconstruction is evaluated by computing the impulse response of the
analysis/synthesis system. Multiplying by H 1 (−Z) the polynomial which is obtained by canceling
the coefficient with an odd index in H 0 (Z) and adding the product by H 0 (−Z) of the polynomial
obtained by canceling the coefficient with even index in H 1 (Z), it can readily be verified that the
center coefficient of the resulting polynomial is unity, while other coefficients take on almost
240 11 QMF Filters and Wavelets
Amplitude
1.5
N=1
0.5
N=5
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Frequency
zero values, the error being smaller than 2 × 10−6 . The discrepancies stem from rounding the
coefficients.
Figure 11.7 shows the frequency responses of the filters H 0 (Z) and H 1 (Z)/2. Again, the partition
of the signal is unbalanced, but less so than in Figure 11.3, due to the higher number of filter coeffi-
cients. These filters have half their zeros at points Z = ± 1 in the complex plane, which brings high
regularity to the frequency responses – an important property in image processing.
Regarding arithmetic complexity, it is worth pointing out the small number of multiplications,
close to that of polyphase techniques, because it is possible to benefit from the coefficient symme-
tries for both the analysis and synthesis.
11.4 Wavelets 241
Zi H 0 (Z) H 1 (Z)
Amplitude
1.4
1.2
H1
H0
1
0.8
0.6
0.4
0.2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Frequency
H(f)
1 (a)
0 f
1 1 1 1
2N 4N 2N N
(b)
1
0 f
1 1 1 3
4N 2N N 2N
Figure 11.7 Frequency responses of the filters in the standard lossy JPEG 2000.
242 11 QMF Filters and Wavelets
The factorization (11.8) of the half-band filter can also be realized in lattice representation, cancel-
ing every other coefficient. The corresponding modular structure is shown in Figure 11.8.
The transfer functions of analysis and synthesis filters are obtained through the calculations pro-
vided in Section 8.5. For example, for three cells, K = 3, we find:
H0 (Z) = 1 + 𝛼1 Z −1 − 𝛼1 𝛼2 Z −2 + 𝛼2 Z −3
The delay is 2(K−1) sampling periods. The lattice coefficients are computed from filter
specifications. √ √
For example, imposing a zero at Z = −1 and H 0 (1) = 4, we obtain 𝛼1 = 1 + 2; 𝛼2 = 1 − 2.
Comparison with the previous section reveals that the lattice approach is less efficient than
other factorizations; referring to Table 11.1, the filter with four coefficients has two zeros at
Z = −1. However, the lattice approach has benefits when it comes to implementation, modularity
of the structure, and insensitivity to coefficient rounding, because perfect reconstruction is not
impacted [7].
2:1 1:2 Z– 2 Z
X(Z) – αi αi ˜
X(Z)
x(n) αi – αi ˜
x(n)
Z Z– 2 2:1 1:2
K cells K cells
H(f)
1
f
2N
f
f
2i + 1 2i + 3
4N 4N
Exercises
( )
n𝜋
11.1 The signal x(n) = cos 4
is applied to the filter with transfer function H(Z) = − 0.050 +
0.117 Z −1 + 0.452 Z −2 + 0.452 Z −3 + 0.117 Z −4 − 0.050 Z −5
What is the delay incurred by the filter? Give the impulse response and express the output
sequence.
11.2 Give the diagram of a QMF bank of two filters, assuming H(Z) is the prototype filter.
Provide the expressions of the signals at the output from the analysis. Compute the transfer
function of the analysis/synthesis system and give the expression of the reconstructed
signal.
11.3 In the case of perfect decomposition and reconstruction, the prototype P(Z) has degree 6
and it must be factorized with the same degree factors. Looking for maximum regularity, the
low-pass filter H 0 (Z) must have 3 zeros at Z = −1. Compute the coefficients of the high-pass
filter H 1 (Z) and find its zeros.
11.4 Compare the frequency responses with those in Figure 11.3. Quantization occurs at
analysis output; how is it amplified in the reconstruction phase?
11.5 Linear phase analysis filters are assumed to have three and five coefficients. Compute the
values which give the best possible sub-band separation and perfect reconstruction.
11.6 When a unit power white noise is applied at the inputs of the analysis filters, what is the
power of the output signals?
( )
11.7 The signal x(n) = cos n𝜋 4
is applied to the analysis filters in Table 11.2. Give the expres-
sions of the output signals before and after decimation. In the reconstruction process, give
the expressions of the signals before and after the final addition.
References
1 R. E. Crochière and L. R. Rabiner, Multirate Digital Signal Processing, Prentice-Hall Inc., Engle-
wood Cliffs, New Jersey, 1983.
2 M. Vetterli, Filter banks allowing perfect reconstruction. Signal Processing, 10(3), 1986, 219–244.
244 11 QMF Filters and Wavelets
3 M. Smith and T. Barnwell Exact reconstruction techniques for tree structured sub-band coders,
IEEE Transactions, ASSP-34, 3, 1986, 434–441.
4 I. Daubechies, Orthonormal bases for compactly supported wavelets. Communications on Pure
and Applied Mathematics, 41, 1988, 909–996.
5 S. Mallat, A Wavelet Tour of Signal Processing, 2nd ed., Academic Press, New York, 1999.
6 A. Skodras, C. Christopoulos and T. Ebrahimi, The JPEG 2000 still image compression standard,
IEEE Signal Processing Magazine, 18(5), 36–58, 2001.
7 P. P. Vaidyanathan, Multirate Systems and Filter Banks, Prentice Hall Inc.: Englewood Cliffs, N.J,
1993.
245
12
Filter Banks
The signal decomposition and reconstruction techniques presented in the previous chapter can
be generalized to any number of sub-bands, using banks of more than two filters. In that case, in
principle, decimation can be performed at the output of the analysis filters, but, in general, it is
preferable to decimate at system input in order to benefit from the combination of a polyphase
network and DFT, thus minimizing the computational complexity.
In the realization of filter banks using polyphase networks and DFT, as explained in Section 10.7,
the operations involved are reversible and this leads to the arrangement shown in Figure 12.1 for
the decomposition and reconstruction of a signal [1, 2].
The difficulty, in practice, consists of applying the operations associated with the functions
−1 N
Hi (Z ).
The filter H(Z) which serves as a basis for the process, sometimes called the prototype filter, has
a polyphase decomposition whose elements satisfy relation (10.8):
1 ∑ −j(2𝜋∕N)im
N−1
z−i Hi (Z N ) = e H(Ze−j(2𝜋∕N)m ); 0 ≤ i ≤ −1 (12.1)
N m=0
If the prototype filter H(Z) has a cutoff frequency of less than f s /2N and infinite attenuation for
frequencies greater than or equal to f s /2N – that is, if aliasing of the spectrum due to sampling at
the rate 1/NT is negligible – one can write:
Under these conditions, the following equation is satisfied on the unit circle, except for the factor
Z −N :
1 ∑ 2 −j(2𝜋∕N)m
N−1
Hi (Z N )HN−i (Z N ) = H (Ze ) = H02 (Z N )
N 2 m=0
Hence:
Hi (Z N ) (H)N−i (Z N )
= 1; 0≤i≤N −1 (12.3)
H0 (Z N ) H0 (Z N )
These equations simply convey the phase relations illustrated in Figure 10.12.
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
246 12 Filter Banks
H0(Z N) H0–1(Z N)
One can then take H N−i (Z N ) to realize Hi−1 (Z N ). Thus, the same filter bank is used for
decomposition and reconstruction of the signal; the overall operation corresponds to multiplication
by H02 (Z N ).
In certain applications, it is not possible to neglect aliasing; this is the case, for example, when it is
required to decompose a signal, sample it at the rate f s /N, and then reconstruct it with the greatest
possible accuracy over the band (0, f s ). Let G(Z) be the transfer function of the basic filter for the
reconstruction. As the product of a discrete Fourier transform and its inverse is equal to unity, the
overall operation corresponds to decomposition of the signal x(n) into N interleaved sequences
x(pN + i), to which are applied N operators with transfer functions Gi (Z N )H i (Z N ).
Figure 12.2 corresponds to a reduction of sampling frequency by N in the decomposition part,
sometimes called analysis, and an increase by N in the reconstruction part, sometimes called syn-
thesis. All processing in the corresponding device is performed at a rate 1/N, which is a particularly
effective approach.
The condition for reconstruction with a delay D is written as:
The delay D must be the same in all branches of the polyphase network for which the interleaving
corresponds to an increase of sampling frequency by N, and this can delay the initial signal, x(n−D).
To determine the inverse functions Gi (Z N ), it is necessary to perform a detailed analysis of the
frequency response for the elements of the polyphase network.
N H0(Z N) G0(Z N) N
N H1(Z N) G1(Z N) N
x(n) x̂ (n)
DFT DFT–1
X(z) ANALYSIS SYNTHESIS X̂ (z)
N HN – 1(Z N) GN – 1(Z N) N
Figure 12.2 Polyphase filter banks for signal analysis and synthesis.
12.2 Analyzing the Elements of the Polyphase Network 247
The frequency response of the elements H i (Z N ) of the polyphase network follows directly from
equation (12.1). However, simplifications can be made. Indeed, the filters of the bank generally
provide limited coverage, as in Figure 12.3. Under these conditions, a given filter is superimposed
only on its immediate neighbors if the response of the prototype filter H(Z) is such that H(f ) = 0
for |f | > 1/N.
Then, for the branch of index i, one can write:
( )
1
Hi (f ) = H(f ) + e−j(2𝜋∕N)i H f − (12.5)
N
Since the periodicity of this response is 1/N, if the coefficients are real, it is sufficient to consider
the response in the interval 0 ≤ f ≤ 1/2N, and this gives:
( )
1
Hi (f ) = H(f ) + e−j(2𝜋∕N)i H −f (12.6)
N
Assuming that the response of the prototype filter is a monotonically decreasing curve in the
transition band Δf , H i (f ) cannot be zero for f ≠ 1/2N. For the value 1/2N, this gives:
( ) ( )
1 1
Hi =H + (1 + e−j(2𝜋∕N)i ) (12.7)
2N 2N
This response is zero for the branch i = N/2.
Hence, with the decomposition (12.1), the same result as equation (10.36), the branch of index
N/2 cannot be inverted since its Z-transfer function has a zero in the Z-plane at the point − 1. To
obtain a set of invertible branches, it is necessary to use another polyphase decomposition.
In Chapter 5, it was shown that a linear phase FIR filter with an even number of coefficients is
an interpolator at mid-sample period. It can be considered to result from an FIR filter with an odd
number of coefficients, having the same frequency response by downsampling of a factor two. It is
therefore necessary to start with a polyphase decomposition with 2N branches and to retain only
one branch in two. One then obtains this equation for H(Z):
∑
N−1
H(z) = Z −(i+1∕2) Hi+1∕2 (Z N ) (12.8)
i=0
With this decomposition, the minimum amplitude H min in a branch is given by:
| ( )
2 || 𝜋
Hmin = ||H(N+1)∕2 = |(1 − ej𝜋∕N )| = 2 sin (12.9)
| 2N || 2N
As a result, to have an invertible polyphase network, it is sufficient to require that the linear phase
prototype FIR filter have an even number of coefficients.
H Δf
1
0 1 1 3 f
2N N 2N
Note that the zeros of the functions H i+1/2 (Z N ) in the Z N -plane with respect to the unit circle
are divided equally between the interior and the exterior of the unit circle. The reason is that the
elements of H i+1/2 (Z N ) are almost of linear phase. Furthermore, the amplitude of the frequency
response remains close to unity, so the zeros are far from the unit circle, except for the branches
which exhibit high attenuation at the frequency 1/2N when N is large.
By placing the transfer functions in the Z-plane, determination of the inverse function for the
polyphase elements starts with a factorization where the L1 zeros within the unit circle are sep-
arated from the L2 zeros which are outside:
∏
L1
∏
L2
Hi (Z) = hi0 (1 − Zh Z −1 ) (1 − Zl Z −1 ) (12.10)
h=1 l=0
−Z ∑ −1 i i
∞
1
= (Z ) Z (12.11)
1 − Zl Z −1 Zl l=0
and, consequently, the inverse of the second factor of (12.10) can be approached with an
arbitrary accuracy within a finite number of terms. Consider the function Gi (Z) defined by:
∑
L3
∑
L3
al Z −l al Z −l
l=0 l=0
Gi (Z) = = (12.12)
∏
L1
∑
L1
(1 − Zh Z −1 ) 1+ al Z −l
h=1 l=0
where L3 is an integer.
The condition for inversion is satisfied if:
(L )(L )
∑ 3 ∑ 3
−l −l
cl Z al Z = Z −(L2 +L3 ) (12.13)
l=0 l=0
The choice of delay L2 + L3 is justified by the fact that the coefficients of the expansion of (12.11)
are decreasing and that the second factor in (12.10) is of maximum phase. The inversion relation
is written in matrix form:
⎡C0 0 0 ··· 0 ⎤ ⎡ a0 ⎤ ⎡ 0 ⎤
⎢C C0 0 … 0 ⎥⎥ ⎢⎢ al ⎥⎥ ⎢⎢0⎥⎥
MA = ⎢ 1 = (12.14)
⎢⋮ ⋮ ⋮ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢⋮⎥
⎢0 0 0 ··· CL3 ⎦ ⎣aL3 ⎦ ⎢⎣1⎥⎦
⎥ ⎢ ⎥
⎣
where the vector A has the unknown coefficients al as elements. The system is overdetermined and
permits a solution in the least-squares sense, given by the relation:
(M t M)−1 M t (12.15)
The polyphase synthesis elements Gi (Z), have the structure of a general IIR filter. As the poles
Z h are far from the unit circle, a realization with a direct structure is possible.
12.4 Banks of Pseudo-QMF Filters 249
The method of calculating the polyphase synthesis elements described above is general and
applies to all analysis filters, with the sole condition that there are no zeros on the unit circle.
It is necessary to perform one calculation per branch since different coefficients are obtained. This
enables the analysis filter to be specified relatively independently of the synthesis filter. However,
it may be useful to sacrifice a little flexibility in order to obtain a simpler and more systematic
calculation, as in the previous chapter.
This will be described for a uniform bank of N real filters.
The principle relies on the assumption that, for a given filter, the attenuation is such that aliases
originate only in the adjacent bands [3]. Let H(Z) be the transfer function of a prototype low-pass
linear phase filter having the frequency response represented in Figure 12.4. Consideration of a
number of coefficients equal to LN gives:
∑
LN−1
H(Z) = hk Z −k (12.16)
k=0
In the bank, the filter of index i, centered on the frequency (2i + 1)/4N, has a transfer function
H(Ze−j2𝜋(−2i+1)/4N ). During the analysis, a component of the signal at the frequency (2i + 1)/4N + Δf ,
with perhaps 1/4N < Δf < 3/4N, will be attenuated by a factor H(Δf ). Downsampling at a rate 1/N
will produce a replica of this component at a frequency:
( )
2i + 1 3 2i + 1 3
+ − + Δf = − Δf
4N 4N 4N 4N
During synthesis, this component, aliased in the band of the filter of index i, will be found at
the frequency (2i + 1)/4N + 1/2N − Δf , and it will be subject to attenuation by the synthesis fil-
ter of index i – that is, G(1/2N) − Δf , if G(Z) designates the prototype synthesis filter. Finally, the
replicated component will have suffered attenuation:
( )
1
H(Δf)G − Δf
2N
H(f)
1
0 f
1 1 1 1
–
2N 4N 2N N
(a)
1
0 f
1 1 1 3
4N 2N N 2N
(b)
Figure 12.4 (a) Prototype filter, (b) uniform bank of N real filters.
250 12 Filter Banks
H(f)
1
Δf
2N
f
Δf
2i + 1 2i + 3
4N 4N
The same component of the signal will now be processed by the filter of index i+1, since it falls in
the pass band. Sampling then produces an image component which, during synthesis, will be added
to the previous aliased component with attenuation H(1/2N − Δf )G(Δf ). The process is illustrated
in Figure 12.5.
Hence, the condition for these components to compensate each other is:
[ ( )] [ ( ) ]
1 1
H(Δf)G − Δf + H − Δf G(Δf)i+1 = 0 (12.17)
2N i 2N
This condition for the absence of aliasing can be obtained by taking G(f ) = H(f ) and applying a
phase difference of 𝜋/2 between the filters of index i and i + 1, during analysis and synthesis.
The necessary phase difference can be obtained by introducing phase shifts into the modulation
functions – for example, by taking the following values for the coefficients hik of the analysis filter
of index i:
[ ( ) ]
𝜋 LN − 1
hik = 2hk cos (2i + 1) k− + 𝜃i (12.18)
2N 2
with 0 ≤ i ≤ N − 1 and 0 ≤ k ≤ NL − 1.
For the synthesis filter:
[ ( ) ]
𝜋 LN − 1
gik = 2hk cos (2i + 1) k− − 𝜃i (12.19)
2N 2
Setting:
LN − 1
ai = ej𝜃i Ci = e−j(2i+1)𝜋∕2N (12.20)
2
gives the following, for the corresponding transfer functions:
Hi (Z) = ai ci H(Ze−j(2i+1)𝜋∕2N ) + ai ci H(Zej(2i+1)𝜋∕2N ) (12.21)
With the assumption that the filters have sufficient attenuation for only the adjacent bands to
cause significant aliasing, it is now necessary to determine the values for the angles 𝜃 i to obtain the
desired cancellation.
At the output of the filter H i (Z), after downsampling, in accordance with equation (10.8),
the signal is:
1∑
N−1
Xi (Z N ) = H (ZWm )X(ZWm ) (12.25)
N m=0 i
The output signal of the system becomes:
∑
N−1
1∑
N−1
∑
N−1
̂i (Z) =
X Gi (Z)Xi (ZN ) = X(ZWm ) Gi (Z)Xi (ZWm ) (12.26)
i=0
N m=0 i=0
∑
N−1
Gi (Z)Xi (ZWm ) = 0; 1≤m≤N −1 (12.28)
i=0
To make the aliased components appear to cancel one another out, it is necessary to examine the
output signal of each of the synthesis filters. At the output of the filter Gi (Z), the signal X i (Z) using
equation (12.25) can be written as:
∑
N−1
̂i (Z) = Gi (Z) 1
X H (ZWm )X(ZWm ) (12.29)
N m=0 i
However, in light of definition (12.22) and the assumptions concerning attenuation, the filter
Gi (Z) allows passage only of the band centered on frequency (2i + 1)/4N and the two adjacent
bands. Taking account of the distribution of the bands on the frequency axis, the indices m associ-
ated with these adjacent bands correspond to a frequency translation such that:
2i + 1 m 2i + 1 1
± =− ± (12.30)
4M N 4M 2N
In fact, the downsampling leads to frequency translations which are integral multiples of the
frequency 1/N. Under these conditions, the values of m can be written as:
m = ±i and m = ±(i + 1)
For example, considering the case of Figure 12.5, the aliased component arises from a component
at the frequency −(2i + 1)/4N − Δf shifted by (i + 1)/N; hence:
( )
2i + 1 i + 1 2i + 3
− + Δf + = − Δf
4N N 4N
̂i (Z) is limited to the following expansion by using equation (12.21) to define
Consequently, X
H i (Z):
̂i (Z) = Gi (Z) 1 [ai ci H(ZW −(2i+1)∕4 )]X(Z) + ai ci H(ZW (2i+1)∕4 )X(Z)
X
N
+ ai ci H(ZW (2i−1)∕4 )X(ZWi ) + ai ci H(ZW(1−2i)∕4 )X(ZW−i )
+ ai ci H(ZW (2i+3)∕4 )X(ZWi+1 ) + ai ci H(ZW−(2i+3)∕4 )X(ZW−(i+1) ) (12.31)
252 12 Filter Banks
Since:
∑
N−1
̂ (Z) =
X ̂i (Z)
X (12.32)
i=0
The specification of the prototype filter reflects the required separation between the sub-bands.
Several approaches to the design can be envisaged.
It is the design of a half-Nyquist filter, and the first approach consists of taking a cosine transition
band and using (5.37). However, the sampling frequency technique set forth in Section 5.4 may be
more efficient. In particular, when the transition band equals sub-band spacing, the coefficients
are derived from a simple formula [4].
Let K be a whole number and consider a set of KN samples H k (0 ≤ k ≤ KN − 1) in the frequency
domain:
H0 = 1 (12.40)
Hk2 + HK−k
2
= 1; HKN−k = Hk ; 1 ≤ k ≤ K − 1
Hk = 0; K ≤ k ≤ KN − K
∑
K−1
H0 + 2 (−1)k Hk = 0 (12.41)
k=1
Then, the middle coefficient hKN/2 is null, and the filter has an odd number of coefficients. There
is a double zero at half the sampling frequency, as indicated in Section 5.12, and high attenuation
results for high frequencies.
For K = 3 and K = 4, equations (12.40 and 12.41) define the frequency samples:
H1 = 0.911438; H2 = 0.411438
and
1
H1 = 0.971960; H2 = √ ; H3 = 0.235147 (12.42)
2
For example, a bank of N = 16 filters and K = 4 has the coefficients:
( )
∑
K−1
2𝜋ki
hi+1 = 1 + 2 (−1)k Hk cos ; 1 ≤ i ≤ 63 (12.43)
k=1
KN
h1 = 0
The filter obtained has an odd number of coefficients. The frequency response, whose transition
band is equal to 1/16, is shown in Figure 12.6. The attenuation grows with frequency, but it is pos-
sible to obtain a constant attenuation using the method indicated in Section 11.2 with appropriate
specifications. Iterative techniques may be used in addition to the procedure to better approximate
the symmetry condition (12.37).
Once the coefficients are determined, the calculations should be set out and performed so that
the number of arithmetic operations is minimal.
254 12 Filter Banks
0
Amplitude
(dB)
–50
–100
–150
0 0.2 0.4 0.6 0.8 1
Normalized frequency
Figure 12.6 Prototype filter obtained by frequency weighting for a bank of 16 filters and 64 coefficients.
Consider a bank of real filters having the frequency responses shown in Figure 12.4 and an even
number of coefficients 2LN, in which the filter of index i has the coefficients:
[ ( )( )]
2𝜋 1 1
hik = hk cos i+ k+ (12.44)
2N 2 2
with 0 ≤ i ≤ N, 0 ≤ k ≤ 2LN − 1.
A decomposition into a polyphase network and a discrete Fourier transform can be obtained by
setting k = 2Nl + m with 0 ≤ l I ≤ L − 1 and 0 ≤ m ≤ 2N − 1.
The output xi (n) for the filter of index i can be written as:
∑
2LN−1 [ ( )( )]
2𝜋 1 1
xi (n) = x(n − k)hk cos i+ k+ (12.45)
k=0
2N 2 2
Applying the general transfer function decomposition (10.36) to the prototype filter gives:
∑
2N−1
H(Z) = Z −m Hm (Z 2N ) (12.47)
m=0
and the filters H m (Z 2N ) are those which arise in the second summation of expression (12.46). To
take account of the factor (−1)l , it is sufficient to introduce the functions H m (−Z 2N ), and the dia-
gram corresponding to the analysis filters is shown in Figure 12.7. The decomposition produces a
polyphase network with 2N branches, and a cosine transform.
12.6 Realizing a Bank of Real Filters 255
x(n)
Cosine
X(Z ) transform
xN(n)
HN + i (–Z 2N)
x(n)
Z–N
–
Z–(N – I – i) HN – I – i (–Z 2N) + yN – I – i (n)
Figure 12.7 can be further simplified since, in the cosine transform considered, two symmetrical
inputs are subjected to the same operations, apart from the sign. Factorizing yields an odd-time
odd-frequency cosine transform – a special case mentioned in Section 3.3.2. Furthermore, with
downsampling, the system operates at a rate 1/N. As the system consists of 2N branches, a given
x(n) is processed by the filters H i and H i+N at two successive instants. Under these conditions,
the 2N branches of the polyphase network can be regrouped as N/2 subgroups having the lattice
structure shown in Figure 12.8. The overall configuration is therefore as illustrated in Figure 12.9.
In the case of pseudo-QMF filters, this arrangement is applicable with the introduction of phase
shifts. In fact, by taking equation (12.45) with the coefficients of the filter given in equation (12.38),
the output xi (n) of the filter of index i with 2LN coefficients can be written as:
∑
2LN−1 [ ( ) ]
2i + 1 2LN − 1 𝜋
xi (n) = 2 x(n − k)hk cos 2𝜋 k− + (2i + 1) (12.48)
k=0
4N 2 4
∑ ∑
N−1 2L−1 [
𝜋 •
xi (n) = 2 cos (2i + 1)(2m + 1 + N)
m=0 l=0
4N
]
+ (2i + 1)(l − L)𝜋∕2 hlN+mx (n − lN − m) (12.49)
256 12 Filter Banks
H0 HN x0(n)
x(n)
TFDI 2
Z–(N/2 – 1) HN/2 HN + N/2
Z–N/2 HN/2 – I HN + N/2 – 1
xN – 1(n)
∑
N−1 [ ]
𝜋
xi (n) = 2 cos (2i + 1)(2m + 1) y (n) (12.50)
m=0
4N m
with:
1
ym (n) = [−y2,N∕2−1−m (n) + y2,N∕2+m (n)]; 0 ≤ m ≤ N −1
2
ym (n) = [y1,m−N∕2 (n) − y1,3N∕2−M−1 (n)]; N∕2 ≤ m ≤ N − 1
and:
∑
2L−1
𝜋
y1,m (n) = cos(l − L) hlN+m x(n − lN − m)
m=0
2
∑
2L−1
𝜋
y2,m (n) = sin(l − L) hlN+m x(n − lN − m)
m=0
2
The sequences y1,m (n) and y2,m (n) are interleaved, with sampling frequency f s /2N, and the bank
of analysis filters is determined with an odd-time odd-frequency cosine transform. Finally, the
phase shifts introduced by the pseudo-QMF technique have been taken into account simply by
rearranging the data before the transform.
References 257
Exercises
12.1 In a bank of four filters, the prototype is linear phase, and it has the coefficients:
h = [−0.0218 − 0.0200 − 0.0116 0.0160 0.0621 0.1181 0.1691 0.1996]
Give the polyphase-FFT (PPN-FFT) decomposition. The sampling frequency is unity.
Check that the amplitude of the frequency response in the pass band is the same for all four
branches. The phases are linear and the values at frequency 1/8 in radian
phase = [−1.4829 − 1.2459 − 1.0366 0.8002]
Calculate the delay brought by each branch and justify the results.
12.2 A filter bank can be realized with a FFT whose size equals the number of coefficients
of the prototype filter. Justify the approach, using the results in Chapter 2. Compute the
coefficients. How are they placed in the analysis and synthesis parts?
As an illustration, the example in Section 12.5 is considered with K = 4 and N = 16. Give
the values of the seven coefficients to apply in front of the summation that yields the output
of each filter. The base filter, in the synthesis part, is centered on the FFT output of index 0.
At which FFT output are the seven coefficients applied? Answer the same question for the
neighboring filter centered on index 4.
Describe the application to the synthesis filter bank.
12.3 A bank of N = 16 filters is used to decompose a signal. The 2N coefficients of the prototype
filter are the following:
( ( ))
𝜋 1
h(n) = sin n+ ; 0 ≤ n ≤ 2N − 1
2N 2
What is the amplitude H(f ) of the frequency response? (Refer to Section 5.8 and find an
approximate expression corresponding to the continuous case.)
Compare with the filter associated with the DFT of size N = 16, considering the coefficients
hFFT (n) = 1; 0 ≤ n ≤ 15.
In a PPN-FFT realization, give the expression of the coefficients in each of the 16 branches.
An alternative realization is based on a FFT of size 2N = 32. What processing must be intro-
duced at FFT output to realize the prototype filter? Compare the number of multiplications
with the previous approach.
References
1 M. Bellanger and J. Daguet, TDM-FDM transmultiplexer: digital polyphase and FFT. IEEE
Transactions on Communications, 22(9), 1199–1205, 1974.
2 R. E. Crochière and L. R. Rabiner, Multirate Digital Signal Processing, Prentice-Hall Inc.,
Englewood Cliffs, NJ, 1983.
3 N. J. Fliege, Multirate Digital Signal Processing, John Wiley, Chichester, 1994.
4 K. W. Martin, Small side-lobe filter design for multitone data communications. IEEE Transactions
on Circuits and Systems II, 45(8), 1155–1161, 1998.
259
13
The modeling of systems is one of the most important areas of signal processing. Furthermore,
modeling is an alternative approach to signal analysis, with properties differing from those of
the Fourier transform and those of filters defined in the frequency domain. Linear prediction,
in particular, is a simple and efficient tool to characterize some signal types and then compress
them. The processing is specified in the time domain, using statistical parameters and, particularly,
correlation.
1 ∑N
rxx (n) = lim x(i)x(i − n) (13.2)
N→∞ 2N + 1
i=−N
The function r xx (n) is even. Its value at the origin is the power of the signal and, for any n, is:
|r (n)| ≤ r (0) (13.3)
| xx xx
Consider a set of N coefficients ai (1 ≤ i ≤ N). Calculation of the variance of the variable y(n)
such that:
∑
N
y(n) = ai x(n − i)
i=1
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
260 13 Signal Analysis and Modeling
results in:
∑ ∑
N N
E[y2 (n)] = ai aj E[x(n − i)x(n − j)]
i=1 j=1
or,
∑ ∑
N N
E[y2 (n)] = ai aj rxx (i − j) (13.4)
i=1 j=1
This property characterizes positive functions. If, in the definition of equation (13.1), x(i − n)
is replaced by another signal, a function is obtained which allows two different signals to be com-
pared. The intercorrelation function between two discrete signals x(n) and y(n) is the set r xy (n) such
that:
1 ∑N
rxy (n) lim x(i)y(i − n) (13.7)
N→∞ 2N + 1
i=−N
Similarly,
For example, if the signals are the input and output of a filter:
∑
∞
y(n) = hj x(n − j)
j=0
or,
Similarly,
and also:
If two random signals are independent, their intercorrelation functions are zero. Further, the
following inequality is always valid:
1
|rxy (n)| ≤ [r (0) + ryy (0)] (13.12)
2 xx
13.2 Correlogram Spectral Analysis 261
It is worth mentioning that the AC and intercorrelation functions can, in some cases, be com-
puted without multiplication, the signals being replaced by their signs. Thus, if x(n) is a Gaussian
signal:
√
𝜋√
rxx (n) = r (0)E[x(i)sing(x(i − n))] (13.13)
2 xx
[ ]
𝜋
rxx (n) = rxx (0) sin E[sing[x(i)x(i − n)]] (13.14)
2
These expressions can greatly simplify the equipment required.
The Fourier transform Φxy (f ) of the intercorrelation function r xy (n) is called the interspectrum:
Φxy (f ) = X(f )Y (f )
where X(f ) denotes the spectrum of the set x(n) and Y (f ) is the conjugate spectrum of the set y(n).
If the set y(n) is the output of a filter with transfer function H(f ), then:
Y (f ) Y (f )X(f )
H(f ) = =
X(f ) X(f )X(f )
Hence,
and finally,
These results apply to the spectrum analysis of random signals in general and are useful for the
study of adaptive systems.
The AC function of the signal is r(p). In practice, the analysis is performed using a limited
number, N 0 , of signal samples. Therefore, r(p) must be estimated first.
An initial estimation of the AC function is:
1 ∑
N0
r1 (p) = x(n)x(n − p) (13.18)
N0 n=p+1
1 ∑
N0
r2 (p) = x(n)x(n − p) (13.19)
N0 − p n=p+1
From P values of the AC function, the so-called correlogram spectral estimation is defined by:
∑
P−1
SCR (f ) = r2 (p)e−j2𝜋pf (13.20)
p=−(P−1)
A single frequency f is to be estimated among a set of spurious components which can be seen
as an additive white Gaussian noise (AWGN). If N samples x(n) (0 ≤ n ≤ N − 1) are available, the
attainable accuracy depends on N and on the signal-to-noise ratio (SNR). A lower bound of the
estimation variance is (see Appendix):
6
BCRf = (13.23)
4𝜋 2 SB N 3
In practice, the theoretical bound above can be closely approximated, provided the SNR remains
greater than a threshold, whose magnitude is roughly:
29
SBmin ≈ (13.24)
N
The simplest estimation technique is based on a DFT of size N, and the procedure is made up of
two steps. First, a rough estimate is derived, taking the index of the output with maximum magni-
tude, and then, it is refined through interpolation [2].
Assuming the sampling frequency is unity, the signal is written as:
A, f , and 𝜑 are the amplitude, frequency, and phase of the component to be identified; the additive
noise is b(n). The first step is based on the decomposition:
k0 1 1
f = + 𝛿f with k0 integer and − < 𝛿f < (13.26)
N 2N 2N
13.3 Single-Frequency Estimation 263
The DFT output y(k) which has the maximum amplitude provides k0 in (13.26). After a shift
about the origin (k0 = 0), we get:
( ( ))
sin N𝜋 𝛿f − Nk
y(k) = Aej𝜑 ( ( )) ej𝜋𝛿f(N−1) + 𝛽(k); 0 ≤ k ≤ N − 1 (13.27)
k
N sin 𝜋 𝛿f − N
𝛽(k) is the noise component in the frequency domain. Since the phase is unknown, in order to
assess the frequency deviation, it is necessary to get rid of the phase terms, and one can use the real
function:
( ( ))
sin N𝜋 𝛿f − Nk
y′ (k) = A ( ( )) + |𝛽(k)|ej𝜋k∕N (13.28)
N sin 𝜋 𝛿f − Nk
which is obtained by restoring the signs of the sin(x)/x function in |y(k)|. Of course, ambiguity is
introduced, and we only know that indices k and N−k are associated with opposite signs. The ambi-
guity can be solved by observing the sign of the difference abs(y(N−1))–abs(y(1)). From (13.28), it
is possible to apply the least squares method to minimize the cost function:
N
−1 [ ]2
∑
2
sin(N𝜋(𝛿f − k∕N))
J= y′ (k) − (13.29)
N sin(𝜋(𝛿f − k∕N))
k=− N2 +1
The minimum can be reached through iteration – for example, using the procedure set forth in
Section 5.4. A direct estimation is obtained by series development of the cardinal sine function.
For example, assuming the amplitude A has been estimated and keeping only 3 terms in the cost
function (13.29), we obtain:
1 abs(y(1)) + abs(y(−1))
𝛿f ≈ (13.30)
N [abs(y(0)) − 1] 𝜋 2 + 2
3
The estimation is valid in the interval [−1/2N 1/2N], and it is degraded in the vicinity of
the bounds. For example, for N = 16, 1/2N = 0.031 and 𝛿f = 0.01, we obtain the estimation
𝛿f estim = 0.0106.
Iterative techniques are needed to approach the bound (13.23). A particularly efficient technique
is called iterated correlations, according to which the amplitude and the frequency of the sine wave
can be estimated independently of the phase. The procedure begins with calculating the AC func-
tion of the signal x(n) by:
{ }
2N |TFD2N (x)|
2
[r1 ] = TFD−1 (13.31)
In order to apply a DFT of size 2N, N zeros are appended to the sequence x(n). The frequency
domain interpolation (see Section 9.10) leads to the AC function because the squaring in the fre-
quency domain amounts to convolution of the signal by its conjugate. Thus, we obtain:
1∑ N − p j2𝜋pf 1 ∑
N−1 N−1
r1 (p) = x(n)x∗ (n − p) = e + b (n); 0≤p≤N −1 (13.32)
N n=p N N n=p 1
∑
N−1
Nr(0) = 𝜆i = N𝜎x2 (13.38)
i=0
13.4 Correlation Matrix 265
That is, if the determinant of the matrix is nonzero, each eigenvalue is nonzero, and their sum is
equal to N times the power of the signal. The positive nature of the matrix RN further implies that
they are all positive:
𝜆i > 0; 0≤i≤N −1 (13.39)
To ensure this, it is necessary and sufficient that the following determinants all be positive:
⎡ r(1) r(1) · · · r(N − 1)⎤
[ ] ⎢ ⎥
r(0) r(1) r(1) r(0) ··· ⋮ ⎥
r(0); det ; … ; det ⎢
r(1) r(0) ⎢ ⋮ ⋮ ⋮ ⋮ ⎥
⎢r(N − 1) ··· · · · r(0) ⎥⎦
⎣
The corresponding matrices are the AC matrices of order less than or equal to N.
Under these conditions, the matrix RN can be diagonalized so that:
RN = M t diag(𝜆i )M (13.40)
where M is a square matrix of dimension N, such that M t = M −1 , and diag(𝜆i ) is the diagonal matrix
of the eigenvalues. M t can be equal to M in some cases.
The matrix can be expressed in terms of its normalized eigenvectors U i (0 ≤ i ≤ N − 1) by:
∑
N−1
RN = 𝜆i Ui Uit (13.41)
i=0
and, if this maximum corresponds to the value for i = 0, one can write:
∏ RN − 𝜆j IN
N−1
RPN ≈ 𝜆Pmax (13.43)
j=1
𝜆max − 𝜆j
It will be shown below that these two extreme eigenvalues, 𝜆max and 𝜆min , condition the behavior
of adaptive systems.
The physical interpretation of the eigenvalues of the AC matrix is not readily apparent from their
definition, but it can be illustrated by comparing them with the spectrum of the signal x(n).
The case where the signal x(n) is periodic and has period N will be considered first. In this case,
the set r(n) is also periodic, and is also symmetrical with:
Under these conditions, the matrix RN is a rotating matrix in which each row is derived from the
preceding one by shifting. If the set Φxx (n) (0 ≤ n ≤ N − 1) denotes the Fourier transform of the set
r(n), then it can be shown directly that:
RN TN = TN diag(Φxx (n))
Comparison with equation (13.26) shows that, in this case, the eigenvalues of the matrix RN are
the discrete Fourier transform of the AC function – that is, the values of the signal power spectral
density. M is the cosine transform matrix.
This relation is also valid for discrete white noise as the spectrum is constant and, since the AC
matrix is a unit matrix (to a factor), the eigenvalues are equal.
Real signals generally have a spectral density with nonconstant power, and their AC function
r(p) decreases as the index p increases. For sufficiently large N, the significant elements of the
N-dimensional matrix can be regrouped around the principal diagonal. Under these conditions, let
R′N be the AC matrix of a signal x(n) which is assumed to be periodic with period N. Its eigenvalues
Φxx (n) form a sample of the power spectral density. The discrepancy between RN and R′N is due to
the fact that R′N is a rotating matrix and the difference appears primarily in the upper right-hand and
lower left-hand corners. Thus, RN can be better approximated by a diagonal matrix than R′N , and
consequently, its eigenvalues are less dispersed. In fact, under certain conditions which commonly
occur in practice, it can be shown that [3]:
In conclusion, it can be considered in practice that, when the dimension of this matrix is
sufficiently large, the extreme eigenvalues of the AC matrix approximate the extreme values of the
power spectral density of the signal.
13.5 Modeling
Digital filters apply to system modeling, as shown in Figure 13.1. The signal x(n) is fed to the
system and the model, and the coefficients are computed so that the difference in the outputs is
minimized.
13.5 Modeling 267
y(n)
System to be modeled
x(n) e(n)
+
–
Digital filter
y(n)
Depending on the knowledge available to begin with, one of a number of filters may be used as a
model. However, FIR filters are chosen in general, due to their ease of design and implementation,
and the output is:
∑
N−1
ỹ (n) = hi x(n − i) = H t X(n) (13.49)
i=0
where X(n) is the vector of the N most recent data. The number of coefficients N is chosen in
accordance with information available about the model. The output error is defined by:
e(n) = y(n) − ỹ (n) (13.50)
Next, the coefficients are calculated using the minimum mean square error (MMSE) criterion
with cost function:
J = E[e2 (n)] (13.51)
Canceling the derivatives of this cost function produces the equation E[e(n)X(n)] = 0, which
defines the decorrelation of the output sequence and the vector of the most recent input data. Then,
we get:
E[y(n)X(n)] − E[X(n)X t (n)]H = 0 (13.52)
Definition (13.1) of the AC function and (13.36) of the AC matrix show that E[X(n)X t (n)] = RN
and the coefficients obtained by:
H = R−1
N ryx (13.53)
Thus, the coefficients of the modeling filter are obtained by multiplying the inverse of the AC
matrix by the cross-correlation vector of system input and output defined by:
⎡ ⎡ x(n) ⎤⎤
⎢ ⎢ ⎥⎥
⎢ ⎢ x(n − 1) ⎥⎥
⎢ ⎢ . ⎥⎥
ryx = E ⎢y(n) ⎢ ⎥⎥ (13.54)
⎢ ⎢ . ⎥⎥
⎢ ⎢ . ⎥⎥
⎢ ⎢x(n + 1 − N)⎥⎥
⎣ ⎣ ⎦⎦
Combining (13.51) and (13.53) yields the MMSE Emin and the three alternative expressions:
Emin = E[y2 (n)] − H t RN H
Equalization is a special case, in which the system to be modeled is the inverse of the system
which has generated the input sequence x(n). For example, in communications, the transfer func-
tion of the equalizer is the inverse of the channel transfer function, in the absence of noise.
Example:
The channel output signal x(n) is related to the emitted data d(n), assumed to be un-correlated and
of unit power, by:
Taking d(n) as the reference signal, y(n) = d(n), the three equalizer coefficients are:
−1
⎡1.29 0.60 0.20⎤ ⎡1⎤ ⎡ 0.9953 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
H = ⎢0.60 1.29 0.60⎥ ⎢0⎥ = ⎢−0.4971⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣0.20 0.60 1.29⎦ ⎣0⎦ ⎣ 00778 ⎦
The output error is Emin = 0.0047 and the Z-transfer function T(Z) of the channel-equalizer cas-
cade is:
It is readily verified that the filter H(Z) is a three-coefficient approximation of the inverse of the
channel, which is a FIR filter.
Linear prediction is a particular case in which the output of the system to be modeled is the signal
itself, as shown in Figure 13.2.
The output error, called prediction error, is written as:
∑
N
e(n) = x(n) − ai x(n − i) (13.56)
i=1
x(n) e(n)
+
AN(Z)
The decorrelation of the prediction error and the input signal leads to N relations:
∑
N
r(p) = ai r(p − i); 1≤p≤N (13.58)
i=1
The above expressions can be combined to yield the matrix linear prediction equation:
[ ] [ ]
1 EaN
RN+1 = (13.60)
−AN 0
A signal is said to be “predictable” if the prediction error is null – that is, the following recurrence
equation is verified:
∑
N
x(n) = ai x(n − i)
i=1
The predictable signal is made of at most N/2 sinusoids, and the filter with transfer function
H(Z) = 1 − AN (Z) has at most N zeros on the unit circle. For example, for N = 2 and a sinusoid of
frequency f 0 , the recurrence is:
x(n) = 2 cos(2𝜋f0 )x(n − 1) − x(n − 2)
In fact, the zeros of H(Z) are either on the unit circle or inside the unit circle because it is mini-
mum phase. The proof is as follows. Assume a zero z0 is outside the unit circle, |Z 0 | < 1, then H ′ (Z)
such that:
( )( )
H(z) z−1 z−1
H ′ (Z) = × 1 − 1 −
(1 − z0 z−1 )(1 − z0 z−1 ) z0 z0
yields a smaller prediction error because:
1
|H ′ (ej𝜔 )| = |H(ej𝜔 )|
|Z0 |2
as indicated in Section 9.6. According to expression (4.24) for the signal power at the output of a
filter, the power at the output of H ′ (Z) is lower than that at the output of H(Z) when the same signal
is fed to both filters. This contradicts the definition of the prediction filter.
Since the filter H(Z) is minimum phase and invertible, linear prediction is employed to analyze
and model signals. In fact, it is possible to retrieve a signal from its prediction error by inverting
expression (13.56):
∑
N
x(n) = e(n) − ai x(n − i) (13.61)
i=1
This is so-called autoregressive (AR) modeling. Assuming the prediction error is a white Gaus-
sian noise of power EaN , the signal spectrum can be estimated by:
EaN
SAR (f ) = (13.62)
| ∑ N |2
|1 − i=1 ai e−j2𝜋if |
| |
If the filter H(Z) is of IIR type, the model is called autoregressive moving average (ARMA), and
the spectrum can again be derived from the model.
270 13 Signal Analysis and Modeling
FIR and IIR filters can be implemented in lattice structures, as indicated in Section 8.5. The
approach is particularly advantageous in linear prediction [4].
The prediction coefficients are defined by (13.57) and, due to the properties of the matrix, the
inversion may be avoided by an iterative procedure.
The Levinson–Durbin procedure provides a solution for the system of equations (13.58) by recur-
sion over N stages. It begins by taking the power of the error signal as:
E0 = r(0)
aii = ki (13.63)
aij = ai−1
j − ki ai−1
i−j ; 1≤j≤i−1
( 2
)
Ei = 1 − ki Ei−1 (13.64)
ai = aNi ; 1≤i≤N
The term Ei corresponds to the power of the residual error with a predictor of order i. However,
the values of the coefficients ki obtained at the preceding stages are not changed in the ith stage. The
procedure is sequential, and the model is improved as the number of stages (and hence the number
of coefficients) increases, because, from equation (13.64), the power of the error is reduced at each
stage if |ki | < 1.
The coefficients ki completely define the filter and indicate the method of realization. The error
signal at stage i is formed by the set ei (n), where:
∑
i
ei (n) = x(n) − aij x(n − j)
j=1
∑
i
Ai (Z) = 1 − aij Z −j
j−1
we obtain:
An alternative decomposition of the prediction filter is provided by the line spectral pair (LSP)
method, which consists of splitting the transfer function into two parts with symmetric and anti-
symmetric coefficients. Taking recurrence (13.65) at order N + 1 and designating by PN (Z) the
polynomial obtained with kN + 1 = 1, we get:
Clearly, this is a decomposition of the polynomial AN (Z), because the sum of the two preceding
relations yields:
1
AN (Z) = [P (Z) + QN (Z)] (13.71)
2 N
The coefficients of PN (Z) and QN (Z) exhibit even and odd symmetries, respectively; they are lin-
ear phase and, since, as predictors, they are not allowed to have zeros outside the unit circle, all their
zeros are on the unit circle. Moreover, if N is even, the relations PN (1) = QN (−1) = 0 hold – hence
the factorization:
N
∑
2
QN (Z) = (1 + Z −1 ) (1 − 2 cos 𝜔i Z −1 + Z −2 )
i=1
x(n)
k1 k2
– –
Z–1 + Z–1 +
b0(n) b1(n) b2(n)
The above approach provides a realization structure for the prediction error filter, as shown in
Figure 13.4. The transfer functions F(Z) and G(Z) are the linear phase factors in (13.72). This struc-
ture is amenable to implementation as a cascade of second-order sections. The overall minimum
phase property is checked by observing the alternation of the Z −1 coefficients.
0 (reference)
x(t) ∆t
y(t)
Σ
θ
(N – 1)∆t
where 𝜆 = c/f is the wavelength associated with frequency f . The delays translate into phase shifts
and, in order to be able to distinguish the arrival angles between −𝜋/2 and 𝜋/2 and carry out
beamforming, the phase differences between elements must be less than 𝜋, which implies:
d sin 𝜃 1 𝜆
≤ ; d≤ (13.75)
𝜆 2 2
The value 𝜆/2 generally retained is the upper bound for the spatial sampling interval.
The interpolators boil down to multiplications by weighting coefficients given by:
wi = ej𝜔 iΔt ; 0≤i≤N −1 (13.76)
The calculation of the weighting coefficients is linked to the system environment.
In the presence of Gaussian white noise, a technique similar to linear prediction can be used. One
of the antenna elements is taken as the reference, from the corresponding signal x0 (n), a weighted
sum of the signals coming from the other elements is subtracted and the following difference is
obtained:
∑
N−1
x0 (n) − ỹ (n) = e(n) = x0 (n) − t
wi xi (n) = x0 (n) − WN−1 XN−1 (n) (13.77)
i=1
h21
h12
Receiver Transmitter
hNN
A system with N antennas at the sender and receiver sides is illustrated in Figure 13.6. The N × N
transfer matrix H is written as:
⎡ h11 h12 .. … … … … h1N ⎤
⎢ ⎥
h21 h22 … … … … h2N
H=⎢ ⎥
⎢ .. … … … … … … … … …⎥
⎢ hN1 hN2 .. … … … hNN ⎥
⎣ ⎦
where the entries hij are complex numbers representing the amplitude and the phase of every
subchannel linking transmitter and receiver antenna elements.
The analysis of the channel relies on the diagonalization of matrix H and the eigen-
decomposition, when it exists:
∑
N
H = M diag (𝜆i )M ∗ ; H= 𝜆i Ui Ui∗ (13.82)
i=1
where 𝜆i and U i (i = 1, …, N) are the eigenvalues and normalized eigenvectors, respectively, and
M the rotation matrix:
M = [U1 . … … UN ]
[ ]
1re−j𝜃
Example: H = ; 𝜆i = 1 ± jr; |𝜆i |2 = 1 + r 2
−rej𝜃 1
Eigenvectors:
[ ] [ ]
1 1 1 1
U1 = √ ; U2 = √
2 jej𝜃 2 −jej𝜃
Rotation matrix:
[ ]
1 11
M= √ j𝜃 j𝜃
2 je − je
The decomposition of the transfer matrix can be used to transmit a vector D of N data, provided a
precoding operation by matrix M is introduced in the transmitter and post-coding by the transpose
matrix M* is introduced in the receiver. The block diagram is given in Figure 13.7 for a Hermitian
channel.
The received signal vector is:
The equivalent of a set of N separated channels has been realized and the capacity of the system
is the sum of the capacities of the N individual channels. It is worth pointing out that the precoding,
which assumes knowledge of the channel at the transmitter side, can be avoided by inverting the
channel matrix at the receiver. However, such a scheme might amplify the noise significantly and
degrade the transmission performance.
13.9 Conclusion
The AC matrix is directly involved in modeling with the MMSE criterion. Although it does not
appear explicitly in the most common adaptive filter algorithms, its eigenvalues – in particular, the
minimum eigenvalue – control the system operation.
The AC matrix is instrumental in high-resolution spectral analysis of signals, through the
harmonic decomposition method. In that method, the signal is modeled by a set of sinusoids
in noise, whose frequencies are linked to the zeros of the minimum eigen filter – that is, the
filter coefficient vector is the eigenvector associated with the minimum eigenvalue. Compared
with linear prediction, harmonic decomposition avoids the bias introduced by the least squares
criterion.
Linear prediction allows for real-time signal analysis and can lead to simple and efficient
compression methods.
The techniques developed for one-dimensional signals can be extended to multidimensional
signals with real-time matrix processing in broad diffusion applications – in particular, mobile
radiocommunications.
where M G is the matrix of the derivatives of the measurements with respect to the parameters.
Taking the derivative of x(n), as defined in (13.A1), with respect to the parameters, we obtain:
⎡ N −jA N(N−1) −jAN ⎤
t ⎢ 2 ⎥
MG M G ⎢jA N(N−1) A2 N(N−1)(2N−1) A2 N(N−1) ⎥
(13.A3)
⎢ 2 6 2 ⎥
⎢ jAN A 2 N(N−1)
A 2 N ⎥
⎣ 2 ⎦
276 13 Signal Analysis and Modeling
Since the three parameters to be estimated are real, their variances are obtained from the real
t
part of the above matrix MG M G only. Assuming the noise is complex, the inverse of the real part of
t
the matrix MG M G must be multiplied by 𝜎b2 ∕2, which yields:
𝜎b2 1 6 1 (N − 1)
var{A} = ; var{𝜔} = ; var{𝜙} = (13.A4)
2N SB N(N 2 − 1) SB N(4N 2 − 3N + 1)
These bounds are called Cramer–Rao bounds.
Exercises
13.1 Real signal estimation. Consider the sequence:
√
x(n) = 2 cos(2𝜋nf); 0 ≤ n ≤ N
Apply definition (13.18) to find the AC function. For f = 1/8 and N = 16, verify that the
value r 1 = 0.618 is obtained.
Clearly, the AC function does not provide an accurate estimation of the frequency. In
order to use the iterated correlation technique, relation (13.31) is modified by canceling the
terms with index greater than N−1 in the sequence y(n)=TFD2N (x). Refer to Section 9.2 for
justification.
For f = 0.1, we find f estim = 0.1014, and if a noise such that SNR = 10 (10 dB) is added
to the signal, the variance becomes var = 4.7 × 10−6 . Compare with the estimation bound
(13.23) for complex signals.
13.2 The signal x(n) = m + e(n), where m is a constant and e(n) is a white noise of power 𝜎e2 , is
applied to a recursive estimator whose output is:
Assuming x(n) = 0 for n < 0, compute y(n). If b = 0.8, how many samples are needed for
y(n) to approach m, in the mean, within 1%?
Compute the output mean square error, E[[y(n) − m]2 ], for n > 0. What is the limit when
n tends toward infinity? Study the evolution and the choice of coefficient b for the three
cases: 𝜎 e ≈ m; 𝜎 e > m; and 𝜎 e < m.
Compare the performance of the recursive estimator with that of the non-recursive estima-
tor defined by:
1 ∑
n
y(n) = x(i)
n + 1 i=0
√ ( )
13.3 Consider the signal sequence x(n) = 2 sin n𝜋 4
. Compute the first three terms of the AC
function. Compute the eigenvalues and eigenvectors of the 3 × 3 AC matrix. Verify expres-
sion (13.40) for the decomposition and reconstruction of the matrix.
H(Z) = 1 − a1 Z −1 − a2 Z −2
References 277
13.6 Give the polynomial decomposition with even and odd symmetry for the following predic-
tion filter:
1 − AN (Z) = (1 − 1.6 Z −1 + 0.9 Z −2 ) (1 − Z −1 + Z −2 )
Give the zeros of polynomials obtained and compare them with those of the initial filter.
13.7 In a radio transmission link, in order to dispose of large jamming signals, a three-antenna
network is employed, and the following sequences are available:
√ √
x1 (n) = d(n) + 2 sin(n𝜋∕4 − 𝜋∕6); x2 (n) = d(n) + 2 sin(n𝜋∕4);
√
x3 (n) = d(n) + 2 sin(n𝜋∕4 + 𝜋∕6)
The data d(n) are independent and have unit power. Compute the 3 × 3 covariance matrix
R3 of the input signals.
A weighted summation of these three signals is performed. Write the matrix equation
needed to compute the weighting coefficient values such that the summation yields the
useful sequence d(n). Deduce that the first and third coefficients are equal and provide the
coefficient values.
13.8 A MIMO system with two antennas at the transmitter side and the receiver side has the
channel coefficient matrix:
[ ]
0.8 0.5ej𝜋∕3
H=
0.7ej𝜋∕4 0.6e−j𝜋∕8
Compute the eigenvalues of that matrix.
The total emitted power is unity, and it is uniformly distributed between the two anten-
nas. The additive white noise at each receiver antenna is 𝜎b2 = 0.1. Compute the theoretical
capacity of the system expressed in bit/s/Hz.
References
1 L. Marple, Digital Spectrum Analysis with Applications, Prentice-Hall, New York, 1987.
2 E. Aboutanios and B. Mulgrew, Iterative frequency estimation by interpolation on Fourier coeffi-
cients, IEEE Transactions on Signal Processing, 53(4), 1237–1242, 2005.
278 13 Signal Analysis and Modeling
3 S.S. Reddy, Eigenvector Properties of Toeplitz Matrices and Their Applications to Spectral Analy-
sis of Time Series, Signal Processing, Vol. 7, North Holland, 1984, pp. 46–56.
4 J. Makhoul, Linear prediction: a tutorial review, Proceedings of the IEEE, Vol. 63, pp. 561–580,
1975.
5 A. Paulraj, R. Nabar and D. Gore, Introduction to Space-Time Wireless Communications, Cam-
bridge University Press, USA, 2003.
6 D. Tse and P. Viswanath, Fundamentals of Wireless Communication, Cambridge University Press,
USA, 2005.
279
14
Adaptive Filtering
Adaptive filtering is used when we need to realize, simulate, or model a system whose char-
acteristics develop over time. It leads to the use of filters whose coefficients change with time.
The variations in the coefficients are defined by an optimization criterion and are realized
according to an adaptation algorithm, both of which are determined depending on the application.
There are many different possible criteria and algorithms [1-4]. This chapter examines the simple
but, in practice, most important case in which the criterion of mean square error minimization is
associated with the gradient algorithm.
While fixed coefficient filtering is generally associated with specifications in the frequency
domain, adaptive filtering corresponds to specifications in time. It is natural to introduce
this subject by considering the calculation of filter coefficients in these conditions. We begin
by examining FIR filters.
where X(p) now designates the column vector of the N most recent input samples at time p:
X t (p) = [x(p), x(p − 1) · · · x(p + 1 − N)]
Applying the results from Section 13.5, we obtain the following expression for the filter
coefficients:
H(n) = R−1
N (n)ryx (n) (14.2)
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
280 14 Adaptive Filtering
~ y(n)
y (n)
x(n)
PROGRAMMABLE
+
FILTER
–
Input
signal Reference
e(n)
COEFFICIENT
UPDATING
ALGORITHM
Whenever a new data set {x(n + 1), y(n + 1)} becomes available, the new coefficient vector
H(n + 1) can be recursively computed from H(n). The definition equations (14.3) and (14.4) lead
to the recursions:
RN (n + 1) = RN (n) + X(n + 1)X t (n + 1)
(14.5)
ryx (n + 1) = ryx (n) + X(n + 1)y(n + 1)
Now:
RN (n + 1)H(n + 1) = ryx (n + 1) = ryx (n) + X(n + 1)y(n + 1)
and:
RN (n + 1)H(n + 1) = RN (n)H(n) + X(n + 1)y(n + 1)
and finally:
RN (n + 1)H(n + 1) = [RN (n + 1) − X(n + 1)X t (n + 1)]H(n) + X(n + 1)y(n + 1)
Hence the recursion:
t
H(n + 1) = H(n) + R−1
N X(n + 1)[y(n + 1) − H (n)X(n + 1)] (14.6)
14.1 Principle of Adaptive Filtering 281
x(n)
T T
Xδ
+ + +
e(n)
a0 a1 aN–1
T T T
– + y(n)
+ +
The choice of the value 𝛿 in equation (14.8) leads to a compromise between the adaptation speed
and the value of the residual error when the adaptation is obtained. These two properties will be
studied later, but it is first necessary to study the convergence conditions.
The case of a well-dimensioned system without noise is considered first, which means
that the residual error is null after convergence. Designating the coefficient deviation by
ΔH(n) = H opt − H(n), we can write the following at time n:
and also:
t
y(n + 1) = Hopt X(n + 1); e(n + 1) = ΔH t (n)X(n + 1) (14.13)
Considering deviation means, taking the expectations of the two terms in (14.14), and assuming
independence of e2 (n + 1) and X t (n + 1) X(n + 1), it appears that convergence cannot occur if the
following conditions are not satisfied:
2
0<𝛿< (14.17)
N𝜎x2
where 𝜎x2 is the input signal power.
The two upper bounds (14.16) and (14.17) are linked by the peak factor F c of the input signal and,
in applications, an intermediate value may be selected – for example (14.17), with some margin.
In the case of Gaussian signals, it can be shown that convergence is guaranteed as soon as the
adaptation step is smaller than:
1 2
0<𝛿< (14.18)
3 N𝜎x2
An alternative way of deriving (14.17) is worth mentioning. A convergence condition can be
derived by considering the a posteriori error defined by:
The adaptive filter works properly if the adaptation is efficient – that is, if the a posteriori error is
smaller, on average, than the a priori one:
Substituting the coefficient updating equation (14.8) into the definition of 𝜀(n + 1) yields:
The two errors are, in fact, proportional, and condition (14.15) follows.
If a measurement noise is introduced, the block diagram is as shown in Figure 14.3.
The output error is:
The noise b(n) is zero mean and its power is 𝜎b2 . Then, the quadratic coefficient deviation satisfies
the recurrence:
Since the noise is uncorrelated with the signal, the last term in the right-hand side of the equation
is zero in the mean and the convergence condition is derived by taking the expectations of the two
terms in (14.20):
[ ]
𝛿N𝜎x2 − 2 ER + 2𝜎b2 < 0 (14.21)
284 14 Adaptive Filtering
where ER is the mean square error. Then, the convergence stops as soon as the following equality
is satisfied:
𝜎b2
ER = (14.22)
1 − 𝛿2 N𝜎x2
Thus, after convergence, a residual error with power ER remains, and the noise power corre-
sponds to the minimum mean square error Emin , which is reached when the coefficients take on
their optimal value H opt . This residual error is analyzed in Section 14.4.
With the stability ensured, it is interesting to evaluate the adaptation speed and to determine the
time constant of the adaptive filter.
Assuming the initial coefficient values are null, 𝜎y2 is the power of the reference signal.
In concrete applications, maximum adaptation speed is often sought and, thus, the maximum
adaptation step size is contemplated.
The convergence condition (14.17) provides an upper bound for the adaptation step, and if 𝛿
exceeds that bound then the output error will grow. A geometric illustration, based on the error
surface representation, assumed to be symmetric, shows that the fastest adaptation is obtained
with half that bound, 𝛿 = N𝜎1 2 . That result can be proven analytically using the approach described
x
in the next section. In these conditions, the time constant satisfies the following inequality:
𝜏e ≥ N (14.29)
In order to complete the examination of this adaptive filter, the residual error after adaptation
still has to be evaluated in the general case – that is, with imperfect dimensioning and measurement
noise.
Since the coefficient deviations have the same variance, factorization can take place and, using
(14.35), it appears that the residual error at infinity, ER = E(∞), is given by:
Emin
ER = (14.39)
1 − (𝛿∕2)N𝜎x2
It is worth pointing out that the above equation leads to the stability condition (14.17) derived by
a different approach.
In practice, due to the margin generally taken on the step size 𝛿, the following approximation
holds:
( )
𝛿
ER ≃ Emin 1 + N𝜎x2 (14.40)
2
In terms of the time constant, with equation (14.27), we have:
( )
NT
ER ≃ Emin 1 + (14.41)
2𝜏e
Thus, the relation between the time constant and the residual divergence is clearly seen. T is the
sampling period, taken here as being equal to unity.
The increase in residual error due to the step size 𝛿 can be viewed as a gradient noise.
The complexity parameters of adaptive filters are the same as those for filters with fixed coeffi-
cients. The most important are the multiplication rate, the number of bits of the coefficients, and
the internal memories. The limitations on the number of bits of the coefficients and internal data
increase the residual error ERT . The specifications are generally given in terms of a minimum gain
of the system. That is, the ratio of the power of the reference signal 𝜎y2 to the total residual error ERT
14.5 Complexity Parameters 287
power Emin .
If the coefficients are quantized with the step q1 , the round-off errors make a vector which has to
be incorporated into the coefficient evolution equation.
( 2) Assuming that round-off errors are inde-
q
pendent of other signals, the additional term 121 IN has to be introduced in equations (14.33)
and equation (14.34) becomes:
2 ( )
𝛿 1 q1 1
E{[𝛼(∞)][𝛼(∞)]t } = E(∞)IN + diag (14.43)
2 2𝛿 12 𝜆i
Finally, the total residual error ERT is given by:
[ ]
1 q22 N q21
ERT = Emin N (14.44)
1 − (𝛿∕2)N𝜎 2 x 12 2𝛿 12
q1 = hmax 21−bc
G2 ⋅ Emin ≈ 𝜎y2
288 14 Adaptive Filtering
Starting from zero values for the coefficients, in the convergence phase of the filter, it can be taken
that 𝜎 e ≃ 𝜎 y and the time constant 𝜏 s for the sign algorithm can be expressed by:
𝜎y
𝜏s ≈ (14.55)
Δ𝜎x
After convergence, one can take 𝜎e2 = Emin . If the variation step is sufficiently small, the residual
error ERS in the sign algorithm is:
( )
NΔ 𝜎x
ERS ≈ Emin 1+ √ (14.56)
2 Emin
The residual error is thus found to be larger than with the gradient algorithm. This leads to low
values for Δ. It should also be noted that, remembering equation (14.54), the stability condition
(14.17) is represented by the inequality:
√
2 Emin
Δ≤ (14.57)
N 𝜎x
which can be taken as a convergence condition and can result in very small values of Δ. In practice,
equation (14.52) is usually modified to become:
hi (n + 1) = (1 − 𝜀) hi (n) + Δ sign(e(n + 1) x(n + 1 − i)) (14.58)
The constant 𝜀, positive and small, introduces a leakage function, which is needed, for example,
in transmission systems which must tolerate a certain error rate. Under these conditions, the coef-
ficients are bounded by:
Δ
|hi (n)| ≤ ; 0≤i≤N −1 (14.59)
𝜀
This modification leads to an increase in the residual error. The coefficients are biased, and
instead of (14.11) for small values of 𝜀 and Δ, we can write:
[ ]−1
𝜀
E[H(∞)] = I + RN E[y(n)X(n)] (14.60)
Δ
The corresponding increase in residual error can be calculated by an expression similar to
equation (14.36).
The constants 𝜀 and Δ are chosen on the basis of the performance to be achieved in each case.
The adaptive filters considered above are of the direct structure FIR type. It is a simple and robust
approach and is commonly used. However, as with fixed coefficient filters, other structures can be
employed.
Beginning with a given set of values of the coefficients, variations proportional to the gradient of
the error function E(A) must be applied so as to minimize the mean square error. This leads to:
2 ∑
N0 −1
𝜕E 𝜕̃y(n)
= − [y(n) − ỹ (n)] ; k = 1.2; 1≤i≤L (14.61)
𝜕aik N0 n=0 𝜕aik
In order to calculate the term gki such that:
𝜕̃y(n)
gki (n) =
𝜕aik
one can use the expression for y(n) obtained by an inverse Z-transform on the transform X(Z) of
the set x(n) (see Section 4.2). Hence:
1 ∏( L
)
ỹ (n) = Z n−1 1 + ai1 Z −1 + ai2 Z −2 X(Z)dZ
∫
2𝜋j r i=1
𝜕̃y(n) 1
L
∏( )
= Z n−1 Z −k 1 + al1 Z 1 + al2 Z −2 X(Z)dZ
𝜕aik 2𝜋j ∫𝛤 l=1
l≠i
or,
𝜕̃y(n) 1 H(Z)
= Z n−1 Z −k X(Z)dZ (14.62)
𝜕aik 2𝜋j ∫r 1 + a1 Z + ai2 Z −2
i −1
Thus, to form the term gki (n), it is sufficient to apply the set ỹ (n) to a recursive section whose
transfer function is the inverse of that of the initial section of order i. This recursive section has
the same coefficients, but with the opposite sign. The corresponding circuit is given in Figure 14.4.
Xδ e(n)
x(n) – + y(n)
H1 H2 HL
~
y (n)
+
g11(n) g12(n) gL1(n) gL2(n)
Z–1
gi1(n)
+
–ai1
Z–1
gi2(n)
–ai2
The filter obtained in this way is more complicated than that in the previous section, but it offers
a very simple method of finding the roots, which, due to the presence of a recursive part, should be
inside the unit circle in the Z-plane to ensure the stability of the system. The techniques derived
for FIR filters also apply to IIR filters.
The coefficients of an IIR filter can be calculated for time specifications by using the least mean
squares technique in an iterative procedure, as was done in Section 7.3. Algorithms for adapting
the coefficients to the time evolution of the system can also be deduced.
A linear system can be modeled by a purely recursive IIR filter with a Z-transfer function, G(Z),
such that:
1
G(Z) = a0 ∑K (14.64)
1+ k=1 bk Z −k
In this case, the model is said to be autoregressive (AR). This important and convenient approach
is also appropriate if the best representation of the system corresponds to a Z-transfer function,
H(Z), which is the quotient of two polynomials:
N(Z)
H(Z) =
D(Z)
where N(Z) has all its roots inside the unit circle and thus has minimum phase. In this case, for a
suitable integer M, we can write:
1 ∑ M
≃1+ ci Z −i
N(Z) i=1
It is then sufficient to let the degree K of the denominator of the function G(Z) take a value
sufficient for representing H(Z). The presence of roots inside the unit circle of the system results
in an increase in the number of poles of the model [1].
The general IIR filter corresponds to an autoregressive moving average (ARMA) model. This is
the most widely used approach for modeling a linear system. For the IIR filter whose coefficients
must be calculated over a set of N 0 indices in order to approximate a set y(n), the output is written as:
∑
L
∑
K
ỹ (n) = al x(n − l) − bk ỹ (n − k) (14.65)
l=0 k=1
1 ∑
N0 −1
E(A, B) = [y(n) − ỹ (n)]2 (14.66)
N0 n=0
Starting from a set of values for the coefficients, this function can be minimized using the gradient
algorithm if the coefficients are given increments proportional to the gradient of E(A, B) and of the
opposite sign. The presence of a recursive part causes complications. Calculation of the gradient
292 14 Adaptive Filtering
2 ∑
N0 −1
𝜕E 𝜕̃y(n)
=− [y(n) − ỹ (n)] ; 0≤l≤L
𝜕al N0 n=0 𝜕al
2 ∑
N0 −1
𝜕E 𝜕̃y(n)
=− [y(n) − ỹ (n)] ; 1≤k≤K
𝜕bk N0 n=0 𝜕bk
with:
𝜕̃y(n) ∑K
𝜕̃y(n − k)
= x(n − l) − bk (14.67)
𝜕al k=1
𝜕al
𝜕̃y(n) ∑ 𝜕̃y(n − j)
K
= −̃y(n − k) − bj (14.68)
𝜕bk j=1
𝜕bk
To show the method of realizing equations (14.67) and (14.68), we can write:
∑L −l
l=0 al Z N(Z)
H(Z) = ∑K =
1+ bk Z −k D(Z)
k=1
Then:
1
ỹ (n) = Z n−1 H(Z)X(Z)dZ
2𝜋j ∫𝛤
and consequently:
𝜕̃y(n) 1 1
= Z n−1 Z −l X(Z)dZ (14.69)
𝜕al 2𝜋j ∫𝛤 D(Z)
𝜕̃y(n) 1 (−1)
= Z n−1 Z −k H(Z)X(Z)dZ (14.70)
𝜕bk 2𝜋j ∫r D(Z)
The gradient is thus calculated from the set obtained by applying x(n) and ỹ (n) to the circuits
corresponding to the transfer function 1/D(Z).
To simplify the implementation, the second terms in equations (14.67) and (14.68) can be ignored,
which leads to the following increments:
For each value of the index n, the coefficients 𝛼 l and bk are incremented by a quantity which is
proportional to the product of the error e(n) = y(n) − ỹ (n) with x(n − l) and ỹ (n − k), respectively.
The stability and parameters of this type of filter can be studied in the same manner as for FIR
filters [5]. However, it is not difficult to control the stability of an IIR filter when it is constructed
as a cascade of second-order sections, which, as indicated in previous chapters, also offers other
advantages.
Consider a filter constructed as a cascade of second-order elements and with a transfer function
G(Z). In the AR case, we have:
∏
L
1
G(z) = a0 (14.72)
i=1 1 + bi1 Z −1 + bi2 Z −2
14.9 Conclusion 293
To control the stability of such a filter, it is sufficient, as set forth in Section 6.7, to ensure that the
following conditions are fulfilled:
|bi2 | < 1; |bi1 | < 1 + bi2 ; 1 ⩽ i ⩽ L (14.73)
As before, the calculation of the gradient of the error function requires knowledge of the term gki ,
where:
𝜕̃y(n)
gki (n) =
𝜕bik
Since,
1 ∏
L
1
ỹ (n) = Z n−1 a0 X(Z)dZ
2𝜋j ∫𝛤 i=1 1 + b i −1
1 Z + bi2 Z −2
we obtain:
𝜕̃y(n) 1 Z −k
=− Z n−1 G(Z)X(Z)dZ
𝜕bik 2𝜋j ∫𝛤 1 + b1 Z + bi2 Z −2
i −1
This expression indicates that the terms gki (n), with k = 1, 2 and 1 ≤ i ≤ L, are obtained by applying
the set ỹ (n) to the ith recursive section. The stability of the system is tested by equation (14.73) for
each value of the index n.
The method which has been discussed is also applicable to an ARMA model, but in this case, the
circuits are rather more complicated.
The techniques used in the previous sections involve overall minimization of the mean square
error. It is also possible to achieve minimization step by step, using lattice structures.
The lattice filter in Figure 13.3 can also be adapted by a gradient algorithm for each value of the
index. Indeed, using the equations:
ei (n) = ei−1 (n) − ki bi−1 (n − 1)
(14.74)
bi (n) = bi−1 (n − 1) − ki ei−1 (n)
one can write the gradients as:
𝜕e2i (n)
= −2ei (n)bi−1 (n − 1)
𝜕ki
(14.75)
𝜕b2i (n)
= −2bi (n)ei−1 (n)
𝜕ki
and the following variations can be applied to the coefficients by assuming that the functions
[ ]
E e2i (n) + b2i (n) for 1 ⩽ i ⩽ N are to be minimized:
ki (n + 1) = ki (n) + 𝛿i (ei (n)bi−1 (n − 1) + bi (n)ei−1 (n)) (14.76)
As the power of the signals ei (n) and bi (n) decreases with the index i, the variation step 𝛿 i must
be related to this power in order to obtain a certain degree of homogeneity of the time constants.
14.9 Conclusion
Several techniques for designing and producing adaptive filters have been presented in this chapter.
They are based on the gradient algorithm, which is the simplest and most robust approach for
294 14 Adaptive Filtering
changing the coefficients. The direct FIR structure has been studied in detail by developing the
adaptation parameters (the time constant and the residual error) and the complexity parameters
(multiplication rate and the number of bits of the coefficients and internal data). This is the struc-
ture most commonly used in practice. In some specific cases, different structures, such as IIR, mixed
FIR–IIR, or lattice structures, can offer significant advantages. Analysis of the stability conditions
in these structures and the study of the adaptation and complexity parameters can be performed
by a method similar to that given for the FIR structure.
The gradient algorithm results in a relatively slow change in the values of the coefficients of the
filter, especially when a low residual error is required and when it is used in its most reduced form:
the sign algorithm. In order to find the most rapid rate of adaptation, all the coefficients can be
recalculated periodically by using fast iterative procedures. The lattice structure is well suited to
this approach and allows for real-time analysis or modeling of signals such as speech with circuits
of moderate complexity.
The gradient algorithm can be improved, for example, by using different adaptation steps for the
coefficients, which are obtained from statistical estimates of the signal characteristics.
It is possible to envisage criteria which, for certain applications, are more appropriate than the
minimization of the mean square error, and algorithms which are more efficient than the gradient
algorithm can be developed [1]. However, these algorithms are generally more complicated to put
into operation, and problems of sensitivity to any imperfection in the realization may arise.
In conclusion, FIR and IIR structures operating according to the least mean square error crite-
rion and using the gradient algorithm, or its simplest form, the sign algorithm, offer a simple and
effective compromise for adaptive filtering applications.
Exercises
14.1 A signal x(n) can be modeled by the output of a filter H(Z) when the input is a unit power
white noise. The Z-transfer function is:
1∕ (1
2 + Z −1 )
H(Z) =
1 − 0.5Z −1
Compute the impulse response and show that the three first terms of the AC function
take on the values: r 0 = 1; r 1 = 0.75; r 2 = 0.375.
For compression, a two-coefficient predictor P(Z) is used, and the output sequence is:
Compute the optimum coefficient values – those which minimize the output power.
Give the output power value and the prediction gain Gp .
Compute the impulse response of the cascade H(Z)P(Z).
In an adaptive realization with the gradient algorithm, give the coefficient updating
equations and the maximum adaptation step value 𝜹m . Compute the residual error for
adaptation step 𝜹 = 𝜹m /4 and give the time constant of the adaptive predictor.
Starting from zero initial coefficients, examine the trajectory of the zeros of this filter for
an adaptation step 𝛿 = 0.1.
Give the new values of the prediction coefficients when discrete white noise of power 𝜎 2
is added to the signal.
References
1 B. Widrow and S. D. Stearns, Adaptive Signal Processing, Prentice-Hall, Englewood Cliffs, NJ,
1985.
2 M. Bellanger, Adaptative Digital Filters, 2nd Edition, Marcel Dekker Inc., New York, 2001.
3 O. Macchi, Adaptative Processing, the LMS approach with Application in Transmission, Wiley,
New York, 1995.
4 S. Haykin, Adaptative Filter Theory, 4th Edition, Prentice-Hall, Englewood Cliffs, New Jersey,
2000.
5 P. A. Regalia, Adaptative IIR Filtering in Signal Processing and Control, Marcel Dekker Inc.,
New York, 1995.
297
15
Neural Networks
Adaptive systems are at the heart of digital communication networks. They are particularly critical
for data transmission efficiency, which is achieved through channel equalization. This operation
involves an initial learning phase, an operational phase, and subsequent decisions and improve-
ments throughout the duration of the communication. Thanks to artificial intelligence, these
concepts can be extended to all technical fields, with operational devices that are ever more
complex and sophisticated. Signal processing and adaptive techniques, as presented above, are
profoundly involved in one such device – namely, the neural network.
The present chapter describes how neural networks operate, and how signal-processing tech-
niques, and specifically adaptive techniques, are exploited. As a starting point, we look at a simple
classification operation.
15.1 Classification
Classification is a basic operation in shape recognition. Its complexity depends on the space in
which it is carried out. In a two-dimensional space, objects defined by their coordinates can be
grouped together, and the different groups can be separated by curves. Whenever a new object
appears, it is assigned to an existing group, depending on its position with respect to the separation
curves. An illustration is provided in Figure 15.1.
N objects are separated into two groups (a and b) by a line with equation:
h0 + h1 X1 + h2 X2 = 0 (15.1)
The structure of the corresponding classifier is shown in Figure 15.2.
Assuming the objects are represented by points in the plane, during the learning phase, the coor-
dinates of the two categories of points are fed to the input, the output is subtracted from a reference
d, and the error e obtained is used to determine the coefficients of the separation line.
An iterative procedure can be employed to obtain the coefficient vector H, starting from an initial
vector. As in the previous chapter, we can write:
𝛿
H(n + 1) = H(n) + e(n + 1)X(n + 1); 0≤n≤N −1 (15.2)
X(n + 1)X t (n + 1)
with:
e(n + 1) = d(n + 1) − X t (n + 1)H(n) (15.3)
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
298 15 Neural Networks
The procedure and the reference must be adjusted to the two types of data (a and b) to achieve
the desired separation. In the absence of prior information, the reference is null (d(n)=0).
The adaptation step 𝛿 = 1 leads to cancellation of the a posteriori error as shown in Section 14.6.
Hence the need to select 0 < 𝛿 < 1.
Once the learning phase is completed, the sign of the output y(n) is used for the classification
operation. In fact, a decision device is inserted at filter output and the whole structure is called a
“neuron”.
Now, if two other groups of objects (c and d) are introduced, two lines are needed to divide the
plane into four parts and potentially achieve separation, which leads to the combination of two
neurons, as shown in Figure 15.3. The system has two inputs fed by the coordinates of the objects
and two outputs which, if binary ±1, allow for an object to be assigned to one of the four groups.
Nonbinary outputs can also be exploited; they are interpreted as estimations of the probability for
the objects to belong to the groups.
The approach can be generalized – a neuron can have a large number of inputs and related coef-
ficients, and several such neurons can be connected to make a system with multiple inputs and
multiple outputs, along with their decision devices. Such a system is called a “perceptron” [1].
Now, if the curve needed to separate the groups of objects in the plane is no longer a line,
nonlinear processing is required, which is achieved through the combination of several per-
ceptrons. Then, it can be shown that 2 perceptrons are able to distinguish convex domains and
+ + y1
h12
h21 h01
+ + y2
h22
x2
h02
15.2 Multilayer Perceptron 299
3 perceptrons can cope with the combination of convex domains – that is, domains of any form.
Such systems are called multilayer perceptrons.
Besides object classification, neural networks can serve to approximate or estimate nonlinear
functions. This stems from the partitioning capability provided by their decision, or activation, of
nonlinear functions. In theory, any function can be approximated with arbitrary precision,
although some constraints may have to be introduced for actual realization [2]. This aspect is dealt
with at the end of the chapter.
A system with two intermediary layers, called hidden layers, is shown in Figure 15.4. If all the
connections exist, the network is said to be fully connected.
By definition, in a network made of L+1 layers, the input and output correspond to layers 0 and 1,
respectively, and L-1 hidden layers are present in between. The system may be parameterized as
follows:
The term h0j, l in the above convolution represents the introduction of a bias, linked to the statis-
tics of the data and the type of activation function. It may be null for zero-mean data.
The nonlinear activation function f (.) may be the sign, as in Figure 15.3, on which a decision is
based. However, if the synaptic coefficients in the various branches have to be updated in supervised
learning, the activation function must be derivable, so that gradient techniques can be employed.
For example, the sigmoid:
eu eu
f (u) = ; f ′ (u) = = f (u)(1 − f (u)) (15.5)
1 + eu (1 + eu )2
or the hyperbolic tangent:
hi,j,l hjk,l + 1
Expression (15.4) and the details of the connections between two stages in the network are illus-
trated in Figure 15.5.
In the supervised-learning phase, a reference sequence is available, and an error signal is derived
from the network output, through subtraction, as in the previous chapter. The presentation of the
learning algorithm is simplified if all the layers have the same width N and the output does not
include a nonlinear function, as in estimation for example.
1∑ 2
N
J(n + 1) = e (n + 1) (15.10)
2 k=1 k,L
Since the coefficients evolve in the opposite direction to the gradient, the updating equation
should be:
Hj,L−1 (n + 1) = Hj,L−1 (n) + 𝛿L−1 XL−2 (n + 1) ej,L−1 (n + 1); 1 ≤ j ≤ N (15.11)
Then, it is necessary to determine the “error” at stage L − 1. Let us assume a deviation 𝜀 is intro-
duced on the coefficient hij, L − 1 . Since, by definition, we have:
∑
N
uj,L−1 = hij,L−1 xi,L−2 (15.12)
i=1
15.3 The Backpropagation Algorithm 301
the variation of uj, L − 1 will be 𝜀 xi, L − 2 and, for the output with index k, yk , it will be
′
f (uj, L − 1 )𝜀 xi, L − 2 hjk, L , which leads to the following derivative:
𝜕yk
= f ′ (uj,L−1 ) xi,L−2 hjk,L (15.13)
𝜕hij,L−1
Summing for all the output errors, the gradient of the global cost function is obtained:
𝜕J ∑ N
𝜕yk
=− ek,L (15.14)
𝜕hij,L−1 k=1
𝜕hij,L−1
𝜕J 𝜕J 𝜕uj,L−1
= (15.14-bis)
𝜕hij,L−1 𝜕uj,L−1 𝜕hij,L−1
Due to (15.13) and (15.14), the error to be used for the updating of the coefficients of stage L − 1
at time n + 1 is given by:
∑
N
ej,L−1 (n + 1) = f ′ (uj,L−1 (n + 1)) hjk,L (n)ek (n + 1) (15.15)
k=1
As concerns other stages, referring to Figure 15.5, the following relation holds:
𝜕J ∑N
𝜕J
= f ′ (uj,l ) hjk,l+1 (15.16)
𝜕uj,l k=1
𝜕u k,l+1
It follows that relation (15.15) can be extended to stages lower than L − 1 and a recurrence
equation is established for the “errors”:
∑
N
ej,l−1 (n + 1) = f ′ (uj,l−1 (n + 1)) hjk,l (n)ek,l (n + 1) (15.17)
k=1
Therefore, synaptic coefficients can be updated at time n + 1, from stage L − 1 to stage 1, which
is the backpropagation algorithm.
The algorithm must be initialized so that backpropagation can occur. Unless otherwise specified,
the coefficients of the last stage are set to zero. In order for the algorithm to start, the last state vector
must differ from zero, which is obtained by setting the coefficients of the stages smaller than L to
values such that the first input vector propagates up to stage L − 1. If a bias is present, the set of
updating equations is supplemented by:
As shown in the previous chapter, the adaptation step controls the stability, the convergence
speed, and the residual error after convergence. However, the results cannot be applied directly
due to interdependence of coefficients; some specific analysis is required.
The simplest case is the last stage because each coefficient vector is updated from the related
output error. A key observation is that the values in the state vector are bounded by the activation
function, assumed to be a sigmoid or hyperbolic tangent. If that bound is unity, an initial determi-
nation of the range for the adaptation step in the last stage is:
2
0 < 𝛿L < (15.18)
N
302 15 Neural Networks
Further in the analysis, hidden layers must be accounted for. To begin with, a system with a single
hidden layer, L = 2, and a single output with index k, is considered. Then, a simple approach is to
refer to the a posteriori error obtained after coefficient updating by:
𝜀k,2 (n + 1) = dk (n + 1) − Hk,2
t
(n + 1)X1 (n + 1) (15.19)
As mentioned in the previous chapter, the stability condition implies that the mean of the ampli-
tude of the a posteriori error must be less than the mean of the amplitude of the a priori error:
Once the coefficients of layers 1 and 2 have been updated, the a posteriori error becomes as
follows, neglecting second-order terms:
[ ]2
⎡ ∑N
∑N N
∑ 𝜕yk ⎤
⎢
𝜀k,2 (n + 1) ≈ ek,2 (n + 1) 1 − 𝛿2 xj,1 (n + 1) − 𝛿1
2 ⎥ (15.21)
⎢ 𝜕h ⎥
⎣ j=1 j=1 i=1 ij,1
⎦
Introducing the activation function and for a single adaptation step, 𝛿, the stability condition
becomes:
2
0<𝛿< (15.22)
A+B
∑
N
A= [f (uj,1 (n + 1))]2
j=1
∑
N
∑
N
2
B= xi,0 (n + 1) [f ′ (uj,1 (n + 1)) hjk,2 (n) ]2
i=1 j=1
The importance of the hyperbolic tangent, bounded as well as its derivative, clearly appears in
the expressions above.
Now, when all the outputs are active, equation (15.21) becomes a matrix equation. Letting H 1 and
H 2 designate the N × N coefficient matrices, it can be shown that the a priori and the a posteriori
output vectors are related by:
[ ]
∈ (n + 1) = 1 − 𝛿2 |f (U1 (n + 1)|2 IN − 𝛿1 |X0 (n + 1)|2 H2t (n) Du H2 (n) E(n + 1) (15.23)
′
Du is a diagonal matrix whose N entries are: [f (uj1 (n + 1))]2 . In order to determine the stability
limit, the maximum eigenvalue 𝜆max of the matrix H2t Du H2 must be considered and relation (15.22)
becomes:
2
0<𝛿< (15.24)
|f (U1 (n + 1))|2 + |X0 (n + 1) |2 ⋋max
Note that the maximum eigenvalue of a square matrix is bounded by the sum of the absolute
values of the elements of a row or a column.
The above analysis can be extended to systems having several hidden layers. However, in practice,
and as a first attempt, one can simply rely on relation (15.18).
The system time constant impacts the length of the learning sequence. It is the sum of the time
constants of the different stages, which are proportional to the inverses of the adaptation steps and,
thus, are proportional to the number of coefficients, as indicated in Section 14.3. As concerns the
cascade of the different hidden layers, one can refer to the beginning of Section 6.2. Overall, a high
number of coefficients leads to long learning sequences.
15.4 Examples of Application 303
Regarding the residual error after convergence, the results of the previous chapter apply – in
particular, the fact that the residual error grows near the stability limit.
The above processing also applies to nonzero mean signals. It suffices to introduce, in each
neuron, the branch corresponding to coefficient h0 and update, for example, according to relation
(15.11-bis).
It is worth mentioning that, in principle, alternative optimization techniques, which are more
complicated and harder to understand and implement than the gradient, may be employed [3].
h122
h211
x2 f(.) – d1
h221 h222 y1
e1
304 15 Neural Networks
The standard deviation after convergence is 5%. Invoking the series expansion:
1 2
tanh(u) = u − u3 + u5 + … (15.28)
3 15
Estimations a, b of the coefficients of the terms x2 , x1 in (15.25) are given by:
a = h211 h112 + h221 h212 = 1.23; b = h111 h112 + h121 h212 = 2.07 (15.29)
Comparing to the slope of function x2 = f (x1 ) at the origin, which equals 2, we find b/a = 1.7.
Note that the function has a maximum and, in contrast, a monotonic increasing function would
yield a more accurate estimation, as illustrated in Exercise 5.
As for the coefficient of the term −x13 , setting the factor of x2 to unity, we obtain:
( )
1 3
c= h111 h112 ∕a3 = 1,06
3
In general, accurate estimation of nonlinear functions, with the above activation functions and
a single hidden layer, requires large numbers of neurons [2].
2. Classification. The network shown in Figure 15.6 is able to perform a classification operation
in the plane with its 2 outputs and the curves:
x2 + 2x1 − x13 = 0; x2 − 2x1 + x13 = 0 (15.30)
Four areas are distinguished, and the input data pairs can be assigned to one of those. In the
process, equations (15.26) and (15.27) are supplemented by:
e1 (n + 1) = d1 (n + 1) − y1 (n + 1) (15.31)
Worth noting is the impact of the second output on the coefficients of the first part of the circuit.
After learning, the classification error is less than 2%.
3. Graphic symbol recognition. Image recognition is among the most important applications of
neural networks. In order to provide a simple yet convincing illustration, digit recognition is
considered.
The 10 digits can be represented in a grid of 5×3 pixels, as shown in Figure 15.7.
Weights −1 and +1 are assigned to white and black pixels, respectively. The digits can be identi-
fied through 10 sets of 15 coefficients each. These coefficients take values ±1/15 and the summation
outputs are included in the interval [−1 +1]. Note that digits may differ by just a single pixel. Indeed,
it is not necessary to use a network with hidden layers in that case.
However, for recognition of handwritten digits, redundancy must be introduced to account for
variations in handwriting. A network with 2 hidden layers is represented in Figure 15.8. The input
grid contains N1 pixels.
15.4 Examples of Application 305
x1 x1,1 x1,2
0
Layer 1 Layer 2
9
xN1 xN2,1 x15,2
From N1 input values, after learning, layers 1 and 2 yield a set of discriminating values that are
used by the 10 sets of output coefficients to provide 10 final values, to decide which digit was applied
at input. Learning can be performed digit by digit, using only one output error and scanning all the
digits at each iteration.
As an illustration, we take N1 = 8×8, N2 = 32 and consider three digits. Learning is carried out
with 3, 5 and 7, step 𝛿 = 0.01 and 150 iterations. In the coefficient matrices of layers 1 and 2, the
diagonal entries of the square central matrices are set to 1 and a single output error is used for
coefficient updating. The following discriminations are obtained for the digits, in the absence of
noise (a), with SNR = 5 dB (b):
(a)
3 5 7
3 1 0.11 −0.37
5 0.11 1 −0.22
7 −0.36 −0.22 1
(b)
3 5 7
3 1 0.14 −0.49
5 0.11 1 −0.36
7 −0.43 −0.30 1
Clearly, recognition is easily achieved, even in the presence of a high level of noise. The impor-
tance of initialization is worth pointing out – a reduction of initial coefficient values leads to an
increase of the system time constant and then, it might be necessary to include all the output errors
in the coefficient updating process.
In order to cope with multiple ways of writing the same digit, with significant differences, the
size of the input grid must be increased, and the learning process must be adjusted accordingly, if
a recognition rate objective is prescribed.
The above example illustrates the role of the hidden layers – in one or more steps, they condense
the input grid onto the grid employed by the output stage. Of course, the computational cost grows
with the size of the input grid and might become prohibitive.
306 15 Neural Networks
In certain fields, such as image processing or vision, the amount of data to be processed simultane-
ously might be enormous, leading to layers of excessive dimensions in the networks. For example,
take a black-and-white image of size 32 × 32. The dimension of the input vector is 1024 and the
number of coefficients in the layers may reach the millions, which is unrealistic.
However, it is known that compression techniques are particularly efficient with images and
certain categories of signals, which is evidence that these signals are highly redundant. In such
conditions, it appears well advised – for example, in recognition or classification operations – to
introduce one or several compression stages before the neural network. These extra stages can
extract characteristics which may considerably reduce the dimensions of the hidden layers and
facilitate the mastering of networks. Generally, the corresponding devices are banks of FIR fil-
ters, which perform convolutions or correlations, to cancel out redundancies while preserving the
specificities of signals [4–6]. The principle is shown in Figure 15.9.
The coefficients in the filter banks depend on the characteristics that are to be extracted.
For example, the filters mentioned in Section 5.14 for the extraction of contours in images are
frequently used. At the output, a table of characteristics are available. Several such devices may
be cascaded, in combination with sample rate reductions, so as to end up with a neural network
input vector of significantly reduced size.
In the approach, taking the characteristics of images into account, it might be well advised to
use activation functions other than sigmoid or hyperbolic tangents. In fact, in classification for
example, we can consider that the desired output is obtained by aligning the input data vector and a
coefficient vector, such that the scalar product is maximized. In a multilayer network, the alignment
is achieved step by step. The outputs of the first layer correspond to the scalar products of the input
vector and the coefficient vectors associated with these outputs. A negative output implies opposite
directions for the vectors involved and, thus, it is useless to keep this coefficient vector in the chain
leading to the desired global alignment, and this output can be dropped. Hence, the nonlinear
activation function ReLU (rectified linear unit), which is zero for negative values of the variable
and linear for positive values. In order to make the procedure even more selective, a threshold can
be introduced which cancels the small positive values. In fact, it can be argued that the operation
of a perceptron layer is similar to a data-clustering operation.
In the implementation of a ReLU-based CNN system, it is worth emphasizing that the derivative
of the activation function is not continuous, and the signal amplitude is not bounded, which
requires stability control. To that end, an upper bound may be imposed on the activation function
output and the step size may be chosen to satisfy the conditions given in Section 15.3. More-
over, initialization may be challenging and, instead of assigning random values to the coefficients,
one can employ data-clustering techniques like “k-means” [6].
Reference [4] describes an application of CNN to handwritten character recognition.
X Filter Neural Y
bank network
15.6 Recurrent/Recursive Neural Networks 307
Bi (n + 1) = Bi (n) + 𝛿2 ei,1 (n + 1) X1 (n + 1)
It must be pointed out that, for matrix B, the data are bounded by the nonlinearity, while the
input data do not necessarily have the same bound, which can lead to different adaptation steps 𝛿 1
and 𝛿 2 for the two matrices.
A critical issue for recursive systems is stability. In the long run, the evolution of the coefficients
must be controlled. If matrix B is diagonal, or near diagonal, the recursive part becomes a set of
first-order cells that evolve separately, and the modulus of each single coefficient must remain
smaller than unity.
An important aspect of neural networks in general, and the recursive structure in particular, is
overdimensioning. In fact, the number of coefficients generally exceeds the number of variables
B Z–1
308 15 Neural Networks
of the system to be modelled and, as a result, we see a drift of the adaptive network coefficients
linked to the adaptation parameters. That drift can raise problems in long learning phases and in
steady-state operation. Then, it may be advisable, in the updating equations of some of the coeffi-
cients, to introduce a regularization term 𝜀 which, as described in Section 14.6, limits the amplitude
of the coefficients to 𝛿𝜀 .
0 2 hidden layers
No hidden layers
–10
Perr (dB)
–20
–30
–40
–50
0 100 200 300 400 500 600 700 800
Length of learning sequence
It can be observed that the two hidden layers double the time constant and bring a gain of about
12 dB in computation accuracy after convergence. In the absence of the jammer, the output error
power decreases to −65 dB, this threshold being due to the cascading of the two hidden layers, with
their nonlinearities.
The stability limit for the adaptation step is 1/8 without the hidden layers, and it turns out to be
slightly smaller with the two layers.
As concerns numerical complexity, the system has 768 coefficients, which implies that,
disregarding learning operations, 768 multiplications are carried out per DCT, while direct
calculation necessitates fewer than 64 multiplications.
The DCT is employed in image and video compression; a neural network can reduce the
degradations arising from imperfect sensors.
The nonlinear activation functions used in practice result from a compromise between contra-
dictory requirements. On the one hand, decisions must be made; on the other hand, adaptive
techniques (such as learning, in particular) must be implemented. Therefore, a stepwise charac-
teristic is needed for the decision, while a linear characteristic is preferred for adaptivity.
The sigmoid (15.5) is an initial compromise, which provides smooth decision and allows for the
direct use of gradient techniques. However, the output is bounded, amplitude distortions occur in
the vicinity of the bounds, and derivatives tend toward zero.
The ReLU function is an approach that keeps linearity on a part of the domain and cancels the
other part. Adaptive techniques can be applied in the retained part, on the condition that stability
conditions are met. If the cancelled part is unimportant in view of the objectives, ReLU can be
considered optimal. Thus, applied to the digit recognition task set out in Section 15.4 and with the
same parameters, it yields discrimination deviations smaller than those of the hyperbolic tangent.
The ReLU function is appropriate for function approximation because it has the capability to
divide a domain into nonoverlapping elementary parts. As an illustration, let us consider the
approximation of a half-sinusoid on the interval [01] by 4 segments bounded by the 4 lines whose
equations in the plane are:
√ √ √
y = 2 2 x; y = (4 − 2 2 ) x + 2 − 1; (15.38)
√ √ √ √
y = (2 2 − 4) x + 3 − 2; y = −2 2 x + 2 2
The block diagram of the network with two hidden layers is shown in Figure 15.12. In fact, the
input domain is divided into
√
four parts with
√
linear interpolation in each part. √
The coefficients a = − 22 and b = 1 − 22 are related to the slopes of the lines and = 22 .
At the output of the first hidden layer, the ReLU functions, represented by rectangles, keep only
positive values and, thus, set the negative values to zero and, in that case, also inhibit the addition
of the terms ± 1. The output y yields the approximation of sin(𝜋x).
In order to improve the approximation, the approach can be extended by dividing the interval
[0 1] into 2N parts with a network of N hidden layers.
A general method to approximate continuous functions on a closed interval using deep ReLU
networks is presented in Reference [9].
310 15 Neural Networks
15.9 Conclusion
In the field of artificial intelligence, neural networks are a widely used tool which can be lever-
aged in all technical areas and can produce spectacular results. They apply, among other areas, to
the approximation of functions, object classification, and shape recognition. The neural network
offers an alternative to specific techniques which exploit a priori knowledge on a given topic in
cooperation with dedicated algorithms. In fact, the two approaches can be combined, as they are
in convolutional neural networks.
On the operational front, the practical design and implementation of neural networks is a chal-
lenge. Firstly, in a given situation, the structure must be decided upon. One might think that the
more neurons are included in the system, the better will the performance be. However, then, besides
the computation requirements, multiple issues manifest themselves in relation to initialization,
response time, and overdimensioning, with related drifts and stability problems. Next, one must
get to grips with a system comprising large numbers of nonlinearities and be able to validate the
results. Finally, though the neural network is a potentially powerful tool, its exploitation requires
great care in order to be efficient and possibly competitive in terms of performance and technolog-
ical resources.
Exercises
Give the equation of a separation line and compute the distances from the points to
that line.
Use an iterative method to find a separation line and check that the separation is achieved.
A (x1 , x2 ): (1,2) (0,4) (−3,3) (−2,−1) (−4,0); B: (1,−2) (2,−1) (4,−1) (4,1) (−3,−3) (5,−2)
C: (4,5) (6,2) (6,6) (4,6)
Exercises 311
Through observation of the points, give the equations of 2 lines which separate the
3 groups.
Propose an iterative method to reach that objective and check the results.
15.4 A neuron is added to the network in Figure 15.6, so as to obtain a network with 2 hidden
layers. Give the block diagram. Keeping the reference sequence, perform a simulation and
determine the new estimations a, b, and c. Compare with the case of a single hidden layer.
White noise of amplitude 0.01 is added to the reference signal. Verify the impact on the
estimations and on the drift of the coefficients.
15.5 In the first example in Section 15.4, the reference sequence is changed and replaced by the
following nonlinear function:
( )
𝜋
d(n) = x2 (n) − sin x1 (n)
2
Simulations yield the values: a = 1.0744; b = 1.6154, and the standard deviation of the
error after convergence is 2%. Justify these values and the accuracy achieved.
15.6 In the third example in Section 15.4, digit 5 leads to the following values of the second layer
output:
X2 = 0.7 [−1 −1 −1 −1 −1 1 1 −1 −1 −1 1 1 1 1 1 ]
15.7 A source delivers 2 independent signals d1 (n) and d2 (n), uniformly distributed in the inter-
val [−1,1], which are fed to a device performing the processing:
d(n) = d1 (n) + jd2 (n); s(n) = d(n) + C d(n − 1); C = O.5 (1 + j)
( )
followed by a distortion of the amplitude a which becomes: f (a) = sin 𝜋a 4
4∕𝜋.
To retrieve the source signal, a recursive neural network with 2 inputs, 2 outputs, and 2
nonlinearities is employed. Write the equations of the network.
The coefficients of the input matrix, the recursive part, and the output matrix are
designated by hij, 1 , hij, 2 , and hij, 3 , respectively. A learning sequence yields the following
coefficients:
hij,2 = [−0.57 − 0.57 0.56 − 0.56]; h11,1 = 1.05; h11,3 = 1.25
Justify these values and the output error power E = 0.006.
312 15 Neural Networks
15.8 The impact of the initial conditions in the example provided in Section 15.7 is investigated.
Give the first value of the variables, xi1 and xi2 , at the output of the activation functions of
the first and the second hidden layers respectively. Answer the same question for the errors
ei1 and ei2 which are involved in updating at the next step. Deduce the impact of dividing
the initial coefficient values by 2.
Justify the values given for the limit of stability. Do likewise for the system time constant.
15.9 It is proposed to extend the block diagram in Figure 15.12 to obtain an approximation of
the half-sinusoid with 8 segments. Provide the corresponding diagram and the coefficient
values. Give an estimate of the maximum value of the approximation error.
References
1 F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in
the brain”, Psychological Review, vol.65, n∘ 6, pp. 386–408, 1957.
2 K. Hornik, M. Stinchcombe and H. White, “Multilayer feedforward networks are universal
approximators”, Neural Networks, vol.2, n∘ 5, pp. 359–356, 1989.
3 B.M. Wilamowski, “Neural networks architectures and learning algorithms”, IEEE Industrial
Electronics Magazine, vol.3, n∘ 4, pp. 56–63, 2009.
4 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document
recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
5 I. Goodfellow, Y. Bengio and A. Courville, “Deep Learning”, Cambridge, MA, MIT-Press, 2016.
http://www.deeplearningbook.org
6 C.C. Jay Kuo,” The CNN as a guided multilayer RECOS transform, IEEE Signal Processing Maga-
zine, 34 2017, pp.81–89.
7 M. Schuster and K.K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on
Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.
8 M. Narwaria, “The transition from white box to black box”, IEEE Signal Processing Magazine, 38
2021, pp. 163–173.
9 D. Elbrächter, D. Perekrestenkp, P. Grohs and H. Bölcskei, “Deep neural network approximation
theory”, IEEE Transactions on Information Theory, vol.67, n∘ 5, 2021, pp. 2581–2623.
313
16
Error-Correcting Codes
Systems for information processing and transmission constitute a major field of application for
signal-processing techniques. Error-detection and correction techniques are also widely used in
these systems and, therefore, they coexist and even interact with signal processing in communica-
tion equipment.
Generally, coding is presented and taught with a mathematical approach [1]. However, some
of the most commonly used coding techniques exploit signal-processing concepts, results, and
algorithms [2]. For example, Reed–Solomon coding uses the discrete Fourier transform and linear
prediction, convolutional coding is FIR filtering, and turbo codes are related to IIR filtering.
This chapter provides an introduction to some important error-correcting codes from a
signal-processing perspective, allowing readers to gain an understanding of these codes and assess
their strengths and weaknesses. Moreover, a unified view of communication techniques may
result.
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
314 16 Error-Correcting Codes
In fact, to cancel x(n), it is sufficient to apply the following FIR filter transfer function:
The signal is predictable, since, as long as two consecutive samples are known, the entire
sequence can be calculated.
Similarly, the complex signal:
∑
P
x(n) = Ai e jn𝜔i (16.4)
i=1
When signal samples are available, the P elementary signals making up x(n) can be extracted in a
few steps. First, compute the prediction coefficients ai (1 ≤ i ≤ P) from 2P consecutive samples with
the help of the following matrix, obtained by running the recurrence equation (16.6) P times:
⎡ x(P)x(P − 1) … x(1) ⎤ ⎡ a1 ⎤ ⎡ x(P + 1)⎤
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ x(P + 1)x(P) … x(2) ⎥ ⎢ a2 ⎥ ⎢ x(P + 2)⎥
⎢ . ⎥⎢ . ⎥ ⎢ . ⎥
⎢ . ⎥⎢ . ⎥⎥ ⎢⎢
=
. ⎥ (16.7)
⎢ ⎥⎢ ⎥
⎢ . ⎥⎢ . ⎥ ⎢ . ⎥
⎢ x(2P − 1)x(2P − 2) … x(P)⎥ ⎢ aP ⎥⎦ ⎢⎣ x(2P) ⎥⎦
⎣ ⎦⎣
This linear system is solved efficiently using an order iterative algorithm, beginning at order 1 and
finishing at order P. As for the computation load, 2(P + 1)(P + 2) multiplications and P divisions are
needed. For a given order i of the prediction filter, the algorithm computes the prediction error ei+1
at order i + 1 and stops when the error is zero. For i = P, if the error eP+1 is not zero, the number of
components in x(n) is greater than P.
Once the prediction coefficients have been obtained, the frequencies are determined by comput-
ing the roots Zi = ej𝜔i of the prediction filter transfer function H(Z) given by (16.5).
Finally, returning to (16.4), the amplitudes Ai are obtained by expressing P values of the signal
x(n), which yields the equations:
⎡ Z1 Z2 … Zp ⎤ ⎡A1 ⎤ ⎡ x(1) ⎤
⎢Z 2 Z 2 … … Z 2 ⎥ ⎢A ⎥ ⎢ x(2) ⎥
⎢ 1 2 p ⎥ ⎢ 2⎥ ⎢ ⎥
⎢ . . ⎥ ⎢ . ⎥=⎢ . ⎥ (16.8)
⎢ . . ⎥ ⎢ . ⎥ ⎢ . ⎥
⎢ p ⎥ ⎢ ⎥ ⎢ ⎥
⎣ Z1 … Zp ⎦
P
⎣ Ap ⎦ ⎣x(P)⎦
It can be observed that, if the frequencies 𝜔i are known, 2P samples enable us to compute 2P
amplitudes.
As above for (16.7), efficient techniques can be used to solve the matrix equation (16.8) and deter-
mine the amplitudes Ai (1 ≤ i ≤ P) without inverting the matrix.
16.1 Reed–Solomon Codes 315
For example, we can apply the signal sequence x(n) to the filter:
∏
P
( )
Hj (z) = 1 − e j𝜔i z−1 (16.9)
i=1
i≠j
in order to get amplitude Aj .
Overall, it is verified that 2P signal samples are required to identify P components.
From the above development, it appears that two methods exist to generate samples of predictable
signals – one in the frequency domain and one in the time domain, which leads to two variants of
the codes.
K data 2P zeros
x(n)
⇓ FFT of order N = K + 2P
X(k)
(Transmitted signal)
⇓ Channel
X(k) + E(k)
(Received signal)
⇓ Inverse FFT
x(n) + e(n)
Syndrome
e(n)
⇓ Subtraction
x(n)
K data 2P zeros
A strong point of the approach is the simplicity of the decoder, since there is no need to separate
the components of the error signal. However, it requires DFT calculations and, more importantly,
the coding is not systematic. In systematic coding, the useful data are transmitted as such, and they
are just supplemented with protecting data. In fact, the procedure can be modified to obtain that
property.
2P zeros
Channel
Syndrome
e(n)
Subtraction
x(n)
(Transmitted signal) K data 2P values
1
1 – Z–1 X(0)
1
X(1)
1 – WZ–1
[x(0), x(1),…………,x(N–1)] Syndrome
-
-
-
-
1
X(2P – 1)
1 – W 2P–1Z –1
On the transmitter side, the processing can also be performed by filtering. Assuming the data to
be transmitted are [x(2P), …, x(N − 1)], the 2P complementary values [x(0), …, x(2P − 1)] must be
computed. The filter with transfer function:
1 1
H(z) = ∏2P−1 = ∑2P (16.10)
i=0 (1 − wi z−1 ) 1+ i=1 ai z−i
has the following input–output relationship:
∑
2P
y(n) = x(n) − u(n); u(n) = ai y(n − i) (16.11)
i=1
Taking the index inversion into account, the sequence x(2P − i) = u[N − 1 − (2P − i)], (1 ≤ i ≤ 2P)
is fed to the filter and the last 2P outputs – that is, [y(N − 2P,…, y(N − 1] – are zero. The operation
can be checked by taking P = 1, for example.
The above coding and decoding process has been presented assuming the arithmetic operations
are carried out in the complex field. However, the data which must be protected are generally
binary – that is, the signals x(n) are B-bit numbers, and the same applies to the 2P protecting values.
Then, in order for the numerical calculations to be exact, arithmetic operations must be carried out
in finite fields with 2B elements, as indicated in Section 3.7.
0 0 0000
𝛼 0
1 0001
𝛼1 𝛼 0010
𝛼 2
𝛼 2
0100
𝛼3 𝛼3 1000
𝛼 4
𝛼+1 0011
𝛼5 𝛼2 + 𝛼 0110
𝛼 6
𝛼 +𝛼3 2
1100
𝛼7 𝛼3 + 𝛼 + 1 1011
𝛼8 𝛼2 + 1 0101
𝛼 9
𝛼3 + 𝛼 1010
𝛼 10
𝛼 + 𝛼+ 1
2
0111
𝛼 11 𝛼3 + 𝛼2 + 𝛼 1110
𝛼 12
𝛼 +𝛼 +𝛼+1
3 2
1111
𝛼 13 𝛼3 + 𝛼2 + 1 1101
𝛼 14 𝛼 3 +1 1001
The elements of the field are the successive powers of a primitive element 𝛼 such that 𝛼 M = 1 and
M = 2B – 1. The number N of code values must be less than or equal to M.
Example: B = 4; g(x) = x4 + x + 1; M = 15.
Letting 𝛼 4 + 𝛼 + 1 = 0, the 15 elements of the code are given in Table 16.1.
Coding and decoding consist of running the algorithms described in the above sections, using the
code table for the arithmetic operations. Appropriate and optimized algorithms have been devel-
oped; they are generally presented with the help of the polynomial terminology [1]. Accordingly, a
typical temporal decoding corresponds to the following sequence:
– Compute the syndrome (polynomial s(x) of degree 2P).
– Compute the localizing polynomial using the Berlekamp–Massey algorithm (iterative calcula-
tion of the prediction filter coefficients); extract the localizing zeros by the Chien method (sys-
tematic search among the elements of the field).
– Compute the amplitudes of the error by the Forney algorithm.
– Subtract the errors.
Frequency decoding necessitates the definition of a Fourier transform in the field GF(2B ), using
successive powers of a primitive element 𝛼.
bits B of each value must be such that N < 2B . In total, each block contains KB useful bits protected
by (N – K)B redundancy bits. The performance of the code is measured by the probability error per
bit Pb after decoding, as a function of the error probability per symbol Ps before decoding. In case
of scattered errors in the channel, and if the bit error rate is P, the probability for a B-bit symbol to
be error free is: (1 − p)B and we get:
ps = 1 − (1 − p)B (16.12)
After decoding, there is no output error when the number of erroneous symbols in the block of
N symbols is less than the correction capability P. For P + 1 erroneous symbols, the number of bit
errors is at most (P + 1)B and the probability is:
Pp+1 = CNP+1 (ps )p+1 (1 − ps )N−P−1 (16.13)
The corresponding bit error rate is written as:
P + 1 P+1
BER ≤ CN (ps )P+1 (1 − ps )N−P−1 (16.14)
N
In total, the case of (P + i) erroneous symbols (1 ≤ i ≤ N − P) must be taken into account and the
probabilities must be summed. However, where Ps is small, it might be sufficient to consider only
i = 1 and stick to expression (16.14).
Example:
N = 204; K = 188; P = 8; B = 8 bits.
With the line error probability p = 10−3 , we obtain: ps = 1 − (1 − 10−3 )8 ≈ 0.008 and
9 204!
BER ≤ (0.008)9 (0.992)195
204 9! 195!
which is:
BER ≤ 1.6 10−6
Referring to the signal-to-noise ratio, and assuming Gaussian white noise, probability p = 10−3
corresponds to a SNR of about 10 dB, while 1.6 10−6 corresponds to about 14 dB. Thus, the gain
brought by the code for this error probability, the coding gain, is about 4 dB.
Now, p = 10−4 leads to BER ≤ 10−14 and the code cancels the line errors, almost completely, in
this example, for p ≤ 10−4 .
Clearly, Reed–Solomon codes are recommended for applications requiring very low error rates.
The above derivations assume scattered errors. When errors occur in packets, we can return to
the case of scattered errors by introducing symbol interleaving.
Amplitude
2
1.5
0.5
0 Probability
–0.5
–1
–1.5
–2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Assuming uniform distribution of symbols in the hypersphere, the corresponding signal has
energy:
R
1 M
ES = r 2 r M−1 dr … f (𝜃 )d𝜃 = R2 (16.24)
VM ∫0 ∫i=1 ∫M−1 i i M+2
When M tends toward infinity, R2 represents the total energy of the received samples and R2 /M
represents the sum of the signal and noise powers. From (16.22), we obtain:
S + B > 22N B (16.25)
Hence, the channel capacity limit is defined by (16.18).
Now, in order to come close to this limit, the points representing the transmitted signal must be
spread in the hypersphere, which implies that the projections on the coordinate axes are quantized
with more than N bits and that relations exist between these projections.
It is worth pointing out that tending toward infinity for the number of symbols of a block means
infinite transmission delay.
In practice, the number of symbols in a block at decoding is limited to M and the loss in per-
formance must be assessed. The error probability Pe must be related to the ratio of noise standard
(𝜎 )
deviation 𝜎 b and the value (𝜎 b )lim associated with the limit capacity. Letting 𝛼 = b𝜎 lim with 𝛼 > 1,
b
the error probability is expressed as a function of the probability distribution P(r) of the radius r of
the noise hypersphere by:
∞
Pe = P(r)dr (16.26)
∫𝛼√M
To find P(r), consider M unit-variance Gaussian independent random variables bi . The variable:
(M )1∕2
∑ 2
r= bi (16.27)
i=1
0.6
P(r)
0.5
0.4
0.3
0.2
0.1
r
0
800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2600
——— —
√M – 1 α√M
Figure 16.5 Probability distribution of the noise vector modulus.
^
ML d (n–K)
decoder [0,1]
– +
e(n)
Convolutional codes are designed to draw near to the conditions leading to the limit, and a simple
scheme is described, to begin with. As for large blocks, iterative techniques are required, with turbo
codes.
(±1)
y1(n) +
x1(n)
+
b1(n)
d̂ (n – M)
d(n) ML
z–1 z–1 [0,1]
[0,1] decoder
b2(n)
+
y2(n) x2(n)
+
(±1)
Channel
[ ]
b (0) + b2 (0) + b1 (1) + b1 (2) + b2 (2)
(d(0) = 1) ∶ E1 − Emin = Δ1 = 20 1 + 1
5
2. Double error
[ ]
b (0) + b2 (0) + b2 (1) + b2 (2) + b1 (3) + b2 (3)
d(0) and d(1) ∶ Δ2 = 24 1 + 1
6
[ ]
b (0) + b2 (0) + b1 (1) + b1 (3) + b1 (4) + b2 (4)
d(0) and d(2) ∶ Δ′2 = 24 1 + 1
6
The average of noise samples appears in the cost function, with factor 5 for the single-error case
and 6 for the double-error case. It can readily be verified that the averaging factor is greater than 6
for other cases. In the absence of coding, the averaging factor is 2. Thus, with coding, considering
single-error cases only, the coding gain is approximately Gc = 2.5, or 4 dB.
It can be observed that the averaging factor corresponds to the number “1” at the filter outputs
when the error sequence is fed at the inputs. Then, a single error leads to the impulse response.
This number is called the weight of the sequence.
The curves giving the error probability per bit as a function of Eb /N 0 , with and without coding,
are shown in Figure 16.8. The term Eb is the energy per bit and N 0 is the noise power spectral
density. The ratio Eb /N 0 is a theoretical parameter that allows us to obtain generic curves. To get the
practical SNR and determine the rates, it is necessary to apply a factor 2 to account for the symbol
rate equal to twice the channel bandwidth and multiply by the number of bits in each symbol.
With L = 3, the coding gain is close to 4 dB for Pe = 10−6 . In contrast, it is less than 3 dB for
Pe = 10−3 . In fact, it is necessary to consider the vectors in the vicinity of the ideal vector and
add up the error probabilities. Overall, the vectors associated with multiple errors have averaging
factors greater than that of the single-error case and their impact is weak for high-SNR situations
Pe
0
–2
no coding
–4
–6
coding: L = 3
–8
coding: L = 7
–10
coding: L = 9
–12
3 4 5 6 7 8 9 10 11 12 13
Eb/No (dB)
Figure 16.8 Bit error probability for convolutional coding with rate R = 1/2.
326 16 Error-Correcting Codes
and small error probabilities. The error probability is expressed as a function of the averaging factor
k by:
∞x2 − 2 k
1
pk = √ √ e− 2 dx ≈ 0.4 e 2𝜎b (16.33)
2𝜋 ∫ 𝜎b
k
and for k + 1:
( ) ( )
k+1 1
pk+1 ≈ 0.4 exp − ≈ pk exp − 2 (16.34)
2𝜎b2 2𝜎b
The vectors in the vicinity of the ideal vector are obtained from the trellis diagram and the graph
of the transitions between associated states, which constitute an efficient realization of the ML
principle. The generating function of the convolution code is expressed by:
T(D, L, N) = D5 L3 N + D6 L4 (1 + L)N 2 + D7 L5 (1 + L)2 N 3 + D8 L6 (1 + L)3 N 4 (16.35)
where D is the averaging factor, the exponent of N is the number of errors, and the terms of the
polynomial in L represent the multiple error configurations. For example, the averaging factor 7
occurs for 3 errors in the following 4 configurations:
[d(0), d(1), d(2)], [d(0), d(1), d(3)], [d(0), d(2), d(3)], [d(0), d(2), d(4)].
The averaging factor is called “free distance” in coding theory.
For each configuration, the error probability Pk is determined so that the average of the k noise
samples exceeds unity as shown by relations (16.32). The error probability is bounded by:
∑
∞
PE < ak Pk (16.38)
kmin
The inequality reflects the fact that noise samples are involved in several averaging operations.
Approximation (16.33) leads to:
( ) ∞ ( )
kmin ∑ k − kmin
PE < 0.4 exp − 2 ak exp − (16.39)
2𝜎b k=k 2𝜎b2
min
This represents the probability that an error will occur. However, an important feature of
convolutional codes is that the output errors often occur in packets, due to the decoder.
In summary, convolutional codes have the following properties:
● Assuming d(n) = 0, search for the minimum E0 (n) of the error vector norm.
● Assuming d(n) = 1, search for the minimum E1 (n) of the error vector norm.
● Take sq (n) = E1 (n) − E0 (n).
1.2
Errors: l(n)
1
0.8
0.6
0.4
0.2
0
100 200 300 400 500 600 700 800 900 1000
n
The binary data are given the sign of the sequence sq (n). In fact, the sign of sq (n) represents the
sum of the original data and the error signal and |sq (n)| is the input noise after averaging, with
amplitude shift and aliasing about the origin in the presence of errors. Low values of |sq (n)| reflect
the low reliability of the recovered data at corresponding times.
A decoder that delivers the sequence sq (n) is called a soft output decoder. This signal may be
leveraged in decoders in cascades to extract errors, as in turbo codes.
An approach to cope with error packets consists of cascading the convolutional decoder with an
interleaver and a Reed–Solomon decoder. The interleaver performs a permutation which spreads
the errors over several blocks of the RS code, which allows for correction.
y1(n)
d(n)
+ Z–1 Z–1
[0,1]
y2(n)
y(n)
d(n)
+ z –1 z –1
+ y1(n)
Interleaver
+ z –1 z –1
+ y2(n)
De-interleaver
sp(n)
x(n) ∑
Decoder 1 Interleaver Decoder 2
x1(n)
x2(n)
d̂(n – M)
noise signals must be independent – hence the importance of the interleaving involved in recur-
sive coding. Moreover, it is obvious that using systematic coding and keeping the input signal is
essential for the realization.
For low input SNR, many errors occur at first decoding. They are corrected gradually by suc-
cessive iterations. The procedure is stopped when the output signal variance is not decreasing any
more and the retrieved data are obtained as the sign of this signal.
The transmission efficiency is improved with rate R = 1/2 and alternatively transmitting the out-
puts of coders 1 and 2.
Example: Let us consider a coder with L = 5 and the IIR filter transfer function:
H2 (Z) 1 + Z −4
=
H1 (Z) 1 + Z + Z −2 + Z −3 + Z −4
−1
a permutation matrix of size 256 × 256 as interleaver, and the block length 65,536 bits. After 18
iterations, the error probability per bit has become smaller than 10−5 for SNR = 0.7 dB. The SNR
limit value, with rate R = 1/2, equals 0 dB. In this example, the turbo code comes as close as 0.7 dB
to the theoretical limit.
111
110
101 4q
100
2q
011
010
001
000
Convolu- c2
/ x(n)
tional
d2 coder
--- c3
Amplitude
R = 1/2
minimum averaging factor kmin of√the convolutional code can be associated with a multiplication
of the distance between levels by kmin . Then, for kmin > 16, the distance 4q corresponding to the
noncoded bit is dominant and the system coding gain, with respect to the absence of coding, can
come close to factor 2 – that is, 6 dB.
At decoding, the noncoded bit is introduced in the trellis in the form of parallel paths, which
means that the weighting computations are carried out for the 2 possibilities associated with this
bit. The assignment of the amplitudes for the 2 coded bits can also be optimized – the greatest
available distance is associated with transitions in the trellis that start from the same state or reach
the same state. The objective is to maximize the distance between paths.
When the free distance of the code is preponderant, the error probability Pe per symbol is esti-
mated using (16.38) or the simplified version:
( )
kmin
Pe ≈ 0.4 a0 exp − 2 (16.40)
2𝜎b
where a0 is the number of error configurations leading to the minimum averaging factor kmin as
explained in Section 16.2.3.
16.3 Conclusion
Signal-processing techniques can apply to error detection and correction. Reed–Solomon codes
based on the DFT, in combination with linear prediction, are able to correct errors in blocks so that
extremely low error rates can be reached.
Convolutional codes use digital filters, and they bring the equivalent of SNR improvements
in transmission. The decoder is simple in principle and easy to implement. It exploits the ML
technique and the Viterbi algorithm whose complexity depends on the filter order and the length
of the code. The delay is proportional to the code length, multiplied by a factor of a few units.
Convolutional coding can be extended to multi-bit symbols, with the coded modulation technique,
in which one or several low-weight bits are protected.
Turbo codes combine recursive filtering, interleaving, and an iterative procedure. Their perfor-
mance can come close to the theoretical limit of channel capacity. They process data blocks of large
or very large size, which increases complexity and transmission delay.
The combination of a convolutional coder (inner code) and a Reed–Solomon coder (outer code) is
a very powerful setup which yields extremely low error rates in transmission and radio broadcasting
systems.
332 16 Error-Correcting Codes
Exercises
16.1 A Reed–Solomon code uses a DFT of size 32 and the syndrome consists of the 4 values:
S = [2 + j − 1,69 + 2,23 j − 2,12(1 + j)2,23 – 1,67 j]
Compute the prediction coefficients and determine the signal which must be subtracted in
the frequency domain to retrieve the initial signal.
16.2 Cancelling an impulse noise. To protect a block of samples X = [x(0), …,x(5)], a suffix
Sx = [x(6),x(7)] is added, such that, in the DFT of sequence, the terms Y (6) and Y (7) are
null. Give the expression of Sx as a function of X.
After addition of an impulsive noise, we have Y (6) = 2 e−j𝜋/2 , Y (7) = 2 ej3𝜋/4 . Compute the
amplitude of the pulse and its index in the input block. What happens if the spurious pulse
falls between two index values and then, how can its impact be reduced?
16.3 A rate R = 1/2 convolutional code has the coefficients [1111] and [1101]. Give the realization
diagram and show that the free distance is 6. What is the maximum coding gain?
16.4 Assuming binary data, d(n) = ± 1, apply the formula that gives the theoretical capacity
of a channel, in order to obtain the maximum SNR value which allows for error-free
transmission.
The system SNR is assumed to be equal to 8 dB and a rate R = 1/2 convolutional code is
used. If a bit error rate less than 10−8 is targeted, give the coding gain needed. Propose a
code which is able to meet the objective.
References
1 W.W. Peterson and E.J. Weldon, Error Correcting Codes, MIT Press, Cambridge, MA, 1972.
2 R. Blahut, Algebraic Methods for Signal Processing and Communications Coding, Springer-Verlag,
New York, 1992.
3 A. Hocquenghem, Codes Correcteurs d’Erreurs, revue Chiffres, Vol. 2, pp. 147–156, 1959.
4 R.C. Bose and D.K. Ray-Chaudhuri, On a class of error correcting binary group codes,
Information and Control, Vol. 3, 1960, pp. 68–79.
5 A.J. Viterbi and J.K. Omura, Principles of Digital Communications and Coding, McGraw-Hill,
New York, 1979.
6 R. Ziemer and R. Peterson, Introduction to Digital Communication, Chapter 7: Fundamentals of
convolutional coding Prentice Hall, N.J., 2001.
References 333
7 C. Berrou and A. Glavieux, Near optimum error correcting coding and decoding: turbo-codes,
IEEE Transactions Vol. 44, N∘ 10, 1996, pp. 1261–71.
8 C. Berrou, The ten-year-old turbo codes are entering into service, IEEE Communications Maga-
zine, 2003, pp. 111–116.
9 E. Biglieri, D. Divsalar, P.J. McLane and M.K. Simon, Introduction to Trellis-Coded Modulation,
Macmillan, New York, 1991.
335
17
Applications
Signal processing is instrumental in the generalization of electronics to all technical fields. A few
examples of applications are presented in this chapter, mainly in the field of communications.
Assume that the amplitude of a signal component with frequency f 0 is to be determined when the
signal is sampled at frequency f s > 2f 0 . Figure 17.1 represents the set of operations to be performed.
The signal is applied to a narrow band-pass filter centered on the frequency f 0 . Rectification is
then performed by taking the absolute value of the numbers obtained. This set of absolute values
is applied to a low-pass filter which provides the desired value of the amplitude. If the frequency
component f 0 which is to be detected is present, threshold logic provides the logic information.
This process can be analyzed as follows. Assume s0 (t) is the signal to be detected, with:
s0 (t) = A sin(𝜔0 t)
Taking the absolute value of the numbers which represent samples of this signal is equivalent to
multiplying by a square wave ip (t) in phase with s0 , and of unit amplitude. Using equation (1.6), we
can write:
∑
∞
ip (t) = 2 h2n+1 sin[(2n + 1)𝜔0 t] (17.1)
n=0
in which:
sin[𝜋(2n + 1)∕2] 1
h2n+1 = (−1)n =
𝜋(2n + 1)∕2 𝜋(2n + 1)∕2
The signal s∗0 (t) obtained after rectification is:
∑
∞
s∗0 (t) = 2A h2n+1 sin[(2n + 1)𝜔0 t] sin(𝜔0 t)
n=0
or:
∑
∞
s∗0 (t) = Ah1 + A (h2n+1 − h2n−1 ) cos(2n𝜔0 t) (17.2)
n=1
To obtain the amplitude A, terms of the infinite sum have to be eliminated. Above a certain
order, the parasitic products have frequencies greater than half the sampling frequency f s /2, and
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
336 17 Applications
are aliased in the useful band. The specifications of the low-pass filter, and, in particular, the
stop-band edge, have to be chosen so as to eliminate the largest parasites. Those occurring in the
pass band result in fluctuations in the measurement of A.
In this approach, it is advantageous to use an IIR band-pass filter and an FIR low-pass filter,
because the amplitude can be measured at a frequency less than f s . There is another method avail-
able, which involves only:
multirate filters. This is based on modulation by two carriers in quadrature at the frequency f 0
and is shown in Figure 17.2.
The component to be detected is:
s(t) = A sin(𝜔0 t + 𝜑)
where 𝜑 represents the phase of the component relative to the carrier. After low-pass filtering
in the two branches to eliminate the unwanted modulation products, the following signals are
obtained:
A A
SR = ; sin 𝜑; SI = cos 𝜑 (17.3)
2 2
The required amplitude is:
√( )
A = 2 SR2 + S12
√( 2 )
Accurate evaluation of X = SR + S12 is difficult, and one is generally satisfied with an approx-
imation X ′ , which depends on the phase 𝜑.
Table 17.1 gives various approximations and the corresponding relative errors. These errors can
be reduced by multiplication by a scaling factor C – that is, by calculating the value Xc′ :
√( 2 )
XC′ = C SR + SI2
Detection of a modulated frequency generally requires less calculation than the method which
uses a band-pass filter, but it does require the availability of suitable carrier signals. The operation
of detecting a frequency is used in signal transmission systems and forms the basis of receivers of
multifrequency codes.
Low-pass SR
filter
√( 2 )
Table 17.1 Approximation of x = SR + S12
( ) ( )
′ X ′ −X X ′ −X
C
X max X
C max X
Phase-locked loops are used for clock recovery in terminals and receivers [1, 2]. The principle
is illustrated in Figure 17.3. When the loop is in equilibrium, the frequency produced by the
voltage-controlled oscillator is equal to the frequency of the input signal and the phase detector
produces a signal whose continuous component is extracted by the narrowband low-pass filter.
The phase detector can be a modulator which forms the product of the oscillator output and the
signal input. If the nominal frequency of the oscillator is equal to the input frequency, the signals
are in quadrature, and the continuous component at the output of the modulator is zero. If not,
then the phase difference with respect to the quadrature signal produces a continuous component
which shifts the oscillator frequency by the required amount for the frequencies to become equal.
The bandwidth of the loop filter determines the capture range, the response time, and the residual
noise level.
This operation can be replicated entirely digitally. However, there is additional flexibility with
respect to where the phase calculation is performed. The digital oscillator can be realized by means
of a phase accumulator connected to a memory which provides samples of the sinusoid. Thus, the
input phase values can be directly processed at the input and output of the loop, and the phase
difference can be obtained by simple subtraction. A model corresponding to a second-order loop
is shown in Figure 17.4. It is a control loop with two coefficients, K 1 and K 2 , corresponding to the
proportional and integral control terms, respectively.
K1 +
–
+
φe(n) K2
338 17 Applications
The voltage-controlled oscillator is represented by the integrator which provides the output phase
𝜑s (n). The transfer function between the output and the input can be written as:
𝛷S K1 Z −1 + (K2 − K1 )Z −2
H(Z) = = (17.4)
𝛷e 1 − (2 − K1 )Z −1 + (1 − K1 + K2 )Z −2
This is the transfer function of a low-pass filter with a value of 1 at zero frequency. The char-
acteristics of the filter are determined by the two parameters K x and K 2 . The region of stability is
examined by using the results given in Section 6.7 and setting b1 = K 1 = 2 and b2 = 1 – K 1 + K 2 .
This gives 1 – K 1 + K 2 < 1 and |K 1 − 2| < 2 – K 1 + K 2 . In the plane of the coefficients K 1 and K 2 , the
stability domain is a triangle.
The transfer function between the phase shift and the input can be written as:
𝛷e (Z) − 𝛷s (Z) (1 − Z −1 )2
= (17.5)
𝛷e (Z) 1 − (2 − K1 )Z −1 + (1 − K1 + K2 )Z −2
The presence of the term (1 − Z −1 )2 in the numerator shows that such a loop is capable of tracking
a phase variation.
Differential coding of speech leads to reduced data rates, with no additional delay and low compu-
tational complexity.
The speech signal has a spectral density which, in the long term, decreases rapidly with frequency
from a value of less than 1 kHz. Under these conditions, considering the 8 kHz sampling frequency
which is standard in communications, significant prediction gains can be expected [3].
Figure 17.5 shows the principle of differential coding based on linear prediction. At transmission,
the prediction error e(n) is quantized by coder C, and, at the receiver, decoder D delivers the signal
which is fed to the inverse filter to retrieve the speech signal. The set e′ (n) is the result of adding the
quantization error to e(n) and x′ (n) is the set output by the decoder. The signal e(n) is expressed by:
∑
N
e(n) = x(n) − x̃ (n) = x(n) − ai x(n − i)
i=1
The order N of the filter and the coefficients ai (1 ≤ i ≤ N) should be chosen to minimize the power
of the signal e(n). Under these conditions, for a given value of N, the coefficients are calculated as
Coder Decoder
x(n) e(n) eʹ(n) xʹ(n)
– C D +
P P
~ ~
x (n) xʹ(n)
indicated in Section 13.57, from the elements r(k) (0 ≤ k ≤ N) of the autocorrelation function of
x(n). The following normalized values have been suggested for speech signals:
R(0) = 1; R(1) = 0.8644; R(2) = 0.5570; R(3) = 0.2274
They show a strong correlation between neighboring samples. The corresponding coefficients
have values of:
a1 = 1.936; a2 = −1.553; a3 = 0.4972
The eigenvalues of the autocorrelation matrix R3 are:
𝜆1 = 2.532; 𝜆2 = 0.443; 𝜆3 = 0.025
and we have:
Atopt R3 Aopt = 0.947
Thus, the corresponding prediction gain is close to 13 dB.
Improvements can be made to the basic principle, in order to achieve a high level of performance:
(1) The prediction is carried out using the sequence e(n) transmitted after quantization, which
brings a reduction in quantization distortion power. Moreover, the transmitter and receiver
then operate on the same source of information and nothing else has to be transmitted if some
adaptive procedures are introduced.
(2) The quantizer is made adaptive by relating the quantization step to an evaluation of the signal
power – that is, to take advantage of the fact that speech signals are nonstationary but can be
considered almost stationary over short periods of time (of the order of 10 ms).
(3) The prediction is made adaptive to follow the short-term variations of the speech spectrum.
With an adaptive prediction filter of adequate order (for example, 10th-order FIR), the prediction
gain for speech can range from 6 dB for unvoiced sounds to 16 dB for voiced ones, with an overall
subjective value of about 13 dB.
These techniques are used in communication networks and a system named adaptive differential
pulse code modulation (ADPCM) has been standardized by the ITU (International Telecommuni-
cation Union) under recommendation G721 [4].
16 kbits/s
Audio signal MIC-DA 64 kbits/s
QMF
MUX
filters 48 kbits/s
14 bits – 16 kHz MIC-DA
0 16 kHz f
In communication networks, echoes are created when delayed and attenuated replicas of the signal
transmitted by a local terminal toward a distant terminal reach the local receiver.
Specifically, electrical echoes are produced on the transmission lines in the form of reflected
signals due to impedance mismatching and imperfections in the hybrid transformers which per-
form two-wire to four-wire conversions. In the case of speech, these signals are reflected back to
the subscriber who is speaking, and they become problematic as the distance between the sub-
scribers increases. There are also echoes arising from acoustic coupling between the microphone
and loudspeaker in a telephone set, in which case an adaptive echo canceller can be used to provide
a hands-free set.
Echo canceling consists of modeling the echo path and subtracting the generated synthetic echo
from the real echo [7].
Two different cases can be distinguished, depending on the type of signals involved – namely
speech and data. To begin with the simpler case, data modems are considered.
Terminal B Terminal A
H(Z)
2-wire cable
xA(n)
The selection of adaptive filter parameters is driven by the context. The number N of coeffi-
cients is derived from the echo impulse response, taking account of the sampling frequency. The
filter must be adaptive because the transmission line characteristics may evolve over time. Regard-
ing input signals, the context is favorable, because the filter input signal is the transmitted data
sequence xA (n), which is generally uncorrelated, unit power, and has the AC matrix RN = I N . Then,
the performance of the gradient algorithm, LMS, is equivalent to that of the recursive least squares
algorithm (RLS). The adaptation step 𝛿 is bounded by 2/N and the time constant is 𝜏 = 1/𝛿. In the
learning phase, the mean output error power is deduced from (14.28) as:
The norm L2 of the echo coefficient vector ||Hopt ||22 reflects the power of the echo signal.
During bidirectional transmission, the useful signal yB (n) in the reference is smaller than the
echo r A (n) and the echo attenuation Ae must satisfy the inequality:
where As is the echo-to-useful signal ratio. For example, SNR = 40 dB and As = 20 dB leads to
Ae = 60 dB, which implies that the residual error after convergence is very small.
In the adaptation process, the useful signal impacts the coefficients, and the consequence is an
increase of the output residual error. The filter coefficient variance after convergence is 𝜎y2 𝛿∕2, and
the residual error is N times more, N𝜎y2 𝛿∕2. The term 𝜎y2 is the power of the useful signal – that is,
the received data. The SNR objective can be reached if the following inequality is satisfied:
𝛿 1
N < (17.9)
2 SNR
In the above derivations, the output error power is assumed to be close to the useful signal power.
As an illustration, take SNR = 10−4 (40 dB), N = 60, then 𝛿 < 3.3 × 10−6 , which is a very small value
and makes the learning phase very long. The impact on the coefficient accuracy is worth pointing
out. Referring to Section 14.5, a simplified expression for the number of bits of the coefficients is
obtained as:
( )
1 1
bc = log2 + log2 (Ae ) (17.10)
𝛿 2
With the above figures, bc = 29. In practice, there is no need to perform multiplications with such
a high degree of accuracy; it is needed only in the coefficient updating operations.
342 17 Applications
Level
detector
Yes/No
Double
speech
r(n) Level decision
+
detector
–
~
y (n)
e(n)
AE
x(n)
The transmission of image signals on the communication network requires very high bit rates.
Therefore, bit rate reduction techniques, based on digital processing, are crucial – particularly for
television signals.
Overall, a television image is a function of four variables s(x, y, t, 𝜆). There are two spatial
variables: time and wavelength. For transmission purposes, this signal is transformed into a
one-dimensional signal.
The wavelength variable can be dropped by considering that the human visual system basically
consists of three types of receivers, which perform filtering and produce three signals associated
with the primary colors – namely red, green, and blue (R, G, B).
17.6 Television Image Processing 343
The television scanning process converts these three-dimensional signals into a one-dimensional
signal. The images are scanned 25 times per second, with 625 lines per image. In fact, the odd-
and even-numbered lines form two consecutive frames which are multiplexed in time. Hence the
recurrence of 50 interleaved frames per second.
For transmission, the primary components R, G, and B are replaced by linear combinations called
luminance Y and color differences, or chrominance, U and V.
Y = 0.30R + 0.59G + 0.11B
U = R − Y = 0.70R − 0.59G − 0.11B
V = B − Y = −0.30R − 0.59G + 0.89B
Digitization is performed with a frequency of 13.5 MHz for the luminance and 6.75 MHz for
the chrominance signals. Since the analog-to-digital conversion is 8 bits, the corresponding data
rate rises to 216 Mbit/s. This format corresponds to Recommendation CCIR 601 of the ITU and is
described as Type 422. It leads to images presented in the form of tables of 8-bit numbers containing
720 points per line and 576 useful lines, in the case of 625-line scanning. Thus, one image corre-
sponds to 414 720 bytes for the luminance and 207 360 bytes for each chrominance component.
Bit rate reduction techniques rely on the fact that good modeling is provided by the output of a
first-order IIR filter to which Gaussian white noise is applied. The corresponding two-dimensional
autocorrelation function can be written as:
V(x, y) = r0 e−(𝛼x+𝛽y)
where 𝛼 and 𝛽 are positive constants. For the associated spectrum, this gives:
( )
4𝛼𝛽
S(𝜔1 , 𝜔2 ) = r0 ( )( ) (17.11)
𝛼 2 + 𝜔21 𝛽 2 + 𝜔22
The greatest compression in the representation of a signal is obtained with a transformation based
on the eigenvectors of the autocorrelation matrix. In the case of first-order signals, this transforma-
tion is well approximated by the discrete cosine or sine transformation, presented in Sections 3.3.3
and 3.3.4. In the image compression standards, it is the DCT applied to blocks of 8 × 8 picture ele-
ment points, or pixels, which has been retained. The standards formulated for videophones, image
storage, and digital television use the following three techniques [8]:
(1) Motion estimation in order to be able to minimize the difference between the current image
and the preceding one.
(2) The discrete cosine transform to minimize spatial redundancy.
(3) Variable-length statistical coding (VLC).
The general structure of an image encoder is shown in Figure 17.10. The digitizer Q operates
above thresholds which can be set by a control device, allowing a constant bit rate to be obtained
with the help of a buffer memory. For commercial-quality television, the bit rate can be reduced to
around 4 Mbit/s, which represents a compression factor of the order of 50 [9].
Digital filters are used for interpolation and subsampling operations during changes in image
format or movement estimation. These are separable filters. Digital compression of multimedia
signals, speech, image, and sound allows for considerable reduction of the bit rates required for
broadcasting programs, and in combination with the techniques of digital transmission, it offers
the possibility of high spectral efficiency by transmitting several programs in the channels formerly
used for a single analog program. The resulting saving may be quite significant, as is the case in
satellite television broadcasting.
344 17 Applications
Regulation
Source Motion
+ DCT Q VLC
images estimation Multiplex Buffer Bitstream
QI
ICDT
Techniques with high spectral efficiency make intensive use of digital processing and they make
the best use of the characteristics of the channels. In this way, multicarrier techniques can lead
to capacities of several bit/s per hertz on channels which are of limited quality or susceptible to
interference.
Referring to Section 2.4, and in particular to Figure 2.8, notice that orthogonality of the signals is
valid only for frequencies which are at the center of the interval of length f s/2N allocated to each
subchannel; and notice how the subchannels have an area of overlap and that the amplitude of
overlap reduces with increasing frequency difference. On the edges of the transmission channel,
the frequency responses of the subchannels are not symmetric, which can lead to interference. It
is therefore necessary to avoid using extreme subchannels and to provide a margin of at least a few
subchannels on each side of the chosen frequency band.
In the time domain, a practical transmission channel has an impulse response of duration 𝜏. To
avoid superposition of two consecutive OFDM symbols on reception, the symbols must be sepa-
rated by sufficient time; that means it is necessary to introduce a guard interval T g > 𝜏. During this
guard interval, it is necessary to prolong the OFDM symbol, to introduce the circular convolution
mentioned in Section 2.1, and hence to avoid interference between the subchannels. In practice,
the receiver operation is facilitated by the end of the symbol being reproduced at the start after a
time T g (Figure 17.12).
With this device, the received signals are simply multiplied by the DFT of the channel – an effect
which can be compensated for by an equalization in amplitude and phase in each subchannel. To
show this, the Z-transfer function of the channel which contains P ≤ Ng coefficients is defined as
C(Z):
∑
P
C(Z) = Cp Z −p (17.12)
p=0
If x(n) is the transmitted signal, the received signal y(n) can be written:
∑
P
y(n) = Cp x(n − p)
p=0
Ng Ng
r(n)
∑
N−1
x(n) = dk ej(2𝜋∕N)kn (17.13)
k=0
∑
P
∑
N−1
y(n) = Cp dk ej(2𝜋∕N)k(n−p) (17.14)
p=0 k=0
Setting:
∑
P
Hk = Cp e−j(2𝜋∕N)kp
p=0
finally gives:
∑
N−1
y(n) = (dk Hk )ej(2𝜋∕N)kn (17.15)
k=0
The circular convolution property of the DFT is again found, and the receiver provides the trans-
mitted data multiplied by the channel spectrum H k .
The redundancy of the transmitted signals can be exploited by the receiver for synchronization.
In fact, by calculating the following correlation function:
∑
n
r(n) = y(i)y∗ (i − N) (17.16)
i=n−Ng +1
peaks appear as shown in Figure 17.12 which characterize the start of each symbol and allow the
temporal analysis window of the receiver to be adjusted and a contribution made to synchroniza-
tion of the clocks.
Synchronization in time and frequency is a major problem in systems with a large number of
carriers, and special reference symbols are introduced or some subchannels are reserved for fixed
signals called pilots.
Figure 17.13 shows the block diagram of a digital television receiver for terrestrial broadcasts
[11]. The analog interfaces carry the signal in the band 0.76–8.37 MHz and the analog-to-digital
conversion is performed at f s = 18.28 MHz. Then a real-to-complex conversion using a quadrature
filter is performed and the sampling frequency is reduced to 9.14 MHz. Before calculation of the
FFT at N = 8192 points, a complex multiplier performs frequency adjustment on the spectrum
of the signal. Temporal synchronization controls the positioning of the window of the FFT. The
transmitted signal contains 6817 active carriers, and 177 of them are dedicated to pilot signals which
allow for exact synchronization of the receiver, an estimate of the frequency response of the channel
for equalization, and a measure of the distortion in each subchannel. The guard interval can reach
20% of the symbol duration. This system must facilitate transmission rates up to 32 Mbit/s in a
channel with 8 MHz spacing, or 4 bit/s/Hz.
Alternative multicarrier transmission techniques require filter banks as described in Chapter 12.
At the cost of an increase in computational complexity, they allow the cyclic prefix to be avoided,
and deliver a high level of out-of-band signal rejection, which may be critical for coexistence in
networks [12].
17.8 Mobile Radiocommunications 347
Frequency Pilot
synchronization extraction
Decoder
Time
synchronization Data d(n)
The basic mobile radio channel, called Rayleigh, corresponds to the non-line-of-sight (NLOS) trans-
mission between transmitter and receiver. It is represented by the sum of a large set of independent
paths of equivalent propagation delays, with the addition of Gaussian white noise. As a result, use-
ful signals can be viewed as being multiplied by a complex coefficient whose real and imaginary
parts are independent, centered, Gaussian random variables.
In practice, transmission may happen with multiple reflections and diffractions, with differing
propagation delays. Then, the channel is characterized by multipaths with fading, and it is said
to be doubly dispersive, because it exhibits dispersions in both the time and frequency domains.
These obstacles can be overcome by the multicarrier technique OFDM, which is able to exploit the
channel with high bit rates [13].
Processing design and performance evaluations rely on channel models which have been defined
for three multipath configurations:
Power (dB) 0 −1.5 −1.4 −3.6 −0.6 −9.1 −7.0 −12.0 −16.9
348 17 Applications
Power (dB) −1 −1 −1 0 0 0 −3 −5 −7
For each incident wave, and for each of the multipath channels, mobility entails a frequency shift
due to the Doppler effect given by:
v
f = f0 cos(𝜃) (17.17)
c
where v is the speed of the mobile, c is the celerity of light, f 0 is the carrier frequency, and 𝜃 is the
angle of incidence. As a consequence, the signal associated with a given path undergoes a slow
evolution, whose Fourier transform, called the Doppler spectrum, is generally represented by the
following theoretical formula:
1
S(f ) = ( −fD < f < fD
( )2 )0.5
; (17.18)
f
1− f
D
( )
f
C(n) = b1 C(n − 1) − b2 C(n − 2) + u(n); b1 = 2rd cos 2𝜋 √D T ; b2 = rd2 (17.19)
2
The input u(n) is zero-mean Gaussian white noise. The pole is very close to the unit circle – for
example: r d = 0.999 − 0.1 2𝜋 f D T.
) initial phases 𝜃(k). The cissoid
– Sum of N cissoids uniformly spread on the unit circle with(random
with index k undergoes the Doppler shift: Δf (k) = fD cos 2𝜋kN
and the coefficient of the path is
expressed by:
Next, the amplitudes defined in the tables of the multipath models are multiplied by C(n) to get
the coefficients of the radio channel.
The mobile radio channel so defined entails a set of constraints for OFDM. However, thanks to its
richness in multipaths, it offers noteworthy potential gains in capacity and flexibility. In particular,
the MIMO technique described in Section 13.9 allows for multiuser transmission. The principle is
shown in Figure 17.14 for four antennas at the transmitter, two users and two antennas per user at
the receiver.
The principle consists of decomposing the radio channel into signal and null spaces, so that
each user transmits their data in the other user’s null space. Accordingly, the 2 × 4 channel matrix
References 349
U S1 0 0 0 V1 V2 V3 V4
0 S2 0 0
between the transmitter and a user, H 1 , is decomposed into singular values to yield the matrices U,
S, and V, as in Figure 17.14. V is the modulation matrix in the transmitter, S is the singular value
matrix, and U t is the demodulation matrix.
The channel matrix decomposition is written as: H 1 = U S V t . Vectors V 1 and V 2 define the
signal space of user 1, while V 3 and V 4 define the null space. If user 1 is alone, the data couples
to be transmitted are applied to the 4 × 2 matrix [V 1 V 2 ], whose outputs are connected to four
antennas through four OFDM interfaces. Now, if user 2 is active, it must use the null space and its
transmission matrix becomes H 2 [V 3 V 4 ]. This 2 × 2 matrix must be decomposed to get the modu-
lation matrix applied to the data couples of user 2. Then, the same procedure is enforced on user 1,
starting from the decomposition of H2 and performing the transmission in the corresponding null
space. Finally, with the same set of antennas, the radio base station is able to target a set of different
users [15].
Multiantenna systems, with four and eight antennas, are included in the basic mobile radio
standards.
The constraints imposed by the channel on OFDM are as follows, in particular:
– The subcarrier spacing must account for the Doppler frequency of the fastest mobile.
– Transmission by blocks of a limited number of multicarrier symbols.
– Estimation of the channel through preamble and/or scattered pilot data.
– Some measures to counter fading and noise.
– Bandwidth: 10 MHz.
– Sampling frequency: 15.36 MHz.
– FFT size: 1024; cyclic prefix: 72.
– Subcarrier spacing: 15 kHz.
– Number of symbols per block: 14.
Assuming the carrier frequency is 4 GHz and mobile speed is 100 km/h, the maximum Doppler
frequency is 370 Hz, and it has to be compensated for with every transmitted symbol.
Leveraging signal processing techniques and making use of current computation capabilities,
the new generations of wireless and mobile radio systems are able to reach communication quality
levels and bit rates that come close to those of hardwired systems and they contribute significantly
to developing the flexibility of digital networks.
References
1 U. Mengali and A.N. d’Andrea, Synchronization Techniques for Digital Receivers, Plenum Press,
1997.
2 H. Meyr, M. Moeneclay and S.A. Fechtel, Digital Communication Receivers: Synchronization,
channel estimation and Signal Processing, Wiley, Chichester, 1998.
350 17 Applications
3 J.D. Gibson, Adaptive Prediction in Speech Differential Encoding Systems. Proceedings of the
IEEE, 68, 1980.
4 CCITT, Digital Networks, Transmission Systems and Multiplexing Equipment Book III.3,
Geneva, 1985.
5 ISO-IEC 13818, Information Technology, Coding of Moving Pictures and Audio, Geneva, 1996.
6 ISO-IEC 14496, Information Technology, Generic Coding of Audio-Visual Objects, Geneva, 1998.
7 C. Breining, et al., Acoustic echo control: application of very-high-order adaptive filters, IEEE
Signal Processing Magazine, 16(4), 42-69, 1999.
8 J. Chen, U. Koc and K .J. Ray Liu, Design of Digital Video Coding Systems, Marcel Dekker Inc.,
New York, 2002.
9 G. Sullivan and T. Wiegand, Video Compression—from concepts to the H264/AVC standard.
Proceedings of the IEEE, 93(1), 18-31, 2005.
10 T. de Couasnon, R. Monnier and J. B. Rault, OFDM for Digital TV Broadcasting, Signal Process-
ing, Vol. 39, 1994, 1-32.
11 U. Ladebusch and C.A. Liss, Terrestrial DVB—a broadcast technology for stationary, portable
and mobile use, Proceedings of the IEEE, 94(1), 183-193, 2006.
12 M. Renfors, X. Mestre, E. Kofidis and F. Bader, Orthogonal Waveforms and Filter Banks for
Future Communication Systems, Academic Press, Cambridge, MA, USA, 2017.
13 Document: 3rd Generation Partnership Project; Technical Specification Group Radio Access
Network; Evolved Universal Terrestrial Radio Access (E-UTRA); Base Station (BS) Radio
Transmission and Reception (Release 15), 3 GPP TS 36.104 v15.4.0, 2018.
14 F. Pérez Fontan and P. Marino Espineira, Modeling the Wireless Propagation Channel, Wiley,
Chichester, 2008.
15 E. Dahlman, S. Parkvall and J. Skold, 5G-NR: the Next Generation Wireless Access Technology,
Academic Press, Cambridge, MA, USA, 2018.
351
Chapter 1
1 2 ∑
4
(−1)p+1
1.1 IL (t) = 2
+ π 2p−1
cos 2𝜋(2p − 1) Tt
p=1
( ) √
fs 2 2
1.3 H 2
= 𝜋
(0.92 dB).
sin 3𝜋 n
1.5 s(nT) = sr (nT) + jsi (nT) = ej(𝜋∕2)n ( 8 ).
sin 𝜋8 n
p sin 𝜋2 n
1.9 Periodic part; Fourier coefficient: Cn = 𝜋n
.
1−cos 𝜋fT
Non-periodic part; spectrum: S2 (f ) = p(1 − p)T 𝜋 2 f 2 T2
.
1.14 Linear coding (S/N)max = 50 dB; with nonlinear coding, it varies from 35 to 38 dB when the
signal varies from −36 dB to 0 dB.
Chapter 2
2.1 The DFT of the second set is related to the DFT of the first by X ′ (k) = e−jk(𝜋/4) X(k)
2.3 The small differences come from aliasing and decrease when N increases.
2.5 Maximum noise-power at any output: 28. q2 /12. With quantization at eight bits of the coeffi-
cients: |𝜀(i, k)| ≤ 0.003.
q2
2.7 Total roundoff noise: N 12 + Nq2 ; signal-to-noise ratio degradation: ΔSNR = 11.5 dB (input
noise q2 /12).
2.8 Recording: 20 000 samples; memory 160 kbits; cycle time per multiplication: 1 μs.
2.9 Cosine, Hamming, and Blackman windows attenuate the secondary lobes but do not allow
for the detection of weak components.
Chapter 3
3.2 It is sufficient to verify the relations 3.3, 3.4, 3.5, and 3.6.
3.3 Number of real multiplications in radices 2, 4, and 8: 384, 284, and 246.
3.5 Use relations (3.18) and (3.21) to get the two factorizations.
3.6 The complex DFT of order eight leads to 24 real multiplications. The odd transform
leads to 26.
Exercises: Solutions and Hints 353
3.7 With that approach, the operations in Δ12 vanish, which reduces the number of complex
multiplications to 16.
Chapter 4
9−n
4.1 Response to the sequence an ∶ y(n) = an−3 1−a
1−a
for 5 ≤ n ≤ 8.
1
4.3 H(Z) = (1−rej𝜃 Z −1 )(1−re−j𝜃 Z −1 )
.
4.4 Output power: 21; H(𝜔) = 4.41 − 1.536 cos 𝜔 + 0.46 cos 2𝜔.
Chapter 5
5.1 The response is zero for f = 0.288; 0.347; 0.408; 0.469; maximal ripple: 0.08.
Zeros of H(Z): 0.606; 1.651; 0.4292 ± j0.464; 1.073 ± j1.161.
5.2 Coefficients: −0.012; 0; 0.042; 0; −0.093; 0; 0.314; 0.5. Zeros of H(Z): 0.4816; 2.076;
0.3764 ± j0.368; 1.3583 ± j1.328; maximal ripple: 0.03.
5.4 With the sampling frequency f s /2 at the output, the numbers of memories and multiplications
are divided by 2, through interleaving (see Section 10.5).
5.5 In the complex plane, H(Z) is rotated by 𝜋 and ±𝜋/2, which yields a high-pass filter and a
low-pass filter.
Chapter 6
6.1 Follow the derivation in Section 6.1. The difference between filter delay and group delay illus-
trates the nonlinearity of the phase response.
1
6.2 Unit step response: y(n) = 1.8
(1 − (−0.8)n+1 ) + (−0.8)n+1 y(−1).
6.3 Poles: P = 0.78 ± j0.438. The zeros do not add multiplications to the circuit.
0.796 − 1.42Z −1 + Z −2
6.6 H(Z) = ; 𝜏 (𝜔) calculated by (6.45).
1 − 1.42Z −1 + 0.796Z −2 g
Realization possible with three multiplications.
Chapter 7
1.49+1.4 cos 𝜔
7.1 First-order section: |H(𝜔)|2 = 1.81−1.8 cos 𝜔
1.6 sin 𝜔 1.6(0.37 cos 𝜔 − 0.2)
𝜑(𝜔) = tan−1 ; 𝜏(𝜔) =
1.63 − 0.2 cos 𝜔 (0.37 cos 𝜔 − 0.2)2 + 2.66 sin2 𝜔
7.6 Coefficient wordlength: bc ≃ 12 bits. The optimum is obtained through systematic search
about rounding. The critical pole 0.9235 ± j0.189 means it is not possible to reduce bc to
11 bits.
7.7 The filter in Section 7.2.3 can have limit cycles of amplitude less than 3q and frequency near
f s /5.
7.8 The IIR filter requires 7 multiplications and 4 memories while the FIR counterpart requires
8 multiplications and 16 memories.
Exercises: Solutions and Hints 355
7.10 Theoretic order: N = 5.19; for N = 6, 𝛿 1 becomes very small. Coefficient wordlength: bc ≃ 11
bits. Difference between input and internal data: 7 bits.
Chapter 8
[ ] [ ]
1 z 2 1 − z∕2 z∕2
8.1 S= ; t=
z+2 2 z −z∕2 1 + z∕2
1 LC
For LC circuits take, z = Lp + Lp or z = 1
Lp+ Cp
8.2 The diagram is that shown in Figure 8.3, with N = 6 and Y 6 = 0. For
fs = 40 kHz ∶ a1 = a4 = 0.205; a2 = a3 = 0.085;
The coefficients are multiplied by four for f s = 10 kHz.
8.3 Follow the procedure given at the end of Section 8.2 and show the impact of the sampling
frequency. Verify the five-bit curve given in Figure 8.9.
8.4 Lattice filter zeros: 0.6605; 0.6647 ± j0.5020; after rounding the ki to 5 bits: 0.6661;
0.6377 ± j0.5002.
Chapter 9
1 − a cos 2𝜋f −a sin 2𝜋f
9.1 X(f ) = +j .
1 + a2 − 2a cos 2𝜋f 1 + a2 − 2a cos 2𝜋f
9.3 The nonzero terms in the sets xR (n) and xI (n) are interleaved. The operation performed is
analytic filtering: y(n) = 12 e−jn 𝜋5 .
( ) ( )
fe
9.4 Coefficient wordlength: bc ≃ 2 + 12 log2 2𝛥f
+ log2 1
𝛿
.
( ) ( )
Phase shifter order: N ≃ log 𝜋𝜀 log f s f s ;
f f
9.5
( ) ( )1 2 ( )
coefficients: bc ≈ log2 𝜋𝜀 + log2 f s + log2 f s .
f f
1 2
For the example in Section 9.4: N = 4.97; bc ≃ 14 bits.
356 Exercises: Solutions and Hints
With respect to (9.24), the DFT introduces double periodicity in time and frequency.
The values of xc (n) in the vicinity of n = 8 are approximately restored from the first 3 or 7
values of h(n). Clearly, the DFT is efficient but the accuracy is limited.
9.10
[ ] [ ] [ ]−1
x(5) −1 − 0.866j 1 j
= =− T21 X1
x(6) 0.612 − 1.319j −1 e−j3𝜋∕4
Chapter 10
10.1 Coefficient wordlengths: bc = 1, 2, 5, 6, 9, 10, 10, 11, 14. For the half-band filter:
bc ≈ 2 log2 (1∕𝛿m − 𝛿0 ).
10.2 Filters in the cascade of 3 filters: Δf = 0.4 with M = 2; Δf = 0.15 with M = 3; Δf = 0.025
q2
with M = 8. Roundoff noise at the output of a half-band filter: 2M 12 .
q2
After three filters: PN = 20 12 .
Exercises: Solutions and Hints 357
10.3 The function can be carried out with a half-band filter (M = 3) and a low-pass filter with
54 coefficients – hence the computation rate of 264 kmult/s. A direct realization with 100
coefficients leads to 400 kmult/s.
Chapter 11
11.3 Use the procedure described in Section 11.3, without double zero at −1 for H 1 (−Z).
11.5 The reconstruction error is bounded by the quantization step multiplied by twice the sum
of the absolute values of the coefficients.
358 Exercises: Solutions and Hints
Chapter 12
12.1 The transfer function of the first branch of the polyphase network is:
B1(Z) = Z −1∕2 [−0.0218 + 0.0621 Z −4 + 0.1996 Z −8 + 0.0160 Z −12 ]
To simplify, let
H1(Z) = [−0.0218 + 0.0621 Z −1 + 0.1996 Z −2 + 0.0160 Z −3 ]
The frequency response is analyzed, taking Z = ej2𝜋f .
A delay 𝜏 corresponds to the factor e−j2𝜋f𝜏 ; the delays in the branches are as follows:
𝜏 = [1.8875 1.5863 1.3198 1.0188].
These delays compensate the factors due to interleaving: (1/2, 3/2, 5/2, 7/2).
12.3 Assuming the length of the prototype filter impulse response is unity, with a continuous-time
half-sinusoid, we get:
2 cos(𝜋f)
H(f ) =
𝜋 1 − 4f 2
To be compared with the FFT response in similar conditions (duration: 0.5):
( )
𝜋f
sin 2
HFFT (f ) =
𝜋f∕2
The frequency response H(f ) decreases with the square of the frequency.
In each branch of the polyphase network, the coefficient values are h(n) and h(n + N).
Realization by 2N-FFT – it suffices to add two adjacent outputs. More multiplications are
needed.
Chapter 13
13.1 AC function:
[ ]
N −p N −p sin(2𝜋(N − p)f )
r1 (p) = cos(2𝜋pf) + cos(2𝜋(N − 1)f )
N N (N − p) sin(2𝜋f)
[ ]
N −1 sin(2𝜋2(N − 1)f )
r1 (1) = cos(2𝜋f) +
N 2(N − 1) sin(2𝜋f)
For f = 1/8 and N = 16, we get r 1 = 0.618.
Exercises: Solutions and Hints 359
Cancelling y(n) for n > N−1 corresponds to applying a quadrature filter which, due to
the ripple, introduces a deviation for the frequency values which are non-multiples of
1/2N – hence the difference between the estimated and actual frequencies (1.4%). The
bound is CR = 3.7 × 10−6 .
13.2 Time constant 𝜏 = 1/𝛿. For y(n) to approach m within 1% on average, 23 samples are needed.
After the transition phase, the quadratic residual error is given by (14.39). Recursive and
non-recursive estimators are equivalent for n ≃ 2/𝛿.
13.5 AC function:
( )
cos 𝜋4 + cos 𝜋3
1 2𝜋
r(0) = 1; r(1) = ; r(2) = cos
2 4 3
We verify that the zeros of the predictor are situated in the Z-plane between ej𝜋/4 and ej𝜋/3 .
13.6 The roots of the polynomials sit on the unit circle, and they verify the alternation principle.
Chapter 14
14.4 Give the expression of the output error and search for the coefficients which minimize its
power.
To begin with, in the input–output relation of filter H(Z), substitute the reference y(n) for
the output ỹ (n) and compute the optimal coefficient values in that case.
Chapter 15
Iterative method: The adaptation step in (15.2) is set to 1/2 and inputs are used in the follow-
ing order: A1, B1, A2, B2, A3, B3. Normalizing h2 to unity, we get:
H(1) = [0 – 2 1] (line 0 − A1); H2 = [0.2 − 1.2 1]; … . . . .; H6 = [−0.45 − 1.55 1]
15.3 Optimum coefficient values: h1opt = 1.4; h2opt = 1. Error power: E0 = 0.023. With respect to
the estimations given in Section 15.4, the degradation is due to the nonlinearity, which is
not taken into account.
The step 𝛿 = 0.1 entails an additional error power of 5%.
15.4 Figure 15.6 is supplemented with the set of coefficients hij2 . The coefficients of the last stage
become hj13 (j = 1, 2). As estimations, we get a = 1.19 and b = 2.19.
With coefficients h11j = [1.23 1.20 1.44], the two nonlinearities are accounted for to deter-
mine c, which yields c = 1.5. In the presence of noise, the drift of the coefficients is reduced.
15.5 The circuit leads to 1.535 for the slope of the curve at the origin instead of 𝜋/2. This result
is justified by the series expansion of the sine function.
15.7 The equations of the circuits are as set forth in Section 15.6. Parameter C determines the
coefficients of the recursive section.
Coefficients h11,1 and h11,3 must, on average, compensate for the attenuation brought by
f (a). Coefficients hij, 2 must compensate for the attenuation of the signals they multiply.
Approximate values can be obtained by simple calculations based on development (15.28).
Reducing the initial values entails a reduction of the errors involved in the initial coefficient
updating and an increase of the system time constant.
As for stability issues and time constant, see (14.16), (15.22), and (6.18).
Chapter 16
16.2 Splitting matrix T8 into four appropriate blocks A, B, C, and D, we find: Sx = − C−1 D X .
Spurious impulse: amplitude 2 and index 3.
Between 2 index values, the impulse impacts the whole useful signal, and it is necessary to
subtract the signal provided by linear prediction.
16.3 For a single error, N1 = 7 and for a double error, N2 = 6. Coding gain: 3 (4.7 dB)
16.4 Error-free transmission of binary data requires at least SNR = 3, or 4.77 dB.
If the system signal-to-noise ratio is 8 dB, the code must bring a gain of 7 dB.
A code with length L = 9 provides the gain 7 dB for bit error rate 10−8 .
Index
Note: Italicized and bold page numbers refer to figures and tables, respectively.
inverse functions, determining 248–249 vs. infinite impulse response (IIR) filters
nonuniform 239 169–170
phase shifts in 228 interpolation using 202–203, 202
polyphase network, analyzing the elements least-squares method, coefficients of 2D FIR
of 247–248 filters by 118–121, 118
prototype filter, determining the coefficients low-pass FIR filter 217–220
of 253, 254 minimum-phase filters 111–113
pseudo-QMF filters, banks of 249–253 number of bits for coefficients, limitation of
real filters, realizing a bank of 254–256 107–109
with two filters 226 number of coefficients and filter
uniform 239 characteristic, relationships between
using polyphase networks and discrete 102–104
Fourier transform 227–229 practical transfer functions and linear phase
filtering see also adaptive filtering; finite filters 91–94
impulse response (FIR) filters; infinite quadrature filter 195–197
impulse response (IIR) filters; multirate raised-cosine transition filter 104–106,
filtering 104
analog 173, 176 realization of 106, 107
cosine 89, 90 by contour extraction 117
digital 173, 224 structures for implementing 106
“max flat” filtering 203 transposed structure 106, 107
raised-cosine 90 two-dimensional 114–117, 116, 117
of random signals 82 coefficients of, by least-squares method
filtering function 49, 96, 106, 134, 197 118–121, 118
Butterworth 151 Z–transfer function of 109–111
of discrete Fourier transform 46–48, 47, 66 “finite memory” filter 89
elliptic 154 first half-band filter 223, 223
of filter bank 229 first-order filter section 123–127, 124, 126
low-pass 157 Fourier coefficients 2, 3
filter design with a large number of coefficients Fourier expansion 2
113–114 Fourier series xvi, 1, 2
finite impulse response (FIR) filters xvi, relations with 37
89–91, 123, 183, 235, 323, 328 Fourier series expansion 3
adaptive FIR filtering in cascade form coefficients calculation by 94–96
289–291 of periodic function 1–2
analytic 197 Fourier transforms xvi, xvii, 3, 4, 14, 35, 49,
coefficients calculation 79 see also discrete Fourier transform
by Chebyshev approximation 100–102 (DFT); fast Fourier transform (FFT)
by discrete Fourier transform 99–100 of a distribution 6
by Fourier series expansion for frequency of a function 2–4
specifications 94–96 inverse 10
by least-squares method 97–99 odd discrete 60, 60
configuration of the zeros in 110 of real and causal set 189–192, 190
design of filters with large number of fourth-order transform, matrix for 40
coefficients 113–114 frequency detection 335–336, 336
direct structure 106, 107 by band-pass filtering 336
half-band 220–222 by complex filtering 336
368 Index
weight of the sequence 325 Z-transform xvii, 77–85, 125, 224, 290
white noise 9, 19, 82, 130 decimation and 213–217, 214
inverse 127
z one-sided 126, 132
Z-transfer function xvii 80, 85, 112, 123, 127,
134, 140, 147, 149, 150, 152, 157, 173,
178, 224, 228
WILEY END USER LICENSE AGREEMENT
Go to www.wiley.com/go/eula to access Wiley’s ebook EULA.