0% found this document useful (0 votes)
88 views

2022-Maurice Bellanger - Digital Signal Processing_ Theory and Practice, 10th Edition-WILEY (2024) (1)

Libro señales digitales

Uploaded by

Carlos Carvajal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views

2022-Maurice Bellanger - Digital Signal Processing_ Theory and Practice, 10th Edition-WILEY (2024) (1)

Libro señales digitales

Uploaded by

Carlos Carvajal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 397

本书版权归John Wiley & Sons Inc.

所有
Digital Signal Processing
Digital Signal Processing

Theory and Practice

Tenth Edition

Maurice Bellanger
CNAM, Paris
France

Translated by
Benjamin A. Engel
This edition first published 2024
Copyright © 2024 by John Wiley & Sons Ltd. All rights reserved.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any
form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law.
Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/
permissions.

The right of Maurice Bellanger to be identified as the author of this work has been asserted in accordance with law.
Originally published in France as: Traitement numérique du signal 10th edition By Maurice BELLANGER
© Dunod 2022, Malakoff

Registered Offices
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at
www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears
in standard print versions of this book may not be available in other formats.

Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or
its affiliates in the United States and other countries and may not be used without written permission. All other
trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product
or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty


While the publisher and authors have used their best efforts in preparing this work, they make no representations or
warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all
warranties, including without limitation any implied warranties of merchantability or fitness for a particular
purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional
statements for this work. This work is sold with the understanding that the publisher is not engaged in rendering
professional services. The advice and strategies contained herein may not be suitable for your situation. You should
consult with a specialist where appropriate. The fact that an organization, website, or product is referred to in this
work as a citation and/or potential source of further information does not mean that the publisher and authors
endorse the information or services the organization, website, or product may provide or recommendations it may
make. Further, readers should be aware that websites listed in this work may have changed or disappeared between
when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of
profit or any other commercial damages, including but not limited to special, incidental, consequential, or
other damages.

Library of Congress Cataloging-in-Publication Data

Names: Bellanger, Maurice, author. | Engel, Benjamin A., translator.


Title: Digital signal processing : theory and practice / Maurice Bellanger,
CNAM; translated by Benjamin A. Engel.
Other titles: Traitement numérique du signal. English
Description: Tenth edition. | Hoboken, NJ, USA : Wiley, 2024. | Translation
of: Traitement numérique du signal.
Identifiers: LCCN 2023049303 (print) | LCCN 2023049304 (ebook) | ISBN
9781394182664 (cloth) | ISBN 9781394182671 (adobe pdf) | ISBN
9781394182688 (epub)
Subjects: LCSH: Signal processing–Digital techniques.
Classification: LCC TK5102.9 .B4513 2024 (print) | LCC TK5102.9 (ebook) |
DDC 621.382/2–dc23/eng/20240110
LC record available at https://lccn.loc.gov/2023049303
LC ebook record available at https://lccn.loc.gov/2023049340

Cover Design: Wiley


Cover Image: © MR.Cole_Photographer/Getty Images

Set in 9.5/12.5pt STIXTwoText by Straive, Chennai, India


v

Contents

Foreword (Historical Perspective) xi


Preface xiii
Introduction xv

1 Signal Digitizing – Sampling and Coding 1


1.1 Fourier Analysis 1
1.1.1 Fourier Series Expansion of a Periodic Function 1
1.1.2 Fourier Transform of a Function 2
1.2 Distributions 4
1.2.1 Definition 4
1.2.2 Differentiation of Distributions 5
1.2.2.1 The Fourier Transform of a Distribution 6
1.3 Some Commonly Studied Signals 6
1.3.1 Deterministic Signals 7
1.3.2 Random Signals 8
1.3.3 Gaussian Signals 9
1.3.3.1 Peak Factor of a Random Signal 11
1.4 The Norms of a Function 12
1.5 Sampling 13
1.6 Frequency Sampling 14
1.7 The Sampling Theorem 15
1.8 Sampling of Sinusoidal and Random Signals 16
1.8.1 Sinusoidal Signals 17
1.8.2 Discrete Random Signals 18
1.8.3 Discrete Noise Generation 19
1.9 Quantization 20
1.10 The Coding Dynamic Range 22
1.11 Nonlinear Coding with the 13-segment A-law 24
1.12 Optimal Coding 26
1.13 Quantity of Information and Channel Capacity 28
1.14 Binary Representations 29
1.A Appendix 1: The Function I(x) 31
1.B Appendix 2: The Reduced Normal Distribution 31
Exercises 32
References 34
vi Contents

2 The Discrete Fourier Transform 35


2.1 Definition and Properties of the Discrete Fourier Transform 36
2.2 Fast Fourier Transform (FFT) 38
2.2.1 Decimation-in-time Fast Fourier Transform 39
2.2.2 Decimation-in-frequency Fast Fourier Transform 41
2.2.3 Radix-4 FFT Algorithm 43
2.2.4 Split-radix FFT Algorithm 44
2.3 Degradation Arising from Wordlength Limitation Effects 45
2.4 Calculation of a Spectrum Using the DFT 46
2.4.1 The Filtering Function of the DFT 46
2.4.2 Spectral Resolution 48
2.5 Fast Convolution 50
2.6 Calculations of a DFT Using Convolution 51
2.7 Implementation 52
Exercises 52
References 54

3 Other Fast Algorithms for the FFT 55


3.1 Kronecker Product of Matrices 55
3.2 Factorizing the Matrix of a Decimation-in-Frequency Algorithm 56
3.3 Partial Transforms 58
3.3.1 Transform of Real Data and Odd DFT 59
3.3.2 The Odd-time Odd-frequency DFT 61
3.3.3 Sine and Cosine Transforms 63
3.3.4 The Two-dimensional DCT 66
3.4 Lapped Transform 66
3.5 Other Fast Algorithms 67
3.6 Binary Fourier Transform – Hadamard 71
3.7 Number-Theoretic Transforms 71
Exercises 73
References 74

4 Time-Invariant Discrete Linear Systems 77


4.1 Definition and Properties 77
4.2 The Z-Transform 78
4.3 Energy and Power of Discrete Signals 80
4.4 Filtering of Random Signals 82
4.5 Systems Defined by Difference Equations 83
4.6 State Variable Analysis 85
Exercises 86
References 87

5 Finite Impulse Response (FIR) Filters 89


5.1 FIR Filters 89
5.2 Practical Transfer Functions and Linear Phase Filters 91
5.3 Calculation of Coefficients by Fourier Series Expansion for Frequency
Specifications 94
Contents vii

5.4 Calculation of Coefficients by the Least-Squares Method 97


5.5 Calculation of Coefficient by Discrete Fourier Transform 99
5.6 Calculation of Coefficients by Chebyshev Approximation 100
5.7 Relationships Between the Number of Coefficients and the Filter Characteristic 102
5.8 Raised-Cosine Transition Filter 104
5.9 Structures for Implementing FIR Filters 106
5.10 Limitation of the Number of Bits for Coefficients 107
5.11 Z–Transfer Function of an FIR Filter 109
5.12 Minimum-Phase Filters 111
5.13 Design of Filters with a Large Number of Coefficients 113
5.14 Two-Dimensional FIR Filters 114
5.15 Coefficients of Two-Dimensional FIR Filters by the Least-Squares Method 118
Exercises 121
References 122

6 Infinite Impulse Response (IIR) Filter Sections 123


6.1 First-Order Section 123
6.2 Purely Recursive Second-Order Section 127
6.3 General Second-Order Section 134
6.4 Structures for Implementation 138
6.5 Coefficient Wordlength Limitation 140
6.6 Internal Data Wordlength Limitation 141
6.7 Stability and Limit Cycles 142
Exercises 144
References 145

7 Infinite Impulse Response Filters 147


7.1 General Expressions for the Properties of IIR Filters 147
7.2 Direct Calculations of the Coefficients Using Model Functions 148
7.2.1 Impulse Invariance 149
7.2.2 Bilinear Transform 150
7.2.2.1 Butterworth Filters 151
7.2.2.2 Elliptic Filters 153
7.2.2.3 Calculating any Filter by Transformation of a Low-pass Filter 157
7.2.3 Iterative Techniques for Calculating IIR Filter with Frequency 158
7.2.3.1 Minimizing the Mean Square Error 158
7.2.3.2 Chebyshev Approximation 159
7.2.4 Filters Based on Spheroidal Sequences 160
7.2.5 Structures Representing the Transfer Function 162
7.2.6 Limiting the Coefficient Wordlength 164
7.2.7 Round-Off Noise 167
7.2.8 Comparison of IIR and FIR Filters 169
Exercises 170
References 171

8 Digital Ladder Filters 173


8.1 Properties of Two-Port Circuits 173
viii Contents

8.2 Simulated Ladder Filters 176


8.3 Switched-Capacitor Filters 180
8.4 Lattice Filters 183
8.5 Comparison Elements 187
Exercises 188
References 188

9 Complex Signals – Quadrature Filters – Interpolators 189


9.1 The Fourier Transform of a Real and Causal Set 189
9.2 Analytic Signals 192
9.3 Calculating the Coefficients of an FIR Quadrature Filter 195
9.4 Recursive 90∘ Phase Shifters 197
9.5 Single Side-Band Modulation 199
9.6 Minimum-Phase Filters 200
9.7 Differentiator 201
9.8 Interpolation Using FIR Filters 202
9.9 Lagrange Interpolation 203
9.10 Interpolation by Blocks – Splines 204
9.11 Interpolations and Signal Restoration 206
9.12 Conclusion 208
Exercises 209
References 211

10 Multirate Filtering 213


10.1 Decimation and Z-Transform 213
10.2 Decomposition of a Low-Pass FIR Filter 217
10.3 Half-Band FIR Filters 220
10.4 Decomposition with Half-Band Filters 222
10.5 Digital Filtering by Polyphase Network 224
10.6 Multirate Filtering with IIR Elements 227
10.7 Filter Banks Using Polyphase Networks and DFT 227
10.8 Conclusion 229
Exercises 229
References 230

11 QMF Filters and Wavelets 233


11.1 Decomposition into Two Sub-Bands and Reconstruction 233
11.2 QMF Filters 233
11.3 Perfect Decomposition and Reconstruction 236
11.4 Wavelets 238
11.5 Lattice Structures 242
Exercises 243
References 243

12 Filter Banks 245


12.1 Decomposition and Reconstruction 245
12.2 Analyzing the Elements of the Polyphase Network 247
Contents ix

12.3 Determining the Inverse Functions 248


12.4 Banks of Pseudo-QMF Filters 249
12.5 Determining the Coefficients of the Prototype Filter 253
12.6 Realizing a Bank of Real Filters 254
Exercises 257
References 257

13 Signal Analysis and Modeling 259


13.1 Autocorrelation and Intercorrelation 259
13.2 Correlogram Spectral Analysis 261
13.3 Single-Frequency Estimation 262
13.4 Correlation Matrix 264
13.5 Modeling 266
13.6 Linear Prediction 268
13.7 Predictor Structures 270
13.7.1 Sensor Networks – Antenna Processing 272
13.8 Multiple Sources – MIMO 273
13.9 Conclusion 275
Appendix: Estimation Bounds 275
Exercises 276
References 277

14 Adaptive Filtering 279


14.1 Principle of Adaptive Filtering 279
14.2 Convergence Conditions 282
14.3 Time Constant 284
14.4 Residual Error 285
14.5 Complexity Parameters 286
14.6 Normalized Algorithms and Sign Algorithms 288
14.7 Adaptive FIR Filtering in Cascade Form 289
14.8 Adaptive IIR Filtering 291
14.9 Conclusion 293
Exercises 294
References 295

15 Neural Networks 297


15.1 Classification 297
15.2 Multilayer Perceptron 299
15.3 The Backpropagation Algorithm 300
15.4 Examples of Application 303
15.5 Convolution Neural Networks 306
15.6 Recurrent/Recursive Neural Networks 307
15.7 Neural Network and Signal Processing 308
15.8 On Activation Functions 309
15.9 Conclusion 310
Exercises 310
References 312
x Contents

16 Error-Correcting Codes 313


16.1 Reed–Solomon Codes 313
16.1.1 Predictable Signals 313
16.1.2 Reed–Solomon Codes in the Frequency Domain 315
16.1.3 Reed–Solomon Codes in the Time Domain 316
16.1.4 Computing in a Finite Field 317
16.1.5 Performance of Reed–Solomon Codes 318
16.2 Convolutional Codes 319
16.2.1 Channel Capacity 320
16.2.2 Approaching the Capacity Limit 321
16.2.3 A Simple Convolutional Code 323
16.2.4 Coding Gain and Error Probability 326
16.2.5 Decoding and Output Signals 327
16.2.6 Recursive Systematic Coding (RSC) 328
16.2.7 Principle of Turbo Codes 329
16.2.8 Trellis-Coded Modulations 330
16.3 Conclusion 331
Exercises 332
References 332

17 Applications 335
17.1 Frequency Detection 335
17.2 Phase-locked Loop 337
17.3 Differential Coding of Speech 338
17.4 Coding of Sound 339
17.5 Echo Cancelation 340
17.5.1 Data Echo Canceller 340
17.5.1.1 Two-wire Line 340
17.5.2 Acoustic Echo Canceler 342
17.6 Television Image Processing 342
17.7 Multicarrier Transmission – OFDM 344
17.8 Mobile Radiocommunications 347
References 349

Exercises: Solutions and Hints 351

Index 363
xi

Foreword (Historical Perspective)

The most important and most impactful technical revolutions are not always those that are most
evident to a product’s end user. Modern digital signalprocessing methods fall into the category of
impactful technical revolutions whose consequences are not immediately perceptible, and which
do not make the front page.
It is interesting to reflect, for a moment, on the way in which such techniques emerge. Digital
computation, applied to a signal in the broadest sense, is certainly not a new idea in itself. When
Kepler derived the laws of motion of the planets from the series of observations made by his
father-in-law Tycho Brahe, his was a truly numerical computation of the signal – in this case, the
signal being Brahe’s observations of the planets’ positions over time. In recent decades, though,
digital signal processing has become a discipline in its own right. What has changed is the way it
can now process electrical signals in real time, using digital technologies.
This leap forward is the cumulative result of technical progress in numerous fields – starting,
of course, with the capability of recording the data we wish to process in the form of an electrical
signal. This has been contingent on the gradual development of what are known as information
sensors, which can range in complexity from a simple stress gage (which, in itself, took a great deal
of research in solid mechanics to make possible) to a radar system.
In addition, with the marvelous progress in micro-electronics came the necessary technological
tools, capable, at the extremely fast rates required for real-time processing, of performing the arith-
metical operations that the earliest computers (the ENIAC was built in 1945, not long ago in the
grand scheme of things) took hours to do, often being interrupted by repeated breakdowns. Today,
these operations can be carried out by microprocessors weighing only a few grams and consuming
only a few milliwatts of power, capable of functioning for over a decade without breakdown.
Finally, we have had to wait for progress in programming techniques – i.e. the optimal use of these
technologies – because though the computational capacities of modern microprocessors are vast, it
is unwise to waste those capacities on performing unnecessary operations. The invention of the fast
Fourier transform algorithms is one of the most striking examples of the importance of program-
ming methods. This convergence of technical progress in fields ranging from physics to electronics
to mathematics has not been unintentional. To a certain extent, every step forward has created a
new problem, which was then solved by new progress in a different field. It would undoubtedly be
helpful, from the standpoint of the history and epistemology of science and technology, to have an
in-depth study of this lengthy and complicated process.
Indeed, the consequences are already considerable. Indisputably, analog processing of electrical
signals came before digital processing, and analog processing will surely continue to have an impor-
tant role to play in certain applications, but the benefits of digital processing can be expressed in two
words: accuracy and reliability. Certain applications have only been made possible by the accuracy
xii Foreword (Historical Perspective)

and reliability offered by digital technologies, which go far beyond the sectors of electronics and
telecommunications in which these techniques first emerged. As one example among many, in
X-ray tomodensitography, scanners are based on the application of a theorem developed by Johann
Radon in 1917. Only the developments mentioned above have enabled the practical implemen-
tation of this new medical diagnostic tool. It is a safe bet that, in tomorrow’s world, digital signal
processing techniques will be used in increasingly varied products, including consumer electron-
ics. However, it is an equally safe bet that the general public, while benefitting from the lower
prices and higher performance and reliability offered by these techniques, will remain blissfully
unaware of the phenomenal and complex combination of research, technology, and invention
represented by this progress. This shift has already begun in the case of television receivers.
However, when these technical revolutions take place, another problem almost inevitably arises.
We need to train users to get to grips not just with a new tool, but often, an entirely new way of
thinking. If we are not careful, such training can easily become a bottleneck, delaying the intro-
duction of new techniques. Therefore, this book is a particularly important addition to the field.
Its author, Maurice Bellanger, has been teaching for many years at the École Nationale Supérieure
des Télécommunications and the Institut Supérieur d’Électronique de Paris. It is a highly didactic
book, containing relevant exercises as well as in-depth explanations and multiple programs, which
certain people will often be able to make use of exactly as they are. Without a doubt, it will help
open the door to desirable and necessary evolution.

P. Aigrain, 1981
xiii

Preface

In signal processing, digital techniques offer a fantastic range of possibilities: rigorous system
design, flexibility, reproducibility of equipment, stability of operating features, and ease of super-
vision and monitoring. However, there is a certain amount of abstractness in these techniques,
and, in order to apply them to real-world cases, we need a set of theoretical knowledge, which
may represent an obstacle to their use. This book aims to break down the barriers and make
digital techniques accessible to readers by drawing the connection between theory and practice
and providing users with the most widely used results in the domain, at their fingertips.
The foundation upon which this book is built is the author’s teaching at engineering
schools – first the École nationale supérieure des télécommunications and the Institut supérieur
d’électronique de Paris, and later, Supélec and CNAM. The book offers a clear and concise
presentation of the main techniques used in digital processing, comparing them on their merits
and giving the most useful results in a form that is directly usable, both for the design and for
the concrete implementation of systems. Theoretical explanations have been condensed to what
is absolutely necessary for a thorough understanding and a correct application of the results.
Bibliographic references are provided, where interested readers will find further information
about the topics discussed herein. At the end of each chapter are a few exercises, often drawn from
real-world examples, to allow readers to test their absorption of the material in the chapter and
familiarize themselves with its application. Answers to these exercises and guidelines are given at
the end of the book.
With respect to previous editions, this new edition offers additional information, simplifi-
cations, and also a new chapter about one of the most important tools in the field of artificial
intelligence – neural networks, as they relate to adaptive systems.
As with the previous editions, this one owes a great deal to the author’s students and colleagues.
Thanks to them all for their contributions and assistance.
xv

Introduction

A signal is the medium carrying information, transmitted by a source to a receiver. In other


words, a signal is the vehicle of intelligence in systems. It transports commands in control and
remote-control equipment; it carries data such as information, spoken words, or images across
networks. It is particularly fragile and needs to be handled with a great deal of care. Signal
processing is applied in order to extract information, alter the message being carried, or adapt
the signal to the transmission techniques being used. It is here that digital techniques come into
play. Indeed, if we imagine substituting the signal with a set of numbers, representing its value or
amplitude at carefully chosen times, then its processing, even in the most elaborate of forms, boils
down to a sequence of logical and arithmetical operations on that set of numbers, committing the
results to memory.
A continuous analog signal is converted into a digital signal by sensors which act on readings,
or directly in the devices producing or receiving the signal. The operations taking place in the
wake of that conversion are carried out by digital computers, tasked or programmed to perform
the sequence of operations by which the desired processing is defined.
Before introducing the content of each chapter of this book, it is wise to precisely define the
processing of which we speak here.
Digital signal processing refers to the set of operations, arithmetic calculations, and number
manipulations, which are applied to a signal to be processed, represented by a series or a set of
numbers, to produce another series or set of numbers, which represent the processed signal. In
this way, an immense variety of functions can be performed, such as spectral analysis, linear or
nonlinear filtration, transcoding, modulation, detection, estimation, and parameter extraction. The
machines used are digital computers.
The systems corresponding to this processing obey the laws of discrete systems. In certain cases,
the numbers to which the processing is applied may be derived from a discrete process. How-
ever, they often represent the amplitude of samples taken from a continuous signal, and, in that
case, the computer must be downstream of an analog-to-digital converter and possibly upstream
of a digital-to-analog converter. In designing such systems, and in studying how they work, signal
digitization is fundamentally important, and the operations of sampling and encoding must be ana-
lyzed in terms of their principles and their consequences. The theory of distributions is a concise,
simple, and effective means of such analysis. Following the presentation of certain fundamental
aspects concerning Fourier analysis, distributions, and signal representation, Chapter 1 contains
the most important and most useful results for sampling and encoding of a signal.
The advent of digital processing dates from the discovery of fast computational algorithms of
the discrete Fourier transform. Indeed, this transform is the basis for the study of discrete systems.
In digital processing, it is the equivalent of the Fourier transform in analog processing, enabling
xvi Introduction

us to transition from the discrete-time domain to the discretefrequency domain. It lends itself very
well to spectral analysis, with a frequency step dividing the sampling frequency of the signals being
analyzed.
Fast computation algorithms offer gains, as they enable operations to be performed in real time,
provided certain elementary conditions are met. Thus, the discrete Fourier transform is not only a
fundamental tool in determining the processing characteristics and in the study of the impacts of
those characteristics on the signal, but it is also used in the production of popular devices, such as
mobile radio and digital television. Chapters 2 and 3 are dedicated to these algorithms. To begin
with, they present the elementary properties and the mechanism of fast computation algorithms
and their applications before moving on to a set of variants associated with practical situations.
A significant portion of this book is devoted to the study of one-dimensional invariant linear
time-discrete systems, which are easily accessible and highly useful. Multi-dimensional systems,
and, in particular, two- and three-dimensional systems, are experiencing significant development.
For example, they are applied to images. However, their properties are generally deduced from
those of one-dimensional systems, of which they are often merely simplified extensions. Nonlinear
or time-variable systems either contain a significant subset, retaining the properties of linearity and
time-invariance, or can be analyzed with the same techniques as systems that have those properties.
Linearity and time-invariance lead to the existence of a convolution relation, which governs the
operation of the system or filter having those properties. This convolution relation is defined on
the basis of the system’s response to the elementary signal which represents a pulse – the impulse
response – by an integral in the case of analog signals. Thus, if x(t) denotes the signal to be filtered,
and h(t) is the filter impulse response, the filtered signal y(t) is given by the equation:

y (t) = h (𝝉) x (t − 𝝉) d𝝉
∫−∞
In these conditions, such a relation, which directly expresses the filter’s real operation, offers
limited practical interest. To begin with, it is not particularly easy to determine the impulse
response on the basis of criteria that define the filter’s intended operation. In addition, an equation
that contains an integral cannot easily be used to recognize and check the filter’s behavior. Design
is much easier to address in the frequency domain because the Laplace transform or Fourier
transform can be used to move to a transformed plane where the convolution relations from the
amplitude–time plane become simple products of functions. The Fourier transform matches the
system’s frequency response to the impulse response, and the filtration is then the product of that
frequency response by the Fourier transform, or spectrum, of the signal to be filtered.
In discrete digital systems, the convolution is expressed by a sum. The filter is defined by a series
of numbers, representing its impulse response. Thus, if the series to be filtered is written as x(n),
the filtered series y(n) is expressed by the following sum, where n and m are integers:

y (n) = h (m) x (n − m)
m

Two scenarios then arise. Firstly, the sum may pertain to a finite number of terms – i.e. the h(m)
values are zero, except for a finite number of values of the integer variable m. The filter is known as
a finite impulse-response filter. In reference to its realization, it is also referred to as non-recursive,
because it does not require a feedback loop from output to input in its implementation. It occupies
finite memory space because it only retains the memory of an elementary signal – an impulse, for
example – for a limited time. The numbers h(m) are called the coefficients of the filter, which they
define completely. They can be calculated directly, in a very simple way – for instance, by means of a
Fourier series development of the frequency response. This type of filter exhibits highly interesting
Introduction xvii

original features (for example, the possibility of a rigorously linear phase response – i.e., a constant
group delay); the signals whose components are within the filter’s passband are not deformed as
they pass through the filter. This possibility is exploited in data transmission systems, or spectral
analysis, for example.
Alternatively, the sum may pertain to an infinite number of terms, and the h(m) may have
an infinite number of nonzero values; the filter is called an infinite impulse-response filter, or
recursive, because its memory must be set up as a feedback loop from output to input. Its operation
is governed by an equation whereby an element in the output series y(n) is calculated by the
weighted sum of a number of elements of the input series x(n), and a certain number of elements
of the previous output series. For example, if L and K are integers, the filter’s operation may be
defined by the following equation:

L

K
y (n) = al x (n − l) − bk y (n − k)
l=0 k=1

The al(l = 0, 1, …, L) and bk(k = 1, 2, …, K) are the coefficients. As is the case with analog
filters, this type of filter generally cannot easily be studied directly; it is necessary to go through
a transformed plane. The Laplace or Fourier transforms could be used for this purpose. However,
there is a transform that is much more suitable – the Z transform, which is the equivalent for
discrete systems. A filter is characterized by its Z-transfer function, generally written as H(Z),
which involves the coefficients in the following equation:

L
al Z−l
l=0
H (Z) =

K
1+ bk Z−k
k=1

To obtain the filter’s frequency response, in H(Z), we simply need to replace the variable Z with
the following expression, where f denotes the frequency variable, and T the time step between the
signal samples:

Z = e j2𝜋f T

In this operation, the imaginary axis in the Laplacian plane corresponds to the circle with unit
radius, centered at the origin in the plane of the variable Z. It is plain that the frequency response
of the filter defined by H(Z) is a periodic function whose period is the sampling frequency.
Another representation of the function H(Z), which is useful in the design of filters and the
study of a number of properties, explicitly includes the roots of the numerator, also known as the
zeroes of the filter, Zl(l = 1, 2, …, L), and the roots of the denominator, also known as the poles,
Pk(k = 1, 2, …, K):

L
(1 − Zl Z −1 )
l=1
H (Z) = a0

K
(1 − Pk Z −1 )
k=1

The term a0 is a scaling factor which defines the gain of the filter. The filter stability condition
is expressed very simply by the following constraint: all the poles must be within the unit circle.
The position of the poles and zeroes with respect to the unit circle offers a very simple way of
determining the characteristics of the filter; this technique is very widely used in practice.
xviii Introduction

Four chapters are devoted to the study of the characteristics of these digital filters. Chapter 4
presents the properties of time-invariant discrete linear systems, recaps the main properties of
the Z-transform, and lays down the fundamental groundwork necessary for the study of filters.
Chapter 5 discusses finite impulse-response filters – their properties are studied, the techniques
for calculating the coefficients are described, and the structures of real-world filters are examined.
Infinite impulse-response filters are generally produced by cascading first- and second-order ele-
mentary cells, or sections, so Chapter 6 describes these sections and their properties. To begin with,
this makes the study of this type of system considerably easier; in addition, the chapter provides a set
of results that are highly useful in practice. Chapter 7 outlines the methods for calculating the coef-
ficients for infinite impulse-response filters and discusses the problems posed by their real-world
implementation, with the limitations that are encountered and the consequences of those limita-
tions – in particular, computational noise.
As the properties of infinite impulse-response filters are comparable to those of continuous ana-
log filters, it is natural to envisage similar structures for these filters to those generally employed
in analog filtering. This is the subject of Chapter 8, which presents ladder structures. We then take
a diversion to look at switched-capacitor filters, which are not digital in the strictest sense of the
word, but which are sampled, and are highly useful additions to digital filters. To guide users, a
summary of the respective merits of the structures described is given at the end of the chapter.
Certain devices – for example, in instrumentation or telecommunications – work on signals
represented by a series of complex numbers. Out of all signals of this type, one category is of
particular practical interest: analytic signals. Their properties are studied in Chapter 9, as is the
design of devices apt for the generation or processing of such signals. Additional concepts relating
to filtering are also explained in this chapter, which, in a unified manner, presents the main
interpolation techniques. Signal restoration is also discussed.
Digital processing machines, when operating in real time, operate at a rate that is closely linked
to the signal sampling frequency. Their complexity depends on the volume of operations being
carried out, and the length of time available in which to perform this processing. The signal
sampling frequency is generally imposed either at system input or at output, but within the system
itself, it is possible to vary this rate in order to adapt it to the characteristics of the signal and the
processing, and thereby reduce the volume of operations and the computation rate. The machines
may be simplified – potentially very significantly – if, over the course of the processing, the
sampling frequency is adapted to suit the usable bandwidth of the signal; this is multirate filtering,
which is presented in Chapter 10. The impacts on the processing characteristics are described,
along with realization methods. Rules are provided on usage and assessment. This technique
produces particularly interesting results for narrow passband filters or the implementation of sets
known as filter banks. In this case, the system associates, with a set of phase-shifting circuits, a
discrete Fourier transform calculator.
Filter banks for the breakdown and reconstruction of signals have become a fundamental tool for
compression. The way in which they work is described in Chapters 11 and 12 with design methods
and realization structures.
The filters can be determined on the basis of time-domain specifications; such is the case, for
example, with the modeling of a system, as described in Chapter 13. If the characteristics vary, it
may be interesting to adapt the coefficients as a function of changes occurring in the system. This
adaptation may depend on an approximation criterion and take place at a rate that may come to
equal the system’s sampling rate; then, the filter is said to be adaptive. Chapter 14 is devoted to
adaptive filtering, in the simplest of cases, but also the most common and the most useful – where
the approximation criterion chosen is the minimization of the mean squared error, and where the
Introduction xix

coefficients vary depending on the gradient algorithm. After recapping details of random signals
and their properties in Chapter 13 – in particular, the autocorrelation function and matrix, whose
eigenvalues play an important role – the gradient algorithm is presented in Chapter 14, and its
convergence conditions are studied. Then, the two main adaptation parameters, the time constant
and the residual error, are analyzed along with the arithmetic complexity. Different structures are
proposed for concrete implementation.
Chapter 15 can be viewed as an extension of Chapters 13 and 14 to the domain of neural networks
in artificial intelligence. These devices are characterized by the systematic use of nonlinear circuits
for the functions of modeling, classification, or shape recognition. Adaptive techniques are used
during the learning phases.
Chapter 16 discusses a very specific application: error-correction coding. Indeed, information
processing and transmission systems include error-correction coding techniques, which are gener-
ally introduced by a mathematical approach, though some of the most widely used types of coding
are actually direct applications of the fundamental signal processing techniques. Thus, the chapter
puts forward a signalprocessing vision of certain types of coding, to facilitate readers’ access to and
use of these techniques.
Finally, Chapter 17 briefly describes some applications, showing how the fundamental methods
and techniques are put to use.
1

Signal Digitizing – Sampling and Coding

The conversion of an analog signal to digital form involves a twofold approximation. Firstly, in the
time domain, the signal function s(t) is replaced by its values at integral time increments T and is
thus converted to s(nT). This process is called sampling. Secondly, in the amplitude domain, each
value of s(nT) is approximated by a whole multiple of an elementary quantity. This process is called
quantization. The approximate value thus obtained is then associated with a number. This process
is called coding – a term often used to describe the whole process by which the value of s(nT) is
transformed into the number representing it.
The effect of these two approximations on the signal will be analyzed in this chapter. To achieve
this, two basic tools will be used: Fourier analysis and distribution theory.

1.1 Fourier Analysis

Fourier analysis is a method of decomposing a signal into a sum of individual components which
can easily be produced and observed. The importance of this decomposition is that a system’s
response to the signal can be deduced from these individual components using the superposition
principle. These elementary component signals are periodic and complex, so both the amplitude
and phase of the systems can be studied. They are represented by a function se (t) such that:

Se (t) = ej2𝜋ft = cos(2𝜋ft) + j sin(2𝜋ft) (1.1)

where f is the inverse of the period – that is, the frequency of the elementary signal.
Since the elementary signals are periodic, clearly, the analysis is simplified when the signal itself
is periodic. This case will be examined first, although it is not the most interesting, since a periodic
signal is completely determinate and carries practically no information.

1.1.1 Fourier Series Expansion of a Periodic Function


Let s(t) be a function of a periodic variable t with period T – that is, satisfying the relation:

s(t + T) = s(t) (1.2)

Under certain conditions, this function can be expanded in a Fourier series as:


s(t) = Cn ej2𝜋nt∕T (1.3)
n=−∞

Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
2 1 Signal Digitizing – Sampling and Coding

ip(t)
T
a

τ t

Figure 1.1 Impulse train.

The index n is an integer, and the Cn , called the Fourier coefficients, are defined by:
T
1
Cn = s(t)e−j2𝜋nt∕T dt (1.4)
T ∫0
In fact, the Fourier coefficients minimize the square of the difference between the function s(t)
and the series (1.3). Expression (1.4) is obtained by taking the derivative with respect to the index
n coefficient of the quantity:
( )2
T ∑

j2𝜋mt∕T
s(t) − Cm e dt
∫0 m=−∞

and setting that derivative to zero.


Example: Figure 1.1 shows an example of a Fourier expansion of a function ip (t) composed of a
train of impulses, each of width 𝜏 and amplitude a, occurring at time intervals T. The time origin
is taken as being at the center of an impulse.
The coefficients Cn are given by:
𝜏∕2
1 a𝜏 sin(𝜋n𝜏∕T)
Cn = ae−j2𝜋nt∕T dt = (1.5)
T ∫−𝜏∕2 T 𝜋n𝜏∕T
and the Fourier expansion is:
a𝜏 ∑ sin(𝜋n𝜏∕T) j2𝜋nt∕T

ip (t) = e (1.6)
T n=−∞ 𝜋n𝜏∕T
The importance of this example for the study of sampled systems is readily apparent.
The properties of Fourier series expansions are given in Ref. [1]. One important property,
expressed by the Bessel–Parseval equation, is that power is conserved in the expansion of the
signal:



1
T
|Cn |2 = |s(t)|2 dt (1.7)
n=−∞
T ∫0
The constituent elements resulting from the expansion of a periodic signal have frequencies
which are integer multiples of 1/T (the inverse of the period). They form a discrete set in the space of
all frequencies. In contrast, if the signal is not periodic, the Fourier components form a continuous
domain in the frequency space.

1.1.2 Fourier Transform of a Function


Let s(t) be a function of t. Under certain conditions, one can write:

s(t) = S(f )ej2𝜋ft df (1.8)
∫−∞
1.1 Fourier Analysis 3

where

S(f ) = S(f )ej2𝜋ft dt (1.9)
∫−∞
The function S(f ) is the Fourier transform of s(t). More commonly, S(f ) is called the spectrum of
signal s(t).
Example: To calculate the Fourier transform I(f ) of an isolated pulse i(t) of width 𝜏 and amplitude
a, centered on the time origin (Figure 1.2):
∞ 𝜏∕2
I(f ) = i(t)e−j2𝜋ft dt = a e−j2𝜋ft dt
∫−∞ ∫−𝜏∕2
sin(𝜋f𝜏)
I(f ) = a𝜏 (1.10)
𝜋f𝜏
Figure 1.3 represents the function I(f ). This will be used frequently i(t)
in this book. It is important to note that it will be zero for nonzero fre- a
quencies which are whole multiples of the inverse of the impulse width.
A table of this function is given in Appendix 1.
This example clearly shows the correspondence between the Fourier
coefficients and the spectrum. In effect, by comparing equations (1.6) t
τ
and (1.10), it can be verified that, apart from the factor 1/T, the coef-
ficients of the Fourier series expansion of an impulse train correspond Figure 1.2 Isolated
impulse.
to the values of the spectrum of the isolated impulse at frequen-
cies which are whole multiples of the inverse of the period of the
impulses.
In the case of a nonperiodic function, there is an expression similar to the Bessel–Parseval
relation, but this time the energy in the signal is conserved, instead of the power:
∞ ∞
|S(f )|2 df = |s(t)|2 dt (1.11)
∫−∞ ∫−∞
Let s′ (t) be the derivative of the function s(t); its Fourier transform Sd (f ) is given by:

Sd (f ) = e−j2𝜋ft s′ (t)dt = j2𝜋fS(f ) (1.12)
∫−∞
Thus, taking the derivative of a signal leads to multiplying its spectrum by j2𝜋f .
One essential property of the Fourier transform (in fact, the main reason for its use) is that it
transforms a convolution into a simple product. Consider two time functions, x(t) and h(t), with
Fourier transforms X(f ) and H(f ), respectively. The convolution y(t) is defined by:

y(t) = x(t) ∗ h(t) = x(t − 𝜏)h(𝜏)d𝜏 (1.13)
∫−∞

Figure 1.3 Spectrum of an isolated I(f)


impulse.

2 1 0
– – 1 2 f
τ τ τ τ
4 1 Signal Digitizing – Sampling and Coding

The Fourier transform of this product is:


∞( ∞ )
Y (f ) = x(t − 𝜏)h(𝜏)d𝜏 e−j2𝜋ft dt
∫−∞ ∫−∞
∞ ∞
Y (f ) h(𝜏)e−j2𝜋f𝜏 d𝜏 x(u)e−j2𝜋fu du = H(f )X(f )
∫−∞ ∫−∞
Conversely, it can be shown that the Fourier transform of a simple product is a convolution
product.
An interesting result can be derived from the abovementioned properties. Let us consider the
Fourier transform II(f ) of the function i2 (t). Because of equations (1.10) and (1.13), we obtain:

II(f ) = I(f ) ∗ I(f ) = aI(f ) (1.14)

and therefore,

sin(𝜋𝜙𝜏) sin(𝜋(f − 𝜙)𝜏) 1 sin(𝜋f𝜏)
d𝜙 =
∫−∞ 𝜋𝜙𝜏 𝜋(f − 𝜙)𝜏 𝜏 𝜋f𝜏
Taking f = n/𝜏, for any integer n,

sin(𝜋𝜙𝜏) sin(𝜋(𝜙𝜏 − n))
• d𝜙 = 0 (1.15)
∫−∞ 𝜋𝜙𝜏 𝜋(𝜙𝜏 − n)
Thus, the functions sin 𝜋(x – n)/[𝜋(x – n)], with n being an integer, forms a set of orthogonal
functions.
The definition and properties of the Fourier transform can be extended to multivariate functions.
Let s(x1 , x2 , …, xn ) be a function of n real variables. Its Fourier transform is a function S(𝜆1 , 𝜆2 , …,
𝜆n ) defined by:

S(𝜆1 , 𝜆2 , … , 𝜆n ) = ··· s(x1 , x2 , … , xn ) × e−j𝟐𝜋(𝜆1 x1 +𝜆2 x2 +···+𝜆n xn ) dx1 dx2 · · · dxn (1.16)
∫ ∫ℝn ∫
If the function s(x1 , x2 , …, xn ) is separable – that is, if:

s(x1 , x2 , … , xn ) = s(x1 )s(x2 ) · · · s(xn )

then:

S(𝜆1 , 𝜆2 , … , 𝜆n ) = S(𝜆1 )S(𝜆2 ) · · · S(𝜆n )

The variables xi (1 ⩽ i ⩽ n) often represent distances (for example, for two dimensions), and in
that case, 𝜆i are called spatial frequencies.

1.2 Distributions

Mathematical distributions constitute a formal mathematical representation of the physical distri-


butions found in experiment [1].

1.2.1 Definition
A distribution D is defined as a continuous linear function in the vector space 𝒟 of functions
defined in ℝn , indefinitely differentiated, and having a bounded support.
1.2 Distributions 5

With each function 𝜑 belonging to 𝒟 , the distribution D associates a complex number D(𝜑)
which will also be denoted by ⟨D, φ⟩, with the properties:
(1) D(𝜑1 + 𝜑2 ) = D(𝜑1 ) + D(𝜑2 ).
(2) D(𝜆𝜑) = 𝜆D(𝜑) where 𝜆 is a scalar.
(3) If 𝜑j converges to 𝜑 when j tends toward infinity, the sequence D(𝜑j ) converges to D(𝜑).
Examples:
(i) If f (t) is a function which is summable over any bounded ensemble, it defines a distribution
Df by:

⟨Df , 𝜙⟩ = f (t)𝜙(t)dt (1.17)
∫−∞
(ii) If 𝜑′ denotes the derivative of 𝜑, the function:

⟨D, 𝜙⟩ = f (t)𝜙′ (t)dt = ⟨f , 𝜙⟩ (1.18)
∫−∞
is also a distribution.
(iii) The Dirac distribution 𝛿 is defined by:
⟨𝛿, 𝜙⟩ = 𝜙(0) (1.19)
The Dirac distribution 𝛿 at a real point x is defined by:
⟨𝛿(t − x), 𝜙⟩ = 𝜙(x) (1.20)
This distribution is said to represent a mass of +1 at the point x.
(iv) Consider a pulse i(t) of duration 𝜏, with amplitude a = l/𝜏, centered on the origin. It defines a
distribution Di :
𝜏∕2
1
⟨Di , 𝜙⟩ = 𝜙(t)dt
𝜏 ∫−𝜏∕2
For very small values of 𝜏, this becomes:
⟨Di , 𝜙⟩ ≃ 𝜙(0)
that is, the Dirac distribution can be regarded as the limit of the distribution Di when 𝜏 tends toward
0.

1.2.2 Differentiation of Distributions


The derivative 𝜕D/𝜕t of a distribution D is defined by the relation:
⟨ ⟩ ⟨ ⟩
𝜕D 𝜕𝜙
, 𝜙 = − D, (1.21)
𝜕t 𝜕t
To illustrate this, consider the Heaviside function Y , or single-step function, which is zero when
t < 0 and +1 if t ⩾ 0:
⟨ ⟩ ⟨ ⟩
𝜕Y 𝜕𝜙 ∞
, 𝜙 = − Y, =− 𝜙′ (t)dt = 𝜙(0) = ⟨𝛿, 𝜙⟩ (1.22)
𝜕t 𝜕t ∫0
As a result, the discontinuity in Y appears in the derivative as a point of unit mass.
This example illustrates the considerable practical interest of the notion of a distribution, which
means that a number of the concepts and properties of continuous functions can be extended to
discontinuous functions.
6 1 Signal Digitizing – Sampling and Coding

1.2.2.1 The Fourier Transform of a Distribution


By definition, the Fourier transform of a distribution D is a distribution denoted by FD such that:

⟨FD, 𝜙⟩ = ⟨D, F𝜙⟩ (1.23)

By applying this definition to distributions with a point support, we obtain:



⟨F𝛿, 𝜙⟩ = ⟨𝛿, F𝜙⟩ = 𝜙(t)dt = ⟨1, 𝜙⟩ (1.24)
∫−∞
Consequently, F𝛿 = 1. Similarly, Fδ(t − a) = e−j2𝜋fa .
A case which is fundamental to the study of sampling is that of the set (u) of Dirac distributions
shifted by T and such that:


u(t) = 𝛿(t − nT) (1.25)
n=−∞

This set is a distribution of unit mass points separated on the abscissa by whole multiples of T.
Its Fourier transform is:


Fu = e−j2𝜋fnT = U(f ) (1.26)
n=−∞

and it can be shown that this sum is a point distribution.


A simple demonstration can be obtained from the Fourier series development of the function
ip (t), formed by the set of separate pulses of duration T, with width 𝜏, and amplitude 1/𝜏, centered
on the time origin.
One can consider u(t) as the limit of ip (t) as 𝜏 tends toward zero:

u(t) = limip (t)


𝜏→0

and by referring to relation (1.6), we find that:

1 ∑ j2𝜋nt∕T

lim ip (t) = e
𝜏→0 T n=−∞
The following fundamental property is demonstrated in Ref. [2].
The Fourier transform of the time distribution, represented by unit mass points separated by whole
multiples of T is a frequency distribution of points of mass 1/T separated by whole multiples of 1/T.
That is:
∑ ( )
1 ∑
∞ ∞
n
U(f ) = ej𝜋fnT = 𝛿 f− (1.27)
n=−∞
T n=−∞ T
This result will be used when studying signal sampling. The property of the Fourier transform
whereby it exchanges convolution and multiplication applies equally to distributions.
Before considering the influence of the sampling and quantizing operations on the signal, it is
useful to discuss the characteristics of those signals which are most often studied.

1.3 Some Commonly Studied Signals

A signal is defined as a function of time s(t). This function can be either an analytic expression or
the solution of a differential equation, in which case the signal is said to be deterministic.
1.3 Some Commonly Studied Signals 7

1.3.1 Deterministic Signals


Sine waves are the most frequently used signals of this type. For example,

s(t) = A cos(𝜔t + 𝛼)

where A is the amplitude, 𝜔 = 2𝜋f is the angular frequency, and 𝛼 is the phase of the signal.
Signals of this type are easy to reproduce and recognize at different points of a system. They allow
the various characteristics to be visualized in a simple way. Moreover, as mentioned above, they
serve as the basis for the decomposition of any deterministic signal through the Fourier transform.
If the system is linear and invariant in time, it can be characterized by its frequency response
H(𝜔). For each value of the frequency, H(𝜔) is a complex number whose modulus is the amplitude
of the response. By convention, the function 𝜑(𝜔) such that:

H(𝜔) = |H(𝜔)|e−j𝜙(𝜔) (1.28)

is defined as the phase. This convention allows the group delay 𝜏(𝜔), a positive function for real
systems, to be expressed as:

d𝜙
𝜏(𝜔) = (1.29)
d𝜔
The group delay refers to transmission lines on which different frequencies of the signal
propagate at different speeds, which leads to energy dispersion in time. As an illustration of the
concept, let us consider two close frequencies 𝜔 ± Δ𝜔 and the corresponding phases per unit
length 𝜑 ± Δ𝜑. The sum signal is expressed by:

s(t) = cos[(𝜔 + Δ𝜔)t − (𝜑 + Δ𝜑)] + cos[(𝜔 − Δ𝜔)t − (𝜑 − Δ𝜑)]

or

s(t) = 2 cos(𝜔t − 𝜑) cos(Δ𝜔t − Δ𝜑)

This is a modulated signal, and there is no dispersion if the two factors in the above expres-
sion undergo the same delay per unit length – that is, Δ𝜑/Δ𝜔 is constant. Thus, the group delay
characterizes the dispersion imposed on the signal energy by a transmission line or any equivalent
system.
If the sinusoidal signal s(t) is applied to the system, then an output signal sr (t) is obtained such
that:

sr (t) = A|H(𝜔)| cos[𝜔t + 𝛼 − 𝜙(𝜔)] (1.30)

Once again, this is a sinusoidal signal, and comparison with the applied signal reveals the
response of the system. The importance of this procedure (for example, for test operations) can
readily be appreciated.
Deterministic signals, meanwhile, do not give a good representation of real signals because
they do not carry any information except by the fact of their presence. Real signals are generally
represented by a random function s(t). For testing and analyzing systems, random signals are also
used, but they must have particular characteristics which do not present undue complications for
their generation and use. A study of such signals is given in Vol. 2 of Ref. [2].
8 1 Signal Digitizing – Sampling and Coding

1.3.2 Random Signals


A random signal is defined at each time, t, by a probability law for its amplitude. This law can be
expressed by a probability density p(x, t) defined by:
Prob[x ≤ s(t) ≤ x + Δx]
p(x, t) = lim (1.31)
Δx→0 Δx
It is stationary if its statistical properties are independent of time – that is, if the probability den-
sity is independent of time:

p(x, t) = p(x)

It is of second-order if it possesses a first-order moment called the mean value, which is the
mathematical expectation of s(t), denoted by E[s(t)] and defined by:

m1 (t) = E[s(t)] = xp(x, t)dx (1.32)
∫−∞
and a second-order moment called the covariance:
∞ ∞
E[s(t1 )s(t2 )] = m2 (t1 , t2 ) = x1 x2 p(x1 , x2 ; t1 , t2 )dx1 dx2
∫−∞ ∫−∞
where p(x1 ,x2 ;t1 ,t2 ) is the probability density of a pair of random variables [s(t1 ),s(t2 )].
The stationarity can be limited to the moments of first and second order; then, the signal is said
to be stationary of order 2 or stationary in the wider sense. For such a signal:

E[s(t)] = xp(x)dx = m1
∫−∞
The independence of time is translated for the probability density p(x1 , x2; t1 ,t2 ) as follows:

p(x1 , x2 ; t1 , t2 ) = p(x1 , x2 ; 0, t2 − t1 ) = p(x1 , x2 ; 𝜏)

where,

𝜏 = t2 − t1

Only the difference between the two observation times is involved:

E[s(t1 )s(t2 )] = m2 (𝜏) (1.33)

The function r xx (𝜏), such that:

rxx (𝜏) = E[s(t)s(t − 𝜏)] (1.34)

is called the autocorrelation function of the signal.


A random signal s(t) also has a time mean mT . This is a random variable defined by:
T∕2
1
mT = lim s(t)dt (1.35)
T→∞ T ∫−T∕2

The ergodicity of this average illustrates that it takes a particular value k with probability 1. For a
stationary signal, the ergodicity of the time average implies equality with the average of the ampli-
tudes at a given instant. In effect, we use the expectation of the variable mT :
[ T∕2 ] T∕2
1 1
E[mT ] = k = E lim s(t)dt = lim E[s(t)]dt = m1
T→∞ T ∫−T∕2 T→∞ T ∫−T∕2
1.3 Some Commonly Studied Signals 9

This result has important consequences in practice, as it provides a means of finding the statistical
properties of the signal at a given instant from observation over the time period. The ergodicity of
the covariance in the stationary case is also very interesting because it leads to the relation:
T∕2
1
rxx (𝜏) = lim s(t)s(t − 𝜏)dt (1.36)
T→∞ T ∫−T∕2

The autocorrelation function r xx (𝜏) of the signal s(t) is fundamental for the study of ergodic
second-order stationary signals. Its principal properties are:

(1) It is an even function:

rxx (𝜏) = rxx (−𝜏)

(2) Its maximum is at the origin and corresponds to the power P of the signal:

rxx (0) = E[s2 (t)] = P

(3) The power spectral density is the Fourier transform of the autocorrelation function:

∞ ∞
Φxx (f ) = rxx (𝜏)ej2𝜋f𝜏 d𝜏 = 2 rxx (𝜏) cos(2𝜋f𝜏)d𝜏
∫−∞ ∫0
In effect, r xx (𝜏) = s(𝜏)*s(−𝜏), and if S(f ) denotes the Fourier transform of s(t), we obtain:

Φxx (f ) = S(f )S(f ) = |S(f )|2 (1.37)

This last property is physically translated by the fact that the more rapidly the signal varies (that
is, the more its spectrum tends toward high frequencies), the narrower its autocorrelation function
becomes. In the limit case, the signal is purely random, and the function becomes zero for 𝜏 ≠ 0.
This signal is called white noise and is described by:

rxx (𝜏) = P𝛿

The spectral density is then constant:

Φxx (f ) = P

In fact, such a signal has no physical reality since its power is infinite, but it does offer a use-
ful mathematical model for signals with a spectral density that is virtually constant over a wide
frequency band.

1.3.3 Gaussian Signals


Among the probability distributions that can be considered for a signal s(t), one particular cate-
gory is of special interest – normal or Gaussian distributions. In effect, normal random distributions
retain their normal character under any linear operation, such as convolution through a certain dis-
tribution, filtering, differentiation, or integration. These random distributions are also frequently
used for modeling real signals and for testing systems.
A random variable x is said to be Gaussian if its probability law has a density p(x) which follows
the normal or Gaussian law:
1 2 2
p(x) = √ e−(x−m) ∕(2𝜎 ) (1.38)
𝜎 (2𝜋)
10 1 Signal Digitizing – Sampling and Coding

The parameter m is the mean of the variable x; the variance 𝜎 2 is the second-order moment of
the centered random variable (x − m); and 𝜎 is the standard deviation. The variable (x − m)/𝜎 has
a zero average and a unit standard deviation. A tabulation is given in Appendix 2.
A random variable is characterized by its amplitude probability law and also by its moments mn ,
such that:

mn = xn p(x)dx (1.39)
∫−∞
These moments are the coefficients of the series expansion of the function F(u), called the
characteristic function of the random variable x and defined by:

F(u) = ejux p(x)dx (1.40)
∫−∞
It is the inverse Fourier transform of the probability density p(x) which is expressed by:

1
p(x) = e−jux F(u)du (1.41)
2𝜋 ∫−∞
On the basis of equation (1.40), the following series expansion is obtained:


(ju)n
F(u) = mn (1.42)
n=0
n!
and for a centered Gaussian variable:
2 u2 ∕2
F(u) = e−𝜎 (1.43)

The normal law can be generalized to multidimensional random variables [3]. The characteristic
function of a k-dimensional Gaussian variable x(x1 ,x2 , … ,xk ) is expressed by:
( )
1 ∑∑
k k
F(u1 , u2 , … , uk ) = exp − r uu (1.44)
2 i=1 j=1 ij i j

where

rij = E(xi xj )
The probability density is obtained through Fourier transformation. For two dimensions, we get:
{ [ ]}
1 1 x12 2rx1 x2 x22
p(x1 , x2 ) = √ exp − − + 2 (1.45)
2𝜋𝜎 𝜎 (1 − r 2 ) 2(1 − r 2 ) 𝜎12 𝜎1 𝜎2 𝜎2
1 2

where r is the correlation coefficient:


E(x1 x2 )
r=
𝜎1 𝜎2
A random signal s(t) is said to be Gaussian if, for any set of k times ti (1 ⩽ i ⩽ k), the k-dimensional
random variable s = [s(t1 ),…, s(tk )] is Gaussian. According to equation (1.44), the probability law
of that variable is completely defined by the autocorrelation function r xx (𝜏) of the signal s(t).
Example: The signal defined by the equations:

rxx (𝜏) = 𝜎 2 e−|𝜏|∕(RC) (1.46)

1 2 2
p(x) = √ e−x ∕(2𝜎 ) (1.47)
𝜎 (2π)
1.3 Some Commonly Studied Signals 11

is an approximation to white Gaussian noise which is used in the analysis of systems or for modeling
signals. It is a stationary signal with a zero average and a spectral density which is not strictly
constant, but which corresponds to a uniform distribution filtered by an RC-type low-pass filter. It
is obtained by amplifying the thermal noise generated at the terminals of a resistor.
The Gaussian distribution can be obtained from a uniform distribution. Let p(y) be the Rayleigh
distribution:
y 2 2
p(y) = 2 e−(y ∕2𝜎 ) ; y ⩾ 0 (1.48)
𝜎
and p(x) the uniform distribution over the interval [0,1]. Changing variables so that:

p(x)dx = p(y)dy

one gets:
dx dx y 2 2
p(y) = p(x) = = 2 e−(y ∕2𝜎 )
dy dy 𝜎
and therefore,
√[ ( )]
−(y2 ∕2𝜎 2 ) 1
x=e ; y=𝜎 21n (1.49)
x
The Gaussian distribution is obtained by considering two independent variables, x and y, and the
variable z, given by:

z = y cos 2𝜋x (1.50)

With the help of the variable:

z′ = y sin 2𝜋x

the following equation is obtained:

p(z, z′ )dz dz′ = p(z)p(z′ )dz dz′ = p(y)p(x)dx dy = p(z)p(z′ )y dy2𝜋dx

Hence:
1 1 −(y2 ∕2𝜎 2 ) 1 −(z2 +z′ 2 ∕2𝜎 2 )
p(z)p(z′ ) = e = e
2𝜋 𝜎 2 2𝜋𝜎 2
and finally,
1 2 2
p(z) = √ e−(z ∕2𝜎 )
𝜎 2𝜋
The method is often used to generate digital Gaussian signals.

1.3.3.1 Peak Factor of a Random Signal


A random signal is defined by a probability law of its amplitude, such that this amplitude is often
not bounded. This is the case for Gaussian signals, as can be seen from equation (1.38).
Signal processing can only be performed for a limited range of amplitudes, and restrictions are
necessary. An important parameter is the peak factor defined as the ratio of a certain amplitude
Am to the standard deviation 𝜎. By convention, the amplitude Am is that value which is exceeded
for not more than 10−5 of the total time. This relationship is often expressed in decibels (dB) by f c ,
where:

fc = 20log10 (Am ∕𝜎) (1.51)


12 1 Signal Digitizing – Sampling and Coding

Following that convention, the peak factor for a Gaussian signal is 12.9 dB, and when this
definition is applied to a sinusoidal signal, a peak factor of 3 dB results.
A stationary model used to represent telephone signals is formed by the random signal whose
amplitude probability density obeys the following exponential, or Laplace, distribution:

1
p(x) √ e− 2|x|∕𝜎 (1.52)
𝜎 2
The peak factor rises, in this case, to 17.8 dB.
In conclusion, ergodic second-order stationary functions characterized by an amplitude proba-
bility distribution and an autocorrelation function can be used to model the majority of signals of
interest. They are widely used in system analysis.
In addition to the other possibilities for representing signals, it is important to have some
sort of global measure, so as to be able, for example, to follow a signal through the processing
system. Such a measure can be obtained by defining norms on the function which represents
the signal.

1.4 The Norms of a Function

A norm is a positive real function which satisfies the relations:

||x|| ≥ 0; k||x|| = ||kx||

where k is real and positive.


One category of norms that is frequently employed is the set called Lp norms [4]. The Lp norm of
a continuous function s(t) defined over the interval [0,1] is denoted by ||s||p and defined by:
[ 1 ]1∕p
||s||p = |s(t)|p dt (1.53)
∫0
Three values of p are of particular interest:

(1) p = 1:
1
||s||1 = |s(t)|dt (1.53a)
∫0
(2) p = 2:
1
||s||22 = |s(t)|2 dt (1.53b)
∫0
which is the expression for the energy of the signal s(t).
(3) p = ∞:

||s||∞ = max |s(t)| (1.53c)


0⩽t⩽1

This is Chebyshev’s norm.


Norms are also used in approximation techniques to measure the discrepancy between a function
f (x) and the function F(x) being approximated. The approximation is made by the least squares
method if the norm L2 is used and, in Chebyshev’s sense, if the norm L∞ is used.
1.5 Sampling 13

The Lp norms can be generalized by introducing a real, positive weighting function of p(x). The
weighted Lp norm of the difference function f (x) − F(x) is then written:
[ 1 ]1∕p
||f (x) − F(x)||p = |f (x) − F(x)| p(x)dx
p
(1.53d)
∫0
These expressions are applied when calculating filter coefficients and in realization problems – in
particular for the scaling of internal data in memories and for noise estimation.

1.5 Sampling

Sampling consists of representing a signal function s(t) by its value s(nT) taken at whole multiples
of the time interval T, called the sampling period. Such an operation can be analyzed in a simple
and concise way by using distribution theory. In fact, by definition, the distribution of unit masses at
whole multiple points on the axis, with period T, associates with the function s(t) the ensemble of its
values s(nT), where n is a whole number. Conforming to the notation given earlier, this distribution
is denoted by u(t) and is written:


u(t) = 𝛿(t − nT)
n=−∞

The sampling operation affects the spectrum S(f ) of the signal. Considering the fundamental
relation (1.27), it appears that the spectrum U(f ) of the distribution u(t) is formed of lines of ampli-
tude 1/T at frequencies which are whole multiples of the sampling frequency f s = 1/T. Thus, u(t)
is expressed as a sum of elementary sinusoidal signals, collectively called carriers:

1 ∑ j2𝜋nt∕T

u(t) = e (1.54)
T n=−∞
Hence, the set of values for the signal s(nT) corresponds to the product with the signal s(t) of
the ensemble of component signals which make up u(t). That is, the operation of sampling is an
amplitude modulation of the signal by an infinite number of carriers with frequencies which are
whole multiples of the sampling frequency f s = 1/T. Consequently, the sampled signal spectrum
includes the function S(f ), called the baseband, and other images or sidebands, which correspond
to the translation of the baseband by whole multiples of the sampling frequency. The operation of
sampling and its influence on the signal spectrum are represented in Figure 1.4.

s s

t 0 f
s s

T t 0 1 f
= fe
T

Figure 1.4 The spectral incidence of sampling.


14 1 Signal Digitizing – Sampling and Coding

The sampled signal spectrum Ss (f ) is expressed as the convolution of S(f ) with U(f ), so that:


Ss (f ) = 1∕T S(f − n∕T) (1.55)
n=−∞
It is important to note that the function Ss (f ) is periodic – that is, the sampling has introduced a
periodicity into the frequency space, which constitutes a fundamental characteristic of the sampled
signals.
The sampling operation, as described above, which is called ideal sampling, may seem rather
unrealistic in that it would be difficult in practice to manipulate or reconstitute an instantaneous
signal value. Real samplers, or circuits which reconstitute samples, all possess a certain aperture
time. In fact, it can be shown that sampling, or the reconstitution of samples by pulses having a
given width, simply introduces a modification of the signal spectrum.
In effect, when sampling a signal x(t) by a set of pulses separated by period T, with width 𝜏 and
amplitude a, it is quite possible for a quantity 𝜎 n to be collected in the nth period, and this is written:
nT+𝜏∕2
𝜎n = a s(t)dt
∫nT−𝜏∕2
This quantity expresses the result of the convolution of the signal s(t) by the elementary pulse
i(t). The function which results in this case at the sampling time nT is the function s*i. That is, the
sampled signal does not have S(f ) for its spectrum, but the product:
( ) ( )
1 ∑

n n
Sp (f ) = U(f )S(f ) = S 𝛿 f−
T n=−∞ T T
Similar reasoning applies in the case of reconstitution of samples with a duration 𝜏. In fact, it is
the convolution product of the samples s(nT) with the elementary pulse i(t) which is reconstituted.
Thus, we have the proposition:
Sampling or the reconstitution of samples by pulses with width 𝜏 can be treated as ideal sampling or
ideal reconstitution, provided that the signal spectrum is multiplied by the spectrum of the elementary
pulse train.
In practice, however, whenever τ is small in comparison with the period T, the correction is
negligible.

1.6 Frequency Sampling


The type of sampling considered above is carried out on a temporal basis. Nevertheless, the general
characteristics are also applicable to frequency sampling.
Let us calculate the spectrum of a periodic function sp (t) with period T. Such a function can be
regarded as resulting from the convolution product of the function s(t) which takes values of sp (t)
over one period and is zero elsewhere, and the point distribution u(t). This results in the following
relation between the Fourier transforms:
( ) ( )
1 ∑

n n
Sp (f ) = U(f )S(f ) = S 𝛿 f− (1.56)
T n=−∞ T T
where we again meet the coefficients of the Fourier series expansion of the function sp (t). The case
where s(t) is a square pulse is represented in Figure 1.5.
It is apparent that the spectrum of the periodic function sp (t) is a set of lines forming a sam-
ple of the spectrum of one period of the function. Sampling in frequency space corresponds to a
periodicity in the time space. This is a useful interpretation for digital analysis of spectra.
1.7 The Sampling Theorem 15

i(t)
p T
a

τ t

I(f) aτ
T

1 f
1 τ
T

Figure 1.5 The spectrum of an impulse train.

1.7 The Sampling Theorem

This theorem establishes the conditions under which the set of samples of a signal correctly repre-
sents that signal. A signal is said to be correctly represented by the set of its samples taken with the
periodicity T, if it is possible, from this set of values, to completely reconstitute the original signal.
The sampling has introduced a periodicity of the spectrum in frequency space. To restore the
original signal, this periodicity must be suppressed – that is, the image bands must be eliminated.
This can be achieved using a low-pass filter with a transfer function H(f ) which is equal to 1/f s up
to the frequency f s /2 and is zero at higher frequencies. At the output of such a filter, a continuous
signal appears, which can be expressed as a function of the values s(nT). The square wave impulse
response h(t) of the filter is written, using equation (1.10), as:
sin(𝜋t∕T)
h(t) =
𝜋t∕T
The signal at the output of the filter, s(f ), corresponds to the convolution product of the set s(nT)
with the function h(t). It is written as:
[ ∞ ]
∞ ∑ sin 𝜋(t − 𝜃)∕T
s(t) = s(𝜃)𝛿(𝜃 − nT) d𝜃
∫−∞ n=−∞ 𝜋(t − 𝜃)∕T

and hence,


sin 𝜋(t∕T − n)
s(t) = s(nT) (1.57)
n=−∞
𝜋(t∕T − n)

This is the formula for interpolating signal values at points sited between the samples. It can
be verified that it reproduces s(nT) for multiples of the period T, and the reconstitution process is
represented in Figure 1.6.
In order for the calculated signal s(t) to be identical to the original signal, the spectrum S(f ) has to
be identical to the spectrum of the original signal. As shown in Figure 1.6, this condition is satisfied
if and only if the original spectrum does not contain any components with frequencies greater than
or equal to f s /2.
16 1 Signal Digitizing – Sampling and Coding

s(t)

Se(f)

0 fe = 1 f T t
1 T h(t)
H(f) 1
fe

fe 0 fe f 0 1 =T t

2 2 fe
s(t)
S(f)

0 f 0 t
T

Figure 1.6 Reconstruction of the signal after sampling.

Se(f)

0 fe 1 f
fe =
2 T

Figure 1.7 Spectrum aliasing.

If this is not the case, and the image bands overlap the baseband, as shown in Figure 1.7, a
foldover distortion or aliasing is introduced and the restoring filter produces a signal that is dif-
ferent from the original one. From this, the sampling theorem or Shannon’s theorem is derived:
A signal which does not contain any component with a frequency equal to or greater than a value
f m is completely determined by the set of its values at regularly spaced intervals of period T = 1/(2f m ).
Thus, the sampling frequency of a signal is determined by the upper limit of its frequency band.
In practice, the signal band is generally limited by filtering to a value below f s /2 before sampling
at frequency f s , in order that the restoring filter can be practically realizable.
It is interesting to note that the sampling frequency is determined by the bandwidth of the signal.
The reconstitution illustrated in Figure 1.6 was for a low-frequency signal with which a low-pass
filter was associated. It is apparent that the same reasoning can also be applied to a signal occupying
a restricted region in frequency space by using an associated band-pass filter. In particular, this
property is applicable to modulated signals and is used in certain types of digital filters.
The result given at the end of Section 1.1.2 enables sampling to be presented from another view-
point. Equation (1.57) shows that sampling corresponds to decomposition of the signal s(t) in accor-
sin 𝜋(t∕T−n)
dance with the set of orthogonal functions 𝜋(t∕T−n) and Shannon’s theorem simply expresses the
condition for this set to form a signal decomposition basis (several decomposition bases may exist).

1.8 Sampling of Sinusoidal and Random Signals

The properties given above can be clearly illustrated by sampling sinusoidal signals, whose features
are of use in numerous applications.
1.8 Sampling of Sinusoidal and Random Signals 17

1.8.1 Sinusoidal Signals


Let the signal s(t) = cos(2πft + 𝜑), with 0 ⩽ 𝜑 ⩽ π/2, be sampled with the period T = 1/f s = 1. The
samples are given by the set s(n) such that:

s(n) = cos(2πfn + 𝜙)

If the ratio f/f s = f is a rational number, this becomes:

f = N1 ∕N2 with N1 and N2 as integers

Then,

s(n + N2 ) = cos[2𝜋f(n + N2 ) + 𝜙] = s(n)

The set exhibits the periodicity N 2 and comprises at most N 2 different numbers. On the other
hand, since the sampling frequency is more than twice the signal frequency, N1 ∕N2 < 12 . The
ensemble of N 2 different samples permits the representation of a number of sinusoidal signals
equal to the largest whole number less than N 2 /2. For example, if N 2 = 8, with the ensemble of
numbers – 2 cos (2πn/8 + 𝜑), (n = 0,1, …,7), it is possible to represent samples of three sinusoidal
signals:
( )
N1
2 cos 2𝜋 t + 𝜙 with N1 = 1,2, 3
8
Figure 1.8 represents this sampling for 𝜑 = 0, and in this particular case, four numbers are suffi-

cient: ±2 and ± 2.
If we then add to the three sinusoidal signals in Figure 1.8 the continuous signal with the value 1,
and the oscillating signal cos (𝜋t) with frequency 12 , and amplitude 1, the result is that when the
composite signal is sampled, zero values appear. This is true except for points which are multiples
of 8, where the value 8 is obtained as shown in Figure 1.9(a). The spectrum of this sum is obtained
directly by applying the relation:
1 jx
cos x = (e + e−jx )
2
It is formed of lines with an amplitude of 1 at frequencies which are multiples of 18 (Figure 1.9(b)).
This spectrum has already been studied in Section 1.2, and equation (1.27) again applies.
Spectrum analyzers and digital frequency synthesizers use the fact that it is possible to produce
a range of sinusoidal signals from a limited set of numbers which are stored, for example, in a
computer memory.

A
2
2

0 1 2 3 4 5 6 7 8 t

– 2
–2

Figure 1.8 Sampling the signals cos (2𝜋N/8t).


18 1 Signal Digitizing – Sampling and Coding

A
8

0 1 2 3 4 5 6 7 8 t
(a)
A

0 1 1 1 f
8 2
(b)


3
Figure 1.9 (a) Sampling the signal: s(t) = 1 + 2 cos(2𝜋n∕8t) + cos(𝜋t). (b) corresponding spectrum.
n=1

1.8.2 Discrete Random Signals


If the random signal s(t) is sampled with the unit period T = 1, a discrete random signal results,
which, by definition, has the same probability distribution of amplitude. The results obtained for
the continuous case can be applied to the discrete one, particularly for the random signals which
are second-order stationary and ergodic [5].
Thus, the autocorrelation function of the discrete signal s(n) is the sequence r(n), such that:

r(n) = E[s(i)s(i − n)] (1.58)

This is the sampling of the autocorrelation function r xx (𝜏) of the continuous random signal
defined by expression (1.34). Its Fourier transform gives the power spectrum density Φd (f ) of the
discrete signal, which is related to the spectral density Φxx (f ) of the continuous signal by a relation
similar to equation (1.55); that is,
( )
1 ∑

n
Φd (f ) = Φxx f − (1.59)
T n=−∞ T

If the sampling frequency is not high enough, or if the spectrum Φxx (f ) spans an infinite domain,
aliasing takes place.
The hypothesis of ergodicity for the discrete signal s(n) leads to the relation:

1 ∑N
r(n) = lim s(i)s(i − n) (1.60)
N→∞ 2N + 1
i=−N

This relation gives the opportunity to extend the concept of autocorrelation function to determin-
istic signals. Then, for a periodic signal with period N 0 , the autocorrelation function is the sequence
r(n) given by:

1 ∑
N0 −1
r(n) = s(i)s(i − n) (1.61)
N0 i=0

It is a periodic sequence with the same period.


1.8 Sampling of Sinusoidal and Random Signals 19

Example:
( )
n
s(n) = A sin 2𝜋
N0
( ) ( )
1 ∑
N0
i i−n
r(n) = A2 sin 2𝜋 sin 2𝜋
N0 i=0 N0 N0
2
( )
A n
r(n) = cos 2𝜋
2 N0
The period of r(n) is N 0 and r(0) is the signal power.
A discrete random signal can also be defined directly. For example, if r(n) = 0 for n ≠ 0, the
signal s(n)
[ is a] discrete white noise, and the spectral density is constant over the frequency
interval − 12 , 12 . This signal represents a physical reality – it is a sequence of noncorrelated
random variables, and it can be obtained through an algorithm which produces statistically
independent numbers.

1.8.3 Discrete Noise Generation


It has been shown in Section 1.8.1 that the generation of digital sinusoidal signals is particularly
simple. Such signals can be used to simulate noise, for example, by addition of a large number
of sine waves of different frequencies and constant amplitudes, having random or pseudorandom
phases. This approach can lead to particularly efficient realizations, like that which has been
standardized for measuring equipment in digital telephone transmission and is as follows:
A pseudorandom sequence is created, which is a periodic sequence of 2N − 1 bits comprising
approximately as many “zeros” as “ones” and which simulates a random sequence in which the
bits are independent and have the probability 12 of having a value of 0 or 1 (or in order to center the
variables, a value of ± 12 ).
If such a set is filtered (filtering in fact consists of weighted summation), the numbers obtained
after filtering obey a probability distribution which approximates the normal distribution.
Pseudorandom sequences are studied in Ref. [6] and can be obtained easily using a suitably
looped N-bit shift register. Figure 1.10 gives an example used in measuring equipment where
N = 17. The generator polynomial is written:
g(x) = 1 + x3 + x17 (1.62)
The sequence contains 2N − 1 = 131071 bits, and it is periodic with a period T = (2N − 1)𝜏, where
𝜏 = 1/f H denotes the clock period. The spectrum is formed of lines 1/T apart. For f H = 370 kHz,
the distance between the lines is 2.8 Hz and there are 36 lines per 100 Hz.

p(x)
0.5
H 0.4
1 2 3 4 5 17 0.3
0.2
0.1
–1 1 0 1 1 x

2 2
“exclusive OR”
S

Figure 1.10 Pseudorandom generator and the probability distribution after filtering.
20 1 Signal Digitizing – Sampling and Coding

By applying to this set a narrowband filter which passes only the band 450–550 Hz, an approx-
imately Gaussian signal is obtained. The signal has a peak factor of 10.5 dB and is an excellent
test signal for digital transmission equipment. If the filtering is performed numerically, the set of
numbers obtained can be used to test digital processing equipment.

1.9 Quantization

Quantization is the approximation of each signal value s(t) by a whole multiple of an elementary
quantity q which is called the quantizing step. If q is constant for all signal amplitudes, the quan-
tization is said to be uniform. This operation is carried out by passing the signal through a device
which has a staircase characteristic and produces the signal sq (t), as shown in Figure 1.11 for q = 1.
The way in which the approximation is made defines the centering of this characteristic. For
example,
( ) the diagram
( represents
) the case (called rounding), where each value of the signal between
1 1
n − 2 q and n + 2 q is rounded to nq. This approximation minimizes the power of the error
signal. It is also possible to have approximation by default, which is obtained by truncation and
consists of approximating every value between nq and (n + 1)q by nq; the characteristic is therefore
displaced by q/2 toward the right on the abscissa.
The effect of this approximation is to superimpose on the original signal an error signal e(t) called
the quantizing distortion or, more commonly, the quantizing noise. Thus:

s(t) = sq (t) + e(t) (1.63)

The case of rounding is illustrated in Figure 1.12. The amplitudes at odd multiples of q/2 are
called the decision amplitudes. The amplitude of the error signal lies between −q/2 and q/2 and
power measures the degradation undergone by the signal.
When the variations in the signal are large relative to the quantizing step – that is, when quanti-
zation has been carried out sufficiently finely – the error signal is equivalent to a set of elementary
signals which are each formed from a straight-line segment (Figure 1.13). The power of such an

Figure 1.11 The quantization operation.

Q
s(t) sq(t)

sq(t)
5
4
3
2
1

–5 –4 –3 –2 –1 0 1 2 3 4 5 s(t)
–2
–3
–4
–5
1.9 Quantization 21

s(t)
sq(t)

q
e(t)

Figure 1.12 Quantization error.

e(t)
q
2

τ 0 τ t

2 2
q
2

Figure 1.13 Elementary error signal.

elementary signal of width 𝜏 is written:


𝜏∕2 ( ) τ∕2
q2
1 1 q 2
B= e2 (t)dt = t2 dt = (1.64)
𝜏 ∫−𝜏∕2 𝜏 𝜏 ∫−τ∕2 12
The value obtained in this way, B = q2 /l2, is a satisfactory estimate of the power of the quantizing
noise in the majority of actual cases.
The spectral distribution of the error signal is more difficult to discern. The spectrum of
the elementary error signal in Figure 1.13, Eτ (f ), can be derived from its derivative. Using
expressions (1.22) and then (1.12), the following is obtained:

[ ]
1 sin(𝜋f𝜏)
E𝜏 (f ) = q − cos(𝜋f𝜏) (1.65)
j2𝜋f 𝜋f𝜏
It appears that the majority of the energy is found around the frequency 1/𝜏. Under these
conditions, the spectral distribution of the error signal depends both on the slope of the elemen-
tary signal – that is, on the statistical distribution of the derivative s′ (t) of the signal – and on
the size of the step q relative to the signal. Reference [7] gives the calculation of this spectrum
for a noise signal and shows the spread as a function of frequency, when the quantizing step is
sufficiently small, to be on a range several hundred times the width of the signal band. If the signal
to be quantized is not random, the spectrum of the error signal will be concentrated on certain
frequencies – for example, the harmonics of a sinusoidal signal.
22 1 Signal Digitizing – Sampling and Coding

When converting an analog signal into digital form, quantization occurs, along with the sam-
pling, as the two operations are carried out in succession. While sampling is normally carried out
first, it is equally valid to carry out the quantization first and the sampling second, at a frequency
f s which is usually a little over twice the bandwidth of the signal. Under these conditions, the error
signal often has a spectrum which extends beyond the sampling frequency, and since it is actually
the sum of the signal and the error signal which is sampled, aliasing occurs, and the whole energy
of the error signal is recovered in the frequency band ( −f s /2, f s /2). In the majority of cases, the
spectral energy density of the quantizing noise is constant, and the following statement can be
made:

The noise produced during uniform quantization with a step q has a power which is gener-
ally expressed by B = q2 /12 and shows a constant spectral distribution in the frequency band
(−f s /2, f s /2).

It should be noted that the quantization of small signals (those with amplitude of the same order
of magnitude as step q) depends critically on the centering of the characteristic. For example, with
the centering in Figure 1.11, a sinusoidal signal with an amplitude of less than q/2 is totally sup-
pressed. It is possible, nevertheless, to suitably code these small signals by superimposing on them a
large-amplitude auxiliary signal which is removed later in the process. Thus, the coding introduces
limits on the small amplitudes of the signal. Equally, however, other limits appear in the case of
large amplitudes, as will be seen below.

1.10 The Coding Dynamic Range


The signal which is sampled and quantized in amplitude is represented by a set of numbers which
are almost always in binary form. If each number has N bits, the maximum number of quantized
amplitudes that it is possible to represent is 2N . Thus, the range of amplitudes that can be coded is
subject to a twofold limitation – at low values, it is limited by the quantum q, and at larger values
by 2N q. Any amplitude that exceeds this value cannot be represented and the signal is clipped.
This results in degradation, so that, for example, if the signal is sinusoidal, harmonic distortion is
introduced.
If the range of amplitudes to be coded covers the domain [ −Am , Am ], we get:
Am = 2N q∕2 (1.66)
and, with rounding, the error signal e(t) is:
|e(t)| ≤ Am 2−N
By definition, the peak power of a coder is the power of that sinusoidal signal which has the
maximum possible amplitude, Am , which the coder will pass without clipping. It is expressed by:

[ N ]2
1 2 q
Pc = = 22N−3 q2
2 2
Figure 1.14 illustrates this signal together with the quantizing step and the decision amplitudes.
The coding dynamic range is defined as the ratio between this peak power and the power of
the quantizing noise; this is, in fact, the maximum value of the signal-to-noise ratio (SNR) for a
sinusoidal signal with uniform coding. The following formula expresses this dynamic range:
3
Pc ∕B = (S∕B)max = 22N−3 •12 = ⋅ 22N
2
1.10 The Coding Dynamic Range 23

+Am

2N·q q O

–Am

Figure 1.14 Peak power of the coder.

This can be expressed more conveniently in dB as:


Pc ∕B = 6.02N + 1.76dB (1.67)
This formula, which is of great practical use, relates the number of bits in the coding to the range
of amplitudes which can be coded.
The signal to be coded is, in general, not sinusoidal. However, it is still possible to treat this case if
an equivalent peak power can be defined for the signal, which can then be taken as the peak power
of the coder. This occurs, for example, with multiplexed telephone signals. The case of random
Gaussian signals is of particular importance because they conveniently represent a large number
of signals which are encountered in practice. The maximum amplitude of the coder must then
be correctly positioned relative to the signal amplitude so that the distortion introduced by peak
limitation remains within the imposed limits.
It is evident from the table given in Appendix 2 that the probability is less than 10−3 that a signal
exceeds 3.4𝜎 when it has a zero mean and a power of 𝜎 2 . Figure 1.15 gives an example of coding
where 𝜎 = q. It would appear that the probability of clipping is less than 5 × 10−4 for the chosen
parameters.
Finally, in order to obtain the maximum SNR, it is necessary to consider the peak SNR
expressed by:
A2m q2
(SNR)c = = 3.2N
12
and subtract the peak factor F c . Thus, the general expression for the maximum SNR is:
(SNR)max = 6.02 N + 4.77 − Fc dB (1.67bis)
This result is used not only in specifying analog signal coders but also in digital processing for
determining memory sizes and scaling internal data.
The coding dynamic range for a given number of bits can be considerably increased if coding
is carried out with a quantizing step which varies with the amplitude of the signal. This is
called nonlinear coding. Numerous variations can be envisaged, but of particular importance
is the 13-segment coding law which has been standardized by the International Telegraph and
Telephone Consultative Committee (CCITT) for the coding of signals in telecommunications
networks [8].
24 1 Signal Digitizing – Sampling and Coding

x
4

3 0.004
0.018
2 0.054
0.13
1 0.24
0.352
q 0
0.1 0.2 0.3 0.4 0.5 p(x)
–1

–2

–3

–4

Figure 1.15 Coding of a Gaussian signal.

1.11 Nonlinear Coding with the 13-segment A-law


In nonlinear coding using the 13-segment A-law, the positive and negative amplitudes which are
to be coded are divided into 7 ranges, each with a quantizing step which is related to an elementary
step q by some power of 2. This operation can be regarded as resulting from linear coding which
was preceded by a compression in which the signal x is transformed into the signal y according to
the following relations:
1 + log A ∣ x ∣ 1
y = sign(x) for ⩽∣ x ∣⩽ 1
1 + log A A
A∣x∣ 1
y = sign(x) for 0 ⩽∣ x ∣⩽ (1.68)
1 + log A A
The parameter A controls the dynamic range increase; the value used is A = 87.6.
The compression characteristic of the 13-segment A-law is shown in Figure 1.16.
The operation can be described as follows:
1
if 0 < x < , then y = 16x
64
1 1 1
⩽∣ x ∣⩽ y = 8x +
64 32 8
1 1 1
⩽∣ x ∣⩽ y = 4x +
32 16 4
1 1 3
⩽∣ x ∣⩽ y = 2x +
16 8 8
1 1 1
⩽∣ x ∣⩽ y=x+
8 4 2
1 1 1 5
⩽∣ x ∣⩽ y= x+
4 2 2 8
1 1 3
⩽∣ x ∣⩽ 1 y= x+
2 4 4
1.11 Nonlinear Coding with the 13-segment A-law 25

y
1
111

110

101

100
1/2
011

010
1/4
001

000
0
1 1 1 1 1 1 1 x
64 3216 8 4 2

Figure 1.16 The 13-segment compression A-law.

This characteristic causes seven straight-line segments to appear in both the positive and negative
quadrants. As the two segments which encompass the origin are colinear, the characteristic has a
total of 13 segments.
Since the quantization of the y amplitudes is carried out with the quantum q, quantization of
the x amplitudes near the origin is based on a quantum q/16 – that is, the dynamic of the coder
is increased by 24 dB. Amplitudes close to unity are less well quantized as the step is multiplied
by 4. The power of the quantization noise is a function of the signal amplitude – it is necessary to
calculate an average for each value, and this interferes with the statistics of the signal.
Figure 1.17 gives the SNR for a Gaussian signal as a function of the signal level after coding into
8 bits for linear and nonlinear coding. The reference level for the signal (0 dB) is the peak power
of the coder, and it is clearly apparent that the dynamic is extended by nonlinear coding. For low
amplitudes, quantization in fact corresponds to 12 bits. In practice, the signal can be coded by linear
quantization into 12 bits, followed by a process which is very close to the conversion of an integer
into a floating-point number:

S
(dB)
B
40

30

20

10
24 dB

–80 –70 –60 –50 –40 –30 –20 –10 0 S (dB)

Figure 1.17 Eight-bit linear and nonlinear coding of a Gaussian signal.


26 1 Signal Digitizing – Sampling and Coding

For example, the 12-bit number

+0 0 0 1 0 1 1 0 1 1 0

corresponds to the 8-bit one

+1 0 0 0 1 1 0

after application of the compression law.


The three bits which follow the sign give the code for the exponent; the four following bits give
the position within the segment or the mantissa. The difference in the conversion from a whole
number to a floating-point number appears in the vicinity of the origin.
The implementation can use an array of gates arranged in parallel, or a shift register combined
with a 3-bit counter, for serial realization. Equally, one could use memories holding the conversion
table.
Another nonlinear coding law is also widely used in telecommunications – the 15-segment 𝜇-law,
which follows the relation:
log(1 + 𝜇|x|)
y = sign(x) for −1 ⩽ x ⩽ 1 (1.69)
log(1 + 𝜇)
The compression parameter is 𝜇 = 255.

1.12 Optimal Coding

The coding can be improved when the probability distribution p(x) of the signal amplitude is
known. For a given number of bits N, an optimal quantizing characteristic can be found, which
minimizes the total quantizing distortion.
The signal amplitude range is divided into M = 2N subsets (xi−1 ,xi ) with − (M/2) + 1 ⩽ i ⩽ M/2
and every subset is represented by a value yi, as shown in Figure 1.18. The optimization consists of
determining the set of values xi and yi which minimize the error signal power E2 expressed by:

M∕2 xi
E2 = (x − yi )2 p(x)dx
i=−(M∕2)+1
∫xi −1

yM/2

yM/2–1

y2

y1

x–2 x–1 x0 x1 x2 xM/2–1 xM/2 x

Figure 1.18 Optimal quantization characteristic.


1.12 Optimal Coding 27

Taking the derivative with respect to xi and yi , it can be shown that the following relations must
hold:
1 M M
xi = (yi + yi+1 ) for − +1⩽i⩽ −1
2 2 2
xi
M M
(x − yi )p(x)dx = 0 for ∈ − + 1 ⩽ i ⩽
∫xi −0 2 2
p(xm∕2 ) = p(x−M∕2 ) = 0 (1.70)

These relations lead to the determination of the quantizing characteristic. If p(x) is an even
function, x0 = 0, and an iterative procedure is used, starting with an a priori choice for y1 . If
relation (1.70) is not satisfied for M/2, another initial choice is made for y1 and so on [9].
Table 1.1 gives the error signal power obtained with a Gaussian signal of unit power, for
several numbers of bits N, for the optimal coding and for uniform coding with the best
scaling of the quantizing characteristic [9]. Table 1.2, taken from Reference [10], gives, for
the same conditions, the values which correspond to a signal probability density following
expression (1.48).
The coding optimization can also be carried out with respect to the information content, by intro-
ducing the concept of entropy H defined as follows [2, Vol. 3]:

H = − pi log2 (pi ) (1.71)
i

with
M M
− +1⩽i⩽ ,
2 2
and where pi designates the probability that the signal is in the amplitude subrange represented
by yi .
Considering that:

pi = 1
i

Table 1.1 Coding of unitary Gaussian signal.

N 1 2 3 4 5
E

Optimal coding 0.3634 0.1175 0.03454 0.0095 0.0025


Uniform coding 0.3634 0.1188 0.03744 0.01154 0.00349
Entropy H 1 1.911 2.825 3.765 4.730

Table 1.2 Coding of unitary Laplacian signal.

N 1 2 3 4 5
E2

Optimal coding 0.5 0.1765 0.0548 0.0954 0.00414


Uniform coding 0.5 0.1963 0.0717 0.0254 0.0087
28 1 Signal Digitizing – Sampling and Coding

the entropy is zero when the amplitude is concentrated in a single subrange. It is maximal when
the signal amplitude is uniformly distributed when it takes the value H max equal to the number of
bits N of the coder:
Hmax = log2 M = N (1.72)
In fact, the entropy measures the difference between a given distribution and the uniform distri-
bution. The quantizing characteristic which maximizes the entropy is that which leads to amplitude
subranges corresponding to a uniform probability distribution.
The last row in Table 1.1 shows that for a Gaussian signal, the quantizing law which minimizes
the error signal power leads to entropy values close to the maximum N.

1.13 Quantity of Information and Channel Capacity

The results obtained for sampling and quantization can be used, inversely, to evaluate the quantity
of information carried by a signal or to determine the capacity of a transmission channel.
A real channel of bandwidth f m can carry 2f m independent samples per second, as shown in
Figure 1.4, by replacing 𝜏 with 2f m . The quantity of information per sample depends on the relative
powers of the useful signal, the noise, and their amplitude distributions.
An important particular case is the Gaussian channel [11].
Let us assume a set of M symbols of N bits each is to be transmitted by a channel in the presence
of white Gaussian noise of power B = 𝜎b2 .
In a M-dimension hyperspace, the M symbols occupy the volume of a hypersphere V M
defined by:
R
RM
VM = r M−1 dr ··· f (𝜃i ) d𝜃i = F (1.73)
∫o ∫i=1,..,M−1 ∫ M 𝜃
If a uniform distribution of the symbols in the hypersphere with radius R is assumed, the energy
of the corresponding signal is:
R
1 M
ES = r 2 r M−1 dr ··· f (𝜃 )d𝜃 = R2 (1.74)
VM ∫0 ∫i=1 ∫M−1 i i M+2
The quantity of transmitted information is M⋅N bits. In the hypersphere, it is possible to associate
with each set of bits a volume V s expressed by:
( )M
VM 1 R
Vs = MN = F(𝜃) (1.75)
2 M 2N
Now, a M-component noise with energy Eb = M𝜎b2 is assigned to each set of bits. When M
tends toward infinity,
√ the point representing the noise in the hypersphere comes close to a sphere
with radius 𝜎b2 M and centered on the point representing the set of bits. In fact, for M Gaussian
√ √
∑M 2 M2
random variables b(n), the variable r = n=1 b (n) has the first-order moment m1 = 𝜎b M+1
and its variance m2 − m21 tends toward zero when M tends toward infinity. The volume of the
sphere is:

( M 𝜎b )M
Vb = F(𝜃) (1.76)
M
1.14 Binary Representations 29

The condition for the absence of transmission errors is that the volume of the sphere be included
in the volume assigned to each set of bits, which leads to:
√ R
M 𝜎b < N (1.77)
2
However, when M tends toward infinity, according to (1.74), R2 represents the energy of the sum
of the signal with power S and the noise:
( )
R2 = M S + 𝜎b2 (1.78)

Then, there are no transmission errors if the following inequality is satisfied:


S + 𝜎b2
22N < (1.79)
𝜎b2
Hence:
( )
1 S
N < log2 1+ 2 (1.80)
2 𝜎b
Assuming a real channel of bandwidth W and no distortion, the symbols may be emitted at the
rate 2W, and the asymptotic capacity of the channel, in bits per second, is:
( )
S
C = W log2 1 + (1.81)
B
The assumptions are worth emphasizing:

– distortion-free channel
– white Gaussian noise
– infinite transmission delay

In practice, channel equalization and error-correcting codes make it achievable to approach that
limit with finite transmission delay.

1.14 Binary Representations

There are several ways of establishing the correspondence between the set of quantized amplitudes
and the set of binary numbers which must represent them. As the signals to be coded generally have
both positive and negative amplitudes, the preferred representations are those which preserve the
sign information. The following are the most usual representations:

(1) Sign and absolute value


(2) Offset binary
(3) 1s-complement
(4) 2s-complement

Table 1.3 defines them for three bits.


The most useful representations for analog-to-digital conversion are sign and absolute value,
and offset binary. The remaining two representations are mostly used in arithmetic circuits. As
mentioned in Section 1.11, nonlinear coding brings a considerable increase of the dynamic range.
30 1 Signal Digitizing – Sampling and Coding

Table 1.3 Binary representation for linear code.

Number Sign and value Offset binary 1s complement 2s complement

+3 011 111 011 011


+2 010 110 010 010
+1 001 101 001 001
+0 000 100 000 000
–0 100 – 111 –
–1 101 011 110 111
–2 110 010 101 110
–3 111 001 100 101
(0 0 0) (1 0 0)

Digital processing machines, and particularly general-purpose ones, often use floating-point
representations in which each number has three parts: the sign bit, the mantissa, and the expo-
nent. The mantissa represents the fractional part, and the exponent is a power of the base number,
for example, with base 10: +0.719 × 105 .
The dynamic range extension comes from the multiplicative effect introduced by the exponent.
6
For example, in base 2 for a 6-bit exponent and 16-bit mantissa, the dynamic range is: 22 × 216 =
280 ≃ 1024 – that is, 24 decimal numbers. Additional gain is achieved by choosing a base which is,
itself, a power of two, such as 8 or 16, leading to octal or hexadecimal operations (Figure 1.19).

20 log λ Figure 1.19 The reduced normal distribution: P as a function


of 20 log λ.
–5 0 5 10 15
P 1

10–1

10–2

10–3

10–4

10–5
1.B Appendix 2: The Reduced Normal Distribution 31

1.A Appendix 1: The Function I(x)


The set of values:
sin(𝜋(n∕20))
I(n) =
𝜋(n∕20)
for 0 ⩽ n ⩽ 159 when n = k + 20N, is given in the table below.

k N=0 N=1 N=2 N=3 N=4 N=5 N=6 N=7

0 1 0 0 0 0 0 0 0
1 0.99589 −0.04742 0.02429 −0.01633 0.01229 −0.00986 0.00823 −0.00706
2 0.98363 −0.08942 0.04684 −0.03173 0.02399 −0.01929 0.01613 −0.01385
3 0.96340 −0.12566 0.06721 −0.04588 0.03482 −0.02806 0.02350 −0.02021
4 0.93549 −0.15591 0.08504 −0.05847 0.04455 −0.03598 0.03018 −0.02599
5 0.90032 −0.18006 0.10004 −0.06926 0.05296 −0.04287 0.03601 −0.03105
6 0.85839 −0.19809 0.11196 −0.07804 0.05989 −0.04850 0.04088 −0.03528
7 0.81033 −0.21009 0.12069 −0.08466 0.06520 −0.05301 0.04466 −0.03859
8 0.75683 −0.21624 0.12614 −0.08904 0.06880 −0.05606 0.04730 −0.04091
9 0.69865 −0.21682 0.12832 −0.09113 0.07065 −0.05769 0.04874 −0.04220
10 0.63662 −0.21221 0.12732 −0.09095 0.07074 −0.05787 0.04897 −0.04244
11 0.57162 −0.20283 0.12329 −0.08856 0.06910 −0.05665 0.04800 −0.04164
12 0.50455 −0.18921 0.11643 −0.08409 0.06581 −0.05406 0.04587 −0.03983
13 0.43633 −0.17189 0.10702 −0.07770 0.06099 −0.05020 0.04265 −0.03707
14 0.36788 −0.15148 0.09538 −0.06960 0.05479 −0.04518 0.03844 −0.03344
15 0.30011 −0.12862 0.08185 −0.06002 0.04739 −0.03914 0.03335 −0.02904
16 0.23387 −0.10394 0.06682 −0.04924 0.03989 −0.03298 0.02751 −0.02399
17 0.17001 −0.07811 0.05071 −0.03753 0.02980 −0.02470 0.02110 −0.01841
18 0.10929 −0.05177 0.03392 −0.02522 0.02007 −0.01667 0.01426 −0.01245
19 0.05242 −0.02544 0.01688 −0.01261 0.01006 −0.00837 0.00716 −0.00626

1.B Appendix 2: The Reduced Normal Distribution

∞ ∞
√ e−x ∕2 ∫𝝀 e−x
2 ∕2
∫𝝀 e−x
1 2 2 2 ∕2
f (x) = P= P=
2

2𝝅

2𝝅
dx √
2𝝅
dx

X 105 ⋅f (x) 100 P 𝝀 𝝀 100 P

0 39,894 100 0 0 100


0.2 39,104 95 0.0627 0.2 84.148
0.4 36,827 90 0.1257 0.4 68.916
0.6 33,322 85 0.1891 0.6 54.851
0.8 28,969 80 0.2533 0.8 42.371
1 24,197 75 0.3186 1 31.731
32 1 Signal Digitizing – Sampling and Coding

∞ ∞
√ e−x ∕2 ∫𝝀 e−x
2 ∕2
∫𝝀 e−x
1 2 2 2 ∕2
f (x) = P= P=
2

2𝝅

2𝝅
dx √
2𝝅
dx

X 105 ⋅f (x) 100 P 𝝀 𝝀 100 P

1.2 19,419 70 0.3853 1.2 23.014


1.4 14,973 65 0.4538 1.4 16.151
1.6 11,092 60 0.5244 1.6 10.960
1.8 7,895 55 0.5978 1.8 7.186
2 5,399 50 0.6745 2 4.550
2.2 3,547 45 0.7554 2.2 2.781
2.4 2,239 40 0.8416 2.4 1.640
2.6 1,358 35 0.9346 2.6 0.932
2.8 792 30 1.0364 2.8 0.511
3 443 25 1.1503 3 0.270
3.2 238 20 1.2816 3.2 0.137
3.4 123 15 1.4395 3.4 0.067
3.6 61 10 1.6449 3.6 0.032
3.8 29 5 1.9600 3.8 0.014
4 13 1 2.5758 4 0.006
4.2 5.9 0.1 3.2905 4.5 0.00068
4.4 2.5 0.01 3.8906 5 0.000057
4.6 1 0.001 4.4172 5.5 0.000004
0.0001 4.8916
0.00001 5.3267

Approximation for large values of 𝝀:

3 1 −𝜆∕2
P≈ e
4 𝜆

Exercises

1.1 Consider the Fourier series expansion of the periodic function i(t) of period T, which is zero
throughout the period except for the range − 𝜏/2 ⩽ t ⩽ 𝜏/2, where it has a value of 1.
Give the value of the coefficients for 𝜏 = T/2 and 𝜏 = T/3.
Verify that the expansion leads to i(0) = 1 and draw the function when the expansion is
limited to 5 terms.

1.2 Analyze the sampling at the frequency f s of the signal s(t) = sin (𝜋f s t + 𝜑) when 𝜑 varies
from 0 to 𝜋/2.
Examine the reconstitution of this signal from the samples.
Exercises 33

1.3 Calculate the amplitude distortion introduced into a signal reconstituted by pulses with a
width of half the sampling period.

1.4 A signal occupies the frequency band [f 1 , f 2 ]. What conditions should be imposed on the
frequency f 1 so that this signal can be sampled directly at a frequency between f 2 and 2f 2 ?

1.5 Analyze the sampling of a signal given by:


∑3 ( )
n t
si (t) = sin 2𝜋
n=1
8T
and compare it with that of the signal:
∑3 ( )
n t
sr (t) = cos 2𝜋
n=1
8T
Show by studying the spectra that the combination of the two sets of samplings forms the
sampling of a complex signal.

1.6 Let s(t) be the signal defined by:


( )
∑3
kt
s(t) = 1 + 2 cos 2𝜋 + 𝜑k + cos(𝜋t + 𝜑4 )
k=1
8
This signal is sampled with the period T = 1. What is the maximum value of s(n), where n
is a whole number? Show that there exists a set of values 𝜑k (k = 1,2,3,4) which minimizes
the maximum value of s(n). Can this property be generalized?

1.7 A digital frequency synthesizer is constructed from read-only memory of 16 kbits with an
access time of 500 ns. Knowing that the numbers which represent the samplings of sinu-
soidal signals total 8 bits, what are the characteristics of the synthesizer, and the frequency
range and increment step that can be obtained?

1.8 What is the probability distribution of the amplitudes of the sinusoidal signal:
( )
t
s(t) = A cos 2𝜋
T
Give its autocorrelation function. Give the autocorrelation function of a stationary random
Gaussian function whose spectrum has a uniform distribution in the frequency band
(f 1 , f 2 ).

1.9 Calculate the spectrum of a set of impulses with width T/2, separated by T, the occurrence
of each pulse having probability p. In particular, examine the case where p = 1/2.
What happens to the spectrum if these pulses form a pseudorandom sequence with a length
24 −1 = 15 produced by a 4-bit shift register, following the polynomial g(x) = x4 + x + 1?

1.10 A sinusoidal signal with frequency 1050 Hz is sampled at 8 kHz and coded into 10 bits. What
is the maximum value of the SNR? What is the value of the signal-to-quantization-noise ratio
measured in the frequency band 300–500 Hz? What are the values if the sampling frequency
is increased to 16 kHz?
34 1 Signal Digitizing – Sampling and Coding

1.11 The sinusoidal signal sin (2𝜋t/8 + 𝜑) with 0 ⩽ 𝜑 ⩽ 𝜋/2 is sampled with period T = 1 and
coded into 5 bits.
In the case where 𝜑 = 0, calculate the power and the spectrum of the quantization noise.
How does this spectrum appear as a function of the phase 𝜑?

1.12 Consider a coding scale in which the quantizing step has a value q. What is the quantization
of the signal s1 (t) = 𝛼q sin (𝜔1 t) for −1 ⩽ 𝛼 ⩽ 1, as a function of the centering of the quan-
tization characteristic? Show the envelope of the restored signal after decoding and narrow
filtering around the frequency 𝜔1 .
The signal s2 (t) = 10q sin 𝜔2 t is superimposed on s1 (t). Show the envelope of the restored
signal under these conditions.

1.13 Assume a Gaussian signal is to be coded. How many bits would be required to have the
signal-to-quantization-noise ratio greater than 50 dB? Can this number be reduced if signal
clipping is allowed for 1% of this time?

1.14 The signal s(t) = A sin (2𝜋⋅810t) is coded into 8 bits. If the quantization step is q, trace the
curve which shows the signal-to-quantization-noise ratio as a function of the amplitude A
when this amplitude varies from q to 27 q. Sketch the corresponding curve for nonlinear
coding following the 13-segment A-law.

1.15 Calculate the limits of the amplitude subranges for the optimal 2-bit coding of a unit
Gaussian signal.

References

1 A. Papoulis, The Fourier Integral and its Applications, McGraw-Hill, New York, 1962.
2 E. Roubine, Introduction à la théorie de la communication, 3 vols, Masson, Paris, 1970.
3 W. B. Davenport, Probability and Random Processes, McGraw-Hill, New York, 1970.
4 J. R. Rice, The Approximation of Functions, vol. 1, Addison-Wesley, Reading, Mass, 1964.
5 B. Picinbono, Principles of Signals and Systems, Artech House Inc., London, 1988.
6 W. Peterson, Error Correcting Codes, MIT Press, 1972.
7 W. B. Bennet, Spectra of quantized signals. The Bell System Technical Journal, 1948.
8 CCITT, Digital Networks — transmission systems and multiplexing equipment. Yellow Book,
Vol. III, 3, Geneva, Switzerland, 1981.
9 J. Max, Quantizing for minimum distortion. IRE Transactions on Information Theory 6, 7–12,
1960.
10 M. D. Paez and T. H. Glisson, Minimum mean-squared error quantization in speech PCM and
DPCM systems. IEEE Transactions on Communications 20, 225–30, 1972.
11 C. Shannon, Communication in the presence of noise, Proceedings of I.R.E., Vol. 37, pp. 10–21,
Jan. 1949 (reprinted in: Proceedings of the IEEE, Sept. 1984 and Feb. 1998).
35

The Discrete Fourier Transform

The discrete Fourier transform (DFT) is introduced when the Fourier transform of a function is to
be calculated using a digital computer. This type of processor can handle only numbers and, in a
quantity limited by the size of its memory. It follows that the Fourier transform:

S(f ) = s(t)e−j2𝜋ft dt
∫−∞
must be adapted, by replacing the signal s(t) with the numbers s(nT) which represent a sample
of the signal, and by limiting to a finite value N the set of numbers on which the calculations are
carried out. The calculation then provides numbers S*(f ) defined by

N−1
S ∗ (f ) = s(nT)e−j2𝜋fnT
n=0

As the computer has limited processing power, it can only provide results for a limited number
of values of the frequency f , and it is natural to choose multiples of a certain frequency step Δf .
Thus,

N−1
S ∗ (kΔf ) = s(nT)e−j2𝜋nkΔfT
n=0

The conditions under which the calculated values form a good approximation to the required
values are examined below. An interesting simplifying choice is to take Δf = 1/NT. Then there are
only N different values of S*(k/NT), which is a periodic set of period N since:
S ∗ [(k + N)∕NT] = S ∗ (k∕NT)
On the other hand, the transform thus calculated appears as discrete values and, as shown in
Section 1.6, this property is characteristic of the spectrum of periodic functions. Thus, the set
S*(k/NT) is obtained by the Fourier transform of the set s(nT), which is periodic, with period NT.
The DFT and the inverse transform establish the relations between these two periodic sets.
The definition, properties, methods of calculation, and applications of the DFT have been
discussed in numerous publications. Overviews are given in References [1–4].

Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
36 2 The Discrete Fourier Transform

2.1 Definition and Properties of the Discrete Fourier Transform

If two sets of complex numbers, x(n) and X(k), which are periodic with period N, are chosen, then
the DFT and the inverse transform establish the following relationships between them:

1∑
N−1
X(k) = x(n)e−j2𝜋nk∕N (2.1)
N n=0

N−1
x(n) = X(k)ej2𝜋kn∕N (2.2)
k=0

The position of the scale factor 1/N is chosen so that the X(k) are the coefficients of the Fourier
series expansion of the set x(n). This transformation has the following properties.
Linearity: If x(n) and y(n) are two sets with the same period and with transforms X(k) and Y (k),
respectively, the set v(n) = x(n) + 𝜆y(n), where 𝜆 is a scalar, has the transform

V(k) = X(k) + 𝜆Y(k)

A translation of the x(n) implies a rotation of the phase of the X(k). If the transform Xn0 (k) of the
set x(n − n0 ) is calculated, then,

N−1
Xn0 (k) = x(n − n0 )e−j2𝜋nk∕n = X(k)e−j2𝜋n0 k∕N
n=0

A translation of the x(n) from n0 induces on X(k) a rotation of the phase through an angle equal
to 2𝜋n0 k/N.
Symmetry: If the set x(n) is real, the numbers x(k) and X(N − k) are complex conjugates:

N
X(N − k) = x(n)ej2𝜋n(N−k)∕N = X(k)
n=0

If the set x(n) is real and even, then so is the set X(k). Indeed, if x(N − n) = x(n) then, for example,
for N = 2P + 1:
( )
∑p
nk
X(N − k) = x(0) + 2 x(n) cos 2𝜋 = X(k)
n−1
N
If the set x(n) is real and odd, the set X(k) is purely imaginary. In this case, x(N − n) = − x(n) and
x(0) = x(N) = 0. For example, for N = 2P + 1, this becomes:
( )
∑p
nk
X(k) = −2j x(n) sin 2𝜋 = −X(N − k)
n=1
N
It should be noted that X(0) = X(N) = 0.
As any real signal can always be decomposed into odd and even parts, these last two symmetry
properties are important.
Circular convolution: The transform of a convolution product is equal to the product of the
transforms.
If x(n) and h(n) are two sets with period N, the circular convolution y(n) can be defined by the
equation:

N−1
y(n) = x(l)h(n − l) (2.3)
𝜄=0
2.1 Definition and Properties of the Discrete Fourier Transform 37

This is a set which has the same period N. Its transform is written as:
[ ]
∑ ∑
N−1 N−1
Y (k) = x(l)h(n − l) e−j2𝜋nk∕N
n=0 𝜄=0
[N−1 ]

N−1

−j2𝜋(n−l)k∕N
= x(l) h(n − l)e e−j2𝜋lk∕N
l=0 n=0
(N−1 ) (N−1 )
∑ ∑
−j2𝜋[(n−l)k∕N] −j2𝜋lk∕N
Y (k) = h(n − l)e x(l)e = H(k)X(k) (2.4)
n=0 𝜄=0

This is an important property of the DFT. A direct application will be given later.
Parseval’s relation: This relation states that the power of a signal is equal to the sum of the powers
of its harmonics. Thus,
1∑ 1∑ ∑
N−1 N−1 N−1
x(n)x(n) = x(n) X(k)e−j2𝜋kn∕N
N n=0 N k=0 k=0

1∑ 1∑ 1∑
N−1 N−1 N−1
|x(n)|2 = X(k) x(n)e−j2𝜋kn∕N
N n=0 N k=0 N n=0

1∑ ∑
N−1 N−1
|x(n)|2 = |X(k)|2 (2.5)
N n=0 k=0

Relations with Fourier series: Due to the presence of the scale factor 1/N in the definition (2.1)
of the DFT, the output terms X(k) represent, except for spectrum aliasing effects, the coeffi-
cients of the Fourier series development of the periodic signal, when this signal exhibits no
discontinuity. If it is not the case, noticeable discrepancies emerge. In fact, it can be shown
that, if there is a discontinuity in function x(t) at time t0 , the Fourier series development of
x(t) at time t0 equals the average of the left and right limits of x(t) when t tends toward t0 . By
contrast, the inverse DFT restores the exact original signal samples and, therefore, the terms
X(k) include the DFT of the distribution of the discontinuities with the inverse sign and scale
factor 0.5.
Illustration: x(t) is the following triangular signal:
x(t) = t; 0 ≤ t < 1

x(t + 1) = x(t)
Coefficients of the Fourier series:
1 j
C0 = ; Cn = ; n ∶ integer; n ≠ 0
2 2𝜋n
The magnitude of the discontinuity at the origin is 1 and the DFT of order N gives the following
values:
1
X(k) = − + X ′ (k); X ′ (k) ≈ Ck
N
This specificity of the DFT is an undesirable effect when a development with the smallest number
of non-negligible coefficients is sought, as in signal compression, for example. If the signal is real,
a symmetric signal is appended, which cancels the discontinuity and leads to the discrete cosine
transform described in Section 3.3.4.
38 2 The Discrete Fourier Transform

However, the most important property of the DFT probably lies in the fact that it lends itself
to efficient calculation techniques. This property has won it a prominent position in digital signal
processing.

2.2 Fast Fourier Transform (FFT)

The equations defining the DFT provide a relationship between two sets of N complex numbers.
This is conveniently written in matrix form by setting

W = e−j2𝜋∕N . (2.6)

The coordinates of the numbers W n , and the coefficients of the DFT, appear on the unit circle in
the complex plane as shown in Figure 2.1, and are the roots of the equation Z N − 1 = 0.
The matrix equation for the direct transform is as follows:

⎡ X0 ⎤ ⎡1 1 1 1 ··· 1 ⎤ ⎡ x0 ⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ X1 ⎥ ⎢1 W W2 W3 · · · W N−1 ⎥ ⎢ x1 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ X ⎥ = 1 ⎢1 W 2 W 4 W 6 · · · W 2(N−1) ⎥ ⎢ x2 ⎥
⎢ 2
⎥ N⎢ ⎥⎢ ⎥
⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ··· ⋮ ⎥⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ (N−1) 2(N−1) (N−1)(N−1) ⎥⎢ ⎥
⎣XN−1 ⎦ ⎣1 W W ··· W ⎦ ⎣xN−1 ⎦
For the inverse transform, it is sufficient to remove the factor 1/N and change W n to W −n .
The square matrix T N of order N exhibits obvious features – rows and columns with the same
index have the same elements and these elements are powers of a basic number such that W N = 1.
Significant simplifications can be envisaged under these conditions, leading to algorithms for
fast calculation. A FFT is said to have been carried out when the DFT is calculated using such
algorithms.
An important case occurs when N is a power of 2 because it leads to algorithms which are simple
and particularly effective. These algorithms are based on a decomposition of the set to be trans-
formed into a number of interleaved subsets. The case of interleaving in the time domain will be
considered first, which leads to the so-called decimation-in-time algorithms.

Im Figure 2.1 Coordinates of the coefficients of a discrete Fourier


transform.
–3
w –2
w
–1
w 2𝜋
1 N
Re
w
w2
2.2 Fast Fourier Transform (FFT) 39

2.2.1 Decimation-in-time Fast Fourier Transform


The set of elements x(n) can be decomposed into two interleaved sets – those with even indices
and those with odd ones. Using this decomposition, the first N/2 elements of the set X(k) can be
written as:
⎡ X0 ⎤ ⎡1 1 ··· 1 ⎤ ⎡ x0 ⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ X1 ⎥ ⎢1 W2 ··· W 2(N∕2−1) ⎥ ⎢ x2 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ X ⎥ = ⎢1 W4 ··· W 4(N∕2−1) ⎥ ⎢ x4 ⎥
⎢ 2
⎥ ⎢ ⎥⎢ ⎥
⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎢ ⎥ ⎢ 2(N∕2−1) 2(N∕2−1)(N∕2−1) ⎥⎢ ⎥
⎣XN∕2−1 ⎦ ⎣1 W ··· W ⎦ ⎣x2(N∕2−1) ⎦

⎡ 1 1 ··· 1 ⎤ ⎡ x1 ⎤
⎢ ⎥⎢ ⎥
⎢ W W3 ··· W N−1 ⎥ ⎢ x3 ⎥
⎢ ⎥⎢ ⎥
+ ⎢ W2 W6 ··· W 2(N−1) ⎥ ⎢ x5 ⎥
⎢ ⎥⎢ ⎥
⎢ ⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥
⎢ ⎥⎢ ⎥
⎢ N∕2−1 3(N∕2−1) ⎥⎢ ⎥
⎣W W · · · W (N∕2−1)(N−1) ⎦ ⎣xN−1 ⎦

If the matrix which multiplies the column vector of elements with even indices is denoted by
T N/2 , then the matrix multiplying the vector of elements with odd indices can be factorized into
the product T N/2 and a diagonal matrix so that,

⎡ X0 ⎤ ⎡ x0 ⎤ ⎡1 0 0 ··· 0⎤ ⎡ x1 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ X1 ⎥ ⎢ x2 ⎥ ⎢0 W 0 ··· 0⎥ ⎢ x3 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ X ⎥=T ⎢ x ⎥ + ⎢0 0 W2 ··· 0⎥ TN∕2 ⎢ x5 ⎥
⎢ 2
⎥ N∕2
⎢ 4
⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮⎥ ⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ N∕2−1 ⎦
X ⎣ 2(N∕2−1) ⎦ ⎣0
x 0 … W N∕2−1 ⎦ ⎣ N∕2−1 ⎦
x

Similarly, for the last N/2 elements of the set X(k), remembering that W N = 1, one can write:

⎡ XN∕2 ⎤ ⎡ x0 ⎤ ⎡1 0 0 ··· ⎤ 0 ⎡ x1 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢XN∕2+1 ⎥ ⎢ x2 ⎥ ⎢1 W 0 ··· 0 ⎥ ⎢ x3 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢X ⎥ ⎢ x ⎥ − ⎢0 W2 · · · 0 ⎥ TN∕2 ⎢ x5 ⎥
N∕2+2 = TN∕2 0
⎢ ⎥ ⎢ 4
⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ⋮ ⎥ ⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ XN−1 ⎦ ⎣x2(N∕2−1) ⎦ ⎣0 0 0 … W N∕2−1 ⎦ ⎣xN−1 ⎦
It is apparent that the calculation of X(k) and X(k + N/2) for 0 < k(N/2) − 1 uses the same
calculations with only a change in sign in the final sum. Hence the following diagram:
This shows that the calculation of a Fourier transform of order N reduces to the calculation of
two transforms of order N/2, to which M/2 complex multiplications are added. By iteration through
40 2 The Discrete Fourier Transform

a number of steps given by log2 N − 1 = log2 (N/2), transforms of order 2 are arrived at. These have
the matrix:
[ ]
1 1
T2 =
1 −1
and no multiplications are required.
As each stage involves N/2 complex multiplications, the complete transform requires M c complex
multiplications, where:
Mc = (N∕2)log2 (N∕2) (2.7)
and Ac complex additions, where:
Ac = Nlog2 N (2.8)
In practice, the number of complex multiplications can again be reduced because some powers
of W have certain properties. For example, W 0 = 1 and W N/4 = −j do not require complex multipli-
cations, and

N∕8 2
W = (1 − j)
2
only requires one complex half-multiplication each. Thus, three multiplications can be saved in the
first stage, 3N/8 can be eliminated in the penultimate one, and 2N/4 in the last. The gain over all
the stages is 5N/4 − 3 and the minimum number of complex multiplications is given by:
[ ]
5
mc = N∕2 log2 (N∕2) − +3 (2.9)
2
It should be noted that all of these calculation reductions cannot always be easily implemented
either in software or in hardware.
The matrix for the fourth-order transform is:
⎡1 1 1⎤
1
⎢ ⎥
⎢1 −j −1 +j ⎥
T4 = ⎢ ⎥ (2.10)
⎢1 −1 1 −1⎥
⎢ ⎥
⎢1 +j −1 −j ⎥⎦

The diagram for its reduction is shown in Figure 2.2. By convention, the arrows represent multi-
plications and the solid circles to the left of the elementary flow graphs, often called “butterflies,”
represent an addition (upper one) and a subtraction (lower one). The eighth-order transform is
represented in Figure 2.3.
It is seen that in this treatment, the indices of the X(k) appear in natural order, while those of the
x(n) are in a permuted one. This permutation is caused by the successive interleaving and results

X0 = x0 + x2 + x1 + x3 (x0 + x2) x0

X1 = x0 – x2 – j(x1 – x3) (x0 – x2) x2

X2 = x0 + x2 – x1 – x3 (x1 + x3) x1
w0
X3 = x0 – x2 + j(x1 – x3) (x1 – x3) x3
w1

Figure 2.2 Transform of order 4 with decimation in time.


2.2 Fast Fourier Transform (FFT) 41

X0 x0

X1 x4

X2 x2
w0
X3 x6
w2
X4 x1
w0
X5 x5
w1
X6 x3
w2 w0
X7 x7
w3 w2

Figure 2.3 Transform of order 8 with decimation in time.

in a reversal or inversion of the binary representation of the indices, which is often called “bit
reversal.” For example, for N = 8,
x0 (000) corresponds to x0 (000)
x4 (100) corresponds to x1 (001)
x2 (010) corresponds to x2 (010)
x6 (110) corresponds to x3 (011)
x1 (001) corresponds to x4 (100)
x5 (101) corresponds to x5 (101)
x3 (011) corresponds to x6 (110)
x7 (111) corresponds to x7 (111)
The amount of data memory required to calculate a transform of order N is that needed to hold
N complex positions. Indeed, the calculations are performed on pairs of variables which undergo
the operation represented by a butterfly and preserve their position in the set of variables at the
end of the operation, as is clearly shown in the diagrams. This is called “in-place computation.”
The inverse transform is simply obtained by changing the sign of the exponent of W. The factor
1/N can be introduced, for example, by halving the results of the additions and subtractions made
in the butterflies. This allows scaling of the numbers in the memories.
This type of interleaving can also be applied to X(k) when a similar algorithm is obtained. It leads
to the so-called decimation-in-frequency algorithms.

2.2.2 Decimation-in-frequency Fast Fourier Transform


The set of elements X(k) can be decomposed into a pair of interleaved sets – one with even indices
and the other with odd ones. For the even index elements, since W N = 1, the following situation
occurs after elementary factorization:

⎡ X1 ⎤ ⎡ 1 W W2 ··· W N∕2−1 ⎤ ⎡ x0 − xN∕2 ⎤


⎢ X ⎥ ⎢1 W 3 ⎢ ⎥
⎢ 3 ⎥ ⎢ W6 ··· W 3(N∕2−1) ⎥⎥ ⎢ x1 − xN∕2+1 ⎥
⎢ X ⎥ = ⎢1 W 5 ⎢ ⎥
W10 ··· W 5(N∕2−1) ⎥ ⎢ x2 − xN∕2+2 ⎥
⎢ 5 ⎥ ⎢ ⎥⎢ ⎥
⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ··· ⋮ ⎥⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥
⎣XN−1 ⎦ ⎣1 W N−1 W 2(N−1) … W (N−1)(N∕2−1) ⎦ ⎢⎣xN∕2−1 − xN−1 ⎥⎦
42 2 The Discrete Fourier Transform

For the elements with odd indices after a similar process, the corresponding equation is:

⎡ X1 ⎤ ⎡ 1 W W2 ··· W N∕2−1 ⎤ ⎡ x0 − xN∕2 ⎤


⎢ X ⎥ ⎢1 W 3 ⎢ ⎥
⎢ 3 ⎥ ⎢ W 6 ··· W 3(N∕2−1) ⎥⎥ ⎢ x1 − xN∕2+1 ⎥
⎢ X ⎥ = ⎢1 W 5 ⎢ ⎥
W ··· W 5(N∕2−1) ⎥ ⎢ x2 − xN∕2+2 ⎥
⎢ 5 ⎥ ⎢ 10
⎥⎢ ⎥
⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ··· ⋮ ⎥⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥
⎣XN−1 ⎦ ⎣1 W N−1 W 2(N−1) … W (N−1)(N∕2−1) ⎦ ⎢⎣xN∕2−1 − xN−1 ⎥⎦

In this case, the square matrix obtained is equal to the product of the matrix T N/2 obtained for
the elements with even indices, and the diagonal matrix whose elements are the powers W k with
0 ⩽ k ⩽ N/2 − 1. Thus,

⎡ X1 ⎤ ⎡ 1 0 0 ··· 0⎤ ⎡ x0 − xN∕2 ⎤
⎢ X ⎥ ⎢0 ⎢ ⎥
⎢ 3 ⎥ ⎢ W 0 ··· 0 ⎥⎥ ⎢ x1 − xN∕2+1 ⎥
⎢ X ⎥ = ⎢0 ⎢ ⎥
0 W2 · · · 0 ⎥ ⎢ x2 − xN∕2+2 ⎥
⎢ 5 ⎥ ⎢ ⎥⎢ ⎥
⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥
⎣XN−1 ⎦ ⎣0 0 0 … W N∕2−1 ⎦ ⎢⎣xN∕2−1 − xN−1 ⎥⎦

The elements x(k) with even and odd indices are calculated using the square matrix T N/2 for the
transform of order N/2, and the following diagram is obtained:
By adopting the same notation for the butterflies as was used in the preceding section, similar
diagrams are obtained. Figure 2.4 shows the diagram for N = 8.
In decimation-in-frequency algorithms, the number of calculations is the same as with time inter-
leaving. The numbers x(n) to be transformed appear in their natural order, while the transformed
numbers X(k) are permuted.
The algorithms which have been obtained so far are based on a decomposition of the transform
of order N into elementary second-order transforms which do not require multiplications. These
algorithms are said to be radix-2 transforms. Other elementary transforms can also be used, the
most important being the radix-4 one, which uses the elementary matrix T 4 .

X0 x0

X4 x1

X2 x2
w0
X6 x3
w2
X1 x4
w0
X5 x5
w1
X3 x6
w0 w2
X7 x7
w2 w3

Figure 2.4 Transform of order 8 with decimation in frequency.


2.2 Fast Fourier Transform (FFT) 43

2.2.3 Radix-4 FFT Algorithm


This algorithm can be used when N is a power of 4. The set of numbers x(k) is decomposed into four
interleaved sets. Let us calculate the first N/4 values of X(k) as an illustration of the decomposition.
If T N/4 denotes the square matrix of the transform of order N/4. and if Di (i = 1, 2, 3) is the diagonal
matrix whose elements are the powers W ik with 0 ⩽ k ⩽ N/4 − 1, then,

⎡ X0 ⎤ ⎡ x0 ⎤ ⎡ x1 ⎤ ⎡ X2 ⎤ ⎡ X3 ⎤
⎢ X ⎥ ⎢ x ⎥ ⎢ x ⎥ ⎢ X ⎥ ⎢ X ⎥
⎢ 1 ⎥ ⎢ 4 ⎥ ⎢ 5 ⎥ ⎢ 6 ⎥ ⎢ 7 ⎥
⎢ X ⎥=T ⎢ x ⎥+D T ⎢ x ⎥+D T ⎢ X ⎥+D T ⎢ X ⎥
⎢ 2
⎥ N∕4
⎢ 8
⎥ 1 N∕4
⎢ 9 ⎥ 2 N∕4
⎢ 10 ⎥ 3 N∕4
⎢ 11 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ N−2 ⎥ ⎢ ⎥
⎣XN∕4−1 ⎦ ⎣x4(N∕4−1) ⎦ ⎣xN−3 ⎦ ⎣X ⎦ ⎣XN−1 ⎦

The next N/4 terms of X(k) are given by:

⎡ XN∕4 ⎤ ⎡ x0 ⎤ ⎡ x1 ⎤ ⎡ x2 ⎤ ⎡ x3 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢XN∕4+1 ⎥ ⎢ x4 ⎥ ⎢ x5 ⎥ ⎢ x6 ⎥ ⎢ x7 ⎥
⎢ ⎥ = TN∕4 ⎢ ⎥ − jD1 TN∕4 ⎢ ⎥ − D2 TN∕4 ⎢ ⎥ + jD3 TN∕4 ⎢ ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎢X ⎥ ⎢x ⎥ ⎢x ⎥ ⎢x ⎥ ⎢x ⎥
⎣ N∕2−1 ⎦ ⎣ 4(N∕4−1) ⎦ ⎣ N−3 ⎦ ⎣ N−2 ⎦ ⎣ N−1 ⎦

This equation involves the same matrix calculations as the previous one, with the addition of the
multiplications by the elements of the second row of the matrix T 4 . It can hence be shown that the
calculation of the transform results in the diagram in Figure 2.5.

0 k N/4 – 1
X0 x0
X1 row k x4
=
TN/4
w0
XN/4–1 x4 (N/4–1)

XN/4 x1
XN/4+1 row x5
=
TN/4
k+N/4 1 1 1 1 wk
XN/2–1 1 –j –1 j xN–3
XN/2 1 –1 1 –1 x2
XN/2+1 1 j –1 –j x6
row
= TN/4
k+N/2 w2k
X3 N/4–1 xN–2
X3 N/4 x3
X3 N/4+1 row x7
=
k+3 N/4 TN/4
w3k
XN–1 xN–1

Figure 2.5 Radix-4 transform.


44 2 The Discrete Fourier Transform

This type of transform is carried out in log4 N − 1 = log4 (N/4) stages. Each stage requires 3(N/4)
complex multiplications, which results in a total of M c4 multiplications, where:
( )
3 N
Mc4 = Nlog4 (2.11)
4 4
The number of complex additions Ac4 is:
Ac4 = 2Nlog4 N (2.12)
It is apparent that the number of additions is the same in radix-2 and in radix-4 algorithms,
in contrast to the complex multiplications, where calculation in radix-4 algorithms results in a
saving of over 25%. Other radices can also be envisaged – for example, radix 8. In this case, there
are multiplications in the elementary matrix, and the savings over radix 4 are negligible. Different
radices can also be combined [5].

2.2.4 Split-radix FFT Algorithm


In a transform of order N, the set of odd-index transformed values can be decomposed into two
subsets:
1∑
N−1
X(4k + 1) = x(n)W n W 4kn
N n=0
and:
1∑
N−1
X(4k + 3) = x(n)W 3n W 4kn
N n=0
Using definition (2.6) for W, the summations can also be written as:
N∕4−1 [[ ( )] [ ( ) ( )]]
1 ∑ N N 3N
X(4k + 1) = x(n) − x n + − j x n+ −x n+ W n W 4nk (2.13)
N n=0 2 4 4
and:
N∕4−1 [[ ( )] [ ( ) ( )]]
1 ∑ n N 3N
X(4k + 3) = x(n) − x n + +j x n+ −x n+ W 3n W 4kn (2.14)
N n=0 2 4 4
The set of even index transformed values is:
N∕2−1 [ ( )]
1 ∑ N
X(2k) = x(n) + x n + W 2nk (2.15)
N n=0 2
The above equations show that the first step in the decimation-in-time approach can be replaced
by the calculation of a transform of order N/2 and two transforms of order N/4. The split-radix
algorithm is obtained through repeated applications of the above procedure:
In the decimation-in-frequency approach, the split-radix algorithm is based on the following
decomposition:

N∕2−1

N∕4−1

N∕4−1
X(k) = x(2n)W 2nk + W k x(4n + 1)W 4nk + W 3k x(4n + 3)W 4nk (2.16)
n=0 n=0 n=0

For a transform of order N, the number of complex multiplications M c2/4 (N) is given by the recur-
rence derived from the above equations:
( ) ( )
N N N
Mc2∕4 (N) = Mc2∕4 + 2Mc2∕4 + (2.17)
2 4 2
2.3 Degradation Arising from Wordlength Limitation Effects 45

with the initial values M(2) = M(4) = 0. The value obtained in this way is slightly smaller than for
a radix-4 algorithm.
In practice, taking trivial operations into account and implementing complex multiplications
with 3 real multiplications and 5 real additions, as explained later on, it can be shown that
the N-order transform needs Nlog2 (N) − 3N + 4 real multiplications and 3Nlog2 (N) − 3N + 4
additions [6].
The algorithms which have been presented for decimation-in-time and -frequency, and for
radix-2 and 4, are elements of a large set of algorithms. A unified presentation of FFT algorithms
is given in the next chapter, so that the most appropriate can be selected for each application. In
actual calculations, however, operations are carried out with limited precision. This results in
some degradation of the signal.

2.3 Degradation Arising from Wordlength Limitation Effects

The equipment used introduces limitations caused by the finite precision of arithmetic units and
the limited memory capacity. Firstly, the coefficients are held in a memory, with a limited number
of bits. Thus, the memory contents represent approximations of the actual coefficients, which are
generally obtained by rounding. Secondly, as the calculation proceeds, rounding is performed so
as to keep the data wordlength within the capacities of the memory locations, or of the arithmetic
units. It is important to analyze the degradations introduced by these two types of wordlength
limitation in order to be able to precisely determine the hardware needed to produce a transform
with specific performance.
To begin with, the rounding of coefficients is considered.
The coefficients actually used by the machine represent an approximation of the theoretical coef-
ficients, which have values of the real and imaginary parts within the range [−1, +1].
For the coefficient e−j2𝜋n/N , digitization into bc bits involves a quantization error
𝛿(n) = 𝛿 R (n) + j𝛿 1 (n) such that, if rounding is employed,

|𝛿R (n)| ⩽ 2−bc and |𝛿I (n)| ⩽ 2−bc

The calculation of each transformed number X(k) from the data x(n) is performed with an error
Δ(k) such that:

1∑
N−1
X(k) + Δ(k) = x(n)[e−j2𝜋nk∕N + 𝛿(nk)]
N n=0
or

1∑
N−1
Δ(k) = x(n)𝛿(nk)
N n=0
As the x(n) and X(k) are related by equation (2.2):

N−1
x(n) = X(k)ej2𝜋nk∕N
k=0

this becomes,

N−1
Δ(k) = X(i)𝜀(i, k) (2.18)
i=0
46 2 The Discrete Fourier Transform

with,

1∑
N−1
𝜀(i, k) = 𝛿(nk)ej2𝜋ni∕N
N n=0

Consequently, for the transformed number x(k), rounding the coefficients of the transform
introduces a perturbation Δ(k) obtained by summing the elementary perturbations, each of which
is equal to the product of a transformed number by a factor representing its contribution. The
transformed numbers interact with each other and are no longer strictly independent.
It is possible to calculate the 𝜀(i,k) for each transformation. In general, it is important to know
the maximum value 𝜀m that the |𝜀(i,k)| can have for a given order of transformation and for a given
number of bits bc . √
The inequality |𝛿(n)| < 2−bc 2 provides a maximum for 𝜀m as

𝜀m ⩽ 2bc 2

In practice, the values found for 𝜀m are much lower than this maximum. For example, for N = 64,
it is found that εm ≃ 0.6 × 2 − bc , and this value is also found for higher values of N [7].
Another performance degradation arises from computation noise. At DFT computer input, data
have a limited number of bits, and this number grows with every multiplication and addition. In
general, the number of bits assigned to internal data in the computer is fixed and it is necessary
to limit word lengths. Most of the time, limitations are obtained just by rounding, because, since
overflow is generally not acceptable, the scaling is selected at the beginning of the transform so
that the whole of the calculation can be performed without any risk of overflow. Two cases can be
distinguished for the radix-2 transform:

– Direct transform: Due to the global scale factor 1/N, it is sufficient to halve the results of addi-
tions and subtractions in each butterfly to keep the correct scaling.
– Inverse transform: The scaling at the beginning of the transform is selected so that no overflow
can occur in the calculations.

Regarding round-off noise power, detailed estimation requires identification of the sources, with
corresponding quantization steps, and summations [8].

2.4 Calculation of a Spectrum Using the DFT

The calculation of a spectrum by the DFT requires that certain approximations and a suitable choice
of parameters be made to attain the desired performance. Before considering any application, it is
therefore useful to look carefully at the function fulfilled by the DFT.

2.4.1 The Filtering Function of the DFT


Let us examine the relationship between the outputs X(k) and the inputs x(n), considered as the
result of sampling a signal x(t) with period T, which establishes the DFT. For k = 0 this is:

1∑
N−1
X(0) = x(n)
N n=0
2.4 Calculation of a Spectrum Using the DFT 47

The signal X(0) thus defined results from the convolution of the signal x(t) with the distribution
𝜑0 (t) such that:

1∑
N−1
𝜙0 (t) = 𝛿(t − nT)
N n=0
The Fourier transform of this distribution is given by:

1 ∑ −j2𝜋nf∕T
N−1
1 1 − e−j2𝜋f∕NT
Φ0 (f ) = e =
N n=0 N 1 − ej2𝜋fT
or
Φ0 (f ) = e−j𝜋f(N−1)T 𝜙(f )
with,
1 sin(𝜋fNT)
Φ(f ) = (2.19)
N sin(𝜋fT)
Now, a convolution operation in the time domain corresponds to a product in the frequency
domain – that is, X(0) is a signal obtained by filtering the input signal by the function Φ0 (f ).
Figure 2.6 shows the function Φ(f ) and the function 𝜙(t) of which it is the Fourier transform. The
function Φ(f ) is zero at points on the frequency axis which are whole multiples of 1/NT, except for
multiples of 1/T. It is periodic and has a period 1/T, conforming to the laws of sampling. It is simply
the spectrum of a sampled impulse of width NT. Similarly, the output X(k) has the corresponding
function 𝜙k (t) such that:

1 ∑ j2𝜋nk∕N
N−1
𝜙k (t) = e 𝛿(t − nT)
N n=0

1 ∑ −j2𝜋nk∕N −j2𝜋fnT
N−1
Φk (f ) = e e
N n=0
In compact form, after simplification, this becomes:
( )
k
Φk (f ) = (−1)k e−j𝜋f(N−1)T ej𝜋k∕N 𝜙 f (2.19b)
NT

φ(t)

1/N

0 T 2T t
NT
∅(f)
1

f
0 1 2
NT NT

Figure 2.6 The filtering function of a discrete Fourier transform.


48 2 The Discrete Fourier Transform

x2
x1
x0 2𝜋 x N –1
x1 x0 N x0
xN–1 xN–
1 x1

X0 X1 XN–1

Figure 2.7 Filtering by phase shift in a discrete Fourier transform.

The output X(k) provides the signal filtered according to the function Φ0 (f ), but translated by
k/NT along the frequency axis.
Thus, the DFT forms a set of N identical filters, or a bank of filters, distributed uniformly over the
frequency domain at intervals of 1/NT.
If the input signal is periodic, then, from the definition of the DFT, this bank of filters is frequency
sampled at intervals of 1/NT, and it should be noted that there is no interference between the
outputs X(k). Strictly speaking, this property is lost if the coefficients are rounded, as has been
shown earlier.
The DFT function described above also illustrates the problem of scaling numbers in the
memory of an FFT calculator. Let us suppose that the numbers of transform x(n) result from the
sampling of a random signal for which the amplitude probability distribution has the variance 𝜎 2 .
If the signal has a uniform distribution of its energy spectrum, its power is uniformly distributed
between the X(k), and each has a variance equal to 𝜎 2 /N. By contrast, if the signal has a spectral
distribution which can be concentrated on one X(k), this X(k) has the same probability distribution
as the x(n) – in particular, the variance 𝜎 2 . Scaling the numbers by dividing by 2 at each stage of
an FFT calculation is suitable for the handling of such signals.
A different view of the filtering process is provided by observing that the outputs X(k) of the DFT
are the sums of the inputs x(n) after phase shifting. In effect, the output X(0) is the sum of the x(n)
with zero phase shift, and the output X(k) is the sum of x(n) with phase shifts which are multiples
of 2𝜋(k/N), as shown in Figure 2.7. At each output, in-phase components of the resulting signals
add while the others cancel. For example, if the x(n) are complex numbers with the same phase and
modulus, all the X(k) become zero except for X(0). The input signal is thus found to be decomposed
according to the base represented by the N vectors e−j2𝜋(k/N) with 0 ⩽ k ⩽ N − 1.
This result is useful in studying banks of filters which include a DFT processor.

2.4.2 Spectral Resolution


Spectrum analysis is used in many fields and is often performed on recorded data. By definition,
the DFT establishes a relation between two periodic sets, the x(n) and X(k), each of which
includes N different elements. In order to use it, it is therefore necessary to examine this double
periodicity.
The periodicity in the frequency is introduced by the process of signal sampling. The data to be
processed are either in digital form, with sampling and coding being performed within the signal
source, or in analog form, which then has to be converted into digital. The choice of sampling
frequency f s = 1/T should be such that components of the signal with frequencies greater than or
equal to f s /2 are negligible, and, in any case, less than the tolerable error in the amplitude of the
useful components. Fulfillment of this condition can be ensured by prefiltering the signal.
2.4 Calculation of a Spectrum Using the DFT 49

Periodicity in time is introduced artificially by assuming that the signal is repeated outside the
time interval 𝜃 = NT which actually corresponds to the data being processed. Under these condi-
tions, the DFT supplies a sample of the spectrum with a frequency period Δf , equal to the inverse
of the duration of the data, which constitutes the frequency resolution of the analysis. The relation
Δf = NT expresses the Heisenberg uncertainty principle for spectrum analysis. A more accurate
analysis can be obtained by increasing the duration of the data collection – for example, by making
it N′ T (with N′ > N) with zero additional samples. The additional frequency samples are obtained
simply from an interpolation of the others. This procedure is commonly used to provide a number
N’ of data points which is a power of 2, thus allowing fast algorithms to be used. On the other hand,
the fact that the signal is not formed solely of lines at frequencies which are multiples of 1/NT intro-
duces interference between the spectral components obtained. Indeed, the filter function Φ(f ) of
the DFT, which is given in Section 2.4.1, introduces ripples throughout the frequency band and, if
the signal has a spectral component S(f 0 ) at frequency f 0 such that k/NT < f 0 < (k + 1)/NT, then:
X(k) = S(f0 )Φ(k∕NT − f0 ), 0⩽k ⩽N −1 (2.20)
As a result, all the transform outputs can assume nonzero values as shown in Figure 2.8. Thus,
limitations appear for the resolution of the analyzer. This effect can be reduced by modifying the
filter function of the DFT by weighting the signal samples before transforming them.
This operation amounts to replacing the rectangular time window 𝜙(t) with a function whose
Fourier transform results in smaller ripples. Numerous functions are used, of which the simplest
is the raised cosine window:
( )
1 t
𝜙(t) = 1 + cos 2𝜋 (2.21)
2 NT
and the Hamming window:
( )
t
𝜙(t) = 0.54 + 0.46 cos 2 𝜋 (2.22)
NT
The latter function has 99.96% of its energy in the main lobe. The peak side lobe ripple is about
40 dB below the main lobe peak. Other time windows can be used, and several efficient functions
are introduced in Reference. [9].
Let Φ(f ) be the spectrum of the time window 𝜑(t) after sampling; the expression (2.20) can
be extended to any signal with spectrum S(f ), using the definition of the convolution and taking
account of the periodicity of Φ(f ):
[ ] ( )
( )
1 ∑
1∕T ∞
n k
X(k) = S u− Φ − u du
∫0 T n=−∞ T NT
The signal spectrum aliasing due to sampling with period T is apparent.
To cope better with interference between the calculated spectral components, it is necessary to
employ a bank of more selective filters, like that presented in Chapter 10.
The DFT can also be used indirectly in the calculation of convolutions.

X(k)
1

k
O 1 2 3 4 5

Figure 2.8 Analysis of a signal with a frequency which is not a multiple of 1/NT.
50 2 The Discrete Fourier Transform

2.5 Fast Convolution

The efficiency of FFT algorithms leads to the use of the DFT in cases other than spectrum analysis
and, in particular, in convolutions.
Although, in general, this is not the most efficient of approaches, it can be useful in applications
where an FFT processor is available.
One of the properties of the DFT is that the transform of a convolution product is equal to the
product of its transforms. Given two sets x(n) and h(n) of period N, with transforms X(k) and H(k),
the circular convolution

N−1
y(n) = h(m)x(n − m)
m=0

is a set of the same period whose transform is written as:


Y (k) = H(k)X(k)
Fast convolution consists of calculating the set y(n) by applying a DFT to the set Y (k). As one of
the convolution terms is normally constant, the operation requires one DFT, one product, and one
inverse DFT. This technique is applied to sets of finite length. If x(n) and h(n) are two sets of N 1
and N 2 nonzero terms, the set y(n) defined by

n
y(n) = h(m)x(n − m)
m=0

is a set of finite length, having N 1 + N 2 − 1 terms. The fast convolution is applied by considering
that the three sets y(n), x(n), and h(n) have the period N such that N ⩾ N 1 + N 2 − 1. It is then
sufficient to complete each set with a suitable number of zero terms. It is of particular interest if a
power of 2 is chosen for N.
Nevertheless, in practice, convolution is a filtering operation, where the x(n) represents the signal
and the h(n) the coefficients. The set of the x(n) is much longer than that of the h(n), and it is
necessary to subdivide the calculation. To do this, the set of the x(n) is regarded as a superposition
of elementary sets xk (n), each of N 3 terms. That is,

x(n) = xk (n)
k

with xk (n) = x(n) for kN 3 ⩽ n ⩽ (k + 1)N 3 − 1 and xk (n) = 0 elsewhere.


We can then write:
∑n

y(n) = h(m) xk (n − m)
m=0 k
∑∑
n

y(n) = h(m)xk (n − m) = yk (n)
k m=0 k

Each set yk (n) contains N 3 + N 2 − 1 nonzero terms. Thus, the convolutions involve N 3 + N 2 − 1
terms. Figure 2.9 shows the sequence of operations. The sets yk (n) and yk+1 (n) have N 2 − 1 terms
which are superposed. The same operations can be performed by decomposing the set x(n) into
sets xk (n) such that N 2 − 1 terms are superposed.
In this process, the number of calculations to be performed on each element of the set y(n)
increases as log2 (N 2 + N 3 − 1) and N 3 must not be chosen as too large. Also, if N 3 < N 2 , no terms
in the set y(n) can be obtained directly. Consequently, there is an optimal value for N 3 . The number
2.6 Calculations of a DFT Using Convolution 51

N2
h(n)

N3
xk(n)

N2 + N3 – 1
yk(n)
N3
xk+1(n)

N2 + N3 – 1
yk+1(n)

N2 – 1 N2 – 1 N2 – 1
y(n)

Figure 2.9 Sequence of the operations in fast convolution.

of memory locations needed increases as N 3 + N 2 + 1, and a good compromise is reached by taking


for N 3 the first value above N 2 such that N 3 + N 2 − 1 is a power of 2.

2.6 Calculations of a DFT Using Convolution

In certain applications, only those operators which can form convolutions are available to calcu-
late a DFT. Such is the case for circuits using charge transfer devices which allow calculations to
be performed on the sampled signals in analog form at speeds compatible with the frequencies
encountered, for example, in radar applications.
The definition of a DFT can be written as:

1∑
N−1
X(k) = x(n)e−j2𝜋(nk∕N)
N n=0

By writing
1 2
nk = [n + k2 − (n − k)2 ] and W = e−j(2𝜋∕N)
2
this becomes

2 ∕2

N−1
2 ∕2 2 ∕2
X(k) = W k x(n)W n W −(n−k) (2.24)
n=0
2 2
This equation expresses the circular convolution product of the sets x(n)W n ∕2 and W −n ∕2 . It
follows that the calculation of X(k) can be performed in three stages comprising the following
operations:
2
(1) Multiply the data x(n) by the coefficients W n ∕2
2
(2) Form the convolution product with the set of coefficients W n ∕2
2
(3) Multiply the results by the coefficients W k ∕2

This process is represented in Figure 2.10, and the method can be extended to the case where W
is a complex number with a non-unit modulus [10].
52 2 The Discrete Fourier Transform

Figure 2.10 Calculation of a discrete Fourier


Convolution transform by convolution.
2
x(n) by W–n /2 X(k)
2/2
Wn
2/2
Wk

2.7 Implementation

In order to implement DFT algorithms, the following elements must be employed:


(1) A memory unit to store the input–output data and the intermediate results
(2) A memory unit to store the coefficients of the transform
(3) An arithmetic unit which can add and multiply complex numbers
(4) A control unit to link the various operations
These fundamental elements are found in every machine designed for digital signal processing,
whether it uses hardwired or programmed logic. Implementation of the FFT has two main
characteristics:
(1) A large number of arithmetic calculations have to be performed.
(2) Permutations are required on the data, resulting in complicated calculations for the indices.
The constraints increase with the order of the transform.
The problem of implementation is to find efficient procedures for the algorithms described in
the above sections. Various circuits and methods of organizing the calculations can be devised. An
important possibility is worth mentioning – a complex multiplication can be implemented with the
help of 3 real multiplications and 5 additions according to the equation:
(aR + jaI )(bR + jbI ) = [(aR + aI )bR − aI (bR + bI )] + j[(aR + aI )bR + aR (bI − bR )] (2.25)
This option is frequently invoked in complexity estimations.
When a degree of optimization is being sought, particularly to adapt the circuits to the algo-
rithms, it is preferable to look for algorithms which are compatible with the constraints imposed
on the circuitry by technology. Moreover, in order to further reduce the number of arithmetic calcu-
lations, algorithms can be developed which are more sophisticated than the FFT. Also, significant
simplifications can be made if the data display any particular feature (for example, if they are real
numbers, or if symmetries appear).
These topics will be examined in the following chapter after a unified presentation of FFT
algorithms.

Exercises

2.1 Calculate the DFT of the set comprising N = 16 terms such that:
x(0) = x(1) = x(2) = x(14) = x(15) = 1
x(n) = 0 for 3 ⩽ n ⩽ 13
and of the set
x(0) = x(1) = x(2) = x(3) = x(4) = 1
Exercises 53

x(n) = 0 for 5 ⩽ n ⩽ 15
Compare the results obtained. Carry out the inverse transform of these results.

2.2 Establish the diagram for the FFT algorithm of order 16 with time and frequency
interleaving. What is the minimum number of multiplications and additions that are
required?

2.3 Calculate the DFT of the set comprising N = 128 terms such that:
x(0) = x(1) = x(2) = x(126) = x(127) = 1
x(n) = 0 for 3 ⩽ n ⩽ 125
Compare the results with those in Exercise 2.1.
The set X(k) obtained forms an approximation to the Fourier series expansion of a set
of impulses. Compare the results obtained with the figures in the table in Appendix 1,
Chapter 1. Account for the differences.

2.4 We wish to develop a DFT of order 64 with a minimum of arithmetic operations. Determine
the number of multiplications and additions required with algorithms with radix 2, 4, and 8.

2.5 Analyze the power of the rounding noise produced in a transform of order 32. Using the
results in Section 2.3, show how the results vary at the different outputs. Calculate the dis-
tortion introduced by limiting the coefficients to 8 bits.

2.6 Show that each output of a DFT, X(k), can be obtained from the inputs x(n) by a recurrence
relation. Calculate the number of multiplications that would be required.

2.7 Carry out a DFT of order 64 on data which are 16-bit numbers. Calculate the degradation
of signal-to-noise ratio when a cascade of direct and inverse transforms is used on a 16-bit
machine.

2.8 Assume that the bandwidth occupied by a signal for analysis is from 0 to 10 kHz. The spectral
resolution required is 1 Hz. What length of recording is required in order to carry out such an
analysis? What memory capacity is required to store the data, assuming they are coded into
8 bits? Determine the characteristics of a computer capable of performing such a spectral
analysis – memory capacity, memory cycle, addition, and multiplication times.

2.9 Calculate the DFT of the set x(n) which is defined by:
x(n) = sin(2𝜋n∕3.5) + 0.2 sin(2𝜋n∕6.5) with 0 ⩽ n ⩽ 15
The following windows are used to improve the analysis:
1
g(n) = [1 − cos(2𝜋n∕16)]
2
2𝜋n
g(n) = 0.54 − 0.46 cos (Hamming)
16
2𝜋n 4𝜋n
g(n) = 0.42 − 0.5 cos + 0.08 cos (Blackman)
16 16
Compare the results.
54 2 The Discrete Fourier Transform

References

1 Special issue on FFT and applications. IEEE Transactions, 15(2), 1967, https://ieeexplore.ieee
.org/xpl/tocresult.jsp?isnumber=26059&punumber=8337, 1–113.
2 A. Oppenheim and R. Schafer, Digital Signal Processing, Prentice Hall, Englewood Cliffs NJ,
1974, Chapters 3 and 6.
3 L. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Prentice Hall,
Englewood Cliffs NJ, 1975, Chapters 6 and 10.
4 C. S. Burrus and T. W. Parks, DFT/FFT and Convolution Algorithms, Wiley, New York, 1985.
5 P. Duhamel and H. Hollmann, Split radix FFT algorithm. Electronics Letters, 1984, 20 (1), 14–16.
6 H. Sorensen, M. Heideman and S. Burrus. On computing the split-radix FFT, IEEE Transac-
tions, 34(1), 1986, 152–156.
7 D. W. Tufts, H. S. Hersey and W. E. Mosier, Effects of FFT Coefficient quantization on bin
frequency response. Proceedings of IEEE, 1972, 60, 1, pp. 146–147, 10.1109/PROC.1972.8582.
8 T. Thong and B. Liu, Fixed point FFT error analysis. IEEE Transactions, 24(6), 563–573, 1976.
9 A. Eberhard, An optimal discrete window for the calculation of power spectra. IEEE Transac-
tions, 21(1), 37–43, 1973.
10 L. Rabiner, R. Schafer and C. Rader, The chirp z-transform algorithm. IEEE Transactions on
Audio and Electroacoustics, 17, 1969. 17, 2, pp. 86–92, 10.1109/TAU.1969.1162034
55

Other Fast Algorithms for the FFT

Algorithms for the fast calculation of a discrete Fourier transform (DFT) are based on factorization
of the matrix of the transform. We have already seen such factorization in the sections on
decimation-in-time and decimation-in-frequency algorithms, in the preceding chapter, which are
particular examples of a large group of algorithms.
In order to use these fast algorithms and thus to exploit to the full both the characteristics of the
signals to be processed and the various technological possibilities, one must use a suitable mathe-
matical tool – the Kronecker product of matrices. By combining this product with the conventional
product, it is possible to factorize the matrix of the DFT in a simple way.

3.1 Kronecker Product of Matrices


The Kronecker product is a tensor operation which is a generalization of the multiplication of a
matrix by a scalar [1]. Knowing two matrices A and B with m and p rows and n and q columns
respectively, the Kronecker product of A by B (written A × B) is a new matrix with mp rows and
nq columns, which is obtained by replacing each element bij of the matrix B by the following
array bij A:

bij a11 bij a12 · · · bij a1n


⋮ ⋮
bij am1 bij am2 · · · bij amn
This product is generally not commutative:
A×B≠B×A
As an example of the product, if the matrix B is
[ ]
b11 b12
B=
b21 b22
the Kronecker product of the matrix A by the matrix B is
[ ]
b A b12 A
A × B = 11 (3.1)
b21 A b22 A
In particular, it should be noted that the Kronecker product of the unit matrix I N by the
M-dimensional unit matrix I M is equal to the unit matrix of dimension MN:
IN × IM = INM (3.2)
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
56 3 Other Fast Algorithms for the FFT

Similarly, the Kronecker product of a diagonal matrix by another diagonal matrix is once again
a diagonal matrix.
The Kronecker product can be combined with conventional matrix products, and thus we have
the following properties which will be used in the coming sections, provided that the dimensions
are compatible:

(1) The Kronecker product of a product of matrices with the unit matrix is equal to the product of
the Kronecker products of each matrix with the unit matrix:

(ABC) × I = (A × I)(B × I)(C × I) (3.3)

(2) The product of Kronecker products is equal to the Kronecker product of the products:

(A × B × C)(D × E × F) = (AD) × (BE) × (CF) (3.4)

(3) The inverse of a Kronecker product is equal to the Kronecker product of the inverses:

(A × B × C)−1 = A−1 × B−1 × C−1 (3.5)

The property of transposition is similar to that of inversion:

(A × B × C)t = At × Bt × Ct (3.6)

The transpose of the matrix of a Kronecker product is the Kronecker product of the transposes
of the matrices.
These properties can be easily demonstrated using some simple examples. They are used
to factorize matrices with redundant elements and, in particular, for DFT matrices [2].
Decimation-in-frequency will be considered first.
It should be noted that the scale factor 1/N is ignored throughout the rest of this chapter.

3.2 Factorizing the Matrix of a Decimation-in-Frequency Algorithm

In the algorithms examined in the previous chapter, one of the sets – either the input or the
output – was permuted. The matrix which represents this algorithm is derived from the matrix T N
by permutation of the rows or the columns depending upon whether the decimation in frequency
or decimation in time is considered [3].
Let TN′ denote the matrix corresponding to decimation in frequency. This is obtained by
permutation of the rows of T N as follows. The rows are numbered, and each number is expressed
in binary notation; then the binary numbers are reversed, and the resulting number denotes the
position of that row in the new matrix. For example, for N = 8 we obtain:

⎡1 1 1 1 1 1 1 1 ⎤ 0 0 0 = 0
⎢1 W W 2 W 3 −1− W −W 2 −W 3 ⎥ 0 0 1 = 1
⎢ 2 2 2 2⎥
⎢1 W − 1 −W 1 W − 1 −W ⎥ 0 1 0 = 2
⎢1 W 3 −W 2 W −1−W 3 W 2 W ⎥ 0 1 1 = 3
T8 = ⎢
⎢1− 1 1 − 1 1− 1 1 − 1 ⎥⎥ 1 0 0 = 4
⎢1− W W −W −1 W −W 2 W 3 ⎥
2 3 1 0 1 = 5
⎢1−W 2 − 1 W 2 1−W 2 − 1 W 2 ⎥ 1 1 0 = 6
⎢ ⎥
⎣1−W 3 −W 2 − W −1 W 3 W 2 W ⎦ 1 1 1 = 7
3.2 Factorizing the Matrix of a Decimation-in-Frequency Algorithm 57

⎡1 1 1 1 1 1 1 1 ⎤ 0 0 0 = 0
⎢1− 1 1 − 1 1− 1 1 − 1 ⎥ 1 0 0 = 4
⎢ 2 2 2 2⎥
⎢1 W − 1 −W 1 W − 1 −W ⎥ 0 1 0 = 2
⎢1−W 2 − 1 W 2 1−W 2 − 1 W 2 ⎥ 1 1 0 = 6
T8′ = ⎢ 2 3 2 3⎥
⎢1 W W W −1− W −W −W ⎥ 0 0 1 = 1
⎢1− W W 2 −W 3 −1 W −W 2 W 3 ⎥ 1 0 1 = 5
⎢1 W 3 −W 2 W −1−W 3 W 2 − W ⎥ 0 1 1 = 3
⎢ ⎥
⎣1−W 3 −W 2 − W −1 W 3 W 2 W ⎦ 1 1 1 = 7
Note that for N = 2, the matrix T2′ is equal to T 2 .
The matrix T2′ is factorized by finding the matrix TN∕2

and the diagonal matrix DN/2 whose ele-
ments are the numbers W with 0 ⩽ k ⩽ N/2 − 1. Thus,
k

⎡TN∕2
′ ′
TN∕2 ⎤
TN′ = ⎢ ⎥
⎢T ′ D −T ′ D ⎥
⎣ N∕2 N∕2 N∕2 N∕2 ⎦
This decomposition appears clearly for T8′ . If I N/2 denotes the unit matrix of order N/2, we can
write:
[ ]
⎡TN∕2

0 ⎤ IN∕2 IN∕2
TN = ⎢
′ ⎥
⎢0 ′ ⎥
TN∕2
⎣ ⎦ DN∕2 −DN∕2
or
[ ][ ]
⎡TN∕2

0 ⎤ IN∕2 0 IN∕2 IN∕2
TN′ = ⎢ ⎥
⎢ 0 T′ ⎥ 0 D IN∕2 −IN∕2
⎣ N∕2 ⎦ N∕2

By using the Kronecker products of the matrices, for TN′ we obtain:


( ) ( )
TN′ = TN∕2

× I2 ΔN IN∕2 × T2′ (3.7)

where ΔN is a diagonal square matrix of order N, in which the first N/2 elements have the value 1
and the subsequent elements are powers of W, W k with 0⩽k⩽N/2–1.
The complete factorization is obtained by iteration:
( ) ( )
TN′ = T2′ × IN∕2 (Δ4 × IN∕4 ) I2 × T2′ × IN∕4
…………………………………………
( )
(ΔN∕2 × I2 ) IN∕4 × T2′ × I2
( )
ΔN IN∕2 × T2′
or
∏(
log2 N
)( )
TN′ = Δ2 i × IN∕2i I2i−1 × T2′ × IN∕2i (3.8)
i=1

This expression shows that the transform is calculated in log2 (N) stages, each containing:
( )
(1) One part involving the ordering of the data corresponding to the factor I2i−1 × IN∕2i , which
contains only additions and subtractions.
(2) One part which involves the multiplications by the coefficients represented in the matrix
( )
Δ2i × T2′ × IN∕2i . The stage corresponding to i = 1 does not involve any multiplications. It can be
verified that all the matrices indeed have the dimension N.
58 3 Other Fast Algorithms for the FFT

In order to see how factorization is generalized to radix 4, it is interesting to examine the matrix

T16 , which is obtained from T 16 by the following permutation of the rows. The rows are numbered
to base 4 and the order of the digits in the row numbers are reversed. The value obtained shows the
number of the row in the new matrix. Following this permutation, we obtain T4 = T4′ .
If D4 denotes the diagonal matrix
⎡1 0 0 0 ⎤
⎢ ⎥
0 W 0 0 ⎥
D4 = ⎢
⎢0 0 W2 0 ⎥
⎢0 0 0 W 3 ⎥⎦

then the matrix of the transform of order 16 thus obtained is:
⎡ T4 T4 T4 T4 ⎤
⎢ ⎥
⎢T4 D4 T4 (−j)D4 T4 (−1)D4 T4 (+j)D4 ⎥

T16 =⎢ ⎥
⎢T D2 T (−1)D2 T D2 T (−1)D2 ⎥
⎢ 4 4 4 4 4 4 4 4⎥
⎢T D3 T (+j)D3 T (−1)D3 T (−j)D3 ⎥
⎣ 4 4 4 4 4 4 4 4⎦
⎡T4 0 0 0 I4 0 0 0 ⎤ ⎡I4 I4 I4 I4 ⎤
⎢ ⎥ ⎢ ⎥
⎢ 0 T4 0 0 0 D4 0 0 ⎥ ⎢I4 −jI4 −I4 +jI4 ⎥

T16 =⎢ ⎥×⎢ ⎥
⎢ 0 0 T 0 0 0 D2 0 ⎥ ⎢I −I I −I ⎥
⎢ 4 4 ⎥ ⎢ 4 4 4 4 ⎥
⎢ 0 0 0 T 0 0 0 D3 ⎥ ⎢I +jI −I −jI ⎥
⎣ 4 4⎦ ⎣ 4 4 4 4⎦
This expression is written in Kronecker product form as:

T16 = (T4 × I4 )Δ16 (I4 × T4 ) (3.9)

where Δ16 is a diagonal matrix in which the first four terms have the value 1, the next four terms
W k with 0 ⩽ k ⩽ 3, and the subsequent terms (W 2 )k and (W 3 )k with 0 ⩽ k ⩽ 3.
Factorization as Kronecker products forms the basis of algorithms which have various
properties – notably the order of presentation/extraction of data, and the linking of operations. It
also applies to partial transforms, which are of great practical importance.

3.3 Partial Transforms

The transforms which have been studied in the above sections relate to sets of N numbers which
may be complex. In a fine spectrum analysis, it can happen that the order of the transform N
becomes very large, though we are interested in knowing only a reduced number of points in the
spectrum. The limitation of the calculation to useful single points can then allow for significant
savings.
Let us calculate the partial transform defined by the following equation, where r is a factor of N:
⎡ Xp ⎤ ⎡ 1 W P W 2P … W (N−1)P ⎤ ⎡ x0 ⎤
⎢X ⎥ ⎢1 W P+1 W 2(P+1) … W (N−1)(P+1) ⎥ ⎢ x ⎥
⎢ p+1 ⎥ ⎢ P+2 W 2(P+2) … W (N−1)(P+2)
⎥⎢ 1 ⎥
⎢ Xp+2 ⎥ = ⎢1 W ⎥ ⎢ x2 ⎥ (3.10)
⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
⎣Xp+r−1 ⎦ ⎣1W (P+r−1) · · · …W (N−1)(P+r−1) ⎦ ⎣xN−1 ⎦
3.3 Partial Transforms 59

From the whole set of data, one can form N/r subsets, each containing r terms:
(x0 , xN∕r , x2N∕r , , x(r−1)N∕r )
(x1 , x(N∕r)+1 , , x(r−1)N∕r+1 )
……………………………
(x(N∕r)−1 , x(2N∕r)−1 , , xN−1 )
Assume Dr is the diagonal matrix of dimension r, whose elements are the powers of W, W k with
0 ⩽ k ⩽ r − 1.
The matrix of the partial transform can be separated into N/r submatrices which can each be
applied to one of the sets which were defined earlier, and the matrix equation of the transform is
written as:

(N∕r)−1
(N∕r)
[X]p,r = Dir Tr (W p )i Dr [x]i,r (3.11)
i=0

where [X]p,r denotes the set of r numbers X k with p ⩽ k ⩽ p + r − 1 and [x]i,r the set of data xk with
k = nN/r + i and n = 0, 1,…, r − 1. The transform T r is that of order r.
Consequently, if r is a factor of N, a partial transform relating to r points is calculated using N/r
transforms of order r with which the appropriate diagonal matrices are associated.
If N and r are powers of 2, the number M P of complex multiplications to be made is given by:
( ) [ ( ) ]
N r r 1 r
MP = log2 + 2r = N log2 +2 (3.12)
r 2 2 2 2
This result is equally valid when it is the number of points to be transformed which is limited, as
is often the case in spectrum analysis. A common example of a partial transform is that applied to
real data.

3.3.1 Transform of Real Data and Odd DFT


If the data to be transformed are real, the properties listed in Chapter 2 show that the transformed
numbers X(k) and X(N − k) are complex conjugates – that is, X(k) = X(N − k). Thus, it is only
necessary to calculate the set of the X k with 0 ⩽ k ⩽ N/2 – 1 and the above result can be applied:
[X]0,N∕2 = TN∕2 [x]0,N∕2 + DN∕2 TN∕2 [x]1,N∕2 (3.13)
In this particular case, the transform T N/2 has only to be calculated once, taking advantage of
the following property of the DFT – if the set to be transformed xk is purely imaginary, and the
transformed set is such that:
X(k) = −X(N − k)
Under these conditions, the procedure for calculating the transform of a real set is as follows:
(1) Using the x(k), form a complex set of N/2 terms y(k) = x(2 k) + jx(2 k + 1) with 0 ⩽ k ⩽ N/2 − 1.
(2) Calculate the transform Y (k) of the set y(k) with 0 ⩽ k ⩽ N/2 − 1.
(3) Calculate the required numbers using the expression

[ ( )] [ ( ) ]
1 N 1 N
X(k) = Y (k) + Y − k + je−j2𝜋(k∕N) Y − k − Y (k) (3.14)
2 2 2 2
( )
N
with 0 ⩽ k ⩽ N/2 and Y 2
= Y (0).
60 3 Other Fast Algorithms for the FFT

It is convenient to rewrite equation (3.14) as follows:


( )
N
X(k) = A(k)Y (k) + B(k)Y −k (3.15)
2
1 1
A(k) = (1 − j W k ); B(k) = (1 + j W k )
2 2
The inverse transform is obtained by, first, calculating:
( )
N N
Y (k) = A (k)X(k) + B(k)X −k ; 0≤k ≤ −1 (3.16)
2 2
then, calculating the inverse DFT of size N/2 and, finally, taking the real parts as odd index data
and the imaginary parts as even index data.
If N is a power of 2, the number of complex multiplications Mc to be performed is
( )
N N N N
Mc = log2 + = log2 N (3.17)
4 4 2 4
Memory locations are required for N real numbers. An algorithm for real data is described in
detail in Reference [4]. Another method of calculating the transforms of real numbers is to use odd
transforms [5].
The odd DFT, by definition, establishes the following relations between two sets of N complex
numbers x(n) and X(k):

1∑
N−1
X(k) = x(n)e−j2𝜋(2k+1)n∕(2N) (3.18)
N n=0

N−1
x(n) = X(k)ej2𝜋(2k+1)n∕(2N)
k=0

The coefficients of this transform have as their coordinates the points M of a unit circle such that
−−→
the vector OM forms an angle with the abscissa which is an odd multiple of 2𝜋/2N, as shown in
Figure 3.1.
By setting W = e−j(𝜋/N) , the matrix of this transform is written:

⎡1 W W 2 … W N−1 ⎤
⎢1 W 3 W 6 … W 3(N−1) ⎥
I ⎢ ⎥
TN = ⎢1 W 5 W 10 … W 5(N−1) ⎥
⎢⋮ ⋮ ⋮ ⎥
⎢ (2N−1) … …W (2N−1)(N−1) ⎥
⎣ 1 W ⎦

Im Figure 3.1 Coefficients of the odd discrete Fourier transform.

M
2𝜋
N
1
0 Re
3.3 Partial Transforms 61

If the x(n) are real numbers, one can write:

1∑
N−1
X(N − 1 − k) = x(n)e−j2𝜋[2(N−1−k)+1]∕(2N)
N n=0

1∑
N−1
= x(n)ej2𝜋(2k+1)n∕(2N) (3.19)
N n=0
Thus,
X(N − 1 − k) = X(k) (3.20)
Consequently, since the X(k) with even and odd indices are complex conjugates, it is sufficient
to calculate the X(k) with an even index in order to perform a transform on real numbers. Such a
transform is the matrix T R given by:
⎡1 W W 2 … W N∕2 … W N−1 ⎤
⎢ ⎥
1 W 5 W 10 … W 5N∕2 … W 5(N−1) ⎥
TR = ⎢
⎢⋮ ⋮ ⋮ ⋮ ⎥
⎢1W 2N−3 … …W (2N−3)N∕2 …W (2N−3)(N−1) ⎥
⎣ ⎦
Let DN/2 be the diagonal matrix whose elements are W k with 0 ⩽ k ⩽ N/2 − 1 and let T N/2 be
the matrix of the transform of order N/2. Allowing for the fact that W 2N = 1 and W N/2 = −j, this
becomes:
TR = [TN∕2 D, −jTN∕2 D] = (TN∕2 D) × [1, −j] (3.21)
The odd transform of the real data is then calculated by carrying out a transform of order N/2 on
the set of complex numbers:
[ ( )]
N N
y(n) = x(n) − jx + n W n with 0 ⩽ n ⩽ −1 (3.22)
2 2
The number of calculations is the same as in the method illustrated at the beginning of this
section, but the structure is simpler. It should be noted that the transformed numbers give a fre-
quency sample of the signal spectrum represented by the x(n), displaced by a half step on the
frequency axis.
An important case where significant simplifications are introduced is that of real symmetrical
sets. Reductions in the calculations are illustrated by using the doubly odd transform [6].

3.3.2 The Odd-time Odd-frequency DFT


The odd-time odd-frequency DFT, by definition, establishes the following relations between two
sets of N complex numbers x(n) and X(k):

1∑
N−1
X(k) = x(n)e−j2𝜋(2k+1)(2n+1)∕(4N) (3.23)
N n=0

N−1
x(n) = X(k)e j2𝜋(2k+1)(2n+1)∕(4N) (3.24)
k=0

The coefficients of this transform are based on the points M of a unit circle such that the vector
−−→
OM forms an angle with the abscissa which is an odd multiple of 2𝜋/4N as shown in Figure 3.2.
If the x(n) are real numbers, the following holds:
X(N − 1 − k) = −X(k)
62 3 Other Fast Algorithms for the FFT

Im Figure 3.2 Coefficients of the doubly odd discrete Fourier


transform.

M
2𝜋
N
1
0 Re

Similarly, if the X(k) are real numbers, then:

x(N − 1 − n) = −x(n)

By assuming, as before, that W = e−j(𝜋/N) , the matrix of the transform is written as:

⎡ W 1∕2 W 3∕2 W 5∕2 … W N−1∕2 ⎤


⎢ W 3∕2 W 9∕2 W 15∕2 … W 3(N−1∕2) ⎥
II ⎢ ⎥
TN = ⎢ W 5∕2 W 15∕2 W 25∕2 … W 5(N−1∕2) ⎥ (3.25)
⎢ ⋮ ⋮ ⋮ ⋮ ⎥
⎢ N−1∕2 3(N−1∕2) ⎥
⎣W W … …W (2N−1)(N−1∕2) ⎦
This transform is factorized as follows:
⎡1 0 0 … 0 ⎤ ⎡1 1 1 … 1 ⎤ ⎡
⎢0 1 0 0 … 0 ⎤
W 0 … 0 ⎥ ⎢1 W 2 W 4 … W 2(N−1) ⎥ ⎢ ⎥
1∕2 ⎢ ⎥⎢ ⎥ 1 W 0 … 0 ⎥
TN = W ⎢⋮
II
⋮ ⋮ ⎥ ⎢1 W 4 W 8 … W 4(N−1) ⎥ × ⎢
⎢⋮ ⎢⋮ ⋮ ⋮ ⎥
⋮ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ⎥ ⎢
⎢ ⎥⎢ ⎥ 0 0 … W N−1 ⎥⎦
⎣0 0 … W N−1 ⎦ ⎣1 W 2(N−1) … W 2(N−1)(N−1) ⎦ ⎣
That is,

TNII = W 1∕2 DN TN DN (3.26)

Let us consider the case where the set of data x(n) is real and antisymmetric – that is,
x(n) = −x(N − 1 − n). Then the same applies to the set X(k). The set of the x(n) for even n is equal
to the set of the x(n) for odd n, except for the sign. The situation is the same for the set of X(k).
In order to calculate the transform, it is sufficient in this case to carry out the calculations for the
x(2n) with 0 ⩽ n ⩽ N/2 − 1, since the X(k) are real numbers. Alternatively, it is sufficient to perform
the calculations on the X(2k) with 0 ⩽ k ⩽ N/2 − l.
The corresponding matrix T RR is written as:

⎤⎡
1 1 … 1 1⎤ ⎡ 0 … 0 ⎤
⎡1 0 0 … 0
⎢ ⎥⎢⎢ 1 W 2 … W 8[(N∕2)−1] 0⎥ ⎢W 2 … 0 ⎥
0 W2 0 … 0 ⎥⎢ ⎥
TRR = W 1∕2 ⎢ ⎥ ⎢1 W 16 … ⋮ ⋮⎥ ⎢ ⋮ ⋮ ⎥
⎢⋮ ⋮ ⋮ ⎥⎢
⎢0 ⋮ ⋮ ⋮ ⋮⎥ ⎢ ⋮ ⋮ ⎥
⎣ 0 … … W 2(N∕2−1) ⎥⎦ ⎢ ⎥⎢ ⎥
⎣1 W 8[(N∕2)−1] W 8[(N∕2)−1][(N∕2)−1] 0⎦ ⎣ 0 W 2[(N∕2)−1]

Allowing for W 2N = 1, this becomes:
[ ]
1∕2 TN∕4 TN∕4
TRR = W DN∕2 D (3.27)
TN∕4 TN∕4 N∕2
3.3 Partial Transforms 63

Table 3.1 Arithmetic complexities of the various fast fourier transforms.

Complex multiplications Complex additions Memory positions


( )
N
Complex DFT log2 N2 N log2 N 2N
2 ( )
N N
Odd DFT – real data log2 (N) log2 N2 N
4 2 ( )
N N
Doubly odd DFT – symmetrical real data 8
log2 (2N) 4
log2 N4 N
2

and, as W 2N/4 = −j, this calculation can be carried out with one performance of the operations
represented by the matrix T N/4 , on the set of numbers x(2n) − jx(2n + N/2) with 0 ⩽ n ⩽ N/4–1.
The N/4 numbers obtained are complex ones whose real parts form the set of the desired X(2 k)
with 0 ⩽ k ⩽ N/4 − 1. By carrying out the operation defined by T RR for the transformed numbers of
rank 2 k + N/2 with 0 ⩽ k ⩽ N/4 − 1, it can be verified that we have obtained the earlier numbers
multiplied by −j. That is, the imaginary part of the numbers obtained previously furnishes the set of
the X(2 k + N/2). It follows that, if the doubly odd transform is applied to a real and antisymmetric
set of N terms, or to a symmetric set which becomes antisymmetric through a suitable change of
sign, it is reduced to the equation:
[ ( )] [ ( )]
N N
X(2k) + jX 2k + = W 1∕2 DN∕4 TN∕4 DN∕4 x(2n) − jx 2n + (3.28)
2 2
with 0 ⩽ k ⩽ N/4 − 1, 0 ⩽ n ⩽ N/4 − 1, and where DN/4 is a diagonal matrix, whose elements are
W 2i with 0 ⩽ i ⩽ N/4 − 1.
The number of complex multiplications Mc which are necessary is:
( )
N N N N
Mc = log2 + 2 = log2 (2n) (3.29)
8 8 4 8
Comparisons using the different transforms are given in Table 3.1 to illustrate the amount of
calculations for each type.
The importance of the odd transforms can be easily seen. It should, however, be noted that other
algorithms allow greater reductions to be made for real data and for symmetric real data [7], but
these are not as simple to use, especially for practical implementations.
One feature of the doubly odd transform, when applied to a real antisymmetric set, is that it is
identical to the inverse transform. Apart from the scale factor 1/N, there is no distinction, in this
case, between the direct and inverse transforms.
The Fourier transform of a real symmetric set is introduced, for example, when deriving the
power spectrum density of a signal from its autocorrelation function.

3.3.3 Sine and Cosine Transforms


The transforms considered so far have complex coefficients. Discrete transforms of the same family
can be obtained using the real and imaginary parts of the complex coefficients.
The following transforms can be defined:

(1) The cosine DFT (cos-DFT):

1∑
N−1
2𝜋nk
XCF (k) = x(n) cos (3.30)
N n=0 N
64 3 Other Fast Algorithms for the FFT

(2) The sine DFT (sin-DFT):

1∑
N−1
2𝜋nk
XSF (k) = x(n) sin (3.31)
N n=0 N
(3) The discrete cosine transform (DCT):


2 N−1
XDC (0) = x(n)
N n=0
( )
2∑
N−1
2𝜋(2n + 1)k
XDC (k) = x(n) cos (3.32)
N n=0 4N
The inverse transform is given by:

1 ∑
N−1
2𝜋(2n + 1)k
x(n) = √ XDC (0) + XDC (k) cos
2 n=1
4N
(4) The discrete sine transform (DST):

√( )N−1 [ ]
2 ∑ 2𝜋(n + 1)(k + 1)
XDS (k) = x(n) sin (3.33)
N + 1 n=0 2N + 2

Through algebraic manipulations, as in the previous sections, it is possible to establish rela-


tionships between the standard DFT and these transforms, and also among these transforms
themselves.
For example, from the definitions, we have:
DFT(N) = cos -DFT(N) − j sin -DFT(N)
Now, considering the cosine DFT:
[ ]

N∕2−1
2𝜋nk ∑
N∕4−1
2𝜋(2n + 1)k
XCF (k) = x(2n) cos + [x(2n + 1) + x(N − 2n − 1)] cos
n=0
N∕2 n=0
4N∕4
it is clear that the cosine DFT of order N can be completed with the help of a cosine DFT of order
N/2 and a DCT of order N/4. In concise form:
cos DFT(N) = cos DFT(N∕2) + DCT(N∕4)
Similarly, the DCT is expressed by:
N∕2−1 [ ]
2 ∑ 2𝜋(4n + 1)k 2𝜋[4(N − n − 1) + 1]k
XDC (k) = x(2n) cos + x(2n + 1) cos
N n=0 4N 4N
and now, taking
N
y(n) = x(2n); 0⩽n⩽ −1
2
y(N − n − 1) = x(2n + 1)
we have:
2∑
N−1
2𝜋(4n + 1)k
XDC (k) = y(n) cos
N n=0 4N
3.3 Partial Transforms 65

DCT (x1) x1

Inverse
S s
DFT

DCT (x2) x2

Figure 3.3 Computing 2 inverse DCTs with the help of a single same-size inverse DFT.

Expanding the cosine function yields:


2𝜋k 2𝜋k
DCT(N) = cos cos DFT(N) − sin sin −DFT(N)
4N 4N
In concise form
{ 𝜋k
}
DCT(x) = 2 c(k) Re DFT(y) e−j 2N (3.34)

1
c(0) = √ ; c(k) = 1; 1 ≤ k ≤ N − 1
2
Finally, the DCT of order N can be completed with the help of a DFT of the same order.
Taking into account the fact that only the real part is exploited in the above equation, it is even
possible to compute two DCTs, using the imaginary part. Similarly, it is possible to compute two
inverse DCTs with a single inverse DFT, as illustrated in Figure 3.3.
At the input of the inverse DFT, the relations between variables are the following [8]:
C0 (x1 ) + jC0 (x2 ) CN∕2 (x1 ) + jCN∕2 (x2 )
S0 = √ ; SN∕2 = √
2 2
{ ( ) ( )}
𝜋k 𝜋k
2Sk = [Ck (x1 ) + CN−k (x2 )] cos + [CN−k (x1 ) − CN−k (x2 )] sin ]
2N 2N
{ ( ) ( )}
𝜋k 𝜋k
+ j [Ck (x1 ) + CN−k (x2 )] sin + [Ck (x2 ) − CN−k (x1 )] cos ]
2N 2N
1 ≤ k ≤ N − 1; k ≠ N∕2
Similarly, at the output of the inverse DFT:
x1 (2p) = Re{s(p)}; x1 (2p + 1) = Re{s(N − p − 1)}

x2 (2p) = Im{s(p)}; x2 (2p + 1) = Im{s(N − p − 1)}


The scheme yields computation gains in image compression, for example.
Among the transforms based on real coefficients, the discrete Hartley transform (DHT) is worth
mentioning. It is defined by [9]:
[ ]
1∑
N−1
nk nk
XDH (k) = x(n) cos 2𝜋 + sin 2𝜋 (3.35)
N n=0 N N
and the inverse transform:
[ ]

N−1
nk nk
x(n) = XDH (k) cos 2𝜋 + sin 2𝜋
k=0
N N
66 3 Other Fast Algorithms for the FFT

The connection with the DFT is given by:


1
X(k) = [X (k) + XDH (N − 1 − k) − j(XDH (k) − XDH (N − 1 − k))] (3.36)
2 DH
The discrete sine and cosine transforms have been introduced for information compression, par-
ticularly in image processing. It is worth pointing out that, for images, they provide reasonably
accurate approximations of the Eigen transform, which yields a signal representation with the min-
imum number of parameters.
This compression power stems from the cancellation of discontinuities at the edge of the DFT,
as pointed out in Section 2.1, because the DFT is performed on a symmetric sequence. However,
this implies that the signal sample with index zero is zero, which is obtained by using the sequence
u(n), defined as follows:
u(2p) = 0; 0 ≤ p ≤ 2N − 1
u(2p + 1) = u(4N − 2p − 1) = x(p); 0≤p≤N −1
The 4N size DFT of the sequence u(n) leads, after simplifications, to expression (3.32).

3.3.4 The Two-dimensional DCT


For a set of (N × N) real data, the two-dimensional DCT (2D-DCT) is defined by:
4e(k1 )e(k2 ) ∑ ∑
N−1 N−1
2𝜋(2n1 + 1)k1 2𝜋(2n2 + 1)k2
X(k1 , k2 ) = 2
x(n1 , n2 ) cos cos (3.37)
N n =0 n =0
4N 4N
1 2

and:
∑ ∑
N−1 N−1
2𝜋(2n1 + 1)k1 2𝜋(2n2 + 1)k2
x(n1 , n2 ) = e(k1 )e(k2 )X(k1 , k2 ) cos cos
k1 =0 k2 =0
4N 4N

with:
1
e(k) = √ ; k=0
2
e(k) = 1; k≠0
That transform is separable, and it can be computed as follows:
[ ]
2 ∑
N−1
2𝜋(2n2 + 1)k2 2 ∑ (
N−1
) 2𝜋(2n1 + 1)k1
X(k1 , k2 ) = e(k2 ) cos × e(k ) x n1 , n2 cos
N n =0
4N N 1 n =0 4N
2 1

Thus, the 2D-transform can be computed using 2N times the 1D-DCT, and the number of real
multiplications is of the order of N 2 log2 (N). In fact, that amount can be reached with the help of
an algorithm based on the decomposition of a DCT of order N into two DCTs of order N/2 [10]. It is
even possible to reach the value 3/4 N 2 log2 N, through extension of the decimation technique and
splitting the set of (N × N) data into subsets of (N/2 × N/2) data [11].

3.4 Lapped Transform

The filtering function of the DFT is improved when the transform is performed on overlapping
blocks of data [12].
3.5 Other Fast Algorithms 67

Time n + 1

X1(n + 1) X2(n + 1)

X1(n) X2(n)

Time n

Figure 3.4 Overlapping data blocks.

Let us consider a data sequence of length 2M and a transform whose matrix is M × 2M. At time n,
the block of processed data can be represented by two M-element vectors denoted X 1 (n) and X 2 (n),
as shown in Figure 3.4.
At time n + 1, a block of data includes half the data of the previous block – that is,
X 1 (n + 1) = X 2 (n). The lapped transform allows the two vectors to be recovered. The block
of data is multiplied by the matrix [A B], A and B being M × M matrices:
[ ] [ ]
X1 (n) X1 (n + 1)
U1 = [A, B] ; U2 = [A, B] (3.38)
X2 (n) X2 (n + 1)
Then, the results, U 1 at time n and U 2 at time n + 1, are multiplied by the transpose matrix.
[ ] [ t] [ ] [ t]
Y𝟏 (n) A Y𝟏 (n + 𝟏) A
= t U𝟏 ; = t U𝟐 (3.39)
Y𝟐 (n) B Y𝟐 (n + 𝟏) B
which yields:
[Y𝟐 (n) + Y𝟏 (n + 𝟏)] = [Bt AX𝟏 (n) + Bt BX𝟐 (n) + At AX𝟏 (n + 𝟏) + At BX𝟐 (n + 𝟏)] (3.40)
𝟏
At time n + 1, the input data are restored by addition: X𝟐 (n) = 𝟐
[Y𝟐 (n) + Y𝟏 (n + 𝟏)], provided the
following conditions are satisfied:
𝟏 t
Bt A = At B = 𝟎; [B B + At A] = IM (3.41)
𝟐
For example, these conditions are met if the elements of the matrix [A B] are:
√ [( )( ) ]
𝟐 M+𝟏 𝟏 𝝅
ank = h(n) cos n + k+
M 𝟐 𝟐 M
( )
𝟏 𝝅
𝟎 ≤ n ≤ 𝟐M − 𝟏; 𝟎 ≤ k ≤ M − 𝟏; h(n) = −sin n +
𝟐 𝟐M
In fact, a bank of M orthogonal filters has been obtained and the terms h(n) are the coefficients of
the prototype filter, which, in the present case, is a low-pass filter whose frequency response, given
in Section 5.8, is more selective than that of the DFT filter.
In image compression, lapped transforms introduce smoothing and attenuate the so-called block-
ing effects.

3.5 Other Fast Algorithms


Fast Fourier transform algorithms form a technique for calculating a DFT of order N using a num-
ber of multiplications of the order of N log2 N. It has been shown in the previous sections that these
algorithms have a relatively simple structure and offer sufficient flexibility for good adaptation to
68 3 Other Fast Algorithms for the FFT

the operating constraints and to technological characteristics to be achieved. Hence, they are of
great interest for practical applications.
Nevertheless, they are not the only method of fast calculation of a DFT, and algorithms can be
elaborated which involve, at least in certain cases, a shorter calculation time or a lower number of
multiplications, or which are applicable for values of the order N which are not necessarily powers
of two.
A first approach consists of replacing the complex multiplications, which are costly in terms of
circuitry or time, with a set of operations which are simpler to put into operation. Reference [13]
describes a technique which uses one property of the DFT mentioned in Section 2.4.1 – namely the
fact that multiplications by the coefficients W k correspond to phase rotations.
The technique known as CORDIC (coordinate rotation digital computer) enables these rotations
to be realized by linking simple operations – to rotate a vector (x,y) through an angle 𝜃 with an accu-
racy of 0/2n , a sequence of n elementary rotations of angle d𝜃 i is performed such that tan d𝜃 i = 2−i
with 0 ≤ i ≤ n − 1 and − 𝜋/2 ≤ 0 ≤ 𝜋/2. The coordinates xi and yi of the vector at iteration i yield the
coordinates at iteration i + 1 by using the expressions:
xi+1 = xi + sign[𝜃i ] ⋅ yi 2−i
yi + 1 = yi − sign[𝜃i ] ⋅ xi 2−i
𝜃i + 1 = 𝜃i − sign[𝜃i ] ⋅ d𝜃i (3.42)
The function sign [𝜃 i ] is the sign of 𝜃, and 𝜃 0 = −𝜃. These operations consist only of additions and
shifts and can therefore be more advantageous than complex multiplication of the same precision.
One method which is particularly interesting is used to obtain a multiplication volume of order N,
instead of N log2 N, for a DFT of order N. This method depends on factorizing the matrix T N in a
particular way. It is decomposed into a product of three factors:
TN = BN CN AN
where AN is a J × N matrix, J is a whole number, CN is a diagonal matrix of dimension J, and BN
is an N × J matrix. The special feature of this factorization is that the elements of the matrices AN
and BN are 0, 1, or − 1. Under these conditions, the calculation requires only J multiplications.
For example, the complex multiplication takes the form:
[ ] [ ] ⎡a 0 0 ⎤ ⎡1 1⎤
a −b 1 −1 0 ⎢
= 0 a+b 0 ⎥ ⎢0 1⎥
b a 1 0 1 ⎢ ⎥ ⎢ ⎥
⎣ 0 0 a − b⎦ ⎣1 0⎦
which shows that 3 real multiplications are sufficient, as shown in Section 2.7.
This decomposition is obvious for J = N 2 ; for example for N = 3, we obtain:
⎡1 ⎤ ⎡1 0 0⎤
⎢ 1 O ⎥ ⎢0 1 0⎥
⎢ ⎥⎢ ⎥
⎢ 1 ⎥ ⎢0 0 1⎥
⎡1 1 1 0 0 0 0 0 0 ⎤ ⎢ 1 ⎥ ⎢1 0 0⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥
T3 = 0 0 0 1 1 1 0 0 0 × ⎢ W ⎥ ⎢0 1 0⎥
⎢ ⎥ ⎢ ⎥ ⎢0
⎣0 0 0 0 0 0 1 1 1 ⎦ W2 0 1⎥
⎢ ⎥⎢ ⎥
⎢ 1 ⎥ ⎢1 0 0⎥
⎢ O W 2 ⎥ ⎢0 1 0⎥
⎢ ⎥⎢ ⎥
⎣ W ⎦ ⎣0 0 1⎦
With some low values of N, certain factorizations are available in which J is of the order of N,
in which case there are the same number of multiplications. In order to generalize this property
3.5 Other Fast Algorithms 69

and to illustrate a suitable factorization of T N , it is necessary to perform a permutation of the data


before and after transformation. For example, for N = 12, by assuming:
⎡ X0 ⎤ ⎡ X0 ⎤
⎢X ⎥ ⎢X ⎥
⎢ 3⎥ ⎢ 9⎥
X
⎢ 6⎥ ⎢ X6 ⎥
⎢ X9 ⎥ ⎢ X3 ⎥
⎢ ⎥ ⎢ ⎥
⎢ X4 ⎥ ⎢ X4 ⎥
′ ⎢ X7 ⎥ ′ ⎢X ⎥
X = ⎢ ⎥ and x = ⎢ 1 ⎥
X
⎢ ⎥10 ⎢X10 ⎥
⎢ X1 ⎥ ⎢ X7 ⎥
⎢X ⎥ ⎢X ⎥
⎢ 8⎥ ⎢ 8⎥
⎢X11 ⎥ ⎢ X5 ⎥
⎢X ⎥ ⎢X ⎥
⎢ 2⎥ ⎢ 2⎥
⎣ X5 ⎦ ⎣X11 ⎦
and by using the Kronecker products of the matrices, it can be shown that:
X ′ = (T3 × T4 )x′
Similarly, if N has L factors such that:
N = NL NL−1 · · · N1
it can be shown that:
X ′ = (TNL × TNL−1 × · · · TN1 )x′ (3.43)
By using the factorization defined earlier for the matrices TNL and the algebraic properties of the
Kronecker products, this becomes:
X ′ = (BNL × BNL−1 × · · · × BN1 (CNL × CNL−1 × · · · × CN1 ) × (ANL × ANL−1 × · · · × AN1 )x′
This result defines a type of algorithm called the Winograd algorithm.
It can be clearly seen that the algorithm of order N is deduced from algorithms of order N i with
1 ⩽ i ⩽ L. Herein lies the importance of algorithms with small numbers of multiplications for small
values of N. Reference [14] gives algorithms for N = 2, 3, 4, 5, 7, 8, 9, and 16, where the number
of multiplications is of the order of N, as shown in Table 3.2. In the multiplication column, the
figures in parentheses give the number of multiplications by coefficients different from 1. Further,
these are complex multiplications which correspond to two real multiplications. The number of
additions is comparable to that of FFT algorithms.
The algorithms for low values of N are obtained by calculating the Fourier transform as a set of
correlations:

N−1
Xk = (xn − x0 )W nk ; k = 1, … , N − 1
n=1

and by using the algebraic properties of the set of exponents of W which are defined modulo N.
For example, for N = 4, the sequence of operations is as follows:
t1 = x0 + x2 , t2 = x1 + x3
m0 = 1(t1 + t2 ), m1 = 1(t1 − t2 )
m2 = 1(x0 − x2 ), m3 = j(x1 − x3 )
70 3 Other Fast Algorithms for the FFT

Table 3.2 Arithmetic complexities of low-order Winograd


algorithms.

Order of the DFT Multiplications Additions

2 2 (0) 2
3 3 (2) 6
4 4 (0) 8
5 6 (5) 17
7 9 (8) 36
8 8 (2) 26
9 11 (10) 44
16 18 (10) 74

X0 = m 0
X1 = m 2 + m 3
X2 = m 1
X3 = m 2 − m 3

For N = 8:
t1 = x0 + x4 , t2 = x2 + x6 , t3 = x1 + x5 ,
t4 = x1 − x5 , t5 = x3 + x7 , t6 = x3 − x7 ,
t7 = t1 + t2 , t8 = t3 + t5 ,

m0 = 1(t7 + t8 ), m1 = 1(t7 − t8 ),
m2 = 1(t1 − t2 ), m3 = 1(x0 − x4 ),
( )
𝜋
m4 = cos (t4 − t6 ), m5 = j(t3 − t5 ),
4 ( )
𝜋
m6 = j(x2 − x6 ), m7 = j sin (t4 + t6 ),
4

s1 = m 3 + m 4 , s2 = m 3 + m 4 , s3 = m 6 + m 7 , s4 = m 6 + m 7
X0 = m 0 , X1 = s1 + s3 , X2 = m 2 + m 5 , X 3 = s2 − s4 ,
X4 = m 1 , X5 = s5 + s4 , X6 = m 2 − m 5 , X 7 = s1 − s3 .

Finally, Winograd algorithms generally introduce a reduction in the amount of computation.


This reduction can be quite significant when compared to FFT algorithms. The situation is similar
for other algorithms, such as those using the polynomial transforms [15].
These techniques have a broad range of potential applications and are of considerable importance
in certain cases. However, it should be noted that they may require a larger memory capacity and
a more complicated sequence of operations, which results in an increase in the size of the system’s
control unit, or in the volume of the program memory.
Another attractive path toward the optimization of processing and machines consists of avoiding
multiplications, as with the Hadamard transform and number theoretic transforms.
3.7 Number-Theoretic Transforms 71

3.6 Binary Fourier Transform – Hadamard


The DFT and its variants require a computation load that can be seen as unaffordable in certain
applications, such as fast real-time processing of large images. This being the case, transforms with
similar properties but without multiplications can be used [16].
The order N = 2M Hadamard transform is defined by the matrix HN derived from the size-2 DFT,
T 2 , through Kronecker products:
⎡𝟏 𝟏 𝟏 𝟏 ⎤
[ ] ⎢ ⎥
𝟏 𝟏 𝟏 −𝟏 𝟏 −𝟏 ⎥
H𝟐 = T𝟐 = ; H𝟒 = T𝟐 × T𝟐 = ⎢ ; HN = T𝟐 × HN∕𝟐 (3.44)
𝟏 −𝟏 ⎢𝟏 𝟏 −𝟏 −𝟏 ⎥
⎢𝟏 −𝟏 −𝟏 𝟏 ⎥⎦

It is real, symmetric, and orthogonal:
𝟏
H H = IN
N N N
The algorithm structure is the same as that of the FFT, but there are no multiplications in the
butterflies. Regarding filtering, the filter bank is much less selective because of the presence of
harmonics of basic frequencies. As an illustration, Figure 3.5 shows the Fourier transform of row
31 in matrix H64 .
Hadamard matrices have applications in certain types of wireless communication systems that
resort to spread spectrum to cope with adverse environments.

3.7 Number-Theoretic Transforms


Fourier transformation involves performing arithmetic operations on the field of complex numbers.
The machines which carry out these operations generally use binary representations which are
approximations of the data and the coefficients. The precision of the calculations is a function of
the number of bits available in the machine.

Amplitude
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 10 20 30 40 50 60
Index (0 – 63)

Figure 3.5 Spectrum of the Hadamard code 31/64.


72 3 Other Fast Algorithms for the FFT

In practice, a machine with B bits carries out its operations on the set with 2B integers: 0, 1, …,
2B − 1. In this set, the usual laws of addition and multiplication cannot be applied, as shifting and
truncation must be introduced, which lead to approximations in the calculations, as was shown in
Section 2.3.
The first condition to be fulfilled to ensure exact calculations in a set E is that the product or the
sum of two elements of the set E must also belong to this set. This condition is satisfied in the set
of integers 0, 1, …, M − 1 if the calculations are made modulo M. By appropriate selection of the
modulus M, it is possible to define transformations with properties which are comparable to those
of the DFT and which allow error-free calculation of convolutions with fast calculation algorithms.
The definition of such transformation rests on the algebraic properties of integers modulo M, for
certain values of M. They are called number-theoretic transforms.
The choice of the modulus M is governed by the following considerations:

(1) Simplicity of the calculations in the modular arithmetic. In principle, modular arithmetic
implies a division by the modulus M. This division is trivial for M = 2m . It is very simple for
M = 2m ± 1, because the result is obtained by adding a carry-bit (1’s complement arithmetic)
or subtracting it.
(2) The modulus must be sufficiently large. The result of the convolution must be capable of repre-
sentation without ambiguity in modulo M arithmetic. For example, a convolution with 32 terms
with 12-bit data and 8-bit coefficients requires M > 225 .
(3) Suitable algebraic properties. The set of modulo M integers should have algebraic properties
allowing the definition of transformations comparable to the DFT.

First, there should be periodic elements in order that the fast algorithms can be elaborated; the
set must have an element 𝛼 such that:

𝛼N = 1

A transformation can then be defined by the expression:


N−1
X(k) = x(n)𝛼 nk (3.45)
n=0

For the existence of the inverse transformation which is defined by the expression:


N−1
x(n) = N −1 X(k)𝛼 −nk (3.46)
k=0

it is first necessary for N and the powers of 𝛼 to have inverses.


It can be shown that N has an inverse modulo M if N and M are prime relative to each other. The
element α should be prime relative to the M and of order N – that is, 𝛼 N = 1.
The existence of the inverse transformation implies a further condition for the 𝛼 i , that:


N−1
𝛿(i) = 1 if i = 0 modulo N
𝛼 ik = N𝛿(i) with
k=0
𝛿(i) = 0 if i ≠ 0 modulo N

This condition reflects the fact that each element (1 − 𝛼 i ) must have an inverse. It can be shown
that all of the conditions for the existence of a transformation and its inverse reduce to the following
one. For each prime factor P of M, N must be a factor of P − 1. Thus, if M is prime, N must divide
into M − 1.
Exercises 73

Fast algorithms can be elaborated if N is a composite number – in particular, if N is a power


of 2. These algorithms are similar to those of the FFT:
N = 2m
The calculations to be performed in the transformation are considerably simplified in the partic-
ular case when 𝛼 = 2.
Finally, an interesting choice for the modulus M is:
M = 22m + 1
where M is prime. These are the Fermat numbers.
A transform based on the Fermat numbers is defined as follows:
m
(1) Modulus M = 22 + 1
(2) Order of the transform: N = 2m + 1
(3) Direct transform:

N−1
X(k) = x(n)2nk
n=0
(4) Inverse transform:
N−1

x(n) = (2t ) X(k)2−nk with t = 2m+1 − m − 1
k=0
Example: m = 3, 2m = 8, M = 257, N = 16, t = 12.
This transform allows the calculation of convolutions of real numbers, as with the DFT, but with
the following advantages:
(a) The result is obtained without approximation
(b) The operations relate to real numbers
(c) The calculation of the transform and its inverse does not require any multiplication. The only
multiplications which remain are in the transformed space.
This technique does, nevertheless, have some significant limitations. As the calculations are
exact, the modulus M needs to be sufficiently large, resulting in numbers which are long.
The relations between the parameters M and N given above require that the calculations be car-
ried out with a number of bits B of the order of N/2. That is, the number of terms in the convolution
is approximately twice the number of bits of data in the calculation. The application is consequently
restricted to convolutions which contain a small number of terms.
The field of application of number theoretic transforms can be widened by employing num-
bers other than the Fermat numbers, or by treating the convolutions with many terms as
two-dimensional convolutions [17]. Reference [18] describes a practical example.
Number-theoretic transforms are used in error-correction coding – particularly in Reed–Solomon
codes.

Exercises

3.1 Find the Kronecker product A × I 3 for the matrix A such that
[ ]
a a
A = 11 12
a21 a22
and the unit matrix I 3 of dimension 3.
Find the product I 3 × A and compare it with the above.
74 3 Other Fast Algorithms for the FFT

3.2 By taking square matrices of dimension 2, verify the properties of the Kronecker products
given in Section 3.1.

3.3 Give the factorizations of the matrix of a DFT of order 64 on base 2, base 4, and base 8, fol-
lowing the procedure given in Section 3.2.
Calculate the required number of multiplications for each of these three cases and compare
the results.

3.4 Using the decimation-in-time approach, factorize the matrix of the DFT of order 12. What is
the minimum number of multiplications and additions required? Write a computer program
for the calculation.

3.5 Factorize the matrix of a DFT of order 16 on base 2 for the following two cases:
(a) When the data appear at both input and output in natural order.
(b) When the stages in the calculation are identical.
For the latter case, devise an implementation scheme involving the use of shift registers as
memories.

3.6 Calculate a DFT of order 16 relating to real data


Give the algorithm which uses a DFT of order 8 for this calculation.
Give the algorithm based on the odd DFT.
Compare these algorithms and the number of operations.

3.7 Calculate a DFT of order 12 by using a factorization of the type given in Section 3.4 and with
the given permutations for the data
Evaluate the number of operations and compare it with the values found in Exercise 3.4.

3.8 To perform a circular convolution of the two sets x = (2, −2, 1, 0) and h = (1, 2, 0, 0) a
number-theoretic transform of modulus M = 17 and coefficient 𝛼 = 4 is used
As N = 4, verify that 𝛼 N = 1. Give the matrices of the transformation and the inverse
transformation. Prove that the desired result is the set y = (2, 2, −3, 2).

References

1 M. C. Pease, Methods of Matrix Algebra. Academic Press, New York, 1965.


2 C. S. Burrus and T. W. Parks, DFT/FFT and Convolution Algorithms, Wiley, New York, 1985.
3 H. Sloate, Matrix representations for sorting and the fast Fourier transform, IEEE Transactions,
21(1), 109–1161974.
4 G. Bergland, A fast Fourier transform algorithm for real valued series, Communications of the
ACM, 11(10), 1968 703–710.
5 J. L. Vernet, Real signals FFT by means of an odd discrete Fourier transform. Proceedings of
IEEE, 1971 5 (10), 1531–1532.
6 G. Bonnerot and M. Bellanger, Odd-time odd-frequency DFT for symmetric real-valued series.
Proceedings of IEEE 64 (3), 392–393, 1976.
References 75

7 H. Ziegler, A fast transform algorithm for symmetric real valued series. IEEE Transactions,
20(5), 1972.
8 C. Diab, M. Oueidat and R. Prost, A new IDCT-DFT relationship reducing the IDCT computa-
tional cost. IEEE Transactions on Signal Processing, 50(7), 1681–84, 2002.
9 R. Bracewell, The fast Hartley transform. Proceedings of IEEE 72(8), 1010–1018, 1984.
10 M. Vetterli, H. J. Nussbaumer, Simple FFT and DCT algorithms with reduced number of opera-
tions. Signal Processing, 6(4), 267–278, 1984.
11 M. A. Haque, A two-dimensional fast cosine transform. IEEE Transactions, 33(6), 1532–1539,
1985.
12 H. S. Malvar, Signal Processing with Lapped Transforms, Artech House, Norwood MA, 1992.
13 A. Despain, Very fast Fourier transform algorithms hardware for implementation. IEEE Trans-
actions on Computers, C-28(5), 333–341, 1979.
14 H. Silverman, An introduction to programming the Winograd Fourier transform algorithm.
IEEE Transactions, 25(2), 152–165, 1977.
15 H. Nussbaumer, Fast Fourier Transform and Convolution Algorithms, Springer, Berlin, 1981.
16 B. Fino and R. Algazi, Unified matrix treatment of fast Walsh-Hadamard transform. IEEE
Transactions on Computers, C-25 (11), 1142–1146, 1976.
17 R. C. Agrawal and C. S. Burrus, Number theoretic transforms to implement fast digital convolu-
tion. Proceeding of IEEE, 63(4), 550–560, 1975.
18 J. H. McClellan, Hardware realization of a Fermat number transform. IEEE Transactions, 24(3),
216–225, 1976.
77

Time-Invariant Discrete Linear Systems

Time-invariant discrete linear systems represent a very important area for digital signal
processing – digital filters with fixed characteristics. These systems are characterized by the
fact that their behavior is governed by a convolution equation. Their properties are analyzed
using the Z-transform, which plays the same role in discrete systems as the Laplace or Fourier
transforms do in continuous systems. In this chapter, the elements which are most useful for
studying such systems will be briefly introduced. To supplement this discussion, reference should
be made to References [1–5].

4.1 Definition and Properties


A discrete system is one which converts a set of input data x(n) into a set of output data y(n). It is
linear if the set x1 (n) + ax2 (n) is converted to the set y1 (n) + ay2 (n), and it is time-invariant if the
set x(n − n0 ) is converted to the set y(n − n0 ) for any integer n0 .
Assume u0 (n) is a unit set, as shown in Figure 4.1, and defined by:
u0 (n) = 1 for n = 0
u0 (n) = 0 for n ≠ 0 (4.1)
The set x(n) can be decomposed into a sum of suitably shifted unit sets:


x(n) = x(m)u0 (n − m) (4.2)
m=−∞

Further, if h(n) is the set forming the system’s response to the unit set u0 (n), then h(n − m) corre-
sponds to u0 (n − m) because of the time-invariance. Linearity then implies the following relation:
∑ ∑
y(n) = x(m)h(n − m) = h(m)x(n − m) = h(n) ∗ x(n) (4.3)
m m

This is the convolution equation which represents the linear time-invariant (LTI) system. Such
a system is completely defined by the values of the set h(n), which is called the system’s impulse
response.
This system has the property of causality if the output with index n = n0 depends only on inputs
with indices n < n0 . This property implies that h(n) = 0 for n < 0, and the output is given by:



y(n) = h(m)x(n − m) (4.4)
m=0

Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
78 4 Time-Invariant Discrete Linear Systems

u0(n)
1

0 n

u0(n–n0)
1

0 n

Figure 4.1 Unit series.

An LTI system is stable if each input with a bounded amplitude has a corresponding bounded
output. A necessary and sufficient condition for stability is given by the inequality:

|h(n)| < ∞ (4.5)
n

To show that the condition is necessary, it is sufficient to apply to the system the input set x(n)
such that:
x(n) = +1 if h(n) ⩾ 0
−1 if h(n) < 0

Then, for n = 0, we obtain:



y(0) = |h(m)|
m

If the inequality (4.5) is not satisfied, then y(0) is not bounded, and the system is not stable. If the
input set is bounded – that is,

|x(n)| ⩽ M for all n

then we have
| ∑ ∑
|
|y(n) ⩽ |h(m)||x(n − m)| ⩽ M |h(m)|
|
| m m

If the inequality (4.5) is satisfied, then y(n) is bounded, and the condition is sufficient.
In particular, the LTI system defined by the response:

h(m) = am with m ⩾ 0

is stable for |a| < 1. The properties of LTI systems will be studied using the Z-transform.

4.2 The Z-Transform

The Z-transform, X(Z), of the set x(n) is defined by the following relation:


X(Z) = x(n)Z −n (4.6)
n=−∞
4.2 The Z-Transform 79

Z is a complex variable and the function X(Z) has a convergence region which, in general, is an
annular ring centered on the origin, with radii R1 and R2 . That is, X(Z) is defined for R1 < |Z| < R2 .
The values R1 and R2 depend on the set x(n). If the set x(n) represents the samples of a signal taken
with period T, its Fourier transform is written as:


S(f ) = x(n)e−j2𝜋fnT
n=−∞

Consequently, for Z = ej2𝜋fT , the Z-transform of the set x(n) coincides with its Fourier transform.
That is, the analysis of a discrete system can be performed with the Z-transform, and, in order to
find a frequency response, it is sufficient to replace Z with ej2𝜋fT .
This transform has an inverse. Assuming Γ is a closed contour containing the origin and all sin-
gular points, or poles, of X(Z), we can write:



Z m−1 X(Z) = x(n)Z m−1−n = x(m)Z −1 + Z m−1−n x(n)
n=−∞ n≠m

and from the theory of residues:


1
x(m) = Z m−1 X(Z)dZ (4.7)
2𝜋j ∫Γ
For example, if X(Z) = 1/(1 − pZ −1 ), direct application of the above equation yields

x(n) = pn for n⩾0


x(n) = 0 for n < 0

Similarly, for X(Z) defined by:



N
ai
X(Z) =
i=1
1 − pi Z −1

there is a corresponding set x(n) such that:



N
x(n) = ai pni for n ⩾ 0
i=1
x(n) = 0 for n < 0

A stability condition appears quite simply from observing that the set x(n) is bounded if and only
if |pi | < 1 for 1 ⩽ i ⩽ N – that is, the poles of X(Z) are inside the unit circle.
In these examples, the terms of the set x(n) can be obtained directly by series expansion. When
X(Z) is a rational fraction, a very simple method of obtaining the first values of the set x(n) is direct
division. For example, if
1 + 2Z −1 + Z −2 + Z −3
X(Z) =
1 − Z −1 − 8Z −2 + 12Z −3
direct division gives:

X(Z) = 1 + 3Z −1 + 12Z −2 + 25Z −3 + · · ·

and thus:

x(0) = 1; x(1) = 3; x(2) = 12; x(3) = 25


80 4 Time-Invariant Discrete Linear Systems

The Z-transform has the property of linearity. Also, the Z-transform of the delayed set x(n − n0 )
is written:
Xn0 (Z) = Z −n0 X(Z) (4.8)
These two properties are used to calculate the Z-transform, Y (Z), of the set y(n) obtained at the
output of a discrete linear system by convolution of the sets x(n) and h(n) which have transforms
X(Z) and H(Z), respectively.
By calculating the Z-transform for the two components of the convolution equation (4.3):

y(n) = h(m)x(n − m)
m
we find,

Y (Z) = h(m)Z −m X(Z) = H(Z)X(Z) (4.9)
m
Consequently, the Z-transform of a convolution product is the product of the transforms. The
function H(Z) is called the Z-transfer function of the LTI system being considered.
The Z-transform of the product of two sets x3 (n) = x1 (n)x2 (n) is the function X 3 (Z) defined by:
( )
1 Z −1
X3 (Z) = X1 (𝜐)X2 𝜐 d𝜐 (4.10)
2𝜋j ∫Γ 𝜐
The integration contour is inside the region of convergence of the functions X 1 (v) and X 2 (Z/v).
When applied to causal sets, the one-sided Z-transform is introduced. The one-sided Z-transform
of the set x(n) is written:


X(Z) = x(n)Z −n (4.11)
n=0
The properties are the same as for the transform defined by equation (4.6), except for delayed sets
where the transform of the set x(n − n0 ) is written:



n0
Xn0 (z) = x(n − n0 )z−n = z−n0 X(z) + x(−n)z−(n0 −n) (4.12)
n=0 n=1
The value of this transform in the study of system response is that account can be taken of the
initial conditions and that the transient response can be brought to light. It also allows for deter-
mination of the extreme values of the set x(n) from the x(z). The initial value x(0) is written as:
x(0) = lim X(Z) (4.13)
z→∞

and the final value is:


x(∞) = lim(Z − 1)X(Z) (4.14)
z→1

More on the Z-transform and its applications can be found in Reference [6]. The above results
can be applied to the calculation of the power of discrete signals.

4.3 Energy and Power of Discrete Signals


Let us calculate the energy E of a signal represented by the set x(n), whose Z-transform is written
X(Z). By definition:


E= |x(n)|2
n=−∞
4.3 Energy and Power of Discrete Signals 81

The set x3 (n) defined by:


x3 (n) = |x(n)|2
can be regarded as the product of two sets x1 (n) and x2 (n) such that:
x1 (n) = x(n); x2 (n) = x(n)
The transform X 3 (Z) is calculated from the functions X 1 (Z) and X 2 (Z) using formula (4.10) given
in the preceding section for the Z-transform of the product of two sets:
( )
1 Z d𝜐
X3 (Z) = X1 (𝜐)X2
2𝜋j ∫Γ 𝜐 𝜐
The evaluation of X 3 (Z) at the point Z = 1 leads to the equation:

∞ ( )
1 1 d𝜐
X3 (1) = |x(n)|2 = X1 (𝜐)X2
n=−∞
2𝜋j ∫Γ 𝜐 𝜐
If Γ is the unit circle, 1∕𝜐 = 𝜐 and consequently:
( ) ∑∞
1
X2 = X2 (𝜐) = x(n)(𝜐)−n = X(𝜐)
𝜐 n=−∞

Since 𝜐 = ej2𝜋f , we have:



∞ 1∕2
E= |x(n)|2 = |X(ej2𝜋f )|2 df (4.15)
n=−∞
∫−1∕2

This is the Bessel–Parseval relation given in Section 1.1.1, which expresses the conservation of
energy for discrete signals. The energy of the signal is equal to the energy of its spectrum.
The calculations above provide an expression which is useful for the norm ||X||2 of the function
X(f ). Indeed, by definition,
1∕2
||X||22 = |X(f )|2 df
∫−1∕2
This becomes:
1 dZ
||X||22 = X(Z)X(Z −1 ) (4.16)
2𝜋j ∫|Z|=1 Z
If X(Z) is a holomorphic function of the complex variable in a domain which contains the unit
circle, the integral is calculated by the method of residues and directly yields the value of ||X||2 ,
which is also the L2 norm of the discrete signal x(n).
Let us now assume we want to calculate the energy Ey of the signal y(n) at the output of the LTI
system with impulse response h(n), to which the signal x(n) is applied.
The signal x(n) is assumed to be deterministic. Using equation (4.15), by setting 𝜔 = 2𝜋f , we can
write:
𝜋
1
Ey = |Y (ej𝜔 )|2 d𝜔
2𝜋 ∫−𝜋
Equation (4.9) directly provides the following result:
𝜋
1
Ey = |H(ej𝜔 )|2 |X(ej𝜔 )|2 d𝜔 (4.17)
2𝜋 ∫−𝜋
These results also apply to random signals.
82 4 Time-Invariant Discrete Linear Systems

4.4 Filtering of Random Signals


If the signal x(n) is random and has a moment of order 1, which is E[x(n)], then the expected value
of the output y(n) for the LTI system can be calculated as:

E[y(n)] = h(m)E[x(n − m)] (4.18)
m
If the expected value of x(n) is stationary, then so is that of y(n), provided that the system is
stable – that is, it satisfies equation (4.5).
If the input signal x(n) is stationary and of the second order, with an autocorrelation function
r xx (n), then the autocorrelation function r yy (n) of the output of the LTI system can be calculated.
Using the equation of definition (1.58), we have:

ryy (n) = E[y(i)y(i − n)] = h(m)E[x(i − m)y(i − n)]
m
If the correlation function r xy (n) between x(n) and y(n) is:
rxy (n) = E[x(i)y(i − n)] (4.19)
we can write:
ryy (n) = h(n) ∗ rxy (n) (4.20)
Then,

rxy (n) = h(m)E[x(i − n − m)] = h(−n) ∗ rxx (n)
m
Finally, one obtains:
ryy (n) = h(n) ∗ h(−n) ∗ rxx (n) (4.21)
Then, the Z-transforms Φxx (Z) and Φyy (Z) are related by:
Φyy (Z) = H(Z)H(Z −1 )Φxx (Z) (4.22)
This expression can provide a more useful approach than equation (4.21) for calculating the
autocorrelation function of the output signal by inverse transformation. With the Fourier
transform, one has:
𝜋
1
ryy (n) = |H(ej𝜔 )|2 Φxx (ej𝜔 )ejn𝜔 d𝜔 (4.23)
2𝜋 ∫−𝜋
In particular, the output power of the signal can be written as:
𝜋
1
Py = ryy (0) = |H(ej𝜔 )|2 Φxx (ej𝜔 )ej𝜔 d𝜔 (4.24)
2𝜋 ∫−𝜋
This is equivalent to equation (4.17) for random signals. If the signal x(n) is assumed to be a white
noise with variance 𝜎x2 , the variance 𝜎y2 of the output signal y(n) is given by:
𝜋
𝜎x2
𝜎y2 = |H(ej𝜔 )|2 d𝜔 (4.25)
2𝜋 ∫−𝜋
or using the equality (4.15):
[ ]

𝜎 y = 𝜎x
2 2 2
h (n) (4.26)
n

These results are of great practical importance and will be frequently used in later sections (for
example, when evaluating the powers of the round-off noise in the filters).
4.5 Systems Defined by Difference Equations 83

4.5 Systems Defined by Difference Equations

The most interesting LTI systems are those where the input and output sets are related by a linear
difference equation with constant coefficients. On the one hand, they represent simple examples,
and on the other, they offer an excellent representation of many natural systems.
A system of this type of order N is defined by the equation:

N

N
y(n) = ai x(n − i) − bi y(n − i) (4.27)
i=0 i=1

By applying the Z-transform to both sides of this equation, and by denoting the transforms of the
sets y(n) and x(n) by Y (Z) and X(Z), we obtain:

N

N
Y (Z) = ai Z −i X(Z) − bi Z −1 Y (Z) (4.28)
i=0 i=1

Hence:

Y (Z) = H(Z)X(Z)

with:
a0 + a1 Z −1 + · · · + aN Z −N
H(Z) = (4.29)
1 + b1 Z −1 + · · · bN Z −N
The transfer function H(Z) is a rational fraction. The ai and bi are the coefficients of the system.
Some coefficients can be zero, as is the case, for example, when the two summations of expression
(4.27) have different numbers of terms. To find the frequency response, it is sufficient to replace
the variable Z by ej2𝜋f in H(Z).
The function H(Z) is written in the form of a quotient of two polynomials N(Z) and D(Z) of degree
N which have N roots Z i and Pi , respectively, with 1 ⩽ i ⩽ N.
By using these roots, another expression for H(Z) appears:
∏N
N(Z) (1 − Zi Z −1 )
H(Z) = = a0 ∏i=1
N
(4.30)
D(Z) −1
i=1 (1 − Pi Z )

where a0 is a scale factor. Thus, we can write:


∏N
(Z − Zi )
H(Z) = a0 ∏i=1 N
(4.31)
i=1 (Z − Pi )

In the complex plane, Z is the affix of a moving point M, Pi and Z i (1 ⩽ i ⩽ N) are affixes of the
poles, and zeros of the function H(Z). Then:

Z − Zi = MZi ej𝜃i and Z − Pi = MPi ej𝜃i

Consequently, the transfer function is expressed by:


∏N
MZi j ∑
N
H(Z) = a0 e (𝜃 − 𝜙i ) (4.32)
i=1
MPi i=1 i

From this, a graphic interpretation in the complex plane can be developed. The frequency
response of the system is obtained when the moving point M lies on the unit circle. Figure 4.2
shows the example of a system of order N = 2.
84 4 Time-Invariant Discrete Linear Systems

J Figure 4.2 Graphical interpretation of a transfer function.


θ1

M
Z1
φ1
P1

φ2 R
P2

θ2

Z2

The modulus of the transfer function is thus equal to the ratio of the product of the distances
between the moving point M and the roots Z i to the product of the distances between M and the
−−→
poles Pi . The phase is equal to the difference between the sum of the angles between the vectors Pi M
−−−→
and the real axis, and the sum of the angles between the vectors Zi M and the real axis, following
the convention introduced in Chapter 1. This graphic interpretation is frequently used in practice
because it offers a very simple visualization of a system’s frequency response.
Analysis of a system using its frequency response corresponds to steady-state operation. It is ade-
quate only insofar as transient phenomena can be neglected. If this is not the case, initial conditions
have to be introduced, to represent, for example, the status of the equipment and the contents of
its memory at switch on.
Consider the behavior for values with index n ⩾ 0 of the system defined by equation (4.27), to
which the set x(n) (x(n) = 0 for n < 0) is applied. The y(n) are completely determined if the values
y(−i) with 1 ⩽ i ⩽ N are known. These values correspond to the initial conditions, and, in order to
introduce them, the one-sided Z-transform has to be used.
The one-sided Z-transform is applied to both sides of equation (4.27), assuming that the input
x(n) is causal – that is, that x(n) = 0 for n < 0. Allowing for equation (4.12), which gives the transform
Y i (Z) of the delayed set y(n − i):

i
Yi (Z) = Z −i Y (Z) + y(−n)Z −(i−n)
n=1

we obtain:

N

N

N

i
Y (Z) = ai Z −i X(Z) − bi Z −i Y (Z) − bi y(−n)Z −(i−n)
i=0 i=1 i=1 n=1
or

N

i
bi y(−n)Z −(i−n)
i=1 n=1
Y (Z) = H(Z)X(Z) − (4.33)

N
1+ bi Z −i
i=1

The system response with index n, y(n), is obtained by series expansion or by inverse
transformation.
It should be noted that the values y(−i) represent the state of the system at switch on, provided
that only the set of output numbers is contained in the system memory. However, the memory often
contains other internal variables which can be introduced into the analysis for generalization and
to provide other features relating, in particular, to the implementation.
4.6 State Variable Analysis 85

4.6 State Variable Analysis

The state of a system of order N at time n is defined by a set of at least N internal variables repre-
sented by a vector U(n) called the state vector. Its operation is governed by the relations between
this state vector and the input and output signals. The behavior of a linear system to which the
input set x(n) is applied, and which provides the output set y(n), is described by the following pair
of equations, called state equations [7]:
U(n + 1) = AU(n) + Bx(n)
y(n) = Ct U(n) + dx(n) (4.34)

A is called the matrix of the system, B is the control vector, C the observation vector, and d is the
transition coefficient. The set x(n) is the innovation and y(n) is the observation. The reasons why
they are so called will be outlined later, particularly in Chapters 6 and 15. The matrix A is a square
matrix of order N. The vectors B and C are N-dimensional.
The state of the system at time n is obtained from the initial state at time zero by the equation:

n
U(n) = An U(0) + An−i Bx(i − 1) (4.35)
i=1

Consequently, the behavior of such a system depends on successive powers of the matrix A.
The Z-transfer function of the system is obtained by taking the Z-transform of the state equations
(4.34). Thus:
(ZI − A)U(Z) = BX(Z)
Y (Z) = Ct U(Z) + dX(Z)

and consequently:

H(Z) = Ct (ZI − A)−1 B + d (4.36)

The poles of the transfer function thus obtained are the values of Z for which the determinant of
the matrix (ZI − A) is zero. That is, they are the roots of the characteristic polynomial of A. Con-
sequently, the poles of the transfer function are the eigenvalues of the matrix A and have absolute
values less than unity to ensure stability. This result agrees with the equation for the operation
of the system (4.35). Indeed, by diagonalizing the matrix A, it can be seen that it is the condition
whereby the vector U(n) = An U(0) tends toward zero as n tends toward infinity – a situation which
corresponds to the free evolution of the system from the initial state U(0).
Examination of the transfer function of the system (4.36) shows by another route that when a
system is specified by the input–output equation, there is some latitude in the choice of the state
parameters. Indeed, only the eigenvalues of the matrix A are imposed, and the matrix of the system
can be replaced by another matrix A′ = M −1 AM, which has the same eigenvalues. Then, in order
to preserve the same output set, using equation (4.35), the following criteria are necessary:

A′ = M −1 AM; C′t M; B′ = M −1 B

The matrix A can also be replaced by its transpose A′ . The system is then described by a system
of equations, parallel to that of equation (4.34), corresponding to the state vector V(n), such that:
V(n + 1) = At V(n) + Cx(n)
y(n) = Bt V(n) + dx(n) (4.37)
86 4 Time-Invariant Discrete Linear Systems

This state representation produces another method of realization.


The results obtained in this section are used subsequently for studying the properties of, and
finding the structures for, the realization of LTI systems.

Exercises

4.1 Consider an LTI system with an impulse response h(n) such that:

h(n) = 1 0 ⩽ n ⩽ 3
h(n) = 0 elsewhere

Calculate the response y(n) to the set x(n) given by

x(n) = an with a = 0.7 for 0 ⩽ n ⩽ 5


x(n) = 0 elsewhere

Calculate the response to the set:

x(n) = cos(2𝜋n∕8) for 0 ⩽ n ⩽ 7


x(n) = 0 elsewhere

4.2 Show that the Z-transform of the causal sequence x(n) defined by:

x(n) = nTe−anT for n ⩾ 0


x(n) = 0 for n < 0

is
Te−aT Z −1
X(Z) =
(1 − eaT Z −1 )2
Calculate the inverse transform of ln(Z − a), Z/{(Z − a)(Z − b)} and establish the conditions
on a and b such that the resulting sequence converges.

4.3 Calculate the Z-transform for the impulse response:


sin[(n + 1)𝜃]
h(n) = r n n⩾0
sin(𝜃)
h(n) = 0 n<0

What is the domain of convergence of the function obtained? Plot its poles and zeros in the
Z-plane.

4.4 Consider an LTI system whose transfer function H(Z) is written:


1
H(Z) =
1 − 1.6Z −1 + 0.92Z −2
and to which a signal with a uniform spectrum and unit power is applied. Calculate the power
of the signal at the output of the system and give the spectral distribution.
References 87

4.5 Use the one-sided Z-transform to calculate the response of the system defined by the
difference equation:
y(n) = x(n) + y(n − 1) − 0.8y(n − 2)
with the initial conditions y(−1) = a and y(−2) = b, to the set x(n) defined by:
x(n) = ejn𝜔 for n ⩾ 0
x(n) = 0 for n < 0
Show the response due to the initial conditions and the steady state response.

References

1 L. R. Rabiner and B. Gold, Theory and Application of Digital Signal Processing. Chapter II,
Prentice Hall, 1975.
2 A. V. Oppenheim and R. W. Schafer, Digital Signal Processing. Chapter II, Prentice Hall, 1974.
3 M. Kunt, Digital Signal Processing, Artech House, Norwood, MA, 1986.
4 F.J. Taylor, Digital Filter Design Handbook, Dekker, New York, 1983.
5 A. Peled and B. Liu, Digital Signal Processing: Theory, Design and Implementation, Wiley,
New York, 1976.
6 E. I. Jury, Theory and Application of the Z-Transform Method, Wiley, 1964.
7 J. A. Cadzow, Discrete Time Systems, Prentice-Hall, 1973.
89

Finite Impulse Response (FIR) Filters

Digital FIR filters are discrete linear time-invariant systems in which an output number, rep-
resenting a sample of the filtered signal, is obtained by weighted summation of a finite set of
input numbers, representing samples of the signal to be filtered. The coefficients of the weighted
summation constitute the filter’s impulse response and only a finite number of them take nonzero
values. This is a “finite memory” filter – that is, it determines its output as a function of input data
of limited age. It is frequently called a non-recursive filter because, unlike the infinite impulse
response filter, it does not require a feedback loop in its implementation.
The properties of FIR filters will be illustrated by two simple examples.

5.1 FIR Filters


Consider a signal x(t) represented by its samples x(nT), taken at frequency f s = 1/T, and examine
the effect on its spectrum of replacing the set x(nT) with the set y(nT) defined by the equation:
1
y(nT) = [x(nT) + x(n − 1)T] (5.1)
2
This set is also obtained by sampling the signal y(t) such that:
1
y(t) = [x(t) + x(t − T)]
2
If Y (f ) and X(f ) denote the Fourier transforms of the signals y(t) and x(t), then:
1
Y (f ) =X(f )(1 + e−j2𝜋fT )
2
Such an operation corresponds to the transfer function:
H(f ) = Y (f )∕X(f )
where
H(f ) = e−j𝜋fT cos(𝜋 fT) (5.2)
This is called cosine filtering, and it conserves the zero frequency component and eliminates that
at f s /2, as can be readily verified.
In the expression for H(f ), the complex term e−j𝜋fT represents a delay 𝜏 = T/2, which is the
propagation time of the signal through the filter.

Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
90 5 Finite Impulse Response (FIR) Filters

|H(f)| Figure 5.1 Cosine filtering.


1

–fe/2 0 fe/2 fe f

h(t)

0.5

T 0 T t

2 2

The impulse response h(t) which corresponds to the filter transfer function |H(f )| is written as:
[ ( ) ( )]
1 T T
h(t) = 𝛿 t+ +𝛿 t−
2 2 2
Figure 5.1 shows the characteristics of the filter.
Another simple operation associates the set x(nT) with the set y(nT) by:
1
y(nT) = [x(nT) + 2x[(n − 1)T] + [(n − 2)T]] (5.3)
4
As in the previous case, this equation conserves the component with zero frequency and
eliminates the one with frequency f s /2. This corresponds to the transfer function:
1 1
H(f ) = (1 + e−j2𝜋f2T + e−j2𝜋f2T ) = e−j2𝜋f2T (1 + cos 2𝜋fT) (5.4)
4 2
This is a raised-cosine filter. Its propagation delay is 𝜏 = T, and |H(f )| corresponds to the impulse
response h(t) such that:
1 1 1
h(t) = 𝛿(t + T) + 𝛿(t) + 𝛿(t − T)
4 2 4
This is a more selective low-pass filter than the preceding one, and it is evident that an even
more selective filtering function can be obtained merely by increasing the number of terms in the
set x(nT) over which the weighted summation is carried out (Figure 5.2).

|H(f)| Figure 5.2 Raised-cosine filtering.


1

–fe/2 0 fe/2 fe f

h(t)
0.5

–T 0 T t
5.2 Practical Transfer Functions and Linear Phase Filters 91

These two examples have served to illustrate the following properties of FIR filters:
(1) The input set x(n) and the output set y(n) are related by an equation of the following type (the
defining relation):

N−1
y(n) = ai x(n − i) (5.5)
i=0

The filter defined in this way comprises a finite number N of coefficients ai . If it is regarded as
a discrete system, its response h(i) to the unit set is:
h(i) = ai if 0 ⩽ i ⩽ N − 1
0 elsewhere
That is, the impulse response is simply the set of the coefficients.
(2) The transfer function of the filter is:

N−1
H(f ) = ai e−j2𝜋fiT (5.6)
i=0

or, expressed in terms of Z:



N−1
H(Z) = ai Z −i (5.7)
i=0

(3) The function H(f ), the filter’s frequency response, is periodic with period f s = 1/T. The coeffi-
cients ai (0 ⩽ i ⩽ N − 1) form the Fourier series expansion of this function.
In view of the Bessel–Parseval relation given in Section 1.1.1, the following can be written:

N−1
1
f
s
|ai |2 = |H(f)|2 df (5.8)
i=0
fs ∫0
(4) If the coefficients are symmetrical, the transfer function can be written as the product of two
terms, of which one is a real function and the other a complex number with modulus 1, repre-
senting a constant propagation delay 𝜏 which is a whole multiple of half the sampling period.
Such a filter is said to have a linear phase.

5.2 Practical Transfer Functions and Linear Phase Filters

A digital filter which processes numbers representing samples of a signal taken with period T
has a periodic frequency response of period f s = 1/T. As a consequence, this function H(f ) can
be expanded into a Fourier series:


H(f ) = 𝛼n e+j2𝜋fnT (5.9)
n=−∞

with
f
s
1
𝛼n = H(f )e−j2𝜋fnT df (5.10)
fs ∫0
The coefficients 𝛼 n of the expansion are, except for a constant factor, the samples taken with
period T of the Fourier transform of the function H(f ) over a frequency interval of width f s . As they
92 5 Finite Impulse Response (FIR) Filters

form the impulse response, the condition (4.5) for the stability of the filter implies that the 𝛼 n tends
toward zero when n tends toward infinity. Consequently, the function H(f ) can be approximated
by an expansion reduced to a limited number of terms:

Q
H(f ) ≃ 𝛼n ej2𝜋fnT = HL (f )
n=−p

where P and Q are finite integers. The approximation is improved as the numbers increase.
The property of causality, which expresses the fact that in a real filter, the output cannot precede
the input in time, implies that the impulse response h(n) is zero for n < 0. Using relations (5.5) and
(5.6), if the filter is causal, then Q = 0 and we obtain:

P
HL (f ) = an e−j2𝜋fnT
n=0

As a result, any causal digital filtering function can be approximated by the transfer function of
an FIR filter.
The fact that transfer functions with linear phase can be realized is an important property which
is exploited in spectral analysis or in data-transmission applications. For FIR filters, this capability
leads to a reduction in complexity, because it implies symmetry in the coefficients which is used to
halve the number of multiplications needed to produce each output number.
By definition, a linearphase filter has the following frequency response:

H(f ) = R(f )e−j𝜙(f ) (5.11)

where R(f ) is a real function and the phase 𝜙(f ) is a linear one: 𝜙(f ) = 𝜙0 + 2𝜋f𝜏, when 𝜏 is a
constant giving the propagation delay through the filter.
In fact, the phase is not rigorously linear, due to the sign changes of R(f ) which introduce dis-
continuities of 𝜋 in the phase. However, the filter is said to be linear phase.
The impulse response of this filter is written as:

h(t) = e−j𝜙0 R(f )ej2𝜋f(t−𝜏) df (5.12)
∫−∞
By assuming 𝜙0 to be zero and by decomposing the real function R(f ) into an even part P(f ) and
an odd part I(f ), this becomes:
∞ ∞
h(t + 𝜏) = 2 P(f ) cos(2𝜋ft)df + 2j I(f ) sin(2𝜋ft)df
∫0 ∫0
If the condition is imposed that the function h(t) must be real, this becomes:

h(t + 𝜏) = 2 P(f ) cos(2𝜋ft)df
∫0
This relation shows that the impulse is symmetric about the point t = 𝜏 on the time axis. Such a
condition is satisfied in a filter with real symmetric coefficients. Two configurations are available,
depending upon whether the number of coefficients N is even or odd:

(1) N = 2P + 1: the filter has a propagation time 𝜏 = PT. The transfer function is:
[ ]
∑P
−j2𝜋fPT
H(f ) = e h0 + 2 hi cos(2𝜋fiT) (5.13)
i=1
5.2 Practical Transfer Functions and Linear Phase Filters 93

( )
(2) N = 2P: the filter has a propagation delay 𝜏 = P − 12 T. The transfer function is:
( )
∑P [ ( ) ]
−j2𝜋f(P−1∕2)T 1
H(f ) = e 2 hi cos 2𝜋f i − T (5.14)
i=1
2

The filter coefficients hi form the digital filter’s response to the unit set. They can also be regarded
as samples, taken with period T, of the continuous impulse response h(t) of a filter which has the
same frequency response as the digital filter in the range (−T/2, T/2) but has no periodicity on the
frequency axis. Figures 5.3 and 5.4 illustrate this for odd and even N.
These filters have the even function P(f ) in their frequency response. With real coefficients, a
frequency response can also be obtained which corresponds to I(f ), the odd part of R(f ).
As the function h(t) must be real, this category of filter has the transfer function:
H(f ) = −je−j2𝜋f𝜏 I(f ) = e−j(𝜋∕2) e−j2𝜋f𝜏 I(f )
A fixed phase shift 𝜙0 = 𝜋/2, which corresponds to a quadrature, is added to the frequency-
proportional phase shift. This possibility is important in certain types of modulation and will be
examined later. The impulse response is zero at the point t = 𝜏 on the time axis and is antisymmetric
about it. Figures 5.5 and 5.6 show the configurations when the number of coefficients N is even
or odd.

Figure 5.3 Symmetrical filter with odd order. h(t)

0 t
(2P+1)T

Figure 5.4 Symmetrical filter with even order. h(t)

0 t
2PT

Figure 5.5 Antisymmetrical filter with odd order. h(t)

0 t

(2P+1)T

Figure 5.6 Antisymmetrical filter with even order. h(t)

0 t

2PT
94 5 Finite Impulse Response (FIR) Filters

(1) N = 2P + 1: the filter has a propagation delay 𝜏 = PT



P
H(f ) = −je−j2𝜋f𝜏 2 hi sin(2𝜋 + iT) (5.15)
i=1
( )
(2) N = 2P: this becomes 𝜏 = P − 12 T


P [ ( ) ]
1
H(f ) = −jej2𝜋f𝜏 2 hi sin 2𝜋f i − T (5.16)
i=1
2

As h0 = 0, the transfer function has the same form in both cases. It is not difficult to envisage
that fixed phase shifts other than 𝜙0 = 0 and 𝜙0 = 𝜋/2 can be obtained for filters with complex
coefficients.
We now turn our attention to the calculation of the coefficients of FIR filters, assuming the phase
to be linear, which applies to a majority of applications, and is the case when specifications are given
for the frequency response.

5.3 Calculation of Coefficients by Fourier Series Expansion


for Frequency Specifications

The frequency specifications are generally provided through a mask.


For a low-pass filter, for example, the absolute value of the transfer function is required to approx-
imate the value 1 with precision 𝛿 1 , in the frequency band (0, f 1 ), called the pass band, and the
value 0 with precision 𝛿 2 in the band (f 2 , f s /2), which is called the stop (or rejection) band. The
corresponding limits are represented in Figure 5.7. The range Δf = f 2 − f 1 is called the transition
band, and the steepness of the cutoff is described by the parameter Rc such that:
f1 + f2
Rc= (5.17)
2(f2 − f1 )
A very simple method of obtaining the coefficients hi is to expand the periodic function H(f ) into
a Fourier series to produce an approximation:
sf
1
hi = H(f )e−j2𝜋if∕fs df

fs 0
In the case of a low-pass filter corresponding to the mask given in Figure 5.7, expression (1.5)
leads to:
f1 + f2 sin 𝜋i[(f1 + f2 )∕fs ]
hi = (5.18)
fs 𝜋i[(f1 + f2 )]∕fs

Ampl. Figure 5.7 Mask for a low-pass filter.


1 + δ1
1 – δ1

δ2
0 f1 f2 fe/2 f
5.3 Calculation of Coefficients by Fourier Series Expansion for Frequency Specifications 95

The table given in Appendix 1 of Chapter 1 can then be used to provide an initial estimation of
the coefficients of an FIR filter. The optimized values are, in fact, not very different.
For the filter to be realizable, it is necessary to limit the number of coefficients to N. This operation
boils down to multiplying the impulse response h(t) by a time window g(t) so that:
NT NT
g(t) = 1 for − ⩽t⩽
2 2
0 elsewhere

Using equation (1.10), the Fourier transform of this function is written as:
sin(𝜋fNT)
G(f ) = NT (5.19)
𝜋fNT
Figure 5.8 shows these functions.
The real filter, with a limited number of coefficients N, has the following convolution product
H R (f ) as its transfer function:

HR (f ) = H(f ′ )G(f − f ′ ) df ′
∫−∞
By limiting the number of coefficients, ripples are introduced and the steepness of the cutoff of
the filter is limited, as shown in Figure 5.9, which corresponds to an ideal low-pass filter with a
cutoff frequency f c .
The ripples depend on those of the function G(f ). In order to reduce their amplitude, it is suffi-
cient to choose as the time window functions whose spectra introduce smaller ripples than that of

Figure 5.8 Rectangular window. g(t)

NT T NT t

2 2
G(f)
NT

0 1 f
NT

Figure 5.9 Effect of limiting the number of coefficients. H(f)


1

–fc 0 fc f

HR(f)
1

–fc 0 fc f
96 5 Finite Impulse Response (FIR) Filters

the rectangular window given above. This situation has been encountered in Section 2.4.2 for spec-
trum analysis and the same functions can be employed. One example is the Hamming window,
which is defined as:
g(t) = 0.54 + 0.46 cos(2𝜋i∕NT) for |t| ⩽ NT∕2
0 for |t| > NT∕2
The consequence of reducing the ripples in the pass and stop bands is an increase in the width
of the transition band.
The function which presents the smallest ripples for a given width of the principal lobe is the
Dolph–Chebyshev function:
cos[Kcos−1 (Z0 cos 𝜋x)]
G(x) = for x0 ⩽ x ⩽ 1 − x0
ch[K ch−1 (Z0 )]
ch[K ch−1 (Z0 cos 𝜋x)]
for 0 ⩽ x ⩽ x0 and 1 − x0 ⩽ x ⩽ 1 (5.20)
ch[K ch−1 (Z0 )]
with x0 = (1/𝜋) cos−1 (1/Z 0 ), K an integer, and Z 0 a parameter. This function, as shown in
Figure 5.10, presents a main lobe of width B, given by:
( )
2 −1 1
B = 2x0 = cos
𝜋 Z0
and secondary lobes of constant amplitude given by:
1
A=
ch[K ch−1 (Z0 )]
It is periodic and its inverse Fourier transform is formed of a set of K + 1 discrete nonzero values,
used to weight the coefficients of the Fourier series expansion of the filter function to be approxi-
mated.
It is worth noting that if the ripples in the function G(x) are of constant amplitude, those in the
resulting filter decrease in amplitude, with distance from the pass band and stop band edges. The
ripples in the pass and stop bands, however, are the same.
The technique of Fourier series expansion of the function to be approximated leads to a simple
method of determining the coefficients of the filter, but it involves two important restrictions:
(1) The ripples of the filter in the pass and stop bands are equal.
(2) The amplitude of the ripple is not constant.
The first limitation can be eliminated by using a method which preserves the simplicity of direct
calculation: the least-squares method. Further, this corresponds exactly to the objectives to be
achieved in a number of applications.

|G(x)| Figure 5.10 The Dolph–Chebyshev


1 function.

0 x0 1 x
5.4 Calculation of Coefficients by the Least-Squares Method 97

5.4 Calculation of Coefficients by the Least-Squares Method

Let us calculate the N coefficients hi of an FIR filter according to the leastsquares criterion so that
the transfer function approximates a given function.
The calculation can be carried out directly from the relationship between coefficients and fre-
quency response, as shown in Section 5.16 for 2-dimensional filters. However, it is interesting to
provide a more general iterative approach, operating in the frequency domain from an approxi-
mate solution, which can cope with non-quadratic cost functions and can be used for other types
of filters – in particular, IIR filters.
The discrete Fourier transform applied to the set hi , with (0 ⩽ i ⩽ N − 1), produces a set H k such
that:

1 ∑ −j2𝜋(ik∕N)
N−1
Hk = he (5.21)
N i=0 i

The set of H k , 0 ⩽ k ⩽ N − 1 forms a sample of the filter’s frequency response with period f s /N.
Conversely, the coefficients hi are related to the set H k by the equation:


N−1
Hk = hk e−j2𝜋(ik∕N) (5.22)
k=0

Consequently, the problem of calculating the N coefficients is equivalent to that of determining


the filter’s frequency response at N points in the range (0, f s ). The function H(f ) is then obtained
by the interpolation formula which expresses the convolution product of the set of the samples
H k 𝛿(f − [k/N]f s ) by the Fourier transform of the rectangular sampling window, calculated in
Section 2.4:

N−1
sin{𝜋N[(f ∕fs ) − (k∕N)]}
H(f ) = Hk (5.23)
k=0
N sin{𝜋[(f ∕fs ) − (k∕N)]}

It should be noted that this expression simply forms a different type of series expansion of the
function H(f ) with a limited number of terms.
Given the function to be approximated, D(f ), a first possibility is to choose the H k such that:

( )
k
Hk = D f for 0 ⩽ k ⩽ N − 1
N s
The transfer function of the filter H(f ), obtained by interpolation, shows ripples in the pass and
stop bands, as shown in Figure 5.11.

H(f)

0 f1 f2 fe/2 f

Figure 5.11 Interpolated transfer function.


98 5 Finite Impulse Response (FIR) Filters

The divergence between this function and that given represents an error e(f ) = H(f ) − D(f ) which
can be minimized by the least-squares criterion. The procedure begins by evaluating the squared
error E which is the L2 norm of the divergence function. To do this, the response H(f ) is sampled
with a frequency step Δ less than f s /N, in a way that produces interpolated values, for example:
fs
Δ= with L as an integer greater than 1
NL
The function e(f ) is calculated at frequencies which are multiples of Δ. Generally, when evalu-
ating the squared error E, only one part of the band (0, f s /2) is taken into account. For a low-pass
filter this can be the pass band, the stop band or the set of both bands. In order to demonstrate the
principle of the calculation, it is assumed that the minimization relates to the pass band (0, f 1 ) of
a low-pass filter, whence:
N0 −1 ( )
∑ f f f
E= e2 n with 1 NL < N0 ⩽ 1 NL + 1
n=0
NL f s fs
Further, it is often useful to find a weighting factor P0 (n) for the error element of index n, so that
the frequency response can be modeled. Thus:
( ) N0 −1

N0 −1
f ∑
E= P02 (n)e2 n s = P02 (n)e2 (n) (5.24)
n=0
NL n=0

With the error function obtained from the interpolation formula (5.23), the squared error E is
a function of the set of H k with 0 ⩽ k ⩽ N − 1 and is expressed by E(H). If these samples of the
frequency response are given increases of ΔH k , a new squared error value is obtained:
∑ 𝜕E
N−1
1 ∑ ∑ 𝜕2 E
N−1N−1
E(H + ΔH) = E(H) + ΔHk + ΔHk ΔH1 (5.25)
K=0
𝜕H 2 k=0 l=0 𝜕HK 𝜕H1
From the defining equation for E and the interpolation equation (5.23), this becomes:

𝜕E ∑
N0 −1
𝜕e(n)
=2 P02 (n)e(n)
𝜕Hk n=0
𝜕Hk

𝜕2 E ∑
N0 −1
𝜕e(n) 𝜕e(n)
=2 P02 (n)
𝜕Hk 𝜕H1 n=0
𝜕H1 𝜕Hk

These equations can be written in matrix form where A is the matrix with N rows and N 0 columns
such that:
⎡ a00 a01 · · · a0(N0 −1) ⎤
⎢ a a11 · · · a1(N0 −1) ⎥⎥ 𝜕e(j)
A = ⎢ 10 with aij =
⎢ ⋮ ⋮ ⋮ ⎥ 𝜕Hi
⎢a a a ⎥
⎣ 0(N−1) 1(N−1) · · · (N−1)(N0 −1) ⎦

Let P0 be the diagonal matrix of order N 0 whose elements are the weighting factor P0 (n). Then:
[ ]
𝜕E
= 2AP20 [e(n)] (5.26)
𝜕Hk
The set of terms 𝜕 2 E/𝜕H k 𝜕H e form a square matrix of order N such that:
𝜕2 E
= 2AP20 At (5.27)
𝜕Hk 𝜕H1
5.5 Calculation of Coefficient by Discrete Fourier Transform 99

The condition for E(H + ΔH) to be a minimum is that all its derivatives with respect to
H k (0 ⩽ k ⩽ N − 1) be zero. Now,

𝜕 𝜕E ∑ 𝜕E 𝜕E
N−1
E(H + ΔH) = + ΔHl
𝜕Hk 𝜕Hk l=0 𝜕Hk 𝜕Hl

The least-squares condition can be written as:

AP20 [e(n)] + AP20 At [ΔH] = 0 (5.28)

Under these conditions, the increments ΔH k (0 ⩽ k ⩽ N − 1) which transfer the initial values
of the samples to the optimal values of the frequency response form a column vector which is
written as:
[ ]−1
[ΔH] = − AP20 At AP20 [e(n)] (5.29)

To summarize, the following operations are required for calculating the coefficients of the filter
by the least-squares method:

(1) Sample the function to be approximated at N points in order to obtain N numbers


H k (0 ⩽ k ⩽ N − 1).
(2) In the frequency band where the error should be minimized, interpolate the response between
the H k in order to obtain N 0 numbers e(n) (0 ⩽ n ⩽ N 0 − 1) which represent the divergence
between the response of the filter and the function to be approximated.
(3) Determine N 0 weighting coefficients P0 (n) as a function of the constraints of the
approximation.
(4) Using the interpolation equation, calculate the elements of the matrix A.
(5) Solve the matrix equation which gives the ΔH k .
(6) Perform an inverse Fourier transformation on the set of numbers (H k + ΔH k ) with 0 ⩽ k ⩽ N − 1
to obtain the coefficients for the filter.

The weighting coefficients P0 (n) allow certain constraints to be introduced – for example, to
obtain ripples in the pass and stop bands which are in a given ratio, or to force the frequency
response to a particular value. This latter condition can also be taken into account by reducing
the number of degrees of freedom by 1; this is more elegant but more complicated to program.
Implementation of this calculation does not present any particular difficulty and permits the
calculation of a filter in a direct way. However, the filter obtained does not have constant amplitude.
This is a commonly required feature, and to achieve it, an iterative technique is employed.
If the matrix inversion in equation (5.29) is difficult or impossible, the optimum can still be
reached by replacing the matrix with a small constant and iterating the process. This is the gradient
algorithm, which is dealt with in Chapter 14.

5.5 Calculation of Coefficient by Discrete Fourier Transform

A first iterative approach consists of using the discrete Fourier transform, which is calculated effi-
ciently by a fast algorithm.
Consider calculation of a linear-phase filter with N coefficients meeting the specification of
Figure 5.7. A discrete Fourier transform of order N 0 with N 0 ≃ 10 N will be used.
100 5 Finite Impulse Response (FIR) Filters

The procedure consists of taking initial values for the coefficients – for example, the terms h given
by (5.18) for −P ≤ i ≤ P, if N = 2P + 1. This set of N values is completed symmetrically with zeros
to obtain a set of N 0 real values, symmetrical with respect to the origin.
A DFT calculation then gives the response H(f ) at N 0 points on the frequency axis:
H(f ) = Hid (f ) + E(f )
where H id (f ) is the ideal response and E(f ) is the deviation from this response.
A reduction of the deviation E(f ) is then performed, by replacing H(f ) with the function G(f )
such that:
G(f ) = Hid (f ) + EL (f ) if H(f ) > Hid (f ) + EL (f )
G(f ) = Hid (f ) − EL (f ) if H(f ) < Hid (f ) − EL (f )
where EL (f ) represents the limit of the deviation given by the characteristic – for example, 𝛿 1 and
𝛿 2 for the low-pass filter in Figure 5.7.
Calculation of the inverse DFT gives N 0 terms; the N values which encircle the origin are retained
and the others are discarded. The procedure is repeated by taking the DFT of the N 0 values obtained
in this way.
Denoting the sum of the squares of the N 0 − N terms discarded in the time domain at iteration
k by J(k), a decreasing function is obtained if the specification of the filter is compatible with the
number of coefficients N. The procedure is terminated when J(k) falls below a fixed threshold.
By applying the method for different numbers of coefficients N, an optimum solution can be
approached and even achieved in special cases. All types of linear phase filters can be designed in
this way.
To obtain the optimum filter, a method based on the Chebyshev approximation is used.

5.6 Calculation of Coefficients by Chebyshev Approximation


The objective is to produce a filter whose frequency response has constant amplitude ripple in
such a way as to best approximate the desired characteristic. An example was given in Figure 5.8
for a low-pass filter whose ripples must not exceed amplitude 𝛿 1 in the pass band and 𝛿 2 in the
stop band. This problem depends upon the approximation of a function by a polynomial in the
Chebyshev sense, and L∞ is the norm to be considered for the error function.
Using the expression for the transfer function of a linear-phase FIR filter, the calculation of the
coefficients corresponds to determining the function H R (f ) which is written as:

r−1
HR (f ) = hi cos(2𝜋fiT) (5.30)
i=0

when the number of coefficients is N = 2r − 1. The technique which will be presented is valid in all
cases, whether N is even or odd, or whether the coefficients are symmetric or antisymmetric. It is
based on the following theorem [1].
A necessary and sufficient condition for H R (f ) to be the unique and, in the Chebyshev [ ] sense, the
1
best approximation to a given function D(f ) over a compact subset A for the range 0, 2 is that the
error function e(f ) = H R (f ) − D(f ) present at least (r + 1) extremal frequencies on A. That is, there
exist (r + 1) extremal frequencies (f 0 , f 1 ,…, f r ) such that e(f i ) = −e(f i − 1 ), with 1 ⩽ i ⩽ r and
|e(fi )| = max |e(f )|
f ∈A
5.6 Calculation of Coefficients by Chebyshev Approximation 101

This result is still valid if a weighting function P0 (f ) for the error is introduced.
Thus, the problem is equivalent to the solution of the system of (r + 1) equations:

P0 (fi )[D(fi ) − HR (fi )] = (−1)i 𝛿

The unknowns are the coefficients hi (0 ⩽ i ⩽ r − 1) and the maximum of the error function: 𝛿.
In matrix form, by writing the unknowns as a column vector and by normalizing the frequencies
so that f s = 1/T, this becomes:
⎡1 cos(2𝜋f ) … cos[2𝜋f (r − 1)] 1 ⎤ ⎡
h ⎤
⎡D(f0 )⎤ ⎢ P0 (f0 ) ⎥ ⎢ 0 ⎥
0 0

⎢ ⎥ ⎢ ⎥
1 ⎥ ⎢ h1 ⎥
⎢D(f1 )⎥ ⎢⎢1 cos(2𝜋f1 ) … cos[2𝜋f1 (r − 1)] ⎢ ⎥
⎢ ⎥=⎢ P0 (f1 ) ⎥ ⎢ ⋮ ⎥
⎢ ⋮ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢⋮ ⋮ ⋮ ⋮ ⎥⎢
⎥ hr−1 ⎥
⎢D(f )⎥ ⎢ ⎢ ⎥
⎣ r ⎦ 1 ⎥⎢
⎢1 cos(2𝜋fr ) … cos[2𝜋fr (r − 1)] ⎥ ⎣ 𝛿 ⎥⎦
⎣ P0 (fr ) ⎦
This matrix equation results in the determination of the coefficients of the filter, under the con-
dition that the (r + 1) extremal frequencies f i are known.
An iterative procedure based on an algorithm called the Remez algorithm is used to find the
extremal frequencies. In this algorithm, each stage has the following phases:

(1) Initial values are assigned or are available for the parameters f i (0 ⩽ i ⩽ r).
(2) The corresponding value 𝛿 is calculated by solving the system of equations, which leads to the
following formula:
a0 D(f0 ) + a1 D(f1 ) + · · · + ar D(fr )
𝛿=
a0 ∕P0 (f0 ) − a1 ∕P0 (f1 ) + · · · + (−1)r ar ∕P0 (fr )
with:

r
1
ak =
cos(2𝜋fk ) − cos(2𝜋fi )
i=0
i≠k
(1) The values of the function H R (f ) are interpolated between the f i (0 ⩽ i ⩽ r) in order to
calculate e(f ).
(2) The extremal frequencies obtained in the previous phase are taken as the initial values for the
next stage.

Figure 5.12 shows the evolution of the error function through one stage of the calculation. The
procedure is halted when the difference between the value 𝛿 is calculated on the basis of the new
extremal frequencies, iteratively. The new 𝛿 value is compared with the preceding value. When the
difference between two consecutive values of 𝛿 falls below a certain threshold, the procedure is
halted. Usually, this occurs after a few iterations.
The convergence of this procedure is related to the choice of initial values for the frequencies f i .
For the first iteration, the extremal frequencies obtained using a different method of calculation
can be used as the filter coefficients. Even more simply, a uniform distribution of the extremal
frequencies over the frequency range can be assumed.
As in the least squares method presented in the previous section, the values H R (f ) have to be inter-
polated. Because of the non-uniform distribution of the extremal frequencies, it is more convenient
102 5 Finite Impulse Response (FIR) Filters

e(f)

step i + 1
δ step i

f1i f1i+1 f3i f3i+1


0
f0 f2i f2i+1 f4i f
f4i+1

Figure 5.12 Evolution of the error function in a stage of the Remez algorithm.

to use the Lagrange interpolation formulae:



r−1
{𝛽k ∕(x − xk )}[D(fk ) − (−1)k 𝛿∕P0 (fk )]
k=0
HR (f ) = (5.31)

r−1
𝛽k (x − xk )
k=0
with:

r−1
1
𝛽k = and x = cos(2𝜋f)
xk − xi
i=0
i≠k
At the end of the iteration, the extremal frequencies obtained are used to produce samples with a
constant frequency increment which, by an inverse discrete Fourier transformation, produces the
coefficients of the filter.
Filters having several hundred coefficients can be designed using this technique, and it applies
to low-pass, high-pass, and band-pass filters with or without fixed phase shift [2].

5.7 Relationships Between the Number of Coefficients and the


Filter Characteristic
In the calculation techniques which have been considered previously, the number of coefficients N
of the filter was assumed to have been given a priori. In practice, then, N is an important parameter
(for example, in projects where the computational complexity needed to implement a digital filter
satisfying given specifications must be determined).
For a low-pass filter, as shown in Figure 5.7, the specifications concern the pass-band ripple 𝛿 1 ,
the stop-band ripple 𝛿 2 , the pass-band edge f 1, and the transition band Δf = f 2 − f 1 . An analysis of
the results obtained in the design of a large number of filters with a wide range of specifications
shows that the number of coefficients is proportional to the logarithms of 1/𝛿 1 and 1/𝛿 2 , and also
to the ratio of the sampling frequency f s to the transition band Δf . The adjustment of parameters
leads to the following estimate N e for the filter order:
( )
2 1
Ne = log f ∕Δf (5.32)
3 10𝛿1 𝛿2 s
5.7 Relationships Between the Number of Coefficients and the Filter Characteristic 103

This simple estimate is sufficient for most practical cases. It clearly indicates the relative impor-
tance of the various parameters. The transition band Δf is the most sensitive, with the pass-band
and stop-band ripples having less significant impact. For example, if 𝛿 1 = 𝛿 2 = 0.01, dividing one of
these figures by 2 leads to an increase of only 10% in the filter order. Moreover, it is worth emphasiz-
ing that, according to the estimation, the filter complexity is independent of the pass-band width.

Examples:

(f s is taken to be 1.0 unless otherwise stated.)

(1) Calculation for a filter with 39 coefficients (N = 39) leads to the following values:

𝛿1 = 0.017; 𝛿2 = 0.034; f1 = 0.14375;


f2 = 0.14375; Δ f = 0.04

With these values for the parameters, the estimate gives N e = 40.
(2) A filter with 160 coefficients (N = 160) has the following parameters:

𝛿1 = 2.24 × 10−2 ; 𝛿2 = 1.12 × 10−4 ; f1 = 0.053125;


f2 = 0.071875; Δf = 0.01875

The estimate gives N e = 164.

It should be noted that non-negligible differences can occur between the value N which is
necessary in practice and the estimated value N e , when the transition-band limits approach 0
and 0.5, or when N is a few units. A set of more complex formulae is given in Reference [3]. As
indicated in Chapter 7, a high-pass filter can be derived from a low-pass one by inverting the sign
of every second coefficient. Consequently, estimate (5.32) can also apply to high-pass filters. When
the mask specifies different ripples for the pass and stop bands, an overestimation of the number
of coefficients can be obtained by assuming the most rigid constraints on 𝛿 1 and 𝛿 2 in the pass and
stop bands, respectively.
For band-pass filters, it is necessary to introduce several transition bands.
Figure 5.13 gives the characteristic of such a filter, which has two transition bands Δf 1 and
Δf 2 . Experience shows that the number of coefficients N depends essentially on the smallest
bandwidth, Δf m = min(Δf 1 , Δf 2 ). Then estimate (5.32) can be applied, with Δf = Δf m . An upper

Ampl.

1 + δ1
1
1 – δ1

δ2
0 f1 f2 f3 f4 0.5 f
–δ2
Δf1 Δf2

Figure 5.13 Mask of a band-pass filter.


104 5 Finite Impulse Response (FIR) Filters

bound is obtained by considering the band-pass filter as a cascade of low-pass and high-pass filters
and summing the estimates.
Example: A band-pass filter with 32 coefficients (N = 32) has the following characteristics:
𝛿1 = 0.015; 𝛿2 = 0.0015; f1 = 0.1; f1 = 0.2;
f3 = 0.35; f4 = 0.425; Δfm = 0.075
The estimate using equation (5.32) with Δf = Δf m gives N e = 32. A set of elaborate formulae to
estimate band pass filter orders is given in Reference [3].
The estimation formulae can be used to complete a computer program for the filter coefficient
calculations by determining the number N at the beginning of the program. Expression (5.32) is
very useful in practice for complexity assessment.
When the frequency responses of filters are examined in the transition band, it is evident that
they approximate to raised-cosine form; the approximation becomes closer as the pass-band and
attenuation-band ripples become closer. In fact, this type of response corresponds to the specifi-
cations imposed on data transmission using Nyquist filters, and it represents another approach to
linear-phase FIR filters.

5.8 Raised-Cosine Transition Filter


The frequency response H(f ) of a filter whose transition band has raised-cosine form is represented
in Figure 5.14(a).

H(f)
Δf
1

0.5

0 fc f

I1(f)

0 fc f

I2(f)
1

Δf 0 Δf f

2 2

Figure 5.14 (a) Response with raised-cosine transition; (b) frequency impulse of width 2fc ; (c) frequency
impulse for the transition.
5.8 Raised-Cosine Transition Filter 105

It is expressed as the following convolution product:


[ ( )]
𝜋 𝜋f
H(f ) = I1 (f ) ∗ I2 (f ) cos (5.33)
2Δf Δf
where I 1 is an impulse of width 2f c and I 2 (f ) is an impulse of width Δf .
Under these conditions, the impulse response h(t) can be written as the product of two impulse
responses i1 (t) and i2 (t) given by:
sin 𝜋2fc t
i1 (t) = 2fc
2𝜋fc t
and
1 1 sin 𝜋Δf(t + 1∕2Δf) 1 1 sin 𝜋Δf(t − 1∕2Δf)
i2 (t) = +
2 2Δf(t + 1∕2∕Δf) 2 2Δf(t − 1∕2∕Δf)
After simplification, this gives:
sin 2𝜋fc t cos 𝜋ft
h(t) = 2fc = (5.34)
2𝜋fc t 1 − 4Δf 2 t2
The total number of coefficients of the filter is determined principally by the function i2 (t) and
the width of its main lobe, equal to 3/Δf .
Hence, in a raised-cosine transition band filter, the number of coefficients can be estimated using:
3fs
N≈ (5.35)
Δf
This estimate can be considered as a first approach, when it is compared with equation (5.32).
The digital filter coefficients are obtained by sampling h(t) with |i| ≤ P and N = 2P + 1:
f Δf
2f sin 2𝜋i fs
c
cos 𝜋i f
hi = c ( )2
s
(5.36)
fs 2𝜋i fc Δf
fs 1−4 f i2
s

Note that this expression can be applied to any filter and provides a more accurate estimation of
the coefficients than (5.18).
The above results can be generalized to any symmetrical transition band:
Δf
H(fc + f ) = 1 − H(fc − f ); |f | ≤
2
These filters, called Nyquist filters, play a crucial role in digital transmission. The impulse
response is zero at multiples of 1/2f c , which determines the interference free transmission rate.
An important particular case is the half-band filter such that f c = f s /4 . Then, even coefficients
are null and, for N = 4 M + 1 coefficients, the input–output relationship is:
[ ]
1 ∑ M
y(n) = x(n − 2M) + h2i−1 [x(n − 2M + 2i − 1) + x(n − 2M − 2i + 1)]
2 i=1

The frequency response is:


[ ]
∑M
−j2𝜋2Mf 1
H(f ) = e 1+2 h2i−1 cos(2𝜋(2i − 1)f )
2 i=1

With its reduced number of computations, this filter is a basic building block in multirate
filtering.
106 5 Finite Impulse Response (FIR) Filters

In communications, the filtering function is generally shared between transmitter and receiver,
which leads to the half-Nyquist filter – for example, with cosine transition band:
1 Δf
H 2 (f ) = 1; |f | ≤ fc −
2
( [ ( )])
Δf
1
cos 𝜋 f − fc − 2 Δf Δf
H 2 (f ) = ; fc − ≤ f ≤ fc +
2Δf 2 2
1 Δf
H 2 (f ) = 0; |f | ≤ fc +
2
The impulse response is:
( ) ( )
Δf Δf 1 Δf
1
4 𝜋
cos 2𝜋t fc + 2
+ 𝜋t
sin 2𝜋t fc − 2
h 2 (t) = (5.37)
1 − (4Δft)2
As mentioned above, digital filter coefficients are obtained by sampling – that is, by substituting
i/f s for t in (5.37).

5.9 Structures for Implementing FIR Filters

FIR filters are composed of circuits which carry out the three fundamental operations of storage,
multiplication, and addition. They are arranged to produce from the set of data x(n) an output set
y(n) according to the equation of definition of the filter. As no real operation can be instantaneous,
the equation which is implemented in place of equation (5.5) in Section 5.1 is:

N−1
y(n) = ai x(n − i − 1) (5.38)
i=0

N data memories are required, and for each output number, N multiplications and N − 1
additions have to be performed. Various arrangements of the circuits can be devised to put these
operations into effect [4–6].
Figure 5.15(a) outlines the filter in what is called the “direct” structure. The transposition of the
diagram of this scheme produces the “transposed” structure. This is represented in Figure 5.15(b),
where the same operators are arranged differently. This structure allows multiplication of each
piece of data x(n) by all of the coefficients in succession. The memories store partial sums,
and, in effect, at time n, the first memory stores the number aN−1 x(n), the next memory stores
aN−1 x(n − 1) + aN−2 x(n), and the last memory stores the sum y(n). The difference between the two
structures lies in the position of the memories. An intermediate structure having two memories
per coefficient can also be envisaged, the internal data being stored for the duration T/2 in each.
The cascade structure so obtained has local connections only.
In linear-phase filters, the symmetry of the coefficients can be exploited to halve the number of
multiplications to be performed for each output number. This is very important for the complexity
of the filter and justifies the almost universal use of linear-phase filters.
The complexity of the circuits depends on the number of operations to be performed and on
their precision. Thus, the multiplication terms should have the minimum possible number of bits,
which will reduce the amount of memory necessary both for the coefficients and for the data. These
limitations modify the characteristics of the filter.
5.10 Limitation of the Number of Bits for Coefficients 107

T T T T
x(n)

xa0 xa1 xa2 xaN–2 xaN–1

+ + + +
y(n)
(a)

x(n)

xaN–1 xaN–2 xaN–3 xa1 xa0

T + T + + T + T
y(n)
(b)

Figure 5.15 Realization of FIR filters, (a) Direct structure; (b) transposed structure.

5.10 Limitation of the Number of Bits for Coefficients

Limitation of the number of bits in the coefficients of a filter produces a change in the frequency
response, which appears as the superposition of an error function. The consequences will be ana-
lyzed for linear-phase filters, and it is a straightforward matter to extend the results to any FIR
filter.
Consider a linear-phase filter with N = 2P + 1 coefficients, for which the transfer function is
written according to Section 5.2, equation (5.13), as:
[ ]

P
−j2𝜋fPT
H(f ) = e h0 + 2 hi cos(2𝜋fiT)
i=1

The limitation of the number of bits in the coefficients introduces an error 𝛿hi (0 ⩽ i ⩽ P) in the
coefficient hi which, under the assumption of rounding with a quantization step q, is such that:
|𝛿hi | ≤ q∕2
This results in the superposition on the function H(f ) of an error function e(f ) such that:
[ ]

P
e(f ) = e−j2𝜋fPT
𝛿h0 + 2 𝛿hi cos(2𝜋fiT) (5.39)
i=1

The amplitude of this function must be limited so that the response of the real filter is within the
required specification. An upper limit is obtained from:

P
|e(f )| ⩽ |𝛿h0 | + 2 |𝛿hi || cos(2𝜋fiT)|
i=1
q
|e(f )| ⩽ N (5.40)
2
In general, this limit is much too large, and a more realistic estimate is obtained by statistical
methods [7].
When a large number of filters with a wide range of specifications are considered and when
general results are required, the variables 𝛿hi (0 ⩽ i ⩽ P) can be viewed as random independent
108 5 Finite Impulse Response (FIR) Filters

variables, uniformly distributed over the range [−q/2, q/2]. In these conditions, they have variance
q2 /12. The function e(f ) can equally be regarded as a random variable. Assume that e0 is the effective
value of the function e(f ) over the frequency range [0, f s ] – that is,
sf
1
e20 = |e(f )|2 df (5.41)

2 0
In fact, the function e(f ) is periodic and defined by its Fourier series expansion. The
Bessel–Parseval equation allows us to write, according to equation (5.8):

1
fs ∑
N−1
|e(f )|2 df = (𝛿hi )2

fs 0 i=0

As a consequence, the variance 𝜎 2 of the random variable e0 can be written as:


[ ]
𝜎 2 = E e20 = N(q2 ∕12)
By assuming the second-order moment to be independent of frequency and by using
equation (5.41), the variable e(f ) can be regarded as a random variable with variance such
that:

𝜎 2 = q∕2 (N∕3) (5.42)
This relation provides an estimate of |e(f )| which is much less than the bound equation (5.40).
In fact, e(f ), resulting from equation (5.39) as a weighted sum of variables which are assumed to be
independent, can be regarded as a Gaussian variable with a zero mean if quantization is performed
by rounding and with variance 𝜎 2 . In order to determine the quantization step q, one can also base
the reasoning on the confidence levels. For example, the probability that |e(f )| will exceed the value
2𝜎 is less than 5% using the table given in Appendix 2 of Chapter 1.
The above results will now be used to estimate the number of bits bc needed to represent the
coefficients of a filter specified by a mask.
Given a filter specification, let 𝛿 m be the required value for the amplitude of the ripple and 𝛿 0
the amplitude of the ripple before the number of bits in the coefficients is limited. The parasitic
function e(f ) must be such that:
|e(f )| < 𝛿m − 𝛿0
The level of confidence in the estimate is deemed sufficient if q is chosen as:

q N 𝛿 − 𝛿0
< m
2 3 2
or

q < (𝛿m − 𝛿0 ) (3∕N) (5.43)
The number of bits bc needed to represent the coefficients depends on the largest of the values
hi (0 ⩽ i ⩽ P), and, allowing for the sign, the quantization step is given by:
q = 21−bc [max|hi |] (5.44)
0⩽i⩽p

If the filter is a low-pass one with a frequency response approaching unity in the pass band and
corresponding to the mask in Figure 5.7, initial approximations of the values of the coefficients can
be calculated by equation (5.18). Under these conditions, the maximum for h0 is obtained with:
h0 = (f1 + f2 )fs (5.45)
5.11 Z–Transfer Function of an FIR Filter 109

and equations (5.41) and (5.42) lead to the following estimate:


[ √ ]
f1 + f2 N 1
bc ≈ 1 + log2 . . (5.46)
fs 3 𝛿m − 𝛿0
where:
bc is the number of bits of the coefficients (including the sign),
N is the number of coefficients of the filter,
f 1 is the pass-band edge,
f 2 is the stop-band edge,
f s is the sampling frequency,
𝛿 m is the limit imposed on the amplitude of the ripple,
𝛿 0 is the amplitude of the ripple of the filter before limitation of the number of bits in the
coefficients.
In practice, equation (5.46) can be simplified. First, the tolerance provided by the filter mask is
generally distributed equally between the ripples before limitation of the number of bits in the coef-
ficients and before the supplementary error caused by this limitation (that is, 𝛿 0 = 𝛿 m /2). Further,
the filters to be realized are generally such that:
0.5fs ∕Δf ≤ N∕3 ≤ 1.5fs ∕Δf (5.47)
which, using equation (5.32), corresponds to a large range of values for the parameters 𝛿 1 and 𝛿 2 .
f
Under these conditions, using estimation (5.35) and substituting Δfs for N/3, a suitable estimate of
the number of bits of the coefficients is given by:

⎡f + f fs 2 ⎤
bc ≈ 1 + log2 ⎢ 1 2
. . ⎥
⎢ fs Δf 𝛿m ⎥
⎣ ⎦
By introducing the steepness of the cutoff of the filter (equation (5.17)) and the normalized
transition band, one finally obtains:
1
bc ≃ 3 + log2 [(f1 + f2 )∕2Δf] −log(fs ∕Δf) + log2 (1∕ min {𝛿1 , 𝛿2 }) (5.48)
2
Thus, the number of bits for the coefficients is directly related to the filter specifications. It should
be stated that filters with a narrow pass band require fewer bits than filters with a wide band.
Equations (5.46) and (5.48) also apply to high-pass and band-pass filters.
The above analysis was performed under the assumption that the two operations of the calcu-
lation of the coefficients and limitation of the number of bits are separate. It is equally possible to
take an overall view of these two operations, and techniques exist for the appropriate calculations,
although they are somewhat complicated [8].

5.11 Z–Transfer Function of an FIR Filter

The Z-transfer function for an FIR filter with N coefficients is a polynomial of degree N − 1, which
is written (equation (5.7)) as:

N−1
H(Z) = ai Z −i
i=0
110 5 Finite Impulse Response (FIR) Filters

This polynomial has N − 1 roots Z i ⋅(1 ⩽ i ⩽ N − 1) in the complex plane and can be written as
the product:

N−1
H(Z) = a0 (1 − Zi Z −1 ) (5.49)
i=1

These roots have certain characteristics because of the properties of FIR filters.
Firstly, if the coefficients are real, each complex root Z i has a corresponding complex conju-
gate root Z i . Hence, H(Z) can be written as a product of first- and second-degree terms with real
coefficients. Each second-degree term is thus written as:

H(Z) = 1 − 2Re(Zi )Z −1 + |Zi |2 Z −2 (5.50)

Secondly, the symmetry of the coefficients of a linear-phase filter must appear in the decompo-
sition into products of factors. For a second-degree term with real coefficients, it is necessary that
|Z i | = 1 if the roots are complex – that is, the root must lie on the unit circle. For a fourth-degree
term with real coefficients, the four complex roots have to be Z i , Zi , Z i , 1∕Zi , 1∕Z i . That is,
( ) [ ] ( )]
1 1 1
H4 (Z) = 1 − 2Re Zi + Z −1 + |Zi |2 + 2 + 4Re Z −2
Zi Zi Zi
( )
1
− 2Re Zi + Z −3 + Z −4 (5.51)
Zi

Under these conditions, an FIR linear-phase filter can be decomposed into a set of elementary
filters of second or fourth degree, having the symmetry properties of the coefficients.
The roots have been calculated for a low-pass filter with 15 coefficients. The coordinates of the
14 roots are:

Z1 = −0.976 ± j0.217 Z5 = 0.492 ± j0.266


Z2 = −0.797 ± j0.603 Z6 = 1.573 ± j0.851
Z3 = −0.512 ± j0.859 Z7 = 0.165
Z4 = −0.271 ± j0.962 Z8 = 6.052

Their positions in the complex plane are shown in Figure 5.16. This illustrates the characteristics
of the frequency response of the filter. The pairs of roots characteristic of the linear phase can be
clearly seen. The configuration of the roots is modified if this constraint is no longer imposed.

J Figure 5.16 Configuration of the zeros in an FIR


j filter.

6.052

1 R
5.12 Minimum-Phase Filters 111

It is interesting to observe that the frequency response of a linear phase filter of even length is
zero at half sampling frequency, as shown by expression (5.14). Thus, such a filter has a zero at
f s /2. Similarly, if an odd-length filter has a zero at f s /2, this is a double zero, which ensures the
symmetry of the frequency response about f s /2.

5.12 Minimum-Phase Filters

The propagation delay through a linear-phase filter can be excessive for some applications. Also, it
is not always possible or useful to use symmetry of the coefficients of a linear-phase filter to simplify
the calculations [9]. Therefore, if phase linearity is not a strictly imposed constraint, the complexity
of the filter may be reduced by abandoning it. A linear-phase transfer function can be considered
as the product of a minimum-phase function and a pure phase shift. The condition for a Z-transfer
function to be minimum phase is that its roots must be within or on the unit circle. This point is
developed in Chapter 9.
The calculation methods developed for linear-phase filters can be adapted. However, the coef-
ficients for a minimum-phase filter can be obtained in a simple way using the coefficients of an
optimal linear-phase filter.
Let the frequency response of a linear-phase filter with N = 2P + 1 coefficients be written as:
[ ]
∑P
−j2𝜋fPT
H(f ) = e h0 + 2 hi cos(2𝜋fiT)
i=1

The ripples in the pass and stop bands are 𝛿 1 and 𝛿 2 , respectively. Let us consider the filter
obtained by adding 𝛿 2 to the above response and by rescaling it to approach unity in the pass band.
Its response H 2 (f ) will be:
[ ]
1 ∑P
−j2𝜋fPT
H2 (f ) = e h0 + 2 hi cos(2𝜋fiT) (5.52)
1 + 𝛿2 i=1

In the pass band, there are ripples with amplitude 𝛿1′ such that:
𝛿1
𝛿1′ =
(1 + 𝛿2 )
Its response in the stop band is represented in Figure 5.17; the ripples are limited to
𝛿2′ = 2𝛿2 ∕(1 + 𝛿2 ).
This is a linear-phase filter because the symmetry of the coefficients is conserved. In contrast,
it can be noticed that the roots of the Z-transfer function which are on the unit circle are double
because H 2 (f ) never becomes negative. Under these conditions, the configuration of the roots is as
shown in Figure 5.18.

H2(f)

δ'2

0 f2 0.5 f

Figure 5.17 Ripple in the stop band of the raised filter.


112 5 Finite Impulse Response (FIR) Filters

j j

1 1

Figure 5.18 Configuration of the zeros of H2 (f) and of a minimum-phase filter.

The roots which are not on the unit circle are not double. However, the absolute value of the
function H 2 (f ) is not modified, except for a constant factor, if the roots Z i outside the unit circle
are replaced by the roots 1/Z i , which are inside the unit circle and thus also become double. This
operation amounts simply to multiplication by G(Z) such that:
[1 − (Z −I ∕Zi )][1 − (Z −1 ∕Zi )]
G(Z) =
(1 − Zi Z −1 )(1 − Zi Z −1 )

Since Z −1 = Z on the unit circle, the symmetry with respect to the real axis yields:
| (Z −1 − Z )(Z −1 − Z ) |
| i i |
| | =1 (5.53)
| (Z − Z )(Z − Z ) | j∞
| i i |Z=e
and thus:
1
|G(ej2𝜋f )| =
|Zi |2
Under these conditions, we can write:
2
H2 (f ) = Hm (f )K (K is a constant)

where H m (f ) is the response of a filter which has a Z-transfer function with P roots which are
single and inside or on the unit circle. This filter satisfies the minimum-phase condition, it has
P + 1 coefficients, and the amplitudes of the ripples in the pass and stop band are 𝛿m1 and 𝛿m2 such
that:

𝛿1 1 𝛿1 𝛿
𝛿m1 = 1 + −1≈ ≈ 1 (5.54)
1 + 𝛿2 2 1 + 𝛿2 2

2𝛿2 √
𝛿m2 = ≈ 2𝛿2 (5.55)
1 + 𝛿2
To design this filter, it is sufficient to start from the linear-phase filter whose parameters 𝛿 1 and
𝛿 2 are determined from 𝛿m1 and 𝛿m2 and to follow the procedure described above. One drawback is
that the extraction of the roots of a polynomial of degree N − 1 is required, which limits the values
that can be envisaged for N. Other procedures can also be used [10, 11].
An estimate Ne′ of the order of the FIR filter with minimum phase shift can be deduced from
equation (5.32). According to the procedure described earlier for the specifications of 𝛿 1 and 𝛿 2 ,
5.13 Design of Filters with a Large Number of Coefficients 113

this becomes:
( )
1 2 1 f3
Ne′ ≈ ⋅ log
2 3 10𝛿1 𝛿22 Δf
or
( )
1 1 fs
Ne′ ≈ Ne − log (5.56)
3 10𝛿1 Δf
The validity of this formula is naturally limited to the case where 𝛿 1 ≪ 0.1. The improvement in
the order of the filter with minimum phase is a function of the ripple in the pass band. It generally
remains relatively low.
Example: Assume the following specifications for a low-pass filter:

𝛿m1 = 0.0411; 𝛿m2 = 0.0137; Δf = 0.0115

whence

𝛿1 = 0.0822; 𝛿2 = 0.0000938

The number of coefficients needed for the corresponding linear-phase filter is estimated at
N e = 24, which leads to Ne′ = 12. Actually, it can be shown that the minimum-phase filter satisfying
the characteristic requires 11 coefficients instead of the 15 for the linear-phase filter.
In conclusion, when the symmetries provided by the linear phase cannot be used, it may be
advantageous to resort to minimum-phase filters.

5.13 Design of Filters with a Large Number of Coefficients

Optimization techniques become difficult to use, or they no longer converge, when the number of
filter coefficients is very large – perhaps one thousand or more – corresponding to extremely nar-
row transition bands of the order of thousandths. One can then use suboptimal techniques which
require only the calculation of filters with a reduced number of coefficients. This is the case in the
method described as frequency masking [12].
Consider a filter H(Z) whose transition band Δf is centered on the cutoff frequency f c . One
starts by designing a low-pass filter H 0 (Z M ) with a reduced sampling frequency of f s /M, where
M < f s /4Δf , such that the transition band of one of the alias frequencies of this filter coincides with
the transition band of the required filter, as shown in Figure 5.19(b).
Two complementary filters are then constructed from H 0 (Z M ) as shown in Figure 5.19(c); this
requires an odd number of coefficients, 2P + 1, for H 0 (Z M ).
A diagram with two branches is obtained, to which are applied the filters G1 (Z) and G2 (Z); G1
and G2 are described as interpolators, and they have the responses given in Figure 5.19(c). It is then
sufficient to sum the outputs to obtain the desired filter of Figure 5.19(a). The overall arrangement
is shown in Figure 5.20.
The procedure thus requires three filters having transition bands of M Δf , f c − kf s /M and
(k + 1)f s /M − f c , where k is the integer which permits the cutoff frequency f c to be included.
The transfer function H(Z) of the required filter takes the form:

H(Z) = H0 (ZM )G1 (Z) + [Z −PM − H0 (ZM )G2 (Z)] (5.57)

which provides the coefficient values.


114 5 Finite Impulse Response (FIR) Filters

H(f) Δf

0 fc f (a)
H0(f)

0 f (b)
G1(f)

G2(f)

0 fc f (c)

Figure 5.19 The principle of frequency masking: (a) desired filter, (b) downsampled filter, (c) interpolating
filters.

1
––Z– PM + G 1 (Z)
2
x (n ) y (n)
+

1 –
H 0 (Z M ) – –– Z PM + G 2 (Z)
2

Figure 5.20 Diagram of filter using the frequency-masking technique.

1
––Z– PM F 1 (Z )
2
x (n ) y (n )
+

1 –
H 0 (Z M ) – ––Z PM F 2 (Z )
2

Figure 5.21 Simplified diagram of the frequency-masking filter.

Note that the arrangement shown in Figure 5.20 provides an efficient realization of the over-
all filter since the filter H 0 (Z M ) has M − 1 zero coefficients among two nonzero coefficients. This
arrangement can be simplified as shown in Figure 5.21. The interpolating filters can be taken as
F 1 (Z) = G1 (Z) + G2 (Z) and F 2 (Z) = G1 (Z) − G2 (Z) but they can also be derived directly from their
specifications, as deduced from Figure 5.19.

5.14 Two-Dimensional FIR Filters


A two-dimensional FIR filter is defined by a relationship between the input x(n, m) and the output
y(n, m) expressed by:
∑ ∑
N1 −1 N2 −1
y(n, m) = aij x(n − i, m − j) (5.58)
i=0 j=0
5.14 Two-Dimensional FIR Filters 115

The set of coefficients aij takes the form of a N 1 × N 2 matrix denoted AN1 N2 . The corresponding
two-variable transfer function:
∑ ∑
N1 −1 N2 −1
−j
H(Z1 , Z2 ) = aij Z1−i Z2 (5.59)
i=0 j=0

is given in vector notation by:


⎡ 𝟏 ⎤
[ ] ⎢ Z−𝟏 ⎥
[AN𝟏 N𝟐 ] ⎢ ⎥
−(N𝟏 −𝟏) 𝟐
H(Z𝟏 , Z𝟐 ) = 𝟏, Z−𝟏
𝟏 , … , Z𝟏 (5.60)
⎢ ⋮ ⎥
⎢Z−(N𝟐 −𝟏) ⎥
⎣ 𝟐 ⎦
The coefficient matrix AN𝟏 N𝟐 is also called the mask. As an illustration, the following high-pass
filters are often used in image processing:
⎡−1 0 1⎤ ⎡1 1 1⎤
A′ = ⎢−1 0 2⎥ ; A′′ = ⎢ 1 −2 1 ⎥
⎢ ⎥ ⎢ ⎥
⎣−1 0 1⎦ ⎣−1 −1 −1⎦
In some edge-extraction processes, the SOBEL filter A′ and the PREVITT filter A′′ are employed
twice – once as above and once after a 90∘ rotation.
The coefficients of the two-dimensional filters can be computed directly from specifications in the
two-dimensional frequency domain. When the impulse response is an even function, with respect
to the two variables, the frequency response and the coefficients can be derived from those of a
linear-phase one-dimensional filter. Let H(𝜔) be the frequency response of such a filter.
Equation (5.13), when the phase term is neglected, yields:

P
H(𝜔) = h0 + 2 hi cos i𝜔
i=1

A polynomial relation exists between cos i𝜔 and cos 𝜔:


cos i𝜔 = Ti (cos 𝜔) (5.61)
where T i (x) is the Chebyshev polynomial of degree i. Under these conditions, H(𝜔) can also be
written as:

P
H(𝜔) = gi (cos 𝜔)i (5.62)
i=0

Changing variables as follows:


∑ ∑
k−1 L−1
cos 𝜔 = H1 (𝜔1 , 𝜔2 ) = t(k, l) cos k𝜔1 cos l𝜔2 (5.63)
k=0 l=0

yields a two-variable function:


( k−1 L−1 )i

P
∑∑
H (e𝜔1 , e𝜔2 ) = gi t(k, l) cos k𝜔1 cos l𝜔2 (5.64)
i=0 k=0 l−0

which can be rewritten as:


∑∑
N1 −1N2 −1
H(𝜔1 , 𝜔2 ) = hij cos i𝜔1 cos j𝜔2 (5.65)
i=0 j=0
116 5 Finite Impulse Response (FIR) Filters

where:
N1 = 2KP + 1; N2 = 2LP + 1
The function t(k, l) can be chosen so as to map the points in the frequency response of the
one-dimensional filter into contours in the (𝜔1 , 𝜔2 ) plane. For example, the circular symmetry
is approximately achieved by:
1
cos 𝜔 = [cos 𝜔1 + cos 𝜔2 + cos 𝜔1 cos 𝜔2 − 1] (5.66)
2
as can be seen from a series expansion of cos 𝜔1 , with 𝜔1 being small. The frequency response of a
filter designed that way is shown in Figure 5.22.
The implementation of a two-dimensional filter can be obtained through straight application of
equation (5.58). For filters derived from a one-dimensional function, expression (5.64) suggests an
important simplification: in the one-dimensional FIR filter with P + 1 coefficients gi , the delays are
replaced by two-dimensional sections corresponding to the function H 1 (𝜔1 , 𝜔2 ) [13].
Separable filters are particularly simple to produce; in this case, the coefficient matrix is
dyadic – that is,
AN1 N2 = V1 V2t
where V 1 and V 2 are the vectors. Then, in accordance with equation (5.60), the transfer function
factorizes:
H(z1 , z2 ) = H1 (z1 )H2 (z2 ) (5.67)
The specifications of such filters are subject to limitations. First, they must have quadrantal
symmetry along the coordinate axes. As shown in Figure 5.23, the useful frequency domain is
divided into four parts: low pass/low pass (LL), low pass/high pass (LH), high pass/low pass (HL),
and high pass/high pass (HH).
Consequently, the ripple specifications must be defined. For a two-dimensional low-pass filter,
the HH domain is subjected to the attenuation of two filters, horizontal and vertical. An illustration
is given in Figure 5.24 which shows the frequency response of a two-dimensional separable filter
based on the half-band filter with coefficients: a = [0.5 0.314 0 −0.094 0 0.045 0 −0.022].
Such filters can be realized by following the definition exactly – that is, a data table representing
an image can be processed row by row with the horizontal filter and column by column with the
vertical filter.
When the image is subjected to a horizontal scan as in television, the signal appears as
one-dimensional and can be processed as such. If each row contains N points, the transfer

H(ω1, ω2)

H(ω)

ω2

0 ω ω1

Figure 5.22 Two-dimensional FIR filter designed from a one-dimensional linear-phase filter.
5.14 Two-Dimensional FIR Filters 117

ω2

BH HH

ω2c

BB HB

0 ω1c π ω1

Figure 5.23 Frequency domains for a two-dimensional separable filter.

f2

f1

(a) (b)

Figure 5.24 Two-dimensional half-band separable filter: (a) low pass/low pass, (b) high pass/high pass.

function can be written as:

H(z1 , z2 ) = H1 (z)H2 (zN ) (5.68)

For example, for the Sobel filter A′ , one has:

⎡1⎤ [ ]
A′ = ⎢2⎥ −1 0 1
⎢ ⎥
⎣3⎦

and the corresponding circuit is given in Figure 5.25. Realization is particularly simple as the cir-
cuits do not contain multipliers.

z–2 z–N z–N


x(n) + + + y(n)

Figure 5.25 Realization of a filter by contour extraction.


118 5 Finite Impulse Response (FIR) Filters

5.15 Coefficients of Two-Dimensional FIR Filters by the


Least-Squares Method

The method will be developed for an important special case – filters with quadrantal symmetry.
Two types of filters correspond to this category: rectangular and lozenge filters with the frequency
domains shown in Figure 5.26.
The frequency response of a zero-phase filter having (2 M + 1) × (2 N + 1) coefficients with quad-
rantal symmetry is expressed by:


M

N
∑ ∑
M N
H(𝜔1 , 𝜔2 ) = h00 + 2 hi0 cos i𝜔1 + 2 h0j cos j𝜔2 + 4 hij cos i𝜔1 cos j𝜔2 (5.69)
i=1 j=1 i=1 j=1

In total, the filter has (1 + M + N + MN) coefficients hij with different values.
The least-squares method with weighting will be applied directly, to approach the desired
response, D(𝜔1 ,𝜔2 ). With an oversampling factor of k, the quadratic deviation function, or cost
function, to be minimized is:
KM KN | ( ) ( )2 ( )
∑ ∑| m𝜋 n𝜋 m𝜋 n𝜋 || m𝜋 n𝜋
J= |H , −D , |W , (5.70)
| KM KN KM KN || KM KN
m=0 n=0 |

with K M = k(M + 0.5) and K N = k(N + 0.5) in order to cover all the useful frequency domains.
The weighting function W(𝜔1 , 𝜔2 ) enables the approximation to be adjusted in accordance with
the ripple specifications, for example.
With simplified notation, this gives:

∑ ∑
KM KN
J= e2 (m, n)W(m, n) (5.71)
m=0 n=0

The minimum of the cost function is obtained for:


∑ ∑
KM KN
𝜕e(m, n)
e2 (m, n)W(m, n) =0 (5.72)
m=0 n=0
𝜕hij

which yields a system with (1 + M + N + MN) equations.

ω2 ω2

π π

π ω1 π ω1

(a) (b)

Figure 5.26 Filters: (a) two-dimensional lozenge and (b) rectangle.


5.15 Coefficients of Two-Dimensional FIR Filters by the Least-Squares Method 119

Designating the coefficient vector by [hij ] and the frequency vector by V(m, n):
[ ( ) ( ) ( ) ( ) ]
m𝜋 n𝜋 n𝜋 n𝜋
V t (m, n) = 1, … , 2 cos i , … , 2 cos j , … , 4 cos i cos j ,…
KM KN KM KN
the solution can be written as:
[K K ]−1 [ K K ]
∑M ∑N ∑ M ∑N
t
[hij ] = W(m, n)V (m, n) V (m, n) W(m, n)V (m, n) D (m, n) (5.73)
m=0 n=0 m=0 n=0

If the number of coefficients is even, it is necessary to modify the parameters. For a filter with
(2 M) × (2 N + 1) coefficients, it is necessary to take:
[ ( ) ( ) ( )]
m𝜋 n𝜋 n𝜋
V (m, n) = … , 2 cos (i − 0.5)
t
, … , 4 cos (i − 0.5) cos j (5.74)
KM KM KN

with K M = kM and K N = k(N + 0.5). The coefficient vector obtained in this case has (M + MN)
elements.
An important characteristic of filters used in image processing is the response to a unit step.
Ringing at the transition can produce repetitions of contours and thus degrade the image. It is
possible to reduce ringing by modifying the desired response D(𝜔1 , 𝜔2 ) using a slope at the end of
the pass band and the start of the attenuation band.
The method is illustrated by the design of a rectangular filter with (2 M + 1) × (2 N + 1) = 9 × 9
coefficients, with 0.125 and 0.25 as the end of the pass band and the start of the attenuation band on

Figure 5.27 Rectangular filter with 9 × 9 coefficients: (a) impulse response, (b) frequency response,
(c) horizontal section of the frequency response, (d) response to a unit step.
120 5 Finite Impulse Response (FIR) Filters

the horizontal frequency axis and 0.0625 and 0.125 on the vertical axis. The 25 different coefficients
obtained are:
⎡ 0.052427 0.0419028 0.0184534 −0.0002861 −0.006258 ⎤
⎢ ⎥
⎢0.0491981 0.0393451 0.0173566 −0.0002629 −0.0059292⎥
⎢ ⎥
hij = ⎢ 0.041534 0.0332908 0.0147612 −0.000261 −0.005282 ⎥
⎢ ⎥
⎢0.0299102 0.0240605 0.107414 −0.0002704 −0.0041828⎥
⎢ ⎥
⎢ ⎥
⎣0.0180912 0.0146366 0.0065523 −0.0003836 −0.0031209⎦

and the corresponding frequency response is given in Figure 5.27. Evidently, this response is very
close to that of a separable filter. Considering now a lozenge filter with (2 M + 1) × (2 N) = 9 × 8
coefficients, a pass band ending at 0.125, and an attenuation band starting at 0.25 on the horizontal
and vertical axes, the coefficients for one quadrant are:

⎡0.0763835 0.0680674 0.0403862 0.0130039 ⎤


0.000071
⎢ ⎥
⎢0.0642979 0.03951 0.0217936 0.0008111 −0.002745 ⎥
hij = ⎢ ⎥
⎢0.0276109 0.0195655 0.0068997 −0.0068997 −0.0110481⎥
⎢ ⎥
⎢0.0065124 0.0011002 0.0085984 0.0085984 −0.0073724⎥⎦

Figure 5.28 Lozenge filter with 9 × 8 coefficients: (a) impulse response, (b) frequency response,
(c) horizontal section of the frequency response, (d) response to a unit step.
Exercises 121

The frequency response is given in Figure 5.28. Calculation has been guided by attempting to
reduce the unit step response g(i, j) defined by:

i

j
g(i, j) = h(i1 , j1 ) (5.75)
i1 =−M j1 =−N

This response is also shown in the figure, where it has been repeated in the four quadrants to
provide a complete picture.
See References [14, 15] for complementary developments in the design techniques for
two-dimensional FIR filters, including those with limited precision coefficients and constraints on
the unit step response.

Exercises

5.1 Consider the 17 coefficients of a low-pass filter with cutoff frequency f c = 0.25f s given in
Figure 5.12. How many of these coefficients assume different values? Give the expression
for the frequency response H(f ). Determine the frequencies for which it is zero and give the
maximum ripple. Determine the zeros of the filter Z-transfer function.

5.2 Consider a filter for which the sampling frequency is taken as the reference (f s = 1) and whose
frequency response H(f ) is such that:

H(k × 0.0625) = 1 for k = 0, 1, 2, 3


H(0.25) = 0.5
H(k × 0.0625) = 0 for k = 5, 6, 7, 8

Using the discrete Fourier transform, calculate the 17 coefficients of this filter. Draw the fre-
quency response and determine the zeros of the Z-transfer function.

5.3 Using the equations given in Section 5.7, determine the ripple of a low-pass filter with 17 coef-
ficients for which the upper frequency of the pass band is f 1 = 0.2 and the lower frequency of
the stop band is f 2 = 0.3. Compare the results obtained with those in the preceding exercises.

5.4 Consider a filter with a transfer function H(f ) which, except for the phase shift, is given by
the equation:

4
H(f ) = h0 + 2 h2i−1 cos[2𝜋f(2i − 1)T]
i=0

Give the direct and transposed structures which allow this filter to be achieved with a min-
imum number of elements. What simplifications are involved if the sampling frequency of
the output is divided by two?

5.5 A narrow-band low-pass filter is defined by the equation



N−1
y(n) = ai x(n − i)
i=0
122 5 Finite Impulse Response (FIR) Filters

How is the frequency response modified if the coefficients ai are replaced by ai (−1)i and by
ai cos (i𝜋/2)? How are the Z-transfer function zeros affected?

5.6 Consider a low-pass filter which satisfies Figure 5.7 with the following values for the
parameters
f1 = 0.05; f2 = 0.15; 𝛿1 = 0.01; and 𝛿2 = 0.001
How many coefficients are needed and how many bits are required to represent them? If the
input data have 12 bits and if the signal-to-noise ratio degradation is limited to ΔSN = 0.1 dB,
what is the internal data wordlength?

References

1 T. W. Parks and J. H. McClellan, Chebyshev approximation for non recursive digital filters with
linear phase. IEEE Transactions on Circuits and Systems, 19(2), 189–194, 1972.
2 J. McClellan, T. Parks and L. Rabiner, “A computer program for designing optimum FIR linear
phase digital filters,” IEEE Transactions on Audio and Electroacoustics, 21, 6, pp. 506–526, 1973,
10.1109/TAU.1973.1162525.
3 J. Shen and G. Strang, The asymptotics of optimal (equiripple) filters. IEEE Transactions, 47(4),
1087–1098, 1999.
4 R. E. Crochiere and A. V. Oppenheim, “Analysis of linear digital networks,” Proceedings of the
IEEE, vol. 63, no. 4, pp. 581–595, 1975, 10.1109/PROC.1975.9793.
5 W. Schussler, On Structures for Non Recursive Digital Filters, Archiv der elektrischen Übertra-
gung, 1972.
6 M. Bellanger and G. Bonnerot, Premultiplication scheme for digital FIR filters. IEEE Transac-
tions, 26(1), 50–55, 1978.
7 D. Chan and L. Rabiner, “Analysis of quantization errors in the direct form for finite impulse
response digital filters,” IEEE Transactions on Audio and Electroacoustics, vol. 21, no. 4,
pp. 354–366, 1973, 10.1109/TAU.1973.1162497.
8 F. Grenez, Synthèse des filtres numériques non récursifs à coefficients quantifiés. Annales des
Télécommunications, 34, 1–2, 1979.
9 M. Feldmann, J. Henaff, B. Lacroix and J. C. Rebourg, Design of minimum phase
charge-transfer transversal filters, Electronic Letters, 15(8), 1979.
10 R. Boite and H. Leich, A new procedure for the design of high order minimum phase FIR
filters. Signal Processing, 3(2), 101–108, 1981.
11 Y. Kamp and C. J. Wellekens, Optimal design of minimum-phase FIR filters. IEEE Transactions,
31(4), 922–926, 1983.
12 Y. C. Lim and Y. Lian, The optimum design of one and two-dimensional FIR filters using the
frequency response masking technique. IEEE Transactions on Circuits and Systems II: Express
Briefs, 40(2), 88–95, 1993.
13 D. Dudgeon and R. Mersereau, Multidimensional Digital Signal Processing, Prentice-Hall,
Englewood Cliffs, NJ, 1984.
14 D. E. Pearson, Image Processing, McGraw-Hill, UK, 1991.
15 V. Ouvrard and P. Siohan, Design of 2D video filters with spatial constraints, Proceedings of
EUSIPCO-92, North Holland, Brussels, August 1992, 1001–1004.
123

Infinite Impulse Response (IIR) Filter Sections

Digital filters with an infinite impulse response (IIR) are discrete linear systems which are governed
by a convolution equation based on an infinite number of terms. In principle, they have infinite
memory. This memory is achieved by feeding the output back to the input, so they are known as
recursive filters. Each element of the set of output numbers is calculated by weighted summation
of a certain number of elements of the input set and of the previous output set.
In general, this IIR allows for much more selective filtering functions to be obtained than with
finite impulse response (FIR) filters of similar complexity. However, the feedback loop complicates
the study of the properties and design of these filters and leads to parasitic phenomena.
When examining IIR filters, it is simpler initially to consider them in terms of first- and
second-order sections. Not only are these simple structures useful in introducing the properties of
IIR filters. They also represent the most frequently used type of implementation. Indeed, even the
most complex IIR filters appearing in practice are generally formed from a set of such sections.

6.1 First-Order Section

Consider a system which, for the set of data x(n), produces the set y(n) such that:

y(n) = x(n) + by(n − 1) (6.1)

where b is a constant. This is a first-order filter section.


This system’s response to the unit set u0 (n) such that:
u0 (n) = 1 for n = 0
0 for n ≠ 0

is the set y0 (n) such that:


y0 (n) = 0 for n < 0
bn for n ≥ 0

This set constitutes the impulse response of the filter. It is infinite and the stability condition is
written as:


|b|n < ∞
n=0

Hence |b| < 1.


Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
124 6 Infinite Impulse Response (IIR) Filter Sections

y(n)
1
1–b

1–e–1
1–b

–1 0 1 2 3 4 5 6 7 8 9 10 11 n
τ

Figure 6.1 Response of the first-order section to a unit step

The response of the system for the set x(n) such that:
x(n) = 0 for n < 0
1 for n ≥ 0

is the set y(n) such that:

y(n) = 0 for n < 0


(1−bn+1 )
(1−b)
for n ≥ 0 (6.2)

which tends toward 1/(1 − b) as n tends toward infinity if the system is stable. This response is
shown in Figure 6.1.
By analogy with a continuous system with time constant 𝜏, sampled with period T, and whose
response yc (n) is written as:

yc (n) = [1 − e−(T∕𝜏)(n+1) ]

the time constant of the first-order digital filter is defined by:

e−T∕𝜏 = b

for b > 0. Then:


T
𝜏= ( ) (6.3)
ln b1

For b close to unity:

b = 1 − 𝛿 with 0 < 𝛿 ≪ 1

that is, for systems defined by:

y(n) = x(n) + (1 − 𝛿)y(n − 1) (6.4)

we obtain:
T
𝜏≈ (6.5)
𝛿
This situation is encountered in adaptive systems, which are presented in Chapter 14.
6.1 First-Order Section 125

If the input set x(n) results for n ≥ 0 from the sampling of the signal x(t) = ej2𝜋ft or x(t) = ej𝜔t , with
period T = 1, then:
ejn𝜔 bn+1 e−j𝜔
y(n) = − (6.6)
1 − be−j𝜔 1 − be−j𝜔
This expression exhibits a transient and a steady-state term which corresponds to the frequency
response H(𝜔) of the filter:
1
H(𝜔) = (6.7)
(1 − be−j𝜔 )
The modulus and the phase of this function are:
1 b sin 𝜔
|H(𝜔)|2 = ; 𝜙(𝜔) = tan−1 (6.8)
(1 − 2b cos 𝜔 + b2 ) 1 − b cos 𝜔
The phase can also be written as:
( )
sin 𝜔
𝜙(𝜔) = tan−1 −𝜔
(cos 𝜔 − b)
and the group delay:
d𝜙 b cos 𝜔 − b2
𝜏g (𝜔) = = (6.9)
d𝜔 1 − 2b cos 𝜔 + b2
It should be noted that for very small values of 𝜔:
|
| H(𝜔)|2 ≈ 1
|
{ [ ]} (6.10a)
b
(1 − b2 ) 1 + (1−b)2
𝜔2

This expression approximates the response H RC (𝜔) of an RC circuit, which is written as:
1
|HRc (𝜔)|2 = (6.10b)
1 + R2 C2 𝜔2
It appears that for frequencies which are very small in comparison to the sampling frequency,
the digital circuit has a frequency response similar to that of an RC network. Figure 6.2(a) shows
the form of the frequency response for a digital first-order circuit. Figure 6.2(b) gives the phase
response, and Figure 6.2(c) the group delay.
The phase can be written as:
( )
sin 𝜔
𝜑(𝜔) = tan−1 −𝜔 cos 𝜔 > b
cos 𝜔 − b
𝜋
𝜑max cos−1 b cos 𝜔 = b
2
( )
sin 𝜔
𝜑(𝜔) = 𝜋 + tan−1 − 𝜔 cos 𝜔 > b (6.11)
cos 𝜔 − b
It passes through a maximum for 𝜔 such that cos 𝜔 = b, which corresponds to cancellation of the
group delay. Thus, the coefficient b directly controls the maximum phase of the section.
The transfer function of the first-order section can also be obtained using the Z-transform.
Assume Y (Z) and X(Z) are the transforms of the output and input sets, respectively. Then:
Y (Z) = X(Z) + bZ −1 Y (Z)
and thus the Z-transfer function, H(Z), is:
1
H(Z) = Z
−1
(1 − bZ ) = (Z−b)
126 6 Infinite Impulse Response (IIR) Filter Sections

|H(f)|

1
1–b φ(ω)

φmax

–π 0 Arc cos b π ω
1
1+b

0 0.25 0.5 f
(a) (b)

τg(ω) b
j
1–b
M

Arc cos b ω α
0 π ω 0 b P 1
–b
1+b

(c) (d)

Figure 6.2 First-order section: (a) frequency response, (b) phase response, (c) group delay, and (d) pole

The frequency response is obtained simply by replacing Z with ej𝜔 , in the expression for H(Z),
with 𝜔 = 2𝜋f .
A graphical interpretation is given in Figure 6.2(d), which represents the pole P of this function
in the complex plane. This is a point on the real axis with coordinate b.
Following this figure:
1
|H| = and 𝜙 = 𝛼 − 𝜔
MP
The stability condition implies that the pole P is inside the unit circle.
An interesting special case is the narrowband integrator, defined by the following transfer func-
tion:
𝜀
H(𝜏) (6.12)
1 − (1 − 𝜀)Z −1 int
with 𝜀 being small such that 0 < 𝜀 ≪ 1.
It can also be shown that the 3 dB bandwidth is approximately equal to 𝜀 and the time constant
equal to 1/𝜀. The norm of the frequency response is: ||H||22 ≈ 2𝜀 .
The one-sided Z-transform generates the transient responses and introduces the initial condi-
tions. Indeed:






y(n)Z −n = x(n)Z −n + b y(n − 1)Z −n
n=0 n−0 n=0
Y (Z) = X(Z) + by(−1) + bZ −1 Y (Z)
6.2 Purely Recursive Second-Order Section 127

Hence:
X(Z) by(−1)
Y (Z) = +
1 − bZ −1 1 − bZ −1
If x(n) = ejn𝜔 , X(Z) is written:


1
Y (Z) = ejn𝜔 Z −n = (6.13)
n=0 1 − ej𝜔 Z −1
The value y(n) is obtained from the equation for the inverse Z-transform:
[ ]
1 n−1 1 1 by(−1)
y(n) = ΓZ • + dZ
j2𝜋 ∫ 1 − ej𝜔 Z −1 1 − bZ −1 1 − bZ −1
By taking a circle with radius greater than unity as the contour of integration, Γ, the theory of
residues gives:
ejn𝜔 bn+1 e−j𝜔
y(n) = − + y(−1)bn+1 (6.14)
1 − be−j𝜔 1 − be−j𝜔
which can also be obtained directly from a series expansion of Y (Z). This expression shows the
steady-state and transient responses, and also the response due to the initial conditions. The last
two items disappear when n increases if |b| < 1 – that is, if the system is stable.
This analysis shows that the first-order filter offers restricted possibilities because it has only
one pole, which must be real if the filter has real coefficients. Further, its frequency response is a
monotonic function. The second-order filter has a wider variety of possibilities. It is the structure
most commonly used in digital filtering because of the modularity it allows, even with the most
complex filters, and because of its properties relating to limitations in the coefficient wordlengths
and the round-off noise. We will first examine the purely recursive filter section.

6.2 Purely Recursive Second-Order Section


Consider a system which, for the dataset x(n), produces the corresponding set y(n) such that:
y(n) = x(n) − b1 y(n − 1) − b2 y(n − 2) (6.15)
In this expression, the signs of the coefficients b1 and b2 have been changed compared to the
previous section in order to facilitate the writing of the Z-transfer function of the system, H(Z):
1 Z2
H(Z) = =
1 + b1 Z −1 + b2 Z −2 Z 2 + b1 Z + b2
This function has a double root at the origin and two poles P1 and P2 such that:
b1 1 √ ( 2 )
P1,2 = − ± b1 − 4b2 (6.16)
2 2
Two cases occur, depending on the sign of b21 − 4b2 :
(1) b𝟐𝟏 ≥ 𝟒b𝟐 : both poles lie on the real axis of the complex plane. The transfer function is simply
the product of two first-order functions with real coefficients. The corresponding filter is com-
posed of two first-order sections in cascade and its properties are deduced accordingly. The
amplitudes are multiplied, and the phases are added. The step response at the output of the
second section is:
[ ]
1 n+1
b1n+1 − b2n+1
y2 (n) = 1 − b2 − (1 − b2 ) (6.17)
(1 − b1 )(1 − b2 ) b1 − b2
128 6 Infinite Impulse Response (IIR) Filter Sections

where b1 and b2 are the coefficients. The corresponding time constant 𝜏 12 is given by:
√ √
𝜏12 ≈ 2 𝜏1 𝜏2 (6.18)

for coefficients sufficiently close to unity. For N identical sections, the time constant 𝜏 N can be
approximated by:

𝜏N ≈ (N)𝜏1 (6.19)

(2) b𝟐𝟏 < 𝟒b𝟐 : the two poles are complex conjugates written as P and P, with

b1 1√ ( )
P=− +j 4b2 − b21 (6.20)
2 2
Figure 6.3 illustrates this, the most interesting case, and the remainder of this section will con-
centrate on this.
The relation between the position of the poles and the filter coefficients is
very simple:

b1 = −2Re(P) (6.21)

That is, the coefficient of the Z −1 term in the expression for H(Z) is equal in modulus to twice
the real part of the pole and has the opposite sign. Then:

b2 = |OP|2 (6.22)

The coefficient of the Z −2 term is equal to the square of the modulus of the pole or to the square
of the distance from the pole to the origin. As will be seen later, both relations are very useful in
determining filter coefficients.
If M denotes the coordinate ej𝜔 in the complex plane, the modulus of the transfer function is:
1
|H(𝜔)| =
MP•M P
and the phase is:

𝜙(𝜔) = 𝛼1 + 𝛼2 − 2𝜔
−−→ −−→
where 𝛼 1 and 𝛼 2 denote the angles between the vectors PM and PM and the real axis.
The analytical expressions are deduced from H(Z) by letting Z = ej𝜔 . By using
1
H(Z) =
1 + b1 Z −1 + b2 Z −2

Figure 6.3 Second-order section with complex poles


j

α1 P
M
ω
0
1
− α2
P
6.2 Purely Recursive Second-Order Section 129

we have:
1
|H(𝜔)|2 = (6.23a)
1 + b21 + b22 + 2b1 (1 + b2 ) cos 𝜔 + 2b2 cos 2𝜔
[ ]
b1 sin 𝜔 + b2 sin 2𝜔
𝜙(𝜔) = − arctan (6.24a)
1 + b1 cos 𝜔 + b2 cos 2𝜔
A very elegant form for the frequency response and the phase is obtained by representing the
poles in polar coordinates, P = rej𝜃 , and expressing H(Z) as a factor product:
1
H(Z) =
(1 − PZ −1 )(1 − PZ −1 )
The coefficients b1 and b2 then become:
b1 = −2r cos 𝜃; b2 = r 2
For H(𝜔), we obtain:
1
H(𝜔) = (6.25)
[1 − rej(𝜃−𝜔) ][1 − re−j(𝜃+𝜔) ]
Hence:
1
|H(𝜔)|2 = (6.23b)
[1 + r 2 − 2r cos(𝜃 − 𝜔)][1 + r 2 − 2r cos(𝜃 + 𝜔)]
[ ] [ ]
r sin(𝜃 + 𝜔) r sin(𝜃 − 𝜔)
𝜙(𝜔) = arctan − arctan (6.24b)
1 − r cos(𝜃 + 𝜔) 1 − r cos(𝜃 − 𝜔)
These expressions permit the curves for |H(𝜔)| and 𝜙(𝜔) to be plotted as a function of the
frequency 𝜔 = 2𝜋f . It can be shown that |H(𝜔)| is an even function and that 𝜙(𝜔) is an odd
function of 𝜔.
The values corresponding to the extrema of |H(𝜔)| are the roots of the following equation, which
is obtained by taking the derivative of equation (6.23) with respect to 𝜔:
sin 𝜔[b1 (1 + b2 ) + 4b2 cos 𝜔] = 0
The extremum frequencies are 0 and 0.5 and another extremal frequency f 0 exists if:
| b1 (1 + b2 ) |
| |<1 (6.26a)
| 4b |
| 2 |
or, in polar coordinates:
2r
cos𝜃 < (6.26b)
1 + r2
In this case,
b1 (1 + b2 )
cos(2𝜋fo ) = cos 𝜔o = − (6.27)
4b2
The frequency f 0 is the resonance frequency of the filter section. The amplitude at the resonance
is written as:
( )
1 √ 4b2
Hm = (6.28)
1 − b2 4b2 − b21
or, in polar coordinates:
1 1
Hm = (6.29)
1 − r (1 + r) sin 𝜃
130 6 Infinite Impulse Response (IIR) Filter Sections

Thus, it appears that the frequency response at resonance is inversely proportional to the distance
from the pole to the unit circle. This is a fundamental expression which will be used frequently in
the following chapters.
It is also important for the second-order section to determine the 3-dB bandwidth, B3 , such that:
B3 = f2 − f1 = (𝜔2 − 𝜔1 )∕2𝜋
with:
|H(𝜔1 )|2 = |H(𝜔2 )|2 = Hm
2
∕2
For a strongly resonant filter section (r ≈ 1), using equations (6.22) and (6.23), the following
approximation holds in the vicinity of the resonance frequency:
| | 1 1
|H(𝜔1 )2 | ≈
| | 4 sin2 𝜃 1 + r 2 − 2r cos(𝜃 − 𝜔1 )
[ ]
1 1
=
2 (1 − r 2 )2 sin2 𝜃
whence:
cos(𝜃 − 𝜔1 ) = ((1 + r 2 )∕2r) − (1 − r 2 )2 ∕4r
By expansion and limiting the number of terms, we derive:
|𝜃 − 𝜔1 | ≈ 1 − r
Hence, the approximation for a strongly resonant filter section is:
B3 = (1 − r)∕𝜋 (6.30a)
This result is used below for calculating the arithmetic complexity of filters.
Another characteristic is sometimes used for a purely recursive second-order section: the equiva-
lent noise bandwidth B2 . This is the bandwidth of a noise source whose spectral density is assumed
to be constant within this band and equal to Hm 2 and whose total noise power is equal to the power

obtained at the output of the section when white noise of unit power is applied. By definition:
Bb .Hm
2
= ||H||22
Taking account of the expression for Bb .Hm
2 = ||H||2 given below (6.36), and expression (6.29)
2
above, yields:
(1 − r 2 )sin2 𝜃
Bb = (6.30b)
1 + r 4 − 2r 2 cos 2𝜃
This expression is useful in spectral analysis, for example.
The main characteristics of a purely recursive second-order filter section can be illustrated by an
example.
Example: Consider a second-order filter section having poles with coordinates:
P = 0.6073 + j0.5355
P = 0.6073 − j0.5355
The various parameters are:
b1 = −2 Re(P) = −1.2146
b2 = |OP|2 = 0.6556
6.2 Purely Recursive Second-Order Section 131

|H(f)|
5
Hm
4

0 0.1 f0 = 0.111 0.2 0.3 0.4 0.5 f

Figure 6.4 Frequency response of a purely recursive second-order section

1
H(Z) =
1 − 1.2146Z −1 + 0.6556Z −2
1
|H(𝜔)|2 =
2.905 − 4.02 cos 𝜔 + 1.31 cos(2𝜔)
𝜃 = 2𝜋 × 0.1156; r = 0.81; f0 = 0.111; Hm = 4.39; B3 = 0.06
The modulus of the response is shown in Figure 6.4 as a function of the frequency.
The phase response of the second-order section can be considered using equation (6.24) for the
function 𝜙(𝜔). In order to describe the variations in this function, it is useful first to calculate its
derivative using equation (6.24b). Thence,
d𝜙 r cos(𝜃 + 𝜔) − r 2 r cos(𝜃 − 𝜔) − r 2
= + (6.31)
d𝜔 1 − 2r cos(𝜃 + 𝜔) + r 2 1 − 2r cos(𝜃 − 𝜔) + r 2
This derivative is interesting as it is the group delay of the filter. By definition (1.29):
d𝜙
𝜏(𝜔) =
d𝜔
Thus, the group delay can be written as:
r[cos(𝜃 + 𝜔) − r] r[cos(𝜃 − 𝜔) − r]
𝜏(𝜔) = + (6.32)
1 − 2r cos(𝜃 + 𝜔) + r 2 1 − 2r cos(𝜃 − 𝜔) + r 2
The function 𝜏(𝜔) has a maximum in the vicinity of the resonance frequency. At the frequency
of f = 𝜃/2𝜋, this becomes:
[ ]
r (1 − r)[cos 2𝜃 − r] r
𝜏(𝜃) = 1+ ≈ (6.33)
1−r 1 − 2r cos 2𝜃 + r 2 1−r
In physical systems, this function is positive and 𝜙(𝜔) is an increasing function which has a value
of 0 at the origin and a multiple of 𝜋 at frequency 0.5.
Example: r = 0.81; 𝜃 = 2𝜋 × 0.1156
Figure 6.5 shows the curve 𝜏(f ) as a function of frequency. This curve has a maximum of 3.8 in
the vicinity of the resonance. The unit of time is the sampling period T. The values obtained should
be multiplied by T if this period is different from unity.
The function 𝜏(f ) can be seen to have negative values. In fact, it is the theoretical group delay of
the filter. The system, as it has been presented, however, cannot actually be realized. Each output
element y(n) is calculated by an addition which involves an input number x(n). This operation
cannot be instantaneous. To enable the system to be realized, y(n) has to be delayed, for example,
132 6 Infinite Impulse Response (IIR) Filter Sections

τ(f)
5

4 τm = 3.8

0
0.1 f0 0.2 0.3 0.4 0.5 f
–1

Figure 6.5 Theoretical group delay of a purely recursive section

φ(f)
π
3

0 0.1 f0 0.2 0.3 0.4 0.5 f

Figure 6.6 Phase characteristic of a purely recursive section

by one period. The group delay will then be increased correspondingly, and it is necessary to add
the value 𝜔 to the phase 𝜙(𝜔). The function 𝜙(𝜔) obtained under these conditions is represented
in Figure 6.6, and the curve has maximum slope in the vicinity of the resonance.
The equations which were given for the functions |H(𝜔)|, 𝜙(𝜔), and 𝜏(𝜔) are important
because they allow the corresponding functions to be determined for filters realized by cascading
second-order sections, either by multiplication for the modulus of the frequency response, or by
addition for the phase and the group delay.
To introduce the initial conditions and find the transient responses, the one-sided Z-transform
is used. From the equation of definition of the filter section, the following relation can be found
between the one-sided transforms Y (Z) and X(Z):

X(Z) b1 y(−1) + b2 [y(−2) + y(−1)Z −1 ]


Y (Z) = − (6.34a)
1 + b1 Z −1 + b2 Z −2 1 + b1 Z −1 + b2 Z −2
6.2 Purely Recursive Second-Order Section 133

For x(n) = ejn𝜔 , y(n) is given by the equation:


1
y(n) = Z n−1 Y (Z)dZ
j2𝜋 ∫Γ
with:
1
X(Z) =
1 − ej𝜔 Z −1
taking a circle with radius greater than unity as the integration contour Γ.
The study of the purely recursive section has been carried out in the frequency domain. In the
time domain, this section has an impulse response which is a set h(n) determined by examining
the response to the unit set, or by series expansion of the function H(Z). For complex poles, one
has:
1 P 1 P ∑∞
H(Z) = + = h(n)Z −n
1 − PZ −1 P − P 1 − PZ −1 P − P n=0
Hence,
sin(n + 1)𝜃
h(n) = r n (6.35)
sin 𝜃
Figure 6.7 shows the impulse response of the filter described in the above example. The response
to a unit step can be derived from the definition, after several manipulations:
1
g(n) = [1 + b2 h(n) − h(n + 1)]
1 + b1 + b2
This then gives:
[ ]
1 r i+1
g(n) = 1+ [r sin(i + 1)𝜃 − sin(i + 2)𝜃] (6.34b)
1 + r 2 − 2r cos 𝜃 sin 𝜃
The norm ||H||2 of the function H(𝜔) is used, and can be calculated by two methods, as was
shown in Section 4.3. By summation of the series:


1 ∑ 2n 1 − cos [2(n + 1)𝜃]

||H||22 = |h(n)|2 = r
n=0 sin2 𝜃 n=0 2
Using integration following the residues theorem:
1 ZdZ
||H||22 =
j2𝜋 ∫|Z|=1 (Z − P)(Z − P)(1 − PZ)(1 − PZ)

h(n)

0.5

–2 –1 0 1 2 3 4 5 6 7 8 9 10 11 12 n

Figure 6.7 Impulse response of a second-order section


134 6 Infinite Impulse Response (IIR) Filter Sections

Finally,
1 + r2 1
||H||22 = (6.36)
1 − r 2 1 + r 4 − 2r 2 cos 2𝜃
The value ||H||1 is also used:


||H||1 = |h(n)|
n=0

This value is bounded by the inequality:

1 ∑ n

1
||H||1 = r | sin[(n + 1)𝜃]| ≤ (0 < 𝜃 < 𝜋) (6.37)
sin 𝜃 n=0 (1 − r) sin 𝜃
Example: When the poles are located on the imaginary axis in the Z-plane, 𝜃 = 𝜋/2, and the
impulse response is given by:

h(2p) = r 2p (−1)p

then:


1
||H||22 = |h(2p)|2 =
p=0
1 − r4


1
||H||1 = h(2p) =
p=0
1 − r2

The results obtained for the purely recursive second-order section can be extended to the general
second-order section.

6.3 General Second-Order Section

The most general second-order filter introduces the input data x(n − 1) and x(n − 2) to the
calculation of an element y(n) of the output set at time n. Its equation of definition is written as:

y(n) = a0 x(n) + a1 x(n − 1) + a2 x(n − 2) − b1 y(n − 1) − b2 y(n − 2) (6.38)

and it leads to the Z-transfer function:


a0 + a1 Z −1 + a2 Z −2
HT (Z) =
1 + b1 Z −1 + b2 Z −2
which has two zeros which, since the numerator coefficients are real, are either real or complex
conjugates. The position of these zeros, written as Z 0 and Z 0 , is often special, and two cases are
encountered in practice. The first corresponds to a filtering function. The zeros are then almost
always on the unit circle, both to optimize the attenuation of the filter by introducing an infinite
attenuation frequency. and because, under these conditions, symmetry between the coefficients
appears and the calculations can be simplified. The second case corresponds to a pure phase shifter,
when the zeros are harmonic conjugates of the poles.
The filter case will be considered first. The transfer function is now written as:
1 + a1 Z −1 + Z −2 (Z − Z0 )(Z − Z 0 )
HT (Z) = a0 = a0 (6.39)
1 + b1 Z −1 + b2 Z −2 (Z − P)(Z − P)
6.3 General Second-Order Section 135

or:
1 − 2 Re(Z0 )Z −1 + Z −2
HT (Z) = a0
1 − 2 Re(P)Z −1 + |P|2 Z −2
The modulus of the frequency response of the general second-order section, when the zeros are
placed on the unit circle, is expressed by:
(a1 + 2a0 cos 𝜔)2
|HT (𝜔)|2 = (6.40)
1 + b21 + b22 + 2b1 (1 + b2 ) cos 𝜔 + 2b2 cos 2𝜔
Such a filter can be regarded as the cascade of a purely recursive IIR filter section and a
linear-phase FIR filter section. Consequently, the phase characteristics and the group delay of the
complete filter section are sums of the characteristics of the elementary components. That is,
r cos(𝜃 + 𝜔) − r 2 r cos(𝜃 − 𝜔) − r 2
𝜏T (𝜔) = 1 + + (6.41)
1 − 2r cos(𝜃 + 𝜔) + r 2 1 − 2r cos(𝜃 − 𝜔) + r 2
[ ] [ ]
r sin(𝜃 + 𝜔) r sin(𝜃 − 𝜔)
𝜙T (𝜔) = 𝜔 + arctan − arctan (6.42)
1 − r cos(𝜃 + 𝜔) 1 − r sin(𝜃 − 𝜔)
These two expressions give the phase and the group delay of a second-order section when both
zeros are on the unit circle. This filter section is usually called the second-order elliptic section,
from the technique used for calculating the coefficients.
Example: To illustrate the properties of a general second-order filter section, let us use the
example given earlier by completing the filter with two zeros:
Z0 = 0.3325 + j0.943 and Z 0 = 0.3325 − j0.943
The positions of the singularities in the complex plane are given by Figure 6.8(a). The transfer
function H T (Z) is the quotient of the two second-order polynomials N(Z) and D(Z):
N(D)
HT (Z) = a0
D(Z)
with:
N(Z) = (1 − 0.665Z −1 + Z −2 )
D(Z) = 1 − 1.2146Z −1 + 0.6556Z −2
Figure 6.8(b) shows the frequency response for the filter. The contributions of the numerator
and the denominator of the transfer are also indicated. The factor a0 corresponds to a scaling factor
which is calculated so that the response of the filter has a specified value at a given frequency. For
example:
HT (0) = 1 results in a0 = 0.33
The group delay and the phase are given by Figures 6.5 and 6.6, respectively. The norm ||H T ||2 of
the function H T (𝜔) is calculated as given earlier, and is:
2 + a21 + a21 b2 − 4a1 b1 + 2b21 − 2b22
||HT ||22 = a20 [ ] (6.43)
(1 − b2 ) (1 + b2 )2 − b21
An important particular case is the notch filter, which is used to remove a line in a spectrum
without disturbing the other components. The transfer function is:
1 + a1 Z −1 + Z −2
HN (Z) = (6.44)
1 + a1 (1 − 𝜀)Z −1 + (1 − 𝜀)2 Z −2
136 6 Infinite Impulse Response (IIR) Filter Sections

j Z0 = 0.3325 + j 0.943

P = 0.6073 + j 0.5355
x

0 1
x

(a)
HT(f)
5

3
N(f)
2

1
1/D(f)
0 0.1 0.2 0.3 0.4 0.5 f
(b)

Figure 6.8 (a) Poles and zeros of a general second-order section. (b) Frequency response of a general
second-order section

where 𝜀 is a small positive real value. As shown in Figure 6.9(a), 𝜀 is the distance of the poles to the
unit circle. For very small value of 𝜀, the 3-dB attenuation bandwidth can be approximated by:
𝜀
B3N ≈
𝜋
Outside the notch, the poles compensate for the zeros and the frequency response is almost flat
and close to unity. Moreover, such a filter achieves a very small amplification of the input white
noise, since equation (6.43) yields:
2 − 3𝜀
||HN ||22 ≈ ≈1+𝜀
2 − 5𝜀
If the frequency of the signal to be removed is not precisely known, then the zeros have to be
moved inside the unit circle in order to increase the notch width.
Another class of general second-order sections is that of phase-shifter circuits. The phase-shifter
circuit is characterized by the fact that the numerator and the denominator of the transfer function
have the same coefficients but in the reverse order:
b2 + b1 Z −1 + Z −2 N(Z)
HD (Z) = = (6.45)
1 + b1 Z −1 + b2 Z −2 D(Z)
The polynomials N(Z) and D(Z) are image polynomials. As a result, |H D (ej𝜔 )| = 1 – that is, the
circuit is a pure phase shifter.
The transfer function H D (Z) is written as a function of the poles and zeros as:
(P − Z −1 )(P − Z −1 )
HD (Z) =
(1 − PZ −1 )(1 − PZ −1 )
6.3 General Second-Order Section 137

|HE(f)|
1 j

ej2πf0
1–
0 f0 0.5f 1

(a)

+ yE(n)

x(n) Phase
shifter

– yL(n)
(b)

Figure 6.9 (a) Second-order notch filter. (b) Implementation of the notch filter and its complement

j
Z0
P

1

P

Z0

Figure 6.10 Poles and zeros of the second-order phase shifter

It is clear that the poles and zeros are harmonic conjugates and Figure 6.10 shows their position
in the Z-plane.
The calculation of the phase and group delay for this circuit can be very simply deduced from
equations (6.24) and (6.32) for the purely recursive element as:
N(Z) Z −2 (Z −1 )
HD (Z) = =
D(Z) D(Z)
However,

|D(𝜔)| = |D(−𝜔)|; 𝜙(𝜔) = −𝜙(−𝜔)

Hence,

𝜙D (𝜔) = 2𝜙(𝜔) − 2𝜔

After a few arithmetic manipulations, the group delay 𝜏 g (𝜔) of the phase shifter becomes:

1 − r2 1 − r2
𝜏g (𝜔) = + (6.46)
1 + 2r cos(𝜃 − 𝜔) + r 2 1 − 2r cos(𝜃 + 𝜔) + r 2
138 6 Infinite Impulse Response (IIR) Filter Sections

It is not difficult to show that, as 𝜔 varies from 0 to 𝜋, the phase 𝜙D (𝜔) varies by 2𝜋:
𝜋 𝜋
1 − r2
𝜙D (𝜋) = 𝜏(𝜔)d𝜔 = 2 d𝛼 = 2𝜋
∫0 ∫0 1 + r 2 − 2r cos α
An interesting application of this result is that the abovementioned notch filter can be imple-
mented with the help of a phase shifter. In fact, two complementary filters can be obtained with a
single all-pass section, as shown in Figure 6.9(b). The filter zero that sits on the unit circle corre-
sponds to the phase shift 𝜋. In fact, it is even a set of two complementary filters that are obtained
with a single second-order phase shifter [1, 2].

6.4 Structures for Implementation

The elements can be implemented by circuits which directly produce the operations represented
in the expression for the transfer functions. The term Z −1 corresponds to a delay of one elementary
period and is achieved by one memory element. The coefficients used in the circuits are those
of the transfer function, with the same sign for the numerator and the opposite sign for the
denominator.
The circuit which corresponds directly to the equation for the definition of the purely recursive
second-order section is given in Figure 6.11. The output numbers y(n) are delayed twice, and mul-
tiplied by the coefficients −b1 and −b2 before being added to the input numbers x(n). The circuit
includes two memory locations for data and two for the coefficients. For each output number, two
multiplications and two additions are required.
The general second-order filter section can be realized to conform with the equation of defini-
tion. However, two data memory locations are required for the input numbers and two for the
output numbers. The structure obtained is not canonical, as it contains more than the minimum
number of components. Indeed, only two data memories are necessary if the transfer function is
factorized as:
N(Z) 1
HT (Z) = = N(Z)
N(Z) D(Z)
That is, the calculations involved in the denominator are performed first, followed by those for the
numerator. This structure, called D–N, is shown in Figure 6.12. It corresponds to the introduction

x(n) + y(n)
Figure 6.11 Circuit of a purely recursive section

Z –1
–b1

+ y(n–1)

Z –1
–b2

y(n–2)
6.4 Structures for Implementation 139

a0
x(n) + + y(n)

Z–1

–b1 a1
u1(n)
+ +

Z –1

–b2 a2
u2(n)

Figure 6.12 Second-order section in the D–N structure

of two internal variables u1 (n) and u2 (n) forming a state vector U(n) with N = 2 dimensions. The
system is described by the following equations:
u1 (n + 1) = x(n) − b1 u1 (n) − b2 u2 (n)
u2 (n + 1) = u1 (n)
y(n) = a0 x(n) − a0 b1 u1 (n) − a0 b2 u2 (n) + a1 u1 (n) + a2 u2 (n)

or, in matrix form, conforming to equation (4.34):


u1 (n + 1) = x(n) − b1 u1 (n) − b2 u2 (n)
u2 (n + 1) = u1 (n)
y(n) = a0 x(n) − a0 b1 u1 (n) − a0 b2 u2 (n) + a1 u1 (n) + a2 u2 (n) (6.47)

a0
x(n) y(n)
v1(n)

Z–1

a1 –b1
+

v2(n)

Z–1

a2 –b2
+

Figure 6.13 Second-order section in the N–D structure


140 6 Infinite Impulse Response (IIR) Filter Sections

This representation thus results in a canonical realization, which has the minimum number of
internal variables and consequently the minimum number of memory spaces.
Using the results given in Section 4.6, we see that there is a dual structure which corresponds to
the internal variables 𝜐1 (n)and 𝜐2 (n) such that:
[ ] [ ] [ ] [ ]
𝜐1 (n + 1) −b1 1 𝜐1 (n) −a0 b1 + a1
= x(n)
𝜐2 (n + 1) −b2 0 𝜐2 (n) −a0 b2 + a2
y(n) = 𝜐1 (n) + a0 x(n) (6.48)

This alternative canonical structure is represented in Figure 6.13. It corresponds to performing


the operations on the numerator of the Z-transfer function first and is called the N–D structure.
In order to optimize the circuit and to reduce the size of the multiplier, it is important to
minimize the number of bits in each of the factors of the multiplication. The coefficients will be
considered first.

6.5 Coefficient Wordlength Limitation

The limitation of the coefficient wordlength means that the coefficients can have only a limited
number of values. It follows, therefore, that the poles have a limited number of possible positions
inside the unit circle. The same effect occurs for zeros on the unit circle when the filter is elliptic.
Thus, quantization of the absolute value of the coefficients to b bits limits the number of positions
that the poles can take in a quadrant of the unit circle to 22b , and the number of frequencies of
infinite attenuation to 2b . If the transfer function is calculated first, and the number of bits in the
coefficients is then limited, for example, by rounding, the transfer function is modified by perturba-
tions eN (Z) and eD (Z) in the numerator and the denominator [3]. The function H R (Z) is obtained:
N(Z) + eN (Z)
HR (Z) = (6.49)
D(Z) + eD (Z)
If the round-off errors in the coefficients are denoted by 𝛿ai and 𝛿bi (0 ≤ i ≤ 2), the perturbation
transfer functions are written as:

2

2
eN (f ) = 𝛿ai e−j2𝜋fi ; eD (f ) = 𝛿bi e−j2𝜋fi
i=0 i=1

Let us consider the case of the elliptic filter element whose coefficients are quantized by rounding
to bc bits, including the sign. Allowing for the inequalities:

|ai | ≤ 2; |bi | < 2

the quantization step q is written:

q = 2 × 21−bc = 22−bc

Thus,

|eD (f )| ≤ 2(q∕2)22−bc (6.50)

The modifications of the transfer function caused by quantization of the coefficients of the
denominator are a maximum for frequencies near the poles because the function D(f ) is then a
minimum.
6.6 Internal Data Wordlength Limitation 141

Neglecting the effect of rounding the numerator coefficients, we get:


HR (f ) = N(f )∕[D(f ) + eD (f )] ≈ N(f )[1 − {eD (f )∕D(f )}]∕D(f )
The relative error, e(f ) = (H R (f ) − H(f ))/H(f ), is bounded by:
|eD (f )| ≤ q(1∕|D(f )|) (6.51)
This expression allows us to determine the number of bits needed to represent the coefficients
of the denominator as a function of the tolerance of the frequency response and the values of the
coefficients. It is used in the next chapter.
In the numerator, quantization of the coefficient a1 of the elliptic filter section leads to a displace-
ment of the zeros which lie on the unit circle. The displacement df i of the infinite attenuation point
f i is such that:
1 2−bc
|dfi | ≤ (6.52)
2𝜋 | sin2𝜋fi|
Quantization of the coefficient a0 for the elliptic section results simply in a change in the gain of
the filter.

6.6 Internal Data Wordlength Limitation


In the D–N structure, which is the one most frequently used, the second multiplication factor is
the number held in the data memory. By necessity, this memory has a limited capacity; the feed-
back structure (Figure 6.12) implies that even if the input numbers x(n) have a limited number
of bits, and the memories are empty when the operation begins, the number of bits of the data to
be stored in the memory increases indefinitely. Limitation, usually by rounding, is required. On
the other hand, the filter sections can introduce large gains and major instabilities, so that logic
saturation devices must be introduced to limit the amplitude of the data being stored in the mem-
ory. The elliptic second-order section with a logic saturation unit and quantization device is shown
in Figure 6.14 for the D–N structure. For simplification, this figure assumes a single wordlength
limitation device sited immediately before the memory.
The quantization device involves a degradation in the signal, which passes through the filter. This
is the round-off noise. Following the circuit in Figure 6.14, it can be seen that quantization has the
effect of superimposing, on the input signal x(n), an error signal e(n), which also passes through the

a0
x(n) + + y(n)
Q

Z–1
e(n)
–b1 a1
+ +

Z–1

–b2

Figure 6.14 Elliptic section with wordlength-limitation device


142 6 Infinite Impulse Response (IIR) Filter Sections

filter. If the quantization step has the value q, this error signal can be regarded as having a spectrum
with uniform distribution and a power q2 /12. Under these conditions, the round off noise N c at the
output can be determined if f s = 1 by using equation (4.25) in Section 4.4, as:
1| |2
q2 | N(f ) | df
Nc =
12 ∫0 | D(f ) ||
|
or, as a function of the set h(n), the impulse response of the filter:
q2 ∑

Nc = |h(n)|2
12 n=0
By using the results in the earlier sections, for a purely recursive element with complex poles and
polar coordinates (r, 𝜃), this becomes:
q2 1 + r 2 1
Nc = (6.53)
12 1 − r 2 1 + r 4 − 2r 2 cos 2𝜃
and for the elliptic section:
2
( 2 2 2 2
)
q2 a0 2 + a1 + a1 b2 − 4a1 b1 + 2b1 − 2b2
Nc = [ ] (6.54)
12 (1 − b2 ) (1 + b2 )2 − b21
The quantization step q is related to the number of bits of the internal data memories. This
relation involves the amplitude of the frequency response of the purely recursive part. It is studied
in detail in the following chapter, for a cascade of second-order sections.
In this section, only the D–N structure has been considered. The calculations can be readily
adapted to the N–D structure [4]. The introduction of the quantization device also has consequences
in the absence of a signal.

6.7 Stability and Limit Cycles


Even if there is no signal at the input of an IIR filter, there can still be a signal at the output. This is
particularly likely if the coefficients are such that the filter is unstable.
The condition for stability of the filter is that the poles must lie inside the unit circle. This
condition defines a stability domain in the plane (b1 ,b2 ). From the results of Section 6.2, the
domain of the complex poles is limited by the parabola:
b21
b2 =
4
and the stability condition imposes:
0 ≤ b2 < 1
If the poles are real, then:
√ √
b 1 ( 2 ) b 1 ( 2 )
− 1 + b1 − 4b2 < 1; − 1 < − 1 − b1 − 4b2
2 2 2 2
Now, besides b2 < 1, the stability condition is:
b2 > −b1 − 1; b2 > −1 + b1 ; or |b1 | < 1 + b2 (6.55)
The domain of stability is, therefore, a triangle defined by the three straight lines:
b2 = 1; b2 = −b1 − 1; b2 = b1 − 1
as shown in Figure 6.15.
6.7 Stability and Limit Cycles 143

b2 b12
b2
4

1
Complex poles

–1 1 2
0 b1
Real poles

–1
1

b2
1
b

=
=

–b
2
b

1

1

Figure 6.15 Stability domain of the second-order section

Nevertheless, even if the stability condition is fulfilled, there may still be a signal at the output
in the absence of an input. This is usually a constant or periodic signal which corresponds to an
auto-oscillation of the filter, and which is often called a limit cycle. Such auto-oscillations can be
produced with large amplitudes if overflow occurs when the capacity of the memories is exceeded
in the absence of a logic saturation device. The equation for the system when there is no input
signal is:

y(n) + b1 y(n − 1) + b2 y(n − 2) = 0

The natural condition for the absence of oscillations is given by the inequality:

|b1 y(n − 1) + b2 y(n − 2)| < 1

and thus the condition which is necessary and sufficient for the absence of large-amplitude
auto-oscillations is:

|b1 | + |b2 | < 1 (6.56)

This inequality determines a square in the plane (b1 , b2 ), within the triangle of stability of the
filter element.
To eliminate all possibility of large-amplitude oscillations caused by overflow of the memory, one
can show that it is sufficient to employ a logic saturation device as shown in Section 6.6 [5].
Limit cycles are also produced through quantization before storage in the memory. However,
these have small amplitudes in well-designed systems. They arise through the fact that, in practice,
the input signal is never zero, because, even in the absence of data x(n), the error signal e(n) caused
by quantization of the internal data before storage in the memory is still applied to the filter. An
estimation of the amplitude Aa of the limit cycle is given by the expression:
q
Aa = max |H(ω)|
2
where H(𝜔) is the transfer function for the filter section.
144 6 Infinite Impulse Response (IIR) Filter Sections

Application to a purely recursive second-order section with complex poles produces, according
to equations (6.37) and (6.29):
q 1
|y(n)| ≤ (6.57)
2 (1 − r) sin 𝜃
q 1
Aa = (6.58)
2 (1 − r 2 ) sin 𝜃
These signals often have a spectrum formed of lines with frequencies close to those at which
H(𝜔) is maximum, which are either factors of, or in simple ratios to, the sampling frequency.
When designing filters, the number of bits in the internal data memories must be chosen to be
sufficiently large, and the quantization step q has to be chosen to be sufficiently small to prevent
the limit cycles from being troublesome. It should be noted also that they can be eliminated by
using a type of quantization other than rounding (for example, truncation of the absolute value)
[2]. However, this is only achieved with an increase in the power of the round-off noise in the
presence of a signal [6].
The results obtained in this chapter will be used in the next chapter, where second-order sections
in cascade will be discussed.

Exercises

6.1 Study the first-order section


y(n) = x(n) + by(n − 1)
in the following conditions:
x(n) = 0; n<0
π
x(n) = cos n𝜔; n ≥ 0; b = −0.8; 𝜔=
; y(−1) = 0
2
Give the expression of y(n) with transient and steady-state terms. From y(n), compute the
amplitude and phase responses and check with the results given in Section 6.1. Considering
the steady-state term in y(n), give the filter delay. Compare with the value obtained for the
group delay. Justify the difference between the two values.

6.2 Calculate the response of the system which is defined by the following equation:
y(n) = x(n) + x(n − 1) − 0.8y(n − 1)
to the unit set u0 (n) and the set x(n) such that:
x(n) = 0 for n < 0
1 for n ≥ 0
Give the steady-state frequency response and the transient response.

6.3 Assume a purely recursive second-order section which has the following coefficients:
b1 = −1.56; b2 = −0.8
References 145

State the position of the poles. Calculate the frequency response, the phase response, and
the group delay. How are the functions modified if two zeros are added at j and −j? For this
case, show the circuit diagram using the D–N form and count the number of multiplications
required for each output number.

6.4 Give the expression for the impulse response of a purely recursive second-order filter section
which has the following coefficients:

b1 = −1.60; b2 = −0.98

Calculate the resonance frequency and amplitude of the response at resonance. Give the
response H(𝜔) and calculate the norm ||H||2 .
The zeros are added on the unit circle to produce an infinite attenuation at frequency 3f s /8.
What are the coefficients of the filter? Calculate the new expression for H(𝜔) and the new
value of ||H||2 .
For this filter, find the amplitude of the limit cycles, using the D–N form and then the N–D
form. Give an example of a limit cycle.
Does this filter produce large-amplitude oscillations when there is no logic saturation device?
Give an example.

6.5 How many bits are needed to represent the coefficients of the filter with the following
Z-transfer function:
1 − 0.952Z −1 + Z −2
H(Z) =
1 − 1.406Z −1 + 0.9172−2
in order that the frequency response is not modified by more than 1% in the vicinity of the
poles? Calculate the displacement of the point of infinite attenuation.

6.6 Assume the realization of a second-order phase shifter having poles P1,2 such that:

P1,2 = 0.71 ± j0.54

Calculate the coefficients and give the expression for the function 𝜏 g (𝜔). Show that an imple-
mentation scheme exists which produces a reduced number of multiplications. When there
is no logic saturation device, can this element exhibit large-amplitude oscillations? Can it
produce low-amplitude limit cycles?

References

1 S. K. Mitra and J. F. Kaiser, Handbook for Digital Signal Processing, John Wiley, New York, 1993.
2 P. A. Regalia, S. K. Mitra and P. P. Vaidyanathan, All-pass filter: a versatile signal processing
building block. Proceedings of the IEEE, 76(1), 19–37, 1988.
3 J. B. Knowles and E. M. Olcayto, Coefficient accuracy and digital filter response. IEEE Transac-
tions on Circuit Theory, 1968.
4 L. B. Jackson, On the interaction of round-off noise and dynamic range in digital filters. Bell Sys-
tems Technical Journal, 1970.
146 6 Infinite Impulse Response (IIR) Filter Sections

5 P. Ebert, J. Mazo and M. Taylor, Overflow oscillations in digital filters. Bell Systems Technical
Journal, 1969.
6 T. Claasen, W. Mecklenbrauker and J. Peek, Effects of quantizations and overflow in recursive
digital filters. IEEE Transactions, ASSP24(6), 1976.
147

Infinite Impulse Response Filters

Digital filters with an infinite impulse response (IIR), or recursive filters, have properties simi-
lar to those of analog filters, and consequently, their coefficients can be determined by similar
techniques [1–3].
Before discussing the method for calculating the coefficients, it is useful to give some general
expressions for the properties of these filters.

7.1 General Expressions for the Properties of IIR Filters


The general IIR filter is a system which, from the set of data x(n), produces the set y(n) such that:

L

K
y(n) = a1 x(n − 1) − bk y(n − k) (7.1)
l=0 k=1

The Z-transfer function for this system is written:


∑L −1
l=0 a1 Z
H(Z) = ∑K (7.2)
1 + k=1 bk Z −k
This is the quotient of two polynomials in Z, which are often of the same degree.
As the coefficients al and bk are real numbers, H(Z) is a complex number such that:

H(Z) = H(Z)
and the frequency response of the filter can be written with the same conventions as in the earlier
chapters:
H(𝜔) = |H(𝜔) |e −j𝜙(𝜔)
The modulus and the phase are expressed in terms of H(Z) by the following equations:
|H(𝜔)| 2 = [H(Z)H(Z −1 )]Z =ejω (7.3)
By squaring H(𝜔) and using equation (7.3),
[ ]
1 H(Z)
𝜙(𝜔) = − j log (7.4)
2 H(Z −1 ) Z=ej𝜔
and, by taking the derivative of 𝜙(Z) with respect to the complex variable Z, we obtain:
[ ]
d𝜙 1 H ′ (Z) 1 H ′ (Z −1 )
=− j + 2
dZ 2 H(Z) Z H(Z −1 )

Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
148 7 Infinite Impulse Response Filters

For Z = ej𝜔 this becomes:


[ ]
d𝜙 1 d
= − Re Z log(H(Z))
dZ jZ dZ
Thus, the equation for the group delay becomes:
[ ]
d𝜙 d𝜙 d
𝜏(𝜔) = = jZ = −Re Z log(H(Z)) (7.5)
d𝜔 dZ dZ Z=ej𝜔

Using equations (7.3)–(7.5), it is possible to analyze IIR filters of any order.


Example: Assume,
1 1
H(Z) = =
D(Z) 1 + b1 Z −1 + b2 Z −2
Since
[ ′ ] [ ]
D (Z) b1 Z −1 + 2b2 Z −2
𝜏(𝜔) = Re Z = −Re
D(Z) Z=ej𝜔 1 + b1 Z −1 + b2 Z −2 Z=ej𝜔
one has
1 − b22 + b1 (1 − b2 ) cos 𝜔
𝜏(𝜔) = 1 − (7.6)
1 + b21 + b22 + 2b1 (1 + b2 ) cos 𝜔 + 2b2 cos(2𝜔)
which is equivalent to the expression given in the previous chapter when the poles are complex
with b1 = −2r cos 𝜗 and b2 = r 2 .
Other expressions for IIR filters can be obtained in terms of the poles and zeros of H(Z). If the
numerator and the denominator have the same degree N, and if N is even, then:

N∕2
1 + ai1 Z −1 + ai2 Z −2
H(Z) = a0
i=1 1 + bi1 Z −1 + bi2 Z −2
The square of the modulus of the transfer function is equal to the product of the squares of the
moduli of the elementary functions. The phase and the group delay are the sums of the contribu-
tions of the sections:

N∕2
|H(𝜔)| 2 = a0 |H (𝜔)| 2
| i |
i=1


N∕2
𝜏(𝜔) = 𝜏i (𝜔)
i=1

The general equations for IIR filters given above are used to calculate the coefficients.

7.2 Direct Calculations of the Coefficients Using Model Functions

A direct method for calculating the coefficients of an IIR filter is to use a model function, which
is a real function defined on the frequency axis. The model functions which will be considered are
those of Butterworth, Bessel, Chebyshev, and elliptic functions, all of which have known selectivity
properties. They are also used to calculate analog filters and form a model for the square of the
transfer function to be derived. However, one drawback to their use for calculating digital filters
is that they are not periodic when the desired function has period f s . It is therefore necessary to
7.2 Direct Calculations of the Coefficients Using Model Functions 149

establish a map linking the real axis and the range [0, f s ]. Such mapping is supplied by a conformal
transformation in the complex plane with the following properties:
(1) It transforms the imaginary axis onto the unit circle.
(2) It transforms a rational fraction of the complex variable s into a rational fraction of the complex
variable Z.
(3) It conserves stability.

7.2.1 Impulse Invariance


Consider the analog filter defined by the equation:
y′a (t) = bya (t) + x(t) (7.7)
It has a transfer function:
1
H(s) = (7.8)
s−b
and an impulse response:
h(t) = ebt (7.9)
Sampling this response with period T provides the sequence:
h(nT) = ebTn (7.10)
which has a Z-transform:
1
H(Z) = (7.11)
1 − ebT Z −1
The pole b of the analog filter has become ebT for the digital filter. The method can be generalized
to any number of poles.
This simple method is used in the simulation of analog systems by digital computers. It has a
serious disadvantage for the design of filters due to aliasing of the frequency response. During the
sampling operation, the frequency response of the analog filter is replicated in the useful band of
the digital filter and the amplitude specification cannot be conserved.
Another approach consists of establishing a direct correspondence between the operation of the
analog filter and the digital filter.
Reconsidering the differential equation of the analog filter, at time nT, one can write:
T
ya (nT) = ya (nT − T) + y′a (nT − T + 𝜏)d𝜏 (7.12)
∫0
Evaluating the integral using the trapezium rule leads to:
T[ ′ ]
ya (nT) − ya (nT − T) = y (nT) + y′a (nT − T) (7.13)
2 a
Hence:
T
ya (nT) − ya (nT − T) = [bya (nT) + x(nT) + bya (nT − T) + x(nT − T)]
2
Taking the Z-transform of both sides leads to the following Z-transfer function:
1
H(Z) = ( 2 ) (7.14)
T
(1−Z −1 )
(1+Z −1 )−b

which yields a relation between s and Z, called the bilinear transform.


150 7 Infinite Impulse Response Filters

7.2.2 Bilinear Transform


Assume a transformation which transforms a point Z on the complex plane into the point s where:
2 1 − Z −1
s= (7.15)
T 1 + Z −1
To each point on the unit circle Z = ej𝜔T , there is a corresponding point s such that:
2 1 − e−j𝜔T 2 𝜔T
s= = j tan (7.16)
T 1 + e−j𝜔T T 2
Consequently, the imaginary axis corresponds to the unit circle. The equation which gives Z as
a function of s is:
(2∕T) + s
Z= (7.17)
(2∕T) − s
A rational fraction involving s is transformed into a rational fraction involving Z in which the
numerator and the denominator have the same degree.
If the real part of s is negative, the modulus of Z is less than 1 – that is, the part of the complex
plane of the variable s to the left of the imaginary axis is transformed inside the unit circle. This
property ensures that the stability characteristics of the system are conserved.
In the definition of the transformation, the factor 2/T is a scale factor. T = 1/f s is the sampling
period of the digital system. This factor controls the warping of the frequency axis which is intro-
duced when the bilinear transform is used to obtain the Z-transfer function of a digital system using
a complex function of s. Indeed, the digital system obtained has a frequency response which is a
function of the variable f N , which, in turn, is related by equation (7.16) to the values jf A of the
initial function on the imaginary axis. Thus,

𝜋fA T = tan(𝜋fN T) (7.18)

Figure 7.1 illustrates this equation. Note that the warping is negligible for very low frequencies,
which justifies the choice of the scale factor 2/T.
Although other transforms can also be applied to the calculation of digital filter coefficients,
the bilinear transform is the most commonly used. It allows for the calculation of digital filters

fA Figure 7.1 Frequency warping introduced by the bilinear


transform.

1
2T

1 1

2 T
0 1 1 fN
2 T

1

2T
7.2 Direct Calculations of the Coefficients Using Model Functions 151

from the transfer functions for analog filters, or by using computer programs developed for analog
filters.
It should be remembered, however, that the frequency response is warped, as shown above, since
the analog and digital frequencies 𝜔A and 𝜔D are related by the equation:
[ ]
2 𝜔 T
𝜔A = tan N (7.19)
T 2
The group delay is also modified:
[ ( )]
𝜔A T 2
𝜏N = 𝜏A 1 + (7.20)
2
That is, an analog filter with a constant group delay is transformed into a digital filter which does
not have this property.
In order to design a digital filter from a mask using this method, the mask must first be modified
to take account of the subsequent frequency warping. The analog filter satisfying the new mask can
then be calculated, and finally, the bilinear transform can be applied.

7.2.2.1 Butterworth Filters


Two examples will be used to illustrate the calculation of the coefficients by a model function. The
first is Butterworth filter functions, because of their simplicity, and the second is elliptic functions
because they are the ones most frequently used.
A Butterworth function of order n is defined by:
|
|
| 1
|F(𝜔) |2 = ( )2n (7.21)
|
|
| 1 + 𝜔𝜔
| c

The parameter 𝜔c gives the value of the variable for which the function has the value 12 . Figure 7.2
represents this function for various values of n.
By analytic extension, taking 𝜔c = 1, this can be written as:
| 1
|F(𝜔) |2 = |H(j𝜔)|2 = |H(s)H(−s)|
| s=j𝜔 =
| 1 + 𝜔2n
1 1
H(s)H(−s) = ( )2n = 1 + (−s2 )n
s
1+ j

The poles of this function lie on the unit circle. For example, when n is odd, one can write:
1
H(s)H(−s) = ∏2n
j𝜋(k∕n) )
k=1 (s − e

Figure 7.2 Butterworth functions. |F(ω)|2

0.5

n=1
4 2
0 ωc 2 ωc ω
152 7 Infinite Impulse Response Filters

By setting to H(s) the poles which are to the left of the imaginary axis, to obtain a stable filter,
and after proper factorization to obtain first- and second-order sections with real coefficients, one
has:

1 ∏
(n−1)∕2
1
H(s) = ( ( ))
1 + s k=1 s2 + 2 cos 𝜋 k s+1
n

Similarly, for even values of n, one obtains:


n

2
1
H(s) = [ ]
𝜋(2k−1)
k=1 s2 + 2 cos s+1
2n

The corresponding digital filter is produced by change of variable in accordance with


equation (7.15):
2 1 − Z −1
s=
tan(𝜔c T∕2) 1 + Z −1
The point on the frequency axis where the response of the digital filter has a value of 2−1/2 is f c
with:

𝜔c = 2𝜋fc

By setting u = 1/tan(𝜋f 0 T) and 𝛼 k = 2cos(𝜋k/n), the Z-transfer function for the digital filter of
odd-order n is obtained:
(n−1)

1 + Z −1 ∏2
(1 + Z −1 )2
k
H(Z) = a 0
(1 + u) + (1 − u)Z −1 k=1 1 + bk1 Z −1 + bk Z −2
2

with:
1
ak0 = ; bk1 = 2ak0 (1 − u2 ); bk2 = ak0 (1 − u𝛼k + u2 )
1 + u𝛼k + u2
For even values of n, with 𝛼 k = 2 cos(𝜋(2k − 1)/2n), this becomes:

n∕2
(1 + Z −1 )2
H(Z) = ak0 (7.22)
k=1 1 + bk1 Z −1 + bk2 Z −2
It would thus appear that the zeros of the Z-transfer function are all found at the point Z = −1,
which can simplify the realization of the filter. On the other hand, the function is completely deter-
mined by the data for the parameters n and u.
The order n is calculated from the specification of the filter. Assume that a filter is to be produced
with a frequency response greater than or equal to 1 − 𝛿 1 in the band [0, f 1 ] and less than or equal to
𝛿 2 in the band [f 2 , f s /2]. In terms of the model function F(𝜔), these constraints imply the following
inequalities:
1 1
( )2n ≥ (1 − 𝛿1 ) and ( )2n ≤ 𝛿2
2 2
𝜔1 𝜔2
1+ 𝜔c
1+ 𝜔c

For small 𝛿 1 and 𝛿 2 , the following equation for n results:


1
2
log(2𝛿1 ) + log(𝛿2 )
n⩾
log(𝜔1 ) − log(𝜔2 )
7.2 Direct Calculations of the Coefficients Using Model Functions 153

H(f)

0.5

f2
0 0.1 f1 0.2 0.3 0.4 0.5 f

Figure 7.3 Frequency response of a Butterworth filter of order 8.

The order N of the digital filter is obtained using equation (7.18):


( √ )
log 𝛿1 (2𝛿1 )
N≥
2
(7.23)
log tan(𝜋f2 T) − log tan(𝜋f1 T)
Once n is chosen, the parameter u must lie in the interval:
( )1∕n
1 1 (2𝛿1 )(1∕2)n
≤u≤
tan(𝜋f2 T) 𝛿2 tan(𝜋f1 T)
These parameters allow H(Z) to be calculated.
Example: Assume the following characteristic, which was studied earlier for FIR filters:

𝛿1 = 0.045; 𝛿2 = 0.015; fs = 1; f1 = 0.1725; f2 = 0.2875

It is found that N ≈ 7.3, so the value adopted is N = 8:


(1 + Z −1 )2 (1 + Z −1 )2
H(Z) = 0.00185 −1 −2
×
1 − 0.36Z + 0.04Z 1 − 0.39Z −1 + 0.12Z −2
(1 + Z )−1 2 (1 + Z −1 )2
× −1 −2
×
1 − 0.45Z + 0.31Z 1 − 0.58Z −1 + 0.69Z −2
The frequency response obtained is shown by Figure 7.3. When the filter transition band,
Δf = f 2 − f 1 , is sufficiently small, equation (7.23) can be simplified:
( )
1 2.3 fs f
N ≈ log √ sin 2𝜋 1 (7.24)
𝛿2 (2𝛿1 ) 2𝜋 Δf fs

Thus, the filter order is proportional to the inverse of the transition band, as is the case for FIR
filters. It follows that the selectivity of this kind of filter is rather limited in practice.
In conclusion, Butterworth filters are straightforward to calculate. Significant simplifications can
be made in their realization because of the arrangement of the roots and, in certain cases, of the
poles, but they are much less selective than elliptic filters.

7.2.2.2 Elliptic Filters


The elliptic filter displays ripples in the pass band and in the stop band. It is optimal in that for
a given order n and for fixed ripple amplitudes, it has the smallest transition band. The model
154 7 Infinite Impulse Response Filters

|T(jω)|2

1
1
1 + ε2

1
A2
0 ω1 ω2 ω

Figure 7.4 Elliptic filtering function.

function involves elliptic functions and is written as:


1
T 2 (u) = (7.25)
1 + 𝜀2 sn2 (u, k1 )
where y = sn(u, k) is defined implicitly by the incomplete elliptic function of the first type:
arcsin y
d𝜃
u= (7.26)
∫0 (1 − k2 sin2 𝜃) 2
1

Figure 7.4 represents the function T 2 (u) for u = j𝜔, and shows the parameters corresponding to
k1 such that:
𝜀
k1 = √
(A2 − 1)

The function sn2 (𝜔, k) oscillates between 0 and 1 for 𝜔 < 𝜔1 and between (A2 − 1)/𝜀 and infinity
for 𝜔 ≥ 𝜔2 .
One can show that the order n of the filter is determined from the parameters k1 and k, the
selectivity factor:
𝜔
k= 1
𝜔2
by the expression:
(√ ( ))
K(k)K 1 − k12
n= √ (7.27a)
K(k1 )K( (1 − k2 ))
where K(k) is the complete elliptic integral of the first type:
𝜋
2 d𝜃
K(k) = (7.27b)
∫0 (1 − k2 sin2 𝜃) 2
1

This integral is calculated by the Chebyshev polynomial approximation method which results in
an error of the order of 10−8 with a polynomial of degree 4. The inverse function to the incomplete
integral of the first type is calculated as the quotient of two rapidly converging series.
A simplified equation for the order n of the filter can be obtained from the general specification
given in Figure 5.7. With the assumption of a ripple in the pass band of between 1 and 1 − 2𝛿 1 , and
with the following parameters (Figure 7.5):
1 1
𝛿2 = ; 2𝛿1 = 1 − √ ;f =1
A (1+𝜀2 ) s
7.2 Direct Calculations of the Coefficients Using Model Functions 155

Figure 7.5 Poles and zeros of an elliptic filter of order 4.


Z2 j

Z1 P2
P1

0 1

one has:
( ) [ ]
() 8𝜔1
2 2
n≈ In √ In (7.28)
𝜋 2
𝛿2 𝛿1 (𝜔2 − 𝜔1 )

The order N of the digital filter satisfying the mask in Figure 5.7 is then given by:
( )
( ) ⎡ 𝜋f ⎤
( ) ⎢ 8 tan f 1 ⎥
2 2
√ × In ⎢ ( ( ) ( )) ⎥
s
N≈ In
𝜋 2
𝛿2 𝛿1 ⎢ tan 𝜋f2 − tan 𝜋f1 ⎥
⎣ fs fs ⎦
The transition band Δf = f 2 − f 1 is generally narrow, and thus (using logarithms to base 10),
( ) [( ) ( ) ( )]
2 fs 4 2𝜋f1
N ≈ 1.076 log √ log sin (7.29)
𝛿2 𝛿1 Δf 𝜋 fs

This relation should be compared with equation (5.32) for finite impulse response filters. It shows
that, for elliptic IIR filters, the order is proportional to the logarithm of the inverse of the normal-
ized transition band. This leads to much lower values than for FIR filters. Further, equation (7.29)
shows that the width of the band is also involved. The maximum value of N is found for f 1 close to
f s /4 – that is, for a pass band approximating half of the useful band. Further simplification can be
obtained for filters with a narrow pass band. In this case, the order N ′ of the filter is given by:
[ ] ( )
2 8f1

N ≈ 1.076 log √ log (7.30)
(𝛿2 𝛿1 ) Δf

and, as in analog filters, it is the steepness of the cutoff which is important here.
Once the filter order has been determined, the calculation procedure involves determining the
poles and zeros of T 2 (u), which show double periodicity in the complex plane. By changing the
variable and then applying the bilinear transform, the configuration of the poles and zeros of
the digital filter in the Z-plane is obtained [4].
With this technique, the filter is specified by:
(1) The peak-to-peak amplitude of the pass band ripples, expressed in dB:
BP = −20 log(1 − 2𝛿1 )
(2) The amplitude of the ripples in the attenuation band, expressed in dB:
( )
1
AT = 20 log
𝛿2
(3) The frequency of the end of the pass band, FB.
156 7 Infinite Impulse Response Filters

1
|H(f)|
dB
50

40

30

20

0.17
FA
0 0.1 FB 0.2 0.3 0.4 0.5 f

Figure 7.6 Frequency response of an elliptic filter of order 4.

(4) The frequency of the start of the attenuation band, FA.


(5) The sampling frequency, FS.

Example: Consider the specification given in the previous section: BP = 0.4, AT = 36.5, FS = 1,
FB = 0.1725, FA = 0.2875.
It is found that N = 3.37, and the adopted values in N = 4. The zeros and poles have co-ordinates
(Figure 7.5):

Z1 = −0.816 + j0.578 Z2 = −0.2987 + j0.954


P1 = 0.407 + j0.313 P2 = 0.335 + j0.776

To demonstrate the point of infinite attenuation, the curve 1/|H(f )| giving the attenuation of the
filter as a function of frequency is shown in Figure 7.6.
Figure 7.7 shows the group delay of the filter obtained. The curves can be compared with the
results obtained for the same characteristic using the Butterworth filter. They demonstrate the
advantage of the elliptic filter, which requires an order lower by a factor of 2 and produces a corre-
sponding reduction in the complexity of the circuits.
The methods described allow for calculation of low-pass filters, from which suitable frequency
transformations make it possible to obtain high-pass and band-pass filters.

τ(f)

10

0 0.1 0.2 0.3 0.4 0.5 f

Figure 7.7 Group delay of an elliptic filter of order 4.


7.2 Direct Calculations of the Coefficients Using Model Functions 157

7.2.2.3 Calculating any Filter by Transformation of a Low-pass Filter


The calculation procedures presented in the previous sections result in a function H(s) which pro-
vides, by bilinear transforms, the Z-transfer function for the digital filter. For s = j𝜔, the function
H(𝜔) is a low-pass filter function in the frequency domain which extends from zero to infinity. It
is possible to apply transformations to it which result in other types of filters [5]. For example, in
order to obtain a low-pass filter whose pass band extends from 0 to 𝜔′1 , using a function whose pass
band covers the domain [0, 𝜔1 ], the following transformation can be made:
𝜔
s → s ′1
𝜔1
By starting from a low-pass filter with a pass band limited to 1, the following filters can be
obtained, by denoting the lower and upper limits of the pass band by 𝜔B and 𝜔H:
s
(1) Another low-pass: s → 𝜔H
𝜔B
(2) High-pass: s → s
s2 +𝜔H 𝜔B
(3) Band-pass: s → s(𝜔H −𝜔B )
s(𝜔H −𝜔B )
(4) Band stop: s → s2 +𝜔H 𝜔B

These transforms conserve the ripple in the response of the filter but result in frequency warping.
A more direct method consists of using transforms other than the bilinear transform to reach the
function H(Z). For example, the transform:
−1 −2
1 1 − 2 cos(𝜔0 T)Z + Z
s= −2
(7.31)
T 1−Z
allows a bandpass digital filter to be obtained from a low-pass filter function.
For Z = ej𝜔T this becomes:
1 cos(𝜔0 T) − cos(𝜔T)
s=j (7.32)
T sin(𝜔T)
If the pass band of the digital filter extends from 𝜔B to 𝜔H, 𝜔0 must be chosen so that the abscissae
of the transformed points are equal in absolute value but are of opposite sign:
cos(𝜔o T) − cos(𝜔B T) cos(𝜔o T) − cos(𝜔H T)
=−
sin(𝜔B T) sin(𝜔H T)
and thus:
[ ]
cos (𝜔B + 𝜔H ) T2
cos(𝜔o T) = [ ]
cos (𝜔B − 𝜔H ) T2
This approach avoids adding a stage to the calculation procedure for a band-pass filter.
It is also possible to use transforms in the Z-plane which conserve the unit circle. The simplest
is to transform from Z to − Z, which changes a low-pass filter into a high-pass one.
The transform:
(Z −1 − 𝛼)
Z −1 → (7.33)
(1 − 𝛼Z −1 )
where 𝛼 is a real number, changes a low-pass filter into another low-pass one. In fact, it can be
shown that the most general transform is expressed by [5]:
∏K
Z −1 − 𝛼k
Z −1 → ± (7.34)
k=1
1 − 𝛼k Z −1
with |𝛼k| < 1 to ensure stability.
158 7 Infinite Impulse Response Filters

For example, a low-pass filter is transformed into a band-pass filter by:


Z −2 − 𝛼1 Z 1 + 𝛼2
Z −1 → (7.35)
𝛼2 Z −2 − 𝛼1 Z −1 + 1
It would thus appear that every type of filter can be calculated in a direct way using model
functions. However, there are important limitations. Firstly, for example, for elliptic filters, the
ripples must be constant in the pass and stop bands. Secondly, the methods described do not allow
for possible constraints on the impulse response. To overcome these limitations, optimization
techniques have to be employed.

7.2.3 Iterative Techniques for Calculating IIR Filter with Frequency


As with FIR filters, optimization methods allow for the calculation of IIR filters with any specifica-
tion. Nevertheless, the calculation is somewhat more sensitive than for FIR filters as precautions
have to be taken to avoid producing an unstable system.
Two methods will be presented. These correspond to two different optimization criteria, the first
of which is minimization of the mean square error [6].

7.2.3.1 Minimizing the Mean Square Error


The transfer function of a filter is given in a factorized form by the equation which was introduced
earlier:
N

2
1 + ai1 Z −1 + ai2 Z −2
H(Z) = a0 ; a0 > 0 (7.36)
i=1 1 + bi1 Z −1 + bi2 Z −2
by regarding the numerator and denominator as having the same degree N (even).
Assume D(f ) is the function which approximates the frequency response of the filter H(f ). The
difference between these functions represents an error which can be minimized by least squares
for a number of points, N 0 , on the frequency axis. Thus,
N0 −1 ( )
∑ | ( | 2
E= |H(fn )| − |D fn |
| |
n=0

The value E is a function of the set of 2N + 1 parameters, which are the coefficients of the filter:
( ) N
E = E a0 , ai1 , ai2 , bi1 , bi2 with 1 ⩽ i ⩽
2
The minimum corresponds to the set of 2N + 1 parameters xk such that:
𝜕E
= 0; 1 ⩽ k ⩽ 2N + 1
𝜕xk
For the parameter a0 , one can set H(Z) = a0 H 1 (Z), whence:

𝜕E ∑
NO −1
=0=2 (a0 |H1 (fn )| − |D(fn )|)|H1 (fn )|
𝜕a0 n=0

Thus, the value of a0 is:


∑N0 −1
|D(fn )||H1 (fn )|
a0 = n=0 ∑N0 −1 (7.37)
|2
n=0 H1 (fn )|
The optimization is restricted to 2N variables.
7.2 Direct Calculations of the Coefficients Using Model Functions 159

The procedure consists of taking an initial function H10 (Z), which is found, for example, by the
direct calculation method given in the preceding section for elliptic filters, and then assuming that it
is sufficiently close to the optimum for the function E to be represented by a quadratic function with
2N parameters xk . The desired optimum is then obtained through an increment in the parameters
represented by the vector ΔX with 2N element such that:

2N
𝜕E 1 ∑ ∑ 𝜕2 E
2N 2N
E(X + ΔX) ≈ E(X) + Δxk + Δx Δx
k=1
𝜕xk 2 k=1 l=1 𝜕xk 𝜕xl k l

By using A to denote the matrix with 2N rows and N 0 columns which has elements:
𝜕
aij = 2 [a |H (f )|]
𝜕xi 0 1 j
and by using Δ to denote the column vector en with N 0 terms such that:

en = a0 |H1 (fn )| − |D(fn )|

the condition of least squares is obtained by requiring E(X + ΔX) to be an extremum. As in Section
5.4, when calculating the coefficients of FIR filters, we have:

ΔX = −[AAt ]−1 AΔ

The calculation is then repeated with the new values for the parameters, which should ultimately
lead to the required optimum. The chances of achieving this and the rate of convergence depend
on the increments given to the parameters, and one of the best strategies is offered by the Fletcher
and Powell algorithm [7].
To ensure stability in the resulting system, either the stability can be controlled at each stage, or
the final system can be modified by replacing the poles Pi outside the unit circle by 1/Pi , which
does not modify the modulus of the frequency response except for a constant factor. In the latter
case, it is generally necessary to return to the optimization procedure to achieve the optimum.
Mean square error minimization can be applied to other functions as well as the frequency
response – for example, the group delay [8].

7.2.3.2 Chebyshev Approximation


This criterion corresponds to a limitation in the amplitude of the ripple in the frequency response of
the filter in some frequency bands. One elegant approach consists of applying the Remez algorithm,
already used to calculate the coefficients of linear-phase FIR filters. The technique consists of using
an initial filtering function H 0 (Z) which approximates the required H(Z). This function can, for
example, be of the elliptic type calculated by the method shown in Section 7.2.4 using adequate
specifications. It is written as:
N0 (Z)
H0 (Z) =
D0 (Z)
As the zeros of the filter functions are generally on the unit circle, the numerator N(Z) can be
regarded as the transfer function of a linear-phase FIR filter.
The first stage of the iterative technique is to calculate a new value for the numerator, N 1 (Z),
using the algorithm for FIR filters. In this calculation, D0 (f ) is the function to be approximated in
the pass band, and 1/|D0 (f )| is used as a weighting factor.
One then looks for a new value for the denominator D1 (Z). A function which approximates
|N 1 (f )| in the pass band can be sought directly by again using the algorithm for FIR filters. It is
160 7 Infinite Impulse Response Filters

|N(f)|

|G(f)|

0 f1 f2 0.5 f

Figure 7.8 The N(f) and G(f) functions for a low-pass filter.

more satisfactory, however, to use an adaptation of the calculation techniques employed for analog
filters.
By assuming that the required function H(f ), written as:
N(f )
H(f ) =
D(f )
is such that:

|H(f )| ≤ 1

one can write:

|G(f )|2 = |D(f )|2 − |N(f )|2

and hence:
1
|H(f )|2 =
| G(f ) |2
1 + | N(f ) |
| |
Figure 7.8 shows the functions |G(f )| and |N(f )| for a low-pass filter. The zeros of the function
G(Z) lie on the unit circle, and they can be calculated using the algorithm for linear-phase FIR
filters. The weighting function is determined from 1/|N(f )|.
By optimizing the stop and pass bands alternately, the required filter function is obtained after
several iterations and the filter coefficients are then obtained. The stability of the filter requires that
only those poles of H(z) which are inside the unit circle be conserved.

7.2.4 Filters Based on Spheroidal Sequences


The filter design criterion can be the maximization of the energy concentration in a given frequency
band, instead of a set of specifications on the frequency response.
Let 𝜆 be a scalar intended to represent the energy concentration and defined by:
f
∫−fc H(f )H(f )df
𝜆= 1
c
(7.38)
∫ 2
−1 H(f )H(f )df
2

where [−f c , f c ] is the band in which the energy has to be concentrated.


7.2 Direct Calculations of the Coefficients Using Model Functions 161

For the response:


P
H(f ) = an e−j2𝜋fn
n=−P

a direct calculation yields:


∑P ∑P [ ]
sin(n−m)2𝜋fc
n=−P a a
m=−P n m (n−m)𝜋
𝜆= ∑P (7.39)
2
n=−P an

and in matrix form:

At RA = 𝜆At A

which is an eigenvalue equation. The filter coefficients are the elements of the eigenvector corre-
sponding to the largest eigenvalue of the matrix R, whose elements are the terms:

sin(n − m)2𝜋fc
(n − m)𝜋

The elements of the eigenvectors of the matrix R are called the discrete prolate spheroidal
sequences [9].
An FIR filter has been obtained; it is also possible to derive an IIR filter. To that end, consider the
following purely recursive function:
1
|H(f )|2 =
|∑N |2
1 + | n=1 bn ej2𝜋fn |
| |
The coefficients can be calculated to minimize the energy of the denominator in the band [−f c ,
f c ], under the condition |H(f c )|2 = 0.5. Then, the same method as above can be applied and the coef-
ficients bn (1 ≤ n ≤ N) can be taken as the elements of the eigenvector associated with the smallest
eigenvalue of the spheroidal matrix. First, the scaling factor of the eigenvector is chosen such that
|H(fc )|2 = 21• . Then, the poles of the analytic expansion of |H(f )|2 are calculated, and the desired
filter transfer function H(Z) is obtained by keeping only those few poles which are inside the unit
circle to ensure stability.
The procedure can be made reasonably simple by using iterative techniques and exploiting the
structural properties of the spheroidal matrix [9].
Example: Assume: N = 4; f s = 1; f c = 0.1. The minimal eigenvector V min is:
t[1.0,−2.773,−2.773,1.0]
Vmin

If T designates the matrix whose elements are the terms ej2𝜋fc (n−m) with 1 ≤ n, m ≤ N, then the
tmin
scaling factor leading to the equality Vmin is 10.46.
After factorization of the analytical expansion H(Z)H(Z −1 ), the transfer function finally
obtained is:
0.0704
H(Z) = (7.40)
(Z − 0.73 + j0.446)(Z − 0.73 − j0.446)(Z − 0.741)

The technique presented above for low-pass filtering can be extended to high-pass filtering.
162 7 Infinite Impulse Response Filters

7.2.5 Structures Representing the Transfer Function


IIR filters can be produced using circuits which directly perform the operations represented by
the expression for their transfer function. The term Z −1 corresponds to a delay of one sampling
period and is achieved by storage in the memory. The coefficients to be created by the circuits are
those of the transfer function, with the same sign for the numerator and the opposite sign for the
denominator.
Direct realization of transfer functions may necessitate high-accuracy calculations and it is gen-
erally preferable to resort to structures decomposed into sums or products of first- and second-order
sections.
Only canonical structures – that is, those which require the minimum number of elementary
operators, computing circuits, and memories – will be examined.
Decomposition into a product corresponds to the cascade structure in which the filter is gener-
ated from a set of first- and second-order sections:
∏N
(1 − Zi Z −1 )
H(Z) = a0 ∏i=1 N −1
i=1 (1 − Pi Z )
1 − Zi Z −1 1 − 2Re(Zj )Z −1 + |Zj |2 Z −2
= a0 · · · · · · ··· (7.41)
1 − Pi Z −1 1 − 2Re(Pj )Z −1 + |Pj |2 Z −2

This structure is the one most frequently used because, in addition to its modularity, it presents
the useful properties of low sensitivity to coefficient wordlength limitation and to round-off noise.
The function H(Z) can also be decomposed into rational fractions:

𝛼i 𝛼j + 𝛽j Z −1
H(Z) = a0 + · · · + + · · · + +··· (7.42)
1 − Pi Z −1 1 − 2Re(Pj )Z −1 + |Pj |2 Z −2

The approach corresponds to connecting the M basic elements in parallel as shown in Figure 7.9.
The numbers y(n) are obtained by summing the outputs from the different elements, to which the
input numbers x(n) are applied.
IIR filters can also be implemented with the help of phase shifters.
Butterworth, Chebyshev, and elliptic transfer functions can be decomposed into a sum of two
phase shifters [10]. For such a function, this gives:
N(Z) 1
H(Z) = = [A1 (Z) + A2 (Z)] (7.43)
D(Z) 2
where A1 (Z) and A2 (Z) are the transfer functions of the phase shifters.

a0 Figure 7.9 The parallel structure.


x(n)

H1
Σ y(n)

HN–1
7.2 Direct Calculations of the Coefficients Using Model Functions 163

Calculation of A1 (Z) and A2 (Z) from H(Z) involves the complementary function G(Z) =
M(Z)/D(Z) such that:

|G(f )|2 = 1 − |H(f )|2 (7.44)

It is assumed that the initial function H(Z) is such that N(Z) is a symmetric polynomial and M(Z)
is asymmetric – that is:

N(Z) = Z N N(Z) M(Z) = Z N M(Z) (7.45)

Under these conditions, combining equations (7.44) and (7.45) yields:

N(Z)N(Z) + M(Z)M(Z) = D(Z)D(Z) (7.46)

and

[N(Z) + M(Z)][N(Z) − M(Z)] = Z −N D(Z −1 )D(Z) (7.47)

Note that the zeros of N(Z) + M(Z) and N(Z) − M(Z) are harmonic conjugates and that the zeros
of D(Z) are their inverses. Designating the poles of the filters by Pi (i = 1, …, N), and hence the zeros
of D(Z), one can write, to within a constant:

r

N
N(Z) + M(Z) = (1 − Z −1 Pi ) (Z −1 − Pi ) (7.48)
i=1 i=r+1

and

r

N
N(Z) − M(Z) = (Z −1 − Pi ) (1 − Z −1 Pi )
i=1 i=r+1

where r is the number of zeros of the polynomial N(Z) + M(Z) within the unit circle. Dividing by
D(Z), one obtains:
∏N −1 − P )
i=r+1 (Z i
H(Z) + G(Z) = ∏N (7.49)
−1
i=r+1 (1 − Z Pi )

and similarly:
∏r
(Z −1 − Pi )
H(Z) − G(Z) = ∏ri=1 (7.50)
−1
i=1 (1 − Z Pi )

The phase shifters A1 (Z) and A2 (Z) have the following expressions:
∏N
Z −1 − Pi ∏r
Z −1 − Pi
A1 (Z) = −1
A2 (Z) = (7.51)
i=r+1
1 − Z Pi i=1
1 − Z −1 Pi

Finally, the filter H(Z) and its complement G(Z) are obtained by the arrangement shown in
Figure 7.10.

Figure 7.10 Realization of an IIR filter and the


A1 (Z) + y 1 (n )
complementary filter using two phase shifters.
x (n )

A2 (Z) + y 2 (n )

164 7 Infinite Impulse Response Filters

The general procedure for designing the phase shifters from an elliptical filter is as follows:
(1) Calculate the transfer function H(Z) = N(Z)/D(Z) of an elliptic filter of odd order N.
(2) Calculate the coefficients of the antisymmetric polynomial M(Z) from N(Z) and D(Z) by using
equation (7.46).
(3) Determine the inverses of the poles of H(Z) which are the roots of the polynomial N(Z) + M(Z).
(4) Calculate A1 (Z) and A2 (Z) using expression (7.51).
A simplified approach, when the order N is not very high, consists of finding A1 (Z) and A2 (Z)
directly by combining poles. Hence, for:
[ ]
1 + 1.8601Z −1 + 2.9148Z −2 + 2.9148Z −3 + 1.8601Z −4 + Z −5
H(Z) = 0.0546
(1 − 0.4099Z −1 )(1 − 0.06611Z −1 + 0.4555Z −2 )(1 − 0.4993Z −1 + 0.8448Z −2 )
this gives:
0.4555 − 0.6611Z −1 + Z −2
A1 (Z) =
1 − 0.6611Z −1 + 0.4555Z −2
(−0.4099 + Z −1 )(0.8448 − 0.4993Z −1 + Z −2 )
A2 (Z) =
(1 − 0.4099Z −1 )(1 − 0.4993Z −1 + 0.8448Z −2 )
The basic structure of phase shifters is useful since it provides two complementary filters with the
same calculations, which is useful in filter banks, as shown in Chapters 11 and 12. Furthermore, it
is less sensitive than other structures to rounding of the coefficients.
Note that filters which can be decomposed as the sum of phase shifters are entirely defined by
their poles.
A sum of phase shifters as in Figure 7.10 is the most efficient realization of an elliptic filter since
it requires a number of multiplications equal to the order of the filter.

7.2.6 Limiting the Coefficient Wordlength


Practical implementation of a filter implies limitations of the number of bits in the representation of
filter coefficients, which form one of the multiplication terms. The effect of this on the complexity
is important because multiplication is often the most critical factor. It is, therefore, necessary to
find the minimum number of bits which allows the constraints imposed on the filtering function
to be met.
Limitation of the number of bits of the scale factor a0 appears as a modification in the gain of the
filter, but it does not affect the form of the frequency response. As the filter gain is specified with a
certain tolerance at a given frequency (for example, 800 Hz or 1000 Hz for a telephone channel), it
is necessary to ensure that the binary representation of a0 permits this constraint to be met.
Limitation of the number of bits in the other coefficients modifies the transfer function by intro-
ducing error polynomials eN (Z) and eD (Z) in the numerator and the denominator. The actual trans-
fer function is H R (Z):
HR (Z) = (N(Z) + eN (Z))
(7.52)
(D(Z) + eD (Z))
If the rounding errors in the coefficients are denoted by 𝛿ai and 𝛿bi , these error functions can be
written as a function of the normalized frequency (f s = 1):

N

N
eN (f ) = 𝛿ai e−j2𝜋fi ; eD (f ) = 𝛿bi e−j2𝜋fi
i=1 i=1
7.2 Direct Calculations of the Coefficients Using Model Functions 165

In fact, these expressions form the Fourier series expansion of functions which are periodic in
frequency. The Bessel–Parseval equation (1.7) relating the power of a signal to the power of its
components allows the following equation to be written:
1 || ∑N
|e (f )2 df = |𝛿ai |2
∫0 || N
| i=1

If q denotes the quantization step,


q
|𝛿ai | ≤
2
and an upper limit is obtained for |eN (f )| by:
q
|eN (f )| ≤ N (7.53)
2
A statistical estimate 𝜎 of |eN (f )| can be obtained by regarding the 𝛿ai as random variables uni-
formly distributed over the range [−q/2, q/2]. It is evaluated from the effective value of the function
eN (f ). Thus,
1
q2
𝜎2 = |en (f )|2 df = N
∫0 12
and hence
q√N
𝜎= (7.54)
2 3
This estimate is valid for both |eN (f )| and |eD (f )|. It is clearly less than the bound (7.53) given
above and it is, in fact, more realistic when N exceeds several units.
The consequences of rounding the coefficients can be analyzed separately for the numerator and
denominator of the transfer function by considering the stop band for one and the pass band for
the other. Examination of the configuration of the poles and zeros in the Z-plane shows that the
poles determine the response of the filter in the pass band and the zeros in the stop band.
In the stop band, the denominator coefficient wordlength limitation can be neglected and, in
terms of the variable, 𝜔 = 2𝜋f , one has:
(N(𝜔) + eN (𝜔))
HR (𝜔) =
D(𝜔)
The error on the response is then estimated by:
≈𝜎
|HR (𝜔) − H(𝜔)|
|D(𝜔)|
If the specification requires that the ripples in the stop band be less in modulus than 𝛿 2 , then, by
separating the tolerance into two equal parts – one for the ripple when there is no rounding error
on the coefficients, and the other to allow for the error caused by this rounding – one has:
𝜎 𝛿
< 2 (7.55)
| D(𝜔)| 2
In the pass band, the numerator coefficient wordlength limitation can be neglected:
[ ]
N(𝜔) N(𝜔) e (𝜔)
HR (𝜔) ≈ ≈ 1− D (7.56)
D(𝜔) + eD (𝜔) D(𝜔) D(𝜔)
If the ripple in the pass band is less in modulus than 𝛿 1 , then by again dividing the tolerance into
two equal parts,
𝜎 𝛿
< 1 (7.57)
| D(𝜔)| 2
166 7 Infinite Impulse Response Filters

Figure 7.11 The poles and zeros of a narrow low-pass filter.


j

ωc

0 1

This condition is generally much more restrictive than the earlier one because the function |D(𝜔)|
has very low values in the pass band and is even more restrictive with a more selective filter. Further,
when the pass band is narrow, the coefficients can have large values. For a low-pass filter such as
that in Figure 7.11, the following equation can be written:
D(Z) ≈ [1 − Z −1 ]N
and thus:
N!
bi ≈ (7.58)
i!(N − i)!
Under these conditions, a very large number of bits is required if both large values of the coeffi-
cients are to be represented and a very low quantization error is required. For this reason, decom-
posed structures are used almost exclusively with first- and second-order sections.
Let us first consider the cascade structure which corresponds to the decomposition (7.41) of the
transfer function. If the order N of the filter is even, this is written as:

N∕2
H(𝜔) = N(𝜔)∕D(𝜔) = Ni (𝜔)∕Di (𝜔)
i=1
The polynomials N i (𝜔) and Di (𝜔) are of the second degree.
In the pass band, if rounding of the coefficients of the polynomials N i (𝜔) is neglected, we obtain:

N∕2
Ni (𝜔)
HR (𝜔) ≈
i=1
Di (𝜔) + ei (𝜔)
or
[ ]
N(𝜔) ∑
N∕2
ei (𝜔)
HR (𝜔) ≈ 1− (7.59)
D(𝜔) i=1
Di (𝜔)
Then, using equation (7.53), we obtain:
| e (𝜔)| ≤ q
| i |
and the overall relative error e(𝜔) in the frequency response is found to be bounded by:
| ∑
N∕2
| 1
| e(𝜔)| ≤ q (7.60)
| |D
| i (𝜔)|
| i=1

This expression demonstrates the benefit of the decomposed structure as the bound of the error
is proportional to:

N∕2
1
i=1
|Di (𝜔)|
7.2 Direct Calculations of the Coefficients Using Model Functions 167

instead of (see equation (7.56)):



N∕2
1
i=1
|Di (𝜔)|
Further, the absolute value of the coefficients of the denominator cannot be greater than 2 if the
filter is stable.
With the parallel structure, also in the pass band, the following equation is obtained:
[ ]

N∕2
Ni′ (𝜔) ∑
N∕2
Ni′ (𝜔) e (𝜔)
HR (𝜔) ≈ ≈ 1− i
i=1
Di (𝜔) + ei (𝜔) i=1
Di (𝜔) Di (𝜔)
| N ′ (𝜔) |
Bearing in mind that the terms | D2(𝜔) | are close to unity, the error in this case is approximately
| i |
the same as that with the cascade structure.
In the stop band, away from the infinite attenuation frequencies, the following is obtained for
the cascade structure:
[ ]

N∕2
Ni (𝜔) + ei (𝜔) N(𝜔) ∑
N∕2
ei (𝜔)
HR (𝜔) ≈ ≈ 1+ (7.61)
i=2
Di (𝜔) D(𝜔) i=1
Ni (𝜔)
For the parallel structure, the following estimate can be made:

N∕2
Ni′ (𝜔) + ei (𝜔) ∑
N∕2
Ni′ (𝜔) ∑
N∕2
ei (𝜔)
HR (𝜔) ≈ = +
i=1
Di (𝜔) i=1
Di (𝜔) i=1
Di (𝜔)
or,
⎡ ⎤
⎢ ⎥
N(𝜔) ⎢⎢ ∑ ei (𝜔) ∑ Dj (𝜔) ⎥
N∕2 N∕2
HR (𝜔) ≈ 1+ ⎥ (7.62)
D(𝜔) ⎢ i=1
Ni (𝜔) Nj (𝜔) ⎥
⎢ i=1 ⎥
⎢ j≠i ⎥
⎣ ⎦
By comparing equations (7.61) and (7.62), it appears that the terms:

N∕2
Dj (𝜔)
𝛼j =
j=1
Nj (𝜔)

in the stop band can have values much greater than unity, for example, in the vicinity of the zeros
of the filter. The result is that in the stop band, the parallel structure is more sensitive to rounding
errors than the cascade structure.
Finally, the cascade structure allows the coefficients of the IIR filters to be represented using
fewer bits. Thus, this structure is the one most frequently used.

7.2.7 Round-Off Noise


Another limitation occurs in the practical realization of IIR filters because of the limits on the
capacity of the data memories. This limit is the origin of the round-off noise. It will be analyzed
for the D–N structure, but the same arguments apply to the N–D one. While this structure has
specific advantages, mainly related to the calculation of partial sums and to the sequencing of the
multiplications, it is the D–N structure which is most often used because it is generally easier to
design, construct, and test.
168 7 Infinite Impulse Response Filters

In the presence of a signal – that is, for nonzero values of x(n) – rounding before storage in the
memory with quantization step q is equivalent to superposition
on the input signal of an error signal e(n) such that |e(n)| < q/2, assumed to have uniform spec-
trum and power so that 𝜎 2 = q2 /12.
If other roundings are involved (for example, in the multiplications), it is apparent that the error
signals produced are added either to the input or to the output signal depending upon whether they
correspond to the coefficients of the recursive or non-recursive part. Consequently, to simplify the
analysis, only the case of single quantization is considered. (By modifying the power of the injected
noise, it is always possible to reach a scenario where only single quantization is required).
The error signal applied to the input of the filter undergoes the filtering function and, by applying
equation (4.25), the power of the round-off noise at the output is:
1| |2
q2 | N(f ) | df
Nc = (7.63)
12 ∫0 | D(f ) ||
|
or, as a function of the set h(k), the impulse response of the filter,
q2 ∑

Nc = |h(k)|2 (7.64)
12 k=0
The implementation of the cascade structure presents some possibilities for reducing this noise
power [11].
When the filter is realized as a cascade of N/2 second-order sections, the round-off noise pro-
duced in each section undergoes the filtering function of that section and of the following ones. In
this case, it should be noted that the amplitude, or level, at the input of each section, varies with
the rank of the section and the frequency of the signal being considered.
If the rounding procedure is the same for all sections, the noise produced is the same and the con-
tributions are added to each other. The total noise at the output of the filter under these conditions
has a power of:
( )
q2 ∑ 1∏
N∕2 N∕2
| Ni (f ) |2
Nc = | | df (7.65)
12 j=1 ∫0 l=j || Di (f ) ||

It is important to arrange the cascade of sections in such a way that the total round-off noise is
minimized, and the parameters are available:
(1) The pairing of the poles and zeros to form a section.
(2) The order in which the sections are arranged.
(3) The scaling factor applied to each section.
These three parameters will be examined in turn.
(1) The pairing of the poles and zeros. The products

N∕2
| Ni (f ) |2
Pj (f ) = | |
| Di(f ) |
l=j | |
have to be minimized, which means that each of the factors is minimized and, in particular,
the lowest maximal value for each factor must be obtained. This condition is approximately
fulfilled by the very simple procedure of associating the pole nearest to the unit circle with its
closest zero, then the next pole with the zero which is then closest, and so on.
(2) Determining the order of the sections. The factor making the largest contribution to the
total noise is often the one which has the highest maximal value. It can be worthwhile
7.2 Direct Calculations of the Coefficients Using Model Functions 169

to place it at the beginning of the chain so that its contribution appears only once in the
total sum following equation (7.65) and to connect the sections in decreasing order of their
maxima.
(3) Calculating the scale factors. These are the parameters which control the scaling of the num-
bers in the internal data memories. They are calculated for each section so as to maximize the
amplitude of the signal while avoiding clipping.
Once the scale factors are known, all the elements involved in the implementation are available
and the power of the rounding noise at the output of the filter can be determined for each value of
the number of bits in the internal data memories [12–14].

7.2.8 Comparison of IIR and FIR Filters


Both IIR and FIR filters allow any given specification to be satisfied, and a systems designer often
has to choose between the two approaches. The criterion is the complexity of the circuits to be
employed. In practice, as will be seen in a later chapter, the comparison is reduced mainly to eval-
uating a single parameter: the number of multiplications to be carried out.
Equations (5.32) and (7.29) give estimates for FIR and IRR filters of order N, which is needed to
satisfy the specifications of low-pass filtering as expressed by the ripple in the pass and stop bands,
and by the widths of the pass band and the transition band.
For an FIR linear phase filter, the coefficients are symmetric and, if N is even, N/2 have different
values. Such a filter, therefore, requires N/2 memories for the coefficients and N memories for the
internal data. For each output number, N/2 multiplications and N additions have to be performed.
If phase linearity is not imposed, the order N can be reduced with minimum-phase filters, as shown
by equation (5.56). This reduction depends on the ripple in the pass band and is less than 50%.
There is a consequent increase in the number of multiplications as the symmetry of the coefficients
disappears. Amongst the advantages of FIR filters, it should be stressed that they are always stable,
and are not difficult to implement.
IIR filters are more difficult to achieve except for the particular case of the elliptic type, which is
the most efficient and the most often used. The estimate of equation (7.29) shows that the order of
the filter is a maximum near the frequency f s /4.
Let n be the filter order. Assuming that the filter is formed of second-order sections, each com-
prising four coefficients and two data memories (Section 6.4), realization of a filter with n data
memories requires 2n memories for the coefficients, and each output number involves 2n additions
and 2n multiplications.
If comparison between FIR and IIR filters is limited to the number of multiplications to be carried
out to produce an output number, then, in the case of the low-pass filter, the IIR type is more
advantageous than the FIR one for values of the parameters such that:
N > 4n (7.66)
Following equations (5.32) and (7.29), it can be seen that the transition band is the most cru-
cial parameter in the comparison, and similar number of multiplications are obtained ( in )the most
f f
adverse conditions for the IIR filter if the transition band is given by: 13 Δfs ≈ 2 log Δfs that is,
Δf ≃ f s /3. It follows that the inequality (7.66) is satisfied as soon as Δf is smaller than f s /3. This is
the case in the great majority of applications. For example, for parameter values corresponding to
Figure 7.11, this inequality is always valid.
Linearity in the phase can be approximated by an IIR filter over a limited frequency band by sup-
plementing, for example, the basic elliptic filter with a group delay equalizer, made of phase-shifter
170 7 Infinite Impulse Response Filters

sections whose properties are given in Section 6.3. Experience has shown that the FIR filter, which
exhibits perfectly linear phase, always requires fewer calculations [15]. It is also easy to implement.
Finally, it is recommended that FIR filters be used when linear phase is required and that IIR
filters be used in other cases.
Nevertheless, the comparison above has been made with the implicit hypothesis that the sam-
pling rate is the same at the input and output of the filter. The bases for the comparison are notice-
ably modified if this constraint disappears, as will be shown in a later chapter.

Exercises
7.1 Using the formulae given in Section 7.1, calculate the frequency and phase response and the
group delay of the filter section defined by the relation:

y(n) = x(n) + 0.7x(n − 1) + 0.9y(n − 1)

Using the same formulae, calculate the frequency and phase response and the group delay
of the second-order section with the Z-transfer function:
b2 + b1 Z −1 + Z −2
H(Z) =
1 + b1 Z −1 + b2 Z −2

7.2 It is proposed to use charts of analog filters in order to calculate a digital band-pass filter.
What specifications should be used in order for the digital filter to reject the signal compo-
nents in the bands (0−0.15) and (0.37−0.5) and show no attenuation in the band (0.2−0.33),
assuming fs = 1? Study direct calculation using a low-pass transformation and 7.31.

7.3 Calculate the coefficients of a Butterworth filter of order 4 whose amplitude has the value
2−1/2 at the frequency fc = 0.25. Give the decomposition into second-order sections.

7.4 Use a frequency transformation to transform the band-pass filter in Section 7.2.4 (Figure 7.6)
into a high-pass filter with a pass-band limit of fH = 0.4. How are the poles and zeros changed
in this operation?

7.5 A filter has the poles:

P1 = 0.9235 ± j0.189; P2 = −0.806 ± j0.354; P3 = 0.7326; P4 = −0.4515

and zeros: Z1 = 0.9953 ± j0.0965; Z2 = Z3 = Z4 = Z5 = −1


Give the frequency response of the decomposition into second-order sections. Calculate the
scaling factors of the sections and the internal data wordlength if the round-off noise added
1
by the filter remains below the power of 10 of the noise present at the input and if the num-
bers at the input have 10 bits.

7.6 The specification of the above filter is increased by 0.1 dB in order to permit rounding of the
filter coefficients. How many bits are required to represent the coefficients in the cascade
structure? Estimate how many bits are required to represent the coefficients for the parallel
structure. Find an optimum for rounding the coefficients. Can the number of bits found
earlier be reduced?
References 171

7.7 Does the filter given in Section 7.2.3 exhibit auto-oscillations? What are the frequencies and
the amplitudes? Answer the same question for the filter described in Exercise 5.

7.8 How many multiplications are required by the filter in Figure 7.5? How many memory
locations are necessary? How many coefficients are required by an FIR filter for the same
specification? Compare the number of multiplications and the memory capacities.

7.9 It is desired to achieve the channel filtering function in a PCM transmission terminal by
digital methods. The telephone signal is sampled at 32 kHz and coded into 12 bits, and the
filtering is carried out by a low-pass IIR filter. The pass band is 3300 Hz, and the stop band
begins at 4600 Hz.
The ripples in the pass and stop bands have value of:

𝛿1 ≤ 0.015; 𝛿2 ≤ 0.04

A computer program for elliptic filters produces the following results: order of filter: N = 4
zeros: z1 = 0.09896 ± j0.995
z2 = 0.5827 ± j0.8127
poles: P1 = 0.6192 ± j0.2672
P2 = 0.702 ± j0.589
Calculate the transfer function of the filter decomposed into second-order sections.
What is the value of the overall scaling factor, knowing that the amplitude at frequency
0 is 0.99?
The coefficients are quantized into 10 bits. Determine the displacement of the infinite atten-
uation frequencies and evaluate the additional pass-band ripple.
Calculate the scaling factor to be assigned to each section and estimate the round-off noise
produced if the data memories have 16 bits.
Give the complete diagram for the filter.
Evaluate the complexity in terms of:
(1) The number of multiplications and additions per second.
(2) The number of memory bits.

7.10 The following specifications are given for a low-pass filter:

𝛿1 = 0.01; 𝛿2 = 0.01; f1 = 1700Hz; f2 = 2000Hz; and fs = 8000Hz

Calculate the filter order. By taking the order N = 6, a margin on the in-band ripple is avail-
able. Determine the coefficient wordlength.
If the signal-to-noise ratio degradation is limited to 0.1 dB, give the increase of the internal
data wordlength with respect to the input data wordlength.

References

1 A. Oppenheim and R. Schafer, Digital Signal Processing, Prentice Hall, Englewood Cliffs, NJ,
1974, Chapters 5 and 9.
2 L. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Prentice Hall,
Englewood Cliffs, NJ, 1975, Chapters 4 and 5.
172 7 Infinite Impulse Response Filters

3 R. Boite and H. Leich, Les filtres numériques: analyse et synthèse des filtres unidimensionnels,
Masson, Paris, 1980.
4 B. Gold and C. Rader, Digital processing of signals, McGraw-Hill, New York, 1969.
5 A. G. Constantinides, Spectral transformation for digital filters. Proceedings of the IEEE, 8, 1970.
6 A. Deczky, Synthesis of recursive digital filters using the minimum p-error criterion. IEEE
Transactions on Audio and Electroacoustics, 20, 1972.
7 R. Fletcher and M. J. D. Powell, A rapidly convergent descent method for minimization.
Computer Journal, 6(2), 1963.
8 J. P. Thiran, Equal ripple delay recursive filters. IEEE Transactions on Circuit Theory, 1971.
9 T. Durrani and R. Chapman, Optimal all-pole filter design based on discrete prolate spheroidal
sequences. IEEE Transactions, ASSP32(4), 716–21, 1984.
10 P. P. Vaidyanathan, S. K. Mitra and Y. Neuvo, A new approach to the realisation of low sensitiv-
ity IIR filters. IEEE Transactions, ASSP34(2), 350–61, 1986.
11 L. B. Jackson, Round-off noise analysis for fixed point digital filters in cascade or parallel form.
IEEE Transactions on Audio and Electroacoustics, 1970.
12 A. Peled and B. Liu, Digital Signal Processing: Theory, Design and Implementation, John Wiley,
New York, 1976.
13 Von E. Lueder, H. Hug and W. Wolf, Minimizing the round-off noise in digital filters by
dynamic programming. Frequenz, 29(7), 211–14, 1975.
14 D. Mitra, Large-amplitude, self-sustained oscillations in difference equations, which describe
digital filter sections using saturation arithmetic. IEEE Transactions, ASSP25(2), 1977.
15 L. Rabiner, J. F. Kaiser, O. Herrmann and M. Dolan, Some comparison between FIR and IIR
digital filters. Bells Systems Technical Journal, 53, 1974.
173

Digital Ladder Filters

The filter structures presented in the previous chapters are deduced directly from their Z-transfer
functions, with the coefficients applied to the multiplying circuits being those of the powers of Z −1 .
More elaborate structures can be developed.
In analog filtering, structures exist which allow filters with very low ripple and excellent
selectivity to be constructed using passive components of limited precision. In digital filtering,
these properties can be translated into a reduction in the round-off noise and in the number of
bits representing the coefficients.
Analog filter networks are based on cascading two port circuits whose properties will be
considered first [1].

8.1 Properties of Two-Port Circuits


The general two-port circuit terminated by the resistors R1 and R2 is shown in Figure 8.1, together
with the currents I and voltages V at ports 1 and 2. This circuit, assumed to be linear, is defined by
its impedance matrix z, which establishes the relations between the variables, generally written in
reduced form as:
V √
R = R1 = R2 ; 𝜐 = √ ; i = I R
R
Then,
𝜐 = zi (8.1)
with:
[ ]
z11 z12
z=
z21 z22
The values z12 and z21 are the transfer impedances of the two-port circuit. It is reciprocal if
z12 = z21 .
If reversing the circuit does not change the external conditions, it is said to be symmetric, and
we have z11 = z22 .

Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
174 8 Digital Ladder Filters

I1 I2 Figure 8.1 Two-port element with resistive


terminations.

E
V1 V2 R2
R1

The transmission and reflection coefficients of the circuit can be demonstrated if another
matrix – the distribution matrix – is introduced. If a reference case is defined with unit terminating
resistors, the incident and reflected waves a and b can be written as:
1
a = (𝜐 + i) (8.2)
2
1
b = (𝜐 − i) (8.3)
2
The relations between variables a and b can be obtained using equation (8.1):
1
a = (z + I2 )i
2
[ ]
1 0
İ 2 =
0 1
1
b = (z − I2 )i
2
Thus,
b = Sa (8.4)
where:
[ ]
S11 S12
S=
S21 S22
and:
S = (z − İ 2 )(z + İ 2 )−1 (8.5)
If the circuit is reciprocal, then:
S12 = S21 = 𝜏 (8.6)
where 𝜏 is the transmission coefficient:
2V2
𝜏= (8.7)
E
If the input and output impedances of the circuit are z1 and z2 , respectively, one can write:
z1 − 1 z2 − 1
S11 = 𝜌1 = ; S22 = 𝜌2 = (8.8)
z1 + 1 z2 + 1
The values 𝜌1 and 𝜌2 are the reflection coefficients at the input and output of the two-port circuit.
If the circuit is not dissipative, the power that it absorbs is zero, and it can be shown that the
distribution matrix of such a reciprocal two-port network has the form [2]:
[ ]
1 h f
S= (8.9)
g f ±h∗
where f, g, and h are real polynomials with the following properties:
8.1 Properties of Two-Port Circuits 175

(1) They are linked by a relation which, on the imaginary axis, corresponds to:
|g|2 = |h|2 + |f |2
The notation h* (p) indicates h(− p).
(2) Depending upon whether f is of even or odd degree, the lower or upper sign is taken in
equation (8.9).
(3) Each root of g in the complex plane lies in the left-hand half plane.
The polynomials f , g, and h are the characteristic polynomials of the circuit. The roots of f (p) are
generally on the imaginary axis in the stop band and are transmission zeros. The roots of h(p) are
attenuation zeros, and for a non-dissipative network, they are generally on the imaginary axis in
the pass band.
For the circuit in Figure 8.1, the transmission coefficient is:

2V2 R2
S12 = (8.10)
E R1
The attenuation in decibels is denoted by the function Af (𝜔), where:
( )
| g(𝜔) |2
| = 10 log 1 + |h|
2
Af (𝜔) = −10 log |S12 (𝜔)|2 = 10 log || |
| f (𝜔) | |f |2
The relation
| f (𝜔) |2 | h(𝜔) |2
| | | |
| g(𝜔) | + | g(𝜔) | = 1 (8.11)
| | | |
expresses the fact that the non-transmitted power is reflected.
For a cascade arrangement, it is important to pay equal attention to the transfer matrix t
defined by:
[ ] [ ]
b1 a
=t 2 (8.12)
a1 b2
The cascade arrangement is represented by the product of the transfer matrices.
The transfer matrix of non-dissipative two-port circuits takes the form:
[ ]
1 ±g∗ h
t= (8.13)
f ±h∗ g
By way of example, Figure 8.2 gives the transfer matrices of several elementary circuits.
The fact that the circuit element is non-dissipative has important consequences for the atten-
uation Af (𝜔). In the pass band, Af (𝜔) cannot take negative values. Consequently, at frequencies
where h(𝜔) is zero, its derivative with respect to any of the parameters must also be zero. In a filter
with inductances and capacitances, terminated by resistors, variation of the values of the L and S
elements does not affect the attenuation to the first order at frequencies where it is zero.
If the ripple is small, it can be assumed that this property applies over the whole pass band. In
practice, it can be taken that, in a ladder filter, for example, the interactions between the different
branches are such that a perturbation in one element has repercussions for all the other factors of
the attenuation function, having an overall compensating effect which minimizes the incidence of
the perturbation.
Given this behavior, it is of interest to find digital filter structures which have similar properties.
In effect, in a digital filter where the amplitudes of the ripples in the pass and stop bands are similar,
176 8 Digital Ladder Filters

L C Figure 8.2 Transfer matrices for two-port elements.

1–LP/2 1 1
LP/2 1 p– 2C 2C
p 1 1
–LP/2 1+LP/2 – p+ 2C
2C

C L

1 1
1–CP/2 –CP/2 p– –
1 2L 2L
p 1 1
CP/2 1+CP/2 2L
p+ 2L

the denominator of the transfer function determines the number of bits required to represent the
coefficients. Structures derived, for example, from ladder analog filters can therefore be expected
to lead to significant gains in the coefficient wordlengths, in the complexity of the multipliers, and
also in the power of the round-off noise.
Ladder structures are the most commonly used type in passive analog filtering. The procedure
for obtaining the elements of such a structure, using a transfer function, is described in detail in
Ref. [2]. It consists of factorizing the overall transfer matrix, defined using the calculated transfer
function H(𝜔) into partial matrices corresponding to the series and parallel arms of the ladder
structure.
The most direct approach for obtaining a digital filter structure from an analog ladder filter is to
simulate its voltage–current flow chart.

8.2 Simulated Ladder Filters


The representation of ladder filters in terms of their flow chart is used to synthesize active filters
from integrator or differentiator circuits. The ladder filter shown in Figure 8.3, and terminated with
the resistors R1 and R2 , will be used to demonstrate the voltage–current flow chart. Application of
Kirchhoff’s laws leads to the following relations:
I1 = (E − V2 )R−1
1 ;
−1
IK−1 = (VK−2 − VK )ZK−1 ; IN+1 = VN R−1
2
V2 = (I1 − I3 )Y2−1 ; VK = (IK−1 − IK+1 )YK−1
where the index K takes the values 4, 6, …, N.
The flow chart is composed of arcs, each of which is associated with a coefficient representing an
impedance or an admittance. With each junction, there is associated either a voltage at the node or
the current in a branch. For each arc, there is formed the product of the corresponding coefficient
and the magnitude associated with its origin. The magnitude associated with each junction is the
sum of the products corresponding to its various associated arcs.
The chart for the ladder filter of Figure 8.3 is shown in Figure 8.4. The currents and voltages are
associated with alternate junctions. This topology is defined as “leapfrog.”
Simulated ladder digital filters are the structures obtained by simulating each arc in the group,
or each branch of the ladder, by an element with an equivalent transfer function.
8.2 Simulated Ladder Filters 177

V2 V4 VN–2 VN
R1 Z3 ZN–3 ZN–1 R2

I1 I3 IN–1 IN+1
E Y2 Y4 YN–2 YN

Figure 8.3 Analog ladder filter.

–1
–R1–1 –R2–1
E
R1 I1 V2 I3 V4 IN–1
VN
1 Y2–1 1 Z3–1 1 Y4–1 –1
ZN–1 1 YN–1

–1 –1

Figure 8.4 Flow graph of a ladder filter.

A particularly simple case occurs when the series impedors Z K−1 are inductors and the paral-
lel branches y𝜅 are capacitors (K = 4, 6, …, N). Such filters are purely recursive and are without
frequencies of infinite attenuation.
The transfer functions to account for them take the form:
−1 R 1
ZK−1 = ; YK−1 = (8.14)
sLK−1 sCK R
where s is the Laplace variable and R is a normalization constant. Both cases involve the transfer
functions of integrators which are easily produced using operational amplifiers and R−C networks.
The diagram in Figure 8.5 then appears, being deduced from Figure 8.4. It represents the functions
to be implemented and shows the circuit diagram using integrators.

R R
R1 R2
ER
R1 – 1 – – 1 VN
Σ Σ Σ R Σ R Σ
– sC2R sL3 – sLN–1 sCNR

– – VN
Σ Σ

R R
R1 R2
1 R R 1
sC2R sL3 sLN–1 sCNR

ER –
Σ Σ Σ
R1 – –

Figure 8.5 Filter composed of integrators.


178 8 Digital Ladder Filters

The digital realization consists of replacing each integrator with an equivalent function. In
Ref. [3], it is shown that the only digital integrator circuit, which is simple to realize, and which
is equivalent to an analog integrator, is the one represented by Figure 8.6 and whose Z-transfer
function, i(z), is written as:
aZ−1∕2
I(Z) = (8.15)
1 − Z −1
The equivalence between the analog and digital integrators is obtained in the same way as for
any transfer function, by replacing Z with ej𝜔T . Thus,
ae−j𝜔(T∕2) a 1
I(𝜔) = = (8.16)
1 − e−j𝜔T 2j sin(𝜔(T∕2))
where T is the sampling period of the digital circuit. This function is equivalent to the analog
function a/j𝜔T with frequency warping. If f A denotes the analog frequency and f N is the digital
frequency, we can show:
𝜋 fA T = sin(𝜋 fN T) (8.17)
The frequency warping thus introduced is different from that obtained with the bilinear trans-
form introduced in the previous chapter, as shown in Figure 8.7, and it must be taken into account
when calculating a filter from a specification.
The circuit in Figure 8.6 shows the disadvantage of introducing the function Z −1/2 , which cor-
responds to an additional memory circuit, but the transfer function of a ladder filter is not altered
when the impedances of all branches are multiplied by the same function [3]. This property has
already been used to introduce the normalization constant R. If the impedances are multiplied by
Z −1/2 , this term is eliminated from all of the Z-transfer functions of the integrators in the circuit,
which then become:
TR 1
Ii (Z) = for odd i
Li 1 − Z −1

a Figure 8.6 Digital integrator circuit.


Z–1/2 +

Z–1

fA Bilinear Figure 8.7 Frequency warping by the sine transform.


transform

1
2T

1
πT

Sine
transform

0 1 fN
2T
8.2 Simulated Ladder Filters 179

and:
T Z −1
Ii (Z) = for even i
Ci R 1 − Z −1
In contrast, the termination resistances are transformed to R1 Z −1/2 and R2 Z −1/2 ; the terminations
are no longer purely resistive, as they have the transfer functions:
R1 e−j𝜋fT ; Ṙ 2 e−j𝜋fT
This effect can be neglected when the sampling frequency is large compared to the pass band,
as there is an insignificant change in the transfer function of the filter. Further, the resistances R1
and R2 can be chosen as unity, as can the normalization constant R. The circuit of the digital filter
obtained under these conditions is given in Figure 8.8.
The coefficients have the following values, for odd-order N:
T T T N −1
aN = ; a2i−1 = ; a2i = ; i = 1, 2, … (8.18)
CN+1 C2i L2i+1 2
The filter thus realized involves N multiplications and N memories for a transfer function of
order N. For these parameters, the structure is canonic. The number of additions is 2N + 1.
To summarize, the calculation of a simulated ladder digital filter from an imposed specification
involves the following stages:
(1) Transposing the specification by modifying the frequency axis using equation (8.17) above.
(2) Calculating the elements of a filter with passive ladder elements LC, satisfying the transposed
specification.
(3) Using the values of the elements so obtained to calculate the coefficients ai (i = 1, 2, …, N) of
the digital filter using equations (8.18).
The chief attraction of the structure obtained in this way is that the coefficients can be represented
by a very small number of bits. Also, some multiplications can be replaced by simple additions,
and, in certain cases, all the multiplications of the filter can be eliminated, resulting in signifi-
cant savings in the circuitry. To illustrate this property, let us consider a low-pass filter of order
N = 7 [3], with elements having the following values (Figure 8.5):
R = R1 = R9 = 1
C2 = 1.2597 = C8
L3 = 1.5195 = L7
C4 = 2.2382 = C6
L5 = 1.6796

x(n) – – – y(n)
+ + +

Z–1 Z–1 Z–1


a2 aN–1
+

a1 Z–1 + Z–1 + aN

+ +
– –

Figure 8.8 Digital simulated ladder filter.


180 8 Digital Ladder Filters

Af(f)
(dB)

0.5
3 bits
0.4

0.3
5 bits
0.2
10 bits
0.1

0 0.5 1 f

Figure 8.9 Ripples in the pass band for various coefficient wordlengths.

The coefficients ai (i = 1, 2, …, N) of the corresponding simulated ladder digital filter are calcu-
lated with a sampling period T = 1/fs = 0.01, from equation (8.18) above. The ripples of the filter in
the pass band are shown in Figure 8.9, where the coefficients are represented by 10, 5, and 3 bits. It
is notable that, when represented with 5 bits, the attenuation zeros are conserved. With 3 bits, they
are also conserved except for the one closest to the transition band. Thus, the insensitivity to the
first order of the attenuation zeros, which was shown in Section 8.1, is demonstrated. In compari-
son with the cascade structure in the previous chapter, this example shows an estimated gain of 4
or 5 bits for representing the coefficients.
The technique described in this section can be extended to filters other than the purely recursive
low-pass filter, but the designs will be more complicated. Also, the need to have a sampling fre-
quency which is large in comparison with the pass band is not conducive to efficient processing.
In practice, the simulated ladder structure is primarily used with a different method of performing
the calculations – that used in switched-capacitor devices.

8.3 Switched-Capacitor Filters

Strictly speaking, because they do not use arithmetic operations, filters using switched-capacitor
devices are not digital filters. Nevertheless, they do use the same design methods and are comple-
mentary to digital filters. They are frequently used in analog–digital conversion circuits.
The basic principle, which is presented in detail in Reference [4], is as follows. Switching a capac-
itor S between two voltages V 1 and V 2 at frequency f s is equivalent to introducing a resistance R
given by:
1
R= f
C s
between the two potentials. In effect, as is shown in Figure 8.10, the capacitor is charged to voltages
V 1 and V 2 alternately and a charge transfer C(V 1 − V 2 ) results. If the operations are performed at
frequency f s , a current i:

i = C(V1 − V2 )fs

flows between the voltages V 1 and V 2 .


8.3 Switched-Capacitor Filters 181

V1 V2 V1 V2

C (Q = CV1) C (Q = CV2)

Figure 8.10 Switching a capacitor between voltages V1 and V2 .

V0 C2 V0 C2

R
Σ – –
– V2 C1 V2
+ +

V1 V1
dV2 1 ΔV2 C1
=– (V0 –V1) = –fe (V0 –V1)
dt RC2 Δt C2

Figure 8.11 Switched-capacitor integrator.

This equivalent resistor is inserted in an integrator circuit as shown in Figure 8.11. The integrator
being considered has an adder at its input, like those in Figure 8.5. The equation describing the
operation of the analog integrator is also shown in Figure 8.11. In the switched capacitor version,
the capacitor C1 is alternately connected at frequency f s across the input of the operational
amplifier, and between voltages V 0 and V 1 . The equation for the variation ΔV 2 in the output
voltage during the interval Δt, which is assumed to be large in comparison with the period l/f s , is
shown in the diagram.
The condition that the two types of integrators are equivalent is:
1
C1 = (8.19)
fs R
However, in order to completely analyze the switched-capacitor integrator, it is necessary to take
account of the sampling [5] and to calculate its Z-transfer function. Assume that 𝜐e (t) is the input
signal and that 𝜐2 (t) is the output one. The sampling period T is assumed to be divided into two
equal parts. The capacitor C1 is connected to the input, to the integrator for time T/2,
( and)to the
voltage 𝜐e (t) itself for time T/2. Let us assume that this is between times nT and n + 12 T . The
charge transmitted to the integrator is Q(nT) such that:
[( ) ]
1
Q(nT) = C1 𝜐e n + T
2
Under these conditions, at time (n + 1)T, the output voltage is:
C [( ) ]
1
𝜐2 [(n + 1)T] = V2 (nT) − 1 𝜐e n + T
C2 2
By taking the Z-transform of the two components, one has:
V2 (Z) C Z −1∕2
= H(Z) = − 1 (8.20)
Ve (Z) C2 1 − Z −1
One finds the same type of transfer function as was given by equation (8.15) for digital circuits.
The switched-capacitor integrator performs in exactly the same way as the digital circuits described
182 8 Digital Ladder Filters

in the previous section, and the same warping of the frequency axis is involved. It should be noted
that, to ensure that no further delay is introduced and that this function is conserved when cascad-
ing two integrators, the capacitors of the two integrators must be switched in antiphase.
A design involving a switched-capacitor device for a simulated ladder filter, like that in Figure 8.5,
is obtained by substituting integrator circuits and calculating the value to be given to the switched
capacitors in each case.
Example: Assume an implementation using switched capacitors for a Butterworth filter of
order 4, with an analog circuit as shown in Figure 8.12(a).
The procedure described in the previous section results in the design (Figure 8.12(b)) for produc-
ing a filter from integrators, assuming unit terminal resistances. The switched-capacitor design is
shown in Figure 8.12(c). The coefficients ai (i = 1, 2, 3, 4) which define the ratios of the capacitances
are given by equation (8.18).
If the filter has a 3 dB attenuation frequency f c equal to 1 kHz, the analog parameters are as
follows:

R1 = R2 = 1
C2 = 121.8 × 10−6 ; C4 = 294.1 × 10−6
L3 = 294.1 × 10−6 ; L5 = 121.8 × 10−6

R1 L3 L5
E Vs
C2 C4 R2

(a)
– –
Σ Σ Vs

1 R 1 R
jωRC2 jωL3 jωRC4 jωL5


E Σ Σ Σ
– –
a2 c2 (b) a4 c4 Vs

C1 C2 C3 C4

E
a1 c1 a3 c3
a1 c1

(c)

Figure 8.12 Switched-capacitor filter of order 4.


8.4 Lattice Filters 183

With a sampling frequency of 40 kHz:


1
a1 = = 0.205 = a4
4.87
1
a2 = = 0.085 = a3
11.76
Thus, in switched-capacitor devices, the precision and stability of the integrator time constant
depends on the external sampling frequency and on the ratio of the capacitances. These devices
allow very selective filters to be produced on a silicon chip as a monolithic integrated circuit.

8.4 Lattice Filters

The lattice structure occurs in the analysis and synthesis of speech for simulating the vocal tract,
and also more generally in systems for linear prediction. It allows the realization of finite impulse
response (FIR) and infinite impulse response (IIR) filters [6].
Consider the structure with M sections as represented in Figure 8.13. The outputs y1 (n) and u1 (n)
of the first section are related to the input set x(n) by:

y1 (n) = x(n) + k1 x(n − 1)


u1 (n) = k1 x(n) + x(n − 1) (8.21)

Similarly, y2 (n) and u2 (n), the outputs of the second element, are related to the inputs by:

y2 (n) = x(n) + k1 (1 + k2 )x(n − 1) + k2 x(n − 2)


u2 = (n) = k2 x(n) + k1 (1 + k2 )x(n − 1) + x(n + 2) (8.22)

By iteration, yM (n) and uM (n), the outputs produced by the filter are related to the set x(n) by the
following equations, which correspond to FIR filtering:


M
yM (n) = ai x(n − i) (8.23)
i=0


M
uM (n) = aM−i x(n − i) (8.24)
i=0

The two FIR filters thus obtained have the same coefficients but in reverse order. Their Z-transfer
functions, H M (Z) and U M (Z), are image polynomials. Thus,


M
HM (Z) = ai Z −i
i=0
∑ M
UM (Z) = aM−i Z −i = Z −M HM (Z −1 ) (8.25)
i=0

An iterative process is used to determine the coefficients ki of the lattice filter from the coefficients
ai (1 ⩽ i ⩽ M). Firstly, the coefficient a0 is assumed to be equal to unity. Then it is straightforward,
using the equations given earlier (and also directly from Figure 8.13), to prove that:

kM = a M
184 8 Digital Ladder Filters

y1(n) y2(n) yM(n)


+ + +

x(n)
k1 k2 kM

Z–1 + Z–1 + Z–1 +


u1(n) u2(n) uM(n)

Figure 8.13 FIR lattice filter.

This point is the basis for the calculation. By using H m (Z) and U m (Z) (1 ⩽ m ⩽ M) to denote the
corresponding transfer functions at the outputs of the mth section, the following matrix equation
can be written:
[ ] [ 1 k Z −1 ] [ ]
Hm (Z) m Hm−1 (Z)
=
Um (Z) km Z −1 Um−1 (Z)

By assuming km ≠ 1, this equation can also be written as:


[ ] [ ]
Hm−1 (Z) 1 1 −km [Hm (Z) ]
Um−1 (Z)
= 2 Um (Z)
(8.26)
1 − km −km Z Z
Thus, the polynomials H m−1 (Z) and U m−1 (Z) are image polynomials of degree m − 1 whose coef-
ficients ai(m−1) (1 ⩽ i ⩽ m − 1) are calculated using the coefficients aim (1 ⩽ i ⩽ m) of the polynomials
H m (Z) and U m (Z).
Under these conditions, this becomes:
amm = km ; a(m−1)(m−1) = km−1
1
a(m−1)(m−1) = [a(m−1)m − amm a1m ] (8.27)
1 − a2mm
The coefficients km (1 < m < M) are thus calculated in M iterations.
Example: Consider the transfer function H 3 (Z) such that:
H3 (Z) = 1 − 1.990Z −1 + 1.572Z −2 − 0.4583Z −3
Then:
k3 = −0.4583
Using equation (8.28), one can write:
H2 (Z) = 1 − 1.607Z −1 + 0.8355Z −2
Thus:
k2 = 0.8355
By application of equation (8.29), we have:
k1 = 0.88756 and H1 (Z) = 1 − 0.8756Z −1
The realization of purely recursive IIR-type filters results in a dual structure, as represented in
Figure 8.14. The sets x1 (n), u(n), and y(n) are related by the equations:
y(n) = x1 (n) − k1 y(n − 1)
u1 (n) = k1 y(n) + y(n − 1)
8.4 Lattice Filters 185

xM(n) x2(n) x1(n) y(n)


+ + +
– – –

kM k2 k1

+ Z–1 + Z–1 + Z–1


uM(n) u2(n) u1(n)

Figure 8.14 Purely recursive IIR lattice filter.

Similarly, the sets x2 (n), x1 (n), u1 (n), and u2 (n) are related by:
x1 (n) = x2 (n) − k2 u1 (n − 1)
u2 (n) = k2 x1 (n) + u1 (n − 1)
This results in the transfer function H 2 (Z) between the input x2 (n), and the output y(n) given by:
1
H2 (Z) =
1 + k1 (1 + k2 )Z −1 + k2 Z −2
Similarly, the transfer function U 2 (Z) relates u2 (n), and y(n) where:
U2 (Z) = k2 + k1 (1 + k2 )Z −1 + Z −2
By iteration, it can be seen that the sets xM (n) and y(n), on the one hand, and uM (n), and y(n) on
the other, are related by the equations:

M
y(n) = xM (n) − bi y(n − 1) (8.28)
i=1

M−1
uM (n) = bM−i y(n − i) + y(n − M) (8.29)
i=0

Consequently, the transfer functions H M (Z) and U M (Z) are:


1 1
HM (Z) = =

M DM (Z)
1+ bi Z −i
i=1

M−1
UM (Z) = bM−i Z −i + Z −M = Z −M DM (Z −1 ) (8.30)
i=0

The coefficients ki (1 ⩽ i ⩽ M) of the lattice filter are calculated by iteration from the coefficients
bi of the IIR filter, after noting that:
kM = bM
By using H m (Z) and U m (Z) to denote the relating functions for the set of m sections (1 ⩽ m ⩽ M),
it is possible, using the equations of definition:
xm−1 (n) = xm (n) − km um−1 (n − 1)
um (n) = km xm−1 (n) + um−1 (n − 1)
to produce the following matrix equation:
[ ] [ 1 k Z −1 ] [ ]
Dm (Z) m Dm−1 (Z)
Um (Z)
= −1 Um−1 (Z)
km Z
186 8 Digital Ladder Filters

Figure 8.15 Lattice filter cell with single multiplication.


x'n k x'n–1
+ +

+ Z–1
u'n u'n–1

As in the case of FIR filters, this matrix equation is also written, for km ≠ 1:
[ ] [ ]
Dm−1 (Z) 1 1 −km [Dm (Z) ]
Um−1 (Z)
= 2 Um (Z)
(8.31)
1 − km −km Z Z
As with FIR filters, this expression allows for the calculation of the coefficients ki (1 ⩽ i ⩽ M) of
the IIR lattice filter in M iterations, using the polynomial DM (Z), where:

M
DM (Z) = 1 + bi Z −i
i=1

The lattice structures given in Figures 8.13 and 8.14 are canonical for the data memories but not
for the multiplications. They can be made canonical, for an IIR type filter, for example, by using
the single-multiplication section represented in Figure 8.15.
However, a further addition is then necessary. The equations of this first-order section are as
follows:
(1 + k)x1 (n) = y(n) + ky(n − 1)
(1 + k)u1 (n) = ky(n) + y(n − 1)

To within a factor of (1 + k), they are equivalent to the equations for a two-multiplier lattice.
In contrast with the structures described in the previous sections, lattice filters do not have any
particular advantage for the number of bits needed to represent the coefficients. Nevertheless, in
practice, they have one interesting property in that a necessary and sufficient condition for an IIR
filter to be stable and have its poles inside the unit circle is that the coefficients have a modulus of
less than unity:

1≤i≤M

This property is obvious for k1 in Figure 8.14 if the appropriate section is isolated. It can be
extended to the other coefficients by considering the subcircuits and using recurrence.
This results in a control of the stability which can be realized quite simply and is of particular use
in systems such as adaptive filters, where the values of the coefficients are constantly changing.
The lattice structures considered above are either non-recursive or purely recursive. Note that
the purely recursive structure can be supplemented to make a general filter; it is sufficient to form
a weighted sum of the variables um (n). That is, the expression:

M
vM = 𝛾0 y(n) + 𝛾m um (n)
m=1

defines FIR-type filtering of the signal y(n) by virtue of equation (8.29). As the coefficients
bi (1 ≤ i ≤ M) are fixed, the coefficients 𝛾 i can be determined to obtain any numerator for the
general filter.
8.5 Comparison Elements 187

It is also useful to observe that the purely recursive structure consists of the pure all-pass function.
In fact, equations (8.29) and (8.30) yield:
UM (Z) bM + bM−1 Z −1 + … + Z −M
HD (Z) = =
X(Z) 1 + b1 Z −1 + … + bM Z −M
This expression shows that, as indicated in Section 6.3, the signal uM (n) is the output of an all-pass
network with x(n) as the input. The transfer function H D (Z) can be expressed directly as a function
of the lattice coefficients by a continued fraction:
( 2
) −1
1 − kM Z
HD (Z) = kM + 1
−1
kM Z +
k Z −1
(1−kM−1
2
)Z−1
m−1 kM−1 Z −1 +

⋱ ⋱
1
+
(1−k12 )Z−1
k1 + k1 Z −1 +1

This observation can be used to calculate the poles of the lattice filter directly [7].
An interesting application of the above results is the implementation of the notch filter intro-
duced in Section 6.3. The notch filter output yN (n) is obtained simply by incorporating one more
adder into Figure 8.14 to carry out the following operation:
yN (n) = xM (n) + uM (n) (8.32)
For order 2, the transfer function of the all-pass circuit is:
k2 + k1 (1 + k2 )z−1 + z−2
HD (Z) =
1 + k1 (1 + k2 )z−1 + k2 z−2
A very peculiar property of the approach is that the frequency 𝜔0 and the 3 dB attenuation band
width B3N can be tuned independently [8]. This decoupling effect is due to the relationships:
k1 ≃ − cos 𝜔0
1 − tan 𝜋B3N
k2 ≃ = (1 − 𝜀)2 (8.33)
1 + tan 𝜋B3N
If a subtraction is performed instead of the addition in equation (8.32), the complementary filter
is obtained.

8.5 Comparison Elements


Having discussed the various structures for digital filters, this is an appropriate point at which to
summarize their properties. Reference [9] gives a detailed comparative analysis.
The cascade structure is the simplest to obtain, as the coefficients correspond to a simple factor-
ization of the Z-transfer function. It produces a minimum number of multiplications, additions,
and memories. However, the representation of the coefficients may require a large number of bits.
The choice between cascade and lattice structures is not normally realistic for filters with fixed
coefficients, as the lattice structure only corresponds to particular applications.
The structures derived from analog ladder networks and simulated ladder filters offer the pos-
sibility of representing the coefficients with few bits, even for very selective filters. Consequently,
elimination of the multipliers can be envisaged. As these are generally the most complex circuits,
the saving in materials is appreciable. Nevertheless, some complications do occur. The number
188 8 Digital Ladder Filters

of additions is increased, as is the complexity of the sequence of operations. Extra memories are
required for storing the intermediate results. Also, multiplexing of the operations between several
filters – an important advantage of digital filters – becomes more difficult. In view of these factors,
a detailed evaluation is necessary before this type of structure can be used.

Exercises

8.1 Give the impedance and distribution matrices for the elementary two port networks of
Figure 8.2. Give the impedance, distribution, and transfer matrices when the elements are
resonant LC circuits.

8.2 Consider the Butterworth filter of order 4 given in Figure 8.12(a). Draw the corresponding
flow chart. Show the circuit of the digital simulated ladder filter and calculate the coefficients
from the given values for the analog elements for a sampling frequency of 40 kHz. Study
the modification of the transfer function in the pass band introduced by a reduction in the
sampling frequency from 40 kHz to 10 kHz.

8.3 Let us consider the low-pass Chebyshev filter of order 7, whose analog elements are given in
Section 8.2. The sampling frequency is taken as 10 kHz. Show the design for the correspond-
ing simulated ladder filter.
What is the number of operations to be carried out in each realization?
Give the frequency response when the coefficients are represented by 5 bits.

8.4 Calculate the frequency response of the lattice filter given as the example in Section 8.4. How
does this response evolve when the parameters are represented by 5 bits? Give the circuit of
the filter with single multiplication elements. How should the circuit be modified to produce
the filter using an inverse Z-transfer function?

References

1 V. Belevitch, Classical Network Theory, Holden-Day, San Francisco, 1968.


2 J. Neirynk and P. van Bastelaer, La synthèse des filtres par factorisation de la matrice de trans-
fert. Revue MBLE, 10(1), 1967.
3 L. T. Bruton, Low sensitivity digital ladder filters IEEE Transactions, CAS22(3), 1975.
4 B. Hosticka, R. Brodersen and P. Gray, MOS sampled data recursive filters using switched
capacitor integrators. IEEE Journal of Solid-State Circuits, 12, 1977.
5 R. Brodersen, P. Gray and D. Hodges, MOS switched-capacitor filters. Proceedings of the IEEE,
67(1), 1979.
6 S. K. Mitra, P. S. Kamat and D. C. Huey, Cascaded lattice realization of digital filters. Circuit
Theory and Applications, 5, 1977.
7 W. B. Jones and A. O. Steinhardt, Finding the poles of the lattice filter. IEEE Transactions,
ASSP33 (4), 1328–31, 1985.
8 T. Saramaki, T. H. Yu and S. K. Mitra, Very low sensitivity realization of IIR digital filters using a
cascade of complex all-pass structures IEEE Transactions, CAS34, 876–86, 1987.
9 R. E. Crochière and A. V Oppenheim, Analysis of linear digital networks. Proceedings of the
IEEE, 63(4), 1975.
189

Complex Signals – Quadrature Filters – Interpolators

Complex signals in the form of sets of complex numbers are currently used in digital signal anal-
ysis. Some examples of these sets are presented in the chapters on discrete Fourier transforms. In
this chapter, analytic signals – a particular category of complex signal – will be studied. Such sig-
nals exhibit some interesting properties and occur primarily in modulation and multiplexing. The
properties of the Fourier transforms of real causal sets will be examined first [1–3].

9.1 The Fourier Transform of a Real and Causal Set

Consider a set of elements x(n) whose Z-transform is written as:




X(Z) = x(n)Z −n
n=−∞

The Fourier transform of this set is obtained by replacing Z with ej2𝜋f in X(Z):


𝜋nf
X(f ) = x(n)e−j2
n=−∞

If the elements x(n) are real numbers, we obtain:

X(−f ) = X(f ) (9.1)

The values of X(f ) at negative frequencies are complex conjugates of the values at positive
frequencies. The supplementary condition of causality can be imposed on the set x(n) and the
consequences for X(f ) will now be examined.
The function X(f ) can be separated into real and imaginary parts:

X(f ) = XR (f ) + jXI (f ) (9.2)

If the set x(n) is real, then using equation (9.1), the function X R (f ) is even. Thus, it is the Fourier
transform of an even set xp (n). The function X 1 (f ) is the Fourier transform of an odd set xi (n) such
that:

xp (n) = xp (−n)
xi (n) = −xi (−n)
x(n) = xp (n) + xi (n) (9.3)

Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
190 9 Complex Signals – Quadrature Filters – Interpolators

x(n)

xp(n)

0 n
xi(n)

Figure 9.1 Decomposition of a causal set into even and odd parts.

Under these conditions, we derive:




XR (f ) = xp (0) + 2 xp (n) cos(2𝜋nf) (9.4)
n=1


XI (f ) = −2 xi (n) sin(2𝜋nf) (9.5)
n=1

If the set x(n) is causal – that is, if:

x(n) = 0 for n < 0

the following equations (Figure 9.1) are derived:


1
xi (n) = xp (n) = x(n) for n ≥ 1
2
xp (0) = x(0)

and hence,


XR (f ) − x(0) = x(n) cos(2𝜋nf) (9.6)
n=1


XI (f ) = − x(n) sin(2𝜋nf) (9.7)
n=1

It can be seen that these two functions are related. To go from one to the other, it is sufficient to
change cos(2𝜋nf ) to − sin(2𝜋nf ) or vice versa. Such an operation is called quadrature, and it will
now be expressed analytically.
By definition, a causal set is one which satisfies the equality:

x(n) = x(n)Y (n)

where the set Y (n) is such that:

Y (n) = 0 for n < 0


1 for n ≥ 0

This set is a sample of the unit step function Y (t) which, in terms of distributions, has a Fourier
transform FY given in Reference [2]:
( )
1 1 1
FY = 𝜐p + 𝛿(f ) (9.8)
j2π f 2
9.1 The Fourier Transform of a Real and Causal Set 191

where 𝜐p(1/f ) is the distribution defined by:


⟨ ( ) ⟩
1

𝜙(f )
𝜐p , 𝜙 = VP df (9.9)
f ∫−∞ f
The principal value of the Cauchy integral is itself defined by:
[ −𝜀 ]

𝜙(f ) ∞
𝜙(f ) 𝜙(f ) ∞
𝜙(f )
VP df = df = lim df + df
∫−∞ f ∫−∞ f 𝜀→0 ∫−∞ f ∫𝜀 f
As the sampling introduces a periodicity to the spectrum, it can be shown that the Fourier trans-
form of the set Y (n) such that:

Y (n) = 0 for n < 0


1
Y (0) =
2
Y (n) = 1 for n > 0

is the distribution FYn :


1 1 1 1
FYn = 𝜐p[cot 𝜋f] + 𝛿(f) for − ≤ f ≤ (9.10)
2j 2 2 2
The convolution product of the Fourier transforms corresponds to the product of the two sets.
Hence,
[ ]
1 1 1
X(f ) = 𝜐p[cot 𝜋f] + 𝛿(f ) ∗ X(f ) + x(0)
2j 2 2
By separating the equation into real and imaginary parts, one obtains:

XR (f ) + jXI (f ) = 𝜐p[cot 𝜋f] ∗ [XI (f ) − jXR (f )] + x(0)

The equations which relate the real and imaginary parts of X(f ) are:
1
2

XR (f ) = x(0) + XI (f ′ ) cot[𝜋(f − f ′ )]df′ (9.11)


∫ −1
2
1
2

XI (f ) = − XR (f ′ ) cot[𝜋(f − f ′ )]df′ (9.12)


∫ −1
2

or further, in a different form, without introducing Cauchy’s principal values,


1∕2
XR (f ) = x(0) − [XI (f ) − XI (f ′ )] cot[𝜋(f − f ′ )]df′ (9.13)
∫−1∕2
1∕2
XI (f ) = [XR (f ) − XR (f ′ )] cot[𝜋(f − f ′ )]df′ (9.14)
∫−1∕2
The real and imaginary parts of the Fourier transform of a causal set are related by
equations (9.11) and (9.12), which correspond to the Hilbert transform for continuous signals.
Example: Consider the set U k (n) such that:

Uk (n) = 0 for n ≠ k
Uk (k) = 1
XR (f ) = cos(2𝜋kf); XI (f ) = − sin(2𝜋kf)
192 9 Complex Signals – Quadrature Filters – Interpolators

It can be shown directly that:


1∕2
cos(2𝜋kf ′ ) cot[𝜋(f − f ′ )]df′
∫−1∕2
1∕2
= cos[2𝜋k(f − f ′ )] cot(𝜋f ′ )df′ = sin(2𝜋kf)
∫−1∕2
1∕2
sin(2𝜋kf ′ ) cot[𝜋(f − f ′ )]df′
∫−1∕2
1∕2
= sin[2𝜋k(f − f ′ )] cot(𝜋f ′ )df′ = − cos(2𝜋kf)
∫−1∕2

On the other hand, using Parseval’s equation, we can write:


1∕2 1∕2
XR2 (f )df = XI2 (f )df (9.15)
∫0 ∫0
The real and imaginary parts of X(f ) have the same power.

9.2 Analytic Signals

Analytic signals correspond to causal signals in which time and frequency are exchanged. Their
spectrum has no negative frequency component, and their name derives from the fact that they
represent the restriction to the real axis of an analytic function of a complex variable – that is, they
can be expanded in series in a region which contains that axis. The properties of analytic signals
are deduced from the properties of causal signals by exchanging time and frequency.
Consider the signal x(t) = xR (t) + jxI (t) such that:

X(f ) = 0 for f < 0

The functions xR (t) and xI (t) are Hilbert transforms of each other:

1 xI (t′ ) ′
xR (t) = dt (9.16)
𝜋 ∫−∞ t − t′

1 xR (t′ ) ′
xI (t) = − dt (9.17)
𝜋 ∫−∞ t − t′
The Fourier transform of the real function:
1
xR (t) = [x(t) + x̄ (t)] (9.18)
2
is the function X R (f ) such that:
1 ̄
XR (f ) = [X(f ) + X(−f )] (9.19)
2
̄
X(−f )
that is, X R (f ) = X(f )/2 for positive frequencies and XR (f ) = 2
for negative ones.
Similarly:
1 ̄
XI (f ) = −j [X(f ) − X(−f )] (9.20)
2
Figure 9.2 shows the decomposition of the spectrum of a signal into real and imaginary parts.
9.2 Analytic Signals 193

Figure 9.2 Spectrum of an analytic signal. X(f)

0 f

XR(f)

0 f

jXI(f)

0 f

Example:
1 j𝜔t
x(t) = ej𝜔t ; xR (t) = cos 𝜔t = [e + e−j𝜔t ]
2
1
xI (t) = sin 𝜔t = −j [ej𝜔t − e−j𝜔t ]
2
Finally, it can be seen that the following equations hold between X R (f ) and:
XI (f ) = −jXR (f ) for f > 0
jXR (f ) for f < 0
That is, X I (f ) is obtained from X R (f ) by a rotation of the components through 𝜋/2. The Hilbert
transform consists of orthogonalizing the signal components. This is a filtering operation, and the
frequency response Q(f ) is represented in Figure 9.3.
Example:

xR (t) = [A(f ) cos(2𝜋ft) − B(f ) sin(2𝜋ft)]df (9.21)
∫0

xI (t) [A(f ) sin(2𝜋ft) + B(f ) cos(2𝜋ft)]df (9.22)
∫0
The properties of continuous analytic signals can be transferred to discrete signals after certain
modifications.
A discrete signal has a periodic Fourier transform. A discrete analytic signal x(n) deduced from
a real signal is a discrete signal whose Fourier transform X n (f ), which has the period f s = 1, is zero
for − 12 < f < 0 (Figure 9.4).

Figure 9.3 Frequency response of a quadrature filter. Q(f)


j

0 f
–j
194 9 Complex Signals – Quadrature Filters – Interpolators

Xn(f)

–1 0 1 f

H(f)
1

0 1 f

Figure 9.4 Spectrum of a discrete analytic signal and an interpolation filter.

If a discrete signal x(n) is obtained by sampling a continuous analytic signal x(t) at a frequency
f s = 1, it is worth noting that the continuous signal can be reconstructed from the discrete values
by using a reconstruction filter which preserves only the signal components contained in the band
(0, f s ), as is shown in Figure 9.4. The reconstruction formula is:


sin[𝜋(t − n)] j𝜋(t−n)
x(t) = x(n) e (9.23)
n=−8
𝜋(t − n)

Sampling does not introduce any degradation to an analytic signal x(t) if its spectrum does not
contain any components with frequencies greater than or equal to f s . Thus, the sampling theorem
for an analytic signal is:
An analytic signal which does not contain any components with frequencies greater than or equal
to f m is wholly determined by the set of its values sampled at time intervals of T = /f m .
The set x(n) is decomposed into a real set xR (n) and an imaginary set xI (n), such that:

x(n) = xR (n) + jxI (n)

The corresponding Fourier transforms X nR (f ) and X nI (f ) are obtained from the Fourier transform
X n (f ) by equations (9.19) and (9.20) given above.
1
XnI (f ) = −jXnR (f ) for 0 < f <
2
1
jXnR (f ) for − <f <0
2
The relations between the sets X R (n) and X I (n) are obtained by considering the quadrature filter
whose frequency response is given in Figure 9.5. The impulse response of this filter is the set h(n),
such that:
0 1∕2
h(n) = jej2𝜋nf df +
(−j)ej2𝜋nf df
∫−1∕2 ∫0
( )
2 n𝜋
h(n) = sin2 for n ≠ 0
𝜋n 2
h(0) = 0 (9.24)

By applying the set xR (n) to this filter, we obtain the set xI (n), thus:

2 ∑

sin2 [𝜋(m∕2)]
xI (n) = xR (n − m) (9.25)
𝜋 m=−∞ m
m≠n
9.3 Calculating the Coefficients of an FIR Quadrature Filter 195

Figure 9.5 Responses of the quadrature Q(f)


filter.
j
1
2
– 1 f
2
–j

h(n)
1
1 1
3 1
5 7
–4 –3 –2 –1
0 1 2 3 4 5 6 7 8 n

and similarly:

2 ∑

sin2 [𝜋(m∕2)]
xR (n) = − xI (n − m) (9.26)
𝜋 m=−∞ m
m≠n

The sets xR (n) and xI (n) are related by the discrete Hilbert transform [4].
Examination of the elements of the set h(n) leads to several observations. Firstly, the fact that
every other element is zero implies that, if the set xR (n) also has every other element zero, then so
must the set xI (n), and further, the sets xR (n) and xI (n) must be interleaved. An example will be
given later.
The impulse response of the quadrature filter corresponds to a linearphase FIR filter as described
in Section 5.2. Its frequency response is:


Q(f ) = −j2 h(n) sin(2𝜋nf) (9.27)
n=1

In order to realize this filter, the number of coefficients must be limited.

9.3 Calculating the Coefficients of an FIR Quadrature Filter

A realizable quadrature filter is easily obtained by limiting the number of terms over which
summation (9.27) is performed. The frequency response then deviates from the ideal response.
In practice, the filter is specified by setting a limit 𝛿 for the ripple in a frequency band (f 1 , f 2 ),
as shown in Figure 9.6. A satisfactory FIR filter can be obtained using a low-pass filter, and the
results are presented in Chapter 5.
One interesting example is that of the filter whose response is given in Figure 9.7. This filter is
called a half-band filter because the pass band represents half of the useful band.
Further, H (0.25) = 0.5, and the response is assumed to be asymmetric about the point
f c = 0.25 – that is, H(0.25 + f ) = 1 − H(0.25 − f ). This filter is specified by the transition band
Δf and the ripple in the pass and stop bands, which are equal to 𝛿 0 . Its coefficients can be
196 9 Complex Signals – Quadrature Filters – Interpolators

Q(f) Figure 9.6 Mask for a quadrature filter.

1+δ
1–δ

f1 f2

–1 0 1 f
2 2
–1 + δ
–1 – δ

H(f) Figure 9.7 Frequency response of a


Δf half-band filter.
1

0.5

1 0 1 1 f

2 4 2

calculated using conventional FIR filter design programs. This type of filter is of interest because
the symmetry of its frequency response implies that the coefficients hn are zero for even values of
n except for h0 . Thus, if n = 2p, decomposition of H(f ) into the sum of two functions leads to:
[ Δf
0.25 2
h2p = cos(2𝜋2pf)df + (−1)p H(0.25 + f ) cos 4𝜋pfdf
∫0 ∫0
]
Δf ∕2
+ [H(0.25 − f ) − 1] cos 4𝜋pfdf
∫0

and hence,

h2p = 0

For a number of coefficients N = 4M + 1, the frequency response is:


[ ]
∑M
−j2𝜋2Mf 1
H(f ) = e 1+2 h2i−1 cos[2𝜋(2i − 1)f ] (9.28)
2 i=1

Translation of this response through 0.25 on the frequency axis leads to the function H ′ (f ) such
that:
[ ]
∑M
′ −j2𝜋2Mf 1 i
H (f ) = H(f − 0.25) = e 1−2 (−1) h2i−1 sin[2𝜋(2i − 1)f ]
2 i=1

The coefficients h′n of the corresponding filter are:

h′2i−1 = j(−1)i h2i−1 ; h′−(2i−1) = −j(−1)i h2i−1 ; 1≤i≤M

They have imaginary values. By comparing the expression for H ′ (f ) with equation (9.27) for Q(f ),
it becomes clear that the set of coefficients an such that:

a2i−1 = −2(−1)i h2i−1 ; a−(2i−1) = 2(−1)i h2i−1 ; 1 ≤ i ≤ M


9.4 Recursive 90∘ Phase Shifters 197

Figure 9.8 An analytic FIR filter.


Delay: 2M

Real Analytic
signal signal
Quadrature
filter

represents the coefficients of a quadrature filter whose ripple equals 2𝛿 0 in the band:
[ ]
1
Δf ∕2, − Δf ∕2 .
2
Example. For the specification 𝛿 0 = 0.01 and Δf = 0.111, it is found that M = 5.
a1 = 0.6283 a3 = 0.1880
a5 = 0.0904 a7 = 0.0443
a9 = 0.0231

The expression H ′ (f ) corresponds to a complex filter which contains two parts: a circuit
producing a delay of 2M elementary periods, and a quadrature filter, as shown in Figure 9.8. The
outputs of these two circuits form the real and imaginary parts of the complex signal. The system
can be said to comprise two branches – one real and the other imaginary. It allows a real signal to
be converted to an analytic signal and is thus an analytic FIR filter.
It should be noted that this property holds true even if the fundamental low-pass filter is not
of the half-band type. In this case, the coefficients with even index do not cancel out. In fact,
the translation through a frequency of 0.25 corresponds to multiplication of the coefficients by a
complex factor, so that the h′n takes the values:

h′n = e−j(𝜋∕2)n hn (9.29)

Under these conditions, the real branch of the system is not a simple delay. A filtering function
is achieved at the same time as the analytic signal is generated.
Thus, circuits with finite impulse response allow an ideal quadrature filter to be approximated
without error in the phase shift but with approximation of the amplitude in the pass band. Circuits
with infinite impulse response, or recursive circuits, provide an alternative approach. By using
pure phase shifters, they allow for the approximation of the quadrature filter with no error in the
amplitude, but with an approximation in the phase.

9.4 Recursive 90∘ Phase Shifters

A recursive phase shifter is characterized by the fact that the numerator and denominator of its
Z-transfer function are image polynomials. That is, they have the same coefficients but in reverse
order. The properties of phase shifters were introduced in Section 6.3.
It is possible to design a pair of phase shifters such that the output signals have a phase difference
which approximates 90∘ with an error of less than , in a given frequency band (f 1 , f 2 ). The cal-
culation techniques are the same as for IIR filters. The procedure for producing a phase difference
with elliptic behavior is as follows [5]:
198 9 Complex Signals – Quadrature Filters – Interpolators

(1) Determination of the order of the circuit:



K(k1 )K( (1 − k2 ))
N= (√ )
2
K(k)K (1 − k1 )

with the following values for the parameters:


[ ]2
tan(𝜋f1 ) 1 − tan(𝜋𝜀)
k= ; k1 =
tan(𝜋f2 ) 1 + tan(𝜋𝜀)
(2) Determination of the zeros zi of the Z-transfer function:
[ √ ]
(4i + 1)K( (1 − k2 )) √
A = Sn , (1 − k )
2
2N

(where Sn is the elliptic function).


A
pi = − tan(𝜋f1 ) √
(1 − A2 )
1 + pi
zi = for 0 ≤ i ≤ N − 1
1 − pi
Example: Assuming the specification f 1 = 0.028; f 2 = 0.33 and  = 1∘ , it is found that:

tan(𝜋f1 ) = 0.0875; k = 0.0505; k1 = 0.9657; N = 4.8

By taking N = 5, we obtain:
p0 = −0.0395z0 = 0.9240
p1 = −0.3893z1 = 0.4396
p2 = −3.8360z2 = −0.5864
p3 = −1.0039z3 = −0.00197
p4 = −0.1509z4 = 0.7377

In forming the circuit, the first three zeros z1 are attributed to one branch and the last two to the
other. The variation in the phase difference with frequency is given by the function 𝜙(f ) represented
in Figure 9.9.

φ(f) Figure 9.9 Characteristic of 90∘ phase


π shifter.

π +ε
2
π –ε
2

0 f1 f2 0.5 f
9.5 Single Side-Band Modulation 199

Recursive phase shifters allow two orthogonal signals to be obtained. It should be noted that in
performing this operation, they also introduce phase distortion which is the same for both signals.
These circuits can be used in modulation and multiplexing equipment.

9.5 Single Side-Band Modulation

Modulation of a signal results in a displacement of the spectrum on the frequency axis. It is sin-
gle side-band (SSB) modulation if, for a real signal, the part of the spectrum which corresponds
to positive frequencies is displaced toward positive frequencies and the part which corresponds
to negative ones is displaced toward negative frequencies. Thus, the signal s(t) = cos 𝜔t has the
modulated signal:

sm (t) = cos(𝜔 + 𝜔0 )t

This operation can be performed by the following procedure:

(1) Form the analytic signal sa (n) = sR (n) + jsI (n) which corresponds to the real signal represented
by the set s(n).
(2) Multiply the set sa (n) by the set of complex numbers:

cos(2𝜋nf0 ) − j sin(2𝜋nf0 )

and retain the real part sm(n) of the set thus obtained. Then,

sm (n) = sR (n) cos(2𝜋n f0 ) + sI (n) sin(2𝜋n f0 )

The signal spectrum evolves as shown in Figure 9.10 and the corresponding circuits are shown
in Figure 9.11.
If the analytic filter is of the FIR type, the set sR (n) is simply the delayed s(n), which is different
from the case of 90∘ recursive phase shifters. The set which corresponds to the modulated signal
sm (n) can be added to other modulated sets to provide, for example, a frequency-multiplexed signal
in telephony.

S(f)

0 f

Sa(f)

0 f0 f

Sm(f)

–f0 0 f0 f

Figure 9.10 Single side-band modulation.


200 9 Complex Signals – Quadrature Filters – Interpolators

Figure 9.11 Single side-band modulation


sR(n)
H1 circuit.

s(n) Cos (2π nf0) sm(n)


+
–Sin (2π nf0)

H2
si(n)

9.6 Minimum-Phase Filters


The properties of causal and analytic signals discussed at the beginning of the chapter help to clarify
a point concerning the phase characteristics of filters [3]. The frequency response of a filter H(f ) is
written as:
H(f ) = A(f )e−j𝜙(f ) with A(f ) = |H(f )|
and
𝜙(f ) = − arg[H(f )]
The term A(f ) is the attenuation and 𝜙(f ) is the phase shift produced by the filter on a sinusoidal
signal, at frequency f .
If the filter has real coefficients, its impulse response, the set h(n), is real, and consequently:
H(f ) = H(−f ); A(f ) = A(−f ) and 𝜙(f ) = −𝜙(−f )
The term h(n) is obtained by:
1∕2
h(n) = 2 A(f ) cos[2𝜋nf − 𝜙(f )]df (9.30)
∫0
As the response cannot precede the application of the signal to the filter, a realizable filter must
be causal, with:
h(n) = 0 for n < 0
It follows that if the response is decomposed into a real and imaginary part:
H(f ) = HR (f ) + jHI (f )
the functions N R (f ) and H I (f ) are related by equations (9.11) and (9.12) given in Section 9.1.
Now, a filter is often only specified by the data for the constraints on the amplitude:
A2 (f ) = HR2 (f ) + HI2 (f )
and this results in a lack of definition in the phase shift.
To express this generally, a certain length of time, corresponding to the propagation time through
the system, is required to process a signal. This is characterized by the variation of the phase shift
with frequency. In order to minimize this propagation time, it is necessary to find the minimum
phase-shift characteristic, which removes the lack of definition in the calculation of the filter.
Another approach is to specify a linear phase shift.
A stable and realizable filter has a Z-transfer function, H(Z), whose poles are inside the unit circle
and whose zeros may be outside it. Assume A0 is such a zero and H 1 (Z) is the function such that:
1 − 2Re(Z0 )Z −1 + |Z0 |2 Z −2
H1 (Z) =
|Z0 |2 − 2Re(Z0 )Z −1 + Z −2
9.7 Differentiator 201

This is the transfer function of a phase shifter of the second order, which was introduced in
Section 6.3 and whose group delay is given by equation (6.45). One can write:
H(Z) = H1 (Z)H2 (Z)
where H 2 (Z) is a function which is zero at Z0−1 and introduces a smaller phase shift than H(Z).
By iteration, it results that minimum phase shift is achieved with the function H m (Z), which is
obtained by replacing all the zeros in H(Z) outside the unit circle with their inverse.
The formulation of the minimum phase condition is that the function:
log[H(Z)] = log[A(Z)] − j𝜙(Z)
should not have any poles outside the unit circle.
Under these conditions, the functions log[A(f )] and 𝜙(f ) are related by equations (9.11) and
(9.12), which correspond to the Hilbert transform. They are the Bayard–Bode equations for discrete
systems:
1∕2
log[A(f )] = K − 𝜙(f ′ ) cot𝜋(f − f ′ )df ′ (9.31)
∫−1∕2
1∕2
𝜙(f ) = log[A(f ′ )] cot𝜋(f − f ′ )df ′ (9.32)
∫−1∕2
The constant K is a scaling factor for the amplitude.
Similarly, a function H M (Z) whose zeros are outside the unit circle is said to be maximum phase.

9.7 Differentiator

A differentiator is a quadrature filter whose frequency response is proportional to frequency:


j𝜋
Hd (𝜔) = D(𝜔)e− 2 (9.33)

D(𝜔) = 𝜔 for 0 ≤ 𝜔 ≤ 𝜔1 ≤ 𝜋
If the upper band edge 𝝎1 equals 𝝅, the filter is said to be full-band.
The frequency response of the corresponding N coefficient digital filter is expressed by:
( )
𝜋
−j + 𝜔(N−1)
H(𝜔) = R(𝜔)e 2 2 (9.34)
R(𝝎) is the real function:

P
R(𝜔) = hi sin(i𝜔); N = 2P + 1 (9.35)
i=1
∑P (( ) )
1
R(𝜔) = hi sin i− 𝜔 ; N = 2P
i=1
2
The corresponding impulse responses have been given in Section 5.2.
Conventional design techniques apply to these devices – amongst others, the least-squares
method. The full-band case is particularly simple, because N is even, as shown by equation (9.35)
above, and the least-squares method leads to the following set of coefficients [6]:
8 (−1)i+1
hi = (9.36)
𝜋 (2i − 1)2
202 9 Complex Signals – Quadrature Filters – Interpolators

If the desired function D(𝝎) is proportional to a power of the frequency, the differentiator is said
to be of higher order.
The conversion of a real signal into a complex signal often involves an interpolation operation,
particularly in analog-to-digital converters in communication transceivers.

9.8 Interpolation Using FIR Filters


Interpolation consists of calculating signal samples between known samples. In fact, it is a filtering
function which can be efficiently realized by FIR filters [7, 8].
Let us assume, as illustrated in Figure 9.12, that the value x(nT + 𝝉) has to be calculated from
the sample sequence x(nT) with the help of a FIR filter having N = 2P+1 coefficients. The delay 𝝉
is such that |𝝉| ≤ T/2.
The filter itself introduces the delay KT, with K being an integer, and the output y(n) is required
to be:
y (n) ≈ x [(n − K)T + 𝜏] (9.37)
The delay can be expressed by the Z-transfer function as well. Taking T = 1:
N−1

H(Z) = ai Z −i ≈ Z −K+𝜏 (9.38)
i=0

and in the frequency domain:


( N−1 )

−ji 𝜔
ai e ej(K−𝜏)𝜔 ≈ 1 (9.39)
i=0

Alternatively,

N−1
e−j𝜏𝜔 e−j(i−K) 𝜔 ≈ 1 (9.40)
i=0

Letting K=P, after change of variables:



P
e−j𝜏𝜔 hi e−j i 𝜔 ≈ 1 (9.41)
i=−P

and finally:

P
1
G(𝜔) = hi e−j( i+𝜏) 𝜔 ≈ 1; |𝜏| ≤ (9.42)
i=−P
2

x(t)

0 T 2T 3T t

Figure 9.12 Interpolation with delay 𝜏.


9.9 Lagrange Interpolation 203

The interpolator coefficients hi are derived from this expression and conventional approximation
techniques such as the least-squares method can be used.
For systems in which the delay may vary in time, such as synchronization loops, it is helpful to
be able to relate the coefficient values to the delay values, in order to make the interpolator be able
to track the delay evolution. Lagrange interpolation is a suitable approach in this case.

9.9 Lagrange Interpolation


In the frequency domain, Lagrange interpolation corresponds to “max-flat” filtering – that is, the
derivatives of the frequency response vanish at the origin. Then, the filter coefficients are obtained
by solving the following linear system:
G(0) = 1; G(p) (0) = 0; 1 ≤ p ≤ P (9.43)
For P = 1, one obtains:
h−1 + h0 + h1 = 1

(𝜏 − 1) h−1 + 𝜏 h0 + (𝜏 + 1)h1 = 0 (9.44)

(𝜏 − 1) h−1 + 𝜏 h0 + (𝜏 + 1) h1 = 0
2 2 2

which leads to the matrix equation:


⎡ 1 1 1 ⎤ ⎡h−1 ⎤ ⎡1⎤
⎢ 𝜏 −1 𝜏 𝜏 +1 ⎥ ⎢ h ⎥ = ⎢0⎥ (9.45)
⎢ ⎥ ⎢ 0⎥ ⎢ ⎥
⎣(𝜏 − 1)2 𝜏 2 (𝜏 + 1)2 ⎦ ⎣ h1 ⎦ ⎣0⎦
whose solution is:
𝜏(𝜏 + 1) (𝜏 − 1)𝜏
h−1 = ; h0 = 1 − 𝜏 2 ; h1 =
2 2
One can observe that the norm of the function H(𝝎) can be written as:

P
3 2 3 4
||H||22 = h2i = 1 − 𝜏 + 𝜏 (9.46)
i=−P
2 2
This shows that the quadratic interpolation error grows with time 𝝉 and it is maximum for
𝝉 = 1/2.
An efficient implementation is obtained by noting that matrix equation (9.45) leads to the fol-
lowing expression of coefficients:

N−1
hi = bij 𝜏 j (9.47)
j=0

with b00 = 1 and bi0 = 0; i ≠ 0. Then, one obtains:


(N−1 )
∑ P

P
Z H(Z) = bij 𝜏 j
Z −i
i=−P j=0

and, by inverting summations:



N−1
Z P H(Z) = Cj (Z)𝜏 j (9.48)
j=0
204 9 Complex Signals – Quadrature Filters – Interpolators

x(n)

CN – 1(Z) CN – 2(Z) ……… C1(Z) Z– P

+ + + y(n)
τ τ

Figure 9.13 Implementation of the Lagrange interpolator.

with:

P
C0 = 1; Cj (Z) = bij Z −i
i=−P

The corresponding implementation diagram is provided in Figure 9.13. It can easily be adapted
to the evolutions of the delay.
The connection with the interpolation formula mentioned in Section 5.6 can be established from
the definition of the filter coefficients.
The general expression of coefficients hi , a solution of (9.45), is obtained by noticing that the
square matrix is of Vandermonde type. Determinants and sub-determinants are zero whenever 2
rows or 2 columns are identical. It follows that they can be expressed as products.
Generalizing to arbitrary periods of time Δ, the interpolation filter coefficients are written as:
∏ Δ−j
N−1
ai = ;0 ≤ i ≤ N − 1 (9.49)
j=0;j≠i
i−j
and interpolated values:

N−1
x(nT − Δ) ≈ y(n) = ai x[(n − i)T] (9.50)
i=0

The quality of the interpolation depends on the number of coefficients and, when this number
tends toward infinity, the sampling formula (1.57) can be obtained using the identity:
∏( t
)
sin 𝜋t = 𝜋t 1− (9.51)
n≠0
n
which, letting 𝚫 = t/T, leads to:
∏ Δ − j ∏ k − (Δ − i) sin 𝜋(Δ − i)
ai = = = (9.52)
j≠i
i−j j≠0
k 𝜋(Δ − i)

In certain applications, such as curve fitting or image processing, data are available in sets or
blocks, and therefore, the values have to be interpolated in the blocks.

9.10 Interpolation by Blocks – Splines


When processing finite sets of data, filters that do not satisfy the Nyquist condition – that is, filters
whose impulse response does not vanish at all integer multiples of the sampling period – can be
employed. However, then, if known samples have to be preserved, some preprocessing is required,
which, in fact, is an inverse filtering operation. Splines are widely used functions of this type [9].
9.10 Interpolation by Blocks – Splines 205

The spline function of degree m is defined by the m following convolutions:


Bm (t) = B0 (t) ∗ B0 (t) ∗ … ∗ B0 (t) (9.53)
where B0 (t) is the unit length impulse:

⎪1; |t| < 1∕2

B0 (t) = ⎨0.5; |t| = 1∕2

⎪0; |t| > 1∕2

Convolution is integration; thus, a higher degree yields a smoother function, because it is nec-
essary to consider higher-order derivatives to find discontinuities. A widely used function is the
cubic spline:
⎧2 |t|3
⎪ 3 − |t| + 2 ; 0 ≤ |t| < 1
2

⎪ 3
B3 (t) = ⎨ 2−|t| ; 1 ≤ t < 2 (9.54)
6

⎪0; 2 ≤ |t|

Given a set of N sample values s(nT),0 ≤ n ≤ N − 1, inverse filtering is carried out first, so that a
new set x(n) is obtained, to which the interpolation filter is applied.
With the cubic spline, assuming T = 1, the Z-transfer function of the samples is:
Z + 4 + Z −1
B3 (Z) = (9.55)
6
The inverse is, in product form:
6 1 1
B−1
3 (Z) = √ √ √ (9.56)
2+ 3 1 + (2 − 3)Z −1 1 + (2 − 3 )Z
or else:
√ [ ]
6−3 3 1 1
B−1
3 (Z) = √ √ + √ −1 (9.57)
2 3−3 1 + (2 − 3)Z −1 1 + (2 − 3 )Z
Of the 2 factors in (9.56), one is causal, and one is anti-causal. The sequence s(n) can be processed
by the first factor and the output u(n) is:

u(n) = s(n) − (2 − 3 ) u(n − 1) (9.58)
The second factor is applied to u(n) and the output is the desired sequence x(n):
6 √
x(n) = √ u(n) − (2 − 3 ) x(n + 1) (9.59)
2+ 3
In that case, the calculations must be carried out in reverse order.
The two above recurrences require initial values – namely u(0) and x(N−1) – whose determina-
tion depends on limit conditions of the data block. In general, a symmetrical extension outside the
data block is retained – that is, s(−n) = s(n) and s(N−1 + k) = s(N−1−k). The periodicity is 2N − 2
and the series development of the first factor in (9.56) leads to the following initial value:
∞ √

u(0) = ( 3 − 2)n s(n)
n=0
206 9 Complex Signals – Quadrature Filters – Interpolators

which is:
1 ∑ √
2N−3
u0 = √ ( 3 − 2)n s(n) (9.60)
1 − ( 3 − 2)2N−2 n=0

For large values of N, depending on the level of accuracy required, the summation may be limited
to the first terms.
In the other direction, x(N−1) can be calculated directly with the following scheme – start from
(9.57), perform series developments of the two terms, use the corresponding expression of u(n)
as a function of s(n) for n = N−1; then, due to the symmetry of s(n) about N−1, the following
initialization is obtained:

6−3 3
x(N − 1) = √ [2 u(N − 1) − s(N − 1)] (9.61)
2 3−3
It is readily verified that the constant signal s(n) = 1 leads to:
1
u(0) = √ = u(n); x(N − 1) = 1 = x(n)
3− 3
Once the transformed block has been made available, interpolation is carried out:

s(t) = x(n) Bm (t − nT) (9.62)
n

The summation is limited to the terms for which the spline function is not null. As in the previous
section, when the degree n grows, this interpolator tends toward the ideal interpolator.

9.11 Interpolations and Signal Restoration


The loss of several samples, adjacent or otherwise, in a signal can be compensated by an inter-
polation operation exploiting the known characteristics of the signal. In particular, when certain
frequencies are missing or are very small, the DFT can be relied upon to estimate the lost temporal
values.
Consider a block of N samples x(n), 0 ≤ n ≤ N − 1, in which P samples x(p), n0 ≤ p ≤ n0 + P − 1,
are unknown. The signal is known to have few high-frequency components in its spectrum and
the DFT outputs y(k), N − P ≤ k ≤ N − 1 are assumed to be null. If the input vector X is split into a
known vector X 1 and an unknown vector X 2 , the DFT output can be expressed by:
Y = T1 X1 + T2 X2 (9.63)
The inputs of the matrices T 1 and T 2 are those of the DFT properly arranged. Splitting the DFT
output vector leads to:
[ ] [ ][ ]
Y1 T T X1
= 11 12 (9.64)
Y2 T21 T22 X2
The vector Y 2 corresponds to the P frequency components which are either null or negligible.
Then, the desired missing input vector X 2 is given by:
−1
X2 = −T22 T21 X1 (9.65)
The restoration is perfect if the vector Y 2 is null, and approximate if it contains small resid-
ual values. In the presence of non-negligible values, further processing, such as filtering, must be
introduced.
9.11 Interpolations and Signal Restoration 207

An important special case is the restoration of a block of pixels in an image from the bordering
pixels.
The compression of fixed images or video streams is often based on decomposition into
blocks – for example, blocks of 8 × 8 or 16 × 16 pixels, in combination with the discrete cosine
transform (DCT-2D). During transmission or manipulation, some blocks may be lost or damaged
and some restoration action is needed to minimize the visual impact. Then, the above technique
can apply.
According to the definition given in Section 3.3.4, without the scale factors, considering a M × N
matrix of real data x(i,j), the DCT delivers a matrix with the same dimensions:
( ) ( )
∑∑
M+1N+1
𝜋(2i + 1)k 𝜋(2j + 1)l
y(k, l) = cos cos x(i, j) (9.66)
i=0 j=0
2M 2N

The impact of the discontinuities at the beginning and end of the sequence, mentioned in
Section 2.1, is avoided by this transform. A property of most of these images is that they have few
high-frequency components. This property can be exploited in the restoration of unknown blocks
from their surrounding pixels.
The unknown block is an M × N rectangle to be determined from the 2(M + N + 2) surrounding
pixels, as illustrated in Figure 9.14a. In the transformed matrix, the M × N high-frequency elements
are set to zero, as indicated in Figure 9.14b.
Once the matrices have been scanned and data are available in vectors, expressions (9.64) and
(9.65) apply. Elements of the matrices T ij (1 ≤ i, j ≤ 2) are the elements of the DCT. The term
G = −T−𝟏 𝟐𝟐 T𝟐𝟏 is called the interpolation mask.
The approach yields particularly simple results because it turns out that every unknown pixel X
is obtained from eight neighboring pixels, as indicated in Figure 9.15, by

X = aX1 + (1 − a)X3 + bX4 + (1 − b)X2 − abX5

−a(1 − b)X6 − (1 − a)(1 − b)X7 − (1 − a)bX8 (9.67)

The parameters a and b depend on the block dimensions M and N. Table 9.1 lists a set of
numerical values.
For the details of the calculations leading to expression (9.67), and for examples with test images,
see reference [10].

(a) (b)

Figure 9.14 (a) Unknown and known pixels; (b) DCT coefficients.
208 9 Complex Signals – Quadrature Filters – Interpolators

X5 X1 X6 Figure 9.15 Determination of an unknown pixel from


eight neighboring pixels.
a
b 1–b
X4 X2
X

1–a

X8 X7
X3

Table 9.1 Parameters of the interpolation mask.

M a, b

2 0.7071
3 0.8090 0.5
4 0.8660 0.6340
5 0.9010 0.7225 0.5
6 0.9239 0.7832 0.5995
7 0.9397 0.8264 0.6736 0.5
8 0.9511 0.8580 0.7298 0.5792
9 0.9595 0.8818 0.7731 0.6423 0.5
10 0.9659 0.9001 0.8070 0.6930 0.5658
11 0.9709 0.9145 0.8340 0.7341 0.6205

9.12 Conclusion

The transformation of real signals into complex signals is a filtering operation, and the quadrature
filter involved can have any frequency response. In particular, when its response is proportional to
frequency, it is called a differentiator.
In practice, real–complex conversions are efficiently implemented through interpolation, with
a half-band filter. The mask of the filter is defined from the performance objectives in terms of
frequency distortion and image–band residuals.
Interpolation is a fundamental operation linked to sampling. In theory, it is defined by the sam-
pling formula which corresponds to a linear-phase infinite impulse response filter. In practice, a
linear-phase FIR filter is used, in order to keep known samples untouched. A particularly important
case is Lagrange interpolation, which corresponds to a max-flat filter, whose frequency response
derivatives vanish at the origin.
Block interpolation, as in image processing, can use filter responses that do not preserve known
samples, such as spline functions, which constitute another approximation of the ideal interpolator.
Lost samples or lost blocks of samples in 1D and 2D signal sequences can be efficiently restored
by interpolation techniques.
Exercises 209

Exercises

9.1 Calculate the Fourier transform X(f ) of the real causal set x(n) such that:
x(n) = 0 for n < 0
an for n ≥ 0
n≥0
with:
|a| < 1
Decompose X(f ) into real and imaginary parts.

9.2 Show that the function X(Z) such that:


1
X(Z) =
1 − aZ −1
can be obtained from its real part X R (𝜔) on the unit circle, given by:
1 − a cos 𝜔
XR (𝜔) = with |a| < 1
1 − 2a cos 𝜔 + a2

9.3 Starting from a real signal represented by the set x(n), a complex signal is formed whose real
and imaginary parts are given by:
xR (n) = x(n) cos[2𝜋(n∕4)]
xI (n) = x(n) sin[2𝜋(n∕4)]
What comments can be made about the sets xR (n) and xI (n)?
Is the signal obtained an analytic signal?
A half-band filter is applied to each of the sets xR (n) and xI (n) and the complex signal filtered
in this way is multiplied by the complex set e−j2𝜋n/4 . What operation has been carried out on
the real signal x(n)? Perform this set of operations on the signal x(n) = cos (𝜋n/5).

9.4 Study the effect of coefficient wordlength limitation on the quadrature FIR filter. Following
the procedure described in Sections 5.7 and 5.10 for linear phase filters, find an expression
for the estimate of the coefficient wordlength as a function of the quadrature filter parame-
ters – that is, ripple and transition band.

9.5 Give a simplified expression for the order of the 90∘ IIR phase shifter. Study the effect
of the coefficient wordlength limitation and find a formula for estimating the coefficient
wordlength as a function of the parameters. Check the results in the example given in
Section 9.4.

9.6 Consider a function H(Z) defined by:


H(Z) = [(1 − Z0 Z −1 )(1 − Z 0 Z −1 )]2
with
Z0 = 0.5(1 + j)
210 9 Complex Signals – Quadrature Filters – Interpolators

This is a minimum-phase function. Give the expression for the linear-phase function and
the maximum-phase function which have the same amplitude characteristic. Compare the
impulse responses.

9.7 A real signal with sampling frequency 2f s is converted into a complex signal with sampling
frequency f s with the following half-band filter:
H(Z) = −0.0506 + 0.2954 Z −2 + 0.5 Z −3 + 0.2951 Z −4 − 0.0506 Z −6
Calculate the filter response at frequencies 0 and f s /8. Derive the ripple and transition
bandwidth. ( )
The filter is included in an IQ modulator fed with the sequence x(n) = sin n𝜋 4
. Give
the expression of the complex output y(n) and show that it has frequency components at
frequencies f s /4 and 3f s /4. Give the amplitudes of these components.

9.8 In a frequency modulation receiver, the discriminator (amplitude/frequency converter) is a


five-coefficient differentiator, and the coefficient vector is:
h = [−0.1766 0.9696 0 − 0.9696 0.1766]
Give the filter response. What is the pass-band width and the ripple of this filter?

9.9 Quadrature filter based on DFT.


The complex sequence associated with the N samples of the real signal xr (n) is to be
determined. The DFT of the sequence, X(k), is calculated, the values obtained are multi-
plied by 2, and X(k) is set to zero for N2 ≤ k ≤ N − 1. The inverse DFT yields the desired
complex sequence xc (n).
Give the expression of the impulse response h(n) of the corresponding filter (refer to
Section 2.4.1). Compare with the quadrature filter coefficients given by expression (9.24)
and explain the differences. A numerical example starts from the sequence:
( )
3,5
xr (n) = cos 2𝜋(n − 1) ; 0 ≤ n ≤ 15
16
which leads to the following table:

n 5 6 7 8 9 10 11

x r (n) 0.8315 −0.3827 −0.9808 0 0.9808 0.3827 −0.8315


x c (n) 0.9565 −0.3827 −0.8558 0 1.1058 0.3827 0.7065
+j0.6040 +j0.9158 +j0.1960 −j1.0047 −j0.2457 +j0.9158 +j0.4371

Using the first values of h(n), justify the values x c (n) in the vicinity of n = 8.
Comment on the accuracy of the approach for quadrature filter design.

9.10 A sequence is made of the following terms:


( )
j n𝜋 + 𝜋3
x(n) = e 4 + 0.5 ejn3𝜋∕4 ; 0≤n≤7
Verify the DFT values: y(6) = y(7) = 0.
Assuming x(4) and x(5) are unknown, find them using (9.65). Give the expression of the
matrix T 22 and apply (9.65).
References 211

References

1 E. A. Guillemin, Theory of Linear Physical Systems. John Wiley, New York, 1963.
2 B. Picinbono, Principles of Signals and Systems. Artech House Inc., London, 1988.
3 A. Oppenheim and R. Schafer, Digital Signal Processing, Chapter 7. Prentice-Hall, Englewood
Cliffs NJ, 1974.
4 B. Gold, A. Oppenheim and C. Rader, Theory and implementation of the discrete Hilbert trans-
form. Proc. of Symp. Computer Processing in Communications, Vol. 19. Polytechnic Press, New
York, 1970.
5 B. Gold and C. Rader, Digital Processing of Signals, Chapter 3. McGraw-Hill, New York, 1969.
6 G. Mollova, Compact formulas for least squares design of digital differentiators. Electronics
Letters, 35(20), 1695–97, 1999.
7 T. I. Laakso, V. Valimaki, M. Karjalainen and U. K. Laine, Splitting the unit delay—tools for
fractional delay filter design. IEEE Signal Processing Magazine, 13(1), 30–60, 1996.
8 J. J. Fuchs, B. Delyon, Minimum L1-norm reconstruction function for 0versampled signals:
application to time delay estimation. IEEE Transactions, 46, 1666–73, 2000.
9 M. Unser, Splines: a perfect fit for signal and image processing. IEEE Signal Processing
Magazine, 16(6), 22–38, 1999.
10 Z. Al Kachouh, M. Bellanger, Efficient restoration technique for missing blocks in images. IEEE
Transactions on Circuits and Systems for Video Technology, 13(12), 1182–86, 2003.
213

10

Multirate Filtering

Multirate filtering is a technique for reducing the calculation rate needed in digital filters and, in
particular, the number of multiplications to be performed per second. As will be shown later, this
parameter is generally regarded as a reflection of the complexity of the system.
In a filter, the number of multiplications M R to be performed per second is given by:
MR = Kfs
where f s is the frequency at which the calculations are made. The parameter f s generally corre-
sponds to the sampling frequency of the signal represented by the numbers to be processed. The
factor K depends on the type of filter and on its performance.
In reducing the value of M R , the factor K can be influenced by choosing the most appropriate
type and structure of a filter and by optimizing the order of that filter to suit the constraints and
required characteristics. Also, f s can be influenced by changes in the sampling frequency during
processing. In many practical cases, the advantages thus obtained are considerable.
The sampling frequency for a real signal must be more than twice its bandwidth, which can vary
during processing. For example, a filtering operation eliminates undesirable components, so the
useful bandwidth is reduced. Once the useful bandwidth has been decreased, the sampling fre-
quency of the signal can itself be reduced. As a result, the sampling frequency can be adapted to
the bandwidth of the signal at each stage of processing, so as to maximize the filter’s computa-
tion speed. Before studying the development and implementation of this fundamental principle,
it is appropriate first to analyze the effect of a change in the sampling frequency on the signal and
its spectrum.

10.1 Decimation and Z-Transform


Sample rate reduction, or decimation, modifies the Z-transform of the signal, as does sample rate
increase, or interpolation. The reduction by a factor of two is illustrated in Figure 10.1, which shows
the decimation symbol and the signal spectrum. The suppression of every other sample is carried
out by adding the initial sequence in which the sign of every other sample has been inverted to
the non-modified initial sequence. The impact of such a sign inversion is a spectral shift of about
half the sampling frequency. Once the spectra are added, the periodicity in the frequency domain
is divided by two.
When the sequence s(n) is split into two interleaved sequences, the Z-transforms of these
sequences take on the following form:
S0 (Z 2 ) = [S(Z) + S(−Z)]∕2; Z −1 S1 (Z 2 ) = [S(Z) − S(−Z)]∕2
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
214 10 Multirate Filtering

S(n) s(2p) s(n) s(2p)


2
xxxxx x .x .x .
(– 1)n 1/2 (b)

(a)
S(f)
S(Z)

S(f – 1/2)

S(– Z)

S0(f)
S(Z) + S(– Z)

S1(f)
S(Z) – S(– Z)
f

(c)

Figure 10.1 Downsampling by 2. (a) Decimation operations; (b) decimation symbol; (c) spectra and
Z-transforms.

The term Z 2 characterizes the doubling of the sampling period, while the term Z −1 reflects the
interleaving operation. The Z-transform of s(n) is recovered as:

S(Z) = S0 (Z 2 ) + Z −1 S1 (Z 2 )

The decomposition and reconstruction formulas, expressed above for factor two, can be general-
ized to any whole number M, and this generalization is presented below with the help of Fourier
transforms.
Given the signal s(t) whose spectrum S(f ) has no component with frequency higher than f m , and
assuming that the signal is sampled with period T such that:

1∕MT > 2fm

where M is a whole number, let us examine the relationship between the Fourier transforms Si (f )
of the sets:
[( ) ]
i
s n+ MT ; i = 0,1, 2, … , M − 1
M
From the results in Section 1.2, the Fourier transform of the distribution u0 (t),


u0 (t) = 𝛿(t − nMT)
n=−∞
10.1 Decimation and Z-Transform 215

is the distribution U 0 (f ) given by:


∑ ( )
1 ∑
∞ ∞
n
U0 (f ) = e−j2πfnMT = 𝛿 f−
n=−∞
MT n=−∞ MT
The Fourier transform of the distribution ui (t),

∞ [ ( ) ]
i
ui (t) = 𝛿 t− n+ MT ; i = 0,1, … , M − 1 (10.1)
n=−∞
M
is the distribution U i (f ) given by:
∑ ( )
1 −j2𝜋fiT ∑
∞ ∞
n
Ui (f ) = e−j2𝜋f(n+i∕M)MT = e 𝛿 f−
n=−∞
MT n=−∞
MT
or,
( )
1 ∑ −j2𝜋(in∕M)

n
Ui (f ) = e 𝛿 f− (10.2)
MT n=−∞ MT
As Si (f ) (i = 0, 1, …, M − 1) is the convolution product of S(f ) with the distribution U i (f ), we
have:
( )
1 ∑ −j2𝜋(in∕M)

n
Si (f ) = e S f− (10.3)
MT n=−∞ MT

Let us calculate the spectrum SM (f ), where:


∑ ( ) M−1
1 ∑ n ∑ −j2𝜋(in∕M)
M−1 ∞
SM (f ) = Si (f ) = S f− e
i=0
MT n=−∞ MT i=0
As the second summation cancels out except for values of n which are multiples of M, this
becomes:
( )
1 ∑

n
SM (f ) = S f− (10.4)
T n=−∞ T
which is the spectrum corresponding to a signal sampled at frequency 1/T.
The Si (f ) terms can also be expressed as a function of SM (f ), and, in equation (10.3), the summa-
tion can be decomposed as follows:
∞ M−1 [ ( ) ]
∑ ∑ m 1 −j2𝜋(im∕M)
Si (f ) = (1∕MT) S f − n+ e
n=−∞ m=0
M T
or:
∑ [( ) ]
1 ∑
M−1 ∞
m n
Si (f ) = (1∕M) e−j2𝜋(im∕M) S f− −
m=0
T n=−∞ MT T
and finally:

M−1 ( )
m
Si (f ) = (1∕M) e−j2𝜋(im∕M) SM f − (10.5)
m=0
MT
Figure 10.2 illustrates the case for M = 4. The spectra Si (f ) correspond to interleaved samples at
frequency 1/MT and the spectrum SM (f ) corresponds to a sample at frequency 1/T. A change in
the sampling frequency will exchange these spectra.
216 10 Multirate Filtering

S0(f)
1

0 1 2 3 1 f
4T 4T 4T T
S1(f)
1

0 1 f
T
S2(f)
1

0 1 f
T
S3(f)
1

0 1 f
S4(f) T

0 1 f
T

Figure 10.2 Spectra obtained by interleaved samples.

It is interesting to note, in Figure 10.2, that retarding the set of sampling pulses causes phase
rotations through multiples of 2𝜋/M for image bands around multiples of the sampling frequency
1/MT.
Addition of all the sets of retarded pulses causes a cancellation of the image bands except around
frequencies which are multiples of 1/T, which becomes the new sampling frequency. This is an
application of the linearity properties of the Fourier transform.
It is also useful to establish relations between the Z-transfer functions of the sequences involved.
Let S(Z) be the Z-transform of the set s(nT). By definition:


S(Z) = s(nT)Z −n (10.6)
n=−∞

The spectrum SM (f ) of the signal sampled with period T is obtained by replacing Z with (ej2𝜋fT ).
Then:

SM (f ) = S(ej2𝜋fT)

Decomposing the summation in S(Z) leads to:


∑ ∑
∞ M−1
S(Z) = S(nMT + iT)Z −(nM+i)
n=−∞ i=0
or:

M−1


S(Z) = Z −i S(nMT + iT)Z −nM
i=0 n=−∞
10.2 Decomposition of a Low-Pass FIR Filter 217

Defining Si (Z M ) by:


M
Si (Z ) = S(nMT + iT)Z −nM
n=−∞

we obtain:

M−1
S(Z) = Si (Z M )Z −i (10.7)
i=0

The terms si (Z M ) are Z-transforms of the sets s[(n + i/M)MT] for i = 0, 1, …, M − 1. The factor
Z −i reflects the interleaving of these sets.
Now, it is necessary to express Si (Z M ) as a function of S(Z). Substituting Ze−j2𝜋m/M for Z in
equation (10.7), we find:
( j2𝜋m
) M−1

S Ze− M = ej2𝜋mi∕M Z −i Si (Z M )
i=0

and, in matrix form:

⎡ (S(Z)j2𝜋 ) ⎤ ⎡ S0 (Z M ) ⎤
⎢ − ⎥ ⎢ −1 M ⎥
⎢ S Ze M ⎥ ⎢ Z S1 (Z ) ⎥
⎢ . ⎥ −1 ⎢ . ⎥
⎢ ⎥ = TN ⎢ . ⎥
⎢ . ⎥ ⎢ ⎥
⎢ . ⎥ ⎢ . ⎥
⎢ −j2𝜋(M−1)M )⎥ ⎢Z −(M−1) S (Z M )⎥
⎣S(Ze ⎦ ⎣ M−1 ⎦

where TN−1 is the matrix of the inverse DFT. Multiplying the two sides of this equation by T N ,
we get:
M−1e−j2𝜋(im∕M)

Z −i Si (Z M ) = (1∕M) S(Ze−j2𝜋(m∕M) ); 0 ≤ i ≤ M − 1 (10.8a)
m=0

which corresponds to equation (10.5) for frequency responses.


Setting W = e−j2𝜋/M , as we did in Chapter 2, we find:

1 ∑ im i
M−1
Si (Z M ) = W Z (S(ZWm )) (10.8b)
M m=0

Equations (10.7) and (10.8) are fundamental for multirate filtering.


The results obtained, and particularly equations (10.3) and (10.4), are valid for signals s(t) whose
spectrum is not limited to the frequency 1/2MT, but spectrum aliasing then occurs.

10.2 Decomposition of a Low-Pass FIR Filter

Multirate filtering will be introduced first for FIR filters, where it appears naturally. Consider a
low-pass FIR filter which eliminates components with a frequency greater than or equal to the fre-
quency f c in a signal sampled at frequency f s . The filtered signal only requires a sampling frequency
equal to 2f c , and in fact, it is sufficient to provide the output numbers at this frequency.
218 10 Multirate Filtering

Figure 10.3 Filter with sample rate


Reduction reduction and increase.
Interpolation
x(n) of s. f. y(n)

Samp. fr. f0 f2

In an FIR filter of order N, the relation which determines the number of the output set y(n) from
the set of input numbers x(n) is written as:

N−1
y(n) = ai x(n − i) (10.9)
i=0

Each output number y(n) is calculated from a set of N input numbers by weighted summation
with the coefficients ai (i = 0, 1, …, N − 1). Under these conditions, the input and output rates are
independent and a decrease in the output sampling rate by a factor k = f s /2f c , assumed to be an
integer, results in a reduction in the computation rate by the same factor.
The same reasoning applies to raising the sampling rate, or interpolation. In this case, the output
rate is greater than the input rate. To show the savings in computation, it is sufficient simply to
regard the rates as being equal by incorporating a suitable amount of null data into the input set.
The independence of the output from the input in FIR filters can be exploited in narrow
band-pass filters, even if the input and output rates are identical, by dividing the filtering operation
into two phases [1]:

(1) Reducing the sampling frequency from the value f s to an intermediate value f 0 such that:

f0 ≥ 2fc

(2) Raising the sampling frequency or interpolating from f 0 to f s .

Figure 10.3 illustrates this decomposition. If the two operations are carried out with similar filters
of order N, the number of multiplications M D to be performed per second is given by:

MD = Nf0 × 2 (10.10)

This value is to be compared with direct realization by a single filter, which results in the value
M R such that:

MR = Nfs (10.11)

Consequently, decomposition is advantageous as long as k = f s /2f c is greater than 2 – that is, as


long as:
fs
fc <
4
This approach, therefore, appears to be well suited to narrow band-pass filters.
It should be noted, however, that the filtering function obtained in the two cases is not exactly
the same, and that distortions are introduced by the decomposition, as shown in Figure 10.4. In
effect, the intermediate subsampling at frequency f 0 ≥ 2f c has three consequences:

(1) Aliasing of residual components with frequencies greater than f 0 /2 into the band below f 0 /2.
Harmonic distortion results. Its power BR depends on the attenuation of the sampling rate
reduction filter and is calculated from its transfer function H(f ) by using the results given in
the previous chapters.
10.2 Decomposition of a Low-Pass FIR Filter 219

H(f)
1+δ
1
1–δ

0 f1 fc f
(a)
H
1+δ
1
1–δ

0 fc 2fc f
(b)
H
1 + 2δ
1
1 – 2δ

0 fc 2fc f
(c)

Figure 10.4 Frequency response of multirate filter, (a) direct filter response; (b) sample rate reduction filter
response; (s) multirate filter response.

For example, if the input signal has a uniform spectral distribution and unit power, the total
power BT of the aliased signal is:
s f −f0 ∕2
1
BT = |H(f )|2 df (10.12)
fs ∫f0 ∕2

If f 1 is the filter pass-band edge, an upper bound on BT is provided by:


N−1
∑ 2f1
BT < |ai |2 −
i=0
fs

The distortion can be assumed to have a uniform spectral distribution and only the power in
the pass band is considered. In this case, we obtain:
[N−1 ]
2f1 ∑ 2 2f1
BR < |a | − (10.13)
f0 i=0 i f2

Allowance must be made for this degradation when calculating the sampling rate reduction
filter [2].
(2) The periodicity in frequency of the response of the sampling rate reduction filter, with period
f 0 , introduces a distortion whose power Bi is a function of the attenuation of the interpolation
filter.
220 10 Multirate Filtering

If this filter is the same as the sampling rate reduction filter, with the same assumptions, we
obtain:
fs −f0 ∕2
Bi = (1∕fs ) |H(f )|2 df
∫f0 ∕2
This distortion outside the pass band can be troublesome when other signals need to be added
to the filtered signal.
(3) Cascading two filters increases the ripple in the pass band. For example, the ripple is doubled
if identical filters are used in both operations.
Finally, the subunits of the multirate filter should be designed so that the complete system satis-
fies the overall specifications imposed on the filter [3].
The circuit in Figure 10.3 is simplified if the sampling frequencies of the signal before and after
filtering can be different. The principle can also be applied to high-pass and band-pass filters, pro-
vided, for example, modulation and demodulation stages are introduced.
The principle of decomposition can be extended to the sampling rate reduction subunit and to
the interpolation subunit, which introduces a further advantage. The FIR half-band filter is a par-
ticularly efficient element for implementing these subunits.

10.3 Half-Band FIR Filters


The half-band FIR filter was introduced in Section 9.3. This is a linear phase filter, and the frequency
response H(f ) takes the value 1/2 at the frequency f s /4.
It is antisymmetric about this point – that is, the function H(f ) satisfies the equations:
( ) ( )
f f
H(fs ∕4) = 0.5; H s + f = 1 − H s − f (10.14)
4 4
For a number of coefficients N = 4M + 1,
[ ]
1 ∑M
−j2𝜋2Mf
H(f ) = e × 1 + 2 h2i−1 cos[2𝜋(2i − 1)f ] (10.15)
2 i=1

The coefficients hn are zero for even value of n except for h0 . Figure 10.5 illustrates the properties
of this filter. The specification is defined by the ripple (𝛿) in the pass and stop bands, and by the
width Δf of the transition band. Given these parameters, the formulae in Section 5.7 can be used
to estimate the filter parameters. The filter order is:
[ ]
2
N ≃ log(1∕10𝛿 2 )fs ∕Δf
3
By taking the attenuation Af as:
Af = 10 log(1∕𝛿 2 )
and bearing in mind the particular significance of the frequency f s /4 in this type of filter, one can
write:
[M ≃ (2∕3)(Af ∕10 − 1)fs ∕4Δf ]
Hence, the relationship between the attenuation (in decibels) and the transition band, for a given
number of coefficients, can be expressed simply as:
Af = 10 + 15M(4Δf ∕fs ) (10.16)
10.3 Half-Band FIR Filters 221

hi

0.5
h1 h1
h5
h5 h3
h3
i

H(f) Δf

0.5

0 fe fe fe f
4 2

Figure 10.5 Half-band FIR filter.

This approximation is valid when M exceeds a few units. The coefficients are calculated using
the general program for FIR filters, with appropriate data. The relationship between the filter input
and output sets is:
[ ]
1 ∑M
y(n) = x(n − 2M) + h2i−1 [x(n − 2M + 2i − 1) + x(n − 2M − 2i + 1)] (10.17)
2 i=1

and M multiplications have to be performed for each element of the output set y(n). It should be
noted that these operations are performed only on elements of the input set with an odd index. It
follows that if such a filter is used to reduce the sampling frequency from f s to f 0 = f s /2, the number
of multiplications to be performed per second is Mf 0 . The same is true if the sampling frequency f s
is increased from f 0 to 2f 0 , which is achieved simply by calculating a sample between two samples
of the input set.

Table 10.1 Half-band FIR filter.

Filter h0 h1 h3 h5 h7 h9

F1 1 1
F2 2 1
F3 16 9 1
F4 32 19 −3
F5 256 180 −25 3
F6 346 208 −44 9
F7 512 302 −53 7
F8 802 490 −116 33 −6
F9 8192 5042 −1277 429 −116 18
222 10 Multirate Filtering

Finally, the number of multiplications to be performed per second in a half-band filter with a
change of sampling frequency is:
[ ( ) f ]
2 1 s 1
MR = log f ∕2 (10.18)
2 10𝛿 2 Δf 4 s
Example: A group of half-band filters with useful application characteristics [3] is shown in
Table 10.1, which gives the quantized coefficients, the quantization step being taken as unity [4].
The frequency response can be calculated simply from equation (10.15). Filters F4, F6, F8, and F9
correspond to a unit value of the parameter 4Δf /f S , with ripples of 37, 50, 67, and 79 dB, respectively.
Filters F2, F3, and F5 have monotonic responses.
The advantages of the particular filter structure described in this section can be applied to general
multirate filtering.

10.4 Decomposition with Half-Band Filters

The properties of the elementary half-band filter can be used to produce the multirate filter design
shown in Figure 10.6. The intermediate frequency f 0 is:
related to the sampling frequency f s by a power of two:
fs = 2P f0 (10.19)
The sampling frequency is reduced or increased by a cascade of P half-band filters. The complete
unit comprises a basic filter which operates at frequency f 0 and has a cascade of half-band filters
on either side.
The overall low-pass filter is specified by the following parameters:
(1) Ripple in the pass band: 𝛿 1
(2) Ripple in the stop band: 𝛿 2
(3) Width of the transition band: Δf
(4) Edge of the pass band: f 1
(5) Edge of the stop band: f 2
To calculate the half-band filters, their specifications have to be defined. The ripple in the pass
band is assumed to be divided between the half-band filters and the basic filter. Also, each filter has
to have a ripple in the stop band smaller than 𝛿 2 . As a result, for each half-band filter, the ripple 𝛿 0
is given by:
{ }
𝛿1
𝛿0 = min , 𝛿2 (10.20)
4P

e e i i
F1 FP Basic filter FP F1
x(n) y(n)

fe fe f0 fe
2P

Figure 10.6 Decomposition with half-band filters.


10.4 Decomposition with Half-Band Filters 223

H(f) Δf
Δf1

0 f2 fe fe fe f
f2
2P 2 2

Figure 10.7 Transition band of first half-band filter.

The first half-band filter F1e of the cascade can be determined if its transition band Δf 1 is fixed
(Figure 10.7). To fix Δf 1 , it must be taken that the role of the first filter is to eliminate those com-
ponents of the signal which can be folded in the useful band after having the sampling frequency.
Thus:
f
Δf1 = s − 2f2 (10.21)
2
From equation (10.18), the number of multiplications M C to be performed in the first filter is:
( )( )
2 1 fs 1
MC1 = log 2
(10.22)
3 10𝛿0 Δf 1 4

The same approach can be taken for the other filters in the cascade, and it can be shown that the
total number of multiplications is estimated as:
( )
1 1
Mc ≈ log fs (10.23)
3 10𝛿02
To estimate the volume of calculations to be performed per second in the complete filter, the
order N of the basic filter must be determined:
[ ( ) ]
𝛿1 1 fs
N≈D ,𝛿 (10.24)
2 2 Δf 2P
with:
( ) ( )
𝛿1 2 1
D ,𝛿 = log
2 2 3 5𝛿1 𝛿2
The values of the parameters showing the complexity of the complete filter are, finally:
[ ( )]
1 f 𝛿1
MR = fs D(𝛿0 ) + 2P+1 s D , 𝛿2 (10.25)
2 Δf 2
Example: Consider a narrow low-pass filter defined by the following:

fs = 1; f2 = 0.05; Δf = 0.025; 𝛿1 = 0.01; 𝛿2 = 0.001

The parameters have the values:


P = 3; 𝛿0 = min {0.01∕12; 0.001} = 0.00083; D(𝛿0 ) = 3.3
D(𝛿1 ∕2, 𝛿2 ) = 2.76; k = 0.8; S(k) = 1.6

Thus: M R = 6.2
A direct realization results in a filter of order N = 110, which corresponds to the values: M R = 55
224 10 Multirate Filtering

From the point of view of generalizing to apply more broadly, it is important, in order to optimize
multirate processing in more general cases than that of a single filter, to emphasize the phase-shift
function.

10.5 Digital Filtering by Polyphase Network

The phase relations between different samples of the same signal were examined in detail in
Section 10.1. The results will be used to analyze multirate filtering from this point of view [5].
Assume that the sampling frequency f s is reduced by a factor N. Let X(Z) be the Z-transform of
( )
the input set x(n); the Fourier transform is obtained by replacing Z with ej2𝜋f∕fs , forming ej2𝜋f∕fs .
The output set y(Nn) sampled at frequency f s /N has Y (Z N ), a function of Z N , as its Z-transform.
Consequently, if phase-shifting circuits are involved in this operation, their transfer function is also
a function of the variable Z N and can be calculated from the overall filter function.
The structure of a phase shifter will be discussed initially for the single element making up the
half-band FIR filter, which is defined by equation (10.17). This relation can be rewritten as:
[ ]
1 ∑
2M
y(n) = x(n − 2M) + ai x(n − 2i + 1) (10.26)
2 i=1

in which:
ai = h(2M−2i+1) = a2M−i+1 for 1 ≤ i ≤ M
The corresponding Z-transfer function is:
[ ]
1 ∑
2M−1
−2M −1 −2i
H(Z) = Z +Z ai+1 Z (10.27)
2 i=0

or alternatively:
1
H(Z) = [H (Z 2 ) + Z −1 H1 (Z 2 )] (10.28)
2 0
The corresponding frequency response is:
1 [ −j2𝜋(f ∕fs )2M ]
H(f ) = e + e−j2𝜋(f ∕fs ) H1 (f ) (10.29)
2
The function H 0 (f ) corresponds to a delay, and so is the characteristic of a purely linear phase
shifter.
Because the coefficients are symmetrical, H 1 (f ) is also a linear phase function. Since this part of
the filter is operating at the frequency f s /2, H 1 (f ) displays the periodicity f s /2. As the number of
the coefficients is even, we can use the results from Section 5.2, and write:
( ( ) )
1 2
H1 (f ) = exp −j2𝜋f M − |H1 (f )| (10.30)
2 fs
or:
H1 (f ) = e−j2𝜋(f ∕fs )2M e−j𝜙(f ) |H1 (f )| (10.31)
The function 𝜙(f ) is linear and has the periodicity f s /2. Consequently, it is expressed by:
([ ] )
2f 1 2f
𝜙(f ) = 𝜋 + − (10.32)
fs 2 fs
10.5 Digital Filtering by Polyphase Network 225

where [x] represents the largest whole number contained in x.


Regarding the amplitude, using equation (10.8) and with the symmetry of H(f ), we get:
( )
j2𝜋f
− f f2
e s H1 (f ) = H(f ) − H −f (10.33)
2
Introducing the antisymmetry with respect to frequency f s /4 shown in Section 10.4, we find:

|H1 (f )| = 2|H(f )| − 1; 0 ≤ f ≤ fs ∕4

The ripple is double with respect to H(f ) and the function is null at f = f s /4 – the frequency which
corresponds to a change in phase of 𝜋.
Figure 10.8 shows the functions |H 1 (f )| and 𝜙(f ) which characterize the filter.
The phase Φ(f ) such that:

Φ(f ) = 𝜙(f ) + 2𝜋(f ∕fs ) (10.34)

is constant and has the value 0 or 𝜋.


Finally, the circuit whose frequency response is e−j2𝜋f∕fs H1 (f ) approximates a pure linear phase
shifter in the pass and stop bands. Its number of coefficients and its complexity depend on the
degree of approximation – that is, on the transition band Δf and on the amplitude 𝛿 of the ripple.
These results are similar to those of Section 9.3.
The half-band filter appears as a network with two branches, as in Figures 10.9 and 10.10.
The overall response corresponds to the sum of the responses of the two branches as shown in
Figure 10.11.
If the sign of function H 1 (f ) is changed, then a high-pass filter is obtained, as shown in
Figure 10.11. Thus, a pair of filters is determined by the calculations for one of them. Let B0 (z) and
B1 (z) be the transfer functions of these filters. The corresponding system is shown in Figure 10.10.

Figure 10.8 Function Hι (f) for half-band filter. H1(f)

1 δ

0 fe fe f
4 2
π
f f
φ(f) + 2π 2π
fe fe
π
2

0 fe f
π φ(f) 2
2

Figure 10.9 Half-band filter with phase


shifters. Z –2 M

+
x(n) y(n)
Z –1 H1(Z 2)
226 10 Multirate Filtering

Z –2 M + y(n)

x(n)

Z –1 H1(Z 2) – yʹ(n)

Figure 10.10 Filter bank with two filters.

H
2

H0
1

0
f
fe fe
4 2
H1
–1

Figure 10.11 Amplitude of half-band filter frequency response.

It is characterized by the matrix equation:


[ ] [ ][ ]
B0 (Z) 1 1 Z −2M
= (10.35)
B1 (Z) 1 −1 Z −1 H1 (Z 2 )
It is worth noting that the matrix of the discrete Fourier transform of order 2 appears in the
expression. The generalization to a bank of N filters leads to a DFT of order N. This is presented in
Section 10.8.
It should also be noted that by changing the sign of every other coefficient in H 1 (f ), the frequency
response is displaced by f s /4 and the quadrature filter from Section 9.3 is obtained.
The results obtained for the half-band filter can be generalized quite simply by using an FIR
filter to reduce or increase the sampling frequency by a factor of N. Assume H(Z) is the Z-transfer
function of such a filter. By assuming that it has KN coefficients, we can write:

MN

N−1
H(Z) = ai Z −n = Z −n Hn (Z N ) (10.36)
i=1 n=0

with:

K−1
Hn (Z N ) = akN+n (Z −N )k
k=0

This filter can be implemented by a network with N paths which is called a polyphase network
because each path has a frequency response which approximates that of a pure phase shifter. The
phase shifts are constant in frequency and are whole multiples of 2𝜋/N. When there is a change
in sampling frequency by a factor of N, the circuits in the different paths of the network operate at
the frequency f s /N.
10.7 Filter Banks Using Polyphase Networks and DFT 227

As infinite impulse response filters have greater selectivity than finite impulse response filters, it
is important to study multirate filters with recursive elements.

10.6 Multirate Filtering with IIR Elements

The basic technique for calculating a multirate filter with IIR elements is to perform the same type
of decomposition on the transfer function H(Z) of the overall IIR filter as was used to produce
equation (10.36). The function H(Z) is assumed to be a rational fraction in which the denominator
and the numerator have the same degree.
Such a decomposition is obtained by finding the poles of H(Z):
ΠKk=1 (Z − Zk )
H(Z) = a0 (10.37)
ΠKk=1 (Z − Pk )
From the identity:
( )
Z N − PkN = (Z − Pk ) Z N−1 + Z N−2 Pk + · · · + PkN−1 (10.38)
one can write:
( )
ΠKk=1 (Z − Zk ) Z N−1 + Pk Z N−2 + · · · + PkN−1
H(Z) = a0 ( )
ΠKk=1 Z N − PkN
which, in another form, is:
∑KN −i
i=0 ai Z
H(Z) = ∑K
1 + k=1 bk Z −Nk
Thus:
∑K
akN+n Z −Nk
k=0
Hn (Z N ) = ∑K (10.39)
1 + k=1 bk Z −Nk
Nn (Z N )
Hn (Z N ) = (10.40)
D(Z N )
Each path of the polyphase network is thus determined. They all have the same recursive part
and are distinguished by the non-recursive one as shown in equation (10.40). In principle, when
compared with the previous section, the difference is that the individual IIR phase shifters obtained
do not have linear phase.
For realization purposes, it is worth pointing out that the poles are raised to power N, which is
highly advantageous because they are spread inside the unit circle.

10.7 Filter Banks Using Polyphase Networks and DFT

A discrete Fourier transform computer is a bank of filters (Section 2.4) which are suitable for
multirate filtering. However, it should be noted that filtering functions achieved in this way have
significant overlap. To improve the discrimination between the components of the signal, the
numbers are weighted before applying the discrete Fourier transform. The weighting coefficients
are samples of functions called spectral analysis windows. The production of filter banks by
228 10 Multirate Filtering

means of a polyphase network and a DFT represents the generalization of such spectral analysis
windows [6, 7].
Assume that a bank is to be created with N filters which cover the band [0, f s ] and are obtained
by shifting a basic filter function along the frequency axis by mf s /N, with 1 < m < N − 1.
If H(Z) is the basic filter Z-transfer function, a change in frequency by mf s /N appears as a change
in the variable from Z to Zej2𝜋m/N . That is, the filter with index m has a transfer function bm (z)
given by:

Bm (Z) = H(Zej2𝜋m∕N )

By applying the decomposition of H(Z) introduced in the earlier sections, this becomes:

N−1
Bm (Z) = Z −n e−j2𝜋(mn∕N) Hn (Z N )
n=0

By allowing for the fact that the functions H n (Z N ) are the same for all the filters, a factorization
can be introduced which results in the following matrix equation:
⎡ B0 ⎤ ⎡ 1 1 1 ··· 1 ⎤⎡ H0 (Z N ) ⎤
⎢ ⎥ ⎢ 2 N−1 ⎥ ⎢ −1 N) ⎥
B
⎢ 1 ⎥=⎢ 1 W W · · · W ⎥⎢ Z H 1 (Z ⎥ (10.41)
⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥
⎢B ⎥ ⎢1 W N−1 W 2(N−1) · · · W (N−1)2 ⎥ ⎢Z −(N−1) H (Z N )⎥
⎣ N−1 ⎦ ⎣ ⎦⎣ N ⎦
where W = e−j2𝜋m/N .
The square matrix is the matrix of the discrete Fourier transform. The bank of filters is realized
by forming a cascade of the polyphase network described in Figure 10.13 and a discrete Fourier
transform.
The operation of this device is illustrated in Figure 10.12 which shows, for the case where N = 4,
the phase shifts introduced at the various points in the system so as to preserve only the signal in
the band [1/2NT, 3/2NT].
The polyphase network has the effect of correcting, in the useful part of the elementary band
[1/2NT, 3/2NT], for the interleaving of the numbers at the output of the discrete Fourier transform
computer. This prevents overlap between the filters and leads to the filter function in Figure 10.13.
This function depends only on the basic filter H(Z), which can be an FIR or IIR filter, and which can
be specified so that the filters in the bank have no overlap or have a crossover point, for example,
at 3 dB or 6 dB.

C0 C0
C1 C1
C2 Δφ = 0 C2
C0 C3 C3
C3 C0
π
C0 Δφ = — C1
C1 C2 2 C3
C1 C2 C1
TFD A
C1 C0 C0 C1
Δφ = π
C2 C3 C2 C2 C3
C1 C2
C0 3π f
Δφ = — C1 0 1 3 1
C3 C2 2 C3
C3 C0 8T 8T T

Figure 10.12 Phase shifts in a filter bank.


Exercises 229

H(f)

0 1 1 2 f
2NT NT NT

Figure 10.13 Filtering function of a filter bank.

The impulse response of the basic filter is the spectral analysis window of the system. If the filters
are specified to have no frequency overlap, the sampling frequency at the output of the filters, or at
the input according to the method of use, can have the value 1/NT and the overall calculation can
be performed at this frequency.
If N is assumed to be a power of 2, the fast Fourier transform algorithm can be used to calculate
the discrete Fourier transform, and the number of real multiplications M R to be performed during
a sampling period in the complete system is:
[ ]
N N
MR = N × 2K + 2N log2 = 2N K + log2 (10.42)
2 2
This value can be compared with the value 2KN 2 required by N IIR filters of the same order
operating at frequency 1/T. An application in telecommunications will be discussed in a later
chapter.

10.8 Conclusion

FIR filters have the property that the sampling frequencies at the input and output are indepen-
dent. This can be exploited to adapt the computation rate to the bandwidth of the signal during
processing. This property can be extended to recursive structures through the use of a suitable
transformation. Phase shifting is the basic function involved.
Multirate filtering can be applied to narrow-band filters. It can provide considerable savings in
computation when there is a factor of more than an order of magnitude between the sampling
frequency and the pass band of the filter, as often occurs in practice.
The use of these techniques requires more detailed analysis of the processing. The limitations
on their use result primarily from complications in the computation sequencing produced by the
cascading of stages operating at different frequencies. This point should be examined carefully for
each potential application of multirate filtering so that an excessive increase in the control unit or
in the instruction program does not outweigh the computational gains.

Exercises

10.1 Give the number of bits assigned to the coefficients of the half-band filters in Table 10.1.
Use the results in Section 5.8 to evaluate the extra ripple introduced in the half-band filters
by limiting the number of bits representing the coefficients. Test this evaluation on the filters
F 6 , F 8 , and F 9 .
230 10 Multirate Filtering

10.2 Estimate the number of coefficients of the three filters in the cascade in the example in
Section 10.4.
As each multiplication result is rounded, analyze the round-off noise produced by the reduc-
tion in sampling rate and give an expression for its power. Perform the same analysis for an
increase in sampling rate and the basic filter. Compare the results with the direct realization.
Give an estimate of the aliasing distortion.

10.3 A filter for a telephone channel has a pass band which extends from 0 to 3400 Hz with a rip-
ple of less than 0.25 dB. Above 4000 Hz, the attenuation is greater than 35 dB. For a sampling
frequency f s = 32 kHz, design a multirate filter as a cascade of half-band filters. Compare
the number of multiplications and additions to be performed per second and the size of the
memory with the values obtained for direct FIR filtering.
The signal applied to the filter is composed of numbers with 13 bits, and the computations
are performed using 16-bit registers. Evaluate the power of the round-off noise with a reduc-
tion and an increase in the sampling rate.

10.4 A discrete Fourier transform computer is a bank of uniform filters with characteristics as
shown in Section 2.4. Study the phase shifts introduced in the odd Fourier transforms and in
the doubly odd Fourier transforms in Section 3.3. What are the characteristics of the banks
of filters so obtained?

10.5 Consider a bank of two filters to be produced using the IIR filter of order 4 given as an
example in Section 7.2. The zeros and the poles of the upper half-plane have coordinates
(Figure 7.5):
Z1 = −0.816 + j0.578; Z2 = −0.2987 + j0.954
P1 = 0.407 + j0.313; P2 = 0.335 + j0.776
Using the equations in Section 10.6, calculate the Z-transfer functions of the polyphase net-
work paths. Give the coordinates of the poles and the zeros in the complex plane.
Use the results of Chapter 7 to determine the effect on the frequency response of limiting
the number of bits in the coefficients of the denominator of the transfer function. Compare
with the direct realization.
Draw the circuit diagram of this bank of two filters and determine the number of
multiplications required, assuming that the sampling frequency at the output of each filter
is half the value at the input.

References

1 M. Bellanger, J. Daguet and G. Lepagnol, Interpolation, extrapolation and reduction of computa-


tion speed in digital filters. IEEE Transactions, 22(4), 1974.
2 F. Mintzer and B. Liu, Aliasing error in the design of multirate filters. IEEE Transactions, 26(2),
1978.
3 R. E. Crochière and L. R. Rabiner, Chapter 5: Multirate Digital Signal Processing, Prentice-Hall
Inc., Englewood Cliffs, NJ, 1983.
4 D. J. Goodman and M. J. Carey, Nine digital filters for decimation and interpolation. IEEE Trans-
actions, 25(4), 1977.
References 231

5 M. Bellanger, G. Bonnerot and M. Coudreuse, Digital filtering by polyphase network: application


to sample rate alteration and filter banks. IEEE Transactions, 24(2), 1976.
6 P. P. Vaidyanathan, Multirate Systems and Filter Banks, Prentice Hall, 1993.
7 N. J. Fliege, Multirate Digital Signal Processing, John Wiley, Chichester, 1994.
233

11

QMF Filters and Wavelets

The compression of certain signals, such as speech, sound, or images, involves sub-band decom-
position combined with sampling rate reduction and reconstruction from sub-bands after storage
or transmission. The simplest approach to these operations is to use banks of two filters [1].

11.1 Decomposition into Two Sub-Bands and Reconstruction


The block diagram of the system is given in Figure 11.1. The signal x(n) to be analyzed is fed to
a set of two filters – namely, a low-pass filter H 0 (Z) and a high-pass filter H 1 (Z) whose outputs
are decimated by factor two. They are called analysis filters. The reconstruction is performed from
two sequences in which every other sample is null, filtered by a low-pass filter G0 (Z) for one and a
high-pass filter G1 (Z) for the other, which are called synthesis filters.
As explained in Section 10.1, due to undersampling at analysis filter output, the image bands
which are added to the useful signal are expressed by H 0 (−Z)X(−Z) and H 1 (−Z)X(−Z). These
undesired signals disappear at the output of the synthesis filters if the following condition is met:
G0 (Z)H0 (−Z) + G1 (Z)H1 (−Z) = 0 (11.1)
Then, it is sufficient to have the same filters in both subsets and the relations:
G0 (Z) = H1 (−Z); G1 (Z) = −H0 (−Z) (11.2)
The reconstruction condition takes the form:
H0 (Z) H1 (−Z) − H1 (Z) H0 (−Z) = Z −K (11.3)
K is the delay incurred by the signal in the process.
Next, the coefficients have to be calculated, depending on the filter type and the frequency
domain specifications. The number of coefficients controls the level of sub-band separation.
The simplest approach, which minimizes computational load, consists of selecting identical
linear phase filters.

11.2 QMF Filters


As pointed out in the previous chapter, a two-branch polyphase network yields a low-pass filter
and a high-pass filter as well, with no additional calculations. Then, decimation can take place at
input and the system block diagram is given in Figure 11.2.
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
234 11 QMF Filters and Wavelets

H0(Z ) 2 2 G0(Z )
x(n) x† (n–K)

X(n)
H1(Z ) 2 2 G1(Z ) ˜ )
X(Z

H0(ZN) H0–1(ZN)

Z–1 H1(ZN) ZH1–1(ZN)


x(n) x(n)
TFD TFD–1

Z–(N–1) HN–1(ZN) ZN–1 HN–1(ZN)

Figure 11.1 Bank of 2 filters for decomposition and reconstruction.

2 H0(Z2) H1(Z 2) 2 Z–1


x (n – K)
x (n)
Z–1 2 H1(Z 2) H0(Z 2) 2

N H0(ZN) G0(ZN) N

N H1(ZN) G1(ZN) N

x(n) x^(n)
TFD TFD–1
^
X(z) ANALYSIS SYNTHESIS X(z)

N HN–1(ZN) GN–1(ZN) N

Figure 11.2 QMF filter.

The transfer functions H 0 (Z 2 ) and H 1 (Z 2 ) represent the polyphase decomposition of prototype


filter H(Z) – that is,

H(Z) = H0 (Z 2 ) + Z −1 H1 (Z 2 ) (11.4)
11.2 QMF Filters 235

Next, it is necessary to determine the conditions to be met by the prototype filter so that the basic
relations (11.1) and (11.2) hold. The system transfer function is expressed by:

T(Z) = Z −1 H1 (Z 2 ) H0 (Z 2 ) (11.5)

The two polyphase components are linked to the prototype filter by (see Section 10.1):
H0 (Z 2 ) = 12 [H (Z) + H(−Z)]; Z −1 H1 (Z 2 ) = 12 [H(Z) − H(−Z)] and the transfer function is:
1
T(Z) = [H 2 (Z) − H 2 (−Z)] (11.6)
4
It can easily be shown that in order to obtain the sign inversion needed for reconstruction in
(11.2), it is necessary to choose a prototype filter with an even number of coefficients: N = 2P. The
frequency response of such a filter is written as (see Section 5.2):

H(f ) = e−j2𝜋f(P−1∕2) 2HR (f ) (11.7)

where H R (f ) is a real even function. Also:


( ) ( )
1 1
H f− = e−j2𝜋f(P−1∕2) ej𝜋(P−1∕2) 2 HR f − (11.8)
2 2
In such conditions, on the unit circle, we get:
[ ( )]
1 2 1
[H (Z) − H 2 (−Z)]Z=ej2𝜋f = e−j2𝜋f(2P−1) HR2 (f ) + HR2 −f (11.9)
4 2
The reconstruction condition is:
( )
1
HR2 (f ) + HR2 −f =1 (11.10)
2
The phase term in (11.9) gives the system delay – that is, the delay for decomposition and
reconstruction – namely K = 2P − 1.
The term QMF (quadrature mirror image filter) stems from the polyphase decomposition of the
even number coefficient prototype filter. In fact, as explained in Chapter 5, it is an interpolator,
which implies that it introduces a delay that is an odd multiple of half the sampling period. Hence
the decomposition (see Section 10.1):

H(Z) = Z −1∕2 H1∕2 (Z 2 ) + Z −3∕2 H3∕2 (Z 2 )

and, in simpler form:

H(Z) = Z −1∕2 [H0 (Z 2 ) + Z −1 H1 (Z 2 )] (11.11)

With respect to polyphase components, we find:

H0 (Z 2 ) = Z 1∕2 [H(Z) + jH(−Z)]; H1 (Z 2 ) = Z 1∕2 [H(Z) − jH(−Z)] (11.12)

and the frequency response:


| ( )|
1
|H0 (f )| = ||H(f ) + j H − f || (11.13)
| 2 |
The base band and the image band are in quadrature. By extension, the name QMF applies to all
banks of two filters used for decomposition and reconstruction.
Next, the coefficients have to be calculated. Relation (11.10) shows that the transfer function
H 2 (Z) is a Nyquist odd coefficient number half-band finite impulse response (FIR) filter and H(Z)
is a low-pass half-Nyquist. The calculation starts with a mask specifying the pass-band edge f 1 , the
236 11 QMF Filters and Wavelets


stop-band edge f 2 , the ripples 𝛿 1 and 𝛿 2 , and imposing the amplitude 2∕2 at frequency 1/4. In
order for H 2 (f ) to meet condition (11.10) with ripple 𝛿, it is sufficient to take:
1 𝛿 √
f2 = − f1 ; H(0) = 1; 𝛿1 = ; 𝛿2 = 𝛿
2 2
Example:
A low-pass filter has the following parameters:
Δf = 0.24; f1 = 0.13; f2 = 0.37; 𝛿 = 0.01
Calculated coefficients:
h1 = h8 = 0.015235; h2 = h7 = −0.085187
h3 = h6 = 0.081638; h4 = h5 = 0.486502
The two branches of the polyphase network obtained H 0 (Z 2 ) and H 1 (Z 2 ) have the same coeffi-
cients, but in reverse order.
∑15
The even coefficients of the filter i=1 h′i Z −i should be null, except for h′8 .
7 ( )2

In fact, we obtain: h′2i = 1.7 × 10−5
i=1; i≠4
Iterative methods can be employed, if necessary, to complete the calculation and achieve a better
approximation of the symmetry.
The quality of the reconstruction depends on the ripple 𝛿 and, therefore, on the number of coef-
ficients. If perfect reconstruction is sought, the option of identical filters must be dropped, and
undersampling is necessary at analysis input [2].

11.3 Perfect Decomposition and Reconstruction


Letting P(Z) = H 0 (Z) H 1 (−Z), the reconstruction condition (11.3) is written:
P(Z) − P(−Z) = Z −K (11.14)
This means that P(Z) is a low-pass filter with an odd number of coefficients and all the even
index coefficients are null, except for the center coefficient, which is unity. For example, for M = 2
different coefficients, we get:
P(Z) = h3 + h1 Z −2 + Z −3 + h1 Z −4 + h3 Z −6 (11.15)
The coefficients of the low-pass and high-pass filters, H 0 (Z) and H 1 (−Z), are subject to the con-
straint that, in their product, the coefficients of the terms Z −1 and Z −5 are null. Some degrees of
freedom remain, which are exploited to obtain specific properties.
In a first example, polynomials of different degrees are chosen, with symmetrical coefficients and
zeros at Z = 1. Then, taking:
1
H1 (−Z) = (1 + Z −1 )2
2
H0 (z) = (1 + Z −1 ) (𝛼0 + 𝛼1 Z −1 + 𝛼1 Z −2 + 𝛼0 Z −3 ) (11.16)
the perfect reconstruction condition imposes that, in the product P(Z), the coefficient of Z −1 is zero
and the coefficient of Z −3 is unity, which leads to 𝛼 0 = − 1/8 and 𝛼 1 = 3/8. Finally:
1
H0 (Z) = [−1 + 2 Z −1 + 6 Z −2 + 2 Z −3 − Z −4 ] (11.17)
8
11.3 Perfect Decomposition and Reconstruction 237

Amplitude
1.4

1.2

H0(Z) H1(– Z)
1

0.8

0.6

0.4

0.2

0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Frequency

H f
1

0 1 1 3 f
2N N 2N

Figure 11.3 Frequency responses of the filters in the standard JPEG 2000 lossless.

The two filters obtained are at the basis of the reversible transform used in the still image com-
pression standard JPEG 2000 – option lossless. The frequency responses are given in Figure 11.3.
It is worth pointing out that the two sub-bands are unbalanced, so the filters are not of the
half-Nyquist type.
Decomposition into two equal sub-bands can be achieved provided linear phase constraint is
dropped and the following factorization of the half-band filter is performed:

P(Z) = H0 (Z) Z −k H0 (Z −1 ) (11.18)

The filters H 0 (Z) and H 1 (−Z) have the same coefficients but in reverse order and the polynomial
P(Z) has degree 2K and 2K + 1 coefficients.
As a function of M, the number of different coefficients, the equality 2K + 1 = 4M holds, which
implies K = 2M−1. The fact that the whole number K is odd allows relation (11.2) to be satisfied,
and with:

G0 (Z) = Z −k H0 (Z −1 ); H1 (Z) = −Z −k H0 (Z −1 ); G1 (Z) = −H0 (−Z)

it can readily be verified that conditions (11.1) and (11.3) are met.
238 11 QMF Filters and Wavelets

Amplitude
1.4

1.2

0.8

0.6

0.4

0.2

0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Frequency

Figure 11.4 Frequency response of the low-pass analysis filter.

The coefficient calculation procedure is the minimum phase technique for FIR filters described
in Section 5.13. The ripple is added to the center coefficient of the half-band filter, which makes the
zeros on the unit circle double, and the minimum and maximum phase factors are extracted [3].
As an illustration, let us consider a filter P(Z) with 2K + 1 = 15 coefficients, computed with the
specifications f1 = 12 − f2 = 0.2. The M = 4 different coefficients are: h1 = 0.62785; h3 = − 0.18681;
h5 = 0.08822; h7 = − 0.05297.
The ripple value is 𝛿 = 0.047 and the central coefficient becomes h0 = 1.047. Taking one of the
zeros that are on the unit circle and those that are inside, the first filter is obtained:
H0 (Z) = 0.3704 + 0.5111 Z −1 + 0.2715 Z −2 − 0.0885 Z −3 − 0.1346 Z −4 + 0.0338 Z −5
+ 0.0973 Z −6 − 0.0703 Z −7
The corresponding frequency response
√ is shown in Figure 11.4. With respect to the initial filter,
the ripple in the attenuated band is 𝛿.
Among the characteristics of the factors of P(Z), the regularity of the frequency response is worth
emphasizing, because it is important for the compression of signals and, particularly, of images.
In filtering, this property corresponds to the presence of multiple zeros at point Z = −1 in the
Z-transfer function. On the theoretical side, the approach is justified by the wavelet theory.

11.4 Wavelets

The objective of the wavelet theory is the representation of signals in the time-frequency domain.
It is a representation which is not possible with the Fourier analysis because it assumes periodic
or finite duration signals. Thus, in order to localize a signal in both time and frequency with the
Fourier transform, a sliding window has to be introduced.
11.4 Wavelets 239

Figure 11.5 Nonuniform filter bank in tree


H0 2 y1(n)
structure.
H0 2

H1 2 y2(n)

X(n)

H1 2 y3(n)

As the basis for decomposition, the wavelet transform uses a set of functions called wavelets,
deduced from a generating function through translation and dilation. It allows for analysis of sig-
nals with arbitrary duration [4, 5].
In practice, the discrete wavelet transform is a nonuniform filter bank and an efficient approach
for implementation consists of cascading banks of two filters, such as those described above, with
decimation by factor two at each stage and the same coefficients for all the filters. The operations
of translation and dilation are performed automatically by the sampling rate changes. The tree
structure for analysis is shown in Figure 11.5. By completing the lower part in the figure, a uniform
filter bank can be obtained.
The coefficients are calculated with the objective of maximum regularity – that is, function P(Z)
featuring the maximum number of zeros at point Z = −1.
A half-band FIR filter with N different coefficients has 4N−1 coefficients and degree 4N−2.
With its 2(N−1) null coefficients, the function P(Z) includes a factor of degree 2(N−1). Thus, the
degree of the remaining factor is 2N. Then, the analysis and synthesis filters are derived from the
factorization of P(Z).
In a minimum phase solution, the filters have 2N coefficients and the Z-transfer function of the
low-pass filter has N zeros at Z = −1 in the complex plane. As in the previous sections, numerical
values can be determined directly by combining this constraint with the conditions of canceling
the coefficients of the odd terms in P(Z). Alternatively, P(Z) can be obtained and factorized.
Table 11.1 provides the coefficients of filters H 0 (Z) for the first values of N. The coefficients
of the other filters involved in analysis and synthesis, H 1 (Z), G0 (Z), G1 (Z) as in Figure 11.1, are
given by:

2N
H0 (Z) = h0,i Z −i ; h1,i = (−1)i h0.2N+1−i ; g0,i = h0.2N+1−i ; g1,i = h1.2N+1−i (11.19)
i=1

Frequency responses are provided in Figure 11.6. It is worth mentioning the similarity with
Butterworth filter responses (see Section 7.2.3), which have the same zeros at Z = −1, also have
the property of perfect reconstruction when the cutoff frequency is set in the middle of the useful
band, and have a monotonic frequency response.
It is also possible to obtain linear phase filters, by giving different odd numbers of coefficients to
the low-pass and high-pass filters. For example, Table 11.2 gives the coefficients of the (9, 7) filters
used in the JPEG 2000 standard for highrate lossy compression [6].
The accuracy of the reconstruction is evaluated by computing the impulse response of the
analysis/synthesis system. Multiplying by H 1 (−Z) the polynomial which is obtained by canceling
the coefficient with an odd index in H 0 (Z) and adding the product by H 0 (−Z) of the polynomial
obtained by canceling the coefficient with even index in H 1 (Z), it can readily be verified that the
center coefficient of the resulting polynomial is unity, while other coefficients take on almost
240 11 QMF Filters and Wavelets

Table 11.1 Coefficients of minimum-phase wavelets.

N=2 N=3 N=4 N=5

0.482963 0.332671 0.230378 0.160102


0.836516 0.806892 0.714847 0.603828
0.224144 0.459878 0.630881 0.724307
−0.129410 −0.135011 −0.027984 0.138427
−0.085441 −0.187035 −0.242295
0.035226 0.030841 −0.032245
0.032883 0.077571
−0.010597 −0.006242
−0.012581
0.003336

Amplitude
1.5

N=1
0.5

N=5

0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Frequency

Figure 11.6 Frequency responses of minimum phase wavelets.

zero values, the error being smaller than 2 × 10−6 . The discrepancies stem from rounding the
coefficients.
Figure 11.7 shows the frequency responses of the filters H 0 (Z) and H 1 (Z)/2. Again, the partition
of the signal is unbalanced, but less so than in Figure 11.3, due to the higher number of filter coeffi-
cients. These filters have half their zeros at points Z = ± 1 in the complex plane, which brings high
regularity to the frequency responses – an important property in image processing.
Regarding arithmetic complexity, it is worth pointing out the small number of multiplications,
close to that of polyphase techniques, because it is possible to benefit from the coefficient symme-
tries for both the analysis and synthesis.
11.4 Wavelets 241

Table 11.2 Filters for lossy compression in JPEG 2000.

Zi H 0 (Z) H 1 (Z)

i=0 0.602949 1.115087


i = ±1 0.266864 −0.591272
i = ±2 −0.078223 −0.057544
i = ±3 −0.016864 0.091272
i = ±4 0.026749 –

Amplitude
1.4

1.2
H1
H0
1

0.8

0.6

0.4

0.2

0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Frequency

H(f)
1 (a)

0 f
1 1 1 1
2N 4N 2N N
(b)
1

0 f
1 1 1 3
4N 2N N 2N

Figure 11.7 Frequency responses of the filters in the standard lossy JPEG 2000.
242 11 QMF Filters and Wavelets

11.5 Lattice Structures

The factorization (11.8) of the half-band filter can also be realized in lattice representation, cancel-
ing every other coefficient. The corresponding modular structure is shown in Figure 11.8.
The transfer functions of analysis and synthesis filters are obtained through the calculations pro-
vided in Section 8.5. For example, for three cells, K = 3, we find:
H0 (Z) = 1 + 𝛼1 Z −1 − 𝛼1 𝛼2 Z −2 + 𝛼2 Z −3

H1 (Z) = −𝛼2 − 𝛼1 𝛼2 Z −1 − 𝛼1 Z −2 + Z −3 (11.20)


This is equivalent to the previous relation: H 1 (Z) = − Z −K H 0 (−Z −1 ). The system transfer
function is:

K
( )
T(Z) = Z −2(K−1) 1 + 𝛼i2 (11.21)
i=1

The delay is 2(K−1) sampling periods. The lattice coefficients are computed from filter
specifications. √ √
For example, imposing a zero at Z = −1 and H 0 (1) = 4, we obtain 𝛼1 = 1 + 2; 𝛼2 = 1 − 2.
Comparison with the previous section reveals that the lattice approach is less efficient than
other factorizations; referring to Table 11.1, the filter with four coefficients has two zeros at
Z = −1. However, the lattice approach has benefits when it comes to implementation, modularity
of the structure, and insensitivity to coefficient rounding, because perfect reconstruction is not
impacted [7].

2:1 1:2 Z– 2 Z
X(Z) – αi αi ˜
X(Z)

x(n) αi – αi ˜
x(n)

Z Z– 2 2:1 1:2

K cells K cells

H(f)

1
f
2N

f
f
2i + 1 2i + 3
4N 4N

Figure 11.8 Bank of 2 filters in lattice structure.


References 243

Exercises
( )
n𝜋
11.1 The signal x(n) = cos 4
is applied to the filter with transfer function H(Z) = − 0.050 +
0.117 Z −1 + 0.452 Z −2 + 0.452 Z −3 + 0.117 Z −4 − 0.050 Z −5
What is the delay incurred by the filter? Give the impulse response and express the output
sequence.

11.2 Give the diagram of a QMF bank of two filters, assuming H(Z) is the prototype filter.
Provide the expressions of the signals at the output from the analysis. Compute the transfer
function of the analysis/synthesis system and give the expression of the reconstructed
signal.

11.3 In the case of perfect decomposition and reconstruction, the prototype P(Z) has degree 6
and it must be factorized with the same degree factors. Looking for maximum regularity, the
low-pass filter H 0 (Z) must have 3 zeros at Z = −1. Compute the coefficients of the high-pass
filter H 1 (Z) and find its zeros.

11.4 Compare the frequency responses with those in Figure 11.3. Quantization occurs at
analysis output; how is it amplified in the reconstruction phase?

11.5 Linear phase analysis filters are assumed to have three and five coefficients. Compute the
values which give the best possible sub-band separation and perfect reconstruction.

11.6 When a unit power white noise is applied at the inputs of the analysis filters, what is the
power of the output signals?
( )
11.7 The signal x(n) = cos n𝜋 4
is applied to the analysis filters in Table 11.2. Give the expres-
sions of the output signals before and after decimation. In the reconstruction process, give
the expressions of the signals before and after the final addition.

11.8 The sensitivity of the decomposition/reconstruction process to coefficient accuracy is to be


investigated.
Considering the filters in Table 11.1, give an upper bound of the reconstruction error due to
coefficient rounding. Check the result with the first filter in the table, assuming four coef-
ficients and four-bit representation. Calculate the frequency response of the corresponding
filter.
Work through the same questions in reference to the filters in Table 11.2.

References

1 R. E. Crochière and L. R. Rabiner, Multirate Digital Signal Processing, Prentice-Hall Inc., Engle-
wood Cliffs, New Jersey, 1983.
2 M. Vetterli, Filter banks allowing perfect reconstruction. Signal Processing, 10(3), 1986, 219–244.
244 11 QMF Filters and Wavelets

3 M. Smith and T. Barnwell Exact reconstruction techniques for tree structured sub-band coders,
IEEE Transactions, ASSP-34, 3, 1986, 434–441.
4 I. Daubechies, Orthonormal bases for compactly supported wavelets. Communications on Pure
and Applied Mathematics, 41, 1988, 909–996.
5 S. Mallat, A Wavelet Tour of Signal Processing, 2nd ed., Academic Press, New York, 1999.
6 A. Skodras, C. Christopoulos and T. Ebrahimi, The JPEG 2000 still image compression standard,
IEEE Signal Processing Magazine, 18(5), 36–58, 2001.
7 P. P. Vaidyanathan, Multirate Systems and Filter Banks, Prentice Hall Inc.: Englewood Cliffs, N.J,
1993.
245

12

Filter Banks

The signal decomposition and reconstruction techniques presented in the previous chapter can
be generalized to any number of sub-bands, using banks of more than two filters. In that case, in
principle, decimation can be performed at the output of the analysis filters, but, in general, it is
preferable to decimate at system input in order to benefit from the combination of a polyphase
network and DFT, thus minimizing the computational complexity.

12.1 Decomposition and Reconstruction

In the realization of filter banks using polyphase networks and DFT, as explained in Section 10.7,
the operations involved are reversible and this leads to the arrangement shown in Figure 12.1 for
the decomposition and reconstruction of a signal [1, 2].
The difficulty, in practice, consists of applying the operations associated with the functions
−1 N
Hi (Z ).
The filter H(Z) which serves as a basis for the process, sometimes called the prototype filter, has
a polyphase decomposition whose elements satisfy relation (10.8):

1 ∑ −j(2𝜋∕N)im
N−1
z−i Hi (Z N ) = e H(Ze−j(2𝜋∕N)m ); 0 ≤ i ≤ −1 (12.1)
N m=0

If the prototype filter H(Z) has a cutoff frequency of less than f s /2N and infinite attenuation for
frequencies greater than or equal to f s /2N – that is, if aliasing of the spectrum due to sampling at
the rate 1/NT is negligible – one can write:

H[exp j2𝜋(f ∕fs − m∕N)]H[exp j2𝜋(f ∕fs − k∕N)] = 0; m≠k (12.2)

Under these conditions, the following equation is satisfied on the unit circle, except for the factor
Z −N :

1 ∑ 2 −j(2𝜋∕N)m
N−1
Hi (Z N )HN−i (Z N ) = H (Ze ) = H02 (Z N )
N 2 m=0
Hence:
Hi (Z N ) (H)N−i (Z N )
= 1; 0≤i≤N −1 (12.3)
H0 (Z N ) H0 (Z N )
These equations simply convey the phase relations illustrated in Figure 10.12.
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
246 12 Filter Banks

H0(Z N) H0–1(Z N)

Z–1 H1(Z N) ZH1–1(Z N)


x(n) x(n)
TFD TFD–1

Z–(N – 1) HN – 1(Z N) ZN – 1 HN – 1(Z N)

Figure 12.1 The principle of decomposition and reconstruction.

One can then take H N−i (Z N ) to realize Hi−1 (Z N ). Thus, the same filter bank is used for
decomposition and reconstruction of the signal; the overall operation corresponds to multiplication
by H02 (Z N ).
In certain applications, it is not possible to neglect aliasing; this is the case, for example, when it is
required to decompose a signal, sample it at the rate f s /N, and then reconstruct it with the greatest
possible accuracy over the band (0, f s ). Let G(Z) be the transfer function of the basic filter for the
reconstruction. As the product of a discrete Fourier transform and its inverse is equal to unity, the
overall operation corresponds to decomposition of the signal x(n) into N interleaved sequences
x(pN + i), to which are applied N operators with transfer functions Gi (Z N )H i (Z N ).
Figure 12.2 corresponds to a reduction of sampling frequency by N in the decomposition part,
sometimes called analysis, and an increase by N in the reconstruction part, sometimes called syn-
thesis. All processing in the corresponding device is performed at a rate 1/N, which is a particularly
effective approach.
The condition for reconstruction with a delay D is written as:

Gi (Z N )Hi (Z N ) = Z −D ; 0≤i≤N −1 (12.4)

The delay D must be the same in all branches of the polyphase network for which the interleaving
corresponds to an increase of sampling frequency by N, and this can delay the initial signal, x(n−D).
To determine the inverse functions Gi (Z N ), it is necessary to perform a detailed analysis of the
frequency response for the elements of the polyphase network.

N H0(Z N) G0(Z N) N

N H1(Z N) G1(Z N) N
x(n) x̂ (n)
DFT DFT–1
X(z) ANALYSIS SYNTHESIS X̂ (z)

N HN – 1(Z N) GN – 1(Z N) N

Figure 12.2 Polyphase filter banks for signal analysis and synthesis.
12.2 Analyzing the Elements of the Polyphase Network 247

12.2 Analyzing the Elements of the Polyphase Network

The frequency response of the elements H i (Z N ) of the polyphase network follows directly from
equation (12.1). However, simplifications can be made. Indeed, the filters of the bank generally
provide limited coverage, as in Figure 12.3. Under these conditions, a given filter is superimposed
only on its immediate neighbors if the response of the prototype filter H(Z) is such that H(f ) = 0
for |f | > 1/N.
Then, for the branch of index i, one can write:
( )
1
Hi (f ) = H(f ) + e−j(2𝜋∕N)i H f − (12.5)
N
Since the periodicity of this response is 1/N, if the coefficients are real, it is sufficient to consider
the response in the interval 0 ≤ f ≤ 1/2N, and this gives:
( )
1
Hi (f ) = H(f ) + e−j(2𝜋∕N)i H −f (12.6)
N
Assuming that the response of the prototype filter is a monotonically decreasing curve in the
transition band Δf , H i (f ) cannot be zero for f ≠ 1/2N. For the value 1/2N, this gives:
( ) ( )
1 1
Hi =H + (1 + e−j(2𝜋∕N)i ) (12.7)
2N 2N
This response is zero for the branch i = N/2.
Hence, with the decomposition (12.1), the same result as equation (10.36), the branch of index
N/2 cannot be inverted since its Z-transfer function has a zero in the Z-plane at the point − 1. To
obtain a set of invertible branches, it is necessary to use another polyphase decomposition.
In Chapter 5, it was shown that a linear phase FIR filter with an even number of coefficients is
an interpolator at mid-sample period. It can be considered to result from an FIR filter with an odd
number of coefficients, having the same frequency response by downsampling of a factor two. It is
therefore necessary to start with a polyphase decomposition with 2N branches and to retain only
one branch in two. One then obtains this equation for H(Z):

N−1
H(z) = Z −(i+1∕2) Hi+1∕2 (Z N ) (12.8)
i=0

With this decomposition, the minimum amplitude H min in a branch is given by:
| ( )
2 || 𝜋
Hmin = ||H(N+1)∕2 = |(1 − ej𝜋∕N )| = 2 sin (12.9)
| 2N || 2N
As a result, to have an invertible polyphase network, it is sufficient to require that the linear phase
prototype FIR filter have an even number of coefficients.

H Δf
1

0 1 1 3 f
2N N 2N

Figure 12.3 Overlap of adjacent filters.


248 12 Filter Banks

Note that the zeros of the functions H i+1/2 (Z N ) in the Z N -plane with respect to the unit circle
are divided equally between the interior and the exterior of the unit circle. The reason is that the
elements of H i+1/2 (Z N ) are almost of linear phase. Furthermore, the amplitude of the frequency
response remains close to unity, so the zeros are far from the unit circle, except for the branches
which exhibit high attenuation at the frequency 1/2N when N is large.

12.3 Determining the Inverse Functions

By placing the transfer functions in the Z-plane, determination of the inverse function for the
polyphase elements starts with a factorization where the L1 zeros within the unit circle are sep-
arated from the L2 zeros which are outside:

L1

L2
Hi (Z) = hi0 (1 − Zh Z −1 ) (1 − Zl Z −1 ) (12.10)
h=1 l=0

The term hi0 is a scaling factor.


For all zeros Z l outside the unit circle, one can write:

−Z ∑ −1 i i

1
= (Z ) Z (12.11)
1 − Zl Z −1 Zl l=0

and, consequently, the inverse of the second factor of (12.10) can be approached with an
arbitrary accuracy within a finite number of terms. Consider the function Gi (Z) defined by:

L3

L3
al Z −l al Z −l
l=0 l=0
Gi (Z) = = (12.12)

L1

L1
(1 − Zh Z −1 ) 1+ al Z −l
h=1 l=0

where L3 is an integer.
The condition for inversion is satisfied if:
(L )(L )
∑ 3 ∑ 3
−l −l
cl Z al Z = Z −(L2 +L3 ) (12.13)
l=0 l=0

The choice of delay L2 + L3 is justified by the fact that the coefficients of the expansion of (12.11)
are decreasing and that the second factor in (12.10) is of maximum phase. The inversion relation
is written in matrix form:
⎡C0 0 0 ··· 0 ⎤ ⎡ a0 ⎤ ⎡ 0 ⎤
⎢C C0 0 … 0 ⎥⎥ ⎢⎢ al ⎥⎥ ⎢⎢0⎥⎥
MA = ⎢ 1 = (12.14)
⎢⋮ ⋮ ⋮ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢⋮⎥
⎢0 0 0 ··· CL3 ⎦ ⎣aL3 ⎦ ⎢⎣1⎥⎦
⎥ ⎢ ⎥

where the vector A has the unknown coefficients al as elements. The system is overdetermined and
permits a solution in the least-squares sense, given by the relation:

(M t M)−1 M t (12.15)

The polyphase synthesis elements Gi (Z), have the structure of a general IIR filter. As the poles
Z h are far from the unit circle, a realization with a direct structure is possible.
12.4 Banks of Pseudo-QMF Filters 249

The method of calculating the polyphase synthesis elements described above is general and
applies to all analysis filters, with the sole condition that there are no zeros on the unit circle.
It is necessary to perform one calculation per branch since different coefficients are obtained. This
enables the analysis filter to be specified relatively independently of the synthesis filter. However,
it may be useful to sacrifice a little flexibility in order to obtain a simpler and more systematic
calculation, as in the previous chapter.
This will be described for a uniform bank of N real filters.

12.4 Banks of Pseudo-QMF Filters

The principle relies on the assumption that, for a given filter, the attenuation is such that aliases
originate only in the adjacent bands [3]. Let H(Z) be the transfer function of a prototype low-pass
linear phase filter having the frequency response represented in Figure 12.4. Consideration of a
number of coefficients equal to LN gives:

LN−1
H(Z) = hk Z −k (12.16)
k=0

In the bank, the filter of index i, centered on the frequency (2i + 1)/4N, has a transfer function
H(Ze−j2𝜋(−2i+1)/4N ). During the analysis, a component of the signal at the frequency (2i + 1)/4N + Δf ,
with perhaps 1/4N < Δf < 3/4N, will be attenuated by a factor H(Δf ). Downsampling at a rate 1/N
will produce a replica of this component at a frequency:
( )
2i + 1 3 2i + 1 3
+ − + Δf = − Δf
4N 4N 4N 4N
During synthesis, this component, aliased in the band of the filter of index i, will be found at
the frequency (2i + 1)/4N + 1/2N − Δf , and it will be subject to attenuation by the synthesis fil-
ter of index i – that is, G(1/2N) − Δf , if G(Z) designates the prototype synthesis filter. Finally, the
replicated component will have suffered attenuation:
( )
1
H(Δf)G − Δf
2N

H(f)
1

0 f
1 1 1 1

2N 4N 2N N
(a)
1

0 f
1 1 1 3
4N 2N N 2N
(b)

Figure 12.4 (a) Prototype filter, (b) uniform bank of N real filters.
250 12 Filter Banks

H(f)

1
Δf
2N

f
Δf
2i + 1 2i + 3
4N 4N

Figure 12.5 Aliasing of a component in the filter bank.

The same component of the signal will now be processed by the filter of index i+1, since it falls in
the pass band. Sampling then produces an image component which, during synthesis, will be added
to the previous aliased component with attenuation H(1/2N − Δf )G(Δf ). The process is illustrated
in Figure 12.5.
Hence, the condition for these components to compensate each other is:
[ ( )] [ ( ) ]
1 1
H(Δf)G − Δf + H − Δf G(Δf)i+1 = 0 (12.17)
2N i 2N
This condition for the absence of aliasing can be obtained by taking G(f ) = H(f ) and applying a
phase difference of 𝜋/2 between the filters of index i and i + 1, during analysis and synthesis.
The necessary phase difference can be obtained by introducing phase shifts into the modulation
functions – for example, by taking the following values for the coefficients hik of the analysis filter
of index i:
[ ( ) ]
𝜋 LN − 1
hik = 2hk cos (2i + 1) k− + 𝜃i (12.18)
2N 2
with 0 ≤ i ≤ N − 1 and 0 ≤ k ≤ NL − 1.
For the synthesis filter:
[ ( ) ]
𝜋 LN − 1
gik = 2hk cos (2i + 1) k− − 𝜃i (12.19)
2N 2
Setting:
LN − 1
ai = ej𝜃i Ci = e−j(2i+1)𝜋∕2N (12.20)
2
gives the following, for the corresponding transfer functions:
Hi (Z) = ai ci H(Ze−j(2i+1)𝜋∕2N ) + ai ci H(Zej(2i+1)𝜋∕2N ) (12.21)

Gi (Z) = ai ci H(Ze−j(2i+1)𝜋∕2N ) + ai ci H(Zej(2i+1)𝜋∕2N ) (12.22)


With the symmetry of the coefficients hk and alignment of the modulation functions, the follow-
ing expressions are satisfied:
gik = hi(LN−1−k) Gi (Z) = Z −(LN−1) Hi (Z −1 ) (12.23)
Under these conditions, the total response of the analysis and synthesis systems can be
written as:
̂ (Z) 1∑ Z −(LN−1) N−1 ∑
N−1 N−1
X
T(Z) = = Hi (Z)Gi (Z) = Hi (Z)Hi (Z −1 ) (12.24)
X(Z) N i=0 N i=0
Thus, the global system is of linear phase.
12.4 Banks of Pseudo-QMF Filters 251

With the assumption that the filters have sufficient attenuation for only the adjacent bands to
cause significant aliasing, it is now necessary to determine the values for the angles 𝜃 i to obtain the
desired cancellation.
At the output of the filter H i (Z), after downsampling, in accordance with equation (10.8),
the signal is:

1∑
N−1
Xi (Z N ) = H (ZWm )X(ZWm ) (12.25)
N m=0 i
The output signal of the system becomes:

N−1
1∑
N−1

N−1
̂i (Z) =
X Gi (Z)Xi (ZN ) = X(ZWm ) Gi (Z)Xi (ZWm ) (12.26)
i=0
N m=0 i=0

The condition for perfect reconstruction can then be written as:



N−1
Gi (Z)Hi (Z) = Z −k (12.27)
i=0


N−1
Gi (Z)Xi (ZWm ) = 0; 1≤m≤N −1 (12.28)
i=0

To make the aliased components appear to cancel one another out, it is necessary to examine the
output signal of each of the synthesis filters. At the output of the filter Gi (Z), the signal X i (Z) using
equation (12.25) can be written as:

N−1
̂i (Z) = Gi (Z) 1
X H (ZWm )X(ZWm ) (12.29)
N m=0 i
However, in light of definition (12.22) and the assumptions concerning attenuation, the filter
Gi (Z) allows passage only of the band centered on frequency (2i + 1)/4N and the two adjacent
bands. Taking account of the distribution of the bands on the frequency axis, the indices m associ-
ated with these adjacent bands correspond to a frequency translation such that:
2i + 1 m 2i + 1 1
± =− ± (12.30)
4M N 4M 2N
In fact, the downsampling leads to frequency translations which are integral multiples of the
frequency 1/N. Under these conditions, the values of m can be written as:
m = ±i and m = ±(i + 1)
For example, considering the case of Figure 12.5, the aliased component arises from a component
at the frequency −(2i + 1)/4N − Δf shifted by (i + 1)/N; hence:
( )
2i + 1 i + 1 2i + 3
− + Δf + = − Δf
4N N 4N
̂i (Z) is limited to the following expansion by using equation (12.21) to define
Consequently, X
H i (Z):
̂i (Z) = Gi (Z) 1 [ai ci H(ZW −(2i+1)∕4 )]X(Z) + ai ci H(ZW (2i+1)∕4 )X(Z)
X
N
+ ai ci H(ZW (2i−1)∕4 )X(ZWi ) + ai ci H(ZW(1−2i)∕4 )X(ZW−i )
+ ai ci H(ZW (2i+3)∕4 )X(ZWi+1 ) + ai ci H(ZW−(2i+3)∕4 )X(ZW−(i+1) ) (12.31)
252 12 Filter Banks

Since:

N−1
̂ (Z) =
X ̂i (Z)
X (12.32)
i=0

̂ (Z) cancel if the high band of X


the aliases in X ̂i (Z) compensates for the low band of X ̂i+1 (Z). The
corresponding condition is obtained by inserting definition (12.22) for Gi (Z) into the expression
̂i (Z) and writing the same expression for X
for X ̂i+1 (Z). The factors of X[ZW i + 1 ] and X[ZW −(i + 1) ]
cancel if the following condition is satisfied:
a2i ci ci H(ZW (2i+3)∕4 )H(ZW (2i+1)∕4 ) + a2i+1 ci+1 ci+1 H(ZW (2i+1)∕4 )H(ZW (2i+1) ∕4) = 0
that is, if:
a2i+1 = −a2i (12.33)
It is therefore necessary that the phase shifts satisfy:
𝜋
𝜃i+1 = 𝜃i + ; 0 ≤ i ≤ N − 1 (12.34)
2
The first condition for perfect reconstruction (12.27) can thus be written as:
1∑
N−1
[c H(ZW(2i+1)∕4 )]2 + [ci H(ZW−(2i+1)∕4 )]
N 1=1 i
( )
2
+ a20 + a0 H(ZW1∕4 )H(ZW−1∕4 )
( )
2
+ a2N−1 + aN−1 H(ZW(N−1)4 )H(ZW−(N−1)4 ) = 1 (12.35)
since the cross products do not cancel at the origin and at the sampling half-frequency as the filters
are adjacent.
For the cross products to vanish, it is necessary to take:
𝜋
𝜃i = (−1)i (12.36)
4
and the imposed condition for calculation of the frequency response of the prototype filter can
finally be written as:
| ( 1 )|2
1
|H(f )|2 + ||H − f || = 1 0 ≤ f <
| 2N | 2N
1 1
H(f ) = 0 ≤f ≤ (12.37)
2N 2
Note that the following are also possible for the phase shifts in the analysis and synthesis banks:
(1) 𝜃i = (i + 1)𝜋/2 – the overall system response cancels at frequencies 0 and 1/2.
(2) 𝜃i = i𝜋/2 – the overall response doubles at frequencies 0 and 1/2.
In summary, the design procedure for a bank of N real pseudo-QMF filters consists of the
following two operations:
(3) Design of the prototype linear phase filter meeting the specification (12.37), with LN
coefficients.
(4) Determination of the transfer functions of the analysis and synthesis filters using the
expressions:

LN−1 [ ( )( ) ]
2i + 1 LN − 1 𝜋
Hi (Z) = 2 cos 2𝜋 k− + (2i + 1) hk Z −k (12.38)
k=0
4N 2 4

LN−1 [ ( )( ) ]
2i + 1 LN − 1 𝜋
Gi (Z) = 2 cos 2𝜋 k− − (2i + 1) hk Z −k (12.39)
k=0
4N 2 4
12.5 Determining the Coefficients of the Prototype Filter 253

The specification of the prototype filter reflects the required separation between the sub-bands.
Several approaches to the design can be envisaged.

12.5 Determining the Coefficients of the Prototype Filter

It is the design of a half-Nyquist filter, and the first approach consists of taking a cosine transition
band and using (5.37). However, the sampling frequency technique set forth in Section 5.4 may be
more efficient. In particular, when the transition band equals sub-band spacing, the coefficients
are derived from a simple formula [4].
Let K be a whole number and consider a set of KN samples H k (0 ≤ k ≤ KN − 1) in the frequency
domain:

H0 = 1 (12.40)

Hk2 + HK−k
2
= 1; HKN−k = Hk ; 1 ≤ k ≤ K − 1

Hk = 0; K ≤ k ≤ KN − K

Generally, the number N of sub-bands is even. The filter coefficients hi (0 ≤ i ≤ KN − 1) are


obtained with the inverse DFT.
The following constraint is imposed:


K−1
H0 + 2 (−1)k Hk = 0 (12.41)
k=1

Then, the middle coefficient hKN/2 is null, and the filter has an odd number of coefficients. There
is a double zero at half the sampling frequency, as indicated in Section 5.12, and high attenuation
results for high frequencies.
For K = 3 and K = 4, equations (12.40 and 12.41) define the frequency samples:

H1 = 0.911438; H2 = 0.411438

and
1
H1 = 0.971960; H2 = √ ; H3 = 0.235147 (12.42)
2
For example, a bank of N = 16 filters and K = 4 has the coefficients:
( )

K−1
2𝜋ki
hi+1 = 1 + 2 (−1)k Hk cos ; 1 ≤ i ≤ 63 (12.43)
k=1
KN
h1 = 0

The filter obtained has an odd number of coefficients. The frequency response, whose transition
band is equal to 1/16, is shown in Figure 12.6. The attenuation grows with frequency, but it is pos-
sible to obtain a constant attenuation using the method indicated in Section 11.2 with appropriate
specifications. Iterative techniques may be used in addition to the procedure to better approximate
the symmetry condition (12.37).
Once the coefficients are determined, the calculations should be set out and performed so that
the number of arithmetic operations is minimal.
254 12 Filter Banks

0
Amplitude
(dB)

–50

–100

–150
0 0.2 0.4 0.6 0.8 1
Normalized frequency

Figure 12.6 Prototype filter obtained by frequency weighting for a bank of 16 filters and 64 coefficients.

12.6 Realizing a Bank of Real Filters

Consider a bank of real filters having the frequency responses shown in Figure 12.4 and an even
number of coefficients 2LN, in which the filter of index i has the coefficients:
[ ( )( )]
2𝜋 1 1
hik = hk cos i+ k+ (12.44)
2N 2 2
with 0 ≤ i ≤ N, 0 ≤ k ≤ 2LN − 1.
A decomposition into a polyphase network and a discrete Fourier transform can be obtained by
setting k = 2Nl + m with 0 ≤ l I ≤ L − 1 and 0 ≤ m ≤ 2N − 1.
The output xi (n) for the filter of index i can be written as:


2LN−1 [ ( )( )]
2𝜋 1 1
xi (n) = x(n − k)hk cos i+ k+ (12.45)
k=0
2N 2 2

or by replacing k with 2Nl + m and simplifying:


∑ ( )( ) L−1
1 ∑
2N−1
2𝜋 1
xi (n) = cos i+ m+ (−1)l h2Nl+mx (n − 2Nl − m) (12.46)
m=0
2N 2 2 l=0

Applying the general transfer function decomposition (10.36) to the prototype filter gives:

2N−1
H(Z) = Z −m Hm (Z 2N ) (12.47)
m=0

and the filters H m (Z 2N ) are those which arise in the second summation of expression (12.46). To
take account of the factor (−1)l , it is sufficient to introduce the functions H m (−Z 2N ), and the dia-
gram corresponding to the analysis filters is shown in Figure 12.7. The decomposition produces a
polyphase network with 2N branches, and a cosine transform.
12.6 Realizing a Bank of Real Filters 255

H0(–Z 2N) 0 x0(n)

Z–1 H1(–Z 2N) 1 x1(n)

x(n)
Cosine
X(Z ) transform
xN(n)

Z–(2N – 1) H2N – 1(–Z 2N)

Figure 12.7 Arrangement of a bank of N real filters.

Z– i Hi (–Z 2N) + yi (n)



Z–N

HN + i (–Z 2N)
x(n)

H2N – I – i (–Z 2N)

Z–N

Z–(N – I – i) HN – I – i (–Z 2N) + yN – I – i (n)

Figure 12.8 Lattice structure for the polyphase network.

Figure 12.7 can be further simplified since, in the cosine transform considered, two symmetrical
inputs are subjected to the same operations, apart from the sign. Factorizing yields an odd-time
odd-frequency cosine transform – a special case mentioned in Section 3.3.2. Furthermore, with
downsampling, the system operates at a rate 1/N. As the system consists of 2N branches, a given
x(n) is processed by the filters H i and H i+N at two successive instants. Under these conditions,
the 2N branches of the polyphase network can be regrouped as N/2 subgroups having the lattice
structure shown in Figure 12.8. The overall configuration is therefore as illustrated in Figure 12.9.
In the case of pseudo-QMF filters, this arrangement is applicable with the introduction of phase
shifts. In fact, by taking equation (12.45) with the coefficients of the filter given in equation (12.38),
the output xi (n) of the filter of index i with 2LN coefficients can be written as:


2LN−1 [ ( ) ]
2i + 1 2LN − 1 𝜋
xi (n) = 2 x(n − k)hk cos 2𝜋 k− + (2i + 1) (12.48)
k=0
4N 2 4

By setting k = lN + m this time, the following double summation is obtained:

∑ ∑
N−1 2L−1 [
𝜋 •
xi (n) = 2 cos (2i + 1)(2m + 1 + N)
m=0 l=0
4N
]
+ (2i + 1)(l − L)𝜋∕2 hlN+mx (n − lN − m) (12.49)
256 12 Filter Banks

H0 HN x0(n)

Z–(N – 1) H2N – I HN – I x1(n)

x(n)

TFDI 2
Z–(N/2 – 1) HN/2 HN + N/2
Z–N/2 HN/2 – I HN + N/2 – 1

xN – 1(n)

Figure 12.9 Optimized structure of a bank of N real filters.

A cosine expansion produces these terms:


𝜋 𝜋
cos(2i + 1)(l − L) = cos(l − L)
2 2
and:
𝜋 𝜋
sin(2i + 1)(l − L) = (−1)i sin(l − L)
2 2
Furthermore, the following expression is satisfied:
[ ] [ ]
𝜋 𝜋
cos (2i + 1)[2(N − 1 − m) + 1] = (−1)i sin (2i + 1)(2m + 1)
4N 4N
Finally, to take account of the term N in the cosine argument of (12.49), it is sufficient to scale the
data by a factor of N/2. By combining all these results, the output xi (n) is determined as follows:


N−1 [ ]
𝜋
xi (n) = 2 cos (2i + 1)(2m + 1) y (n) (12.50)
m=0
4N m

with:
1
ym (n) = [−y2,N∕2−1−m (n) + y2,N∕2+m (n)]; 0 ≤ m ≤ N −1
2
ym (n) = [y1,m−N∕2 (n) − y1,3N∕2−M−1 (n)]; N∕2 ≤ m ≤ N − 1

and:

2L−1
𝜋
y1,m (n) = cos(l − L) hlN+m x(n − lN − m)
m=0
2

2L−1
𝜋
y2,m (n) = sin(l − L) hlN+m x(n − lN − m)
m=0
2

The sequences y1,m (n) and y2,m (n) are interleaved, with sampling frequency f s /2N, and the bank
of analysis filters is determined with an odd-time odd-frequency cosine transform. Finally, the
phase shifts introduced by the pseudo-QMF technique have been taken into account simply by
rearranging the data before the transform.
References 257

Exercises

12.1 In a bank of four filters, the prototype is linear phase, and it has the coefficients:
h = [−0.0218 − 0.0200 − 0.0116 0.0160 0.0621 0.1181 0.1691 0.1996]
Give the polyphase-FFT (PPN-FFT) decomposition. The sampling frequency is unity.
Check that the amplitude of the frequency response in the pass band is the same for all four
branches. The phases are linear and the values at frequency 1/8 in radian
phase = [−1.4829 − 1.2459 − 1.0366 0.8002]
Calculate the delay brought by each branch and justify the results.

12.2 A filter bank can be realized with a FFT whose size equals the number of coefficients
of the prototype filter. Justify the approach, using the results in Chapter 2. Compute the
coefficients. How are they placed in the analysis and synthesis parts?
As an illustration, the example in Section 12.5 is considered with K = 4 and N = 16. Give
the values of the seven coefficients to apply in front of the summation that yields the output
of each filter. The base filter, in the synthesis part, is centered on the FFT output of index 0.
At which FFT output are the seven coefficients applied? Answer the same question for the
neighboring filter centered on index 4.
Describe the application to the synthesis filter bank.

12.3 A bank of N = 16 filters is used to decompose a signal. The 2N coefficients of the prototype
filter are the following:
( ( ))
𝜋 1
h(n) = sin n+ ; 0 ≤ n ≤ 2N − 1
2N 2
What is the amplitude H(f ) of the frequency response? (Refer to Section 5.8 and find an
approximate expression corresponding to the continuous case.)
Compare with the filter associated with the DFT of size N = 16, considering the coefficients
hFFT (n) = 1; 0 ≤ n ≤ 15.
In a PPN-FFT realization, give the expression of the coefficients in each of the 16 branches.
An alternative realization is based on a FFT of size 2N = 32. What processing must be intro-
duced at FFT output to realize the prototype filter? Compare the number of multiplications
with the previous approach.

References
1 M. Bellanger and J. Daguet, TDM-FDM transmultiplexer: digital polyphase and FFT. IEEE
Transactions on Communications, 22(9), 1199–1205, 1974.
2 R. E. Crochière and L. R. Rabiner, Multirate Digital Signal Processing, Prentice-Hall Inc.,
Englewood Cliffs, NJ, 1983.
3 N. J. Fliege, Multirate Digital Signal Processing, John Wiley, Chichester, 1994.
4 K. W. Martin, Small side-lobe filter design for multitone data communications. IEEE Transactions
on Circuits and Systems II, 45(8), 1155–1161, 1998.
259

13

Signal Analysis and Modeling

The modeling of systems is one of the most important areas of signal processing. Furthermore,
modeling is an alternative approach to signal analysis, with properties differing from those of
the Fourier transform and those of filters defined in the frequency domain. Linear prediction,
in particular, is a simple and efficient tool to characterize some signal types and then compress
them. The processing is specified in the time domain, using statistical parameters and, particularly,
correlation.

13.1 Autocorrelation and Intercorrelation


The degree of similarity between two signals can be described using a correlation coefficient, which
should logically take the values +1 for two identical signals, zero for two signals which have no rela-
tionship to each other, and −1 for signals in opposition to each other. When time-dependent signals
are compared, the correlation coefficient becomes a function of time. It is called the intercorrelation
function if the signals are different and the autocorrelation (AC) function if they are the same.
Some definitions and properties will now be restated in order to recapitulate and supplement the
discussions in Sections 1.8 and 4.4.
As shown in Section 1.8, the AC function of a random discrete signal x(n) is the set r xx (n) such
that:
rxx (n) = E[x(i)x(i − n)] (13.1)
where E[x] denotes the expectation value of x.
With the assumption of ergodicity, this becomes:

1 ∑N
rxx (n) = lim x(i)x(i − n) (13.2)
N→∞ 2N + 1
i=−N

The function r xx (n) is even. Its value at the origin is the power of the signal and, for any n, is:
|r (n)| ≤ r (0) (13.3)
| xx xx

Consider a set of N coefficients ai (1 ≤ i ≤ N). Calculation of the variance of the variable y(n)
such that:

N
y(n) = ai x(n − i)
i=1

Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
260 13 Signal Analysis and Modeling

results in:
∑ ∑
N N
E[y2 (n)] = ai aj E[x(n − i)x(n − j)]
i=1 j=1

or,
∑ ∑
N N
E[y2 (n)] = ai aj rxx (i − j) (13.4)
i=1 j=1

As this variance is positive or zero, we have:


∑ ∑
N N
ai aj rxx (i − j) ≥ 0 (13.5)
i=1 j=1

This property characterizes positive functions. If, in the definition of equation (13.1), x(i − n)
is replaced by another signal, a function is obtained which allows two different signals to be com-
pared. The intercorrelation function between two discrete signals x(n) and y(n) is the set r xy (n) such
that:

rxy (n) = E[x(i)y(i − n)] (13.6)

With the assumption of ergodicity,

1 ∑N
rxy (n) lim x(i)y(i − n) (13.7)
N→∞ 2N + 1
i=−N

Similarly,

rxy (−n) = E[x(i)y(i + n)] = rxy (n) (13.8)

For example, if the signals are the input and output of a filter:


y(n) = hj x(n − j)
j=0

then, as shown in Section 4.4, one has:




rxy (n) = E[y(i)x(i − n)] = hj rxx (n − j)
j=0

or,

ryx (n) = rxx (n) ∗ h(n) (13.9)

Similarly,

rxy (n) = rxx (n) ∗ h(−n) (13.10)

and also:

ryy (n) = rxx (n) ∗ h(n) ∗ h(−n) (13.11)

If two random signals are independent, their intercorrelation functions are zero. Further, the
following inequality is always valid:
1
|rxy (n)| ≤ [r (0) + ryy (0)] (13.12)
2 xx
13.2 Correlogram Spectral Analysis 261

It is worth mentioning that the AC and intercorrelation functions can, in some cases, be com-
puted without multiplication, the signals being replaced by their signs. Thus, if x(n) is a Gaussian
signal:

𝜋√
rxx (n) = r (0)E[x(i)sing(x(i − n))] (13.13)
2 xx
[ ]
𝜋
rxx (n) = rxx (0) sin E[sing[x(i)x(i − n)]] (13.14)
2
These expressions can greatly simplify the equipment required.
The Fourier transform Φxy (f ) of the intercorrelation function r xy (n) is called the interspectrum:

Φxy (f ) = X(f )Y (f )

where X(f ) denotes the spectrum of the set x(n) and Y (f ) is the conjugate spectrum of the set y(n).
If the set y(n) is the output of a filter with transfer function H(f ), then:

Y (f ) Y (f )X(f )
H(f ) = =
X(f ) X(f )X(f )

Hence,

Φyx (f ) = Φxx (f )H(f ) (13.15)

which corresponds to equation (13.9). Similarly, for equation (13.10), we have:

Φyx (f ) = Φxx (f )H(f )

and finally,

Φyy (f ) = Φxx (f )| H (f )|2 (13.16)

These results apply to the spectrum analysis of random signals in general and are useful for the
study of adaptive systems.

13.2 Correlogram Spectral Analysis

The Fourier transform of the AC function is the spectral power density:




S(f ) = r(p)e−j2𝜋pf (13.17)
p=−∞

The AC function of the signal is r(p). In practice, the analysis is performed using a limited
number, N 0 , of signal samples. Therefore, r(p) must be estimated first.
An initial estimation of the AC function is:

1 ∑
N0
r1 (p) = x(n)x(n − p) (13.18)
N0 n=p+1

It is biased because its expectation is:


N0 − p
E[r1 (p)] = r(p)
N0
262 13 Signal Analysis and Modeling

The non-biased estimation is given by:

1 ∑
N0
r2 (p) = x(n)x(n − p) (13.19)
N0 − p n=p+1

From P values of the AC function, the so-called correlogram spectral estimation is defined by:

P−1
SCR (f ) = r2 (p)e−j2𝜋pf (13.20)
p=−(P−1)

as a function of the theoretical spectrum:


sin 𝜋f(2P − 1)
SCR (f ) = S(f ) ∗ (13.21)
sin 𝜋f
The variance is approximated as follows:
2P − 1 2
Var{SCR (f )} ≈ S (f ) (13.22)
N0
It is apparent that the minimum number of AC values should be used to perform the estimation,
which means that only the most significant values of the AC estimation should be retained [1].
In fact, the most direct approach to calculate the signal power spectrum from N 0 samples consists
of applying the DFT and taking the squared output values |X(k)|2 . It can readily be shown that the
biased estimation (13.18) is obtained.
An important particular case is the single-frequency signal.

13.3 Single-Frequency Estimation

A single frequency f is to be estimated among a set of spurious components which can be seen
as an additive white Gaussian noise (AWGN). If N samples x(n) (0 ≤ n ≤ N − 1) are available, the
attainable accuracy depends on N and on the signal-to-noise ratio (SNR). A lower bound of the
estimation variance is (see Appendix):
6
BCRf = (13.23)
4𝜋 2 SB N 3
In practice, the theoretical bound above can be closely approximated, provided the SNR remains
greater than a threshold, whose magnitude is roughly:
29
SBmin ≈ (13.24)
N
The simplest estimation technique is based on a DFT of size N, and the procedure is made up of
two steps. First, a rough estimate is derived, taking the index of the output with maximum magni-
tude, and then, it is refined through interpolation [2].
Assuming the sampling frequency is unity, the signal is written as:

x(n) = Aej(2𝜋fn+𝜑) + b(n) (13.25)

A, f , and 𝜑 are the amplitude, frequency, and phase of the component to be identified; the additive
noise is b(n). The first step is based on the decomposition:
k0 1 1
f = + 𝛿f with k0 integer and − < 𝛿f < (13.26)
N 2N 2N
13.3 Single-Frequency Estimation 263

The DFT output y(k) which has the maximum amplitude provides k0 in (13.26). After a shift
about the origin (k0 = 0), we get:
( ( ))
sin N𝜋 𝛿f − Nk
y(k) = Aej𝜑 ( ( )) ej𝜋𝛿f(N−1) + 𝛽(k); 0 ≤ k ≤ N − 1 (13.27)
k
N sin 𝜋 𝛿f − N

𝛽(k) is the noise component in the frequency domain. Since the phase is unknown, in order to
assess the frequency deviation, it is necessary to get rid of the phase terms, and one can use the real
function:
( ( ))
sin N𝜋 𝛿f − Nk
y′ (k) = A ( ( )) + |𝛽(k)|ej𝜋k∕N (13.28)
N sin 𝜋 𝛿f − Nk

which is obtained by restoring the signs of the sin(x)/x function in |y(k)|. Of course, ambiguity is
introduced, and we only know that indices k and N−k are associated with opposite signs. The ambi-
guity can be solved by observing the sign of the difference abs(y(N−1))–abs(y(1)). From (13.28), it
is possible to apply the least squares method to minimize the cost function:
N
−1 [ ]2

2
sin(N𝜋(𝛿f − k∕N))
J= y′ (k) − (13.29)
N sin(𝜋(𝛿f − k∕N))
k=− N2 +1

The minimum can be reached through iteration – for example, using the procedure set forth in
Section 5.4. A direct estimation is obtained by series development of the cardinal sine function.
For example, assuming the amplitude A has been estimated and keeping only 3 terms in the cost
function (13.29), we obtain:
1 abs(y(1)) + abs(y(−1))
𝛿f ≈ (13.30)
N [abs(y(0)) − 1] 𝜋 2 + 2
3

The estimation is valid in the interval [−1/2N 1/2N], and it is degraded in the vicinity of
the bounds. For example, for N = 16, 1/2N = 0.031 and 𝛿f = 0.01, we obtain the estimation
𝛿f estim = 0.0106.
Iterative techniques are needed to approach the bound (13.23). A particularly efficient technique
is called iterated correlations, according to which the amplitude and the frequency of the sine wave
can be estimated independently of the phase. The procedure begins with calculating the AC func-
tion of the signal x(n) by:
{ }
2N |TFD2N (x)|
2
[r1 ] = TFD−1 (13.31)

In order to apply a DFT of size 2N, N zeros are appended to the sequence x(n). The frequency
domain interpolation (see Section 9.10) leads to the AC function because the squaring in the fre-
quency domain amounts to convolution of the signal by its conjugate. Thus, we obtain:

1∑ N − p j2𝜋pf 1 ∑
N−1 N−1
r1 (p) = x(n)x∗ (n − p) = e + b (n); 0≤p≤N −1 (13.32)
N n=p N N n=p 1

The noise component is:

b1 (n) = b(n)e−j2𝜋(n−p)f + b(n − p)ej2𝜋pf + b(n)b∗ (n − p) (13.33)


264 13 Signal Analysis and Modeling

Two observations can be made:


– r 1 (0) is an estimation of the signal power.
– The sequence r 1 (p) is a new signal with decreasing SNR as p grows.
In such conditions, it is advisable to keep only the first values for the next step:
( )
N N
x1 (n) = r1 (n); x1 + n = 0; 0 ≤ n ≤ −1 (13.34)
2 2
Then, the following calculation is carried out:
{ }
[r2 ] = TFD−1N |TFDN (x1 )|2
The sequence [r 2 ] has N values, of which only N/4 are kept for the next step. The procedure stops
at index q such that [r q ] has only four values. Finally, we find:
rq (1)
r1 (0) = A2 ; = ej2𝜋f (13.35)
rq (0)
This algorithm offers very high accuracy, for any frequency to be estimated. For example, if
N = 16, f = 0.1, and SNR = 100 (20 dB), the variance of the estimated frequency and the bound are:
var = 4.3 × 10−7 ; BCRf = 3.7 × 10−7
It can be verified that the SNR threshold is close to SNRmin = 29/16.
With regard to the estimation of a real frequency, refer to exercise 1.

13.4 Correlation Matrix


The AC matrix of dimension N of a signal is defined by:
⎡ r(0) r(1) · · · r(N − 1)⎤
⎢ ⎥
r(1) r(0) · · · r(N − 2)⎥
RN = ⎢ (13.36)
⎢ ⋮ ⋮ ⋮ ⋮ ⎥
⎢r(N − 1) r(N − 2) · · · r(0) ⎥⎦

with:
r(p) = E[x(n)x(n − p)]
As the AC function is positive (13.5), the AC matrix is positive and symmetric by definition.
In fact, it has double symmetry as it is also symmetric about the second diagonal. This leads to a
set of fundamental properties for adaptive systems.
The eigenvalues 𝜆i (0 ≤ i ≤ N − 1) of the AC matrix of order N will be considered first. The char-
acteristic equation:
det (𝜆IN − RN ) = 0
leads to the equations:

N−1
det (RN ) = 𝜆i (13.37)
i=0


N−1
Nr(0) = 𝜆i = N𝜎x2 (13.38)
i=0
13.4 Correlation Matrix 265

That is, if the determinant of the matrix is nonzero, each eigenvalue is nonzero, and their sum is
equal to N times the power of the signal. The positive nature of the matrix RN further implies that
they are all positive:
𝜆i > 0; 0≤i≤N −1 (13.39)
To ensure this, it is necessary and sufficient that the following determinants all be positive:
⎡ r(1) r(1) · · · r(N − 1)⎤
[ ] ⎢ ⎥
r(0) r(1) r(1) r(0) ··· ⋮ ⎥
r(0); det ; … ; det ⎢
r(1) r(0) ⎢ ⋮ ⋮ ⋮ ⋮ ⎥
⎢r(N − 1) ··· · · · r(0) ⎥⎦

The corresponding matrices are the AC matrices of order less than or equal to N.
Under these conditions, the matrix RN can be diagonalized so that:
RN = M t diag(𝜆i )M (13.40)
where M is a square matrix of dimension N, such that M t = M −1 , and diag(𝜆i ) is the diagonal matrix
of the eigenvalues. M t can be equal to M in some cases.
The matrix can be expressed in terms of its normalized eigenvectors U i (0 ≤ i ≤ N − 1) by:

N−1
RN = 𝜆i Ui Uit (13.41)
i=0

It is useful to examine successive powers of the matrices RN and R−1 N


. By using both the
Cayley–Hamilton theorem (according to which a matrix satisfies its own characteristic equation),
and the Lagrange interpolation formula (which has already been used in Section 5.5), the power
of a matrix can be expressed as a function of the powers of its eigenvalues:

N−1
∏ RN − 𝜆j IN
N−1
RPN = 𝜆Pi (13.42)
i=0 j=0
𝜆i − 𝜆j
j≠0

For large values of the integer P, with:


𝜆max = max (𝜆i )
0≤i≤N−1

and, if this maximum corresponds to the value for i = 0, one can write:
∏ RN − 𝜆j IN
N−1
RPN ≈ 𝜆Pmax (13.43)
j=1
𝜆max − 𝜆j

Consequently, for large values of P, one can make the approximation:


RPN ≈ 𝜆Pmax KN (KN : Square matrix of dimension N) (13.44)
where K N is the square matrix of order N of equation (13.43); as the matrix RN can be diagonalized,
and satisfies (13.41), K N can also be expressed more simply as the product of M −1 and a matrix
deduced from M and setting all rows to zero except those which correspond to the index of the
highest eigenvalue. Similarly, using equation (13.41), with the same conditions, one can write:
( −1 )P ( 1 )P ′ ( ′ )
RN ≈ 𝜆min KN KN : Square matrix of dimension N (13.45)
with:
𝜆min = min (𝜆i )
0≤i≤N−1
266 13 Signal Analysis and Modeling

It will be shown below that these two extreme eigenvalues, 𝜆max and 𝜆min , condition the behavior
of adaptive systems.
The physical interpretation of the eigenvalues of the AC matrix is not readily apparent from their
definition, but it can be illustrated by comparing them with the spectrum of the signal x(n).
The case where the signal x(n) is periodic and has period N will be considered first. In this case,
the set r(n) is also periodic, and is also symmetrical with:

r(N − i) = r(i); 0≤i≤N −1

Under these conditions, the matrix RN is a rotating matrix in which each row is derived from the
preceding one by shifting. If the set Φxx (n) (0 ≤ n ≤ N − 1) denotes the Fourier transform of the set
r(n), then it can be shown directly that:

RN TN = TN diag(Φxx (n))

where T N is the matrix of the discrete Fourier transform of order N. Hence


( )
RN TN = TN diag Φxx (n)TN−1 (13.46)

Comparison with equation (13.26) shows that, in this case, the eigenvalues of the matrix RN are
the discrete Fourier transform of the AC function – that is, the values of the signal power spectral
density. M is the cosine transform matrix.
This relation is also valid for discrete white noise as the spectrum is constant and, since the AC
matrix is a unit matrix (to a factor), the eigenvalues are equal.
Real signals generally have a spectral density with nonconstant power, and their AC function
r(p) decreases as the index p increases. For sufficiently large N, the significant elements of the
N-dimensional matrix can be regrouped around the principal diagonal. Under these conditions, let
R′N be the AC matrix of a signal x(n) which is assumed to be periodic with period N. Its eigenvalues
Φxx (n) form a sample of the power spectral density. The discrepancy between RN and R′N is due to
the fact that R′N is a rotating matrix and the difference appears primarily in the upper right-hand and
lower left-hand corners. Thus, RN can be better approximated by a diagonal matrix than R′N , and
consequently, its eigenvalues are less dispersed. In fact, under certain conditions which commonly
occur in practice, it can be shown that [3]:

min Φxx (n) ≤ 𝜆min ≤ 𝜆max ≤ max Φxx (n) (13.47)


0≤n≤N−1 0≤n≤N−1

and for sufficiently large N:

𝜆min ≈ min Φxx (f ); 𝜆max ≈ max Φxx (f ) (13.48)


0≤f ≤1 0≤f ≤1

In conclusion, it can be considered in practice that, when the dimension of this matrix is
sufficiently large, the extreme eigenvalues of the AC matrix approximate the extreme values of the
power spectral density of the signal.

13.5 Modeling

Digital filters apply to system modeling, as shown in Figure 13.1. The signal x(n) is fed to the
system and the model, and the coefficients are computed so that the difference in the outputs is
minimized.
13.5 Modeling 267

y(n)
System to be modeled

x(n) e(n)
+


Digital filter
y(n)

Figure 13.1 System modeling.

Depending on the knowledge available to begin with, one of a number of filters may be used as a
model. However, FIR filters are chosen in general, due to their ease of design and implementation,
and the output is:

N−1
ỹ (n) = hi x(n − i) = H t X(n) (13.49)
i=0

where X(n) is the vector of the N most recent data. The number of coefficients N is chosen in
accordance with information available about the model. The output error is defined by:
e(n) = y(n) − ỹ (n) (13.50)
Next, the coefficients are calculated using the minimum mean square error (MMSE) criterion
with cost function:
J = E[e2 (n)] (13.51)
Canceling the derivatives of this cost function produces the equation E[e(n)X(n)] = 0, which
defines the decorrelation of the output sequence and the vector of the most recent input data. Then,
we get:
E[y(n)X(n)] − E[X(n)X t (n)]H = 0 (13.52)
Definition (13.1) of the AC function and (13.36) of the AC matrix show that E[X(n)X t (n)] = RN
and the coefficients obtained by:
H = R−1
N ryx (13.53)
Thus, the coefficients of the modeling filter are obtained by multiplying the inverse of the AC
matrix by the cross-correlation vector of system input and output defined by:
⎡ ⎡ x(n) ⎤⎤
⎢ ⎢ ⎥⎥
⎢ ⎢ x(n − 1) ⎥⎥
⎢ ⎢ . ⎥⎥
ryx = E ⎢y(n) ⎢ ⎥⎥ (13.54)
⎢ ⎢ . ⎥⎥
⎢ ⎢ . ⎥⎥
⎢ ⎢x(n + 1 − N)⎥⎥
⎣ ⎣ ⎦⎦
Combining (13.51) and (13.53) yields the MMSE Emin and the three alternative expressions:
Emin = E[y2 (n)] − H t RN H

Emin = E[y2 (n)] − H t ryx (13.55)

Emin = E[y2 (n)] − ryx


t
R−1
N ryx
268 13 Signal Analysis and Modeling

Equalization is a special case, in which the system to be modeled is the inverse of the system
which has generated the input sequence x(n). For example, in communications, the transfer func-
tion of the equalizer is the inverse of the channel transfer function, in the absence of noise.

Example:
The channel output signal x(n) is related to the emitted data d(n), assumed to be un-correlated and
of unit power, by:

x(n) = d(n) + 0.5d(n − 1) + 0.2d(n − 2)

The first three elements of the AC function are:

r(0) = 1.29; r(1) = 0.60; r(2) = 0.20

Taking d(n) as the reference signal, y(n) = d(n), the three equalizer coefficients are:
−1
⎡1.29 0.60 0.20⎤ ⎡1⎤ ⎡ 0.9953 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
H = ⎢0.60 1.29 0.60⎥ ⎢0⎥ = ⎢−0.4971⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣0.20 0.60 1.29⎦ ⎣0⎦ ⎣ 00778 ⎦
The output error is Emin = 0.0047 and the Z-transfer function T(Z) of the channel-equalizer cas-
cade is:

T(Z) = 0.9953 − 0.0015Z −1 + 0.0273Z −2 − 0.0609Z −3 + 0.0156Z −4

It is readily verified that the filter H(Z) is a three-coefficient approximation of the inverse of the
channel, which is a FIR filter.

13.6 Linear Prediction

Linear prediction is a particular case in which the output of the system to be modeled is the signal
itself, as shown in Figure 13.2.
The output error, called prediction error, is written as:

N
e(n) = x(n) − ai x(n − i) (13.56)
i=1

The prediction coefficients are:


⎡ a1 ⎤ ⎡ r(1) ⎤
⎢ ⋮ ⎥ = R−1 ⎢ ⋮ ⎥ (13.57)
⎢ ⎥ N ⎢ ⎥
⎣aN ⎦ ⎣r(N)⎦

x(n) e(n)
+

AN(Z)

Figure 13.2 Principle of linear prediction.


13.6 Linear Prediction 269

The decorrelation of the prediction error and the input signal leads to N relations:

N
r(p) = ai r(p − i); 1≤p≤N (13.58)
i=1

As for the MMSE, we get:



N
EaN = r(0) − ai r(i) (13.59)
i=1

The above expressions can be combined to yield the matrix linear prediction equation:
[ ] [ ]
1 EaN
RN+1 = (13.60)
−AN 0
A signal is said to be “predictable” if the prediction error is null – that is, the following recurrence
equation is verified:

N
x(n) = ai x(n − i)
i=1

The predictable signal is made of at most N/2 sinusoids, and the filter with transfer function
H(Z) = 1 − AN (Z) has at most N zeros on the unit circle. For example, for N = 2 and a sinusoid of
frequency f 0 , the recurrence is:
x(n) = 2 cos(2𝜋f0 )x(n − 1) − x(n − 2)
In fact, the zeros of H(Z) are either on the unit circle or inside the unit circle because it is mini-
mum phase. The proof is as follows. Assume a zero z0 is outside the unit circle, |Z 0 | < 1, then H ′ (Z)
such that:
( )( )
H(z) z−1 z−1
H ′ (Z) = × 1 − 1 −
(1 − z0 z−1 )(1 − z0 z−1 ) z0 z0
yields a smaller prediction error because:
1
|H ′ (ej𝜔 )| = |H(ej𝜔 )|
|Z0 |2
as indicated in Section 9.6. According to expression (4.24) for the signal power at the output of a
filter, the power at the output of H ′ (Z) is lower than that at the output of H(Z) when the same signal
is fed to both filters. This contradicts the definition of the prediction filter.
Since the filter H(Z) is minimum phase and invertible, linear prediction is employed to analyze
and model signals. In fact, it is possible to retrieve a signal from its prediction error by inverting
expression (13.56):

N
x(n) = e(n) − ai x(n − i) (13.61)
i=1

This is so-called autoregressive (AR) modeling. Assuming the prediction error is a white Gaus-
sian noise of power EaN , the signal spectrum can be estimated by:
EaN
SAR (f ) = (13.62)
| ∑ N |2
|1 − i=1 ai e−j2𝜋if |
| |
If the filter H(Z) is of IIR type, the model is called autoregressive moving average (ARMA), and
the spectrum can again be derived from the model.
270 13 Signal Analysis and Modeling

13.7 Predictor Structures

FIR and IIR filters can be implemented in lattice structures, as indicated in Section 8.5. The
approach is particularly advantageous in linear prediction [4].
The prediction coefficients are defined by (13.57) and, due to the properties of the matrix, the
inversion may be avoided by an iterative procedure.
The Levinson–Durbin procedure provides a solution for the system of equations (13.58) by recur-
sion over N stages. It begins by taking the power of the error signal as:

E0 = r(0)

At the ith stage (1 ≤ i ≤ N), the following calculations are performed:


[ ]
1 ∑
i−1
ki = r(i) − aj r(i − j) ; 1 ≤ i ≤ N
i−1
Ei−1 j=1

aii = ki (13.63)

aij = ai−1
j − ki ai−1
i−j ; 1≤j≤i−1
( 2
)
Ei = 1 − ki Ei−1 (13.64)

At the Nth stage, the N coefficients ai are obtained by:

ai = aNi ; 1≤i≤N

The term Ei corresponds to the power of the residual error with a predictor of order i. However,
the values of the coefficients ki obtained at the preceding stages are not changed in the ith stage. The
procedure is sequential, and the model is improved as the number of stages (and hence the number
of coefficients) increases, because, from equation (13.64), the power of the error is reduced at each
stage if |ki | < 1.
The coefficients ki completely define the filter and indicate the method of realization. The error
signal at stage i is formed by the set ei (n), where:


i
ei (n) = x(n) − aij x(n − j)
j=1

The transfer function of the corresponding filter is expressed by Ai (Z) where:


i
Ai (Z) = 1 − aij Z −j
j−1

Using equation (13.63), this becomes:

Ai (Z) = Ai=1 (Z) − ki (Z) − ki Z −i Ai−1 (Z −1 ) (13.65)

From the results of Section 8.5, by assuming:

Bi−1 (Z) = Z −(i−1) Ai−1 (Z −1 )

we obtain:

Ai (Z) = Ai−1 (Z) − ki Z −1 Bi−1 (Z) (13.66)


13.7 Predictor Structures 271

The function Bi (Z) has the corresponding set bi (n), with:

bi (n) = bi−1 (n − 1) − ki ei−1 (n) (13.67)

Finally, the coefficients ki result in a lattice structure as shown in Figure 13.3.


The convergence of the procedure is ensured if:

|ki | < 1; 1≤i≤N (13.68)

An alternative decomposition of the prediction filter is provided by the line spectral pair (LSP)
method, which consists of splitting the transfer function into two parts with symmetric and anti-
symmetric coefficients. Taking recurrence (13.65) at order N + 1 and designating by PN (Z) the
polynomial obtained with kN + 1 = 1, we get:

PN (Z) = AN (Z) − Z −(N+1) AN (Z −1 ) (13.69)

Similarly, designating by QN (Z) the polynomial obtained with kN+1 = −1:

QN (Z) = AN (Z) + Z −(N+1) AN (Z −1 ) (13.70)

Clearly, this is a decomposition of the polynomial AN (Z), because the sum of the two preceding
relations yields:
1
AN (Z) = [P (Z) + QN (Z)] (13.71)
2 N
The coefficients of PN (Z) and QN (Z) exhibit even and odd symmetries, respectively; they are lin-
ear phase and, since, as predictors, they are not allowed to have zeros outside the unit circle, all their
zeros are on the unit circle. Moreover, if N is even, the relations PN (1) = QN (−1) = 0 hold – hence
the factorization:
N

2

PN (Z) = (1 − Z −1 ) (1 − 2 cos 𝜃i Z −1 + Z −2 ) (13.72)


i=1
N

2

QN (Z) = (1 + Z −1 ) (1 − 2 cos 𝜔i Z −1 + Z −2 )
i=1

The two sets of parameters 𝜃 i , 𝜔i , 1 ≤ i ≤ N provide another representation of prediction


parameters.
If Z0 = ej𝜔0 is a zero of AN (Z) sitting on the unit circle, it is also a zero of PN (Z) and QN (Z). Now,
if this zero moves inside the unit circle, the corresponding zeros of PN (Z) and QN (Z) move on the
unit circle in opposite directions from 𝜔0 . A necessary and sufficient condition for 1−A(Z) to be
minimum phase is that the zeros of PN (Z) and QN (Z) be simple and alternate on the unit circle.

e0(n) e1(n) e2(n)


+ +
– –

x(n)
k1 k2

– –
Z–1 + Z–1 +
b0(n) b1(n) b2(n)

Figure 13.3 Linear prediction in a lattice structure.


272 13 Signal Analysis and Modeling

F (Z) Figure 13.4 Line spectral pair predictor.


x (n) – x (n – 1)
x (n) e (n)
+
x (n) + x (n – 1) G (Z)

The above approach provides a realization structure for the prediction error filter, as shown in
Figure 13.4. The transfer functions F(Z) and G(Z) are the linear phase factors in (13.72). This struc-
ture is amenable to implementation as a cascade of second-order sections. The overall minimum
phase property is checked by observing the alternation of the Z −1 coefficients.

13.7.1 Sensor Networks – Antenna Processing


The signal to be processed often comes from a single sensor, but it can also be delivered by a set
of sensors, which offers opportunities for improving the performance of existing systems or for
introducing new functions. Such a case is illustrated by multiple antenna devices in radio commu-
nications. The outputs from the elements of an antenna array are combined to produce a far-field
beam pattern which optimizes the reception of a desired signal in a specified direction [5, 6].
A linear network of N elements delivers delayed versions of the signal s(t), as shown in
Figure 13.5.
The elementary delay is:
d sin 𝜃
Δt = (13.73)
c
where d is the distance between two elements, 𝜃 is the incidence angle, and c is the celerity of the
signal. The delays can be compensated through N FIR interpolators whose outputs are summed.
However, an important simplification occurs if the antenna elements are narrow-band, which is
a common case in radiocommunication systems where the frequency band used for information
transmission is very low compared to the carrier frequency. Then, the received signals can be
viewed as sine waves, and the propagation equations are given by:
( )
j2𝜋 ft− 𝜆x
s(x, t) = S e (13.74)

0 (reference)

x(t) ∆t
y(t)
Σ

θ
(N – 1)∆t

Figure 13.5 Antenna array.


13.8 Multiple Sources – MIMO 273

where 𝜆 = c/f is the wavelength associated with frequency f . The delays translate into phase shifts
and, in order to be able to distinguish the arrival angles between −𝜋/2 and 𝜋/2 and carry out
beamforming, the phase differences between elements must be less than 𝜋, which implies:
d sin 𝜃 1 𝜆
≤ ; d≤ (13.75)
𝜆 2 2
The value 𝜆/2 generally retained is the upper bound for the spatial sampling interval.
The interpolators boil down to multiplications by weighting coefficients given by:
wi = ej𝜔 iΔt ; 0≤i≤N −1 (13.76)
The calculation of the weighting coefficients is linked to the system environment.
In the presence of Gaussian white noise, a technique similar to linear prediction can be used. One
of the antenna elements is taken as the reference, from the corresponding signal x0 (n), a weighted
sum of the signals coming from the other elements is subtracted and the following difference is
obtained:

N−1
x0 (n) − ỹ (n) = e(n) = x0 (n) − t
wi xi (n) = x0 (n) − WN−1 XN−1 (n) (13.77)
i=1

Minimizing the power of this signal yields:


[ ] [ ]
1 E
RN = (13.78)
−WN−1 0
where RN is the input covariance matrix, W N−1 is the coefficient vector, and E is the output error
power. In an evolving environment, adaptive algorithms based on least squares or gradient tech-
niques can be employed to determine the coefficients.
The approach leads to improvements in signal-to-noise ratio. For example, if the noise power is
the same for all received signals, the global SNR is improved by a factor close to N−1.
Radio links are often plagued by jammers. Undesired signals can be canceled or attenuated if the
direction of arrival of the useful signal is known, and a unity response is imposed for that direction.
For a known direction 𝜃, with a delay Δt, the directional vector is expressed by:
F = [1, ej𝜔Δt , .. … , ej𝜔(N−1)Δt ]t (13.79)
The coefficients are calculated to minimize the output power subject to the constraint:
WNt F = 1 (13.80)
which yields:
1
WN = R−1
N F (13.81)
FR−1
N
F
The corresponding beam provides high attenuation in the direction of jamming signals, which
are associated with the zeros of the equivalent Z-transfer function. N antenna elements can atten-
uate N − 1 undesired signals.

13.8 Multiple Sources – MIMO


The environment might convey several useful source signals, which the receiver has to be able to
distinguish. The corresponding global system has multiple inputs and multiple outputs, referred
to by the abbreviation MIMO.
274 13 Signal Analysis and Modeling

h11 Figure 13.6 Principle of a MIMO system.

h21

h12

Receiver Transmitter

hNN

A system with N antennas at the sender and receiver sides is illustrated in Figure 13.6. The N × N
transfer matrix H is written as:
⎡ h11 h12 .. … … … … h1N ⎤
⎢ ⎥
h21 h22 … … … … h2N
H=⎢ ⎥
⎢ .. … … … … … … … … …⎥
⎢ hN1 hN2 .. … … … hNN ⎥
⎣ ⎦
where the entries hij are complex numbers representing the amplitude and the phase of every
subchannel linking transmitter and receiver antenna elements.
The analysis of the channel relies on the diagonalization of matrix H and the eigen-
decomposition, when it exists:

N
H = M diag (𝜆i )M ∗ ; H= 𝜆i Ui Ui∗ (13.82)
i=1

where 𝜆i and U i (i = 1, …, N) are the eigenvalues and normalized eigenvectors, respectively, and
M the rotation matrix:

M = [U1 . … … UN ]
[ ]
1re−j𝜃
Example: H = ; 𝜆i = 1 ± jr; |𝜆i |2 = 1 + r 2
−rej𝜃 1
Eigenvectors:
[ ] [ ]
1 1 1 1
U1 = √ ; U2 = √
2 jej𝜃 2 −jej𝜃

Rotation matrix:
[ ]
1 11
M= √ j𝜃 j𝜃
2 je − je
The decomposition of the transfer matrix can be used to transmit a vector D of N data, provided a
precoding operation by matrix M is introduced in the transmitter and post-coding by the transpose
matrix M* is introduced in the receiver. The block diagram is given in Figure 13.7 for a Hermitian
channel.
The received signal vector is:

Y = M ∗ H M D = diag (𝜆i ) D = Λ D (13.83)


Appendix: Estimation Bounds 275

Figure 13.7 MIMO transmission with Hermitian Postcoding CHANNEL Pre-coding


N × N channel.
D M* H(Z) M D

The equivalent of a set of N separated channels has been realized and the capacity of the system
is the sum of the capacities of the N individual channels. It is worth pointing out that the precoding,
which assumes knowledge of the channel at the transmitter side, can be avoided by inverting the
channel matrix at the receiver. However, such a scheme might amplify the noise significantly and
degrade the transmission performance.

13.9 Conclusion
The AC matrix is directly involved in modeling with the MMSE criterion. Although it does not
appear explicitly in the most common adaptive filter algorithms, its eigenvalues – in particular, the
minimum eigenvalue – control the system operation.
The AC matrix is instrumental in high-resolution spectral analysis of signals, through the
harmonic decomposition method. In that method, the signal is modeled by a set of sinusoids
in noise, whose frequencies are linked to the zeros of the minimum eigen filter – that is, the
filter coefficient vector is the eigenvector associated with the minimum eigenvalue. Compared
with linear prediction, harmonic decomposition avoids the bias introduced by the least squares
criterion.
Linear prediction allows for real-time signal analysis and can lead to simple and efficient
compression methods.
The techniques developed for one-dimensional signals can be extended to multidimensional
signals with real-time matrix processing in broad diffusion applications – in particular, mobile
radiocommunications.

Appendix: Estimation Bounds


When estimating a frequency from a set of N samples, three parameters are taken into
account – namely amplitude A, angular frequency 𝜔, and phase 𝜑.
Consider the signal:
x(n) = Aej(n𝜔+𝜙) + b(n); 0≤n≤N −1 (13.A1)
At optimum, and in the absence of bias, the covariance matrix of the parameter estimation
vector 𝜃 is related to the power 𝜎b2 of the noise b(n) by:
[ ]
t −1
var{θ} = MG M G 𝜎b2 (13.A2)

where M G is the matrix of the derivatives of the measurements with respect to the parameters.
Taking the derivative of x(n), as defined in (13.A1), with respect to the parameters, we obtain:
⎡ N −jA N(N−1) −jAN ⎤
t ⎢ 2 ⎥
MG M G ⎢jA N(N−1) A2 N(N−1)(2N−1) A2 N(N−1) ⎥
(13.A3)
⎢ 2 6 2 ⎥
⎢ jAN A 2 N(N−1)
A 2 N ⎥
⎣ 2 ⎦
276 13 Signal Analysis and Modeling

Since the three parameters to be estimated are real, their variances are obtained from the real
t
part of the above matrix MG M G only. Assuming the noise is complex, the inverse of the real part of
t
the matrix MG M G must be multiplied by 𝜎b2 ∕2, which yields:

𝜎b2 1 6 1 (N − 1)
var{A} = ; var{𝜔} = ; var{𝜙} = (13.A4)
2N SB N(N 2 − 1) SB N(4N 2 − 3N + 1)
These bounds are called Cramer–Rao bounds.

Exercises
13.1 Real signal estimation. Consider the sequence:

x(n) = 2 cos(2𝜋nf); 0 ≤ n ≤ N

Apply definition (13.18) to find the AC function. For f = 1/8 and N = 16, verify that the
value r 1 = 0.618 is obtained.
Clearly, the AC function does not provide an accurate estimation of the frequency. In
order to use the iterated correlation technique, relation (13.31) is modified by canceling the
terms with index greater than N−1 in the sequence y(n)=TFD2N (x). Refer to Section 9.2 for
justification.
For f = 0.1, we find f estim = 0.1014, and if a noise such that SNR = 10 (10 dB) is added
to the signal, the variance becomes var = 4.7 × 10−6 . Compare with the estimation bound
(13.23) for complex signals.

13.2 The signal x(n) = m + e(n), where m is a constant and e(n) is a white noise of power 𝜎e2 , is
applied to a recursive estimator whose output is:

y(n) = (1 − b)y(n − 1) + bx(n)

Assuming x(n) = 0 for n < 0, compute y(n). If b = 0.8, how many samples are needed for
y(n) to approach m, in the mean, within 1%?
Compute the output mean square error, E[[y(n) − m]2 ], for n > 0. What is the limit when
n tends toward infinity? Study the evolution and the choice of coefficient b for the three
cases: 𝜎 e ≈ m; 𝜎 e > m; and 𝜎 e < m.
Compare the performance of the recursive estimator with that of the non-recursive estima-
tor defined by:

1 ∑
n
y(n) = x(i)
n + 1 i=0

√ ( )
13.3 Consider the signal sequence x(n) = 2 sin n𝜋 4
. Compute the first three terms of the AC
function. Compute the eigenvalues and eigenvectors of the 3 × 3 AC matrix. Verify expres-
sion (13.40) for the decomposition and reconstruction of the matrix.

13.4 A second-order predictor with transfer function:

H(Z) = 1 − a1 Z −1 − a2 Z −2
References 277

is applied to the signal:



x(n) = 2 sin(n𝜔) + b(n)
where b(n) is a Gaussian white noise with power 𝜎b2 . Give the power of the predictor output
signal. Through derivation, find the expressions of the prediction coefficients. How do these
values evolve when the noise power grows?

13.5 Consider the signal:


( ) ( )
n𝜋 n𝜋
x(n) = sin + cos
4 3
Compute the first three terms of the AC function. Give the coefficients of the second-order
predictor. Place the zeros of the prediction filter in the Z-plane.
Give the expression of the prediction error signal and compute its power.

13.6 Give the polynomial decomposition with even and odd symmetry for the following predic-
tion filter:
1 − AN (Z) = (1 − 1.6 Z −1 + 0.9 Z −2 ) (1 − Z −1 + Z −2 )
Give the zeros of polynomials obtained and compare them with those of the initial filter.

13.7 In a radio transmission link, in order to dispose of large jamming signals, a three-antenna
network is employed, and the following sequences are available:
√ √
x1 (n) = d(n) + 2 sin(n𝜋∕4 − 𝜋∕6); x2 (n) = d(n) + 2 sin(n𝜋∕4);

x3 (n) = d(n) + 2 sin(n𝜋∕4 + 𝜋∕6)
The data d(n) are independent and have unit power. Compute the 3 × 3 covariance matrix
R3 of the input signals.
A weighted summation of these three signals is performed. Write the matrix equation
needed to compute the weighting coefficient values such that the summation yields the
useful sequence d(n). Deduce that the first and third coefficients are equal and provide the
coefficient values.

13.8 A MIMO system with two antennas at the transmitter side and the receiver side has the
channel coefficient matrix:
[ ]
0.8 0.5ej𝜋∕3
H=
0.7ej𝜋∕4 0.6e−j𝜋∕8
Compute the eigenvalues of that matrix.
The total emitted power is unity, and it is uniformly distributed between the two anten-
nas. The additive white noise at each receiver antenna is 𝜎b2 = 0.1. Compute the theoretical
capacity of the system expressed in bit/s/Hz.

References

1 L. Marple, Digital Spectrum Analysis with Applications, Prentice-Hall, New York, 1987.
2 E. Aboutanios and B. Mulgrew, Iterative frequency estimation by interpolation on Fourier coeffi-
cients, IEEE Transactions on Signal Processing, 53(4), 1237–1242, 2005.
278 13 Signal Analysis and Modeling

3 S.S. Reddy, Eigenvector Properties of Toeplitz Matrices and Their Applications to Spectral Analy-
sis of Time Series, Signal Processing, Vol. 7, North Holland, 1984, pp. 46–56.
4 J. Makhoul, Linear prediction: a tutorial review, Proceedings of the IEEE, Vol. 63, pp. 561–580,
1975.
5 A. Paulraj, R. Nabar and D. Gore, Introduction to Space-Time Wireless Communications, Cam-
bridge University Press, USA, 2003.
6 D. Tse and P. Viswanath, Fundamentals of Wireless Communication, Cambridge University Press,
USA, 2005.
279

14

Adaptive Filtering

Adaptive filtering is used when we need to realize, simulate, or model a system whose char-
acteristics develop over time. It leads to the use of filters whose coefficients change with time.
The variations in the coefficients are defined by an optimization criterion and are realized
according to an adaptation algorithm, both of which are determined depending on the application.
There are many different possible criteria and algorithms [1-4]. This chapter examines the simple
but, in practice, most important case in which the criterion of mean square error minimization is
associated with the gradient algorithm.
While fixed coefficient filtering is generally associated with specifications in the frequency
domain, adaptive filtering corresponds to specifications in time. It is natural to introduce
this subject by considering the calculation of filter coefficients in these conditions. We begin
by examining FIR filters.

14.1 Principle of Adaptive Filtering


The principle of adaptive filtering is illustrated in Figure 14.1. It consists of processing the input
signal x(n) to produce an output ỹ (n), whose difference with the reference y(n) is minimized.
For every new set of data, reference, and input signal, the coefficients of the filter, assumed to be
of FIR type and represented by the vector H(n), are updated.
At time n, assuming n data have been received, the cost function for the
optimization procedure is chosen as:

n
J(n) = [y(p) − H t (n)X(p)]2 (14.1)
p=1

where X(p) now designates the column vector of the N most recent input samples at time p:
X t (p) = [x(p), x(p − 1) · · · x(p + 1 − N)]
Applying the results from Section 13.5, we obtain the following expression for the filter
coefficients:
H(n) = R−1
N (n)ryx (n) (14.2)

Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
280 14 Adaptive Filtering

~ y(n)
y (n)
x(n)
PROGRAMMABLE
+
FILTER

Input
signal Reference

e(n)

COEFFICIENT
UPDATING
ALGORITHM

Figure 14.1 Adaptive filter principle.

The input signal AC matrix can be conveniently expressed as:


⎡ x(p) ⎤
n ⎢ ⎥

n
∑ x(p − 1)
RN (n) = X(p)X t (p) = ⎢ ⎥ [x(p), x(p − 1), … , x(p + 1 − N)] (14.3)
p=1 p=1
⎢ ⋮ ⎥
⎢x(p + 1 − N)⎥
⎣ ⎦
Similarly, the estimation of the cross-correlation vector between input and reference is:

n
ryx (n) = y(p)X(p) (14.4)
p=1

Whenever a new data set {x(n + 1), y(n + 1)} becomes available, the new coefficient vector
H(n + 1) can be recursively computed from H(n). The definition equations (14.3) and (14.4) lead
to the recursions:
RN (n + 1) = RN (n) + X(n + 1)X t (n + 1)
(14.5)
ryx (n + 1) = ryx (n) + X(n + 1)y(n + 1)
Now:
RN (n + 1)H(n + 1) = ryx (n + 1) = ryx (n) + X(n + 1)y(n + 1)
and:
RN (n + 1)H(n + 1) = RN (n)H(n) + X(n + 1)y(n + 1)
and finally:
RN (n + 1)H(n + 1) = [RN (n + 1) − X(n + 1)X t (n + 1)]H(n) + X(n + 1)y(n + 1)
Hence the recursion:
t
H(n + 1) = H(n) + R−1
N X(n + 1)[y(n + 1) − H (n)X(n + 1)] (14.6)
14.1 Principle of Adaptive Filtering 281

It is interesting to observe that the term:


e(n + 1) = y(n + 1) − H t (n) X(n + 1) (14.7)
is the error at system output, computed at time (n + 1) with the coefficients H(n) available at
time n. This is called the a priori error, while the same calculation with H(n + 1) is called the a
posteriori error.
The algorithms in which the coefficients are computed each time, according to equation (14.6),
are the least squares algorithms.
Simplified but very useful algorithms are obtained when the matrix R−1 N
(n) is replaced by the
diagonal matrix 𝛿I N , where 𝛿 is a scalar called the adaptation step size. Accordingly, the coefficients
are updated by:
H(n + 1) = H(n) + 𝛿 X(n + 1) e(n + 1) (14.8)
The algorithm is called the gradient algorithm because the vector −X(n + 1)e(n + 1) is actually
the gradient of the function 12 e2 (n + 1), which is the instantaneous value of the quadratic error [4].
Thus, the change in the coefficient values is carried out in the direction of the error gradient, but
with the opposite sign, so as to move toward the minimum. The procedure is similar to steepest
descent in optimization.
The above procedure can easily be extended to complex signals. Computing the gradient of the
squared norm of the complex output error leads to the updating equation for complex coefficients:
H(n + 1) = H(n) + 𝛿 X(n + 1) e(n + 1) (14.8bis)
Indeed, the computational complexity increases four times.
In a stationary environment, the coefficient vector converges, in the mean, to the theoretical
solution. Substituting equation (14.7) into equation (14.8) yields:
H(n + 1) = [ IN − 𝛿 X(n + 1)X t (n + 1)]H(n) + 𝛿 X(n + 1)y(n + 1) (14.9)
Now, defining:
RN = E[X(n)X t (n)]; ryx = E[y(n)X(n)] (14.10)
The optimum set of coefficients is reached when n tends toward infinity:
E[H(∞)] = Hopt = R−1
N ryx (14.11)
The matrix RN is the N × N signal autocorrelation matrix and r yx is the vector of the N first
elements of the inter-correlation between input and reference. Finally, the gradient algorithm
converges in the mean to the optimal solution H opt ; hence also the denomination of stochas-
tic gradient, which is sometimes used. The minimization criterion is the least mean squares
(LMS) criterion.
Once the convergence has been obtained, the optimum coefficient values are given by (14.11).
The minimum value Emin of the quadratic error corresponding to the optimum coefficient val-
ues can be expressed in terms of the signals x(n), y(n), and their cross-correlation, as indicated in
Section 13.5.
The circuit for the resulting adaptive filter is shown in Figure 14.2. The variations of the coeffi-
cients are calculated by multiplying for each value of the difference e(n) and summing.
282 14 Adaptive Filtering

x(n)
T T

+ + +
e(n)

a0 a1 aN–1
T T T

– + y(n)
+ +

Figure 14.2 Adaptive FIR filter in direct form.

The choice of the value 𝛿 in equation (14.8) leads to a compromise between the adaptation speed
and the value of the residual error when the adaptation is obtained. These two properties will be
studied later, but it is first necessary to study the convergence conditions.

14.2 Convergence Conditions

The case of a well-dimensioned system without noise is considered first, which means
that the residual error is null after convergence. Designating the coefficient deviation by
ΔH(n) = H opt − H(n), we can write the following at time n:

ΔH(n + 1) = ΔH(n) + 𝛿 X(n + 1) e(n + 1) (14.12)

and also:
t
y(n + 1) = Hopt X(n + 1); e(n + 1) = ΔH t (n)X(n + 1) (14.13)

The norm of the coefficient deviation vector is:

||ΔH(n + 1)||22 = ||ΔH(n)||22 + 𝛿 2 e2 (n + 1)X t (n + 1)X(n + 1) − 2 𝛿e(n + 1)ΔH t (n)X(n + 1)

and, using (14.13):

||ΔH(n + 1)||22 = ||ΔH(n)||22 + 𝛿 e2 (n + 1) [𝛿 X t (n + 1)X(n + 1) − 2 ] (14.14)

A monotonic decreasing sequence is obtained, and convergence is guaranteed if the following


conditions are satisfied:
2
0<𝛿< t (14.15)
X (n + 1)X(n + 1)
The constraints for selecting the adaptation step can be deduced:
2
0<𝛿< (14.16)
N max [x2 (n)]
For signals with high peak factor and a number of coefficients N exceeding several units, the
upper bound might be too restrictive and may lead to slow adaptation.
14.2 Convergence Conditions 283

Figure 14.3 Adaptive filter with measurement noise.


Hopt +
b(n)
x(n)
~ y(n)
y(n)
H(n) +

e(n)

Considering deviation means, taking the expectations of the two terms in (14.14), and assuming
independence of e2 (n + 1) and X t (n + 1) X(n + 1), it appears that convergence cannot occur if the
following conditions are not satisfied:
2
0<𝛿< (14.17)
N𝜎x2
where 𝜎x2 is the input signal power.
The two upper bounds (14.16) and (14.17) are linked by the peak factor F c of the input signal and,
in applications, an intermediate value may be selected – for example (14.17), with some margin.
In the case of Gaussian signals, it can be shown that convergence is guaranteed as soon as the
adaptation step is smaller than:
1 2
0<𝛿< (14.18)
3 N𝜎x2
An alternative way of deriving (14.17) is worth mentioning. A convergence condition can be
derived by considering the a posteriori error defined by:

𝜀(n + 1) = y(n + 1) − H t (n + 1)X(n + 1) (14.7bis)

The adaptive filter works properly if the adaptation is efficient – that is, if the a posteriori error is
smaller, on average, than the a priori one:

E[|𝜀(n + 1)|] < E[|e (n + 1)|]

Substituting the coefficient updating equation (14.8) into the definition of 𝜀(n + 1) yields:

𝜀(n + 1) = [1 − 𝛿X t (N + 1)X(n + 1)]e(n + 1)

The two errors are, in fact, proportional, and condition (14.15) follows.
If a measurement noise is introduced, the block diagram is as shown in Figure 14.3.
The output error is:

e(n + 1) = ΔH t (n)X(n + 1) + b(n + 1) (14.19)

The noise b(n) is zero mean and its power is 𝜎b2 . Then, the quadratic coefficient deviation satisfies
the recurrence:

||ΔH(n + 1)||22 = ||ΔH(n)||22 + 𝛿 e2 (n + 1) [𝛿 X t (n + 1)X(n + 1) − 2 ]


+ 2𝛿b2 (n + 1) + 2𝛿b(n + 1)ΔH t (n)X(n + 1) (14.20)

Since the noise is uncorrelated with the signal, the last term in the right-hand side of the equation
is zero in the mean and the convergence condition is derived by taking the expectations of the two
terms in (14.20):
[ ]
𝛿N𝜎x2 − 2 ER + 2𝜎b2 < 0 (14.21)
284 14 Adaptive Filtering

where ER is the mean square error. Then, the convergence stops as soon as the following equality
is satisfied:
𝜎b2
ER = (14.22)
1 − 𝛿2 N𝜎x2
Thus, after convergence, a residual error with power ER remains, and the noise power corre-
sponds to the minimum mean square error Emin , which is reached when the coefficients take on
their optimal value H opt . This residual error is analyzed in Section 14.4.
With the stability ensured, it is interesting to evaluate the adaptation speed and to determine the
time constant of the adaptive filter.

14.3 Time Constant


Let us first consider a filter with a single coefficient h1 (n). In these conditions, equation (14.9) can
be written:
h1 (n + 1) = [1 − 𝛿x2 (n + 1)] h1 (n) + 𝛿x(n + 1)y(n + 1) (14.23)
The coefficient of this filter has the mean value b = 1 − 𝛿 𝜎x2 . For a global assessment of the char-
acteristics, that filter can be viewed as a first-order IIR section with a fixed coefficient equal to b.
Then, for small values of 𝛿, expression (6.5) yields the time constant:
1
𝜏≈
𝛿 𝜎x2
The result can be generalized to a filter with N coefficients, under certain conditions.
Assume [𝛼(n)] is the column vector:
[𝛼(n)] = M [Hopt − H(n)] (14.24)
where M is the square matrix of order N, which appears in the decomposition of RN given by
equation (13.26).
Using (14.11), it can be verified that equation (14.9), which defines the evolution of the coeffi-
cients, can be written as follows, in the transformed space and in the mean:
E[𝛼(n + 1)] = [IN − 𝛿 diag(𝜆i )] E[𝛼(n)] (14.25)
The system corresponds to N time constants:
1
𝜏i = (14.26)
𝛿𝜆i
and it shows that the smallest eigenvalue of the AC matrix, 𝜆min , determines the convergence time
of the adaptive filter. The most favorable case occurs when the input signal is a white noise, because
all the eigenvalues are equal to the signal power:
1
𝜏e = (14.27)
𝛿𝜎x2
This expression provides an estimate of the time constant, which turns out to be proportional to
the inverse of the adaptation step. The equations of the coefficient evolution can be simplified in
that case, as can that of the mean square error evolution which is:
( )2n 2
E(n) = 1 − 𝛿 𝜎x2 𝜎y (14.28)
14.4 Residual Error 285

Assuming the initial coefficient values are null, 𝜎y2 is the power of the reference signal.
In concrete applications, maximum adaptation speed is often sought and, thus, the maximum
adaptation step size is contemplated.
The convergence condition (14.17) provides an upper bound for the adaptation step, and if 𝛿
exceeds that bound then the output error will grow. A geometric illustration, based on the error
surface representation, assumed to be symmetric, shows that the fastest adaptation is obtained
with half that bound, 𝛿 = N𝜎1 2 . That result can be proven analytically using the approach described
x
in the next section. In these conditions, the time constant satisfies the following inequality:
𝜏e ≥ N (14.29)
In order to complete the examination of this adaptive filter, the residual error after adaptation
still has to be evaluated in the general case – that is, with imperfect dimensioning and measurement
noise.

14.4 Residual Error


After a transition phase corresponding to the convergence, the coefficients of the adaptive filter
continually vary about their optimal value, because the adaptation step 𝛿 remains constant.
This, incidentally, is the condition for permanent adaptation of the system. As a result, the residual
error ER , defined as the limit of the expectation of the quadratic error E(n) as n tends toward
infinity, is greater than the minimum value Emin .
The residual error ER is evaluated by considering the vector [𝛼(n)] defined by equation (12.59)
and its evolution, described by the following recursion:
[𝛼(n + 1)] = [𝛼(n)] − 𝛿MX(n + 1)e(n + 1) (14.30)
In order to estimate the variances of the elements of the vector [𝛼(n)], it is convenient to consider
the matrix [𝛼(n)] [𝛼(n)]t whose main diagonal is made of those elements squared:
[𝛼(n + 1)][𝛼(n + 1)]t = [𝛼(n)[𝛼(n)]t − 2𝛿MX(n + 1)e(n + 1)[𝛼(n)]t
+ 𝛿 2 e2 (n + 1)MX(n + 1)X t (n + 1)M t (14.31)
In terms of [𝛼(n)], the error is expressed by:
t
e(n + 1) = y(n + 1) − Hopt X(n + 1) + X t (n + 1)M t [𝛼(n)]
The evolution of the adaptive system is governed by the above two equations. In order to obtain
useful results, it is necessary to make several simplifying assumptions.
The following variables are assumed to be independent:
(1) The output error when the coefficients take on their optimal values.
(2) The data vector: X(n + 1).
(3) The coefficient deviation: H(n) − H opt .
The consequence is the following equality:
[[ ] ]
t
E y(n + 1) − Hopt X(n + 1) X t (n + 1)M t [𝛼(n)] = 0 (14.32)
Taking the expectation of both sides of equation (14.31) yields:
E{[𝛼(n + 1)][𝛼(n + 1)]t } = [IN − 2𝛿diag(𝜆i )]E{[𝛼(n)][𝛼(n)]t }
+ 𝛿 2 E(n)diag(𝜆i ) (14.33)
286 14 Adaptive Filtering

After the transient phase, when n tends toward infinity:


𝛿
E{[𝛼(∞)][𝛼(∞)]t } = E(∞)IN (14.34)
2
Because of the definition (14.24) of [𝛼(n)], we also have:
𝛿
E[ [ Hopt − H(∞)] [Hopt − H(∞)]t ] = E(∞) IN (14.35)
2
Therefore, after convergence, the coefficient deviations are independent, and they all have the
same variance.
Next, the residual error power must be computed.
For the coefficient deviation ΔH(n), the residual error is given by:

E(n) = Emin + ΔH t (n) RN ΔH(n) (14.36)

Alternatively, using (14.24):

E(n) = Emin + [𝛼(n)]t diag(𝜆i ) [𝛼(n)] (14.37)

Computing the products, we find:



N−1
E(n) = Emin + 𝜆i 𝛼i2 (n)] (14.38)
i=0

Since the coefficient deviations have the same variance, factorization can take place and, using
(14.35), it appears that the residual error at infinity, ER = E(∞), is given by:
Emin
ER = (14.39)
1 − (𝛿∕2)N𝜎x2
It is worth pointing out that the above equation leads to the stability condition (14.17) derived by
a different approach.
In practice, due to the margin generally taken on the step size 𝛿, the following approximation
holds:
( )
𝛿
ER ≃ Emin 1 + N𝜎x2 (14.40)
2
In terms of the time constant, with equation (14.27), we have:
( )
NT
ER ≃ Emin 1 + (14.41)
2𝜏e
Thus, the relation between the time constant and the residual divergence is clearly seen. T is the
sampling period, taken here as being equal to unity.
The increase in residual error due to the step size 𝛿 can be viewed as a gradient noise.

14.5 Complexity Parameters

The complexity parameters of adaptive filters are the same as those for filters with fixed coeffi-
cients. The most important are the multiplication rate, the number of bits of the coefficients, and
the internal memories. The limitations on the number of bits of the coefficients and internal data
increase the residual error ERT . The specifications are generally given in terms of a minimum gain
of the system. That is, the ratio of the power of the reference signal 𝜎y2 to the total residual error ERT
14.5 Complexity Parameters 287

should exceed a specified value G2 :


𝜎y2
≤ G2 (14.42)
ERT
The time constant 𝜏 e , if it is imposed, should be compatible with the minimum gain of the system
so that the system can be realized using the gradient technique.
The values of the parameters G and 𝜏 e of the adaptive filter can be used to calculate the number
of bits of the coefficients and internal data for each structure. The order of the filter is chosen to be
sufficiently large for the minimum divergence Emin to be acceptably small and for equation (14.42)
to be satisfied.
In the case of the FIR filter realized as a direct structure as in Figure 14.2, rounding is generally
carried out on the output of the multipliers. The noise generated by rounding( q2 ) the internal data
with the quantization step q2 corresponds to the addition of the power N 12 to the minimal error
2

power Emin .
If the coefficients are quantized with the step q1 , the round-off errors make a vector which has to
be incorporated into the coefficient evolution equation.
( 2) Assuming that round-off errors are inde-
q
pendent of other signals, the additional term 121 IN has to be introduced in equations (14.33)
and equation (14.34) becomes:
2 ( )
𝛿 1 q1 1
E{[𝛼(∞)][𝛼(∞)]t } = E(∞)IN + diag (14.43)
2 2𝛿 12 𝜆i
Finally, the total residual error ERT is given by:
[ ]
1 q22 N q21
ERT = Emin N (14.44)
1 − (𝛿∕2)N𝜎 2 x 12 2𝛿 12

which, for a small step size 𝛿, can be approximated by:


( )
𝛿.N𝜎x2 2
N q1 q2
ERT ≃ Emin 1 + + +N 2 (14.45)
2 2𝛿 12 12
The relative values of the four terms in this expression can be chosen to suit each application.
One common option is to consider Emin as the most important term and to assume that the addi-
tional residual error caused by the adaptation step 𝛿 is equal to the noise introduced by the internal
rounding – that is,
2 2
1 𝛿.N𝜎x N q1 q2
2
E (14.46)
2 2 2𝛿 12 12 min
If bc is the number of bits of the coefficients and amax is the amplitude of the largest coefficient,
then:

q1 = hmax 21−bc

Under these conditions,


2 h2max
22bc = (14.47)
3 𝛿 2 Emin 𝜎x2
With the assumption that Emin is the dominant term in equation (14.45) – that is,

G2 ⋅ Emin ≈ 𝜎y2
288 14 Adaptive Filtering

and by introducing the time constant, one has approximately:


( )
𝜎
bc ≈ log2 (𝜏e ) + log2 (G) + log2 hmax x (14.48)
𝜎y
𝜎
The term hmax 𝜎x depends on the system gain, the signals, and the filter order; it must be
y
determined for every application.
Equation (14.48) shows that the number of bits of the coefficients is directly related to the time
constant and to the gain of the system. It can assist designers in selecting from the available options.

14.6 Normalized Algorithms and Sign Algorithms


The time constant of an adaptive filter and its residual error are related to the power of the input
signal x(n). When this power can vary greatly, the adaptation can be modified as follows:
𝛿
H(n + 1) = H(n) + X(n + 1) e(n + 1) (14.49)
X t (n + 1)X(n + 1)
This is described as a normalized algorithm. It can be verified that the a posteriori error is zero
if 𝛿 = 1. In practice, rather than calculate the scalar product X t (n + 1)X(n + 1), a recursive estimate
of the power can be made, and this leads to:
Px (n + 1) = (1 − 𝜀) Px (n) + 𝜀 x2 (n + 1) (14.50)
𝛿 X(n + 1)e(n + 1)
H(n + 1) = H(n) + (14.51)
Px (n + 1)
The parameter 𝜀 of the recursive estimate is chosen as a function of the variations of the signal
power. It must be at least of order 1/N, where N is the number of coefficients for the filter.
In certain applications, it is important to minimize the operations, and simplified algorithms are
then used in which the variations of the coefficients are a function of the sign of the terms e(n) and
x(n), or even the products e(n)x(n − i); these are sign algorithms. The reduction in complexity thus
obtained comes at the cost of a degradation in some aspects of system performance [2].
Consider the following coefficient adaptation algorithm:
hi (n + 1) = hi (n) + Δe(n + 1) sign(x(n + 1 − i)) (14.52)
For nonzero x:
x
sign[x] =
|x|
If the amplitude distribution of x(n) is symmetrical, in a rough approximation, |x(n)| can be
replaced by 𝜎 x . Therefore, expression (14.52) is similar to equation (14.8) with:
Δ
𝛿≈
𝜎x
Proceeding further in that direction, the coefficient variations can be limited to constant values:
hi (n + 1) = hi (n) + Δ sign(e(n + 1) x(n + 1 − i)) (14.53)
Then, as an initial approximation, the coefficient variations in equation (14.53) are comparable
with those in equation (14.8), with:
Δ
𝛿= (14.54)
𝜎e 𝜎x
14.7 Adaptive FIR Filtering in Cascade Form 289

Starting from zero values for the coefficients, in the convergence phase of the filter, it can be taken
that 𝜎 e ≃ 𝜎 y and the time constant 𝜏 s for the sign algorithm can be expressed by:
𝜎y
𝜏s ≈ (14.55)
Δ𝜎x
After convergence, one can take 𝜎e2 = Emin . If the variation step is sufficiently small, the residual
error ERS in the sign algorithm is:
( )
NΔ 𝜎x
ERS ≈ Emin 1+ √ (14.56)
2 Emin
The residual error is thus found to be larger than with the gradient algorithm. This leads to low
values for Δ. It should also be noted that, remembering equation (14.54), the stability condition
(14.17) is represented by the inequality:

2 Emin
Δ≤ (14.57)
N 𝜎x
which can be taken as a convergence condition and can result in very small values of Δ. In practice,
equation (14.52) is usually modified to become:
hi (n + 1) = (1 − 𝜀) hi (n) + Δ sign(e(n + 1) x(n + 1 − i)) (14.58)
The constant 𝜀, positive and small, introduces a leakage function, which is needed, for example,
in transmission systems which must tolerate a certain error rate. Under these conditions, the coef-
ficients are bounded by:
Δ
|hi (n)| ≤ ; 0≤i≤N −1 (14.59)
𝜀
This modification leads to an increase in the residual error. The coefficients are biased, and
instead of (14.11) for small values of 𝜀 and Δ, we can write:
[ ]−1
𝜀
E[H(∞)] = I + RN E[y(n)X(n)] (14.60)
Δ
The corresponding increase in residual error can be calculated by an expression similar to
equation (14.36).
The constants 𝜀 and Δ are chosen on the basis of the performance to be achieved in each case.
The adaptive filters considered above are of the direct structure FIR type. It is a simple and robust
approach and is commonly used. However, as with fixed coefficient filters, other structures can be
employed.

14.7 Adaptive FIR Filtering in Cascade Form


In certain modeling problems, particularly in automatic control, it is important to know the roots
of the Z-transfer function of the adaptive filter. It is then convenient to use a cascade arrangement
of L second-order sections H i ⋅(Z), 1 ≤ i ≤ L, such that:
Hi (Z) = 1 + ai1 Z −1 + ai2 Z −2
From the results of Chapter 6, if the roots Z1i and Z2i are complex, then:
| |2 ( )
Z2i = Z1i ; ai2 = |Z1i | ; ai1 = −2Re Z1i
| |
290 14 Adaptive Filtering

Consider an adaptive filter whose transfer function H(Z) is:



L
( )
H(Z) = 1 + ai1 Z −1 + ai2 Z −2 (14.60)
i=1

Beginning with a given set of values of the coefficients, variations proportional to the gradient of
the error function E(A) must be applied so as to minimize the mean square error. This leads to:

2 ∑
N0 −1
𝜕E 𝜕̃y(n)
= − [y(n) − ỹ (n)] ; k = 1.2; 1≤i≤L (14.61)
𝜕aik N0 n=0 𝜕aik
In order to calculate the term gki such that:
𝜕̃y(n)
gki (n) =
𝜕aik
one can use the expression for y(n) obtained by an inverse Z-transform on the transform X(Z) of
the set x(n) (see Section 4.2). Hence:

1 ∏( L
)
ỹ (n) = Z n−1 1 + ai1 Z −1 + ai2 Z −2 X(Z)dZ

2𝜋j r i=1

where Γ is a suitable integration contour. Hence:

𝜕̃y(n) 1
L
∏( )
= Z n−1 Z −k 1 + al1 Z 1 + al2 Z −2 X(Z)dZ
𝜕aik 2𝜋j ∫𝛤 l=1
l≠i

or,
𝜕̃y(n) 1 H(Z)
= Z n−1 Z −k X(Z)dZ (14.62)
𝜕aik 2𝜋j ∫r 1 + a1 Z + ai2 Z −2
i −1

Thus, to form the term gki (n), it is sufficient to apply the set ỹ (n) to a recursive section whose
transfer function is the inverse of that of the initial section of order i. This recursive section has
the same coefficients, but with the opposite sign. The corresponding circuit is given in Figure 14.4.

Xδ e(n)

x(n) – + y(n)
H1 H2 HL
~
y (n)

+
g11(n) g12(n) gL1(n) gL2(n)
Z–1
gi1(n)
+
–ai1
Z–1
gi2(n)

–ai2

Figure 14.4 Adaptive FIR filter in cascade form.


14.8 Adaptive IIR Filtering 291

The variations in the coefficients are calculated by the expressions:

daik (n) = 𝛿 gki (n)[y(n) − ỹ (n)]; k = 1.2; 1≤i≤L (14.63)

The filter obtained in this way is more complicated than that in the previous section, but it offers
a very simple method of finding the roots, which, due to the presence of a recursive part, should be
inside the unit circle in the Z-plane to ensure the stability of the system. The techniques derived
for FIR filters also apply to IIR filters.

14.8 Adaptive IIR Filtering

The coefficients of an IIR filter can be calculated for time specifications by using the least mean
squares technique in an iterative procedure, as was done in Section 7.3. Algorithms for adapting
the coefficients to the time evolution of the system can also be deduced.
A linear system can be modeled by a purely recursive IIR filter with a Z-transfer function, G(Z),
such that:
1
G(Z) = a0 ∑K (14.64)
1+ k=1 bk Z −k
In this case, the model is said to be autoregressive (AR). This important and convenient approach
is also appropriate if the best representation of the system corresponds to a Z-transfer function,
H(Z), which is the quotient of two polynomials:
N(Z)
H(Z) =
D(Z)
where N(Z) has all its roots inside the unit circle and thus has minimum phase. In this case, for a
suitable integer M, we can write:

1 ∑ M
≃1+ ci Z −i
N(Z) i=1

It is then sufficient to let the degree K of the denominator of the function G(Z) take a value
sufficient for representing H(Z). The presence of roots inside the unit circle of the system results
in an increase in the number of poles of the model [1].
The general IIR filter corresponds to an autoregressive moving average (ARMA) model. This is
the most widely used approach for modeling a linear system. For the IIR filter whose coefficients
must be calculated over a set of N 0 indices in order to approximate a set y(n), the output is written as:


L

K
ỹ (n) = al x(n − l) − bk ỹ (n − k) (14.65)
l=0 k=1

The error function E(A, B) is expressed by:

1 ∑
N0 −1
E(A, B) = [y(n) − ỹ (n)]2 (14.66)
N0 n=0

Starting from a set of values for the coefficients, this function can be minimized using the gradient
algorithm if the coefficients are given increments proportional to the gradient of E(A, B) and of the
opposite sign. The presence of a recursive part causes complications. Calculation of the gradient
292 14 Adaptive Filtering

leads to the following expressions:

2 ∑
N0 −1
𝜕E 𝜕̃y(n)
=− [y(n) − ỹ (n)] ; 0≤l≤L
𝜕al N0 n=0 𝜕al

2 ∑
N0 −1
𝜕E 𝜕̃y(n)
=− [y(n) − ỹ (n)] ; 1≤k≤K
𝜕bk N0 n=0 𝜕bk

with:
𝜕̃y(n) ∑K
𝜕̃y(n − k)
= x(n − l) − bk (14.67)
𝜕al k=1
𝜕al

𝜕̃y(n) ∑ 𝜕̃y(n − j)
K
= −̃y(n − k) − bj (14.68)
𝜕bk j=1
𝜕bk

To show the method of realizing equations (14.67) and (14.68), we can write:
∑L −l
l=0 al Z N(Z)
H(Z) = ∑K =
1+ bk Z −k D(Z)
k=1

Then:
1
ỹ (n) = Z n−1 H(Z)X(Z)dZ
2𝜋j ∫𝛤
and consequently:
𝜕̃y(n) 1 1
= Z n−1 Z −l X(Z)dZ (14.69)
𝜕al 2𝜋j ∫𝛤 D(Z)
𝜕̃y(n) 1 (−1)
= Z n−1 Z −k H(Z)X(Z)dZ (14.70)
𝜕bk 2𝜋j ∫r D(Z)
The gradient is thus calculated from the set obtained by applying x(n) and ỹ (n) to the circuits
corresponding to the transfer function 1/D(Z).
To simplify the implementation, the second terms in equations (14.67) and (14.68) can be ignored,
which leads to the following increments:

dal (n) = 𝛿 [y(n) − ỹ (n)]x(n − l); 0≤l≤L (14.71)

dbk (n) = −𝛿 [y(n) − ỹ (n)]̃y(n − k); 1≤k≤K

For each value of the index n, the coefficients 𝛼 l and bk are incremented by a quantity which is
proportional to the product of the error e(n) = y(n) − ỹ (n) with x(n − l) and ỹ (n − k), respectively.
The stability and parameters of this type of filter can be studied in the same manner as for FIR
filters [5]. However, it is not difficult to control the stability of an IIR filter when it is constructed
as a cascade of second-order sections, which, as indicated in previous chapters, also offers other
advantages.
Consider a filter constructed as a cascade of second-order elements and with a transfer function
G(Z). In the AR case, we have:

L
1
G(z) = a0 (14.72)
i=1 1 + bi1 Z −1 + bi2 Z −2
14.9 Conclusion 293

To control the stability of such a filter, it is sufficient, as set forth in Section 6.7, to ensure that the
following conditions are fulfilled:
|bi2 | < 1; |bi1 | < 1 + bi2 ; 1 ⩽ i ⩽ L (14.73)
As before, the calculation of the gradient of the error function requires knowledge of the term gki ,
where:
𝜕̃y(n)
gki (n) =
𝜕bik
Since,

1 ∏
L
1
ỹ (n) = Z n−1 a0 X(Z)dZ
2𝜋j ∫𝛤 i=1 1 + b i −1
1 Z + bi2 Z −2
we obtain:
𝜕̃y(n) 1 Z −k
=− Z n−1 G(Z)X(Z)dZ
𝜕bik 2𝜋j ∫𝛤 1 + b1 Z + bi2 Z −2
i −1

This expression indicates that the terms gki (n), with k = 1, 2 and 1 ≤ i ≤ L, are obtained by applying
the set ỹ (n) to the ith recursive section. The stability of the system is tested by equation (14.73) for
each value of the index n.
The method which has been discussed is also applicable to an ARMA model, but in this case, the
circuits are rather more complicated.
The techniques used in the previous sections involve overall minimization of the mean square
error. It is also possible to achieve minimization step by step, using lattice structures.
The lattice filter in Figure 13.3 can also be adapted by a gradient algorithm for each value of the
index. Indeed, using the equations:
ei (n) = ei−1 (n) − ki bi−1 (n − 1)
(14.74)
bi (n) = bi−1 (n − 1) − ki ei−1 (n)
one can write the gradients as:
𝜕e2i (n)
= −2ei (n)bi−1 (n − 1)
𝜕ki
(14.75)
𝜕b2i (n)
= −2bi (n)ei−1 (n)
𝜕ki
and the following variations can be applied to the coefficients by assuming that the functions
[ ]
E e2i (n) + b2i (n) for 1 ⩽ i ⩽ N are to be minimized:
ki (n + 1) = ki (n) + 𝛿i (ei (n)bi−1 (n − 1) + bi (n)ei−1 (n)) (14.76)
As the power of the signals ei (n) and bi (n) decreases with the index i, the variation step 𝛿 i must
be related to this power in order to obtain a certain degree of homogeneity of the time constants.

14.9 Conclusion
Several techniques for designing and producing adaptive filters have been presented in this chapter.
They are based on the gradient algorithm, which is the simplest and most robust approach for
294 14 Adaptive Filtering

changing the coefficients. The direct FIR structure has been studied in detail by developing the
adaptation parameters (the time constant and the residual error) and the complexity parameters
(multiplication rate and the number of bits of the coefficients and internal data). This is the struc-
ture most commonly used in practice. In some specific cases, different structures, such as IIR, mixed
FIR–IIR, or lattice structures, can offer significant advantages. Analysis of the stability conditions
in these structures and the study of the adaptation and complexity parameters can be performed
by a method similar to that given for the FIR structure.
The gradient algorithm results in a relatively slow change in the values of the coefficients of the
filter, especially when a low residual error is required and when it is used in its most reduced form:
the sign algorithm. In order to find the most rapid rate of adaptation, all the coefficients can be
recalculated periodically by using fast iterative procedures. The lattice structure is well suited to
this approach and allows for real-time analysis or modeling of signals such as speech with circuits
of moderate complexity.
The gradient algorithm can be improved, for example, by using different adaptation steps for the
coefficients, which are obtained from statistical estimates of the signal characteristics.
It is possible to envisage criteria which, for certain applications, are more appropriate than the
minimization of the mean square error, and algorithms which are more efficient than the gradient
algorithm can be developed [1]. However, these algorithms are generally more complicated to put
into operation, and problems of sensitivity to any imperfection in the realization may arise.
In conclusion, FIR and IIR structures operating according to the least mean square error crite-
rion and using the gradient algorithm, or its simplest form, the sign algorithm, offer a simple and
effective compromise for adaptive filtering applications.

Exercises
14.1 A signal x(n) can be modeled by the output of a filter H(Z) when the input is a unit power
white noise. The Z-transfer function is:
1∕ (1
2 + Z −1 )
H(Z) =
1 − 0.5Z −1
Compute the impulse response and show that the three first terms of the AC function
take on the values: r 0 = 1; r 1 = 0.75; r 2 = 0.375.
For compression, a two-coefficient predictor P(Z) is used, and the output sequence is:

y(n) = x(n) − a1 x(n − 1) − a2 x(n − 2)

Compute the optimum coefficient values – those which minimize the output power.
Give the output power value and the prediction gain Gp .
Compute the impulse response of the cascade H(Z)P(Z).
In an adaptive realization with the gradient algorithm, give the coefficient updating
equations and the maximum adaptation step value 𝜹m . Compute the residual error for
adaptation step 𝜹 = 𝜹m /4 and give the time constant of the adaptive predictor.

14.2 A sinusoidal signal x(n) is applied to a second-order FIR prediction filter:


( )
3n
x(n) = sin 2𝜋
8
Calculate the coefficients a1 and a2 of this filter and locate the zeros in the Z-plane.
References 295

Starting from zero initial coefficients, examine the trajectory of the zeros of this filter for
an adaptation step 𝛿 = 0.1.
Give the new values of the prediction coefficients when discrete white noise of power 𝜎 2
is added to the signal.

14.3 An FIR filter is used to equalize a channel with transfer function:


0.5
C(Z) =
1 − 0.5 Z −1
Calculate the power of the received signal x(n), assuming uncorrelated unit power data.
For N = 3 coefficients, provide the values.
A white noise with power 𝜎b2 = 0.1 is added to the signal x(n). Calculate the new coeffi-
cient values and explain the difference. Give the noise amplification factor.
Using the channel + equalizer impulse response, compute the power of the inter-symbol
interference.
Compute the eigenvalues of the 3 × 3 AC matrix of the signal x(n). In an adaptive realiza-
tion, the step size is 𝛿 = 0.05. Give the time constant of the equalizer and check by simula-
tion.

14.4 Consider Figure 14.3 with the following filters:


1 + 0.5Z −1 a0 + a1 Z −1
Hopt (Z) = ; H(Z) =
1 − 0.5Z −1 1 − bZ −1
The input is a unit power uncorrelated sequence, and the power of the noise b(n) is 𝜎b2 =
0.1.
Compute the optimum coefficient values of the adaptive filter.
In an adaptive realization, give the coefficient updating equations and the bounds of the
adaptation step 𝜹. Draw the curves of the evolution of the coefficients for 𝜹 = 0.1 and verify
the time constant and residual error power after convergence.

References

1 B. Widrow and S. D. Stearns, Adaptive Signal Processing, Prentice-Hall, Englewood Cliffs, NJ,
1985.
2 M. Bellanger, Adaptative Digital Filters, 2nd Edition, Marcel Dekker Inc., New York, 2001.
3 O. Macchi, Adaptative Processing, the LMS approach with Application in Transmission, Wiley,
New York, 1995.
4 S. Haykin, Adaptative Filter Theory, 4th Edition, Prentice-Hall, Englewood Cliffs, New Jersey,
2000.
5 P. A. Regalia, Adaptative IIR Filtering in Signal Processing and Control, Marcel Dekker Inc.,
New York, 1995.
297

15

Neural Networks

Adaptive systems are at the heart of digital communication networks. They are particularly critical
for data transmission efficiency, which is achieved through channel equalization. This operation
involves an initial learning phase, an operational phase, and subsequent decisions and improve-
ments throughout the duration of the communication. Thanks to artificial intelligence, these
concepts can be extended to all technical fields, with operational devices that are ever more
complex and sophisticated. Signal processing and adaptive techniques, as presented above, are
profoundly involved in one such device – namely, the neural network.
The present chapter describes how neural networks operate, and how signal-processing tech-
niques, and specifically adaptive techniques, are exploited. As a starting point, we look at a simple
classification operation.

15.1 Classification
Classification is a basic operation in shape recognition. Its complexity depends on the space in
which it is carried out. In a two-dimensional space, objects defined by their coordinates can be
grouped together, and the different groups can be separated by curves. Whenever a new object
appears, it is assigned to an existing group, depending on its position with respect to the separation
curves. An illustration is provided in Figure 15.1.
N objects are separated into two groups (a and b) by a line with equation:
h0 + h1 X1 + h2 X2 = 0 (15.1)
The structure of the corresponding classifier is shown in Figure 15.2.
Assuming the objects are represented by points in the plane, during the learning phase, the coor-
dinates of the two categories of points are fed to the input, the output is subtracted from a reference
d, and the error e obtained is used to determine the coefficients of the separation line.
An iterative procedure can be employed to obtain the coefficient vector H, starting from an initial
vector. As in the previous chapter, we can write:
𝛿
H(n + 1) = H(n) + e(n + 1)X(n + 1); 0≤n≤N −1 (15.2)
X(n + 1)X t (n + 1)
with:
e(n + 1) = d(n + 1) − X t (n + 1)H(n) (15.3)

Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
298 15 Neural Networks

X2 Figure 15.1 Using a 3-coefficient filter for


classification.
a a h0 + h1 X1 + h2 X2 = 0
a
a a a
b
a b
a a
b b
a b X1
b b
b b
b
b
b

h0 Figure 15.2 Filter for distinguishing two groups of objects.


1
h1
x1 + – d
y
h2
x2
e

The procedure and the reference must be adjusted to the two types of data (a and b) to achieve
the desired separation. In the absence of prior information, the reference is null (d(n)=0).
The adaptation step 𝛿 = 1 leads to cancellation of the a posteriori error as shown in Section 14.6.
Hence the need to select 0 < 𝛿 < 1.
Once the learning phase is completed, the sign of the output y(n) is used for the classification
operation. In fact, a decision device is inserted at filter output and the whole structure is called a
“neuron”.
Now, if two other groups of objects (c and d) are introduced, two lines are needed to divide the
plane into four parts and potentially achieve separation, which leads to the combination of two
neurons, as shown in Figure 15.3. The system has two inputs fed by the coordinates of the objects
and two outputs which, if binary ±1, allow for an object to be assigned to one of the four groups.
Nonbinary outputs can also be exploited; they are interpreted as estimations of the probability for
the objects to belong to the groups.
The approach can be generalized – a neuron can have a large number of inputs and related coef-
ficients, and several such neurons can be connected to make a system with multiple inputs and
multiple outputs, along with their decision devices. Such a system is called a “perceptron” [1].
Now, if the curve needed to separate the groups of objects in the plane is no longer a line,
nonlinear processing is required, which is achieved through the combination of several per-
ceptrons. Then, it can be shown that 2 perceptrons are able to distinguish convex domains and

h11 Figure 15.3 Classification with two neurons.


x1

+ + y1
h12

h21 h01

+ + y2
h22
x2
h02
15.2 Multilayer Perceptron 299

3 perceptrons can cope with the combination of convex domains – that is, domains of any form.
Such systems are called multilayer perceptrons.
Besides object classification, neural networks can serve to approximate or estimate nonlinear
functions. This stems from the partitioning capability provided by their decision, or activation, of
nonlinear functions. In theory, any function can be approximated with arbitrary precision,
although some constraints may have to be introduced for actual realization [2]. This aspect is dealt
with at the end of the chapter.

15.2 Multilayer Perceptron

A system with two intermediary layers, called hidden layers, is shown in Figure 15.4. If all the
connections exist, the network is said to be fully connected.
By definition, in a network made of L+1 layers, the input and output correspond to layers 0 and 1,
respectively, and L-1 hidden layers are present in between. The system may be parameterized as
follows:

– Number of neurons per layer: nl ; 1 ≤ l ≤ L


– So-called synaptic coefficients: hij, l ; 1 ≤ i ≤ nl − 1 ; 1 ≤ j ≤ nl
– Nonlinear activation function: xj, l = f (uj, l )
– Synaptic weightings:

nl−1
uj,l = hij,l xi,l−1 + h0j,l (15.4)
i=1

The term h0j, l in the above convolution represents the introduction of a bias, linked to the statis-
tics of the data and the type of activation function. It may be null for zero-mean data.
The nonlinear activation function f (.) may be the sign, as in Figure 15.3, on which a decision is
based. However, if the synaptic coefficients in the various branches have to be updated in supervised
learning, the activation function must be derivable, so that gradient techniques can be employed.
For example, the sigmoid:
eu eu
f (u) = ; f ′ (u) = = f (u)(1 − f (u)) (15.5)
1 + eu (1 + eu )2
or the hyperbolic tangent:

f (u) = tanh(u); f ′ (u) = 1 − tan h2 (u) (15.6)

are frequently retained.

Figure 15.4 Neural network with 2 hidden layers. Layer 2


Input Layer 1
Output
300 15 Neural Networks

uj,l xj,l Figure 15.5 Connecting two stages in the


network.

xi,l – 1 f(.) uk,l + 1

hi,j,l hjk,l + 1

Expression (15.4) and the details of the connections between two stages in the network are illus-
trated in Figure 15.5.
In the supervised-learning phase, a reference sequence is available, and an error signal is derived
from the network output, through subtraction, as in the previous chapter. The presentation of the
learning algorithm is simplified if all the layers have the same width N and the output does not
include a nonlinear function, as in estimation for example.

15.3 The Backpropagation Algorithm


Let X 0 (n) be the input vector at time n, Y (n) the output vector, and D(n) the reference vector.
In the different layers, the state vectors U l (n), X l (n) (1 ≤ l ≤ L − 1) are available, as are the synaptic
coefficients hij, l (n), which need to be updated at time n+1 with the gradient technique.
Applying the vector X 0 (n + 1) at network input, the output is Y (n + 1), and the following error
signal is derived:
ej,L (n + 1) = dj (n + 1) − yj (n + 1); 1≤j≤N (15.7)
The output is expressed as a function of the last state vector X L − 1 (n + 1) by:

N
yj (n + 1) = t
hij,L (n)xi,L−1 (n + 1) = Hj,L (n) XL−1 (n + 1); 1 ≤ j ≤ N (15.8)
i=1

As in Section 14.1, the coefficient updating equation is:


Hj,L (n + 1) = Hj,L (n) + 𝛿L XL−1 (n + 1) ej,L (n + 1); 1≤j≤N (15.9)
𝛿 L is the adaptation step size.
As for stage L − 1, the coefficients impact the whole error signal, which implies that the global
quadratic cost function is involved – that is:

1∑ 2
N
J(n + 1) = e (n + 1) (15.10)
2 k=1 k,L
Since the coefficients evolve in the opposite direction to the gradient, the updating equation
should be:
Hj,L−1 (n + 1) = Hj,L−1 (n) + 𝛿L−1 XL−2 (n + 1) ej,L−1 (n + 1); 1 ≤ j ≤ N (15.11)
Then, it is necessary to determine the “error” at stage L − 1. Let us assume a deviation 𝜀 is intro-
duced on the coefficient hij, L − 1 . Since, by definition, we have:

N
uj,L−1 = hij,L−1 xi,L−2 (15.12)
i=1
15.3 The Backpropagation Algorithm 301

the variation of uj, L − 1 will be 𝜀 xi, L − 2 and, for the output with index k, yk , it will be

f (uj, L − 1 )𝜀 xi, L − 2 hjk, L , which leads to the following derivative:
𝜕yk
= f ′ (uj,L−1 ) xi,L−2 hjk,L (15.13)
𝜕hij,L−1

Summing for all the output errors, the gradient of the global cost function is obtained:

𝜕J ∑ N
𝜕yk
=− ek,L (15.14)
𝜕hij,L−1 k=1
𝜕hij,L−1

Worth noting also is the following general expression:

𝜕J 𝜕J 𝜕uj,L−1
= (15.14-bis)
𝜕hij,L−1 𝜕uj,L−1 𝜕hij,L−1

Due to (15.13) and (15.14), the error to be used for the updating of the coefficients of stage L − 1
at time n + 1 is given by:


N
ej,L−1 (n + 1) = f ′ (uj,L−1 (n + 1)) hjk,L (n)ek (n + 1) (15.15)
k=1

As concerns other stages, referring to Figure 15.5, the following relation holds:

𝜕J ∑N
𝜕J
= f ′ (uj,l ) hjk,l+1 (15.16)
𝜕uj,l k=1
𝜕u k,l+1

It follows that relation (15.15) can be extended to stages lower than L − 1 and a recurrence
equation is established for the “errors”:


N
ej,l−1 (n + 1) = f ′ (uj,l−1 (n + 1)) hjk,l (n)ek,l (n + 1) (15.17)
k=1

Therefore, synaptic coefficients can be updated at time n + 1, from stage L − 1 to stage 1, which
is the backpropagation algorithm.
The algorithm must be initialized so that backpropagation can occur. Unless otherwise specified,
the coefficients of the last stage are set to zero. In order for the algorithm to start, the last state vector
must differ from zero, which is obtained by setting the coefficients of the stages smaller than L to
values such that the first input vector propagates up to stage L − 1. If a bias is present, the set of
updating equations is supplemented by:

h0j,l (n + 1) = h0j,l (n) + 𝛿l ej,l (n + 1); 1 ≤ j ≤ N; 1≤l≤L−1 (15.11-bis)

As shown in the previous chapter, the adaptation step controls the stability, the convergence
speed, and the residual error after convergence. However, the results cannot be applied directly
due to interdependence of coefficients; some specific analysis is required.
The simplest case is the last stage because each coefficient vector is updated from the related
output error. A key observation is that the values in the state vector are bounded by the activation
function, assumed to be a sigmoid or hyperbolic tangent. If that bound is unity, an initial determi-
nation of the range for the adaptation step in the last stage is:
2
0 < 𝛿L < (15.18)
N
302 15 Neural Networks

Further in the analysis, hidden layers must be accounted for. To begin with, a system with a single
hidden layer, L = 2, and a single output with index k, is considered. Then, a simple approach is to
refer to the a posteriori error obtained after coefficient updating by:

𝜀k,2 (n + 1) = dk (n + 1) − Hk,2
t
(n + 1)X1 (n + 1) (15.19)

As mentioned in the previous chapter, the stability condition implies that the mean of the ampli-
tude of the a posteriori error must be less than the mean of the amplitude of the a priori error:

E[ |𝜀k,2 (n + 1)|] < E[|ek,2 (n + 1)|] (15.20)

Once the coefficients of layers 1 and 2 have been updated, the a posteriori error becomes as
follows, neglecting second-order terms:
[ ]2
⎡ ∑N
∑N N
∑ 𝜕yk ⎤

𝜀k,2 (n + 1) ≈ ek,2 (n + 1) 1 − 𝛿2 xj,1 (n + 1) − 𝛿1
2 ⎥ (15.21)
⎢ 𝜕h ⎥
⎣ j=1 j=1 i=1 ij,1

Introducing the activation function and for a single adaptation step, 𝛿, the stability condition
becomes:
2
0<𝛿< (15.22)
A+B

N
A= [f (uj,1 (n + 1))]2
j=1


N

N
2
B= xi,0 (n + 1) [f ′ (uj,1 (n + 1)) hjk,2 (n) ]2
i=1 j=1

The importance of the hyperbolic tangent, bounded as well as its derivative, clearly appears in
the expressions above.
Now, when all the outputs are active, equation (15.21) becomes a matrix equation. Letting H 1 and
H 2 designate the N × N coefficient matrices, it can be shown that the a priori and the a posteriori
output vectors are related by:
[ ]
∈ (n + 1) = 1 − 𝛿2 |f (U1 (n + 1)|2 IN − 𝛿1 |X0 (n + 1)|2 H2t (n) Du H2 (n) E(n + 1) (15.23)

Du is a diagonal matrix whose N entries are: [f (uj1 (n + 1))]2 . In order to determine the stability
limit, the maximum eigenvalue 𝜆max of the matrix H2t Du H2 must be considered and relation (15.22)
becomes:
2
0<𝛿< (15.24)
|f (U1 (n + 1))|2 + |X0 (n + 1) |2 ⋋max
Note that the maximum eigenvalue of a square matrix is bounded by the sum of the absolute
values of the elements of a row or a column.
The above analysis can be extended to systems having several hidden layers. However, in practice,
and as a first attempt, one can simply rely on relation (15.18).
The system time constant impacts the length of the learning sequence. It is the sum of the time
constants of the different stages, which are proportional to the inverses of the adaptation steps and,
thus, are proportional to the number of coefficients, as indicated in Section 14.3. As concerns the
cascade of the different hidden layers, one can refer to the beginning of Section 6.2. Overall, a high
number of coefficients leads to long learning sequences.
15.4 Examples of Application 303

Regarding the residual error after convergence, the results of the previous chapter apply – in
particular, the fact that the residual error grows near the stability limit.
The above processing also applies to nonzero mean signals. It suffices to introduce, in each
neuron, the branch corresponding to coefficient h0 and update, for example, according to relation
(15.11-bis).
It is worth mentioning that, in principle, alternative optimization techniques, which are more
complicated and harder to understand and implement than the gradient, may be employed [3].

15.4 Examples of Application


1. Nonlinear function. In order to illustrate how the adaptation algorithm works, it is applied to
the elementary network shown in Figure 15.6, which corresponds to two inputs, two outputs
and a single hidden layer.
To begin with, it is proposed to approximate a nonlinear function with one output only and the
following reference sequence:
d(n) = x2 (n) + 2 x1 (n) − x13 (n) (15.25)
The data are assumed to have a zero mean. At time n + 1 we get:
e(n + 1) = d(n + 1) − y(n + 1) (15.26)

y(n + 1) = h112 (n) tanh(u1 (n + 1)) + h212 (n) tanh(u2 (n + 1))

u1 (n + 1) = h111 (n)x1 (n + 1) + h211 (n)x2 (n + 1)

u2 (n + 1) = h221 (n)x2 (n + 1) + h121 (n)x1 (n + 1)


For coefficient updating:
hj12 (n + 1) = hj12 (n) + 𝛿 e(n + 1) tanh(uj (n + 1)); j = 1,2
( )
ej,1 (n + 1) = 1 − u2j (n + 1) hj12 (n)e(n + 1); j = 1,2 (15.27)

hij1 (n + 1) = hij1 (n) + 𝛿 ej,1 (n + 1) xi (n + 1); i, j = 1,2


Initial coefficient values are set to zero, except h111 and h221 , which are set to 1. Using N = 1000
pairs of random data in the interval [−1 1] and the adaptation step 𝛿 = 0.1, we obtain:

h111 h121 h211 h221 h112 h212


1.67 −0.05 0.07 1.07 1.27 1.07

Figure 15.6 Approximation of a nonlinear h111 h112


function. x1 f(.) – d
h121 y
h212 e

h122
h211
x2 f(.) – d1
h221 h222 y1
e1
304 15 Neural Networks

The standard deviation after convergence is 5%. Invoking the series expansion:
1 2
tanh(u) = u − u3 + u5 + … (15.28)
3 15
Estimations a, b of the coefficients of the terms x2 , x1 in (15.25) are given by:
a = h211 h112 + h221 h212 = 1.23; b = h111 h112 + h121 h212 = 2.07 (15.29)
Comparing to the slope of function x2 = f (x1 ) at the origin, which equals 2, we find b/a = 1.7.
Note that the function has a maximum and, in contrast, a monotonic increasing function would
yield a more accurate estimation, as illustrated in Exercise 5.
As for the coefficient of the term −x13 , setting the factor of x2 to unity, we obtain:
( )
1 3
c= h111 h112 ∕a3 = 1,06
3
In general, accurate estimation of nonlinear functions, with the above activation functions and
a single hidden layer, requires large numbers of neurons [2].
2. Classification. The network shown in Figure 15.6 is able to perform a classification operation
in the plane with its 2 outputs and the curves:
x2 + 2x1 − x13 = 0; x2 − 2x1 + x13 = 0 (15.30)
Four areas are distinguished, and the input data pairs can be assigned to one of those. In the
process, equations (15.26) and (15.27) are supplemented by:
e1 (n + 1) = d1 (n + 1) − y1 (n + 1) (15.31)

y1 (n + 1) = h122 (n) tanh(u1 (n + 1)) + h222 (n) tanh(u2 (n + 1))


and coefficient updating:
hj22 (n + 1) = hj22 (n) + 𝛿 e1 (n + 1) tanh(uj (n + 1)); j = 1.2 (15.32)
( )
ej,1 (n + 1) = 1 − u2j (n + 1) (hj12 (n)e(n + 1) + hj22 (n)e1 (n + 1)); j = 1.2. Using N = 1000 pairs
of random data in the interval [−1 1] and the adaptation step 𝛿 = 0.1, we find:

h111 h121 h211 h221 h112 h212 h122 h222


1.82 −0.01 −0.015 1.02 1.22 1.16 −1.18 1.16

Worth noting is the impact of the second output on the coefficients of the first part of the circuit.
After learning, the classification error is less than 2%.
3. Graphic symbol recognition. Image recognition is among the most important applications of
neural networks. In order to provide a simple yet convincing illustration, digit recognition is
considered.
The 10 digits can be represented in a grid of 5×3 pixels, as shown in Figure 15.7.
Weights −1 and +1 are assigned to white and black pixels, respectively. The digits can be identi-
fied through 10 sets of 15 coefficients each. These coefficients take values ±1/15 and the summation
outputs are included in the interval [−1 +1]. Note that digits may differ by just a single pixel. Indeed,
it is not necessary to use a network with hidden layers in that case.
However, for recognition of handwritten digits, redundancy must be introduced to account for
variations in handwriting. A network with 2 hidden layers is represented in Figure 15.8. The input
grid contains N1 pixels.
15.4 Examples of Application 305

Figure 15.7 Representing the 10 digits in a grid of 5 × 3 pixels.

x1 x1,1 x1,2
0

Layer 1 Layer 2

9
xN1 xN2,1 x15,2

Figure 15.8 Recognition of handwritten digits.

From N1 input values, after learning, layers 1 and 2 yield a set of discriminating values that are
used by the 10 sets of output coefficients to provide 10 final values, to decide which digit was applied
at input. Learning can be performed digit by digit, using only one output error and scanning all the
digits at each iteration.
As an illustration, we take N1 = 8×8, N2 = 32 and consider three digits. Learning is carried out
with 3, 5 and 7, step 𝛿 = 0.01 and 150 iterations. In the coefficient matrices of layers 1 and 2, the
diagonal entries of the square central matrices are set to 1 and a single output error is used for
coefficient updating. The following discriminations are obtained for the digits, in the absence of
noise (a), with SNR = 5 dB (b):
(a)
3 5 7
3 1 0.11 −0.37
5 0.11 1 −0.22
7 −0.36 −0.22 1

(b)
3 5 7
3 1 0.14 −0.49
5 0.11 1 −0.36
7 −0.43 −0.30 1

Clearly, recognition is easily achieved, even in the presence of a high level of noise. The impor-
tance of initialization is worth pointing out – a reduction of initial coefficient values leads to an
increase of the system time constant and then, it might be necessary to include all the output errors
in the coefficient updating process.
In order to cope with multiple ways of writing the same digit, with significant differences, the
size of the input grid must be increased, and the learning process must be adjusted accordingly, if
a recognition rate objective is prescribed.
The above example illustrates the role of the hidden layers – in one or more steps, they condense
the input grid onto the grid employed by the output stage. Of course, the computational cost grows
with the size of the input grid and might become prohibitive.
306 15 Neural Networks

15.5 Convolution Neural Networks

In certain fields, such as image processing or vision, the amount of data to be processed simultane-
ously might be enormous, leading to layers of excessive dimensions in the networks. For example,
take a black-and-white image of size 32 × 32. The dimension of the input vector is 1024 and the
number of coefficients in the layers may reach the millions, which is unrealistic.
However, it is known that compression techniques are particularly efficient with images and
certain categories of signals, which is evidence that these signals are highly redundant. In such
conditions, it appears well advised – for example, in recognition or classification operations – to
introduce one or several compression stages before the neural network. These extra stages can
extract characteristics which may considerably reduce the dimensions of the hidden layers and
facilitate the mastering of networks. Generally, the corresponding devices are banks of FIR fil-
ters, which perform convolutions or correlations, to cancel out redundancies while preserving the
specificities of signals [4–6]. The principle is shown in Figure 15.9.
The coefficients in the filter banks depend on the characteristics that are to be extracted.
For example, the filters mentioned in Section 5.14 for the extraction of contours in images are
frequently used. At the output, a table of characteristics are available. Several such devices may
be cascaded, in combination with sample rate reductions, so as to end up with a neural network
input vector of significantly reduced size.
In the approach, taking the characteristics of images into account, it might be well advised to
use activation functions other than sigmoid or hyperbolic tangents. In fact, in classification for
example, we can consider that the desired output is obtained by aligning the input data vector and a
coefficient vector, such that the scalar product is maximized. In a multilayer network, the alignment
is achieved step by step. The outputs of the first layer correspond to the scalar products of the input
vector and the coefficient vectors associated with these outputs. A negative output implies opposite
directions for the vectors involved and, thus, it is useless to keep this coefficient vector in the chain
leading to the desired global alignment, and this output can be dropped. Hence, the nonlinear
activation function ReLU (rectified linear unit), which is zero for negative values of the variable
and linear for positive values. In order to make the procedure even more selective, a threshold can
be introduced which cancels the small positive values. In fact, it can be argued that the operation
of a perceptron layer is similar to a data-clustering operation.
In the implementation of a ReLU-based CNN system, it is worth emphasizing that the derivative
of the activation function is not continuous, and the signal amplitude is not bounded, which
requires stability control. To that end, an upper bound may be imposed on the activation function
output and the step size may be chosen to satisfy the conditions given in Section 15.3. More-
over, initialization may be challenging and, instead of assigning random values to the coefficients,
one can employ data-clustering techniques like “k-means” [6].
Reference [4] describes an application of CNN to handwritten character recognition.

Figure 15.9 Convolutional neural network (CNN).

X Filter Neural Y
bank network
15.6 Recurrent/Recursive Neural Networks 307

15.6 Recurrent/Recursive Neural Networks


The techniques described so far apply to blocks of data and they do not account for the evolution
of signals over time. Whenever streams, such as video, audio, natural language, or temporal series
delivered by sensors, have to be processed, it is necessary to add, into the system, the evolution at a
given time, the previous states and, as in Section 14.8 for adaptive filters, we end up with an infinite
impulse response [5, 7].
A so-called recurrent neural network with a single hidden layer is depicted in Figure 15.10.
Using the terminology of the previous chapter, actually, it is a recursive structure.
The system equations are as follows:
X1 (n) = tanh[AX(n) + BX1 (n − 1) + b1 ] (15.33)
and, if a nonlinear activation function is introduced at output:
Y (n) = f [H X1 (n) + b2 ] (15.34)
The terms b1 and b2 are bias coefficients related to the activation functions. The 3 matrices A, B,
and H must have their coefficients updated.
Assuming the vectors X, X 1 , and Y have dimension N, the coefficients of matrix H can be
updated, as above, using the output errors:
Hj (n + 1) = Hj (n) + 𝛿 X1 (n + 1) ej (n + 1); 1≤j≤N (15.35)
As for the matrices A and B, the derivative of the output error with respect to the coefficients
is more difficult to determine because of the recursive structure and the nonlinearity in the loop.
Taking a simplified approach as in Section 14.8, we can write:

N

N
ui (n + 1) = aki (n) xk (n + 1) + bki (n) xk,1 (n + 1) (15.36)
k=1 k=1
∑N
ei,1 (n + 1) = [1 − tan h2 ui (n + 1)] hij (n) ej (n + 1)
j=1

Ai (n + 1) = Ai (n) + 𝛿1 ei,1 (n + 1) X(n + 1)

Bi (n + 1) = Bi (n) + 𝛿2 ei,1 (n + 1) X1 (n + 1)
It must be pointed out that, for matrix B, the data are bounded by the nonlinearity, while the
input data do not necessarily have the same bound, which can lead to different adaptation steps 𝛿 1
and 𝛿 2 for the two matrices.
A critical issue for recursive systems is stability. In the long run, the evolution of the coefficients
must be controlled. If matrix B is diagonal, or near diagonal, the recursive part becomes a set of
first-order cells that evolve separately, and the modulus of each single coefficient must remain
smaller than unity.
An important aspect of neural networks in general, and the recursive structure in particular, is
overdimensioning. In fact, the number of coefficients generally exceeds the number of variables

Figure 15.10 Recursive neural network with a X1(n)


single hidden layer. X(n) A + tanh(.) H Y(n)

B Z–1
308 15 Neural Networks

of the system to be modelled and, as a result, we see a drift of the adaptive network coefficients
linked to the adaptation parameters. That drift can raise problems in long learning phases and in
steady-state operation. Then, it may be advisable, in the updating equations of some of the coeffi-
cients, to introduce a regularization term 𝜀 which, as described in Section 14.6, limits the amplitude
of the coefficients to 𝛿𝜀 .

15.7 Neural Network and Signal Processing


Thanks to their learning capability, neural networks are able to perform all sorts of functions and,
in particular, signal-processing functions, provided a sufficient number of input sequences is avail-
able, as well as the corresponding output sequences. There is no need to know how the inside of
the system works; this system is viewed as a black box [8].
Such an approach might require considerable amounts of calculations, but the same hardware
or software can implement different tasks with or without reconfiguration.
As an illustration, let us consider a discrete cosine transform (DCT) in the presence of an interfer-
ence signal. With a neural network, there is no need to identify the jammer to enhance performance
with respect to the direct calculation.
The DCT outlined in Section 3.3.3 is implemented using a fully connected neural network with
N = 16 inputs, 16 outputs, and 2 hidden layers also of width N. The input vector X is a binary
zero-mean random sequence, and the reference Y is obtained by applying relations (3.32). The tan
activation functions are used and coefficient updating is performed with step 𝛿 = 0.02. The coeffi-
cients hij1 for layer 1, hij2 for layer 2, and gij for the output layer are initialized as follows:
hii1 = 1; hii2 = 1; hij1 = hij2 = 0 i ≠ j; gij = 0; 1 ≤ i, j ≤ 16
The jammer is the sinusoid:
( )
𝜋
s(n) = 0.1 sin 2.25 n ; 1 ≤ n ≤ 16 (15.37)
16
The evolution of the output error power Perr (m) for a sequence of M = 800 learning blocks is
shown in Figure 15.11, for the network with and without hidden layers.
The absence of hidden layers corresponds to iterative calculation, as in the previous chapter.

0 2 hidden layers
No hidden layers
–10
Perr (dB)

–20

–30

–40

–50
0 100 200 300 400 500 600 700 800
Length of learning sequence

Figure 15.11 Computation of a DCT in the presence of a jammer.


15.8 On Activation Functions 309

It can be observed that the two hidden layers double the time constant and bring a gain of about
12 dB in computation accuracy after convergence. In the absence of the jammer, the output error
power decreases to −65 dB, this threshold being due to the cascading of the two hidden layers, with
their nonlinearities.
The stability limit for the adaptation step is 1/8 without the hidden layers, and it turns out to be
slightly smaller with the two layers.
As concerns numerical complexity, the system has 768 coefficients, which implies that,
disregarding learning operations, 768 multiplications are carried out per DCT, while direct
calculation necessitates fewer than 64 multiplications.
The DCT is employed in image and video compression; a neural network can reduce the
degradations arising from imperfect sensors.

15.8 On Activation Functions

The nonlinear activation functions used in practice result from a compromise between contra-
dictory requirements. On the one hand, decisions must be made; on the other hand, adaptive
techniques (such as learning, in particular) must be implemented. Therefore, a stepwise charac-
teristic is needed for the decision, while a linear characteristic is preferred for adaptivity.
The sigmoid (15.5) is an initial compromise, which provides smooth decision and allows for the
direct use of gradient techniques. However, the output is bounded, amplitude distortions occur in
the vicinity of the bounds, and derivatives tend toward zero.
The ReLU function is an approach that keeps linearity on a part of the domain and cancels the
other part. Adaptive techniques can be applied in the retained part, on the condition that stability
conditions are met. If the cancelled part is unimportant in view of the objectives, ReLU can be
considered optimal. Thus, applied to the digit recognition task set out in Section 15.4 and with the
same parameters, it yields discrimination deviations smaller than those of the hyperbolic tangent.
The ReLU function is appropriate for function approximation because it has the capability to
divide a domain into nonoverlapping elementary parts. As an illustration, let us consider the
approximation of a half-sinusoid on the interval [01] by 4 segments bounded by the 4 lines whose
equations in the plane are:
√ √ √
y = 2 2 x; y = (4 − 2 2 ) x + 2 − 1; (15.38)
√ √ √ √
y = (2 2 − 4) x + 3 − 2; y = −2 2 x + 2 2

The block diagram of the network with two hidden layers is shown in Figure 15.12. In fact, the
input domain is divided into

four parts with

linear interpolation in each part. √
The coefficients a = − 22 and b = 1 − 22 are related to the slopes of the lines and = 22 .
At the output of the first hidden layer, the ReLU functions, represented by rectangles, keep only
positive values and, thus, set the negative values to zero and, in that case, also inhibit the addition
of the terms ± 1. The output y yields the approximation of sin(𝜋x).
In order to improve the approximation, the approach can be extended by dividing the interval
[0 1] into 2N parts with a network of N hidden layers.
A general method to approximate continuous functions on a closed interval using deep ReLU
networks is presented in Reference [9].
310 15 Neural Networks

+ Figure 15.12 Approximation of a


2 a half-sinusoid by a network with ReLU and
two hidden layers.
–1
+ + b
2 –2
+ y
x –1 1 a
2
–2 + c
+ b
–1
1 –2 +

15.9 Conclusion

In the field of artificial intelligence, neural networks are a widely used tool which can be lever-
aged in all technical areas and can produce spectacular results. They apply, among other areas, to
the approximation of functions, object classification, and shape recognition. The neural network
offers an alternative to specific techniques which exploit a priori knowledge on a given topic in
cooperation with dedicated algorithms. In fact, the two approaches can be combined, as they are
in convolutional neural networks.
On the operational front, the practical design and implementation of neural networks is a chal-
lenge. Firstly, in a given situation, the structure must be decided upon. One might think that the
more neurons are included in the system, the better will the performance be. However, then, besides
the computation requirements, multiple issues manifest themselves in relation to initialization,
response time, and overdimensioning, with related drifts and stability problems. Next, one must
get to grips with a system comprising large numbers of nonlinearities and be able to validate the
results. Finally, though the neural network is a potentially powerful tool, its exploitation requires
great care in order to be efficient and possibly competitive in terms of performance and technolog-
ical resources.

Exercises

15.1 Consider 2 groups of points in the plane, with coordinates:

A1 = [1,2}; A2 = [−1,1]; A3 = [−2,0]; B1 = [2,1]; B2 = [−1,−2]; B3 = [2,−2]

Give the equation of a separation line and compute the distances from the points to
that line.
Use an iterative method to find a separation line and check that the separation is achieved.

15.2 Three groups of points in the plane have the coordinates:

A (x1 , x2 ): (1,2) (0,4) (−3,3) (−2,−1) (−4,0); B: (1,−2) (2,−1) (4,−1) (4,1) (−3,−3) (5,−2)
C: (4,5) (6,2) (6,6) (4,6)
Exercises 311

Through observation of the points, give the equations of 2 lines which separate the
3 groups.
Propose an iterative method to reach that objective and check the results.

15.3 An adaptive system is defined by the relations:


y(n) = h1 (n − 1) x1 (n) + h2 (n − 1)x2 (n); e(n) = d(n) − y(n)
The input signals consist of real random variables uniformly distributed in the interval
[−1 1] and the reference signal is expressed by: y(n) = x2 (n) + 2x1 (n) − x13 (n).
Give the values of the optimum coefficients h1 and h2 and the corresponding error power.
Compare with the estimations in Section 15.4 and justify the differences.
For adaptation step 𝛿 = 0.1, compute the excess residual error.

15.4 A neuron is added to the network in Figure 15.6, so as to obtain a network with 2 hidden
layers. Give the block diagram. Keeping the reference sequence, perform a simulation and
determine the new estimations a, b, and c. Compare with the case of a single hidden layer.
White noise of amplitude 0.01 is added to the reference signal. Verify the impact on the
estimations and on the drift of the coefficients.

15.5 In the first example in Section 15.4, the reference sequence is changed and replaced by the
following nonlinear function:
( )
𝜋
d(n) = x2 (n) − sin x1 (n)
2
Simulations yield the values: a = 1.0744; b = 1.6154, and the standard deviation of the
error after convergence is 2%. Justify these values and the accuracy achieved.

15.6 In the third example in Section 15.4, digit 5 leads to the following values of the second layer
output:

X2 = 0.7 [−1 −1 −1 −1 −1 1 1 −1 −1 −1 1 1 1 1 1 ]

Give the values of coefficients g(i, 5) (1 ≤ i ≤ 15) in the output layer.


Compute the number of multiplications per iteration in the learning phase.
Based on the possibilities of the 5×3 grid, estimate the number of different forms of writ-
ing that could be taken into account by the 8×8 grid.

15.7 A source delivers 2 independent signals d1 (n) and d2 (n), uniformly distributed in the inter-
val [−1,1], which are fed to a device performing the processing:
d(n) = d1 (n) + jd2 (n); s(n) = d(n) + C d(n − 1); C = O.5 (1 + j)
( )
followed by a distortion of the amplitude a which becomes: f (a) = sin 𝜋a 4
4∕𝜋.
To retrieve the source signal, a recursive neural network with 2 inputs, 2 outputs, and 2
nonlinearities is employed. Write the equations of the network.
The coefficients of the input matrix, the recursive part, and the output matrix are
designated by hij, 1 , hij, 2 , and hij, 3 , respectively. A learning sequence yields the following
coefficients:
hij,2 = [−0.57 − 0.57 0.56 − 0.56]; h11,1 = 1.05; h11,3 = 1.25
Justify these values and the output error power E = 0.006.
312 15 Neural Networks

15.8 The impact of the initial conditions in the example provided in Section 15.7 is investigated.
Give the first value of the variables, xi1 and xi2 , at the output of the activation functions of
the first and the second hidden layers respectively. Answer the same question for the errors
ei1 and ei2 which are involved in updating at the next step. Deduce the impact of dividing
the initial coefficient values by 2.
Justify the values given for the limit of stability. Do likewise for the system time constant.

15.9 It is proposed to extend the block diagram in Figure 15.12 to obtain an approximation of
the half-sinusoid with 8 segments. Provide the corresponding diagram and the coefficient
values. Give an estimate of the maximum value of the approximation error.

References

1 F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in
the brain”, Psychological Review, vol.65, n∘ 6, pp. 386–408, 1957.
2 K. Hornik, M. Stinchcombe and H. White, “Multilayer feedforward networks are universal
approximators”, Neural Networks, vol.2, n∘ 5, pp. 359–356, 1989.
3 B.M. Wilamowski, “Neural networks architectures and learning algorithms”, IEEE Industrial
Electronics Magazine, vol.3, n∘ 4, pp. 56–63, 2009.
4 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document
recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
5 I. Goodfellow, Y. Bengio and A. Courville, “Deep Learning”, Cambridge, MA, MIT-Press, 2016.
http://www.deeplearningbook.org
6 C.C. Jay Kuo,” The CNN as a guided multilayer RECOS transform, IEEE Signal Processing Maga-
zine, 34 2017, pp.81–89.
7 M. Schuster and K.K. Paliwal, “Bidirectional recurrent neural networks,” IEEE Transactions on
Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.
8 M. Narwaria, “The transition from white box to black box”, IEEE Signal Processing Magazine, 38
2021, pp. 163–173.
9 D. Elbrächter, D. Perekrestenkp, P. Grohs and H. Bölcskei, “Deep neural network approximation
theory”, IEEE Transactions on Information Theory, vol.67, n∘ 5, 2021, pp. 2581–2623.
313

16

Error-Correcting Codes

Systems for information processing and transmission constitute a major field of application for
signal-processing techniques. Error-detection and correction techniques are also widely used in
these systems and, therefore, they coexist and even interact with signal processing in communica-
tion equipment.
Generally, coding is presented and taught with a mathematical approach [1]. However, some
of the most commonly used coding techniques exploit signal-processing concepts, results, and
algorithms [2]. For example, Reed–Solomon coding uses the discrete Fourier transform and linear
prediction, convolutional coding is FIR filtering, and turbo codes are related to IIR filtering.
This chapter provides an introduction to some important error-correcting codes from a
signal-processing perspective, allowing readers to gain an understanding of these codes and assess
their strengths and weaknesses. Moreover, a unified view of communication techniques may
result.

16.1 Reed–Solomon Codes


Reed–Solomon codes are extensions of the BCH (Bose–Chaudhury–Hocquenghem) codes to
multi-bit symbols [3, 4]. They exploit predictable signals generated by line errors to identify these
errors and subtract them from the received signal so as to recover the transmitted signal.
Before the codes are described, some addition information about linear prediction is provided.

16.1.1 Predictable Signals


A signal is said to be predictable if it satisfies a linear recurrence equation. For example, the real
sequence:
x(n) = A cos(n𝜔t + 𝜙) (16.1)
satisfies equation:
x(n) − 2 cos 𝜔x(n − 1) + x(n − 2) = 0 (16.2)

Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
314 16 Error-Correcting Codes

In fact, to cancel x(n), it is sufficient to apply the following FIR filter transfer function:

H(Z) = (1 − e j𝜔 Z −1 )(1 − e−j𝜔 Z −1 ) = 1 − 2 cos 𝜔Z −1 + Z −2 (16.3)

The signal is predictable, since, as long as two consecutive samples are known, the entire
sequence can be calculated.
Similarly, the complex signal:

P
x(n) = Ai e jn𝜔i (16.4)
i=1

is canceled by the so-called prediction FIR filter:



P

P
( )
H(Z) = 1 − H(z) = 1 − e j𝜔i z−1 ai z−i (16.5)
i=1 i=1

and it satisfies the recurrence equation:



P
x(n) = ai x(n − i) = 0 (16.6)
i=1

When signal samples are available, the P elementary signals making up x(n) can be extracted in a
few steps. First, compute the prediction coefficients ai (1 ≤ i ≤ P) from 2P consecutive samples with
the help of the following matrix, obtained by running the recurrence equation (16.6) P times:
⎡ x(P)x(P − 1) … x(1) ⎤ ⎡ a1 ⎤ ⎡ x(P + 1)⎤
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ x(P + 1)x(P) … x(2) ⎥ ⎢ a2 ⎥ ⎢ x(P + 2)⎥
⎢ . ⎥⎢ . ⎥ ⎢ . ⎥
⎢ . ⎥⎢ . ⎥⎥ ⎢⎢
=
. ⎥ (16.7)
⎢ ⎥⎢ ⎥
⎢ . ⎥⎢ . ⎥ ⎢ . ⎥
⎢ x(2P − 1)x(2P − 2) … x(P)⎥ ⎢ aP ⎥⎦ ⎢⎣ x(2P) ⎥⎦
⎣ ⎦⎣
This linear system is solved efficiently using an order iterative algorithm, beginning at order 1 and
finishing at order P. As for the computation load, 2(P + 1)(P + 2) multiplications and P divisions are
needed. For a given order i of the prediction filter, the algorithm computes the prediction error ei+1
at order i + 1 and stops when the error is zero. For i = P, if the error eP+1 is not zero, the number of
components in x(n) is greater than P.
Once the prediction coefficients have been obtained, the frequencies are determined by comput-
ing the roots Zi = ej𝜔i of the prediction filter transfer function H(Z) given by (16.5).
Finally, returning to (16.4), the amplitudes Ai are obtained by expressing P values of the signal
x(n), which yields the equations:
⎡ Z1 Z2 … Zp ⎤ ⎡A1 ⎤ ⎡ x(1) ⎤
⎢Z 2 Z 2 … … Z 2 ⎥ ⎢A ⎥ ⎢ x(2) ⎥
⎢ 1 2 p ⎥ ⎢ 2⎥ ⎢ ⎥
⎢ . . ⎥ ⎢ . ⎥=⎢ . ⎥ (16.8)
⎢ . . ⎥ ⎢ . ⎥ ⎢ . ⎥
⎢ p ⎥ ⎢ ⎥ ⎢ ⎥
⎣ Z1 … Zp ⎦
P
⎣ Ap ⎦ ⎣x(P)⎦
It can be observed that, if the frequencies 𝜔i are known, 2P samples enable us to compute 2P
amplitudes.
As above for (16.7), efficient techniques can be used to solve the matrix equation (16.8) and deter-
mine the amplitudes Ai (1 ≤ i ≤ P) without inverting the matrix.
16.1 Reed–Solomon Codes 315

For example, we can apply the signal sequence x(n) to the filter:

P
( )
Hj (z) = 1 − e j𝜔i z−1 (16.9)
i=1
i≠j
in order to get amplitude Aj .
Overall, it is verified that 2P signal samples are required to identify P components.
From the above development, it appears that two methods exist to generate samples of predictable
signals – one in the frequency domain and one in the time domain, which leads to two variants of
the codes.

16.1.2 Reed–Solomon Codes in the Frequency Domain


In order to protect a set of K samples, or symbols, x(n), 2P zeros are appended, which makes a
set of N = K + 2P symbols. A DFT of size N yields N values X(k) (0 ≤ k ≤ N − 1) that are fed to
the transmission channel. At the receiver side, the inverse DFT restores the initial symbols, in the
absence of error. However, if errors occur, the 2P appended samples are no longer zeros: they are
predictable signals, whose frequencies are related to the errors.
The identification of the error signal consists of computing the prediction coefficients ai (1 ≤ i ≤ P)
and, by recurrence, extending the error signal to the complete set of samples. The K initial symbols
are restored by subtraction. The 2P samples of the received signal which are not zero in the presence
of errors are called the “syndrome”. The procedure is illustrated in Figure 16.1.

K data 2P zeros
x(n)

⇓ FFT of order N = K + 2P

X(k)
(Transmitted signal)

⇓ Channel
X(k) + E(k)
(Received signal)

⇓ Inverse FFT
x(n) + e(n)

Syndrome
e(n)

⇓ Subtraction

x(n)
K data 2P zeros

Figure 16.1 Reed–Solomon coding in the frequency domain.


316 16 Error-Correcting Codes

A strong point of the approach is the simplicity of the decoder, since there is no need to separate
the components of the error signal. However, it requires DFT calculations and, more importantly,
the coding is not systematic. In systematic coding, the useful data are transmitted as such, and they
are just supplemented with protecting data. In fact, the procedure can be modified to obtain that
property.

16.1.3 Reed–Solomon Codes in the Time Domain


The procedure is shown in Figure 16.2.
The K data symbols are supplemented by 2P values, computed in such a manner that the inverse
DFT of the total set includes a subset of 2P zeros. At the receiver side, an inverse DFT is calculated
and the samples which should be zeros make the syndrome. The 2P samples of the syndrome allow
for the determination of P frequencies, along with the amplitudes of the P corresponding compo-
nents. Next, the error signal is identified and subtracted from the received signal to recover the
initial data.
In general, in order to limit the loss in transmission rate, the length of the syndrome is chosen
such that 2P ≪ K. In such conditions, it is advantageous to replace the inverse DFT with a set of 2P
filters, fed by the sequence [x(N − 1), x(N − 2), …, x(0)] as shown in Figure 16.3, when the syndrome
corresponds to the first 2P values of the inverse DFT.

It can readily be verified that X(k) = N−1 n=0 x(n)W
nk with W = e j2𝜋/N , once the N input values

have been processed.

x(n) K data 2P values


(Transmitted signal)

Inverse FFT of order N = K + 2P

2P zeros

Channel

r(n) = x(n) + e(n)


(Received signal)
Inverse FFT of order N

Syndrome

Identification of errors in position and amplitude

e(n)

Subtraction

x(n)
(Transmitted signal) K data 2P values

Figure 16.2 Reed–Solomon coding in the time domain.


16.1 Reed–Solomon Codes 317

1
1 – Z–1 X(0)

1
X(1)
1 – WZ–1
[x(0), x(1),…………,x(N–1)] Syndrome
-
-
-
-

1
X(2P – 1)
1 – W 2P–1Z –1

Figure 16.3 Computing the syndrome by filtering.

Finally, the processing in the receiver comprises the following operations:

– Compute the syndrome by filtering.


– Compute the prediction coefficients.
– Extract the zeros of the prediction filter.
– Compute the amplitudes of the components
– Subtract the components of the error signal.

On the transmitter side, the processing can also be performed by filtering. Assuming the data to
be transmitted are [x(2P), …, x(N − 1)], the 2P complementary values [x(0), …, x(2P − 1)] must be
computed. The filter with transfer function:
1 1
H(z) = ∏2P−1 = ∑2P (16.10)
i=0 (1 − wi z−1 ) 1+ i=1 ai z−i
has the following input–output relationship:

2P
y(n) = x(n) − u(n); u(n) = ai y(n − i) (16.11)
i=1

Taking the index inversion into account, the sequence x(2P − i) = u[N − 1 − (2P − i)], (1 ≤ i ≤ 2P)
is fed to the filter and the last 2P outputs – that is, [y(N − 2P,…, y(N − 1] – are zero. The operation
can be checked by taking P = 1, for example.
The above coding and decoding process has been presented assuming the arithmetic operations
are carried out in the complex field. However, the data which must be protected are generally
binary – that is, the signals x(n) are B-bit numbers, and the same applies to the 2P protecting values.
Then, in order for the numerical calculations to be exact, arithmetic operations must be carried out
in finite fields with 2B elements, as indicated in Section 3.7.

16.1.4 Computing in a Finite Field


Galois fields, GF(2B ), possess the required properties; they are algebraic extensions of the field
GF(2) = [0, 1]. They are defined from a degree-B polynomial g(x) irreducible on the field [0, 1].
318 16 Error-Correcting Codes

Table 16.1 GALOIS FIELD GF(24 )

Polar representation Polynomial representation Binary representation

0 0 0000
𝛼 0
1 0001
𝛼1 𝛼 0010
𝛼 2
𝛼 2
0100
𝛼3 𝛼3 1000
𝛼 4
𝛼+1 0011
𝛼5 𝛼2 + 𝛼 0110
𝛼 6
𝛼 +𝛼3 2
1100
𝛼7 𝛼3 + 𝛼 + 1 1011
𝛼8 𝛼2 + 1 0101
𝛼 9
𝛼3 + 𝛼 1010
𝛼 10
𝛼 + 𝛼+ 1
2
0111
𝛼 11 𝛼3 + 𝛼2 + 𝛼 1110
𝛼 12
𝛼 +𝛼 +𝛼+1
3 2
1111
𝛼 13 𝛼3 + 𝛼2 + 1 1101
𝛼 14 𝛼 3 +1 1001

The elements of the field are the successive powers of a primitive element 𝛼 such that 𝛼 M = 1 and
M = 2B – 1. The number N of code values must be less than or equal to M.
Example: B = 4; g(x) = x4 + x + 1; M = 15.
Letting 𝛼 4 + 𝛼 + 1 = 0, the 15 elements of the code are given in Table 16.1.
Coding and decoding consist of running the algorithms described in the above sections, using the
code table for the arithmetic operations. Appropriate and optimized algorithms have been devel-
oped; they are generally presented with the help of the polynomial terminology [1]. Accordingly, a
typical temporal decoding corresponds to the following sequence:
– Compute the syndrome (polynomial s(x) of degree 2P).
– Compute the localizing polynomial using the Berlekamp–Massey algorithm (iterative calcula-
tion of the prediction filter coefficients); extract the localizing zeros by the Chien method (sys-
tematic search among the elements of the field).
– Compute the amplitudes of the error by the Forney algorithm.
– Subtract the errors.
Frequency decoding necessitates the definition of a Fourier transform in the field GF(2B ), using
successive powers of a primitive element 𝛼.

16.1.5 Performance of Reed–Solomon Codes


A code C(N,K) is defined by the number K of data to be protected and the total number N of values
in the block. Such a code is able to detect N – K errors and correct P ≤ N−K
2
errors. The number of
16.2 Convolutional Codes 319

bits B of each value must be such that N < 2B . In total, each block contains KB useful bits protected
by (N – K)B redundancy bits. The performance of the code is measured by the probability error per
bit Pb after decoding, as a function of the error probability per symbol Ps before decoding. In case
of scattered errors in the channel, and if the bit error rate is P, the probability for a B-bit symbol to
be error free is: (1 − p)B and we get:
ps = 1 − (1 − p)B (16.12)
After decoding, there is no output error when the number of erroneous symbols in the block of
N symbols is less than the correction capability P. For P + 1 erroneous symbols, the number of bit
errors is at most (P + 1)B and the probability is:
Pp+1 = CNP+1 (ps )p+1 (1 − ps )N−P−1 (16.13)
The corresponding bit error rate is written as:
P + 1 P+1
BER ≤ CN (ps )P+1 (1 − ps )N−P−1 (16.14)
N
In total, the case of (P + i) erroneous symbols (1 ≤ i ≤ N − P) must be taken into account and the
probabilities must be summed. However, where Ps is small, it might be sufficient to consider only
i = 1 and stick to expression (16.14).

Example:
N = 204; K = 188; P = 8; B = 8 bits.
With the line error probability p = 10−3 , we obtain: ps = 1 − (1 − 10−3 )8 ≈ 0.008 and
9 204!
BER ≤ (0.008)9 (0.992)195
204 9! 195!
which is:
BER ≤ 1.6 10−6
Referring to the signal-to-noise ratio, and assuming Gaussian white noise, probability p = 10−3
corresponds to a SNR of about 10 dB, while 1.6 10−6 corresponds to about 14 dB. Thus, the gain
brought by the code for this error probability, the coding gain, is about 4 dB.
Now, p = 10−4 leads to BER ≤ 10−14 and the code cancels the line errors, almost completely, in
this example, for p ≤ 10−4 .
Clearly, Reed–Solomon codes are recommended for applications requiring very low error rates.
The above derivations assume scattered errors. When errors occur in packets, we can return to
the case of scattered errors by introducing symbol interleaving.

16.2 Convolutional Codes

Convolutional coding of a sequence of symbols consists of introducing redundancy into the


sequence and correlation between symbols [5, 6]. Correlation is introduced through digital
filtering – thus, convolution at the transmitter side. At the receiver side, the inverse operation is
carried out. The convolution–deconvolution cascade provides the opportunity to approach the
conditions for maximum rate transmission in the channel. These conditions need to be explicitly
stated.
320 16 Error-Correcting Codes

Amplitude
2

1.5

0.5

0 Probability

–0.5

–1

–1.5

–2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 16.4 Two-level detection.

16.2.1 Channel Capacity


The number of bits per symbol that can be transmitted by a channel, assumed to be free of distor-
tion but noisy, with a specified error probability, depends on the signal-to-noise ratio and on the
probability distribution law of the amplitudes of the noise.
As an illustration, consider real equiprobable symbols d(n) = ± 1, and a Gaussian law as shown
in Figure 16.4. Then, the error probability (Pe )±1 is expressed by:
∞ x2
1 −
2𝜎 2
(Pe )±1 = √ e b dx (16.15)
∫1 𝜎b 2𝜋
where 𝜎b2 stands for the noise power. After a change of variable, we find:

x2
1
(Pe )±1 = √ e− 2 dx (16.16)

2𝜋 1∕𝜎b
Referring to Annex 2 in Chapter 1, the error probability Pe = 10−5 requires 𝜎1 = 4.4 and
b
SNR = 12.9 dB. For symbols having more than 2 levels, (16.16) must be modified to account for
neighboring levels:
x2

2
Pe (Δ) = √ e− 2 dx (16.17)
2𝜋 ∫Δ
where Δ = 𝜎1
b
Next, the error probability must be linked to the theoretical channel capacity presented in
Section 1.13. The upper bound of the number of bits that can be transmitted without error, for a
16.2 Convolutional Codes 321

given SNR, is expressed by:


( )
1 S
C = log2 1 + (16.18)
2 B
Thus, to transmit 1 bit, we must have SNR = 3 – that is, 4.77 dB. The difference of 8.1 dB with the
10−5 error probability above represents the loss which can possibly be compensated for by coding.
For symbols with 2N equiprobable symbols, uniformly spread in the amplitude range ±2N , the
signal power is:
1 2N 1
S= (2 − 1) ≈ 22N (16.19)
3 3
The error probability can be introduced into the expression of the number of bits CN that symbols
can carry. Referring to the peak signal power, 3S, we write:
( )
1 3 S
CN = log2 1 + 2 (16.20)
2 Δ B
where the parameter 𝚫, introduced in (16.17), represents the error probability. √
Worth noting is the fact that the capacity limit (16.18) corresponds to Δ = 3, which means
that the greatest noise standard deviation that allows CN bits to be transmitted without error is
expressed by:
1
(𝜎b )lim = √ = 0.58 (16.21)
3
This should be compared with half the distance between neighboring levels, which is unity.
Applying relation (16.17), we note that, in the absence of coding, this noise level would lead to
the error probability of 0.0833.

16.2.2 Approaching the Capacity Limit


Before studying the conditions for approaching the limit, it is necessary to revisit the derivation in
Section 1.13.
Instead of decoding symbols independently as above, the global decoding of a set of M sym-
bols of N bits each, in the presence of Gaussian noise power B = 𝜎b2 , is contemplated. Then, an M
component noise vector is added to the useful signal, the energy being Eb = M𝜎b2 .
√When M tends toward infinity, the end of the noise vector sits on the hypersphere with radius
M 𝜎b . Each set of M symbols can be represented by a point in a M-dimensional hyperspace. In this
hyperspace, the received signal samples must occupy a volume greater than 2MN times the volume
of the noise hypersphere, because each of the potential 2MN sets of symbols is accompanied by a
noise vector. In order to minimize energy, the symbol volume V s must be a hypersphere whose
radius R must be greater than 2N times the radius of the noise sphere:

R > 2N M𝜎b (16.22)

The volume V M of the hypersphere is calculated as a function of the radius R by:


R
RM
VM = r M−1 dr … f (𝜃i ) d𝜃i = F (16.23)
∫o ∫i=1,..,M−1 ∫ M 𝜃
where F 𝜃 is a function of 𝜋. For example, for M = 3, we have F 𝜃 = 4𝜋.
322 16 Error-Correcting Codes

Assuming uniform distribution of symbols in the hypersphere, the corresponding signal has
energy:
R
1 M
ES = r 2 r M−1 dr … f (𝜃 )d𝜃 = R2 (16.24)
VM ∫0 ∫i=1 ∫M−1 i i M+2
When M tends toward infinity, R2 represents the total energy of the received samples and R2 /M
represents the sum of the signal and noise powers. From (16.22), we obtain:
S + B > 22N B (16.25)
Hence, the channel capacity limit is defined by (16.18).
Now, in order to come close to this limit, the points representing the transmitted signal must be
spread in the hypersphere, which implies that the projections on the coordinate axes are quantized
with more than N bits and that relations exist between these projections.
It is worth pointing out that tending toward infinity for the number of symbols of a block means
infinite transmission delay.
In practice, the number of symbols in a block at decoding is limited to M and the loss in per-
formance must be assessed. The error probability Pe must be related to the ratio of noise standard
(𝜎 )
deviation 𝜎 b and the value (𝜎 b )lim associated with the limit capacity. Letting 𝛼 = b𝜎 lim with 𝛼 > 1,
b
the error probability is expressed as a function of the probability distribution P(r) of the radius r of
the noise hypersphere by:

Pe = P(r)dr (16.26)
∫𝛼√M
To find P(r), consider M unit-variance Gaussian independent random variables bi . The variable:
(M )1∕2
∑ 2
r= bi (16.27)
i=1

has the distribution:


2 r2
P(r) = M ( ) r M−1 e− 2 (16.28)
2 2 M2 − 1 !

with M being even. For example, M = 2 is√the Rayleigh distribution.


The function P(r) is maximum for r = M − 1. Figure 16.5 provides an illustration for M = 256.
Finally, the 3 decoding parameters are the deviation with respect to the limit SNR 𝛼, the tolerated
error probability Pe , and the dimension M of the decoded block. For example, to approach the limit
at 1 dB – that is, 𝛼 = 1.1, with M = 250, we get Pe = 10−2 , while with M = 950, we get Pe = 10−5 .
In the case of binary symbols N = 1, the above derivation must be slightly modified. √ In fact,
each set of M symbols can be represented by a point of the hypersphere of radius M in the
M-dimension hyperspace. The noise hypersphere must be within the cone associated with the solid
angle corresponding to 1/2M of the total angle. It can be shown that the following approximation
holds:
( )
1
M.[10 log(𝛼)] ≥ 29.10 log
2
(16.29)
Pe
Thus, for 𝛼 = 1.1, which is a 1 dB deviation with respect to the limit, and Pe = 10−5 , the block
length must be M = 1450 symbols.
16.2 Convolutional Codes 323

0.6

P(r)
0.5

0.4

0.3

0.2

0.1
r
0

800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2600

——— —
√M – 1 α√M
Figure 16.5 Probability distribution of the noise vector modulus.

^
ML d (n–K)
decoder [0,1]

d(n) Convolu- y(n) ŷ (n) u(n)


tional Channel Equalizer
[0,1] ±1 [0,1]
coder

– +

e(n)

Figure 16.6 Transmission system with convolutional coding.

Convolutional codes are designed to draw near to the conditions leading to the limit, and a simple
scheme is described, to begin with. As for large blocks, iterative techniques are required, with turbo
codes.

16.2.3 A Simple Convolutional Code


Convolutional codes are based on FIR filtering. A simple system consists of two filters, binary data
and doubling the output rate; it is shown in Figure 16.6.
The binary symbols y(n) delivered by the coder are fed to the channel whose distortion is com-
pensated by an equalizer at receiver input. The equalizer output ỹ (n) is applied to a maximum
likelihood (ML) decoder, which provides an estimation d(ñ − K) of the input binary data. The sys-
tem delay is K. The ML decoder has multilevel inputs. The binary symbols u(n) are used to obtain
the error signal e(n) needed for equalization. Assuming perfect equalization, the system can be
modeled as shown in Figure 16.7.
324 16 Error-Correcting Codes

(±1)
y1(n) +
x1(n)
+
b1(n)
d̂ (n – M)
d(n) ML
z–1 z–1 [0,1]
[0,1] decoder
b2(n)
+
y2(n) x2(n)
+
(±1)

Channel

Figure 16.7 System model with convolutional decoding.

The filter transfer functions are:


H1 (Z) = 1 + Z −1 + Z −2
H2 (Z) = 1 + Z −2 (16.30)
The number of coefficients L = 3 is called the “constraint length” of the code. Since input and
output data are binary, the arithmetic operations are carried out modulo 2, in the field [0, 1]. The
coder output rate is doubled since each input generates 2 outputs; the code rate is R = 1/2.
The channel is assumed to be distortion free but noisy. Noise samples are added to the transmitted
useful samples, and the received signals x1 (n) and x2 (n) are fed to the ML decoder which retrieves
the data with the delay M > L.
In the decoding process, at time n = M − 1, the data sample to be decoded d(0) = ± 1 appears in
at most 2L = 6 received values, as shown in the sequence:
x1 (0) = d(0) ⊕ d(−1) ⊕ d(−2) + b1 (0)
x2 (0) = d(0) ⊕ d(−2) + b2 (0)
x1 (1) = d(1) ⊕ d(0) ⊕ d(−1) + b1 (1)
x2 (1) = d(1) ⊕ d(−1) + b2 (1)
x1 (2) = d(2) ⊕ d(1) ⊕ d(0) + b1 (2)
x2 (2) = d(2) ⊕ d(0) + b2 (2)
x1 (3) = d(3) ⊕ d(2) ⊕ d(1) + b1 (3)
x2 (3) = d(3) ⊕ d(1) + b2 (3) (16.31)
An error vector of M terms can be built, taking the differences between the received and emit-
ted values. The norm Emin of this vector is taken as the cost function, and it is minimum at time
n = M − 1 if d(0) is the transmitted value in (16.31). Next, the deviations produced by false d(0) val-
ues in (16.31) must be calculated. The impact of modulo additions is worth pointing out because
error compensation may occur which necessitates the examination of cases of multiple errors.
Then, if the exact value is d(0) = 0, we find:
1. Single error

M−1
E1 = [x1 (n) − y1 (n)]2 + [x2 (n) − y2 (n)]2 (16.32)
n=0
16.2 Convolutional Codes 325

[ ]
b (0) + b2 (0) + b1 (1) + b1 (2) + b2 (2)
(d(0) = 1) ∶ E1 − Emin = Δ1 = 20 1 + 1
5
2. Double error
[ ]
b (0) + b2 (0) + b2 (1) + b2 (2) + b1 (3) + b2 (3)
d(0) and d(1) ∶ Δ2 = 24 1 + 1
6

[ ]
b (0) + b2 (0) + b1 (1) + b1 (3) + b1 (4) + b2 (4)
d(0) and d(2) ∶ Δ′2 = 24 1 + 1
6

The average of noise samples appears in the cost function, with factor 5 for the single-error case
and 6 for the double-error case. It can readily be verified that the averaging factor is greater than 6
for other cases. In the absence of coding, the averaging factor is 2. Thus, with coding, considering
single-error cases only, the coding gain is approximately Gc = 2.5, or 4 dB.
It can be observed that the averaging factor corresponds to the number “1” at the filter outputs
when the error sequence is fed at the inputs. Then, a single error leads to the impulse response.
This number is called the weight of the sequence.
The curves giving the error probability per bit as a function of Eb /N 0 , with and without coding,
are shown in Figure 16.8. The term Eb is the energy per bit and N 0 is the noise power spectral
density. The ratio Eb /N 0 is a theoretical parameter that allows us to obtain generic curves. To get the
practical SNR and determine the rates, it is necessary to apply a factor 2 to account for the symbol
rate equal to twice the channel bandwidth and multiply by the number of bits in each symbol.
With L = 3, the coding gain is close to 4 dB for Pe = 10−6 . In contrast, it is less than 3 dB for
Pe = 10−3 . In fact, it is necessary to consider the vectors in the vicinity of the ideal vector and
add up the error probabilities. Overall, the vectors associated with multiple errors have averaging
factors greater than that of the single-error case and their impact is weak for high-SNR situations

Pe
0

–2

no coding
–4

–6
coding: L = 3

–8

coding: L = 7
–10

coding: L = 9
–12
3 4 5 6 7 8 9 10 11 12 13
Eb/No (dB)

Figure 16.8 Bit error probability for convolutional coding with rate R = 1/2.
326 16 Error-Correcting Codes

and small error probabilities. The error probability is expressed as a function of the averaging factor
k by:
∞x2 − 2 k
1
pk = √ √ e− 2 dx ≈ 0.4 e 2𝜎b (16.33)
2𝜋 ∫ 𝜎b
k

and for k + 1:
( ) ( )
k+1 1
pk+1 ≈ 0.4 exp − ≈ pk exp − 2 (16.34)
2𝜎b2 2𝜎b
The vectors in the vicinity of the ideal vector are obtained from the trellis diagram and the graph
of the transitions between associated states, which constitute an efficient realization of the ML
principle. The generating function of the convolution code is expressed by:
T(D, L, N) = D5 L3 N + D6 L4 (1 + L)N 2 + D7 L5 (1 + L)2 N 3 + D8 L6 (1 + L)3 N 4 (16.35)
where D is the averaging factor, the exponent of N is the number of errors, and the terms of the
polynomial in L represent the multiple error configurations. For example, the averaging factor 7
occurs for 3 errors in the following 4 configurations:
[d(0), d(1), d(2)], [d(0), d(1), d(3)], [d(0), d(2), d(3)], [d(0), d(2), d(4)].
The averaging factor is called “free distance” in coding theory.

16.2.4 Coding Gain and Error Probability


The codes are determined by a given constraint length and by the search for the maximum value
of the minimum averaging factor. In the single error case, this factor equals the number of nonzero
coefficients N 1 in the set of polynomials H 1 (Z) and H 2 (Z). For 2 consecutive errors, it equals the
number of alternations N 2 of zero and nonzero coefficients. Generally, these two cases account for
the smallest averaging factor values and a good initialization strategy is to start from 2 polynomials
that lead to equivalent factors for the cases of single error and two consecutive errors. If the ones
and zeros alternate in H 1 (Z) and H 2 (Z), then, N 2 = 2L and N 1 = L.
If a zero is replaced by a one, the number of alternations N 2 is reduced by two units and N 1 is
increased by one unit. Equality is reached for N 1 = N 2 = 4L/3 and the coding gain for small error
probability is bounded by:
[ ]
1 4
Gmax = 10 log L (16.36)
2 3
For example, for L = 7, a widely used code has coefficients [1111001] and [1011011] and
N 1 = N 2 = 10. The bound (16.36) is Gmax = 6.7 dB, while simulations yield a coding gain of 6.3 dB.
Similarly, for L = 9, the best code has coefficients [111101011] and [101110001] and
N 1 = N 2 = 12 = 9×4/3. As shown by the curve associated with that code in Figure 16.8, for
Pe = 10−8 the coding gain is 7 dB, while bound (16.36) gives Gmax = 7.8 dB. That gain is obtained
E
for Nb = 5 dB. Note that, for a rate R = 1/2 code, the channel capacity is C = 1/2 = 1/2 log2 (1+1)
0
which corresponds to SNR = 1 = 2 Eb /N 0 , or Eb /N 0 = −3 dB.
The systematic codes keep the initial data sequence, and they are associated with H 1 (Z) = Z −P .
The delay is P samples and the gain, which rests on H 2 (Z), is reduced.
The global error probability of a code is computed by considering all the data vectors in the vicin-
ity of the ideal vector and summing the corresponding probabilities. For the averaging factor k, the
number of error configurations ak is obtained from (16.35) rewritten as:


T(D, 1,1) = ak Dk (16.37)
kmin
16.2 Convolutional Codes 327

For each configuration, the error probability Pk is determined so that the average of the k noise
samples exceeds unity as shown by relations (16.32). The error probability is bounded by:


PE < ak Pk (16.38)
kmin

The inequality reflects the fact that noise samples are involved in several averaging operations.
Approximation (16.33) leads to:
( ) ∞ ( )
kmin ∑ k − kmin
PE < 0.4 exp − 2 ak exp − (16.39)
2𝜎b k=k 2𝜎b2
min

This represents the probability that an error will occur. However, an important feature of
convolutional codes is that the output errors often occur in packets, due to the decoder.
In summary, convolutional codes have the following properties:

● The coding gain grows with the constraint length.


● The coding gain is reduced for low SNR values.
● The decoder may generate error packets.

16.2.5 Decoding and Output Signals


The ML decoder retrieves the data sample d(n) assuming the L − 1 previous data are known and
searching for the minimum norm of the M-term error vector associated with indices: n, n + 1, …,
n + M − 1. If an error occurs at time n, it is propagated in the decoder memory, and it can generate
new errors. An illustration is provided in Figure 16.9, with L = 3, a block of 1000 bits and 3 dB SNR.
To investigate the generation of these errors, it is helpful to consider the signal which represents
the noise averaging, according to (16.32). This signal sq (n) is obtained by the following operations
at time n:

● Assuming d(n) = 0, search for the minimum E0 (n) of the error vector norm.
● Assuming d(n) = 1, search for the minimum E1 (n) of the error vector norm.
● Take sq (n) = E1 (n) − E0 (n).

1.2
Errors: l(n)
1

0.8

0.6

0.4

0.2

0
100 200 300 400 500 600 700 800 900 1000
n

Figure 16.9 Errors at the output of a convolutional decoder.


328 16 Error-Correcting Codes

The binary data are given the sign of the sequence sq (n). In fact, the sign of sq (n) represents the
sum of the original data and the error signal and |sq (n)| is the input noise after averaging, with
amplitude shift and aliasing about the origin in the presence of errors. Low values of |sq (n)| reflect
the low reliability of the recovered data at corresponding times.
A decoder that delivers the sequence sq (n) is called a soft output decoder. This signal may be
leveraged in decoders in cascades to extract errors, as in turbo codes.
An approach to cope with error packets consists of cascading the convolutional decoder with an
interleaver and a Reed–Solomon decoder. The interleaver performs a permutation which spreads
the errors over several blocks of the RS code, which allows for correction.

16.2.6 Recursive Systematic Coding (RSC)


An IIR filter can be employed to provide systematic coding which is equivalent to the
non-systematic coding provided by a FIR filter. It suffices to replace H 1 (Z) with 1 and H 2
(Z) with H 1 (Z)/H 2 (Z), which leads to the diagram shown in Figure 16.10 for L = 3 [7].
The output of the decoder is the same as the output of FIR filter-based coder when the input
signal is filtered by H 1 (Z). Therefore, the minimum averaging factor in ML decoding is the same,
but it is obtained for 3 consecutive errors, for L = 3. However, the impulse response is infinite,
and, as a consequence, a data sample to be decoded is involved in all the following samples that
are taken into account in the computation of the error vector norm. Modification of one bit in the
input sequence entails a change of the whole following output sequence. The computation of the
soft output sequence is carried out as in the FIR coding.
The recursive code can be made circular and transformed into a long block code by terminating
the block in such a manner that the memory returns to the initial state. Then, the impulse response
is at most equal to the block length. Now, an input sequence with L − 1 zeros appended leads to a
return to zero with FIR filtering, while the probability is 1/2L − 1 with IIR filtering. Then, due to the
ML calculation, an input sequence that does not bring about a return to zero has a low probability
of being incorrectly recovered by the decoder.
In short, RSC allows for a long block to be processed with a small constraint length and reduced
decoder complexity.
In order to come near to the channel capacity limit mentioned in Section 16.2.2, it is necessary
to combine RSC with an interleaving operation, which spreads the error packets, and an iterative
process [7, 8].

y1(n)

d(n)
+ Z–1 Z–1
[0,1]

y2(n)

Figure 16.10 Recursive systematic coder.


16.2 Convolutional Codes 329

y(n)

d(n)
+ z –1 z –1

+ y1(n)
Interleaver

+ z –1 z –1

+ y2(n)

Figure 16.11 Rate R = 1/3 parallel coder.

16.2.7 Principle of Turbo Codes


The data are processed in blocks. Two recursive systematic convolutional coders are deployed in
parallel with an interleaver in between. At the decoder, soft outputs are obtained, and the output of
the second decoder is fed to the input of the first decoder after deinterleaving. The diagram of the
rate R = 1/3 coder, based on the filters described in the previous section, is given in Figure 16.11.
The principle of the decoder is shown in Figure 16.12. Each decoder delivers a soft output. The
decoded data sample is the sign of the output of decoder 2, sp (n). The decoder outputs can be cal-
culated through a technique similar to that used to derive sq (n) in the previous sections or using
probability computation.
Decoder 1 receives the signal x(n) of the emitted data and the signal x1 (n) which is the output of
the first coder with channel noise added. The soft output is fed to decoder 2, after interleaving, in
order to synchronize with the signal x2 (n) from coder 2. Then, the output of decoder 2 is deinter-
leaved to be fed to the input of decoder 1 after combination with the received signal x(n). The SNR
of the sequence sp (n) is improved at each iteration and the combination with the noisy input signal
is performed in such a manner that the SNRs add up. In order for the addition to be efficient, the

De-interleaver
sp(n)

x(n) ∑
Decoder 1 Interleaver Decoder 2
x1(n)

x2(n)

d̂(n – M)

Figure 16.12 Principle of the iterative decoder.


330 16 Error-Correcting Codes

noise signals must be independent – hence the importance of the interleaving involved in recur-
sive coding. Moreover, it is obvious that using systematic coding and keeping the input signal is
essential for the realization.
For low input SNR, many errors occur at first decoding. They are corrected gradually by suc-
cessive iterations. The procedure is stopped when the output signal variance is not decreasing any
more and the retrieved data are obtained as the sign of this signal.
The transmission efficiency is improved with rate R = 1/2 and alternatively transmitting the out-
puts of coders 1 and 2.
Example: Let us consider a coder with L = 5 and the IIR filter transfer function:
H2 (Z) 1 + Z −4
=
H1 (Z) 1 + Z + Z −2 + Z −3 + Z −4
−1

a permutation matrix of size 256 × 256 as interleaver, and the block length 65,536 bits. After 18
iterations, the error probability per bit has become smaller than 10−5 for SNR = 0.7 dB. The SNR
limit value, with rate R = 1/2, equals 0 dB. In this example, the turbo code comes as close as 0.7 dB
to the theoretical limit.

16.2.8 Trellis-Coded Modulations


In the previous sections, the data and emitted symbols are binary: d(n) = [0, 1] and x(n) = ± 1.
Coding techniques can be extended to multilevel symbols. The principle consists of protecting the
low-weight bits with a convolutional code and distributing the amplitudes in such a way that inter-
vals corresponding to the non-protected bits are maximized [9].
An illustration is given in Figure 16.13. Amplitudes of 4- and 8-level symbols are meant to carry
2 information bits with the same power of the emitted signal.
In the absence of coding, 4 levels are needed and the distance between neighboring levels is 2q.
Applying a rate R = 1/2 convolutional code to one of the 2 bits, as shown in Figure 16.14, a 3-bit
symbol is obtained. The amplitude distribution is such that the noncoded bit is associated with
the distance 4q, as apparent in Figure 16.13. In order to determine the system coding gain, the

111

110

101 4q

100
2q
011

010

001

000

Figure 16.13 Amplitudes of 4- and 8-level symbols.


16.3 Conclusion 331

Figure 16.14 Transmitter with coded


c1
modulation. d1
Symbol

Convolu- c2
/ x(n)
tional
d2 coder
--- c3
Amplitude
R = 1/2

minimum averaging factor kmin of√the convolutional code can be associated with a multiplication
of the distance between levels by kmin . Then, for kmin > 16, the distance 4q corresponding to the
noncoded bit is dominant and the system coding gain, with respect to the absence of coding, can
come close to factor 2 – that is, 6 dB.
At decoding, the noncoded bit is introduced in the trellis in the form of parallel paths, which
means that the weighting computations are carried out for the 2 possibilities associated with this
bit. The assignment of the amplitudes for the 2 coded bits can also be optimized – the greatest
available distance is associated with transitions in the trellis that start from the same state or reach
the same state. The objective is to maximize the distance between paths.
When the free distance of the code is preponderant, the error probability Pe per symbol is esti-
mated using (16.38) or the simplified version:
( )
kmin
Pe ≈ 0.4 a0 exp − 2 (16.40)
2𝜎b
where a0 is the number of error configurations leading to the minimum averaging factor kmin as
explained in Section 16.2.3.

16.3 Conclusion
Signal-processing techniques can apply to error detection and correction. Reed–Solomon codes
based on the DFT, in combination with linear prediction, are able to correct errors in blocks so that
extremely low error rates can be reached.
Convolutional codes use digital filters, and they bring the equivalent of SNR improvements
in transmission. The decoder is simple in principle and easy to implement. It exploits the ML
technique and the Viterbi algorithm whose complexity depends on the filter order and the length
of the code. The delay is proportional to the code length, multiplied by a factor of a few units.
Convolutional coding can be extended to multi-bit symbols, with the coded modulation technique,
in which one or several low-weight bits are protected.
Turbo codes combine recursive filtering, interleaving, and an iterative procedure. Their perfor-
mance can come close to the theoretical limit of channel capacity. They process data blocks of large
or very large size, which increases complexity and transmission delay.
The combination of a convolutional coder (inner code) and a Reed–Solomon coder (outer code) is
a very powerful setup which yields extremely low error rates in transmission and radio broadcasting
systems.
332 16 Error-Correcting Codes

Exercises

16.1 A Reed–Solomon code uses a DFT of size 32 and the syndrome consists of the 4 values:
S = [2 + j − 1,69 + 2,23 j − 2,12(1 + j)2,23 – 1,67 j]
Compute the prediction coefficients and determine the signal which must be subtracted in
the frequency domain to retrieve the initial signal.

16.2 Cancelling an impulse noise. To protect a block of samples X = [x(0), …,x(5)], a suffix
Sx = [x(6),x(7)] is added, such that, in the DFT of sequence, the terms Y (6) and Y (7) are
null. Give the expression of Sx as a function of X.
After addition of an impulsive noise, we have Y (6) = 2 e−j𝜋/2 , Y (7) = 2 ej3𝜋/4 . Compute the
amplitude of the pulse and its index in the input block. What happens if the spurious pulse
falls between two index values and then, how can its impact be reduced?

16.3 A rate R = 1/2 convolutional code has the coefficients [1111] and [1101]. Give the realization
diagram and show that the free distance is 6. What is the maximum coding gain?

16.4 Assuming binary data, d(n) = ± 1, apply the formula that gives the theoretical capacity
of a channel, in order to obtain the maximum SNR value which allows for error-free
transmission.
The system SNR is assumed to be equal to 8 dB and a rate R = 1/2 convolutional code is
used. If a bit error rate less than 10−8 is targeted, give the coding gain needed. Propose a
code which is able to meet the objective.

16.5 A systematic code is defined by:


H1 (Z) = 1; H2 (Z) = 1 + Z −1
Give the free distance of the code.
Perform the decoding of the sequence: 01101011000110111
The code is made recursive – that is, H1 (Z) = 1; H2 (Z) = 1+Z1 −1
To use this code, an extra bit is appended to each set of 3 information bits, in order to secure
a return to zero. Give the corresponding rate. What is the free distance?

References

1 W.W. Peterson and E.J. Weldon, Error Correcting Codes, MIT Press, Cambridge, MA, 1972.
2 R. Blahut, Algebraic Methods for Signal Processing and Communications Coding, Springer-Verlag,
New York, 1992.
3 A. Hocquenghem, Codes Correcteurs d’Erreurs, revue Chiffres, Vol. 2, pp. 147–156, 1959.
4 R.C. Bose and D.K. Ray-Chaudhuri, On a class of error correcting binary group codes,
Information and Control, Vol. 3, 1960, pp. 68–79.
5 A.J. Viterbi and J.K. Omura, Principles of Digital Communications and Coding, McGraw-Hill,
New York, 1979.
6 R. Ziemer and R. Peterson, Introduction to Digital Communication, Chapter 7: Fundamentals of
convolutional coding Prentice Hall, N.J., 2001.
References 333

7 C. Berrou and A. Glavieux, Near optimum error correcting coding and decoding: turbo-codes,
IEEE Transactions Vol. 44, N∘ 10, 1996, pp. 1261–71.
8 C. Berrou, The ten-year-old turbo codes are entering into service, IEEE Communications Maga-
zine, 2003, pp. 111–116.
9 E. Biglieri, D. Divsalar, P.J. McLane and M.K. Simon, Introduction to Trellis-Coded Modulation,
Macmillan, New York, 1991.
335

17

Applications

Signal processing is instrumental in the generalization of electronics to all technical fields. A few
examples of applications are presented in this chapter, mainly in the field of communications.

17.1 Frequency Detection

Assume that the amplitude of a signal component with frequency f 0 is to be determined when the
signal is sampled at frequency f s > 2f 0 . Figure 17.1 represents the set of operations to be performed.
The signal is applied to a narrow band-pass filter centered on the frequency f 0 . Rectification is
then performed by taking the absolute value of the numbers obtained. This set of absolute values
is applied to a low-pass filter which provides the desired value of the amplitude. If the frequency
component f 0 which is to be detected is present, threshold logic provides the logic information.
This process can be analyzed as follows. Assume s0 (t) is the signal to be detected, with:
s0 (t) = A sin(𝜔0 t)
Taking the absolute value of the numbers which represent samples of this signal is equivalent to
multiplying by a square wave ip (t) in phase with s0 , and of unit amplitude. Using equation (1.6), we
can write:


ip (t) = 2 h2n+1 sin[(2n + 1)𝜔0 t] (17.1)
n=0

in which:
sin[𝜋(2n + 1)∕2] 1
h2n+1 = (−1)n =
𝜋(2n + 1)∕2 𝜋(2n + 1)∕2
The signal s∗0 (t) obtained after rectification is:


s∗0 (t) = 2A h2n+1 sin[(2n + 1)𝜔0 t] sin(𝜔0 t)
n=0
or:


s∗0 (t) = Ah1 + A (h2n+1 − h2n−1 ) cos(2n𝜔0 t) (17.2)
n=1

To obtain the amplitude A, terms of the infinite sum have to be eliminated. Above a certain
order, the parasitic products have frequencies greater than half the sampling frequency f s /2, and

Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
336 17 Applications

Input Band-pass Low-pass Logical


signal Rectifier Threshold signal
filter filter

Figure 17.1 Frequency detection by band-pass filtering.

are aliased in the useful band. The specifications of the low-pass filter, and, in particular, the
stop-band edge, have to be chosen so as to eliminate the largest parasites. Those occurring in the
pass band result in fluctuations in the measurement of A.
In this approach, it is advantageous to use an IIR band-pass filter and an FIR low-pass filter,
because the amplitude can be measured at a frequency less than f s . There is another method avail-
able, which involves only:
multirate filters. This is based on modulation by two carriers in quadrature at the frequency f 0
and is shown in Figure 17.2.
The component to be detected is:

s(t) = A sin(𝜔0 t + 𝜑)

where 𝜑 represents the phase of the component relative to the carrier. After low-pass filtering
in the two branches to eliminate the unwanted modulation products, the following signals are
obtained:
A A
SR = ; sin 𝜑; SI = cos 𝜑 (17.3)
2 2
The required amplitude is:
√( )
A = 2 SR2 + S12
√( 2 )
Accurate evaluation of X = SR + S12 is difficult, and one is generally satisfied with an approx-
imation X ′ , which depends on the phase 𝜑.
Table 17.1 gives various approximations and the corresponding relative errors. These errors can
be reduced by multiplication by a scaling factor C – that is, by calculating the value Xc′ :
√( 2 )
XC′ = C SR + SI2

Detection of a modulated frequency generally requires less calculation than the method which
uses a band-pass filter, but it does require the availability of suitable carrier signals. The operation
of detecting a frequency is used in signal transmission systems and forms the basis of receivers of
multifrequency codes.

Low-pass SR
filter

Input Cos (ω0)


S2R + S2I A
signal
Sin (ω0)
Low-pass
filter SI

Figure 17.2 Frequency detection by complex filtering.


17.2 Phase-locked Loop 337

√( 2 )
Table 17.1 Approximation of x = SR + S12

( ) ( )
′ X ′ −X X ′ −X
C
X max X
C max X

|SR | + |SI | 0.41421 0.20711 0.17157


max(|SR |, |SI |) 0.29289 0.85355 0.17157
1
max(|SR |, |SI |) + 2
min (|SR |, |SI |) 0.11803 1.05803 0.05573

17.2 Phase-locked Loop

Phase-locked loops are used for clock recovery in terminals and receivers [1, 2]. The principle
is illustrated in Figure 17.3. When the loop is in equilibrium, the frequency produced by the
voltage-controlled oscillator is equal to the frequency of the input signal and the phase detector
produces a signal whose continuous component is extracted by the narrowband low-pass filter.
The phase detector can be a modulator which forms the product of the oscillator output and the
signal input. If the nominal frequency of the oscillator is equal to the input frequency, the signals
are in quadrature, and the continuous component at the output of the modulator is zero. If not,
then the phase difference with respect to the quadrature signal produces a continuous component
which shifts the oscillator frequency by the required amount for the frequencies to become equal.
The bandwidth of the loop filter determines the capture range, the response time, and the residual
noise level.
This operation can be replicated entirely digitally. However, there is additional flexibility with
respect to where the phase calculation is performed. The digital oscillator can be realized by means
of a phase accumulator connected to a memory which provides samples of the sinusoid. Thus, the
input phase values can be directly processed at the input and output of the loop, and the phase
difference can be obtained by simple subtraction. A model corresponding to a second-order loop
is shown in Figure 17.4. It is a control loop with two coefficients, K 1 and K 2 , corresponding to the
proportional and integral control terms, respectively.

φe(t) Voltage- φs(t)


Phase Filter/
controlled
detector amplifier
oscillator

Figure 17.3 The principle of a phase-locked loop.

Figure 17.4 Model of a second-order loop. φs(n)


Z–1 + + Z–1

K1 +

+
φe(n) K2
338 17 Applications

The voltage-controlled oscillator is represented by the integrator which provides the output phase
𝜑s (n). The transfer function between the output and the input can be written as:
𝛷S K1 Z −1 + (K2 − K1 )Z −2
H(Z) = = (17.4)
𝛷e 1 − (2 − K1 )Z −1 + (1 − K1 + K2 )Z −2
This is the transfer function of a low-pass filter with a value of 1 at zero frequency. The char-
acteristics of the filter are determined by the two parameters K x and K 2 . The region of stability is
examined by using the results given in Section 6.7 and setting b1 = K 1 = 2 and b2 = 1 – K 1 + K 2 .
This gives 1 – K 1 + K 2 < 1 and |K 1 − 2| < 2 – K 1 + K 2 . In the plane of the coefficients K 1 and K 2 , the
stability domain is a triangle.
The transfer function between the phase shift and the input can be written as:
𝛷e (Z) − 𝛷s (Z) (1 − Z −1 )2
= (17.5)
𝛷e (Z) 1 − (2 − K1 )Z −1 + (1 − K1 + K2 )Z −2
The presence of the term (1 − Z −1 )2 in the numerator shows that such a loop is capable of tracking
a phase variation.

17.3 Differential Coding of Speech

Differential coding of speech leads to reduced data rates, with no additional delay and low compu-
tational complexity.
The speech signal has a spectral density which, in the long term, decreases rapidly with frequency
from a value of less than 1 kHz. Under these conditions, considering the 8 kHz sampling frequency
which is standard in communications, significant prediction gains can be expected [3].
Figure 17.5 shows the principle of differential coding based on linear prediction. At transmission,
the prediction error e(n) is quantized by coder C, and, at the receiver, decoder D delivers the signal
which is fed to the inverse filter to retrieve the speech signal. The set e′ (n) is the result of adding the
quantization error to e(n) and x′ (n) is the set output by the decoder. The signal e(n) is expressed by:

N
e(n) = x(n) − x̃ (n) = x(n) − ai x(n − i)
i=1

The prediction filter P has a transfer function P(z) such that:



N
P(Z) = ai Z −i (17.6)
i=1

The order N of the filter and the coefficients ai (1 ≤ i ≤ N) should be chosen to minimize the power
of the signal e(n). Under these conditions, for a given value of N, the coefficients are calculated as

Coder Decoder
x(n) e(n) eʹ(n) xʹ(n)
– C D +

P P
~ ~
x (n) xʹ(n)

Figure 17.5 Principle of differential coding.


17.4 Coding of Sound 339

indicated in Section 13.57, from the elements r(k) (0 ≤ k ≤ N) of the autocorrelation function of
x(n). The following normalized values have been suggested for speech signals:
R(0) = 1; R(1) = 0.8644; R(2) = 0.5570; R(3) = 0.2274
They show a strong correlation between neighboring samples. The corresponding coefficients
have values of:
a1 = 1.936; a2 = −1.553; a3 = 0.4972
The eigenvalues of the autocorrelation matrix R3 are:
𝜆1 = 2.532; 𝜆2 = 0.443; 𝜆3 = 0.025
and we have:
Atopt R3 Aopt = 0.947
Thus, the corresponding prediction gain is close to 13 dB.
Improvements can be made to the basic principle, in order to achieve a high level of performance:
(1) The prediction is carried out using the sequence e(n) transmitted after quantization, which
brings a reduction in quantization distortion power. Moreover, the transmitter and receiver
then operate on the same source of information and nothing else has to be transmitted if some
adaptive procedures are introduced.
(2) The quantizer is made adaptive by relating the quantization step to an evaluation of the signal
power – that is, to take advantage of the fact that speech signals are nonstationary but can be
considered almost stationary over short periods of time (of the order of 10 ms).
(3) The prediction is made adaptive to follow the short-term variations of the speech spectrum.
With an adaptive prediction filter of adequate order (for example, 10th-order FIR), the prediction
gain for speech can range from 6 dB for unvoiced sounds to 16 dB for voiced ones, with an overall
subjective value of about 13 dB.
These techniques are used in communication networks and a system named adaptive differential
pulse code modulation (ADPCM) has been standardized by the ITU (International Telecommuni-
cation Union) under recommendation G721 [4].

17.4 Coding of Sound


Filter banks are the basis of digital sound compression, since they are capable of exploiting the
characteristics of the human ear – particularly, the effect of masking. The algorithm represented
in Figure 17.6, and standardized as ITU-T/G722, permits transmission of sound on the telephone
channel at 64 kbit/s. The audio signal has a bandwidth extending to 7 kHz; it is sampled at 16 kHz
and coded in 14 bits. A bank of two QMF filters enables two sub-bands to be obtained, sampled at
8 kHz, which are then coded in ADPCM at rates of 48 and 16 kbit/s for the low- and high-bands,
respectively. One multiplexing operation, with possible insertion of data, provides a transmission
rate of 64 kbit/s.
Compression of high-quality sound for digital broadcasting or recording is defined by the
ISO/IEC 1 1172-2 standard [5, 6]. It is based on a bank of 32 filters with 512 coefficients of the
pseudo-QMF type. The signals thus obtained are quantized separately, with a number of bits for
each sub-band such that the quantization noise remains at a level below the masking threshold.
This threshold, illustrated in Figure 17.7, is defined for each sub-band on the basis of a DFT
analysis at 1024 points by applying psycho-acoustic results. The method enables a rate of 128 kbit/s
to be achieved for a high-quality monophonic sound channel (MPEG-layer 3/MP3) [6].
340 17 Applications

16 kbits/s
Audio signal MIC-DA 64 kbits/s
QMF
MUX
filters 48 kbits/s
14 bits – 16 kHz MIC-DA

Figure 17.6 Coding of an audio signal in two sub-bands.

0 16 kHz f

Figure 17.7 Example of a masking curve for a sound signal.

17.5 Echo Cancelation

In communication networks, echoes are created when delayed and attenuated replicas of the signal
transmitted by a local terminal toward a distant terminal reach the local receiver.
Specifically, electrical echoes are produced on the transmission lines in the form of reflected
signals due to impedance mismatching and imperfections in the hybrid transformers which per-
form two-wire to four-wire conversions. In the case of speech, these signals are reflected back to
the subscriber who is speaking, and they become problematic as the distance between the sub-
scribers increases. There are also echoes arising from acoustic coupling between the microphone
and loudspeaker in a telephone set, in which case an adaptive echo canceller can be used to provide
a hands-free set.
Echo canceling consists of modeling the echo path and subtracting the generated synthetic echo
from the real echo [7].
Two different cases can be distinguished, depending on the type of signals involved – namely
speech and data. To begin with the simpler case, data modems are considered.

17.5.1 Data Echo Canceller


The most efficient exploitation of two-wire lines is achieved when signals are transmitted simulta-
neously in both directions and in the same frequency band. The transmission is called full-duplex.
The principle is illustrated in Figure 17.8.

17.5.1.1 Two-wire Line


The signal xA (n) is transmitted from terminal A to terminal B through a two-wire line. At the input
of the receiver of terminal A, the signal y(n) has 2 components, the useful data signal yB (n) issued
from terminal B and the reflected signal which is the disturbing echo produced by xA (n) and desig-
nated by r A (n). The function of the filter H(Z) is to generate a synthetic echo y(n) as close to r A (n)
as possible, so that, after subtraction, the error signal e(n) is close to yB (n) and the quality of the
data transmission from terminal A to terminal B is of satisfactory quality.
17.5 Echo Cancelation 341

y(n) = yB(n) + rA(n) e(n)


+

Terminal B Terminal A
H(Z)
2-wire cable

xA(n)

Figure 17.8 Principle of echo canceling.

The selection of adaptive filter parameters is driven by the context. The number N of coeffi-
cients is derived from the echo impulse response, taking account of the sampling frequency. The
filter must be adaptive because the transmission line characteristics may evolve over time. Regard-
ing input signals, the context is favorable, because the filter input signal is the transmitted data
sequence xA (n), which is generally uncorrelated, unit power, and has the AC matrix RN = I N . Then,
the performance of the gradient algorithm, LMS, is equivalent to that of the recursive least squares
algorithm (RLS). The adaptation step 𝛿 is bounded by 2/N and the time constant is 𝜏 = 1/𝛿. In the
learning phase, the mean output error power is deduced from (14.28) as:

Er = ||Hopt ||22 (1 − 𝛿)2n (17.7)

The norm L2 of the echo coefficient vector ||Hopt ||22 reflects the power of the echo signal.
During bidirectional transmission, the useful signal yB (n) in the reference is smaller than the
echo r A (n) and the echo attenuation Ae must satisfy the inequality:

Ae > As + SNR (dB) (17.8)

where As is the echo-to-useful signal ratio. For example, SNR = 40 dB and As = 20 dB leads to
Ae = 60 dB, which implies that the residual error after convergence is very small.
In the adaptation process, the useful signal impacts the coefficients, and the consequence is an
increase of the output residual error. The filter coefficient variance after convergence is 𝜎y2 𝛿∕2, and
the residual error is N times more, N𝜎y2 𝛿∕2. The term 𝜎y2 is the power of the useful signal – that is,
the received data. The SNR objective can be reached if the following inequality is satisfied:
𝛿 1
N < (17.9)
2 SNR
In the above derivations, the output error power is assumed to be close to the useful signal power.
As an illustration, take SNR = 10−4 (40 dB), N = 60, then 𝛿 < 3.3 × 10−6 , which is a very small value
and makes the learning phase very long. The impact on the coefficient accuracy is worth pointing
out. Referring to Section 14.5, a simplified expression for the number of bits of the coefficients is
obtained as:
( )
1 1
bc = log2 + log2 (Ae ) (17.10)
𝛿 2
With the above figures, bc = 29. In practice, there is no need to perform multiplications with such
a high degree of accuracy; it is needed only in the coefficient updating operations.
342 17 Applications

Level
detector

Yes/No
Double
speech
r(n) Level decision
+
detector

~
y (n)
e(n)
AE

x(n)

Figure 17.9 Double speech detection.

17.5.2 Acoustic Echo Canceler


Acoustic echo cancelation leads to very long filters. Taking v = 330 m/s as the propagation speed
of acoustic waves in the air, sampling frequency f s = 8 kHz, and distance 2D = 100 m, we obtain
N = 2D f s /v = 2400. This case is encountered in audioconference rooms.
An additional adverse issue in speech echo cancelation is double speech. Double speech is
produced when the speech of the local subscriber is superimposed on the echo in the reference
signal. To cancel the echo, this disturbing signal entails a change in the coefficients which
must be stopped as rapidly as possible. Adaptation is therefore stopped as soon as detection has
occurred. Detection can simply take place by comparing the levels of the received signal r(n)
and the signal e(n) after the synthetic echo has been subtracted. The scheme is represented in
Figure 17.9.
In the absence of a distant speech signal in r(n), provided cancelation works properly, levels are
very different. In contrast, during double speech, levels come close, and the information can be
exploited to determine and stop the coefficient change.
Note that the parameters of level detection and decision must be chosen with care, to avoid false
decisions and excessive delays. Level detection may be based on power or amplitude estimations.

17.6 Television Image Processing

The transmission of image signals on the communication network requires very high bit rates.
Therefore, bit rate reduction techniques, based on digital processing, are crucial – particularly for
television signals.
Overall, a television image is a function of four variables s(x, y, t, 𝜆). There are two spatial
variables: time and wavelength. For transmission purposes, this signal is transformed into a
one-dimensional signal.
The wavelength variable can be dropped by considering that the human visual system basically
consists of three types of receivers, which perform filtering and produce three signals associated
with the primary colors – namely red, green, and blue (R, G, B).
17.6 Television Image Processing 343

The television scanning process converts these three-dimensional signals into a one-dimensional
signal. The images are scanned 25 times per second, with 625 lines per image. In fact, the odd-
and even-numbered lines form two consecutive frames which are multiplexed in time. Hence the
recurrence of 50 interleaved frames per second.
For transmission, the primary components R, G, and B are replaced by linear combinations called
luminance Y and color differences, or chrominance, U and V.
Y = 0.30R + 0.59G + 0.11B
U = R − Y = 0.70R − 0.59G − 0.11B
V = B − Y = −0.30R − 0.59G + 0.89B
Digitization is performed with a frequency of 13.5 MHz for the luminance and 6.75 MHz for
the chrominance signals. Since the analog-to-digital conversion is 8 bits, the corresponding data
rate rises to 216 Mbit/s. This format corresponds to Recommendation CCIR 601 of the ITU and is
described as Type 422. It leads to images presented in the form of tables of 8-bit numbers containing
720 points per line and 576 useful lines, in the case of 625-line scanning. Thus, one image corre-
sponds to 414 720 bytes for the luminance and 207 360 bytes for each chrominance component.
Bit rate reduction techniques rely on the fact that good modeling is provided by the output of a
first-order IIR filter to which Gaussian white noise is applied. The corresponding two-dimensional
autocorrelation function can be written as:
V(x, y) = r0 e−(𝛼x+𝛽y)
where 𝛼 and 𝛽 are positive constants. For the associated spectrum, this gives:
( )
4𝛼𝛽
S(𝜔1 , 𝜔2 ) = r0 ( )( ) (17.11)
𝛼 2 + 𝜔21 𝛽 2 + 𝜔22
The greatest compression in the representation of a signal is obtained with a transformation based
on the eigenvectors of the autocorrelation matrix. In the case of first-order signals, this transforma-
tion is well approximated by the discrete cosine or sine transformation, presented in Sections 3.3.3
and 3.3.4. In the image compression standards, it is the DCT applied to blocks of 8 × 8 picture ele-
ment points, or pixels, which has been retained. The standards formulated for videophones, image
storage, and digital television use the following three techniques [8]:
(1) Motion estimation in order to be able to minimize the difference between the current image
and the preceding one.
(2) The discrete cosine transform to minimize spatial redundancy.
(3) Variable-length statistical coding (VLC).
The general structure of an image encoder is shown in Figure 17.10. The digitizer Q operates
above thresholds which can be set by a control device, allowing a constant bit rate to be obtained
with the help of a buffer memory. For commercial-quality television, the bit rate can be reduced to
around 4 Mbit/s, which represents a compression factor of the order of 50 [9].
Digital filters are used for interpolation and subsampling operations during changes in image
format or movement estimation. These are separable filters. Digital compression of multimedia
signals, speech, image, and sound allows for considerable reduction of the bit rates required for
broadcasting programs, and in combination with the techniques of digital transmission, it offers
the possibility of high spectral efficiency by transmitting several programs in the channels formerly
used for a single analog program. The resulting saving may be quite significant, as is the case in
satellite television broadcasting.
344 17 Applications

Regulation

Source Motion
+ DCT Q VLC
images estimation Multiplex Buffer Bitstream

QI

ICDT

Vectors Image memory


Modes and prediction
Fixed
Intra
Inter

Figure 17.10 General structure of a moving image encoder.

Techniques with high spectral efficiency make intensive use of digital processing and they make
the best use of the characteristics of the channels. In this way, multicarrier techniques can lead
to capacities of several bit/s per hertz on channels which are of limited quality or susceptible to
interference.

17.7 Multicarrier Transmission – OFDM

The objective of multicarrier transmission techniques is to approach the theoretical capacity of


a channel, firstly by limiting the effect of distortion and secondly by adjusting the data rate in
accordance with the spectral density of the noise. In fact, by dividing a given channel into tens,
hundreds, or thousands of subchannels, the effect of distortion on each subchannel is made neg-
ligible and the data rate it is able to support can be allocated to each subchannel. One simple and
effective approach to implementing this procedure involves the fast Fourier transform; known as
OFDM (orthogonal frequency-division multiplexing), its principles are illustrated in Figure 17.11.
The stream of data to be transmitted is converted into N elementary streams, each N times smaller,
which are applied to the input of an inverse DFT computer. In accordance with the definition of
the inverse DFT given in Chapter 2, this operation corresponds to a modulation of N carriers by
the elementary streams, at frequency multiples of f s /N, and the addition of the set of signals mod-
ulated in this way. Thus, the rate of OFDM symbols is f s /N. On reception, after passing through the
channel, a direct DFT performs the set of demodulations and reconstructs the original data, which
merely needs to be serialized to recover the initial stream.
This simple principle is a direct illustration of the definition of the DFT and its inverse. However,
to operate correctly, it requires several precautions and adaptations [10].
17.7 Multicarrier Transmission – OFDM 345

Data Serial/ Parallel/ Data


Parallel FFT – 1 Channel FFT Serial
d (n) conversion conversion d(n)

Figure 17.11 The principle of OFDM transmission.

Referring to Section 2.4, and in particular to Figure 2.8, notice that orthogonality of the signals is
valid only for frequencies which are at the center of the interval of length f s/2N allocated to each
subchannel; and notice how the subchannels have an area of overlap and that the amplitude of
overlap reduces with increasing frequency difference. On the edges of the transmission channel,
the frequency responses of the subchannels are not symmetric, which can lead to interference. It
is therefore necessary to avoid using extreme subchannels and to provide a margin of at least a few
subchannels on each side of the chosen frequency band.
In the time domain, a practical transmission channel has an impulse response of duration 𝜏. To
avoid superposition of two consecutive OFDM symbols on reception, the symbols must be sepa-
rated by sufficient time; that means it is necessary to introduce a guard interval T g > 𝜏. During this
guard interval, it is necessary to prolong the OFDM symbol, to introduce the circular convolution
mentioned in Section 2.1, and hence to avoid interference between the subchannels. In practice,
the receiver operation is facilitated by the end of the symbol being reproduced at the start after a
time T g (Figure 17.12).
With this device, the received signals are simply multiplied by the DFT of the channel – an effect
which can be compensated for by an equalization in amplitude and phase in each subchannel. To
show this, the Z-transfer function of the channel which contains P ≤ Ng coefficients is defined as
C(Z):

P
C(Z) = Cp Z −p (17.12)
p=0

If x(n) is the transmitted signal, the received signal y(n) can be written:

P
y(n) = Cp x(n − p)
p=0

Ng Ng

Symbol p – 1 Symbol p Symbol p + 1

r(n)

Figure 17.12 Introducing a guard interval and the correlation function.


346 17 Applications

As x(n) is expressed in terms of the data dk by:


N−1
x(n) = dk ej(2𝜋∕N)kn (17.13)
k=0

The double summation is obtained for y(n):


P

N−1
y(n) = Cp dk ej(2𝜋∕N)k(n−p) (17.14)
p=0 k=0

Setting:


P
Hk = Cp e−j(2𝜋∕N)kp
p=0

finally gives:


N−1
y(n) = (dk Hk )ej(2𝜋∕N)kn (17.15)
k=0

The circular convolution property of the DFT is again found, and the receiver provides the trans-
mitted data multiplied by the channel spectrum H k .
The redundancy of the transmitted signals can be exploited by the receiver for synchronization.
In fact, by calculating the following correlation function:


n
r(n) = y(i)y∗ (i − N) (17.16)
i=n−Ng +1

peaks appear as shown in Figure 17.12 which characterize the start of each symbol and allow the
temporal analysis window of the receiver to be adjusted and a contribution made to synchroniza-
tion of the clocks.
Synchronization in time and frequency is a major problem in systems with a large number of
carriers, and special reference symbols are introduced or some subchannels are reserved for fixed
signals called pilots.
Figure 17.13 shows the block diagram of a digital television receiver for terrestrial broadcasts
[11]. The analog interfaces carry the signal in the band 0.76–8.37 MHz and the analog-to-digital
conversion is performed at f s = 18.28 MHz. Then a real-to-complex conversion using a quadrature
filter is performed and the sampling frequency is reduced to 9.14 MHz. Before calculation of the
FFT at N = 8192 points, a complex multiplier performs frequency adjustment on the spectrum
of the signal. Temporal synchronization controls the positioning of the window of the FFT. The
transmitted signal contains 6817 active carriers, and 177 of them are dedicated to pilot signals which
allow for exact synchronization of the receiver, an estimate of the frequency response of the channel
for equalization, and a measure of the distortion in each subchannel. The guard interval can reach
20% of the symbol duration. This system must facilitate transmission rates up to 32 Mbit/s in a
channel with 8 MHz spacing, or 4 bit/s/Hz.
Alternative multicarrier transmission techniques require filter banks as described in Chapter 12.
At the cost of an increase in computational complexity, they allow the cyclic prefix to be avoided,
and deliver a high level of out-of-band signal rejection, which may be critical for coexistence in
networks [12].
17.8 Mobile Radiocommunications 347

Frequency Pilot
synchronization extraction

s(t) Analog FFT


Subchannel
A/D I/Q equalization
interface N = 8192

Decoder

Time
synchronization Data d(n)

Figure 17.13 Receiver for terrestrial digital television.

17.8 Mobile Radiocommunications

The basic mobile radio channel, called Rayleigh, corresponds to the non-line-of-sight (NLOS) trans-
mission between transmitter and receiver. It is represented by the sum of a large set of independent
paths of equivalent propagation delays, with the addition of Gaussian white noise. As a result, use-
ful signals can be viewed as being multiplied by a complex coefficient whose real and imaginary
parts are independent, centered, Gaussian random variables.
In practice, transmission may happen with multiple reflections and diffractions, with differing
propagation delays. Then, the channel is characterized by multipaths with fading, and it is said
to be doubly dispersive, because it exhibits dispersions in both the time and frequency domains.
These obstacles can be overcome by the multicarrier technique OFDM, which is able to exploit the
channel with high bit rates [13].
Processing design and performance evaluations rely on channel models which have been defined
for three multipath configurations:

Extended pedestrian A (EPA)

Delay (ns) 0 30 70 90 110 190 410

Power (dB) 0 −1 −2 −3 −8 −17.2 −20.8

Extended vehicular A (EVA)

Delay (ns) 0 30 150 310 370 710 1090 1730 2510

Power (dB) 0 −1.5 −1.4 −3.6 −0.6 −9.1 −7.0 −12.0 −16.9
348 17 Applications

Extended typical urban (ETU)

Delay (ns) 0 50 120 200 230 500 1600 2300 5000

Power (dB) −1 −1 −1 0 0 0 −3 −5 −7

For each incident wave, and for each of the multipath channels, mobility entails a frequency shift
due to the Doppler effect given by:
v
f = f0 cos(𝜃) (17.17)
c
where v is the speed of the mobile, c is the celerity of light, f 0 is the carrier frequency, and 𝜃 is the
angle of incidence. As a consequence, the signal associated with a given path undergoes a slow
evolution, whose Fourier transform, called the Doppler spectrum, is generally represented by the
following theoretical formula:
1
S(f ) = ( −fD < f < fD
( )2 )0.5
; (17.18)
f
1− f
D

where f D represents the maximum frequency.


Two approaches can be contemplated to model this Doppler spectrum [14].
– Purely recursive second-order section. The approximation applies whenever f D is much smaller
than the frequency step – that is, the product f D T is small (T: symbol period). The complex
coefficients of the path are determined by:

( )
f
C(n) = b1 C(n − 1) − b2 C(n − 2) + u(n); b1 = 2rd cos 2𝜋 √D T ; b2 = rd2 (17.19)
2
The input u(n) is zero-mean Gaussian white noise. The pole is very close to the unit circle – for
example: r d = 0.999 − 0.1 2𝜋 f D T.
) initial phases 𝜃(k). The cissoid
– Sum of N cissoids uniformly spread on the unit circle with(random
with index k undergoes the Doppler shift: Δf (k) = fD cos 2𝜋kN
and the coefficient of the path is
expressed by:

1 ∑ j(2𝜋Δf (k) n+𝜃(k))


N
C(n) = √ e (17.20)
N k=1

Next, the amplitudes defined in the tables of the multipath models are multiplied by C(n) to get
the coefficients of the radio channel.
The mobile radio channel so defined entails a set of constraints for OFDM. However, thanks to its
richness in multipaths, it offers noteworthy potential gains in capacity and flexibility. In particular,
the MIMO technique described in Section 13.9 allows for multiuser transmission. The principle is
shown in Figure 17.14 for four antennas at the transmitter, two users and two antennas per user at
the receiver.
The principle consists of decomposing the radio channel into signal and null spaces, so that
each user transmits their data in the other user’s null space. Accordingly, the 2 × 4 channel matrix
References 349

Figure 17.14 Decomposition of the channel matrix of user 1.

U S1 0 0 0 V1 V2 V3 V4
0 S2 0 0

between the transmitter and a user, H 1 , is decomposed into singular values to yield the matrices U,
S, and V, as in Figure 17.14. V is the modulation matrix in the transmitter, S is the singular value
matrix, and U t is the demodulation matrix.
The channel matrix decomposition is written as: H 1 = U S V t . Vectors V 1 and V 2 define the
signal space of user 1, while V 3 and V 4 define the null space. If user 1 is alone, the data couples
to be transmitted are applied to the 4 × 2 matrix [V 1 V 2 ], whose outputs are connected to four
antennas through four OFDM interfaces. Now, if user 2 is active, it must use the null space and its
transmission matrix becomes H 2 [V 3 V 4 ]. This 2 × 2 matrix must be decomposed to get the modu-
lation matrix applied to the data couples of user 2. Then, the same procedure is enforced on user 1,
starting from the decomposition of H2 and performing the transmission in the corresponding null
space. Finally, with the same set of antennas, the radio base station is able to target a set of different
users [15].
Multiantenna systems, with four and eight antennas, are included in the basic mobile radio
standards.
The constraints imposed by the channel on OFDM are as follows, in particular:

– The subcarrier spacing must account for the Doppler frequency of the fastest mobile.
– Transmission by blocks of a limited number of multicarrier symbols.
– Estimation of the channel through preamble and/or scattered pilot data.
– Some measures to counter fading and noise.

A typical configuration is as follows:

– Bandwidth: 10 MHz.
– Sampling frequency: 15.36 MHz.
– FFT size: 1024; cyclic prefix: 72.
– Subcarrier spacing: 15 kHz.
– Number of symbols per block: 14.

Assuming the carrier frequency is 4 GHz and mobile speed is 100 km/h, the maximum Doppler
frequency is 370 Hz, and it has to be compensated for with every transmitted symbol.
Leveraging signal processing techniques and making use of current computation capabilities,
the new generations of wireless and mobile radio systems are able to reach communication quality
levels and bit rates that come close to those of hardwired systems and they contribute significantly
to developing the flexibility of digital networks.

References

1 U. Mengali and A.N. d’Andrea, Synchronization Techniques for Digital Receivers, Plenum Press,
1997.
2 H. Meyr, M. Moeneclay and S.A. Fechtel, Digital Communication Receivers: Synchronization,
channel estimation and Signal Processing, Wiley, Chichester, 1998.
350 17 Applications

3 J.D. Gibson, Adaptive Prediction in Speech Differential Encoding Systems. Proceedings of the
IEEE, 68, 1980.
4 CCITT, Digital Networks, Transmission Systems and Multiplexing Equipment Book III.3,
Geneva, 1985.
5 ISO-IEC 13818, Information Technology, Coding of Moving Pictures and Audio, Geneva, 1996.
6 ISO-IEC 14496, Information Technology, Generic Coding of Audio-Visual Objects, Geneva, 1998.
7 C. Breining, et al., Acoustic echo control: application of very-high-order adaptive filters, IEEE
Signal Processing Magazine, 16(4), 42-69, 1999.
8 J. Chen, U. Koc and K .J. Ray Liu, Design of Digital Video Coding Systems, Marcel Dekker Inc.,
New York, 2002.
9 G. Sullivan and T. Wiegand, Video Compression—from concepts to the H264/AVC standard.
Proceedings of the IEEE, 93(1), 18-31, 2005.
10 T. de Couasnon, R. Monnier and J. B. Rault, OFDM for Digital TV Broadcasting, Signal Process-
ing, Vol. 39, 1994, 1-32.
11 U. Ladebusch and C.A. Liss, Terrestrial DVB—a broadcast technology for stationary, portable
and mobile use, Proceedings of the IEEE, 94(1), 183-193, 2006.
12 M. Renfors, X. Mestre, E. Kofidis and F. Bader, Orthogonal Waveforms and Filter Banks for
Future Communication Systems, Academic Press, Cambridge, MA, USA, 2017.
13 Document: 3rd Generation Partnership Project; Technical Specification Group Radio Access
Network; Evolved Universal Terrestrial Radio Access (E-UTRA); Base Station (BS) Radio
Transmission and Reception (Release 15), 3 GPP TS 36.104 v15.4.0, 2018.
14 F. Pérez Fontan and P. Marino Espineira, Modeling the Wireless Propagation Channel, Wiley,
Chichester, 2008.
15 E. Dahlman, S. Parkvall and J. Skold, 5G-NR: the Next Generation Wireless Access Technology,
Academic Press, Cambridge, MA, USA, 2018.
351

Exercises: Solutions and Hints

Chapter 1
1 2 ∑
4
(−1)p+1
1.1 IL (t) = 2
+ π 2p−1
cos 2𝜋(2p − 1) Tt
p=1

1.2 s(nT) = sin (n𝜋 + 𝜑) = (−1)n sin 𝜑. ( )


𝜋
The possibility of reconstruction depends on 𝜑. 𝜑 = 2
yes; 𝜑 = 0 no .

( ) √
fs 2 2
1.3 H 2
= 𝜋
(0.92 dB).

1.4 f 2 < f s < 2f 1 .

sin 3𝜋 n
1.5 s(nT) = sr (nT) + jsi (nT) = ej(𝜋∕2)n ( 8 ).
sin 𝜋8 n

1.6 Maximum value of s(n) = 8;


k
s(n) = 0 for 𝜑k = −2𝜋 n + k𝜋
8
1.7 f s = 2 MHz; Δf = 1 kHz.

1 1 sin 𝜋(f2 −f1 )𝜏


1.8 p(1) = √
𝜋 A2 −s2
; r(𝜏) = 2(f2 − f1 ) 𝜋(f2 −f1 )𝜏
cos 𝜋(f2 + f1 )𝜏.

p sin 𝜋2 n
1.9 Periodic part; Fourier coefficient: Cn = 𝜋n
.
1−cos 𝜋fT
Non-periodic part; spectrum: S2 (f ) = p(1 − p)T 𝜋 2 f 2 T2
.

1.10 Signal-to-noise ratio in the band: 300–500 Hz = 75 dB (f s = 16 kHz; gain 3 dB).

1.11 Quantizing distortion: line at 38 fs ; power: 0.01952 .

1.12 If the characteristic is centered:



1 4q 1 1
a1 = 0 for 0 ≤ |𝛼| ≤ ; a1 = 1− for ≤ |𝛼| ≤ 1
2 𝜋 4𝛼 2 2
q 2q
Centering at 2
∶ a1 = 𝜋
for 0 ≤ |𝛼| ≤ 1.
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
352 Exercises: Solutions and Hints

1.13 Without clipping (peak factor): 10 bits; with clipping at 1% = 9 bits.

1.14 Linear coding (S/N)max = 50 dB; with nonlinear coding, it varies from 35 to 38 dB when the
signal varies from −36 dB to 0 dB.

1.15 Optimal values: x0 = 0; x1 = 0.9816: y1 = 0.4528; y2 = 1.510.

Chapter 2

2.1 The DFT of the second set is related to the DFT of the first by X ′ (k) = e−jk(𝜋/4) X(k)

2.2 Real multiplications: M R = 28: real additions: AR = 84.

2.3 The small differences come from aliasing and decrease when N increases.

2.4 Number of complex multiplications: 160, 96, 72. Additions: 384.

2.5 Maximum noise-power at any output: 28. q2 /12. With quantization at eight bits of the coeffi-
cients: |𝜀(i, k)| ≤ 0.003.

2.6 Recursion: X 0 = x(N − 1); X m = x(N – 1 − m) + WXm − 1 for 1 ≤ m ≤ N − 1


N − 1 complex multiplications are needed.

q2
2.7 Total roundoff noise: N 12 + Nq2 ; signal-to-noise ratio degradation: ΔSNR = 11.5 dB (input
noise q2 /12).

2.8 Recording: 20 000 samples; memory 160 kbits; cycle time per multiplication: 1 μs.

2.9 Cosine, Hamming, and Blackman windows attenuate the secondary lobes but do not allow
for the detection of weak components.

Chapter 3

3.1 It is sufficient to verify that the products I 3 × A and A × I 3 are different.

3.2 It is sufficient to verify the relations 3.3, 3.4, 3.5, and 3.6.

3.3 Number of real multiplications in radices 2, 4, and 8: 384, 284, and 246.

3.4 The order-12 DFT requires 20 complex multiplications.

3.5 Use relations (3.18) and (3.21) to get the two factorizations.

3.6 The complex DFT of order eight leads to 24 real multiplications. The odd transform
leads to 26.
Exercises: Solutions and Hints 353

3.7 With that approach, the operations in Δ12 vanish, which reduces the number of complex
multiplications to 16.

3.8 Matrices of the transformation:


⎡1 1 1 1 ⎤ ⎡1 1 1 1 ⎤
⎢ ⎥ ⎢ ⎥
1 4 16 13 ⎥ 1 13 16 4 ⎥
T=⎢ ; T −1 = 13 ⎢
⎢1 16 1 16 ⎥ ⎢ 1 16 1 16 ⎥
⎢1 13 16 4 ⎥⎦ ⎢ 1 4 16 13 ⎥
⎣ ⎣ ⎦

Chapter 4
9−n
4.1 Response to the sequence an ∶ y(n) = an−3 1−a
1−a
for 5 ≤ n ≤ 8.

4.2 Take the derivative, perform series expansion, and an integration


∞ ( )


1 n; Z −1 1 ∑ 1 1
log(z − a) = − n
Z −1 )(1 − bZ −1 )
= n
− n
Z −n
n=
na (1 − aZ a − b n=0
b a
with |a| < 1; |b| < 1

1
4.3 H(Z) = (1−rej𝜃 Z −1 )(1−re−j𝜃 Z −1 )
.

4.4 Output power: 21; H(𝜔) = 4.41 − 1.536 cos 𝜔 + 0.46 cos 2𝜔.

4.5 System response:


[ ]
ejn𝜔 n−1 r sin(n + 1)𝜃 sin(n𝜃)
y(n) = + r (a − 0.8b) − 0.8a
1 − e−j𝜔 + 0.8e−2j𝜔 sin 𝜃 sin 𝜃

Chapter 5

5.1 The response is zero for f = 0.288; 0.347; 0.408; 0.469; maximal ripple: 0.08.
Zeros of H(Z): 0.606; 1.651; 0.4292 ± j0.464; 1.073 ± j1.161.

5.2 Coefficients: −0.012; 0; 0.042; 0; −0.093; 0; 0.314; 0.5. Zeros of H(Z): 0.4816; 2.076;
0.3764 ± j0.368; 1.3583 ± j1.328; maximal ripple: 0.03.

5.3 𝛿 = 0.017, which is less than the above values.

5.4 With the sampling frequency f s /2 at the output, the numbers of memories and multiplications
are divided by 2, through interleaving (see Section 10.5).

5.5 In the complex plane, H(Z) is rotated by 𝜋 and ±𝜋/2, which yields a high-pass filter and a
low-pass filter.

5.6 Coefficients: N = 27; computation accuracy: bc = l2 bits; bi = 20 bits.


354 Exercises: Solutions and Hints

Chapter 6

6.1 Follow the derivation in Section 6.1. The difference between filter delay and group delay illus-
trates the nonlinearity of the phase response.

1
6.2 Unit step response: y(n) = 1.8
(1 − (−0.8)n+1 ) + (−0.8)n+1 y(−1).

6.3 Poles: P = 0.78 ± j0.438. The zeros do not add multiplications to the circuit.

6.4 H m = 85; cos 𝜔0 = 0.808; ||H||2 = 8.53.


With zeros at 3f s /8: ||H||2 = 25.8.
Limit-cycle frequency close to f s /10; amplitude ≃42q. Significant amplitude oscillations are
possible because (6.56) is not verified.

6.5 bc ≥ 13 bits; infinite attenuation point displacement: df i , ≤2.210−5 f s .

0.796 − 1.42Z −1 + Z −2
6.6 H(Z) = ; 𝜏 (𝜔) calculated by (6.45).
1 − 1.42Z −1 + 0.796Z −2 g
Realization possible with three multiplications.

Chapter 7
1.49+1.4 cos 𝜔
7.1 First-order section: |H(𝜔)|2 = 1.81−1.8 cos 𝜔
1.6 sin 𝜔 1.6(0.37 cos 𝜔 − 0.2)
𝜑(𝜔) = tan−1 ; 𝜏(𝜔) =
1.63 − 0.2 cos 𝜔 (0.37 cos 𝜔 − 0.2)2 + 2.66 sin2 𝜔

7.2 Characteristic frequencies: 0.162; 0.231; 0.538; 0.736.


−1 4
7.3 Transfer function: H(Z) = 0.094 (1+0.039Z(1+Z )
−2 )(1+0.447Z −2 )
.

Z −𝛼 −1 sin 𝜋 (f1 −f1′ )


7.4 Transformation: Z −1 = − 1−𝛼Z −1
; 𝛼= sin 𝜋 (f1 +f1′ )
with f 1 = 0.1725; f′ 1 = 0.1; 𝛼 = 0.3.

7.5 Scale factors: a00 = 2−6 ; a10 = 2+2 ; a20 = 2.


For H(j) = 1, we get: a0 = 0.515 and a30 = 4.12; accuracy: bi , = 16 bits.

7.6 Coefficient wordlength: bc ≃ 12 bits. The optimum is obtained through systematic search
about rounding. The critical pole 0.9235 ± j0.189 means it is not possible to reduce bc to
11 bits.

7.7 The filter in Section 7.2.3 can have limit cycles of amplitude less than 3q and frequency near
f s /5.

7.8 The IIR filter requires 7 multiplications and 4 memories while the FIR counterpart requires
8 multiplications and 16 memories.
Exercises: Solutions and Hints 355

7.9 Transfer function:


1 − 1.165Z −1 + Z −2 1 − 0.198Z −1 + Z −2
H(Z) = 0.0625
1 − 1.404Z + 0.84Z 1 − 1.238Z −1 + 0.455Z −2
−1 −2

f1 = 4832HZ ; f2 = 7495HZ ; Δf1 = 4HZ ; Δf2 = −3HZ .


Scale factors: a00 = 2−3 ; a10 = 2−1 ; a20 = 1.

7.10 Theoretic order: N = 5.19; for N = 6, 𝛿 1 becomes very small. Coefficient wordlength: bc ≃ 11
bits. Difference between input and internal data: 7 bits.

Chapter 8
[ ] [ ]
1 z 2 1 − z∕2 z∕2
8.1 S= ; t=
z+2 2 z −z∕2 1 + z∕2
1 LC
For LC circuits take, z = Lp + Lp or z = 1
Lp+ Cp

8.2 The diagram is that shown in Figure 8.3, with N = 6 and Y 6 = 0. For
fs = 40 kHz ∶ a1 = a4 = 0.205; a2 = a3 = 0.085;
The coefficients are multiplied by four for f s = 10 kHz.

8.3 Follow the procedure given at the end of Section 8.2 and show the impact of the sampling
frequency. Verify the five-bit curve given in Figure 8.9.

8.4 Lattice filter zeros: 0.6605; 0.6647 ± j0.5020; after rounding the ki to 5 bits: 0.6661;
0.6377 ± j0.5002.

Chapter 9
1 − a cos 2𝜋f −a sin 2𝜋f
9.1 X(f ) = +j .
1 + a2 − 2a cos 2𝜋f 1 + a2 − 2a cos 2𝜋f

9.2 Calculate X I (𝜔) through the Hilbert transform or write:


[ ]
1 1 1
XR (𝜔) = +
2 1−P 1−P

9.3 The nonzero terms in the sets xR (n) and xI (n) are interleaved. The operation performed is
analytic filtering: y(n) = 12 e−jn 𝜋5 .
( ) ( )
fe
9.4 Coefficient wordlength: bc ≃ 2 + 12 log2 2𝛥f
+ log2 1
𝛿
.

( ) ( )
Phase shifter order: N ≃ log 𝜋𝜀 log f s f s ;
f f
9.5
( ) ( )1 2 ( )
coefficients: bc ≈ log2 𝜋𝜀 + log2 f s + log2 f s .
f f
1 2
For the example in Section 9.4: N = 4.97; bc ≃ 14 bits.
356 Exercises: Solutions and Hints

9.6 Transfer functions:


Hm (Z) = 1 − 2Z −1 + 2Z −2 − Z −3 + 0.25Z −4
HL (Z) = 0.5 − 1.5Z −1 + 2.25Z −2 − 1.5Z −1 + 0.5Z −4
HM (Z) = 0.25 − Z −1 + 2Z −2 − 2Z −3 + Z −4

9.7 Filter frequency response

H(f ) = e−j2𝜋3f [0.5 + 0.592 cos 2𝜋f − 0.1012 cos 6𝜋f]


( )
f
H(0) = H s = 0.989; 𝛿1 = 𝛿2 = 0.011; Δf = fs ∕4
8
Output of IQ modulator

y(n) = 0.4945 e−j(n−5)𝜋∕4 + 0.0055 ej(n−5)𝜋∕4

After undersampling, setting n = 2p+1

y(p) = 0.4945 e−jp𝜋∕2 − 0.0055 ejp𝜋∕2

Two components at frequencies f s /4 and −f s /4 = 3f s /4.

9.8 Pass-band width: [0; 0.25]; ripple: 4 × 10−2 ; delay: 2T.

9.9 Filter impulse response:


( )
sin n𝜋 ( )
𝜋
2 2 j N −1 n
h(n) = ( )e 2 N
N sin 𝜋n
N

With respect to (9.24), the DFT introduces double periodicity in time and frequency.
The values of xc (n) in the vicinity of n = 8 are approximately restored from the first 3 or 7
values of h(n). Clearly, the DFT is efficient but the accuracy is limited.

9.10
[ ] [ ] [ ]−1
x(5) −1 − 0.866j 1 j
= =− T21 X1
x(6) 0.612 − 1.319j −1 e−j3𝜋∕4

Chapter 10

10.1 Coefficient wordlengths: bc = 1, 2, 5, 6, 9, 10, 10, 11, 14. For the half-band filter:

bc ≈ 2 log2 (1∕𝛿m − 𝛿0 ).

10.2 Filters in the cascade of 3 filters: Δf = 0.4 with M = 2; Δf = 0.15 with M = 3; Δf = 0.025
q2
with M = 8. Roundoff noise at the output of a half-band filter: 2M 12 .
q2
After three filters: PN = 20 12 .
Exercises: Solutions and Hints 357

10.3 The function can be carried out with a half-band filter (M = 3) and a low-pass filter with
54 coefficients – hence the computation rate of 264 kmult/s. A direct realization with 100
coefficients leads to 400 kmult/s.

10.4 The odd DFT corresponds to a frequency shift of: f s /2N.

10.5 Polyphase network functions:


D(Z)(1 − 0.1354Z −1 + 0.069Z −2 )(1 + 0.98Z −1 + 0.51Z −2 )
N1 (Z) = 1 + 7.806Z −1 + 9.718Z −2 + 3.773Z −3 + 0.1883Z −4
N2 (Z) = 3.713(1 + 2.908Z −1 + 2.035Z −2 + 0.317Z −3 )
The diagram is that given in Figure 10.10; multiplication rate: 8f s .

Chapter 11

11.1 Frequency response:


H(f ) = e−j2𝜋f2.5 [0.904 cos 𝜋f + 0.234 cos 3𝜋f − 0.1 cos 5𝜋f]
Output signal:
y(n) = 0.963 cos(n − 2.5)𝜋∕4
Output of analysis filters, after downsampling:
0.963 cos(n − 2.5)𝜋
u1 (n) = + 0.963 cos(n − 2.5)3𝜋∕4
4
0.037 cos(n − 2.5)𝜋
u2 (n) = + 0.037 cos(n − 2.5)3𝜋∕4
4
Total transfer function:
T(Z) = Z −1 [−0.023 + 0.121 Z −2 + 0.882Z −4 + 0.121Z −6 − 0.023Z −8 ]

Reconstructed signal: x (n) = 0.929 cos(n − 5)𝜋/4

11.2 Transfer functions of the two factors:


H0 (Z) = (1 + Z −1 )3 ∕4; H1 (Z) = (−1 − 3Z −1 + 3 Z −2 + Z −3 )∕4
Amplification factor: 20/16 = 1.25.

11.3 Use the procedure described in Section 11.3, without double zero at −1 for H 1 (−Z).

11.4 Output signals: y1 (n) = 0.951 cos(n − 4)𝜋/4


y2 (n) = 0.15 cos(n − 3)𝜋∕4
Downsampling introduces image components at frequency 3/8.
Cancelling of the image components at reconstruction can be verified.

11.5 The reconstruction error is bounded by the quantization step multiplied by twice the sum
of the absolute values of the coefficients.
358 Exercises: Solutions and Hints

Chapter 12

12.1 The transfer function of the first branch of the polyphase network is:
B1(Z) = Z −1∕2 [−0.0218 + 0.0621 Z −4 + 0.1996 Z −8 + 0.0160 Z −12 ]
To simplify, let
H1(Z) = [−0.0218 + 0.0621 Z −1 + 0.1996 Z −2 + 0.0160 Z −3 ]
The frequency response is analyzed, taking Z = ej2𝜋f .
A delay 𝜏 corresponds to the factor e−j2𝜋f𝜏 ; the delays in the branches are as follows:
𝜏 = [1.8875 1.5863 1.3198 1.0188].
These delays compensate the factors due to interleaving: (1/2, 3/2, 5/2, 7/2).

12.2 Convolution in time is multiplication in the frequency domain.


The coefficients are obtained by calculating the inverse Fourier transform of the prototype
filter impulse response. They are applied at FFT output in the analysis part and at inverse
FFT input in the synthesis part.
The coefficients are given by (12.40) and (12.42). For the first filter of the analysis bank after
the basis filter, the outputs 1 to 7 must be used and the multiplication results are summed.
For the synthesis, the signal intended for the first filter must be applied to inputs 1 to 7 of
the iFFT, after multiplication by the coefficients.

12.3 Assuming the length of the prototype filter impulse response is unity, with a continuous-time
half-sinusoid, we get:
2 cos(𝜋f)
H(f ) =
𝜋 1 − 4f 2
To be compared with the FFT response in similar conditions (duration: 0.5):
( )
𝜋f
sin 2
HFFT (f ) =
𝜋f∕2
The frequency response H(f ) decreases with the square of the frequency.
In each branch of the polyphase network, the coefficient values are h(n) and h(n + N).
Realization by 2N-FFT – it suffices to add two adjacent outputs. More multiplications are
needed.

Chapter 13

13.1 AC function:
[ ]
N −p N −p sin(2𝜋(N − p)f )
r1 (p) = cos(2𝜋pf) + cos(2𝜋(N − 1)f )
N N (N − p) sin(2𝜋f)
[ ]
N −1 sin(2𝜋2(N − 1)f )
r1 (1) = cos(2𝜋f) +
N 2(N − 1) sin(2𝜋f)
For f = 1/8 and N = 16, we get r 1 = 0.618.
Exercises: Solutions and Hints 359

Cancelling y(n) for n > N−1 corresponds to applying a quadrature filter which, due to
the ripple, introduces a deviation for the frequency values which are non-multiples of
1/2N – hence the difference between the estimated and actual frequencies (1.4%). The
bound is CR = 3.7 × 10−6 .

13.2 Time constant 𝜏 = 1/𝛿. For y(n) to approach m within 1% on average, 23 samples are needed.
After the transition phase, the quadratic residual error is given by (14.39). Recursive and
non-recursive estimators are equivalent for n ≃ 2/𝛿.

13.3 AC function: r(0) = 1; r(1) = 0.707; r(2) = 0.


Eigenvalues of R3 : [2, 1, 0].
( )
13.4 Output power: Ps = 1 + a21 + a22 𝜎b2 + |1 − a1 e−j𝜔 − a2 e−2j𝜔 |2
Setting the derivatives to zero, we obtain the expressions of the coefficients.

13.5 AC function:
( )
cos 𝜋4 + cos 𝜋3
1 2𝜋
r(0) = 1; r(1) = ; r(2) = cos
2 4 3
We verify that the zeros of the predictor are situated in the Z-plane between ej𝜋/4 and ej𝜋/3 .

13.6 The roots of the polynomials sit on the unit circle, and they verify the alternation principle.

13.7 Covariance matrix: R3 = [2 1.866 1.5; 1.866 2 1.866; 1.5 1.866 2]


Weighting coefficients: A = [3.73 −6.46 3.73]

13.8 Eigenvalues: 𝜆1 = 1.06 + 0.37j; 𝜆2 = 0.30 − 0.60j


Signal-to-noise ratio in each channel: |𝜆i |2 ∕𝜎b2 ; theoretical capacity: 6.2 bit/s/Hz.

Chapter 14

14.1 Impulse response: H(Z) = 0.5(1 + 1.5 Z −1 (1 + 0.5 Z −1 + 0.52 Z −2 + …))


Prediction coefficients: a1 = 1.07; a2 = − 0.43.
Output power: Emin = 0.357 = 1/Gp

H(Z)P(Z) = 0.5 (1 + 0.430 Z −1 − 0.425Z −2 + 0.220Z −3 + …)


1
𝛿m = 1; Er = 1.25 E min ; 𝜏= 𝛿
=4

14.2 The prediction filter exhibits infinite attenuation at frequency f s /8. Hence: a1 = 2;
a2 = −1.
With the noise 𝜎 2 , one gets:
√ 1 + 2𝜎 2 √ 1
a1 = 2 2 4
≃ 2(1 − 6𝜎 2 ); a2 = − ≈ −(1 − 8𝜎 2 )
1 + 8𝜎 + 8𝜎 1 + 8𝜎 2 + 8𝜎 4
360 Exercises: Solutions and Hints

14.3 Signal AC function:


1 1 1
r0 = Px = ; r1 = ; r2 =
3 6 12
t
Optimum coefficient values: Hopt = [2; −1; 0]
t
With noise: Hopt = [1.35; −0.5; −0.07 ]
Noise amplification factor: ||Hopt ||22 = 2.09
The residual interference power is derived from the total response (channel + equalizer).
Minimum eigenvalue: 𝜆min = 0.235. Simulations confirm that the time constant is in agree-
ment with the estimation based on 𝜆min .

14.4 Give the expression of the output error and search for the coefficients which minimize its
power.
To begin with, in the input–output relation of filter H(Z), substitute the reference y(n) for
the output ỹ (n) and compute the optimal coefficient values in that case.

Chapter 15

15.1 Equation of a separation line: −x1 + x2 = 0;

Distances ∶ A1 ∶ 1; A2 ∶ 2; A3 ∶ 2; B1 ∶ −1; B2 ∶ −1; B3 ∶ −4

Iterative method: The adaptation step in (15.2) is set to 1/2 and inputs are used in the follow-
ing order: A1, B1, A2, B2, A3, B3. Normalizing h2 to unity, we get:
H(1) = [0 – 2 1] (line 0 − A1); H2 = [0.2 − 1.2 1]; … . . . .; H6 = [−0.45 − 1.55 1]

15.2 Equations of two separation lines: 1 − 3x1 + x2 = 0; −5 + 57 x1 + x2 = 0


Iterative procedure: In a two-step process, for example, a line separating A from B + C is
determined first, and, then, B is separated from C.

15.3 Optimum coefficient values: h1opt = 1.4; h2opt = 1. Error power: E0 = 0.023. With respect to
the estimations given in Section 15.4, the degradation is due to the nonlinearity, which is
not taken into account.
The step 𝛿 = 0.1 entails an additional error power of 5%.

15.4 Figure 15.6 is supplemented with the set of coefficients hij2 . The coefficients of the last stage
become hj13 (j = 1, 2). As estimations, we get a = 1.19 and b = 2.19.
With coefficients h11j = [1.23 1.20 1.44], the two nonlinearities are accounted for to deter-
mine c, which yields c = 1.5. In the presence of noise, the drift of the coefficients is reduced.

15.5 The circuit leads to 1.535 for the slope of the curve at the origin instead of 𝜋/2. This result
is justified by the series expansion of the sine function.

15.6 Coefficient values: ±1/(0.7 × 15).


Total number of coefficients: 64 × 32 + 32 × 15 + 15 × 10 = 2678.
Updating for one digit: 32 × 64 + 15 × 32 + 15 = 2543.
With the 8 × 8 grid, one might be able to handle 4 (64/16) different writings of each digit.
Exercises: Solutions and Hints 361

15.7 The equations of the circuits are as set forth in Section 15.6. Parameter C determines the
coefficients of the recursive section.
Coefficients h11,1 and h11,3 must, on average, compensate for the attenuation brought by
f (a). Coefficients hij, 2 must compensate for the attenuation of the signals they multiply.
Approximate values can be obtained by simple calculations based on development (15.28).

15.8 Values obtained, with dk (n) reference values:



16
xi1 = ±0.76; xi2 = ±0.64; ei2 ≈ 0.02 ∗ 0.64 ∗ dk (1)dk (2); ei1 = 0.59 ∗ ei2
k=1

Reducing the initial values entails a reduction of the errors involved in the initial coefficient
updating and an increase of the system time constant.
As for stability issues and time constant, see (14.16), (15.22), and (6.18).

15.9 The diagram is extended to a third hidden layer by repeating the[first-layer


( ) circuit. ( )]
The slopes of the lines that bear the segments are expressed by: 8 sin k𝜋8
− sin (k−1)𝜋
8
;
k−1
1 ≤ k ≤ 8. The coefficient values follow. To verify, take x = 8
+ 𝜀 and compute the
output y.
( ) sin( 𝜋 )
𝜋
Maximum error estimation: sin 16 − 2 8 = 0.0037.

Chapter 16

16.1 Prediction coefficients: a1 = 1.85 j; a2 = 1.


Signal to be subtracted in the frequency domain: E(6) = 1; E(10) = 2 j.

16.2 Splitting matrix T8 into four appropriate blocks A, B, C, and D, we find: Sx = − C−1 D X .
Spurious impulse: amplitude 2 and index 3.
Between 2 index values, the impulse impacts the whole useful signal, and it is necessary to
subtract the signal provided by linear prediction.

16.3 For a single error, N1 = 7 and for a double error, N2 = 6. Coding gain: 3 (4.7 dB)

16.4 Error-free transmission of binary data requires at least SNR = 3, or 4.77 dB.
If the system signal-to-noise ratio is 8 dB, the code must bring a gain of 7 dB.
A code with length L = 9 provides the gain 7 dB for bit error rate 10−8 .

16.5 Free distance: 3; decoded sequence (systematic coding): 10010101


Recursive code: rate R = 3/8; free distance: 3.
363

Index

Note: Italicized and bold page numbers refer to figures and tables, respectively.

a with odd order 93, 93


acoustic echo cancelation 342 a posteriori error 281, 283, 288, 298, 302
adaptation step size 281, 285 applications of signal processing 335
adaptive differential pulse-code modulation differential coding of speech 338–339
(ADPCM) 339 echo cancelation 340, 341
adaptive filtering 279, 280 acoustic 342
complexity parameters 286–288 data 340–341
convergence conditions 282–284 frequency detection 335–336, 336
with measurement noise 283 mobile radiocommunications 347–349
normalized algorithms 288–289 multicarrier transmission 344–346
principle of 279–282 orthogonal frequency-division multiplexing
residual error 285–286 (OFDM) 344–346, 345
sign algorithms 288–289 phase-locked loop 337–338
time constant 284–285 sound, coding of 339, 340
adaptive FIR filtering television image processing 342–344
in cascade form 289–291, 290 a priori error 27, 102, 281, 283, 302, 310
in direct form 281, 282 autocorrelation (AC) function 8–10, 12, 18,
adaptive IIR filtering 291–293 63, 82, 259, 339, 343
adaptive systems 261, 264, 266, 285, 297 autocorrelation matrix 281, 339, 343
additive white Gaussian noise (AWGN) 262 and intercorrelation 259–261
adjacent filters, overlap of 247 autoregressive (AR) model 269, 291
analog ladder filter 176, 177 see also digital autoregressive moving average (ARMA) model
ladder filters 269, 291, 293
analog signal xv, xvi, 1, 22
analog-to-digital converter xv, 202 b
analytic signals xviii, 189, 192–195, 197 backpropagation algorithm 300–303
continuous 193, 194 band pass filters 16, 103–104, 109, 157, 158,
discrete 193, 194 218, 220, 335
spectrum of 193 frequency detection by 336
antenna processing 272–273, 272 mask of 103
antisymmetrical filter Berlekamp–Massey algorithm 318
with even order 93, 93 Bessel–Parseval equation 2, 3, 81, 91, 108, 165
Digital Signal Processing: Theory and Practice, Tenth Edition. Maurice Bellanger.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons
364 Index

bilinear transformation 149, 150 quantity of information and channel


elliptic filters 153–156, 154–156 capacity 28–29
frequency warping introduced by 150 of unitary Gaussian signal 27
low-pass filter, calculating any filter by of unitary Laplacian signal 27
transformation of 157–158 coding dynamic range 22–23
binary Fourier transform 71 coding gain and error probability 326–327
binary representations 29–30, 30, 71, 164 coefficients
bit-reversal 41 calculation
blocking effects 67 by Chebyshev approximation 100–102
block interpolation 204–206, 208 by discrete Fourier transform 99–100
Bose–Chaudhury–Hocquenghem (BCH) codes by Fourier series expansion for frequency
313 specifications 94–96
Butterworth filters 151–153, 151, 153, 156, by least-squares method 97–99
239 filter design with a large number of
113–114
c limitation of the number of bits for
cascade form, adaptive FIR filtering in 107–109
289–291, 290 number of coefficients and filter
Cauchy’s principal values 191 characteristic, relationships between
102–104
Cayley–Hamilton theorem 265
of two-dimensional FIR filters by
channel capacity 28–29, 320–322, 328, 331
least-squares method 118–121, 118
characteristic function of random variable 10
coefficient vector 119, 275, 280, 297, 306, 341
Chebyshev approximation 154, 159–160
coefficient wordlength limitation 140–141,
coefficients calculation by 100–102
164–167
Chebyshev’s norm 12
complex signals 189
Chien method 318
analytic signals 192–195
circular convolution 36–37, 50, 51, 345, 346
differentiator 201–202
classification operation 297, 298, 304, 306
FIR filters, interpolation using 202–203,
CNNs see convolutional neural networks
202
(CNNs) FIR quadrature filter, calculating the
coder 25 coefficients of 195–197
binary symbols delivered by 323 Fourier transform of real and causal set
convolutional 331 189–192, 190
FIR filter-based 328 interpolations and signal restoration
peak power of 22, 23, 23 206–207
rate R = 1/3 coder 329 Lagrange interpolation 203–204, 204
recursive systematic 328 minimum-phase filters 200–201
Reed–Solomon 331 recursive phase shifters 197–199, 198
coding see also convolutional codes; single side-band (SSB) modulation 199,
Reed–Solomon codes 199, 200
binary representations 29–30, 30 splines 204–206
defined 1 compression techniques 306
of Gaussian signal 24 “constraint length” of the code 324, 326, 328
nonlinear coding with 13-segment A-law Consultative Committee for International
24–26 Telegraphy and Telephony (CCITT) 23
optimal 26–28, 26 continuous analytic signal 193, 194
Index 365

convolution degradation arising from wordlength limitation


calculations of discrete Fourier transform effects 45–46
using 51 desired filter 113, 114, 161
system model with convolutional decoding deterministic signals 7, 18
326 difference equations, systems defined by
convolutional codes 319 83–84
capacity limit, approaching 321–323 differential coding of speech 338–339, 338
channel capacity 320–321 differentiation of distributions 5
coding gain and error probability 326–327 differentiator 201–202
decoding and output signals 327–328 digital filter coefficients 105, 106, 150
recursive systematic coding (RSC) 328, 328 digital filters 91–93, 123, 127, 147, 148, 153,
simple convolutional code 323–326 155, 343
trellis-coded modulations 330–331 circuit of 179
turbo codes, principle of 329–330 by polyphase network 224–227
convolutional neural networks (CNNs) 306, digital frequency synthesizers 17
306 digital integrator circuit 178, 178
coordinate rotation digital computer digital ladder filters 173 see also analog
(CORDIC) 68 ladder filter
comparison elements 187–188
correlation matrix 264–266
lattice filters 183–187, 184–186
correlogram spectral analysis 261–262
simulated ladder filters 176–180
cosine DFT (cos-DFT) 63, 64
switched-capacitor filters 180–183, 181, 182
cosine filtering 89, 90
two-port circuits, properties of 173–176,
Cramer–Rao bounds 276
174, 176
digital processing machines xviii, 30
d
digital simulated ladder filter 179
data blocks 205, 331
digital television receiver 346, 347
overlapping 67, 67
Dirac distribution 5, 6
data echo canceller 340
discrete analytic signal 193, 194
two-wire line 340–341
discrete cosine transform (DCT) 37, 64, 65,
decimation and Z-transform 213–217, 214
207, 308, 308
decimation-in-frequency algorithms 41 inverse 66, 66
factorizing the matrix of 56–58 two-dimensional 66
fast Fourier transform 41–42 discrete Fourier transform (DFT) 35, 55, 65,
decimation-in-time algorithms 38 66, 71 see also fast Fourier transform
fast Fourier transform 39–41 (FFT)
decoding and output signals 327–328 calculation of, by convolution 51, 52
decomposition coefficient calculation by 99–100
of channel matrix 349, 349 coordinates of the coefficients of 38
with half-band filters 222–224, 222 definition and properties of 36–38
of low-pass FIR filter 217–220, 218, 219 fast convolution 50–51
perfect decomposition and reconstruction filter banks using 227–229
236–238 filtering by phase shift in 48
and reconstruction 245–246, 246 filtering function of 47
of transfer matrix 274 implementation 52
into two sub-bands and reconstruction inverse 65, 65
233–236, 234 odd-time odd-frequency DFT 61–63
366 Index

discrete Fourier transform (DFT) (contd.) simple convolutional code 323–326


real data and odd DFT, transform of 59–61 trellis-coded modulations 330–331
spectrum calculation using 46 Reed–Solomon codes 313
filtering function 46–48, 47 computing in a finite field 317–318
spectral resolution 48–49 in frequency domain 315–316, 315
discrete Hartley transform (DHT) 65 performance of 318–319
discrete noise generation, sampling of 19–20 predictable signals 313–315
discrete random signals, sampling of 18–19 in time domain 316–317, 316
discrete signals 193, 194 error probability 320–322, 325, 331
autocorrelation function of 18 coding gain for 326–327
energy and power of 80–81 error signal 20, 22, 141–142, 168, 270,
hypothesis of ergodicity for 18 315–316, 323, 340
discrete sine transform (DST) 64 elementary 21, 21
distributions 4 spectral distribution of 21
definition 4–5 estimation bounds 275–276
differentiation of 5
Dirac distribution 5 f
Fourier transform of 6 fast convolution 50–51, 51
Gaussian distribution 11 fast Fourier transform (FFT) 38, 229, 344 see
spectral distribution of error signal 21 also discrete Fourier transform (DFT)
D–N structure 141, 142, 158, 167 algorithms 67
second-order section in 139 arithmetic complexities of 63
Dolph–Chebyshev function 96, 96 decimation-in-frequency 41–42
Doppler spectrum 348 decimation-in-time 39–41
double speech detection 342, 342 radix-4 FFT algorithm 43–44, 43
doubly odd discrete Fourier transform, split-radix FFT algorithm 44–45
coefficients of 61, 62 fast Fourier transform (FFT), fast algorithms
for 55
e binary Fourier transform 71
echo cancelation 340, 341 decimation-in-frequency algorithm,
acoustic echo cancelation 342 factorizing the matrix of 56–58
data echo canceller 340 fast algorithms 67–70
two-wire line 340–341 Kronecker product of matrices 55–56
Eigen transform 66 lapped transform 66–67
elementary error signal 21, 21 number-theoretic transforms 71–73
elliptic filters 153–156, 154–156 partial transforms 58
equalization 268, 323, 346 odd-time odd-frequency DFT 61–63
error-correcting codes 313 real data and odd DFT, transform of
convolutional codes 319 59–61
approaching the capacity limit 321–323 sine and cosine transforms 63–66
channel capacity 320–321 two-dimensional DCT (2D-DCT) 66
coding gain and error probability Fermat numbers 73
326–327 filter banks xviii, 245, 339
decoding and output signals 327–328 aliasing of a component in 250
principle of turbo codes 329–330 decomposition and reconstruction
recursive systematic coding (RSC) 328, 245–246, 246
328 filtering function of 229
Index 367

inverse functions, determining 248–249 vs. infinite impulse response (IIR) filters
nonuniform 239 169–170
phase shifts in 228 interpolation using 202–203, 202
polyphase network, analyzing the elements least-squares method, coefficients of 2D FIR
of 247–248 filters by 118–121, 118
prototype filter, determining the coefficients low-pass FIR filter 217–220
of 253, 254 minimum-phase filters 111–113
pseudo-QMF filters, banks of 249–253 number of bits for coefficients, limitation of
real filters, realizing a bank of 254–256 107–109
with two filters 226 number of coefficients and filter
uniform 239 characteristic, relationships between
using polyphase networks and discrete 102–104
Fourier transform 227–229 practical transfer functions and linear phase
filtering see also adaptive filtering; finite filters 91–94
impulse response (FIR) filters; infinite quadrature filter 195–197
impulse response (IIR) filters; multirate raised-cosine transition filter 104–106,
filtering 104
analog 173, 176 realization of 106, 107
cosine 89, 90 by contour extraction 117
digital 173, 224 structures for implementing 106
“max flat” filtering 203 transposed structure 106, 107
raised-cosine 90 two-dimensional 114–117, 116, 117
of random signals 82 coefficients of, by least-squares method
filtering function 49, 96, 106, 134, 197 118–121, 118
Butterworth 151 Z–transfer function of 109–111
of discrete Fourier transform 46–48, 47, 66 “finite memory” filter 89
elliptic 154 first half-band filter 223, 223
of filter bank 229 first-order filter section 123–127, 124, 126
low-pass 157 Fourier coefficients 2, 3
filter design with a large number of coefficients Fourier expansion 2
113–114 Fourier series xvi, 1, 2
finite impulse response (FIR) filters xvi, relations with 37
89–91, 123, 183, 235, 323, 328 Fourier series expansion 3
adaptive FIR filtering in cascade form coefficients calculation by 94–96
289–291 of periodic function 1–2
analytic 197 Fourier transforms xvi, xvii, 3, 4, 14, 35, 49,
coefficients calculation 79 see also discrete Fourier transform
by Chebyshev approximation 100–102 (DFT); fast Fourier transform (FFT)
by discrete Fourier transform 99–100 of a distribution 6
by Fourier series expansion for frequency of a function 2–4
specifications 94–96 inverse 10
by least-squares method 97–99 odd discrete 60, 60
configuration of the zeros in 110 of real and causal set 189–192, 190
design of filters with large number of fourth-order transform, matrix for 40
coefficients 113–114 frequency detection 335–336, 336
direct structure 106, 107 by band-pass filtering 336
half-band 220–222 by complex filtering 336
368 Index

frequency domain half-Nyquist filter 106, 253


Reed–Solomon codes in 315–316, 315 Hamming window 49, 96
for two-dimensional separable filter 116, Heisenberg uncertainty principle 49
117 hidden layers 299, 302
frequency masking 113, 114 neural network with 299
frequency response xvi, xvii, 7, 83, 84, 93, 94, Hilbert transform 191
98, 104, 105, 115, 116, 118, 119, 120, 120,
121, 125–126, 126, 131, 136, 136, 150, 222 i
of Butterworth filter 153 image encoder 343, 344
of elliptic filter 156 impulse invariance 149
of half-band filter 196 impulse train 2, 3, 15
of low-pass analysis filter 258 infinite impulse response (IIR) filters 97, 147,
of minimum phase wavelets 240 169, 183, 328
of multirate filter 219 adaptive 291–293
of quadrature filter 193, 194, 201 vs. finite impulse response (FIR) filters
of the filters 240, 241 169–170
frequency sampling 14 general expressions for the properties of
full-duplex transmission 340 147–148
model functions 148
g bilinear transformation 150–158, 150
Galois fields 317, 318
Butterworth filters 151–153, 151, 153
Gaussian distribution 9, 11
calculating any filter by transformation of
Gaussian law 9, 320
a low-pass filter 157–158
Gaussian noise power 321
Chebyshev approximation 159–160
Gaussian random variables 28, 347
comparison of IIR and FIR filters
Gaussian signals 9, 11–12, 20, 23, 25, 283
169–170
coding of 24
elliptic filters 153–156, 154–156
eight-bit linear and nonlinear coding of 25
filters based on spheroidal sequences
peak factor of random signal 11–12
160–161
Gaussian white noise 273, 319, 343, 347, 348
general second-order filter section 134–138, impulse invariance 149
136, 137 iterative techniques for calculating IIR
gradient algorithm 99, 279, 281, 294, 341 filter with frequency 158–160
graphic symbol recognition 304 limiting the coefficient wordlength
guard interval 345, 345 164–167
minimizing the mean square error
h 158–159
Hadamard code 71, 71 round-off noise 167–169
Hadamard transform 70, 71 structures representing the transfer
half-band filter 195, 225 function 162–164
decomposition with 222–224, 222, 223 multirate filtering with 227
frequency response of 196 sections 123
with phase shifters 225 coefficient wordlength limitation
half-band filter frequency response, amplitude 140–141
of 226 first-order section 123–127, 124, 126
half-band FIR filters 220–222, 221, 221, 224, general second-order section 134–138,
235, 239 136, 137
Index 369

internal data wordlength limitation l


141–142 Lagrange interpolation 102, 203–204, 204
purely recursive second-order section Laplace distribution 12
127–134, 128, 131–133 Laplace transform xvii
realization of 163 lapped transform 66–67
stability and limit cycles 142–144, 143 lattice filters 183–187, 184–186
structures for implementation 138–140, lattice structures 186, 242, 242
138 linear prediction in 271, 271
in-place computation 41 leapfrog 176
integrators least mean squares (LMS) 281, 341
analog 178, 181 least-squares method
digital 178, 178 coefficient calculation by 97–99
filter composed of 177–178, 177 coefficients of two-dimensional FIR filters by
narrowband 126 118–121
switched-capacitor 181–182, 181 Levinson–Durbin procedure 170
intercorrelation 259–261 limit cycles 143
intercorrelation function 260–261 linearity 36, 169
interleaved samples, spectra obtained by 215, linear phase filters 92, 106, 111, 116
216 linear prediction 259, 268–269, 268,
internal data wordlength limitation 141–142 275
in lattice structure 271
interpolation filter 114, 114, 204, 205,
linear time-invariant (LTI) system 77–78, 82,
219
83
spectrum of 194
line spectral pair (LSP) method 271, 272
interpolations
LMS see least mean squares (LMS)
Lagrange interpolation 203–204
lossy compression, filters for, in JPEG 2000
and signal restoration 206–207
236, 241
using FIR filters 202–203, 202
low-order Winograd algorithms, arithmetic
interspectrum 261
complexities of 70
inverse Fourier transform of probability density
low-pass analysis filter, frequency response of
10 238, 238
inverse functions, determining 248–249 low-pass filter 15, 94, 94, 160
isolated impulse 3, 3 calculating any filter by transformation of
iterative decoder, principle of 329 157–158
iterative techniques for calculating IIR filter decomposition of 217–220, 218, 219
with frequency 158 poles and zeros of 166
Chebyshev approximation 159–160 lozenge filter 118, 118, 120, 120
mean square error, minimizing 158–159
m
j masking curve for a sound signal 339, 340
JPEG 2000 237, 239 mathematical distributions 4
filters for lossy compression in 241 maximum likelihood (ML) decoder 323
frequency responses of the filters in 241 mean square error 284
minimization of 158–159, 293, 294
k minimum mean square error (MMSE) 267,
K data symbols 316 269, 275
Kronecker product of matrices 55–58, 69, 71 mean value 8
370 Index

measurement noise 285 interleaved samples, spectra obtained by


adaptive filter with 283 215, 216
minimum mean square error (MMSE) criterion low-pass FIR filter, decomposition of
267, 269, 275 217–220, 218, 219
minimum-phase filters 111–113, 200–201 polyphase network, digital filtering by
minimum-phase wavelets 224–227, 225
coefficients of 240
frequency responses of 240 n
mobile radiocommunications 275, 347–349 narrowband integrator 126
model functions, direct calculations of IIR neural networks 297
filter coefficients using 148 on activation functions 309
bilinear transformation 150, 150 application, examples of 303–305
Butterworth filters 151–153, 151, 153 backpropagation algorithm 300–303
elliptic filters 153–156, 154–156 classification 297–299
low-pass filter, calculating any filter by convolution neural networks (CNN) 306,
transformation of 157–158 306
comparison of IIR and FIR filters 169–170 with hidden layers 299
multilayer perceptron 299–300, 300
filters based on spheroidal sequences
neural network and signal processing
160–161
308–309
impulse invariance 149
recurrent/recursive neural networks
iterative techniques for calculating IIR filter
307–308, 307
with frequency 158
neuron 298, 298, 304
Chebyshev approximation 159–160
90∘ phase shifters 197–199, 198
minimizing the mean square error
noise vector modulus, probability distribution
158–159
of 323
limiting the coefficient wordlength
nonbinary outputs 298
164–167
nonlinear activation function 299, 306, 307
round-off noise 167–169
nonlinear coding with 13-segment A-law
structures representing the transfer function
24–26
162–164 nonlinear function 303–304, 303
modulated signal 7, 16, 199 nonuniform filter bank in tree structure 239,
moving image encoder 343, 344 239
multicarrier transmission 344–346 normal distribution, reduced 30, 31–32
multilayer perceptron 299–300, 300 normalized algorithms 288–289
multiple inputs and multiple outputs (MIMO) norms of a function 12–13
system 273–275, 274, 275, 348 N real filters 249, 249, 253, 256
multirate filtering 213, 229 number of coefficients 113
decimation and Z-transform 213–217, 214 and filter characteristic 102–104
filter banks using polyphase networks and limiting 95, 95
DFT 227–229 number-theoretic transforms 71–73
filter bank with two filters 226 Nyquist filters 104, 105
first half-band filter, transition band of 223
half-band filters, decomposition with o
222–224, 222 odd discrete Fourier transform, coefficients of
half-band FIR filters 220–222, 221, 221 60, 60, 62
with IIR elements 227 one-sided Z-transform 80, 84, 126, 132
Index 371

optimal coding 26–28, 26 purely recursive second-order filter section


optimization techniques 113, 158, 303 127–134, 128, 348
orthogonal frequency-division multiplexing frequency response of 131
(OFDM) 344–346, 345. 347–349 impulse response 133
phase characteristic of 132
p theoretical group delay of 132
parallel structure 162, 167
Parseval’s relation 37 q
partial transforms 58 quadratic coefficient deviation 283
odd-time odd-frequency DFT 61–63 quadrature filters 194
real data and odd DFT, transform of 59–61 frequency response of 193
sine and cosine transforms 63–66 mask for 196
two-dimensional DCT (2D-DCT) 66 responses of 195
peak factor of random signal 11–12 quadrature mirror image filter (QMF)
peak power of a coder 22–23, 23 233–236, 234
perceptron 298–299 lattice structures 242, 242
periodic function perfect decomposition and reconstruction
Fourier series expansion of 1–2 236–238, 237
spectrum of 14 two sub-bands and reconstruction,
decomposition into 233
phase-locked loop 337–338, 337
wavelets 238–241
phase shifter circuit 136
quadrature operation 190
phase shifters 163–164, 163
quantity of information 28–29
half-band filter with 225
quantization 20–22, 20
phase shifters, recursive 197–199, 198
of the coefficients 140, 141
phase shifts 48, 136, 138, 229
defined 1
in filter bank 228
operation 20
filtering by 48
quantization error 20, 21, 45, 338
polynomial terminology 318
quantizing distortion 20, 26
polyphase network
quantizing noise 20–22
analyzing the elements of 247–248
digital filtering by 224–227, 225 r
filter banks using 227–229 radix-4 FFT algorithm 43–44, 43
predictable signal 269, 313 raised-cosine filtering 90
prediction error 268–269, 338 raised-cosine transition filter 104–106, 104
predictor structures 270, 271 raised filter, ripple in the stop band of 111,
sensor networks 272–273 111
probability density 8, 10 random signals 8, 10
inverse Fourier transform of 10 discrete 18–19
probability distribution filtering of 82
of noise vector modulus 323 peak factor of 11–12
of signal amplitude 26 random variable 8, 9, 108
prototype filter 235, 249 characteristic function of 10
determining the coefficients of 253, 254 noncorrelated 19
pseudo-QMF filters 255–256 Rayleigh distribution 11, 347
banks of 249–253, 254 real and causal set, Fourier transform of
pseudorandom sequence 19, 19 189–192, 190
372 Index

real filters, realizing a bank of 254–256, 255 spectral incidence of 13, 13


receiver for terrestrial digital television 346, theorem 15–16
347 second-order elliptic section 135
rectangular filter 119, 119 second-order filter section 130
rectified linear unit (ReLU) function 306, general 134–138, 136, 137
309, 310 purely recursive 127–134, 128
recurrent/recursive neural networks 307–308 second-order loop, model of 337, 337
recursive least squares algorithm (RLS) 341 sensor networks 272–273
recursive phase shifters 197–199, 198 separable filters 116
recursive systematic coding (RSC) 328 Shannon’s theorem 16
reduced normal distribution 30, 31–32 signal, defined xv
Reed–Solomon codes 73, 313 signal analysis and modeling 259
finite field, computing in 317–318 antenna processing 272–273, 272
in frequency domain 315–316, 315 autocorrelation and intercorrelation
performance of 318–319 259–261
predictable signals 313–315 correlation matrix 264–266
in time domain 316–317, 316 correlogram spectral analysis 261–262
Remez algorithm 101, 102, 159 estimation bounds 275–276
residual error 270, 282, 284–286 linear prediction 268–269, 268
ripples in the pass band 180, 180 modeling 266–268, 267
round-off noise 167–169 multiple inputs and multiple outputs
(MIMO) system 273–275, 274
s predictor structures 270–273, 271
sampled signal spectrum 13, 14 single-frequency estimation 262–264
sampling 13–14, 16 sign algorithms 288–289
defined 1 signal processing xv, 11, 313, 331, 335
deterministic signals 7 neural network and 308–309
of discrete noise generation 19–20 signal reconstruction after sampling 15, 16
of discrete random signals 18–19 signal restoration, interpolations and
distributions 4 206–207
definition 4–5 signal sampling frequency xviii
differentiation of 5 signal-to-noise ratio (SNR) 22, 23, 25, 273
Fourier transform of 6 simple convolutional code 323–326
Fourier series expansion of a periodic simulated ladder filters 176–180, 179
function 1–2 sine and cosine transforms 63–66
Fourier transform of a function 2–4 sine DFT (sin-DFT) 64
frequency 14 sine transform
frequency sampling 14 discrete sine transform (DST) 64
Gaussian signals 9 frequency warping by 178, 178
peak factor of a random signal 11–12 single-frequency estimation 262–264
norms of a function 12–13 single side-band (SSB) modulation 199, 199,
operation 14 200
period 13 sinusoidal signals 7
quantization 20–22, 20 sampling of 17
random signals 8–9 sound, coding of 339, 340
signal reconstruction after 15, 16 spectral distribution of error signal 21
of sinusoidal signals 17 spectral incidence of sampling 13, 13
Index 373

spectral resolution 48–49 definition and properties 77–78


spectrum difference equations, systems defined by
aliasing 16, 49, 217 83–84
of analytic signal 193 energy and power of discrete signals 80–81
of discrete analytic signal 194 random signals, filtering of 82
of Hadamard code 31/64 71, 71 state variable analysis 85–86
of isolated impulse 3 Z-transform 78–80
of periodic function 14 transfer function 15, 83, 91–94, 113, 127, 135,
of signal 3 136, 138, 158, 159, 161, 177, 179, 185,
spectrum analyzers 17 187, 197, 202, 205, 216, 218, 234, 242,
spectrum calculation using DFT 46 270, 272, 292, 338
filtering function 46–48, 47 of all-pass circuit 187
spectral resolution 48–49 of first-order section 123
spectrum of discrete analytic signal and graphical interpretation of 84
interpolation filter 194 interpolated 97, 97
speech, differential coding of 338–339 structures representing 162–164
spheroidal sequences, filters based on Z-transfer function xvii, 80, 85, 112, 123,
160–161 127, 134, 140, 147, 149, 150, 152, 157,
splines 204–206 173, 178, 224, 228
split-radix FFT algorithm 44–45 of FIR filter 109–111
stability and limit cycles 142–144, transition band 94, 96, 102–104, 155, 253
143 of first half-band filter 223
state equations 83, 85 translation 36
state variable analysis 85–86 transmission channel, capacity of 28–29
subsampled filter 114 tree structure, nonuniform filter bank in 239,
sum signal 7 239
switched-capacitor filters 180–183, 181, trellis-coded modulations 330–331
182 turbo codes, principle of 329–330
switched-capacitor integrator 181–182, two-dimensional DCT (2D-DCT) 66
181 two-dimensional FIR filters 114–117, 116,
symmetrical filter 117
with even order 93, 93 coefficients of, by least-squares method
with odd order 93, 93 118–121
symmetry 36, 110, 112 one-dimensional linear phase filter, designed
from 116
t two-port circuits, properties of 173–176, 174,
television image processing 342–344 176
terrestrial digital television, receiver for two-wire line 340–341
346, 347
13-segment A-law, nonlinear coding with u
24–26, 25 unitary Gaussian signal, coding of 27
time constant 124, 128, 284–285 unitary Laplacian signal, coding of 27
time domain 1, 133 unit series 77, 78
discrete-time domain xvi
interleaving in 38 w
Reed–Solomon codes in 316–317, 316 wavelength variable 342
time-invariant discrete linear systems 77 wavelets 238–241
374 Index

weight of the sequence 325 Z-transform xvii, 77–85, 125, 224, 290
white noise 9, 19, 82, 130 decimation and 213–217, 214
inverse 127
z one-sided 126, 132
Z-transfer function xvii 80, 85, 112, 123, 127,
134, 140, 147, 149, 150, 152, 157, 173,
178, 224, 228
WILEY END USER LICENSE AGREEMENT
Go to www.wiley.com/go/eula to access Wiley’s ebook EULA.

You might also like