0% found this document useful (0 votes)
74 views29 pages

Paper For Filter Design

This document presents research on using iterative reweighted least squares (IRLS) algorithms to design digital filters based on lp norms. It introduces IRLS as a method to break complicated optimization problems into a series of weighted least squares problems. The document proposes using IRLS to design both finite impulse response (FIR) and infinite impulse response (IIR) digital filters. It believes IRLS offers a tailored approach for lp filter design problems compared to traditional optimization tools. The goal is to develop efficient algorithms that can shape filters according to different applications' needs.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views29 pages

Paper For Filter Design

This document presents research on using iterative reweighted least squares (IRLS) algorithms to design digital filters based on lp norms. It introduces IRLS as a method to break complicated optimization problems into a series of weighted least squares problems. The document proposes using IRLS to design both finite impulse response (FIR) and infinite impulse response (IIR) digital filters. It believes IRLS offers a tailored approach for lp filter design problems compared to traditional optimization tools. The goal is to develop efficient algorithms that can shape filters according to different applications' needs.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

1

Iterative Design of l
p
Digital Filters
Ricardo A. Vargas and C. Sidney Burrus
Electrical and Computer Engrineering Dept.
Rice University
February 2009
AbstractThe design of digital lters is a fundamental process in
the context of digital signal processing. The purpose of this paper is to
study the use of lp norms (for 2 <p <) as design criteria for digital
lters, and to introduce a set of algorithms for the design of Finite (FIR)
and Innite (IIR) Impulse Response digital lters based on the Iterative
Reweighted Least Squares (IRLS) algorithm. The proposed algorithms rely
on the idea of breaking the lp lter design problem into a sequence of
approximations rather than solving the original lp problem directly. It is
shown that one can efciently design lters that arbitrarily approximate
a desired lp solution (for 2<p<) including the commonly used l (or
minimax) design problem. A method to design lters with different norms
in different bands is presented (allowing the user for better control of the
signal and noise behavior per band). Among the main contributions of this
work is a method for the design of magnitude lp IIR lters. Experimental
results show that the algorithms in this work are robust and efcient,
improving over traditional off-the-shelf optimization tools. The group of
proposed algorithms form a exible collection that offers robustness and
efciency for a wide variety of digital lter design applications.
I. INTRODUCTION
The design of digital lters has fundamental importance in digital
signal processing. One can nd applications of digital lters in many
diverse areas of science and engineering including medical imaging,
audio and video processing, oil exploration, and highly sophisticated
military applications. Furthermore, each of these applications benets
from digital lters in particular ways, thus requiring different proper-
ties from the lters they employ. Therefore it is of critical importance
to have efcient design methods that can shape lters according to
the users needs.
In this work we use the discrete lp norm as the criterion for
designing efcient digital lters. We also introduce a set of algo-
rithms, all based on the Iterative Reweighted Least Squares (IRLS)
method, to solve a variety of relevant digital lter design problems.
The proposed family of algorithms has proven to be efcient in
practice; these algorithms share theoretical justication for their use
and implementation. Finally, the document makes a point about the
relevance of the lp norm as a useful tool in lter design applications.
The rest of this chapter is devoted to motivating the problem.
Section I-A introduces the general lter design problem and some
of the signal processing concepts relevant to this work. Section I-C
presents the basic Iterative Reweighted Least Squares method, one
of the key concepts in this document. Section I-D introduces Finite
Impulse Response (FIR) lters and covers theoretical motivations
for lp design, including previous knowledge in lp optimization (both
from experiences in lter design as well as other elds of science
and engineering). Similarly, Section I-E introduces Innite Impulse
Response (IIR) lters. These last two sections lay down the structure
of the proposed algorithms, and provide an outline for the main
contributions of this work.
Chapters II and III formally introduce the different lp lter
design problems considered in this work and discuss their IRLS-
based algorithms and corresponding results. Each of these chapters
provides a literary review of related previous work as well as a
discussion on the proposed methods and their corresponding results.
An important contribution of this work is the extension of known and
well understood concepts in lp FIR lter design to the IIR case.
A. Digital lter design
When designing digital lters for signal processing applications
one is often interested in creating objects h R
N
in order to alter
some of the properties of a given vector x R
M
(where 0<M, N<
). Often the properties of x that we are interested in changing lie in
the frequency domain, with X = T(x) being the frequency domain
representation of x given by
x
F
X = AXe
j
X
where AX and
X
are the amplitude and phase components of x,
and T() : R
N
R

is the Fourier transform operator dened by


T|h = H()
N1

n=0
hne
jn
[, ] (1)
So the idea in lter design is to create lters h such that the
Fourier transform H of h posesses desirable amplitude and phase
characteristics.
The ltering operator is the convolution operator () dened by
(x h)(n) =

m
x(m)h(n m)
An important property of the convolution operator is the Convolution
Theorem [1] which states that
x h
F
X H = (AX AH) e
j(
X
+
H
)
(2)
where |AX,
X
and |AH,
H
represent the amplitude and phase
components of X and H respectively. It can be seen that by ltering
x with h one can apply a scaling operator to the amplitude of x and
a biasing operator to its phase.
A common use of digital lters is to remove a certain band of
frequencies from the frequency spectra of x (such as typical lowpass
lters). Other types of lters include band-pass, high-pass or band-
reject lters, depending on the range of frequencies that they alter.
B. The notion of approximation in lp lter design
Once a lter design concept has been selected, the design problem
becomes nding the optimal vector h R
n
that most closely
approximates our desired frequency response concept (we will denote
such optimal vector by h

). This approximation problem will heavily


depend on the measure by which we evaluate all vectors h R
N
to
choose h

.
In this document we consider the discrete lp norms dened by
|a|p = p
_

k
[a
k
[
p
a R
N
(3)
as measures of optimality, and consider a number of lter design
problems based upon this criterion. The work explores the Iterative
a
r
X
i
v
:
1
2
0
7
.
4
5
2
6
v
1


[
c
s
.
I
T
]


1
9

J
u
l

2
0
1
2
2
Reweighted Least Squares (IRLS) approach as a design tool, and
provides a number of algorithms based on this method. Finally, this
work considers critical theoretical aspects and evaluates the numerical
properties of the proposed algorithms in comparison to existing
general purpose methods commonly used. It is the belief of the author
(as well as the authors advisor) that the IRLS approach offers a more
tailored route to the lp lter design problems considered, and that it
contributes an example of a made-for-purpose algorithm best suited
to the characteristics of lp lter design.
C. The IRLS algorithm
Iterative Reweighted Least Squares (IRLS) algorithms dene a
family of iterative methods that solve an otherwise complicated
numerical optimization problem by breaking it into a series of
weighted least squares (WLS) problems, each one easier in principle
than the original problem. At iteration i one must solve a weighted
least squares problem of the form
min
h
i
|w(hi1)f(hi)|2 (4)
where w() is a specic weighting function and f() is a function
of the lter. Obviously a large class of problems could be written in
this form (large in the sense that both w() and f() can be dened
arbitrarily). One case worth considering is the linear approximation
problem dened by
min
h
|DCh| (5)
where D R
M
and C R
MN
are given, and || is an arbitrary
measure. One could write f() in (4) as
f(h) = DCh
and attempt to nd a suitable function w() to minimize the arbitrary
norm || in (5). In vector notation, at iteration i one can write (4)
as follows,
min
h
i
|w(hi1) (DChi)|2 (6)
One can show that the solution of (6) for any iteration is given by
h = (C
T
WC)
1
C
T
WD
with W = diag(w
2
) (where w is the weighting vector). To solve
problem (6) above, one could use the following algorithm:
1) Set initial weights w0
2) At the i-th iteration nd hi = (C
T
Wi1C)
1
C
T
Wi1D
3) Update Wi as a function of hi (i.e. Wi = W(hi) )
4) Iterate steps 2 and 3 until a certain stopping criterion is reached
This method will be referred in this work as the basic IRLS algorithm.
An IRLS algorithm is said to converge if the algorithm produces
a sequence of points hi such that
lim
i
hi = h

where h

is a xed point dened by


h

= (C
T
W

C)
1
C
T
W

D
with W

= W(h

). In principle one would want h

= h

(as
dened in Section I-B).
IRLS algorithms have been used in different areas of science and
engineering. Their atractiveness stem from the idea of simplifying a
difcult problem as a sequence of weighted least squares problems
that can be solved efciently with programs such as Matlab or LA-
PACK. However (as it was mentioned above) success is determined
by the existence of a weighting function that leads to a xed point
that happens to be at least a local solution of the problem in question.
This might not be the case for any given problem. In the case of lp
optimization one can justify the use of IRLS methods by means of
the following theorem:
Theorem 1 (Weight Function Existence theorem): Let g
k
() be a
Chebyshev set and dene
H(h; ) =
M

k=0
h
k
g
k
()
where h = (h0, h1, . . . , hM)
T
. Then, given D() continuous on
[0, ] and 1<q <p the following are identical sets:
|h [ H(h; ) is a best weighted Lp approximation toD() on
[0, ].
|h [ H(h; ) is a best weighted Lq approximation to D() on
[0, ].
Furthermore, the theorem above is valid if the interval [0, ] is
replaced by a nite point set [0, ] (this theorem is accredited
to Motzkin and Walsh [2], [3]).
Theorem 1 is fundamental since it establishes that weights exist
so that the solution of an Lp problem is indeed the solution of a
weighted Lq problem (for arbitrary p, q >1). Furthermore the results
of Theorem 1 remain valid for lp and lq. For our purposes, this
theorem establishes the existence of a weighting function so that the
solution of a weighted l2 problem is indeed the solution of an lp
problem; the challenge then is to nd the corresponding weighting
function. The remainder of this document explores this task for a
number of relevant lter design problems and provides a consistent
computational framework.
D. Finite Impulse Response (FIR) lp design
A Finite Impulse Response (FIR) lter is an ordered vector h
R
N
(where 0! < N <), with a complex polynomial form in the
frequency domain given by
H() =
N1

n=0
hne
jn
The lter H() contains amplitude and phase components
|AH(), H() that can be designed to suit the users purpose.
Given a desired frequency response D(), the general lp approx-
imation problem is given by
min
h
|D() H(h; )|p
In the most basic scenario D() would be a complex valued function,
and the optimization algorithm would minimize the lp norm of the
complex error function () = D() H(); we refer to this case
as the complex lp design problem (refer to Section II-C).
One of the caveats of solving complex approximation problems is
that the user must provide desired magnitude and phase specications.
In many applications one is interested in removing or altering a range
of frequencies from a signal; in such instances it might be more
convenient to only provide the algorithm with a desired magnitude
function while allowing the algorithm to nd a phase that corresponds
to the optimal magnitude design. The magnitude lp design problem
is given by
min
h
|D() [H(h; )[ |p
where D() is a real, positive function. This problem is discussed
in Section II-D.
Another problem that uses no phase information is the linear phase
lp problem. It will be shown in Section II-B that this problem can be
formulated so that only real functions are involved in the optimization
3
problem (since the phase component of H() has a specic linear
form).
An interesting case results from the idea of combining different
norms in different frequency bands of a desired function D().
One could assign different p-values for different bands (for example,
minimizing the error energy (2) in the passband while using a
minimax error () approach in the stopband to keep control of
noise). The frequency-varying lp problem is formulated as follows,
min
h
|(D H)(
pb
)|p +|(D H)(
sb
)|q
where |
pb
,
pb
are the passband and stopband frequency ranges
respectively (and 2<p, q <).
Perhaps the most relevant problem addressed in this work is the
Constrained Least Squares (CLS) problem. In a continuous sense, a
CLS problem is dened by
min
h
|d() H()|2
subject to [d() H()[
The idea is to minimize the error energy across all frequencies, but
ensuring rst that the error at each frequency does not exceed a
given tolerance . Section II-F explains the details for this problem
and shows that this type of formulation makes good sense in lter
design and can efciently be solved via IRLS methods.
1) The IRLS algorithm and FIR literature review: A common
approach to dealing with highly structured approximation problems
consists in breaking a complex problem into a series of simpler,
smaller problems. Often, one can even prove important mathemat-
ical properties in this way. Consider the lp approximation problem
introduced in (3),
min
h
|f(h)|p (7)
For simplicity at this point we can assume that f() : R
N
R
M
is
linear. It is relevant to mention that (7) is equivalent to
min
h
|f(h)|
p
p
(8)
In its most basic form the lp IRLS algorithm works by rewriting (8)
into a weighted least squares problem of the form
min
h
|w(h)f(h)|
2
2
(9)
Since a linear weighted least squares problem like (9) has a closed
form solution, it can be solved in one step. Then the solution is used
to update the weighting function, which is kept constant for the next
closed form solution and so on (as discussed in Section I-C).
One of the earlier works on the use of IRLS methods for lp approx-
imation was written by Charles Lawson [4][6], in part motivated by
problems that might not have a suitable l algorithm. He looked
at a basic form of the IRLS method to solve l problems and
extended it by proposing a multiplicative update of the weighting
coefcients at each iteration (that is, w
k+1
() = f() w
k
()).
Lawsons method triggered a number of papers and ideas; however
his method is sensitive to the weights becoming numerically zero; in
this case the algorithm must restart. A number of ideas [5], [6] have
been proposed (some from Lawson himself) to prevent or deal with
these occurrences, and in general his method is considered somewhat
slow.
John Rice and Karl Usow [5], [7] extended Lawsons method to
the general lp problem (2<p<) by developing an algorithm based
on Lawsons that also updates the weights in a multiplicative form.
They used the results from Theorem 1 by Motzkin and Walsh [2], [3]
to guarantee that a solution indeed exists for the lp problem. They
dened
w
k+1
() = w

k
()[
k
()[

where
=
(p 2)
(p 2) + 1
and
=

2
=
p 2
2((p 2) + 1)
with being a convergence parameter and () = d()H(). The
rest of the algorithm works the same way as the basic IRLS method;
however the proper selection of could allow for strong convergence
(note that for = 0 we obtain the basic IRLS algorithm).
Another approach to solve (7) consists in a partial updating
strategy of the lter coefcients rather than the weights, by using
a temporary coefcient vector dened by
a
k+1
= [C
T
W
T
k
W
k
C]
1
C
T
W
T
k
W
k
A
d
(10)
The lter coefcients after each iteration are then calculated by
a
k+1
= a
k+1
+ (1 )a
k
(11)
where is a convergence parameter (with 0<<1). This approach
is known as the Karlovitz method [8], and it has been claimed that
it converges to the global optimal solution for even values of p such
that 4 p <. However, in practice several convergence problems
have been found even under such assumptions. One drawback is that
the convergence parameter has to be optimized at each iteration
via an expensive line search process. Therefore the overall execution
time becomes rather large.
S. W. Kahng [9] developed an algorithm based on Newton-
Raphsons method that uses
=
1
p 1
(12)
to get
a
k+1
=
a
k+1
+ (p 2)a
k
p 1
(13)
This selection for is based upon Newtons method to minimize
(the same result was derived independently by Fletcher, Grant and
Hebden [10]). The rest of the algorithm follows Karlovitzs approach;
however since is xed there is no need to perform the linear
search for its best value. Since Kahngs method is based on Newtons
method, it converges quadratically to the optimal solution. Kahng
proved that his method converges for all cases of and for any
problem (at least in theory). It can be seen that Kahngs method is
a particular case of Karlovitzs algorithm, with as dened in (12).
Newton-Raphson based algorithms are not warranted to converge to
the optimal solution unless they are somewhat close to the solution
since they require to know and invert the Hessian matrix of the
objective function (which must be positive denite [11]). However,
their associated quadratic convergence makes them an appealing
option.
Burrus, Barreto and Selesnick developed a method [7], [12], [13]
that combines the powerful quadratic convergence of Newtons meth-
ods with the robust initial convergence of the basic IRLS method, thus
overcoming the initial sensitivity of Newton-based algorithms and
the slow linear convergence of Lawson-based methods. To accelerate
initial convergence, their approach to solve (7) uses p = 2, where
is a convergence parameter (with 1<2). At any given iteration,
p increases its value by a factor of . This is done at each iteration,
so to satisfy
p
k
= min (p
des
, p
k1
) (14)
where p
des
corresponds to the desired lp norm. The implementation
of each iteration follows Karlovitzs method using the particular
selection of p given by (14).
4
Fig. 1. Homotopy approach for IRLS lp lter design.
It is worth noting that the method outlined above combines several
ideas into a powerful approach. By not solving the desired lp problem
from the rst iteration, one avoids the potential issues of Newton-
based methods where convergence is guaranteed within a radius of
convergence. It is well known that for 2 p there exists
a continuum of lp solutions (as shown in Figure 1). By slowly
increasing p from iteration to iteration one hopes to follow the
continuum of solutions from l2 towards the desired p. By choosing a
reasonable the method can only spend one iteration at any given p
and still remain close enough to the optimal path. Once the algorithm
reaches a neighborhood of the desired p, it can be allowed to iterate
at such p, in order to converge to the optimal solution. This process
is analogous to homotopy, a commonly used family of optimization
methods [14].
While l2 and l designs offer meaningful approaches to lter
design, the Constrained Least Squares (CLS) problem offers an
interesting tradeoff to both approaches [15]. In the context of lter
design, the CLS problem seems to be rst presented by John Adams
[16] in 1991. The problem Adams posed is a Quadratic Programming
(QP) problem, well suited for off-the-shelf QP tools like those based
on Lagrange multiplier theory [16]. However, Adams posed the
problem in such a way that a transition band is required. Burrus et al.
presented a formulation [17][19] where only a transition frequency
is required; the transition band is induced; it does indeed exist but
is not specied (it adjusts itself optimally according to the constraint
specications). The method by Burrus et al. is based on Lagrange
multipliers and the Karush-Kuhn-Tucker (KKT) conditions.
An alternative to the KKT-based method mentioned above is the
use of IRLS methods where a suitable weighting function serves as
the constraining function over frequencies that exceed the constraint
tolerance. Otherwise no weights are used, effectively forcing a least-
squares solution. While this idea has been suggested by Burrus et al.,
one of the main contributions of this work is a thorough investigation
of this approach, as well as proper documentation of numerical
results, theoretical ndings and proper code.
E. Innite Impulse Response (IIR) lp design
In contrast to FIR lters, an Innite Impulse Response (IIR) lter
is dened by two ordered vectors a R
N
and b R
M+1
(where
0<M, N<), with frequency response given by
H() =
B()
A()
=
M

n=0
bne
jn
1 +
N

n=1
ane
jn
Hence the general lp approximation problem is
min
an,bn
_
_
_
_
_
_
_
_
M

n=0
bne
jn
1 +
N

n=1
ane
jn
D()
_
_
_
_
_
_
_
_
p
(15)
which can be posed as a weighted least squares problem of the form
min
an,bn
_
_
_
_
_
_
_
_
w()
_
_
_
_
M

n=0
bne
jn
1 +
N

n=1
ane
jn
D()
_
_
_
_
_
_
_
_
_
_
_
_
2
2
(16)
It is possible to design similar problems to the ones outlined in
Section I-D for FIR lters. However, it is worth keeping in mind
the additon al complication s that IIR design involves, including the
nonlinear least squares problem presented in Section I-E1 below.
1) Least squares IIR literature review: The weighted nonlinear
formulation presented in (16) suggests the possibility of taking ad-
vantage of the exibilities in design from the FIR problems. However
this point comes at the expense of having to solve at each iteration a
weighted nonlinear l2 problem. Solving least squares approximations
with rational functions is a nontrivial problem that has been studied
extensively in diverse areas including statistics, applied mathematics
and electrical engineering. One of the contributions of this document
is a presentation in Section III-B on the subject of l2 IIR lter design
that captures and organizes previous relevant work. It also sets the
framework for the proposed methods used in this document.
In the context of IIR digital lters there are three main groups
of approaches to (16). Section III-B1 presents relevant work in
the form of traditional optimization techniques. These are methods
derived mainly from the applied mathematics community and are in
general efcient and well understood. However the generality of such
methods occasionally comes at the expense of being inefcient for
some particular problems. Among the methods found in literature, the
Davidon-Flecther-Powell (DFP) algorithm [20], the damped Gauss-
Newton method [21], [22], the Levenberg-Marquardt algorithm [23],
[24], and the method of Kumaresan [25], [26] form the basis of a
number of methods to solve (15).
A different approach to (15) from traditional optimization methods
consists in linearizing (16) by transforming the problem into a sim-
pler, linear form. While in principle this proposition seems inadequate
(as the original problem is being transformed), Section III-B2 presents
some logical attemps at linearizing (16) and how they connect with
the original problem. The concept of equation error (a weighted form
of the solution error that one is actually interested in solving) has
been introduced and employed by a number of authors. In the context
of lter design, E. Levy [27] presented an equation error linearization
formulation in 1959 applied to analog lters. An alternative equation
error approach presented by C. S. Burrus [28] in 1987 is based on
the methods by Prony [29] and Pade [30]. The method by Burrus can
be applied to frequency domain digital lter design, and is used in
selected stages in some of the algorithms presented in this work.
An extension of the equation error methods is the group of iterative
preltering algorithms presented in Section III-B8. These methods
build on equation error methods by weighting (or preltering) their
equation error formulation iteratively, with the intention to converge
to the minimum of the solution error. Sanathanan and Koerner [31]
presented in 1963 an algorithm (SK) that builds on an extension
of Levys method by iterating on Levys formulation. Sid-Ahmed,
Chottera and Jullien [32] presented in 1978 a similar algorithm to
the SK method but applied to the digital lter problem.
A popular and well understood method is the one by Steiglitz and
McBride [33], [34] introduced in 1966. The SMB method is time-
domain based, and has been extended to a number of applications,
including the frequency domain lter design problem [35]. Steiglitz
and McBride used a two-phase method based on linearization.
Initially (in Mode-1) their algorithm is essentially that of Sanathanan
and Koerner but in time. This approach often diverges when close to
5
the solution; therefore their method can optionally switch to Mode-2,
where a more traditional derivative-based approach is used.
A more recent linearization algorithm was presented by L. Jackson
[36] in 2008. His approach is an iterative preltering method based
directly in frequency domain, and uses diagonalization of certain
matrices for efciency.
While intuitive and relatively efcient, most linearization methods
share a common problem: they often diverge close to the solution
(this effect has been noted by a number of authors; a thorough review
is presented in [35]). Section III-B13 presents the quasilinearization
method derived by A. Soewito [35] in 1990. This algorithm is robust,
efcient and well-tailored for the least squares IIR problem, and is
the method of choice for this work.
II. FINITE IMPULSE RESPONSE FILTERS
This chapter discusses the problem of designing Finite Impulse
Response (FIR) digital lters according to the lp error criterion using
Iterative Reweighted Least Squares methods. Section II-A gives an
introduction to FIR lter design, including an overview of traditional
FIR design methods. For the purposes of this work we are particularly
interested in l2 and l design methods, and their relation to relevant
lp design problems. Section II-B formally introduces the linear phase
problem and presents results that are common to most of the problems
considered in this work. Finally, Sections II-C through II-E present
the application of the Iterative Reweighted Least Squares algorithm
to other important problems in FIR digital lter design, including the
relevant contributions of this work.
A. Traditional design of FIR lters
Section I-A introduced the notion of digital lters and lter design.
In a general sense, an FIR lter design problem has the form
min
h
|f(h)|
where f() denes an error function that depends on h, and || is
an abitrary norm. While one could come up with a number of error
formulations for digital lters, this chapter elaborates on the most
commonly used, namely the linear phase and complex problems (both
satisfy the linear form f(h) = DCh as will be shown later in this
chapter). As far as norms, typically the l2 and l norms are used.
One of the contributions of this work is to demonstrate the usefulness
of the more general lp norms and their feasibility by using efcient
IRLS-based algorithms.
1) Traditional design of least squares (l2) FIR lters: Typically,
FIR lters are designed by discretizing a desired frequency response
H
d
() by taking L frequency samples at |0, 1, . . . , L1. One
could simply take the inverse Fourier transform of these samples
and obtain L lter coefcients; this approach is known as the
Frequency Sampling design method [28], which basically interpolates
the frequency spectrum over the samples. However, it is often more
desirable to take a large number of samples to design a small lter
(large in the sense that L N, where L is the number of frequency
samples and N is the lter order). The weighted least-squares (l2)
norm (which considers the error energy) is dened by
2 |()|2 =
_
1

_

0
W()[D() H()[
2
d
_1
2
(17)
where D() and H() = T(h) are the desired and designed
amplitude responses respectively. By acknowledging the convexity
of (17), one can drop the root term; therefore a discretized form of
(17) is given by
2 =
L1

k=0
W(
k
)[D(
k
) H(
k
)[
2
(18)
The solution of Equation (18) is given by
h =
_
C
T
W
T
WC
_
1
C
T
W
T
WD (19)
where W= diag(

w) contains the weighting vector w. By solving


(19) one obtains an optimal l2 approximation to the desired frequency
response D(). Further discussion and other variations on least
squares FIR design can be found in [28].
2) Traditional design of minimax (l) FIR lters: In contrast to l2
design, an l lter minimizes the maximum error across the designed
lters frequency response. A formal formulation of the problem [37],
[38] is given by
min
h
max

[D() H(; h)[ (20)


A discrete version of (20) is given by
min
h
max
k
[D(
k
) C
k
h[ (21)
Within the scope of lter design, the most commonly approach to
solving (21) is the use of the Alternation Theorem [39], in the context
of linear phase lters (to be discussed in Section II-B). In a nutshell
the alternation theorem states that for a length-N FIR linear phase
lter there are at least N + 1 extrema points (or frequencies). The
Remez exchange algorithm [28], [37], [38] aims at nding these
extrema frequencies iteratively, and is the most commonly used
method for the minimax linear phase FIR design problem. Other
approaches use more standard linear programming methods including
the Simplex algorithm [40], [41] or interior point methods such as
Karmarkars algorithm [42].
The l problem is fundamental in lter design. While this docu-
ment is not aimed covering the l problem in depth, portions of this
work are devoted to the use of IRLS methods for standard problems
as well as some innovative uses of minimax optimization.
B. Linear phase lp lter design
Linear phase FIR lters are important tools in signal processing.
As will be shown below, they do not require the user to specify a
phase response in their design (since the assumption is that the desired
phase response is indeed linear). Besides, they satisfy a number of
symmetry properties that allow for the reduction of dimensions in the
optimization process, making them easier to design computationally.
Finally, there are applications where a linear phase is desired as such
behavior is more physically meaningful.
1) Four types of linear phase lters: The frequency response of
an FIR lter h(n) is given by
H() =
N1

n=0
h(n)e
jn
In general, H() = R() + jI() is a periodic complex function
of (with period 2). Therefore it can be written as follows,
H() = R() +jI()
= A()e
j()
(22)
where the magnitude response is given by
A() = [H()[ =
_
R()
2
+I()
2
(23)
and the phase response is
() = arctan
_
I()
R()
_
However A() is not analytic and () is not continuous. From
a computational point of view (22) would have better properties if
6
both A() and () were continuous analytic functions of ; an
important class of lters for which this is true is the class of linear
phase lters [28].
Linear phase lters have a frequency response of the form
H() = A()e
j()
(24)
where A() is the real, continuous amplitude response of H() and
() = K1 +K2
is a linear phase function in (hence the name); K1 and K2
are constants. The jumps in the phase response correspond to sign
reversals in the magnitude resulting as dened in (23).
Consider a length-N FIR lter (assume for the time being that N
is odd). Its frequency response is given by
H() =
N1

n=0
h(n)e
jn
= e
jM
2M

n=0
h(n)e
j(Mn)
(25)
where M =
N1
2
. Equation (25) can be written as follows,
H() = e
jM
[h(0)e
jM
+. . . +h(M 1)e
j
+h(M)
+h(M + 1)e
j
+. . . +h(2M)e
jM
] (26)
It is clear that for an odd-length FIR lter to have the linear phase
form described in (24), the term inside braces in (26) must be a real
function (thus becoming A()). By imposing even symmetry on the
lter coefcients about the midpoint (n = M), that is
h(k) = h(2M k)
equation (26) becomes
H() = e
jM
_
h(M) + 2
M1

n=0
h(n) cos (M n)
_
(27)
Similarly, with odd symmetry (i.e. h(k) = h(2Mk)) equation (26)
becomes
H() = e
j(

2
M)
2
M1

n=0
h(n) sin (M n) (28)
Note that the term h(M) disappears as the symmetry condition
requires that
h(M) = h(N M 1) = h(M) = 0
Similar expressions can be obtained for an even-length FIR lter,
H() =
N1

n=0
h(n)e
jn
= e
jM
N
2
1

n=0
h(n)e
j(Mn)
(29)
It is clear that depending on the combinations of N and the symmetry
of h(n), it is possible to obtain four types of lters [28], [43], [44].
Table I shows the four possible linear phase FIR lters described by
(24), where the second column refers to the type of lter symmetry.
Even
A() = h(M) + 2
M1

n=0
h(n) cos (M n)
() = M
N
O
d
d
Odd
A() = 2
M1

n=0
h(n) sin (M n)
() =

2
M
Even
A() = h(M) + 2
N
2
1

n=0
h(n) cos (M n)
() = M
N
E
v
e
n
Odd
A() = 2
N
2
1

n=0
h(n) sin (M n)
() =

2
M
TABLE I
THE FOUR TYPES OF LINEAR PHASE FIR FILTERS.
2) IRLS-based methods: Section II-B1 introduced linear phase
lters in detail. In this section we cover the use of IRLS methods to
design linear phase FIR lters according to the lp optimality criterion.
Recall from Section II-B1 that for any of the four types of linear phase
lters their frequency response can be expressed as
H() = A()e
j(K
1
+K
2
)
Since A() is a real continuous function as dened by Table I, one
can write the linear phase lp design problem as follows
min
a
|D() A(; a)|
p
p
(30)
where a relates to h by considering the symmetry properties outlined
in Table I. Note that the two objects from the objective function inside
the lp norm are real. By sampling (30) one can write the design
problem as follows
min
a

k
[D(
k
) A(
k
; a)[
p
or
min
a

k
[D
k
C
k
a[
p
(31)
where D
k
is the k-th element of the vector D representing the
sampled desired frequency response D(
k
), and C
k
is the k-th row
of the trigonometric kernel matrix as dened by Table I.
One can apply the basic IRLS approach described in Section I-C
to solve (31) by posing this problem as a weighted least squares one:
min
a

k
w
k
[D
k
C
k
a[
2
(32)
The main issue becomes iteratively nding suitable weights w for
(32) so that the algorithm converges to the optimal solution a

of
the lp problem (30). Existence of adequate weights is guaranteed by
Theorem 1 as presented in Section I-C; nding these optimal weights
is indeed the difcult part. Clearly a reasonable choice for w is that
which turns (32) into (31), namely
w = [DCa[
p2
Therefore the basic IRLS algorithm for problem (31) would be:
1) Initialize the weights w0 (a reasonable choice is to make them
all equal to one).
2) At the i-th iteration the solution is given by
ai+1 = [C
T
W
T
i
WiC]
1
C
T
W
T
i
WiD (33)
7
3) Update the weights with
wi+1 = [DCai+1[
p2
4) Repeat the last steps until convergence is reached.
It is important to note that Wi = diag(

wi). In practice it has


been found that this approach has practical defciencies, since the
inversion required by (33) often leads to an ill-posed problem and,
in most cases, convergence is not achieved.
As mentioned before, the basic IRLS method has drawbacks that
make it unsuitable for practical implementations. Charles Lawson
considered a version of this algorithm applied to the solution of l
problems (for details refer to [4]). His method has linear convergence
and is prone to problems with proportionately small residuals that
could lead to zero weights and the need for restarting the algorithm. In
the context of lp optimization, Rice and Usow [5] built upon Lawsons
method by adapting it to lp problems. Like Lawsons methods, the
algorithm by Rice and Usow updates the weights in a multiplicative
manner; their method shares similar drawbacks with Lawsons. Rice
and Usow dened
wi+1() = w

i
()[i()[

where
=
(p 2)
(p 2) + 1
and
=

2
=
p 2
2(p 2) + 2
and follow the basic algorithm.
L. A. Karlovitz realized the computational problems associated
with the basic IRLS method and improved on it by partially updating
the lter coefcient vector. He denes
ai+1 = [C
T
W
T
i
WiC]
1
C
T
W
T
i
WiD (34)
and uses a in
ai+1 = ai+1 + (1 )ai (35)
where [0, 1] is a partial step parameter that must be adjusted at
each iteration. Karlovitzs method [8] has been shown to converge
globally for even values of p (where 2 p < ). In practice,
convergence problems have been found even under such assumptions.
Karlovitz proposed the use of line searches to nd the optimal value
of at each iteration, which basically creates an independent opti-
mization problem nested inside each iteration of the IRLS algorithm.
While computationally this search process for the optimal makes
Karlovitzs method impractical, his work indicates the feasibility of
IRLS methods and proves that partial updating indeed overcomes
some of the problems in the basic IRLS method. Furthermore,
Karlovitzs method is the rst one to depart from a multiplicative
updating of the weights in favor of an additive updating on the lter
coefcients. In this way some of the problems in the Lawson-Rice-
Usow approach are overcome, especially the need for restarting the
algorithm.
S. W. Kahng built upon the ndings by Karlovitz by considering
the process of nding an adequate for partial updating. He applied
Newton-Raphsons method to this problem and proposed a closed
form solution for , given by
=
1
p 1
(36)
resulting in
ai+1 = ai+1 + (1 )ai (37)
The rest of Kahngs algorithm follows Karlovitzs approach. How-
ever, since is xed, there is no need to perform the linear search
at each iteration. Kahngs method has an added benet: since it
uses Newtons method to nd , the algorithm tends to converge
much faster than previous approaches. It has indeed been shown to
converge quadratically. However, Newton-Raphson-based algorithms
are not guaranteed to converge globally unless at some point the
existing solution lies close enough to the solution, within their radius
of convergence [11]. Fletcher, Grant and Hebden [10] derived the
same results independently.
Burrus, Barreto and Selesnick [7], [12], [13] modied Kahngs
methods in several important ways in order to improve on their initial
and nal convergence rates and the methods stability (we refer to this
method as BBS). The rst improvement is analogous to a homotopy
[14]. Up to this point all efforts in lp lter design attempted to solve
the actual lp problem from the rst iteration. In general there is no
reason to believe that an initial guess derived from an unweighted
l2 formulation (that is, the l2 design that one would get by setting
w0 =

1) will look in any way similar to the actual lp solution that
one is interested in. However it is known that there exists a continuity
of lp solutions for 1 <p <. In other words, if a

2
is the optimal
l2 solution, there exists a p for which the optimal lp solution a

p
is
arbitrarily close to a

2
; that is, for a given >0
|a

2
a

p
| for some p (2, )
This fact allows anyone to gradually move from an lp solution to an
lq solution.
To accelerate initial convergence, the BBS method of Burrus et al.
initially solves for l2 by setting p0 = 2 and then sets pi = pi1,
where is a convergence parameter dened by 12. Therefore
at the i-th iteration
pi = min (p
des
, pi1) (38)
where p
des
corresponds to the desired lp solution. The implementa-
tion of each iteration follows Karlovitzs method with Kahngs choice
of , using the particular selection of p given by (38).
To summarize, dene the class of IRLS algorithms as follows: after
i iterations, given a vector ai the IRLS iteration requires two steps,
1) Find wi = f(ai)
2) Find ai+1 = g(wi, ai)
The following is a summary of the IRLS-based algorithms dis-
cussed so far and their corresponding updating functions:
1) Basic IRLS algorithm.
wi = [DCai[
p2
Wi = diag(

wi)
ai+1 =
_
C
T
W
T
i
WiC
_
1
C
T
W
T
i
WiD
2) Rice-Usow-Lawson (RUL) method
wi = w

i1
[DCai[

2
Wi = diag(wi)
ai+1 =
_
C
T
W
T
i
WiC
_
1
C
T
W
T
i
WiD
=
(p2)
(p2)+1
constant
3) Karlovitz method
wi = [DCai[
p2
Wi = diag(

wi)
ai+1 =
_
C
T
W
T
i
WiC
_
1
C
T
W
T
i
WiD+ (1 )ai
constant
4) Kahngs method
wi = [DCai[
p2
Wi = diag(

wi)
ai+1 =
_
1
p1
_
_
C
T
W
T
i
WiC
_
1
C
T
W
T
i
WiD +
_
p2
p1
_
ai
8
5) BBS method
pi = min (p
des
, pi1)
wi = [DCai[
p
i
2
Wi = diag(

wi)
ai+1 =
_
1
p
i
1
_
_
C
T
W
T
i
WiC
_
1
C
T
W
T
i
WiD +
_
p
i
2
p
i
1
_
ai
constant
3) Modied adaptive IRLS algorithm: Much of the performance
of a method is based upon whether it can actually converge given a
certain error measure. In the case of the methods described above,
both convergence rate and stability play an important role in their
performance. Both Karlovitz and RUL methods are supposed to
converge linearly, while Kahngs and the BBS methods converge
quadratically, since they both use a Newton-based additive update
of the weights.
Barreto showed in [12] that the modied version of Kahngs
method (or BBS) typically converges faster than the RUL algorithm.
However, this approach presents some peculiar problems that depend
on the transition bandwidth . For some particular values of , the
BBS method will result in an ill-posed weight matrix that causes the
lp error to increase dramatically after a few iterations as illustrated
in Figure 2 (where f = /2).
Fig. 2. Error jumps on IRLS methods.
Two facts can be derived from the examples in Figure 2: for this
particular bandwidth the error increased slightly after the fth and
eleventh iterations, and increased dramatically after the sixteenth.
Also, it is worth to notice that after such increase, the error started
to decrease quadratically and that, at a certain point, the error
became at (thus reaching the numerical accuracy limits of the digital
system).
The effects of different values of were studied to nd out if a
relationship between and the error increase could be determined.
Figure 3 shows the lp error for different values of and for = 1.7.
It can be seen that some particular bandwidths cause the algorithm
to produce a very large error.
Our studies (as well as previous work from J. A. Barreto [12])
demonstrate that this error explosion occurs only for a small range of
20
40
60
80
100
120
140
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0
0.1
0.2
0.3
0.4
0.5
Bandwidth
L
p
error for different bandwidts, f
p
=0.2
Iteration
L
p

e
r
r
o
r
Fig. 3. Relationship between bandwidth and error jumps.
bandwidth specications. Under most circumstances the BBS method
exhibits fast convergence properties to the desired solution. However
at this point it is not understood what causes the error increase and
therefore this event cannot be anticipated. In order to avoid such
problem, we propose the use of an adaptive scheme that modies
the BBS step. As p increases the step from a current lp guess to the
next also increases, as described in (38). In other words, at the i-th
iteration one approximates the l
2
i solution (as long as the algorithm
has not yet reached the desired p); the next iteration one approximates
l
2
i+1. There is always a possibility that these two solutions lie far
apart enough that the algorithm takes a descent step so that the l
2
i+1
guess is too far away from the actual l
2
i+1 solution. This is better
illustrated in Figure 4.
Fig. 4. A step too long for IRLS methods.
The conclusions derived above suggest the possibility to use an
adaptive algorithm [45] that changes the value of so that the
error always decreases. This idea was implemented by calculating
temporary new weight and lter coefcient vectors that will not
become the updated versions unless their resulting error is smaller
than the previous one. If this is not the case, the algorithm tries
two values of , namely
L = (1 ) and H = (1 +) (39)
(where is an updating variable). The resulting errors for each
attempt are calculated, and is updated according to the value that
produced the smallest error. The error of this new is compared
to the error of the nonupdated weights and coefcients, and if the
new produces a smaller error, then such vectors are updated;
otherwise another update of is performed. The modied adaptive
IRLS algorithm can be summarized as follows,
1) Find the unweighted approximation a0 =
_
C
T
C
_
1
C
T
D
and use p0 = 2 (with 12)
9
2 4 6 8 10 12 14 16 18 20
0.05
0.06
0.07
0.08
0.09
0.1
0.11
a) L
p
error (f
p
=0.2, =0.048, K
0
=1.75)
Iterations
L
p

e
r
r
o
r
2 4 6 8 10 12 14 16 18 20
1.2
1.4
1.6
1.8
2
b) Variations in K
K
Iterations
Fig. 5. FIR design example using adaptive method. a) lp error obtained with
the adaptive method; b) Change of .
2) Iteratively solve (34) and (35) using i =
1
p
i
1
and nd the
resulting error i for the i-th iteration
3) If i i1,
Calculate (39)
Select the smallest of
L
and
H
to compare it with i
until a value is found that results in a decreasing error
Otherwise iterate as in the BBS algorithm.
The algorithm described above changes the value of that causes
the algorithm to produce a large error. The value of is updated as
many times as necessary without changing the values of the weights,
the lter coefcients, or p. If an optimal value of exists, the
algorithm will nd it and continue with this new value until another
update in becomes necessary.
The algorithm described above was implemented for several com-
binations of and ; for all cases the new algorithm converged faster
than the BBS algorithm (unless the values of and are such that
the error never increases). The results are shown in Figure 5.a for the
specications from Figure 2. Whereas using the BBS method for this
particular case results in a large error after the sixteenth iteration, the
adaptive method converged before ten iterations.
Figure 5.b illustrates the change of per iteration in the adaptive
method, using an update factor of = 0.1. The lp error stops
decreasing after the fth iteration (where the BBS method introduces
the large error); however, the adaptive algorithm adjusts the value of
so that the lp error continues decreasing. The algorithm decreased
the initial value of from 1.75 to its nal value of 1.4175 (at the
expense of only one additional iteration with = 1.575).
One result worth noting is the relationship between l2 and
l solutions and how they compare to lp designs. Figure 6
shows a comparison of designs for a length-21 Type-I linear
phase low pass FIR lter with transition band dened by
f = |0.2, 0.24. The curve shows the l2 versus l errors (namely
2 and ); the values of p used to make this curve were p =
|2, 2.2, 2.5, 3, 4, 5, 7, 10, 15, 20, 30, 50, 60, 100, 150, 200, 400,
(Matlabs firls and firpm functions were used to design the l2
and l lters respectively). Note the very small decrease in after
p reaches 100. The curve suggests that a better compromise between
2 and can be reached by choosing 2 < p < . Furthermore,
to get better results one can concentrate on values between p = 5
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
0.08
0.09
0.1
0.11
0.12
0.13
0.14
0.15
0.16
0.17
0.18
L
2
L

Error for L
p
designs
E
2
E

Fig. 6. Relationship between l


2
and l errors for lp FIR lter design.
and p = 20; fortunately, for values of p so low no numerical
complications arise and convergence is reached in a few iterations.
C. Complex lp problem
The design of linear phase lters has been intensively discussed
in literature. For the two most common error criteria (l2 and l),
optimal solution algorithms exist. The least squares norm lter can
be found by solving an overdetermined system of equations, whereas
the Chebishev norm lter is easily found by using either the Remez
algorithm or linear programming. For many typical applications, lin-
ear phase lters are good enough; however, when arbitrary magnitude
and phase constraints are required, a more complicated approach
must be taken since such design results in a complex approximation
problem. By replacing Cin the linear phase algorithm with a complex
Fourier kernel matrix, and the real desired frequency vector D with
a complex one, one can use the same algorithm from Section II-B3
to design complex lp lters.
D. Magnitude lp problem
In some applications, the effects of phase are not a necessary factor
to consider when designing a lter. For these applications, control of
the lters magnitude response is a priority for the designer. In order
to improve the magnitude response of a lter, one must not explicitly
include a phase, so that the optimization algorithm can look for the
best lter that approximates a specied magnitude, without being
constrained about optimizing for a phase response too.
1) Power approximation formulation: The magnitude approxima-
tion problem can be formulated as follows:
min
h
|D() [H(; h)[|
p
p
(40)
Unfortunately, the second term inside the norm (namely the absolute
value function) is not differentiable when its argument is zero.
Although one could propose ways to work around this problem,
we propose the use of a different design criterion, namely the
approximation of a desired magnitude squared. The resulting problem
is
min
h
|D()
2
[H(; h) [
2
|
p
p
10
The autocorrelation r(n) of a causal length-N FIR lter h(n) is
given by
r(n) = h(n) h(n) =
N1

k=(N1)
h(k)h(n +k) (41)
The Fourier transform of the autocorrelation r(n) is known as the
Power Spectral Density function [46] R() (or simply the SPD), and
is dened as follows,
R() =
N1

n=(N1)
r(n)e
jn
=
N1

n=(N1)
N1

k=(N1)
h(n)h(n +k)e
jn
From the properties of the Fourier Transform [47, 3.3] one can show
that there exists a frequency domain relationship between h(n) and
r(n) given by
R() = H() H

() = [H()[
2
This relationship suggests a way to design magnitude-squared lters,
namely by using the lters autocorrelation coefcients instead of the
lter coefcients themselves. In this way, one can avoid the use of
the non-differentiable magnitude response.
An important property to note at this point is the fact that since
the lter coefcients are real, one can see from (41) that the
autocorrelation function r(n) is symmetric; thus it is sufcient to
consider its last N values. As a result, the PSD can be written as
R() =

n
r(n)e
jn
= r(0) +
N1

n=1
2r(n) cos n
in a similar way to the linear phase problem.
The symmetry property introduced above allows for the use of
the lp linear phase algorithm of section (II-B) to obtain the au-
tocorrelation coefcients of h(n). However, there is an important
step missing in this discussion: how to obtain the lter coefcients
from its autocorrelation. To achieve this goal, one can follow a
procedure known as Spectral Factorization. The objective is to use the
autocorrelation coefcients r R
N
instead of the lter coefcients
h R
N
as the optimization variables. The variable transformation is
done using (42), which is not a one-to-one transformation. Because of
the last result, there is a necessary condition for a vector r R
N
to
be a valid autocorrelation vector of a lter. This is summarized [48]
in the spectral factorization theorem, which states that r R
N
is the
autocorrelation function of a lter h(n) if and only if R()0 for all
[0, ]. This turns out to be a necessary and sufcient condition
[48] for the existence of r(n). Once the autocorrelation vector r
is found using existing robust interior-point algorithms, the lter
coefcients can be calculated via spectral factorization techniques.
Assuming a valid vector r R
N
can be found for a particular
lter h, the problem presented in (40) can be rewritten as
L()
2
R()U()
2
[0, ] (42)
In (42) the existence condition R()0 is redundant since 0L()
2
and, thus, is not included in the problem denition. For each , the
constraints of (42) constitute a pair of linear inequalities in the vector
r; therefore the constraint is convex in r. Thus the change of variable
transforms a nonconvex optimization problem in h into a convex
problem in r.
E. lp error as a function of frequency
Previous sections have discussed the importance of complex least-
square and Chebishev error criteria in the context of lter design.
In many applications any of these two approaches would provide
adequate results. However, a case could be made where one might
want to minimize the error energy in a range of frequencies while
keeping control of the maximum error in a different band. This idea
results particularly interesting when one considers the use of different
lp norms in different frequency bands. In principle one would be
interested in solving
min
h
|D(
pb
) H(
pb
; h)|p +|D(
sb
) H(
sb
; h)|q (43)
where |
pb

pb
,
sb

sb
represent the pass and stopband
frequencies respectively. In principle one would want
pb

sb
=
|. Therefore problem (43) can be written as
min
h
p
_

pb
[D(
pb
) H(
pb
; h)[
p
+ q
_

sb
[D(
sb
) H(
sb
; h)[
q
(44)
One major obstacle in (44) is the presence of the roots around the
summation terms. These roots prevent us from writing (44) in a
simple vector form. Instead, one can consider the use of a similar
metric function as follows
min
h

pb
[D(
pb
) H(
pb
; h)[
p
+

sb
[D(
sb
) H(
sb
; h)[
q
(45)
This expression is similar to (44) but does not include the root terms.
An advantage of using the IRLS approach on (45) is that one can
formulate this problem in the frequency domain and properly separate
residual terms from different bands into different vectors. In this
manner, the lp modied measure given by (45) can be made into
a frequency-dependent function of p() as follows,
min
h
|D() H(; h)|
p()
p()
=

[D() H(; h)[


p()
Therefore this frequency-varying lp problem can be solved following
the modied IRLS algorithm outlined in Section II-B3 with the
following modication: at the i-th iteration the weights are updated
according to
wi = [DCai[
p()2
It is fundamental to note that the proposed method does not indeed
solve a linear combination of lp norms. In fact, it can be shown
that the expression (45) is not a norm but a metric. While from a
theoretical perspective this fact might make (45) a less interesting
distance, as it turns out one can use (45) to solve the far more
interesting CLS problem, as discussed below in Section II-F.
F. Constrained Least Squares (CLS) problem
One of the common obstacles to innovation occurs when knowl-
edge settles on a particular way of dealing with problems. While
new ideas keep appearing suggesting innovative approaches to design
digital lters, it is all too common in practice that l2 and l dominate
error criteria specications. This section is devoted to exploring a
different way of thinking about digital lters. It is important to
note that up to this point we are not discussing an algorithm yet.
The main concern being brought into play here is the specication
(or description) of the design problem. Once the Constrained Least
11
Squares (CLS) problem formulation is introduced, we will present
an IRLS implementation to solve it, and will justify our approach
over other existing approaches. It is the authors belief that under
general conditions one should always use our IRLS implementation
over other methods, especially when considering the associated
management of transition regions.
The CLS problem was introduced in Section I-D and is repeated
here for clarity,
min
h
|D() H(; h)|2
subject to [D() H(; h)[
(46)
To the best of our knowledge this problem was rst introduced in the
context of lter design by John Adams [16] in 1991. The main idea
consists in approximating iteratively a desired frequency response in a
least squares sense except in the event that any frequency exhibits an
error larger than a specied tolerance . At each iteration the problem
is adjusted in order to reduce the error on offending frequencies
(i.e. those which do not meet the constraint specications). Ideally,
convergence is reached when the altered least squares problem
has a frequency response whose error does not exceed constraint
specications. As will be shown below, this goal might not be attained
depending on how the problem is posed.
Adams and some collaborators have worked in this problem and
several variations [15]. However his main (and original) problem was
illustrated in [16] with the following important assumption: the de-
nition of a desired frequency response must include a xed non-zero
width transition band. His method uses Lagrange multiplier theory
and alternation methods to nd frequencies that exceed constraints
and minimize the error at such locations, with an overall least squares
error criterion.
Burrus, Selesnick and Lang [17] looked at this problem from a
similar perspective, but relaxed the design specications so that only
a transition frequency needs to be specied. The actual transition
band does indeed exist, and it centers itself around the specied
transition frequency; its width adjusts as the algorithm iterates (con-
straint tolerances are still specied). Their solution method is similar
to Adams approach, and explicitly uses the Karush-Kuhn-Tucker
(KKT) conditions together with an alternation method to minimize
the least squares error while constraining the maximum error to meet
specications.
C. S. Burrus and the author of this work have been working
on the CLS problem using IRLS methods with positive results.
This document is the rst thorough presentation of the method,
contributions, results and code for this approach, and constitutes one
of the main contributions of this work. It is crucial to note that
there are two separate issues in this problem: on one hand there
is the matter of the actual problem formulation, mainly depending
on whether a transition band is specied or not; on the other hand
there is the question of how the selected problem description is
actually met (what algorithm is used). Our approach follows the
problem description by Burrus et al. shown in [17] with an IRLS
implementation.
1) Two problem formulations: As mentioned in Section II-F, one
can address problem (46) in two ways depending on how one views
the role of the transition band in a CLS problem. The original problem
posed by Adams in [16] can be written as follows,
min
h
|D() H(; h)|2
s.t. [D() H(; h)[ [0,
pb
] [
sb
, ]
(47)
where 0<
pb
<
sb
<. From a traditional standpoint this formula-
tion feels familiar. It assigns xed frequencies to the transition band
edges as a number of lter design techniques do. As it turns out,
however, one might not want to do this in CLS design.
An alternate formulation to (47) could implicitly introduce a
transition frequency
tb
(where
pb
<
tb
<
sb
); the user only
species
tb
. Consider
min
h
|D() H(; h)|2 [0, ]
subject to [D() H(; h)[ [0,
pb
] [
sb
, ]
(48)
The algorithm at each iteration generates an induced transition band
in order to satisfy the constraints in (48). Therefore |
pb
,
sb
vary
at each iteration.
Fig. 7. Two formulations for Constrained Least Squares problems.
It is critical to point out the differences between (47) and (48).
Figure 7.a explains Adams CLS formulation, where the desired lter
response is only specied at the xed pass and stop bands. At any
iteration, Adams method attempts to minimize the least squares error
(2) at both bands while trying to satisfy the constraint . Note
that one could think of the constraint requirements in terms of the
Chebishev error by writing (47) as follows,
min
h
|D() H(; h)|2
s.t. |D() H(; h)| [0,
pb
] [
sb
, ]
In contrast, Figure 7.b illustrates our proposed problem (48). The idea
is to minimize the least squared error 2 across all frequencies while
ensuring that constraints are met in an intelligent manner. At this
point one can think of the interval (
pb
,
sb
) as an induced transition
band, useful for the purposes of constraining the lter. Section II-F2
presents the actual algorithms that solve (48), including the process
of nding |
pb
,
sb
.
It is important to note an interesting behavior of transition bands
and extrema points in l2 and l lters. Figure 8 shows l2 and
l length-15 linear phase lters (designed using Matlabs firls
and firpm functions); the transition band was specied at |
pb
=
0.4/,
sb
= 0.5/. The dotted l2 lter illustrates an important
behavior of least squares lters: typically the maximum error of an
l2 lter is located at the transition band. The solid l lter shows
why minimax lters are important: despite their larger error across
most of the bands, the lter shows the same maximum error at all
extrema points, including the transition band edge frequencies. In
a CLS problem then, typically an algorithm will attempt to reduce
iteratively the maximum error (usually located around the transition
band) of a series of least squares lters.
Another important fact results from the relationship between the
transition band width and the resulting error amplitude in l lters.
Figure 9 shows two l designs; the transition bands were set at
|0.4/, 0.5/ for the solid line design, and at |0.4/, 0.6/ for
the dotted line one. One can see that by widening the transition band
a decrease in error ripple amplitude is induced.
These two results together illustrate the importance of the transition
bandwidth for a CLS design. Clearly one can decrease maximum
error tolerances by widening the transition band. Yet nding the
perfect balance between a transition bandwidth and a given tolerance
12
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.2
0.4
0.6
0.8
1
/

H
(

L
2
Fig. 8. Comparison of l
2
and l lters.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.2
0.4
0.6
0.8
1
/

H
(

Fig. 9. Effects of transition bands in l lters.


can prove a difcult task, as will be shown in Section II-F2. Hence
the relevance of a CLS method that is not restricted by two types
of specications competing against each other. In principle, one
should just determine how much error one can live with, and allow
an algorithm to nd the optimal transition band that meets such
tolerance.
2) Two problem solutions: Section II-F1 introduced some impor-
tant remarks regarding the behavior of extrema points and transition
bands in l2 and l lters. As one increases the constraints on an l2
lter, the result is a lter whose frequency response looks more and
more like an l lter.
Section II-E introduced the frequency-varying problem and an
IRLS-based method to solve it. It was also mentioned that, while the
method does not solve the intended problem (but a similar one), it
could prove to be useful for the CLS problem. As it turns out, in CLS
design one is merely interested in solving an unweighted, constrained
least squares problem. In this work, we achieve this by solving a
sequence of weighted, unconstrained least squares problems, where
the sole role of the weights is to constraint the maximum error of
the frequency response at each iteration. In other words, one would
like to nd weights w such that
min
h
|D() H(; h)|2
s.t. |D() H(; h)| [0,
pb
] [
sb
, ]
is equivalent to
min
h
|w() (D() H(; h))|2
Hence one can revisit the frequency-varying design method and use
it to solve the CLS problem. Assuming that one can reasonably
approximate l by using high values of p, at each iteration the main
idea is to use an lp weighting function only at frequencies where the
constraints are exceeded. A formal formulation of this statement is
w(()) =
_
[()[
p2
2
if [()[ >
1 otherwise
Assuming a suitable weighting function existed such that the
specied tolerances are related to the frequency response constraints,
the IRLS method would iterate and assign rather large weights to
frequencies exceeding the constraints, while inactive frequencies get
a weight of one. As the method iterates, frequencies with large errors
move the response closer to the desired tolerance. Ideally, all the
active constraint frequencies would eventually meet the constraints.
Therefore the task becomes to nd a suitable weighting function that
penalizes large errors in order to have all the frequencies satisfying
the constraints; once this condition is met, we have reached the
desired solution.
0 0.1 0.2 0.3
0
200
400
600
800
1000
a) Polynomial weighting (p=100)

w
(

)
0.18 0.19 0.2 0.21 0.22
10
0
10
1
10
2
b) Linearlog view (p=100)

0.18 0.19 0.2 0.21 0.22


10
0
10
1
10
2
c) Linearlog view (p=500)

Fig. 10. CLS polynomial weighting function.


One proposed way to nd adequate weights to meet constraints is
given by a polynomial weighting function of the form
w() = 1 +

()

p2
2
where effectively serves as a threshold to determine whether a
weight is dominated by either unity or the familiar lp weighting term.
Figure 10 illustrates the behavior of such a curve.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
0.2
0
0.2
0.4
0.6
0.8
1
1.2
L
2
Initial guess
f
H
(
f
)
Fig. 11. Original l
2
guess for CLS algorithm.
In practice the method outlined above has proven robust par-
ticularly in connection with the specied transition band design.
Consider the least squares design in Figure 11 (using a length-
21 Type-I linear phase low-pass FIR lter with linear transition
frequencies |0.2, 0.25). This example illustrates the typical effect of
CLS methods over l2 designs; the largest error (in an l sense) can be
located at the edges of the transition band. Figures 12 and 13 illustrate
13
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
0.2
0
0.2
0.4
0.6
0.8
1
1.2
CLS Solution (=0.06)
f
H
(
f
)
Fig. 12. CLS design example using mild constraints.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
0.2
0
0.2
0.4
0.6
0.8
1
1.2
CLS Solution (=0.03)
f
H
(
f
)
Fig. 13. CLS design example using tight constraints.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
0
0.2
0.4
0.6
0.8
1
a) L
2
Solution
H
(
F
)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
0
0.2
0.4
0.6
0.8
1
b) Intermediate Solution
H
(
f
)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
0
0.2
0.4
0.6
0.8
1
c) CLS Solution
H
(
f
)
f
Fig. 14. CLS design example without transition bands.
design examples using the proposed approach. Figure 12 shows an
example of a mild constraint ( = 0.6), whereas 13 illustrates an
advantage of this method, associated to a hard constraint ( = 0.3).
The method is trying iteratively to reduce the maximum error towards
the constraint; however the specied constraint in Figure 13 is such
that even at the point where an equiripple response is reached for
the specied transition bands the constraint is not met. At this point
the method converges to an optimal lp solution that approximates
equiripple as p increases (the examples provided use p = 50).
A different behavior occurs when no transition bands are dened.
Departing from an initial l2 guess (as shown in Figure 14.a) the
proposed IRLS-based CLS algorithm begins weighting frequencies
selectively in order to reduce the l error towards the constraints
at each iteration. Eventually an equiripple behavior can be observed
if the constraints are too harsh (as in Figure 14.b). The algorithm
will keep weighting until all frequencies meet the constraints (as
in Figure 14.c). The absence of a specied transition band presents
some ambiguity in dening valid frequencies for weighting. One
cannot (or rather should not) apply weights too close to the transition
frequency specied as this would result in an effort by the algorithm
to create a steep transition region (which as mentioned previously
is counterintuitive to nding an equiripple solution). In a sense, this
would mean having two opposite effects working at the same time and
the algorithm cannot accommodate both, usually leading to numerical
problems.
Fig. 15. Denition of induced transition band.
In order to avoid these issues, an algorithm can be devised that
selects a subset of the sampled frequencies for weighting purposes
at each iteration. The idea is to identify the largest ripple per band at
each iteration (the ripple associated with the largest error for a given
band) and select the frequencies within that band with errors equal
or smaller than such ripple error. In this way one avoids weighting
frequencies around the transition frequency. This idea is illustrated
in Figure 15.
The previous example is fundamental since it illustrates the rel-
evance of this method: since for a particular transition band the
tightest constraint that one can get is given by the equiripple (or
minimax) design (as shown in Section II-F1), a problem might arise
when specications are tighter than what the minimax design can
meet. Adams found this problem (as reported in [16]); his method
breaks under these conditions. The method proposed here overcomes
an inadequate constraint and relaxes the transition band to meet the
constraint.
It is worth noting that the polynomial weighting form works even
when no transition bands are specied (this must become evident
from Figure 14.c above). However, the user must be aware of some
practical issues related to this approach. Figure 16 shows a typical
CLS polynomial weighting function. Its spiky character becomes
more dramatic as p increases (the method still follows the homotopy
and partial updating ideas from previous sections) as shown in Figure
14
0 0.1 0.2 0.3 0.4 0.5
2
4
6
8
10
12
14
a) Weights at early iteration
w
(
f
)
0 0.1 0.2 0.3 0.4 0.5
2
4
6
8
10
12
14
b) Weights near convergence
w
(
f
)
f
Fig. 16. CLS weights.
16.b. It must be evident that the algorithm will assign heavy weights
to frequencies with large errors, but at p increases the difference in
weighting exaggerates. At some point the user must make sure that
proper sampling is done to ensure that frequencies with large weights
(from a theoretical perspective) are being included in the problem,
without compromising conputational efciency (by means of massive
oversampling, which can lead to ill-conditioning in numerical least
squares methods). Also as p increases, the range of frequencies
with signifcantly large weights becomes narrower, thus reducing
the overall weighting effect and affecting convergence speed.
Fig. 17. CLS envelope weighting function.
A second weighting form can be dened where envelopes are
used. The envelope weighting function approach works by assigning
a weight to all frequencies not meeting a constraint. The value of
such weights are assigned as at intervals as illustrated in Figure
17. Intervals are determined by the edge frequencies within neigh-
borhoods around peak error frequencies for which constraints are
not met. Clearly these neighborhoods could change at each iteration.
The weight of the k-th interval is still determined by our typical
expression,
w
k
() = [(
+
k
)[
p2
2
where
+
k
is the frequency with largest error within the k-th interval.
Envelope weighting has been applied in practice with good results.
It is particularly effective at reaching high values of p without
ill-conditioning, allowing for a true alternative to minimax design.
Figure 18 shows an example using = 0.4; the algorithm managed
to nd a solution for p = 500. By specifying transition bands
and unachievable constraints one can produce an almost equiripple
solution in an efcient manner, with the added exibility that milder
constraints will result in CLS designs.
0 0.1 0.2 0.3 0.4 0.5
0.2
0
0.2
0.4
0.6
0.8
1
1.2
H
(
f
)
a) CLS Envelope Solution
0 0.1 0.2 0.3 0.4 0.5
10
0
10
10
10
20
10
30
10
40
f
w
(
f
)
b) Envelope weights
Fig. 18. CLS design example using envelope weights.
3) Comparison with lp problem: This chapter presented two
problems with similar effects. On one hand, Section II-B3 illustrated
the fact (see Figure 6) that as p increases towards innity, an lp lter
will approximate an l one. On the other hand, Section II-F presented
the constrained least squared problem, and introduced IRLS-based
algorithms that produce lters that approximate equiripple behavior
as the constraint specications tighten.
A natural question arises: how do these methods compare with
each other? In principle it should be possible to compare their per-
formances, as long as the necessary assumptions about the problem
to be solved are compatible in both methods. Figure 19 shows a
comparison of these algorithms with the following specications:
Both methods designed length-21 Type-I lowpass linear phase
digital lters with xed transition bands dened by f =
|0.2, 0.24 (in normalized linear frequency).
The lp experiment used the following values of p = |2, 2.2,
2.5, 3,4, 5, 7, 10, 15, 20, 30, 50, 70, 100, 170, 400
The CLS experiment used the polynomial weighting method
with xed transition bands and a value of p = 60. The error
tolerances were = |.06, .077, .078, .8, .084, .088, .093, .1,
.11, .12, .13, .14, .15, .16, .17, .18
Some conclusions can be derived from Figure 19. Even though at the
extremes of the curves they both seem to meet, the CLS curve lies just
below the lp curve for most values of p and . These two facts should
be expected: on one hand, in principle the CLS algorithm gives an
l2 lter if the constraints are so mild that they are not active for any
frequency after the rst iteration (hence the two curves should match
15
around p = 2). On the other hand, once the constraints become too
harsh, the xed transition band CLS method basically should design
an equiripple lter, as only the active constraint frequencies are lp-
weighted (this effects is more noticeable with higher values of p).
Therefore for tight constraints the CLS lter should approximate an
l lter.
The reason why the CLS curve lies under the lp curve is because
for a given error tolerance (which could be interpreted as for a given
minimax error ) the CLS method nds the optimal l2 lter. An lp
lter is optimal in an lp sense; it is not meant to be optimal in either
the l2 or l senses. Hence for a given it cannot beat the CLS lter
in an l2 sense (it can only match it, which happens around p = 2 or
p = ).
It is important to note that both curves are not drastically different.
While the CLS curve represents optimality in an L2 l sense,
not all the problems mentioned in this work can be solved using
CLS lters (for example, the magnitude IIR problem presented in
Section III-C2). Also, one of the objectives of this work is to motivate
the use of lp norms for lter design problems, and the proposed
CLS implementations (which absolutely depends on IRLS-based lp
formulations) are good examples of the exibility and value of lp
IRLS methods discussed in this work.
Fig. 19. Comparison between CLS and lp problems.
III. INFINITE IMPULSE RESPONSE FILTERS
Chapter II introduced the problem of designing lp FIR lters,
along with several design scenarios and their corresponding design
algorithms. This chapter considers the design of lp IIR lters and
examines the similarities and differences compared to lp FIR lter
design. It was mentioned in Section I-D that lp FIR design involves a
polynomial approximation. The problem becomes more complicated
in the case of IIR lters as the approximation problem is a ratio of
two polynomials. In fact, the case of FIR polynomial approximation
is a special form of IIR rational approximation where the denominator
is equal to 1.
Innite Impulse Response (or recursive) digital lters constitute
an important analysis tool in many areas of science (such as signal
processing, statistics and biology). The problem of designing IIR
lters has been the object of extensive study. Several approaches
are typically used in designing IIR lters, but a general procedure
follows: given a desired lter specication (which may consist of
an impulse response or a frequency specication), a predetermined
approximation error criterion is optimized. Although one of the most
widely used error criteria in Finite Impulse Response (FIR) lters is
the least-squares criterion (which in most scenarios merely requires
the solution of a linear system), least-squares (l2) approximation for
IIR lters requires an optimization over an innite number of lter
coefcients (in the time domain approximation case). Furthermore,
optimizing for an IIR frequency response leads to a rational (nonlin-
ear) approximation problem rather than the polynomial problem of
FIR design.
As discussed in the previous chapter, a successful IRLS-based lp
algorithm depends to a large extent in the solution of a weighted
l2 problem. One could argue that one of the most important aspects
contrasting FIR and IIR lp lter design lies in the l2 optimization
step. This chapter presents the theoretical and computational issues
involved in the design of both l2 and lp IIR lters and explores several
approaches taken to handle the resulting nonlinear l2 optimization
problem. Section III-A introduces the IIR lter formulation and the
nonlinear least-squares design problem. Section III-B presents the l2
problem more formally, covering relevant methods as a manner of
background and to lay down a framework for the approach proposed
in this work. Some of the methods covered here date back to the
1960s, yet others are the result of current active work by a number
of research groups; the approach employed in this work is described
in section III-B13. Finally, Section III-C considers different design
problems concerning IIR lters in an lp sense, including IIR ver-
sions of the complex, frequency-varying and magnitude lter design
problems as well as the proposed algorithms and their corresponding
results.
A. IIR lters
An IIR lter describes a system with input x(n) and output y(n),
related by the following expression
y(n) =
M

k=0
b(k)x(n k)
N

k=1
a(k)y(n k)
Since the current output y(n) depends on the input as well as on N
previous output values, the output of an IIR lter might not be zero
well after x(n) becomes zero (hence the name Innite). Typically
IIR lters are described by a rational transfer function of the form
H(z) =
B(z)
A(z)
=
b0 +b1z
1
+ +bMz
M
1 +a1z
1
+ +aNz
N
(49)
where
H(z) =

n=0
h(n)z
n
(50)
and h(n) is the innite impulse response of the lter. Its frequency
response is given by
H() = H(z)[
z=e
j (51)
Substituting (49) into (51) we obtain
H() =
B()
A()
=
M

n=0
bne
jn
1 +
N

n=1
ane
jn
(52)
16
Given a desired frequency response D(), the l2 IIR design problem
consists of solving the following problem
min
an,bn

B()
A()
D()

2
2
(53)
for the M +N +1 real lter coefcients an, bn with (where
is the set of frequencies for which the approximation is done). A
discrete version of (53) is given by
min
an,bn

n=0
bne
j
k
n
1 +
N

n=1
ane
j
k
n
D(
k
)

2
(54)
where
k
are the L frequency samples over which the approximation
is made. Clearly, (54) is a nonlinear least squares optimization
problem with respect to the lter coefcients.
B. Least squares design of IIR lters
Section III-A introduced the IIR least squares design problem,
as illustrated in (54). Such problem cannot be solved in the same
manner as in the FIR case; therefore more sophisticated methods
must be employed. As will be discussed later in Section III-C, some
tradeoffs are desirable for lp optimization. As in the case of FIR
design, when designing lp IIR lters one must use l2 methods as
internal steps over which one iterates while moving between diferent
values of p. Clearly this internal iteration must not be too demanding
computationally since an outer lp loop will invoke it repeatedly (this
process will be further illustrated in Section III-C1). With this issue in
mind, one needs to select an l2 algorithm that remains accurate within
reasonable error bounds while remaining computationally efcient.
This section begins by summarizing some of the traditional ap-
proaches that have been employed for l2 rational approximation, both
within and outside lter design applications. Amongst the several
existing traditional nonlinear optimization approaches, the Davidon-
Fletcher-Powell (DFP) and the Gauss-Newton methods have been
often used and remain relatively well understood in the lter design
community. A brief introduction to both methods is presented in
Section III-B1, and their caveats briey explored.
An alternative to attacking a complex nonlinear problem like (54)
with general nonlinear optimization tools consists in linearization, an
attempt to linearize a nonlinear problem and to solve it by using
linear optimization tools. Multiple efforts have been applied to similar
problems in different areas of statistics and systems analysis and
design. Section III-B2 introduces the notion of an Equation Error, a
linear expression related to the actual Solution Error that one is in-
terested in minimizing in l2 design. The equation error formulation is
nonetheles important for a number of lter design methods (including
the ones presented in this work) such as Levys method, one of the
earliest and most relevant frequency domain linearization approaches.
Section III-B4 presents a frequency domain equation error algorithm
based on the methods by Prony and Pad e. This algorithm illustrates
the usefulness of the equation error formulation as it is fundamental
to the implementation of the methods proposed later in this work (in
Section III-C).
An important class of linearization methods fall under the name
of iterative preltering algorithms, presented in Section III-B8.
The Sanathanan-Koerner (SK) algorithm and the Steiglitz-McBride
(SMB) methods are well known and commonly used examples in this
category, and their strengths and weaknesses are explored. Another
recent development in this area is the method by Jackson, also
presented in this section. Finally, Soewitos quasilinearization (the
method of choice for least squares IIR approximation in this work)
is presented in Section III-B13.
1) Traditional optimization methods: One way to adress (54) is
to attempt to solve it with general nonlinear optimization tools.
One of the most typical approach in nonlinear optimization is to
apply either Newtons method or a Newton-based algorithm. One
assumption of Newtons method is that the optimization function
resembles a quadratic function near the solution. In order to update a
current estimate, Newtons method requires rst and second order
information through the use of gradient and Hessian matrices. A
quasi-Newton method is one that estimates in a certain way the
second order information based on gradients (by generalizing the
secant method to multiple dimensions).
One of the most commonly used quasi-Newton methods in IIR
lter design is the Davidon-Fletcher-Powell (DFP) method [20]. In
1970 K. Steiglitz [49] used the DFP method to solve an IIR magnitude
approximation to a desired real frequency response. For stability
concerns he used a cascade form of the IIR lter given in (49) through
H(z) =
M

r=1
1 +arz
1
+brz
2
1 +crz
1
+drz
2
(55)
Therefore he considered the following problem,
min
an,bn,cn,dn

k
_

r=1
1 +are
j
k
+bre
2j
k
1 +cre
j
k
+dre
2j
k

D(
k
)
_
2
His method is a direct implementation of the DFP algorithm in the
problem described above.
In 1972 Andrew Deczky [50] employed the DFP algorithm to solve
a complex IIR least-p approximation to a desired frequency response.
Like Steiglitz, Deczky chose to employ the cascaded IIR structure of
(55), mainly for stability reasons but also because he claims that
for this structure it is simpler to derive the rst order information
required for the DFP method.
The MATLAB Signal Processing Toolbox includes a function
called INVFREQZ, originally written by J. Smith and J. Little [22].
Invfreqz uses the algorithm by Levy (see III-B3) as an initial step
and then begins an iterative algorithm based on the damped Gauss-
Newton [21] to minimize the solution error s according to the least-
squared error criteria. This method performs a line search after every
iteration to nd the optimal direction for the next step. Invfreqz
evaluates the roots of A(z) after each iteration to verify that the
poles of H(z) lie inside the unit circle; otherwise it will convert the
pole into its reciprocal. This approach guarantees a stable lter.
Among other Newton-based approaches, Spanos and Mingori [23]
use a Newton algorithm combined with the Levenberg-Marquardt
technique to improve the algorithms convergence properties. Their
idea is to express the denominator function A() as a sum of second-
order rational polynomials. Thus H() can be written as
H() =
L1

r=1
br +jr
ar +jr
2
+d
Their global descent approach is similar to the one presented in [24].
As any Newton-based method, this approach suffers under a poor
initial guess, and does not guarantee to converge (if convergence
occurs) to a local minimum. However, in such case, convergence is
quadratic.
Kumaresans method [25] considers a three-step approach. It is not
clear whether his method attempts to minimize the equation error or
the solution error. He uses divided differences [26] to reformulate
the solution error in terms of the coefcients a
k
. Using Lagrange
multiplier theory, he denes
c = y
T
C
T
[CC
T
]
1
Cy (56)
17
where y = [H0H1 HL1]
T
contains the frequency samples
and C is a composition matrix containing the frequency divided
differences and the coefcients a
k
(a more detailed derivation can
be found in [51]). Equation (56) is iterated until convergence of the
coefcient vector a is reached. This vector is used as initial guess in
the second step, involving a Newton-Raphson search of the optimal
a that minimizes |c|2. Finally the vector

b is found by solving a
linear system of equations.
2) Equation error linearization methods: Typically general use
optimization tools prove effective in nding a solution. However in
the context of IIR lter design, they often tend to take a rather large
number of iterations, generate large matrices or require complicated
steps like solving or estimating (and often inverting) vectors and
matrices of rst and second order information [35]. Using gradient-
based tools for nonlinear problems like (54) certainly seems like
a suboptimal approach. Also, typical Newton-based methods tend
to converge quick (quadratically), yet they make assumptions about
radii of convergence and initial proximity to the solution (otherwise
performance is suboptimal). In the context of lter design one
should wonder if better performance could be achieved by exploiting
characteristics from the problem. This section introduces the concept
of linearization, an alternative to general optimization methods that
has proven successful in the context of rational approximation. The
main idea behind linearization approaches consists in transforming a
complex nonlinear problem into a sequence of linear ones, an idea
that is parallel to the approach followed in our development of IRLS
lp optimization.
A common notion used in this work (as well as some of the
literature related to linearization and lter design) is that there are
two different error measures that authors often refer to. It is important
to recognize the differences between them as one browses through
literature. Typically one would be interested in minimizing the l2
error given by:
= |c()|
2
2
=
_
_
_
_
D()
B()
A()
_
_
_
_
2
2
(57)
This quantity is often referred to as the solution error (denoted by s);
we refer to the function c() in (57) as the solution error function,
denoted by cs(). Also, in linearization algorithms the following
measure often arises,
= |c()|
2
2
= |A()D() B()|
2
2
(58)
This measure is often referred to as the equation error e; we denote
the function c() in (58) as the equation error function ce().
Keeping the notation previously introduced, it can be seen that the
two errors relate by one being a weighted version of the other,
ce() = A()cs()
3) Levys method: E. C. Levy [27] considered in 1959 the follow-
ing problem in the context of analog systems (electrical networks to
be more precise): dene
1
H(j) =
B0 +B1(j) +B2(j)
2
+
A0 +A1(j) +A2(j)
2
+
=
B()
A()
(59)
Given L samples of a desired complex-valued function D(j
k
) =
R(
k
) + jI(
k
) (where R, I are both real funtions of ), Levy
denes
c() = D(j) H(j) = D(j)
B()
A()
1
For consistency with the rest of this document, notation has been modied
from the authors original paper whenever deemed necessary.
or
=
L

k=0
[c(
k
)[
2
=
L

k=0
[A(
k
)D(j
k
) B(
k
)[
2
(60)
Observing the linear structure (in the coefcients A
k
, B
k
) of equation
(60), Levy proposed minimizing the quantity . He actually realized
that this measure (what we would denote as the equation error) was
indeed a weighted version of the actual solution error that one might
be interested in; in fact, the denominator function A() became the
weighting function.
Levys proposed method for minimizing (60) begins by writing
as follows,
=
L

k=0
_
(R
k

k

k

k
I
k

k
)
2
+ (
k

k
R
k
+
k
I
k

k

k
)
2
_
(61)
by recognizing that (59) can be reformulated in terms of its real and
imaginary parts,
H(j) =
+j
+j
with
+j =(B0 B2
2
+B4
4
)
+j(B1 B3
2
+B5
4
)
+j =(A0 A2
2
+A4
4
)
+j(A1 A3
2
+A5
4
)
and performing appropriate manipulations
2
. Note that the optimal set
of coefcients A
k
, B
k
must satisfy

A0
=

A1
= . . . =

B0
= . . . = 0
The conditions introduced above generate a linear system in the lter
coefcients. Levy derives the system
Cx = y (62)
where C = |C1 C2 with
C1 =
_

_
0 0 2 0 4
0 2 0 4 0
2 0 4 0 6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
T1 S2 T3 S4 T5
S2 T3 S4 T5 S6
T3 S4 T5 S6 T7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_

_
C2 =
_

_
T1 S2 T3 S4 T5
S2 T3 S4 T5 S6
T3 S4 T5 S6 T7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
U2 0 U4 0 U6
0 U4 0 U6 0
U4 0 U6 0 U8
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_

_
2
For further details on the algebraic manipulations involved, the reader
should refer to [27].
18
and
x =
_

_
B0
B1
B2
.
.
.
A1
A2
.
.
.
_

_
y =
_

_
S0
T1
S2
T3
.
.
.
0
U2
0
U4
.
.
.
_

_
(63)
with

h
=
L1

l=0

h
l
S
h
=
L1

l=0

h
l
R
l
T
h
=
L1

l=0

h
l
I
l
U
h
=
L1

l=0

h
l
(R
2
l
+I
2
l
)
Solving for the vector x from (62) gives the desired coefcients (note
the trivial assumption that A0 = 1). It is important to remember
that although Levys algorithm leads to a linear system of equations
in the coefcients, his approach is indeed an equation error method.
Matlabs invfreqz function uses an adaptation of Levys algorithm
for its least-squares equation error solution.
4) Prony-based equation error linearization: A number of algo-
rithms that consider the approximation of functions in a least-squared
sense using rational functions relate to Pronys method. This section
summarizes these methods especially in the context of lter design.
5) Pronys method: The rst method considered in this section is
due to Gaspard Riche Baron de Prony, a Lyonnais mathematician and
physicist which, in 1795, proposed to model the expansion properties
of different gases by sums of damped exponentials. His method [29]
approximates a sampled function f(n) (where f(n) = 0 for n<0)
with a sum of N exponentials,
f(n) =
N

k=1
c
k
e
s
k
n
=
N

k=1
c
k

n
k
(64)
where
k
= e
s
k
. The objective is to determine the N parameters c
k
and the N parameters s
k
in (64) given 2N samples of f(n).
It is possible to express (64) in matrix form as follows,
_
_
_
_
_
1 1 1
1 2 N
.
.
.
.
.
.
.
.
.
.
.
.

N1
1

N1
2

N1
N
_

_
_
_
_
_
_
c1
c2
.
.
.
cN
_

_
=
_
_
_
_
_
f(0)
f(1)
.
.
.
f(N 1)
_

_
(65)
System (65) has a Vandermonde structure with N equations, but 2N
unknowns (both c
k
and
k
are unknown) and thus it cannot be solved
directly. Yet the major contribution of Pronys work is to recognize
that f(n) as given in (64) is indeed the solution of a homogeneous
order-N Linear Constant Coefcient Difference Equation (LCCDE)
[52, ch. 4] given by
N

p=0
apf(mp) = 0 (66)
with a0 = 1. Since f(n) is known for 0 n 2N 1, we can
extend (66) into an (N N) system of the form
_
_
_
_
_
f(N 1) f(N 2) f(0)
f(N) f(N 1) f(1)
.
.
.
.
.
.
.
.
.
.
.
.
f(2N 2) f(2N 3) f(N 1)
_

_
_
_
_
_
_
a1
a2
.
.
.
aN
_

_
=

f
(67)
where

f =
_
_
_
_
_
f(N)
f(N + 1)
.
.
.
f(2N 1)
_

_
which we can solve for the coefcients ap. Such coefcients are then
used in the characteristic equation [53, 2.3] of (66),

N
+a1
N1
+ +aN1 +aN = 0 (68)
The N roots
k
of (67) are called the characteristic roots of (66).
From the
k
we can nd the parameters s
k
using s
k
= ln
k
. Finally,
it is now possible to solve (65) for the parameters c
k
.
The method described above is an adequate representation of
Pronys original method [29]. More detailed analysis is presented in
[54][57] and [58, 11.4]. Pronys method is an adequate algorithm
for interpolating 2N data samples with N exponentials. Yet it is
not a lter design algorithm as it stands. Its connection with IIR
lter design, however, exists and will be discussed in the following
sections.
6) Pad es method: The work by Prony served as inspiration to
Henry Pad e, a French mathematician which in 1892 published a work
[30] discussing the problem of rational approximation. His objective
was to approximate a function that could be represented by a power
series expansion using a rational function of two polynomials.
Assume that a function f(x) can be represented with a power
series expansion of the form
f(x) =

k=0
c
k
x
k
(69)
Pad es idea was to approximate f(x) using the function

f(x) =
B(x)
A(x)
(70)
where
B(x) =
M

k=0
b
k
x
k
and
A(x) = 1 +
N

k=1
a
k
x
k
The objective is to determine the coefcients a
k
and b
k
so that the
rst M +N + 1 terms of the residual
r(x) = A(x)f(x) B(x)
dissappear (i.e. the rst N + M derivatives of f(x) and

f(x) are
equal [59]). That is, [60],
r(x) = A(x)

k=0
c
k
x
k
B(x) = x
M+N+1

k=0
d
k
x
k
To do this, consider A(x)f(x) = B(x) [56]
(1 +a1x + +aNx
N
) (c0 +c1x + +cix
i
+ ) =
b0 +b1x + +bMx
M
19
By equating the terms with same exponent up to order M +N +1,
we obtain two sets of equations,
_

_
c0 = b0
a1c0 +c1 = b1
a2c0 +a1c1 +c2 = b2
a3c0 +a2c1 +a1c2 +c3 = b3
.
.
.
aNcMN +aN1cMN+1 + +cM = bM
(71)
_

_
aNcMN+1 +aN1cMN+2 + +cM+1 = 0
aNcMN+2 +aN1cMN+3 + +cM+2 = 0
.
.
.
aNcM +aN1cM+1 + +cM+N = 0
(72)
Equation (72) represents an N N system that can be solved for
the coefcients a
k
given c(n) for 0nN +M. These values can
then be used in (71) to solve for the coefcients b
k
. The result is a
system whose impulse response matches the rst N +M +1 values
of f(n).
7) Prony-based lter design methods: Both the original methods
by Prony and Pade were meant to interpolate data from applications
that have little in common with lter design. What is relevant to this
work is their use of rational functions of polynomials as models for
data, and the linearization process they both employ.
When designing FIR lters, a common approach is to take L
samples of the desired frequency response D() and calculate the
inverse DFT of the samples. This design approach is known as fre-
quency sampling. It has been shown [28] that by designing a length-
L lter h(n) via the frequency sampling method and symmetrically
truncating h(n) to N values (N L) it is possible to obtain a least-
squares optimal length-N lter hN(n). It is not possible however
to extend completely this method to the IIR problem. This section
presents an extension based on the methods by Prony and Pade, and
illustrates the shortcomings of its application.
Consider the frequency response dened in (52). One can choose
L equally spaced samples of H() to obtain
H(
k
) = H
k
=
B
k
A
k
for k = 0, 1, . . . , L 1 (73)
where A
k
and B
k
represent the length-L DFTs of the lter coef-
cients an and bn respectively. The division in (73) is done point-by-
point over the L values of A
k
and B
k
. The objective is to use the
relationship in described in (73) to calculate an and bn.
One can express (73) as B
k
= H
k
A
k
. This operation represents
the length-L circular convolution b(n) = h(n) _L a(n) dened as
follows [43, 8.7.5]
b(n) = h(n)_L a(n) =
L1

m=0
h[((n m))L] a(m) , 0nL1
(74)
where h(n) is the length-L inverse DFT of H
k
and the operator
(())L represents modulo L. Let
a =
_
_
_
_
_
_
_
_
_
_
_
_
_
1
a1
.
.
.
aN
0
.
.
.
0
_

_
and

b =
_
_
_
_
_
_
_
_
_
_
_
_
_
b0
b1
.
.
.
bM
0
.
.
.
0
_

_
(75)
Therefore (74) can be posed as a matrix operation [28, 7.4.1] of the
form

H a =

b (76)
where

H = [

H1

H2] with

H1 =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
h0 hL1 hLN
h1 h0 hLN+1
.
.
.
.
.
.
.
.
.
hM hM1 h
((LN+M))
L
hM+1 hM h
((LN+M+1))
L
.
.
.
.
.
.
.
.
.
hL2 hL3 hLN2
hL1 hL2 hLN1
_

H2 =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
hLN1 h2 h1
hLN h3 h2
.
.
.
.
.
.
.
.
.
h
((LN+M1))
L
hM+2 hM+1
h
((LN+M))
L
hM+3 hM+2
.
.
.
.
.
.
.
.
.
hLN3 h0 hL1
hLN2 h1 h0
_

_
Hence

His an LL matrix. From (75) it is clear that the L(N+1)
rightmost columns of

H can be discarded (since the last L(N+1)
values of a in (75) are equal to 0). Therefore equation (76) can be
rewritten as
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
h0 hL1 hLN
h1 h0 hLN+1
.
.
.
.
.
.
.
.
.
hM hM1 h
((LN+M))
L
hM+1 hM h
((LN+M+1))
L
.
.
.
.
.
.
.
.
.
hL2 hL3 hLN2
hL1 hL2 hLN1
_

_
_
_
_
_
_
1
a1
.
.
.
aN
_

_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
b0
b1
.
.
.
bM
0
.
.
.
0
0
_

_
(77)
or in matrix notation
H
_
1
a
_
=
_
b
0
_
or H a =

b (78)
where a and b correspond to the length-N and (M + 1) lter
coefcient vectors respectively and H contains the rst N + 1
columns of

H. It is possible to uncouple the calculation of a and b
from (78) by breaking H furthermore as follows,
H =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
h0 hL1 hLN
h1 h0 hLN+1
.
.
.
.
.
.
.
.
.
hM hM1 h
((LN+M))
L
hM+1 hM h
((LN+M+1))
L
.
.
.
.
.
.
.
.
.
hL2 hL3 hLN2
hL1 hL2 hLN1
_

_
Therefore
H =
_
H1
H2
_
a =
_
b
0
_
(79)
20
with
a =
_
1
a
_
as dened in (78). This formulation allows to uncouple the calcula-
tions for a and b using two systems,
H1
a = b
H2
a = 0
Note that the last equation can be expressed as

H2a =

h2 (80)
where H2 = [

h2

H2] (that is,

h2 and

H2 contain the rst and
second through N-th columns of

H2 respectively).
From (80) one can conclude that if L = N + M + 1 and if

H2
and H1 are nonsingular, then they can be inverted
3
to solve for the
lter coefcient vectors a in (80) and solve for b using H1
a = b.
The algorithm described above is an interpolation method rather
than an approximation one. If L>N+M+1 and

H2 is full column
rank then (80) is an overdetermined linear system for which no exact
solution exists; therefore an approximation must be found. From (73)
we can dene the solution error function cs(
k
) as
cs(
k
) =
B(
k
)
A(
k
)
H(
k
) (81)
Using this notation, the design objective is to solve the nonlinear
problem
min
a,b
|cs(
k
)|
2
2
Consider the system in equation (78). If H2 is overdetermined, one
can dene an approximation problem by introducing an error vector
e,

b = H a e (82)
where
e =
_
e1
e2
_
Again, it is possible to uncouple (82) as follows,
b = H1
a e1 (83)
e2 =

h2 +

H2a (84)
One can minimize the least-squared error norm |e2|2 of the overde-
termined system (84) by solving the normal equations [21]

H
T
2

h2 =

H
T
2

H2a
so that
a = [

H
T
2

H2]
1

H
T
2

h2
and use this result in (83)
b = H1
a (85)
Equation (83) represents the following time-domain operation,
(n) = b(n) h(n) _L a(n) , 0nM
(where _L denotes circular convolution) and can be interpreted in the
frequency domain as follows,
ce(
k
) = B(
k
) H(
k
)A(
k
) (86)
Equation (86) is a weighted version of (81), as follows
ce(
k
) = A(
k
)cs(
k
)
3
In practice one should not invert the matrices H
1
and

H
2
but use a more
robust and efcient algorithm. See [61] for details.
Therefore the algorithm presented above will nd the lter coefcient
vectors a and b that minimize the equation error ce in (86) in the
least-squares sense. Unfortunately, this error is not what one may
want to optimize, since it is a weighted version of the solution error
cs.
8) Iterative preltering linearization methods: Section III-B2 in-
troduced the equation error formulation and several algorithms that
minimize it. In a general sense however one is more interested
in minimizing the solution error problem from (54). This section
presents several algorithms that attempt to minimize the solution
error formulation from (53) by preltering the desired response D()
in (58) with A(). Then a new set of coefcients |an, bn are
found with an equation error formulation and the preltering step
is repeated, hence dening an iterative procedure.
9) Sanathanan-Koerner (SK) method: The method by Levy pre-
sented in Section III-B3 suggests a relatively easy-to-implement
approach to the problem of rational approximation. While interesting
in itself, the equation error e does not really represent what in
principle one would like to minimize. A natural extension to Levys
method is the one proposed [31] by C. K. Sanathanan and J.
Koerner in 1963. The algorithm iteratively prelters the equation
error formulation of Levy with an estimate of A(). The SK method
considers the solution error function cs dened by
cs() = D()
B()
A()
=
1
A()
[A()D() B()]
=
1
A()
ce()
(87)
Then the solution error problem can be written as
min
a
k
,b
k
s (88)
where
s =
L

k=0
[cs(
k
)[
2
=
L

k=0
1
[A()[
2
[ce(
k
)[
2
= W()[ce(
k
)[
2
(89)
Note that given A(), one can obtain an estimate for B() by
minimizing ce as Levy did. This approach provides an estimate,
though, because one would need to know the optimal value of A()
to truly optimize for B(). The idea behind this method is that by
solving iteratively for A() and B() the algorithm would eventually
converge to the solution of the desired solution error problem dened
by (88). Since A() is not known from the beginning, it must be
initialized with a reasonable value (such as A(
k
) = 1).
To solve (88) Sanathanan and Koerner dened the same linear
system from (62) with the same matrix and vector denitions.
However the scalar terms used in the matrix and vectors reect the
presence of the weighting function W() in s as follows,

h
=
L1

l=0

h
l
W(
l
)
S
h
=
L1

l=0

h
l
R
l
W(
l
)
T
h
=
L1

l=0

h
l
I
l
W(
l
)
U
h
=
L1

l=0

h
l
(R
2
l
+I
2
l
)W(
l
)
21
Then, given an initial denition of A(), at the p-th iteration one
sets
W() =
1
[Ap1(
k
)[
2
(90)
and solves (62) using |, S, T, U as dened above until a con-
vergence criterion is reached. Clearly, solving (88) using (89) is
equivalent to solving a series of weighted least squares problems
where the weighting function consists of the estimated values of A()
from the previous iteration. This method is similar to a time-domain
method proposed by Steiglitz and McBride [34], presented later in
this chapter.
10) Method of Sid-Ahmed, Chottera and Jullien: The methods
by Levy and Sanathanan and Koerner did arise from an analog
analysis problem formulation, and cannot therefore be used directly
to design digital lters. However these two methods present important
ideas that can be translated to the context of lter design. In 1978
M. Sid-Ahmed, A. Chottera and G. Jullien followed on these two
important works and adapted [32] the matrix and vectors used by
Levy to account for the design of IIR digital lters, given samples
of a desired frequency response. Consider the frequency response
H() dened in (52). In parallel with Levys development, the
corresponding equation error can be written as
e =
L

k=0
[F
k
()[
2
(91)
with
F
k
() =
_
(R
k
+jI
k
)
_
1 +
N

c=1
aie
j
k
c
_

_
M

c=0
bie
j
k
c
__
One can follow a similar differentiation step as Levy by setting
e
a1
=
e
a2
= . . . =
e
b0
= . . . = 0
with as dened in (91). Doing so results in a linear system of the
form
Cx = y
where the vectors x and y are given by
x =
_

_
b0
.
.
.
bM
a1
.
.
.
aN
_

_
y =
_

_
0 r0
.
.
.
M rM
1
.
.
.
N
_

_
(92)
The matrix C has a special structure given by
C =
_

_
where and are symmetric Toeplitz matrices of order M + 1
and N respectively, and their rst row is given by
1m = m1 for m = 1, . . . , M + 1
1m = m1 for m = 1, . . . , N
Matrix has order M + 1 N and has the property that elements
on a given diagonal are identical (i.e. i,j = i+1,j+1). Its entries
are given by
1m = m +rm for m = 1, . . . , N
m1 = m2 rm2 for m = 2, . . . , M + 1
The parameters |, , r, are given by
i =
L

k=0
cos i
k
for 0i M
i =
L

k=0
[D(
k
)[
2
cos i
k
for 0i N 1
i =
L

k=0
R
k
cos i
k
for 0i max(N, M 1)
ri =
L

k=0
I
k
sin i
k
for 0i max(N, M 1)
The rest of the algorithm works the same way as Levys. For a
solution error approach, one must weight each of the parameters
mentioned above with the factor from (90) as in the SK method.
There are two important details worth mentioning at this point:
on one hand the methods discussed up to this point (Levy, SK and
Sid-Ahmed et al.) do not put any limitation on the spacing of the
frequency samples; one can sample as ne or as coarse as desired in
the frequency domain. On the other hand there is no way to decouple
the solution of both numerator and denominator vectors. In other
words, from (63) and (92) one can see that the linear systems that
solve for vector x solve for all the variables in it. This is more
of an issue for the iterative methods (SK & Sid-Ahmed), since at
each iteration one solves for all the variables, but for the purposes of
updating one needs only to keep the denominator variables (they get
used in the weighting function); the numerator variables are never
used within an iteration (in contrast to Burrus Prony-based method
presented in Section III-B4). This approach decouples the numerator
and denominator computation into two separate linear systems. One
only needs to compute the denominator variables until convergence
is reached, and only then it becomes necessary to compute the
numerator variables. Therefore most of the iterations solve a smaller
linear system than the methods involved up to this point.
11) Steiglitz-McBride iterative algorithm: In 1965 K. Steiglitz
and L. McBride presented an algorithm [33], [34] that has become
quite popular in statistics and engineering applications. The Steiglitz-
McBride method (SMB) considers the problem of deriving a transfer
function for either an analog or digital system from their input and
output data; in essence it is a time-domain method. Therefore it is
mentioned in this work for completeness as it closely relates to the
methods by Levy, SK and Sid-Ahmed, yet it is far better known and
understood.
The derivation of the SMB method follows closely that of SK. In
the Z-domain, the transfer function of a digital system is dened by
H(z) =
B(z)
A(z)
=
b0 +b1z
1
+. . . +bNz
N
1 +a1z1 +. . . +aNz
N
Furthermore
Y (z) = H(z)X(z) =
B(z)
A(z)
X(z)
Steiglitz and McBride dene the following problem,
min s =

i
ci(z)
2
=
1
2j
_

X(z)
B(z)
A(z)
D(z)

2
dz
z
(93)
where X(z) =

j
xjz
j
and D(z) =

j
djz
j
represent the
z-transforms of the input and desired signals respectively. Equation
(93) is the familiar nonlinear solution error function expressed in the
Z-domain. Steiglitz and McBride realized the complexity of such
function and proposed the iterative solution (93) using a simpler
22
problem dened by
min e =

i
ci(z)
2
=
1
2j
_
[X(z)B(z) D(z)A(z)[
2
dz
z
(94)
This linearized error function is the familiar equation error in the
Z-domain. Steiglitz and McBride proposed a two-mode iterative
approach. The SMB Mode 1 iteration is similar to the SK method,
in that at the k-th iteration a linearized error criterion based on (94)
is used,
c
k
(z) =
B
k
(z)
A
k1
(z)
X(z)
A
k
(z)
A
k1
(z)
D(z)
= W
k
(z) [B
k
(z)X(z) A
k
(z)D(z)]
(95)
where
W
k
(z) =
1
A
k1
(z)
Their derivation
4
leads to the familiar linear system
Cx = y
with the following vector denitions
x =
_

_
b0
.
.
.
bN
a1
.
.
.
aN
_

_
q
j
=
_

_
xj
.
.
.
xjN+1
dj1
.
.
.
djN
_

_
The vector q
j
is referred to as the input-output vector. Then
C =

j
q
j
q
T
j
y =

j
djq
j
SMB Mode 2 is an attempt at reducing further the error once
Mode 1 produces an estimate close enough to the actual solution.
The idea behind Mode 2 is to consider the solution error dened by
(93) and equate its partial derivatives with respect to the coefcients
to zero. Steiglitz and McBride showed [33], [34] that this could be
attained by dening a new vector
rj =
_

_
xj
.
.
.
xjN+1
yj1
.
.
.
yjN
_

_
Then
C =

j
rjq
T
j
y =

j
djrj
The main diference between Mode 1 and Mode 2 is the fact that
Mode 1 uses the desired values to compute its vectors and matrices,
whereas Mode 2 uses the actual output values from the lter. The
rationale behind this is that at the beggining, the output function
y(t) is not accurate, so the desired function provides better data for
computations. On the other hand, Mode 1 does not really solve the
4
For more details the reader should refer to [33], [34].
desired problem. Once Mode 1 is deemed to have reached the vicinity
of the solution, one can use true partial derivatives to compute the
gradient and nd the actual solution; this is what Mode 2 does.
It has been claimed that under certain conditions the Steiglitz-
McBride algorithm converges. However no guarantee of global
convergence exists. A more thorough discussion of the Steiglitz-
McBride algorithm and its relationships to other parameter estimation
algorithms (such as the Iterative Quadratic Maximum Likelihood
algorithm, or IQML) are found in [62][64].
12) Jacksons method: The following is a recent approach (from
2008) by Leland Jackson [36] based in the frequency domain.
Consider vectors a R
N
and b R
M
such that
H() =
B()
A()
where H(), B(), A() are the Fourier transforms of h, b and a
respectively. For a discrete frequency set one can describe Fourier
transform vectors B = W
b
b and A = Waa (where W
b
, Wa
correspond to the discrete Fourier kernels for b, a respectively).
Dene
Ha(
k
) =
1
A(
k
)
In vector notation, let Da = diag(Ha) = diag(1/A). Then
H() =
B()
A()
= Ha()B() H = DaB (96)
Let H
d
() be the desired complex frequency response. Dene D
d
=
diag(H
d
). Then one wants to solve
min E

E = |E|
2
2
where E = H H
d
. From (96) one can write H = H
d
+E as
H = DaB = DaW
b
b (97)
Therefore
H
d
= H E = DaW
b
b E (98)
Solving (98) for b one gets
b = (DaW
b
)\H
d
(99)
Also,
H
d
= D
d

I = D
d
DaA = DaD
d
A = DaD
d
Waa
where

I is a unit column vector. Therefore
H E = H
d
= DaD
d
Waa
From (98) we get
DaW
b
b E = DaD
d
Waa
or
DaD
d
Waa +E = DaW
b
b
which in a least squares sense results in
a = (DaD
d
Wa)\(DaW
b
b) (100)
From (99) one gets
a = (DaD
d
Wa)\(DaW
b
[(DaW
b
)\H
d
])
As a summary, at the i-th iteration one can write (98) and (100) as
follows,
bi = (diag(1/Ai1)W
b
)\H
d
ai = (diag(1/Ai1)diag(H
d
)Wa)\(diag(1/Ai1)W
b
bi)
23
13) Soewitos quasilinearization method: Consider the equation
error residual function
e(
k
) = B(
k
) D(
k
) A(
k
)
=
M

n=0
bne
j
k
n
D(
k
)
_
1 +
N

n=1
ane
j
k
n
_
= b0 +b1e
j
k
+ +bMe
j
k
M

D
k
D
k
a1e
j
k
D
k
aNe
j
k
N
=
_
b0 + bMe
j
k
M
_

D
k
_
a1e
j
k
+ aNe
j
k
N
_
D
k
with D
k
= D(
k
). The last equation indicates that one can represent
the equation error in matrix form as follows,
e = Fh D
where F = [F1 F2] and
F1 =
_
_
_
1 e
j
0
e
j
0
M
.
.
.
.
.
.
.
.
.
1 e
j
L1
e
j
L1
M
_

_
F2 =
_
_
_
D0e
j
0
D0e
j
0
N
.
.
.
.
.
.
DL1e
j
L1
DL1e
j
L1
N
_

_
and
h =
_
_
_
_
_
_
_
_
_
_
_
_
_
b0
b1
.
.
.
bM
a1
.
.
.
aN
_

_
and D =
_
_
_
D0
.
.
.
DL1
_

_
Consider now the solution error residual function
s(
k
) = H(
k
) D(
k
) =
B(
k
)
A(
k
)
D(
k
)
=
1
A(
k
)
[B(
k
) D(
k
) A(
k
)]
= W(
k
)e(
k
)
Therefore one can write the solution error in matrix form as follows
s = W(Fh D) (101)
where Wis a diagonal matrix with
1
A()
in its diagonal. From (101)
the least-squared solution error s = s

s can be minimized by
h = (F

W
2
F)
1
F

W
2
D (102)
From (102) an iteration
5
could be dened as follows
hi+1 = (F

W
2
i
F)
1
F

W
2
i
D
by setting the weights W in (101) equal to A
k
(), the Fourier
transform of the current solution for a.
A more formal approach to minimizing s consists in using a
gradient method (these approaches are often referred to as Newton-
like methods). First one needs to compute the Jacobian matrix J of s,
5
Soewito refers to this expression as the Steiglitz-McBride Mode-1 in
frequency domain.
where the pq-th term of J is given by Jpq =
sp
hq
with s as dened
in (101). Note that the p-th element of s is given by
sp = Hp Dp =
Bp
Ap
Dp
For simplicity one can consider these reduced form expressions for
the independent components of h,
sp
bq
=
1
Ap

bq
M

n=0
bne
jpn
= Wpe
jpq
sp
aq
= Bp

aq
1
Ap
=
Bp
A
2
p

aq
_
1 +
N

n=1
ane
jpn
_
=
1
Ap

Bp
Ap
e
jpq
= WpHpe
jpq
Therefore on can express the Jacobian J as follows,
J = WG (103)
where G = [G1 G2] and
G1 =
_
_
_
1 e
j
0
e
j
0
M
.
.
.
.
.
.
.
.
.
1 e
j
L1
e
j
L1
M
_

_
G2 =
_
_
_
H0e
j
0
H0e
j
0
N
.
.
.
.
.
.
HL1e
j
L1
HL1e
j
L1
N
_

_
Consider the solution error least-squares problem given by
min
h
f(h) = s
T
s
where s is the solution error residual vector as dened in (101) and
depends on h. It can be shown [21, pp. 219] that the gradient of the
squared error f(h) (namely f) is given by
f = J

s (104)
A necessary condition for a vector h to be a local minimizer of f(h)
is that the gradient f be zero at such vector. With this in mind and
combining (101) and (103) in (104) one gets
f = G

W
2
(Fh D) = 0 (105)
Solving the system (105) gives
h = (G

W
2
F)
1
G

W
2
D
An iteration can be dened as follows
6
hi+1 = (G

i
W
2
i
F)
1
G

i
W
2
i
D (106)
where matrices Wand G reect their dependency on current values
of a and b.
Atmadji Soewito [35] expanded the method of quasilinearization
of Bellman and Kalaba [65] to the design of IIR lters. To understand
his method consider the rst order of Taylors expansion near Hi(z),
given by
Hi+1(z) = Hi(z)+
[Bi+1(z) Bi(z)]Ai(z) [Ai+1(z) Ai(z)]Bi(z)
A
2
i
(z)
= Hi(z)+
Bi+1(z) Bi(z)
Ai(z)

Bi(z)[Ai+1(z) Ai(z)]
A
2
i
(z)
6
Soewito refers to this expression as the Steiglitz-McBride Mode-2 in
frequency domain. Compare to the Mode-1 expression and the use of G
i
instead of F.
24
Using the last result in the solution error residual function s() and
applying simplication leads to
s() =
Bi+1()
Ai()

Hi()Ai+1()
Ai()
+
Bi()
Ai()
D()
=
1
Ai()
[Bi+1() Hi()Ai+1() +Bi() . . .
Ai()D()]
(107)
Equation (107) can be expressed (dropping the use of for simplic-
ity) as
s = W
_
_
[Bi+1 Hi(Ai+1 1)]Hi
_
+
_
[Bi D(Ai 1)]D
_
_
(108)
One can recognize the two terms in brackets as Ghi+1 and Fhi
respectively. Therefore (108) can be represented in matrix notation
as follows,
s = W[Ghi+1 (D+Hi Fhi)] (109)
with H = [H0, H1, , HL1]
T
. Therefore one can minimize s
T
s
from (109) with
hi+1 = (G

i
W
2
i
Gi)
1
G

i
W
2
i
(D+Hi Fhi) (110)
since all the terms inside the parenthesis in (110) are constant at the
(i + 1)-th iteration. In a sense, (110) is similar to (106), where the
desired function is updated from iteration to iteration as in (110).
It is important to note that any of the three algorithms can be
modied to solve a weighted l2 IIR approximation using a weighting
function W() by dening
V () =
W()
A()
(111)
Taking (111) into account, the following is a summary of the three
different updates discussed so far:
SMB Frequency Mode-1: hi+1 = (F

V
2
i
F)
1
F

V
2
i
D
SMB Frequency Mode-2: hi+1 = (G

i
V
2
i
F)
1
G

i
V
2
i
D
Soewitos quasilinearization: hi+1 = (G

i
V
2
i
Gi)
1
. . .
G

i
V
2
i
(D+Hi Fhi)
C. lp approximation
Innite Impulse Response (IIR) lters are important tools in signal
processing. The exibility they offer with the use of poles and zeros
allows for relatively small lters meeting specications that would
require somewhat larger FIR lters. Therefore designing IIR lters
in an efcient and robust manner is an inportant problem.
This section covers the design of a number of important lp IIR
problems. The methods proposed are consistent with the methods
presented for FIR lters, allowing one to build up on the lessons
learned from FIR design problems. The complex lp IIR problem is
rst presented in Section III-C1, being an essential tool for other
relevant problems. The lp frequency-dependent IIR problem is also
introduced in Section III-C1. While the frequency-dependent formu-
lation might not be practical in itself as a lter design formulation,
it is fundamental for the more relevant magnitude lp IIR lter design
problem, presented in Section III-C2.
Some complications appear when designing IIR lters, among
which the intrinsic least squares solving step clearly arises from
the rest. Being a nonlinear problem, special handling of this step
is required. It was detemined after thorough experimentation that the
quasilinearization method of Soewito presented in Section III-B13
can be employed successfully to handle this issue.
Fig. 20. Block diagram for complex lp IIR algorithm.
1) Complex and frequency-dependent lp approximation: Chapter
II introduced the problem of designing lp complex FIR lters. The
complex lp IIR algorithm builds up on its FIR counterpart by
introducing a nested structure that internally solves for an l2 complex
IIR problem. Figure 20 illustrates this procedure in more detail. This
method was rst presented in [66].
Fig. 21. Results for complex l
100
IIR design.
Compared to its FIR counterpart, the IIR method only replaces the
weighted linear least squares problem for Soewitos quasilinearization
algorithm. While this nesting approach might suggest an increase
in computational expense, it was found in practice that after the
initial l2 iteration, in general the lp iterations only require from
one to only a few internal weighted l2 quasilinearization iterations,
thus maintaining the algorithm efciency. Figures 21 through 23
present results for a design example using a length-5 IIR lter
with p = 100 and transition edge frequencies of 0.2 and 0.24 (in
normalized frequency).
Figure 21 compares the l2 and lp results and includes the desired
frequency samples. Note that no transition band was specied. Figure
22 illustrates the effect of increasing p. The largest error for the l2
solution is located at the transition band edges. As p increases the
algorithm weights the larger errors heavier; as a result the largest
errors tend to decrease. In this case the magnitude of the frequency
25
Fig. 22. Maximum error for l
2
and l
100
complex IIR designs.
Fig. 23. Error curve for l
100
complex IIR design.
response went from 0.155 at the stopband edge (in the l2 case) to
0.07 (for the lp design). Figure 23 shows the error function for the
lp design, illustrating the quasiequiripple behavior for large values of
p.
Another fact worth noting from Figure 21 is the increase in the
peak in the right hand side of the passband edge (around f = 0.22).
The lp solution increased the amplitude of this peak with respect to
the corresponding l2 solution. This is to be expected, since this peak
occurs at frequencies not included in the specications, and since the
lp algorithm will move poles and zeros around in order to meet nd
the optimal lp solution (based on the frequencies included for the
lter derivation). The addition of a specied transition band function
(such as a spline) would allow for control of this effect, depending
on the users preferences.
The frequency-dependent FIR problem was rst introduced in
Section II-E. Following the FIR approach, one can design IIR
frequency-dependent lters by merely replacing the linear weighted
least squares step by a nonlinear approach, such as the quasilineariza-
tion method presented in Section III-B13 (as in the complex lp IIR
case). This problem illustrates the exibility in design for lp IRLS-
based methods.
2) Magnitude lp IIR design: The previous sections present algo-
rithms that are based on complex specications; that is, the user must
specify both desired magnitude and phase responses. In some cases it
might be better to specify a desired magnitude response only, while
Fig. 24. Block diagram for magnitude lp IIR method.
allowing an algorihm to select the phase that optimally minimizes the
magnitude error. Note that if an algorithm is given a phase in addition
to a magnitude function, it must then make a compromise between
approximating both functions. The magnitude lp IIR approximation
problem overcomes this dilemma by posing the problem only in terms
of a desired magnitude function. The algorithm would then nd the
optimal phase that provides the optimal magnitude approximation. A
mathematical formulation follows,
min
a,b
_
_
_
_
[D()[

B(; b)
A(; a)

_
_
_
_
p
p
(112)
A critical idea behind the magnitude approach is to allow the
algorithm to nd the optimum phase for a magnitude approximation.
It is important to recognize that the optimal magnitude lter indeed
has a complex frequency response. Atmadji Soewito [35] published
in 1990 a theorem in the context of l2 IIR design that demonstrated
that the phase corresponding to an optimal magnitude approximation
could be found iteratively by updating the desired phase in a complex
approximation scenario. In other words, given a desired complex
response D0 one can solve a complex l2 problem and take the
resulting phase to form a new desired response D
+
from the original
desired magnitude response with the new phase. That is,
Di+1 = [D0[e
j
i
where D0 represents the original desired magnitude response and
e
j
i
is the resulting phase from the previous iteration. This approach
was independently suggested [36] by Leland Jackson and Stephen
Kay in 2008.
This work introduces an algorithm to solve the magnitude lp IIR
problem by combining the IRLS-based complex lp IIR algorithm
from Section III-C1 with the phase updating ideas from Soewito,
Jackson and Kay. The resulting algorithm is robust, efcient and ex-
ible, allowing for different orders in the numerator and denominator
as well as even or uneven sampling in frequency space, plus the
optional use of specied transition bands. A block diagram for this
method is presented in Figure 24.
The overall lp IIR magnitude procedure can be summarized as
follows,
1) Experimental analysis demonstrated that a reasonable initial
solution for each of the three main stages would allow for faster
convergence. It was found that the frequency domain Prony
method by Burrus [28] (presented in Section III-B4) offered
26
a good initial guess. In Figure 24 this method is iterated to
update the specied phase. The outcome of this step would be
an equation error l2 magnitude design.
2) The equation error l2 magnitude solution from the previous
step initializes a second stage where one uses quasilinearization
to update the desired phase. Quasilinearization solves the true
solution error complex approximation. Therefore by iterating
on the phase one nds at convergence a solution error l2
magnitude design.
3) The rest of the algorithm follows the same idea as in the
previous step, except that the least squared step becomes
a weighted one (to account for the necessary lp homotopy
weighting). It is also crucial to include the partial updating
introduced in Section II-B2. By iterating on the weights one
would nd a solution error lp magnitude design.
Figures 25 through 29 illustrate the effectiveness of this algorithm
at each of the three different stages for length-5 lters a and b, with
transition edge frequencies of 0.2 and 0.24 (in normalized frequency)
and p = 30. A linear transition band was specied. Figures 25, 25
and 25 show the equation error l2, solution error l2 and solution error
lp. Figure 28 shows a comparison of the magnitude error functions
for the solution error l2 and lp designs. Figure 29 shows the phase
responses for the three designs.
Fig. 25. Equation error l
2
magnitude design.
From Figures 28 and 29 one can see that the algorithm has changed
the phase response in a way that makes the maximum magnitude
error (located in the stopband edge frequency) to be reduced by
approximately half its value. Furthermore, Figure 28 demonstrates
that one can reach quasiequiripple behavior with relatively low values
of p (for the examples shown, p was set to 30).
Fig. 26. Solution error l
2
magnitude design.
IV. CONCLUSIONS
Digital lters are essential building blocks for signal processing
applications. One of the main goals of this work is to illustrate
the versatility and relevance of lp norms in the design of digital
lters. While popular and well understood, l2 and l lters do tend
to accentuate specic benets from their respective designs; lters
designed using lp norms as optimality criteria can offer a tradeoff
between the benets of these two commonly used criteria. This work
presented a number of applications of Lp norms in both FIR and IIR
lter design, and their corresponding design algorithms and software
implementation.
The basic workhorse for the methods presented in this document
is the Iterative Reweighted Least Squares algorithm, a simple yet
powerful method that sets itself naturally adept for the design of lp
lters. The notion of converting a mathematically complex problem
into a series of signicantly easier optimization problems is common
in optimization. Nevertheless, the existence results from Theorem
1 strongly motivate the use of IRLS methods to design lp lters.
Knowing that optimal weights exist that would turn the solution
of a weighted least squared problem into the solution of a least-p
problem must at the very least captivate the curiosity of the reader.
The challenge lies in nding a robust and efcient method to nd
such weights. All the methods presented in this work work under
this basic framework, updating iteratively the weighting function of
a least squares problem in order to nd the optimal lp lter for a given
application. Therefore it is possible to develop a suite of computer
programs in a modular way, where with few adjustments one can
solve a variety of problems.
Throughout this document one can nd examples of the versatility
of the IRLS approach. One can change the internal linear objective
27
Fig. 27. Solution error lp magnitude design.
Fig. 28. Comparison of l
2
and lp IIR magnitude designs
function from a complex exponential kernel to a sinusoidal one to
solve complex and linear phase FIR lters respectively using the same
algorithm. Further adaptations can be incorporated with ease, such
as the proposed adaptive solution to improve robustness.
Another important design example permits to make p into a
function of frequency to allow for different p-norms in different
Fig. 29. Phase responses for l
2
and lp IIR magnitude designs.
frequency bands. Such design merely requires a few changes in the
implementation of the algorithm, yet allows for fancier, more elegant
problems to be solved, such as the Constrained Least Squares (CLS)
problem. In the context of FIR lters, this document presents the CLS
problem from an lp prespective. While the work by John Adams
[16] set a milestone in digital lter design, this work introduces a
strong algorithm and a different perspective to the problem from
that by Adams and other authors. The IRLS lp-based approach from
this work proves to be robust and exible, allowing for even and
uneven sampling. Furthermore, while a user can use xed transition
bands, one would benet much from using a exible transition
band formulation, where the proposed IRLS-based algorithm literally
nds the optimal transition band denition based on the constraint
specications. Such exibility allows for tight constrains that would
otherwise cause other algorithms to fail meeting the constraint speci-
cations, or simply not converging at all. Section II-F introduced two
problem formulations as well as results that illustrate the methods
effectiveness at solving the CLS problem.
While previous work exists in the area of FIR design (or in linear
lp approximation for that matter), the problem of designing lp IIR
lters has been far less explored. A natural reason for this is the
fact that l2 IIR design is in itself an open research area (and a
rather complicated problem as well). Traditional linear optimization
approaches cannot be directly used for either of these problems,
and nonlinear optimization tools often prove either slow or do not
converge.
This work presents the lp IIR design problem as a natural extension
of the FIR counterpart, where in a modular fashion the linear
weigthed l2 section of the algorithms is replaced by a nonlinear
weighted l2 version. This problem formulation allows for the IIR
implementation of virtually all the IRLS FIR methods presented in
Chapter II. Dealing with the weighted nonlinear l2 problem is a
different story.
The problem of rational l2 approximation has been studied for
some time. However the sources of ideas and results related to
this problem are scattered across several areas of study. One of
the contributions of this work is an organized summary of efforts
in rational l2 optimization, particularly related to the design of IIR
28
digital lters. The work in Section III-B also lays down a framework
for the IIR methods proposed in this work.
As mentioned in Section III-C, some complications arise when
designing IIR lp lters. Aside from the intrinsic l2 problem, it is
necessary to properly combine a number of ideas that allowed for ro-
bust and efcient lp FIR methods. A design algorithm for complex lp
IIF lters were presented in Section III-C1; this algorithm combined
Soewitos quasilinearization with ideas such as lp homotopy, partial
updating and the adaptive modication. In practice, the combination
of these ideas showed to be practical and the resulting algorithm
remained robust. It was also found that after a few p-steps, the internal
l2 algorithm required from one to merely a few iterations on average,
thus maintaining the algorithm efcient.
One of the main contributions of this work is the introduction
of an IRLS-based method to solve lp IIR design problems. By
properly combining the principle of magnitude approximation via
phase updating (from Soewito, Jackson and Kay) with the complex
IIR algorithm one can nd optimal magntiude lp designs. This work
also introduced a sequence of steps that improve the efciency and
robustness of this algorithm, by dividing the design process into three
stages and by using suitable initial guesses for each stage.
Some of the examples in this document were designed using
Matlab programs. It is worth to notice the common elements between
these programs, alluding to the modularity of the implementations.
An added benet to this setup is that further advances in any of the
topics covered in this work can easily be ported to most if not all of
the algorithms.
Digital lter design is and will remain an important topic in digital
signal processing. It is the hope of the author to have motivated in
the reader some curiosity for the use of lp norms as design criteria
for applications in FIR and IIR lter design. This work is by no
means comprehensive, and is meant to inspire the consideration of
the exibility of IRLS algorithms for new lp related problems.
REFERENCES
[1] R. E. Ziemer, W. H. Tranter, and D. R. Fannin, Signals and Systems:
Continuous and Discrete, 4th ed. Prentice Hall, 1998.
[2] J. L. Walsh and T. S. Motzkin, Polynomials of Best Approximation on
an Interval, Proceeedings of the National Academy of Sciences, USA,
vol. 45, pp. 15231528, October 1959.
[3] T. S. Motzkin and J. L. Walsh, Polynomials of Best Approximation
on a Real Finite Point Set I, Trans. American Mathematical Society,
vol. 91, no. 2, pp. 231245, May 1959.
[4] C. L. Lawson, Contributions to the theory of linear least maximum
approximations, Ph.D. dissertation, UCLA, 1961.
[5] J. R. Rice and K. H. Usow, The Lawson Algorithm and Extensions,
Mathematics of Computation, vol. 22, pp. 118127, 1968.
[6] J. R. Rice, The Approximation of Functions. Addison-Wesley, 1964,
vol. 1.
[7] C. S. Burrus, J. A. Barreto, and I. W. Selesnick, Iterative Reweighted
Least-Squares Design of FIR Filters, IEEE Transactions on Signal
Processing, vol. 42, no. 11, pp. 29262936, November 1994.
[8] L. A. Karlovitz, Construction of Nearest Points in the L
p
, p even and
L

norms, I. Journal of Approximation Theory, vol. 3, pp. 123127,


1970.
[9] S. W. Kahng, Best Lp Approximations, Mathematics of Computation,
vol. 26, no. 118, pp. 505508, April 1972.
[10] R. Fletcher, J. A. Grant, and M. D. Hebden, The Calculation of Linear
Best Lp Approximations, The Computer Journal, vol. 14, no. 118, pp.
276279, Apr 1972.
[11] M. Aoki, Introduction to Optimization Techniques. The Macmillan
Company, 1971.
[12] J. A. Barreto, Lp Approximation by the Iterative Reweighted Least
Squares Method and the Design of Digital FIR Filters in One Dimen-
sion, Masters thesis, Rice University, 1992.
[13] C. S. Burrus and J. A. Barreto, Least p-power Error Design of FIR
Filters, in Proc. IEEE Int. Symp. Circuits, Syst. ISCAS-92, vol. 2, San
Diego, CA, May 1992, pp. 545548.
[14] J. Nocedal and S. J. Wright, Numerical Optimization, ser. Springer series
in operations research. New York, NY: Springer-Verlag, 1999.
[15] J. W. Adams and J. L. Sullivan, Peak-Constrained Least-Squares
Optimization, IEEE Trans. on Signal Processing, vol. 46, no. 2, pp.
306321, Febr. 1998.
[16] J. W. Adams, FIR Digital Filters with Least-Squares Stopbands Subject
to Peak-Gain Constraints, IEEE Trans. on Circuits and Systems, vol. 39,
no. 4, pp. 376388, April 1991.
[17] I. W. Selesnick, M. Lang, and C. S. Burrus, Constrained Least Square
Design of FIR Filters without Specied Transition Bands, IEEE Trans-
actions on Signal Processing, vol. 44, no. 8, pp. 18791892, August
1996.
[18] M. Lang, I. W. Selesnick, and C. S. Burrus, Constrained Least Square
Design of 2-D FIR Filters, IEEE Transactions on Signal Processing,
vol. 44, no. 5, pp. 12341241, May 1996.
[19] I. W. Selesnick, M. Lang, and C. S. Burrus, A Modied Algorithm
for Constrained Least Square Design of Multiband FIR Filters Without
Specied Transition Bands, IEEE Transactions on Signal Processing,
vol. 46, no. 2, pp. 497501, Feb. 1998.
[20] R. Fletcher and M. J. D. Powell, A Rapidly Convergent Descent Method
for Minimization, Computer Journal, vol. 6, no. 2, pp. 163168, 1963.
[21] J. E. Dennis and R. B. Schnabel, Numerical Methods for Unconstrained
Optimization and Nonlinear Equations. Philadelphia, PA: SIAM, 1996.
[22] L. S. e. a. T. P. Krauss, Signal Processing Toolbox Users Guide. The
MathWorks, 1994, ch. 2, pp. 143145.
[23] J. T. Spanos and D. L. Mingori, Newton Algorithm for Fitting Transfer
Functions to Frequency Measurements, Journal of Guidance, Control
and Dynamics, vol. 16, no. 1, pp. 3439, January 1993.
[24] D. C. Sorensen, Newtons Method with a Dodel Trust Region Mod-
ication, SIAM Journal of Numerical Analysis, vol. 16, pp. 409426,
1982.
[25] R. Kumaresan and C. S. Burrus, Fitting a Pole-Zero Filter Model to
Arbitrary Frequency Response Samples, Proc. ASILOMAR, pp. 1649
1652, 1991.
[26] F. B. Hildebrand, Introduction to Numerical Analysis. McGraw-Hill,
1974.
[27] E. C. Levy, Complex-Curve Fitting, IRE Transactions on Automatic
Control, vol. AC-4, no. 1, pp. 3743, May 1959.
[28] T. W. Parks and C. S. Burrus, Digital Filter Design. John Wiley and
Sons, 1987.
[29] B. G. C. F. M. R. de Prony, Essai Exp erimental et Analytique: Sur
les lois de la Dilatabilit e des uides elastiques et sur celles de la Force
expansive de la vapeur de leau et de la vapeur de lalkool, ` a diff erentes
temp eratures. Journal de l

Ecole Polytechnique (Paris), vol. 1, no. 2,


pp. 2476, 1795.
[30] H. E. Pad e, Sur la Repr esentation Approch ee dune Fonction par
des Fractions Rationnelles, Annales Scientiques de L

Ecole Normale
Sup erieure (Paris), vol. 9, no. 3, pp. 198, 1892.
[31] C. K. Sanathanan and J. Koerner, Transfer Function Synthesis as a
Ratio of Two Complex Polynomials, IEEE Transactions on Automatic
Control, vol. AC-8, pp. 5658, January 1963.
[32] A. C. M. A. Sid-Ahmed and G. A. Jullien, Computational Techniques
for Least-Square Design of Recursive Digital Filters, IEEE Transactions
on Acoustics, Speech, and Signal Processing, vol. ASSP-26, no. 5, pp.
477480, October 1978.
[33] H. W. S. L. E.McBride and K. Steiglitz, Time-Domain Approximation
by Iterative Methods, IEEE Transactions on Circuit Theory, vol. CT-13,
no. 4, pp. 38187, December 1966.
[34] K. Steiglitz and L. E. McBride, A Technique for the Identication of
Linear Systems, IEEE Transactions on Automatic Control, vol. AC-10,
pp. 46164, October 1965.
[35] A. W. Soewito, Least square digital lter design in the frequency
domain, Ph.D. dissertation, Rice University, December 1990.
[36] L. B. Jackson, Frequency-Domain Steiglitz-McBride Method for Least-
Squares IIR Filter Design, ARMA Modeling, and Periodogram Smooth-
ing, IEEE Signal Processing Letters, vol. 15, pp. 4952, 2008.
[37] A. Antoniou, Digital Filters: Analysis, Design, and Applications, 2nd ed.
McGraw-Hill, 1993.
[38] E. Cunningham, Digital Filtering: An Introduction. Houghton-Mifin,
1992.
[39] E. W. Cheney, Introduction to Approximation Theory, ser. Intl. Series in
Pure and Applied Mathematics. McGraw-Hill, 1966.
[40] V. Chvatal, Linear Programming. Freeman and Co., 1980.
[41] G. Strang, Introduction to Applied Mathematics. Wellesley-Cambridge
Press, 1986.
29
[42] S. A. Ruzinsky, L
1
and L Minimization via a Variant of Karmarkars
Algorithm, IEEE Trans. on Acoustics, Speech and Signal Processing,
vol. 37, no. 2, pp. 245253, Febr. 1989.
[43] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing.
Englewood Cliffs, N.J.: Prentice-Hall, 1989.
[44] C. S. Burrus, J. H. McClellan et al., Computer-based Exercises for Signal
Processing. Prentice Hall, 1994.
[45] R. A. Vargas and C. S. Burrus, Adaptive Iterative Reweighted Least
Squares Design of Lp FIR Filters, Proc. ICASSP, vol. 3, Signal
Processing, Theory and Methods, pp. 112932, 1999.
[46] K. S. Shanmugan and A. M. Breipohl, Random Signals: Detection,
Estimation and Data Analysis. John Wiley & Sons, 1988.
[47] J. G. Proakis and D. G. Manolakis, Digital Signal Processing. Macmil-
lan Publishing Co., 1988.
[48] S.-P. Wu, S. Boyd, and L. Vandenberghe, Fir lter design via spectral
factorization and convex optimization, to Appear in Applied Com-
putational Control, Signal and Communications, Biswa Datt editor,
Birkhauser, 1997.
[49] K. Steiglitz, Computer-Aided Design of Recursive Digital Filters,
IEEE Transactions on Audio and Electroacoustics, vol. AU-18, no. 2,
pp. 123129, June 1970.
[50] A. Deczky, Synthesis of Recursive Digital Filters Using the Minimum
p-error Criterior, IEEE Transactions on Audio and Electroacoustics,
vol. AU-20, no. 4, pp. 257263, October 1972.
[51] R. Kumaresan, Identication of Rational Transfer Functions from
Frequency Response Samples, IEEE Transactions on Aerospace and
Electronic Systems, vol. 26, no. 6, pp. 925934, Novermber 1990.
[52] R. E. Mickens, Difference equations. Van Nostrand Reinhold, 1987.
[53] S. N. Elaydi, An Introduction to Difference Equations, ser. Undergrad-
uate texts in mathematics. New York: Springer, 1996.
[54] D. F. T. Jr., On Fluids, Networks, and Engineering Education, in
Aspects of Network, R. E. Kalman and N. DeClaris, Eds. Hol, Rinehart
and Winston, Inc, 1971, pp. 591612.
[55] M. L. V. Blaricum, A Review of Pronys Method techniques for
Parameter Estimation, in Air Force Statistical Estimation Workshop,
May 1978, pp. 125135.
[56] L. Weiss and R. N. McDonough, Pronys method, Z-transforms , and
Pad e approximation, SIAM Review, vol. 5, no. 2, pp. 145149, April
1963.
[57] I. Barrondale and D. D. Olesky, Exponential Approximation using
Pronys Method, in The Numerical Solution of Nonlinear Problems,
C. T. H. Baker and C. Phillips, Eds. New York: Oxford University
Press, 1981, ch. 20, pp. 258269.
[58] S. L. Marple, Jr, Digital Spectral Analysis with Applications, ser. Signal
Processing Series. Englewood Cliffs, NJ: Prentice-Hall, 1987.
[59] C. F. Gerald and P. O. Wheatley, Applied Numerical Analysis. Reading,
MA: Addison-Wesley, 1984.
[60] H. Cabannes, Ed., Pade Approximates Method and its Applications to
Mechanics, ser. Lecture notes in physics. Springer-Verlag, 1976, no. 47.
[61] G. H. Golub and C. F. V. Loan, Matrix Computations. Johns Hopkins
University Press, 1996.
[62] H. Fan and M. Doroslova cki, On Global Convergence of Steiglitz-
McBride Adaptive Algorithm, IEEE Transactions on Circuits and
Systems II, vol. 40, no. 2, pp. 7387, February 1993.
[63] P. Stoica and T. S oderstr om, The Steiglitz-McBride Identication Al-
gorithm Revisited-Convergence Analysis and Accuracy Aspects, IEEE
Transactions on Automatic Control, vol. AC-26, no. 3, pp. 71217, June
1981.
[64] J. H. McClellan and D. Lee, Exact Equivalence of the Steiglitz-McBride
Iteration and IQML, IEEE Transactions on Signal Processing, vol. 39,
no. 2, pp. 50912, February 1991.
[65] R. E. Bellman and R. E. Kalaba, Quasilinearization and Nonlinear
Boundary-Value Problems. New York: American Elsevier, 1965.
[66] R. A. Vargas and C. S. Burrus, On the Design of Lp IIR Filters with
Arbitrary Frequency Bands, in Proc. ICASSP, vol. 6, 2001, pp. 3829
3832.

You might also like