0% found this document useful (0 votes)

56 views12 pages

Practical Considerations in Fixed-Point FIR Filter Implementations

The document discusses practical considerations for implementing finite impulse response (FIR) filters using fixed-point arithmetic. It covers scaling FIR filter coefficients to fixed-point values, choosing the output word length, sources of quantization noise like truncation and rounding, and noise-shaping techniques. The document provides background on FIR filters and fixed-point number representations before discussing these topics in detail.

Uploaded by

PrithvirajKshatriya-Badgujar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views12 pages

Practical Considerations in Fixed-Point FIR Filter Implementations

Uploaded by

PrithvirajKshatriya-Badgujar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

+∞

X
Digital
n=−∞
Sound Labs
Digital Audio Signal Processing

Practical Considerations in Fixed-Point

FIR Filter Implementations
Randy Yates

MarchÃ13, 2003 1:20

Typeset using PICTEX

Practical Considerations in Fixed-Point FIR Filter Implementations Randy Yates

Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Scaling FIR Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Choosing the FIR Filter Output Word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Quantization Noise in FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1 Truncation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3 Dithering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.4 Noise-shaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Digital Sound Labs ii MarchÃ13, 2003 1:20

Practical Considerations in Fixed-Point FIR Filter Implementations Randy Yates

Suggestions for improvements: 1. Add rightindent to theorem definition 2. Put “Example” in small
caps 3. The derivation leading up to theorem 3 is unclear - even I had a hard time following it! Also,
the use of alternate symbols in theorem 3 is unclear. 4. Expand the warning after theorem 3 to explain
coefficient quantization error effects (frequency response). 5. Add references. 6. Change “greatly” to
“significantly”. 7. Consider combining this paper and the FP paper into one “book.” 8. Replace comma
with period in the equation just prior to equation (2). 9. Clarify that “maximum coefficient length”
means the size of the register that holds a coefficient. 10.

1 Introduction

1.1 Motivation

The most basic type of filter in digital signal processing is the Finite Impulse Response (FIR) filter. By
definition, a filter is classified as FIR if it has a z-transform of the form

b0 z N −1 + b1 z N −2 + . . . + bN −2 z + bN −1
H(z) = , bi ∈ <, N, M ∈ Z, N > 0, z ∈ C,
z M −1
where < denotes the reals, Z denotes the integers, and C denotes the complex numbers. This is referred
to as an N -tap FIR filter. In general, an FIR filter can be either causal or non-causal. However, FIR
filters are always stable, and indeed that is the chief reason they are widely utilized.

The difference equation that results from H(z) is

y[n] = b0 x[n + N − M ] + b1 x[n + N − M − 1] + . . . + bN −2 x[n − M − 2] + bN −1 x[n − M + 1]

N
X −1
= bi x[n + N − M − i].
i=0

If N = M , this simplifies to

y[n] = b0 x[n − 0] + b1 x[n − 1] + . . . + bN −2 x[n − (N − 2)] + bN −1 x[n − (N − 1)]

N
X −1
= bi x[n − i].
i=0

This is the familiar result of the discrete convolution of the filter with the input data.

The equations above are the idealized, mathematical representations of an FIR filter because the arith-
metic operations of addition, subtraction, multiplication, and division are performed over the field of real
numbers (<, +, ×), i.e., in the real number system (or over the complex field if the data or coefficients
contain imaginary values). In practice, both the coefficients and the data values are constrained to be
fixed-point rationals (see “Fixed Point Arithmetic: An Introduction”), a subset of the rationals. While
this set is closed, it is not “bit bounded”, i.e., the number of bits required to represent a value in the
fixed-point rationals can be arbitrarily large. In a practical system one is limited to a finite number
of bits in the words used for the filter input, coefficients and filter output. Most current digital signal
processors provide arithmetic logic units and memory architectures to support 16 bit, 24 bit, or 32 bit
wordlengths, however, one may implement arbitrarily long lengths by customizing the multiplications
and additions in software and utilizing more processor cycles and memory. Similar choices can be made
in digital hardware implementations. The final choices are governed by many aspects of the design such
as required speed, power consumption, SNR, cost, etc.

Digital Sound Labs 1 MarchÃ13, 2003 1:20

Practical Considerations in Fixed-Point FIR Filter Implementations Randy Yates

1.2 Conventions
We shall represent scaled quantities using the U (a, b) and A(a, b) notation described in “Fixed Point
Arithmetic: An Introduction”.
There are generally two methods of operating on fixed-point data used today - integer and fractional.
The integer method interprets the data as integers (either natural binary or signed two’s complement)
and performs integer arithmetic. For example, the Texas Instruments TMS320C54x DSP is an integer
machine. The fractional method assumes the data are fixed-point rationals bounded between -1 and +1.
The Motorola 56002 DSP is an example of a machine which uses fractional arithmetic. Except for an
extra left shift performed in fractional multiplies, these two methods can be considered equivalent. In
this article we shall utilize the integer method because I find it simpler and I am more familiar with it.

2 Scaling FIR Coefficients

Consider an FIR filter with N coefficients b0 , b1 , . . . , bN −1 , bi ∈ <. From “Fixed Point Arithmetic: An
Introduction”, we see that in fixed-point arithmetic a binary word can be interpreted as an unsigned
or signed fixed-point rational. Although there are a number of situations in which the filter coefficients
could be the same sign (and thus could be represented using unsigned values), let us assume they are
not and hence that we must utilize signed fixed-point rationals for our coefficients. Thus we must find
a way of representing, or more accurately, of estimating, the filter coefficients using signed fixed-point
rationals.
Since a signed fixed-point rational is a number in the form Bi /2b , where Bi and b are integers, −2M −1 ≤
Bi ≤ 2M −1 − 1, and M is the wordlength used for the coefficients, we determine the estimate b0i of
coefficient bi by choosing a value for b and then determining Bi as
Bi = round(bi ∗ 2b ).
Then
b0i = Bi /2b .
In general, b0i is only an estimate of bi because of the rounding operation. This approximation phe-
nomenom is referred to as coefficient quantization because, in a real sense, we are quantizing the co-
efficients in amplitude just exactly like an A/D converter amplitude quantizes an analog input signal.
We can determine the “quantization error” ei between the estimate and the real value by taking their
difference:
ei = b0i − bi
= Bi /2b − bi
round(bi ∗ 2b )
= − bi
2b
= round(bi , −b) − bi ,
where “round(x, y)” denotes rounding at bit y of the binary value x. The value y = 0 rounds at the
units bit, with negative values going to the right of the decimal and positive values going to the left of
the units bit. For example, round(1.0010110, −5)= 1.00110.
The question we have not yet answered is: How do we choose b? In order to answer this, note that the
maximum error eimax a quantized coefficient can have will be one-half of the bit being rounded at, i.e.,
eimax = 2−b /2
= 2−b−1 .

Digital Sound Labs 2 MarchÃ13, 2003 1:20

Practical Considerations in Fixed-Point FIR Filter Implementations Randy Yates

It is now easy to see that, lacking any other criteria, the ideal value for b is the maximum it can be
since that will result in the least amount of coefficient quantization error. Well just what exactly is the
maximum, anyway? After all, b is from the integers, and the integers go to infinity. So the maximum is
infinity, right?
Well, no. Again, considering the coefficient wordlength to be M (bits), note that a signed, two’s comple-
ment value has a maximum magnitude of 2M −1 − 1. Therefore we must be careful not to choose a value
for b which will produce a Bi that has a magnitude bigger than 2M −1 − 1. When a value becomes too big
to be represented by the representation we have chosen (in this case, M -bit signed two’s complement),
we say that an overflow has occurred. Thus we must be careful to choose a value for b that will not
overflow the largest magnitude coefficient. We may compute this maximum value for b as
¡ ¢
b = blog2 (2M −1 − 1)/max(|bi |) c,
where bxc denotes the greatest integer less than or equal to x.
In summary we see that, lacking any other criteria, the ideal value for b is the maximum value which
can be used without overflowing the coefficients since that provides the minimum coefficient quanization
error. We emphasize this important result by stating the following

1. First Coefficient Scaling Theorem. Let bi be a set of coefficients with scale factor b. Maximum
precision is preserved when b is chosen to be the maximum integer possible without overflowing the
coefficient representation, i.e.,
¡ ¢
b = blog2 (2M −1 − 1)/max(|bi |) c, (1)
where M is the coefficient wordsize in bits.

Example 1
Consider a 4-tap FIR filter with the following coefficients:
b0 = +1.2830074
b1 = −2.3994138
b2 = +0.1234689
b3 = +0.0029153
Assuming 16-bit wordlengths, find a) the scaling factor b, and b) the coefficient estimates b0i using rule 1.
Solution:

¡ ¢
b = blog2 (2M −1 − 1)/max(|bi |) c
¡ ¢
= blog2 (216−1 − 1)/2.3994138) c
= b13.73727399c
= 13.
13
Since 2 = 8192,
b00 = round(+1.2830074 × 8192)/8192
= +1.2829589843750
b01 = round(−2.3994138 × 8192)/8192
= −2.3994140625000
b02 = round(+0.1234689 × 8192)/8192
= +0.1234130859375
b03 = round(+0.0029153 × 8192)/8192
= +0.0029296875000

Digital Sound Labs 3 MarchÃ13, 2003 1:20

Practical Considerations in Fixed-Point FIR Filter Implementations Randy Yates

So that’s it, right? We now know everything there is to know about coefficient scaling, right?

Well, no. Remember when I said, “...lacking any other criteria...”? Well, guess what - there are other criteria.

Adding two J-bit values requires J + 1 bits in order to maintain precision and avoid overflow when there is no a
priori knowledge about the values being added. For example, if the 16-bit signed two’s complement values 21,583
and 12,042 are summed, the result is 33,625. Since the maximum value for a 16-bit signed two’s complement
number is 32,767, we must add an extra bit to avoid overflowing. Also, since the result is odd, the least-significant
bit (bit 0) is set, so we cannot simply take the upper 16 bits of the 17 bit result without losing precision. As
a counterexample, consider processing a stream of data in which any two adjacent samples are known to be of
opposite signs. In this case, we would be able to guarantee that the sum of two adacent J-bit samples would
never overflow J bits.

We may easily extend this rule to sums of multiple values and state the result as the

2. Fixed-Point Summation Theorem. The sum of N J-bit values requires J + dlog2 N e bits to
maintain precision and avoid overflow if no information is known about the values.

Let us consider an N -tap FIR filter which has L-bit data values and M -bit coefficients. Then using the relations
above, the final N -term sum required at each time interval n,

y[n] = b00 x[n] + b01 x[n − 1] + b02 x[n − 2] + . . . + b0N −1 x[n − N + 1],

requires L + M + log2 N bits in order to maintain precision and avoid overflow if no information is known about
the data or the coefficients. For example, a 64-tap FIR filter (N = 64) with 16-bit coefficients and data values
(L = M = 16) requires L + M + log2 (N ) = 32 + log2 (64) = 32 + 6 = 38 bits in order to maintain precision and
avoid overflow.

Most processors and hardware components provide the ability to multiply two M -bit values together to form
a 2M -bit result. For example, the Integrated Device Technolgy 7210 multiplier-accumulator performs 16x16
multiplies to a 32-bit result. Most general purpose and some DSP processors provide an accumulator that is the
same width as the multiplier output. For example, the Texas Instruments TMS320C50 DSP provides a 16x16
multiplier and a 32-bit accumulator. Some DSP processors provide a 2M + G-bit accumulator, where G denotes
“guard bits” (to be explained shortly). For example, the Texas Instruments TMS320C54x DSP provides a 16x16
multiplier with a 32-bit output and a 40-bit accumulator (M = 16, G = 8).

Therefore another criteria in the design of FIR filters is that the final convolution sum fit within the accumulator.
To put it algebraically, we require that
2M + log2 N ≤ 2M + G,
where we have assumed that the coefficient wordlength and the data wordlength is the same (M bits), and where
we have assumed we have no information about the data or the coefficients. The key point here is that the
number of bits required for the filter output increase with the length of the filter.

For those situations in which G = 0 (e.g., the TMS320C50), we see that we immediately have a problem for
even a two-tap FIR filter since that filter requires 2M + log2 2 = 2M + 1 bits and the accumulator is only 2M
bits. This is precisely why the extra G bits which are available on some processors are called “guard bits” - they
guard against overflow when performing summations. However, even though the accumulator may have guard
bits, it is still possible to overflow the accumulator if log2 N > G, i.e., if we attempt to use a filter that is longer
than 2G taps.

The easiest solution is to simply decree that we shall maintain an optimistic outlook. In other words, we will
acknowledge that our filter won’t work for “the most general case” and hope and pray that those cases (i.e.,
those combinations of N data values) which would result in overflow for our filter will never occur. However, this
is rather like sticking one’s head in the sand, because if and when overflows occur, they can be catastrophic. In
signed two’s complement systems, overflows cause abrupt variations in output levels which, in the case of digital
audio, are very audible to say the least and extremely rude to be more accurate.

Digital Sound Labs 4 MarchÃ13, 2003 1:20

Practical Considerations in Fixed-Point FIR Filter Implementations Randy Yates

Another solution is to redesign the filter to use fewer taps. However, if there are no guard bits, then the filter
would be reduced to a gain control (i.e., 1 tap), and even with guard bits, the number of filter taps is usually at
a premium to begin with anyway (i.e., we can almost always use more taps to implement a better filter).

Yet another solution is to scale down the data values by K bits before applying the filter, thus allowing 2K
more taps in the filter before overflowing. This is, in general, a horrible idea because it greatly degrades the
signal-to-noise ratio of the signal path by 6 dB per bit.

A better solution is to modify the way in which we use equation (1) to scale the coefficients. Since the M we use
in equation (1) is effectively the number of bits used for the coefficients, we can simply use an alternate value that
is smaller than M bits available in our hardware. After all, just because we have an M -bit wordlength available
for the coefficients doesn’t mean we have to use all M bits. Therefore let us use M 0 bits for the coefficients,
where M 0 ≤ M .

What size shall we make M 0 ? Let us (lettuce?) calculate it based on the width of the accumulator:

M + M 0 + log2 N = 2M + G =⇒ M 0 = min(M, M + G − log2 N ),

We summarize this section with

3. Second Coefficient Scaling Theorem. If no information is known about the data or the coeffi-
cients, then the coefficient wordlength M 0 must be

M 0 = min(M, A − L − log2 N ), (2)

in order to avoid overflow and preserve precision in an N -tap FIR filter output, where M is the maximum
coefficient wordlength, A is the accumulator wordlength, and L is the data wordlength.

WARNING: There is a cost associated with this solution: increased coefficient quantization error. This fact
should not be overlooked when weighing the options.

Example 2

We continue with the 4-tap FIR filter we used in example 1, Assume the maximum coefficient wordlength is 16
bits, the data wordlength is 16 bits and the accumulator wordlength is 32 bits.

a. Find the value for M 0 , i.e., the effective coefficient wordlength that will avoid overflow and guarantee precision
is preserved in the filter output using rule 2.

b. Substitute this result into coefficient scaling rule 1 to obtain b0 , the new coefficient scaling.

Solution:

a. Simply plug the numbers into equation (2):

M 0 = min(M, A − L − log2 N )
= min(16, 32 − 16 − log2 4)
= min(16, 32 − 16 − 2)
= min(16, 14)
= 14.

b. Substitute M=14 into equation (1):

¡ ¢
b0 = blog2 (2M −1 − 1)/max(|bi |) c
¡ ¢
= blog2 (214−1 − 1)/2.3994138) c
= b11.73731892c
= 11.

Digital Sound Labs 5 MarchÃ13, 2003 1:20

Practical Considerations in Fixed-Point FIR Filter Implementations Randy Yates

We see that the reduction of wordlength by 2 bits in part a also results in a reduction in the coefficient scale
factor by 2 bits and thus increases the coefficient quantization error. This is the price paid for ensuring the result
will not overflow.

So now we’re really done, right? We certainly must now know everything there is to know about coefficient
scaling, right?

Well, no. Remember when I said, “...if no information is known about the data or coefficients...”? It is often the
case that the coefficient values are known at design time (and won’t change). Therefore we do have information
about the coefficients. How can we use this information to improve our filter architecture?

Since we are constantly concerned about overflow in fixed-point digital signal processing, let us begin by consid-
ering what combination (or combinations) of N input data values will provide maximum output from a given
N -tap FIR filter. In order to answer this, recall the triangle inequality:

|a + b| ≤ |a| + |b|.

Using the obvious relation a + b ≤ |a + b|, we then have

a + b ≤ |a + b| ≤ |a| + |b| ⇒ a + b ≤ |a| + |b|.

We may generalize this 2-term sum to an N -term sum. This means that the signs of x[k] that will make the
terms bi x[n − i] all positive in the convolution sum

X
N −1

y[n] = bi x[n − i],

i=0

will result in larger output. This occurs when sgn(x[n − i]) = sgn(bi ). We may therefore rewrite the set of
x[n − i]s that maximize the output as

x[n − i] = sgn(bi )|x[n − i]|.

Our convolution sum now looks like this:

X
N −1

y[n] = bi x[n − i]
i=0

X
N −1

= bi (sgn(bi )|x[n − i]|).

i=0

But note that sgn(r)r = |r| for any real value r. Therefore bi sgn(bi ) = |bi |, and we have

X
N −1

y[n] = |bi ||x[n − i]|.

i=0

What further property could we assign to x[n − i] that would maximize this sum? It should be obvious that if
we maximize all the magnitudes of x[n], then we maximize the sum. Therefore let |x[n − i]| = xM AX , where
xM AX denotes the maximum magnitude possible for x[n − i]. Then

X
N −1

yM AX [n] = |bi |xM AX

i=0

X
N −1

= xM AX |bi |.
i=0

Digital Sound Labs 6 MarchÃ13, 2003 1:20

Practical Considerations in Fixed-Point FIR Filter Implementations Randy Yates

Also, denote the sum on the right by α, i.e.,

X
N −1

α= |bi |.
i=0

Let’s pause and consider our results so far. Basically, we see that the maximum output value of a filter is a
function of the “coefficient area” (α) in the filter. This seems intuitively obvious. Also obvious is the fact that
reducing the overall gain of the filter reduces the maximum filter output as well.

So far we have been operating in the infinite-precision (i.e., real) domain. Now express this result in terms of
the unscaled integers X and Bi , where x is scaled A(ax , bx ) and bi is scaled A(ab , bb ) so that x = X/2bx and
bi = Bi /2bb :
X
N −1

yM AX [n] = (xM AX )( |bi |)

i=0

= (xM AX )(α)
= (XM AX /2bx )(α).
Let us use the previous notation of A for accumulator wordlength, L for the data wordlength, and M for the
coefficient wordlength. Rules of fixed-point arithmetic dictate that the scaling of the result yM AX [n] will be
A(A − bx − bb − 1, bx + bb ). Thus

YM AX [n]/2bx +bb = yM AX [n]

= (XM AX /2bx )(α)
YM AX [n] = 2bb αXM AX .

We know that the maximum of any T -bit signed two’s complement integer is 2T −1 − 1, which, when T >> 0,
can be approximated as simply 2T −1 . Therefore we can express the last result as

2A−1 ≥ 2bb α2L−1 .

Note the use of the inequality since α will seldom be an exact power of two. Take the log base 2 of both sides
and solve for bb :
A − 1 ≥ bb + log2 α + L − 1
bb ≤ A − L − log2 α
⇒ bb = A − L − dlog2 αe.
This important result says that in order to avoid overflow in the output the maximum value for the coefficient
scale factor bb is established by the accumulator wordlength A, the data wordlength L, and the coefficient area
α.

Let us take stock then of the current situation. There are three criteria that the coefficient scale factor bb seeks
to satisfy:

1. We seek to maximize bb in order to reduce coefficient quantization error.

2. Given a maximum coefficient filter length M , we seek to constrain bb in order that the coefficient with the
largest magnitude is representable.
3. Given the accumlator wordlength A, the data wordlength L, and the information about the coefficients we
call the coefficient area α, we seek to constrain bb so that overflows in the convolution sum are avoided.

Hence we see that the value for bb that meets all three criteria is given by the following function:
¡ ¢
bb = min(blog2 (2M −1 − 1)/max(|bi |) c, A − L − dlog2 αe) (3)

Example 3

Consider the 16-tap FIR filter b0 = 1 and b1 , b2 , . . . , b15 = 0. Assuming an accumulator wordlength of 32 bits,
a data wordlength of 16 bits, and a coefficient wordlength of 16 bits, use equation (3) to establish the optimum
value for the coefficient scale factor bb .

Digital Sound Labs 7 MarchÃ13, 2003 1:20

Practical Considerations in Fixed-Point FIR Filter Implementations Randy Yates

Solution:
Calculate α:
X
15

α= |bi |
i=0

= 1.
Then ¡ ¢
bb = min(blog2 (2M −1 − 1)/max(|bi |) c, A − L − dlog2 αe)
¡ ¢
= min(blog2 (216−1 − 1)/1) c, 32 − 16 − dlog2 1e)
= min(14, 16)
= 14.
In this case we see that the limiting factor is that which allows the coefficients to be representable.

Example 4
Consider the 16-tap FIR filter b0 , b1 , b2 , . . . , b15 = 0.0625. Assuming an accumulator wordlength of 32 bits, a
data wordlength of 16 bits, and a coefficient wordlength of 16 bits, use equation (3) to establish the optimum
value for the coefficient scale factor bb .
Solution:
Calculate α:
X
15

α= |bi |
i=0

= 1.
Then ¡ ¢
bb = min(blog2 (2M −1 − 1)/max(|bi |) c, A − L − dlog2 αe)
¡ ¢
= min(blog2 (216−1 − 1)/.0625) c, 32 − 16 − dlog2 1e)
= min(18, 16)
= 16.
In this case we see that the limiting factor is that which avoids overflow in the accumulator.

Example 5
Consider the 16-tap FIR filter b0 , b1 , b2 , . . . , b15 = 0.0625. Assuming an accumulator wordlength of 40 bits, a
data wordlength of 16 bits, and a coefficient wordlength of 16 bits, use equation (3) to establish the optimum
value for the coefficient scale factor bb .
Solution:
Calculate α:
X
15

α= |bi |
i=0

= 1.
Then ¡ ¢
bb = min(blog2 (2M −1 − 1)/max(|bi |) c, A − L − dlog2 αe)
¡ ¢
= min(blog2 (216−1 − 1)/.0625) c, 40 − 16 − dlog2 1e)
= min(18, 24)
= 18.

Digital Sound Labs 8 MarchÃ13, 2003 1:20

Practical Considerations in Fixed-Point FIR Filter Implementations Randy Yates

In this case we see that the limiting factor is that which allows the coefficients to be representable, but only
because this accumulator has 8 guard bits, otherwise overflow in the accumulator would limit bb as in example
4. Also note that the extra accumulator guard bits allow the coefficient quantization error to be less than in
example 4.

3 Choosing the FIR Filter Output Word

As stated earlier, a DSP or hardware multiplier has an output wordsize that is usually two or more times the
size of the input wordsize, e.g., the TI TMS320C54x has a 40-bit accumulator with a 32-bit multiplier and
typically operates on data and coefficient wordsizes of 16 bits. It is therfore normally the case that a subset of
the accumulator bits must be chosen for the final filter output. How do we choose these bits?
Given a set of N signed, unscaled two’s complement coefficients Bi , i = 0, 1, . . . , N −1, and an input wordsize of L
bits, the number of bits required to maintain precision while simultaneously avoiding overflow in the convolution
sum is
Γ = L + log2 (α),
where
X
N −1

α= |Bi |.
i=0

Therefore, in order to avoid overflow when truncating this width Γ to the output wordlength K, we extract bits
Γ − K to Γ − 1, numbering the LSB of the accumulator as bit 0.
For example, if L = 16 and α = 32, then
Γ = L + log2 (α)
= 16 + log2 (32)
= 21,
and if the output wordlength K = 16, then you must take bits 5 to 20,
If the accumulator size of A is less than Γ, then the filter may overflow the accumulator during the convolution
sum. In this case, the best choice is to choose the top K bits of the accumulator.

4 Quantization Noise in FIR Filters

[under construction]

4.1 Truncation

4.2 Rounding

4.3 Dithering

4.4 Noise-shaping

5 Conclusions
My hope is that this article will allow the FIR filter designer to clearly see the effects that the choices of
wordlength, scaling, and processing architecture have on signal integrity, and that the material is clear and
accurate. Errors, suggestions, etc., should be mailed to [email protected].

Digital Sound Labs 9 MarchÃ13, 2003 1:20

Practical Considerations in Fixed-Point FIR Filter Implementations Randy Yates

6 References

Digital Sound Labs 10 MarchÃ13, 2003 1:20

Dsp Lab Cep 2025
No ratings yet
Dsp Lab Cep 2025
4 pages
diadem_help_signal_analysis_2025-03-25-17-04-34
No ratings yet
diadem_help_signal_analysis_2025-03-25-17-04-34
27 pages
Download full Digital Signal Processing Laboratory Second Edition Kumar ebook all chapters
100% (8)
Download full Digital Signal Processing Laboratory Second Edition Kumar ebook all chapters
82 pages
Introduction_to_DSP_2
No ratings yet
Introduction_to_DSP_2
24 pages
Digital Signal Processors - 2021
No ratings yet
Digital Signal Processors - 2021
12 pages
Vlsi Verilog - Fir Filter Design Using Verilog
No ratings yet
Vlsi Verilog - Fir Filter Design Using Verilog
5 pages
Simple Efficient Digital Filters For Specific Applications DC Blockers
No ratings yet
Simple Efficient Digital Filters For Specific Applications DC Blockers
4 pages
Lecture24 Signal Proc
No ratings yet
Lecture24 Signal Proc
16 pages
An4841 Digital Signal Processing For Stm32 Microcontrollers Using Cmsis Stmicroelectronics
No ratings yet
An4841 Digital Signal Processing For Stm32 Microcontrollers Using Cmsis Stmicroelectronics
25 pages
Experiment Lab Report 6
No ratings yet
Experiment Lab Report 6
8 pages
Project Definitions
100% (1)
Project Definitions
29 pages
DSP-UNIT-IV-PPT-1
No ratings yet
DSP-UNIT-IV-PPT-1
99 pages
Finite Materi Rico PDF
No ratings yet
Finite Materi Rico PDF
25 pages
2008 Regulations Question Paper
100% (2)
2008 Regulations Question Paper
15 pages
Week 2 ECE-852 pak Austria
No ratings yet
Week 2 ECE-852 pak Austria
29 pages
an219
No ratings yet
an219
25 pages
UG - EC303 DSP Part-9 FIR in C55x PDF
No ratings yet
UG - EC303 DSP Part-9 FIR in C55x PDF
27 pages
Efficient Very Large-Scale Integration Architecture Design of Proportionate-Type Least Mean Square Adaptive Filters
No ratings yet
Efficient Very Large-Scale Integration Architecture Design of Proportionate-Type Least Mean Square Adaptive Filters
7 pages
Introduction To FIR Filter Design
No ratings yet
Introduction To FIR Filter Design
34 pages
DSP
No ratings yet
DSP
30 pages
AD9548 GPS Disciplined Stratum 2 Clock
No ratings yet
AD9548 GPS Disciplined Stratum 2 Clock
16 pages
Quanti Effect IIR DSP
No ratings yet
Quanti Effect IIR DSP
7 pages
Topic17 FIR and IIR Filters
100% (1)
Topic17 FIR and IIR Filters
9 pages
Advanced Filter Design
No ratings yet
Advanced Filter Design
83 pages
Digital Signal Analysis and Processing (CT704)
No ratings yet
Digital Signal Analysis and Processing (CT704)
11 pages
Implementing Bit-Serial Digital Filters in At6000 Fpgas
No ratings yet
Implementing Bit-Serial Digital Filters in At6000 Fpgas
9 pages
Fundamentals of Digital Signal Processing
No ratings yet
Fundamentals of Digital Signal Processing
26 pages
B techecesyllabusOLD
No ratings yet
B techecesyllabusOLD
61 pages
Reprort On Adaptive Filter 1
No ratings yet
Reprort On Adaptive Filter 1
38 pages
4 Ijcsi
No ratings yet
4 Ijcsi
10 pages
Fir and Iir Digital Filter Design Guide
No ratings yet
Fir and Iir Digital Filter Design Guide
11 pages
Syllabus Electrical Drives
No ratings yet
Syllabus Electrical Drives
7 pages
Group - A (Short Answer Questions) : S. No Blooms Taxonomy Level Course Outcome
No ratings yet
Group - A (Short Answer Questions) : S. No Blooms Taxonomy Level Course Outcome
15 pages
Implementation of Higher-Order IIR Filter Using 8051 Microcontroller
No ratings yet
Implementation of Higher-Order IIR Filter Using 8051 Microcontroller
9 pages
Download full (eBook PDF) Digital Signal Processing First, eBook, Global Edition ebook all chapters
100% (2)
Download full (eBook PDF) Digital Signal Processing First, eBook, Global Edition ebook all chapters
51 pages
Finite Word length effect in digital filters
No ratings yet
Finite Word length effect in digital filters
8 pages
المرشحات الرقمية
No ratings yet
المرشحات الرقمية
25 pages
Q.Explain Aliasing Concept With Example
No ratings yet
Q.Explain Aliasing Concept With Example
15 pages
Lab # 07 IIR PDF
No ratings yet
Lab # 07 IIR PDF
13 pages
OverSampling Averaging
No ratings yet
OverSampling Averaging
6 pages
Signal Processing Examples Using The TMS320C67x Digital Signal Processing Library (DSPLIB)
No ratings yet
Signal Processing Examples Using The TMS320C67x Digital Signal Processing Library (DSPLIB)
18 pages
Example: Design A Low Pass Filter, Butterworth, With 3dB Bandwith of 500Hz
No ratings yet
Example: Design A Low Pass Filter, Butterworth, With 3dB Bandwith of 500Hz
22 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
5 pages
Digfilt
No ratings yet
Digfilt
237 pages
Updated DSP Assignment
No ratings yet
Updated DSP Assignment
5 pages
Fixed-Point Design: SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications
No ratings yet
Fixed-Point Design: SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications
57 pages
Understanding The Sampling Process: Mixed-Signal
No ratings yet
Understanding The Sampling Process: Mixed-Signal
7 pages
Lab Notes 4
No ratings yet
Lab Notes 4
30 pages
Finite Impulse Response (FIR) Filters
No ratings yet
Finite Impulse Response (FIR) Filters
39 pages
5filter Design Equilization
No ratings yet
5filter Design Equilization
11 pages
Digital Filters - Implementation and Design: Basic Filtering Operations
No ratings yet
Digital Filters - Implementation and Design: Basic Filtering Operations
19 pages
Digital Filter Design
No ratings yet
Digital Filter Design
48 pages
Libro Ifeachor DSP
100% (3)
Libro Ifeachor DSP
862 pages
Lab # 06 PDF
No ratings yet
Lab # 06 PDF
12 pages
EMG Controlled Prosthetic Hand
No ratings yet
EMG Controlled Prosthetic Hand
14 pages
High Speed and Low Power FPGA Implementation of FIR Filter For DSP Applications
No ratings yet
High Speed and Low Power FPGA Implementation of FIR Filter For DSP Applications
10 pages
Ee8591-Digital Signal Processing
No ratings yet
Ee8591-Digital Signal Processing
128 pages
Digfilt
No ratings yet
Digfilt
238 pages
Active Noise Control: A Tutorial Review: Sen M. Kuo Dennis R. Morgan
No ratings yet
Active Noise Control: A Tutorial Review: Sen M. Kuo Dennis R. Morgan
31 pages
High Speed and Low Power FPGA Implementation of FIR Filter For DSP Applications
No ratings yet
High Speed and Low Power FPGA Implementation of FIR Filter For DSP Applications
10 pages
AVR223 - Digital Filters With AVR
No ratings yet
AVR223 - Digital Filters With AVR
24 pages
EC6502 Principles of Digital Signal Processing
No ratings yet
EC6502 Principles of Digital Signal Processing
320 pages
Fixed Point Signal Processing by W Paddget
100% (1)
Fixed Point Signal Processing by W Paddget
133 pages
We Should Forget About Small Efficiencies, Say About 97% of The Time: Premature Optimization Is The Root of All Evil. - D. Knuth
No ratings yet
We Should Forget About Small Efficiencies, Say About 97% of The Time: Premature Optimization Is The Root of All Evil. - D. Knuth
32 pages
EEE 420 Digital Signal Processing: Instructor: Erhan A. Ince E-Mail
No ratings yet
EEE 420 Digital Signal Processing: Instructor: Erhan A. Ince E-Mail
19 pages
Lecture 7
No ratings yet
Lecture 7
42 pages
Fixed-Point Signal Processing
No ratings yet
Fixed-Point Signal Processing
133 pages
Arduino Signal Processing
No ratings yet
Arduino Signal Processing
7 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
53 pages
DSP Syllabus
No ratings yet
DSP Syllabus
2 pages
FIR Filter
No ratings yet
FIR Filter
5 pages
DF Lesson 02
No ratings yet
DF Lesson 02
78 pages
Unit V Finite Word Length Effects in Digital Filters
75% (4)
Unit V Finite Word Length Effects in Digital Filters
3 pages
Summary Dig Filt
No ratings yet
Summary Dig Filt
21 pages
Practical Considerations in Fixed-Point FIR Filter Implem
No ratings yet
Practical Considerations in Fixed-Point FIR Filter Implem
15 pages
DSP Ai
No ratings yet
DSP Ai
113 pages
Signal Processing: Understanding Digital
No ratings yet
Signal Processing: Understanding Digital
7 pages
The Just Intonation Primer
From Everand
The Just Intonation Primer
David B Doty
No ratings yet
Introduction To Digital Signal Processing
90% (10)
Introduction To Digital Signal Processing
487 pages
Digital Signal Processing R13 Previous Papers
100% (1)
Digital Signal Processing R13 Previous Papers
5 pages
DSP Book by Nagoorkani
No ratings yet
DSP Book by Nagoorkani
11 pages
Kochar Inderkumar Asst. Professor MPSTME, Mumbai
No ratings yet
Kochar Inderkumar Asst. Professor MPSTME, Mumbai
66 pages
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Audio, Video, and Media in the Ministry
From Everand
Audio, Video, and Media in the Ministry
Clarence Floyd Richmond
No ratings yet
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Software Patterns Made Easy
From Everand
Software Patterns Made Easy
Justice Nanhou
No ratings yet