Digital filter design using VHDL

DIGITAL FILTER DESIGN
1
CONTENTS
1.INTRODUCTION……………………………………………………………...5
2. ELECTRICALFILTER……………………………………………6
3.COMPARISON OF IIR & FIR FILTER…………………………..8
I.BUTTERWORTH FILTER
II.ELIPTICAL FILTER
III.CHEBYCHEV FILTER
4.EFFECT OF POLES & ZEROES………………………………….11
5.BI-LINEAR TRANSFORMATION……………………………..…12
6.IIR FILTER REALIZATION……………………………………...18
I.DIRECT FILTER REALIZATION
II.CASCADE FILTERREALIZATION
7.VHDL:THE LANGUAGE……………………………………..…..23
I.LEVELS OF ABSTRACTION
II.BIT PARALLEL ARITHMATIC
A.ADDITION & SUBSTRACTION
B.MULTIPLICATION
III.BIT SERIAL AITHMATIC
A.ADDITION & SUBSTRACTION
B.MULTIPLICATION
C.SHIFT & ADD MULTIPLIERS
D.SHIFT & PARALLEL MULTIPLIERS
E.LATENCY
F.THROUGHPUT
8.IMPLEMENTATION& ANALYSIS OF SUB-BLOCKS…….…37
I.ADDER
II.DELAY
III.SERIAL-PARALLEL MULTIPLIER
IV.BOOTH MULTIPLIER
V.MAC
9.IMPLEMENTATION& ANALYSIS OF FIR FILTERS…….….45
I.DIRECT FOR OF REALIZATION
A.USING BIT PARALLEL ARITHMATIC

2
B.USING BIT SERIAL ARITHMATIC
C.AREA ANALYSIS
D.POWER ANALYSIS
II.CASCADE REALIZATION
A.USING BIT PARALLEL ARITHMATIC
B.USING BIT SERIAL ARITHMATIC
C.AREA ANALYSIS
D.POWER ANALYIS
10.CONCLUSION………………………………………………….…….76
11.FUTURE PLANS……………………………………………………...77
12.VHDL CODES FOR FIR FILTERS…………………………….…...78
I.USING BIT PARALLEL ARITHMATIC
A.4 BIT COUNTER
B.BOOTH MULTIPLIER
C.16 BIT FULL ADDER
D.MULTIPLIER
E.SERIAL PARALLEL CONVERTER
F.FIR FILTER
G.ALU DESIGN
II.USING BIT SERIAL ARITHMATIC
A.D FLIP-FLOP
B.FULL ADDER
C.HALF ADDER
D.RIGHT SHIFTER
E.DELAY
F.PIPEINE
G.FIR FILTER
13.REFERENCES……………………………………………………………………..100

3
LIST OF IMAGES
FIG1:MAGNITUDE RESPONSE OF BUTTERWORTH FILTER………………………………..8
FIG2:MAGNITUDE RESPONSE OF ELLIPTIC FILTER…............................................................8
FIG3:MAGNITUDE RESPONSE OF CHEBYCHEV FILTER……………………………………9
FIG4:EFFECTS OF POLES & ZEROES………………………………………………………....…9
FIG5:STABLE TRANSFORMATION……………………………………………………………..12
FIG6:IIR FILTER BLOCK………………………………………………………………………….17
FIG7:DIRECT REALIZATION OF IIR FILTER…………………………………………………..19
FIG8:CASCADE REALIZATION OF IIR FILTER………………………………………………..20
FIG9:BIT PARALLEL RIPPLE CARRY ADDER………………………………………………...26
FIG10:MATRIXPRODUCT OF MULTIPLICATION…………………………………………….27
FIG11:ARRAY MULTIPLIER OF TWO’S COMPLEMENT NUMBERS………………………..28
FIG12:BIT SERIAL ADDER & SUBSTRACTOR…………………………………………………29
FIG13:SHIFT & ADD MULTIPLER………………………………………………………………..31
FIG14:S/P MULTIPLIER USINGSHIFT ACUMULATOR………………………………………..33
FIG15:SIMPLIFIED S/P MULTIPLIER…………………………………………………………….33
FIG16:SIMPLIFIED MULTIFIER STRUCTURE…………………………………………………..34
FIG17:LATENCY & THROUGHPUT OF A PROCESSINGELEMENT…………………...……..35
FIG18:INCREASED THROUGHPUT WITHOUT AFFECTINGLATENCY……………………..35
FIG19:TWO INPUT ADDER BLOCK………………………………………………………………36
FIG20:RTL SCHEMATIC OF ADDER BLOCK…............................................................................36
FIG21:OUTPUT RESULT OF ADDER BLOCK……………………………………………………37
FIG22:BIT SERIAL ADDER BLOCK……………………………………………………………….38
FIG23:TEST BENCH WAVEFORM OF BIT-SERIAL ADDER……………………….…...………38
FIG24:BIT-SERIAL & PARALLEL DELAY BLOCK…………………………………..….……....39
FIG25:TEST BENCH WAVEFORM OF DELAY BLOCK………………………….….….……….39
FIG26:S/P MULTIPLIER BLOCK…………………………………………………….….….………40
FIG27:OUTPUT RESULT OF S/P MULTIPLIER……………………………………….….….…...41
FIG28:SIMULATION RESULT OF BOOTHS MULTIPLIER………………….…..….…………...42
FIG29:MAC CIRCUIT…………………………………………………………..….…..…….….……43
FIG30:DIRECT FORM REALIZATIONOF FIR FILTER…………………………….…….………45
FIG31:FIR FILTER DIAGRAM……………………………………...…………………..…..……….46
FIG32:RTL REPRESNTATION OF FIR FILTER………………...………………………..….….….46
FIG33:DIRECT FORM REALIZATIONOF FIR FILTER( BIT PARALLEL)…….…......................47
FIG34:DIRECT FORM REALIZATIONOF FIR FILTER(BIT SERIAL)……………………….….48
FIG35:OUTPUT WAVEFORM OF POWER ANALYSIS OF FIR FILTER(BIT
PARALLEL)……………………………………………………………………….….….….……...….53
FIG36:OUTPUT WAVEFORM OF POWER ANALYSIS OF FIR FILTER(BIT
SERIAL)……………………………………………………………………………….….….…….…..54
FIG37:CASCADE REALIZATION………………………………………………………….…….….58
FIG38:CASCADER REALIZATION………………………………………………………………….59
FIG39:FIR FILTER CASCADE REALIZATION……………………………………………………..59
FIG40:STRUCTURE OF ARITHMATICOPERATION……………………………………….….….60
FIG41:FIR FILTER CASCADE REALIZATION USINGBIT SERIAL ARITHMATIC….….…….61
FIG42:OUPUT WAVEFORM OF POWERANALYSIS OF LATICE REALIZATIONUSINGBIT
PARALLEL ARITHMATIC…………………………………………………………………....….…..64

4
LIST OF CHARTS
CHART 1: DESIGN SUMMARY OF DIRECT FORM REALIZATION OF FIR FILTER USINGBIT
PARALLEL ARITHMATIC……………………………………………………………………………..50
CHART 2: DESIGN SUMMARY OF DIRECT FORM REALIZATION OF FIR FILTER USINGBIT
SERIAL ARITHMATIC…………………………………………………………………….….….….….51
CHART 3: POWER SUMMARYOF DIRECT FORM REALIZATION…………………….….…….51
CHART 4: DESIGN SUMMARY OF CASCADE FORM REALIZATIONUSINGBIT PARALLEL
ARITHMATIC…………………………………………………………………………….….….………61
CHART 5: DESIGN SUMMARY OF CASCADE FORM REALIZATIONUSINGBIT SERIAL
ARITHMATIC………………………………………………………………………….….…………….61
CHART 6: POWER SUMMARYOF CASCADE FORM REALIZATION…………………….…….62
CHART 7: DESIGN SUMMARY OF LATTICE REALIZATION OF FIR FILTER USINGBIT
PARALLEL ARITHMATIC…………………………………………………………………………….72
CHART 8: DESIGN SUMMARY OF LATTICE REALIZATION OF FIR FILTER USINGBIT
SERIAL ARITHMATIC…………………………………………………………………………………72
CHART 9: POWER SUMMARYOF LATTICE REALIZATIONOF FIR FILTER…………………..73

5
INTRODUCTION
Many of today’s electronic applications contain various types of signal processing. This
includes systems used for music, radar, sonar, audio, video, and communication. Some of these
represent small-volume markets, while others are high-volume consumer products like mobile
phones.
There are many reasons behind the increased use of digital signal processing compared to
its analog counterparts. One is the advent of VLSI, where large complex systems can be
manufactured in large quantities at a low cost per unit. Another reason is that the use of digital
circuitry removes the need for tuning, which analog circuits generally require. A stringent
requirement on communication systems to efficiently utilize limited resources such as bandwidth
and transmitter power has led to the use of complex signal processing algorithms that are only
practical to implement using digital signal processing (DSP). Typical DSP operations are
frequency selective and adaptive filtering, time-frequency transformations, and sample rate
changes.
The signals to be processed are obtained either from nature itself or from man-made
machines. The signal processing is generally aimed at extracting information or to transform the
signal into a form more suited for transmission or storage. Signal processing systems often have
finite time available to compute the result. Some systems can accept a missed dead-line, while
others give an unacceptable error if a deadline is exceeded. The later is called hard real-time
systems and are the type of signal processing systems discussed here.
The signal processing can be implemented using various signal representations and
circuit techniques. The evolution has gone from analog time-continuous signal processing such
as passive LC filters, through design and implementation of recursive digital filters using Bit-
Serial Arithmetic for time analog circuits such as switched-capacitor filters, to purely digital
implementations. Analog and switched-capacitor circuits are, however, still needed for
interfacing of analog signals to the digital signal processing system through anti-aliasing filters,
A/D and D/A converters, and for systems with very high bandwidths.
Theory, design, and implementation of high-performance DSP (sub)systems in terms of
throughput, size, and cost are important research and development areas. Also, the increasing use
of portable equipment together with the cost of cooling electronic equipment will be a strong
incentive to increase the efforts of reducing the power consumption in the DSP (sub)systems.
The work presented in this report addresses several important issues in the design of DSP
algorithm and hardware co-design with the aim of obtaining efficient architectures with respect
to design effort, throughput, chip area, and power consumption; high-speed and low-power
consumption in implementation of recursive digital filters.

6
ELECTRICAL FILTER
An Electrical Filter is a system that can be used to Modify, Reshape, or Manipulate the
Frequency Spectrum of an Electrical Signal according to some prescribed requirements, viz.
Attenuate a selected frequency component, Locate or Isolate a Frequency Component, and so on.
Digital filters can be designed using analog design methods by following these steps:
1. Filter specifications are specified in the digital domain. The filter type (highpass,
lowpass, etc.) is specified.
2. An equivalent lowpass filter is designed that meets these specifications.
3. The analog lowpass filter is transformed using spectral transformations into the
correct type of filter.
4. The analog filter is transformed into a digital filter using a particular mapping.
Analog filters:
Classical theory for analogue filters operating below about 100MHz is generally based on
"lumped parameter" resistors, capacitors, inductors and operational amplifiers (with feedback)
which obey LTI differential equations: [ i(t) = Cdv(t)/dt,v(t) = Ldi(t)/dt,v(t)= i(t)R,v0(t)=A vi(t)].
Analysis of such LTI circuits gives a relationship between input x(t) and output y(t) in the form
of a differential equation:

)()(
)(
)()(
)( 2
2
2102
2
210 
dt
txd
a
dt
tdx
atxa
dt
tyd
b
dt
tdy
btyb
whose system (or transfer) functions is of the form:
  M
M
N
N
a
sbsbsbb
sasasaa
sH



...
...
2
210
2
210
This is a ratio of polynomials in ‘s’. The order of the system function is max(N,M). Replacing
s by j gives the frequency-response H a (j), where  denotes frequency in radians/second. For
values of s with non-negative real parts, H a (s) is the Laplace Transform of the analogue filter’s
impulse response ha(t). H(s) may be expressed in terms of its poles and zeros as:
      
    M
N
a
pspsps
zszszs
ksH



...
...
21
21

7
The entire real life signals that are taken as inputs & processed are analog signals. But, in
today’s world, all the systems and their components have been digitized. And for their utilization
and processing in the digital computers, the analog signals have to be sampled, processed, and
reconstructed via the digital system. Thus samplers and digital filter are an integrated part of
today’s electrical components.
There are many methods for transforming an Analog Signal to a Digital Signal. Some
preferred methods are listed below –
i. Backward Difference Method,
ii. Impulse Invariance
iii. Bilinear Transformation
iv. Step Invariance, and so on.
There is no optimum method. The selection criteria depends on the Sampling Frequency,
Highest Frequency Component of the system, etc.

8
COMPARISON OF IIR AND FIR DIGITAL FILTERS
IIR type digital filters have the advantage of being economical in their use of delays,
multipliers and adders. They have the disadvantage of being sensitive to coefficient round-off
inaccuracies and the effects of overflow in fixed point arithmetic. These effects can lead to
instability or serious distortion. Also, an IIR filter cannot be exactly linear phase.
FIR filters may be realized by non-recursive structures which are simpler and more
convenient for programming especially on devices specifically designed for digital signal
processing. These structures are always stable, and because there is no recursion, round-off and
overflow errors are easily controlled. A FIR filter can be exactly linear phase. The main
disadvantage of FIR filters is that large orders can be required to perform fairly simple filtering
tasks.
Note the frequency response is the transfer function H(z) evaluated around the unit circle
on the Argand diagram of z and since the shape of the transfer function can be determined from
the positions of its poles and zeroes, so can be the frequency response.
The frequency response can be determined by tracing around the unit circle on the Argand
diagram of the z plane:
 project poles and zeroes radially to
hit the unit circle
 poles cause bumps
 zeroes cause dips
 the closer to the unit circle, the
sharper the feature

9
IIR filters can be designed using different methods. One of the most commonly used is
via the reference analog prototype filter. This method is the best for designing all standard types
of filters such as low-pass, high-pass, band-pass and band-stop filters.
Here is a summary of three continuous time low pass filters:
BUTTERWORTHFILTERS
Butterworth ensures a flat response in the passband and an adequate rate of rolloff. A good
"all rounder," the Butterworth filter is simple to understand and suitable for applications such as
audio processing.
FIG1: Magnitude response of Butterworth
Filters
ELLIPTIC FILTERS
This filter has equiripple (the same amount of ripple in the passband and stopband).
FIG2: Magnitude response of Elliptic Filters

10
CHEBYCHEV FILTERS
The Chebyshev filter has ripple in the passband of the filter. There is also an Inverse
Chebyshev analog filter is also known as Chebyshev filter II. Chebyshev-II has ripple in the
stopband.
FIG3: Magnitude response of Chebychev Filters
The z-transform of the transfer function is of great importance for IIR filters. The location of
poles in the z plane is used for testing stability of designed IIR filter. The poles of the IIR filter
transfer function must be located within the unit circle in order that filter is stable.
Figure illustrates zeros and poles of the transfer function of a stable IIR filter in the z plane.
Transfer function zeros are denoted by small
circles, whereas its poles are denoted by
small crosses.

11
EFFECTS OF THE POLES AND ZEROS OF THE TRANSFER
FUNCTION
The location of poles and zeros of the transfer function is very important for discrete-time
system analyses and synthesis. In order that a discrete-time system is stable, all poles of the
discrete-time system transfer function must be located within the unit circle. The location of
zeroes doesn’t affect the stabilty of discrete-time systems. Recalling that FIR flters do not have a
feedback, which makes them stable. However, this doesn’t apply on IIR filters. Therefore, it is
preferable to use bilinear transformation because it always makes filter stable.
In the Impulse Invariance method, the derived signal has exactly the same unit-step,
impulse, or sinusoid response as for the original analog filter with t=nT. Here aliasing may occur.
But if order of filter ‘N’ is high enough, aliasing will be small enough to be acceptable, i.e.,
within our tolerance.

12
BILINEAR TRANSFORMATION
A transformation T (z) : z → w is called bilinear if it takes the form
This type of transformation occurs numerous times in electrical engineering, for example, as
dielectric hysteresis, mutual impedance coupling between circuits, transmission line calculations,
propagation in a stratified medium, loudspeaker impedance, & many more.
A continuous-time (CT) signal must be appropriately band-limited in order to avoid
frequency aliasing distortions. Additionally, if the number of time samples used in a particular
computation is constrained, the Nyquist approximation may do a poor job of representing the
original signal.
In the 1960’s a basis expansion was proposed implementing a nonlinear frequency warping
between a CT signal and its discrete-time (DT) representation according to the bilinear
transform. Since there is a one-to-one relationship between the two frequency domains, this
bilinear expansion theoretically avoids both the band-limited requirement and the frequency
aliasing distortions associated with Nyquist sampling.
Furthermore, the DT expansion coefficients can be obtained using a cascade of first-order
analog systems. Modern-day integrated circuit technology has made it practical to compute these
coefficients through conventional circuit design techniques. Consequently, the bilinear expansion
can be considered as a better procedure in filter designs in various applications.
In the Bilinear Transformation technique (BLT), we shall compress the analog frequency
scale [0 to ∞] to [0 to 2π] in the digital filter. That is, we will compress an infinite frequency
span to a finite span.
The philosophy of BLT is the following: If we are given an analog transfer function Ha(s)
we can always simulate Ha(s) in a basic Analog Circuit.
In the simulation of Ha(s), we require summation, multiplication by a constant and a
dynamic element, namely an integrator. What people used to do is to use op-amps for integration
and simulate any given transfer function by op-amps only. Multiplication by a constant alpha is
either a potentiometer, if alpha is less than 1, or an op-amp if alpha is greater than 1. We can take
care of both the plus sign and minus signs by inverting op-amp and non inverting op-amp.

13
Integration is done by putting a capacitor in the feedback loop and the integration is usually
associated with a negative sign before the integral. The basic fact is that if we have an adder, a
multiplier and a block of transfer function 1/s which describes an integrator, we can simulate any
given analog transfer function. If we simulate the given transfer function by adders, multipliers
and integrators, then we can convert that diagram into a digital filter because in a digital filter
addition and multiplication are the same and there is no change; the only change is that we shall
require a digital integrator.
The bilinear transform is defined by
which is accomplished by replacing ‘s’ by
s-plane to z-plane mapping
Here the entire jω axis maps into one
complete revolution of the unit circle.
(z=eTs maps jω axis into infinite
number of revolutions of the unit circle)
FIG4:STABLE TRANSFORMATION

14
PROCEDURE FOR BILINEAR TRANSFORMATION
Points:
1) Left half of s-plane mapping to inside of the unit circle in z-plane, i.e.,
2) Right half of s-plane mapping to outside of the unit circle in z-plane, i.e.,
Hence, a causal and stable continuous time system will be mapped to a causal and stable
discrete-time system.
..… (1)

15
Unlike Impulse Invariant Transformation where the relationship was simply ω = ΩT as
indicated, in BLT, there is a deviation from linearity because the relation between Ω and ω is
nonlinear.
This is how an infinite axis is compressed to a finite axis, that is 0 to infinity is
compressed to 0 to pi; this phenomenon is called Warping. So frequency scale is warped, which
is a disadvantage. We shall do pre-warping
or anti-warping so that the effect of warping ultimately is cancelled and we get what we want.
So, ω’s are transformed to
Ω’s and by the relationship
This is pre-warping, that is the digital filter frequencies are pre-warped to analog frequencies.
Thus we get the specs on the corresponding analog filter.
There is absolutely no aliasing in Bilinear Transformation because the total transfer
function is being transformed. In Impulse Invariant Transformation, only poles were
transformed.
In Impulse Invariant Transformation, it is simply ωs/ωp because the relationship is linear.
IIT is an approximation for the BLT relationship because for small theta, tanѲ can be replaced
by Ѳ, which gives IIT. For small ω or Ω, IIT and BLT are indistinguishable.
Alternatively, if we have an inverse bilinear transform, we can follow these steps:
1. Use the inverse bilinear transform on the filter specifications in the digital domain to
produce equivalent specifications in the analog domain.
2. Construct the analog filter transfer functions to meet those specifications.
3. Use the bilinear transform to convert the resultant analog filter into a digital filter.
4. The Inverse Bilinear Transform can be expressed as Z=(1+s)/(1-s)

16
Two of the well known methods, the impulse invariance method & the matched Z-transform
method are conceptually similar to sampling a continuous waveform that we're familiar with.
Denoting the inverse Laplace transform by L−1 and the Z transform as Z, both these methods
involve calculating the impulse response of the analog filter as a(t)=L−1{A(s)} and
sampling a(t) at a sampling interval T that is high enough so as to avoid aliasing. The transfer
function of the digital filter is then obtained from the sampled sequence a[n] as
Da(z)=Z{a[n]}
However, there are key differences between the two.
Impulse invariance method:
In this method, you expand the analog transfer function as partial fractions as
where Cm is some constant and αm are the poles.
Mathematically, any transfer function with a numerator of lesser degree than the denominator
can be expressed as a sum of partial fractions. Only low-pass filters satisfy this criterion (high-
pass and bandpass/bandstop have at least the same degree), and hence impulse invariant method
cannot be used to design other filters.
Matched Z-transform
In this method, instead of splitting the impulse response as partial fractions, you do a simple
transform of both the poles and the zeros in a similar manner (matched)
as βm→eβmT and αm→eαmT (also stability preserving), giving
You can easily see the limitation of both these methods. Impulse invariant is applicable
only if your filter is low pass and matched z-transform method is applicable to bandstop and
bandpass filters (and high pass up to the Nyquist frequency).

17
Digital filters designed via bilinear transformation are guaranteed to be stable. However,
the accurate values of coefficients are obtained immediately after the implementation of bilinear
transformation. On filter realization, it is impossible to represent coefficients without an error. In
software digital filter realization (implementation), the resulting coefficients are quantized,
which also generates a certain error. Any error made during the quantization of coefficients
affects more or less the frequency response, which may further cause the stopband attenuation to
decrease.

18
IIR FILTER REALIZATION
IIR filter transfer function can be expressed as:
 N is the filter order,
 bk the coefficient of non-recursive part of IIR filter
 ak the coefficient of feedback of IIR filter.
IIR Filter Difference Equation can be expressed as :
y[n] = b0x[n] + b1x[n] + …………..+bM-1x[n-(M-1)]
- a1y[n-1]-a2y[n-1]-………..-aNy[n-N]
The block diagram of IIR filter is as follows :
FIG5:BOCK DIAGRAM OF IIR FILTER

19
DIRECT REALIZATION
Direct realization of IIR filters starts with this expression:
The first part of the expression refers to non-recursive part and the other refers to
recursive part of IIR filter. In IIR filter direct realization, these two parts are separately
considered and realized.
The realization of non-recursive part of IIR
filter is identical to the direct realization of
FIR filter. Figure illustrates the block
diagram of direct realization of non-
recursive part of IIR filter.
Realization of non-recursive part of IIR
filter is similar to that of recursive part.
Figure illustrates the direct realization of the
filter recursive part.

20
As non-recursive and recursive part of IIR filter are separately realized, it doesn’t matter which
of them will be used first in filtering process.
Direct realization is very convenient for software implementation and this is where it is most
commonly used.
Some of disadvantages of this realization are the greatest sensitivity to accuracy of
realized coefficients (i.e. the largest finite word-length effect), and the greatest complexity due to
implementation (i.e. needs most resources).
CASCADE REALIZATION
Cascade realization structure is the most difficult to obtain from the transfer function
(comparing to other realization structures given in this book). It is very convenient for its
modular structure and less sensitivity to the accuracy of non-recursive and recursive coefficients
realization. On cascade IIR filter realization, a filter is divided into several, mutually independent
sections of the first or second order.
Since the sections are mutually independent after design process, the finite word-length
effect on the accuracy of coefficients, modulation of frequency response and IIR filter stability
are separately examined for each section. The analyze is simplified this way.

21
The IIR filter transfer function is expressed as:
 bi are the coefficients of transfer function numerator ;
 aj are the coefficients of transfer function denominator;
 H0 is a constant;
 qi are the zeros of the transfer function;
 pj are the poles of the transfer function;
 B(z) is the transfer function of non-recursive part;
 A(z) is the transfer function of recursive part ;
 M is the number of sections in cascade realization structure.
Cascade realization requires the given expression to be factorized so that the transfer function
is expressed as follows:
a[i, k] are the coefficients of recursive part of the i-th IIR filter section;
b[i, k] are the coefficients of non-recursive part of the i-th IIR filter section.
Figure illustrates a second-order section.
FIG6: SECOND ORDER FILTER

22
The use of direct transpose realization structure reduces necessary number of delay lines
and adders as well. Filter dividing in independent sections reduces the sensitivity to the accuracy
of quantization coefficients and simplifies analysing the stability of the resulting filter. Besides,
the possibility that IIR filter becomes instable after quantization is drastically reduced as the
coefficients quantization is performed after dividing filter in sections, so the changes of poles
locations are smaller, therefore.
Software realization requires M buffer of length 2 or 1. Each section must have its own
buffer for saving samples of intermediate signals. Such complexity and needed factorization are
two main disadvantages of this realization structure.
Figure illustrates the block diagram describing cascade IIR filter structure.
FIG7: CASCADE REALIZATION OF IIR FILTER

23
VHDL - THE LANGUAGE
The VHSIC Hardware Description Language (VHDL) is an industry standard language
used to describe hardware from the abstract to concrete level. The language not only defines the
syntax but also defines very clear simulation semantics for each language construct. Provides
extensive range of modelling capabilities, it is possible to quickly assimilate a core subset of the
language that is both easy and simple to understand without learning the more complex
features.It’s very useful in teaching top-down design. We can design a system at high level &
express the algorithm in VHDL. We can then simulate and debug the designs at this level before
actually proceeding with detailed logic design. A dataflow level of description offers a
combination of the behavioural and structural levels of description.
LEVELS OF ABSTRACTION
1) Data Flow level :
In this style of modelling the flow of data through the entity is expressed using
concurrent signal assignment statements.
2) Structural level :
In this style of modelling the entity is described as a set of interconnected statements.
3) Behavioral level :
This style of modelling specifies the behavior of an entity as a set of statements that are
executed sequentially in the specified order.
VHDL utilizes these two types of computational procedure,
1) Bit-Parallel Arithmetic
2) Bit-Serial Arithmetic
1) Inputs to a bit-parallel arithmetic operation are stored in registers. In bit-parallel
arithmetic, all bits are conceptually processed at once, i.e. all bits in the inputs are applied in
parallel and all of the bits in the output occur simultaneously and the obtained output is stored in
the registers.
An advantage of bit-parallel arithmetic is that the amount of work performed by a
processing element during one clock cycle is relatively large, and the clock frequency can
therefore be kept low. It means it has high computational speed.

24
Disadvantage of bit-parallel arithmetic is that it has high power consumption and chip
area as compared to bit-serial arithmetic.
2) In serial arithmetic one bit of the input data is processed in each clock cycle, generally
starting with the LSB.
Advantage in bit-serial arithmetic is its power consumption. Bit serial digital filters have
less power consumption because of serial parallel multiplier. Also it consumes smaller area
compared to bit parallel.
Disadvantage of bit-serial arithmetic is their design complexity. The design time for the
bit-serial system increases due to the higher complexity of timing the bit-serial streams.
The potential performance of bit-serial processing elements may be somewhat degraded
due to practical problems with high frequency clocking.
BIT SERIAL ARITHMETIC & BIT PARALLEL ARITHMETIC
Numbers may be described as floating-point or fixed-point numbers. Floating-point numbers use
a signed mantissa M and a signed exponent E to represent a number F = M×βE ,where β is the
base of the number. Fixed point numbers on the other hand have a fixed exponent with the binary
point in the mantissa is always located at the same position, independent of the represented
number.
The variable exponent in the floating-point number representation enables a large number range,
but quantization introduces value-dependent errors which may be troublesome in some
algorithms. Most DSP algorithms do not require the increased number range of floating-point
numbers if appropriate measures are taken to scale the signal levels in the algorithm.
Parasitic oscillations in a system using floating-point arithmetic are in general harder to suppress
compared to a system using fixed-point arithmetic. Implementation of fixed-point arithmetic is
also less complex compared to floating-point arithmetic, making fixed-point arithmetic the
preferred number representation in many cases. Floating-point arithmetic thus becomes slower,

25
consumes more power, and requires more chip area. Signed fixed-point numbers can be
described using various representations. One representation is sign-magnitude representation,
where a sign bit denotes the sign of the number, and the rest of the bits denote the magnitude.
There is in this case two representations of the number zero that will increase the complexity of
the implementation of additions and subtractions. Other representations include one’s-
complement, two’s complement, bias, and signed-digit code.
Most fixed-point systems use two’s-complements to represent signed fixed point numbers.
Signed addition and subtraction are then treated as unsigned addition and subtraction. The most
significant bit has a negative weight, while the other bits have positive weight. Two’s-
complement representations will be assumed throughout the rest of the text.
A number X represented in two’s complement is shown in Eq. (4.1). The number range is here
limited to -1 ≤ X < 1. Larger number ranges is achieved by scaling the representation by a factor
2k, where k is the required number of integer bits.
……………………………………………Eq.4.1.
One important property of two’s-complement representation is that a sum of numbers can be
computed in an arbitrary order. An overflow in an intermediate result can be neglected if the
correct sum is within the available number range. This means that the order of the additions is
unimportant with regard to overflow, and it is therefore possible to rearrange the order without
affecting the final result.
There are beside the ordinary binary number representations also redundant representations [8],
with multiple representations of a single number. Operations involving comparison of numbers
using this type of representation is however often difficult to implement. Some of these
redundant representations are easy to convert into ordinary binary numbers, e.g., signed-digit
code. Others, like Residue Number systems, are difficult to encode and decode to and from
ordinary non redundant binary numbers, but they are efficient for certain operations as long as
conversion between the number systems is not required.

26
The most common operations in DSP algorithms are additions, subtractions, and multiplications.
Multiplications with fixed coefficients are common, which enables the designer to simplify the
hardware. Such simplifications save resources with a possible speed-up.
1) BIT-PARALLEL ARITHMETIC:-
Typically, inputs and outputs to a bit-parallel arithmetic operation are stored in registers. In bit-
parallel arithmetic, all bits are conceptually processed at once, i.e., all bits in the inputs are
applied in parallel and all of the bits in the output occur simultaneously. However, in practice it
is necessary to process them sequentially. An advantage of bit-parallel arithmetic compared to
bit-serial arithmetic is that the amount of work performed by a processing element during one
clock cycle is relatively large, and the clock frequency can therefore be kept low.
ADDITION AND SUBTRACTION:-
A sum Z of two numbers X and Y in two’s-complement representation is computed by adding
the bits two and two, as shown in Eq. (4.2). Carry values are propagating from least significant
bit (LSB) up to the most significant bit (MSB).
.Eq.4.2.
This can be implemented in parallel using a set of full-adders, which adds the bits on the same
significance level including a carry bit from the lower significance level. A straightforward
implementation is shown in Fig.4.1. The carry input at the LSB is set to zero, and the carry
output from each significance level is connected to the next significance level.
The result bit si depends on every input bit of equal or lower significance level. There will
therefore be a combinatorial path from LSB through all full-adders to the MSB resulting in a
long propagation delay.

27
FIG:8 Bit-parallel ripple-carry adder.
The computation of the result will be sequential in the worst case, starting at LSB and generating
carry values up to MSB.
Many techniques have been proposed to avoid this problem of long carry propagation paths, e.g.,
carry look-ahead, carry-save, and carry-select. One common property of these solutions is the
increase of resources that are required to speed up the computation compared to the ripple-carry
implementation.
Unwanted switching in the logic circuits is generated by implementations using the simple full-
adder based structure in Fig.4.1, as intermediate incorrect results are computed before the correct
carry has arrived to a bit level stage. The number of full-adders, and therefore also the carry
propagation path limiting the addition time, is proportional to the data word length Wd.
Subtraction is carried out using the same structure as for addition. By using the property that the
sign of a two’s complement number is changed by inverting all bits and adding one to the LSB
position, the addition is converted into subtraction by inverting the value to be subtracted, and
setting the input carry bit at the LSB.
MULTIPLICATION:-
Binary multiplication may be carried out using a scheme similar to common hand calculation.
An array of partial-product terms is generated and then added as shown in Fig. 4.2. Each dot in

28
the summation array corresponds to two digits multiplied, and this is in the binary case
equivalent with a logic AND function of two bits.
FIG 9: Matrix of partial products generated in multiplication.
Summation of the partial-products can be performed in various ways [8]. The straightforward
method of using a full-adder for the addition of each dot will result in the array multiplier shown
in Fig. 4.3, with a multiplication time proportional to the sum of the data word length and
coefficient
FIG10: Array multiplier for two’s-complement numbers.

29
word length (propagating down and then from right to left). The required area will be
proportional to the data word length and the coefficient word length (Wd×Wc).
Other methods of adding the partial product terms include Wallace trees and similar structures,
where the carry propagation is reduced by changing the addition order of the input data [10].
Such addition schemes use a treelike adder structure to speed up the additions, thereby reducing
the propagation delay. Carry is only propagated from one level to another, resulting in short
combinatorial paths. Only the final step, where the two last intermediate results are to be added,
requires a carry-propagate adder.
2) BIT-SERIAL ARITHMETIC:-
In bit-serial arithmetic one bit of the input data is processed in each clock cycle, generally
starting with the LSB. The complexity of an operation is low as there are few input bits to
operate on in each clock cycle. Combinatorial paths through the logic are short, allowing for high
bit-rates, which will make the total computation time comparable to bit-parallel ripple carry
implementations.
Using of bit-serial arithmetic results in small processing elements and short interconnection paths
between the processing elements. The total chip area therefore becomes smaller which makes the
interconnection between the processing elements shorter. This allows for higher clock frequency
and also reduces the power consumption as the capacitive loads on the gates are reduced.
ADDITION AND SUBTRACTION:-
A bit-serial adder adds two bits during one clock cycle generating a sum bit. A carry bit is also
generated which is added in the next clock cycle, as shown to the left in Fig.4.4. The carry is

30
saved in a flip-flop, which is reset at the start of the addition. This reset of the D flip-flop
corresponds to the zero at the LSB carry input in the bit-parallel case.
FIG 11: Bit-serial adder and sub tractor.
The area of the adder is independent on the data word length, but the number of clock cycles is
proportional to the word length. Power consumption is lower in the bit-serial case compared to a
long bit-parallel ripple carry implementation because the combinatorial depth of the circuit is
smaller, and the output is correctly computed directly without excessive switching.
Subtraction may be implemented as in the bit-parallel case, i.e., by changing the sign of the
subtrahend. This is accomplished by inverting all bits and adding a one to the LSB position. In
the bit-serial case an adder with one inverted input is sufficient to implement the subtraction as
shown to the right in Fig. 4.4. The carry flip-flop is set at the beginning of the subtraction.
MULTIPLICATION:-
Multiplication of two numbers can be accomplished using two bit-serial inputs, generating a bit-
serial output [1]. Many DSP algorithms like digital filters and FFTs only use multiplications of
data and a fixed coefficient. We will only discuss this type of multiplication here.

31
SHIFT-AND-ADD MULTIPLIERS:-
A common case is multiplication with a fixed coefficient, which may be realized as
multiplication of a bit-serial input of Wd bits with a bit parallel coefficient of Wc bits, generating
a bit-serial output of Wd+Wc-1 bits. Both the input and output bit-serial data streams are in a
LSB first order. This shift-and-add multiplier structure computes the product by adding rows in
the matrix representation, generating a new row of bits after each addition. The input stage to a
shift-and-add multiplier consists of a row of AND gates which performs a bit-wise multiplication
of the serial input bit with the parallel coefficient. This stage is in Fig.4.5 implemented as a
multiplexer that selects either the coefficient a or zero. The result of this bit-wise multiplication
is then added to the partial product. The accumulated sum is then shifted right one position. The
rightmost bit yields one bit in the product. Once the last addition is completed, the multiplier is
clocked for additional Wc clock cycles with a zero input in order to shift out the Wc most
significant number of bits.
FIG 12: Shift-and-add multiplier.
Use of a coefficient in two’s-complement form requires the shifting to be done using arithmetic
shifts, copying the sign-bit, as the intermediate result may be negative.
Serial input data in two’s-complement form requires a special treatment compared to positive
binary data. The last bit, the sign-bit, has a bit weight of -1. The sign-bit should therefore be

32
multiplied with the coefficient and the resulting partial product should then be subtracted from
the accumulated sum. One approach is to include logic that convert the addition of (x0×a) to a
subtraction. Finally, the last Wc bits are generated while keeping the serial input to zero. Another
approach to handle the sign-bit is to sign-extend the serial input [4]. The subtraction in the
multiplication of two’s-complement numbers may be eliminated by sign-extending the serial
input as shown in Eq. (4.3). The left part of the last expression is the subtraction operation of the
coefficient. It only contributes to the product in bit-positions with bit-weights above 20. The right
part of the last expression only contributes to the product at bit-positions with bit-weights up to
20. The final subtraction is therefore not required and the multiplier can therefore be
implemented using only additions. The sign-extension logic may consist of a latch.
……………………….Eq.4.3
The multiplication time in a serial/parallel multiplier is Wd+Wc-1 clock cycles, where Wd is the
bit-serial data word length and Wc is the coefficient word length. The maximum clock frequency
will be limited by the addition time in one bit-adder. Only the least significant bit is used as
output at each clock cycle, allowing the rest of the intermediate result to be in an arbitrary
number format. A redundant representation of the intermediate result is therefore acceptable, as
long as the LSB is calculated. Use of carry-save adders will therefore allow for a high clock
frequency since they have a short combinatorial path. Shifting is automatically performed each
clock cycle due to the wiring.

33
SERIAL/PARALLEL MULTIPLIERS:-
An alternative realization of the shift-and-add algorithm is shown in Fig.4.6. This realization is
referred to as a serial/parallel (S/P) multiplier [4]. It consists of two parts. The first part generates
the partial bit-products and the second part is a so-called shift-accumulator. A serial/parallel
multiplier requires little chip area and can be clocked with high clock frequency [11].
Serial/parallel multipliers are natural building blocks for more complex operations. For example,
a processing element corresponding to a two-port adaptor, can be built using a single multiplier,
three bit-serial adders, and a number of D flip-flops. Several implementations of digital filters
with multiplexed processing elements of this type have successfully been implemented using
both standard-cell and full-custom layout styles [9, 12, 6].
FIG 13: Serial/parallel multiplier using a shift-accumulator.
S/P MULTIPLICATION WITH FIXED COEFFICIENTS:-
The serial/parallel multiplier structure may be significantly simplified if the coefficient is fixed
[5]. The number of full-adders in a simplified implementation is equal to the number of non-zero
bits in the coefficient minus one if the coefficient is positive, and the number of non-zero bits in
the case of a negative coefficient. Procedures for simplifying serial/parallel multipliers with fixed
coefficients, either in two’s-complement or signed digit code, is presented in [2]. An example of
a simplified serial/parallel multiplier is shown in Fig.4.7. Here, the logic drawn with dotted lines
can be removed.

34
FIG 14: Simplified serial/parallel multiplier with coefficient 0.0112.
Multiplication generates a product that has a larger data word length than the word length of the
serial input. The number of fractional bits in the coefficient determines the number of extra bits
(of lower significance level compared to the input data). These additional bits must be truncated/
rounded in a recursive path.
LATENCY:-
The computational speed is characterized by two parameters, latency and throughput. The
latency for an operation is defined as the time required for an input of a given significance level
to affect the output at the same significance level [7, 3, 4]. It describes how long time it takes for
an input value to be transformed into an output value. It is often convenient to measure the
latency in terms of clock cycles instead of real time unit.
Latency depends on the function of the processing element (PE). One example is the simplified
serial/parallel multiplier in Fig.4.8, which may be used in multiplication with 0.112 or 0.0112,
without changing the structure. The latency is, however, different in the two cases since the
multiplication with 0.0112 will generate one more fractional bit compared to the multiplication
with 0.112. The 0.0112 case will therefore require one more clock cycle before a result bit of the
same significance level is available at the output.

35
FIG 15: Simplified multiplier structures for fixed coefficients 0.112 and 0.0112.
THROUGHPUT:-
Throughput is defined as the reciprocal of the time between successive outputs as illustrated in
Fig.4.9 [7, 3, 4]. The throughput is measured in operations per time unit.
FIG 16: Latency and throughput of a processing element.
Throughput is not directly connected to the latency and it is possible to modify the throughput
without affecting the latency of a system. This is illustrated in Fig. 4.10, which describes how the
throughput of a system consisting of a single multiplier may be doubled by interleaving of two
multipliers. However, the latency has not changed.

36
FIG 17: Increased throughput without affecting latency.
Upper and lower limits on throughput and latency will depend on the technology used for the
implementation.

37
IMPLEMENTATION & ANALYSIS OF SUB-BLOCKS
A filter consists of various sub blocks like Adder, Multiplier and Delay etc. So to design filters it
is necessary to design all this sub blocks first then by combining these sub blocks as per
requirement filters can be designed. This chapter provides information about design,
implementation and analysis of various sub blocks which are required for filter design.
IMPLEMENTATIONOF ADDER SUB-BLOCKS USING VHDL:-
Fig illustrates a block diagram of 15-bit fixed-point adder sub-block.
FIG 18: Two input Adder block.
To design a 15 bit full adder, first a single bit three input adder is created .by port mapping the
ports of this three input adder block 15 bit full adder is created. This generic VHDL code of 15
bit full adder is used as a library component. For the three input adder block, the carry output of
present state is feed back as the carry input of previous state, which is shown in fig.5.1.2.
FIG 19: RTL schematic of adder block.

38
Fig shows the output result of 15 bits adder block, where ‘a’ and ‘b’ are 15 bit input vectors. The
output vector is stored in ‘yout’ variable, which is the sum of input ‘a’ and ‘b’. By the same way
32 bit and 64 bit adder blocks are created. These adder blocks are used in bit parallel
implementation of digital filters.
FIG 20: output result of two input adder block.
Fig shows the block diagram of bit serial adder .To implement this adder we need the memory
block to store the sum and carry, for that we use D flip flops. In this adder circuit carry output of
present state is feed back as input to the previous state. Here the reset bit is used to reset the
output. Output is not available in the output port until the set bit is in on state, which is shown in
fig.5.1.5.

39
FIG 21: BIT serial adder block.
FIG 22: Test bench waveform of bit serial adder.
IMPLEMENTATIONOF DELAY SUB-BLOCKS USING VHDL:-
Fig Show the bit serial and bit parallel implementation of delay sub blocks. This delay sub
blocks are used as a memory element to store the data up to one clock cycle. Here the reset bit is
used to reset the output.

40
FIG 23: Bit serial delay block.
FIG 24: Bit parallel delay block.
Inputs are given at the rising edge of the clock pulses and based on that same output is obtained
after a delay of one clock pulse. This memory block behaves like a D flip flop. The output is
shown in fig.5.2.3. which we can get after a delay of one clock pulse from the given input.
According to this figure variable‘d’ is input vector and variable ‘q’ is output vector. The input
vector‘d’ appear in the time slot of 90ns to 180ns. The output vector ‘q’ which appear in the time
slot immediately after the first rising edge of clock, (that is 180 ns to 360 ns).This memory block
hold the output for at least one clock cycle.
FIG 25: Test bench wave form of delay block.

41
IMPLEMENTATIONOF MULTIPLIER SUB-BLOCKS USING VHDL:-
There is different way of designing multiplier. Here two of such design method has been
discussed.
SERIAL PARALLEL MULTIPLIER SUB-BLOCK USING VHDL:-
Fig shows the RTL schematic of a serial parallel multiplier. One of the input vector ‘a’ is applied
serially to the circuit (one bit at a time starting from the LSB), while the other ‘b’ is applied
parallel.(all bit simultaneously).Say that ‘a’ has M bit while ‘b’ has N. Then after all M bit of ‘a’
have been presented to the system a string of M ‘0’s must follows , in order to complete M+N
bit output product. As can be seen in fig that the system is pipelined and constructed using And
gates full Adder units and Registers. Each unit of the pipe line (except the left most one) requires
one Adder two Registers an And gate to compute one of the inputs.
FIG 26: serial parallel multiplier.
Simulation results are shown in Fig ‘a=1100’(decimal 12 ) was applied to the serial input. Notice
that this input must start with the LSB (a(0)=’0’), which appear in the time slot of 50ns to
100ns.while the MSB(a(3)=’1’)is situated in 350ns to 400ns.Recall that four zeros must then
follow. On the other hand at the parallel input, b=’1101’(decimal 13)was applied. The expected
result ‘prod=10011100’(decimal 156) can be observed in the lower plot. Recall that the first bit
out is the LSB, that is ‘prod(0)=0’,which appear in the time slot immediately after the first rising

42
edge of clock,(that is 100ns to 200 ns).while the last bit (MSB)of prod is situated in 600 ns to
700 ns. This kind of serial parallel multiplier is used as multiplier in bit serial arithmetic.
FIG 27: Simulation result of serial parallel multiplier.
BOOTHMULTIPLIER:-
Booth multiplication algorithm for radix 4
One of the solutions of realizing high speed multipliers is to enhance parallelism which helps to
decrease the number of subsequent calculation stages. The original version of the Booth
algorithm (Radix-2) had two drawbacks. They are: (i) the number of add subtract operations and
the number of shift operations become variable and become in convenient in designing parallel
multipliers. (ii) The algorithm becomes inefficient when there are isolated 1’s. These problems
are overcome by using modified Radix4 Booth algorithm which scans strings of three bits with
the algorithm given below:
1) Extend the sign bit 1 position if necessary to ensure that n is even.
2) Append a 0 to the right of the LSB of the multiplier.
3) According to the value of each vector, each Partial Product will he 0, +y, -y, +2y or -2y.

43
The negative values of y are made by taking the 2’s complement and in this paper Carry-look-
ahead (CLA) fast adders are used. The multiplication of y is done by shifting
y by one bit to the left. Thus, in any case, in designing a n-bit parallel multipliers, only
n/2 partial products are generated.
FIG 28: simulation result for booths multiplier.
Boots multiplier is used in bit parallel arithmetic. The output result of boots multiplier is shown
Fig
MULTIPLY ACCUMULATE SUB-BLOCKS USING VHDL:-
Multiplication followed by accumulation is a common operation in many digital system,
particularly those highly interconnected, like digital filters neural networks, data quantizes, etc.

44
FIG 29: MAC circuit.
One typical MAC (Multiply-Accumulate) architecture is illustrated in Fig.29 It consist of
multiplying two values, then adding the result to the previous accumulated value, which must
then be restored in the register for future accumulations. Another feature of MAC circuits is that
it must check for overflow, which might happened when the no of MAC operation is large. The
design can be done using components, because we have designed each of the units shown in Fig.
However it is relatively simple circuit, it can also be designed directly. In any case, the MAC
circuit, as a whole, can be used as a component in application like digital filters and neural
networks.

45
IMPLEMENTATION & ANALYSIS OF FIR FILTERS
Digital signal processing finds innumerable applications in the field of audio, video and
communications. Such application is generally based on LTI (linear time invariant) systems,
which can be implemented with digital circuitry. An LTI system is represented by following
equation:
Where Ak and Bk are the filter coefficient and x[n-k],y[n-k] are the current (for k=0) and earlier
(for k>0) input and output values ,respectively. To implement this expression, register are
necessary to store x[n-k] and or y[n-k] (for k>0),beside multiplication and adders , which are
well known building block in the digital domain.
The impulse response of digital filter can be divided in to two categories: IIR (infinite impulse
response) and FIR (Finite impulse response). The former correspond to general case described by
the equation above, while the latter occurs when N=0. Only FIR Filter can exhibits linear phase,
so they are indispensable when linear phase are required, like in many telecom applications.
With N=0, the equation above becomes
Where ck = bk/a0 are the coefficient of FIR filter .This equation can be obtained by the system of
Fig Where D (delay) represented a register (flip flops), a triangle is a multiplier, and a circle
means an adder.

46
TRANSVERSAL STRUCTURE OR DIRECT FORM REALIZATION OF
FIR FILTER:-
The system function of FIR filter can be written as
H (Z) =∑ h (n)z-n for n=0 to N-1.
=h(0) + h(1) z-1 +h(2)z-2 …….+h(N-1)z-(N-1) …………………….Eq.6.1.1.
Y(Z)=h(0)X(Z)+ h(1)z-1X(Z)+ h(2)z-2X(Z)+ ……. h(N-1)z-(n-1) X(Z)
This equation is realized in FigThis is known as transversal structure. This structure requires N
multipliers, N-1 adders, and N-1 delay elements.
FIG 30: Transversal structure or Direct form realization on FIR Filter (with five coefficients).
An equivalent RTL representation is shown in Fig.6.1.3. As shown the values of ‘x’ are stored on
shift register, whose output are connected to the multipliers and then to the adders. The
coefficient must be stored on chip. However if the coefficient are always same, their value can be
implemented by means of logic gates rather than registers. On the other hand if it is general

47
purpose filter, then register are required for the coefficients. In the architecture of Fig the output
vector ‘y’ was always stored, in order to provide a clean synchronous output.
FIG 31: FIR Filter diagram (with four coefficients)
FIG 32: RTL representation of FIR Filter
The circuit of Fig can be constructed in many ways. However, if it is intended for future reuse or
sharing, then it should as generic as possible. The lower section of the filter contains a MAC
(multiply Accumulate) pipeline. This circuit is closely related to MAC circuit discussed
previously. Here to, over flow can happen, so add /truncate procedure must be included in the
design. In this circuit the random coefficient are chosen as constants. No algorithm is used to
generate coefficients. The value chosen are coeff(0)=3,coeff(1)= 9,coeff(2)=6,coeff(3)=13.
Simulation results are shown in Fig

48
FIR FILTER USING BIT PARALLEL ARITHMETIC:-
FIG 33: FIR Filter Direct form realization using bit parallel arithmetic.
Fig shows output result of Direct form realization of FIR Filter using bit parallel arithmetic. Here
8 bit input vector ‘x’ is feed parallel with the rising edge of the clock pulse. Recall that the
coefficients are coeff(0)=3,coeff(1)=9,coeff(2)=6,coeff(3)=13. The sequence applied to the input
were x[0]=4, x[1]=3,x[2]=5,x[3]=2.Therefore, with all the flip flops previously reset, at the first
positive edge of the clock the expected output is y[0]=coef(0)* x[0]=12, which coincides with
the first result of the output for ‘y’ in Fig.3.At the next upward transition of the clock, the
expected value of y[1]= coef(0)*x[1]+ coef(1)*x[0]=45 .And one clock cycle later Y[2]=
coef(0)* x[2] +coef(1)* x[1] +coef(2)* x[0]=66 , and so on.

49
FIR FILTER USING BIT SERIAL ARITHMETIC:-
Fig shows output result of Direct form realization of FIR Filter using bit serial arithmetic. Here
five single bit input from ‘x0’ to ‘x4’ are used which are feed with the rising edge of the clock
pulse. As the data are feed serially so the LSB is applied first, which appear in the time slot of
50ns to 100ns.The sequence applied to the input were x(0)=4, x(1)=3,x(2)=5,x(3)=2.Therefore,
with all the flip flops previously reset, at the first positive edge of the clock the expected output
is y(0)=coef(0)* x(0)=12. In the output LSB will appear first, which will appear in the time slot
immediately after the first rising edge of clock, (that is 50ns to 100 ns).while the last bit (MSB)
of ‘y0’ is situated in 200 to 250 ns. The expected value of y1= coef(0)*x1+ coef(1)*x0=45 .Here
one addition operation take place and we know that for each bit serial addition operation output
will be delayed by one clock pulse. So the output of ‘y1’will appear after a delay of one extra
clock pulse from the output ‘y0’. That means the LSB of ‘y1’ will appear at the time slot of
100ns to 150 ns. So there will be an initial latency of one clock pulse. This trends will be
followed in ‘y2’ also, the LSB for ‘y2’ will appear at the time slot of 150ns to 200 ns. So there
will be an initial latency of two clock pulse. In the next output there will be an initial latency of
three clock pulse and these trends will go on for other outputs also.
FIG 34: FIR Filter Direct form realization using bit serial arithmetic

50
SIMULATION TIME ANALYSIS OF TRANSVERSAL STRUCTURE OR
DIRECT FORM REALIZATION FILTER USING BIT PARALLEL
ARITHMETIC & BIT SERIAL ARITHMETIC:-
In bit serial arithmetic the data are feed serially, first the LSB is given then in the next clock
pulse second bit is given. In this way the data of all input variables are feed and we get the output
in the same fashion. This way of entering input data and extracting output data will introduced
latency in the output waveform. As the latency is go on increasing in every individual output, so
it will take time for the last bit (MSB) of each output to appear in the waveform. The last bit
(MSB) of output ‘y0’ is situated in 200 to 250 ns and the last bit (MSB) of ‘y1’ is situated in 450
to 500 ns. In case of bit parallel arithmetic although there is initial latency in each output but the
output bit of a stage (y15 to y0) appears in synchronous with clock pulse. So we get LSB to MSB
output data of a stage within a single clock pulse. Bit parallel arithmetic of Fig.6.1.4 shows
Y[2]= coef(0)* x[2] +coef(1)* x[1] +coef(2)* x[0]=66 is situated in 350ns to 450 ns. And bit
serial arithmetic of Fig.6.1.5 shows Y[2]=66 is situated in 200ns to 600 ns(LSB at 200ns and
MSB at 600ns ). So in bit serial arithmetic LSB to MSB of the output is situated in different
clock pulse which is not the case of bit parallel arithmetic. For this reason if some one use bit
serial arithmetic, it will take time to get the complete output data compare to bit parallel
arithmetic. This is one such disadvantage of using bit serial arithmetic compare to bit parallel
arithmetic.

51
AREA ANALYSIS OF TRANSVERSALSTRUCTURE OR DIRECT FORM
REALIZATION OF FIR FILTER USING BIT PARALLEL ARITHMETIC
& BIT SERIAL ARITHMETIC:-
According to design summery of Direct form realization of FIR filter, which use bit parallel
arithmetic in fig Number of 4 input LUTs are 479, number of occupied slices are 271, number of
bonded INPUT/OUTPUT are 26. Fig.6.1.7. shows the design summery of Direct form realization
of FIR filter which use bit serial arithmetic. According to this figure the number of 4 input LUTs
are 73, number of occupied slices are 47, number of bonded INPUT/OUTPUT are 13. If a
comparison is made between these two design summery, then it is found that bit parallel
arithmetic realization have used more number of 4 input LUTs, more number of occupied slices,
more number of bonded INPUT/OUTPUT compared to bit serial realization. Extra number of
LUTs used are (479-73) =406, extra number of occupied slices are (271-47) =224, extra number
of bonded INPUT/OUTPUT are (26-13) =13.
Number of Slices Flip Flops 34
Number of 4 input LUTs 479
Number of occupied slices 271
Number of bonded INPUT/OUTPUT 26
Chart 1: Design summary of Direct form realization of FIR Filter using bit parallel arithmetic.
From this comparison it is found that bit parallel implementation of Direct form realization will
need more chip area compared to bit serial implementation. As the modern electronics devices
become smaller and smaller so chip area is an important design parameter for any electronics

52
circuits. If the design is considered in terms of chip area, then bit serial implementation of this
digital Filter is advantageous compared to the bit parallel implementation of digital filters.
Power consumption in the circuits is also related to the chip area. If the chip area is increased
then Power consumption will also increased in the circuits as well.
Chart 2: Design summary of Direct form realization of FIR Filter using bit serial arithmetic.
POWER ANALYSIS OF TRANSVERSALSTRUCTURE OR DIRECT
FORM REALIZATION OF FIR FILTER USING BIT PARALLEL
ARITHMETIC & BIT SERIAL ARITHMETIC:-
Comparative study on total estimated power consumption for direct form realization of FIR filter
reveals that, bit parallel arithmetic representation of FIR filter consume more power compare to
bit serial arithmetic representation. Fig.6.1.8 shows the data of xpower analysis of direct form
realization of FIR filters by using Xilinx tool. Which tell that direct form realization of FIR filter
using bit serial arithmetic will consume 0.084 watt power while the same filter produced by
using bit parallel arithmetic will consume 0.090 watt power in the circuitry.

53
Total estimated power
consumption in Watt
Direct form realization of
FIR filter using bit parallel
arithmetic
Direct form realization of
FIR filter using bit serial
arithmetic
0.090w 0.084w
Chart 3: Power summary of Direct form realization of FIR filters.
According to first wave form of Fig.6.1.9.total power consumption is the sum of quiescent
power, logic power, IO power & digital clock manager power. Where quiescent power (also
called static power) is the power drawn by the device when it is powered up, configured with
user logic and there is no switching activity. In XPower Analyzer, the value reported for Total
Quiescent Power is composed of these quiescent power components:
 Device static power – This represents power consumed by the device when it is powered
up without programming the user logic. The main contributor to this number is the
junction temperature. Any change affecting the device operating environment will affect
this power.
 Design static power – This represents the power consumed by the user logic when the
device is programmed and without any switching activity. For instance, depending on
the device family and resource configuration, some blocks used in a design (such as
clock management, I/Os, and Multi-Gigabit Transceivers) will consume a set amount of
power regardless of activity.
The Logic power is used to account for the number of CLB resources, including LUTs, SRLs,
LUT-based RAMs, and flip-flops estimated for use in the design. By implementing the pre-
existing blocks that constitute a design, it is possible to accurately estimate resource utilization
for the bulk of a design. These resource utilization estimates help to predict the logic power,
which is typically the larger share of the dynamic power consumed in any design.

54
With higher switching speeds and capacitive loads, switching I/O power can be a substantial part
of the total power consumption of an FPGA. Because of this, it is important to accurately define
all I/O related parameters in order to measure IO power.
The Digital Clock Manager (DCM) primitive in Xilinx FPGA parts is used to
implement e.g. delay locked loop, digital frequency synthesizer, digital phase shifter, or a digital
spread spectrum. The digital clock manager module is a wrapper around the DCM primitive
which allows it to be used in the EDK tool suite.
FIG 35: Output wave form of power analysis of FIR filter (Direct form) using bit parallel
arithmetic.
In bit parallel arithmetic more no of input output ports are used compared to bit serial arithmetic.
The first waveform of Fig and Fig reveals that bit parallel arithmetic representation consume
more power because of higher input output ports compared to bit serial arithmetic representation.

55
Junction temperature plays an important part in measuring the device static power .Small change
in junction temperature will radically change the device power consumption. The third and
fourth waveform of Fig and Fig provides the information of changes of power with the change in
junction temperature for bit parallel and bit serial arithmetic.
From this analysis we come to know if power is considered as one of the design
criteria, then it is better to design direct form FIR filters by using bit serial arithmetic. Above
results reveals that direct form realization of FIR filters using bit serial arithmetic consume less
power compared to bit parallel arithmetic representation.
FIG 36: Output wave form of power analysis of FIR filter (direct form) using bit serial
arithmetic

56
CASCADE REALIZATION OF FIR FILTER:-
The Eq. No of transversal structure can be realized in Cascade form from factored form of H(Z)
for N odd value.
H(Z) = ∏ (bk0 +bk1z-1 +bk2z-2) for k=1 to (N-1)/2.
= (b10 +b11z-1 +b12z-2) (b20 +b21z-1 +b22z-2)……(b((N-1)/2)0 +b((N-1)/2)1 z-1 +b((N-1)/2)2z-2)
………………………………………………………………………………….Eq.6.2.1.
For N odd, N-1 will be even and H(Z) will have (N-1)/2 second order factors. Each second order
factored form of H (Z) is realized in direct form and in Cascaded to realize H(Z) as shown in
Fig.6.2.1.
FIG 37: Cascade realization of Eq.6.2.1.
For N even
H(Z) = (1+ b10z-1)∏ (bk0 +bk1z-1 +bk2z-2) for k=2 to N/2 ………….Eq.6.2.2.
When N is even, N-1 is odd and H(Z) will have one first order factor and (N-2)/2 second order
factors.
H(Z) = (1+ b10z-1) (b20 +b21z-1 +b22z-2) (b30 +b31z-1 +b32z-2)……… (b(N/2)0 +b(N/2)1z-1 +b(N/2)2 z-2)

57
Now each factored form in H(Z) is realized in Direct form and are Cascaded to obtain the
realization of H(Z) as shown in Fig
FIG 38: Cascade realization of Eq
CASCADE REALIZATION OF FIR FILTER USING BIT PARALLEL
ARITHMETIC:-
Fig shows output result of Cascade realization of FIR filter using bit parallel arithmetic. Here 8
bit input vector ‘x’ is feed parallel with the rising edge of the clock pulse. Recall that the
coefficients are coeff(0)=1,coeff(1)=2,coeff(2)=3,coeff(3)=4 ,coeff(4)=5,coeff(5)
=6,coeff(6)=7,coeff(7)=8, coeff(8)=9. The sequence applied to the input were x[0]=4,
x[1]=3,x[2]=2.For this realization three slice are chosen ,the first stage output are stored in ‘Y’
vector , the second stage output are stored in ‘Z’ vector and the final stage output are stored in
‘P’ vector .Therefore, with all the flip flops previously reset, at the first positive edge of the clock
the expected output is y[0]=coef(0)* x[0]=4. At the next upward transition of the clock, the
expected value of y[1]= coef(0)*x[1]+ coef(1)*x[0]=11 .And one clock cycle later Y[2]=
coef(0)* x[2] +coef(1)* x[1] +coef(2)* x[0]=20 .

58
According to the Fig.6.2.4. if we consider the output result of second slice then
at the first positive edge of the clock the expected value of output is Z[0]=coef(3)* Y[0]=16. In
the next upward transition of the clock, the value of Z[1]= coef(3)*Y[1]+ coef(4)*Y[0]=64 .And
one clock cycle later Z[2]= coef(3)* Y[2] +coef(4)* Y[1] +coef(5)* Y[0]=159 .
FIG 39: FIR Filter Cascade realization using bit parallel arithmetic.
Finally the output result, which is the output of third slice (slice 2), is P[0]=coef(6)* Z[0]=112.
At the next upward transition of the clock, the value of P[1]= coef(6)*Z[1]+ coef(7)*Z[0]=576
and one clock cycle later P[2]= coef(6)* Z[2] +coef(7)* Z[1] +coef(8)* Z[0]=1769 . Where p[0]
appear in the time slot of 300ns to 380ns.

59
CASCADE REALIZATION OF FIR FILTER USING BIT SERIAL
ARITHMETIC:-
Bit serial implementation of this filter is done with help of bit serial adder, serial parallel
multiplier and delay element. In bit serial adder implementation registers are also used. The
registers are used to store the carry output bit and feed this output as carry input in the next clock
cycle. Digital filters are made of adder blocks, so there will be accumulation of delay of at least
one clock pulse for every addition operation. This is one of the reason for bit serial Filter to have
high initial latency.
Figshows output result of direct form realization of FIR Filter using bit serial arithmetic. Here
three single bit input from x0 to x2 are used which are feed with the rising edge of the clock
pulse. As the data are feed serially so the LSB is applied first, which appear in the time slot of
60ns to 120ns.The sequence applied to the input were x(0)=4, x(1)=3,x(2)=2.Therefore, with all
the flip flops previously reset, at the first positive edge of the clock the expected output value is
y(0)=112. In the output LSB will appear first (that is 180ns to 240 ns) and the last bit (MSB) of
‘y0’ is situated in 480 to 540 ns. The output ‘y0’ will appear after a delay of three clock pulse
from the first rising edge of the clock. To propagate the result via three different slice delay due
to serial addition operation will get accumulated in each slice. That’s why the output will appear
after a delay of three clock pulse form the first rising edge of the clock. The expected value of
‘y1’=576 will appear after a delay of six clock pulse from the first rising edge of the clock. This
will go on increasing for ‘y2’ also, the LSB for ‘y2’ will appear after a delay of nine clock pulse
from the first rising edge of the clock.

60
FIG 41: FIR Filter Cascade realization using bit serial arithmetic.
SIMULATION TIME ANALYSIS OF CASCADE REALIZATION OF FIR
FILTER USING BIT PARALLEL ARITHMETIC & BIT SERIAL
ARITHMETIC:-
In bit serial arithmetic the data are feed serially in each clock pulse. As it has been discussed
earlier, in bit serial adder implementation registers are used. The registers are used to store the
carry output bit of adder and feed this output as carry input of adder in the next clock cycle.
Serial adder blocks are integral part of this filter design. There will be accumulation of delay of
one clock pulse for every addition operations. This is one of the reasons for bit serial filter to
have high initial latency. In this example of Cascade filter three slices are used. If we consider
one addition take place in each slice so there will be an initial latency of three clock cycle for
each output. Because the outputs are passes through each slice only ones.
According to Fig.in bit parallel arithmetic Y[0] = 112 is situated in 300ns to 400 ns, after a
initial latency of two clock pulse and according to bit serial arithmetic of Fig Y[0]=112 is
situated after a initial latency of three clock pulse. In bit parallel arithmetic Y[1] = 576 is situated
after a initial latency of three clock pulse and in bit serial arithmetic Y[1] = 576 is situated after a

61
initial latency of six clock pulses. In bit parallel arithmetic Y[2] = 1769 is situated after a initial
latency of four clock pulse and in bit serial arithmetic Y[2] = 1769 is situated after a initial
latency of nine clock pulses So from this comparison it is found that if the no of slice is further
increased , then the initial latency in bit serial arithmetic will increase much higher than that of
initial latency in bit parallel arithmetic. The delay in bit parallel arithmetic is only due to the
presence of registers. To get the output data form a register we have to wait for at least one clock
cycle.
So we can say that, if simulation time is taken as a design parameter then Cascade realization of
FIR filter using bit parallel arithmetic is advantageous compared to the bit serial arithmetic.
AREA ANALYSIS OF CASCADE REALIZATION OF FIR FILTER
USING BIT PARALLEL ARITHMETIC & BISERIAL ARITHMETIC:-
According to design summery of Cascade realization of FIR Filter in Fig,which use bit parallel
arithmetic the number of slice flip flop are 149, number of 4 input LUTs are 729, number of
occupied slices are 395, number of bonded INPUT/OUTPUT are 74. Fig.6.2.7. shows the design
summery of Cascade realization of FIR Filter which use bit serial arithmetic. According to this
figure the number of slice flip flop are 80, the number of 4 input LUTs are 63, number of
occupied slices are 42, number of bonded INPUT/OUTPUT are 9. If a comparison is made
between these two design summery, then it is found that bit parallel arithmetic realization have
used more number of slice flip flops , more number of 4 input LUTs, more number of occupied
slices, more number of bonded INPUT/OUTPUT compared to bit serial realization. Extra
number of slice flip flops are (149-80) =69, extra number of LUTs used are (729-63) =666 extra
number of occupied slices are (395-42) =353, extra number of bonded INPUT/OUTPUT are
(74-9)=65.

62
From this comparison it is found that bit parallel implementation of Cascade realization will need
more chip area compared to bit serial implementation. The chip area is an important design
parameter for any electronics circuits.
If the design is considered with respect to the chip area, then bit serial implementation of this
digital Filter is advantageous compared to the bit parallel implementation of that filter. Power
consumption in the circuits is also related to the chip area. If the chip area is increased then
Power consumption will also increased in the circuits.
Chart 4: Design summary of Cascade realization of FIR Filter using bit parallel arithmetic.
Chart 5: Design summary of Cascade realization of FIR Filter using bit serial arithmetic.

63
POWER ANALYSIS OF CASCADE REALIZATION OF FIR FILTER
USING BIT PARALLEL ARITHMETIC & BIT SERIAL ARITHMETIC:-
The study on total estimated power consumption for Cascade realization of FIR filter reveals
that, bit parallel arithmetic representation of FIR filter consume more power compare to bit serial
arithmetic representation. Fig shows the data of Xpower analysis of lattice realization of FIR
filters by using Xilinx tool. Which tell that Cascade realization of FIR filter using bit serial
arithmetic will consume 0.057 watt power while the same filter produced by using bit parallel
arithmetic will consume 0.068 watt power in the internal circuitry.
consumption in Watt
Cascade realization of FIR
filter using bit parallel
arithmetic
Cascade realization of FIR
filter using bit serial
arithmetic
0.091 w 0.083w
Chart 6: Power summary of Cascade realization of FIR filters.
According to the wave form of Fig.6.2.9.total power consumption is the sum of quiescent power,
logic power, IO power & digital clock manager power. Details description about each and
individual power consumption is given previously in this chapter.

64
FIG 42: Output wave form of power analysis of FIR filter (Cascade realization) using bit
parallel arithmetic.
Comparative study between the first wave form of Fig and Fig reveals that quiescent power,
logic power and digital clock manager (DCM) power are almost same in both the cases. But IO
power consumption is high for bit parallel cases.
In bit parallel arithmetic more number of input output ports are used compared to bit serial
arithmetic. Which is the reason for bit parallel arithmetic representation of Cascade realized filter
to consume more IO power compared to bit serial arithmetic representation.
As a result of that the over all power consumption for bit parallel arithmetic representation of
Cascade FIR filter is higher with respect to bit serially represented lattice filter.

65
FIG 43: Output wave form of power analysis of FIR filter (Cascade realization) using bit serial
arithmetic.
The third and fourth waveform of Fig and Fig provides the information about changes in power
with the change in junction temperature for bit parallel and bit serial arithmetic. Junction
temperature plays an important role in measuring the device static power .Small change in
junction temperature will drastically change the device power consumption. Here operational
junction temperature is chosen as 27.1 degree Celsius.
So the study reveals that if power is considered as one of the design criteria, then it is better to
design Cascade realization of FIR filters by using bit serial arithmetic compared to bit parallel
arithmetic representation.

66
LATTICE STRUCTURE OF AN FIR FILTER:-
Let us consider a FIR Filter with system function
H(Z) = Am(Z) =1+ ∑α m(k)z-k m>=1 for k=1 to m
From which we have
Y(Z) =X(Z)[1 +∑α m(k)z-k ] for k=1 to m
Taking inverse Z-transform on both side we get
y(n)=x(n)+ ∑α m(k)x(n-k) for k=1 to m……………………Eq
Eq.6.3.1. represent a FIR system with system function H(Z)= Am(Z).
Lattice structure for an all zero FIR system is obtained by interchanging the role of input and
output. For an all pole Filter the input x(n) =ƒN(n) and the output y(n) =ƒ0(n)
For all zero FIR system of order M-1 the input x(n) =ƒ0(n) and the output y(n) =ƒM-1(n)
For m =1 the Eq.6.3.1. reduces to
y(n)=x(n)+α1(1) x(n-1)………………………………………………Eq
The output can be obtained from single stage lattice Filter shown in Fig from which we have
x(n) = f0(n) = g0(n)
y(n) = ƒ1(n) = ƒ0(n) + k1g0(n-1)
= x(n) +k1x(n-1)
and g1(n) = k1 ƒ0(n) + g0(n-1)
= k1x(n) + x(n-1)………………………………………Eq
Comparing Eq. with Eq we get α1(0)=1 and α1(1)=k1.

67
FIG 44: single stage all zero Lattice Filter.
Now let us consider an FIR Filter for which m=2.
then y(n)=x(n)+α2(1) x(n-1) + α2(2) x(n-2)………………..
By cascading two lattice stage as shown in Fig it is possible to obtain the output y(n).
FIG 45: Two stage all zero Lattice Filter.
From Fig the output for second stage is
y(n) = ƒ2(n) = ƒ1(n) + k2g1(n-1) ……………………………………………Eq
= g2(n) = k2 ƒ1(n) + g1(n-1)

68
Substitute for ƒ1(n) and g1(n-1) from Eq. in Eq. we get
y(n) = ƒ2(n) = x(n) + k1x(n-1) + k2 [ k1x(n-1) + x(n-2)]
= x(n) +k1(1 + k2)x(n-1) + k2x(n-2)…………………………….Eq
Eq is identical to Eq from which we have
α2(0)= 1, α2(2)= k2 , α2(1) =k1(1+k2) = α1(1)[1 + α2(2)].
Similarly g2(n) = α2x(n) +k1(1+k2)x(n-1) +x(n-2).
LATTICE STRUCTURE OF FIR FILTER USING BIT PARALLEL
ARITHMETIC:-
Fig shows the output result of single stage Lattice realization of FIR Filter using bit parallel
arithmetic. Here 7 bit input vector x is feed parallel with the rising edge of the clock pulse. The
only one coefficient chosen in this stage is k1=3. The block diagram of single stage lattice
realization of FIR Filter is shown in Fig The sequence applied to the input were x[0]=1,
x[1]=2,x[2]=3,X[3]=4.when all the flip flop previously reset, then at the first positive edge of the
clock the expected output is y[0]=x[0]+k1* x[0-1]=1. At the next upward transition of the clock,
the expected value of y[1]= x[1]+k1* x[1-1]=5 .And one clock cycle later Y[2]= x[2]+k1* x[2-
1]=9 and at last Y[3]= x[3]+k1* x[3-1]=13.
As each stage have two sets of output so there will be another set of output in
terms of ‘g’. where at the first positive edge of the clock pulse the value is g[0]=k1*x[0]+ x[0-
1]=3. At the next upward transition of the clock pulse the value is g[1]=k1*x[1]+ x[1-1]=7.And
one clock cycle later g[2]= k1*x[2]+ x[2-1]=11 and at last g[3]= k1*x[3]+x[3-1]=15.

69
FIG 46: FIR Filter Lattice realization using bit parallel arithmetic.
LATTICE STRUCTURE OF FIR FILTER USING BIT SERIAL
ARITHMETIC:-
Bit serial implementation of this Filter is done with help of bit serial adder, serial parallel
multiplier and delay element. Fig.6.3.4 shows output result of Lattice realization of FIR Filter
using bit serial arithmetic. Here three single bit input from x0 to x3 are used, which are feed with
the rising edge of the clock pulse.
As the data are feed serially so the LSB is applied first, which appear in the time slot of 50ns to
150ns for all four input data bit. The sequence applied to the input was x0=1, x1=2,x2=3 and
X3=4.when all the flip flops are previously reset, then at the first positive edge of the clock the
expected output are y0=1 and g0=3. In both the output ‘y’ and ‘g’, LSB will appear first. Which
appear after a delay of one clock pulse from the first rising edge of the clock,(that is 100ns to 200
ns). The expected value of ‘y1’ and ‘g1’ will appear after a delay of two clock pulse from the
first rising edge of the clock. This will go on in the same fashion for y2 and g2 also. At last the
LSB for y3 and g3 will appear after a delay of four clock pulse from the first rising edge of the
clock.

70
FIG 47: FIR Filter Lattice realization using bit serial arithmetic.
SIMULATION TIME ANALYSIS OF LATTICE REALIZATION OF FIR
FILTER USING BIT PARALLEL ARITHMETIC & BIT SERIAL
ARITHMETIC:-
As it has been discussed earlier, in bit serial adder implementation registers are used. The
registers are used to store the carry output bit of adder and feed this output as carry input of
adder in the next clock cycle. So there will be a delay of one clock cycle. As the Filter is made
of this adder block, so there will be accumulation of delay of one clock pulse for every add
operations in the Filter. This is one of the reason for bit serial Filter to have high initial latency.
In this example of Lattice Filter one stage is used. In this Filter realization each output depend on
the present and previous value of input, where previous value of input means input at earlier
clock. So there will be propagation of delay via addition operation in each output. In the bit serial
and parallel realization of this Filter each output has one additional initial latency than the
previous output. For example, output ‘y1’ have one extra initial latency than output ‘y0’. In
Fig.That means LSB of ‘y1’ will appear one clock cycle later than the LSB of ‘y0’.

71
But for bit serial case LSB and MSB of output and input are not appear in the same clock cycle.
As the output LSB appear late so it will take time for the MSB to appear in the output .Which is
not the case of bit parallel implementation. In bit parallel implementation LSB and MSB of any
output are appear in the same clock cycle. So it will take less simulation time to get the output.
From this study it is found that the simulation time taken to make Lattice realization of FIR Filter
using bit serial arithmetic is much higher than bit parallel implementation. So from simulation
time analysis point of view use of bit parallel arithmetic for designing Lattice Filter is
advantageous compared to the bit serial arithmetic.
AREA ANALYSIS OF LATTICE REALIZATION OF FIR FILTER USING
BIT PARALLEL ARITHMETIC & BIT SERIAL ARITHMETIC:-
According to design summery of Lattice realization of FIR Filter in Fig.6.3.5,which use bit
parallel arithmetic the number of slice flip flop are 5, number of 4 input LUTs are 123, number
of occupied slices are 65, number of bonded INPUT/OUTPUT are 42. Fig.6.3.6. shows the
design summery of Lattice realization of Fir Filter which use bit serial arithmetic. According to
this figure the number of slice flip flop are 24, the number of 4 input LUTs are 24, number of
occupied slices are 14, number of bonded INPUT/OUTPUT are 15. If a comparison is made
between these two design summery, then it is found that bit parallel arithmetic realization have
used more number of 4 input LUTs, more number of occupied slices, more number of bonded
INPUT/OUTPUT compared to bit serial realization. Extra number of LUTs used are (123-24)
=99, extra number of occupied slices are (65-14) =51, extra number of bonded INPUT/OUTPUT
are (42-15) =27.

72
Chart 7: Design summary of Lattice realization of FIR Filter using bit parallel arithmetic.
But bit serial arithmetic realization have used more number of slice flip flop (24-5 =19)
compared to bit parallel realization, which is different from the previous two cases. But from
over all analysis it is found that bit parallel implementation of Lattice realization will need more
chip area compared to bit serial implementation. The chip area is an important design parameter
for any electronics circuits.
If the design is considered in terms of chip area, then bit serial implementation of this digital
Filter is advantageous compared to the bit parallel implementation.
Chart 8: Design summary of Lattice realization of FIR Filter using bit serial arithmetic.

73
POWER ANALYSIS OF LATTICE REALIZATION OF FIR FILTER
USING BIT PARALLEL ARITHMETIC & BIT SERIAL ARITHMETIC:-
Fig shows the data of xpower analysis of lattice realization of FIR filters by using Xilinx tool.
Which tell that Lattice realization of FIR filter using bit serial arithmetic will consume 0.057
watt power while the same filter produced by using bit parallel arithmetic will consume 0.068
watt power in the internal circuitry.
consumption in Watt
Lattice realization of FIR
filter using bit parallel
arithmetic
Lattice realization of FIR
filter using bit serial
arithmetic
0.068 w 0.057w
Chart 9: Power summary of lattice realization of FIR filters.
So the study on total estimated power consumption for Lattice realization of FIR filter reveals
that, bit parallel arithmetic representation of FIR filter consume more power compare to bit serial
arithmetic representation. According to the wave form of Fig total power consumption is the sum
of quiescent power, logic power, IO power & digital clock manager power. Details description
about each and individual power consumption is given previously in this chapter.

74
FIG 48:Output wave form of power analysis of FIR filter (Lattice realization) using bit parallel
arithmetic.
Comparative study between the first wave form of Fig and Fig. tells that quiescent power, logic
power and digital clock manager (DCM) power are almost same in both the cases. But IO power
consumption is high for bit parallel cases.
In bit parallel arithmetic more number of input output ports are used compared to bit serial
arithmetic. Which is the reason for bit parallel arithmetic representation of Lattice realized filter
to consume more IO power compared to bit serial arithmetic representation.
As a result of that the over all power consumption for bit parallel arithmetic representation of
Lattice FIR filter is higher with respect to bit serially represented lattice filter.

75
FIG 49: Output wave form of power analysis of FIR filter (Lattice realization) using bit serial
arithmetic.
Junction temperature plays an important role in measuring the device static power .Small change
in junction temperature will drastically change the device power consumption. The third and
fourth waveform of Fig. and Fig. provides the information about changes in power with the
change in junction temperature for bit parallel and bit serial arithmetic. Here operational junction
temperature is chosen as 26.3 degree celcious.
So the study reveals that if power is considered as one of the design criteria, then it is better to
design lattice realization of FIR filters by using bit serial arithmetic compared to bit parallel
arithmetic representation.

76
CONCLUSION
This current work is dealing with an approach to design and implementation of very fast fixed-
function digital filters using bit-serial and bit-parallel arithmetic. The main concerns of the filter
designs are high throughput, small chip area and low power consumption. The increased
throughput can be traded for reduced power consumption through power supply voltage scaling.
VHDL and FPGA provided the platform for realization of Direct form, Cascade form and Lattice
structure of digital Filters using bit serial bit parallel arithmetic.
By making a comparative study among all this Filters to estimate the performance in terms of
simulation time, chip area and power consumption, several important performances are observed.
From these performances it is found that
(i) Simulation time - bit parallel implemented digital filters take less time compared to bit serial
implementation.
(ii) Initial latency - bit serially designed filters have higher initial latency compared to bit parallel
implemented filters.
(iii) Chip area - bit parallel implementation of digital filters consume much larger area compared
to the same filters realized using bit serial arithmetic.
(iv) Power consumption - bit serial digital filters have less power consumption than bit parallel
implementation.
VHDL has been used successfully for designing the filters by loading the VHDL software
(Xilinx) of version 7.1i in pc. For implementing the design, Spartan-3E kit has been chosen,
which is connected via USB port of the pc. But with such direct connection there are some
incompatibilities arising while input bits exceed more than 16 bits. As the Spartan-3E kit is
having 16 input ports and 8 output ports, our implementation is thus restricted by only checking
the peripheral filter circuitry (such as adder, multiplier, subtractor etc.). In all of our designed
filters, there are 32 input bits and 32 output bits. So we have to develop proper interfacing which
will be able to handle more numbers of input and output bits.

77
FUTURE PLANS
In this present work our study is restricted to three different FIR filter realizations. The other
realizations of FIR Filter like direct form two, linear phase realization could be achieved by the
same arithmetic. The other filter realization arithmetic like distributed arithmetic, digit serial
arithmetic etc. can be incorporated as our future plan.
It has been observed that the project have certain limitations regarding
measurement of the area consumed by the designed filter due to unavailability of the proper
simulation tools.
According to Chapter 6, the total chip area is measured by counting the number of look up table,
flip flops etc. So, one of our future goals will be to develop a simulation tool which can measure
the exact chip area in terms of milimeter2 or micrometer2. In the same chapter, we have measured
the power consumed indirectly by some software tools. No tools are available as a free
simulation tool to measure power directly from the designed filters, and also analysis the power
performance of the filter.
At the same time in future, we will include our design expertise to explore the
domain of IIR filters.

78
VHDL CODES FOR FIR FILTERS
 USING BIT-PARALLEL ARITHMETIC
 VHDL Code for 4-BIT COUNTER
entity counter_4_bit is
Port ( clk : in STD_LOGIC;
rst : in STD_LOGIC;
q : inout STD_LOGIC_VECTOR (4 downto 1);
qbar : inout STD_LOGIC_VECTOR (4 downto 1));
end counter_4_bit;
architecture Behavioral of counter_4_bit is
component d_flip_flop is
Port ( d : in STD_LOGIC;
clk : in STD_LOGIC;
rst : in STD_LOGIC;
q : out STD_LOGIC);
end component;
signal i,j,k,l : STD_LOGIC;

79
begin
qbar <= not q;
i<=qbar(1);
j<=q(1) xor q(2);
k<=(q(1) and q(2) and qbar(3)) or (q(3) and (qbar(1) or qbar(2))) ;
l<=(q(4) and (qbar(1) or qbar(2) or qbar(3))) or (q(1) and q(2) and q(3) and qbar(4));
d1: d_flip_flop port map(i,clk,rst,q(1));
d2: d_flip_flop port map(j,clk,rst,q(2));
d3: d_flip_flop port map(k,clk,rst,q(3));
d4: d_flip_flop port map(l,clk,rst,q(4));
end Behavioral;
 VHDL CODE FOR BOOTHMULTIPLIER
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_SIGNED.ALL;
use ieee.numeric_std.all;
entity encoder is
Port ( a : in std_logic_vector(7 downto 0);
arg : in std_logic_vector(2 downto 0);
pprod : out std_logic_vector(15 downto 0));
end encoder;

80
architecture Behavioral of encoder is
function encoder(arg1: std_logic_vector(2 downto 0);data:std_logic_vector(7
downto 0))
return std_logic_vector is
variable temp,temp1,temp2: std_logic_vector(8 downto 0);
variable sign: std_logic;
begin
case arg1 is
when "001"|"010" =>
if data <0 then
temp:='1'& data
else
temp:='0'&data;
end if;
when "011" =>
if data<0 then
temp1:='1'&data;
temp:=temp1(7 downto 0)&'0';
else
temp:='0'&data(6 downto 0)&'0';
end if ;
when "100" =>
if data<0 then
temp1:='1'&data;
temp2:=(not temp1)+"000000001";
temp:=(temp2(7 downto 0)&'0');
else
temp1:='0'&data;
temp2:=(not temp1)+"000000001";
temp:=(temp2(7 downto 0)&'0');
end if;

81
when "101"|"110" =>
if data < 0 then
temp1:='1'&data;
temp:=not(temp1)+"000000001";
else
temp1:='0'&data;
temp:=(not temp1)+"000000001";
end if;
when others =>
temp:="000000000";
end case;
return temp;
end encoder;
signal s1: std_logic_vector(8 downto 0);
signal s2: std_logic;
begin
s1<=encoder(arg,a);
pprod<=sxt(s1,16);
end Behavioral;
 VHDL CODE FOR SIXTEEN BIT FULL ADDER
library IEEE;
use IEEE.STD_LOGIC_SIGNED.ALL;
entity sixteenbit_fa is
Port ( a : in STD_LOGIC_VECTOR (15 downto 0);
b : in STD_LOGIC_VECTOR (15 downto 0);

82
yout : out STD_LOGIC_VECTOR (15 downto 0));
end sixteenbit_fa;
architecture Behavioral of sixteenbit_fa is
signal s: std_logic_vector(15 downto 0);
signal carry1: std_logic_vector(16 downto 0);
COMPONENTtwobit_add
PORT(
a : IN std_logic;
b : IN std_logic;
cin : IN std_logic;
sum : OUT std_logic;
cout : OUT std_logic);
END COMPONENT;
begin
carry1(0)<='0';
g1 : for i in 0 to 15 generate
f0 : twobit_add PORT MAP(a(i), b(i),carry1(i),yout(i), carry1(i+1));
-- inter_carr<=carry(i+1);
end generate g1;
--cout<=carry1(16);
end Behavioral;

83
 VHDL CODE FOR MULTIPLIER
entity multiply is
a : in STD_LOGIC_VECTOR (4 downto 1);
y : out STD_LOGIC_VECTOR (8 downto 1));
end multiply;
architecture Behavioral of multiply is
signal x1,x2,x3,x4,x5,x6,x7,x8 : std_logic_vector(8 downto 1);
begin
process (clk) is
begin
if (b(1)='1') then
x1 <= "0000" & a ;
else
x1 <= "00000000";
end if;
if (b(2)='1') then
x2 <= "000" & a & '0' ;
else
x2 <= "00000000";

84
end if;
if (b(3)='1') then
x3 <= "00" & a & "00" ;
else
x3 <= "00000000";
end if;
if (b(4)='1') then
x4 <= '0' & a & "000" ;
else
x4 <= "00000000";
end if;
end process;
y<= x1 + x2 + x3 + x4;
end Behavioral;

85
 VHDL CODE FOR SERIAL PARALLEL COVERTER
entity converter is
Port ( rst : in STD_LOGIC;
clk : in STD_LOGIC;
start : in STD_LOGIC;
din : in STD_LOGIC_VECTOR (7 downto 0);
dout : out STD_LOGIC);
end converter;
architecture Behavioral of converter is
signal dst :std_logic_vector(7 downto 0):=(others => '0');
signal data,stop: std_logic:= '0';
begin
process (clk,rst)
begin
if rst = '1' then
dst <=(others => '0');
data <= '0';
stop <= '0';

86
elsif rising_edge(clk) then
if start ='1' then
data <= '1';
stop <= '1';
dst <= din;
else
data <= dst(7);
stop <= '0';
dst <= dst(6 downto 0)&stop;
end if;
end if;
end process;
dout<=data;
end Behavioral;
 VHDL CODE FOR FIR FILTERS
entity Filter is
Port ( h0,h1,h2,h3,h4 : in STD_LOGIC_VECTOR (4 downto 1);
cp : in STD_LOGIC;
rst: in STD_LOGIC;
clk: in STD_LOGIC;
xin : in STD_LOGIC_VECTOR (4 downto 1);
y : out STD_LOGIC

87
);
end Filter;
architecture Behavioral of dekhi is
component multiply is
a : in STD_LOGIC_VECTOR (4 downto 1);
y : out STD_LOGIC_VECTOR (8 downto 1));
end component;
component converter is
Port ( rst : in STD_LOGIC;
clk : in STD_LOGIC;
start : in STD_LOGIC;
din : in STD_LOGIC_VECTOR (7 downto 0);
dout : out STD_LOGIC);
end component;
component d_flip_flop is
clk : in STD_LOGIC;
rst : in STD_LOGIC;
q : out STD_LOGIC);
end component;
signal p1,p2,p3,p4,p5,p6,p7,p8,p9,y1,y2,y3,y4,y5 : STD_LOGIC;
signal yy1,yy2,yy3,yy4,yy5 : STD_LOGIC_VECTOR (8 downto 1);
begin
m1: multiply port map(cp,xin,h4,yy1);

88
c1: converter port map(rst,clk,'1',yy1,y1);
d1: d_flip_flop port map(y1,clk,rst,p1);
p2 <= p1 or y2;
d2: d_flip_flop port map(p2,clk,rst,p3);
p4 <= p3 or y3;
p6 <= p5 or y4;
y <= p7 or y5;
end Behavioral;

89
 VHDL Code for ALU-DESIGN
entity ALU is
Port ( a : in STD_LOGIC_VECTOR (3 downto 0);
ch : in STD_LOGIC_VECTOR (1 downto 0);
y : out STD_LOGIC_VECTOR (7 downto 0);
clk : in STD_LOGIC);
end ALU;
architecture Behavioral of ALU is
begin
process (clk,ch,a,b) is
begin
if(rising_edge( clk )) then
case ch is
when "00" =>
y <= a or b;
when "01" =>
y <= a nor b;
when "10" =>
y <= a xor b;
when "11" =>
y <= a nand b;

90
when others => "00000000"
end case;
end if;
end process;
end Behavioral;
 USING SERIAL-BIT ARITHMETIC
 VHDL CODE FOR D FLIP-FLOP
entity d_flip_flop is
clk : in STD_LOGIC;
rst : in STD_LOGIC;
q : out STD_LOGIC);
end d_flip_flop;
architecture Behavioral of d_flip_flop is
begin

91
dff : process (clk,rst) is
begin
if (rst='1') then
q <= '0';
elsif (rising_edge (clk)) then
q <= d;
end if;
end process dff;
end Behavioral;
 VHDL Code for FULL ADDER
entity fa1 is
Port ( a : in STD_LOGIC;
b : in STD_LOGIC;
ci : in STD_LOGIC;
s : out STD_LOGIC;
co : out STD_LOGIC);
end fa1;
architecture Behavioral of fa1 is
component or1

92
port(a,b: in std_logic;
y: out std_logic);
end component;
component ha1
port(a,b: in std_logic;
s,co: out std_logic);
end component;
signal s1,c1,c2: std_logic;
begin
a1: ha1 port map(a,b,s1,c1);
a2: ha1 port map(s1,ci,s,c2);
a3: or1 port map(c1,c2,co);
end Behavioral;

93
 VHDL Code for HALF ADDER
entity ha1 is
Port ( a : in STD_LOGIC;
b : in STD_LOGIC;
s : out STD_LOGIC;
co : out STD_LOGIC);
end ha1;
architecture Behavioral of ha1 is
begin
s<= a xor b;
co<= a and b;
end Behavioral;
 VHDL Code for RIGHT SHIFTER
entity rt_shifter is
rst : in STD_LOGIC;
sin : in STD_LOGIC;
y : out STD_LOGIC);

94
end rt_shifter;
architecture Behavioral of rt_shifter is
component d_flip_flop
clk : in STD_LOGIC;
rst : in STD_LOGIC;
q : out STD_LOGIC);
end component;
signal tmp,y4,y3,y2,yo : STD_LOGIC;
begin
process (clk,rst) is
begin
if (rising_edge(clk)and rst ='0') then
tmp <=sin;
end if;
end process;
d1: d_flip_flop port map(tmp,clk,rst,y4);
d2: d_flip_flop port map(y4,clk,rst,y3);
d3: d_flip_flop port map(y3,clk,rst,y2);

95
d4: d_flip_flop port map(y2,clk,rst,y);
end Behavioral;
 VHDL CODE FOR DELAY
library IEEE;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity reg is
Port ( d : in std_logic;
clk : in std_logic;
rst : in std_logic;
q : out std_logic);
end reg;
architecture reg of reg is
signal state : std_logic;
begin

96
process(clk,rst)
begin
if (rst='1')
then state<= '0';
elsif (clk'event and clk='1')
then state<=d;
end if;
end process;
q<= state;
end reg;
 VHDL CODE FOR PIPELINE
library IEEE;
entity pipe is
Port ( a : in std_logic;
b : in std_logic;
clk : in std_logic;
rst : in std_logic;

97
q : out std_logic);
end pipe;
architecture structural of pipe is
component reg is
Port ( d : in std_logic;
clk : in std_logic;
rst : in std_logic;
q : out std_logic);
end component;
component fau is
Port ( a : in std_logic;
b : in std_logic;
cin : in std_logic;
s : out std_logic;
cout : out std_logic);
end component;
signal s,cin,cout: std_logic;
begin
u1: component fau port map(a ,b,cin, s , cout);
u2: component reg port map(cout ,clk ,rst, cin);
u3: component reg port map(s ,clk ,rst, q);
end structural;

98
 VHDL CODE FOR FIR FILTER
library IEEE;
61
entity fir is
Port ( x0 : in STD_LOGIC;
x1 : in STD_LOGIC;
clk : in STD_LOGIC;
rst : in STD_LOGIC;
y0 : out STD_LOGIC;
y1 : out STD_LOGIC);
end fir;
architecture Behavioral of fir is
COMPONENTadder
PORT(
a : IN std_logic;
b : IN std_logic;
clk: IN std_logic;
rst: IN std_logic;
s : OUT std_logic);
END COMPONENT;
COMPONENTmultiplier

99
PORT(
a : IN std_logic;
clk: IN std_logic;
rst: IN std_logic;
b : IN std_logic_vector(3 downto 0);
prod : OUT std_logic);
END COMPONENT;
type coefficients is array (3 downto 0)of std_logic_vector(3 downto 0);
constant coef:coefficients
:=("1001","0011");
signal p1,p2: std_logic;
begin
m1: multiplier port map(x0,clk,rst,coef(0),y0);
m2: multiplier port map(x0,clk,rst,coef(1),p1);
m3: multiplier port map(x1,clk,rst,coef(0),p2);
a1: adder port map(p1,p2,clk,rst,y1);
end Behavioral;

100
REFERENCES
Articles from published conference proceedings:
[1] M. Vesterbacka, K. Palmkvist, and L. Wanhammar: Serial Squarers and Serial/Serial
Multipliers, National Conference on Radio Science (RVK-96), Lule., Sweden, June 3-6, 1996.
[2] M. Vesterbacka, K. Palmkvist, and L. Wanhammar: Realization of Serial/Parallel Multipliers
with Fixed Coefficients, National Conference on Radio Science (RVK-93), Lund Institute of
Technology, Lund, Sweden, pp. 209-212, April 5-7, 1993.
[3] K. Palmkvist, M. Vesterbacka, P. Sandberg, L. Wanhammar: Scheduling of Data-
Independent Recursive Algorithms, Proc. European Conference on Circuit Theory and Design
(ECCTD’95), Istanbul, Turkey, pp. 855-858, Aug. 27-31, 1995.
Books:
[4] L. Wanhammar: DSP Integrated Circuits, Linkoping University, 1996.
[5] A. P. Chandrakasan and R. W.Brodersen: Low Power Digital CMOS Design, Kluwer
Academic Publ., 1995
[6] P. Sandberg , K. Palmkvist ,L. Wanhammar ,R. Gustavsson : Synthesis of the SIC
Architecture from VHDL, LiTH-ISY-R-1610, Linkoping University, Sweden.
[7] A. Bellaouar and M. Elmasry: Low-Power Digital VLSI Design: Circuits and Systems,
Kluwer Academic Publ., 1995.
[8] I. Koren : Computer Arithmetic Algorithms, Prentice Hall, 1993.
Technicalreports:

101
[9] M. Vesterbacka ,K. Palmkvist ,P. Sandberg , and L. Wanhammar : Implementation of Fast
Bit-Serial Lattice Wave Digital Filters, Proc. IEEE Int. Symposium on Circuits and Systems
(ISCAS’94), Vol. 2, pp. 113-116, London, England, May 29- June 2, 1994.
[10] C.G.Wallace: A Suggestion for a Fast Multiplier, IEEE Trans. Electronic Computers, Vol.
EC-13, pp. 14-17, February, 1964.
[11] M.Vesterbacka : Implementation of Maximally Fast Wave Digital Filters, Linkoping
Studies in Science and Technology, Thesis No. 495, Linkoping University, Sweden, 1995.
[12] P. Sandberg, K. Palmkvist , and L. Wanhammar : Some Experiences From Automatic
Synthesis of Digital Filters, Proc. NorChip-94, G.teborg, Sweden, Nov. 8-9, 1994.

Digital filter design using VHDL

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Digital filter design using VHDL (20)

Recently uploaded (20)

Digital filter design using VHDL