0% found this document useful (0 votes)
155 views

The Design of An 8-Tap FIR Filter Using A Flexible MAC

The document describes the design of an 8-tap FIR filter using a flexible MAC module called FIRMAC. It discusses designing the FIRMAC and TAPSET modules in VHDL, simulating them in ModelSim, synthesizing them for different clock periods, and implementing the filter using flat and hierarchical place and route in Encounter. It also compares the results of the two approaches.

Uploaded by

viyrgn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
155 views

The Design of An 8-Tap FIR Filter Using A Flexible MAC

The document describes the design of an 8-tap FIR filter using a flexible MAC module called FIRMAC. It discusses designing the FIRMAC and TAPSET modules in VHDL, simulating them in ModelSim, synthesizing them for different clock periods, and implementing the filter using flat and hierarchical place and route in Encounter. It also compares the results of the two approaches.

Uploaded by

viyrgn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

The Design of an 8-Tap FIR Filter Using a Flexible MAC

ENSC 895 Advanced VLSI Systems Design, Final Project Report, Group 1
Henry Fu, Muyar Htun, Vijayaraghavan Ravi
School of Engineering Science
Simon Fraser University
Burnaby, Canada
AbstractThis project realizes the design of an 8-Tap FIR filter
using 8 instances of a flexible MAC (FIRMAC). The FIRMAC is
a black box module, which functions as both a standard MAC and
a FIR tap. The top level FIR filter (TAPSET) replicates 8 instances
of the FIRMAC. The respective testbench for both the FIRMAC
and TAPSET was written and tested, and the functionality was
verified with ModelSim. The design was synthesized using
Synopsys DC Shell for various clock periods to select the target
frequency. The Verilog netlist and the clock tree synthesis
generated were used in Encounter for automated place and route
(P&R), where both flat and hierarchical P&R were implemented
and compared. Post P&R of the design was tested for power
integrity using RedHawk.
KeywordsVLSI; FIR filter; MAC; VHDL; synthesis; place &
route

I. Introduction
The goal of this VLSI project is to design and implement
an 8-tap finite impulse response (FIR) filter using a
multiplier-accumulator (MAC) macro. A macro is a
building block designed to suit various applications. For
instance, the FIRMAC macro is suitable for FIR filter,
concurrent MAC array, Fast Fourier Transform (FFT)
butterfly construct, etc. In this project, the layout of the
FIR filter is implemented with the cmos045 StdCells
Technology and in two different approaches: flat and
hierarchical. For the former, the filter is synthesized with a
standard cell library, the netlist generated will then be used
for P&R. For the latter, the FIRMAC macro is first
synthesized, and placed and routed, the P&R parameters of
the macro is then added to the standard cell library. When
the filter is synthesized with this new library, the macro
will be used as a sub-block in generating the netlist. The
synthesis and P&R results of these two approaches will be
compared for their properties, and their power integrity
will be analyzed.
II. Design Methodology
First the FIRMAC macro is designed. Figure 1 shows the
FIRMAC, which acts as a FIR tap when the operating mode
op is set to FIR and acts a MAC when op is set to
MAC. The FIRMAC has four input pins: clk, resetn, in1
and in2_sum, and one output pin: out1. resetn is used to
reset the flip-flop (FF) and to input the coefficient to the FF
in the FIR mode. in2_sum is a flexible pin which serves
two purposes depending on op. The flip-flop (FF) is used

to hold the accumulator output in the MAC mode, and to


hold the coefficient in the FIR mode.

Figure 1. FIRMAC, the flexible MAC module. On the left is the


multiplier-accumulator configuration. On the right is the FIR tap
configuration.

Figure 2 shows the hardware structure of an 8-tap FIR filter.


In the implementation, the top level FIR filter TAPSET
module is implemented with 8 serial FIRMACs, hence the
leftmost FIRMAC has an extra FF and accumulator. The
extra FF spends one more clock cycle to feed the input
signal, the extra accumulator can be virtually eliminated by
assigning a 0 to its in2_sum input pin.

Figure 2. The configuration of an 8-tap FIR filter.

A. Simulation and Synthesis


A testbench was written in VHDL to test the FIRMAC
operation for the first 100ns and test the FIR operation for
the next 100ns. The VHDL compiled successfully and
when the results were tested using ModelSim, the
waveform conformed to the expected values. Figure 3
shows the ModelSim waveform for the flexible FIRMAC
module.

Figure 3. VHDL simulation of FIRMAC, clock period = 20ns.

The FIRMAC module was then synthesized for different


time periods using Synopsys DC Shell with the slow
settings NangateOpenCellLibrary_slow.db of the library
file. The results are shown in Table 1.

The different time periods and the corresponding results for


the hierarchical design are given in Table 3. The target
frequency of 26.67 MHz is chosen as it is fastest in clock
speed and also with fair area and power.

Table 1. Synthesis results of FIRMAC.

Table 3. Synthesis results of the hierarchical design of TAPSET.

Period
(ns)
12

Freq.
(MHz)
83.33

Slack
(ns)
-0.16

Area
(um2)
6672.61

Area
(Kg)
8.34

Dynamic
(mW)
0.96

Leakage
(uW)
102.52

Period
(ns)
42

Freq.
(MHz)
23.81

Slack
(ns)
4.34

Area
(um2)
49272.73

Area
(kg)
61.59

Dynamic
(uW)
77.86

Leakage
(uW)
13.72

10
9

100.00

-0.2

6561.69

8.20

1.13

100.72

40

25.00

2.34

49272.73

61.59

81.74

13.72

111.11

-0.14

6588.29

8.24

1.26

101.25

38

26.32

0.34

49272.73

61.59

86.02

13.72

125.00

-0.04

6609.30

8.26

1.44

100.92

37.5

26.67

0.07

49277.52

61.60

86.44

13.87

142.86

-0.41

6610.36

8.26

1.66

101.96

The target frequency selected is 125.00 MHz (8 ns period)


as the slack is the best at -0.04 ns. The negative slack could
be due to the timing constraint
set_max_delay -from in2_sum[*] -to out1[*] 3.5
was set too tight, which was added to the synth.tcl script to
force all the adders to generate a sum within one clock
period. The maximum delay time 3.5 ns was chosen, since
when a time was set higher than 3.5 ns, Synopsis displayed
an internal system error. If it was set lower than 3.5 ns, a
higher slack was found. Although the FIRMAC slack time
is negative, this could be compensated through the next step
in P&R.
The TAPSET module and its corresponding VHDL was
compiled successfully. Figure 4 shows the ModelSim
simulation of the TAPSET module, the result and
functionality were verified.

The FIRMAC macro shows a higher dynamic power than


the flat TAPSET because a much higher clock frequencies
were used in synthesis. The hierarchical TAPSET shows
very low dynamic and static power since the power of the
FIRMAC instances were not included.
B. Place and Route [1]
The flat TAPSET P&R was performed using the cmos045
standard cells. The target frequency was 40MHz. The
respective Verilog and sdc files were copied to Encounters
inputs folder, and the backend environment was setup.
The floorplan script file was changed to accommodate the
standard cells with a good density of 0.9, and core to edge
spacing of 4 um. The scripts were run in Encounter to
perform the P&R.
After P&R, the results were checked for any violations.
There were no DRC or timing violations. Figure 5 shows
the final layout in Encounter for the flat design.

Figure 4. VHDL simulation of TAPSET, clock period = 10ns.

The TAPSET module was synthesized for both the flat and
hierarchical design cases also using the slow settings of the
library file. The different time periods and their
corresponding values for the flat design are given in Table
2. The target frequency of 40.00 MHz is chosen as it is the
fastest in clock speed, and with fair area and power.
Figure 5. Flat place and route of TAPSET.
Table 2. Synthesis results of the flat design of TAPSET.
Period
(ns)
35

Freq.
(MHz)
28.57

Slack
(ns)
0.08

Area
(um2)
68229.53

Area
(kg)
85.29

Dynamic
(uW)
640.71

Leakage
(mW)
1.2594

34

29.41

0.02

68122.60

85.15

659.31

1.2563

33.5

29.85

0.05

68207.72

85.26

673.81

1.2589

32

31.25

0.03

66567.83

83.21

698.98

1.2134

30

33.33

0.05

66167.23

82.71

774.01

1.1985

28

35.71

0.01

64113.98

80.14

828.79

1.1391

27

37.04

0.01

64111.85

80.14

884.76

1.1369

25

40.00

0.01

65590.28

81.99

987.02

1.1707

The hierarchical TAPSET P&R was performed using the


FIRMAC macro as the sub-block. The target frequency was
26.67 MHz and the floorplan was changed to allow
Encounter to place the FIRMAC instances on given
locations. A rectangular core design was chosen to better
arrange the placement of the FIRMAC blocks and to
conserve area on chip. Figure 6 shows the Final hierarchical
design layout in Encounter. Table 4 show the P&R
summary from the 08-finishing step.

Encounter shows the clock period is 37.50 ns (frequency =


26.67 MHz), the output external delay is -0.80 ns, and the
total required time is 36.70 ns. The total sum of delay is
found as 35.80 ns, and the slack time is 0.149 ns.
Table 6 show the delay time at each FIRMAC core
measured using Encounters Timing Path Anaylzer. Where
firtap_7 is the slowest since it drives the output pins.
Table 6. The delay time of each FIRMAC core.

Figure 6. Hierarchical place and route of TAPSET.

FIRMAC core
firtap_0
firtap_1
firtap_2
firtap_3
firtap_4
firtap_5
firtap_6
firtap_7

Table 4. P&R Summary.


Data from 08-finishing step
Area of Chip
Area of Standard Cells
Area of Macros
Chip Density
# Instances
Hold Slack Time
Setup Slack Time

Flat P&R
77328 um2
72829 um2
0 um2
94.2%
72222
0.26680 ns
0.58350 ns

Hierarchical P&R
204000 um2
89047 um2
70413 um2
78.2%
13859
0.21500 ns
0.14890 ns

The CPU time used to run the P&R scripts was measured
and as shown in Table 5, where the overall CPU time spent
on hierarchical P&R was only around one-half of that of
the flat P&R, it was because P&R was already done for the
FIRMAC macro beforehand.
Table 5. CPU time used to run the P&R scripts.
Script Step
top.tcl
01-importDesign.tcl
02-floorplan.tcl
03-place.tcl
04-postPlaceOpt.tcl
05-cts.tcl
06-postCTSOpt.tcl
07-route.tcl
08-finishing.tcl

Flat P&R (sec.)


501.67
7.76
3.25
71.45
130.07
24.26
99.97
95.84
65.73

Hierarchical P&R (sec.)


266.63
9.51
4.82
18.21
33.99
14.54
92.66
32.37
59.04

Delay Time (ns)


4.582
4.651
4.342
3.873
3.861
3.873
4.247
5.320

C. Power Integrity
Power integrity analysis [2] was performed using RedHawk
(VCD was not used due to the limited time of the project).
1.

Static analysis

Static analysis involves only resistance in the test. Figure 8


shows the voltage drop on the flat P&R instances. A large
area in the middle section has a higher voltage drop due to
high cell density.

Figure 7 shows the critical path of the hierarchical layout.


The path goes through the following pins:
Figure 8. Voltage drop on flat P&R instances.
delay_out reg[0][1]
firtap_0(in1[1] out1[18])
firtap_1(in2_sum[18] out1[27])
firtap_2(in2_sum[27] out1[30])
firtap_3(in2_sum[30] out1[30])
firtap_4(in2_sum[30] out1[30])
firtap_5(in2_sum[30] out1[30])
firtap_6(in2_sum[30] out1[31])
firtap_7(in2_sum[31] out1[56])
FIRout[56]

Unlike the flat P&R, the hierarchical P&R has the higher
voltage drops distributed in different areas, Figures 9 shows
the voltage drops on instances.

Figure 9. Voltage drop on hierarchical P&R instances.

2. Dynamic Analysis

Figure 7. The critical path of the hierarchical design.

In dynamic analysis, resistance, inductance and capacitance


all have to be considered. Figures 10 and 11 illustrate the
voltage drop effects on the respective cases. For the
hierarchical case, some higher voltage drops are noticed
outside the cores at the flip-flops.

Table 7. Power analysis of the two designs.


Power Analysis
Frequency
Total Power
Leakage Power
Internal Power
Switching Power
% Total Power
Instance Count
Figure 10. Voltage drop on flat P&R min vdd-vss over time window.

Flat P&R
40.00 MHz
3.4134 mW
0.68363 mW
1.2322 mW
1.4975 mW
100%
72222

Hierarchical P&R
26.67 MHz
2.4948 mW
0.68759 mW
0.82003 mW
0.98717 mW
100%
70699

Table 8. Maximum voltage drop of instances.


Max. Voltage Drop of Instances
Flat P&R
Hierarchical P&R

Static (mV)
2.2
3.6

Dynamic (mV)
25.3
37.3

III. Results and Interpretation

Figure 11. Voltage drop on hierarchical P&R min. vdd-vss over time
window.

Figures 12 and 13 show the total current drawn for the flat
and hierarchical P&R respectively, the flat P&R has a
slightly higher current overall.

From the synthesis results in Tables 2 and 3, the flat design


has an advantage in clock frequency due to its more compact
design, it supports a clock up to 40 MHz, where the
hierarchical design has a limit at 26.67 MHz. Encounter
shows the flat design has a chip area of 77,328 um2 with a
density of 94.2%, and the hierarchical design has a chip area
of 204,000 um2, and a density of 78.2%. Table 7 shows the
flat design has a higher total power than the hierarchical
design, where the leakage power is about the same. The
higher operating clock frequency of the flat design should
account for part of the higher power. Table 8 shows the
maximum voltage drop of instances, where the hierarchical
design is higher in both static and dynamic cases, these
instances should be the flip-flops. Figures 12 and 13 show
that the current at Vdd of the flat design is slightly higher at
the peaks (rising and falling clock edges) and the non-peak
areas. Table 6 shows the FIRMAC module delay time is
above 3.5 ns for each core, this might explain why the best
slack obtained in Table 1 is still negative.
IV. Conclusions

Figure 12. Current plot at Vdd of flat P&R, clock period = 25.0 ns.

The design of a FIRMAC macro and its application in an 8tap FIR filter implementation has been proved successful,
where the post P&R hold and setup are all met, and only a
few violations in dangling power wires are found in the
hierarchical P&R, which could be cut in a real design. The
use of the flat and hierarchical P&R has demonstrated both
have their advantages. The flat design has an advantage in
clock speed and area, where the hierarchical design has an
advantage in CPU time and re-design time.
Acknowledgment
We gratefully acknowledge the help and support from Dr.
Fabio Campi and Josh Ancill throughout the project.

Figure 13. Current plot at Vdd of hierarchical P&R, clock period = 37.5
ns.

Table 7 shows the power analysis results from the


RedHawks power summary report files. Table 8 shows
the maximum voltage drops of instances from the
RedHawks TAPSET.inst (static) and TAPSET.dvd
(dynamic) files and from Figures 8 to 11.

References
[1] Campi, Fabio, ENSC 450 VLSI Design, Lab Tutorial: Napkin to
Chip. SFU
[2] Campi, Fabio, ENSC 450 VLSI Design, Lab Tutorial: Power
Integrity. SFU

You might also like