The Design of An 8-Tap FIR Filter Using A Flexible MAC
The Design of An 8-Tap FIR Filter Using A Flexible MAC
ENSC 895 Advanced VLSI Systems Design, Final Project Report, Group 1
Henry Fu, Muyar Htun, Vijayaraghavan Ravi
School of Engineering Science
Simon Fraser University
Burnaby, Canada
AbstractThis project realizes the design of an 8-Tap FIR filter
using 8 instances of a flexible MAC (FIRMAC). The FIRMAC is
a black box module, which functions as both a standard MAC and
a FIR tap. The top level FIR filter (TAPSET) replicates 8 instances
of the FIRMAC. The respective testbench for both the FIRMAC
and TAPSET was written and tested, and the functionality was
verified with ModelSim. The design was synthesized using
Synopsys DC Shell for various clock periods to select the target
frequency. The Verilog netlist and the clock tree synthesis
generated were used in Encounter for automated place and route
(P&R), where both flat and hierarchical P&R were implemented
and compared. Post P&R of the design was tested for power
integrity using RedHawk.
KeywordsVLSI; FIR filter; MAC; VHDL; synthesis; place &
route
I. Introduction
The goal of this VLSI project is to design and implement
an 8-tap finite impulse response (FIR) filter using a
multiplier-accumulator (MAC) macro. A macro is a
building block designed to suit various applications. For
instance, the FIRMAC macro is suitable for FIR filter,
concurrent MAC array, Fast Fourier Transform (FFT)
butterfly construct, etc. In this project, the layout of the
FIR filter is implemented with the cmos045 StdCells
Technology and in two different approaches: flat and
hierarchical. For the former, the filter is synthesized with a
standard cell library, the netlist generated will then be used
for P&R. For the latter, the FIRMAC macro is first
synthesized, and placed and routed, the P&R parameters of
the macro is then added to the standard cell library. When
the filter is synthesized with this new library, the macro
will be used as a sub-block in generating the netlist. The
synthesis and P&R results of these two approaches will be
compared for their properties, and their power integrity
will be analyzed.
II. Design Methodology
First the FIRMAC macro is designed. Figure 1 shows the
FIRMAC, which acts as a FIR tap when the operating mode
op is set to FIR and acts a MAC when op is set to
MAC. The FIRMAC has four input pins: clk, resetn, in1
and in2_sum, and one output pin: out1. resetn is used to
reset the flip-flop (FF) and to input the coefficient to the FF
in the FIR mode. in2_sum is a flexible pin which serves
two purposes depending on op. The flip-flop (FF) is used
Period
(ns)
12
Freq.
(MHz)
83.33
Slack
(ns)
-0.16
Area
(um2)
6672.61
Area
(Kg)
8.34
Dynamic
(mW)
0.96
Leakage
(uW)
102.52
Period
(ns)
42
Freq.
(MHz)
23.81
Slack
(ns)
4.34
Area
(um2)
49272.73
Area
(kg)
61.59
Dynamic
(uW)
77.86
Leakage
(uW)
13.72
10
9
100.00
-0.2
6561.69
8.20
1.13
100.72
40
25.00
2.34
49272.73
61.59
81.74
13.72
111.11
-0.14
6588.29
8.24
1.26
101.25
38
26.32
0.34
49272.73
61.59
86.02
13.72
125.00
-0.04
6609.30
8.26
1.44
100.92
37.5
26.67
0.07
49277.52
61.60
86.44
13.87
142.86
-0.41
6610.36
8.26
1.66
101.96
The TAPSET module was synthesized for both the flat and
hierarchical design cases also using the slow settings of the
library file. The different time periods and their
corresponding values for the flat design are given in Table
2. The target frequency of 40.00 MHz is chosen as it is the
fastest in clock speed, and with fair area and power.
Figure 5. Flat place and route of TAPSET.
Table 2. Synthesis results of the flat design of TAPSET.
Period
(ns)
35
Freq.
(MHz)
28.57
Slack
(ns)
0.08
Area
(um2)
68229.53
Area
(kg)
85.29
Dynamic
(uW)
640.71
Leakage
(mW)
1.2594
34
29.41
0.02
68122.60
85.15
659.31
1.2563
33.5
29.85
0.05
68207.72
85.26
673.81
1.2589
32
31.25
0.03
66567.83
83.21
698.98
1.2134
30
33.33
0.05
66167.23
82.71
774.01
1.1985
28
35.71
0.01
64113.98
80.14
828.79
1.1391
27
37.04
0.01
64111.85
80.14
884.76
1.1369
25
40.00
0.01
65590.28
81.99
987.02
1.1707
FIRMAC core
firtap_0
firtap_1
firtap_2
firtap_3
firtap_4
firtap_5
firtap_6
firtap_7
Flat P&R
77328 um2
72829 um2
0 um2
94.2%
72222
0.26680 ns
0.58350 ns
Hierarchical P&R
204000 um2
89047 um2
70413 um2
78.2%
13859
0.21500 ns
0.14890 ns
The CPU time used to run the P&R scripts was measured
and as shown in Table 5, where the overall CPU time spent
on hierarchical P&R was only around one-half of that of
the flat P&R, it was because P&R was already done for the
FIRMAC macro beforehand.
Table 5. CPU time used to run the P&R scripts.
Script Step
top.tcl
01-importDesign.tcl
02-floorplan.tcl
03-place.tcl
04-postPlaceOpt.tcl
05-cts.tcl
06-postCTSOpt.tcl
07-route.tcl
08-finishing.tcl
C. Power Integrity
Power integrity analysis [2] was performed using RedHawk
(VCD was not used due to the limited time of the project).
1.
Static analysis
Unlike the flat P&R, the hierarchical P&R has the higher
voltage drops distributed in different areas, Figures 9 shows
the voltage drops on instances.
2. Dynamic Analysis
Flat P&R
40.00 MHz
3.4134 mW
0.68363 mW
1.2322 mW
1.4975 mW
100%
72222
Hierarchical P&R
26.67 MHz
2.4948 mW
0.68759 mW
0.82003 mW
0.98717 mW
100%
70699
Static (mV)
2.2
3.6
Dynamic (mV)
25.3
37.3
Figure 11. Voltage drop on hierarchical P&R min. vdd-vss over time
window.
Figures 12 and 13 show the total current drawn for the flat
and hierarchical P&R respectively, the flat P&R has a
slightly higher current overall.
Figure 12. Current plot at Vdd of flat P&R, clock period = 25.0 ns.
The design of a FIRMAC macro and its application in an 8tap FIR filter implementation has been proved successful,
where the post P&R hold and setup are all met, and only a
few violations in dangling power wires are found in the
hierarchical P&R, which could be cut in a real design. The
use of the flat and hierarchical P&R has demonstrated both
have their advantages. The flat design has an advantage in
clock speed and area, where the hierarchical design has an
advantage in CPU time and re-design time.
Acknowledgment
We gratefully acknowledge the help and support from Dr.
Fabio Campi and Josh Ancill throughout the project.
Figure 13. Current plot at Vdd of hierarchical P&R, clock period = 37.5
ns.
References
[1] Campi, Fabio, ENSC 450 VLSI Design, Lab Tutorial: Napkin to
Chip. SFU
[2] Campi, Fabio, ENSC 450 VLSI Design, Lab Tutorial: Power
Integrity. SFU