RTL Design Techniques To Reduce The Power Consumption of FPGA Based Circuits
RTL Design Techniques To Reduce The Power Consumption of FPGA Based Circuits
Abstract
This paper addresses a design methodology to reduce
the power consumption of digital circuits implemented in
FPGA devices. An experimental setup to evaluate its effectiveness is presented. Some RTL design techniques were
applied to several studied circuits and the corresponding
results are presented and discussed.
Introduction
The power consumption of general purpose digital circuits has become increasingly a matter of top, both at the design level and at the commercial level. Specially in portable
equipment this is a central issue and in the last years a great
effort was made to reduce the power consumption. This way
a third criteria was added to the traditional design space exploration considering the area and performance trade-offs.
The power consumed by digital complementary metal
oxide semiconductor (CMOS) circuits results from the contribution of several parcels [2], having different causes.
Among them the most important result from the consecutive charge and discharge of the capacitive loads associated
with the logic gates. The causes are the currents that travels
at the time that an output changes due to the simultaneous
conduction of both the networks, P and N, and the leakage current that goes through the transistors [3]. The last
one has gained increasing importance in recent years, due
to the decreasing of the size of the manufacturing technology, which motivates the increase of the leakage currents.
On CMOS circuits the power consumption can be divided in two categories, the static power, which does not
depends on the clock signal, and the dynamic power, that
depends of the clock activity. The first parcel can not be
reduced during the design process because it only depends
of the amount of hardware present in the circuit and of the
manufacturing technology. On the other hand the dynamic
power can be reduced by minimizing the number of logical
One of the main causes of the dynamic power consumption are the glitches [5, 11]. They consist in undesirable
switching activity, caused by the different signal delays that
occur in a datapath. For example, if the entries in an AND
gate are at a given moment 0 and 1, the output will be 0. If
both inputs switch, the output should remain in 0, but if the
entry which was 0 switch before the otherone, the output
switch to 1, but only after the switch of the other one input
make the correct value.
This type of undesirable switching activity is responsible
by an increase of the dynamic power consumption component increasing the overall power. This source of unnecessary switching activity can be reduced via delay equalization of the gate input signals. In [11] it is described
how the power consumption is decreased by the reducing
glitches. The author refers a reduction of 92% in consumption through delay equalization. The author has used several
circuit types such as adders, multipliers and dividers.
Another solution to reduce glitches is to use registers at
the gate outputs with a common clock signal, in order to
eliminate the differences between the delay of each signal.
This technique is known as pipelining and it can be used
to reduce the power consumption of combinational circuits
[16]. A pipelined architecture can result from dividing a
combinational circuit in various stages, introducing registers at each stage level and operating them synchronously
with the clock signal of the circuit. This way the circuit
is able to process more information in less time, because
the processing paths become shorter and each of them process different parts of the same task. Furthermore, the
registers introduced in the various stages block the spread
of glitches and decreases the capacitive load of the datapath, which means decreasing the dynamic component of the
power consumption.
Pipelining is particularly interesting in the case of circuits implemented in FPGAs. Usually the configurable
blocs that form the FPGAs contain the flip-flops needed to
implement pipelining, and that are not used in purely combinational circuits. A pipelined architecture was used by
the authors of [1, 13] in multipliers circuits. The achieved
power consumption reduction was 33% for circuits implemented in a FPGA XC3050 and 58% for circuit implemented in a XC4005.
In circuits with FSMs (Finite State Machines) based control, state encoding may [15, 14] influence the power consumption of that circuits. If there are many states or many
changes of state occur, the power consumed by the FSM can
be significant. To decrease the power consumption the state
encoding should be chosen so that it minimizes the change
of bits between consecutive states. There are codes specially conceived for that purpose where consecutive codes
differ only by one or two bits. Examples of that are the Gray
code and one-hot code, respectively. By this way, in circuits
where FSMs have relevant impact, it is possible to minimize
the power consumption exploring different of state encoding, strategies.
In [15] the authors said that it is possible to achieve a
power reduction of 57% by choosing properly the FSM state
encoding. They conclude that in FSMs with many states
(tipycally more than sixteen) one-hot encoding gets the better results and in FSMs with few states (less than eight) binary encoding leads to a smaller power consumption.
The decomposition of a FSM in smaller FSMs, with low
activity between them, make a favorable procedure since
it is possible to reduce the power consumption [14]. The
authors refer that from the application of this technique it
was possible a power reduction of 46%.
It is only acceptable to draw conclusions about the effectiveness of an architecture if it is possible to know the power
consumed by the circuit in question. To do that two methods can be considered: estimating the power consumption
by using a software tool or measuring the current consumed
by the implementation of the circuit. In spite of the additional cost, the real measurement allows to obtain values
with more precision. For that purpose, a prototype based in
a Xilinx Spartan3 FPGA is used.
3.1
Power estimation
3.2
Current measurement
The real power consumption can be obtained by measuring simultaneously the current and the voltage applied
to the circuit. In this work it was achieved by measuring
the current at the output of the voltage regulator that supplies the FPGA core. For that, the voltage regulator circuit (FAN1112 [6]) was removed from the board (Digilent
Spartan-3 Starter Kit Board [19]) and mounted externally
to allow the monitoring of the electrical current using an
oscilloscope or a true RMS amperimeter. To observe the
evolution of the current consumption with an oscilloscope,
a resistor with a very low value (1 ) was inserted in series
with the supplied circuit.
To measure the dynamic power component the circuit
should be operating at a certain frequency. The activity of
the circuit is gotten by submitting random values to their input signals. To avoid the use of external signals was decided
to add a stimuli generator to the circuit that is under study.
Figure 1 represents the resources and the implementation of
the circuit in the FPGA. The I/O connections are limited to
the clock and enable inputs, and the parity output. The enable is used to control the circuit activity in order to measure
only the static component of the power consumption[12] or
both static and dynamic components. The dynamic component is then calculated by the difference between the total
consumption and the static consumption. The enable input acts as a clock enable, allowing to interrupt the circuit
and the clock tree activity. The central block in the FPGA
diagram represents the circuit under test (CUT), and the
leftmost block represents the circuit that generate the input
stimuli based on linear feedback shift register (LFSR).
To illustrate the effectiveness of the RTL design techniques to reduce the power consumption, various circuits
were implemented.
One of these circuits implements the md4 algorithm [10]
used to generate digital signatures in data encryption applications. Several multiplier circuits were also implemented
based an a parallel array architecture, including pipelined
versions.
The power consumption and area occupation of the auxiliary circuits, implemented in the FPGA to measure the
current, are presented in Table 1. With these values it is
possible to calculate the power consumption and the area
occupation of the circuits under test. For that purpose they
are subtracted to the results obtained for the whole circuit,
which includes the circuit under test and the auxiliary circuitry.
Bit width
4 bits
8 bits
16 bits
32 bits
Power (mW)
0.324
0.300
0.312
0.276
Area (CLBs)
118
141
182
256
4.1
This circuit uses a FSM to control the sequence of arithmetic and logical operations performed on the data to en-
crypt. The FSM is composed by fifty-five states. In fortyeight of them an addition is performed between the value
stored in a register and the result of a logic operation on the
values of three other registers and a constant, which is zero
in the early sixteen states.
This circuit was described and validated using the Verilog hardware description language (HDL) [7] and then implemented in a Xilinx Virtex5 XC5VLX50T FPGA. Several
versions were implemented using different FSM state encoding techniques. The used state encodings are the available to the user in the Xilinx synthesis tool, XST [17]. The
initial description of the circuit was made without any kind
of concern about the power consumption. In fact the FSM
coding was made using binary encoding with an arbitrary
sequence. The encoding that is mentioned as user corresponds to the encoding used in the initial version of the md4
circuit. This will be taken as the reference circuit to other
alternative implementations.
The power estimation was performed with a simulation
that uses stimuli vectors randomly generated in the testbench used to simulate the circuit operation. This way the
circuit is simulated as realistic as possible considering a
huge set of input vectors.
The power consumption of the circuit was estimated, as
well as the other versions of it with the alternative state encodings. The consumption of these circuits was not measured because it is not possible implement it in the Spartan3
XC3S200 FPGA used, because, once the occupied area exceeds the resources available in that FPGA.
Table 2 presents the obtained results of the power consumption estimation for the md4 circuit, implemented with
the different FSM state encoding. The speed1 is one of
the encodings available in the XST tool [17]. The dynamic
power consumption for each of the encodings is presented
as PD . The area column shows the number of CLBs used
by each implementation. The two rightmost columns show
the percentage variation of power and area of each of the
encodings relatively to the initial circuit. In these columns
a positive value means an increase value and a negative one
means a decrease, comparing with the user implementation.
The md4 circuit was re-implemented using a pipelined
architecture considering the decomposition of the arithmetic operations performed in each state. This new circuit
uses a FSM with one hundred and two states. The power
consumption estimation of the second implementation considering the different FSM encodings were done again. The
simulations were performed with the same stimuli vectors,
in order to have precisely the same conditions used in the
first version of the md4 circuit, allowing their comparison.
Table 3 presents the results of the power consumption
estimation for the pipelined version of the md4 circuit, implemented with the several FSM encodings. The structure
and the included values have the same mean of those pre-
Encoding
User
One-hot
Compact
Sequential
Gray
Johnson
speed1
PD (mW)
96.21
85.41
96.59
92.70
87.31
86.72
83.00
Area
7022
6966
6966
7056
7108
7311
7232
PD (%)
A (%)
-11.23
+0.39
-3.64
-9.25
-9.86
-13.73
-0.8
-0.8
+0.48
+1.22
+4.12
+2.99
Power (mW)
72.75
58.68
63.37
63.37
58.54
73.13
41.19
Area
7649
7441
7441
7682
7951
7347
7785
P (%)
-24.38
-31.30
-34.38
-31.64
-32.95
-15.71
-50.37
A (%)
+9.56
+6.81
+6.81
+8.87
+11.86
+0.49
+7.69
4.2
Multiplier circuits
References
[1] E. Boemo, G. Rivera, L. Buedo, and J. Meneses. Some
notes on power management on FPGA-based sytems. Lecture Notes in Computer Science, 975:149157, 1995.
[2] A. P. Chandrakasan and R. W. Brodersen. Minimizing power
consumption in digital CMOS circuits. In Proceedings of the
IEEE, volume 83, pages 498523, 1995.
[3] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen. Low
power CMOS digital design. In IEEE Journal of Solid-State
Circuits, volume 27, pages 473484, 1992.
[4] D. Chen, J. Cong, and Y. Fan. Low-power high-level synthesis for FPGA architectures. In Low Power Electronics and
Design, pages 143139, l, Aug. 2003.
[5] L. Cheng, D. Chen, and M. D. F. Wong. Glitchmap:
an FPGA technology mapper for low power considering
glitches. In DAC 07: Proceedings of the 44th annual conference on Design automation, pages 318323, New York,
NY, USA, 2007. ACM.
[6] Fairchild. Datasheet FAN1112, 2001.
[7] The Institute of Electrical and Electronics Engineers, New
York. IEEE Standard Verilog hardware description language, Sept. 1995. IEEE Standard 1364-1995.
[8] I. Koren. Computer Arithmetic Algorithms. Brookside Court
Publishers, 1998.
[9] MentorGraphics. ModelSim Reference Manual, 2007.
[10] R. Rivest. The md4 message-digest algorithm, 1992.
[11] N. H. Rollins. Reducing power in FPGA designs through
glitch reduction. Masters thesis, Brigham Young University, 2007.
[12] G. Sutter and E. Boemo. Experiments in low power FPGA
design. Latin America Applied Research, 37(1):99104,
2007.