0% found this document useful (0 votes)
196 views

RTL Design Techniques To Reduce The Power Consumption of FPGA Based Circuits

The document discusses techniques for reducing power consumption in FPGA circuits at the register transfer level. It presents an experimental setup for evaluating the effectiveness of applying techniques like pipelining, state encoding, and decomposition to studied circuits. Measurement and estimation methods are used to analyze the power consumption results.

Uploaded by

raees74
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
196 views

RTL Design Techniques To Reduce The Power Consumption of FPGA Based Circuits

The document discusses techniques for reducing power consumption in FPGA circuits at the register transfer level. It presents an experimental setup for evaluating the effectiveness of applying techniques like pipelining, state encoding, and decomposition to studied circuits. Measurement and estimation methods are used to analyze the power consumption results.

Uploaded by

raees74
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

RTL design techniques to reduce the power consumption of FPGA based circuits

Joao Miguel R. Meixedo and Antonio Jose D. Araujo


[email protected], [email protected]
Faculdade de Engenharia da Universidade do Porto
Rua Dr. Roberto Frias, 4200-465 Porto, PORTUGAL

Abstract
This paper addresses a design methodology to reduce
the power consumption of digital circuits implemented in
FPGA devices. An experimental setup to evaluate its effectiveness is presented. Some RTL design techniques were
applied to several studied circuits and the corresponding
results are presented and discussed.

Introduction

The power consumption of general purpose digital circuits has become increasingly a matter of top, both at the design level and at the commercial level. Specially in portable
equipment this is a central issue and in the last years a great
effort was made to reduce the power consumption. This way
a third criteria was added to the traditional design space exploration considering the area and performance trade-offs.
The power consumed by digital complementary metal
oxide semiconductor (CMOS) circuits results from the contribution of several parcels [2], having different causes.
Among them the most important result from the consecutive charge and discharge of the capacitive loads associated
with the logic gates. The causes are the currents that travels
at the time that an output changes due to the simultaneous
conduction of both the networks, P and N, and the leakage current that goes through the transistors [3]. The last
one has gained increasing importance in recent years, due
to the decreasing of the size of the manufacturing technology, which motivates the increase of the leakage currents.
On CMOS circuits the power consumption can be divided in two categories, the static power, which does not
depends on the clock signal, and the dynamic power, that
depends of the clock activity. The first parcel can not be
reduced during the design process because it only depends
of the amount of hardware present in the circuit and of the
manufacturing technology. On the other hand the dynamic
power can be reduced by minimizing the number of logical

level transitions of the signals in the circuit. This reduction


can be achieved in two ways: by decreasing the operating
frequency of the circuit, which by itself decreases the performance of the circuit, or using different architectural options to eliminate unnecessary logic level transitions maintaining the proper functionality and performance.
This work aims to establish a design methodology that
explores design techniques at the RTL (Register Transfer
Level) level to reduce the dynamic power consumption in
circuits implemented in FPGA (Field Programmable Gate
Array) devices. Although the low power design techniques
are known and used in digital CMOS circuits, there are still
few experimental results for FPGA implementation.
The following section presents some low power design
techniques and their application is reported in previous published works. Section 3 describes the experimental setup
to obtain the power consumption of several circuits implemented with those techniques. Section 4 present and discusses the obtained results for various alternative circuits.
The paper finishes with the most important conclusions
about the current work.

Techniques for dynamic power reduction

One of the main causes of the dynamic power consumption are the glitches [5, 11]. They consist in undesirable
switching activity, caused by the different signal delays that
occur in a datapath. For example, if the entries in an AND
gate are at a given moment 0 and 1, the output will be 0. If
both inputs switch, the output should remain in 0, but if the
entry which was 0 switch before the otherone, the output
switch to 1, but only after the switch of the other one input
make the correct value.
This type of undesirable switching activity is responsible
by an increase of the dynamic power consumption component increasing the overall power. This source of unnecessary switching activity can be reduced via delay equalization of the gate input signals. In [11] it is described
how the power consumption is decreased by the reducing

glitches. The author refers a reduction of 92% in consumption through delay equalization. The author has used several
circuit types such as adders, multipliers and dividers.
Another solution to reduce glitches is to use registers at
the gate outputs with a common clock signal, in order to
eliminate the differences between the delay of each signal.
This technique is known as pipelining and it can be used
to reduce the power consumption of combinational circuits
[16]. A pipelined architecture can result from dividing a
combinational circuit in various stages, introducing registers at each stage level and operating them synchronously
with the clock signal of the circuit. This way the circuit
is able to process more information in less time, because
the processing paths become shorter and each of them process different parts of the same task. Furthermore, the
registers introduced in the various stages block the spread
of glitches and decreases the capacitive load of the datapath, which means decreasing the dynamic component of the
power consumption.
Pipelining is particularly interesting in the case of circuits implemented in FPGAs. Usually the configurable
blocs that form the FPGAs contain the flip-flops needed to
implement pipelining, and that are not used in purely combinational circuits. A pipelined architecture was used by
the authors of [1, 13] in multipliers circuits. The achieved
power consumption reduction was 33% for circuits implemented in a FPGA XC3050 and 58% for circuit implemented in a XC4005.
In circuits with FSMs (Finite State Machines) based control, state encoding may [15, 14] influence the power consumption of that circuits. If there are many states or many
changes of state occur, the power consumed by the FSM can
be significant. To decrease the power consumption the state
encoding should be chosen so that it minimizes the change
of bits between consecutive states. There are codes specially conceived for that purpose where consecutive codes
differ only by one or two bits. Examples of that are the Gray
code and one-hot code, respectively. By this way, in circuits
where FSMs have relevant impact, it is possible to minimize
the power consumption exploring different of state encoding, strategies.
In [15] the authors said that it is possible to achieve a
power reduction of 57% by choosing properly the FSM state
encoding. They conclude that in FSMs with many states
(tipycally more than sixteen) one-hot encoding gets the better results and in FSMs with few states (less than eight) binary encoding leads to a smaller power consumption.
The decomposition of a FSM in smaller FSMs, with low
activity between them, make a favorable procedure since
it is possible to reduce the power consumption [14]. The
authors refer that from the application of this technique it
was possible a power reduction of 46%.

Power consumption evaluation

It is only acceptable to draw conclusions about the effectiveness of an architecture if it is possible to know the power
consumed by the circuit in question. To do that two methods can be considered: estimating the power consumption
by using a software tool or measuring the current consumed
by the implementation of the circuit. In spite of the additional cost, the real measurement allows to obtain values
with more precision. For that purpose, a prototype based in
a Xilinx Spartan3 FPGA is used.

3.1

Power estimation

Considering the design at the RTL level, the estimation


alternative reveals more appropriate because it is performed
at the same level of the other tools used during the design
cycle, like logic simulation and synthesis. This approach
allows quickly and easily draw conclusions about the power
consumption of a digital circuit. Through this work power
estimation was performed using the Xpower estimator tool
included in the Xilinx ISE package [18].
Xpower enables to estimate the power consumption of
a digital circuit implemented in a FPGA, based on switching rates associated with each of the circuit nodes, generated during the post-routing simulation performed with the
ModelSim simulator [9]. This simulation should be complete and extensive enough to be able to generate realistic
switching activity. The result of the simulation is stored in
a .VCD file which is then imported by Xpower to do the
simulation task. From this file, Xpower allows to meet the
estimation of both static and dynamic power consumption
of the circuit as well as the switching rates from each of the
signals present in the circuit.
According to [4], the estimation through Xpower produce results withe an error of 16.2%.

3.2

Current measurement

The real power consumption can be obtained by measuring simultaneously the current and the voltage applied
to the circuit. In this work it was achieved by measuring
the current at the output of the voltage regulator that supplies the FPGA core. For that, the voltage regulator circuit (FAN1112 [6]) was removed from the board (Digilent
Spartan-3 Starter Kit Board [19]) and mounted externally
to allow the monitoring of the electrical current using an
oscilloscope or a true RMS amperimeter. To observe the
evolution of the current consumption with an oscilloscope,
a resistor with a very low value (1 ) was inserted in series
with the supplied circuit.
To measure the dynamic power component the circuit
should be operating at a certain frequency. The activity of

the circuit is gotten by submitting random values to their input signals. To avoid the use of external signals was decided
to add a stimuli generator to the circuit that is under study.
Figure 1 represents the resources and the implementation of
the circuit in the FPGA. The I/O connections are limited to
the clock and enable inputs, and the parity output. The enable is used to control the circuit activity in order to measure
only the static component of the power consumption[12] or
both static and dynamic components. The dynamic component is then calculated by the difference between the total
consumption and the static consumption. The enable input acts as a clock enable, allowing to interrupt the circuit
and the clock tree activity. The central block in the FPGA
diagram represents the circuit under test (CUT), and the
leftmost block represents the circuit that generate the input
stimuli based on linear feedback shift register (LFSR).

Figure 2. Current during the operation of the


circuit.

measurements with a true RMS amperimeter.

Figure 1. Experimental setup.


The LFSR modules were obtained through the ISE Coregen tool [17], allowing to generate them in an easy way by
parametrisation. The output signals are directed to an array of XOR gates in order to get just one output that represents the parity bit. This way the number of output pads is
minimized, in order to reduce the outside influence over the
measurements. This output is necessary only for the proper
synthesis of the circuit.
The power consumed by the auxiliary modules that generate the input vectors and the output parity bit is determined through the implementation of a circuit with these
two modules alone. Thus way it was possible to determine
the consumption of the circuit under test.
Figure 2 shows the current waveform obtained through
an oscilloscope during the operation of the circuit, acquired
in the 1 resistor. The small current variation due to the
switching activity of the circuit, allows to do the remaining

Implementation and results

To illustrate the effectiveness of the RTL design techniques to reduce the power consumption, various circuits
were implemented.
One of these circuits implements the md4 algorithm [10]
used to generate digital signatures in data encryption applications. Several multiplier circuits were also implemented
based an a parallel array architecture, including pipelined
versions.
The power consumption and area occupation of the auxiliary circuits, implemented in the FPGA to measure the
current, are presented in Table 1. With these values it is
possible to calculate the power consumption and the area
occupation of the circuits under test. For that purpose they
are subtracted to the results obtained for the whole circuit,
which includes the circuit under test and the auxiliary circuitry.
Bit width
4 bits
8 bits
16 bits
32 bits

Power (mW)
0.324
0.300
0.312
0.276

Area (CLBs)
118
141
182
256

Table 1. Power and area consumed by auxiliary circuits.

4.1

The md4 circuit

This circuit uses a FSM to control the sequence of arithmetic and logical operations performed on the data to en-

crypt. The FSM is composed by fifty-five states. In fortyeight of them an addition is performed between the value
stored in a register and the result of a logic operation on the
values of three other registers and a constant, which is zero
in the early sixteen states.
This circuit was described and validated using the Verilog hardware description language (HDL) [7] and then implemented in a Xilinx Virtex5 XC5VLX50T FPGA. Several
versions were implemented using different FSM state encoding techniques. The used state encodings are the available to the user in the Xilinx synthesis tool, XST [17]. The
initial description of the circuit was made without any kind
of concern about the power consumption. In fact the FSM
coding was made using binary encoding with an arbitrary
sequence. The encoding that is mentioned as user corresponds to the encoding used in the initial version of the md4
circuit. This will be taken as the reference circuit to other
alternative implementations.
The power estimation was performed with a simulation
that uses stimuli vectors randomly generated in the testbench used to simulate the circuit operation. This way the
circuit is simulated as realistic as possible considering a
huge set of input vectors.
The power consumption of the circuit was estimated, as
well as the other versions of it with the alternative state encodings. The consumption of these circuits was not measured because it is not possible implement it in the Spartan3
XC3S200 FPGA used, because, once the occupied area exceeds the resources available in that FPGA.
Table 2 presents the obtained results of the power consumption estimation for the md4 circuit, implemented with
the different FSM state encoding. The speed1 is one of
the encodings available in the XST tool [17]. The dynamic
power consumption for each of the encodings is presented
as PD . The area column shows the number of CLBs used
by each implementation. The two rightmost columns show
the percentage variation of power and area of each of the
encodings relatively to the initial circuit. In these columns
a positive value means an increase value and a negative one
means a decrease, comparing with the user implementation.
The md4 circuit was re-implemented using a pipelined
architecture considering the decomposition of the arithmetic operations performed in each state. This new circuit
uses a FSM with one hundred and two states. The power
consumption estimation of the second implementation considering the different FSM encodings were done again. The
simulations were performed with the same stimuli vectors,
in order to have precisely the same conditions used in the
first version of the md4 circuit, allowing their comparison.
Table 3 presents the results of the power consumption
estimation for the pipelined version of the md4 circuit, implemented with the several FSM encodings. The structure
and the included values have the same mean of those pre-

Encoding
User
One-hot
Compact
Sequential
Gray
Johnson
speed1

PD (mW)
96.21
85.41
96.59
92.70
87.31
86.72
83.00

Area
7022
6966
6966
7056
7108
7311
7232

PD (%)

A (%)

-11.23
+0.39
-3.64
-9.25
-9.86
-13.73

-0.8
-0.8
+0.48
+1.22
+4.12
+2.99

Table 2. Obtained results for different FSM


state encodings.

sented in table 2. For all of the state encodings P and A


are relative to the user encoding of the initial version of the
md4 circuit.
Encoding
User
One-hot
Compact
Sequential
Gray
Johnson
speed1

Power (mW)
72.75
58.68
63.37
63.37
58.54
73.13
41.19

Area
7649
7441
7441
7682
7951
7347
7785

P (%)
-24.38
-31.30
-34.38
-31.64
-32.95
-15.71
-50.37

A (%)
+9.56
+6.81
+6.81
+8.87
+11.86
+0.49
+7.69

Table 3. Obtained results for different FSM


state encodings of the pipelined circuit.

The results of Table 2 show that the most efficient state


encoding in terms of power consumption is the speed1 encoding, allowing to save 13.73% of dynamic consumption. The average decrease in the power consumption in
the pipelined version of the circuit was 33.53%, and the increase at the circuit area was only 7.44%. These results
show the effectiveness of this type of architecture to decrease the power consumed by circuits with a long combinational datapath.
It is important to note that the pipeline technique would
reduce significantly the power almost without increasing the
ocupied area of the FPGA, as explained in section two.
The decrease of the total dynamic power consumption achieved with the combination of the two techniques,
pipelining and FSM state encoding, was 57.19% and the
corresponding increase in the circuit area was only 6.48%.

4.2

Multiplier circuits

Several cellular array multipliers [8] were implemented


for 4, 8, 16 and 32 bits. For each one different levels of
pipeline were applied, in order to draw conclusions about

the effectiveness of this RTL architecture to reduce power.


Pipelined architectures were used to reduce the power consumption instead of the performance increase. The power
consumption was measured considering the same data rate
and the same clock frequency. Figure 3 shows the evolution of the power consumption according to the number of
pipeline stages of the four multipliers under study.

by the pipeline registers is considerable face to the parcel


consumed by the few LUTs that implement the multiplier.
It is also visible in Figure 3, specially in the case of the 32
bits multiplier, that the decrease in power consumption stabilizes from 8 pipeline stages. This behavior is due to the
balance between the additional power consumption caused
by the flip-flops that implement the pipeline and the savings
that they provide.
In terms of the FPGA occupation, Figure 4 shows that
the increase of the pipeline stages leads to a slight increase
in the circuit area. As explained in section 3 the pipeline
architecture applied to circuits implemented in FPGAs does
not leads necessarily to an increase in the circuit area because it uses the free flip-flops present in CLBs that implement combinational circuits. Considering that initially
the multipliers were implemented as combinational circuits,
figure 5 shows the evolution of the left free flip-flops used
by the pipelining version of the circuits.

Figure 3. Power consumption of the multipliers.


Figure 4 shows the resources occupation in the FPGA
expressed in CLBs, for each one of the studied multipliers.

Figure 5. Flip-flops used by the 32 bits multipliers.

Figure 4. Area occupation of the multipliers.


The results in Figure 3 show that the power reduction efficiency of the pipeline architecture increases with the size
of the circuit. The explanation is the fact that larger circuits originate more glitches, and their spread by longer
datapaths is more visible than in small circuits. In small
circuits as in the 4 bits multiplier the usage of this architecture leads to an increase of the power consumption. This
result was expected because the additional power consumed

The most expressive results were obtained with the 32 bit


multiplier circuit, with a power consumption reduction of
69.3% and an increase of 12.6% on the occupied area. Comparing the results obtained by estimation with the measured
values, the differences are significant. Its average value is
around 50%. However, in spite of this difference, it is still
usefull and valid to use Xpower to compare the efficiency of
different architectures in terms of power consumption during the design flow.

Conclusions and future work

This paper presented a methodology for reducing the


power consumption of digital circuits implemented in
FPGA devices, consisting in the application of different

techniques allowing the reducing of power consumption at


the RTL level design. A method to measure the electrical
current consumed by the FPGA core, when the circuit under
test and pattern generators are implemented, was presented
and characterized.
The obtained results for the considered circuits were presented. The md4 circuit and fixed point multiplier, show the
effectiveness of the pipeline technique and the relevance of
the FSM state encodings used. In particular, the use of the
pipelining technique allows significant power reduction using free available resources in the FPGA.
With the multiplier circuits, it is concluded that the
pipelined architectures allow the power reduction in large
circuits, due to relevance of the glitches spreading in these
cases. In smaller circuits, it get, better results using combinational architectures or pipelined architectures with a reduced number of stages.
The presented results refer to the state of the current
work. Meanwhile this work is resuming and the next steps
will be the study of other circuits in order to reinforce the
current conclusions. For example, dividers14 and FIR filters are currently being implemented.

References
[1] E. Boemo, G. Rivera, L. Buedo, and J. Meneses. Some
notes on power management on FPGA-based sytems. Lecture Notes in Computer Science, 975:149157, 1995.
[2] A. P. Chandrakasan and R. W. Brodersen. Minimizing power
consumption in digital CMOS circuits. In Proceedings of the
IEEE, volume 83, pages 498523, 1995.
[3] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen. Low
power CMOS digital design. In IEEE Journal of Solid-State
Circuits, volume 27, pages 473484, 1992.
[4] D. Chen, J. Cong, and Y. Fan. Low-power high-level synthesis for FPGA architectures. In Low Power Electronics and
Design, pages 143139, l, Aug. 2003.
[5] L. Cheng, D. Chen, and M. D. F. Wong. Glitchmap:
an FPGA technology mapper for low power considering
glitches. In DAC 07: Proceedings of the 44th annual conference on Design automation, pages 318323, New York,
NY, USA, 2007. ACM.
[6] Fairchild. Datasheet FAN1112, 2001.
[7] The Institute of Electrical and Electronics Engineers, New
York. IEEE Standard Verilog hardware description language, Sept. 1995. IEEE Standard 1364-1995.
[8] I. Koren. Computer Arithmetic Algorithms. Brookside Court
Publishers, 1998.
[9] MentorGraphics. ModelSim Reference Manual, 2007.
[10] R. Rivest. The md4 message-digest algorithm, 1992.
[11] N. H. Rollins. Reducing power in FPGA designs through
glitch reduction. Masters thesis, Brigham Young University, 2007.
[12] G. Sutter and E. Boemo. Experiments in low power FPGA
design. Latin America Applied Research, 37(1):99104,
2007.

[13] G. Sutter, E. Todorovich, and E. Boemo. Design of power


aware FPGA-based systems. In Jornadas de Computacin
Reconfigurable y Aplicaciones, Sept. 2004.
[14] G. Sutter, E. Todorovich, L. Buedo, and E. Boemo. FSM
decomposition for low power in FPGA. Lecture Notes in
Computer Science, 2438:350359, 2002.
[15] G. Sutter, E. Todorovich, L. Buedo, and E. Boemo. Lowpower FSMs in FPGA: Encoding alternatives. Lecture Notes
in Computer Science, 2451:363370, 2002.
[16] S. J. Wilton, S.-S. Ang, and W. Luk. The impact of pipelining on energy per operation in field-programmable gate arrays. Lecture notes in computer science, 3203:719728,
2004.
[17] Xilinx. Development System Reference Guide 9.1i.
[18] Xilinx. Xpower Tutorial: FPGA Design, 2002.
[19] Xilinx. Spartan-3 Starter Kit Board User Guide, 2005.

You might also like