0% found this document useful (0 votes)
9 views

Advanced Clock Gating with Power Compiler

Clock Gating

Uploaded by

GoobeD'Great
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Advanced Clock Gating with Power Compiler

Clock Gating

Uploaded by

GoobeD'Great
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Advanced Clock Gating with Power Compiler

Wolfgang Embacher
Christian Bosch
Martin Embacher
Frank Trautmann

National Semiconductor GmbH


Livry-Gargan-Str. 10
D – 82256 Fuerstenfeldbruck, Germany

[email protected]
[email protected]
[email protected]
[email protected]

ABSTRACT

The popularity of wireless devices and the need for longer battery life makes low power a highly
important design goal for recent applications. Synopsys Power Compiler is a tool that assists
designers achieving this goal in a minimum amount of development time.

In synchronous digital designs with ever increasing clock speeds, clock gating is one of the most
efficient design techniques to reduce the dynamic power consumption. With the aid of Power
Compiler's automatic mechanisms, digital designers can instantly apply clock gating to their
existing design. Depending on structure and type of the design, tremendous savings in the
switching-power consumption can be achieved without any negative impact on performance, timing
or area.

This paper provides a trade-off analysis of Power Compiler’s clock gating along with examples and
recommendations for its usage and optimizations.
Table of Contents

1.0 Introduction............................................................................................................................. 3
2.0 Basic Clock Gating Strategies ............................................................................................... 3
2.1 Abstraction Levels of Clock Gating .......................................................................... 3
2.2 Benefits ........................................................................................................................ 5
3.0 Gating Logic and Timing Requirements .............................................................................. 5
3.1 Combinational Gating Logic ..................................................................................... 5
3.2 Latch Based Gating Logic.......................................................................................... 7
4.0 Power Compiler Clock Gating .............................................................................................. 8
4.1 Principle....................................................................................................................... 8
4.2 Clock Gating Style Controls ...................................................................................... 8
4.3 Design Flow ................................................................................................................. 9
4.4 Limitations and Workarounds ................................................................................ 10
5.0 Analysis and Recommendations.......................................................................................... 11
5.1 Power and Area Savings .......................................................................................... 11
5.2 Coding Style .............................................................................................................. 17
5.3 System Level Improvements.................................................................................... 17
6.0 Conclusion ............................................................................................................................. 17
7.0 Acknowledgements ............................................................................................................... 18
8.0 References.............................................................................................................................. 18

Table of Figures

Figure 2.1: Clock Gating from a System View.................................................................... 4


Figure 2.2: Clock Gating Principle ...................................................................................... 4
Figure 3.1: Illegal Timing (AND gate) ................................................................................. 6
Figure 3.2: Correct Timing (NAND gate)............................................................................ 7
Figure 3.3: Latch based clock gating ................................................................................... 8
Figure 4.1: Design Flow with Clock Gating....................................................................... 10
Figure 5.1: Area Comparison ............................................................................................. 13
Figure 5.2: Low Activity Power Savings............................................................................ 16
Figure 5.3: High Activity Power Savings........................................................................... 16

Table of Tables

Table 3.1: Active enable values and clock gate hold mode ................................................ 6
Table 3.2: Required stable period of enable signal............................................................. 6
Table 4.1: Clock gating options ............................................................................................ 9
Table 5.1: Number of instantiated clock gates .................................................................. 12
Table 5.2: Switching activity statistics ............................................................................... 15

SNUG Europe 2005 2 Advanced Clock Gating with Power Compiler


1.0 Introduction
In synchronous large-scale integration (LSI) designs, the permanent switching of the huge clock
tree causes continuous power consumption. This dynamic power consumption is proportional to the
switching frequency and the switched capacitive load. Thus, especially the high capacitance of the
clock tree in conjunction with its high frequency result in a very large part of the total power
consumption in today’s LSI’s.

Clock gating makes use of the fact that, typically, not all parts of a digital design are in use
simultaneously, and thus do not need to receive an active clock signal all the time. Gating elements
can be inserted into the clock tree, to split its capacitive load into smaller pieces. So, the separate
branches of the tree can be switched on or off individually, depending on whether they are needed
or not. Effectively, the average switching load is reduced.

One of the possible negative impacts of clock gating is the increase of the clock skew by the
propagation delay of the gating elements, which needs to be balanced by the clock tree synthesis
(CTS).

2.0 Basic Clock Gating Strategies


Clock gating represents a design technique that has developed over years. This chapter introduces
the basic techniques and discusses the most common clock gating styles. For better readability, the
further considerations assume that all considered flip-flops are positive edge triggered, and only
single edge triggered designs are used.

2.1 Abstraction Levels of Clock Gating

Generally, clock gating can be seen from different levels of abstraction, i.e. at different hierarchical
levels. At system level, entire functional blocks are enabled or disabled using system-wide enable
signals. At module level, the clock of single register banks is gated locally inside a block. Thus, the
original synchronous load-enable flip-flops (flip-flops with multiplexed input) are replaced with
simple flip-flops without enable pin. The enable functionality of several flip-flops can be attained
with a single clock gate.

System Level

The system level clock gating has a strong relation to the architecture of a design. In most cases, a
separate block of control logic generates the system-wide available activation signals that are used
to control the branches of the clock tree, and thereby enable logical or functional blocks. This kind
of clock gating, or controlled clock distribution is very power efficient in designs like a CPU’s core,
where only small pieces of a huge amount of available logic are active, depending on the desired
function. Figure 2.1 illustrates the controlled clock distribution provided by system level clock
gating.

SNUG Europe 2005 3 Advanced Clock Gating with Power Compiler


Figure 2.1: Clock Gating from a System View

At the current state, Power Compiler does not provide the necessary mechanisms to perform this
kind of complex system level clock gating. Nevertheless, a design optimized at system level, is a
very good starting point for power optimization with Power Compiler in view of ultra low power
consumption. Power Compiler performs clock gating at register level.

Register Level

In most cases, clock gating at register level means a replacement for synchronous load-enable flip-
flops. The classical implementation of load-enable flip-flops uses a multiplexed data feedback to
keep the existing information in a flip-flop while it is disabled, compare Figure 2.2. Even though
the flip-flop is inactive, it adds switching activity to the design with every clock cycle, and thereby,
power is consumed in the respective register and on its preceding clock network.

Figure 2.2 also illustrates the clock gated equivalent to the classical implementation. The clock is
provided to the register only if its enable signal is active. Thus, any switching activity during
inactive cycles is stopped at the register and the respective part of the clock network.

Figure 2.2: Clock Gating Principle.


”Classical” register with mux-bank (left) and clock gated register (right)

The use of multi-layer clock gating can further split up the clock tree into smaller pieces if the
designs structure is suited. An example for such a structure is any address decoder with a global
write enable, and an address signal that selects the register to be actually written, like in a cache

SNUG Europe 2005 4 Advanced Clock Gating with Power Compiler


controller module. The first layer of clock gates selects the active address space, the second the
actually addressed registers. As shown later, this is not only more efficient in power reduction, it
also results in additional area savings.

2.2 Benefits

The main benefit of clock gating is dynamic power savings. Both, the net-switching and the cell
internal power are proportional to the switching frequency and the switched capacitive load. As
clock gating reduces the effectively switching load, tremendous power savings can be achieved.
Depending on the design, the savings can be 50 percent and more.

Clock gating does not only save power, it also reduces the die area, because the bank of
multiplexers in front of the register becomes redundant: “One multiplexer per bit” is replaced by
“one gate per bank”. This is the reason why clock gating is not efficient below a minimum number
of bits gated with the same enable signal. The recommended value is typically between three and
five. There is no optimum that is true for all designs, so the user might want to experiment with this
number.

The main downside of clock gating is the additional clock skew caused by the clock gate itself.
Theoretically, this could arise problems in very high speed designs. The second problem with clock
gating occurs where FPGA prototyping is desired. Today’s FPGA’s do not provide the necessary
logic to implement extensive use of clock gates. Other than in designs with handcrafted clock gates,
Power Compiler’s clock gating avoids this problem, because it can temporarily be disabled with a
simple switch when compiling for FPGA.

3.0 Gating Logic and Timing Requirements


Figure 2.1 and 2.2 only use a generic symbol to represent the clock gate. This chapter introduces
the logic behind this symbol, and illustrates special timing requirements. There are two different
approaches for the clock gating logic. One is with combinational logic only, the other uses latches.

Please note that depending on the phase of the clock, type of flip-flops and the guaranteed stable
period of the enable signal, all four basic types of gates can be used: AND, NAND, OR, NOR.
Power Compiler supports all these possible combinations.

3.1 Combinational Gating Logic

Latch-free clock gating uses a basic two-input gate to control the clock. One of the inputs is
connected to the clock source being gated, the other one to the controlling enable signal. The output
signal is the gated clock, provided to the flip-flop(s).

The enable signal’s value that produces an active clock is HIGH for AND and NAND gates, and
LOW for OR and NOR gates (“active enable value”). The logic value that the output clock is forced
to by an inactive enable signal is LOW for AND and NOR gates, HIGH for NAND and OR gates.
This inactive clock value is also referred to as the “clock gate hold mode”. These considerations are
summarized in Table 3.1.

SNUG Europe 2005 5 Advanced Clock Gating with Power Compiler


Gating element Active enable valueClock gate hold mode
AND High low
NAND High high
OR Low high
NOR low low

Table 3.1: Active enable value and clock gate hold mode

Because NAND and NOR gates invert the phase of the gated clock, it is advised to use an inverted
input clock for these elements, to provide the original phase of the clock to the register. This is
important for single edge triggered designs.

To ensure a clean, non-glitching output clock, the enable signal must be stable for the duration of
the clock period which drives the output clock active. Any transition or glitch of the enable signal
during that period results in an illegal transition on the output clock. Table 3.2 summarizes the
required duration of the gated clock for each of the possible gating elements that the enable signal
must be stable for.

Gating elementRequisite Stable Period


(of clock being gated)
AND High Phase
NAND High Phase
OR Low Phase
NOR Low Phase

Table 3.2: Required stable period of enable signal

Figure 3.1 illustrates the corrupted waveforms that result from an illegally timed enable signal at an
AND gate. Figure 3.2 shows the respective (clean) waveforms from the same enable signal
controlling a NAND gate. For the AND gate, the enable signal violates the requirements from Table
3.2, for the NAND gate the requirements are met (the clock actually being gated at the NAND gate
is the inverted signal clk, not clk). Thus, often the NAND is preferred over the AND gate for the
gating logic.

Figure 3.1: Illegal timing (AND gate)

SNUG Europe 2005 6 Advanced Clock Gating with Power Compiler


Figure 3.2: Correct timing (NAND gate)

The bent arrows from the rising (triggering) clock edge to the transition of the enable signal in
Figure 3.1 and Figure 3.2 indicate the time that is required for the enable signal to be evaluated and
to become stable. For the NAND gate, the enable signal needs to be stable before the falling edge of
clk. In other words, the evaluation of the enable signal may take up to half a clock cycle. For the
AND gate, the enable signal would have to be stable right before the next rising clock edge, but
then keep its value for more than half a clock cycle. In realistic scenarios, these requirements can
hardly be met. This is why latch-free clock gating with AND gates is usually avoided.

For the NAND type clock gate, the half cycle path requirement can be a limitation for the maximum
design speed or the maximum number of logical layers for the enable signal’s evaluation logic. The
half-cycle requirement can be avoided by the use of latches, as explained in the next section.

3.2 Latch Based Gating Logic

Latches in the clock gating logic can help to ensure a clean, glitch-free clock signal, even if the
enable signal is only stable at the clock’s triggering edge (rising for positive edge triggered flip-
flops). Thus, the half-cycle path requirement for the enable signal is no longer valid, and the enable
signal’s timing is relaxed.
Figure 3.3 shows the latch-based clock gating logic with an AND gate. Because the latch is
transparent only during the low phase of the clock clk (high phase of clk_inv), transitions of the
enable during the clk’s high phase (required stable period for AND gates according to Table 3.2)
are not propagated to the AND gate’s enable pin, and thus cannot result in glitches of clk_en.

SNUG Europe 2005 7 Advanced Clock Gating with Power Compiler


Figure 3.3: Latch based clock gating

4.0 Power Compiler Clock Gating


Power Compiler provides powerful algorithms to automatically implement clock gating in an
existing design. It supports all the described styles of clock gates including further aspects like
controllability and observability for design for test. This offers full control over all additionally
implemented logic.

4.1 Principle

When a digital design is read in by Power Compiler it is translated into the GTECH level, a generic
technical library description. At this level, Power Compiler analyses the design’s structure and
performs logically equivalent transformations to simplify, flatten or clock gate the design as
required. The clock gating opportunity is determined by the design’s logical structure. Power
Compiler looks for synchronous load-enable flip-flops (a flip-flop with its d_out feed back to its
input through a multiplexer, controlled with an enable signal). It analyses flip-flops with common
enable conditions and summarizes them with one or more clock gates, depending on the clock
gating settings. Clock gates are only inserted if the enable signal meets the required timing
conditions. Power Compiler also takes care about the required observation and control logic.

Until Version 2003.12, the clock gating opportunities are recognized from analyzing the HDL code,
rather than the GTECH description. That is why older versions of Power Compiler are likely to
miss some of the possible opportunities in rare cases.
4.2 Clock Gating Style Controls

All of Power Compiler’s clock gating options are controlled with one specific dc_shell command:
set_clock_gating_style. This command must be called before the clock gate insertion is invoked. It
manages the type of clock gating logic for both, positive and negative edge triggered flip-flops
separately and independently from the use of latched or latch-free clock gating. Different types of
control and observation points can be added and configured to increase the observability for design
for test. The clock gating can be fine-tuned with the options minimum_bitwidth and num_stages.
Table 4.1 explains the most common options.

SNUG Europe 2005 8 Advanced Clock Gating with Power Compiler


Option Possible values Effects
minimum_bitwidth minsize_value Minimum number of flip-flops with the same
enable condition to be gated.
num_stages 1,2 ... Number of clock gating stages
positive_edge_logic {and}, {nand},... Specifies the two-input clock gate for positive
edge triggered flip-flops.
negative_edge_logic {or}, {nor},... Specifies the two-input clock gate for negative
edge triggered flip-flops.
sequential_cell latch | none Specifies whether to use latches or not.
control_point none | before | after Location of control point.
control_signal scan_enable | test_mode Type of test_control signal.
observation_point true | false Whether to use observation points or not.
observation_logic_depth depth_value Depth of XOR-tree in observability circuit.

Table 4.1: Clock Gating Options

4.3 Design Flow

Automatic clock gating requires only minimum changes to the common digital design flow. Clock
gating is applied before the (first) compile step, once all the design source code is read in (e.g.
read_verilog) and the clock gating options have been set (set_clock_gating_style). There are several
options to observe the results. The report_power command gives a rough idea of the design’s power
consumption. However, the numbers might be of poor accuracy. For higher accuracy some kind of
switching information is required. This information can origin from either a gate-level or a RTL
simulation. In both cases the switching information is stored in a so-called SAIF file (Switching
Activity Interchange Format) that needs to be annotated to the design in the Synopsys environment
prior to the power analysis. With this information the accuracy is good enough for most common
cases. If the results don’t meet the expectations, the clock gating settings can be further adjusted
(e.g. number of stages, minimum bit width) to optimize the estimated power consumption.

Figure 4.1 shows a complete sample clock gating flow.

SNUG Europe 2005 9 Advanced Clock Gating with Power Compiler


Figure 4.1: Design Flow with Clock Gating

The performance of Power Compiler shows a strong dependence on the clock gating settings for
both runtime and power savings. Especially the values for the minimum bit width and the number
of stages have to be chosen wisely, as they have a strong influence on the results. Experience shows
that a minimum bit width of at least five, and a number of two stages is a good start for most
designs. There might be a need for a few iterations with these variables if the power or area results
are much worse than expected.

4.4 Limitations and Workarounds

The following chapter points out a few issues and limitations that have been found during the work
with Power Compiler version 2004.06-SP2. Some of those are already solved in later versions.
Synopsys is aware of all mentioned issues.

Clock Gate Replacement

Clock gating replacement is an essential feature, wherever the style of existing clock gating needs
to be converted into another. For instance, when adapted code from an external source needs to be
adjusted to existing in-house clack gating style. Clock gating replacement is performed with the
switch “insert_clock_gating -module_level”. Power Compiler follows strict rules: after analyzing
the existing clock gating, it determines whether a logic and clock equivalent replacement with the
new clock gating settings can be performed. Note that Power Compiler replaces the clock gate only
if the result is both, logical and clock equivalent.

Combinational Clock Gating Logic

Power Compiler refuses to apply the latch-free clock gating whenever the enable signal origins
from a design input port. The reason is that Power Compiler does not take user constraints into
account when determining enable signals. To guarantee glitch-free signals it always assumes the
worst case, and therefore refuses to add these kinds of clock gates.

SNUG Europe 2005 10 Advanced Clock Gating with Power Compiler


In versions before 2004.12 Power Compiler also treats signals that cross module boundaries as
direct inputs. Thus, these signals cannot be used as enable either. A workaround for this behavior is
to get rid of the design hierarchy. Once the design is flattened with an “ungroup -all” command, the
sub-module boundaries vanish and Power Compiler is able to determine the correct enable signals.
However, this problem does not apply in recent versions.

CTS Discrete Clock Gating Cells

The use of discrete clock gating logic (not integrated clock gating cells) with latch-based clock
gating causes problems with some layout tools. Astro/Apollo for instance synchronize the latch
with the flip-flops. This leads to hold violations. Furthermore the latch and the AND are not treated
as they belong together, so they could be placed far apart during layout, which introduces a high
clock skew.

Possible workarounds are the use of integrated clock gating cells or a replacement of the clock
gating cells before layout as described in [6]. Solvnet article 003097 describes some more
suggested solutions.

Overriding of clock gating settings

The command set_clock_gating_registers is used to explicitly include or exclude registers from


clock gating. This command should be used very carefully, as it overrides Power Compiler’s
selection of registers to be gated. This can result in unwanted logic overhead, like a clock gate
before a register that is enabled all the time.

5.0 Analysis and Recommendations


To analyze Power Compiler’s efficiency, the tool was run on three different designs. All designs
exist in a plain version without clock gating, and in a version that provides a manual
implementation of clock gating. Power Compiler’s clock gating was applied to the plain designs
with the gating style set to single and double layer clock gating.

5.1 Power and Area Savings

In the following section, the location and the distribution of automatically inserted clock gates is
compared against their manual counterparts. One important fact for the following considerations is
that all manually inserted clock gates use latch-free gating logic. Because of the limitation
concerning latch-free clock gating with signals from module boundaries, for the automatically
instantiated clock gates latch-based style had to be used. This has a slight influence in power and
area consumption of automatic clock gating results.

Placement and Quantity of Clock Gates

With a size of 976 gates design one is an example for a small, simply structured design. In this case
the placement of the automatically inserted clock gates coincides with the location of the manually
implemented ones. Manual clock gates are used in this module for those registers that provide an
explicit enable condition in the respective not-gated version. Thus, the correspondence of the two
clock gating approaches is no surprise.

SNUG Europe 2005 11 Advanced Clock Gating with Power Compiler


A case of special interest shows the second design (two), with a size of 16600 gates. This module
uses the two-layer clock gating approach in the manual version. Power Compiler is able to
implement a one- or two-layer approach. The manual version uses 326 clock gates, four of which in
the first, the rest in second layer. Like in the first design, all manually gated registers provide a
load-enable. Power Compiler implements 333 clock gates for the two-layer solution and 326 clock
gates for the one-layer solution. The additional gates are inserted at registers, where manual (latch-
free) clock gating cannot be applied due to the half-cycle path requirement for latch-free clock
gating and too extensive evaluation logic for the enable signal. Because of the latches, the late-
arriving (after the trailing edge) enable signals can still be used for clock gating in the automatic
version.

The differences in the clock gate placement at design three, with a size of 29500 gates, are similar
to the ones in design two. Power Compiler uses the advantages of the latch-based clock gating and
inserts slightly more clock gates than the manual approach.

Table 5.1 summarizes the numbers of instantiated clock gates for the tested designs for both the
manual and the automatic approach as discussed above.

Design Nr. Manual Clock Gating Automatic clock gating


One layer Two layers
1 9 9 10
2 326 326 333
3 145 148 150
Table 5.1: Number of Instantiated Clock Gates

Effects on Design Area

The insertion of clock gates affects the design’s area in several ways. The main factors are:

- Omitted multiplexers in redundant feedback loops (major effect).


- Changes in required driver strength due to omitted multiplexers (medium effect).
- Additional area consumed by clock gates, control points, observation points (minor effect).

The overall effects of clock gating on the design area under all tested conditions are positive
effects. This is illustrated in Figure 5.1, where the total cell area savings are displayed for all test
cases. The 100% values correspond to the area consumption without clock gating.

SNUG Europe 2005 12 Advanced Clock Gating with Power Compiler


Figure 5.1 Area comparison

The change in the design area caused by clock gating depends on various factors like the clock
gating style and the number of registers gated with one clock gate.

In the technology used, a typical clock gate’s area for driving 32 flip-flops is 203 (um)2 (latch-
based, control point before). The area of 32 multiplexers, each driving one single flip-flop is 864
(um)2. This means in best case an area reduction of more than 20 (um)2 per gated flip-flop. The
minimum bit width setting of the automatic clock gating has here a great effect. For instance, if
only three flip-flops are gated per clock gating cell 100 (um)2 there is an increase of 6 (mm)2 per
gated flip-flop. Therefore the bit width setting must be adapted to the internal structure of the
design to get an optimal value for the area consumption.

The above calculation is not 100% precise for several reasons. The clock gate’s area differs
depending on the clock gating logic style, and any change in the required driver strength of the
preceding logic. Additionally required control and observation points also increase the area for
clock gating. That’s why the effective savings might differ from the expectations calculated above.

For all designs tested, the savings of manual and automatic gating are approximately identical, as
expected. More than 98% of the manually gated flip-flops provide the load enable, and thus, can
also be gated automatically. The eye-catching huge savings at design two (Figure 5.1) are due to the
high number of flip-flops compared to the size of the design.

The two-layer clock gating of Power Compiler shows only positive results on the area of design
two, where the figure matches the one of the manual approach. At design three, almost no effect on
the area is visible, due to the large size of the design, the 5 additional clock gates implemented have
only a very small effect. The area of design one increased clearly due to the additional clock gate.
This is due to the fact that the synthesis uses bigger cells in the neighborhood of the clock gates and
the respective registers with bigger driver strengths, in order to compensate the additional delay that
was inserted by the clock gate.

The overall size of the three designs was reduced by more than 12%. These results show clearly the
positive effects of clock gating on the area of a design.

SNUG Europe 2005 13 Advanced Clock Gating with Power Compiler


Speed and Timing

The latched-based clock gating (automatic) has an advantage over the latch-free (manual) one in
regard of the design’s timing. This leads to two positive effects:

- Clock gating can also be used in timing critical paths


- The maximum design speed can be increased

The effect is visible at design two and three, where automatic clock gating is able to insert more
clock gates, on paths where the manual, latch-free counterpart cannot be used because of the late
arriving enable signals.

There are designs (not the tested ones) where the half-cycle path of the enable signal is the major
limitation for a design’s maximum speed. In these cases, the design’s speed can dramatically be
increased by avoiding the half-cycle requirements with the use of latches.

Theoretically, manual clock gating could make use of latches, too. But for testability and
controllability concerns for the DFT, most designers try to avoid latches in their designs. Thus,
latches are usually not used in manual clock gating logic. Nevertheless, if the use of latches is
controlled by an automatic tool, usually all possible negative side effects are avoided or
compensated for, as is done in the case of automatic clock gating.

The negative effects on the design’s timing due to the additional clock skew can easily be
compensated by buffer insertion during the CTS. Thus, it does not have any noticeable effects on
the timing.

Power Saving Aspects

The power savings that can be achieved with automated clock gating depend on a number of
different conditions. Besides the design structure and the number of flip-flops that can be gated, the
stimulation pattern and the activity distribution across the design have the biggest influence on the
resulting power consumption and the possible power savings.

Apart from design one, the static power consumption is less then 1% of the total power
consumption at the tested designs. At design one the actual savings in the static power are less than
0.5% of the total power consumption. Thus, this part of the power can be neglected due to the
accuracy of the analysis, and will be of no concern in the following analysis, only dynamic power
will be analyzed.

Stimulation Pattern for Power Analysis

The careful choice of the stimulation pattern used for the power analysis is considerably important
for the significance of the resulting power values. It is useful to distinguish between cases of both
low and high switching activity. For both cases, the efficiency of clock gating can differ in wide
ranges. Under real circumstances, a mixture of both cases will usually be seen.

In the low activity means that there should be as little activity on the design’s inputs as possible.
Additionally, any eventual low power mode can be used if provided by the specific design. Design
two and three support such low power modes.

SNUG Europe 2005 14 Advanced Clock Gating with Power Compiler


The high activity pattern should cause as much switching activity in the respective design as
possible (Clock gating without extra observation/control logic).

Table 5.2 presents the percentage of switching nodes in the tested designs for both, the high and the
low activity pattern used (Implementation without clock gating).

Design 1Design 2Desing3


Nodes switching in low activity [%] 0.7 0.01 0.33
Nodes switching in high activity 26.68
[%] 28.5 17.5
Table 5.2: Switching Activity Statistics

Low Activity Power Savings

In the case of low switching activity, the potential power savings from clock gating are generally
comparatively high. Because the large parts of the designs are inactive, the respective parts of the
clock tree can be turned off by the clock gate that would otherwise continuously consume
capacitive power without the clock gate. The potential power savings for low switching activity can
be 80% or more.

There are two major reasons for the differences between manual and automatic clock gating. The
less important one is the use of latches in the automatic version, and the power consumed by them.
This effect is primarily noticeable at design one in Figure 5.2, where the automatic version is
approximately 15% worse than the manual one.

Also the manual savings for design two and three appear better than the automatic ones. In the
design three, the divergence is caused by a wait signal that switches off large parts of the design.
Here it can help to use automatic clock gating with some additional manual clock gates that take
care of the opportunities not recognized by Power Compiler. With design two's power values, the
advantage of the manual two-layer clock gating can be seen. The two layer clock gating of Power
Compiler shows also much better values than the one layer version, but it is not as effective as the
manual version.

SNUG Europe 2005 15 Advanced Clock Gating with Power Compiler


Figure 5.2: Low Activity Power Analysis

High Activity Savings

The average power savings under high activity circumstances are generally smaller than under low
activity circumstances. This is because the registers can no longer be “turned off” for long periods,
as they are frequently being accessed. Nevertheless, the achieved power savings are still larger than
60% under all tested conditions.

With all designs, the differences between manual and automatic clock gating are still noticeable,
but significantly smaller than in the case of low activity. The main reasons for this is that the effects
of system level clock gating (wait/ low-power) do not have any influence if the design is active.
This can be seen in Figure 5.3.

Figure 5.3: High Activity Power Analysis

SNUG Europe 2005 16 Advanced Clock Gating with Power Compiler


5.2 Coding Style

Older versions of Power Compiler, that determine the clock gating opportunity from analyzing the
HDL code show stronger dependency on the coding style. For instance, these versions do not
recognize the load-enable functionality of a flip-flop in case the multiplexer is hidden in a “cloud of
logic”.

However, the only coding style requirement that could be found with the latest version of Power
Compiler is that the multiplexer needs to be directly connected to the flip-flops. Power Compiler
doesn’t recognize the opportunity if the multiplexer is hidden behind a piece of logic, e.g. an adder
with a fixed value. Even though this example could easily be equipped with clock gating, this
applies to a really small fraction of cases.

5.3 System Level Improvements


The low activity results in figure 5.2 indicate an advantage of the manual approach over the
automatic one. The in-depth analysis of the clock gating in design two and three shows that the
manual approach uses internal wait and low power signals, that are not apparent to the algorithms
of Power Compiler. Since all clock gating activities represent a trade-off effort versus power
savings the combination of the fast automatic Power Compiler clock gating with the most effective
hand-crafted system-level clock gating comes into the main focus.

This approach splits the clock gating of a design into two independent steps. Firstly, the designer
adds a small number of system level clock gates, with sophisticated enable conditions that are
unknown to Power Compiler. Secondly, Power Compiler is run on the system-level gated RTL to
implement clock gates at register level.

Due to the incremental strategy the optimum power results are achieved with a minimum of design
time. However the above proceeding suffers two minor drawbacks. Firstly, the additional level of
clock gates increases the clock skew to be balanced in CTS. Secondly, the flexibility of Power
Compiler with regards to removing the clock gating for FPGA use and changing the clock gating
style by simple settings is lost.

The advanced strategy presented above uses the best parts of handcrafted and automatic clock
gating, and fulfils thereby the main driving factors in modern designs, reduced time to market and
optimum power consumption.

6.0 Conclusion
Clock gating is a sophisticated design technique that has developed over years. Consequentially, it
is no surprise that Power Compiler is able to save up to 80 percent power dissipations in certain
cases with clock gating. But it is outstanding that Synopsys developed clock gating to a push-button
technology that can be applied within seconds to any kind of design. Even with no experience in
this area, enormous power savings can be achieved with no risk violating any design rule.

Power Compiler is a good aid to speed up the design process and to shorten the time to market,
even for designers that already have experience with clock gating. The results show that Power
Compiler’s automatic clock gating is absolutely comparable to a handcrafted implementation. The
only thing where a human designer could further enhance automatic clock gating is at system-level
clock gating.

SNUG Europe 2005 17 Advanced Clock Gating with Power Compiler


Due to several implications that arise with the use of automatic clock gating, the use of Power
Compiler in an existing environment with an individual design flow might imply unexpected
complications, e.g. with the CTS or layout. The only solution for this is an evaluation on individual
test cases and a close collaboration with Synopsys.

7.0 Acknowledgments
We would like to thank all friends and colleagues from National Semiconductor who contributed to
our work on this document. Furthermore, we would like to express special thanks to Andy Chaggar,
Dr. Th. Mahnke and Dr. W. Stechele for their support and friendly collaboration.

8.0 References
[1] Analysis of automated Power Saving Techniques using Power Compiler, Wolfgang
Embacher, TU Munich, 2004.
[2] Designing low-power circuits: practical recipes, L. Benini, G. De. Micheli, E. Macii, IEEE
Circuits and Systems Magazine, vol. 1, no. 1, 2001.
[3] Low Power ASIC Design Using Voltage Scaling at the Logic Level, Th. Mahnke, TU
Munich, 2003.
[4] Low Power Digital CMOS Design, A. Chandrakasan and R. Brodersen, Kluwer Academic
Publishers, 1995.
[5] How To Successfully Use Gated Clocking in ASIC Design, Darren Jones, SNUG 2002.
[6] Automatic Clock Gating for Power Reduction, Zia Kahn, Guarav Meth, SNUG 1999.
[7] “Power Compiler Reference Manual”, Synopsys

SNUG Europe 2005 18 Advanced Clock Gating with Power Compiler

You might also like